Dell PERC Question (Server Down)



  • @BRRABill said:

    Turns out xByte thinks it IS the drives.

    Working with them now to figure out a fix, which is definitely going to involve new DELL-branded drives.

    Why is ML always right? 😉

    Because the collective is always better than the individual? "You will be assimilated."



  • @BRRABill said:

    Turns out xByte thinks it IS the drives.

    Working with them now to figure out a fix, which is definitely going to involve new DELL-branded drives.

    Why is ML always right? 😉

    Wait - what? Why? Does xByte no longer believe in their SSDs?



  • @Dashrender said:

    Wait - what? Why? Does xByte no longer believe in their SSDs?

    I would not say that.

    I'm not really sure WHAT is going on. I did ask for clarification as to what they think the issue is.

    Now I just pray the thing stays up until I can get a new array in there. I'll probably going to start a "which hard drive thread" as well as a "how do i copy this XS instance" thread. 🙂



  • If you were building a SuperMicro server I might suggest something like the Samsung 850 Pro drives. Under provision them by 20% and you'll statically be fine for the life of a standard server.

    So I wonder - where is the issue with the Edge drives.



  • @Dashrender said:

    So I wonder - where is the issue with the Edge drives.

    I think it is something with how the DELL servers talk to them.

    The DELLs only like it when their drives are in there. The DELL tech I spoke with said that if the drives don't return exactly what the PERC is looking for, it can offline the array, and that the error I saw is almost always a drive issue.

    I'm still waiting to hear back from my rep (Brad) at xByte on the specifics of why they think this did not work.

    Who (whom?) is the main xByte contact here at ML. Maybe we can loop them in.



  • @Dashrender said:

    If you were building a SuperMicro server I might suggest something like the Samsung 850 Pro drives. Under provision them by 20% and you'll statically be fine for the life of a standard server.

    That was the first thing I was told. If you buy DELL, stick with the ecosystem. But I felt confident the EDGE drives would be OK.



  • @BRRABill said:

    @Dashrender said:

    If you were building a SuperMicro server I might suggest something like the Samsung 850 Pro drives. Under provision them by 20% and you'll statically be fine for the life of a standard server.

    That was the first thing I was told. If you buy DELL, stick with the ecosystem. But I felt confident the EDGE drives would be OK.

    Stick with ecosystem = yes. But - as I understand it - and frankly I can't believe no one from xByte has jumped in here yet - xByte had the EDGE drives built to answer the exact calls coming from the PERC cards so they basically look like DELL drives to the PERC cards. Is that not the case? Psst.. that's not for you to answer, that's for xByte to answer.



  • I'll ping my rep. 🙂

    @BradfromxByte


  • Vendor

    Hi @BRRABill , who is your rep at xByte? I am going to look into this.



  • @BRRABill said:

    I'll ping my rep. 🙂

    @BradfromxByte

    @BradfromxByte is awesome.



  • @Lyndsie_xByte

    HI:

    It is Brad. I just spoke with him on the phone. (We've been back and forth all morning.)

    He took care of it from a customer service aspect 100% to my liking. (Full refund for the drives.) He said he was going to get someone else to jump on here to explain what happened.



  • @JaredBusch said:

    @BradfromxByte is awesome.

    Yeah, he has been awesome so far, in my limited dealings with xByte. Really above and beyond.

    They are going to 100% refund the drives. So now I just have to decide what to upgrade to.

    They offered to send out replacement SSDs, but I'm not sure if I trust that route.

    I'll wait to someone techy from xByte pops on to describe what happened, and how they feel about trying another SSD.



  • @Lyndsie_xByte

    If you (or someone else) could speak to why this happened, and if trying another set of SSDs might help, that would be helpful for me, and also for the ML community looking to buy SSDs in the future.


  • Vendor

    @JaredBusch Thanks!



  • We are reaching out to Edge so they can reply directly to this thread.



  • @ryan-from-xbyte said:

    We are reaching out to Edge so they can reply directly to this thread.

    Great.

    Maybe it is "fixable" and I won't have to do anything. (Fingers crossed.)



  • Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:

    1. You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
    2. Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?


  • @jleskovsky said:

    Hello @BRRABill. My name is Justin Leskovsky and I work for EDGE Memory. After reading over the issue you described, I am inclined to agree with the Dell representative that this error was most likely just a fluke. That being said, I did have a couple of questions for you:

    1. You mentioned re-seating the drives in the system. Are the EDGE SSDs currently being recognized by the PERC H710 controller after you re-seated them?
    2. Assuming that the drives are currently recognized by the system, I saw that it was suggested that you go ahead and import the foreign configuration that was recognized by the controller. Did you attempt this import process? Was it successfully able to correct the issue?

    I can kind of answer both the questions at the same time.

    After importing the foreign configuration, the array came back up, and the drives were again recognized by the PERC.

    They were NOT recognized at the hardware level (by iDRAC) until the config was re-imported.

    xByte seemed to think it was a drive issue. The DELL rep, while he said it might have been a fluke, said that error almost always happens with faulty drives.

    So, you think it was just a fluke?



  • @jleskovsky

    BTW: welcome to MangoLassi!



  • Thank you for hopping in to answer questions @jleskovsky



  • Based on all the testing that I have done personally on Dell blade and rack server here in house @ EDGE, I've only personally seen a similar issue only once before, and it was after a PERC controller firmware update on a 12th Gen R720xd. Like your situation, I was able to simply import the "foreign" configuration and the system was back to business as usual. It's actually still up and running Ubuntu now, months later, without giving any indication that the "error" ever even occurred.

    Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!



  • @jleskovsky said:

    Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!

    OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.

    I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.



  • @BRRABill said:

    @jleskovsky said:

    Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!

    OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.

    I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.

    updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.



  • @Dashrender said:

    updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.

    I'm not blaming it, I guess. Just thought it was weird it didn't reboot the server. And in about 24 hours the issue happened.

    I've run that SSD array up in testing for weeks with no problems.

    Well, hopefully it was a fluke. @scottalanmiller said flukes happen all the time.



  • @BRRABill said:

    @Dashrender said:

    updating iLo on my HP server caused the fans to spin up and down constantly... only solution - downgrade firmware on iLo.

    I'm not blaming it, I guess. Just thought it was weird it didn't reboot the server. And in about 24 hours the issue happened.

    I've run that SSD array up in testing for weeks with no problems.

    Well, hopefully it was a fluke. @scottalanmiller said flukes happen all the time.

    iLo and iDrac are completely independent from the servers. They are designed to allow you access to the system regardless of the system's state. Though iLo and iDrac you can mount an ISO through your desktop/laptop as if it was a DVD Rom and boot the server so you can install it completely remotely, etc.

    The general idea is that IT personal generally stay out of DCs and bench techs take care of the hardware, cabinets, etc in the DC.



  • @Dashrender said:

    iLo and iDrac are completely independent from the servers.

    Suuuuuuuuuuuuure.....



  • @BRRABill said:

    @jleskovsky said:

    Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!

    OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.

    I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.

    Can't require a reboot, it is its own computer with its own cup, memory, firmware, etc. if you reboot the server the ilo or idrac do not reboot. They stay on even when the system is powered down.



  • @scottalanmiller said:

    @BRRABill said:

    @jleskovsky said:

    Given that the issue appears to currently be resolved, I would still agree with Dell that this could easily be a one-off fluke. That being said, please keep and eye on it and please update this thread if anything unusual occurs with those drives. You can also e-mail me directly at [email protected] and I will be more than happy to in any way I canto get the situation resolved for you. Thanks!

    OK, let's keep our fingers crossed and hope for the best. I will keep you (and ML) updated.

    I did update the iDRAC on Friday and it did not require a reboot. Who knows ... maybe that did something.

    Can't require a reboot, it is its own computer with its own cup, memory, firmware, etc. if you reboot the server the ilo or idrac do not reboot. They stay on even when the system is powered down.

    Assuming the server has power.



  • Well, everything has been OK thus far. We shall see.

    Gremlins, perhaps.



  • @jleskovsky

    Is there any issue with updating any of the firmware/BIOS of the DELL server? Is there ever a chance that might mess with the co-operation between the EDGE drive and the server?


Log in to reply