ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Invalid Drive Movement from HP SmartArray P411 RAID Controller with StorageWorks MSA60

    Scheduled Pinned Locked Moved IT Discussion
    raiddasstorageworks msa60hpesmartarray p411smartarrayhewlett-packardstorage
    24 Posts 4 Posters 7.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Shuey @travisdh1
      last edited by

      @travisdh1 said in "Invalid Drive Movement" (HP Smart Array P411):

      Any number of things. Do you schedule reboots on all your equipment? If not you really should for just this reason. The one server we have, XS decided the array wasn't ready in time and didn't mount the main storage volume on boot. Always nice to know these things ahead of time, right?

      I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

      scottalanmillerS 1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller @Shuey
        last edited by

        @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

        @travisdh1 said in "Invalid Drive Movement" (HP Smart Array P411):

        Any number of things. Do you schedule reboots on all your equipment? If not you really should for just this reason. The one server we have, XS decided the array wasn't ready in time and didn't mount the main storage volume on boot. Always nice to know these things ahead of time, right?

        I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

        Does your normal reboot schedule of your server include a reboot of the MSA? Could it be that they were powered back on in the incorrect order? MSAs are notoriously flaky, likely that is where the issue is.

        I'd call HPE support. The MSA is a flaky unit but HPE support is quite good.

        S 1 Reply Last reply Reply Quote 2
        • S
          Shuey @scottalanmiller
          last edited by Shuey

          @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

          @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

          I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

          Does your normal reboot schedule of your server include a reboot of the MSA? Could it be that they were powered back on in the incorrect order? MSAs are notoriously flaky, likely that is where the issue is.

          I'd call HPE support. The MSA is a flaky unit but HPE support is quite good.

          We unfortunately don't have a "normal reboot schedule" for ANY of our servers :-/...

          I'm not even sure what the correct order is :-S... I would assume that the MSA would get powered on first, then the ESXi host. If this is correct, we have already tried doing that since we first discovered this issue today, and the issue remains :(.

          We don't have a support contract on this server or the attached MSA, and they're likely way out of warranty (ProLiant DL360 G8 and a StorageWorks MSA60), so I'm not sure how much we'd have to spend in order to get HP to "help" us :-S...

          scottalanmillerS 3 Replies Last reply Reply Quote 0
          • scottalanmillerS
            scottalanmiller @Shuey
            last edited by

            @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

            @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

            @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

            I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

            Does your normal reboot schedule of your server include a reboot of the MSA? Could it be that they were powered back on in the incorrect order? MSAs are notoriously flaky, likely that is where the issue is.

            I'd call HPE support. The MSA is a flaky unit but HPE support is quite good.

            We unfortunately don't have a "normal reboot schedule" of ANY of our servers :-/...

            I should not have said schedule. I should have said your "Normal reboot process." Regardless of the regularity of the reboots, is the process a standard one?

            S 1 Reply Last reply Reply Quote 2
            • scottalanmillerS
              scottalanmiller @Shuey
              last edited by

              @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

              I'm not even sure what the correct order is :-S... I would assume that the MSA would get powered on first, then the ESXi host. If this is correct, we have already tried doing that since we first discovered this issue today, and the issue remains :(.

              You are correct, the MSA needs to be up first.

              1 Reply Last reply Reply Quote 1
              • scottalanmillerS
                scottalanmiller @Shuey
                last edited by

                @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                We don't have a support contract on this server or the attached MSA, and they're likely way out of warranty (ProLiant DL360 G8 and a StorageWorks MSA60), so I'm not sure how much we'd have to spend in order to get HP to "help" us :-S...

                A bit. Why is there an MSA out of contract? The only benefit to an MSA is the support contract. Not that that makes it worth it, but proprietary storage requires a warranty contract to be viable. The rule is that any storage of that nature needs to be decommissioned the day before the support contract runs out because there isn't necessary any path to recovery in the event of an "incident" without one. It's not a standard server that you can just fix yourself with third party parts. Sometimes you can, but as it is a closed, proprietary system, you are generally totally dependent on your support contract from the vendor to keep it working.

                There is a good chance that this is a "replace the MSA and restore from backup" situation in that case.

                S 1 Reply Last reply Reply Quote 1
                • scottalanmillerS
                  scottalanmiller
                  last edited by

                  Because the scenario that you are in is not one that should arise, I am going to guess that tracking down info on this will be difficult. But here is something that I found. Worth trying while we look for something more to help.

                  0_1475969903389_Screenshot from 2016-10-08 19-37-28.png

                  1 Reply Last reply Reply Quote 2
                  • scottalanmillerS
                    scottalanmiller
                    last edited by

                    I see that you have this posted here as well: http://serverfault.com/questions/807892/how-to-recover-from-invalid-drive-movement-hp-smartarray-p411

                    1 Reply Last reply Reply Quote 0
                    • scottalanmillerS
                      scottalanmiller
                      last edited by

                      Hopefully the controller offers an option to continue even with the invalid drive movement, but it might not. Updating the firmware might enable that, or might not.

                      1 Reply Last reply Reply Quote 1
                      • Reid CooperR
                        Reid Cooper
                        last edited by

                        If you have no support options and get desperate, you could try some really desperate things like wiping the array controller and forcing it to pick up the array as if it had never had drives before. That might work, but it is risky and I would not do it unless you have exhausted other options.

                        1 Reply Last reply Reply Quote 1
                        • S
                          Shuey @scottalanmiller
                          last edited by

                          @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                          @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                          @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                          @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                          I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

                          Does your normal reboot schedule of your server include a reboot of the MSA? Could it be that they were powered back on in the incorrect order? MSAs are notoriously flaky, likely that is where the issue is.

                          I'd call HPE support. The MSA is a flaky unit but HPE support is quite good.

                          We unfortunately don't have a "normal reboot schedule" of ANY for our servers :-/...

                          I should not have said schedule. I should have said your "Normal reboot process." Regardless of the regularity of the reboots, is the process a standard one?

                          I'm not sure we have a "standard"... we only reboot this particular ESXi host when absolutely necessary, and this weekend is possibly the first time we've rebooted the MSA in a year or more :-S...

                          scottalanmillerS 1 Reply Last reply Reply Quote 1
                          • S
                            Shuey @scottalanmiller
                            last edited by

                            @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                            @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                            We don't have a support contract on this server or the attached MSA, and they're likely way out of warranty (ProLiant DL360 G8 and a StorageWorks MSA60), so I'm not sure how much we'd have to spend in order to get HP to "help" us :-S...

                            A bit. Why is there an MSA out of contract? The only benefit to an MSA is the support contract. Not that that makes it worth it, but proprietary storage requires a warranty contract to be viable. The rule is that any storage of that nature needs to be decommissioned the day before the support contract runs out because there isn't necessary any path to recovery in the event of an "incident" without one. It's not a standard server that you can just fix yourself with third party parts. Sometimes you can, but as it is a closed, proprietary system, you are generally totally dependent on your support contract from the vendor to keep it working.

                            There is a good chance that this is a "replace the MSA and restore from backup" situation in that case.

                            Unfortunately, my company's philosophy on "investing in IT infrastructure" goes like this: "We'll spend hundreds to thousands of dollars every time our PACS vendor tells us they need it. Then, when they say that they need to upgrade their equipment, we'll re-purpose their old stuff for the rest of our production environment (because we don't understand the importance of spending money on the rest of our infrastructure, and we don't trust the knowledgeable people we hired in our IT department)"

                            scottalanmillerS 1 Reply Last reply Reply Quote 1
                            • scottalanmillerS
                              scottalanmiller @Shuey
                              last edited by

                              @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                              @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                              @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                              @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                              @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                              I actually rebooted this server multiple times about a month ago when I installed updates on it. The reboots went fine. We also completely powered that server down at around the same time because I added more RAM to it. Again, after powering everything back on, the server and raid array information was all intact.

                              Does your normal reboot schedule of your server include a reboot of the MSA? Could it be that they were powered back on in the incorrect order? MSAs are notoriously flaky, likely that is where the issue is.

                              I'd call HPE support. The MSA is a flaky unit but HPE support is quite good.

                              We unfortunately don't have a "normal reboot schedule" of ANY for our servers :-/...

                              I should not have said schedule. I should have said your "Normal reboot process." Regardless of the regularity of the reboots, is the process a standard one?

                              I'm not sure we have a "standard"... we only reboot this particular ESXi host when absolutely necessary, and this weekend is possibly the first time we've rebooted the MSA in a year or more :-S...

                              For the future, sadly it is too late now, but consider these things...

                              • A monthly reboot at the least of everything, not just some components, let's you test that things are really working and at a time when you can best fix them.
                              • Avoid devices like the MSA in general, they add a lot of risk fundamentally.
                              • Avoid any proprietary "black box" system that is out of support. While these systems can be good when under support, the moment that they are out of support their value hits a literal zero. They are effectively bricks. Would you consider running the business on a junk consumer QNAP device? This device when out of support is far worse.
                              1 Reply Last reply Reply Quote 2
                              • scottalanmillerS
                                scottalanmiller @Shuey
                                last edited by

                                @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                                @scottalanmiller said in "Invalid Drive Movement" (HP Smart Array P411):

                                @Shuey said in "Invalid Drive Movement" (HP Smart Array P411):

                                We don't have a support contract on this server or the attached MSA, and they're likely way out of warranty (ProLiant DL360 G8 and a StorageWorks MSA60), so I'm not sure how much we'd have to spend in order to get HP to "help" us :-S...

                                A bit. Why is there an MSA out of contract? The only benefit to an MSA is the support contract. Not that that makes it worth it, but proprietary storage requires a warranty contract to be viable. The rule is that any storage of that nature needs to be decommissioned the day before the support contract runs out because there isn't necessary any path to recovery in the event of an "incident" without one. It's not a standard server that you can just fix yourself with third party parts. Sometimes you can, but as it is a closed, proprietary system, you are generally totally dependent on your support contract from the vendor to keep it working.

                                There is a good chance that this is a "replace the MSA and restore from backup" situation in that case.

                                Unfortunately, my company's philosophy on "investing in IT infrastructure" goes like this: "We'll spend hundreds to thousands of dollars every time our PACS vendor tells us they need it. Then, when they say that they need to upgrade their equipment, we'll re-purpose their old stuff for the rest of our production environment (because we don't understand the importance of spending money on the rest of our infrastructure, and we don't trust the knowledgeable people we hired in our IT department)"

                                Simply explain that an unsupported MSA is a dead device, totally useless. When asked to use it, explain that it's not even something you'd play around with at home.

                                Even a brand new, supported MSA falls below my home line. But once out of support, it's below any home line.

                                http://www.smbitjournal.com/2014/11/the-home-line/

                                1 Reply Last reply Reply Quote 1
                                • scottalanmillerS
                                  scottalanmiller
                                  last edited by

                                  What have you been trying thus far? What's your current triage strategy assuming that we can't fix this?

                                  1 Reply Last reply Reply Quote 1
                                  • scottalanmillerS
                                    scottalanmiller
                                    last edited by

                                    Edited to add tags and upgrade the title for SEO and rapid visual determination.

                                    1 Reply Last reply Reply Quote 1
                                    • S
                                      Shuey
                                      last edited by

                                      You guys are not going to believe this...

                                      First I attempted a fresh cold boot of the existing MSA, waited a couple minutes, then powered up the ESXi host, but the issue remained. I then shutdown the host and MSA, moved the drives into our spare MSA, powered it up, waited a couple minutes, then powered up the ESXi host; the issue still remained.

                                      At that point, I figured I was pretty much screwed, and there was nothing during the initialization of the RAID controller where I had an option to re-enable a failed logical drive. So I booted into the RAID config, verified again that there were no logical drives present, and I created a new logical drive (RAID 1+0 with two spare drives; same as we did about 2 years ago when we first setup this host and storage).

                                      Then I let the server boot back into vSphere and I accessed it via vCenter. The first thing I did was removed the host from inventory, then re-added it (I was hoping to clear all the inaccessible guest VMs this way, but it didn't clear them from the inventory). Once the host was back in my inventory, I removed each of the guest VMs one at a time. Once the inventory was cleared, I verified that no datastore existed and that the disks were basically ready and waiting as "data disks". So I went ahead and created a new datastore (again, same as we did a couple years ago, using VMFS). I was eventually prompted to specify a mount option and I had the option of "keep the existing signature". At this point, I figured it'd be worth a shot to keep the signature - if things didn't work out, I could always blow it away and re-create the datastore again. After I finished the process of building the datastore with the keep signature option, I tried navigating to the datastore to see if anything was in it - it appeared empty. Just out of curiosity, I SSH'd to the host and checked from there, and to my surprise, I could see all my old data and all my old guest VMs! I went back into vCenter and re-scanned storage and refreshed the console, and all of our old guest VMs were there! I re-registered each VM and was able to recover everything! All of our guest VMs are back up and successfully communicating on the network.

                                      I think most people in the IT community would agree that the chances of having something like this happen are extremely low to impossible.

                                      As far as I'm concerned, this was a miracle of God...

                                      1 Reply Last reply Reply Quote 2
                                      • scottalanmillerS
                                        scottalanmiller
                                        last edited by

                                        That is seriously amazing!

                                        1 Reply Last reply Reply Quote 1
                                        • scottalanmillerS
                                          scottalanmiller
                                          last edited by

                                          Next step... get local drives and decom that MSA60. It just sent a shot across your bow and has exposed how dangerous and precarious it is. Don't fail to heed its warning.

                                          S 1 Reply Last reply Reply Quote 3
                                          • S
                                            Shuey @scottalanmiller
                                            last edited by

                                            @scottalanmiller said in Invalid Drive Movement from HP SmartArray P411 RAID Controller with StorageWorks MSA60:

                                            Next step... get local drives and decom that MSA60. It just sent a shot across your bow and has exposed how dangerous and precarious it is. Don't fail to heed its warning.

                                            Absolutely Scott! I'm gonna be talking more with my boss about this as soon as possible!

                                            1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post