ML
    • Register
    • Login
    • Search
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups

    ESXi recovery woes

    IT Discussion
    6
    25
    3038
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      Carnival Boy last edited by

      Four ESXi hosts, two running ESXi version 5.1 (HP Proliant DL380 G6s) and two running version 5.5 (DL380 Gen9s). I have a VM running Windows 2008R2 and a document management system called Meridian that uses a proprietary database called Hypertrieve.

      I take an online backup of the VM.

      If I restore the backup to either of the hosts running 5.1 the VM appears to boot fine and will try and recover the database. But the recovery works fine on the two hosts running version 5.1, but fails on the two hosts running 5.5.

      I’ve compared the Event Logs of both machines. They start off similar, with events like:
      “Start Restore Database (0)”
      “Start copying snapshot back (0)”
      “End copying snapshot back (0)”
      “Restore: database opened (0)”
      “Restore : start truncate log (0)”
      “Restore: end truncate log (0 bytes discarded) (0)”

      But then, the good host logs this:
      “Restore: end processing log (0)”
      “Restore: complete”

      But the bad host logs this:
      “Error: File locked (133) OurDatabase.HDB”

      I appreciate none of you will know the specifics of how a Hypertrieve database restores itself from an unclean shutdown, but I was wondering if any of you had a clue as to why there is inconsistency depending on where the VM is restored to. I'm really at a loss as to where to look. I'd always previously assumed that if the server will boot it will work the same regardless of which version of ESXi it is or which host it is on as the VM should operate transparently to the hypervisor.

      The software vendor simply says they don't support hypervisor issues so I'm kind of own my own on this (nice).

      scottalanmiller 1 Reply Last reply Reply Quote 0
      • scottalanmiller
        scottalanmiller last edited by

        So the real answer is that I have no idea. BUT, just guessing, is that if this is happening repeatably and reliably, is that the ESXi 5.5 snapping mechanism is just enough different that it is causing the database to see a different type of corruption. In both cases you are only crash consistent, that's common and expected. But that they recover differently, I'm guessing that the imaging agent changed, but I don't know how.

        1 Reply Last reply Reply Quote 0
        • scottalanmiller
          scottalanmiller @Carnival Boy last edited by

          @Carnival-Boy said in ESXi recovery woes:

          The software vendor simply says they don't support hypervisor issues so I'm kind of own my own on this (nice).

          No problem, explain to him that this is a storage issue and has nothing to do with the hypervisor. The identical thing would happen if you were doing SAN snaps of a running VM.

          1 Reply Last reply Reply Quote 1
          • Dashrender
            Dashrender last edited by

            @scottalanmiller said in ESXi recovery woes:

            So the real answer is that I have no idea. BUT, just guessing, is that if this is happening repeatably and reliably, is that the ESXi 5.5 snapping mechanism is just enough different that it is causing the database to see a different type of corruption. In both cases you are only crash consistent, that's common and expected. But that they recover differently, I'm guessing that the imaging agent changed, but I don't know how.

            Wait a second. Are the snaps being taken on a 5.1 only? or are they being taken on both?

            I guess the way I read it was that the backups (what I assume Scott means by a snap) were taken on a 5.1 machine.

            Scott, are you saying that output from that 5.1 data to the backups, when restored in the 5.5 would somehow be different? Even if you aren't saying that, why would the way snaps are taken make any difference. The data is taken on a 5.1 server, backed up to some media, then restored from that media onto a 5.5 server - why would the snapping tech be involved here?

            1 Reply Last reply Reply Quote 0
            • scottalanmiller
              scottalanmiller last edited by

              Sorry, I was thinking that they were being snapped on 5.1 AND 5.5 and restored to what they were snapped from.

              1 Reply Last reply Reply Quote 0
              • coliver
                coliver last edited by

                Really the question is how does the database lock? It sounds like there is some means of locking the database that is specific to a VM. Does the MAC address change when you try and restore to ESXi 5.5?

                scottalanmiller 1 Reply Last reply Reply Quote 1
                • scottalanmiller
                  scottalanmiller @coliver last edited by

                  @coliver said in ESXi recovery woes:

                  Really the question is how does the database lock? It sounds like there is some means of locking the database that is specific to a VM. Does the MAC address change when you try and restore to ESXi 5.5?

                  Nothing would be specific to a VM. The virtual nature here is a red herring. This is purely about storage. At least as a root cause for the corruption. Why it sees the resulting storage differently in the two cases, that's likely virtualization related. But the database corruption in the first place is cause purely in the storage.

                  1 Reply Last reply Reply Quote 1
                  • Dashrender
                    Dashrender last edited by

                    How is the storage different? Not that I disagree, just looking for more information.

                    scottalanmiller 1 Reply Last reply Reply Quote 0
                    • scottalanmiller
                      scottalanmiller @Dashrender last edited by

                      @Dashrender said in ESXi recovery woes:

                      How is the storage different? Not that I disagree, just looking for more information.

                      That we don't know. What we know is that snapping will cause corruption with a database. So the corruption is expected and universal. What we don't know is why the snaps are loading one way in one version and another in another. I can only imagine that the block driver was changed between the two and something additionally is being affected.

                      1 Reply Last reply Reply Quote 0
                      • C
                        Carnival Boy last edited by Carnival Boy

                        I'm not sure what you mean by snapping? Do you mean VSS?

                        The issue happens if I take a backup on a 5.5 host and try and restore it on a 5.5 host. If I do that it will fail. But I can restore that same 5.5 host backup to a 5.1 host and it works fine. So the source of the backup doesn't seem to be an issue as much as the destination.

                        scottalanmiller Dashrender 2 Replies Last reply Reply Quote 1
                        • scottalanmiller
                          scottalanmiller @Carnival Boy last edited by

                          @Carnival-Boy said in ESXi recovery woes:

                          I'm not sure what you mean by snapping?

                          Slang for "taking a snapshot." That's the process that is introducing the initial corruption, I assume. The corruption should come from a block-based snapshot of the running database files.

                          1 Reply Last reply Reply Quote 0
                          • scottalanmiller
                            scottalanmiller last edited by

                            It was the phrase "online backup of the VM" that I took to be a description of a snapshot based backup. Like Veeam would do.

                            1 Reply Last reply Reply Quote 0
                            • C
                              Carnival Boy last edited by

                              I assume so. I have used both Veeam and Unitrends.

                              1 Reply Last reply Reply Quote 0
                              • Dashrender
                                Dashrender @Carnival Boy last edited by

                                @Carnival-Boy said in ESXi recovery woes:

                                I'm not sure what you mean by snapping? Do you mean VSS?

                                The issue happens if I take a backup on a 5.5 host and try and restore it on a 5.5 host. If I do that it will fail. But I can restore that same 5.5 host backup to a 5.1 host and it works fine. So the source of the backup doesn't seem to be an issue as much as the destination.

                                OH than I stand corrected, you are trying to take backups (using snaps) on the 5.5 as well as the 5.1. So you have two of these Hypertrieve servers? one on 5.1 and another on 5.5?

                                1 Reply Last reply Reply Quote 0
                                • Dashrender
                                  Dashrender last edited by

                                  So here's a question - does Hypertrieve have their own backup process for an online db? Some things do. before you kick off the backup of the VM, you kick off the backup process on the Hypertrieve DB, then the VM backup happens. Then when you restore, the Hypertrieve stuff will do it's own restore (you might have to do it manually) this is all in the name of preventing corruption.

                                  1 Reply Last reply Reply Quote 1
                                  • C
                                    Carnival Boy last edited by

                                    I shut down the VM on the 5.1 host, migrated it to the 5.5 host, powered it on, took another backup, then restored it back to both the 5.1 and the 5.5 host. I've been pretty busy!

                                    Dashrender 1 Reply Last reply Reply Quote 0
                                    • Dashrender
                                      Dashrender @Carnival Boy last edited by

                                      @Carnival-Boy said in ESXi recovery woes:

                                      I shut down the VM on the 5.1 host, migrated it to the 5.5 host, powered it on, took another backup, then restored it back to both the 5.1 and the 5.5 host. I've been pretty busy!

                                      So you can restore a snap taken from a 5.5 on a 5.1, but not back to the original 5.5 it came from... hmmm..

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        Carnival Boy last edited by Carnival Boy

                                        I'm still working on this 😧

                                        Even if I shut down the VM, back it up in a powered off state, restore to a 5.5 host, and power it on, the Hypertrieve service starts and opens the database, which I can successfully browse, then after about ten seconds it crashes and I can no longer browse the database.

                                        Since this is a restore of a powered off VM, it can't be a snapshotting issue.

                                        I have had a reply from the vendor, who writes:
                                        "So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

                                        I don't really understand this. Anyone?

                                        By "backup procedure" I think they are talking about taking a Hypertrieve backup via the Hypertrieve software and restoring the database that way after migrating.

                                        Which I'm hoping to try next, but, to compound the issue, Unitrends (which I hate, by the way) has stopped working for me, so I can no longer restore the VM! It's just one thing after another with this - I can feel my life slowly slipping away!

                                        Dashrender 1 Reply Last reply Reply Quote 0
                                        • DustinB3403
                                          DustinB3403 last edited by

                                          "So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

                                          This means that when you are importing the VM into the other host, it has a new Disk ID which is causing the issue, as the snapshot process creates a custom disk ID.

                                          What they are recommending you do is a full backup, and import that which should resolve the issue.

                                          Is there no built in way with the ESXi version to create a full backup? (I'm thinking of XO at this point so don't mind me if I'm completely wrong)

                                          1 Reply Last reply Reply Quote 0
                                          • Dashrender
                                            Dashrender @Carnival Boy last edited by

                                            @Carnival-Boy said in ESXi recovery woes:

                                            I have had a reply from the vendor, who writes:

                                            "So, it's clear for us what happened, the virtualization abstraction generates a conflict when you instance a new VM just copying, the Disk Hash is not the same, and crashes the EDM sometimes, I don't recommend a Server Copy, always the backup procedure."

                                            So does ESXi 5.1 somehow maintain the Disk HASH, and VMWare changed this practice in 5.5? Something for you to investigate.

                                            @DustinB3403 said in ESXi recovery woes:

                                            This means that when you are importing the VM into the other host, it has a new Disk ID which is causing the issue, as the snapshot process creates a custom disk ID.

                                            eh? Actually, the OP proved it has nothing to do with the snap shots by taking a backup while the VM was shutdown.

                                            This is a restore to a new VM problem. It's a problem because the vendor has the system checking the Disk ID, presumably for copy protection reasons, yet is easily thwarted by using a backup and restore procedure of the DB/application software itself. This of course means that restoring a system takes a potentially much longer time because not only do you have to restore the VM, but then you have to restore the DB inside the VM - assuming this is even possible, because I suppose you might have to reinstall the application before restoring the DB so that the application recognizes the new DISK HASH.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post