Hyper-V High availability? or only VMware
-
@LAH3385 said:
What I mean by High Availability is for our production team to keep on working without interruption. Currently our file server is on the same server as DC AD DHCP DNS, etc... Back in July, AD got corrupted and went into BSOD loop. This cause our production to freeze for half a day before we are able to get the backup restored.
That incident cost us potential thousands of dollar in only half day. If it happens again and it goes down for days then we may be out of business. What that said, we are looking into redundancy servers or high availability.
AD is HA at the application level. Even if you have the most HA Hyper-V or VMware platform you would never have AD utilize it. AD would always be set to run as normal. That you ran into this issue would not be resolved by having HA and in this particular case could have been resolved by having two AD VMs on a single host.
In fact, if you had full VMware Fault Tolerance, your AD BSOD would have replicated to the other host and your VMware would have extended the problem rather than solving it! This is a great example of how HA is something you do, not something that you buy. You need to design your HA solution workload by workload. Tools like Hyper-V can be really important parts of that design, but rarely will it be the primary one.
-
@LAH3385 said:
Can you shade some light based on the requirement I stated earlier.
I just skimmed the SW thread. Your answer is drop the entire project and fire the current IT consultant.
Then start over with someone who is not out to screw you over.
-
@LAH3385 said:
What do we want? We want a server that can be a RAID 5 but for server.
Just be aware that the issue you described would be subject to the same kinds of issues that RAID is. In your example, AD failed above the platform level, the OS itself failed. So the platform HA would have done nothing to protect you. Platform level HA protects exclusively against hardware failures, not software ones. That's why you only do that when application level HA is not available or reasonable - because it only solves half or less of the issues.
In the same way RAID is great - until the issue is files deleted from the disk, cryptoware or file system corruption. RAID will, just like the HA virtualization solutions, replicate the problem to all nodes instantly leaving you with nothing working no matter how much you spend on the HA.
These are cases where application level HA would solve most problems and a good backup system would solve others.
-
Love this thread - it's finally getting down to "thinking about the problem, not just choosing a solution based on those already given to you"
You mentioned that you had an outage because you had AD corruption. What caused that corruption? How will having anything you've asked for so far prevent or solve this problem in the future?
Things we don't know - Did you only have one AD server? If yes, would having a second AD server have solved this issue?
You've told us your current file storage is on the AD server itself, OK that's easy to solve, make sure to put it on it's own VM in the future. You might find yourself needing a lot of Windows licensing here depending on your setup. If you're expecting a full fail over situation, you'll need the same number of licenses for each server. Assuming you only need one VM per host, you'll need to purchase 1 Windows Server license per host, but, if you need two or more, you'll need at least two Windows Server licenses per host to allow the fail over/maintenance to happen legally.Also, do you need real HA? Can you afford 10 mins of down time while you bootup another VM on the other host? etc etc.
-
@LAH3385 said:
I do not want to overspend on something that can be done and deliver similar result for less. I have many more area I could use some more budget on.
Nor do we want you to. In many cases this will come down to not buying more or little more but rather planning better, changing how and what you implement and being far more thoughtful rather than look at HA as a "solution." HA as a concept is awesome and you should always work towards it, all other factors being equal.
So that's what we need to do. It might make sense to make a separate thread for a number of workloads (maybe one thread per workload) and link here for a higher level description, and we can break down each workload and how it should or could be addressed.
Active Directory, for example, needs to be thought about uniquely. It's actually the easiest to deal with as normal SMBs all have HA for AD - but typically the "vendors salesguy idea" of what to do not only triples your cost, it very often breaks the HA you already had!
-
@LAH3385 said:
What kind of HDD type is recommended for Starwind VSAN? RAID10 with at least 3TB storage space. SATA7.2K or SAS 10K/15K? I doubt we can afford SSD.
If VMware is on the table, you can afford SSDs no problem. Not that you need them, just considering the one guarantees the budget for the other, if that makes sense.
StarWind does not recommend RAID 10 normally. Normally they would push towards RAID 6 or less.
Adding in @StarWind_Software @KOOLER @original_anvil
-
@scottalanmiller said:
StarWind does not recommend RAID 10 normally. Normally they would push towards RAID 6 or less.
Wait, what? StarWind recommends RAID 6 for the sync'ed underbelly of your VM infrastructure?
-
@Dashrender said:
@scottalanmiller said:
StarWind does not recommend RAID 10 normally. Normally they would push towards RAID 6 or less.
Wait, what? StarWind recommends RAID 6 for the sync'ed underbelly of your VM infrastructure?
Often they recommend RAID 0. I, however, do not.
-
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
StarWind does not recommend RAID 10 normally. Normally they would push towards RAID 6 or less.
Wait, what? StarWind recommends RAID 6 for the sync'ed underbelly of your VM infrastructure?
Often they recommend RAID 0. I, however, do not.
Wow.. I guess that would be the really poor man's option.. but if you are that poor.. why do you have two servers? why not just one that costs less than the total cost of two but more powerful (if needed) than the single? Seems like the wrong way to go about things.
This reminds me of @scottalanmiller all eggs in one basket aren't really worst than splitting them over two baskets post.
-
@LAH3385 to determine the storage needs (drives, RAID, etc.) we would need some good info about the needed storage capacity and IOPS that are needed. It is very possible that normal SATA or SL-SAS drives will do the trick. For file servers and AD, slow SATA is more than enough.
-
@Dashrender said:
@scottalanmiller said:
@Dashrender said:
@scottalanmiller said:
StarWind does not recommend RAID 10 normally. Normally they would push towards RAID 6 or less.
Wait, what? StarWind recommends RAID 6 for the sync'ed underbelly of your VM infrastructure?
Often they recommend RAID 0. I, however, do not.
Wow.. I guess that would be the really poor man's option.. but if you are that poor.. why do you have two servers? why not just one that costs less than the total cost of two but more powerful (if needed) than the single? Seems like the wrong way to go about things.
Because many people worry solely about compute node failure and nothing else, just like the logic that leads people to spend a fortune on an inverted pyramid while having huge risk from a single, cheap, fragile SAN - they get sidetracked thinking about a single failure mode rather than focusing on overall reliability.
But keep in mind, StarWind with RAID 0 is still overall RAID 01. But I would almost want RAID 6 in there myself to avoid node failover caused by storage whenever possible. Resulting in RAID 61.
-
@scottalanmiller said:
But keep in mind, StarWind with RAID 0 is still overall RAID 01. But I would almost want RAID 6 in there myself to avoid node failover caused by storage whenever possible. Resulting in RAID 61.
I was thinking the same thing. I'd really had to loose a node, then loose the other server because of a drive failure that I was unlucky enough to loose during a node failure.
But even that seems really undesirable (RAID 6 that is) because of the performance penalties - I 'feel' like a single server would be better in general in that case with RAID 10 Spinning Rust, or RAID 5 SSD.
-
I wonder if @scottalanmiller would still recommend OBR 10 instead of RAID 6 for use with Starwind?
-
We have 2 main production team: I'll call them A and B for simplicity.
A requires File Server as they only need to gather documents and other stuff. Applications that they need are Chrome, Adobe, Office.
B requires some File Server and DB access (Access, SQL, some other accounting programs). B is a more mission critical. For B, the server cannot goes down during production.. period. B is what really require HACurrently both A and B are on different physical servers but B still has some files on A server. When server B goes down, it cause DB corruption. The fix is easy and only takes 30 minutes to relink files and restore some as needed from back up.
AD that got corrupted back in July cause File Server inaccessible and that was what really dealt the most damage. If File Server is the only Mission Critical then failover DFS should be enough. but my boss wants the OS to be failover-able.
-
The question is why? I'm guessing because he's old school (kinda like me). But he and I both need to join the 21st century.
Using Application Level failover is much better than using hardware fail over whenever possible. Of course it's not always possible, so we have hardware fail over as another thing we can add to the reliability chain.
-
@LAH3385 said:
AD that got corrupted back in July cause File Server inaccessible and that was what really dealt the most damage. If File Server is the only Mission Critical then failover DFS should be enough. but my boss wants the OS to be failover-able.
What caused your AD corruption? Did AD corruption prevent access to the server completely?
-
@Dashrender said:
But even that seems really undesirable (RAID 6 that is) because of the performance penalties - I 'feel' like a single server would be better in general in that case with RAID 10 Spinning Rust, or RAID 5 SSD.
Depends on the workload. Most SMB workloads are perfectly fine on RAID 6. Remember you two arrays here, not just one, so your getting some performance from each, but you do have write overhead of the network.
-
@dafyre said:
I wonder if @scottalanmiller would still recommend OBR 10 instead of RAID 6 for use with Starwind?
All depends on the workload. In this case you would normally gravitate towards OBR6 unless you need the speed of OBR10 rather than looking primarily at reliability.
-
@LAH3385 said:
but my boss wants the OS to be failover-able.
Did you explain to him that the OS failing over could DIRECTLY undermine the ability to meet the business need of keeping the files available? OS failing over is a fallback for when your file server fails to fall over, it's not his goal.
Sounds like he is leading with "proximate" needs rather than "goal" needs.
-
@LAH3385 said:
Currently both A and B are on different physical servers but B still has some files on A server. When server B goes down, it cause DB corruption.
We should address this and fix why the database is corrupting as a good starting point.