ZFS Based Storage for Medium VMWare Workload
- 
 @dafyre said: @donaldlandru said: The politics are likely to be harder to play as we just renewed our SnS for both Essentials and Essentials plus in January for three years. 
 <snip>
 Another important piece of information with the local storage is that everything is based on 2.5" disks -- and all but two servers only have two bays each, getting any really kind of local storage without going external direct attached (non-shared) is going to be a challenge.He brings a good point about the 2 bays and 2.5" drives... Do they even make 4 / 6 TB drives in 2.5" form yet? If not, would it be worth getting an external DAS shelf for each of the servers? It's been 15 years, but I've seen DAS shelves that can be split between two hosts. Assuming those are still made, and there is enough needed disk slots, that would save a small amount. 
- 
 @donaldlandru said: To get this all into a single cluster (and hopefully using something like VSAN) would require us to upgrade to standard or higher, we would be able to use acceleration kits to get us there but is no small investment. Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right? What I am proposing is that you make the single order of magnitude leap from "acceptably low" reliability to "standard reliability which is good enough for any normal SMB" while dropping your cost dramatically. It's a massive win. Saving a fortune AND leaping far beyond your reliability needs. Going to something like VSAN just can't make sense. You didn't need something like this before, why would you suddenly need to leapfrog from "super low availability" right over top of normal all the way to "they don't even need this on most of Wall St" super high availability at massively high cost that would require that you upgrade your compute nodes and licensing high cost storage replication technologies? Not only would it require that but it would require bigger or more nodes in order to handle those needs. It's a little like someone who has been riding a bicycle for years (but paying a fortune for it) finding out that they can get a Chevy Cruze for half the price, but having seen what cars are like, deciding that they should buy a Ferrari for their first car when a bicycle was fine all along. 
- 
 @Dashrender said: It's been 15 years, but I've seen DAS shelves that can be split between two hosts. Assuming those are still made, and there is enough needed disk slots, that would save a small amount. DAS by definition can be split. 
- 
 @Dashrender said: But that is completely unnecessary if you move to Xen (or is it XenServer - still confused) or Hyper-V If he moves to HyperV or XenServer he would still need proprietary replicated local storage options at his node count. But it would be free at the platform layer (saving $10K at least) and far cheaper at the storage layer (saving many thousands more.) 
- 
 @scottalanmiller said: It's important to recognize that it is a SPOF. But being a SPOF is not the core issue, believe it or not, just the one that causes the biggest emotional reaction. If you were to buy a super high end active/active EMC or HDS device for this (mainframe class storage, start around $50K for the smallest possible units) the fact that it was a SPOF would be heavily mitigated. The whole mainframe concept is built around making a SPOF that is unlikely to fail. But your issues are bigger. Here are the big issues that you are left with in both of your scenarios: - Single point of failure on which everything rests (the thing most likely to fail causes EVERYTHING to fail.)
- No risk mitigation for the other layers in the dependency chain. This isn't a 3-2-1 as traditionally described but actually a (1/1/1-1) meaning ANY server failure results in unmitigated (literally) failure AND any storage failure results in total failure. You have a dramatic increase in failure risk with this design, not just a small or moderate increase like most people see (because most people are confused and heavily mitigate risk at one or two but not all three layers.) So it is very important to realize that this is at least one full order of magnitude more risky than a traditional inverted pyramid of doom.
- The single point of failure that you have is actually a pretty fragile one. Probably more fragile than the servers themselves. So not only is the risk of failure doubled by having two completely places for things to fail, but the single point of failure that impacts everything is the most fragile piece of all.
- This has the highest cost both today AND going into the future.
 Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are: - Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)
- Move to local storage and create redundant servers for items that can't be down (split-scope DHCP, second Exchange server) not sure how to mitigate the risk to SharePoint being offline since it is the free version, plus the SQL server would be another single point
 When dealing with the Microsoft licensing to create the redundancy to obtain the reliability the business wants I think we are coming in at around the same price. Going with local storage here would reduce the complexity and if I can convince the organization to go with Office 365 we actually have a lot lower risk here and wouldn't need to create a bunch of highly available services. The second topic (scope) is the development environments and you are 100% correct, even if we have active/active SAN clusters the failure will always be at the server level. The lack of vmotion in this "cluster" and the lack of available resources to do a failover, make the compute layer the biggest problem. If we lose a compute node those servers are offline until replaced. The business accepts that risk as long as we have a fast way of spinning down VMs and bringing up the VMs the team is working on. This is much easier with shared storage than local, in my opinion. So I do have multiple problems to solve, with different sets of requirements. 
- 
 If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following: - Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!
- Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)
- Make two clusters of two servers each keeping every software piece free and simple
 
- 
 Going the XenServer HA route, the guy who actually makes HA-Lizard is here in the community so that is a big deal that not only do you have XS resources here, but you have *the XS HA resource. 
- 
 @donaldlandru said: Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are: Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation. 
- 
 My next biggest concern, like any technology, is how do I get there from here. I have enough budget for a storage node, and we are going to run out of space within the next 60 days. I do not have, and will not receive additional funding this year for new servers. So some form of "in-place" style of upgrade has to occur. Obviously, this is a server down, convert vm bring it back up type of process that has an unknown LoE. Trying to not paint a picture of a rock and a hard place, but realistically where else am I at right now? 
- 
 @scottalanmiller said: If you were going to go with RLS, which is completely crazy given the scenario and historically accepted risk then the best investment would be to do the following: - Replace all nodes with adequately sized nodes built on the HP DL380 G9 platform or the Dell R730xd platform. These have enough compute to replace several of your nodes in one, enough memory to handle all of your needs and more than 600% greater per node storage capacity!
- Move to either HyperV + Starwind or XenServer + DRBD (HA-Lizard)
- Make two clusters of two servers each keeping every software piece free and simple
 That would cost a lot more than his current $14,000 budget (assuming that number was a budget number). 
- 
 @donaldlandru said: - Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)
 I've never seen someone do this successfully. That doesn't suggest that it doesn't work, but are you sure that the MSA series will do SAN mirroring with fault tolerance? I'm not confident that that is a feature (but certainly not confident that it isn't.) Double check that to be sure as I talk to MSA users daily and no one has ever led me to believe that this was even an option. I know that Dell's MD series cannot do this, only the EQL series. 
- 
 @scottalanmiller said: @donaldlandru said: Ok if we split this into two separate topics the only unmitigated failure point in operations in the single SAN. Two options to mitigate the risk are: Not currently, you had said that your nodes do not have the tools or the overhead to absorb the load from a failed node, correct? That makes the risk of those nodes failing unmitigated as well. You only have enough nodes to handle your capacity not enough to use them for failure mitigation. In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb. 
- 
 @Dashrender said: That would cost a lot more than his current $14,000 budget (assuming that number was a budget number). Yes, but cost far less than what he was proposing. My recommendations were to lower his cost while improving reliability originally. Then he lept to the Ferrari scenario so I proposed another solution that still beats that one while maintaining the Ferrari features while still only spending a fraction as much money. 
- 
 @donaldlandru said: In Operations, the two node cluster,I said they do have necessary resources to absorb the other node failing. It is the development "cluster that isn't a cluster" that cannot absorb. Oh okay. So mitigated where it matters, I assume, and unmitigated where it doesn't matter so much. That I was not clear about. 
- 
 @scottalanmiller said: Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right? That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief. This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works." 
- 
 @Dashrender said: @scottalanmiller said: Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right? That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief. This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works." This -- all day long! It worked for the 7 years before you got here, it will keep working long after. I fight this fight every day 
- 
 @scottalanmiller From Previous posts, it sounds like they are most concerned with the Dev environment right now since the Ops cluster appears to be ok. 
- 
 @donaldlandru said: @Dashrender said: @scottalanmiller said: Going to VSAN, Starwind, DRBD, etc. would be an "orders of magnitude leap" that is not warranted. It just can't make sense. What you have today and what you are talking about moving to are insanely "low availability." Crazy low. And no one had any worries or concerns about that, right? That's just it - the company probably thinks they have that super high level of availability and the fact that they've never had a failure feeds that fire of belief. This has always been the issue I've had when I try to redesign/spend more money on a new solution. I get the push back - "well, we did it that cheap way before and it worked for 8 years, why do I suddenly now need to use this other way, clearly the old cheap way works." This -- all day long! It worked for the 7 years before you got here, it will keep working long after. I fight this fight every day All you need is one good outage and the tune changes. Or at least that's what I experienced at my last position. 
- 
 @scottalanmiller said: @donaldlandru said: - Add a second SAN that replicates with the first (HP MSA easy to do, not so nice price tag)
 I've never seen someone do this successfully. That doesn't suggest that it doesn't work, but are you sure that the MSA series will do SAN mirroring with fault tolerance? I'm not confident that that is a feature (but certainly not confident that it isn't.) Double check that to be sure as I talk to MSA users daily and no one has ever led me to believe that this was even an option. I know that Dell's MD series cannot do this, only the EQL series. Real life I am not sure if it works, on paper it does. It is a false sense of security but the MSA does have active/active controllers built in (10GB iSCSI), redundant power supplies, and of course the disks are in a RAID. The risks that are not mitigated by the single chassis are: - Chassis failure (I am sure it can happen, but the only part in the chassis is the backplane and some power routing)
- Software bug -- most likely failure to occur
- Human error (oops I just unplugged the storage chassis)
 All in all I think the operations is pretty well protected, minus the three risks listed above. It is two nodes that can absorb either node failing, it is on redundant 10gig top of rack switches and redundant 1gig switches. Also, backups are done and tested as well with Veeam. Am I missing something here? Unless I am mistaken, and Scott please correct me if I am, it is the three node development cluster that is in sorry shape. 
- 
 In your Dev environment, you have 3 servers... with 288GB of Ram, 64GB of RAM, and 16 GB of RAM... Assume RAM compatibility... What happens if you balance out those three servers and get them at least close to having the same amount of RAM? Does that help you at all? If that is a good idea, then why not look at converting them to XenServer and switching to Local Storage? You could then replicate the VMs to each of the three hosts, or you could set up HA-Lizard. 




