ZFS Based Storage for Medium VMWare Workload

donaldlandru

Ok, so a little background. the storage situation at my organization is our weakest link in our network. Currently we have a single HP MSA P2000 with 12 spindles (7200 rpm) serving two separate ESXi clusters. We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines. Development is our core business, in simple terms we do SI work for Oracle Retail applications which includes custom development. Some in the organization argue this data may be even more important than the aforementioned operations systems, thankfully IMO my boss (the CEO) disagrees with that opinion. Also, when presenting this same information (rolled up better to speak CEO), my bosses response was whatever I think is the better solution. The company really does stand behind me in what I suggest, I just don't want to add additional risk.

It is not uncommon for us to max out the disk i/o on 12 spindles sharing the load of almost 150 virtual machines and everyone is on board that something needs to be changed.

Here is what the business cares about the solution: Reliable solution that provides necessary resources for the development environments to operate effectively (read: we do not do performance testing in-house as by the very nature, it is much a your mileage may vary depending on your deployment situation).

In addition to the business requirements, I have added my own requirements that my boss agrees with and blesses.

Operations and Development must be on separate storage devices
Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)
Must be expandable to accommodate future growth

Requirements for development storage

9+ Tib of usable storage
Support a minimum of 1100 random iops (what our current system is peaking at)
disks must be in some kind of array (zfs, raid, mdadm, etc)

Proposed solutions:

#1 a.k.a the safe option
HP StoreVirtual 4530 with 12 TB (7.2k) spindles in RAID6 -- this is our vendor recommendation. This is an HP renew quote with 3 years 5x9 support next-day on-site for ~$15,000

Pros
Can purchase support
Single-vendor -- "one throat to choke"
Integrated solution
Cons
Less performance than solution #2 out of the box
More expensive to upgrade later (additional shelves and drives at HP prices)
All used hardware

#2 ZFS Solution ~$10,000
24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs
Based on Supermicro SC216E16 chassis
X9SRH-7F Motherboard
Intel E5-1620v2 CPU
64 GB of RAM
No L2ARC or ZIL planned
Dual 10gig NICs

Pros
Better performance out of the box (twice the spindle count)
Non-vendor specific parts means upgrades require less investment

Cons
Self-supported
I am the support contract
Multiple vendors and suppliers to acquire parts
Combination of new and used hardware (the chassis) to get this price point

Alright, tear me apart tell me I am wrong or provide any other useful feedback. The biggest concerns I have exist in both platforms (drives fail, controllers fail, data goes bad, etc) and have to be mitigated either way. That is what we have backups for -- in my opinion the HP gets me the following things:

The "ability" to purchase a support contract
Next-day on-site of a tech or parts if needed

With the $4000 saved from not buying the HP support contract I can buy a duplicate Supermicro system, and a couple extra hard drives, and have the same level of protection.

Note: this is my first time posting an actual give me feedback topic, I tried to include all information I felt was relevant. If more is needed I can provide.

scottalanmiller

Before I dive into it, what is the need around ZFS? It sounds like you are leading with the solution, rather than the goal, which will not lead us in the direction of a best answer. We should step back and think at the goal level and determine what it is that we want to accomplish. Maybe ZFS will be the answer, but what it if isn't? Leading with the answer and looking for the question isn't the best way to design a solution.

scottalanmiller

@donaldlandru said:

We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines.

So a two node cluster and a three node cluster. This seems straightforward.... no external storage at all. The rule of thumb of external storage is that it should not be considered until you are above four nodes in a single cluster and even then, not normally until much larger. What is the purpose of having external storage at all?

scottalanmiller

Another question: what is the purpose for the clusters? Currently you have an inverted pyramid of doom, not the best design as you know. But this implies that there are no needs around high availability. In fact, it means that you are currently below "standard availability" and this should mean that dropped out of clusters to just go to stand alone servers would itself be an improvement. What is the reason for having clusters at all given that reliability hasn't been a factor thus far?

donaldlandru

@scottalanmiller said:

Before I dive into it, what is the need around ZFS? It sounds like you are leading with the solution, rather than the goal, which will not lead us in the direction of a best answer. We should step back and think at the goal level and determine what it is that we want to accomplish. Maybe ZFS will be the answer, but what it if isn't? Leading with the answer and looking for the question isn't the best way to design a solution.

In a sense I am, only due to outside of the MSA and Windows based storage this is what I am most familiar with. Seeing as if we don't go with a vendor supported solution, this would require the minimal effort to support. Doesn't make it the right answer, just the one I am most comfortable with putting my name next too.

scottalanmiller

@donaldlandru said:

Operations and Development must be on separate storage devices

Mostly makes sense. This heavily suggests that the local storage options will be best then as you lose the only real potential leverage for having external storage which was tiny bits of cost savings that might have arisen by having five servers share one storage unit. Without that, really hard to come up with a way to have external storage. It was essentially impossible even with five.

scottalanmiller

@donaldlandru said:

Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)

What's the reason for this? Red drives are just as reliable, or meaningfully so, as any other drive type in certain scenarios. I'm not saying that Red is going to be right or make any sense, but as a requirement this doesn't match the concept of a business goal. This is another "solution looking for a problem." Red drives are perfectly viable for the most enterprise of applications, when they fit the bill.

Even for a SAM-SD, which by definition is all about being enterprise storage, WD Red are perfectly acceptable. The idea that consumer drives are risky is purely one tied to the use of already more risky parity arrays. The same factors that would make you classify WD Red as "non-business class" also qualifies RAID 6 in the same way. So it would rule both or neither out, depending on the application of this rule but not one or the other.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines.

So a two node cluster and a three node cluster. This seems straightforward.... no external storage at all. The rule of thumb of external storage is that it should not be considered until you are above four nodes in a single cluster and even then, not normally until much larger. What is the purpose of having external storage at all?

This setup was implemented when I first started four years ago, we used a third-party consultant and they designed this at the solution for the operations cluster. There were initial plans to do something different for the development cluster, but due to cost of the SAN (which may or may not have been needed) it was then value-engineered by the people leading the project, and little regard to my input, as I was the new guy.

My initial plan was to build a four-node cluster with shared storage without the ops/dev silos. The ops (2node) cluster is licensed with VMWare Essentials Plus and the dev cluster is licensed with VMware essentials. I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.

VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access

scottalanmiller

@donaldlandru said:

Must be expandable to accommodate future growth

Expandability often costs a ton today and delivers very little value "tomorrow." Is this truly an important business goal? It is very often cheaper to do the right thing for today and the immediate future and evaluate again in one, two or five years - whenever factor have changed and you are in a position to make a new decision. Planning for expansion introduces unnecessary risk to the project.

scottalanmiller

@donaldlandru said:

VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access

This tells us two things:

VMware is the wrong platform for you almost certainly. You are paying a premium to get less than you would get for free elsewhere.
There is no reason for a cluster or external storage as even the most minimal features of it are being skipped.

scottalanmiller

By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.

scottalanmiller

That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!

scottalanmiller

@donaldlandru said:

I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.

Better to be fast and cheap than to be slow, expensive and have to balance. Easier to throw "speed" at the problem than to do live balancing if that is all that you are getting out of it.

Maintenance should be trivial, what planned outages are you avoiding that warrant the heavier risk of unplanned ones?

scottalanmiller

@donaldlandru said:

Requirements for development storage

9+ Tib of usable storage

Support a minimum of 1100 random iops (what our current system is peaking at)

If split between five nodes, that's a minimal number. My eight year old desktop has 100,000 IOPS! This is less than 250 IOPS per machine, you can often hit that with a small RAID 1 pair in each box! And 10TB is just 2TB per box. This isn't a big problem to tackle when you break it down. Actually pretty moderate needs.

donaldlandru

@scottalanmiller said:

That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!

Essentials plus is to allow us to use VMotion on operations cluster, where is would likely be cheaper in the long-run to acquire MS Server datacenter licensing and building redundant services, this was the approved solution to move VM's back and forth for node maintenance / upgrades.

The ops layout is
2x AD DC (one hosts DHCP server)
1x SQL server for SharePoint
1x SharePoint foundation
1x Exchange server
1x File Server (hosts a bunch of other services because of no additional server licenses)
handful of other CentOS servers for monitoring, help desk, internal web server

The ops cluster could likely be decommissioned and what little remaining services could be collocated on the dev environments if I could only convince the owners to go with Office 365

scottalanmiller

@donaldlandru said:

#1 a.k.a the safe option
HP StoreVirtual 4530 with 12 TB (7.2k) spindles in RAID6 -- this is our vendor recommendation. This is an HP renew quote with 3 years 5x9 support next-day on-site for ~$15,000

http://www8.hp.com/us/en/products/disk-storage/product-detail.html?oid=6255484

Other than being able to blame a vendor for losing data or uptime rather than being on the hook yourself, what makes this safe? Looking at it architecturally, I would call it reckless to the business as it is an inverted pyramid of doom. The unit is nothing but a normal server on which everything rests. How do you handle it failing? How do you do maintenance if you can't do bring it down? And it is just RAID 6, which is fine, but no aspect of this makes it very safe.

Having a vendor to blame is nice, but the vendor is only responsible for the product, not the system architectural design. Outages caused by this would still be your throat, not HP's. It's not that it is a bad unit, I just don't see how it could be used appropriately in this kind of a setup.

scottalanmiller

@donaldlandru said:

The biggest concerns I have exist in both platforms (drives fail, controllers fail, data goes bad, etc) and have to be mitigated either way. That is what we have backups for -- in my opinion the HP gets me the following things:

This is where you really have to look carefully. You have this big risk (and cost) that you know this does not mitigate. But having local drives with stand alone servers would partially mitigate this and local drives with replication would mitigate this better than nearly any possible approach. So you appear to have options that are faster, cheaper and potentially easier that also solve the biggest problem.

scottalanmiller

@donaldlandru said:

24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs

That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.

Understanding RAID 10 and RAID 01.

donaldlandru

Ok.. your feedback is actually showing something I have been afraid of, I have severe tunnel vision is servicing the current solution.
Doing a quick inventory as to why I am trying to do that:

We have the investment into this. Like another recent thread here discussed once an SMB gets heavily invested one way it is hard to switch. To be honest, I am not sure how I could convince them too at this point. This actually seems like an opportunity for a great learning experience
Training of supporting resources -- I have a counterpart in our off-shore office that is just getting up to speed on how VMware works -- to be this will be even harder to change
I have been using Vmware for 4 years at the office and at home, so I am comfortable with it. This reason should also make the list as to why I should change it.

One limiting factor I see right now is our current chassis are 1U with 2-4 drive bays which would hamper a local storage deployment.

Edit -- Stepping back and thinking, the lack of drive bays are not a valid limiting factor as I could easily add SAS and do DAS storage on these nodes.

donaldlandru

@scottalanmiller said:

@donaldlandru said:

24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs

That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.

Understanding RAID 10 and RAID 01.

This was modeled after the way TrueNAS (commercial version of FreeNAS) quoted us.