ZFS Based Storage for Medium VMWare Workload



  • Ok, so a little background. the storage situation at my organization is our weakest link in our network. Currently we have a single HP MSA P2000 with 12 spindles (7200 rpm) serving two separate ESXi clusters. We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines. Development is our core business, in simple terms we do SI work for Oracle Retail applications which includes custom development. Some in the organization argue this data may be even more important than the aforementioned operations systems, thankfully IMO my boss (the CEO) disagrees with that opinion. Also, when presenting this same information (rolled up better to speak CEO), my bosses response was whatever I think is the better solution. The company really does stand behind me in what I suggest, I just don't want to add additional risk.

    It is not uncommon for us to max out the disk i/o on 12 spindles sharing the load of almost 150 virtual machines and everyone is on board that something needs to be changed.

    Here is what the business cares about the solution: Reliable solution that provides necessary resources for the development environments to operate effectively (read: we do not do performance testing in-house as by the very nature, it is much a your mileage may vary depending on your deployment situation).

    In addition to the business requirements, I have added my own requirements that my boss agrees with and blesses.

    1. Operations and Development must be on separate storage devices
    2. Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)
    3. Must be expandable to accommodate future growth

    Requirements for development storage

    • 9+ Tib of usable storage
    • Support a minimum of 1100 random iops (what our current system is peaking at)
    • disks must be in some kind of array (zfs, raid, mdadm, etc)

    Proposed solutions:

    #1 a.k.a the safe option
    HP StoreVirtual 4530 with 12 TB (7.2k) spindles in RAID6 -- this is our vendor recommendation. This is an HP renew quote with 3 years 5x9 support next-day on-site for ~$15,000

    Pros
    Can purchase support
    Single-vendor -- "one throat to choke"
    Integrated solution
    Cons
    Less performance than solution #2 out of the box
    More expensive to upgrade later (additional shelves and drives at HP prices)
    All used hardware

    #2 ZFS Solution ~$10,000
    24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs
    Based on Supermicro SC216E16 chassis
    X9SRH-7F Motherboard
    Intel E5-1620v2 CPU
    64 GB of RAM
    No L2ARC or ZIL planned
    Dual 10gig NICs

    Pros
    Better performance out of the box (twice the spindle count)
    Non-vendor specific parts means upgrades require less investment

    Cons
    Self-supported
    I am the support contract :-/
    Multiple vendors and suppliers to acquire parts
    Combination of new and used hardware (the chassis) to get this price point

    Alright, tear me apart tell me I am wrong or provide any other useful feedback. The biggest concerns I have exist in both platforms (drives fail, controllers fail, data goes bad, etc) and have to be mitigated either way. That is what we have backups for -- in my opinion the HP gets me the following things:

    1. The "ability" to purchase a support contract
    2. Next-day on-site of a tech or parts if needed

    With the $4000 saved from not buying the HP support contract I can buy a duplicate Supermicro system, and a couple extra hard drives, and have the same level of protection.

    Note: this is my first time posting an actual give me feedback topic, I tried to include all information I felt was relevant. If more is needed I can provide.


  • Service Provider

    Before I dive into it, what is the need around ZFS? It sounds like you are leading with the solution, rather than the goal, which will not lead us in the direction of a best answer. We should step back and think at the goal level and determine what it is that we want to accomplish. Maybe ZFS will be the answer, but what it if isn't? Leading with the answer and looking for the question isn't the best way to design a solution.


  • Service Provider

    @donaldlandru said:

    We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines.

    So a two node cluster and a three node cluster. This seems straightforward.... no external storage at all. The rule of thumb of external storage is that it should not be considered until you are above four nodes in a single cluster and even then, not normally until much larger. What is the purpose of having external storage at all?


  • Service Provider

    Another question: what is the purpose for the clusters? Currently you have an inverted pyramid of doom, not the best design as you know. But this implies that there are no needs around high availability. In fact, it means that you are currently below "standard availability" and this should mean that dropped out of clusters to just go to stand alone servers would itself be an improvement. What is the reason for having clusters at all given that reliability hasn't been a factor thus far?



  • @scottalanmiller said:

    Before I dive into it, what is the need around ZFS? It sounds like you are leading with the solution, rather than the goal, which will not lead us in the direction of a best answer. We should step back and think at the goal level and determine what it is that we want to accomplish. Maybe ZFS will be the answer, but what it if isn't? Leading with the answer and looking for the question isn't the best way to design a solution.

    In a sense I am, only due to outside of the MSA and Windows based storage this is what I am most familiar with. Seeing as if we don't go with a vendor supported solution, this would require the minimal effort to support. Doesn't make it the right answer, just the one I am most comfortable with putting my name next too.


  • Service Provider

    @donaldlandru said:

    1. Operations and Development must be on separate storage devices

    Mostly makes sense. This heavily suggests that the local storage options will be best then as you lose the only real potential leverage for having external storage which was tiny bits of cost savings that might have arisen by having five servers share one storage unit. Without that, really hard to come up with a way to have external storage. It was essentially impossible even with five.


  • Service Provider

    @donaldlandru said:

    1. Storage systems must be built of business class hardware (no RED drives -- although I would allow this in a future Veeam backup storage target)

    What's the reason for this? Red drives are just as reliable, or meaningfully so, as any other drive type in certain scenarios. I'm not saying that Red is going to be right or make any sense, but as a requirement this doesn't match the concept of a business goal. This is another "solution looking for a problem." Red drives are perfectly viable for the most enterprise of applications, when they fit the bill.

    Even for a SAM-SD, which by definition is all about being enterprise storage, WD Red are perfectly acceptable. The idea that consumer drives are risky is purely one tied to the use of already more risky parity arrays. The same factors that would make you classify WD Red as "non-business class" also qualifies RAID 6 in the same way. So it would rule both or neither out, depending on the application of this rule but not one or the other.



  • @scottalanmiller said:

    @donaldlandru said:

    We have a 2 node cluster for our operations (Exchange, AD, SharePoint Foundation, and other miscellaneous applications) and a 3 node cluster for development machines.

    So a two node cluster and a three node cluster. This seems straightforward.... no external storage at all. The rule of thumb of external storage is that it should not be considered until you are above four nodes in a single cluster and even then, not normally until much larger. What is the purpose of having external storage at all?

    This setup was implemented when I first started four years ago, we used a third-party consultant and they designed this at the solution for the operations cluster. There were initial plans to do something different for the development cluster, but due to cost of the SAN (which may or may not have been needed) it was then value-engineered by the people leading the project, and little regard to my input, as I was the new guy.

    My initial plan was to build a four-node cluster with shared storage without the ops/dev silos. The ops (2node) cluster is licensed with VMWare Essentials Plus and the dev cluster is licensed with VMware essentials. I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.

    VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access


  • Service Provider

    @donaldlandru said:

    1. Must be expandable to accommodate future growth

    Expandability often costs a ton today and delivers very little value "tomorrow." Is this truly an important business goal? It is very often cheaper to do the right thing for today and the immediate future and evaluate again in one, two or five years - whenever factor have changed and you are in a position to make a new decision. Planning for expansion introduces unnecessary risk to the project.


  • Service Provider

    @donaldlandru said:

    VMotion is of little use to me in the dev cluster as these machines (RAM: 288GB, 64GB, 16GB) don't have enough resources to host everything should a node drop so it is mainly licensed for the backup API access

    This tells us two things:


  • Service Provider

    By dropping VMware vSphere Essentials you are looking at a roughly $1200 savings right away. Both HyperV and XenServer will do what you need absolutely free.


  • Service Provider

    That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!


  • Service Provider

    @donaldlandru said:

    I do rely on vmotion and drs in the ops cluster for better utilizing resources and doing maintenance.

    Better to be fast and cheap than to be slow, expensive and have to balance. Easier to throw "speed" at the problem than to do live balancing if that is all that you are getting out of it.

    Maintenance should be trivial, what planned outages are you avoiding that warrant the heavier risk of unplanned ones?


  • Service Provider

    @donaldlandru said:

    Requirements for development storage

    • 9+ Tib of usable storage
    • Support a minimum of 1100 random iops (what our current system is peaking at)

    If split between five nodes, that's a minimal number. My eight year old desktop has 100,000 IOPS! This is less than 250 IOPS per machine, you can often hit that with a small RAID 1 pair in each box! And 10TB is just 2TB per box. This isn't a big problem to tackle when you break it down. Actually pretty moderate needs.



  • @scottalanmiller said:

    That $1200 number was based off of Essentials. Just saw that you have Essentials Plus. What is that for? Eliminating that will save you many thousands of dollars! This just went from a "little win" to a major one!

    Essentials plus is to allow us to use VMotion on operations cluster, where is would likely be cheaper in the long-run to acquire MS Server datacenter licensing and building redundant services, this was the approved solution to move VM's back and forth for node maintenance / upgrades.

    The ops layout is
    2x AD DC (one hosts DHCP server)
    1x SQL server for SharePoint
    1x SharePoint foundation
    1x Exchange server
    1x File Server (hosts a bunch of other services because of no additional server licenses)
    handful of other CentOS servers for monitoring, help desk, internal web server

    The ops cluster could likely be decommissioned and what little remaining services could be collocated on the dev environments if I could only convince the owners to go with Office 365


  • Service Provider

    @donaldlandru said:

    #1 a.k.a the safe option
    HP StoreVirtual 4530 with 12 TB (7.2k) spindles in RAID6 -- this is our vendor recommendation. This is an HP renew quote with 3 years 5x9 support next-day on-site for ~$15,000

    http://www8.hp.com/us/en/products/disk-storage/product-detail.html?oid=6255484

    Other than being able to blame a vendor for losing data or uptime rather than being on the hook yourself, what makes this safe? Looking at it architecturally, I would call it reckless to the business as it is an inverted pyramid of doom. The unit is nothing but a normal server on which everything rests. How do you handle it failing? How do you do maintenance if you can't do bring it down? And it is just RAID 6, which is fine, but no aspect of this makes it very safe.

    Having a vendor to blame is nice, but the vendor is only responsible for the product, not the system architectural design. Outages caused by this would still be your throat, not HP's. It's not that it is a bad unit, I just don't see how it could be used appropriately in this kind of a setup.


  • Service Provider

    @donaldlandru said:

    The biggest concerns I have exist in both platforms (drives fail, controllers fail, data goes bad, etc) and have to be mitigated either way. That is what we have backups for -- in my opinion the HP gets me the following things:

    This is where you really have to look carefully. You have this big risk (and cost) that you know this does not mitigate. But having local drives with stand alone servers would partially mitigate this and local drives with replication would mitigate this better than nearly any possible approach. So you appear to have options that are faster, cheaper and potentially easier that also solve the biggest problem.


  • Service Provider

    @donaldlandru said:

    24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs

    That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.

    Understanding RAID 10 and RAID 01.



  • Ok.. your feedback is actually showing something I have been afraid of, I have severe tunnel vision is servicing the current solution.
    Doing a quick inventory as to why I am trying to do that:

    1. We have the investment into this. Like another recent thread here discussed once an SMB gets heavily invested one way it is hard to switch. To be honest, I am not sure how I could convince them too at this point. This actually seems like an opportunity for a great learning experience
    2. Training of supporting resources -- I have a counterpart in our off-shore office that is just getting up to speed on how VMware works -- to be this will be even harder to change
    3. I have been using Vmware for 4 years at the office and at home, so I am comfortable with it. This reason should also make the list as to why I should change it.

    One limiting factor I see right now is our current chassis are 1U with 2-4 drive bays which would hamper a local storage deployment.

    Edit -- Stepping back and thinking, the lack of drive bays are not a valid limiting factor as I could easily add SAS and do DAS storage on these nodes.



  • @scottalanmiller said:

    @donaldlandru said:

    24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs

    That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.

    Understanding RAID 10 and RAID 01.

    This was modeled after the way TrueNAS (commercial version of FreeNAS) quoted us.


  • Service Provider

    @donaldlandru said:

    @scottalanmiller said:

    @donaldlandru said:

    24 spindle 900Gb (7.2k SAS) in 12 mirrored vdevs

    That's RAID 01, you never want that. You want 12 mirrors in a stripe for RAID 10.

    Understanding RAID 10 and RAID 01.

    This was modeled after the way TrueNAS (commercial version of FreeNAS) quoted us.

    The exact people I warn people against.

    http://www.smbitjournal.com/2015/07/the-jurassic-park-effect/

    The FreeNAS community should be avoided completely. The worst storage advice and misunderstandings of storage basics I've ever seen. FreeNAS, by its nature, collects storage misunderstandings and creates a community of the worst storage advice possible.


  • Service Provider

    The FreeNAS community tends to do things like promote software RAID when it doesn't make sense and attempts to dupe people by using carefully crafted marketing phrases like "in order for FreeNAS to monitor the disks", leaving out critical advice like "that isn't something you want FreeNAS to be doing."


  • Service Provider

    @donaldlandru said:

    1. We have the investment into this. Like another recent thread here discussed once an SMB gets heavily invested one way it is hard to switch. To be honest, I am not sure how I could convince them too at this point. This actually seems like an opportunity for a great learning experience

    You have what investment into it now? Once you replace the storage that you have today, aren't you effectively starting over and really this is about stopping you from wasting a new investment rather than protecting a current one. Everything that you proposed is, I believe, a greater "reinvestment" than what I am proposing. So, if I'm understanding the concern here correctly, your HP and/or ZFS approach is actually the one that this concern would rule out, correct? Since it requires a much larger new investment.


  • Service Provider

    Also in referencing point one.... what you are sensing is the fear of people giving in to the sunk cost fallacy. Even if they don't end up doing this, take a moment to sit back and understand how the sunk cost fallacy can be destructive and maybe even have a talk with the decision makers before looking at options about this fiscal mistake to make sure that people are thinking about it logically before they get the amygdala (fight or flight) emotional reaction from the idea of changing direction.


  • Service Provider

    @donaldlandru said:

    1. Training of supporting resources -- I have a counterpart in our off-shore office that is just getting up to speed on how VMware works -- to be this will be even harder to change

    All the more reason to go to an easier architecture with fewer moving parts and fewer things to support. Moving from VMware to XenServer or HyperV should take maybe an hour, tops. These are all very similar products that all do very little. Hypervisors should not require any real training. Most people can move from VMware vSphere to XenServer in literally a few minutes. It's all super simple GUI management, they should be able to just look at the interface and know what to do.


  • Service Provider

    @donaldlandru said:

    Edit -- Stepping back and thinking, the lack of drive bays are not a valid limiting factor as I could easily add SAS and do DAS storage on these nodes.

    You can do a hybrid too. Local for some workloads and DAS or shared for others.

    Figuring out if you need to just do local storage, which is super simple, or if you need to have replicated local storage, which is more complex, is the place to start. From the description, it sounds like straight local storage might be the way to go. Very cheap, very easy to tune for big time performance. XenCenter will happily put many independent (non-clustered) nodes into a single interface to make it super simple for the support staff wherever they are.



  • It seems I remember @donaldlandru mentioning making one big 5 host cluster. If he were to use something such as XenServer he would get the big cluster and still be able to separate the workloads out between the dev servers and the ops servers and still have "Local" storage right?

    Even if the answer to the "Local" storage (I say that because XenServer can do its own shared storage now, right?) is a resounding "No", he can still leverage replicatoin to replicate the Dev hosts into the Ops environment and vice versa for maintenance and emergencies, right?



  • @dafyre said:

    It seems I remember @donaldlandru mentioning making one big 5 host cluster. If he were to use something such as XenServer he would get the big cluster and still be able to separate the workloads out between the dev servers and the ops servers and still have "Local" storage right?

    Even if the answer to the "Local" storage (I say that because XenServer can do its own shared storage now, right?) is a resounding "No", he can still leverage replicatoin to replicate the Dev hosts into the Ops environment and vice versa for maintenance and emergencies, right?

    The answer to all your questions is yes. XenServer can deploy VMs on the same "cluster" to different storage devices. It will also do live migrations between various storage devices.



  • If that's the case, then @donaldlandru could just build one big 5-host cluster (assuming he can get the Politics taken care of and the CPUs are compatible -- if that is even an issue) on XenServer and be happy... Upgrade to 4 or 6TB drives per host (RAID 10) and also be happy.


  • Service Provider

    @dafyre said:

    It seems I remember @donaldlandru mentioning making one big 5 host cluster. If he were to use something such as XenServer he would get the big cluster and still be able to separate the workloads out between the dev servers and the ops servers and still have "Local" storage right?

    Even if the answer to the "Local" storage (I say that because XenServer can do its own shared storage now, right?) is a resounding "No", he can still leverage replicatoin to replicate the Dev hosts into the Ops environment and vice versa for maintenance and emergencies, right?

    Correct. This would actually make you question the term cluster as the boxes would actually not be associated with each other except that they are all managed from the same interface. Is that a cluster? Not to most people. Does it look like a single entity to someone managing it? Yes.

    He could replicate things into other environments, yes.



Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.