I have POF setup of an enterprise server (IBM x3550m4) running with just an Intel PCIe-card-form-factor p3600 1.6Tb, hosting an ERP with MS SQL DB and a fileserver with ~400Gb of regular office files. Have done several benchmarks in the last months and everything is running just fine.
I will soon put this stuff in production, and I have a spare (new, with 1.2 Tb of 10k spindles) x3550m4... within my budget (~1500 euro, less is better), I’m thinking about two architectural pattern to get better reliability than the POF single node:
- Buy another enterprise SSD (maybe the samsung 1725a, 1.6 Tb too) for the other node, setup a SQL replication for the DB and fileserver sync daemon (DRFS, Synchthing, ecc), and failover the DNS in case of the first node fail (in any way). We can tolerate some hours of downtime;
- Buy a couple of smaller SAS SSD and RAID 1 them, use those as the primary storage, use the p3600 or the spindles for the replicas.
A couple of consideration about that:
- I think that today’s enterprise-class PCIe SSD in the first 5 years from deployment and with the right overprovisioning (like the ones that I mentioned) almost as reliable as a RAID controller, because they have a full solid-state storage controller, no moving parts and a declared MTBF that is very reassuring;
- Those services can tolerate even a day of downtime every 3-5 years without major impact on revenues;
- I don’t see the point of RAID and in general of node-level reliability if I can rely on better application clustering, this idea was inspired by hyperscalers/opencompute machines, that works on single PSU etc. because their reliability is achieved at an higher level.
What do you think about it? Any hints are welcome!