Why Anecdotes Fail
-
I felt similarly regarding MS SBS 2003 - It was a 'huge risk' to have a single server that served so much - I made it worse on myself by (shooting my foot) at only 4GB of memory. But it ran,..daily,.. only one 'failure' which was a virus and I may have been able to resolve it without wiping the system. But I choose to as a precaution.
But that SBS box ran for nearly 6 straight years with very little day to day futzing. We did have downtime,.. but there isn't anything that I could do about the Windstream hub down the road being under 5 feet of water. We were running,.. just no internet.
Still like the SBS model - though in my new office it doesn't work since we have nearly 300 people.
-
While others might disagree with me, there is nothing wrong with the SBS model. If you were doing it today in an environment that SBS was made for (less than 75 users) you'd probably having email running in it's own VM (and this is where everyone starts yelling that I'm a fool and should be using hosted email/Office 365) and the AD/file/print in a second VM.
Of course you have to take what Scott said about making 'solid business decisions around the risk and cost model' which in most cases for a business of this size will only require a single server with a manual DR plan.
-
When I first considered buying a couple of SANs, on the advice of my vendor, I tried to find out some stats for Proliant reliability. But I couldn't find any. Do HP publish them? I asked around and no-one seemed to know. So I couldn't rely on stats even if I wanted to.
I love stats. I mean really, really love them. But getting hold of accurate ones is pretty tricky. So I rely on anecdotal evidence mostly, not through choice but through necessity.
I'm talking here about independent stats. I'm constantly bombarded with stats from vendors trying to sell me something. But in the same way @scottalanmiller would say never rely on the advice of a vendor, I'd say never rely on the stats of a vendor.
-
Anyway....anecdotally I've been responsible for servers for 3 SMEs over the last 15 years. In that time I've probably got through around 15 Proliant servers. The total downtime during that time is precisely zero. I've also got through hundreds of HP desktops and can count the number of failures on one-hand. So my anecdotal experience is that hardware is incredibly reliable. The only things that generally fail are power supplies and hard drives - but this hasn't resulted in server downtime as these two items have redundancy. I've even run mission critical server software on old re-allocated PCs, which isn't the wisest thing to do, but again, has given me relatively little trouble.
So without any reliable statistics to tell me otherwise, I can only rely on own experience which is that Proliant downtime isn't a big problem for me. I couldn't justify the cost of buying two SANs purely to address a risk that I'd never personally experienced, even if that risk was real.
My personal experience is that software is far, far less reliable than hardware, so my tight budget tends to address making software more reliable and not hardware. I'd be interested to know how many people spend thousands on a SAN but then fail to patch their software in a timely manner. I bet it happens and it's crazy because patching is generally free and SANs are expensive.
If anyone has stats to disprove my theory then I'd love to see them!
-
@Carnival-Boy said:
When I first considered buying a couple of SANs, on the advice of my vendor, I tried to find out some stats for Proliant reliability. But I couldn't find any. Do HP publish them? I asked around and no-one seemed to know. So I couldn't rely on stats even if I wanted to.
I have an article coming about that. No one publishes stats for that stuff because no one else does. Not the SAN vendors, not the server vendors, no one. But what you can know is that almost no SAN vendor makes a server on par with a Proliant. So whatever an equal level SAN is is going to be slightly less reliable than the equivalent Proliant. Just the nature of scale, quality, engineering, etc. But they are roughly identical.
-
@Carnival-Boy said:
I'm talking here about independent stats.
If you think about the billions of dollars that would have to be spend to do a study of a proper scale and that the study would be too old to be useful by the time that it was completed.... there is no way to have stats like that in IT.
Imagine if someone did a current generation Proliant versus PowerEdge study. You'd need at least a thousand servers of each model you want to test and at least ten years to gather the lifetime stats. At $20K per server that is $20,000,000 investment per server so $40,000,000 minimum for a single server to server comparison of a single configuration. And that is before operational costs (power, cooling, etc.) for many years. And when they were done they could tell us at the same time that we've been retiring those servers which ones we "should have" bought.
-
I've done informal studies in massive environments (tens of thousands of servers) and you are correct, hardware almost never fails. The drive to make your hardware redundant is misdirected in most cases. It's chasing a problem that does exist at huge cost and quite often the complexity of the solution causes twice as much downtime as it theoretically protects against.
-
@scottalanmiller said:
But what you can know is that almost no SAN vendor makes a server on par with a Proliant.
IIRC some HP SANs are or were Proliants. I recall the HP 4300 was basically a Proliant. When we were looking at getting a pair, we were basically told our existing Proliant was at risk of failure and in order to mitigate this risk we needed to effectively replace it with two Proliants, plus some software to keep the two in sync. It's adding redundancy to something that I've never personally had fail.
But like the majority of SMEs, we have no redundancy at the software level. We're running single databases for our ERP system and for our Exchange system, for example. So if the database fails we're down. Having a SAN would just mean the failure occurs across two pieces of hardware instead of one.
Another point to make about redundancy. I am really, really confident about the ability of my Proliants to handle disk failure. I've had quite a few over the years, and am now pretty relaxed about the process. That little red light comes on, I phone HP, a new drive arrives, I pop out the old drive, I pop in the new drive, the lights flash, and I walk away. Completely confident that the array will rebuild. It still makes me nervous, but it's a controlled nervousness. I doubt having a SAN fail is anywhere near as straightforward. My point being that I like simple redundancy, I dislike complex redundancy.
-
@Carnival-Boy said:
When I first considered buying a couple of SANs, on the advice of my vendor, I tried to find out some stats for Proliant reliability. But I couldn't find any. Do HP publish them? I asked around and no-one seemed to know. So I couldn't rely on stats even if I wanted to.
I love stats. I mean really, really love them. But getting hold of accurate ones is pretty tricky. So I rely on anecdotal evidence mostly, not through choice but through necessity.
I'm talking here about independent stats. I'm constantly bombarded with stats from vendors trying to sell me something. But in the same way @scottalanmiller would say never rely on the advice of a vendor, I'd say never rely on the stats of a vendor.
Just to be clear, I don't trust vendor stats either. Get nothing that promoted sales from a vendor
-
@Carnival-Boy said:
@scottalanmiller said:
But what you can know is that almost no SAN vendor makes a server on par with a Proliant.
IIRC some HP SANs are or were Proliants. I recall the HP 4300 was basically a Proliant. When we were looking at getting a pair, we were basically told our existing Proliant was at risk of failure and in order to mitigate this risk we needed to effectively replace it with two Proliants, plus some software to keep the two in sync. It's adding redundancy to something that I've never personally had fail.
But like the majority of SMEs, we have no redundancy at the software level. We're running single databases for our ERP system and for our Exchange system, for example. So if the database fails we're down. Having a SAN would just mean the failure occurs across two pieces of hardware instead of one.
Another point to make about redundancy. I am really, really confident about the ability of my Proliants to handle disk failure. I've had quite a few over the years, and am now pretty relaxed about the process. That little red light comes on, I phone HP, a new drive arrives, I pop out the old drive, I pop in the new drive, the lights flash, and I walk away. Completely confident that the array will rebuild. It still makes me nervous, but it's a controlled nervousness. I doubt having a SAN fail is anywhere near as straightforward. My point being that I like simple redundancy, I dislike complex redundancy.
Exactly.
Many HP low end SANs are in fact Proliants. Often setup by DotHill and not by HP.