How Does Local Storage Offer High Availability

scottalanmiller

I get asked this a lot and a recent response was rather thorough so I figured that I would post it here.

Question: I would like an explanation as to how you can have HA with local storage. as you say, "Local storage is faster, cheaper AND more reliable (closer to HA.) A SAN increases risk so moves you farther from HA. You CAN create HA using either as a building block, but SAN requires more work to get there as it is inherently more risky than local storage."

Answer: Well the first way to think about it is.... ask "How can you have HA with a SAN?"

Answer is, of course, that you can't. You can only talk about HA when you have two SANs, which are replicated to each other, and failover transparently with no close coupling. So, since a SAN is nothing but local storage stretched precariously over a network adding risk and bottlenecks, we can apply the same logic to local storage. How do you get HA with local storage? You have to replicate it, exactly like you do with a SAN. Remember that a SAN IS local storage - stretched over a network connection. So ANY feature that SAN would give us, local storage gives us too, but better.

So just like you must replicate SANs to make a SAN a part of an HA solution, you replicate your local storage to make it HA. The difference is that local storage starts out safer, cheaper and faster than a SAN so you have a leg up on the game. And ergo, once replicated, it replicates faster, cheaper and more safely too.

Just as SAN replication is unique to the SAN in question, local storage replication is unique to the local storage in question. If you have XenServer, for example, local storage replication is provided by DRBD. If you are on KVM, same thing. If you are on Hyper-V, you use Starwind. If you are on VMware you lose the best features, it's just not up to par with its competitors, but you can do Starwind for two nodes and VSAN at high cost.

It's the same tools that we used to make server clusters more reliable than SAN in the pre-virtualization world.

The important thing to remember is that SAN is nothing more than your local disk hanging off of a network connection. So once we know that, we know that SAN cannot add reliability, it only takes it away.

More on RLS or Replicated Local Storage.

Of course you can do things like use mainframe level RAS technologies to make a SAN highly available in a single chassis, but any technology to do this comes from the server side and is cheaper there so you could apply that to local storage as well so apples to apples local remains more reliable.

antonit

I love SAN, but hate that it's pricey when you want HA. Even with VSAN in VMWare (being local storage), you still need it replicated to have full HA. It's always a tricky one when it comes to how much $$$ you're willing to spend.

scottalanmiller

@antonit said:

I love SAN, but hate that it's pricey when you want HA. Even with VSAN in VMWare (being local storage), you still need it replicated to have full HA. It's always a tricky one when it comes to how much $$$ you're willing to spend.

That VSAN approach for RLS is free in Hyper-V, XenServer and KVM, though. It's exclusively Vmware lacking that functionality for free.

scottalanmiller

Even on physical systems, Linux and the BSD family do replicated local storage for free as part of the operating system functionality.

antonit

@scottalanmiller I was talking more on a virtual, local storage level. As I'm a VMware guy, I tend to steer towards their technologies. Didn't know about all those other offerings, though. Thanks for pointing them out!

scottalanmiller

@antonit said:

As I'm a VMware guy, I tend to steer towards their technologies.

Think of the ecosystem like this... if someone offers it for VMware, it is probably built in to everything else. The degree to which VMware lacks the basics is mind boggling, especially when you find out it isn't free and everyone else is. It's not that the technology is bad, it isn't, it's excellent, it's just so amazingly lacking. The things that VMware can't do, like software RAID and Replicated Local Storage, that its competitors had from day one, over a decade ago, is crazy.

bbigford

I've read your article on the Inverted Pyramid of Doom (nice) in the past. I was thinking about something the other day when someone told me about their SAN having dual controllers, one for failover. But I cannot remember what you said about dual controllers... In a Mango thread sometime in the past, someone posted, "Well we have some redundancy, we have dual controllers." But you had commented, "Well dual controllers doesn't mean redundancy, because if one controller fails..." -bleh-. Can't remember how you finished a sentence like that. Help me out here? (The obvious thing here is that one box by itself is a single point of failure, regardless what kind of resources you put in it, it's still one box). But there was something specifically about a dual controller setup that sales people throw out there that you spoke to, and it was pretty killer.

travisdh1

@BBigford Are you remembering those cases of fake duel controllers? What I mean by fake duel controllers is that some systems advertised duel controllers, but when one of the controllers went down it would take the other one with it as well. In those cases duel controllers actually made the system much less stable than a single controller would have. The one example I remember off the top of my head is Dell VRTX, but I know some SANs have had the same issue.

bbigford

@travisdh1 I do remember that one but I thought I had read SAM say something else. Maybe I'm just crazy. I'm probably crazy. It could very well have been that though. That feature was completely misleading and criminal to even put on a feature sheet.

scottalanmiller

@BBigford said:

I've read your article on the Inverted Pyramid of Doom (nice) in the past. I was thinking about something the other day when someone told me about their SAN having dual controllers, one for failover. But I cannot remember what you said about dual controllers... In a Mango thread sometime in the past, someone posted, "Well we have some redundancy, we have dual controllers." But you had commented, "Well dual controllers doesn't mean redundancy, because if one controller fails..." -bleh-. Can't remember how you finished a sentence like that. Help me out here? (The obvious thing here is that one box by itself is a single point of failure, regardless what kind of resources you put in it, it's still one box). But there was something specifically about a dual controller setup that sales people throw out there that you spoke to, and it was pretty killer.

It's the straw house problem. Tightly couples controllers in a single box kill each other rather than protecting each other. If you have a fire coming, having redundant straw houses right next to each other for redundancy is pretty silly - not only will the same fire cause them both to burn, the fire at one house will set the next on fire.

And then the second problem.... it's just one component. Controllers don't normally fail. That's why it's the one thing not redundant in a normal server - because it is a pointless place to have redundancy because that's not how you get HA.

And the third problem - high end servers DO have redundant controllers, so the SAN having it only means that the SAN is "as good" as the server but since the SAN is external and extra, it is not better, it's worse.

http://mangolassi.it/topic/6190/redundancy-is-never-a-goal-reliability-is-a-goal-redundancy-is-a-tool

scottalanmiller

@travisdh1 said:

@BBigford Are you remembering those cases of fake duel controllers? What I mean by fake duel controllers is that some systems advertised duel controllers, but when one of the controllers went down it would take the other one with it as well. In those cases duel controllers actually made the system much less stable than a single controller would have. The one example I remember off the top of my head is Dell VRTX, but I know some SANs have had the same issue.

It's not a rare thing where an example has to be sought out. Every dual controller system under a certain price point and essentially all but a handful of vendors do the "tightly coupled" controllers. Every product in the SMB price range is that way.

HDS and EMC make a few true active/active controller units. But if you are buying them, you could have bought a dual controller server instead for cheaper and more power and more reliability. Even active/active doesn't make sense, it just makes it reliable as a single object.

scottalanmiller

@BBigford said:

@travisdh1 I do remember that one but I thought I had read SAM say something else. Maybe I'm just crazy. I'm probably crazy. It could very well have been that though. That feature was completely misleading and criminal to even put on a feature sheet.

As long as they only called it redundant. Most IT people don't care about reliability, they just want redundancy. So why not sell it to them?

bbigford

@scottalanmiller said:

@BBigford said:

@travisdh1 I do remember that one but I thought I had read SAM say something else. Maybe I'm just crazy. I'm probably crazy. It could very well have been that though. That feature was completely misleading and criminal to even put on a feature sheet.

As long as they only called it redundant. Most IT people don't care about reliability, they just want redundancy. So why not sell it to them?

I can't remember if they were labeled as redundant and it being a straight up lie, or if they were marketed as dual controllers and it being implied, which is basically misleading...

scottalanmiller

@BBigford said:

@scottalanmiller said:

@BBigford said:

@travisdh1 I do remember that one but I thought I had read SAM say something else. Maybe I'm just crazy. I'm probably crazy. It could very well have been that though. That feature was completely misleading and criminal to even put on a feature sheet.

As long as they only called it redundant. Most IT people don't care about reliability, they just want redundancy. So why not sell it to them?

I can't remember if they were labeled as redundant and it being a straight up lie, or if they were marketed as dual controllers and it being implied, which is basically misleading...

Redundant is the correct term. Redundant means nothing, it only means there are two of them. Like you have redundant seat belts in your car. Only one is useful, but it doesn't stop the term from being correct. In the UK, being redundant implies you are useless and being fired. The idea that redundant means more reliable is purely made up by IT people and is one of those things that IT does to aid marketers. The marketer says something useless, like redundant, and the IT person just assumes that the marketer meant to say reliable, so in their head they replace the term. But redundant suggests nothing of the sort. It's not lying in the least, the lie is that redundancy has value.

scottalanmiller

They are truly dual controllers. There is nothing misleading. The only thing that would be misleading is if someone says that dual controllers and/or redundancy gives you high availability. That's the misleading part.

bbigford

@scottalanmiller said:

They are truly dual controllers. There is nothing misleading. The only thing that would be misleading is if someone says that dual controllers and/or redundancy gives you high availability. That's the misleading part.

When it really boils down, I think maybe the problem is not in the marketing, but in the assumption by the buyer about what they are receiving without asking all the qualifying questions.

scottalanmiller

@BBigford said:

When it really boils down, I think maybe the problem is not in the marketing, but in the assumption by the buyer about what they are receiving without asking all the qualifying questions.

Yup. I help IT people with this all of the time. I hear it constantly. "But they told me this...." Then I say "Did they really? That would be lying, what did they actually say?"

Then it turns out, nearly every time, that the sales person said something factual, but not positive about the product. Then the IT person, certain that the sales person meant to say something positive, corrected what they said "in their head" and made it so that they heard something positive that was never stated or implied at all.

It's like a word problem, something else IT people often struggle with. Read many posted IT questions. If you look, you'll often see a question about an application configuration issue or a license mistake but they will post it to a virtualization forum and put a title that only mentions things that aren't related to the issue and have a huge description with the problem hidden in it.

Why do people do this? Because they are confused and cannot decipher what is and isn't relevant and what it means. It IT people so often have problems expressing what matters when they are the ones telling the problem, imagine what a problem it is when hearing a description!

All marketing has to do is say enough stuff, just random stuff, and the IT Pros will, much of the time, hear something close enough to something that they wanted to hear that they will fill in the gaps!

scottalanmiller

Examples of negatives or neutrals that sales people use and IT people turn into positives....

It's Redundant:

IT Hears: It's highly reliable.
What it actually means: It costs more because you are double spending!

It has a large ecosystem

IT Hears: It's so popular everyone makes software for it.
What it actually means: The product is so lacking in features that a market has sprung up fixing it!

It's Closed Source

IT Hears: It comes with support.
What it actually means: You are a hostage to the vendor and their is no incentive to provide good support!

bbigford

@scottalanmiller said:

Examples of negatives or neutrals that sales people use and IT people turn into positives....

It's Redundant:

IT Hears: It's highly reliable.
What it actually means: It costs more because you are double spending!

It has a large ecosystem

IT Hears: It's so popular everyone makes software for it.
What it actually means: The product is so lacking in features that a market has sprung up fixing it!

It's Closed Source

IT Hears: It comes with support.
What it actually means: You are a hostage to the vendor and their is no incentive to provide good support!

It's Closed Source: I definitely agree with you there.

It's Redudant: We're just talking about controllers still, right? (Thinking about clusters).

It has a large ecosystem: I didn't really understand that one, with the market springing up to fix it. A large ecosystem to me would mean there are tons of different devices available for development on the platform, and tons of development going on. Like OpenStack would be something I would call a fairly large ecosystem...

scottalanmiller

@BBigford said:

It's Redudant: We're just talking about controllers still, right? (Thinking about clusters).

No, the term means "two of something". Nothing more. If you feel the term redundancy means something positive, there is a misunderstanding.

Redundancy isn't inherently bad, but it is also not inherently good. The term carries no such connotation.