New Infrastructure to Replace Scale Cluster

DustinB3403

If you are familiar with Hyper-V and require Hyperconvergence, why not use Hyper-V and StarWind vSAN?

Absolutely free and scalable, support may be a bit more difficult but I'm sure the support costs are reasonable.

scottalanmiller

@Dashrender said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in Ovirt:

@DustinB3403 yes that is correct. I have a scale cluster already 1150. But its 3 years old. Harddrives are failing. And I cant manage it at all. It just runs and that's it.

Afraid if it craps out I am screwed.

Why can't you manage the Scale cluster? The demos I've seen seem to make it super simple. Didn't seem that hard to manage.

If it's failing drives - buy the drives direct from the manufacturer, done.

I'm not sure that you can do that. Buy drives from the manufacturer, that is. Scale hardware is not generic and you can't just slap anything in there.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

@Dashrender said in Ovirt:

If it's failing drives - buy the drives direct from the manufacturer, done

Exactly my thought, and my question relates directly to the obvious answer. Scale uses hardware raid. So replacing the drives should be the easy part.

Scale does not use hardware RAID. In fact, it uses neither hardware nor RAID. It is software RAIN.

But replacing drives is super simple. Acquiring properly firmwared drives might not be.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

I would build it and know the bones and how it function. Beside the hardware I will know how. to fix things if it breaks.m

It would function the same as the Scale, just with some different shared storage solution instead of SCRIBE. The hardware will be essentially identical, Scale was just standard Dell servers at the time.

Basically you'd be building another, similar era cluster using the same components as the Scale you have now, but with needing loads of special knowledge, but with way easier ability to replace parts.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

@JaredBusch Its all me with my company.

What's their plan if you are sick, hit by a bus, get a better offer elsewhere, etc.?

With the Scale, they pick up the phone and get 100% support instantly. With a built it yourself solution sure, they could call some of us, and hey, we'd love that. But it's not quite the same as having primary vendor support for the entire stack instantly.

DustinB3403

@scottalanmiller said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in New Infrastructure to Replace Scale Cluster:

@JaredBusch Its all me with my company.

What's their plan if you are sick, hit by a bus, get a better offer elsewhere, etc.?

With the Scale, they pick up the phone and get 100% support instantly. With a built it yourself solution sure, they could call some of us, and hey, we'd love that. But it's not quite the same as having primary vendor support for the entire stack instantly.

I'm assuming this is a 1 man band.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

Posted for formatting

@mroth911 said in Ovirt:

@DustinB3403 said in Ovirt:
So to ask a few questions.

what about the scale system are you unable to support?

So here was a situation that I had, A hard drive failed and the system wouldn't recognize the new harddrive I put in. So I had to call them to do something in the backend to active port 2 to reanalyze and make the drive active

Yeah, this is where I think you get stuck. Third party hardware is unlikely to work in the Scale, even at the hard drive level. Because the system does firmware management, if the drive doesn't match perfectly it will not use it, AFAIK. Hence his fear and struggle to support the hardware.

scottalanmiller

@Dashrender said in New Infrastructure to Replace Scale Cluster:

I know Scott does both with Hostadillo - but he doesn't use his own hardware - he offloads that to Vultr. It's not worth his time/effort, etc to manage the hardware. He resells Vultr(or others) hosting while also selling web deving.

That's true, but that's also because of scale (web hosting doesn't need a lot of costly features like big storage), and because it does need crazy bandwidth. So things like Vultr are tuned perfectly for that workload.

His workload might be polar opposite. Say he is running NextCloud, that works extremely poorly on any public cloud and is easily cheaper to buy your own servers for.

DustinB3403

Found it From a post somewhere @scottalanmiller, @mroth911 said he has 24 cPanel systems that can't go offline.

He's hosting websites of some sort locally.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

To ask the question again, does your own personal business actually require HA? Or would Near-HA be good enough?

I would like the avaliblity that if one server goes off, the whole things doesn't shit the bed.

So that's reasonable, to a point. But here is what I always say...

This should never be a "I would like" or "I want". It should be numbers, and numbers only. HA is always a math decision. How much risk do you have of downtime? How much does downtime cost? What is the downtime mitigation path(s)? How much does HA cost?

If the cost of downtime doesn't outweigh the cost of HA, you don't do it. If it does, you do. It's that simple (the math is hard, the resulting decision is simple.) There is no place ever for an emotional view of wanting or not wanting HA. In a lab you might want HA because you want to learn and support HA systems, that's fine. But once we are talking production, math is our sole tool for deciding the direction.

It sounds like the decision processes for the original Scale that can't be supported now are playing out again, that's what we are trying to protect against. When that Scale was purchased, the money to fully purchase it (maintain support till end of life) was not guaranteed and now you are looking to retire a good cluster at half its lifespan making its "per year" cost astronomically higher than it should have been. And now starting down maybe a cheaper, but still potentially much more expensive than necessary, path that doesn't really sound like it makes sense.

Servers with warranties are cheaper than multiple servers. The rule of thumb is, you never get HA without good support warranties. If you even question having a warranty, you've already ruled out HA as making sense.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

@Dashrender said in Ovirt

I do both, and it just seems easier when I have build an application, or custom dev, I can just spin up a machine in my own environment and do what I need to do. plus I have all this equipment already.

I actually often fine the opposite. So easy to spin up on cloud. But that's not here nor there. The decision around that is based on other factors.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

Running web servers, PBX Servers, Jira server, DC's

DCs are not good for a cloud service, but are easy to handle without all of this complexity as they do not use failover. Probably good to start yet another thread and discuss why you have Active Directory in an environment such as this. We don't know enough to say, but it sounds like they might not make sense based on other factors.

Web Servers - Absolutely Ideal for Cloud, definitely get those there
Jira - Just a web server, see above
PBX - Again, ideal for cloud in nearly all cases.

Sounds like you have nearly an ideal setup for cloud, and a really bad one for on premises. Combine that with a fast WAN link, and you are in even better shape.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in Ovirt:

To ask the question again, does your own personal business actually require HA? Or would Near-HA be good enough?

I would like the avaliblity that if one server goes off, the whole things doesn't shit the bed.

Running web servers, PBX Servers, Jira server, DC's

and some other custom stuff for doctors to get into .

Okay so you can do this with 2, 3 or 100 hosts and not need hyperconvergence. Pooling in Hyper-V, XCP-ng and even ESXi (but cost licensing) all allow your VMs to migrate in the event of an outage.

But not transparently, they go through an outage before migrating.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

@DustinB3403 said in Ovirt:

XCP-ng

I really want a product that works and can be put in productions. I have 24 cpanel servers that I cant have go offline.

Okay, now I'm lost. Why do you have cPanel internally? Can't go offline for whom? You have internal users on your LAN using cPanel? This seems... really odd.

Also, why is this your reaction to XCP? You pretty much just stated why XCP is good, but stated it as if it is your reason for not using it.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in Ovirt:

@DustinB3403 said in Ovirt:

XCP-ng

I really want a product that works and can be put in productions. I have 24 cpanel servers that I cant have go offline.

You're still looking at this with brown colored glasses.

First you're wanting to setup something you've never used for production out of the gate. Granted there is a lot of documentation, but the same amount of documentation can make managing and setting this up difficult.

oVirt is powerful, but you have 3 hosts. Meaning if you lost power your systems go offline anyways. All 24 cPanel down.

Same thing would occur with any on-premise solution, power, internet, switching issue.

The idea that XCP-ng (or ESXi or Hyper-V or XenServer) aren't production ready are weird, when you have access to numerous choices for support. Only with ESXi is support more difficult to obtain freely.

Assuming he has a datacenter with dual HVAC, dual power, dual WAN, and all the right infrastructure and generators, he might be fine. But that doesn't match his "can't afford warranty support" for small units.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in Ovirt:

@DustinB3403 said in Ovirt:

XCP-ng

I really want a product that works and can be put in productions. I have 24 cpanel servers that I cant have go offline.

You're still looking at this with brown colored glasses.

First you're wanting to setup something you've never used for production out of the gate. Granted there is a lot of documentation, but the same amount of documentation can make managing and setting this up difficult.

oVirt is powerful, but you have 3 hosts. Meaning if you lost power your systems go offline anyways. All 24 cPanel down.

Same thing would occur with any on-premise solution, power, internet, switching issue.

The idea that XCP-ng (or ESXi or Hyper-V or XenServer) aren't production ready are weird, when you have access to numerous choices for support. Only with ESXi is support more difficult to obtain freely.

SUpport for oVirt isn't cheap, either. And getting it requires that you not build it yourself. oVirt for production support requires buying RHEV which is really expensive.

scottalanmiller

@DustinB3403 said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in Ovirt:

I have 24 cpanel servers that I cant have go offline.

This statement here leads me to think that you need to purchase support. Period. Or host these in the cloud.

On-premise is the "I can accept some downtime option".

No, on premise is "I expect some downtime."

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

I have multiple ips, 2 generators, on LP.

Not IPs, but ISPs, I assume you mean.

That stuff is awesome, but that sounds like a hella lot of money to spend to avoid saving money on cloud. How can you afford all of that stuff, but not basic support on small systems? On one hand, it sounds like you have money pouring out of your ears, on the other, there isn't as much as I have at home.

DustinB3403

@scottalanmiller said in New Infrastructure to Replace Scale Cluster:

@mroth911 said in New Infrastructure to Replace Scale Cluster:

I have multiple ips, 2 generators, on LP.

Not IPs, but ISPs, I assume you mean.

That stuff is awesome, but that sounds like a hella lot of money to spend to avoid saving money on cloud. How can you afford all of that stuff, but not basic support on small systems? On one hand, it sounds like you have money pouring out of your ears, on the other, there isn't as much as I have at home.

I'd take the other approach.

I'm spending so much money on these services for my business that I can't afford to spend money on supporting my infrastructure.

scottalanmiller

@mroth911 said in New Infrastructure to Replace Scale Cluster:

I get everyone view on this. However, I have a ton of equipment. I already have 5 year contracts with fiber. It would cost me more money to cancel the contracts 20K per ISP for me to put my stuff in the cloud. SO While I have all this equipment here that is collecting dust I can use it to make sure I stay up and running.

Kind of. I get it, money has been spent. I guess this is starting to explain why money has run out. I'm assuming that at some point there was lots of funding, people bought gear willy nilly, used up all of the money, and now it's just you having to make due with the cast offs of an earlier investment era while the money runs out?

It's a truly bizarre situation. But okay. I still wonder if the amount of things needed to be learned to support it properly are not going to be so risky as to create more risk than you should have. Making your own HC is cool, and can do a lot. But you really have to get it right and really have to know how to support all of the parts or even the smallest thing could mean downtime way bigger than if you'd never done it.

Remember that no "HA" system comes without its own risks. Only takes a small accident for HA to spell disaster as all of the pieces are so much more interdependent.