Resurrecting the Past: a Project Post Mortem

scottalanmiller

Ha ha ha. Mighty close. However, we did tell them (the bean counters) that we would need a generator to keep things online, and they said "No", just stick with the UPSes. That move was as much of a political thing as it was a money thing.

Then, hopefully, the comeback is "if you don't want to even remotely talk about reliability, why are you spending all this money where it does no good?"

Or "what is the point of IT is arbitrary IT decisions are made without IT oversight?"

dafyre

One thing I will mention, since you like to hear the business side of things as well... We were doing this with the goals that the administration had set before us:

Keep live data in 2 locations -- *check, done with Storage Cluster
Keep systems up as much as possible -- * check, done with Storage Cluster, VMware features, and Windows Failover Clustering

We made suggestions for having a good generator installed , but were shot down repeatedly. A lot the shoot-downs involved a lot of high-level politics involved that I just didn't want to get into (I hate politics. Just tell me what needs to be done, and let me get help getting it done).

The decisions were made by the IT team not just me. So the 4 or 5 of us liked the solution that we picked, and liked it even more after we saw it in action.

dafyre

@scottalanmiller said:

Or "what is the point of IT is arbitrary IT decisions are made without IT oversight?"

There was still a lot of that going on at the time. IT was shown $product and told to make it work with $other_product.... Some times this was possible, and others, it was not.

Fortunately after the fire disaster and we got things setled in with the SAN, there were few IT decisions made without IT involvement. We made things noticeably better for the campus, so they realized that we weren't terribly stupid.

scottalanmiller

So, for a modern deployment, it sounds like the system is small enough that likely you could go down to two nodes, no external storage, and get full failover, even higher reliability through a reduction of failure points and simplification of design. Cost savings, of course, as you only need two nodes. Performance increase by reducing bottlenecks.

HyperV and Starwinds do this really well, without even the need for node licensing of any sort!

A Former User

We have all Equallogic SANs here. Mostly because it was proven that buying many cheap EQ SANs and planning on them to fail was better than buying less more expensive EMC etc. SANs. But we also have these and many sites replicating, plus AppAssure and then Azure SAN Cloud Replication. (Azure is a major part of our DR).

A Former User

I'm curious how many people actually need a SAN that have them. We went without one at the county with under 10 servers. The Town we had one but, they liked to waste IT budget and mostly used the SAN as a file server which made no sense.

dafyre

The Modern Deployment as I left it:

3gig fiber link to a new server room, backed by both UPS and generator ( ). 1gbit reduntant fiber link, 4 node Scale HC3x cluster (servers w/64GB of RAM) and 7.2GB of storage. HP SAN is mostly retired and VMware servers are no longer in production.

SQL Servers are now virtualized and clustered. I think there are now only 3 physical servers left, and the current team is working to finish that off. There are now 30 VMs, as we have separated roles out heavily so if we need to do windows updates, it only takes out one service.

Things are much more reliable and available (since @scottalanmiller won't let me say they are more HA ) than they have previously been in my 10 years at that job.

scottalanmiller

@thecreativeone91 said:

We have all Equallogic SANs here. Mostly because it was proven that buying many cheap EQ SANs and planning on them to fail was better than buying less more expensive EMC etc. SANs. But we also have these and many sites replicating, plus AppAssure and then Azure SAN Cloud Replication. (Azure is a major part of our DR).

If you are planning on them to fail, and you have enough scale for them to make things cheaper than local storage, it can be a cost saver.

scottalanmiller

@thecreativeone91 said:

I'm curious how many people actually need a SAN that have them. We went without one at the county with under 10 servers. The Town we had one but, they liked to waste IT budget and mostly used the SAN as a file server which made no sense.

I can only imagine that below the enterprise space, it has to be less than 5%. These days the scale of a local storage and small node count deployment is just so big.

scottalanmiller

But now the other question.... what would we have done in 2007? Today is easy, consolidate and hyperconverge. Done. Easy peasy.

In 2007.....

scottalanmiller

So the big questions around 2007, how many hosts would have been consolidated to with ESX back then? That's a starting point.

dafyre

As a small shop, we started with 16 physical servers and got that number down to 6 physical servers, and could have gone lower if not for RAM constraints... So we got 5:1 consolidation right off the bat.

A Former User

@dafyre said:

As a small shop

I thought you worked for a state college?

scottalanmiller

@dafyre said:

As a small shop, we started with 16 physical servers and got that number down to 6 physical servers, and could have gone lower if not for RAM constraints... So we got 5:1 consolidation right off the bat.

So brown field constraints limited us to 5:1 and 6 resulting nodes. Okay. Is this three per location so full failover between the two sites?

dafyre

@scottalanmiller That was the plan. We got hit with budget cuts shortly after that due to the economy crashing and all. We had that on the road map until then. Where we left off in the original deployment was settling for having 2 (VMware) servers in the "main DC" and a third server over in the secondary "dc" (which was really just a closet with a server, one of the SAN nodes, 2 switches, and a UPS that was cooled by a window unit, lol).

By the time we were ready to fix it, our VM usage was enough to max out all 3 of the servers. The plan in a host failure was to make sure the critical services were up.

By the end of the VMware deployment, we were around 8:1 (24 VMs across 3 servers).

dafyre

@thecreativeone91 said:

@dafyre said:

As a small shop

I thought you worked for a state college?
That is my current job. 8-)

The Deployments I am talking about are from a previous employer.