Is this server strategy reckless and/or insane?

creayt

I have 2 servers. Other than one having 4 more processor cores total, the servers are identical. Specs are:

R620
a: 2x octacore xeon, b: 2x decacore xeon
128GB ram each
1GB Perc H710P RAID controller
2x 256GB Samsung 850 Pros ( Os and installs live here, Raid 1 )
5x 1TB Samsung 850 Pros ( Data and file uploads live here, Raid 0 ) ( can add 1 more on the decacore and up to 3 more on the octacore later if desired, but this RAID controller gets pretty saturated at 4-5 from what I've read )

My question is, I like the benefits of not having to leave the box to go from app code to database as the goal of this project is for it to be as absolutely instant and fast feeling as possible, so my plan is to basically configure the servers like so
Full serverware stack on each ( IIS, app server, MySQL )
One of the two will be the MySQL master and replicate to the other ( all writes will go to this server )
The other server will be the image upload and processing box, and final images will all be copied to the other server in the background
The two machines will be clustered for all web traffic save those two uses ( db writes and file uploads )
Both will be ready, w/ just a few minutes downtime, to take over the full workload should the other fail, which includes a disk in either Raid 0 failing

The plan would be
If any piece of the DB master server fails, it drops out of the picture until I can get the failure resolved, and the 2nd server takes over all duties. The only piece I can't automate is the switch from slave to master, which I would handle manually and up to an hour of downtime is acceptable

If any piece of the image processing server fails, all traffic would automatically go to the other server until I can resolve the failure, and no perceptible downtime would occur

The only thing running on these servers will be a hobby project I made, and I'm ok w/ a little downtime in the event of a presumably unlikely hardware failure.

What do you think? Is this a completely unorthodox approach? I like the idea of most web site requests being able to go through either server so I can make use of the horsepower of both of them and my goal is to make the fastest web site I've ever used so keeping the db and app code that touch each other on the same machine is ideal for me, as is using high-performance-within-my-budget techniques like a Raid 0 of SSDs.

Let me know what you think, I'm a programmer not a server pro so there may be a ton of negatives I haven't though of in this set up.

Thanks!

Dashrender

I assume you don't care about the data on the RAID 0?

Dashrender

You're already saturating your RAID controller, so tossing one more drive in each to make them RAID 5 (assuming these are SSDs), one less thing to rebuild, restore/resync if you have a drive failure.

scottalanmiller

That's pretty common. Master to slave automated, manual fail back. Works fine as long as you are around most of the time.

creayt

I was wrong, it looks like you can fully automatically fail over to the slave and set it as the new master w/ the latest MySQL set up, so that makes the decision a bit easier.

As far as Raid 5 instead of 0, I'd thought that the performance of Raid 5 was absolutely terrible and that almost no one used it anymore, is that a wrong memory?

DustinB3403

@creayt said in Is this server strategy reckless and/or insane?:

I was wrong, it looks like you can fully automatically fail over to the slave and set it as the new master w/ the latest MySQL set up, so that makes the decision a bit easier.

As far as Raid 5 instead of 0, I'd thought that the performance of Raid 5 was absolutely terrible and that almost no one used it anymore, is that a wrong memory?

No one uses RAID5 with spinning rust.

RAID5 is perfectly acceptable with SSDs

creayt

@dashrender I care about it, but because it's automatically replicated after each write there's a fully up-to-date, ready-to-go backup of it the next U down at all times. Could/would also push nightly backups offsite somewhere I suppose.

Looks like Raid 5 for SSDs can also, possibly, shorten their lifespan because of the parity writes: https://serverfault.com/questions/513909/what-are-the-main-points-to-avoid-raid5-with-ssd

scottalanmiller

@creayt said in Is this server strategy reckless and/or insane?:

As far as Raid 5 instead of 0, I'd thought that the performance of Raid 5 was absolutely terrible and that almost no one used it anymore, is that a wrong memory?

RAID 5 is the standard for SSDs. But you will take performance hits. But whether or not you can tell is the question. On an all flash array with caching, the hit is pretty small.

scottalanmiller

@creayt said in Is this server strategy reckless and/or insane?:

@dashrender I care about it, but because it's automatically replicated after each write there's a fully up-to-date, ready-to-go backup of it the next U down at all times. Could/would also push nightly backups offsite somewhere I suppose.

Looks like Raid 5 for SSDs can also, possibly, shorten their lifespan because of the parity writes: https://serverfault.com/questions/513909/what-are-the-main-points-to-avoid-raid5-with-ssd

Yes, but with enterprise drives and cache buffering, that's trivial. You are typically looking at decades before failure.

Obsolesce

@scottalanmiller said in Is this server strategy reckless and/or insane?:

@creayt said in Is this server strategy reckless and/or insane?:

@dashrender I care about it, but because it's automatically replicated after each write there's a fully up-to-date, ready-to-go backup of it the next U down at all times. Could/would also push nightly backups offsite somewhere I suppose.

Looks like Raid 5 for SSDs can also, possibly, shorten their lifespan because of the parity writes: https://serverfault.com/questions/513909/what-are-the-main-points-to-avoid-raid5-with-ssd

Yes, but with enterprise drives and cache buffering, that's trivial. You are typically looking at decades before failure.

850 pros are not enterprise drives.

scottalanmiller

@tim_g said in Is this server strategy reckless and/or insane?:

@scottalanmiller said in Is this server strategy reckless and/or insane?:

@creayt said in Is this server strategy reckless and/or insane?:

@dashrender I care about it, but because it's automatically replicated after each write there's a fully up-to-date, ready-to-go backup of it the next U down at all times. Could/would also push nightly backups offsite somewhere I suppose.

Looks like Raid 5 for SSDs can also, possibly, shorten their lifespan because of the parity writes: https://serverfault.com/questions/513909/what-are-the-main-points-to-avoid-raid5-with-ssd

Yes, but with enterprise drives and cache buffering, that's trivial. You are typically looking at decades before failure.

850 pros are not enterprise drives.

Whoops, missed that.

Dashrender

@tim_g said in Is this server strategy reckless and/or insane?:

@scottalanmiller said in Is this server strategy reckless and/or insane?:

@creayt said in Is this server strategy reckless and/or insane?:

@dashrender I care about it, but because it's automatically replicated after each write there's a fully up-to-date, ready-to-go backup of it the next U down at all times. Could/would also push nightly backups offsite somewhere I suppose.

Looks like Raid 5 for SSDs can also, possibly, shorten their lifespan because of the parity writes: https://serverfault.com/questions/513909/what-are-the-main-points-to-avoid-raid5-with-ssd

Yes, but with enterprise drives and cache buffering, that's trivial. You are typically looking at decades before failure.

850 pros are not enterprise drives.

I was to slow to respond.. I didn't miss that.

creayt

Let me ask this.

The only thing that'll be stored on each Raid 0/5 is

The MySQL data files ( not the MySQL installation )
and
The image uploads

So if a drive in the Raid 0 fails, I simply replace the drive, recreate the virtual disk, and then copy the database and images, which I think takes just a few minutes w/ two systems of this caliber 1U away from each other especially w/ so many cores to spare ( won't be competing w/ the load of the live site ).

So, since I have to drive an SSD over to the datacenter 10 minutes away, open the box, and get it in, a few more minutes for the copy feels like it'll be negligibly more time than if it failed w/ a Raid 5, where it would stay online ( though I don't know if my set up lets you do the Raid 5 replacement while the OS is running, maybe it does, or maybe I just hot swap the drive I'm not sure ).

So, because the full penalty for a Raid 0 failing vs. a Raid 5 in my set up is basically a few more minutes to copy the stuff manually, seems like the performance improvements would be worth the gamble. Is that logic sound or do y'all think just keeping the array online is better so 5 is the way to go anyway?

Obsolesce

Just an FYI:

 
Posted by
DELL-Josh Cr 
on 16 Mar 2015 15:41 

Hi,
...if it is not a Dell drive we won’t have put our firmware on it that is designed for our controllers and we will not have validated it....

Thanks,
Josh Craig
Dell EMC | Enterprise Support Services
Get support on Twitter: @DellCaresPRO
Download our QRL app: iOS, Android, Windows

creayt

@creayt Also forgot to bring up that Raid 0 also gives me way more capacity right so it'd give me terabyte(s) more before I had to scale to extra hardware? Can't remember how much Raid 5 subtracts.

scottalanmiller

That's not a horrible recovery strategy. But if the question is performance, how much downtime or effort caused by that offsets the performance difference? That's a real question. Will anyone notice the performance difference day to day? Will they notice five minutes or an hour of downtime? Will you notice having to do all of that work that could have been avoided?

Those are the real questions.

DustinB3403

@creayt said in Is this server strategy reckless and/or insane?:

Let me ask this.

The only thing that'll be stored on each Raid 0/5 is

The MySQL data files ( not the MySQL installation )
and
The image uploads

So if a drive in the Raid 0 fails, I simply replace the drive, recreate the virtual disk, and then copy the database and images, which I think takes just a few minutes w/ two systems of this caliber 1U away from each other especially w/ so many cores to spare ( won't be competing w/ the load of the live site ).

So, since I have to drive an SSD over to the datacenter 10 minutes away, open the box, and get it in, a few more minutes for the copy feels like it'll be negligibly more time than if it failed w/ a Raid 5, where it would stay online ( though I don't know if my set up lets you do the Raid 5 replacement while the OS is running, maybe it does, or maybe I just hot swap the drive I'm not sure ).

So, because the full penalty for a Raid 0 failing vs. a Raid 5 in my set up is basically a few more minutes to copy the stuff manually, seems like the performance improvements would be worth the gamble. Is that logic sound or do y'all think just keeping the array online is better so 5 is the way to go anyway?

Keeping the OBR5 online and recovering from that would be faster than having to completely rebuild an OBR0.

creayt

@tim_g What are the implications of this, do you know? For what it's worth none of these drives do the amber light thing in either server, all green and they report as SSDs etc. in the lifecycle tooling.

DustinB3403

@creayt said in Is this server strategy reckless and/or insane?:

@creayt Also forgot to bring up that Raid 0 also gives me way more capacity right so it'd give me terabyte(s) more before I had to scale to extra hardware? Can't remember how much Raid 5 subtracts.

How much storage does this system need?

creayt

@dustinb3403 It's a community style site that's some kind of hybrid between Reddit and something like Mango Lassi, so the more users I get, the more content they'll generate ( mostly in the form of MySQL data ) and the more footprint I'll need, eventually having to go cloud probably if it takes off. But will be a huge volume of small database writes happening pretty much 24/7.