Is this server strategy reckless and/or insane?
-
@storageninja ok smartphone here. Will be ultrashort.
0- really enlighting thank you!
1- I was thinking about a simple layout for bench: os, RAID controller, disks no hypervisor, no apps on system like network fs servers and so on.
I tested the machine w/ Centos with iozone. So my fault: with controller I meant raid controller2- yes, cache is on disk controller board.
3- so when my raid controller card asks me to disable disk onboard cache, and performance actually drops a lot on ssd, what actually happens? Dram is still alive?
-
@matteo-nunziati said in Is this server strategy reckless and/or insane?:
@storageninja ok smartphone here. Will be ultrashort.
0- really enlighting thank you!
1- I was thinking about a simple layout for bench: os, RAID controller, disks no hypervisor, no apps on system like network fs servers and so on.
I tested the machine w/ Centos with iozone. So my fault: with controller I meant raid controller2- yes, cache is on disk controller board.
3- so when my raid controller card asks me to disable disk onboard cache, and performance actually drops a lot on ssd, what actually happens? Dram is still alive?
Depends on the vendor and the drive but I would suspect DRAM cache is still being used (again to protect endurance), it's just delaying the ACK until it gets to the lower level. Now on some enterprise drives that have capacitors (so they can protect that DRAM completely on power loss) they will sometimes still ACK a write in DRAM anyways (as nothing really changes and it's why those drives can post giant performance numbers). On a drive that has full power loss protection built in benching with the cache disabled is dumb as we don't care what the RAW NAND can do, we care what the drive can do under a given load. Your better off in this case if you want to stress the drives do two tests.
- 75% write small block (some drives fall over on mixed workload).
- 100% sequential write large block (256KB).
Even then if your workload doesn't look like this (most don't) then it's kinda pointless finding break points of drives. The point of benchmarking is to make sure a system will handle your workload not find it's break point.
Most people screw up and accidentally test a cupecake (a DRAM cache for reads somewhere), or try to break it with an unrealistic workload. Outside of engineering labs for SSD drives and storage products, there isn't a lot of use to this.
Another thing to note is you can capture and replay an existing workload using vSCSI trace capture and one of the VMware storage flings. you can even "accelerate" or duplicate it several times over. This helps know what your REAL workload will look like on a platform.
-
Another trend in benchmarking is using stuff like HCI bench or VM Fleet to test LOTS of workloads. A single worker in a single VM doesn't' show what contention looks like at scale.
-
@storageninja best meme I've seen in a long time.
-
enterprise drives that have capacitors
This. I asked the reseller about this feature. They anwer: disable ssd cache anyway and use controller cache.
Thelatterformer is a safer choice while the latter is too new/untested feature... -
Your better off in this case if you want to stress the drives do two tests.
I did a test w/ random read and write simulating a thread per expected user.
-
@matteo-nunziati said in Is this server strategy reckless and/or insane?:
This. I asked the reseller about this feature. They anwer: disable ssd cache anyway and use controller cache.
The latter former is a safer choice while the latter is too new/untested feature...To be blunt, the reseller doesn't know what they are talking about. Every enterprise SSD in the modern era (using some sort of FTL) uses this design and has for years. They are configured this way even in big enterprise storage arrays with the unique exception of Pure Storage who re-writes their firmware to basically use drives as dumb NAND devices (and then has MASSIVE NVRAM buffers fronting the drives that do the same damn thing at a global level).
Some SDS systems want you to explicitly disable the front cache as it will coalesce data and prevent data proximity optimizations in the actual raw data placement. It also exists as yet another place that data can be lost or corrupted and for systems that want to "own" IO integrity end to end they want to know where stuff is.
Then again, what do I know...
-
@storageninja said in [Is this server strategy reckless and/or insane?]
Then again, what do I know...
According to a vendor, or anyone that's got a clue?
-
@travisdh1 My job is to fly drink and talk primarily
-
@storageninja said in Is this server strategy reckless and/or insane?:
@travisdh1 My job is to fly drink and talk primarily
I don't know how to "fly drink" or to "talk primarily"!
-
@scottalanmiller said in Is this server strategy reckless and/or insane?:
@storageninja said in Is this server strategy reckless and/or insane?:
@travisdh1 My job is to fly drink and talk primarily
I don't know how to "fly drink" or to "talk primarily"!
You're job description includes talking to people on web forums now doesn't it? Also, when do you stop drinking?
-
@travisdh1 said in Is this server strategy reckless and/or insane?:
You're job description includes talking to people on web forums now doesn't it? Also, when do you stop drinking?
Fly, Drink, Talk. There you go.
No, hanging out on web forums is not my job.
I actually didn't drink that much this weekend (was too hot, working on the beach house).My day job involves...
-
Flying to conferences and speaking. I have 11 conference presentations in the next 4 weeks. Crowd size is 200-800.
-
Flying to fun places and meeting with people. I'll be in India soon meeting with Customers, Partners, and SE's training them and taking questions, and collecting feedback for engineering.
-
Breaking things. I technically am classified as a R&D employee and have full access to our nightly builds, our BAT private cloud, and a dozen "Fully loaded" servers for a lab. I test the new stuff, send feedback through my customer [0] Team, and meet with engineers to capture the subtitles of what's coming out. I don't write the technical publications (core documentation), but I do draft thousands upon thousands of words for design and sizing and usage guides, blogs.
-
I host a podcast for the lols.
-
-
That RAID tho. 5 drives in a 0 seems to be the magic number for this controller.
-
@creayt said in Is this server strategy reckless and/or insane?:
That RAID tho. 5 drives in a 0 seems to be the magic number for this controller.
In RAID 0? That makes sense there are no penalties (aside from reliability) and all the performance of all the drives.
-
@coliver Indeed, but what's interesting is how 5 drives specifically beat other quantities of the same drive in Raid 0 on the same hardware from my earlier posts ( can't link to them because Mango Lassi has been freaking out on me and doing weird stuff including not rendering the images as I scroll ).