NVMe and RAID?
-
@travisdh1 Yeah it would be a Debian VM providing the SMB share (via Proxmox or xcp-ng) so MD RAID isn't an issue. Proxmox can use ZFS Raid 1 whilst xcp-ng can do standard MD RAID.
Edit: Dell even has that BOSS add-in system that allows for a RAID 1 bootable volume just for the OS. The NVMe drives could be VM storage only if I go that route.
-
@biggen NVMe storage is indeed ridiculously fast. When I say fast think about its latency rather than throughput. In practice, their performance really shines with heavily used relational DBs. Doing RAID over the network with NVMe would require at least 25 GbE with RDMA support end-to-end and would work even better with NVMeoF initiator. Otherwise, network latency would be a bottleneck. However, for 4k video editing, 10 GbE end-to-end with SSD storage on the server should be sufficient.
There is a better alternative than interface bonding between a single file server and clients, it's called SMB multi-channel support that uses multiple network interfaces for data transfers (clients need to have multiple NICs though). This way network bandwidth is aggregated with active-active paths not load balanced with active-passive. The downside is SMB Multichannel works reliably in all Windows environment, its Samba implementation is patchy. Mac OS doesn't support it at all AFAIK.
-
NVMe drives are the same price as SAS3 - with the same write endurance / manufacturer.
If you go Dell, because you want them holding your hand, you'll pay the 2-3 times as much for the drives. That's just the way it is.
Consider that more than one person can access the fileserver at the same time,. You can get away with 10GbE at the clients (bonding doesn't help at the client). That means a 100 GB video file will take 100 seconds to transfer.
However you need more than that on the server and your array need to be able to handle more than 1 GB (gigabyte) per sec.
Most 10GbE switches have 40GbE ports as well. So a two port 40GbE NIC on the server will allow 8 streams of 1 GB/sec for a total of 8GB/sec.
That means that your array need to handle 8 GB/sec. You need a lot of drives if you're not going with NVMe drives to get that kind of performance.
If you do a fileserver like this, skip the hypervisor completely and run it on bare metal. You'll lose at ton of performance otherwise.
Also, latency means nothing in your application. It's all about transfer rate.
So something like debian on bare metal, md raid and use 4TB or larger NVMe U2 drives.
Go for a CPU with high base frequency. High I/O rates from NVMe drives will use a quite a bit of CPU power. You don't need lots and lots of cores though. Go for drives with 1 DWPD for best value. -
@taurex Thanks for that information. More to go over for me it seems!
@Pete-S I figure going Dell or HPE is the way to go for him. He needs to have a support contract behind something like this and it doesn't need to be me.
I hadn't considered uplinks of 40Gbe+. Makes sense.
Skip the hypervisor, huh? I figured it would add a performance penalty but makes backups that are so much easier. I don't even know how to perform bare metal backups on servers. Backing up the video files being worked on would be easy via a traditional Synology NAS (or custom built solution) but backing up the OS in the event that a update renders it broken would take some thought.
I assume Samba could keep up with 8GB/sec (assumes ~8 users all transferring at the same time) so long as the underlying storage is performant enough so Samba isn't waiting?
-
@Pete-S said in NVMe and RAID?:
If you do a fileserver like this, skip the hypervisor completely and run it on bare metal. You'll lose at ton of performance otherwise.
Agreed. This is one of those rare exceptions.
-
@biggen said in NVMe and RAID?:
I figured it would add a performance penalty but makes backups that are so much easier.
It shouldn't. What do you need to grab.... one Samba config file and the SMB share? Hypervisor won't make backing that up any easier.
-
@scottalanmiller said in NVMe and RAID?:
@Pete-S said in NVMe and RAID?:
If you do a fileserver like this, skip the hypervisor completely and run it on bare metal. You'll lose at ton of performance otherwise.
Agreed. This is one of those rare exceptions.
I'm not sure about this claim? Maybe ten years ago.
The above solution I mentioned has the workloads virtualized. We've had no issues saturating a setup with IOPS or throughput by utilizing virtual machines.
It's all in the system configuration, OS tuning, and fabric putting it all together. Much like setting up a 6.2L boosted application, there's a lot of pieces to the puzzle.
EDIT: As a qualifier, we're an all Microsoft house. No VMware here.
-
@PhlipElder said in NVMe and RAID?:
@scottalanmiller said in NVMe and RAID?:
@Pete-S said in NVMe and RAID?:
If you do a fileserver like this, skip the hypervisor completely and run it on bare metal. You'll lose at ton of performance otherwise.
Agreed. This is one of those rare exceptions.
I'm not sure about this claim? Maybe ten years ago.
The above solution I mentioned has the workloads virtualized. We've had no issues saturating a setup with IOPS or throughput by utilizing virtual machines.
It's all in the system configuration, OS tuning, and fabric putting it all together. Much like setting up a 6.2L boosted application, there's a lot of pieces to the puzzle.
EDIT: As a qualifier, we're an all Microsoft house. No VMware here.
We're not talking about any fabric because we are talking about local NVMe storage. Data goes straight from the drive over the PCIe bus directly to the CPU.
For high performance I/O workloads the difference between virtualized and bare metal has increased, not decreased, because the amount of I/O you can generate has increased.
When everyone was running spinners and SAS, you couldn't generate enough I/O for the small overhead that virtualizing added to matter. A few percent at most.
As NVMe drives becomes faster and faster and CPUs have more and more PCIe lanes it's not difficult to generate massive amount of I/O. Then every little added overhead for each I/O operation will become more and more noticeable. That's because the overhead becomes a larger percentage of the time, as the total time for the I/O operation becomes shorter.
That's why the bare metal cloud market has had massive growth the last three years or so. There is simply no way to compete with bare metal performance.
Typical bare metal server instances that for instance Oracle offers, runs on all NVMe flash local storage. They put 9 NVMe drives on each server. With high performance NVMe drives that's almost 20 Gigabyte of data per second.
-
One of the first Dell Servers with Hotswap NVME was the R7415 so yeah
https://www.dell.com/en-us/work/shop/povw/poweredge-r7415Not sure what others have seen.
-
@dbeato said in NVMe and RAID?:
One of the first Dell Servers with Hotswap NVME was the R7415 so yeah
https://www.dell.com/en-us/work/shop/povw/poweredge-r7415Not sure what others have seen.
The newer ones have a 5 in the model number, so R7515, R6515 etc.
That's the ones you want to buy. AMD Epyc 2 Rome CPUs.Dual sockets models are R7525, R6525 etc.
And to make this complete: 6 is 1U and 7 is 2U. R7515, R6515 etc.
-
@Pete-S said in NVMe and RAID?:
@dbeato said in NVMe and RAID?:
One of the first Dell Servers with Hotswap NVME was the R7415 so yeah
https://www.dell.com/en-us/work/shop/povw/poweredge-r7415Not sure what others have seen.
The newer ones have a 5 in the model number, so R7515, R6515 etc.
That's the ones you want to buy. AMD Epyc 2 Rome CPUs.Dual sockets models are R7525, R6525 etc.
And to make this complete: 6 is 1U and 7 is 2U. R7515, R6515 etc.
Too helpful must downvote.
-
Chatting with Dell, they don't offer any of their Epyc servers with any 40Gbe offerings. They only go up to dual 25Gbe. They offer HDR100 Infiniband and Fibre channel, but these are pretty foreign to me and I don't even know if they can be used.
-
@biggen said in NVMe and RAID?:
They offer HDR100 Infiniband and Fibre channel, but these are pretty foreign to me and I don't even know if they can be used.
They replace Ethernet. FC is the standard SAN connection. InfiniBand can be used anywhere Ethernet can.
-
@biggen said in NVMe and RAID?:
Chatting with Dell, they don't offer any of their Epyc servers with any 40Gbe offerings. They only go up to dual 25Gbe. They offer HDR100 Infiniband and Fibre channel, but these are pretty foreign to me and I don't even know if they can be used.
It's totally random what Dell offers and what they don't.
They have the Intel XL710-T2L which is dual port 10 GbE but not the XL710-QDA2 which is the dual port 10/40 GbE. It's the same driver and everything.
You could of course buy the network card anywhere you'd like and plug it in.
-
40 GbE is actually 4x10 GbE internally inside the interface. That's why 10 GbE switches have 40 GbE uplinks ports.
The interface is called QSFP+ as in Quad SFP+ (SFP+ being 10GbE, SFP being 1GbE)
And 25 GbE is the upgrade of the 10 GbE. Interface is called SFP28. Same physical dimensions as SFP+.
And 25 GbE switches have 100 GbE uplinks, because 100GbE is 4x25 GbE internally. And the interface is called QSFP28. -
@Pete-S I'd stay away from the 7xx Intel NICs, I heard lots of bad things on different IT forums how they play up. The Mellanox NICs would be my first choice for anything with RDMA support.
-
@taurex said in NVMe and RAID?:
@Pete-S I'd stay away from the 7xx Intel NICs, I heard lots of bad things on different IT forums how they play up. The Mellanox NICs would be my first choice for anything with RDMA support.
I just picked it because that is what Dell sells. It's a simple card, no RDMA, but I don't think RDMA is needed in a fileserver application like this with huge files.
I'm surprised to hear that people have problems with it because it's been around for 5-6 years something now and Intel have newer cards as well. You would kind of assume they've worked out the kinks by now.
Anyway, it more a proof-of-concept at this point. You got to have some numbers to play with to see if it's economically feasible for the customer. What you end up will depend on the budget and what the needs actually are. Switches are also a big cost when it comes to 10GbE and faster.
And yes, Mellanox is good stuff.
-
I appreciate all the help guys. Yeah I'm compiling a price list but it ain't cheap. Server alone would be about $7k and that's on the low end with smaller NVMe drives (1.6TB). Then still have to purchase the switch and then have to purchase the 10Gbe NICs for the workstations themselves.
Its a large investment that I bet never sees the light of day. It will turn into "I have $2k, what can you build with that?"
-
@biggen said in NVMe and RAID?:
I appreciate all the help guys. Yeah I'm compiling a price list but it ain't cheap. Server alone would be about $7k and that's on the low end with smaller NVMe drives (1.6TB). Then still have to purchase the switch and then have to purchase the 10Gbe NICs for the workstations themselves.
Its a large investment that I bet never sees the light of day. It will turn into "I have $2k, what can you build with that?"
FleaBay is your best friend.
10GbE pNIC: Intel x540: $100 to $125 each.
For 10GbE switch go for NETGEAR XS712T XS716T or XS728T depending on port density needed. The 12-port is $1K.
As far as the server goes, is this a proof of concept driven project?
- ASRock Rack Board
** Dual 10GbE On Board (designated by -2T) - Intel Xeon Scalable or AMD EPYC Rome
- Crucial/Samsung ECC Memory
- Power Supply
The board should have at least one SlimSAS x8 or preferably two. Each of those ports gives you two NVMe drives. An SFF-8654 Y cable to connect to a two drive enclosure would be needed. I suggest ICYDOCK.
The build will cost a fraction of a Tier 1 box.
Once the PoC has been run and the kinks worked out, then go for the Tier 1 box tailored to your needs.
- ASRock Rack Board
-
@PhlipElder said in NVMe and RAID?:
FleaBay is your best friend.
Is that where people trade their pets?