CSV... what happens at a lower level?

Jimmy9008

Hi folks,

I have storage which is being presented to several Windows Server 2019 hosts over iSCSI. From each host, I can write to the storage at 1 GB/s actual over the storage network. I have tested this by copying an 14GB ISO file.

I then add the storage to Failover Cluster storage as CSV. When I try to put the same ISO on the CSV, its running at 80 - 100 MB/s only.

What is happening at a lower level that causes the rate to drop so much? I remove the CSV and can then write at 1 GB/s from any host again to the iSCSI target.

Best,
Jim

scottalanmiller

@Jimmy9008 said in CSV... what happens at a lower level?:

What is happening at a lower level that causes the rate to drop so much? I remove the CSV and can then write at 1 GB/s from any host again to the iSCSI target.

CSV is a filesystem, like NTFS or ReFS. So by choosing it you change fundamentally how the disks are being used and like any change in filesystem this potentially has a giant impact on performance. NTFS is the fastest option, ReFS is slower but still reasonably fast, CSV is expected to cause a big drop.
CSV, unlike NTFS or ReFS, is a clustered file system. That means it has to do a ton of work both on disk (by writing meta data that is not needed at all without clustering) and in the drivers (by reading and checking a lot of data that otherwise would not exist at all and enforcing security systems that normally do not exist) that traditional filessytems do not. Because of this, there is way more disk activity, way more code paths, and more waiting and verifying than you are used to. So that the performance tanks is to be expected.

scottalanmiller

How clustered file systems work is pretty universal so we'll talk in broad generalities. There is a reason that you never use them outside of cases where nothing else will do, because they are complex and slow, but they have to be. They make it feasible to have multiple device drives talk to a single dumb storage device, at the same time, without corrupting the data. They are a kludge for making raw disks able to be shared when there is no intelligent logic system existing to handle it.

Clustered file systems are at the raw disk level like shared files are higher up the stack. You know what it is like when you have two or more users trying to access a single Access database at the same time? It works, but it slows down a lot and increases risk because sharing a single file isn't how the system is designed and Access has to put extra stuff inside their file to make it possible at all, and the systems accessing that file have to "play nice" because it is purely at their discretion whether they will do so or not.

Jimmy9008

That makes sense, would a drop from 1GB/s to 100MB/s be expected? Seems huge...

scottalanmiller

So what is happening in any clustered file system is that on access, each attached drive controller (this is identical whether sharing on iSCSI, directly attaching on a single P-SCSI cable or whatever) needs a place on the disk where it can safely write and tell other controllers what it is doing, has done, and plans to do.

Any file that gets touched, at all, has to have a journal where the controllers record that this has happened. Because the controllers can't cache or record actions as they happen like normally they do. So they are writing all kinds of data that normally is done automatically in memory based cache to the disk, this takes a long time. Then every other controller has to read that data constantly to know what is going on on the disk. Normally this takes zero time because they have it in cache automatically, but now there are "other cooks in the kitchen" that only communicate through a written diary.

Imagine cooking dinner for a restaurant and the cooks are all blind and deaf. They have to feel their way around the kitchen and if they are alone they work from memory of where things are where they have left things. But once you have two cooks, you can't work from memory. So each cook has to go to a journal (written in braille) and write down every step that they are doing, and every detail as to where they are going to move a knife or spatula. So the amount of time spent writing down what will be done is huge, and the amount of time reading if anything has been done so that they don't step on each other's toes or lose things or mess up the recipe by each doing the same steps, etc.

scottalanmiller

@Jimmy9008 said in CSV... what happens at a lower level?:

That makes sense, would a drop from 1GB/s to 100MB/s be expected? Seems huge...

Depending on the system, yeah, especially with certain kinds of operations.