File transfer drop



  • Hi folks,

    Can anybody else verify this issue? If you have two windows server 2019 VMs on two different hosts, on a 10 gigabit network, if you try to copy a large file (>10 GB) from \vm1\c$ to \vm2\c$ between VMs, do you start fast ~600 MB/s and then drop to 30 MB/s for the rest of the transfer after about 10 seconds?

    Best,
    Jim



  • You left out the parts that tend to matter, like the storage. I'm guessing you've got a spinner in the chain there somewhere. If so, yup, that's as expected.



  • @scottalanmiller said in File transfer drop:

    You left out the parts that tend to matter, like the storage. I'm guessing you've got a spinner in the chain there somewhere. If so, yup, that's as expected.

    The servers are hyperv hosts. Each has one VM only (vm1 and vm2 referred in the original post). Physicals each have dual 2.1 GHz 22 core procs, 768 GB RAM, and 11 x 600 GB SSD in Raid 5 sitting behind hardware Raid. Dell card I believe with 8GB cache.

    From host to host over the network, I get 1.6GB/s solid. From VM <-> VM, the transfer starts at ~600 MB/s then drops to ~30 MB/s on a 6 GB file with 1 GB left. Each and every time. The VMs are 2019 server both with 200 GB disk on the array, access to the 10Gbe network, and 10GB RAM assigned...



  • To add, I just did this test with two new windows 10 vm on the hosts, solid ~600MB/s transfer, no slowdown...

    WTF is 2019 doing!



  • @Jimmy9008 said in File transfer drop:

    @scottalanmiller said in File transfer drop:

    You left out the parts that tend to matter, like the storage. I'm guessing you've got a spinner in the chain there somewhere. If so, yup, that's as expected.

    The servers are hyperv hosts. Each has one VM only (vm1 and vm2 referred in the original post). Physicals each have dual 2.1 GHz 22 core procs, 768 GB RAM, and 11 x 600 GB SSD in Raid 5 sitting behind hardware Raid. Dell card I believe with 8GB cache.

    From host to host over the network, I get 1.6GB/s solid. From VM <-> VM, the transfer starts at ~600 MB/s then drops to ~30 MB/s on a 6 GB file with 1 GB left. Each and every time. The VMs are 2019 server both with 200 GB disk on the array, access to the 10Gbe network, and 10GB RAM assigned...

    Dual 22 core processors (44 cores total)? Damn, those VMs cost like $5000 each in licensing (well per pair of VMs). Definitely expensive.
    or did I miss something?

    (hint - I made up the $5000 number, but it's definitely going to be WAY more expensive than the default 16 core setup at $800 for two VMs)



  • Is it the VMQ issue?



  • @Dashrender said in File transfer drop:

    Is it the VMQ issue?

    1. That was supposedly resolved.
    2. The only ever affected 1gb nics


  • This sound like the exact same problem as in the thread below:
    https://mangolassi.it/topic/21725/disabling-spectre-mitigations-in-2020

    "What I find puzzling / frustrating is that I've tested installing Server2019 and W10 1909 as guests on the same hardware with the same XenTools and get hugely different performance results. My understanding was that they were essentially a shared code-base so I'm at a loss as to why such a performance difference. Specific test case is a huge file transfer, the W10 guest spikes a cpu thread to 95 or 100% while the Server 2019 guest runs a thread up to about 20% and it stays there for the duration of the file transfer."



  • @Dashrender said in File transfer drop:

    @Jimmy9008 said in File transfer drop:

    @scottalanmiller said in File transfer drop:

    You left out the parts that tend to matter, like the storage. I'm guessing you've got a spinner in the chain there somewhere. If so, yup, that's as expected.

    The servers are hyperv hosts. Each has one VM only (vm1 and vm2 referred in the original post). Physicals each have dual 2.1 GHz 22 core procs, 768 GB RAM, and 11 x 600 GB SSD in Raid 5 sitting behind hardware Raid. Dell card I believe with 8GB cache.

    From host to host over the network, I get 1.6GB/s solid. From VM <-> VM, the transfer starts at ~600 MB/s then drops to ~30 MB/s on a 6 GB file with 1 GB left. Each and every time. The VMs are 2019 server both with 200 GB disk on the array, access to the 10Gbe network, and 10GB RAM assigned...

    Dual 22 core processors (44 cores total)? Damn, those VMs cost like $5000 each in licensing (well per pair of VMs). Definitely expensive.
    or did I miss something?

    (hint - I made up the $5000 number, but it's definitely going to be WAY more expensive than the default 16 core setup at $800 for two VMs)

    Currently only two test VMs. There will eventually be 100's of VMs.



  • @Dashrender said in File transfer drop:

    Is it the VMQ issue?

    Yes, VMQ disabled on all hosts and VMs



  • One thing I have found is that if VMs are given 50GB RAM, they get solid transfer of ~300MB/s. I guess a percentage of VM RAM is used as a cache and once thats full, the network speed drops. Not sure though. And no idea where what cache is or how to edit it. Just a guess.



  • How are you copying the file from VM to VM?



  • Are you sure, you were actually getting 1.6 GB/s, not 1.6 Gbit/sec with iPerf or similar? 1.6 GB/s is more than the theoretical maximum bandwidth of a 10 Gbit link unless it's a sum of results for each of two vNICs or something. Copypasting the file across remote desktops won't give you an accurate figure, btw. With coping large files over 10 Gbit links or above you'd need Jumbo Frames enabled on all devices along the path. Otherwise, you won't be able to fully utilise the available bandwidth. Your storage config should not be a bottleneck. On the other hand, file copy is not a real measuring tool for VM storage performance. You should be using something like diskSPD or IOmeter to get more accurate results inside VMs for typical virtual workloads (random in nature). Storage latency and IOPS are more accurate metrics for them IMO.



  • @Pete-S said in File transfer drop:

    This sound like the exact same problem as in the thread below:
    https://mangolassi.it/topic/21725/disabling-spectre-mitigations-in-2020

    "What I find puzzling / frustrating is that I've tested installing Server2019 and W10 1909 as guests on the same hardware with the same XenTools and get hugely different performance results. My understanding was that they were essentially a shared code-base so I'm at a loss as to why such a performance difference. Specific test case is a huge file transfer, the W10 guest spikes a cpu thread to 95 or 100% while the Server 2019 guest runs a thread up to about 20% and it stays there for the duration of the file transfer."

    Not sure, he seems to have better / more consistent performance on W10 than Server 2019, which is the opposite of what I was seeing with my setup.



  • So a couple of things I'd be looking at if it were me:

    • RAID card config: write-through / write-back will have performance impacts (but should hit S2019 and W10 equally)
    • Network vs storage:
      -- iperf3 only runs in memory, so it completely removes storage from the troubleshooting equation, if you see the same type of drop-off testing with iperf3 you know that there's a networking gremlin somewhere that needs to be dealt with.
      -- something like LANSpeedTest actually writes and reads a file on the far-end storage, so it should provide the same results as your typical file transfer, you can also arbitrarily set the transfer size, just in case you want to test something bigger than what you've got as a static file.
    • What's actually running in the OS at the same time
      -- use something like processhacker to see what else might be using the network or other IO when your file transfer slows to a crawl.
      --Maybe there's security configs being applied to your servers and not the W10 guests that aren't being taken into consideration.


  • @Jimmy9008 said in File transfer drop:

    @Dashrender said in File transfer drop:

    @Jimmy9008 said in File transfer drop:

    @scottalanmiller said in File transfer drop:

    You left out the parts that tend to matter, like the storage. I'm guessing you've got a spinner in the chain there somewhere. If so, yup, that's as expected.

    The servers are hyperv hosts. Each has one VM only (vm1 and vm2 referred in the original post). Physicals each have dual 2.1 GHz 22 core procs, 768 GB RAM, and 11 x 600 GB SSD in Raid 5 sitting behind hardware Raid. Dell card I believe with 8GB cache.

    From host to host over the network, I get 1.6GB/s solid. From VM <-> VM, the transfer starts at ~600 MB/s then drops to ~30 MB/s on a 6 GB file with 1 GB left. Each and every time. The VMs are 2019 server both with 200 GB disk on the array, access to the 10Gbe network, and 10GB RAM assigned...

    Dual 22 core processors (44 cores total)? Damn, those VMs cost like $5000 each in licensing (well per pair of VMs). Definitely expensive.
    or did I miss something?

    (hint - I made up the $5000 number, but it's definitely going to be WAY more expensive than the default 16 core setup at $800 for two VMs)

    Currently only two test VMs. There will eventually be 100's of VMs.

    Well, assuming Windows VMs then DC licensed out will be super cost effective.



  • @notverypunny said in File transfer drop:

    So a couple of things I'd be looking at if it were me:

    • RAID card config: write-through / write-back will have performance impacts (but should hit S2019 and W10 equally)
    • Network vs storage:
      -- iperf3 only runs in memory, so it completely removes storage from the troubleshooting equation, if you see the same type of drop-off testing with iperf3 you know that there's a networking gremlin somewhere that needs to be dealt with.
      -- something like LANSpeedTest actually writes and reads a file on the far-end storage, so it should provide the same results as your typical file transfer, you can also arbitrarily set the transfer size, just in case you want to test something bigger than what you've got as a static file.
    • What's actually running in the OS at the same time
      -- use something like processhacker to see what else might be using the network or other IO when your file transfer slows to a crawl.
      --Maybe there's security configs being applied to your servers and not the W10 guests that aren't being taken into consideration.

    I'm not sure if these will help...

    The physical servers over the network work fine. Full speed ahead! So, cant be RAID settings, network issue or storage. Physical <-> Physical is perfect. What is the point of testing with iperf, im saying already physical <-> physical is perfect...

    The issue is with the VMs. From a VM on host A to a VM on Host B, im seeing much slower speeds. From physical A to physical B, its fine.



  • @Jimmy9008 said in File transfer drop:

    @notverypunny said in File transfer drop:

    So a couple of things I'd be looking at if it were me:

    • RAID card config: write-through / write-back will have performance impacts (but should hit S2019 and W10 equally)
    • Network vs storage:
      -- iperf3 only runs in memory, so it completely removes storage from the troubleshooting equation, if you see the same type of drop-off testing with iperf3 you know that there's a networking gremlin somewhere that needs to be dealt with.
      -- something like LANSpeedTest actually writes and reads a file on the far-end storage, so it should provide the same results as your typical file transfer, you can also arbitrarily set the transfer size, just in case you want to test something bigger than what you've got as a static file.
    • What's actually running in the OS at the same time
      -- use something like processhacker to see what else might be using the network or other IO when your file transfer slows to a crawl.
      --Maybe there's security configs being applied to your servers and not the W10 guests that aren't being taken into consideration.

    I'm not sure if these will help...

    The physical servers over the network work fine. Full speed ahead! So, cant be RAID settings, network issue or storage. Physical <-> Physical is perfect. What is the point of testing with iperf, im saying already physical <-> physical is perfect...

    The issue is with the VMs. From a VM on host A to a VM on Host B, im seeing much slower speeds. From physical A to physical B, its fine.

    Granted, your physical network and storage may be up to par from host to host. You still need to identify where the VMs are having issues, so it's from within your VMs that you need to do this testing, to see if the problem is with your virtualized storage / network / other. Virtualization passthrough sounds like a miracle but it doesn't always work as expected.



  • Server 2019 enables by default some Hyper-v feature called RSC. I wonder if this is your issue. Someone had this same issue as me and turned it off and my Read speed went up to the 900Mbps limited by switch speed now.

    https://serverfault.com/questions/976324/very-poor-network-performance-with-server-2019



  • @magicmarker said in File transfer drop:

    Server 2019 enables by default some Hyper-v feature called RSC. I wonder if this is your issue. Someone had this same issue as me and turned it off and my Read speed went up to the 900Mbps limited by switch speed now.

    https://serverfault.com/questions/976324/very-poor-network-performance-with-server-2019

    @Jimmy9008 Looks like this might be your silver bullet


Log in to reply