Comparing Server CPU Capabilities?

wrx7m

As I explained in another thread, I am trying to build out 2 new ESXi hosts. I currently have 2 R720XD servers, each with 2 - Intel Xeon E5-2609 2.4 GHz (v1) 4-core. I am looking at replacing them with 2- R740XD servers, each with a single Intel Xeon Gold 6242 2.8GHz (Turbo 3.9) 16-core.

This is the only benchmark I could find. The 6242 only has 1 benchmark test sample, so it is hard to say how much weight it holds.
https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6242-vs-Intel-Xeon-E5-2609/3516vs1429

My question is, when using my existing servers' performance history, how can I use that to see how much of a benefit, the new CPU would provide?

scottalanmiller

@wrx7m said in Comparing Server CPU Capabilities?:

My question is, when using my existing servers' performance history, how can I use that to see how much of a benefit, the new CPU would provide?

Bottom line is... you can't. That's not something that capacity planning data can tell you. You can't tell how much faster CPU resources would make things. You can tell that things won't get slower, you can tell that there will be more capacity. But you can't tell if anyone will notice.

The problem is, unless your old CPUs are maxed out pretty much full time, and the app was CPU bound, and you know how much past capacity you were.... you've got nothing to go on.

scottalanmiller

If you have an issue where, for example, you can tell that a user is getting a 1 minute delay cause by something being CPU bound during that time, and you get a faster CPU, and you can estimate pretty accurately how much "unbinding" that will do, you can estimate a benefit.

But you aren't replacing the old servers because they are CPU bound. Almost no one does. CPUs are rarely a bottleneck in modern systems.

With your new CPU you are getting NUMA improvements, clock speed improvements, CPU generational improvements, cache improvements, etc. It's all "better". But it is likely to be like comparing the value of a Ferrari over a Toyota for a commuter. Is the Ferrari better? Heck yeah. Will you be able to measure an improvement in commute time? No, because the speed of the car wasn't what was making it take 20 minutes to get to the office - that's mostly determined by the speed limit and traffic.

wrx7m

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

IRJ

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

How can you define similar, though?

scottalanmiller

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?

scottalanmiller

In theory, you could pretty easily say that you could get about 30% more workload onto your systems using the new CPU... assuming that nothing else is a bottleneck.

CPUs are super complex. You are doubling your threads, removing NUMA issues, increasing clock speed, improving per-cycle performance.... it's a lot of factors. But latency and throughput are not easily measured and if you knew exactly how your workloads performed in the past and exactly how new ones would performance and how they would interact with each other, then you'd have some chance of calculating value. But that's a lot of stuff to know.

Some tricky things can be IO Wait related issues on the CPU in the past. A CPU that is thrashing can be busy, for no fault of its own. Changing RAM or storage can dramatically change the characteristics of the CPU / workload relationship. What seems CPU heavy with too little RAM or slow disks might seem CPU light when RAM is plentiful and storage is fast.

What kind of processing are you doing that is eating up so much CPU today? Knowing what the CPUs are doing will give us more insight into what faster CPUs might be able to accomplish.

NetworkNerd

Do you have any issues with high CPU ready times today? I'd be interested to know the vCPU size of VMs in your environment currently and how you expect that to change in the new world. Some of the "how much more CPU do I need" will be based on current contention (if any), how wide the VMs are today, and how you plan to change that when the number of running VMs gets doubled.

wrx7m

@scottalanmiller said in Comparing Server CPU Capabilities?:

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?

That was also my question. If I could (and knew what it was), I would be able to more than take a shot in the dark for choosing the new CPU. Currently, I stopped at 16 cores for Windows licensing and CPU/hardware cost. I only chose 1 CPU per server, due to the performance overhead that you and others have talked about when using more than 1 CPU with other specs being the same.

scottalanmiller

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller said in Comparing Server CPU Capabilities?:

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?

That was also my question. If I could (and knew what it was), I would be able to more than take a shot in the dark for choosing the new CPU. Currently, I stopped at 16 cores for Windows licensing and CPU/hardware cost. I only chose 1 CPU per server, due to the performance overhead that you and others have talked about when using more than 1 CPU with other specs being the same.

Doubling your cores, while also adding clock speed and more efficient cycles is a huge leap to add for workloads that don't exist yet. Is there a reason to expect the company to be adding so many more computationally heavy workloads? That's a huge amount of capacity.

wrx7m

@scottalanmiller said in Comparing Server CPU Capabilities?:

What kind of processing are you doing that is eating up so much CPU today?

I have several Windows Server systems using different DB technologies, like my ERP (not even sure what the DB is) and MS SQL (WSUS, PRTG, PDQ, Veeam BR, Veeam One, etc). FileMaker (more RAM intensive) and MySQL (Fishbowl Inventory). I have a 2008 R2 RDS server, that is being migrated to Server 2016. Based on my experience with the current environment, Server 2016 uses more CPU than previous versions- Especially during Windows updates. I also have 3 Windows file and print servers, but they are more disk than CPU.

On one host, the most CPU intensive are a Ruckus wireless controller and PRTG monitoring server. - Currently, showing as 31% and 36% CPU utilization, respectively.

On the other, the most CPU intensive are PDQ Inventory/Deploy, Veeam BR and Veeam One and the current RDS server. Currently showing as 30%, 33% and 11% CPU utilization, respectively.

wrx7m

@scottalanmiller said in Comparing Server CPU Capabilities?:

Is there a reason to expect the company to be adding so many more computationally heavy workloads?

I don't know that we would be doubling it, I was just using that as an example on how I can associate some numbers to this planning process. If someone says, we are spending $25K per server, how much more capability/capacity do we get from our investment? For RAM and storage, that is easy- Faster and greater RAM and storage capacity, adding tiered storage by introducing SSDs in RAID5 and using NLSAS in RAID1 for "bulk" storage.

Apparently, due to the complexity and advancements in CPU tech, there isn't an easy way to calculate or express a tangible value for it.

wrx7m

@NetworkNerd said in Comparing Server CPU Capabilities?:

Do you have any issues with high CPU ready times today?

Here is the CPU Ready chart (previous month) at the host level for the first host . It has 9 active VMs.

Here is the CPU Ready chart (previous month) at the host level for the second host. It has 17 active VMs.

The differential between the two, in terms of number of VMs, is due to the current storage constraints. One of the VMs on the first host has significantly greater storage usage than all others. This has grown over time and forced the uneven allocation of VMs and has compounded over time.

wrx7m

@IRJ said in Comparing Server CPU Capabilities?:

@wrx7m said in Comparing Server CPU Capabilities?:

@scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

How can you define similar, though?

Whatever my average CPU utilization is for my existing VMs on my existing hosts with 2 x Intel Xeon E5-2609 v1 CPUs.

1337

You have E5-2600 V1, V2, V3, V4 and Scalable Gen1 and Gen2 as the latest generation. So the new CPU is 5 generations newer.

A rough number is that you get about 10% increase in performance per core for each generation, using the same clock speed. Sometimes it's more and often it's less but memory speed have also increased so I think 10% is a fair number. That said, a lot of the overall performance increases has been from increasing the clock speed or in most cases increasing the number of cores.

Anyway, 5 generations @ 10% increase is a 60% increase. Add clock speed increase from 2.4 to 2.8Ghz and you have 87%. Add going from 2x4 cores to 16 cores and you have a total of about 3.7 times CPU performance increase.

To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.

Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.

1337

@wrx7m said in Comparing Server CPU Capabilities?:

https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6242-vs-Intel-Xeon-E5-2609/3516vs1429

The numbers in the link are a little misleading because you are going from dual E5-2609 v1 to single 6242. The multi core performance for that is 8162 versus 25313 for the 6242.

One sample could be misleading but you could always check for other similar CPUs. To me the numbers looks reasonable though.

scottalanmiller

@wrx7m said in Comparing Server CPU Capabilities?:

I don't know that we would be doubling it, I was just using that as an example on how I can associate some numbers to this planning process. If someone says, we are spending $25K per server, how much more capability/capacity do we get from our investment? For RAM and storage, that is easy- Faster and greater RAM and storage capacity, adding tiered storage by introducing SSDs in RAID5 and using NLSAS in RAID1 for "bulk" storage.

The problem there is... if you don't need it, that capacity is wasted. So unless you need X capacity, whether you get it or not for the investment is moot.

scottalanmiller

@Pete-S said in Comparing Server CPU Capabilities?:

The numbers in the link are a little misleading because you are going from dual E5-2609 v1 to single 6242. The multi core performance for that is 8162 versus 25313 for the 6242.

Yes, you have to remember to double the numbers when looking at the multi-threaded composites. But you also have to shave a little bit for the dual socket overhead. but doubling is close.

notverypunny

Another element to consider is how the newer gen CPUs deal with the Spectre / Meltdown and friends when compared to the older gen of CPU. Also keep in mind that if you're going single socket your RAM options will be limited compared to dual socket. I'm not saying that one or the other is better, as it all depends on budget and purpose, but I didn't notice either factor being mentioned when skimming the thread.

DimS

For the analysts of the current utilization of your hosts, best bet would be running a LiveOptics Collection toolkit. In result, the Report will present you with granular details as to the utilization of your CPU, RAM and storage.
After the run is complete, it would be highly beneficial to discuss your requirements to the future solution with the vendor of your choice.