Comparing Server CPU Capabilities?



  • As I explained in another thread, I am trying to build out 2 new ESXi hosts. I currently have 2 R720XD servers, each with 2 - Intel Xeon E5-2609 2.4 GHz (v1) 4-core. I am looking at replacing them with 2- R740XD servers, each with a single Intel Xeon Gold 6242 2.8GHz (Turbo 3.9) 16-core.

    This is the only benchmark I could find. The 6242 only has 1 benchmark test sample, so it is hard to say how much weight it holds.
    https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6242-vs-Intel-Xeon-E5-2609/3516vs1429

    My question is, when using my existing servers' performance history, how can I use that to see how much of a benefit, the new CPU would provide?



  • @wrx7m said in Comparing Server CPU Capabilities?:

    My question is, when using my existing servers' performance history, how can I use that to see how much of a benefit, the new CPU would provide?

    Bottom line is... you can't. That's not something that capacity planning data can tell you. You can't tell how much faster CPU resources would make things. You can tell that things won't get slower, you can tell that there will be more capacity. But you can't tell if anyone will notice.

    The problem is, unless your old CPUs are maxed out pretty much full time, and the app was CPU bound, and you know how much past capacity you were.... you've got nothing to go on.



  • If you have an issue where, for example, you can tell that a user is getting a 1 minute delay cause by something being CPU bound during that time, and you get a faster CPU, and you can estimate pretty accurately how much "unbinding" that will do, you can estimate a benefit.

    But you aren't replacing the old servers because they are CPU bound. Almost no one does. CPUs are rarely a bottleneck in modern systems.

    With your new CPU you are getting NUMA improvements, clock speed improvements, CPU generational improvements, cache improvements, etc. It's all "better". But it is likely to be like comparing the value of a Ferrari over a Toyota for a commuter. Is the Ferrari better? Heck yeah. Will you be able to measure an improvement in commute time? No, because the speed of the car wasn't what was making it take 20 minutes to get to the office - that's mostly determined by the speed limit and traffic.



  • @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?



  • @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

    How can you define similar, though?



  • @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

    That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?



  • In theory, you could pretty easily say that you could get about 30% more workload onto your systems using the new CPU... assuming that nothing else is a bottleneck.

    CPUs are super complex. You are doubling your threads, removing NUMA issues, increasing clock speed, improving per-cycle performance.... it's a lot of factors. But latency and throughput are not easily measured and if you knew exactly how your workloads performed in the past and exactly how new ones would performance and how they would interact with each other, then you'd have some chance of calculating value. But that's a lot of stuff to know.

    Some tricky things can be IO Wait related issues on the CPU in the past. A CPU that is thrashing can be busy, for no fault of its own. Changing RAM or storage can dramatically change the characteristics of the CPU / workload relationship. What seems CPU heavy with too little RAM or slow disks might seem CPU light when RAM is plentiful and storage is fast.

    What kind of processing are you doing that is eating up so much CPU today? Knowing what the CPUs are doing will give us more insight into what faster CPUs might be able to accomplish.



  • Do you have any issues with high CPU ready times today? I'd be interested to know the vCPU size of VMs in your environment currently and how you expect that to change in the new world. Some of the "how much more CPU do I need" will be based on current contention (if any), how wide the VMs are today, and how you plan to change that when the number of running VMs gets doubled.



  • @scottalanmiller said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

    That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?

    That was also my question. If I could (and knew what it was), I would be able to more than take a shot in the dark for choosing the new CPU. Currently, I stopped at 16 cores for Windows licensing and CPU/hardware cost. I only chose 1 CPU per server, due to the performance overhead that you and others have talked about when using more than 1 CPU with other specs being the same.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

    That's a bit different. You can estimate the amount of "additional capacity" available. The question becomes... can you put a value number on that additional capacity?

    That was also my question. If I could (and knew what it was), I would be able to more than take a shot in the dark for choosing the new CPU. Currently, I stopped at 16 cores for Windows licensing and CPU/hardware cost. I only chose 1 CPU per server, due to the performance overhead that you and others have talked about when using more than 1 CPU with other specs being the same.

    Doubling your cores, while also adding clock speed and more efficient cycles is a huge leap to add for workloads that don't exist yet. Is there a reason to expect the company to be adding so many more computationally heavy workloads? That's a huge amount of capacity.



  • @scottalanmiller said in Comparing Server CPU Capabilities?:

    What kind of processing are you doing that is eating up so much CPU today?

    I have several Windows Server systems using different DB technologies, like my ERP (not even sure what the DB is) and MS SQL (WSUS, PRTG, PDQ, Veeam BR, Veeam One, etc). FileMaker (more RAM intensive) and MySQL (Fishbowl Inventory). I have a 2008 R2 RDS server, that is being migrated to Server 2016. Based on my experience with the current environment, Server 2016 uses more CPU than previous versions- Especially during Windows updates. I also have 3 Windows file and print servers, but they are more disk than CPU.

    On one host, the most CPU intensive are a Ruckus wireless controller and PRTG monitoring server. - Currently, showing as 31% and 36% CPU utilization, respectively.

    On the other, the most CPU intensive are PDQ Inventory/Deploy, Veeam BR and Veeam One and the current RDS server. Currently showing as 30%, 33% and 11% CPU utilization, respectively.



  • @scottalanmiller said in Comparing Server CPU Capabilities?:

    Is there a reason to expect the company to be adding so many more computationally heavy workloads?

    I don't know that we would be doubling it, I was just using that as an example on how I can associate some numbers to this planning process. If someone says, we are spending $25K per server, how much more capability/capacity do we get from our investment? For RAM and storage, that is easy- Faster and greater RAM and storage capacity, adding tiered storage by introducing SSDs in RAID5 and using NLSAS in RAID1 for "bulk" storage.

    Apparently, due to the complexity and advancements in CPU tech, there isn't an easy way to calculate or express a tangible value for it.



  • @NetworkNerd said in Comparing Server CPU Capabilities?:

    Do you have any issues with high CPU ready times today?

    Here is the CPU Ready chart (previous month) at the host level for the first host . It has 9 active VMs.
    f6d54a6e-1dc6-43eb-8db6-64cf550f7c5d-image.png

    Here is the CPU Ready chart (previous month) at the host level for the second host. It has 17 active VMs.
    dc212634-524b-40ff-8851-ae181e859374-image.png

    The differential between the two, in terms of number of VMs, is due to the current storage constraints. One of the VMs on the first host has significantly greater storage usage than all others. This has grown over time and forced the uneven allocation of VMs and has compounded over time.



  • @IRJ said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    @scottalanmiller So, If my average CPU usage on one of the hosts is over 50% and max is 95% for the past 6 months, I can't use that to say how much more CPU I need if I wanted to double the number of running VMs with similar performance needs?

    How can you define similar, though?

    Whatever my average CPU utilization is for my existing VMs on my existing hosts with 2 x Intel Xeon E5-2609 v1 CPUs.



  • You have E5-2600 V1, V2, V3, V4 and Scalable Gen1 and Gen2 as the latest generation. So the new CPU is 5 generations newer.

    A rough number is that you get about 10% increase in performance per core for each generation, using the same clock speed. Sometimes it's more and often it's less but memory speed have also increased so I think 10% is a fair number. That said, a lot of the overall performance increases has been from increasing the clock speed or in most cases increasing the number of cores.

    Anyway, 5 generations @ 10% increase is a 60% increase. Add clock speed increase from 2.4 to 2.8Ghz and you have 87%. Add going from 2x4 cores to 16 cores and you have a total of about 3.7 times CPU performance increase.

    To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.

    Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6242-vs-Intel-Xeon-E5-2609/3516vs1429

    The numbers in the link are a little misleading because you are going from dual E5-2609 v1 to single 6242. The multi core performance for that is 8162 versus 25313 for the 6242.

    One sample could be misleading but you could always check for other similar CPUs. To me the numbers looks reasonable though.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    I don't know that we would be doubling it, I was just using that as an example on how I can associate some numbers to this planning process. If someone says, we are spending $25K per server, how much more capability/capacity do we get from our investment? For RAM and storage, that is easy- Faster and greater RAM and storage capacity, adding tiered storage by introducing SSDs in RAID5 and using NLSAS in RAID1 for "bulk" storage.

    The problem there is... if you don't need it, that capacity is wasted. So unless you need X capacity, whether you get it or not for the investment is moot.



  • @Pete-S said in Comparing Server CPU Capabilities?:

    The numbers in the link are a little misleading because you are going from dual E5-2609 v1 to single 6242. The multi core performance for that is 8162 versus 25313 for the 6242.

    Yes, you have to remember to double the numbers when looking at the multi-threaded composites. But you also have to shave a little bit for the dual socket overhead. but doubling is close.



  • Another element to consider is how the newer gen CPUs deal with the Spectre / Meltdown and friends when compared to the older gen of CPU. Also keep in mind that if you're going single socket your RAM options will be limited compared to dual socket. I'm not saying that one or the other is better, as it all depends on budget and purpose, but I didn't notice either factor being mentioned when skimming the thread.



  • For the analysts of the current utilization of your hosts, best bet would be running a LiveOptics Collection toolkit. In result, the Report will present you with granular details as to the utilization of your CPU, RAM and storage.
    After the run is complete, it would be highly beneficial to discuss your requirements to the future solution with the vendor of your choice.



  • @Pete-S said in Comparing Server CPU Capabilities?:

    To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.
    Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.

    Thanks. The majority of workloads will be moved to SSD, with the file servers running on spinning disks. I already have 2 10Gig for VMs and 2 for vmotion, per server. That is a good point about the RAM. Current RAM utilization is a bit constrained. Each server only has 128GB. I am looking at 320 GB. This will allow for some growth and also running all VMs on a single host if necessary.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    @Pete-S said in Comparing Server CPU Capabilities?:

    To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.
    Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.

    Thanks. The majority of workloads will be moved to SSD, with the file servers running on spinning disks. I already have 2 10Gig for VMs and 2 for vmotion, per server. That is a good point about the RAM. Current RAM utilization is a bit constrained. Each server only has 128GB. I am looking at 320 GB. This will allow for some growth and also running all VMs on a single host if necessary.

    I'm really wondering if you have a need for two host right off the git go.

    Scott's comment about buying more than you need for today is something to seriously consider.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    @Pete-S said in Comparing Server CPU Capabilities?:

    To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.
    Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.

    Thanks. The majority of workloads will be moved to SSD, with the file servers running on spinning disks. I already have 2 10Gig for VMs and 2 for vmotion, per server. That is a good point about the RAM. Current RAM utilization is a bit constrained. Each server only has 128GB. I am looking at 320 GB. This will allow for some growth and also running all VMs on a single host if necessary.

    By the looks of it, you have 12 memory slots for use with your single CPU, so you have more than enough room.



  • @Dashrender said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    @Pete-S said in Comparing Server CPU Capabilities?:

    To keep the actual performance of the VMs the same but run 3-4 times as many VMs, you would also need the same increase in storage and in network performance. That is quite possible if you are also moving from spinning disks to ssd and perhaps from 1GbE to 10GbE on the hypervisors. You also need 3-4 times as much RAM of course.
    Depending on what you are upgrading I'd say the conservative number is you will get twice the capacity and the optimistic number is that you have four times the capacity.

    Thanks. The majority of workloads will be moved to SSD, with the file servers running on spinning disks. I already have 2 10Gig for VMs and 2 for vmotion, per server. That is a good point about the RAM. Current RAM utilization is a bit constrained. Each server only has 128GB. I am looking at 320 GB. This will allow for some growth and also running all VMs on a single host if necessary.

    I'm really wondering if you have a need for two host right off the git go.

    Scott's comment about buying more than you need for today is something to seriously consider.

    Well, since this is production, yes, we do need 2.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    Well, since this is production, yes, we do need 2.

    You might need two. But being production wouldn't tell us that. Only HA environments needs two. And that's super rare.



  • @scottalanmiller said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    Well, since this is production, yes, we do need 2.

    You might need two. But being production wouldn't tell us that. Only HA environments needs two. And that's super rare.

    If we don't have any of our servers running, no one can do any work except for chat and email. I don't have the exact cost, but I can say that it is expensive.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    If we don't have any of our servers running, no one can do any work except for chat and email. I don't have the exact cost, but I can say that it is expensive.

    Try ballparking it. And ballpark the cost of the second server with all of the setup, risks, and licensing.

    Downtown is usually shockingly cheap. Like, often 1-5% as much as people think that it is. Especially when things like chat and email keep working! Those are the core apps.

    What functions stop in the first five minutes, hour, day if the server goes down?



  • @scottalanmiller said in Comparing Server CPU Capabilities?:

    @wrx7m said in Comparing Server CPU Capabilities?:

    If we don't have any of our servers running, no one can do any work except for chat and email. I don't have the exact cost, but I can say that it is expensive.

    Try ballparking it. And ballpark the cost of the second server with all of the setup, risks, and licensing.

    Downtown is usually shockingly cheap. Like, often 1-5% as much as people think that it is. Especially when things like chat and email keep working! Those are the core apps.

    What functions stop in the first five minutes, hour, day if the server goes down?

    Order processing will stop completely. If no one has access to the ERP and other LOB apps we use, they can't do much of anything.



  • The rule of thumb is that downtime is cheap and HA is expensive. It's far from always the case. But it is generally true. It is a super rare company that feels even complete outages for a few hours in any significant way, and very rare that a company comes to a full stand still without their computers.

    And if you can keep somethings still working, even better.

    From the business side of things, we always want our time to sound 100x more expensive than it is. We talk in terms of our busiest day, not our average. We talk in terms of "lost money", when really it is normally "aggravation." We talk in terms of "outages" rather than "inconvenient temporary work arounds."

    Mitigation techniques for an outage for most companies are pretty strong. Some work just keeps on going, at least for a while. Some functions keep going on. Some people switch to available tasks to stay busy. Some people take breaks or go home. Cost reduction in labour, insurance, power, etc. all offset outages. And most companies can simply shift tasks to another time. No company runs at 100% capacity 24x7, none. It's not sustainable. Some companies have very little capability to make up work later, but most can.

    It's actually not uncommon for full day outages to end up having an "effectively zero dollar" cost when it is all said and done.



  • @wrx7m said in Comparing Server CPU Capabilities?:

    Order processing will stop completely. If no one has access to the ERP and other LOB apps we use, they can't do much of anything.

    Sure, but what does that really cost you? I've worked in a lot of factories and even at IBM this would have cost us almost nothing... because over the course of a few days or maybe a couple weeks we'd just run slightly faster and catch up.

    A single server outage is rarely more than a few hours, maybe a day at most. A second server, HA or just a second server, is to reduce that time from "several hours" to "minutes or maybe an hour tops." The average outage that HA protects against is actually just a few minutes. Full outages, like from total hardware failure, are crazy rare and if you are in a major city, part swaps are normally just a few hours.

    Add on to that the risk that the HA system itself might cause an outage and it gets harder to justify.