Cores and Processor Speed: What Do I Need?

scottalanmiller

In an old article I talked about how to understand the differences between CPUs, cores, hyperthreading, logical processors and such. Now that we know the terminology and how to determine what we have we can take a different approach and look at what we need.

Determining our needs around processors is a bit of a dark art, being good at it requires a bit of experience, a lot of knowledge about workloads and a lot of luck. But with a good understanding of what the factors are we can do a decently good job of finding a good starting point.

Virtual CPUs vs. Physical CPUs: It is very important to understand that this discussion is about actual logical processors and CPU speed, not vCPUs and presented CPU speeds as used by VMs running on hypervisors. A vCPU is not directly related to a real logical processor and requires an entirely different discussion to understand. This article is going to look at what hardware needs exist. Assignment of those real resources as virtual resources is an abstraction layer and is dependent on the specific hypervisor and how it consumes and exposes those real resources as virtual ones.

Different workloads have different needs. Long ago, most software was written during a time when computers rarely, if ever, had more than a single processor or core. In the desktop world this was up until the mid-2000s for most people and even in the server world it was rare for servers to have more than two processors, with only a single CPU each. In the enterprise space with very large servers of course this was different but the software that they ran was generally very specialized and written specifically for that hardware.

In a similar manner, many arcade machines and even video game consoles were multi-processor long before mainstream computing was. Even the Commodore 128 had two processors.

It is relatively recent that processor makers have shifted from working to primarily increase the clock speed and efficiency of a processor to giving a processors more cores. These are the three big factors that we must consider in processor performance. But they all have very different effects.

Processor Speed generally measured in GHz is the speed at which a processors clock ticks. It is easy to understand because if we have two processors that are otherwise identical and one has twice the clock speed, it means that it does twice as much in the same amount of time. It is literally "twice as fast." (Of course this is only the processor, it's memory, drives and other factors do not speed up so often this additional speed is wasted.)

Processor Efficiency is how much a processor can do. Each clock cycle a processor gets a chance to "do something." Some processors do a lot more during a clock cycle than others. An IBM Power 8 does many, many more "things" per clock cycle than does an Intel 8088. Sometimes efficiency moves backwards, famously the Intel Pentium IV was less efficient than the Pentium III had been.

The P3 to P4 move was a famous example of processor design failure. The P4 gave up efficiency in exchange for a super fast system clock making for a far easier time marketing what sounded like a fast processor, but in reality was not. The P4 was also the first place that hyperthreading was introduced, but it failed miserably and gave the technology a black eye as it made the processors slower in most cases.

Cores are how many full logic engines exist on the CPU. Traditionally a CPU had one core, but today they can have many. Each core is a complete logic unit able to process a workload independent of other cores. A processor with two cores can literally process two things for each workload instead of just one. In theory, the processor can do double the work, but realistically this is not how it works.

Putting these factors together is rather complex. We need to understand our specific processor options, for one thing. Typically we get only a limited number of options as most workloads are relatively specific. If we are running IBM AIX, we will be running on a Power 6, Power 7 or Power 8 processor only. If we want to use VMware vSphere our choices are limited to Intel Xeon and AMD Opteron, both of which are AMD64 architecture.

In normal usage, it is the AMD64 world where process selection is the most complicated as AMD and Intel take extremely different approaches to processor design. AMD focuses strongly on solid, high core count systems. Intel, on the other hand, focuses on per core performance. Intel, also, has option hyperthreading which allows you to select between enabling HT to get the appearance of more cores at a loss of per core efficiency or to disable HT to get maximum per core performance.

Determining what processor to pick, even from within a single processor range (such as only from within the available AMD Opterons) is difficult enough. Adding any amount of variety makes it extremely hard.

What we must look at first is our workload. The easiest thing to determine is our threading needs. Unlike other factors, our thread need or "load" is relatively straight forward as it is a discrete, real number. In the old days with applications written for DOS, and even DOS itself, everything was single threaded - meaning that any given process could only use a single logical processor no matter how many were in the system and idle. So having a second logical processor would have been pointless. Today many workloads remain single threaded and at this point this is not likely to change again for a long time.

On the desktop side of the world, the trend is still towards systems with low core counts. Thankfully because operating systems are all multi-threaded today even single threaded applications benefit from having additional cores. For example, an old video game or legacy business application that only runs on a single core can get all of the resources of that core while the other core(s) handle other tasks for the system such as operating system tasks, polling hardware, handling memory management, disk management and such. And a system with multiple cores can still give a full core to each single threaded application at the same time allowing each to run at full speed without being blocked by another single threaded application. So for desktop usability, the usefulness of multi-core systems preceded the availability of multi-threaded applications by quite some time. And even today many applications have such little CPU processing needs that leveraging more than one thread may not be useful. CPUs are so fast today that few applications outside of video games can take advantage of them.

It should be noted that what we are discussing here is referred to as symmetrical multi-processing and refers to having many identical logical thread engines / logical processors. This is different than what GPUs and RAID cards do for processing offload as those typically use very different architectures and as asymmetrical multi-processing.

In a server, typically, we are running very different workloads than on a desktop. It is very common for servers to run many processes at once or to have code that is specifically written to take advantage of many available logical processors. This is both because servers are generally used to service many end users at once and also because server hardware typically has the ability to do so and has decades before desktops did. (It is common for a desktop today to have four physical cores while it is common for even a small server to have twenty to forty and high end desktops normally top out around eight cores and high end servers will go into the hundreds or thousands of cores.)

There are a few common approaches to utilizing the logical processors on a server. One is to write software that is multi-threaded itself. This is very complex but also very powerful. This need both to provide this kind of software and to make it easier and safer to write it has given rise to a number of languages that specifically make this kind of programming easier, namely Scala, Clojure, F#, Haskell and the like. This is done to allow complicated logic and shared state to exist across the multi-threaded application.

Another approach is to spin off many individual processes. A great example of this is the Apache web server. As an Apache system gets busy, it simply creates more processes to handle incoming web requests. Each individual process is single threaded and can never run on more than a single logical processor at a time. But by having many of them, the illusion of multi-threading is maintained. Apache is able to easily do this because each process has no need of communicating with other sibling processes, they each do their own thing independently.

Likewise it is common for processes that are single threaded or that have limited threading support to be handled manually. A great example of this is NodeBB. NodeBB itself is single threaded. In order to handle additional workload a single system with multiple logical processors can run one NodeBB instance per thread so manually configuring, as an example, eight instances of NodeBB to run on an eight core system would allow every core to handle the workload of a single NodeBB instance without interference from other instances.

Of course, it should be noted, that it is very rare that any process would use an entire core or logical processor all of the time. Generally even busy ones are only busy some of the time. So the concept of running eight processes on eight cores is a bit hypothetical. In reality a busy desktop might run close to one hundred processes and a server maybe easily go into the thousands. But most of those threads are idle the majority of the time so the number of processes that a server can handle has to be seen in that light. A four core server might still run sixteen or more Apache processes for efficiency. Perhaps even more.

How any given process runs has to be understood. If a single process is going to need to hog a single logical processor full time, or if a process will use it only one percent of the time will create very different scenarios. Then we have to consider needs around latency and throughput but that is for another discussion.

So having logical processors determines how many threads or processes we can service at any given instant. Figuring out how many we need to service is the tougher part.

The speed of processors obviously plays an important role. The faster the processor, the more that each logical processor can process in a given amount of time. All other things being equal, a processor of double the clock speed can do twice the work of one half its speed. But rarely are processors available in a speed range of that dynamic range and generally many other things vary with that. No matter what, a processor of twice the speed is more useful than have double the logical processors as all workloads, no matter what they are, can and will take advantage of a faster clock even if only in reducing latency, but only certain workloads can take advantage of additional logical processors.

The architecture of the processor determines its efficiency and this is, by far, the hardest performance characteristic to understand. Every processor works differently. If you had a single core of an Intel Xeon, AMD Opteron, IBM Power, Oracle Sparc, Intel IA64 Itanium and ran them all at identical clock speeds and tested them for performance you would get wildly different results and, in fact, would get wildly different results depending on the kind of test that is run. Not only is a single core of each much faster or slower but the kinds of workloads that each is good at is very different. It is very rare that we have an opportunity to decide between such a broad range of processor designs, but it is critical to understand that logical processor count and/or straight core count and clock speed cannot simply be put into a formula to derive a performance number for comparison. A good example is AMD Opterons which have a design meant to have a large number of physical cores compared to Intel Xeons which are designed to have relatively few physical cores and more logical processors via hyperthreading, but still fewer logical processors than their Opteron brethren, but generally for the bulk of workloads the Intel Xeon is currently the performance champion.

So we need to understand our own workloads - primarily how a single threaded operation will be impacted by processor speed and then how "threaded" our workload will be. In nearly all cases our architecture is chosen because of other factors and we only need to make decisions around our investments in speed and cores. It is very common today, after decades of CPUs being lacking in servers, for companies, especially in the SMB, to heavily invest in processors so powerful that they have no means of making use of them with their existing workloads while skimping on other aspects of the server such as memory and storage where real bottlenecks exist. CPUs are getting more powerful more quickly than software workloads are becoming complex or expansive. And operating systems routinely get faster and more efficient, as do programming frameworks, often making the need for powerful processors decrease rather than increase!

For example, a busy web server is likely to need many threads but have little requirement for high per thread performance. Often a relational database will be quite the opposite. How individual applications scale and block are key factors to consider.

Of course the world of virtualization has changed general buying requirements. Traditionally an individual workload would dramatically influence CPU purchasing decisions. But today we can balance different types of workloads as needed to spread out use between many processors and can use multiple virtual machines to utilize unused logical processors on our physical hosts. Because of consolidation through virtualization the ability to leverage a large logical processor count has grown significantly. This has been a key factor as to why desktops have stagnated with four physical cores and servers have pushed to many times that. Large logical processor counts simply allow greater consolidation onto a single physical host.

At the end of the day, knowing how to choose processor specs requires a deep understanding of the workload itself as well as the performance needs of those who use those workloads with a specific eye towards understanding how additional logical processors will be utilized, will there be enough simultaneous threads to use the available processors.

art_of_shred

I better keep my replies short. There's no room on this thread for comments. You used all the space already.

DustinB3403

@art_of_shred said:

I better keep my replies short. There's no room on this thread for comments. You used all the space already.

TL;DR can you summarize what you said for me?

Dashrender

@DustinB3403 said:

@art_of_shred said:

I better keep my replies short. There's no room on this thread for comments. You used all the space already.

TL;DR can you summarize what you said for me?

Post =long, reply = must short.

DustinB3403

@Dashrender Thx