jasong
04-21-2005, 06:11 PM
I haven't read the article as of this posting, but you can find it here. (http://www.hexus.net/content/reviews/review.php?dXJsX3Jldmlld19JRD0xMTI5)
they've got the article covered in an advertisement, but copy and paste still works, NYAAAH, NYAAAH!!!!
It's all about the dual-core these days. From Intel launching a split-core Smithfield processor range, from the Pentium Extreme Edition 840 down, to today's official launch of AMD's dual-core Opteron range, all the hype in the processor world revolves around putting more than one core inside a processor package. The current approaches of AMD and Intel to multiple core processors aren't new, with other CPU vendors having multi-core processor designs for a long time. However those processors aren't x86 designs and fall outside of the usual HEXUS remit, and the eye of the consumer and many workstation and server vendors.
Before I dive into things, there's some concepts and terminology to digest, that I'll be using to describe multiple core processors in this article. Intel's Smithfield processor is made up of two separate processor dies on the same physical package, under the same heatspreader. So if you were to take the heatspreader off of a Pentium D or a Pentium Extreme Edition 840, you'd see two separate dies. AMD's approach is different, both cores occupying one die. Take the heatspreader off the Opteron 875 that I'll show you soon and you'd see just one die, albeit one which houses two processor cores.
While technically they're both dual-core processors, both current approaches shows you there's significant difference in the two ways of getting more than one processor core into one physical package. Both ways of doing chip multiprocessing, where you place multiple physical cores on the same package, have the cores sharing an interconnect to the rest of the system. The cores on Intel's Smithfield simply ride the same GTL+ bus that connects the processors to the memory controller and the rest of the system. AMD's approach has the cores sharing a memory controller and HyperTransport links to the rest of the system.
Regardless of the approach, multi-core processors are designed to exploit threading models and thread-level parallelism on modern operating systems. Common sense tells you that doing more work in the same amount of time on a computer system will increase performance. That's the driving mantra behind all multi processor systems, where if there's another complete set of execution resources on another processor to use, you should always do so wherever possible.
The barrier to massive widespread adoption of multi-processor systems is the software. Most consumer software is single-threaded where the application does all its work in one thread, which can only be run on one processor at any given time. If you've got multiple processors, the single-threaded application you're using will ignore all but one of the CPUs. However, with x86 CPU vendors running out of ways to increase single-threaded performance with essentially clock speed and larger caches, because of the current process limits that the CPUs are built with, they've had to go wide, building multi-core processors that keep sane clock speeds and cache sizes, but which let you run multiple threads of execution.
The mass-market introduction of multi-core x86 processors should therefore force operating system and application vendors to seriously consider multi-threading wherever possible. There's massive scope for parallelising many consumer software applications, so pervasive multi-core processors in systems worldwide will only increase the number of well-written multi-threaded applications, which in turn will help drive the reverse: further adoption of PCs, workstations and servers with multi-core processors because more applications are available to exploit them. And while single-threaded performance can theoretically rise just by having the OS you're running have access to more than one CPU, the large gains are with explicitly multi-threaded applications running on that same OS.
In the server and workstation world, they're most of the way there already with the software, and multi-core processors will simply allow vendors to pack more processing power into the same space, which affords them benefits in terms of power (you can double your processing power in the same number of chassis', with dual-core processors, so you're not spending any more of a power budget on things like disks and memory), size (to double your processing power in the same size of chassis, just use a dual-core processor) and therefore money.
Hopefully the benefits of a multi-core processor, like Intel's new dual-core desktop and workstation processors, and AMD's dual-core Opteron range, are obvious, when paired with software that can exploit it. With Tarinder looking at dual-core for the average consumer recently, it's my turn to look at dual-core Opteron for the workstation and server markets.
Let's jump right in with a look at how AMD engineered the dual-core Opteron processor.
Sharing a memory controller
I mentioned on the previous page that AMD's dual-core processor approach has both cores sharing the same die space. They've done so to allow both cores to share a memory controller, which is on the CPU with AMD's K8 generation of CPUs, of which dual-core Opteron is a member. Each core therefore has all the current benefits of AMD's on-die memory controller that a single-core processor does. So while available memory bandwidth doesn't increase when you add the second core, since there's only one memory controller, there's all the benefits of the low-latency controller available to both cores, keeping performance as high as possible.
Sharing HyperTransport links to the system and other CPUs
Every Opteron processor has one HyperTransport link that allows it to connect to devices in the system. Then, depending on the Opteron model, there may be one or two other HyperTransport links available for communicating with other processors in the system. A 1-series Opteron doesn't have any other links for communication with other Opterons in the system, since it's the single processor version. 2-series Opterons have one link, allowing them to connect to one other processor for a maximum of two in the same system. 8-series Opteron has two links, and depending on the topology employed by the mainboard they sit in, that allows you to connect up to eight Opterons together.
Dual-core Opteron doesn't change any of that, with the cores sharing those links via glue logic on the die. You can still place up to eight physical processor packages in a system with a dual-core 8-series Opteron, but the dual-core nature of the processors gives you sixteen processor cores to do work on. So it's not quite as optimal as having sixteen physical CPUs, each with its own memory controller, but it allows you to double processing power in any existing system that supports existing Opteron CPUs.
Cache coherency with MOESI
In any multi-processor system, the caches for each processor core need to be able to talk to each other to maintain coherency, should any processor in the system need data from the cache of any other CPU.
Back in the days of the Athlon MP, AMD implemented the MOESI cache coherency protocol. MOESI stands for Modify, Owner, Exclusive, Shared, Invalid. Each of those is a state the caches in the system can occupy, depending on what's being done with them by the CPU cores. For example, say that core one updates some memory in its cache, before writing it back out to memory. Core two is always snooping the traffic to core one, and as it spots that happening it marks the caches as Modified, to indicate they're not coherent. In a MESI cache coherency scheme, without the Owner state, if core two wanted to read that memory, it would have to ask core one for it, which tells two to hang on a short while while it writes the data back out to main memory.
However, since Athlon MP, single core SMP Opteron, and now dual-core Opteron, has used the Owner state. In the case above, Owner state allows core one to pass the data that core two wanted over the core-to-core interconnect and update the cache on the other CPU directly, without writing it back out to main memory, with the caches then marked as Shared. You can see how that would increase performance.
There's less latency when cache data needs to be updated, since you don't need two trips out to main memory, one per core, for a read and write to get the caches back in sync. It's worth noting that Intel's multi-processor Xeon systems currently implement the MESI protocol, so they do have to go out to main memory if cache data is marked Invalid or Modified.
So there's a fast core-to-core link that allows the cores in any dual-core Opteron system, even one with multiple processors, to update each others caches as fast as possible, with little latency. If the caches to be updates reside on separate physical packages, the cache updated are conducted over HyperTransport. The important thing to keep in mind is that they don't need to hit main memory to do so, unlike current Xeon and dual-core Pentium D and Extreme Edition.
they've got the article covered in an advertisement, but copy and paste still works, NYAAAH, NYAAAH!!!!
It's all about the dual-core these days. From Intel launching a split-core Smithfield processor range, from the Pentium Extreme Edition 840 down, to today's official launch of AMD's dual-core Opteron range, all the hype in the processor world revolves around putting more than one core inside a processor package. The current approaches of AMD and Intel to multiple core processors aren't new, with other CPU vendors having multi-core processor designs for a long time. However those processors aren't x86 designs and fall outside of the usual HEXUS remit, and the eye of the consumer and many workstation and server vendors.
Before I dive into things, there's some concepts and terminology to digest, that I'll be using to describe multiple core processors in this article. Intel's Smithfield processor is made up of two separate processor dies on the same physical package, under the same heatspreader. So if you were to take the heatspreader off of a Pentium D or a Pentium Extreme Edition 840, you'd see two separate dies. AMD's approach is different, both cores occupying one die. Take the heatspreader off the Opteron 875 that I'll show you soon and you'd see just one die, albeit one which houses two processor cores.
While technically they're both dual-core processors, both current approaches shows you there's significant difference in the two ways of getting more than one processor core into one physical package. Both ways of doing chip multiprocessing, where you place multiple physical cores on the same package, have the cores sharing an interconnect to the rest of the system. The cores on Intel's Smithfield simply ride the same GTL+ bus that connects the processors to the memory controller and the rest of the system. AMD's approach has the cores sharing a memory controller and HyperTransport links to the rest of the system.
Regardless of the approach, multi-core processors are designed to exploit threading models and thread-level parallelism on modern operating systems. Common sense tells you that doing more work in the same amount of time on a computer system will increase performance. That's the driving mantra behind all multi processor systems, where if there's another complete set of execution resources on another processor to use, you should always do so wherever possible.
The barrier to massive widespread adoption of multi-processor systems is the software. Most consumer software is single-threaded where the application does all its work in one thread, which can only be run on one processor at any given time. If you've got multiple processors, the single-threaded application you're using will ignore all but one of the CPUs. However, with x86 CPU vendors running out of ways to increase single-threaded performance with essentially clock speed and larger caches, because of the current process limits that the CPUs are built with, they've had to go wide, building multi-core processors that keep sane clock speeds and cache sizes, but which let you run multiple threads of execution.
The mass-market introduction of multi-core x86 processors should therefore force operating system and application vendors to seriously consider multi-threading wherever possible. There's massive scope for parallelising many consumer software applications, so pervasive multi-core processors in systems worldwide will only increase the number of well-written multi-threaded applications, which in turn will help drive the reverse: further adoption of PCs, workstations and servers with multi-core processors because more applications are available to exploit them. And while single-threaded performance can theoretically rise just by having the OS you're running have access to more than one CPU, the large gains are with explicitly multi-threaded applications running on that same OS.
In the server and workstation world, they're most of the way there already with the software, and multi-core processors will simply allow vendors to pack more processing power into the same space, which affords them benefits in terms of power (you can double your processing power in the same number of chassis', with dual-core processors, so you're not spending any more of a power budget on things like disks and memory), size (to double your processing power in the same size of chassis, just use a dual-core processor) and therefore money.
Hopefully the benefits of a multi-core processor, like Intel's new dual-core desktop and workstation processors, and AMD's dual-core Opteron range, are obvious, when paired with software that can exploit it. With Tarinder looking at dual-core for the average consumer recently, it's my turn to look at dual-core Opteron for the workstation and server markets.
Let's jump right in with a look at how AMD engineered the dual-core Opteron processor.
Sharing a memory controller
I mentioned on the previous page that AMD's dual-core processor approach has both cores sharing the same die space. They've done so to allow both cores to share a memory controller, which is on the CPU with AMD's K8 generation of CPUs, of which dual-core Opteron is a member. Each core therefore has all the current benefits of AMD's on-die memory controller that a single-core processor does. So while available memory bandwidth doesn't increase when you add the second core, since there's only one memory controller, there's all the benefits of the low-latency controller available to both cores, keeping performance as high as possible.
Sharing HyperTransport links to the system and other CPUs
Every Opteron processor has one HyperTransport link that allows it to connect to devices in the system. Then, depending on the Opteron model, there may be one or two other HyperTransport links available for communicating with other processors in the system. A 1-series Opteron doesn't have any other links for communication with other Opterons in the system, since it's the single processor version. 2-series Opterons have one link, allowing them to connect to one other processor for a maximum of two in the same system. 8-series Opteron has two links, and depending on the topology employed by the mainboard they sit in, that allows you to connect up to eight Opterons together.
Dual-core Opteron doesn't change any of that, with the cores sharing those links via glue logic on the die. You can still place up to eight physical processor packages in a system with a dual-core 8-series Opteron, but the dual-core nature of the processors gives you sixteen processor cores to do work on. So it's not quite as optimal as having sixteen physical CPUs, each with its own memory controller, but it allows you to double processing power in any existing system that supports existing Opteron CPUs.
Cache coherency with MOESI
In any multi-processor system, the caches for each processor core need to be able to talk to each other to maintain coherency, should any processor in the system need data from the cache of any other CPU.
Back in the days of the Athlon MP, AMD implemented the MOESI cache coherency protocol. MOESI stands for Modify, Owner, Exclusive, Shared, Invalid. Each of those is a state the caches in the system can occupy, depending on what's being done with them by the CPU cores. For example, say that core one updates some memory in its cache, before writing it back out to memory. Core two is always snooping the traffic to core one, and as it spots that happening it marks the caches as Modified, to indicate they're not coherent. In a MESI cache coherency scheme, without the Owner state, if core two wanted to read that memory, it would have to ask core one for it, which tells two to hang on a short while while it writes the data back out to main memory.
However, since Athlon MP, single core SMP Opteron, and now dual-core Opteron, has used the Owner state. In the case above, Owner state allows core one to pass the data that core two wanted over the core-to-core interconnect and update the cache on the other CPU directly, without writing it back out to main memory, with the caches then marked as Shared. You can see how that would increase performance.
There's less latency when cache data needs to be updated, since you don't need two trips out to main memory, one per core, for a read and write to get the caches back in sync. It's worth noting that Intel's multi-processor Xeon systems currently implement the MESI protocol, so they do have to go out to main memory if cache data is marked Invalid or Modified.
So there's a fast core-to-core link that allows the cores in any dual-core Opteron system, even one with multiple processors, to update each others caches as fast as possible, with little latency. If the caches to be updates reside on separate physical packages, the cache updated are conducted over HyperTransport. The important thing to keep in mind is that they don't need to hit main memory to do so, unlike current Xeon and dual-core Pentium D and Extreme Edition.