Dual Cores: Is it the future?

x0d · Aug 31, 2004

Now that AMD and Intel are going to offer Dual Core chips,
I wonder if a shift of the PC to VLIW-based architecture (like Itanium's IA64)
won't yield better results.
It seems like the Mhz game is quite over now and the industry is looking for new speed enhancers.

I see it this way:

Dual core benefits:
1) Can use simpler cores
2) Easier to design
3) Better for server use

Dual core disadvantages:
1) Two caches
2) In NUMA (AMD) based architecture, the seperate memory banks for each core may increase
memory latency; In unified (Intel) memory architecture, the bus bandwidth will severly decrease
3) Duplicate execution units, memory controller units, bus units, all cost transistors and produce heat
4) IO resources contention
5) Harder to write efficient mutlithreaded software, especially in games (like dividing current algorithms
to take advantage of the second core).

At least for the personal/entertainment computing, I feel VLIW is the better solution, although not
backwards compatible.
Look at the DSP world, dual core is a last-resort solution, isn't it?

TuxDave · Aug 31, 2004

I haven't seen the comprehensive study yet, but the popular reason for why they're moving to dual core chips is to reduce the power-performance ratio. Someone must've figured out that for the same throughput in a dual core chip than in a single core chip, the dual core can run at lower power levels. The single core needs to clock faster and use smaller transistors which has more leakage. Dual cores will not need to be clocked as fast and that may be where the power reduction comes from. If not, then there wouldn't be a point to moving over since it would be power limited again but this time at a lower performance level. I consider the entire thing a shift from pipelining for power-performance to parallelism for power-perfomance.

At least this puts a little more pressure on the computer science guys. Now they have to optimize their software for multiple cores.

x0d · Aug 31, 2004

So why not VLIW with multiple execution units?
Look at DSPs like TI's C64xx, which are VLIW and can perform up to 8 instructions per cycle.
They are considered to be (IIRC) highest MIPS/Watt DSP in the industry.
Multiple on-chip execution units IS parallelism of some sort.
Granted, it takes more money and time to develop, but you don't need 2x everything like in dual core,
and I think it is better suited to the entertainment market.

TuxDave · Aug 31, 2004

The lack of backwards compatibility makes VLIW a problem. That's probably why they don't want to make a chip that'll bomb in the market. But you are correct, VLIW is a sort of paralellism. Most designs are shifting in that direction whether it's by replicated everything or just the execution units.

Mday · Sep 1, 2004

well, dual core is "free" since you dont have to completely redesign the core, you just add in some buffers, and shift some things around. Not to mention more than doubling the realestate. Dual core is not a core design change as much as it is a core application\implementation change. It really just is putting 2 processors in one chip.

VLIW is a core CHANGE period. Not to mention is a CS change.

So with dual core, what you do is add i a library (dll or drivers, whatever) to the OS that treats it as 2 processors. With a shift to VLIW, you have to change the OS kernel itself. The first requires little change, the latter requiring a major change.

Titan · Sep 2, 2004

I liken the dual (multi) core approach to building an engine, with the present goal shifted towards power consumption. It's like CPUs have been a 1-cylinder engine forever, and we have been improving performance by increasing fuel intake and driving RPMs as high as we can go. Now we have reached a limit, so to compensate, we are adding more cylinders, which is a good idea so we don't have to keep the RPMs as high. But you are right in that modern software does not play nicely with multithreaded environments, and traditionally mult-cpus are a server idea. For dual cores to fully work there is going to have to be a fundamental shift in how software is written, and I'll be curious to see how they plan to do this, and how hard the proponents will push. But in my mind, (in the future) the overall CPU utilization should rarely reach 100% on a multi-core chip, that would be like redlining an engine.

imported_devnull · Sep 2, 2004

Well that is a good question the one one you posing, although I believe that you are missing some parts of the picture.

So let's get down to it.

Modern CPUs are single threaded (Intel is trying to play some games here but it has a lot more to do!). That means that there is a single thread (or sequence) of instructions that they execute. Don't mess that with anything on software level where multiple threads/programs seem to execute in parallel. Down in the CPU only one stream of instructions is executed.

Now since modern CPUs have the luxury of space ambundance, their designers have added them several execution units and so more than one instruction may be executed simultaneously.

Let's say that an assembly program is the following :

ADD A,B and put the result in X
ADD A,C and put the result in Y
ADD D,E and put the result in Z

If the CPU has 3 ALUs (Arithmetic Logic Units) it could execute these instructions in parallel and produce the three results at the same time. As you can see there is no interference in this code. That means that each instruction is totally independent from any other. But this is rather ideal : if let's say the third instruction was to add D and X then there would be a problem because the X is calculated in the first instruction so this cannot be done in parallel.

So although a CPU may have 3 duplicate units it is not guaranteed that it use them always in parallel. In order to implement this Instruction Level Parallelism (ILP) there are two approaches. The first one (Superscalar) is implemented in hardware and during execution it checks a number of instructions and issues at the same time those instructions that can be executed in parallel. This is the approach in Intel's (x86), AMD's, IBM's and Sun's processors.

Another approach to exploit ILP is to put all the work on the compiler. So the compiler tries to find the instructions that can be executed in parallel and places them on a large instruction. If for a example (oversimplified) the CPU has 3 ALU's then the instruction will have 3 fields each one containing an ALU command. That is the VLIW (Very Large Instruction Word) solution, and you can find in DSPs, Itaniums etc.

The thing is that again VLIW is a single thread executing unit. Yes parallelism is present, but for a single stream of instructions. You don't pack instructions of different programs in a single VLIW instruction!

The main advantage of a dual core design, is that it can behave like a dual processor system with the benefit of smaller space, less power anticipation, quicker interprocessor communication and much greater scalability!

In a server environment where a system may be running simultaneously a great number of processes/threads this is very very important. Maybe a single VLIW core could be faster but that would be true if it only run a single process/thread.

imgod2u · Sep 3, 2004

Dual core and VLIW are not mutually exclusive anymore than SMT and superscalar solutions are mutually exclusive. If anything, dual-core and VLIW follows the same guidelines in that it seeks a simpler core design and pushes the task of finding parallelism on the programmer/compiler. Upcomming versions of Itanium (Montecito and later Tukwila) will be multi-core as well as SMT-enabled (although on Montecito, it maybe corse-grain multithread or superthreading and not the SMT style of Netburst).
Dual-core is a solution to increase performance at a minimal of design cost and heat density (note: not heat, heat density). The heat and power requirements double and performance does not double. Not even theoretically. With a processor clocked at 2x the clockspeed or a processor with 2x the execution bandwidth, the theoretical gain is 2x in performance in any application. With dual-core, the theoretical gain in single-threaded applications is 0 while the gain in multithreaded (very well balanced, multithreaded apps) is 2x. It is inherently less efficient from a power point of view. However, it has the advantage that you can design it without having to significantly alter the architecture (such as extending the pipeline to the point where any gain from frequency increase is masked by a combination of higher branch penalties, longer memory latencies and even, and this is probably most critical, longer cache latencies). The other advantage is that the surface area doubles for double the power consumption, so heat dissipation per area remains the same.
Needless to say, it is not something that was introduced to the desktop except out of desperation for more performance without going to extremes with methods of increasing ILP in a single thread. We saw what happens if you do such a thing with Prescott. But again, ILP and TLP do not exclude each other and should not. VLIW may or may not catch on, but it would have nothing to do with whether dual-core is successful or not.

Search

Dual Cores: Is it the future?

x0d

Junior Member

TuxDave

Lifer

x0d

Junior Member

TuxDave

Lifer

Mday

Lifer

Titan

Golden Member

imported_devnull

Junior Member

imgod2u

Senior member

TRENDING THREADS