RISC or CISC CPU's?

dakels

Platinum Member
Nov 20, 2002
2,809
2
0
Looking at the vector processing conversation, it reminded me of something.

I use both Mac and PC. One thing I have never learned is why does intel and other PC maufacturers use the CISC based processors and not RISC based like DEC and PowerPC? The PowerPC uses much lower mhz which equates in a much lower operating temperature and yet seems to get more bang for the mhz buck. I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?

A huge benefit too is the temeratures as I stated. My G4 can run in very hot ambient conditions with no CPU fan and just a heat sink. The AMD machine next to it will crash and burn in that situation.

someone fill me in? (maybe I should have read more of the vector processing thread later too)
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
The PowerPC uses much lower mhz which equates in a much lower operating temperature and yet seems to get more bang for the mhz buck. I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?

One cannot focus only on the instuction set style to make such comparisons; it ignores the internal microarchitectural implementation. The equation for performance is:

Time/program = cycles/instruction * seconds/cycle * instructions/program

Cycles/instruction (CPI, the inverse of IPC) is dictated by microarchitecture (organization) and instruction set, seconds/cycle (inverse of clock rate) by the microarchitecture and the implementation (circuit and physical design), and the instructions/program (instruction count) by the software and the instruction set.

But the important thing to note is that the instruction set has a second-order influence on cycles/instruction; the internal organization is far more important. The Athlon and P4 share the same basic microarchitecture as most high-performance RISC CPUs, namely dynamically scheduled superscalar, in which multiple instructions can issue each cycle out of program order. The Athlon, P3, P4, and G4 all issue/retire 3 instructions/cycle, and most server-class RISC CPUs issue and retire 4 or 5 instructions/cycle. The Athlon and P4 both decode x86 instructions into smaller RISC operations (typically one arithmetic operation and one memory operation) to faciliate pipelining. Thus the main effects of using the legacy x86 ISA is more engineering difficulty in decoding x86 instructions and tracking instructions and their condition codes down the pipeline, as well as some performance loss due to x86's fewer logical registers (8 vs. 32 in classic RISC architectures). The restriction in the number of registers makes x86 CPUs more dependent on the memory subsystem and can restrict some types of code formation by compilers.

I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?
Depends on the processor. A 3 GHz Alpha EV7 would definitely be very fast compared to a 3 GHz P4, but one must look at the other differences as well. The EV7 has a far more robust microarchitecture: it has a higher issue and retire rate, a far more advanced branch predictor, and more aggressive scheduling (including the ability to issue loads and store out-of-order. It also has a more advanced memory system, with larger and higher bandwidth L1 and L2 caches and a much higher bandwidth main memory system.

So while the Athlon and P4 generally sustain fewer instructions/cycle than many server-class RISC CPUs (though the Athlon is certainly comparable to the G4), they have the advantage of a higher clock rate. This is due to a number of reasons; there is the obvious pipeline length factor, though the IBM POWER4 has an integer pipe that is longer than the Athlon's. Generally speaking x86 CPUs achieve a higher clock rate due to their volume and target market. A higher volume chip can achieve better speed bins during manufacturing. And because they are produced in such a high volume, they generally move to finer manufacturing process technologies before server-class CPUs, which are often produced on older, tried-and-true manufacturing processes in order to maximize yield given their large die size. For example, the P4 is moving to a 90nm process node at the end of this year, while the Alpha EV7 is being moved to a 130nm process next year.

Thus, because the increasing number of transistors per chip allowed x86 CPUs to adopt advanced microarchitectures, and because of their volume manufacturing, x86 CPUs have generally caught up to server-class RISC performance in many respects. Just check out the position of the Athlon and P4 in SPECint 2K and SPECfp 2K (used as a cross-platform benchmark to test integer and floating-point workstation-like performance).

One thing I have never learned is why does intel and other PC maufacturers use the CISC based processors and not RISC based like DEC and PowerPC?
Frankly because this industry loves backwards compatibility. It should be noted that Intel tried twice before to get rid of x86, not including IA-64. x86 was originally a stop-gap measure put out by Intel because of the numerous delays of the iAPX 432, a so-called "super-CISC" chip that directly executed high-level object-oriented code. It was over five years late to the market, and performed very poorly. Their second attempt was the i860 in the late 80s, a RISC CPU that had some very interesting features, including a VLIW-like dual-instruction issue mode. Though it achieved success as a microcontroller and in the Paragon supercomputer, it flopped as a general-purpose desktop and workstation microprocessor.

* not speaking for Intel Corp. *
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
I should probably clarify myself before I lead someone off on the wrong track. While the ISA often has a second-order affect on performance (except, perhaps, in the case of vector and VLIW ISAs, in which the ISA affects the implementation style and performance), it often does have a large effect on implementation. Thus ISA decisions can make pipelining and out-of-order execution difficult; but, like many problems in computer architecture, they can be solved if you throw enough transistors at them.

I should also note that many of the RISC ISAs have decidedly "non-RISC" features by breaking the philosophy of providing primitives in the ISA, not solutions. Some were "good ideas at the time" that tried to solve contemporary problems in the ISA, only to be an annoyance later: branch delay slots, register windows (arguably); others were features carried over from CISC architectures: large number of addressing modes, autoincrement addressing, condition code, non-orthogonal register sets. There's a reason that the Alpha ISA, developed nearly a decade after the first-generation RISC ISAs and with the intention of facilitating high-performance implementations, is much cleaner than most of the RISC ISAs.

* not speaking for Intel Corp. *
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,003
126
The line between CISC and RISC has been blurred so much that it really doesn't mean very much, especially for desktop processors. x86 processors take CISC instructions and decode them to RISC ones before executing them. Also the PPC isn't a true RISC processor since it breaks a lot of rules of RISC processors.
 

Originally posted by: Sohcan
The PowerPC uses much lower mhz which equates in a much lower operating temperature and yet seems to get more bang for the mhz buck. I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?

One cannot focus only on the instuction set style to make such comparisons; it ignores the internal microarchitectural implementation. The equation for performance is:

Time/program = cycles/instruction * seconds/cycle * instructions/program

Cycles/instruction (CPI, the inverse of IPC) is dictated by microarchitecture (organization) and instruction set, seconds/cycle (inverse of clock rate) by the microarchitecture and the implementation (circuit and physical design), and the instructions/program (instruction count) by the software and the instruction set.

But the important thing to note is that the instruction set has a second-order influence on cycles/instruction; the internal organization is far more important. The Athlon and P4 share the same basic microarchitecture as most high-performance RISC CPUs, namely dynamically scheduled superscalar, in which multiple instructions can issue each cycle out of program order. The Athlon, P3, P4, and G4 all issue/retire 3 instructions/cycle, and most server-class RISC CPUs issue and retire 4 or 5 instructions/cycle. The Athlon and P4 both decode x86 instructions into smaller RISC operations (typically one arithmetic operation and one memory operation) to faciliate pipelining. Thus the main effects of using the legacy x86 ISA is more engineering difficulty in decoding x86 instructions and tracking instructions and their condition codes down the pipeline, as well as some performance loss due to x86's fewer logical registers (8 vs. 32 in classic RISC architectures). The restriction in the number of registers makes x86 CPUs more dependent on the memory subsystem and can restrict some types of code formation by compilers.

I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?
Depends on the processor. A 3 GHz Alpha EV7 would definitely be very fast compared to a 3 GHz P4, but one must look at the other differences as well. The EV7 has a far more robust microarchitecture: it has a higher issue and retire rate, a far more advanced branch predictor, and more aggressive scheduling (including the ability to issue loads and store out-of-order. It also has a more advanced memory system, with larger and higher bandwidth L1 and L2 caches and a much higher bandwidth main memory system.

So while the Athlon and P4 generally sustain fewer instructions/cycle than many server-class RISC CPUs (though the Athlon is certainly comparable to the G4), they have the advantage of a higher clock rate. This is due to a number of reasons; there is the obvious pipeline length factor, though the IBM POWER4 has an integer pipe that is longer than the Athlon's. Generally speaking x86 CPUs achieve a higher clock rate due to their volume and target market. A higher volume chip can achieve better speed bins during manufacturing. And because they are produced in such a high volume, they generally move to finer manufacturing process technologies before server-class CPUs, which are often produced on older, tried-and-true manufacturing processes in order to maximize yield given their large die size. For example, the P4 is moving to a 90nm process node at the end of this year, while the Alpha EV7 is being moved to a 130nm process next year.

Thus, because the increasing number of transistors per chip allowed x86 CPUs to adopt advanced microarchitectures, and because of their volume manufacturing, x86 CPUs have generally caught up to server-class RISC performance in many respects. Just check out the position of the Athlon and P4 in SPECint 2K and SPECfp 2K (used as a cross-platform benchmark to test integer and floating-point workstation-like performance).

One thing I have never learned is why does intel and other PC maufacturers use the CISC based processors and not RISC based like DEC and PowerPC?
Frankly because this industry loves backwards compatibility. It should be noted that Intel tried twice before to get rid of x86, not including IA-64. x86 was originally a stop-gap measure put out by Intel because of the numerous delays of the iAPX 432, a so-called "super-CISC" chip that directly executed high-level object-oriented code. It was over five years late to the market, and performed very poorly. Their second attempt was the i860 in the late 80s, a RISC CPU that had some very interesting features, including a VLIW-like dual-instruction issue mode. Though it achieved success as a microcontroller and in the Paragon supercomputer, it flopped as a general-purpose desktop and workstation microprocessor.

* not speaking for Intel Corp. *

:confused:
I have no idea what you just said, but it sounds cool. :)
 

borealiss

Senior member
Jun 23, 2000
913
0
0
dunno if anybody answered this, those posts above are novels, and i'm feeling kinda lazy :)
anyways x86 is basically just a wrapper now. a common interface to do stuff. every high performance cisc cpu breaks these things up into risc like micro-ops. so essentially a cisc cpu is just a risc cpu with a heavy front end, and lots of die space for decoding. so it's kinda hard to say. if you stick with any isa for long enough, i think the cpus will eventually evolve to become cisc based unless you want to scrap the entire isa and forgo backwards compatibility. the line is kinda blurred between the two, but i'll take cisc for my needs, risc for enterprise.
 

sxr7171

Diamond Member
Jun 21, 2002
5,079
40
91
Originally posted by: Sohcan
The PowerPC uses much lower mhz which equates in a much lower operating temperature and yet seems to get more bang for the mhz buck. I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?
One cannot focus only on the instuction set style to make such comparisons; it ignores the internal microarchitectural implementation. The equation for performance is: Time/program = cycles/instruction * seconds/cycle * instructions/program Cycles/instruction (CPI, the inverse of IPC) is dictated by microarchitecture (organization) and instruction set, seconds/cycle (inverse of clock rate) by the microarchitecture and the implementation (circuit and physical design), and the instructions/program (instruction count) by the software and the instruction set. But the important thing to note is that the instruction set has a second-order influence on cycles/instruction; the internal organization is far more important. The Athlon and P4 share the same basic microarchitecture as most high-performance RISC CPUs, namely dynamically scheduled superscalar, in which multiple instructions can issue each cycle out of program order. The Athlon, P3, P4, and G4 all issue/retire 3 instructions/cycle, and most server-class RISC CPUs issue and retire 4 or 5 instructions/cycle. The Athlon and P4 both decode x86 instructions into smaller RISC operations (typically one arithmetic operation and one memory operation) to faciliate pipelining. Thus the main effects of using the legacy x86 ISA is more engineering difficulty in decoding x86 instructions and tracking instructions and their condition codes down the pipeline, as well as some performance loss due to x86's fewer logical registers (8 vs. 32 in classic RISC architectures). The restriction in the number of registers makes x86 CPUs more dependent on the memory subsystem and can restrict some types of code formation by compilers.
I would imagine that a 3.08ghz RISC processor would smoke the current CISC P4?
Depends on the processor. A 3 GHz Alpha EV7 would definitely be very fast compared to a 3 GHz P4, but one must look at the other differences as well. The EV7 has a far more robust microarchitecture: it has a higher issue and retire rate, a far more advanced branch predictor, and more aggressive scheduling (including the ability to issue loads and store out-of-order. It also has a more advanced memory system, with larger and higher bandwidth L1 and L2 caches and a much higher bandwidth main memory system. So while the Athlon and P4 generally sustain fewer instructions/cycle than many server-class RISC CPUs (though the Athlon is certainly comparable to the G4), they have the advantage of a higher clock rate. This is due to a number of reasons; there is the obvious pipeline length factor, though the IBM POWER4 has an integer pipe that is longer than the Athlon's. Generally speaking x86 CPUs achieve a higher clock rate due to their volume and target market. A higher volume chip can achieve better speed bins during manufacturing. And because they are produced in such a high volume, they generally move to finer manufacturing process technologies before server-class CPUs, which are often produced on older, tried-and-true manufacturing processes in order to maximize yield given their large die size. For example, the P4 is moving to a 90nm process node at the end of this year, while the Alpha EV7 is being moved to a 130nm process next year. Thus, because the increasing number of transistors per chip allowed x86 CPUs to adopt advanced microarchitectures, and because of their volume manufacturing, x86 CPUs have generally caught up to server-class RISC performance in many respects. Just check out the position of the Athlon and P4 in SPECint 2K and SPECfp 2K (used as a cross-platform benchmark to test integer and floating-point workstation-like performance).
One thing I have never learned is why does intel and other PC maufacturers use the CISC based processors and not RISC based like DEC and PowerPC?
Frankly because this industry loves backwards compatibility. It should be noted that Intel tried twice before to get rid of x86, not including IA-64. x86 was originally a stop-gap measure put out by Intel because of the numerous delays of the iAPX 432, a so-called "super-CISC" chip that directly executed high-level object-oriented code. It was over five years late to the market, and performed very poorly. Their second attempt was the i860 in the late 80s, a RISC CPU that had some very interesting features, including a VLIW-like dual-instruction issue mode. Though it achieved success as a microcontroller and in the Paragon supercomputer, it flopped as a general-purpose desktop and workstation microprocessor. * not speaking for Intel Corp. *

Excellent post, very well written in explaining a complex concept in simple/plain language.
 

rc240sx

Member
Nov 14, 2002
27
0
0
RISC processor advantages = Runs cooler, uses less electricity, simpler processor design.
RISC disadvantage = Simpler processor design means more complicated software. Software is the heart of everything you do. Can only run real instructions(one at a time)

CISC processor advantages = Can run psuedo instructions (A collection of instructions like the DIV in assembly.) Software is much easier to implement because of this .
CISC processor disadvantages = Runs hotter, more complex processor design, wont last as long in laptops.

I heard somewhere a laptop with a CISC processor laptop running DVD will last about 2 hours but A RISC processor will last like 8 hours.

CISC is the the dominant architecture. All PCs have CISC processors. However the new intel chip is moving toward a RISC architecture.
 

rc240sx

Member
Nov 14, 2002
27
0
0
So as you can see there is advantages and disadvantages to both. It almost like its your preference.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: rc240sx
RISC processor advantages = Runs cooler, uses less electricity, simpler processor design.
RISC disadvantage = Simpler processor design means more complicated software. Software is the heart of everything you do. Can only run real instructions(one at a time)

No, nobody ever said RISC can't be implemented superscalar. In fact, a grad student I know is doing a 4-way sueprscalar inorder MIPS chip IIRC.
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
It used to be that RISC was more efficient back when processors were much simpler than they are today.

Today, CISC (on an external level) has the advantage, because each instruction does more, hence the code to complete any task is smaller, and loads into cache quicker. (and purges less other items out).

At a low level, Athlons & P4s are RISC (after the x86 instructions each get decoded into multiple simpler microinstructions internally).

The disadvantages of CISC turned into advantages.

The smaller number of exposed registers in CISC instruction sets is part of what makes hyperthreading practical. (Having duplicates of the 15 or so 32-bit registers in an x86 (8 general purpose, PC, stack ptr, flags, segment ptrs, etc) is far cheaper than having duplicates of the 32 64-bit general purpose registers and various other registers in a PowerPC 601 (I think a G4 doubles this not sure).

This limited number to work with has forced compiler writers to optimize load/store logic more heavily on x86 sooner, and the higher difficulty of decoding the instructions accelerated innovations within the processor like microcode-caching. (Even RISC processors decode to a lower level microcode nowadays, but since the decoding is simpler they don't yet cache post-decode results).

Its gotten past the point where a P4 or Athlon will outperform any PPC. My 1GHz Athlon scores higher on Speedometer in an emulator (mac benchmark) than my brother's 350MHz G3 does running native code. A 2GHz P4 actually outperforms a 1GHz G4 running mac software in emulation

Linus Torvalds (Linux author) made a really good post on this here

thread view
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Excellent post, very well written in explaining a complex concept in simple/plain language.
Thanks. :)



Software is much easier to implement because of this [CISC].
This is debateable...ease of software design was one of the motivating factors behind complex instructions when compilers were immature and assembly was more common. But as compilers improved in the late-70s/early-80s, it was found that complex instructions were rather useless. Compilers like orthogonal instructions (where the operation, the operands, and the addressing modes can be chosen independently), but CISC ISAs tended to have awkward non-orthogonal rules to assembling complex instructions. As a result, a small subset (around 10 or so in x86 and VAX) of simple instructions were typically the most commonly-used instructions (one of the motivating factors behind RISC).

Originally posted by: glugglug
It used to be that RISC was more efficient back when processors were much simpler than they are today.

Today, CISC (on an external level) has the advantage, because each instruction does more, hence the code to complete any task is smaller, and loads into cache quicker. (and purges less other items out).
This was true three decades ago when memory was expensive and fetch bandwidth was perhaps the most limiting factor in performance. This hasn't been true in over 15 years. Frankly Linus' similar arguments are not canon; his views are too influenced by his role in OS design, where code size is a little more important due to compulsory misses. Otherwise the ~ 1.2X increase in code size with most RISC ISAs is hardly critical. Given that code size is extremely important in embedded processors and that they use RISC ISAs with modifications to keep code size down (16-bit instructions or compression), the CISC code size advantage doesn't stand much ground.

And "doing more work per instruction" isn't affecting x86 code size...the ten most-used x86 instructions (load, cond branch, compare, store, add, and, sub, reg move, call, return) are all relatively simple and on average account for 96% of used instructions (in SPECin92). The slightly smaller code size is far more affecting by the variable-length encoding.

At a low level, Athlons & P4s are RISC (after the x86 instructions each get decoded into multiple simpler microinstructions internally).
While I've blurred the distinction at times as well, it's not appropriate to just call the P4 and Athlon a RISC microprocessor with a decoder slapped in front. The x86 ISA still has implementation difficulties in the pipeline, despite the decoding, with regards to tracking and retiring instructions, flags, and condition codes.

The smaller number of exposed registers in CISC instruction sets is part of what makes hyperthreading practical. (Having duplicates of the 15 or so 32-bit registers in an x86 (8 general purpose, PC, stack ptr, flags, segment ptrs, etc) is far cheaper than having duplicates of the 32 64-bit general purpose registers and various other registers in a PowerPC 601 (I think a G4 doubles this not sure).
Dramatically increasing the number of registers does make the implementation more difficult, but 8 vs. 32 logical registers isn't a major implementation barrior for SMT when rename files have surpassed 100 registers and will continue to increase as OOOE window sizes increase. Implementation techniques can make large register files practical...the EV8 was going to have 4-way SMT with 256 architected registers and 256 rename registers and still achieve 1.8 GHz on a low-volume process.

and the higher difficulty of decoding the instructions accelerated innovations within the processor like microcode-caching
The trace cache wasn't motivated by x86 -> RISC-like op decoding (I know one of the inventors ;)). The trace cache is a method of high-bandwidth instruction fetch which builds dynamic instruction traces. The ability for it to decouple x86's complex decoding from the main pipeline was just an added bonus when it was adapted to the P4...in fact the first simulations were performed on SPARC and MIPS which did not have the same motivation to decouple a complex decode process.

Even RISC processors decode to a lower level microcode nowadays
AFAIK this is limited to the POWER4, but it only cracks less seldom-used instructions into smaller ones

but since the decoding is simpler they don't yet cache post-decode results
Sure they do...the Alpha EV6 instruction prefetcher predecodes instructions in its instruction cache with information about the target functional unit and assists in fetch control.

Linus Torvalds (Linux author) made a really good post on this here

thread view
Frankly, while I agree with some of Linus' comments, others show he's a bit out of touch. He seems to think that x86 is successful because of its ISA, I'd say it's successful despite its ISA. He doesn't seem to acknowledge that volume manufacturing, huge design teams, endless speed-path tweaks, quicker moves to finer manufacturing processes, and intense competition have had far more of an effect on x86's high-performance implementations than any instruction set feature. Give any RISC design with a similar microarchitecture the same design and manufacturing resources of an x86 chip and it will match (and perhaps exceed) the clock rate.

* not speaking for Intel Corp. *
 

Eug

Lifer
Mar 11, 2000
24,154
1,801
126
Originally posted by: glugglug
The smaller number of exposed registers in CISC instruction sets is part of what makes hyperthreading practical. (Having duplicates of the 15 or so 32-bit registers in an x86 (8 general purpose, PC, stack ptr, flags, segment ptrs, etc) is far cheaper than having duplicates of the 32 64-bit general purpose registers and various other registers in a PowerPC 601 (I think a G4 doubles this not sure).

Its gotten past the point where a P4 or Athlon will outperform any PPC. My 1GHz Athlon scores higher on Speedometer in an emulator (mac benchmark) than my brother's 350MHz G3 does running native code. A 2GHz P4 actually outperforms a 1GHz G4 running mac software in emulation
Which benchmark? Anyways, a 350 MHz G3 is terribly slow so I can believe a 1 GHz Athlon might be able to beat it. However, I don't believe a 2 GHz P4 would outperform 1 GHz G4 running mac software in emulation.

By the way, Power5 will be hyperthreaded.
 

Buddha Bart

Diamond Member
Oct 11, 1999
3,064
0
0
By the way, Power5 will be hyperthreaded.
Got a source? I'm not calling you a liar or anything, I just seriously have not been able to find anything on POWER5. From what I read about POWER4/Power 970 they seem to be the last huzzah of the CISC/RISC era. Everything and the kitchen sink is thrown in there. Will POWER5 try to push that further? or go VLIW like MAJC/Crusoe/Itanium?

bart
 

Eug

Lifer
Mar 11, 2000
24,154
1,801
126
Originally posted by: Buddha Bart
By the way, Power5 will be hyperthreaded.
Got a source? I'm not calling you a liar or anything, I just seriously have not been able to find anything on POWER5. From what I read about POWER4/Power 970 they seem to be the last huzzah of the CISC/RISC era. Everything and the kitchen sink is thrown in there. Will POWER5 try to push that further? or go VLIW like MAJC/Crusoe/Itanium?
Power5 will be the world's first multi-core multi-threaded CPU.

The dual-core chip will handle four threads simultaneously in a design that could give server makers a four-fold performance boost over systems using IBM's current Power4 processor.

Take that 4X number with a grain of salt. Not that I know anything about this sort of thing, but others think it will be more like 2X in real-life. However, that's still a nice speed boost. Don't ask me any more questions, because you're over my head. ;)