RISC vs. x86

Jeff7181 · May 18, 2004

I've done some research, not a lot, but the main difference I see is that for a while now, RISC processors have had 32 GPR's, 32 FPR's, and 32 SIMD Registers.

Does anyone have a link to a fairly easy to understand compare/contrast between the two? Or can anyone accurately explain it themselves?

Sahakiel · May 18, 2004

The original philosophy behind RISC was to reduce the complexity of the processor design by limiting hardware support for high level abstraction. It was the complete reverse of the VAX design.
The original philosophy behind x86 was to get something working for IBM's new PC. In doing so, it basically started off as a mess and has evolved into a bigger mess.

Jeff7181 · May 18, 2004

Originally posted by: Sahakiel
The original philosophy behind RISC was to reduce the complexity of the processor design by limiting hardware support for high level abstraction. It was the complete reverse of the VAX design.
The original philosophy behind x86 was to get something working for IBM's new PC. In doing so, it basically started off as a mess and has evolved into a bigger mess.

So... a RISC processor is more of a hardcore number cruncher, where an x86 processor has all these specialized instructions like MMX and SSE and SSE2 and 3DNow! etc. etc.???

f95toli · May 18, 2004

RIS is hsort for "Reduced Instructions Set".

A RISC processor have few instructions but they are fast (often only 1 clockcycle). THis means that it is very fast as long as the tasks are simple but gets slower when more complicated operations are needed.
In reality it is a bit more complicated but htat is the basic idea.

I guess a good example of "real" RISC processors would be the 8-bit microcontrollers, for example the Atmel AVR.

Mark R · May 18, 2004

RISC is really a different design philosophy to CISC.

Some of the eariler CPUs used relatively complex instruction sets - special purpose registers, different instructions for working on different registers, multiple special purpose instructions (branching, interrupt handling, stack operations). Low clock speeds meant in order to improve performance, CPU designers would make a single instruction do as much as possible - e.g. retrieve 2 numbers from memory, add them and then save the result back to RAM. Each instruction, could have lots of different modes, some retrieving data from RAM, others using data in registers, etc.

RISC designs stemmed from observation of the 20/80 rule. 20% of the instructions are used approx 80% of the time. The RISC design only implemented those few instructions and made them operate as quickly as possible - the result was a much simpler circuit, potentially capable of higher speeds, together with lower cost and reduced power consumption.

Despite simplicity, performance could be higher than CISC, because using multiple simple instructions could be quicker than the complex CISC circuit executing a complex instruction. E.g. the above example would be represented by 3 instructions: Load data1 into R1. Add R1 to Data2. Save R1 to RAM. The disadvantage was that programs needed more instructions - so more memory, and more memory bandwidth.

sao123 · May 19, 2004

Deleted

Sohcan · May 19, 2004

I think the best way to approach RISC vs. CISC is to look at the historical trends. Back in the 60s and 70s, the high-performance processors were multi-chip with slow main memory (eg core memory). Going to memory was an extremely expensive operation, so instruction bandwidth was at a premium. Like Mark R said, this forced designers to increase performance by making instructions "do" more, thereby requiring less instruction bandwidth. This was facilitated by microprogramming, an implementation technique in which, at a high level, an instruction's execution was dictated by an on-chip ROM. A micro-programmed datapath is relatively easy to design, so designers would replace commonly occuring machine instructions (programming was largely done in assembly at the time) with a single instruction.

A few things occured in the 70s and 80s which changed the playing field. First, compiler design progressed substantially. What people began to discover is that the complex instructions and addressing modes (the method in which an address is produced) were largely useless to compilers. If the semantics of the instruction, its address mode, or the way it used registers didn't exactly match the high-level use in the language, the compiler couldn't use the instruction. See Mark R's explanation of the 80/20 rule for the results of this.

Secondly, a few innovations on the chip front occured. The invention of DRAM in 1970 allowed fast, cheap, dense main memories...instruction bandwidth isn't as big of an issue, so perhaps now complex instructions aren't necessary. The advent of VLSI (very-large scale integration) allowed the design of high-performance single-chip microprocessors, and by the 80s it was possible to construct a high-frequency, pipelined microprocessor with an integrated cache. There were a few problems with CISC instruction sets that made this difficult. First, the complex instructions were difficult to pipeline, and second, microprogrammed designs are slow (poor instruction fetch bandwidth and difficult to pipeline).

Things start coming together now. In order to ease compiler design, RISC instructions are atomic (they do one thing), orthogonal and regular (any instruction can use any combination of registers, for the most part). Complicated addressing modes are no longer needed. These factors also allow the construction of a simple 5-stage pipeline, with a one-cycle execute stage.

Instead of microprogramming the datapath, innovations in VLSI design allow the instruction set to be hardwired. This, along with an instruction cache, allows the processor to sustain the execution of one instruction per cycle. But because transistor resources were still limited in the 80s, the number of instructions had to be reduced (which didn't have an impact due to the 80/20 rule). In addition, to make hardwired decoding easier, instructions have a single fixed length. x86 instructions can, IIRC, be from 1 byte to 15 bytes long, making it difficult to determine where one instruction ends and another begins. RISC instructions are typically all 32-bits long.

So the features of RISC over CISC are:
- Simple, atomic instructions
- Fixed-length, easy-to-decode instructions (the division of the fields of an instruction format into opcode, register, and function spaces is regular)
- Easy-to-compose and orthagonal instructions
- Fewer number of addressing modes
- Fewer number of instruction formats
- Larger number of registers

Personally, I don't think that a fewer number of instructions is a required feature of RISC. The fact that the current PowerPC instruction set has far more instructions than the Berkeley RISC or Stanford MIPS of the 80s doesn't make it any less RISCy; the sparse number of instructions in the first RISCs was just an artifact of the limit transistors resources available at the time. RISC is really an agreement between the architects and compiler writers than made instructions simpler and easier to use that also facilitated pipelined single-chip implementations.

It's no surprise that some people joke that RISC actually stands for "Really Invented by Seymour Cray." Cray developed many of these ideas in order to make extensive use of pipelining and high clock speeds when he designed the CDC 6600 in 1964 and the Cray-1 in 1976. John Cocke credited these designs when he designed the IBM 801 in 1977, regarded as the first computer to bring all the RISC ideas together.

Of course, any discussion of RISC vs. CISC would be remiss without talk of the present. Advances in process technology, circuit design, microarchitecture and even manufacturing have relegated architecture (instruction set) design to a second-order effect on performance, except for (arguably) floating-point and vectorizable code. Hence, through the power of backwards compatibility, today x86 is every bit as competitive as any RISC.

AEB · May 19, 2004

Dont forget about the EPIC instruction set

aka1nas · May 19, 2004

Also, modern x86 CPUs started to "borrow" some of the concepts of RISC design. Most modern chips will break up the more complex x86 instructions into smaller micro-operations, which are then processed much like a RISC processor would. This is what people are referring to when they say that much of the transistor count of current x86 chips comes from working around the limitations of the x86 ISA itself. There is usually circuitry that translates the x86 instructions into RISC-like instructions.

In a more market-centered view of the topic, RISC came about because hardware at the time was expensive, and the simplified nature of a RISC processor shifted the burden into software from hardware, forcing the compiler to do a bit more work to create code that would run well compared to previous architectures. However, the cost of hardware throughout the years has decreased dramatically, while the cost of software development has risen, which is part of why RISC-based systems such as the SPARCs and SGI's are coming under attack by the ever faster x86 chips.

Rainsford · May 20, 2004

RISC is basically simple hardware. As "reduced" implies, a RISC processor is a much simpler design than an x86 processor. It generally has fewer instructions, so less hardware, and fewer stages in the pipeline as well. This is why RISC based machines can do jobs while running at several hundred megahertz that take x86 processors running at a few gighertz to do. Personally I think that RISC is a better design philosophy, but I've always been a fan of simpler is better.

Gamingphreek · May 20, 2004

How soon until we might see a successor to the x86 microarchitecture. Not extensions but a whole new micro architecture.

-Kevin

SuperTool · May 20, 2004

Originally posted by: Gamingphreek
How soon until we might see a successor to the x86 microarchitecture. Not extensions but a whole new micro architecture.

-Kevin

It almost happened with EPIC. Probably won't be tried again any time soon.
People are just not interested in recompiling their code.

Gamingphreek · May 21, 2004

What is EPIC. Also cansomeone explain to me the whole jist of micro architecture. I know PC uses x86 but ive heard of x87 and just an in a nutshell version of micro architectures.

-Kevin

glugglug · May 21, 2004

The main thing that separates RISC from CISC is the addressing modes instructions can have, NOT the number of instructions. The 6502 had only 96 opcodes, far less than any modern RISC CPU, yet it was considered CISC.

CISC processors have instructions that do things like load 2 pieces of data and add them putting the result in a register, or add 2 registers and store to memory, sometimes even instructions which dereference pointers and perform math on the pointed-to values.

RISC processors have load/store operations which potentially take a lot of time going to slow RAM as instructions by themselves, never combined with math & other stuff.

Of course, going strictly by this definition, the G4 and G5 are NOT RISC PROCESSORS because Altivec instructions are performing math on in-memory arrays.

imported_devnull · May 22, 2004

Well, lets put it all down...

RISC , i.e. Reduced Instruction Set Computer

RISC is a whole philosophy in designing processors. Most of its pure characteristics have already been mentioned :
-> use simple instructions that are to be found frequently in programs. Keep instructions simple so that decoding and execution may be implemented easily. Use fixed-length instructions so that decoding is easy
-> use simple addressing modes
-> use a load/store general purpose register design. What this means? Each operation that is to be carried out (like an add or a multiply) is between data that is placed in registers. There is a (maybe two) main pool of registers (a Register File) that can be used for any purpose. Before data is used it must be loaded to a register like that. For example adding two numbers is something like that :

LD R1, #0080 (load register 1 with the contents from memory address 0080 hex)
LD R2, #0081 (load register 2 with the contents from address 0081 hex)
ADD R3, R2, R1 (add the contents of R2 and R1 and place the result at R3)
ST R3, #0082 (store the contents of R3 in memory address 0082 hex)

Instead of a register, an immediate value encoded in the instruction may be used

-> implement an architecture that is compiler-friendly and not programmer-friendly. That is to say the target of the architecture is a compiler, so there is no need to realise high-level functions and offer structures for easy programmability, like a high-level language

CISC, i.e. Complex Instruction Set Computer, is a quite different approach. Two are its main goals :
1. make small compact programs
2. increase programmability

To make dense code a great number of instructions should be offered. Many addressing modes are required, variable-length instruction encoding and so on. In CISC machines one may meet instructions that define operations between registers and memory and between memory alone. Also an instruction may have one, two, three or more operands. Special purpose registers are common and each is responsible for a certain task (i.e. a special purpose register exists in order to indocate that an instruction has caused a exception/carry/special outcome/etc). Programmability also means that the processor's assembly language should have many common features to high level languages, so numerous addressing modes, operators, operands, etc should be available.

-------

Now EPIC stands for Explicitly Parallel Instruction something. To a fundamental level its no different to CISC or RISC and could be anything of the above (although for reasons that I'll explain later its almost always RISC based). EPIC is variety of VLIW or Very Large Instruction Word. What is the main concept of such an architecture? SIMPLICITY MEANS EFFICIENCY, PLACE THE LOAD ON COMPILER AND FOCUS ON HIGH PERFORMANCE.

Let's put things on the table.
All modern processors (ALL - except for microcontrollers), don't execute only one instruction at a time. This is what SuperScalar is. But the problem is that the code is produced as if one instruction is completed before the other and not simultaneously (please be carefull I'm not talking about the pipelining concept). So a processor that has let's say two ALUs and two FPU's has to look at a number of instructions and decide whether four instructions may be issued simultaneously. If this may be done then ok! The problem as you understand is that this is really complex, not that efficient and surely limited by the limited amount of instructions that the processor may envision before making a decision. On the other hand a compiler has access to the whole code, and it may surely be more efficient in extracting a high level of instruction parallelism. So that is the key concept behind EPIC/VLIW. Instead of having an instruction word of 32-bits long, make an instruction of 128-bit (64-bit in Itanium, 128-bit in Crusoe and 256-bit in Efficeon) to pack a number of instructions. For example an instruction could say add two registers (integers), multiply two others (integers), multiply some fps, load two registers etc. This is a very promising concept that takes off load and complexity from the processor and puts it in the compiler.
Of course as you may understand, it requires a lot more memory, and it is not easy always easy to fill all fields in every instruction. (noone is perfect but in my hamble opinion this is more perfect than the others!!!!)

Itanium is an effort over this direction, and Crusoe/Efficeon by Transmeta are as well. Though there is a major difference. In a programmer's view Itanium remains an EPIC/VLIW architectur while Crusoe is a x86 compatible. A specific software on Crusoe is responsible to interpret at run time the x86 code to VLIW.

--------

At the question posted as to whether one should expect a replacement of the x86 architecture, the answer is really tough, but one with pretty high accuracy may say IN NO TIME SOON. And the answer to the question why is very very simple : SOFTWARE Compatibiliy/Legacy. The code that exists for this architecture is HUGE and used by millions and millions, so things are really tough.

--------

glugglug mentioned that :
"Of course, going strictly by this definition, the G4 and G5 are NOT RISC PROCESSORS because Altivec instructions are performing math on in-memory arrays."

Although I am not really familiar with ALtivec, I think I'll have to disagree with you. If I have got right, Altivec offers some vector instructions. This is not generally contradictory to the RISC concept. I may be wrong.

CTho9305 · May 22, 2004

Originally posted by: devnull
So a processor that has let's say two ALUs and two FPU's has to look at a number of instructions and decide whether four instructions may be issued simultaneously. If this may be done then ok! The problem as you understand is that this is really complex, not that efficient and surely limited by the limited amount of instructions that the processor may envision before making a decision. On the other hand a compiler has access to the whole code, and it may surely be more efficient in extracting a high level of instruction parallelism. So that is the key concept behind EPIC/VLIW. Instead of having an instruction word of 32-bits long, make an instruction of 128-bit (64-bit in Itanium, 128-bit in Crusoe and 256-bit in Efficeon) to pack a number of instructions. For example an instruction could say add two registers (integers), multiply two others (integers), multiply some fps, load two registers etc. This is a very promising concept that takes off load and complexity from the processor and puts it in the compiler.

What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

At the question posted as to whether one should expect a replacement of the x86 architecture, the answer is really tough, but one with pretty high accuracy may say IN NO TIME SOON. And the answer to the question why is very very simple : SOFTWARE Compatibiliy/Legacy. The code that exists for this architecture is HUGE and used by millions and millions, so things are really tough.

http://www.extremetech.com/article2/0,1558,1342857,00.asp
http://news.com.com/2100-1006-5078125.html
"x86 Everywhere"

Gamingphreek · May 22, 2004

Jesus christ thats his first post. That is incredibly advanced!! im speechless (im referring to devnull)... ill have to sit for a long time and look at that to interpret it. Then ill have to ask some people to help. Wow AWESOME post!!

Just out of curiosity what is the fastest most expensive "best" processor in the world?

Also who would use the Alpha and Creusoe Chips? What does alpha run on (microarchitecture wise)

-Kevin

imgod2u · May 23, 2004

Originally posted by: CTho9305

Originally posted by: devnull
So a processor that has let's say two ALUs and two FPU's has to look at a number of instructions and decide whether four instructions may be issued simultaneously. If this may be done then ok! The problem as you understand is that this is really complex, not that efficient and surely limited by the limited amount of instructions that the processor may envision before making a decision. On the other hand a compiler has access to the whole code, and it may surely be more efficient in extracting a high level of instruction parallelism. So that is the key concept behind EPIC/VLIW. Instead of having an instruction word of 32-bits long, make an instruction of 128-bit (64-bit in Itanium, 128-bit in Crusoe and 256-bit in Efficeon) to pack a number of instructions. For example an instruction could say add two registers (integers), multiply two others (integers), multiply some fps, load two registers etc. This is a very promising concept that takes off load and complexity from the processor and puts it in the compiler.

Click to expand...

What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

Strickly speaking, for VLIW, this would be true. However, this isn't true for Itanium/IA-64. IA-64 doesn't let you specify which ALU to use, rather it just lets you specify which instructions can be done in parallel. So, in a sense, you sill are issuing individual instructions (e.g. add 5, 3, 2), but you're "bundling" them into groups that is defined to be executed in parallel. This way, the back-end won't have to check before it just executes them, using all resources available.

CTho9305 · May 23, 2004

Originally posted by: Gamingphreek
Jesus christ thats his first post. That is incredibly advanced!! im speechless (im referring to devnull)... ill have to sit for a long time and look at that to interpret it. Then ill have to ask some people to help. Wow AWESOME post!!

Just out of curiosity what is the fastest most expensive "best" processor in the world?

First, best and most expensive are not necessarily the same. Case in point, at a given performance level, you'll generally pay less for an AMD cpu than an Intel CPU. Second, there is no one "best" processor. It really depends on the task.

If your task has a 32 MB working set, it would crawl on any Opteron or Itanium, since you'd be pounding on the RAM, but would absolutely FLY on one of these HP PA-RISC CPUs.

SPECint/fp don't benefit as much from the huge cache, so you get a higher rating with a different processor.

Also who would use the Alpha and Creusoe Chips? What does alpha run on (microarchitecture wise)

Alpha is alpha (as in, not x86, not PPC). Alpha is also pretty much dead - Intel killed it to avoid competition with their IA64 products. The EV8 would have been one heck of a CPU 🙁. Alphas were used in very high-end workstations.

Crusoe is a mobile part - it goes in some laptops. I think Fujitsu has some Transmeta-based laptops. The performance isn't great, but the battery life should be.

Originally posted by: imgod2u

Originally posted by: CTho9305
What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

Click to expand...

Strickly speaking, for VLIW, this would be true. However, this isn't true for Itanium/IA-64. IA-64 doesn't let you specify which ALU to use, rather it just lets you specify which instructions can be done in parallel. So, in a sense, you sill are issuing individual instructions (e.g. add 5, 3, 2), but you're "bundling" them into groups that is defined to be executed in parallel. This way, the back-end won't have to check before it just executes them, using all resources available.

There is a fixed width for the instructions - you can only specify so much parallelism with each instruction. If there is more parallelism available, you're not maximizing your throughput (and can't, without extra superscalar-esque hardware). If there is not enough parallelism, you're wasting memory bandwidth.

imported_devnull · May 23, 2004

Originally posted by: Gamingphreek
Jesus christ thats his first post. That is incredibly advanced!! im speechless (im referring to devnull)... ill have to sit for a long time and look at that to interpret it. Then ill have to ask some people to help. Wow AWESOME post!!

Kevin that is a very-very nice welcome message to your community. Thank you!

Originally posted by: CTho9305
What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

Well, a lot may be done. One may introduce some superscalar characteristics. One may prefer the Crusoe/Efficeon approach and use a very low level software that will interpret in run-time the code to the new VLIW instruction set. One may choose not to maintain software compatibility. In my point of view the future is not to integrate more execution units in one core but use multi-core or multi-chip designs. All major processor manifacturers are moving towards this direction.

Originally posted by: Gamingphreek
Also who would use the Alpha and Creusoe Chips? What does alpha run on (microarchitecture wise)

Alpha has been a very innovative processor targeting the server/workstation/High Performance Computing segment of the market. It has been introduced at the 90's and it has been a 64-bit design right from the beginning (in a time where 32-bit computing was just beginning to spread). Though it has never been very successfull, due its cost/performance ratio. It was a RISC-based processor (if I remember well) but the most significant thing about this processor, was the highly out-of-order execution scheme it adopted. It has been a quite revolutionary processor (in fact some of its aspects were taught at my university) and Intel is now trying to get its hands on it.

Crusoe/Efficeon is whole different story. This processor is an hybrid software/hardware design, and its main design goal focuses on low power rather than high-performance. You may envision the design as something like this :

|-------------------------------|
| |
| S |-------------------| |
| O | | |
| F | Actual | |
| T | | |
| W | Hardware | |
| A | | |
| R | VLIW Core | |
| E |-------------------| |
| |
|-------------------------------|

Software around the processor is responsible for a number of tasks among which the most important are the following :

-> interpet x86 code to VLIW code
-> control the functioning blocks of the processor (shut down units, put to sleep, in Efficeon they even claim that this software control the leak current of the tramzistors!!)

The result is a very efficient processor in terms of low power dissipation (some models even run without fan) and this has made this processor very attractive for ultra-portable notebooks, silent desktops, and high density servers.

Gamingphreek · May 23, 2004

Not a problem... with that knowledge youll do very well in these forums.

What i was referring to in best and fatest and all that crap was which CPU be it RISC or x86 i guess you would way is used in todays highest end systems.

Also why was alpha killed by Intel i know that HP bought it but i also know that it used to run well above and beyond the speeds of are chips back then.

I know the Itanium is out of the Opterons league because were talking an 8 way instruction set compared to a 128 way set. So what would these highend RISC/ IA64 processors be good in.

Also is there any particular reason as to why AMD doesn't have a "superchip" (talking about a >128 instruction set) out there.

If we switched over to EPIC and ditched x86... doens't the EPIC run at a slower clock speed, have a smaller instruction pipeline, and therefore generates less heat with out sacrificing performance.

(Guys i gotta tell you that m completely out of my league here (as you can probably see) but im learning a whole sh!t load of new things.

-Kevin

Sohcan · May 23, 2004

Originally posted by: CTho9305

Originally posted by: imgod2u

Originally posted by: CTho9305
What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

Click to expand...

Strickly speaking, for VLIW, this would be true. However, this isn't true for Itanium/IA-64. IA-64 doesn't let you specify which ALU to use, rather it just lets you specify which instructions can be done in parallel. So, in a sense, you sill are issuing individual instructions (e.g. add 5, 3, 2), but you're "bundling" them into groups that is defined to be executed in parallel. This way, the back-end won't have to check before it just executes them, using all resources available.

Click to expand...

There is a fixed width for the instructions - you can only specify so much parallelism with each instruction. If there is more parallelism available, you're not maximizing your throughput (and can't, without extra superscalar-esque hardware).

Are you referring to classic VLIW or Itanium? On Itanium, that's not true (of course, an abundance of parallelism may still be limited by instruction rates on both superscalar and VLIW/EPIC). Independent instruction groups, containing no RAW or WAW dependences, are not per 128-bit instruction word bundle, but per instruction group, which may be of an unlimited length. Instruction groups are delineated by an optional stop bit that may appear at the end of an instruction bundle, or in some cases, in the middle of an instruction bundle. For example, an instuction group can span two 128-bit instruction bundles (6 instructions). The first bundle has no stop bit, indicating that the next bundle is part of the same independent instruction group. The second bundle has a stop bit at the end, indicating that the next bundle starts a new instruction group.

Itanium 2, for example, buffers up to 8 instruction bundles (24 instruction atoms) in its front end, and assigns instructions to functional units dynamically. The current implentation issues up to 6 instructions per cycle, but another implementation could have a higher issue rate, and take advantage of extra parallelism without recompilation.

CTho9305 · May 23, 2004

Originally posted by: Gamingphreek
I know the Itanium is out of the Opterons league because were talking an 8 way instruction set compared to a 128 way set. So what would these highend RISC/ IA64 processors be good in.

Also is there any particular reason as to why AMD doesn't have a "superchip" (talking about a >128 instruction set) out there.

I'm not sure what you mean by 8/128-way instruction set. You can do up to 8-way SMP with Opterons, but the limiting factor is not the instruction set, but rather bus bandwidth issues. As I understand it, the way the Opterons HyperTransport is set up, once you go above 8 CPUs, some CPUs become multiple "hops" apart, so the performance gains drop rapidly. (See HT organization here, and explanation here - note that article is from 2002, and since then, AMD have released up to 8-way SMP Opterons).

I'm not entirely sure about Itanium, but I think you can't go above 2-way or 4-way SMP without killing performance, because the CPUs share their memory bandwidth. I think these aren't just "simple" 128-way SMP, but rather made up of modules with 2 CPUs each. This AMD page claims you can't go above 4-way SMP with Itaniums.

glugglug · May 23, 2004

Intel didn't kill Alpha, Compaq did, when they silently discountinued development on it after buying DEC, the original Alpha creator.

imgod2u · May 23, 2004

Originally posted by: CTho9305

Originally posted by: Gamingphreek
Jesus christ thats his first post. That is incredibly advanced!! im speechless (im referring to devnull)... ill have to sit for a long time and look at that to interpret it. Then ill have to ask some people to help. Wow AWESOME post!!

Just out of curiosity what is the fastest most expensive "best" processor in the world?

Click to expand...

First, best and most expensive are not necessarily the same. Case in point, at a given performance level, you'll generally pay less for an AMD cpu than an Intel CPU. Second, there is no one "best" processor. It really depends on the task.

If your task has a 32 MB working set, it would crawl on any Opteron or Itanium, since you'd be pounding on the RAM, but would absolutely FLY on one of these HP PA-RISC CPUs.

SPECint/fp don't benefit as much from the huge cache, so you get a higher rating with a different processor.

Also who would use the Alpha and Creusoe Chips? What does alpha run on (microarchitecture wise)

Click to expand...

Alpha is alpha (as in, not x86, not PPC). Alpha is also pretty much dead - Intel killed it to avoid competition with their IA64 products. The EV8 would have been one heck of a CPU 🙁. Alphas were used in very high-end workstations.

Crusoe is a mobile part - it goes in some laptops. I think Fujitsu has some Transmeta-based laptops. The performance isn't great, but the battery life should be.

Originally posted by: imgod2u

Originally posted by: CTho9305
What happens when two years down the line, 2 ALUs aren't enough? You have to recompile code to take advantage of the new, more powerful 4 ALU 4 FPU processor? Or the new processor has to pull the same old superscalar tricks on the old code?

Click to expand...

Strickly speaking, for VLIW, this would be true. However, this isn't true for Itanium/IA-64. IA-64 doesn't let you specify which ALU to use, rather it just lets you specify which instructions can be done in parallel. So, in a sense, you sill are issuing individual instructions (e.g. add 5, 3, 2), but you're "bundling" them into groups that is defined to be executed in parallel. This way, the back-end won't have to check before it just executes them, using all resources available.

Click to expand...

There is a fixed width for the instructions - you can only specify so much parallelism with each instruction. If there is more parallelism available, you're not maximizing your throughput (and can't, without extra superscalar-esque hardware). If there is not enough parallelism, you're wasting memory bandwidth.

There is no limitation for instruction parallelism. Parallel groups are separated by a stop bit. The back-end then schedules all instructions in a group for execution with all available resources. Instruction bundles are there for the reason of faster fetch/decoding. I'm not sure I agree with the whole template scheme as it does not cover all possible instruction groups.

RISC vs. x86

Lifer

Golden Member

Lifer

Golden Member

Diamond Member

Lifer

Platinum Member

Senior member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Junior Member

Elite Member

Lifer

Senior member

Elite Member

Junior Member

Lifer

Platinum Member

Elite Member

Diamond Member

Senior member