Multi-cycle vs pipelined processors

Rainsford · Oct 31, 2003

I've got a question for your guys that spend your days designing processors. In one of my current computer engineering classes we are talking about processor design and we started out with multi-cycle designs and are now talking about pipelined processors. As I understand it, pipelined is better because the hardware doesn't "go to waste" since an instruction only uses one part of the processor at a time. So beginning a second instruction before the other one is finished seems like a really good idea to use unused parts of the processor.

But my question is this? How much does this help with real code in the real world? Pipelined seems great, except when the instructions depend on the result of the previous instruction. This prevents you from running the instructions right along-side each other (obviously). And then there is the problem of branches, if you take the "wrong" branch you have have to throw out all the work that is in the pipeline, right? Branch prediction units help, but how good are they really?

The reason I'm asking is because it seems like a pipelined processor would be much more complex to actually build given the numerous problems the design introduces. In the real world, is it still faster and does it still make better use of hardware?

Thanks for any help you guys and girls can give me in understanding this better.

Matthias99 · Oct 31, 2003

Since both AMD and Intel seem to have settled on pipelined designs for their chips, there are three options:

1) Pipelined superscalar designs are faster than the alternatives.
2) Pipelined designs are significantly easier/cheaper to design/manufacture than the competitors
3) AMD and Intel are both really dumb.

Just an obvervation.

AFAIK, pipelining works well enough in practice to be worth the extra complexity in the chip. Sure, there will be times when there are interdependencies, or contention for various parts (like the ALU), but in theory you shouldn't ever do *worse* than a non-pipelined design at an equivalent clock speed (although a non-pipelined chip would probably run at a higher clock and/or use less transistors for the same functional design). I know that Intel used to make a big deal out of their "advanced" branch prediction algorithms back in the early days of the Pentium line, but they don't talk about it much anymore.

CTho9305 · Oct 31, 2003

Originally posted by: Rainsford
I've got a question for your guys that spend your days designing processors. In one of my current computer engineering classes we are talking about processor design and we started out with multi-cycle designs and are now talking about pipelined processors. As I understand it, pipelined is better because the hardware doesn't "go to waste" since an instruction only uses one part of the processor at a time. So beginning a second instruction before the other one is finished seems like a really good idea to use unused parts of the processor.

But my question is this? How much does this help with real code in the real world?

A lot. Even if you had lots of pipeline stalls for conflicts, you're still going to get closer to 1 instruction per cycle.

Pipelined seems great, except when the instructions depend on the result of the previous instruction. This prevents you from running the instructions right along-side each other (obviously).

Not always. Sometimes, once the result has been computed, you can forward the value to a previous stage before it reaches the "normal" writeback stage.

And then there is the problem of branches, if you take the "wrong" branch you have have to throw out all the work that is in the pipeline, right? Branch prediction units help, but how good are they really?

Yes, you wasted all the cycles you spent on instructions that you shouldn't have executed. Simple branch predictors can get you upwards of 80% accuracy... modern ones give you something like 95% accuracy.

The reason I'm asking is because it seems like a pipelined processor would be much more complex to actually build given the numerous problems the design introduces. In the real world, is it still faster and does it still make better use of hardware?

It's a lot faster. It's a lot more complicated, especially as the pipeline gets deeper. I don't know if I'd say it makes better use of hardware... I guess it depends on your definition of "good use".

Superscalar is the next step... mutliple pipelines in parallel. It adds another set of conflicts... because an instruction in one of the pipelines could depend on one in the other pipeline, so you need the data even sooner.

AFAIK, pipelining works well enough in practice to be worth the extra complexity in the chip. Sure, there will be times when there are interdependencies, or contention for various parts (like the ALU), but in theory you shouldn't ever do *worse* than a non-pipelined design at an equivalent clock speed (although a non-pipelined chip would probably run at a higher clock and/or use less transistors for the same functional design). I know that Intel used to make a big deal out of their "advanced" branch prediction algorithms back in the early days of the Pentium line, but they don't talk about it much anymore.

If by non-pipelined you mean multicycle, yes you could probably clock it a little faster. However, the pipeline will be a LOT faster.

edit: typo.

Right now, I'm taking a computer architecture class, and the final project will be a 2-way superscalar, 5-stage MIPS processor (all in Verilog). So far, we've completed a single 5-stage pipeline, but without any branch prediction. The current lab assignment is adding branch prediction.

Lynx516 · Oct 31, 2003

To do a basic pipelined CPU isnt much harder than designing a nonpiplined one. You just need something to detect conflicts and branches. If you take the approach that if there is a conflict then you stall the pipeline or if there is a branch you wait till you know the result then it will still be significantly faster than a non piplined CPU due to the fact that if the pipeline stalls it just goes back to being effectivly non pipleined again.

Pipelined CPUs by definition are not overly complex it is when you get superscalar execution and such when things get hairy. Then you have to activly shedule which instruction is going where e.t.c. and things get more and more complex for lesser gain.

Pipelining is a much better use of Hardware as all the hardware has the posibility to be used at once while a non pipelined design can only have a certain amount used at one time.

PS Superscalar means multiple execution units not multiple pipelines. There is a difference.

rjain · Oct 31, 2003

pipelined isn't too hard, tho. It's adding superscalar abilities to that that makes it hard, because at that point, you basically need register renaming. Allowing OOE is what really complicates it and is basically as advanced as modern CPUs have gotten.

Lynx516 · Oct 31, 2003

Not to hijack your thread but CTho what are your pipeline stages?

CTho9305 · Oct 31, 2003

Instruction fetch, decode / operand fetch, execute, data mem, writeback. It's a MIPS processor (load-store architecture), so operands can only come from the register file. I might post a block diagram if I make one that doesn't suck

.

edit: Click here, click "Lab", project 3 is the current project.

Lynx516 · Oct 31, 2003

Interesting page there. I am at the moment almost finished a basic piplined MIPS type CPU. Teaching oneself isnt that easy and developing your own architecture that you think could be realistic isnt either. I am doing it in VHDL. Some areas a very very easy however some of the functions like having to calculate memmory addresses I dont quite understand teh need for.

CTho9305 · Oct 31, 2003

Originally posted by: Lynx516
Interesting page there. I am a 16 year old and I am at the moment almost finished a basic piplined MIPS type CPU. Teaching oneself isnt that easy and developing your own architecture that you think could be realistic isnt either. I am doing it in VHDL. Some areas a very very easy however some of the functions like having to calculate memory addresses I dont quite understand teh need for.

Hehe, I read the textbook for this class (Hennessey & Patterson, Computer Organization & Design) in high school... but didn't know verilog and did know how to go about learning it or VHDL, so I couldn't really do anything with it.

What do you mean, calculating memory addresses?

rjain · Oct 31, 2003

You need to calculate memory addresses because of (indirect) offset addressing modes.

Lynx516 · Oct 31, 2003

Ah ok my architecture doesnt support that yet. Its simple addition and subtration based on a base address am I correct? So pretty easy to implement in VHDL/Verilog

Rainsford · Nov 2, 2003

Thanks for all the info guys, that helps a lot. Our final project in the class is making a pipelined or multi-cycle processor that does standard MIPS instructions plus some extra ones. After reading all of this, I'm pretty sure I'm going to go with pipelined, coming up with the extra stuff seems interesting anyways, even if it is more complex. Plus faster is always good

rjain · Nov 3, 2003

Originally posted by: Lynx516
Ah ok my architecture doesnt support that yet. Its simple addition and subtration based on a base address am I correct? So pretty easy to implement in VHDL/Verilog

Yep

jhu · Nov 3, 2003

why mips?

CTho9305 · Nov 3, 2003

Originally posted by: jhu
why mips?

All instructions are fixed-length, and the instruction set is fairly small. None of the operations are particularly complicated. Translation: it's easy.

jhu · Nov 3, 2003

how about power? too many instructions?

CTho9305 · Nov 3, 2003

Originally posted by: jhu
how about power? too many instructions?

What about power? Power as in heat? We haven't concerned ourselves with that - power-aware computing is a graduate course here. What do you mean by too many instructions?

jhu · Nov 4, 2003

well, i meant power as in powerpc

CTho9305 · Nov 4, 2003

Originally posted by: jhu
well, i meant power as in powerpc

Ah. I don't know the instruction set, but I doubt it is as simple as MIPS. Besides, as we aren't doing virtual memory, real programs won't run anyway.

Lynx516 · Nov 4, 2003

PowerPC is a very complex architecture. You would have to implement virtual memmory e.t.c it would be horrendous to start off wiht.

rjain · Nov 4, 2003

Don't know if the PPC 4xx need an MMU, but the ISA is very complex, with all kinds of funky instructions. Would be a real pain to implement as a class project. The basic MIPS ISA is designed for simplicity. Pipelining them is almost trivial.

jhu · Nov 6, 2003

do you guys have to implement an mmu also?

CTho9305 · Nov 8, 2003

No, wouldn't that practically be implementing virtual memory? (with a virtually-indexed cache)

Matthias99 · Nov 9, 2003

Virtual memory requires paging -- you can use an MMU by itself to do things like protecting areas of memory from certain processes. It also happens to be used for VM, so that the remapping is transparent to the application layer.

Multi-cycle vs pipelined processors

Lifer

Diamond Member

Elite Member

Senior member

Golden Member

Senior member

Elite Member

Senior member

Elite Member

Golden Member

Senior member

Lifer

Golden Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Senior member

Golden Member

Lifer

Elite Member

Diamond Member