Quick question about pipelines

LeftSide · Feb 18, 2004

Ok, all these talks about pipelines have me confused...
When the prescott has 31 piplines, does it take 31 clock cycles to get 1 thread through? Or is it just 1 clock cycle broken into 31 stages? I have heard both, but I think that it takes 31 clock cycles. This would explain why AMD gets so much more done with only a 15 stage pipeline.

Please feed me information, and solve the Pipeline mystery once and for all!!!

Lynx516 · Feb 18, 2004

It does tak 31 clocks to get one instruction through the pipeline. THough in an ideal world a 15 stage pipeline woudl perform identical to a 31 stage pipline. If you fill a pipeline you have (in prescott's case) 31 instructions going at once. So one instruction is finished per clock. It is only when you cannot fill the pipeline when performance begins to drop

LeftSide · Feb 18, 2004

Ok, that makes since. Now I see why the cache is so important for Intel. Thanks

sao123 · Feb 18, 2004

branching must be predicted...this is the true nemesis of long pipelining.
branch prediction algorithms are very difficult to become very accurate. The longer the pipeling, the longer the stall then a misprediction happens.

aka1nas · Feb 18, 2004

Or to expand a bit, the longer pipeline is only "slower" for the first 30 clockcycles while the pipeline is still partially empty, after that it starts cranking out instructions as fast more or less as the shorter pipeline, provided that it doesn't have a branch misprediction.

CTho9305 · Feb 18, 2004

Originally posted by: Lynx516
It does tak 31 clocks to get one instruction through the pipeline. THough in an ideal world a 15 stage pipeline woudl perform identical to a 31 stage pipline. If you fill a pipeline you have (in prescott's case) 31 instructions going at once. So one instruction is finished per clock. It is only when you cannot fill the pipeline when performance begins to drop

No. In an ideal world, the 31 stage pipeline would be faster. Both pipelines would be retiring one instruction per cycle, but the 31 stage pipeline would be clocked about twice as fast.

The problem is that the world isn't ideal, and branches exist (something like 1 in every 5 instructions, according to various professors), so if you can predict 95% of them correctly, and a 30 stage pipeline has 6 branches in flight, you're going to be wasting a lot of time following incorrect paths.

Lynx516 · Feb 19, 2004

I was using the idea world where transistor propergation didnt exist.

Brucmack · Feb 19, 2004

Originally posted by: Lynx516
I was using the idea world where transistor propergation didnt exist.

If everything happens in 0 seconds, then you've just simplified to the point where it doesn't matter how many stages you have anymore... At an infinite clock speed everything's going to get done pretty quickly

Matthew Daws · Feb 20, 2004

It's not just branches which cause a problem, but also pipeline stalls. These occur when an instruction in the pipeline needs the result of an instruction further up the pipeline which hasn't finished yet. Think about multiplying two numbers and adding a third: you need the result of the multiply before you can do the add (and many instructions alter the CPU flags which may or may not affect how subsequence instructions behave). If the pipeline is shorter, then it takes fewer cycles to get the result out and allow the rest of the pipeline to continue. This is what Out Of Order Execution is all about, when the scheduling part of the CPU tries to re-arrange the order instructions are put into the pipeline. This is also why RISC processor have a lot of registers, to allow programmers (or compilers) to not overuse the same register (and why both the Athlon and P4 have a lot of hidden registers and on-the-fly register renaming).

Of course, as CTho9305 points out, the longer pipeline of the P4 allows it to hit much higher clock speeds, and in an ideal case (e.g. multimedia streaming using SSE2 stuff) it is a lot faster.

--Matt

Search

Quick question about pipelines

LeftSide

Member

Lynx516

Senior member

LeftSide

Member

sao123

Lifer

aka1nas

Diamond Member

CTho9305

Elite Member

Lynx516

Senior member

Brucmack

Junior Member

Matthew Daws

Member

TRENDING THREADS