Yonah article here on Anandtech Part II

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: RichUK
so what is the disadvantages of the POWER5, heat?

Not to mention they are not x86
The different architecture is the key. And POWER5 has more than 3-issue, which is now x86 CPU.

Conroe has 4-issue, the different architecture with K8 and any other x86 CPU, so the rule of pipes number is not useful for the compare between Conroe and K8.
 

coldpower27

Golden Member
Jul 18, 2004
1,676
0
76
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
pipe stage != the number of inst issue

K8/Prescott/Yonah is 3-issue, but the pipe stage is different:12-stage/31-stage/11-13-stage?
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.
I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

 

coldpower27

Golden Member
Jul 18, 2004
1,676
0
76
Originally posted by: Betwon
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
pipe stage != the number of inst issue

K8/Prescott/Yonah is 3-issue, but the pipe stage is different:12-stage/31-stage/11-13-stage?


This doesn't really exaplin anything either to me, I know the pipeline stages of the other architectures, and Conroe is a 4 issue wide architecture, how is this 4 more pipes then a 3-issue wide architecture such as K8,Pentium M,NetBurst.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,241
16,107
136
Originally posted by: Zebo
It's all about low power Mark.. I think in that first review system power was 100W loaded vs, 135W for same speed X2.. granted not huge difference like those 300W Plutonium D setups but still signifigant. Another feature we should get is much better mobo's with HD Azila audio from intel and better IDE/active armour driver than NF4.

Main questions are a) overclocks b) price c) and when, to determine if it's really any good or not.. certainly not the performance I expected clock for clock I expected a little bit better than X2's.

What I found most interesting about the review was the talk about conroe - the processor intel is really releasing to compete with AMD on desktop - Evidenty Conroe will have 4 more pipes hurting performance more - but allowing for higher clock speeds too. Next year should be fun.:)

I agree Zebo. I just hate the usuall troll comments from fatty. Its too early to make any real judgements based on price, OCability, etc... and These are laptop chips ! It is a BIG step in the right direction though.
 

RichUK

Lifer
Feb 14, 2005
10,341
678
126
Originally posted by: pm
Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.
I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff :confused: ).
 

Leper Messiah

Banned
Dec 13, 2004
7,973
8
0
Originally posted by: RichUK
Originally posted by: pm
Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.
I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff :confused: ).
the best I think it can be described is that the number of pipline stages can said as the "speed limit" of the proc, and the 3 way vs. 4 way could be said to be the number of lanes on the highway.

But I'm not an egineer, so that could just be FUD.
 

Lithan

Platinum Member
Aug 2, 2004
2,919
0
0
Amd still beats it clock for clock in games? And it's not a margin of error victory either. Is it still on 133 bus or something? If that's performance on 200 bus, then Im really disappointed. If it's 133 bus, then I wanna see some overclocked to 200bus results.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,241
16,107
136
Originally posted by: Betwon
Yonah(1.66GHZ) is about 240$.

Athlon64X2(2GHz) is about 315$. If AMD will sell Athlon64X2(1.8G), the price may be 270$.
If AMD will sell TurionX2(1.6G), the price may be higher than Yonah.

So the price of Yonah(1.66G) is reasonable.

You can't even buy one. Where are you getting that information ?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,241
16,107
136
You mean might be ? you can;t buy one yet, and the dual-core Turion might come out by then. Wait until its available, and has a price !!!! Then we can comment.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: Markfw900
You mean might be ? you can;t buy one yet, and the dual-core Turion might come out by then. Wait until its available, and has a price !!!! Then we can comment.

Yonah will come 2006.1... some days...

Where is TurionX2? Maybe 2006.12?And TurionX2(1.6G) may be more expensive than 240$.

Yonah(1.66GHz) is about 240$ -- information from theinquirer or Intel. That can be sure.
 

Hacp

Lifer
Jun 8, 2005
13,923
2
81
Originally posted by: Lithan
Amd still beats it clock for clock in games? And it's not a margin of error victory either. Is it still on 133 bus or something? If that's performance on 200 bus, then Im really disappointed. If it's 133 bus, then I wanna see some overclocked to 200bus results.

Wow, nice job, you gained like 2 frames on games at resolutions/settings that matter. Woot thats great.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: Leper Messiah
Originally posted by: RichUK
Originally posted by: pm
Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.
I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.
Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.
I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff :confused: ).
the best I think it can be described is that the number of pipline stages can said as the "speed limit" of the proc, and the 3 way vs. 4 way could be said to be the number of lanes on the highway.

But I'm not an egineer, so that could just be FUD.
Easy to understand:
K8 pipeline:
3-issue --- width
12-stage--- length
Yonah pipeline:
3-issue --- width
11/12 or 13-stage--- length
Merom pipeline:
4-issue --- width
14-stage--- length

More width maybe bring more ILP(instruction level parallelism), More length maybe bring the higher frequency.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
4-wide issue on merom refers to the frontend -> backend width. I don't really know why they decided to hype that particular pipestage since there is a lot more to merom than that one improvement over previous p-m's, but I guess the talkers needed something to latch on to. Though I admit it is damn difficult to make 4-wide issue go fast.
 

RichUK

Lifer
Feb 14, 2005
10,341
678
126
Does the term "4-wide issue", imply that it is capable of processing say four floating point operations a clock cycle, against say three FP ops on the K8's?

If that is correct, then that there is the added parallelism


 

Betwon

Member
Dec 20, 2005
81
0
0
New architecture of Conroe/Merom is different with P-M and Netburst. If you try to find some relation between them with the previous CPU, latch on P-M may be more easy. Of course ...not Netburst.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
No that would be execution bandwidth.

A digression: Comparing the number of fu (functional unit heh) between p-m and k8 to judge throughput is not too accurate. One of the reasons is that p-m uses a unified scheduler, whereas P4 and K8 (afaik) uses a segmented scheduler. From what ive seen, p-m fu's are busy more often than not compared to K8 and P4.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
No that would be execution bandwidth.

A digression: Comparing the number of fu (functional unit heh) between p-m and k8 to judge throughput is not too accurate. One of the reasons is that p-m uses a unified scheduler, whereas P4 and K8 (afaik) uses a segmented scheduler. From what ive seen, p-m fu's are busy more often than not compared to K8 and P4.

Where do you know from? For P-M, The float point uop and integer uop will be placed into the same reservation station?

Are you sure that Pentium 3 works in that way?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
P6 and all its derivations had unified schedulers with stack execution units.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
P6 and all its derivations had unified schedulers with stack execution units.

The main of scheduler is the reservation station, which is the key part of out of order.

stack execution units is for x87 float point calculate .

Can you give the original information?

The "unified scheduler" really means that FP uop and Int uop can be place into the same reservation station?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Yes that's what i meant by a unified scheduler, all uops go into one structure.

I say stack because the P6 execution units are also arranged physically in a stack... no reference to the floating stack.

Sorry I don't have any details available on hand, I only know this because I'm working on a P6 derivative.
 

Betwon

Member
Dec 20, 2005
81
0
0
Originally posted by: dmens
Yes that's what i meant by a unified scheduler, all uops go into one structure.

I say stack because the P6 execution units are also arranged physically in a stack... no reference to the floating stack.

Sorry I don't have any details available on hand, I only know this because I'm working on a P6 derivative.
Really?

Load uops and store uops will go to memory order buffer. There must be another "RS" for load and store.

Do you mean the reorder buffer?

I do not think that reorder buffer can place all ops == unified scheduler.

For K8, All macro-ops will go to one reorder buffer.

I'm working on vtune.