Yonah article here on Anandtech Part II

Betwon · Dec 21, 2005

Originally posted by: RichUK
so what is the disadvantages of the POWER5, heat?

Not to mention they are not x86

The different architecture is the key. And POWER5 has more than 3-issue, which is now x86 CPU.

Conroe has 4-issue, the different architecture with K8 and any other x86 CPU, so the rule of pipes number is not useful for the compare between Conroe and K8.

coldpower27 · Dec 21, 2005

Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

Betwon · Dec 21, 2005

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

pipe stage != the number of inst issue

K8/Prescott/Yonah is 3-issue, but the pipe stage is different:12-stage/31-stage/11-13-stage?

pm · Dec 21, 2005

Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.

I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

coldpower27 · Dec 21, 2005

Originally posted by: Betwon

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

Click to expand...

pipe stage != the number of inst issue

K8/Prescott/Yonah is 3-issue, but the pipe stage is different:12-stage/31-stage/11-13-stage?

This doesn't really exaplin anything either to me, I know the pipeline stages of the other architectures, and Conroe is a 4 issue wide architecture, how is this 4 more pipes then a 3-issue wide architecture such as K8,Pentium M,NetBurst.

Markfw · Dec 21, 2005

Originally posted by: Zebo
It's all about low power Mark.. I think in that first review system power was 100W loaded vs, 135W for same speed X2.. granted not huge difference like those 300W Plutonium D setups but still signifigant. Another feature we should get is much better mobo's with HD Azila audio from intel and better IDE/active armour driver than NF4.

Main questions are a) overclocks b) price c) and when, to determine if it's really any good or not.. certainly not the performance I expected clock for clock I expected a little bit better than X2's.

What I found most interesting about the review was the talk about conroe - the processor intel is really releasing to compete with AMD on desktop - Evidenty Conroe will have 4 more pipes hurting performance more - but allowing for higher clock speeds too. Next year should be fun.

I agree Zebo. I just hate the usuall troll comments from fatty. Its too early to make any real judgements based on price, OCability, etc... and These are laptop chips ! It is a BIG step in the right direction though.

RichUK · Dec 21, 2005

Originally posted by: pm

Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.

Click to expand...

I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

Click to expand...

I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff

).

Leper Messiah · Dec 21, 2005

Originally posted by: RichUK

Originally posted by: pm

Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.

Click to expand...

I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

Click to expand...

I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

Click to expand...

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff ).

the best I think it can be described is that the number of pipline stages can said as the "speed limit" of the proc, and the 3 way vs. 4 way could be said to be the number of lanes on the highway.

But I'm not an egineer, so that could just be FUD.

Lithan · Dec 21, 2005

Amd still beats it clock for clock in games? And it's not a margin of error victory either. Is it still on 133 bus or something? If that's performance on 200 bus, then Im really disappointed. If it's 133 bus, then I wanna see some overclocked to 200bus results.

AkumaX · Dec 21, 2005

Originally posted by: fatty4ksu
Score one for Yonah!

Great benchies for Intel!

why my thread

Via will pwn all!!!

Markfw · Dec 21, 2005

Originally posted by: Betwon
Yonah(1.66GHZ) is about 240$.

Athlon64X2(2GHz) is about 315$. If AMD will sell Athlon64X2(1.8G), the price may be 270$.
If AMD will sell TurionX2(1.6G), the price may be higher than Yonah.

So the price of Yonah(1.66G) is reasonable.

You can't even buy one. Where are you getting that information ?

fatty4ksu · Dec 21, 2005

Originally posted by: Wahsapa
yonah is the best laptop chip to date.

:thumbsup:
:thumbsup:

:thumbsup:

:thumbsup:

:thumbsup:

Markfw · Dec 21, 2005

You mean might be ? you can;t buy one yet, and the dual-core Turion might come out by then. Wait until its available, and has a price !!!! Then we can comment.

Betwon · Dec 21, 2005

Originally posted by: Markfw900
You mean might be ? you can;t buy one yet, and the dual-core Turion might come out by then. Wait until its available, and has a price !!!! Then we can comment.

Yonah will come 2006.1... some days...

Where is TurionX2? Maybe 2006.12?And TurionX2(1.6G) may be more expensive than 240$.

Yonah(1.66GHz) is about 240$ -- information from theinquirer or Intel. That can be sure.

Hacp · Dec 21, 2005

Originally posted by: Lithan
Amd still beats it clock for clock in games? And it's not a margin of error victory either. Is it still on 133 bus or something? If that's performance on 200 bus, then Im really disappointed. If it's 133 bus, then I wanna see some overclocked to 200bus results.

Wow, nice job, you gained like 2 frames on games at resolutions/settings that matter. Woot thats great.

Betwon · Dec 21, 2005

Originally posted by: Leper Messiah

Originally posted by: RichUK

Originally posted by: pm

Originally posted by: fbrdphreak
Mostly right here. More cache != more latency. It is all in how they design/implement the cache. Dothan had 2MB L2, as did the later P4's; but the P4's L2 was much much higher latency.

Click to expand...

I'd just add that more cache does usually imply more latency. There's more to the equation than just cache size and latency - how many read/write ports there are, how many ways, what restrictions there are on reading and writing ports, parity/ECC, how the tag compare is performed, how much power the circuit can afford to burn, how big the cache cell is (versus yield and low-voltage operation). But if everything is equal in terms of feature set, then adding more memory to the cache will slow it down either by adding more fan-in to the calculations, or by adding more loading to the cache lines slowing down the evaluate.

Originally posted by: coldpower27
Why does everyone keep saying Conroe have 4 more pipes? Conroe has a 14 stage pipeline. Isn't this only 2 more pipes then Athlon 64, and 1-3 Pipes more then Yonah. If your talking pipeline stages at least.

Click to expand...

I believe there is confusion over how many stages the pipeline is (how long the pipeline is), versus how many instructions can be issued (how wide the pipeline is).

Click to expand...

so what does 4 issue mean? Is this to do with the width of the pipeline, how is this relative the length of the pipeline?

I am familiar with the basic concept of a working x86 processor, but this stuff is a little beyond me (the 4 issue stuff ).

Click to expand...

the best I think it can be described is that the number of pipline stages can said as the "speed limit" of the proc, and the 3 way vs. 4 way could be said to be the number of lanes on the highway.

But I'm not an egineer, so that could just be FUD.

Easy to understand:
K8 pipeline:
3-issue --- width
12-stage--- length
Yonah pipeline:
3-issue --- width
11/12 or 13-stage--- length
Merom pipeline:
4-issue --- width
14-stage--- length

More width maybe bring more ILP(instruction level parallelism), More length maybe bring the higher frequency.

dmens · Dec 21, 2005

4-wide issue on merom refers to the frontend -> backend width. I don't really know why they decided to hype that particular pipestage since there is a lot more to merom than that one improvement over previous p-m's, but I guess the talkers needed something to latch on to. Though I admit it is damn difficult to make 4-wide issue go fast.

RichUK · Dec 21, 2005

Does the term "4-wide issue", imply that it is capable of processing say four floating point operations a clock cycle, against say three FP ops on the K8's?

If that is correct, then that there is the added parallelism

Betwon · Dec 21, 2005

New architecture of Conroe/Merom is different with P-M and Netburst. If you try to find some relation between them with the previous CPU, latch on P-M may be more easy. Of course ...not Netburst.

dmens · Dec 21, 2005

No that would be execution bandwidth.

A digression: Comparing the number of fu (functional unit heh) between p-m and k8 to judge throughput is not too accurate. One of the reasons is that p-m uses a unified scheduler, whereas P4 and K8 (afaik) uses a segmented scheduler. From what ive seen, p-m fu's are busy more often than not compared to K8 and P4.

Betwon · Dec 21, 2005

Originally posted by: dmens
No that would be execution bandwidth.

A digression: Comparing the number of fu (functional unit heh) between p-m and k8 to judge throughput is not too accurate. One of the reasons is that p-m uses a unified scheduler, whereas P4 and K8 (afaik) uses a segmented scheduler. From what ive seen, p-m fu's are busy more often than not compared to K8 and P4.

Where do you know from? For P-M, The float point uop and integer uop will be placed into the same reservation station?

Are you sure that Pentium 3 works in that way?

dmens · Dec 21, 2005

P6 and all its derivations had unified schedulers with stack execution units.

Betwon · Dec 21, 2005

Originally posted by: dmens
P6 and all its derivations had unified schedulers with stack execution units.

The main of scheduler is the reservation station, which is the key part of out of order.

stack execution units is for x87 float point calculate .

Can you give the original information?

The "unified scheduler" really means that FP uop and Int uop can be place into the same reservation station?

dmens · Dec 21, 2005

Yes that's what i meant by a unified scheduler, all uops go into one structure.

I say stack because the P6 execution units are also arranged physically in a stack... no reference to the floating stack.

Sorry I don't have any details available on hand, I only know this because I'm working on a P6 derivative.

Betwon · Dec 21, 2005

Originally posted by: dmens
Yes that's what i meant by a unified scheduler, all uops go into one structure.

I say stack because the P6 execution units are also arranged physically in a stack... no reference to the floating stack.

Sorry I don't have any details available on hand, I only know this because I'm working on a P6 derivative.

Really?

Load uops and store uops will go to memory order buffer. There must be another "RS" for load and store.

Do you mean the reorder buffer?

I do not think that reorder buffer can place all ops == unified scheduler.

For K8, All macro-ops will go to one reorder buffer.

I'm working on vtune.

Yonah article here on Anandtech Part II

Member

Golden Member

Member

Elite Member Mobile Devices

Golden Member

Moderator Emeritus, Elite Member

Lifer

Banned

Platinum Member

Lifer

Moderator Emeritus, Elite Member

Golden Member

Moderator Emeritus, Elite Member

Member

Lifer

Member

Platinum Member

Lifer

Member

Platinum Member

Member

Platinum Member

Member

Platinum Member

Member