POLL: How awesome would prescott have been with only a 20stage pipeline?

butch84

Golden Member
Jan 26, 2001
1,202
0
76
If intel had done all the optimizations described in anand's article, but kept the pipeline at 20 stages, how fast would that chip be? Damn, it seems like it would be pretty schaweet, considering the new optimizations seem to almost keep up with the 55% longer pipeline.

Anyway, i guess it just wasnt meant to be, cause such chips (too bad they dont exist) wouldnt scale as well, and IPC doesnt sell chips, clockspeed does :(

Jes curious on your thougts :)
 

Elcs

Diamond Member
Apr 27, 2002
6,278
6
81
With a 20-stage pipeline, I think Prescott would have been a foolish move to introduce. With the long pipeline, the Prescott seemingly recreates the P3 Vs. P4 fight. Looking rather weak at similar clockspeeds but being able to be ramped up considerably. Looks like Prescott may be able to do what the P4 did.

Its early days so Im not putting any money on the table. Just sitting and waiting. Games are more my forte so AMD wins the battle for me at the minute.
 

JeremiahTheGreat

Senior member
Oct 19, 2001
552
0
0
oops.. that "Hands A64 its A$$" vote was me :(

I thought it read something on the lines of "A64 would still kick its A$$". But i agree with Acanthus, without the changes it would never be able to scale as high as it should now.
 

ntrights

Senior member
Mar 10, 2002
319
0
0
Originally posted by: Acanthus
With a 20 stage pipeline, they couldnt ramp up the clockspeed or even match the clockspeed of the 3.4ghz northwood.

It's all in the cache, Latency of 4 cycles on L1 and 25 cycles on L2 instead of 2 and 16 has something to do with it..Prescott's cache is clearly extremely slow compared to that of Northwood but as we have seen the better overall architecture (0.09 µm, pre-fetcher and HT ) makes up for the
slow cache.

Edit: i dont think it would have been much faster with a shortenend pipeline...the problem is in the latency of the cache not the pipeline.
 

AyashiKaibutsu

Diamond Member
Jan 24, 2004
9,306
4
81
What's the point of releasing a prescott that can just barely compete with the current chips. They should have released a 20 pipeline prescott now and drop the 31 pipeline onto the table once they got to about 4 ghz
 

Corsairpro

Platinum Member
Feb 12, 2001
2,543
0
0
PEOPLE, ITS NOT HARD TO UNDERSTAND

Northwood was running out of clockspeed headroom (3.6GHz) (similar to the P3 around 1GHz when it first got there)
They lengthened pipeline to increase clockspeed SIGNIFICANTLY in the future. (whether it be 1 month or 6 months)
The optimizations allow a longer pipelined CPU to perform roughly the same as the current NW
When Prescott achieves speeds faster than NW could stably attain en mass (3.6GHz+) thats when the redesign pays off.
THUS Prescott isn't "all that" right now, but it has a future much brighter than Northwood.


Most companies are FORWARD LOOKING. Anyone who has a rudimentary business understanding refers to this as the going concern assumption. Intel plans on being in business next year, ten years, ad infintum. Thus the release of a 20 stage "Prescott" optimized cpu would be foolish. It would have doubled the amount of work and troubleshooting to be done getting the cpus ready for market release thus doubling R&D costs.
 

butch84

Golden Member
Jan 26, 2001
1,202
0
76
thats a good point corsairpro, i realize intel is forward looking..... im just curious how badass the hypothetical prescott i am suggesting would be, and i realize it wouldnt make much sense to sink all that money into a cpu that wouldnt scale well :)
 

Acanthus

Lifer
Aug 28, 2001
19,915
2
76
ostif.org
Originally posted by: Corsairpro
PEOPLE, ITS NOT HARD TO UNDERSTAND

Northwood was running out of clockspeed headroom (3.6GHz) (similar to the P3 around 1GHz when it first got there)
They lengthened pipeline to increase clockspeed SIGNIFICANTLY in the future. (whether it be 1 month or 6 months)
The optimizations allow a longer pipelined CPU to perform roughly the same as the current NW
When Prescott achieves speeds faster than NW could stably attain en mass (3.6GHz+) thats when the redesign pays off.
THUS Prescott isn't "all that" right now, but it has a future much brighter than Northwood.


Most companies are FORWARD LOOKING. Anyone who has a rudimentary business understanding refers to this as the going concern assumption. Intel plans on being in business next year, ten years, ad infintum. Thus the release of a 20 stage "Prescott" optimized cpu would be foolish. It would have doubled the amount of work and troubleshooting to be done getting the cpus ready for market release thus doubling R&D costs.

Thank you, you explained it better than i did.
 

MadRat

Lifer
Oct 14, 1999
11,978
295
126
Its not just a longer pipeline, its a dynamic length for the pipeline depending on the situation. Its also a large cache system, which by itself is the single best improvement to improvement for the processor. The cache system has double to latency of the older cache structure, which is contradictory to the original Williamette design, but with everything basically doubled in size, who cares!? The Northwood topped out at 3.4GHz, but the Prescott can ramp right on past 4GHz. Anyone that thinks Prescott isn't a superior DESKTOP PROCESSOR in comparison to the Northwood needs to have their head examined.

Prescott has some super secret additions that won't be unveilled until the 13th. I personally think its some sort of IA64 compatibility, using Prescott and Tejas to bridge consumers to future IA64 use. There is no reason IA64 cannot be emulated on the Prescott, but it may be more than a simple emulation. What would be even more impressive is if somehow the IA64 code could be run on the Prescott's architecture without emulation, like by bypassing the x86 decoders...
 

TAKKLE

Member
Jan 8, 2004
55
0
0
I'm thinkin towards the end of the year I'll get a 4.0 Ghz Prescott, that's when I think it'll start to really be worth it to have a Prescott over a NW unless you wanna seriously OC that 3.4 Prescott now... no one here would do that right? ;)
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
So, where's the "worse than Northwood" option?

Seriously, I doubt there would be any way for them to add some of the optimizations with only 20 stages and still produce the same clock speed. It would be more transistors and higher latency with the same pipeline depth. Add in the extra latency to the cache (which will pay off at higher speeds) and we'd be looking at a slower proc.


Originally posted by: MadRat
Prescott has some super secret additions that won't be unveilled until the 13th. I personally think its some sort of IA64 compatibility, using Prescott and Tejas to bridge consumers to future IA64 use. There is no reason IA64 cannot be emulated on the Prescott, but it may be more than a simple emulation. What would be even more impressive is if somehow the IA64 code could be run on the Prescott's architecture without emulation, like by bypassing the x86 decoders...

Technically, you can emulate any architecture in software.
Hardware is a different problem. From what little I know of the Itanium architecture, that would be very hard to do. Remember that the 20 and 31 stage pipelines don't factor in the x86 decoding. The trace cache effectively isolates the integer pipeline from the legacy decoders. It would make more sense to add extra decoding logic for IA64 instructions. However, the Itanium borrows some aspects of VLIW design, RISC philosophy (let the compiler deal with it) and incorporates an insane number of registers (something like 256 or so). You can probably get around the instruction bundling pretty easily and maybe even conditional branches but it'd be damn hard trying to emulate such a high register count on Pentium 4's architecture.
 

MadRat

Lifer
Oct 14, 1999
11,978
295
126
Prescott has, according to one quote I read somewhere, nearly 5 million transistors that nobody has explained what they are for yet...

Could the 256 registers and IA64 decoders be done in 5 mil transistors? Would 5 mil transistors more fit the AMD64 extensions?
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: MadRat
Prescott has, according to one quote I read somewhere, nearly 5 million transistors that nobody has explained what they are for yet...

Could the 256 registers and IA64 decoders be done in 5 mil transistors? Would 5 mil transistors more fit the AMD64 extensions?

I'm more inclined to believe the extra 5 million would be x86-64 support. I have no experience in chip fabrication, yet, but I have doubts about fitting 200+ extra registers onto a superscalar out of order processor within 5 million transistors.
On the other hand, it's also possible somebody miscalculated. The new Prescott core is around what, 125 M transistors?
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: MadRat
Prescott has, according to one quote I read somewhere, nearly 5 million transistors that nobody has explained what they are for yet...

Could the 256 registers and IA64 decoders be done in 5 mil transistors? Would 5 mil transistors more fit the AMD64 extensions?
The amount of unexplained transistors is actually between 30-40 million transistors, which is almost enough to fit another Northwood core, and a IA64 core. AMD has been quoted as saying that the AMD64 extensions only add about 5% to the core size.

 

VIAN

Diamond Member
Aug 22, 2003
6,575
1
0
I chose the A64 one, but I think it would have been about 10 percent better.
 

MadRat

Lifer
Oct 14, 1999
11,978
295
126
Originally posted by: Accord99
Originally posted by: MadRat
Prescott has, according to one quote I read somewhere, nearly 5 million transistors that nobody has explained what they are for yet...

Could the 256 registers and IA64 decoders be done in 5 mil transistors? Would 5 mil transistors more fit the AMD64 extensions?
The amount of unexplained transistors is actually between 30-40 million transistors, which is almost enough to fit another Northwood core, and a IA64 core. AMD has been quoted as saying that the AMD64 extensions only add about 5% to the core size.

I was referring to the number left over after they account for the second execution unit and its associated d-cache.

 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: MadRat

I was referring to the number left over after they account for the second execution unit and its associated d-cache.

There's a second execution unit?
Hm... quick scan on Anand's review. "Despite the lack of any new execution units.." Nope. No new execution units.

 

MadRat

Lifer
Oct 14, 1999
11,978
295
126
Here's the Prescott under the microscope: Image
Compare to the original P4 - Williamette: Link

The picture suspiciously looks like dual-cores. The original poster claims it is two D-caches and 2 execution units. Back in 2002 AT did an article stipulating that Intel was investigating twin cores, but cores at different speeds - one fast one slow - to keep thermal disipation down. So if it is twin cores then don't necessarily expect two 3GHz+ cores.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Looking at the image, it looks to me like the thing is dual core. That could explain the decision to go with much larger caches rather than keep the lower latency caches. It would be needed to help deal with the much increased memory contention. But then again, it doesn't seem like the duplicate unit would be big enough to hold and FPU so could it possible some sort of integer only second core???
 

Chrono

Diamond Member
Jan 2, 2001
4,959
0
71
The whole P4 architecture was based on a stretched pipelines. I believe this has something to do with thermal design.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: zephyrprime
Looking at the image, it looks to me like the thing is dual core. That could explain the decision to go with much larger caches rather than keep the lower latency caches. It would be needed to help deal with the much increased memory contention. But then again, it doesn't seem like the duplicate unit would be big enough to hold and FPU so could it possible some sort of integer only second core???
That's in fact what it is, a second integer execution engine, including a second L1 data cache. It doesn't appear that other key components are duplicated though. What the purpose is of having two integer engines though, is still a big mystery.

 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
That's in fact what it is, a second integer execution engine, including a second L1 data cache. It doesn't appear that other key components are duplicated though. What the purpose is of having two integer engines though, is still a big mystery.
Is this actually known for a fact? I wonder if the 2 int units provide the ability to act like dual core processor. I guess 2 int units would be almost like having a dual processor computer so long as no significant amount of floating point/MMX/SSE code is used. So the thing would be great for servers of all kinds. And for code that has a lot of floating point in it, it could still be useful so long as there is also a siginificant amount of integer code. (of course, any such programs would need to be multithreaded to take advantage of the second core.)

Hmm. The P4's trace cache would make it much more suitable for a cheap partial second core than non trace cache designs because the trace cache reduces the load on the instruction decoders so you can get away without putting on an expensive second decoder without too much performance loss.

OR,

Instead of what I was talking about above, the second int unit could maybe be used to reduce branch prediction failures. At any branch point that the branch predictor predicts with low accuracy, the processor could simply take both branches. This way, a misprediction for that one point is impossible (so long as it's a simple branch).
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Or, it could just simply be a coincidence. Last I heard, the Prescott design has a few optimizations for hyperthreading. With that in mind, it wouldn't be too far of a stretch to simply take the two existing ALUs in the older P4 design and tweak them to run better resulting in a die layout that screams "dual core" to most people.
Just a thought.