Prescott: What would you have done?

BEL6772 · Mar 15, 2005

Originally posted by: whatever
Anybody think of how it would have gone if Intel had simply taken Northwood and given it a die shrink and extra cache for 1MB L2 cache total? I believe that it would be considerably faster than Prescott as well as having a lower transistor count and therefore a smaller die and lower heat output.

Yeah, if they knew then what they know now, they may very well have transitioned away from the MHz race and stuck with the Northwood core + goodies until a dual-core or some other slower but feature-rich processor could be spun out. Even with a shrink, I don't think we'd see any Northwood based processor running at higher clocks than a Prescott, though.

clarkey01 · Mar 16, 2005

Originally posted by: BEL6772

Originally posted by: whatever
Anybody think of how it would have gone if Intel had simply taken Northwood and given it a die shrink and extra cache for 1MB L2 cache total? I believe that it would be considerably faster than Prescott as well as having a lower transistor count and therefore a smaller die and lower heat output.

Click to expand...

Yeah, if they knew then what they know now, they may very well have transitioned away from the MHz race and stuck with the Northwood core + goodies until a dual-core or some other slower but feature-rich processor could be spun out. Even with a shrink, I don't think we'd see any Northwood based processor running at higher clocks than a Prescott, though.

Well, they always have a new CPU in the wings whenever a new process comes online (90 nm for prescott). If they had shrunk northwood, I dont know what that would get them, sure I think it would be slightly cooler (but it's still a high frequency design so Intel only knows). I think they would have run into problems trying increase the clock too. I dont know what the headroom is on nortwhoods after the last one (3.4C). I think its safe to bet Intel have shrunk northwoods down to 90 nm just to see the results are.

To increase speed, your looking at losing more IPC (longer pipeline) and then your heading into prescott town (It's a very complicated chip, very different from northwood, you only have to look at the die shots to noticed that).Im sure there's some hidden talent in the core.

BitByBit · Mar 16, 2005

In retrospect, it seems that the best thing to have done would have been to keep the pipeline at 20 stages while implementing Prescott's improved branch predictor, perhaps with the goal of matching the Athlon's IPC (however difficult that would have been, given the Athlon's wider core) while maintaining its clock speed advantage.
The Pentium M is proof that Intel can do efficiency.
Prescott's large, slow L2 Cache is also a problem considering the importance of keeping its pipeline full.
Perhaps Prescott should have inherited Banias's L2?

SuperTool · Mar 17, 2005

Looks like it was covered already, but I would not double pump the integer unit (netburst).
It would be willing to bet that it is more area, power, and latency efficient to duplicate the integer FU's, and it's hard to keep scaling at 2x speed when you think of implications on power, especially if 1x clock speed is already much higher than all competitors.
I think the LVS signaling that Intel presented a couple years back is trying to jump up the wall of diminishing returns in terms of power and design effort just to be able to scale the speed of double pumped units at the same rate other units were scaling. From what I heard it took dozens of engineers to just design the shifter and check all the possible noise and timing violations. A great technical feat, but why? It's not like Intel got any PR from running their integer units at 2x frequency, since 99% of consumers have no idea that is happening.
I think Intel circuit designers are doing amazing work on both Itanium and P4, but their architects are coming out with overly ambitious designs that expect too much from compilers and process. Pentium M is a great processor with common sense architecture. Much smaller and lower power core, smaller design team, and performance that is not much worse than the P4. Maybe in the 90's when P4 was designed, companies didn't know what to do with all the extra chip area, so they put more complex circuits on it. Now the industry is moving more towards replicating smaller cores rather than designing huge cores, so P4 doesn't make much sense going forward, unless Intel wants to buck the industry trend.

dmens · Mar 18, 2005

Transistors that constantly toggling (i.e. domino precharge) can be made low-Vt without too much tradeoff since they are always switching anway.

prescott was an organizational failure... the design could've lasted longer if it weren't for the godawful project juggling. Augh.

SirAllen · Mar 19, 2005

If I was a Prescott engineer I would have included a grill attached to the heatsink. That way I could cook my breakfast in the morning w/o wasting additional electricity (even though the oven is probably a lot more efficient in its energy consumption)....

imgod2u · Mar 20, 2005

Well, modern computing is all about memory. The CPU is spending wsay too much time waiting for memory.

Prescott, believe it or not, actually has many improvements over Northwood in terms of "IPC". It added a dedicated shifter, it issues more micro-ops per cycle, its cache fetching scheme is slightly improved. All in all, it wasn't just about hiking clockspeed. I think the problem is that Intel wanted it both ways and ended up getting neither. If the caching system and pipeline had not been messed with, Prescott would've outperformed Northwood quite a bit at the same clockspeed. Of course, they just had to crave more clockspeed and hence, they hiked the L1 cache latency by 4x and the L2 cache latency by 2x to allow higher clockspeeds. They lengthened the pipeline as well. Now, common stigma asside, I doubt the increase in pipeline length really caused too much of a performance hazard. Prescott's branch prediction algorithms were improved upon and with modern branch prediction being ~95% accurate or even more, it's not much of a problem. Of course, this added to transistor count and therefore heat, so, without the added scalability, it was a waste.

However, in terms of performance, it's all about the memory. 4x the L1 latency means integer instructions (those that have to run at *twice* the core clock) have to wait 4 cycles vs 1 or 2 before data is ready. Modern code achieves maybe an ILP of 2 if lucky. Most of the time, it's less than 1. So executing out of order simply cannot feed enough independent instructions to the execution core.

The case is slightly less bad for FP processing, however, it's still worse. All FP data is fetched from the L2 cache and with a 20+ cycle latency, I have a hard time imagining any code that can provide enough ILP to mask that. The best scheduler in the world (which happens to be on Prescott) cannot extract that much ILP from your average code.

So, we have reached the point where not only is memory too slow to feed the processor, but with Prescott, the *cache* itself is too slow to feed the execution core. With only 8 GPR's (of which only 6 are usable by the programmer IIRC) and 8 SIMD/MMX/FP registers, going to cache will occur quite a bit.

How to improve performance? Well there are quite a few ways. For one, since the longer pipeline isn't doing anything but adding heat, take it down a notch to around Northwood's. Another is to change the caching structure back to Northwood's. 1-cycle to 2-cycle L1 latency would help dramatically. A reduced L2 cache would be great too. Some of the added improvements in Prescott should be kept as well. The dedicated shifter, the wider-issue trace cache. If the larger L1 data cache really causes so much latency, then reduce it to 8KB. The improved branch prediction algorithms would be great as well.

Some more experimental things would be an added stack cache, since a good bit of programs access the stack instead of the heap and a stack-like structure is easier to design for high-speed. A stack manager would help as well. Micro-ops fusion (perhaps even better than on Dothan/Banias) and a more flexible FP execution unit would help as well. Currently, Netburst is only good at FP when performing vector SSE operations. Another issue port for FP would help much more and allow scalar code to execute much faster.

These are only some ideas, I'm sure there are many more. Sadly, the future of MPU design seems to be going away from making more elegant and efficient cores. Instead, they're content on being lazy and just slapping multiple cores together and call that a performance increase.

dmens · Mar 20, 2005

Originally posted by: imgod2u
Sadly, the future of MPU design seems to be going away from making more elegant and efficient cores. Instead, they're content on being lazy and just slapping multiple cores together and call that a performance increase.

Dunno about other companies, but that's not what we're doing...

By the way, as we move into multiple cores on a single die, the power envelope for each core is dramatically lowered. Many fancy uarch features become simply unfeasible, and then a lot of potential features remaining have crappy ROI. Good examples of uber-expensive features include "better" branch predictors and data speculation. Also, IMO the execution core is no longer the limiting factor on performance. But that's just me.

In any case, I believe it's high time the programmers started to shoulder some of the burden in extracting parallelism. There is only so much logic we can cram into a pipestage.

uOpt · Mar 21, 2005

Well, I think Intel miscalculated what AMD could and could not do.

It seems very obvious to me that the Pentium-M line is the way of the future. High-power CPUs cause reliability problems for everybody and datacenters in particular hate them passionately. The high-frequency race also grinds to a halt. This is leading nowhere.

My analysis of the situation that led to the current mess is that Intel did expect that the Pentium-M would be more attractive and future-proven, but correctly estimated that they could not push it far enough fast enough to compete with AMD's top of the line chips.

But the mistake they made is that the Pentium-4 could, for most purposes, not compete either and they lost the performance crown nontheless.

If they have anticipated that they turn out slower either way, they would probably have chosen to put the effort into the Pentium-M line, e.g. produce a variant with dual channel RAM, advance SMP support for it and make a value chipset.

Now that they are in the hole they still don't stop digging, the dual-core CPUs will be as power-hungry as the Prescott. They do that because it would take too long to make the Pentium-M even SMP-capable, not to mention dual-core, and AMD would beat them. I think this is a classic example of repeating past mistakes. It seems to me that Intel has no choice than to voluntarily sacrifice one race with AMD to get things back under control. But they don't want to, so again more effort is going into the high-power CPU line.

Sahakiel · Mar 21, 2005

Not enough to be useful.
Just enough to be dangerous.

dmens · Mar 21, 2005

Originally posted by: MartinCracauer
Now that they are in the hole they still don't stop digging, the dual-core CPUs will be as power-hungry as the Prescott. They do that because it would take too long to make the Pentium-M even SMP-capable, not to mention dual-core, and AMD would beat them.

If you read the news you'd know that there will be no more P4 cores.

cmdrmoocow · Mar 21, 2005

If I worked for Intel, I would have spent more effort on buying the rights to IBM's Strained Silicon that AMD uses to keep their heat down at similar clock speeds.

It might've even let them scale the P4 even higher for a given heat rating, which means more speed.

If I was working with Intel right now, I'd convert the new chipsets so that they work with the Pentium M while doing some magic to add 64-bit emulation to the PM. If both can be done simultaneously, AMD would definitely be in trouble.

Need more power? Put two cores on it - still less power usage. Still need more? Four cores. Ad infinitum. What we really should start seeing in the future is the calculations per watt, rather than calculations per dollar.

dmens · Mar 21, 2005

Um, we already use strained silicon. The process is fine... probably better than IBM's, now that I think about it.

Originally posted by: cmdrmoocow
If I worked for Intel, I would have spent more effort on buying the rights to IBM's Strained Silicon that AMD uses to keep their heat down at similar clock speeds.

It might've even let them scale the P4 even higher for a given heat rating, which means more speed.

If I was working with Intel right now, I'd convert the new chipsets so that they work with the Pentium M while doing some magic to add 64-bit emulation to the PM. If both can be done simultaneously, AMD would definitely be in trouble.

Need more power? Put two cores on it - still less power usage. Still need more? Four cores. Ad infinitum. What we really should start seeing in the future is the calculations per watt, rather than calculations per dollar.

uOpt · Mar 21, 2005

Originally posted by: dmens

Originally posted by: MartinCracauer
Now that they are in the hole they still don't stop digging, the dual-core CPUs will be as power-hungry as the Prescott. They do that because it would take too long to make the Pentium-M even SMP-capable, not to mention dual-core, and AMD would beat them.

Click to expand...

If you read the news you'd know that there will be no more P4 cores.

Well, word is that the dual-core CPUs will be based on what is now the Prescott, same power consumption.

dmens · Mar 21, 2005

Originally posted by: MartinCracauer

Originally posted by: dmens

Originally posted by: MartinCracauer
Now that they are in the hole they still don't stop digging, the dual-core CPUs will be as power-hungry as the Prescott. They do that because it would take too long to make the Pentium-M even SMP-capable, not to mention dual-core, and AMD would beat them.

Click to expand...

If you read the news you'd know that there will be no more P4 cores.

Click to expand...

Well, word is that the dual-core CPUs will be based on what is now the Prescott, same power consumption.

Oh, that's old stuff. I'm talking about new generation P4's.

clarkey01 · Mar 21, 2005

Originally posted by: dmens

Originally posted by: imgod2u
Sadly, the future of MPU design seems to be going away from making more elegant and efficient cores. Instead, they're content on being lazy and just slapping multiple cores together and call that a performance increase.

Click to expand...

Dunno about other companies, but that's not what we're doing...

By the way, as we move into multiple cores on a single die, the power envelope for each core is dramatically lowered. Many fancy uarch features become simply unfeasible, and then a lot of potential features remaining have crappy ROI. Good examples of uber-expensive features include "better" branch predictors and data speculation. Also, IMO the execution core is no longer the limiting factor on performance. But that's just me.

In any case, I believe it's high time the programmers started to shoulder some of the burden in extracting parallelism. There is only so much logic we can cram into a pipestage.

Intel never intended to go into multicores as soon as you think or are hinting at.

Tejas (Prescott replacement) & Nehalem were netburst designs that were scarped when they saw the prescott numbers, Nehalem was suppose to start off at 10.25 Ghz by 2005 ( see the old Intel 2003 roadmap). They never intended to move to dual cores until they saw AMD do so and when netburst grinded to a halt.

clarkey01 · Mar 21, 2005

Originally posted by: dmens

Originally posted by: MartinCracauer

Originally posted by: dmens

Originally posted by: MartinCracauer
Now that they are in the hole they still don't stop digging, the dual-core CPUs will be as power-hungry as the Prescott. They do that because it would take too long to make the Pentium-M even SMP-capable, not to mention dual-core, and AMD would beat them.

Click to expand...

If you read the news you'd know that there will be no more P4 cores.

Click to expand...

Well, word is that the dual-core CPUs will be based on what is now the Prescott, same power consumption.

Click to expand...

Oh, that's old stuff. I'm talking about new generation P4's.

Do you know if anything from netburst (tejas, nehalem) have made it to other chips ? I speak to a fab worker in Ireland, he hinted work on netburst still goes on, but on other "processes".

dmens · Mar 21, 2005

Originally posted by: clarkey01

Originally posted by: dmens

Originally posted by: imgod2u
Sadly, the future of MPU design seems to be going away from making more elegant and efficient cores. Instead, they're content on being lazy and just slapping multiple cores together and call that a performance increase.

Click to expand...

Dunno about other companies, but that's not what we're doing...

By the way, as we move into multiple cores on a single die, the power envelope for each core is dramatically lowered. Many fancy uarch features become simply unfeasible, and then a lot of potential features remaining have crappy ROI. Good examples of uber-expensive features include "better" branch predictors and data speculation. Also, IMO the execution core is no longer the limiting factor on performance. But that's just me.

In any case, I believe it's high time the programmers started to shoulder some of the burden in extracting parallelism. There is only so much logic we can cram into a pipestage.

Click to expand...

Intel never intended to go into multicores as soon as you think or are hinting at.

Tejas (Prescott replacement) & Nehalem were netburst designs that were scarped when they saw the prescott numbers, Nehalem was suppose to start off at 10.25 Ghz by 2005 ( see the old Intel 2003 roadmap). They never intended to move to dual cores until they saw AMD do so and when netburst grinded to a halt.

Haha. I obviously can't say much about roadmap details, but you ought to go to the Inq and look for recent articles on Intel multicore targets, then guess the length of CPU dev time for such a new architecture.

dmens · Mar 21, 2005

Originally posted by: clarkey01
Do you know if anything from netburst (tejas, nehalem) have made it to other chips ? I speak to a fab worker in Ireland, he hinted work on netburst still goes on, but on other "processes".

The Inquirer will answer all those questions.

clarkey01 · Mar 21, 2005

Originally posted by: dmens

Originally posted by: clarkey01
Do you know if anything from netburst (tejas, nehalem) have made it to other chips ? I speak to a fab worker in Ireland, he hinted work on netburst still goes on, but on other "processes".

Click to expand...

The Inquirer will answer all those questions.

erm lol. I hear santa clera doesnt care much for what they print ?

dmens · Mar 21, 2005

I don't speak for Intel.

SuperTool · Mar 21, 2005

dmens, just a heads up, if you don't speak for intel, don't say "we" as in:

Dunno about other companies, but that's not what we're doing...

dmens · Mar 21, 2005

Originally posted by: SuperTool
dmens, just a heads up, if you don't speak for intel, don't say "we" as in:

Dunno about other companies, but that's not what we're doing...

Click to expand...

That is a personal observation of what Intel is doing.

MisterChief · Mar 21, 2005

I've heard alot about transistor leakage. What is it, exactly?

dmens · Mar 21, 2005

It's the current that flows through a transistor when it is in subthreshold, or "off", so to speak. The current is negligible, but it adds up when you have 200 million devices in a die.

Prescott: What would you have done?

Senior member

Diamond Member

Senior member

Lifer

Platinum Member

Member

Senior member

Platinum Member

Golden Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Lifer

Platinum Member

Banned

Platinum Member