Prescott: What would you have done?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: Slaimus
I do not understand why Intel does not handpick less leaky, but slower transistors in the new revision, since they know that they will not have to break the 4GHZ barrier.

That was done on prescott as well... not that it did a lot of good.
 

clarkey01

Diamond Member
Feb 4, 2004
3,419
1
0
Originally posted by: dmens
Originally posted by: Slaimus
I do not understand why Intel does not handpick less leaky, but slower transistors in the new revision, since they know that they will not have to break the 4GHZ barrier.

That was done on prescott as well... not that it did a lot of good.

I think one Intel worker left because he believed netburst was just a marketing tool (think his name was bob combwell). Ah well, he may of been right, but look who dominated the desktop anyhow.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: clarkey01
I think one Intel worker left because he believed netburst was just a marketing tool (think his name was bob combwell). Ah well, he may of been right, but look who dominated the desktop anyhow.

people who think marketing shouldn't be part of this game are kidding themselves. the netburst design philosophy started out as a serious attempt to scale to 10ghz but as reality set in the marketers tried to turn freq scaling into a selling point... which worked out pretty well.
 

clarkey01

Diamond Member
Feb 4, 2004
3,419
1
0
Originally posted by: dmens
Originally posted by: clarkey01
I think one Intel worker left because he believed netburst was just a marketing tool (think his name was bob combwell). Ah well, he may of been right, but look who dominated the desktop anyhow.

people who think marketing shouldn't be part of this game are kidding themselves. the netburst design philosophy started out as a serious attempt to scale to 10ghz but as reality set in the marketers tried to turn freq scaling into a selling point... which worked out pretty well.

Yeah it did. But do you agree that if work continued on the desktop for the P6 design, things would have worked out better ? (Tech front)


 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: clarkey01Yeah it did. But do you agree that if work continued on the desktop for the P6 design, things would have worked out better ? (Tech front)

Hindsight is 20/20. And even if recent work on the P6 was for the mobile market, the design is versatile enough to make it a desktop/server product fairly quickly.
 

clarkey01

Diamond Member
Feb 4, 2004
3,419
1
0
Originally posted by: dmens
Originally posted by: clarkey01Yeah it did. But do you agree that if work continued on the desktop for the P6 design, things would have worked out better ? (Tech front)

Hindsight is 20/20. And even if recent work on the P6 was for the mobile market, the design is versatile enough to make it a desktop/server product fairly quickly.

If you could change the design of prescott, what would you do ? I cant say, as I dont work for Intel
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: clarkey01
If you could change the design of prescott, what would you do ? I cant say, as I dont work for Intel

i wouldn't change a thing on the architectural design. everything we do to the core at the high architectural level is scrutunized and debated over, with countless benchmarking and ROI justification.

i could come up with some random bs about how i could make prescott faster, but im sure its been thought of and discarded for a good reason. i dont know enough to make judgements even though i work in design and spent years in college studying the exact same crap. if i did know enough to pass judgement i'd be getting paid a lot more...

a couple comments on the circuit side of things... obviously power was not a big concern for the p4 projects. if i could change anything it would be to drill power discipline in all the designers... power cannot be an afterthought. then again, given the types of circuits used in p4, im not sure if that s*** can be tamed at all. lol. also, the prescott project was horribly mismanaged. that is the biggest problem. not the design.
 

LithographWoker

Junior Member
Apr 14, 2005
19
0
0
The work of the Branch Prediction unit in Intel processors with NetBurst architecture is based on the work with Branch target Buffer (BTB). It is a 4KB buffer storing the statistics about the already complete branching. In other words, Intel?s branch prediction is based on a probabilistic model: the CPU evaluates a given branch as preferable or not in each particular case according to the collected statistical data. This algorithm proved very efficient, however, it turns out absolutely useless if there is no statistics about a certain branch. The Northwood based CPUs selected a ?backward? branch in this case, considering that quitting cycles is the most widely spread branch
 

uOpt

Golden Member
Oct 19, 2004
1,628
0
0
Originally posted by: LithographWoker
The work of the Branch Prediction unit in Intel processors with NetBurst architecture is based on the work with Branch target Buffer (BTB). It is a 4KB buffer storing the statistics about the already complete branching. In other words, Intel?s branch prediction is based on a probabilistic model: the CPU evaluates a given branch as preferable or not in each particular case according to the collected statistical data. This algorithm proved very efficient, however, it turns out absolutely useless if there is no statistics about a certain branch. The Northwood based CPUs selected a ?backward? branch in this case, considering that quitting cycles is the most widely spread branch

You can also tell the CPU which one is more likely, but I am not aware of a compiler doing that (because the compiler likely doesn't know either, except maybe for exception where it can assume the exception won't be thrown).

However, Intel documents what will be assumed by default so that you can put the more probable code into the branch that it assumes is more probable. This of course will be useless once this decision ends up in the cache.

That is all fine but really, for Pentium-3s, Pentium-Ms and AMDs I don't have to do any of this nonsense to get good performance. Why would I invest time into doing this optimization if the result is only required for the biggest power hog on the market which will never end up in our datacenter racks?
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: MartinCracauer
Originally posted by: LithographWoker
The work of the Branch Prediction unit in Intel processors with NetBurst architecture is based on the work with Branch target Buffer (BTB). It is a 4KB buffer storing the statistics about the already complete branching. In other words, Intel?s branch prediction is based on a probabilistic model: the CPU evaluates a given branch as preferable or not in each particular case according to the collected statistical data. This algorithm proved very efficient, however, it turns out absolutely useless if there is no statistics about a certain branch. The Northwood based CPUs selected a ?backward? branch in this case, considering that quitting cycles is the most widely spread branch

You can also tell the CPU which one is more likely, but I am not aware of a compiler doing that (because the compiler likely doesn't know either, except maybe for exception where it can assume the exception won't be thrown).

However, Intel documents what will be assumed by default so that you can put the more probable code into the branch that it assumes is more probable. This of course will be useless once this decision ends up in the cache.

That is all fine but really, for Pentium-3s, Pentium-Ms and AMDs I don't have to do any of this nonsense to get good performance. Why would I invest time into doing this optimization if the result is only required for the biggest power hog on the market which will never end up in our datacenter racks?


I'd imagine there's quite a bit you can do to optimize for AMD cpus, the athlon core is a theoretically very powerful core.(but maybe the integrated memory controller of the a64 already did everything that needed to be done? maybe only hardware optimizations and not software can help?)
 

uOpt

Golden Member
Oct 19, 2004
1,628
0
0
Originally posted by: Fox5

I'd imagine there's quite a bit you can do to optimize for AMD cpus, the athlon core is a theoretically very powerful core.(but maybe the integrated memory controller of the a64 already did everything that needed to be done? maybe only hardware optimizations and not software can help?)

No question. The AMD CPUs in particular like hints to do cache prefix.

But the point is that the Pentium-4 requires about twice as many tricks as everybody else, and that means the second half is not every required for Intel's own Pentium-M CPU.

Some of the P4 requirement are also very hard to comply to. So they had all of integer multiplication, division and bit shifting slower. How am I supposed to get rid of type tags in integer. They had every possible way to do that slower. On the other hand they have the huge cache lines and the modern ones fast memory. So if I would have to follow that and would try to write a compiler to do fast Pentium-4 code for a language requiring type bit you would rewrite the whole thing to use tag bytes or words, or use external type tags in the next cache line. That means rewring much of the compiler. Not going to happen since every other CPU does not require it.

Personally I think that Itanium was only a psychological trick to make compiler writers accept the Pentium-4 better, as in the noise :p
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Of course there is a reason why Prescott consumes so much power even with 90nm process. Most people think its the 90nm process, but they are overlooking the fact that Prescott has 125 million transistors, and if you calculate how much is required for extra caches, you can see that logic transistors increased SIGNIFICANTLY, which consume a lot of power. You can see that extra 1MB L2 on Prescott 6xx series affected nothing on TDP. I saw a Hot Chips presentation for Prescott, and now even the AGU is double pumped.

By the way, as Anandtech pointed couple of months ago, the double pumped ALU is one of the coolest running part of the Prescott.