Shanghai / Istanbul vs Nehalem?

harpoon84 · Sep 26, 2008

AMD has the smaller cores, but Intel has the higher cache density. That is why a quad core Nehalem will have roughly the same die size as a hex core Istanbul despite the actual core being twice the size.

So in reality, its ~1.5x the cores for a given die size when factoring in L2/L3 cache.

Martimus · Sep 26, 2008

Originally posted by: harpoon84
AMD has the smaller cores, but Intel has the higher cache density. That is why a quad core Nehalem will have roughly the same die size as a hex core Istanbul despite the actual core being twice the size.

So in reality, its ~1.5x the cores for a given die size when factoring in L2/L3 cache.

That is true, although the Nehalem L3 cache is inclusive, so it isn't quite an apples to apples comparison. That may be why they have the additional 2MB of L3 cache, since 2MB of it are used when copying the L1 and L2 caches (assuming that nehalem uses 512KB/core of L2 cache and 64KB/core of L1 cache - but I don't remember really), and they have a higher cache density to make up for it.

Idontcare · Sep 26, 2008

Originally posted by: Martimus

Originally posted by: harpoon84
AMD has the smaller cores, but Intel has the higher cache density. That is why a quad core Nehalem will have roughly the same die size as a hex core Istanbul despite the actual core being twice the size.

So in reality, its ~1.5x the cores for a given die size when factoring in L2/L3 cache.

Click to expand...

That is true, although the Nehalem L3 cache is inclusive, so it isn't quite an apples to apples comparison. That may be why they have the additional 2MB of L3 cache, since 2MB of it are used when copying the L1 and L2 caches (assuming that nehalem uses 512KB/core of L2 cache and 64KB/core of L1 cache - but I don't remember really), and they have a higher cache density to make up for it.

The inclusive vs. exclusive cache difference between Shanghai and Nehalem is an interesting one to me.

Anyone have a compelling reason why one is believed to be superior to the other in their expected form of implementation (Shanghai vs. Nehalem)? Surely AMD believes exclusive is superior to inclusive for some good reasons, and surely Intel believes the opposite for some good reasons as well.

Anyone happen to know the reasons given by both sides? I'd really be keen to know more.

Martimus · Sep 26, 2008

Originally posted by: Idontcare

Originally posted by: Martimus

Originally posted by: harpoon84
AMD has the smaller cores, but Intel has the higher cache density. That is why a quad core Nehalem will have roughly the same die size as a hex core Istanbul despite the actual core being twice the size.

So in reality, its ~1.5x the cores for a given die size when factoring in L2/L3 cache.

Click to expand...

That is true, although the Nehalem L3 cache is inclusive, so it isn't quite an apples to apples comparison. That may be why they have the additional 2MB of L3 cache, since 2MB of it are used when copying the L1 and L2 caches (assuming that nehalem uses 512KB/core of L2 cache and 64KB/core of L1 cache - but I don't remember really), and they have a higher cache density to make up for it.

Click to expand...

The inclusive vs. exclusive cache difference between Shanghai and Nehalem is an interesting one to me.

Anyone have a compelling reason why one is believed to be superior to the other in their expected form of implementation (Shanghai vs. Nehalem)? Surely AMD believes exclusive is superior to inclusive for some good reasons, and surely Intel believes the opposite for some good reasons as well.

Anyone happen to know the reasons given by both sides? I'd really be keen to know more.

I think that inclusive is easier to do, and easier to code for (if AMD had inclusive L3 cache, the TLB errata would not have been an issue - I think, I am a little fuzzy about that now), but it effectively reduces your cache size to the outer most volume (In Nehalems case 8MB). Technically, AMD has 8.5MB of cache available to the processor, but 2.5MB (640KB per core) of that is in L1 and L2 cache, which can only be used by each individual core. I believe that AMD would use inclusive cache just like Intel if it had a better process and could just add the cache without using up much space, but they can't if they are going to compete with Intel.

The reason I say it is easier is based on all the reading I did about the TLB errata to try to figure out what it was. Most of the stuff I read about was due to the fact that AMD used exclusive cache, versus inclusive, so they needed to do additional steps to ensure data integrity. There is a good anandtech article that explains it pretty well when the 9850 was released. I don't have time to link it right now, but it should be easy to find.

So each core on a Shanghai processor should have access to 6.625MB of cache(L1+L2+L3), while each core of a Nehalem processor should have access to just under 6.5MB of cache (L3-(L1+L2 of the three other cores)), so they both have approximately the same cache for each core.

edit: My logic was flawed. Nehalem would be exactly 6.5MB per core, because the L1 cache is inclusive within the L2 cache as well. (Still making the assumption that L2 Cache is 512KB like Shanghai, although it may only be 256KB.)

Also here is the link to the Anandtech Article I suggested Earlier: TLB Errata Explanation

Cookie Monster · Sep 26, 2008

Is the nehalem IMC/L3 running at synchronous speeds with the core clock frequency? According to rumours, deneb/shanghai still has slow L3/IMC clock speeds.

daw123 · Sep 26, 2008

Thanks guys and girls for the posts, its interesting to read.

Has any website done a direct comparative benchmark between Shanghai and Nehelam or is it still too early for that?

myocardia · Sep 26, 2008

Originally posted by: Zstream

Originally posted by: Idontcare
Understand I want to see a >20% IPC improvement, but I can't force myself to blindly hope and believe it will happen based on the scant few architecture tweak details that have leaked to date. I'm not ignorant enough of chip design to be bliss about this.

Click to expand...

Obviously you are...

Who said anything about 4mb L3 cache? It is 6mb and it is quite faster. THE LATENCY IS WHAT KILLS L3 on THE CHIP.

Anyways read up on server applications and farm and then come back.

So the noob AMD fanboy FUD-spreader is trying to tell the EE, who has experience designing/taping out CPU's, that he doesn't know anything about CPU's?:laugh: That's ****ing classic! Oh yeah, and the EE happens to have a PhD, no less! ****ing classic, to say the least.

My guess is that Shanghai will provide a 5-10% performance boost in the server segment, i.e. MySQL, and the like. Then again, if they've found out how to enable running the L3$ @ full speed, I can see it being somewhat higher.

myocardia · Sep 26, 2008

Originally posted by: daw123
Has any website done a direct comparative benchmark between Shanghai and Nehelam or is it still too early for that?

How could there be a comparison of two CPU's that haven't been released yet?

daw123 · Sep 26, 2008

Sorry, I was being a bit of a pillock in my previous post. Ignore my late night ramblings.

OCGuy · Sep 26, 2008

Originally posted by: hooflung

Originally posted by: Ocguy31

Originally posted by: BLaber

Originally posted by: Ocguy31

Originally posted by: daw123
Hello guys and gals,

I recently registered with AMDZone for the hell of it and I was wondering how Shanghai (or even Istanbul) compares to the i7 based on the current information available / released by AMD and Intel.

I realise that with none of these chips being released on the open market yet, benchmarks and information may not be accurate or could be misleading.

btw the people on AMDZone are AMD fanatics and realy hate Intel (or spIntel as they call them). Was this just me or have others had the same impression? It makes me not believe the information posted on that forum because it is generally so biased toward AMD.

This topic is purely for my own curiousity.

Thanks for reading.

Click to expand...

They are diehard fanbois that just love to pay for inferior products. I would take everything they say with a grain of salt, just like if you signed up for "IntelLand" or some Intel fansite.

Click to expand...

Just like that .... I thought Anandtech forums are unofficially known as "INTELZONE" just like we have AMDZONE for Amd fanbois ...

Click to expand...

No, its full of people who just want the best. Unless you have blinders on, right now Intel is the hands-down winner for enthusiasts. Can you run everything fine on a Phenom? Of course. But you arent taking a $130 Dual-core and cranking it to 4.0ghz anytime soon.

Go visit the video card section. This site is full of AMD fans that are just hoping and praying that they make a move in the CPU market. They are grudgingly using Intel for the time being because it is the smart move.

Click to expand...

....your assertions that Intel C2D is a superior product is flawed. It is not especially when considering platforms. Right now the P35 and P45 and the X-3/4 series are nothing special outside of the overclocking.

Overclocking is a value added feature of the chip but it is not driving the market or their stocks. Intel's ability to pump them out at amazing quantity is. Not only that, clock for clock AMD Phenom's are very, very competitive. And if you want to dive into Overclocking then the Phenom's are now seriously giving Intel problems. Too little too late? Maybe, but they are price and performance very competitive for enthusiasts and AMD platforms are much more comprehensive.

Putting the uncalled for personal attack on me aside.....

Im sorry but I cant take you serious after that statement.

dmens · Sep 26, 2008

Originally posted by: hooflung
They are good chips and in games they are very good products. However, in server environments such as virtualization and databases they are not superior. Since this isn't the 'PC Gaming' forum or the 'Video Card' forum your assertions that Intel C2D is a superior product is flawed. It is not especially when considering platforms. Right now the P35 and P45 and the X-3/4 series are nothing special outside of the overclocking.

is this a database or virtualization forum then? or maybe a high traffic floating point computation forum. LOL. no one is claiming the C2D is superior in every aspect. just the shit that a lot of people care about.

speaking of which, can't wait for nehalem reviews to come out. then you'll have to cherry pick even more selectively.

Overclocking is a value added feature of the chip but it is not driving the market or their stocks. Intel's ability to pump them out at amazing quantity is. Not only that, clock for clock AMD Phenom's are very, very competitive. And if you want to dive into Overclocking then the Phenom's are now seriously giving Intel problems. Too little too late? Maybe, but they are price and performance very competitive for enthusiasts and AMD platforms are much more comprehensive.

nobody with a brain gives a shit about clock for clock comparisons. you can compare raw performance, or performance per watt, but deliberately equalizing the clock is garbage because it throws out the clock rate design axis. the only so-called clock-to-clock comparsion that is remotely fair is to reduce the voltage of the parts to the bare minimum for the equalized clock then compare the performance/power.

and last i checked, phenoms don't overclock anywhere 65nm C2D, never mind a 45nm. what makes you think a 45nm phenom will make any giant leaps. and you still have not seen nehalem overclocking.

amd platforms are comprehensive how? they have better integrated video or something... yeah, that's really something an enthusiast looks for.

daw123 · Sep 26, 2008

Back to the land of sanity from having venturing in to the other forum (not mentioning any names

).

I agree with dmens you have to have comparable performance. Clock for clock doesn?t mean anything; I remember when Intel went to the m processors for mobile (laptop) technology; significantly reduced core speeds to the P4 laptop equivalents. This gave me a headache trying to buy a laptop on my dad's behalf and trying to figure performance equivalents when the technology (and speeds) differed so much.

I hope I have grasped the ethos of your post and if not I've done you a great disservice.

Keysplayr · Sep 26, 2008

Keep in mind folks, that Nehalem isn't going to be all that much faster (if at all) then C2D at desktop apps and gaming (unless using apps that utilize more than 4 cores of course.)
Nehalem is more geared for Server Platforms. Not saying they wont be great desktop CPU's and great overclockers, but won't shine over C2D like some think it will.
I think Shanghai may have a shadow of a chance to close some gappage. They had better use this "lull" wisely.

Idontcare · Sep 26, 2008

Originally posted by: keysplayr2003
Keep in mind folks, that Nehalem isn't going to be all that much faster (if at all) then C2D at desktop apps and gaming (unless using apps that utilize more than 4 cores of course.)

Goes to show how optimized the Penryn cores are.

Given the stated policy that an architecture improvement was not incorporated into Nehalem unless it would improve performance by a greater percentage than it increased power consumption...I take the fact that single-thread performance wasn't markedly improved as an indication that the architecture improvements left to be made (over those present in Penryn) to increase IPC for single-threads on Nehalem all require a higher percentage increase in power consumption than the percentage IPC increase it would deliver.

All the low-hanging fruit for boosting performance with commensurate increase in power consumption rely on extracting parallelism, not single-thread performance. (apparently, naturally I have no way of knowing if any of this true or fact)

bryanW1995 · Sep 26, 2008

conroe > phenom
penryn> conroe
nehalem >>penryn

shanghai had better show some chinese-style improvement if it hopes to overcome the incredible ass-kicking that intel has been inflicting upon amd for the past couple of years. Expect more of the same at least until 2010.

Dadofamunky · Sep 27, 2008

Originally posted by: Idontcare

Originally posted by: BLaber
Just like that .... I thought Anandtech forums are unofficially known as "INTELZONE" just like we have AMDZONE for Amd fanbois ...

Click to expand...

For someone with a join date of 6/23/2008 you are prone to having little to no memory of just how disloyal the majority of members here were when A64 came out, same when Northwood came out, same when Conroe came out.

It's just how we are, a bunch of disloyal opportunists looking out for numero uno when it comes to the rigs we use.

Without AMD and their Athlons I would not have been able to pursue my PhD (chemical physics with emphasis on computational chemistry) as Intel just didn't have the FP horsepower/$ my research projects required at the time for completing the projects in a reasonable timeline.

And yet today my basement has five Intel quad-core systems running foreign currency exchange simulations to maximize the rate of coin going into my pocket from my business.

Don't get me wrong, I am a loyal fanboi...of my family and whatever is in their best financial interest.

+1

Lonyo · Sep 27, 2008

Originally posted by: Idontcare

Originally posted by: Martimus

Originally posted by: harpoon84
AMD has the smaller cores, but Intel has the higher cache density. That is why a quad core Nehalem will have roughly the same die size as a hex core Istanbul despite the actual core being twice the size.

So in reality, its ~1.5x the cores for a given die size when factoring in L2/L3 cache.

Click to expand...

That is true, although the Nehalem L3 cache is inclusive, so it isn't quite an apples to apples comparison. That may be why they have the additional 2MB of L3 cache, since 2MB of it are used when copying the L1 and L2 caches (assuming that nehalem uses 512KB/core of L2 cache and 64KB/core of L1 cache - but I don't remember really), and they have a higher cache density to make up for it.

Click to expand...

The inclusive vs. exclusive cache difference between Shanghai and Nehalem is an interesting one to me.

Anyone have a compelling reason why one is believed to be superior to the other in their expected form of implementation (Shanghai vs. Nehalem)? Surely AMD believes exclusive is superior to inclusive for some good reasons, and surely Intel believes the opposite for some good reasons as well.

Anyone happen to know the reasons given by both sides? I'd really be keen to know more.

http://www.anandtech.com/cpuch...howdoc.aspx?i=3382&p=9

The L3 cache is shared by all cores and in the initial Core i7 processors will be 8MB large, although its size will vary depending on the number of cores. Multi-threaded applications that are being worked on by all cores will enjoy the large, shared L3 cache.

Intel defended its reasoning for using an inclusive cache architecture with Nehalem, something it has always done in the past. Nehalem?s L3 cache is inclusive in that it contains all data stored in the L1 and L2 caches as well. The benefit is that if the CPU looks for data in L3 and doesn?t find it, it knows that the data doesn?t exist in any core?s L1 or L2 caches - thereby saving core snoop traffic, which not only improves performance but reduces power consumption as well.

An inclusive cache also prevents core snoop traffic from getting out of hand as you increase the number of cores, something that Nehalem has to worry about given its aspirations of extending beyond 4 cores.

http://www.anandtech.com/cpuch...howdoc.aspx?i=2939&p=9
AMD

With four cores sharing a single die, AMD didn't want to complicate its design by introducing a large unified L2 cache. Instead, it took the K8 cache hierarchy and added a third level of cache to the mix - shared among all four cores. At 65nm, a quad-core Barcelona will have a 2MB L3 cache that is shared by all four cores.

The hierarchy in Barcelona works like this: the L2 caches are filled with victims from the L1 cache. When a cache gets full, data that was not recently used is evicted to make room for new data that the cache controller determines is good to keep in the cache. In a victim cache structure, the evicted data is placed in a storage area known as a victim cache instead of being removed from cache all together. If the data should become useful again, the cache controller simply has to fetch it from the victim cache rather than much slower main memory; victims from Barcelona's L1 are kicked out to the L2 cache.

The new L3 cache, acts as a victim for the L2 cache. So when the small L2 cache fills up, evicted data is sent to the larger L3 cache where it is kept until space is needed. The algorithms that govern the L3 cache's operation are designed to accommodate data that is likely to be needed by multiple cores. If the CPU fetches a bit of code, a copy is left in the L3 cache since the code is likely to be shared among the four cores. Pure data load requests however go through a separate process. The cache controller looks at history and if the data has been shared before, a copy will be left in the L3 cache; otherwise it will be invalidated.

Idontcare · Sep 27, 2008

Originally posted by: Martimus
I believe that AMD would use inclusive cache just like Intel if it had a better process and could just add the cache without using up much space, but they can't if they are going to compete with Intel.
.
.
.
Also here is the link to the Anandtech Article I suggested Earlier: TLB Errata Explanation

I'm inclined to believe that is the most straightforward reason from an Occam's razor standpoint - it comes down to cache density and total effective cache size. It is interestingly coincidental that both Shanghai and Nehalem result in nearly identical total effective cache available to a given core.

And thanks for the link. I did google for the 9850 (as well as a bunch of cache articles) but was not having much luck in finding a "dumb enough but not too dumb" explanation. Probably because I am not asking for the definition of the cache styles but rather the pros and cons from the suppliers themselves and of course that is going to result in everything being a matter of opinion. Cache density, your opinion, is a very believable one in the absence of a more coercive opinion.

Originally posted by: Lonyo
The benefit is that if the CPU looks for data in L3 and doesn?t find it, it knows that the data doesn?t exist in any core?s L1 or L2 caches - thereby saving core snoop traffic, which not only improves performance but reduces power consumption as well.

An inclusive cache also prevents core snoop traffic from getting out of hand as you increase the number of cores, something that Nehalem has to worry about given its aspirations of extending beyond 4 cores.

Lonyo that is awesome :beer: Thanks for digging up the links and the relevant quotes. This explanation of the pro's of Nehalem going with an inclusive L3$ is crisp and clear.

Presumably the cons would be that since you are allocating xtors on the die for holding duplicate data you are going to increase manufacturing costs by having a larger die-size (or decrease performance by removing xtors from the budget that would have gone towards implementing another feature elsewhere in the logic block of the core).

Is that a reasonable presumption?

Originally posted by: Lonyo
The hierarchy in Barcelona works like this:

This one was a little less pro/con and more an explanation of how exclusive cache works. Has AMD, or anyone with an authority on the subject matter, discussed the pro's and con's of exclusive cache on AMD's K10?

Smack me with a wet fish here if this makes no sense, but I vaguely remember reading something long ago about AMD's K10 being setup such that core-to-core communications (as in cache snooping, etc) did not require accessing the L3$ as the closest point of contact to a core, they could directly snoop the L1$ or L2$ to query the contents and that was faster than (if the data were found) to retrieve versus waiting for the slow(er) L3$ to respond and send the data if it was contained therein.

Is this true? If so then could this be a pro for exclusive cache? You get to use all the xtors in your sram for potentially storing 100% unique data (no duplication) and you could have faster core-to-core cache transfers than using an inclusive cache hierarchy as done on Nehalem?

I really find this dichotomy between Intel and AMD to be quite intriguing. Cache density could simply be the Occam's razor here though, its a simple beautiful proposition.

Another theory that cannot be ruled out a priori is that this is an artificial situation brought about by IP restrictions on one side or the other. There may be patents involved here that Intel has licensed that AMD did not, or vice versa.

FalseChristian · Sep 27, 2008

Very interesting topic and posts. I learned alot today. Thanks!

Acanthus · Sep 27, 2008

Some of you seem to forget (or just dont know) that we have Intel and AMD engineers with PhDs who post on these forums...

To accuse forum members of being "fanbois" without knowing who they are can quickly ruin your reputation as a reasonable and well thought out poster here at anandtech.

On to Nehalem... The performance jump will be large server side and decent for basic computing/games.

Intel gets to see the jump from going to a IMC, and the inclusive cache hiearchy just makes sense. On top of that, we will see improvements in efficiency, new instructions, the return of hyperthreading for highly threaded apps and cpu intensive multitasking... etc...

My prediction (speculation based on some experience):
15-30% Clock for clock server applications, scientific apps, heavy multitasking, encoding.

5-15% everywhere else.

Idontcare · Sep 27, 2008

Originally posted by: Acanthus
On to Nehalem... The performance jump will be large server side and decent for basic computing/games.

Intel gets to see the jump from going to a IMC, and the inclusive cache hiearchy just makes sense. On top of that, we will see improvements in efficiency, new instructions, the return of hyperthreading for highly threaded apps and cpu intensive multitasking... etc...

My prediction (speculation based on some experience):
15-30% Clock for clock server applications, scientific apps, heavy multitasking, encoding.

5-15% everywhere else.

Anandtech's own nehalem preview gives us some concern for single-threaded performance improvements over Penryn:

Cinebench shows us only a 2% increase in core-to-core performance from Penryn to Nehalem at the same clock speed. For applications that don't go out to main memory much and can stay confined to a single core, Nehalem behaves very much like Penryn.

http://www.anandtech.com/cpuch...howdoc.aspx?i=3326&p=7

I'm happy to see multi-threaded apps getting faster and faster, but I would have liked to have seen some xtor budgeted for giving us a 10% improvement in single-threaded IPC as well given that Nehalem is a tock and all.

harpoon84 · Sep 27, 2008

Idontcare, what single threaded app is so demanding that a Penryn level IPC is found wanting? Sure even faster ST performance is preferable but what good will it do for your computing experience? Any truly CPU intensive task nowadays is already multithreaded, with the slight exception of games perhaps, but even games nowadays require at least a dual core CPU to run well.

Idontcare · Sep 27, 2008

Well if you think about, everything that is multithreaded but has less than 8 threads will benefit from more robust thread processing performance at the core level. If this weren't the case then there would be no performance-based reason to have more than one clockspeed SKU. IPC still matters.

A dual-threaded application will not (loosely) see a speed-up anymore than a single-threaded application if you think about. Not unless the dual-threaded application was seriously challenging the FSB on C2D.

Penryn had 6MB of shared L2$ for dual-core already...going to 8MB shared L3$ and an IMC should help improve performance of a dual-threaded app but it should not improve it any more so than it will help a single-threaded app. (again this is only true, clearly, provided the dual-threaded app was not limited on C2D by interprocessor communication rates)

For me specifically, my applications of interest are TMPGEnc and Metatrader 4.

TMPGEnc is multi-threaded but is always faster to finish the entire encoding job if you bust up the project into multiple projects (takes maybe 20s of prep work to do that) and log them into the batch encode tool and allow the batch processor to run multiple projects in parallel.

I do a fair amount of home video of the kids these days for grandparents and myself and my encode jobs take nearly 24hrs on my 2.67GHz QX6700. (not hard-drive speed limited, 100% CPU load, just a lot of filters enabled to make it all to my liking)

Metatrader 4 is multithreaded but the rate limiting process is the backtester which itself runs as a single-thread. This is OK because I run some 20 instances in parallel so the cores are always fully loaded, but there are weeks when I just need really fast time to results in a single-instance of the application and I am most certainly IPC limited as my runs can take 4-6 weeks to complete on a 3.3GHz Q6600.

Also I have a host of legacy applications that I'd just as soon not spend another $1000 upgrading to the lastest versions (Photoshop, Mathematica, Acrobat, etc) just so I can buy the latest multi-threaded version. I'd like my newest generation $500 processor to add some value here.

At any rate...since this actually is relevant to the thread, I would argue that if Nehalem does not increase single-thread IPC over Penryn but rather delivers performance boosts by scaling multithread apps better than Penryn (and little more) then it actually opens a huge gaping door for Deneb to deliver Nehalem like performance for 80-90% of the applications consumers have on their computers this fall. That's a big deal if you ask me. Not many consumers are going to buy Nehalem to run their 4+ threaded application (Excel and Cinebench R11? Any others?) if they can buy a Deneb at half the price and get the same performance on all their 4 and sub-4 threaded applications.

edit: had the wrong L2$ size for Penryn, doh! Thanks myo!

Lonyo · Sep 27, 2008

Plus for enthusiasts, there's the issue of overclocking.
If you can get a quad core Penryn based CPU that will overclock to 3.6GHz vs buying a more expensive Nehalem which may do 8 threads, but won't overclock as well (who knows yet if it will or won't) then you can increase your single threaded app performance by increasing Core 2 clock speed rather than spending more for Nehalem.

I think the best option is to wait and see how increases pan out across the board though, single and multi-threaded apps, and also contemplate what sort of clockspeed headroom Nehalem will have for future processor introductions (since 3.2GHz is the launch speed and 3.2GHz is the top speed for Core 2 Quads, I'd assume Intel are expecting Nehalem to eventually scale above 3.2GHz to give increased single threaded performance through higher clocks, although that won't be a launch boost).

Acanthus · Sep 27, 2008

Originally posted by: Idontcare

Originally posted by: Acanthus
On to Nehalem... The performance jump will be large server side and decent for basic computing/games.

Intel gets to see the jump from going to a IMC, and the inclusive cache hiearchy just makes sense. On top of that, we will see improvements in efficiency, new instructions, the return of hyperthreading for highly threaded apps and cpu intensive multitasking... etc...

My prediction (speculation based on some experience):
15-30% Clock for clock server applications, scientific apps, heavy multitasking, encoding.

5-15% everywhere else.

Click to expand...

Anandtech's own nehalem preview gives us some concern for single-threaded performance improvements over Penryn:

Cinebench shows us only a 2% increase in core-to-core performance from Penryn to Nehalem at the same clock speed. For applications that don't go out to main memory much and can stay confined to a single core, Nehalem behaves very much like Penryn.

http://www.anandtech.com/cpuch...howdoc.aspx?i=3326&p=7

Click to expand...

I'm happy to see multi-threaded apps getting faster and faster, but I would have liked to have seen some xtor budgeted for giving us a 10% improvement in single-threaded IPC as well given that Nehalem is a tock and all.

Cinebench really has never been a good indicator of real world performance, i remain optimistic.

Shanghai / Istanbul vs Nehalem?

Golden Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Platinum Member

Platinum Member

Elite Member

Elite Member

Lifer

Platinum Member

Lifer

Elite Member

Diamond Member

Lifer

Elite Member

Golden Member

Elite Member

Lifer

Lifer