AMD K10.5 is 10-20 percent faster than K10

Idontcare · Feb 28, 2008

Originally posted by: CTho9305
How do you disable Phenom cores?

Originally posted by: Tomshardware
All it required was an Asus M3A32-MVP Deluxe motherboard, which allows the user to select how many Phenom cores to use. We took the time and benchmarked Phenom with four, three, two and even a single core
http://www.tomshardware.com/20...md_triple_core_phenom/

Originally posted by: CTho9305
Without a reasonable number of people doing controlled tests you can't really tell if it wasn't just dumb luck that that guy ended up with one slow core and 3 fast ones.

Technically you will always have one core which is the slowest. Simple consequence of having >1 core. It is the slowest core which becomes the clock-limiting logic circuit for the overall monolithic IC. I know you know this

heyheybooboo · Feb 28, 2008

Originally posted by: CTho9305

Originally posted by: heyheybooboo

Originally posted by: CTho9305

Originally posted by: Martimus

Originally posted by: Idontcare

Originally posted by: Kuzi
So with higher performance/smaller size than Phenom, if AMD can clock Shanghai higher (I'm sure they can) than 3GHz, they should be good competition to Yorkfield.

Click to expand...

Is there a concensus on the interweb as to what is limiting Phenom clocks at 65nm?

Is it TDP limited? xtor clocking limited (Vcore)? clock-skew limited (die-size)? or speed-path limited (layout)?

Click to expand...

From what I have gathered, the biggest issue is the IMC clock speed. Once the core speed gets farther and farther away from the memory speed, the chip gets errors when multiple cores go to access the memory. (Because two cores could be accessing the same memory location at different clock cycles, but since the IMC is running slower, the memory hasn't caught up yet giving the second access incorrect data.) Although there is so little out there about the technical limitations (at least that I have read) that this is the only thing I can think of. This is pure conjecture though, so I am probably wrong.

Click to expand...

Has anyone tried underclocking the northbridge/memory controller to verify this?

Click to expand...

I stumbled across a forum post (which I can't locate now) whereby someone disabled 'core2' of the Phenom in AMD Overdrive and succeeded in a generous overclock and control of the IMC/northbridge. It was difficult to determine if this was 'factual or FUD' because of the tone of the post.

Click to expand...

Without a reasonable number of people doing controlled tests you can't really tell if it wasn't just dumb luck that that guy ended up with one slow core and 3 fast ones. You'll never get controlled tests out of overclockers though - everybody will have some other tweak set slightly differently or be using different cooling/voltage/definition of stability/etc.

How do you disable Phenom cores?

'Disabled' was a poor choice of words. What he did was to use AMD Overdrive to independently overclock the other individual cores. I cannot recall what he did with that specific core, whether it remained at stock or was underclocked in some fashion.

Sorry I lost the link

because it would have been interesting for the early phenom adopters to play with. It's definitely a 'Brave New World' for the Phenom tweakers out there ...

IntelUser2000 · Feb 28, 2008

True, Intel are more effecient with their cache, they can stack more cache than AMD can in the same die area. What is interesting is that AMD licensed Z-RAM technology about two years ago, but didn't use it yet.

Not true anymore. Barcelona's cache is now equal/smaller per MB compared to Merom's cache: http://www.chip-architect.com/...19_Various_Images.html

Things like the IMC and the HTT bus logic is taking up quite a lot of space too.

The core logic on the Barcelona is actually smaller in die size than the Conroe/Merom core.

If the Shanghai's 10-20% faster than Barcelona predictions are in SpecCPU, than it'll still be behind Conroe/Penryn per clock in integer for single thread.

Regs · Feb 29, 2008

Originally posted by: IntelUser2000

If the Shanghai's 10-20% faster than Barcelona predictions are in SpecCPU, than it'll still be behind Conroe/Penryn per clock in integer for single thread.

And that's if it makes a performance difference across the board, which it won't. The only bottle neck I see is that the core has been over simplified to make up for the monolithic design. Another trip to Africa maybe?

Kuzi · Feb 29, 2008

Originally posted by: IntelUser2000

True, Intel are more effecient with their cache, they can stack more cache than AMD can in the same die area. What is interesting is that AMD licensed Z-RAM technology about two years ago, but didn't use it yet.

Click to expand...

Not true anymore. Barcelona's cache is now equal/smaller per MB compared to Merom's cache: http://www.chip-architect.com/...19_Various_Images.html

Things like the IMC and the HTT bus logic is taking up quite a lot of space too.

The core logic on the Barcelona is actually smaller in die size than the Conroe/Merom core.

If the Shanghai's 10-20% faster than Barcelona predictions are in SpecCPU, than it'll still be behind Conroe/Penryn per clock in integer for single thread.

Thanks for the link, seems you are correct about the IMC and HTT logic taking a sizable space on AMD CPUs, and about the cache density improvements on Barcelona.

About the SpecCPU performance predictions, I don't know really. What is important is how it will perform in everyday applications, games, encoding etc. So on those apps even a 10% increase will allow Shanghai to compete nicely with Penryn.

CTho9305 · Feb 29, 2008

Originally posted by: Idontcare

Originally posted by: CTho9305
How do you disable Phenom cores?

Click to expand...

Originally posted by: Tomshardware
All it required was an Asus M3A32-MVP Deluxe motherboard, which allows the user to select how many Phenom cores to use. We took the time and benchmarked Phenom with four, three, two and even a single core
http://www.tomshardware.com/20...md_triple_core_phenom/

Click to expand...

Originally posted by: CTho9305
Without a reasonable number of people doing controlled tests you can't really tell if it wasn't just dumb luck that that guy ended up with one slow core and 3 fast ones.

Click to expand...

Technically you will always have one core which is the slowest. Simple consequence of having >1 core. It is the slowest core which becomes the clock-limiting logic circuit for the overall monolithic IC. I know you know this

Sure. I meant a significant difference. If you had fine enough granularity in clock speed measurement and consistent tests, you'd find no two cores are ever the same. The intended audience of that post wasn't you or me.

The picture Tom's used for that article made me laugh.

Originally posted by: IntelUser2000

True, Intel are more effecient with their cache, they can stack more cache than AMD can in the same die area. What is interesting is that AMD licensed Z-RAM technology about two years ago, but didn't use it yet.

Click to expand...

Not true anymore. Barcelona's cache is now equal/smaller per MB compared to Merom's cache: http://www.chip-architect.com/...19_Various_Images.html

Cool, I didn't know that

Originally posted by: heyheybooboo
'Disabled' was a poor choice of words. What he did was to use AMD Overdrive to independently overclock the other individual cores. I cannot recall what he did with that specific core, whether it remained at stock or was underclocked in some fashion.

Sorry I lost the link because it would have been interesting for the early phenom adopters to play with. It's definitely a 'Brave New World' for the Phenom tweakers out there ...

These guys appear to be RMAing CPUs because they're not OCable enough. That's some nice ethics they've got.

Kuzi · Feb 29, 2008

Originally posted by: Kuzi
I saw some benchmarks a few months ago at xtremesystems forums of Phenom IMC OC'ed to 3GHz, the L3 cache latency went down 15ns or so and performance went up. Don't remember performance was up by how much though.

I'll try to find the post again and link it here.

I've found the thread on XS forums:

2400MHz, IMC@1980MHz, HT@1100MHz, RAM=440 4-4-4-4 2T:

2600MHz, IMC@3120MHz, HT@2160MHz, RAM=480 5-5-5-5 2T:

Notice how the L3 cache latency changed from 51ns when IMC was clocked at 1980Mhz to 33ns at 3120MHz IMC frequency.

The CPU frequency is 200Mhz higher, but that is not enough to cause the huge decrease in L3 latency (35%!).

Now I want to see a 3.2GHz Shanghai with 6MB L3 cache and IMC at 3.2GHz also

P.S The full and very long Phenom OC thread can be checked Here

Idontcare · Feb 29, 2008

Originally posted by: Kuzi

Originally posted by: Kuzi
I saw some benchmarks a few months ago at xtremesystems forums of Phenom IMC OC'ed to 3GHz, the L3 cache latency went down 15ns or so and performance went up. Don't remember performance was up by how much though.

I'll try to find the post again and link it here.

Click to expand...

I've found the thread on XS forums:

2400MHz, IMC@1980MHz, HT@1100MHz, RAM=440 4-4-4-4 2T:

2600MHz, IMC@3120MHz, HT@2160MHz, RAM=480 5-5-5-5 2T:

Notice how the L3 cache latency changed from 51ns when IMC was clocked at 1980Mhz to 33ns at 3120MHz IMC frequency.

The CPU frequency is 200Mhz higher, but that is not enough to cause the huge decrease in L3 latency (35%!).

Now I want to see a 3.2GHz Shanghai with 6MB L3 cache and IMC at 3.2GHz also

P.S The full and very long Phenom OC thread can be checked Here

Kuzi, without diving into the links you provided (I will do that in time though) can you tell me why (official AMD party-line or enthusiast speculation) the IMC is so vastly underclocked on stock Phenom parts?

I was under the impression that this was intentional to manage the TDP budget. I.e. clock the IMC lower (less power consumption) so you can turn around and clock the cores higher (increasing power consumption) such that for the same total power consumption you balance IMC latency versus core performance and maximize the overall platform performance.

But I am beginning to wonder if I am just making this up and it never actually was part of the AMD strategy.

Foxery · Feb 29, 2008

Originally posted by: Kuzi

I've found the thread on XS forums:

2400MHz, IMC@1980MHz, HT@1100MHz, RAM=440 4-4-4-4 2T:

2600MHz, IMC@3120MHz, HT@2160MHz, RAM=480 5-5-5-5 2T:

P.S The full and very long Phenom OC thread can be checked Here

Hmm, the thread is 51 pages. Was he able to run a more meaningful benchmark on the machine while the IMC was overclocked? If the system is able to perform something complex at these settings, that would be juicy information indeed.

Also, given that AMD couldn't run the IMC and logic core synchronously, (for whatever reason,) why didn't they at least make them differ by some easily manageable multiplier like 1/2 rather than fixing the IMC at an arbitrary 1800mhz? I'm no expert, but round numbers have a special place in my heart.

Foxery · Feb 29, 2008

I just realized why this all sounds so familiar. In the very first incarnation of the Athlons, when it was in a Slot form, (!) AMD couldn't produce L2 Cache at high frequencies. I'm talking about the Pentium-3 equivalent generation, 500-950 MHz range.

The top few speed grades had to run with a multiplier of 2X between the core+L1 and the L2 cache. Guess what? They scaled poorly, because the core was always starved for data. The next revision, with new and improved synchronous L2 cache, was a performance monster that propelled them straight through the P4/AthlonXP generation.

I'm greatly oversimplifying, but I hope we see a similar boost when they improve this generation's bottleneck.

heyheybooboo · Feb 29, 2008

My FUD ...

'Synchronicity' and the AMD 200MHz reference clock, multipliers, memory divisors, etc., started out the window with DDR2/AM2.

AM2+ has thrown it out the window. The L3/IMC is contained on a 'variant' of the northbridge that now runs at its own frequency. AFAIK L1/2 cache still runs at cpu speed. I guess (!) a series of instructions (algorithms) handles the relationship between the 'triad' (cpu/cache - IMC/cache - memory)

When coupled with the split power planes it seems like this is AMD Bizzaro Land. But I think there is reason to their madness.

Since AM2 AMD is undergoing a series of platform transitions. With AM2+ we have the independence of the 'triad' and the split power planes.

I think this configuration (and transition) is a necessary step to ... Fusion.

I hope this made sense - I gotta run ...

exar333 · Feb 29, 2008

Originally posted by: heyheybooboo

Originally posted by: CTho9305

Originally posted by: Martimus

Originally posted by: Idontcare

Originally posted by: Kuzi
So with higher performance/smaller size than Phenom, if AMD can clock Shanghai higher (I'm sure they can) than 3GHz, they should be good competition to Yorkfield.

Click to expand...

Is there a concensus on the interweb as to what is limiting Phenom clocks at 65nm?

Is it TDP limited? xtor clocking limited (Vcore)? clock-skew limited (die-size)? or speed-path limited (layout)?

Click to expand...

From what I have gathered, the biggest issue is the IMC clock speed. Once the core speed gets farther and farther away from the memory speed, the chip gets errors when multiple cores go to access the memory. (Because two cores could be accessing the same memory location at different clock cycles, but since the IMC is running slower, the memory hasn't caught up yet giving the second access incorrect data.) Although there is so little out there about the technical limitations (at least that I have read) that this is the only thing I can think of. This is pure conjecture though, so I am probably wrong.

Click to expand...

Has anyone tried underclocking the northbridge/memory controller to verify this?

Click to expand...

I stumbled across a forum post (which I can't locate now) whereby someone disabled 'core2' of the Phenom in AMD Overdrive and succeeded in a generous overclock and control of the IMC/northbridge. It was difficult to determine if this was 'factual or FUD' because of the tone of the post.

And I was of the general impression that errata 298 was partially based on the 'flush' (or lack thereof) of the L2 in one core that effectively created a memory addressing conflict within the cache structure of the cpu.

Someone above my paygrade will have to determine the connection (if there is one) with the nb/imc speed limitation of 1.8GHz(?)

AMD could gain some traction with an upfront explanation of the problem and their success in addressing it with B3 ... but I guess I'm asking for too much

Originally posted by: Idontcare

Are AMD's 65nm node transistors just so weak (Ion/Ioff) that the switching speed effectively caps the maximum clockspeed of the Phenom to the current ~2.8GHz max?

What's the fastest clocked 65nm part AMD has ever shipped? What about 90nm?

Click to expand...

90nm - X2 6400+ @ 3.2GHz ""Windsor""
65nm - X2 5400+ @ 2.8GHz ""Brisbane""

I don't know how to characterize the clockspeed issues with the Phenom but have always assumed it related to the 1.8GHz limitation of the imc/nb

Doesnt anyone else here find it interesting that as the process gets smaller, the top clock-speed is reducing? That is exactly the opposite of Intel. The other poster may be right that AMD milked so much out of 90nm that they didnt prep well enough for 65nm and smaller. This is a disturbing tend, to be sure. Hopefully they can buck this trend and get back on track with 45nm. What worries me is that we have seen zero test demoes of 45nm. Intel was demoing these 45nm over a year before they became retail to show it was maturing well. AMD seems pretty close-mouthed, that is usually not for the best.

Martimus · Feb 29, 2008

Originally posted by: Kuzi

Originally posted by: Kuzi
I saw some benchmarks a few months ago at xtremesystems forums of Phenom IMC OC'ed to 3GHz, the L3 cache latency went down 15ns or so and performance went up. Don't remember performance was up by how much though.

I'll try to find the post again and link it here.

Click to expand...

I've found the thread on XS forums:

2400MHz, IMC@1980MHz, HT@1100MHz, RAM=440 4-4-4-4 2T:

2600MHz, IMC@3120MHz, HT@2160MHz, RAM=480 5-5-5-5 2T:

Notice how the L3 cache latency changed from 51ns when IMC was clocked at 1980Mhz to 33ns at 3120MHz IMC frequency.

The CPU frequency is 200Mhz higher, but that is not enough to cause the huge decrease in L3 latency (35%!).

Now I want to see a 3.2GHz Shanghai with 6MB L3 cache and IMC at 3.2GHz also

P.S The full and very long Phenom OC thread can be checked Here

That xtremesystems forum link is by far the best information on Phenom overclocking I have seen, thanks! It is interesting to see that Toms Hardware didn't properly disable the TLB fix in their benchmarks after reading that. It seems like a very complicated process to change anything on those MSI boards.

Kuzi · Feb 29, 2008

Originally posted by: Idontcare
Kuzi, without diving into the links you provided (I will do that in time though) can you tell me why (official AMD party-line or enthusiast speculation) the IMC is so vastly underclocked on stock Phenom parts?

I was under the impression that this was intentional to manage the TDP budget. I.e. clock the IMC lower (less power consumption) so you can turn around and clock the cores higher (increasing power consumption) such that for the same total power consumption you balance IMC latency versus core performance and maximize the overall platform performance.

You are correct in your assumption. From the excellent OC tests that KTE posted at XS forums, power consumption does go way up with increased IMC and HT speeds. So that is one reason why AMD would clock Phenom IMC so low at 1.8GHz.

Originally posted by: Foxery
Hmm, the thread is 51 pages. Was he able to run a more meaningful benchmark on the machine while the IMC was overclocked? If the system is able to perform something complex at these settings, that would be juicy information indeed.

Yes check the 71st post on this Page

He ran many many benches with Phenom at 2.6Ghz, IMC at 3120Mhz, and HT at 2160Mhz. Remember these tests were done about 3 months ago, so the BIOS probably was not 100% optimized yet.

And I think his RAM was running only in single channel mode at the time he tested, could have been a mobo or Bios problem not sure. So can't compare these results with anything else really

CTho9305 · Feb 29, 2008

Originally posted by: Foxery
I just realized why this all sounds so familiar. In the very first incarnation of the Athlons, when it was in a Slot form, (!) AMD couldn't produce L2 Cache at high frequencies. I'm talking about the Pentium-3 equivalent generation, 500-950 MHz range.

The top few speed grades had to run with a multiplier of 2X between the core+L1 and the L2 cache. Guess what? They scaled poorly, because the core was always starved for data. The next revision, with new and improved synchronous L2 cache, was a performance monster that propelled them straight through the P4/AthlonXP generation.

I thought they purchased the SRAM chips from other suppliers. Didn't the Pentiums of the time do similar things? Also, IIRC the top speed grades were actually at 1/3 clock for the L2; the slowest CPU speeds had 1/2 clock.

Idontcare · Feb 29, 2008

Originally posted by: CTho9305

Originally posted by: Foxery
I just realized why this all sounds so familiar. In the very first incarnation of the Athlons, when it was in a Slot form, (!) AMD couldn't produce L2 Cache at high frequencies. I'm talking about the Pentium-3 equivalent generation, 500-950 MHz range.

The top few speed grades had to run with a multiplier of 2X between the core+L1 and the L2 cache. Guess what? They scaled poorly, because the core was always starved for data. The next revision, with new and improved synchronous L2 cache, was a performance monster that propelled them straight through the P4/AthlonXP generation.

Click to expand...

I thought they purchased the SRAM chips from other suppliers. Didn't the Pentiums of the time do similar things? Also, IIRC the top speed grades were actually at 1/3 clock for the L2; the slowest CPU speeds had 1/2 clock.

Correct. The Argon and Pluto cores had MCM with L2 cache. 700MHz and below and the L2 ran at 1/2 multiplier. 850MHz and below had L2 cache that ran at 2/5 (not quite 1/2) multiplier. Above 850MHz the L2 ran at 1/3 multiplier.

Apologies for the Wiki link, but having owned 26 Athlons at the time I was merely looking for a link which I can vouch for based on my experience. The wiki link here is correct insofar as the cache discussion.
http://en.wikipedia.org/wiki/Athlon

Who here remembers the "goldfinger"?

CTho9305 · Feb 29, 2008

Originally posted by: Idontcare

Originally posted by: CTho9305

Originally posted by: Foxery
I just realized why this all sounds so familiar. In the very first incarnation of the Athlons, when it was in a Slot form, (!) AMD couldn't produce L2 Cache at high frequencies. I'm talking about the Pentium-3 equivalent generation, 500-950 MHz range.

The top few speed grades had to run with a multiplier of 2X between the core+L1 and the L2 cache. Guess what? They scaled poorly, because the core was always starved for data. The next revision, with new and improved synchronous L2 cache, was a performance monster that propelled them straight through the P4/AthlonXP generation.

Click to expand...

I thought they purchased the SRAM chips from other suppliers. Didn't the Pentiums of the time do similar things? Also, IIRC the top speed grades were actually at 1/3 clock for the L2; the slowest CPU speeds had 1/2 clock.

Click to expand...

Correct. The Argon and Pluto cores had MCM with L2 cache. 700MHz and below and the L2 ran at 1/2 multiplier. 850MHz and below had L2 cache that ran at 2/5 (not quite 1/2) multiplier. Above 850MHz the L2 ran at 1/3 multiplier.

Apologies for the Wiki link, but having owned 26 Athlons at the time I was merely looking for a link which I can vouch for based on my experience. The wiki link here is correct insofar as the cache discussion.
http://en.wikipedia.org/wiki/Athlon

Who here remembers the "goldfinger"?

I used a GFD.

coldpower27 · Feb 29, 2008

Originally posted by: Kuzi

Originally posted by: GFORCE100
The Phenom is 285mm2 at 4x 512KB L2 cache = 2MB + 2MB L3 cache = 4MB total cache. You're suggesting they would have made Phenom at 8MB total cache meaning 570mm2 die size. Never in a million years....the massive die size would kill them as yields would be virtually non-existant at this size and given the problems AMD already has at 65nm with their K8's. At 45nm the die size would still be large.

Click to expand...

Actually I meant that Shanghai K10.5, will have 8MB (2MB L2+6MB L3) cache. So if Phenoms weakness was really the small 512K L2 cache per core, then I'm suggesting AMD engineers would have designed K10.5 to have 1MB L2 cache per core and 4MB L3 shared cache (instead of 6MB), that will still give K10.5 8MB total cache (4MB+4MB) so the die size will be similar to what the actual K10.5 will be at.

Would that be possible, LV3 cache is obviously easier to yeild as it is considerably slower, maybe it's cost trade off? Though even with double the total cache I expect the Shanghai derivatives to be smaller then the current Bareclona.

coldpower27 · Feb 29, 2008

Originally posted by: GFORCE100

Originally posted by: Kuzi

Originally posted by: GFORCE100
IMHO the L3 cache on the K10 really isn't the major problem, it's the L2 cache size as 512K per core is really too small for today's applications.

Click to expand...

If that is really the case then AMD engineers would have designed K10.5 to have 1MB L2 cache per core and 4MB L3 shared cache. The die size should be similar to a 512k L2 per core / 6MB L3 CPU.

Click to expand...

The Phenom is 285mm2 at 4x 512KB L2 cache = 2MB + 2MB L3 cache = 4MB total cache. You're suggesting they would have made Phenom at 8MB total cache meaning 570mm2 die size. Never in a million years....the massive die size would kill them as yields would be virtually non-existant at this size and given the problems AMD already has at 65nm with their K8's. At 45nm the die size would still be large.

No, your forgeting to incorporate the fact that alot of the Bareclona die is core logic, doubling the cache on Bareclona would not make it 570mm2, 1MB of cache at the 65nm node shouldn't be much more then 25mm2 each so your looking at close to 400mm2, still way too costly for anything besides the MP environment though.

At 45nm 8MB of total cache with the Bareclona core as it is now would probably come in at the 210-270mm2 range quite doable.

Idontcare · Feb 29, 2008

Originally posted by: coldpower27
No, your forgeting to incorporate the fact that alot of the Bareclona die is core logic, doubling the cache on Bareclona would not make it 570mm2, 1MB of cache at the 65nm node shouldn't be much more then 25mm2 each so your looking at close to 400mm2, still way too costly for anything besides the MP environment though.

At 45nm 8MB of total cache with the Bareclona core as it is now would probably come in at the 210-270mm2 range quite doable.

On the topic of cache sizes, I was always surprised that AMD didn't spin a teh uber cache size Opteron targeted at that 8xxx server market. Sure the diesize would be intentionally large, sub-600mm^2, but it would mostly be large regions of cache arrays that could be fused off to maintain yields as well as down-binning to desktop SKU's to ensure the whole mix was saleable material.

I never quite settled on a rationale for why AMD only differentiated their Sempron chips by cache size for the lower end but not for their upper end Opterons (and now Barcelona's).

Kuzi · Mar 1, 2008

Originally posted by: coldpower27

Originally posted by: Kuzi
Actually I meant that Shanghai K10.5, will have 8MB (2MB L2+6MB L3) cache. So if Phenoms weakness was really the small 512K L2 cache per core, then I'm suggesting AMD engineers would have designed K10.5 to have 1MB L2 cache per core and 4MB L3 shared cache (instead of 6MB), that will still give K10.5 8MB total cache (4MB+4MB) so the die size will be similar to what the actual K10.5 will be at.

Click to expand...

Would that be possible, LV3 cache is obviously easier to yeild as it is considerably slower, maybe it's cost trade off? Though even with double the total cache I expect the Shanghai derivatives to be smaller then the current Bareclona.

It is my understanding that as the processor gets more cores, a large shared cache pool becomes more important. I believe even Intel will start producing desktop processors with large L3 caches in the future.

As many people mentioned before, the L2 cache size in AMD CPUs is less important because of the integrated memory controller. It's just that at the current Barcelona/Phenom state, the IMC is running too slow at 1.8GHz, which increases L3 cache latency and lowers overall performance. Hopefully this problem will be rectified when Shanghai is released. And who knows there could other variables slowing Phenom right now.

Idontcare · Mar 1, 2008

Originally posted by: Kuzi

Originally posted by: coldpower27

Originally posted by: Kuzi
Actually I meant that Shanghai K10.5, will have 8MB (2MB L2+6MB L3) cache. So if Phenoms weakness was really the small 512K L2 cache per core, then I'm suggesting AMD engineers would have designed K10.5 to have 1MB L2 cache per core and 4MB L3 shared cache (instead of 6MB), that will still give K10.5 8MB total cache (4MB+4MB) so the die size will be similar to what the actual K10.5 will be at.

Click to expand...

Would that be possible, LV3 cache is obviously easier to yeild as it is considerably slower, maybe it's cost trade off? Though even with double the total cache I expect the Shanghai derivatives to be smaller then the current Bareclona.

Click to expand...

It is my understanding that as the processor gets more cores, a large shared cache pool becomes more important. I believe even Intel will start producing desktop processors with large L3 caches in the future.

As many people mentioned before, the L2 cache size in AMD CPUs is less important because of the integrated memory controller. It's just that at the current Barcelona/Phenom state, the IMC is running too slow at 1.8GHz, which increases L3 cache latency and lowers overall performance. Hopefully this problem will be rectified when Shanghai is released. And who knows there could other variables slowing Phenom right now.

In Intel's case it's more cores and more threads per core (Nehalem = 2 threads/core via SMT).

Speaking of hyperthreading, does AMD have any plans to boost the threads/core > 1? They need something like this on their roadmap for 2H/09, don't they?

Kuzi · Mar 1, 2008

Originally posted by: Idontcare
In Intel's case it's more cores and more threads per core (Nehalem = 2 threads/core via SMT).

Speaking of hyperthreading, does AMD have any plans to boost the threads/core > 1? They need something like this on their roadmap for 2H/09, don't they?

Notice how most desktop applications (not synthetic) today don?t use more than two cores, and even the ones that do make use of +2 cores, only get a minor boost in performance (going from 2 to 4 cores), this is generally speaking of course. So the more cores you have it becomes harder and harder to make use of the extra cores, how many programs are out right now that give 4x performance from 4 cores vs a single core.

As to Hyperthreading in Nehalem (8x threads), my guess is that the increase in performance will be minimal for two reasons. First it will never be as effective as having 8 real cores, and second is what I mentioned above, that in general most applications today don?t get much benefit even from 4 cores. On the server side it might be very different and my guess Nehalem with Hyperthreading will show it?s true strength there.

As to AMD my guess is that they plan to have two quad-core processors on the same die (non-native) what Intel has been doing with their CPUs for a while. You know it?s cheaper and easier to do, but even at 45nm the die size would be too big, so maybe they need to wait till 32nm to do that. Good luck AMD.

Viditor · Mar 1, 2008

Originally posted by: Kuzi

Originally posted by: Idontcare
In Intel's case it's more cores and more threads per core (Nehalem = 2 threads/core via SMT).

Speaking of hyperthreading, does AMD have any plans to boost the threads/core > 1? They need something like this on their roadmap for 2H/09, don't they?

Click to expand...

Notice how most desktop applications (not synthetic) today don?t use more than two cores, and even the ones that do make use of +2 cores, only get a minor boost in performance (going from 2 to 4 cores), this is generally speaking of course. So the more cores you have it becomes harder and harder to make use of the extra cores, how many programs are out right now that give 4x performance from 4 cores vs a single core.

As to Hyperthreading in Nehalem (8x threads), my guess is that the increase in performance will be minimal for two reasons. First it will never be as effective as having 8 real cores, and second is what I mentioned above, that in general most applications today don?t get much benefit even from 4 cores. On the server side it might be very different and my guess Nehalem with Hyperthreading will show it?s true strength there.

As to AMD my guess is that they plan to have two quad-core processors on the same die (non-native) what Intel has been doing with their CPUs for a while. You know it?s cheaper and easier to do, but even at 45nm the die size would be too big, so maybe they need to wait till 32nm to do that. Good luck AMD.

I agree completely about hyperthreading...in fact, I don't know of a good reason for Intel to be bringing it back (unless it helps with the CSI interface in some way).

As to AMD's upcoming 8 and 16 cores CPUs, I don't know...
What we know so far is:
1. They are part of the fusion project in that they will be designed for modularity. This lends credence to your predicition of an MCM in that with all of the different variables (xCPU + xGPU = 8 cores), it would be VERY expensive to design and produce all of them...

2. On the other hand, we also know that they will utilize DC architecture and have a crossbar switch. This has always been a feature of monolithic design only (in fact I don't know how you could do it with an MCM). There's also the problem of how you deal with the on-die memory controller in an MCM...

BTW, while an MCM is cheaper from a yield standpoint, it's not necessarily cheaper overall (they are very expensive to design, which is why AMD doesn't have one).

Phynaz · Mar 1, 2008

Originally posted by: Viditor

Originally posted by: Kuzi

Originally posted by: Idontcare
In Intel's case it's more cores and more threads per core (Nehalem = 2 threads/core via SMT).

Speaking of hyperthreading, does AMD have any plans to boost the threads/core > 1? They need something like this on their roadmap for 2H/09, don't they?

Click to expand...

Notice how most desktop applications (not synthetic) today don?t use more than two cores, and even the ones that do make use of +2 cores, only get a minor boost in performance (going from 2 to 4 cores), this is generally speaking of course. So the more cores you have it becomes harder and harder to make use of the extra cores, how many programs are out right now that give 4x performance from 4 cores vs a single core.

As to Hyperthreading in Nehalem (8x threads), my guess is that the increase in performance will be minimal for two reasons. First it will never be as effective as having 8 real cores, and second is what I mentioned above, that in general most applications today don?t get much benefit even from 4 cores. On the server side it might be very different and my guess Nehalem with Hyperthreading will show it?s true strength there.

As to AMD my guess is that they plan to have two quad-core processors on the same die (non-native) what Intel has been doing with their CPUs for a while. You know it?s cheaper and easier to do, but even at 45nm the die size would be too big, so maybe they need to wait till 32nm to do that. Good luck AMD.

Click to expand...

I agree completely about hyperthreading...in fact, I don't know of a good reason for Intel to be bringing it back (unless it helps with the CSI interface in some way).

As to AMD's upcoming 8 and 16 cores CPUs, I don't know...
What we know so far is:
1. They are part of the fusion project in that they will be designed for modularity. This lends credence to your predicition of an MCM in that with all of the different variables (xCPU + xGPU = 8 cores), it would be VERY expensive to design and produce all of them...

2. On the other hand, we also know that they will utilize DC architecture and have a crossbar switch. This has always been a feature of monolithic design only (in fact I don't know how you could do it with an MCM). There's also the problem of how you deal with the on-die memory controller in an MCM...

BTW, while an MCM is cheaper from a yield standpoint, it's not necessarily cheaper overall (they are very expensive to design, which is why AMD doesn't have one).

Because Nehalem is very wide and SMT will show some nice performance gains.

I'd be very interested in what makes you think an MCM is any more expensive to design that any other cpu architechture. Especially since the "expense" would be man-hours, and Intel did it in nine months. There's also the "expense" of of AMD not being in the quad core market for a year.

AMD doesn't have an MCM becuase they picked the wrong strategy. Need I post the link to the Hector Ruiz quote that he wished he had gone MCM instead of native?

AMD K10.5 is 10-20 percent faster than K10

Elite Member

Diamond Member

Elite Member

Lifer

Senior member

Elite Member

Senior member

Elite Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Elite Member

Elite Member

Elite Member

Golden Member

Golden Member

Elite Member

Senior member

Elite Member

Senior member

Diamond Member

Lifer