If Phenom had 8MB L3 cache

PeteRoy

Senior member
Jun 28, 2004
958
2
91
www.youtube.com
I am thinking that if only AMD had put a lot more cache in their CPU's it would have matched Intel Core 2 Duo's or even Nehalem.

And if not match it would at least close the gap.

What do you think?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
From what I've gathered regarding K10's L3$ its not the size but the speed that is the problem.

I think most people would prefer a clock synchronized L3$ rather than a larger slow cache.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
Denab will have 6MB. From what I understand large cache isn't as important since they have the integrated IMC. As IDC said, I think they'd gain more if the L3 ran at the core speed. My chip is at 2.8GHz, my L3 cache is only at 2.0GHz.
 

Comdrpopnfresh

Golden Member
Jul 25, 2006
1,202
2
81
The latency is even more of an issue than speed. AMD would see a larger improvement (in single-threaded performance at least) with a larger L2, but in having L3, I beleive their L2 is of a higher latency than the previous generation. Their yields are probably stretched enough, that adding more L3 would mean loosening the latencies further. As mentioned above, the the IMC, the cache now serves as immediate table space for whatever the CPU is doing: L2 for a core (or is it a pair?), and L3 for the whole thing. I don't remember so well, but I think the reason why nehalem will continue to have such large cache quantities is because the architecture's cache isn't exclusive, or some such term- what is there for one core is their for all.
 

Sylvanas

Diamond Member
Jan 20, 2004
3,752
0
0
Have a look at some preliminary Deneb results here. whether it is the 6mb L3 or something else there is a leap in performance (and 57.3w lower load power consumption and thats with all those extra L3 transistors that come with an increased cache).
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,736
156
106
my theory is that running the IMC at a lower multiple to the cpu frequency is done to save power

i guess there are trade-offs ...
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Comdrpopnfresh
The latency is even more of an issue than speed.

Unfortunately we can't arbitrarily change the latency, so we can't make statements like that and have any confidence in being right.

We can change the speed though, and many reviews have hit on this.

Increasing the L3$ clockspeed from 1.8GHz to 2GHz (and higher) has been noted as having a meaningful improvement to Phenom's performance.

Improving latency no doubt will also improve performance, but with zero data points on how changing the latency would impact performance none of us are qualified to say that doing so would produce a bigger performance impact than the already observable performance impact that comes with increasing clockspeed.

Originally posted by: Soulkeeper
my theory is that running the IMC at a lower multiple to the cpu frequency is done to save power

i guess there are trade-offs ...

If that were true then you would see folks with BE Phenoms clocking their L3$ and IMC's up to 3GHz on water-cooled or phase-cooled systems.

The truth is there appears to be some manner of speedpath issue with the logic involving the IMC and/or L3$ as you rarely will find a person who reports overclocking their L3$ to 2.8Ghz or higher, but core clocks can hit 3.2GHz or better.

The L3$ clockspeed does not appear to be thermally limited.

In hindsight it would appear that Charlie's "dancing in the aisles" article on TheInq was probably spot-on but completely misinterpreted by him and his readers. AMD was excited because they figured out how to make the L3$/IMC asynchronous to the core clocks, thus liberating the core clockspeed from whatever speedlimiting demons were lurking in that L3$/IMC circuitry. Everyone read Charlie's article the other way around, as being that AMD had found the speedlimiting demon and had successfully evicted it thus heralding a regime of clockspeeds that would clobber Clovertown.
 

Comdrpopnfresh

Golden Member
Jul 25, 2006
1,202
2
81
Originally posted by: Sylvanas
whether it is the 6mb L3 or something else there is a leap in performance

I vote something else. lol. It's the 45nm refresh, isn't it? Certainly at 45nm you have a lot less power consumption, and are able to have more, faster, and lower latency cache (is it this core when they have the high-k gates they've been working on w/ IBM?). All three factors should improve bandwidth and performance. They might have also tweaked the architecture and made slight improvements.

Originally posted by: Idontcare
Unfortunately we can't arbitrarily change the latency, so we can't make statements like that and have any confidence in being right.

I hear ya. But there are so many parallels to my point. We can't change the amount of cache on a processor either. When hypertransport came about you could say it didn't make a difference because you had no alternative to the FSB on intel chips for comparison- but we all know it made a difference. Speed increases are always used to mask latencies- the reason why we have so many crucial numbers to look for in computing. The cache on a harddisk hides the latency of read/write speeds, L2 cache hides the latency of fetching data across a bus to the ram. Quad-pumping FSBs masks the low internal clock, and a higher overall FSB speed masks the latency of the bus itself.

My point to the OP (which only comes across when reading past the first sentence, and taking meaning from the whole post combined) is that, on a microarchitecture with an IMC, cache sizes are less important.
The only reason why the L3 is present is because it is a quadcore chip. Adding an L3 to an athlon x2 would do very little for the power consumption and die footprint it would add. In the same manner, adding more L3 to the phenom wouldn't do all that much. Barcelona/Pehnom was designed for the server market, where you will load all the cores and have cross traffic on the die much more frequently than in a consumer desktop- it just so happens having more cores aids desktop multitasking. So for a consumer desktop, L2 is more important than L3 (an exception always exists). If adding more L3 was effectual, you'd see a performance gain on the X3 chips compared to X4s- the same amount of L3 split by three vs four should make an improvement- but it isn't strikingly there. To make the tiered cache system, AMD has cache of a higher latency vs the last generation. L3 is of a native higher latency, as L2 when comparing to L1. With adding in an extra tiers- AMD also (If I correctly remember reading) added latency to current L2.

To the op- AMD Phenom and X2 performance won't increase all that much with more cache @ the current node. Current Intel chips have so much, and penryn chips get a performance boost from more because having lots of cache, as well has lower latencies and higher speeds (in the case of penryn), masks the delay or latency (and performance hit) you get when the processor has to get data from the ram over the FSB- which is slow. On current intel parts, a request for data in the ram goes from the cpu to the nb to the ram. Integrated memory controllers drastically reduce latency. Reducing latency increases the speed in which a request is completed- just like increasing the speed reduces latency- two sides of a coin aimed at the same goal really. Which is why Nehalem chips will have much smaller L2 caches- nehalem will have an integrated memory controller and it will be quicker, thus 'okay', to ask for data from the ram. However, if memory serves me right, there is such a large amount of L3 cache on nehalems because the L3 tier also has a copy of the previous L2. So on the L3 on a nehalem is the contents of cores 1-4's individual L2's (once again, not 100% sure, but I vaguely remember reading this).
The point is that going from a FSB to IMC is a much greater improvement in performance than say going from ddr2-667 to ddr2-800, or from ddr2-ddr3. In both those cases, you gain higher speeds, at the cost of greater latency. The memory subsystem of a computer (and the L2/3 used to mask its latency) is more impacted by latency than speed. Otherwise, we'd have a higher-than linear ramp in improvement between the same processors running at higher speeds- as in most modern processors, the L2 is synchronous with the CPU clock: so you'd have the performance gain of the higher clock, and higher L2 clock. But this just isn't so. Both do a little combined as frequency increases, but not even linearly when combined. But we see JUMPS in speed across generations as nodes shrink and latencies come down. The same thing can be said as to why research is ongoing in using light as a transmission medium in future computing- not because whatever processors using light will automatically run @ a billion ghz, but because the transmissions, through the light itself moves faster = less latency.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
there are a variety of design decisions that AMD took that makes cache less important.
 

jones377

Senior member
May 2, 2004
462
64
91
Originally posted by: Idontcare
From what I've gathered regarding K10's L3$ its not the size but the speed that is the problem.

I think most people would prefer a clock synchronized L3$ rather than a larger slow cache.

There isn't a single fix to performance. A bigger cache will improve performance. A higher clocked cache will improve performance and a lower latency cache will also improve performance. In the case of Phenom I'd say the boost from biggest to smallest is in that order for the L3 cache but all 3 are important.
 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: Comdrpopnfresh
My point to the OP (which only comes across when reading past the first sentence, and taking meaning from the whole post combined) is that, on a microarchitecture with an IMC, cache sizes are less important.

I agree completely.

The memory subsystem of a computer (and the L2/3 used to mask its latency) is more impacted by latency than speed.

I would agree with this statement, if it read "The memory subsystem of a computer with a processor that has an IMC is more impacted by latency than by speed", since with current FSB-based processors, that generally doesn't apply in most instances.

Otherwise, we'd have a higher-than linear ramp in improvement between the same processors running at higher speeds- as in most modern processors, the L2 is synchronous with the CPU clock: so you'd have the performance gain of the higher clock, and higher L2 clock. But this just isn't so.

It's never possible to have higher-than-linear ramps in performance, assuming the microarchitecture doesn't change (also assuming the chipset and memory subsystem are unchanged).

But we see JUMPS in speed across generations as nodes shrink and latencies come down.

I would guess that's because of architectural improvements for the most part, since in the past we've seen no major differences in performance (at the same clockspeed) with die shrinks, unless they were also accompanied by architectural improvements. For instance, a 45nm Penryn is only ~5% faster than a similarly clocked 65nm Kentsfields/Conroes (excluding Divx and other SSE4.1-optimized software), and that seems to be almost entirely from the extra L2 cache. Likewise, the 65nm X2's weren't faster at all than their 90nm counterparts.

BTW, that was a very informative post. Thanks for taking the time to write it. I'm sure the OP appreciates it even more than the rest of us.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
I would think that phenom would see larger benefits form a larger L1 and L2 cache per processor rather then a larger L3. Especially since most applications are still single threaded as opposed to being multithreaded.
 

Philippart

Golden Member
Jul 9, 2006
1,290
0
0
more cache doesn't necessary mean more speed,

AMD and Intel have very different architectures right now
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Originally posted by: Philippart
more cache doesn't necessary mean more speed,

AMD and Intel have very different architectures right now

No, it doesn't but with applications having larger and larger demands on cache, it does provide some speed increases (in at least some applications).

Encoding for example generally benefits with larger amounts of cache even on the same architecture.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I thought encoding used such large datasets that cache size really didnt matter. An imc would matter more.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I thought encoding used such large datasets that cache size really didnt matter. An imc would matter more.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,736
156
106
Originally posted by: Idontcare
Originally posted by: Comdrpopnfresh
The latency is even more of an issue than speed.

Unfortunately we can't arbitrarily change the latency, so we can't make statements like that and have any confidence in being right.

We can change the speed though, and many reviews have hit on this.

Increasing the L3$ clockspeed from 1.8GHz to 2GHz (and higher) has been noted as having a meaningful improvement to Phenom's performance.

Improving latency no doubt will also improve performance, but with zero data points on how changing the latency would impact performance none of us are qualified to say that doing so would produce a bigger performance impact than the already observable performance impact that comes with increasing clockspeed.

Originally posted by: Soulkeeper
my theory is that running the IMC at a lower multiple to the cpu frequency is done to save power

i guess there are trade-offs ...

If that were true then you would see folks with BE Phenoms clocking their L3$ and IMC's up to 3GHz on water-cooled or phase-cooled systems.

The truth is there appears to be some manner of speedpath issue with the logic involving the IMC and/or L3$ as you rarely will find a person who reports overclocking their L3$ to 2.8Ghz or higher, but core clocks can hit 3.2GHz or better.

The L3$ clockspeed does not appear to be thermally limited.

In hindsight it would appear that Charlie's "dancing in the aisles" article on TheInq was probably spot-on but completely misinterpreted by him and his readers. AMD was excited because they figured out how to make the L3$/IMC asynchronous to the core clocks, thus liberating the core clockspeed from whatever speedlimiting demons were lurking in that L3$/IMC circuitry. Everyone read Charlie's article the other way around, as being that AMD had found the speedlimiting demon and had successfully evicted it thus heralding a regime of clockspeeds that would clobber Clovertown.


i've never owned one of these chips, are you saying a person can clock the L3/IMC independently of the core frequency ?
i'm just suggesting that the choice to not run the IMC 1:1 with the core by default is atleast inpart due to power usage concerns, that cannot be proven wrong.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Soulkeeper
i've never owned one of these chips, are you saying a person can clock the L3/IMC independently of the core frequency ?

Yes, that is correct. And individual cores can be clocked seperately.

Originally posted by: Soulkeeper
i'm just suggesting that the choice to not run the IMC 1:1 with the core by default is atleast inpart due to power usage concerns, that cannot be proven wrong.

I'm not grasping what you are saying. How can it not be proven wrong?

The reason the IMC and L3$ is not synched with the core frequency's is quite clearly because AMD wanted to sell Phenoms with clockspeeds >2.4GHz (the approximate upper limit of stable L3$ clocks from most reportings in the forums).
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,736
156
106
Originally posted by: Idontcare
Originally posted by: Soulkeeper
i've never owned one of these chips, are you saying a person can clock the L3/IMC independently of the core frequency ?

Yes, that is correct. And individual cores can be clocked seperately.

Originally posted by: Soulkeeper
i'm just suggesting that the choice to not run the IMC 1:1 with the core by default is atleast inpart due to power usage concerns, that cannot be proven wrong.

I'm not grasping what you are saying. How can it not be proven wrong?

The reason the IMC and L3$ is not synched with the core frequency's is quite clearly because AMD wanted to sell Phenoms with clockspeeds >2.4GHz (the approximate upper limit of stable L3$ clocks from most reportings in the forums).



thanks for clearing this up for me