• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Question Why do caches have bandwidth measuring in GB/s or even TB/s, while their capacity is only in MBs ?

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
So this is something that has eluded me for a long time. Why do caches have more bandwidth than their capacity?

M2 Pro’s L2 is no slouch. The iGPU can pull over 1 TB/s from L2, putting it ahead of some older discrete GPUs like Radeon R9 390. A big iGPU demands more shared cache bandwidth, and M2 Pro is no exception. - https://chipsandcheese.com/2023/10/31/a-brief-look-at-apples-m2-pro-igpu/
Second generation AMD 3D V-Cache has up to 2.5 TB/s bandwidth - VideoCardz.com
I guess someone who has no understanding on this subject would wonder the same thing reading the above quoted statements.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
So this is something that has eluded me for a long time. Why do caches have more bandwidth than their capacity?
What's so hard about that? Of course the capacity is limited by die size. Cache(SRAM) is just not as dense as RAM or NAND memory, but It is much faster.

64MB V-cache using 7nm process is 41mm2.
ujz4EhUoMENDMI57.jpg

Just for 1024MB of V-cache you would need a 656mm2 7nm chip. That's just too expensive and not really worth It.
 
Last edited:

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
What's so hard about that? Of course the capacity is limited by die size. Cache(SRAM) is just not as dense as RAM or NAND memory, but It is much faster.

64MB V-cache using 7nm process is 41mm2.
ujz4EhUoMENDMI57.jpg

Just for 1024MB of V-cache you would need a 656mm2 7nm chip. That's just too expensive and not really worth It.
That was not my question. My question is, for example- why does the 64 MB V-cache have 2.5 TB/s of bandwidth? That is far more than it's capacity. What's the use of having so much bandwidth if you can only read/write 64 MB at maximum?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
That was not my question. My question is, for example- why does the 64 MB V-cache have 2.5 TB/s of bandwidth? That is far more than it's capacity.
And why should the capacity and speed be the same or close to each other? One is speed the other is capacity, what does It have in common?
What's the use of having so much bandwidth if you can only read/write 64 MB at maximum?
Because It's used as a buffer.
What do you think would happen if in modern CPUs there was no cache and only RAM would be?
Highest priority is speed, so cores don't need to wait for data.

storage-nonvolitale_memory_2.jpg
sram_vs_dram-f.png


P.S. Not sure how they calculated 2.5TB/s.
 
Last edited:

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
And why should the capacity and speed be the same or close to each other? One is speed the other is capacity, what does It have in common?
Well I was thinking you can read/write only once per second. Hence 2.5 TB/s for a 64 MB Cache made no sense. Of course, that was a wrong line of thought.
Surely the hint is in the per second thing? These chips do a lot of things every second!
This comment nails it. My question is answered.
 

yottabit

Golden Member
Jun 5, 2008
1,671
874
146
Well I was thinking you can read/write only once per second. Hence 2.5 TB/s for a 64 MB Cache made no sense. Of course, that was a wrong line of thought.

This comment nails it. My question is answered.
Once per second, 5.7 billion times per second, pretty close (the reality is somewhere in between but closer to the latter)

There’s a nice writeup here

It makes a lot more sense intuitively to think of the bandwidth in terms of “bytes per cycle”
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,036
136
As a single data point - Intel Golden Cove can do a pair of 512b loads per cycle. That means that a single 4GHz Golden Cove core is potentially generating 512GB/s of cache traffic.

Being limited to main-memory bandwidth from cache, even at lower latency, would be a severe constraint on many or most real code streams.
 
  • Like
Reactions: TESKATLIPOKA

biostud

Lifer
Feb 27, 2003
19,934
7,039
136
The CPU pipeline needs to be fed data that it can process. The lower latency and higher bandwidth, the less chance are there for the pipeline to be waiting for data. And for some technical reasons as someone has already explained, you cannot add the amount of cache you have as RAM. As the size go up bandwidth and latency goes dowm L1—>L2->L3->RAM
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
The way caches are really limited is by their way. A way is how many windows into memory (other cache) a cache can mirror. For every way the cache has to look up and redirect requests to and from it.

ex Zen3 had 8 way L1 and L2 and a 16 way L3. The L3 being a victim cache. Victim means it does not load from main memory just from the caches above it.

As for the 2.5TB/s, those numbers are full system bandwidth. So you have

32-96bytes a cycle x 16 cores x Ghz = TB/s data.
 
  • Like
Reactions: TESKATLIPOKA

moinmoin

Diamond Member
Jun 1, 2017
5,248
8,463
136
So this is something that has eluded me for a long time. Why do caches have more bandwidth than their capacity?
Not sure how this can have eluded you if you have a basic understanding of CPUs and why different forms of memory exist to begin with.

To keep it very short:
  1. CPUs compute at a very high rate, and ideally all the data they need is instantly available (so in the same cycle).
  2. But holding and delivering data at that rate is very costly, both power usage and area wise.
  3. So such data storage at essentially zero latency is kept at a minimum size.
  4. That's why you have "tiny" level one (or zero) cache that has excellent latency and bandwidth.
  5. At the opposite end you want to actually permanently store data. There bandwidth is the lowest and latency is the highest, but the advantage is storing a lot of data is cheap.
  6. All the different caching levels in between those two extremes are further compromises included to lift different bottlenecks.
 

Doug S

Diamond Member
Feb 8, 2020
3,632
6,413
136
Is this a serious question? You think a 128K L1 should only have 128Kbps of bandwidth? Do a little math on that and tell us the latency in cycles for a CPU clocked at 4 GHz to load a full 64 byte line from that cache.
 

biostud

Lifer
Feb 27, 2003
19,934
7,039
136
Is this a serious question? You think a 128K L1 should only have 128Kbps of bandwidth? Do a little math on that and tell us the latency in cycles for a CPU clocked at 4 GHz to load a full 64 byte line from that cache.
I think OP might not have a full understanding of how a CPU work, so I think the educational route is the right way to go. :)
 
  • Like
Reactions: Tlh97 and coercitiv

coercitiv

Diamond Member
Jan 24, 2014
7,380
17,494
136
I think OP might not have a full understanding of how a CPU work, so I think the educational route is the right way to go. :)
I think it goes deeper than CPUs, but the educational route is still the way to go.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
Ok guys. Can I divulge something? When I was writing the original post, I realised what the answer was. But I decided to post it anyway and pretend I had no idea. :p

I think it goes deeper than CPUs, but the educational route is still the way to go.
How far? High school? Bachelors? PhD?

I will admit I do not have the education in this regard. I opted to not take CS, and chose another subject of my liking. Hence most of my knowledge comes from reading articles from the likes of Anandtech, Semianalysis, Wikichip, Chips&Cheese, Angstronomics etc...
 

biostud

Lifer
Feb 27, 2003
19,934
7,039
136
Ok guys. Can I divulge something? When I was writing the original post, I realised what the answer was. But I decided to post it anyway and pretend I had no idea. :p


How far? High school? Bachelors? PhD?

I will admit I do not have the education in this regard. I opted to not take CS, and chose another subject of my liking. Hence most of my knowledge comes from reading articles from the likes of Anandtech, Semianalysis, Wikichip, Chips&Cheese, Angstronomics etc...
Well, it's not about educational level, I have a candidate in biology, but have had computers as a hobby the last 30+ years. And since you did ask the question, it felt like you didn't have a basic understanding of why cache needs to be extremely fast, even though it is relatively small. And instead af treating you as an idiot, I prefer to enlighten and educate. :)

There's a lot of technical stuff I don't understand as well, so I'm always grateful when someone can explain it to me.
 
  • Like
Reactions: Tlh97 and FlameTail

coercitiv

Diamond Member
Jan 24, 2014
7,380
17,494
136
How far? High school? Bachelors? PhD?
Let's say high school. How can a pipe let through several times more water per second that it's own volume? Why does a capacitor hold orders of magnitude more energy than other electrical components of the same size? How can a photon travel at ~300.000 km per second when it has no mass? All of these question have something in common: they lack contextual information about the systems they are describing, while also implying false dependencies between properties of the systems. In other words one needs to examine their purpose and properties more and things will become clear soon.
  • What do caches do? They serve as speed multipliers for the memory system.
  • How to they do that? They store and send data orders of magnitude faster than other volatile and non-volatile memory.
  • What makes caches better? The balance between size, speed, power and cost. For each application one needs to weigh all these factors in.
PS: most of us hang around these forums to learn. I thought I was here to learn about computers and practice English, turns out I learned how and when and why to debate.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
Well, it's not about educational level, I have a candidate in biology, but have had computers as a hobby the last 30+ years.
Your username checks out! XD.
And since you did ask the question, it felt like you didn't have a basic understanding of why cache needs to be extremely fast, even though it is relatively small.
Oh I sure do. CPUs need to access data quickly, and main memory simply isn't fast enough. Hence caches exist (in different levels). My quibble was with the TB/s thing, which is explained by the fact that CPUs do so many things within one second. As the other commenter said, it intuitively makes sense to think of bandwidth as "bytes per cycle".
And instead af treating you as an idiot, I prefer to enlighten and educate. :)
I highly respect that attitude:)
 
  • Like
Reactions: Tlh97 and biostud

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
32,051
32,573
146
PS: most of us hang around these forums to learn. I thought I was here to learn about computers and practice English, turns out I learned how and when and why to debate.
And you are a better netizen for it. 🫶

Your English is usually impeccable BTW.
Ok guys. Can I divulge something? When I was writing the original post, I realised what the answer was. But I decided to post it anyway and pretend I had no idea. :p
You are doing it wrong. Cunningham's Law: "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."