X4Dany guesses on how AMD will go about naming these processors with multiple L2 / L3 cache combos ?
X4Dany guesses on how AMD will go about naming these processors with multiple L2 / L3 cache combos ?
AICacheany guesses on how AMD will go about naming these processors with multiple L2 / L3 cache combos ?
lmao never.Cache sharing?
It's just moar L2 via SoIC-X.Could this be a blueprint for AMD future cache technologies ?
lmao never.
You'll get a fat L2 at least. Maybe.
It's just moar L2 via SoIC-X.
Weird cos they're gonna 3D stack it.That doesn't make physics sense. As far as I understand. L2 is way way too fast for 3d stacking and the more ways you put in a cache the slower it is. So you can't just make it bigger. But you can improve it's communication with near by assets. Which is basically what Telum does.
yeah it does.That doesn't make physics sense
No?As far as I understand. L2 is way way too fast for 3d stacking
How does Intel's 3M nice and comfy L2 slab work then?So you can't just make it bigger
I suggest your forget all esoterica and always pick the simplest, most straightforward solution in your mind. That's how AMD does things.But you can improve it's communication with near by assets. Which is basically what Telum does.
Weird cos they're gonna 3D stack it.
Ah, that's the best part. This reduces latency (at the same capacity)I mean 3d cache stacking. (the way the L3 cache works now) An L2 would never work with that level of latency. Seriously. It increased the already slow L3 to a few ticks slower and I'm sure there is some secrete logic sause where the faster L3 makes up for the slower stacked part.
His argument is that he wants Telum caches (shared L2 basically) at AMD.So your argument is that the patent is impossible to implement?
well it ain't about latency reduction per se, but piling up like 5 or 6 megs of private L2 at an acceptable cycle count.
I mean 3d cache stacking. (the way the L3 cache works now) An L2 would never work with that level of latency. Seriously. It increased the already slow L3 to a few ticks slower and I'm sure there is some secrete logic sause where the faster L3 makes up for the slower stacked part.
Seems like you are suggesting that the patent posted is invalid, impossible to implement, or you are ignoring it.
The patent suggests 2 things:
- L2 can be moved to stacked die
- L2 latency, after it was moved to stacked die will go down (!!!).
When the L3 received a stacked cache latency went up and that's a victim cache.
The patent is valid. The point is there are trade offs for every operation. If you do more work, it costs power or latency.
There may be some process that makes the above puzzle work. I look forward to seeing the solution.
For reference, here is AnandTech's article on the original Telum (z16), i.e. Dr. Ian Cutress reporting on IBM's Hot Chips 2021 presentation:IBM came out with a cache system where the L2 was shared. (Changed my mind I found it friend of the show Dr Ian Cutress and George?)
Dr. Ian Cutress said:How Is This Possible?
Magic. Honestly, the first time I saw this I was a bit astounded as to what was actually going on.
In the Q&A following the session, Dr. Christian Jacobi (Chief Architect of Z) said that the system is designed to keep track of data on a cache miss, uses broadcasts, and memory state bits are tracked for broadcasts to external chips. These go across the whole system, and when data arrives it makes sure it can be used and confirms that all other copies are invalidated before working on the data. In the slack channel as part of the event, he also stated that lots of cycle counting goes on!
I’m going to stick with magic.
Truth be told, a lot of work goes into something like this, and there’s likely still a lot of considerations to put forward to IBM about its operation, such as active power, or if caches be powered down in idle or even be excluded from accepting evictions altogether to guarantee performance consistency of a single core. It makes me think what might be relevant and possible in x86 land, or even with consumer devices.
I’d be remiss in talking caches if I didn’t mention AMD’s upcoming V-cache technology, which is set to enable 96 MB of L3 cache per chiplet rather than 32 MB by adding a vertically stacked 64 MB L3 chiplet on top. But what would it mean to performance if that chiplet wasn’t L3, but considered an extra 8 MB of L2 per core instead, with the ability to accept virtual L3 cache lines?
Ultimately I spoke with some industry peers about IBM’s virtual caching idea, with comments ranging from ‘it shouldn’t work well’ to ‘it’s complex’ and ‘if they can do it as stated, that’s kinda cool’.
Go down compared to an on-die cache of same size, associativity, etc., right? (I haven't read the patent.)The patent suggests 2 things:
- L2 can be moved to stacked die
- L2 latency, after it was moved to stacked die will go down (!!!).
At the same time, the L3$ size was tripled.When the L3 received a stacked cache latency went up and that's a victim cache.
Go down compared to an on-die cache of same size, associativity, etc., right? (I haven't read the patent.)
RGT has got the linksI have not read (or searched for the link) to full patent either. If anybody has the link handy, it would be appreciated.
RGT thinks this is zen 8 or later
AMD's SECRET WEAPON For Zen: Stacked L2 Cache Patent Analysis
a new secret weapon for future Zen CPUs has been revealed! In this video, we dive into everything revealed in the newly discovered/leaked AMD patent for stacked L2 cache for Ryzen processors. There is a lot to dive into today, but this is VERY exciting for the future generations of Ryzen CPUs, and could have huge impacts on not only the hardware specs and architecture, but also unlocking new levels of performance.But just what kind of specifications and performance improvements can we expect from AMD's new Ryzen tech for stacked L2 cache? And just when will we see this new technology implemented into AMD's roadmap for Ryzen gaming CPUs?
SOURCES
https://redgamingtech.com/amds-nightmare-intels-razor-hammer-lake-serpent-lake-leaks/
https://patents.google.com/patent/US20260003794A1/en?oq=US20260003794A1
https://www.latitudeds.com/post/amd-development-of-cache-architecture-from-planar-to-3d-integration
https://globaldossier.uspto.gov/details/US/18758517/A/125173
On Desktop it's still mostly Gamecache, regardless of quantity. AMD can or may re-record a new version of the video, just to include new markings for certain processors.AICache
you do understand that GPU caches are very different things built for very different reasons?I think a bigger deal for the stacked L2 cache would be for uses by AI GPUs, given the amount of money that is behind it and almost "money is no object" in race to achieve max performance.
It's cheap given what those things will retail for. That's it.Something interesting has to be going on in that die if AMD is spending big bucks on it.
you should just stop.Since AMD is getting better at using L2 to derive better performance and to optimize external bandwidth utilization, AI GPUs need a lot more of that and have large budgets to make it happen.
AMD ain't paying any less for HBM4, they're just not willing to deal with the thermal nightmare of 11Gbps HBM4 at the current DRAM nodes.Another tidbit: NVidia is trying to maximize memory bandwidth by twisting the arms of HBM suppliers to increase the clocks. In the meantime, AMD says: we will take the cheap, low speed / lower bandwidth HBMs
