Broadwell's L4 was a different beast though since it was an MCM solution.I'm a bit confused about performance expectations. Someone cited the Broadwell with extra cache and that it only performed better in games.
Are you sure? Geil promised 10ns true latency modules (7200 MT/s @ CL36 and 6400 MT/s @ CL32) at launch, which is exactly the same as the vast majority of overclocked DDR4 modules. XPG even promises 7400 MT/s modues at unknown latencies.A large L4 cache might be required for most enthusiast level future CPU's paired with DDR5, to hide the rumored higher DRAM latencies, the one critical thing that DDR5 doesn't improve upon.
Not speaking about the fact that if latencies of the off-CCD L3 are very similar to the standard L3, this would point out to multiple CCDs without L3 being interconnected to a large off-die L3 cache, thus improving by orders of magnitude the communication and coherency among CCDs.
No, inter-CCD latencies remain the same and there are no interconnections anywhere. They are stacking additional layer of silicon, full of SRAM on existing L3 area and using vias to connect it to achieve bandwith. Latency is great cause it is basically same physical distance to consumers, it is using same address hashing to put lines into slices ( just the number of slices is greater ).
No need to invent complicated schemes, when 3D stacking solved the problems.
I mean in future CPUs, not the one demoed there.
A large L4 cache might be required for most enthusiast level future CPU's paired with DDR5, to hide the rumored higher DRAM latencies, the one critical thing that DDR5 doesn't improve upon.
Are you sure? Geil promised 10ns true latency modules (7200 MT/s @ CL36 and 6400 MT/s @ CL32) at launch, which is exactly the same as the vast majority of overclocked DDR4 modules. XPG even promises 7400 MT/s modues at unknown latencies.
I know that Ian was worried about latencies in this article, but it does not seem to have been materialized.
Not speaking about the fact that if latencies of the off-CCD L3 are very similar to the standard L3, this would point out to multiple CCDs without L3 being interconnected to a large off-die L3 cache, thus improving by orders of magnitude the communication and coherency among CCDs.
eDRAM or HBM are "L4" solutions, so they either need tags ( that take space that could be taken by L3 cache ) or they serve as so called "system cache" on memory side of things, acting as huge buffer (There is possibility of outright replacing some DRAM with say HBM, so first 16GB of address space are served by HBM and so on but that is different solution ).
This won't happen. Off-die will increase latency. On-die is always faster, and at the same time use less power to do so.
AMD affirmed that the additional stack of L3 has practically the same latency level as on-die L3.
You are not listening to either me nor JoeRambo.
AMD's V-cache is not, I repeat NOT off-die.
Unless you are misunderstanding what "off-die" means. In this case it means something like a separate CCX with caches only.
Then even what I am speaking about is not off-die L3, you too are not trying to understand what I meant.
Not speaking about the fact that if latencies of the off-CCD L3 are very similar to the standard L3, this would point out to multiple CCDs without L3 being interconnected to a large off-die L3 cache, thus improving by orders of magnitude the communication and coherency among CCDs.
Are you sure? Geil promised 10ns true latency modules (7200 MT/s @ CL36 and 6400 MT/s @ CL32) at launch, which is exactly the same as the vast majority of overclocked DDR4 modules. XPG even promises 7400 MT/s modues at unknown latencies.
I know that Ian was worried about latencies in this article, but it does not seem to have been materialized.
Let me quote what you said originally.
And? I clarified that a stack is OFF-die in my view, because it needs a separate die and an sasembly process to be connected to the CCD.
Fine.
But you should use common terminology. AMD's V-cache is vertical stacking. Off-die means off-die.
Thanks for the reply, I learned a lot.Caches are different, because they are easier to manufacture and have higher yields because of the repetitive structure. Also they are quite power efficient.
You are applying the square root law in a wrong way. The square root law is a big penalty because the power consumption increase in the core also increases just as much.
This cache is going to add at best 3-4W.
Thanks for the reply, I learned a lot.
I was wrong about it being two stacks, it's only one. So the cache dies are much cheaper than I expected, but is joining them to the compute die simple and with high yields?
I think we're talking about different square root laws