• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Skylake-EP/EX 28 Cores L3-Cache 38.5 MB

csbin

Senior member
https://www.computerbase.de/2016-09/skylake-ep-28-kerne-server-cpu-cache/



pbS76.jpg



Jji7N.png
 
Why does it have so little L3 cache? Broadwell Xeons tops out at 60MB L3. There is a serious reduction, maybe they increased the speed of the cache, Xeons with lots of L3 cache have slower l3 cache then lower end skus.
 
I would strongly suspect that this means that the chip will have a healthy helping of L4 edram. From a performance perspective, it makes sense to have a smaller, but faster, L3 and a larger, but slower, L4. The real question is what is the communication penalty for an off chip, but on package, eDram cache and is the overall performance of the memory subsystem substantially higher despite this penalty (in typical server workloads). For my own purposes (highly parallel, memory intensive scientific-ish computing), it seems like a trade off that I would be more than happy making. And, as someone who has only recently branched out into this area from theory, it would certainly make it easier to cache optimize my data structures.
 
I would strongly suspect that this means that the chip will have a healthy helping of L4 edram. From a performance perspective, it makes sense to have a smaller, but faster, L3 and a larger, but slower, L4. The real question is what is the communication penalty for an off chip, but on package, eDram cache and is the overall performance of the memory subsystem substantially higher despite this penalty (in typical server workloads). For my own purposes (highly parallel, memory intensive scientific-ish computing), it seems like a trade off that I would be more than happy making. And, as someone who has only recently branched out into this area from theory, it would certainly make it easier to cache optimize my data structures.

IBM Z series (mainframe) and Power series system already utilize this process. I would not be surprised to see Intel follow suit.

I also suspect that Skylake Xeon will have 512kb L2 which is needed for AVX512.
 
Back
Top