Mhmm. Like following the yellow brick road to find Alice's nemesis the queen and her archery chessboard played with humans.They have to solve the latency issue first.
Mhmm. Like following the yellow brick road to find Alice's nemesis the queen and her archery chessboard played with humans.They have to solve the latency issue first.
...because future AMD systems will support up to 64 MCA banks per CPU.
MAX_NR_BANKS is used to allocate a number of data structures, and it is
used as a ceiling for values read from MCG_CAP[Count]. Therefore, this
change will have no functional effect on existing systems with 32 or
fewer MCA banks per CPU.
When asked about what kind of performance gain Milan's CPU core microarchitecture, which is known as Zen 3, will deliver relative to the Zen 2 microarchitecture that Rome relies on in terms of instructions processed per CPU clock cycle (IPC), Norrod observed that -- unlike Zen 2, which was more of an evolution of the Zen microarchitecture that powers first-gen Epyc CPUs -- Zen 3 will be based on a completely new architecture.
Norrod did qualify his remarks by pointing out that Zen 2 delivered a bigger IPC gain than what's normal for an evolutionary upgrade -- AMD has said it's about 15% on average -- since it implemented some ideas that AMD originally had for Zen but had to leave on the cutting board. However, he also asserted that Zen 3 will deliver performance gains "right in line with what you would expect from an entirely new architecture."
How does that work with Zen 2 MCM packages? As we can see the amount of L3 blocks is 8, fitting for a single chiplet with 8 cores and their L3 slices, but not sufficient for multiple of them. Does Zen 2 have separate MCA banks per chiplet instead per CPU, and Zen 3 may be unifying that as well?Current MCA banks in Family 17h
The banks are logical and in the end they produce status which are present in the MCA registers. Some are per core and some are global.How does that work with Zen 2 MCM packages? As we can see the amount of L3 blocks is 8, fitting for a single chiplet with 8 cores and their L3 slices, but not sufficient for multiple of them. Does Zen 2 have separate MCA banks per chiplet instead per CPU, and Zen 3 may be unifying that as well?
Don't think that's as big an issue as you claim. A long time ago, I remember hearing a GPU designer claim that they could work around latency issues fairly easily. I don't think there's a lot of branching as compared to most CPU programs.They have to solve the latency issue first.
Everytime I go find new changes in the manuals and the kernel changes, I keep wondering about Forrest's comment about Zen3.
I have seen and read that, at some point I just stopped bothering to respond.
There is no time for the signals to go through a crossbar switch. We are talking about a situation where just wire delay is significant. A crossbar switch is made to switch any input to any output simultaneously. That isn’t a simple circuit. The latency would be to high. You don’t need a crossbar switch for direct connection since it is just 1 to many not many to many. There would be some control circuitry, buffers or queues, etc but basically with each core connected directly to all slices, you just need a 1 to 4 multiplexor (very simple) on each slice and a 4 to 1 multiplexor on each core. There is no direct connection between slices. Connections are between cores and slices. The complexity is is determining the location of the cache line and accessing it with good latency, not transferring it. That is a massive simplification, but I think the general idea is correct. Cache design has not been simple in a long time; it is just about the most important part of the chip.The L2$ controllers aren't a crossbar, they are simple p2p switches. A crossbar would be all 8 L2 controllers being connected to a fully meshed switch which is then connected to each L3 slice. That, or I've completely forgotten what I learned from the EEs while working at an enterprise network hardware company.
[L3 crossbar] is not the conventional interpretation
both your and Vattilas proposal are topologically the same as far as 4xL3$ is concerned - just drawn differently.
Don't think that's as big an issue as you claim. A long time ago, I remember hearing a GPU designer claim that they could work around latency issues fairly easily. I don't think there's a lot of branching as compared to most CPU programs.
Completely new architecture means completely new architecture. If it didn't it wouldn't be a completely new architecture.
Technically Ryzen was a completely new architecture and it gave us a 52% improvement. *ducks*
EDIT: Can you imagine Zen 3 having 50% higher performance? That would be nuts! Note that I don’t in any way it will happen, but one can dream...
That 52% was compared to Bulldozer though, and literally anything would have been better than Bulldozer.
Given Forrest's comments and Zen 2's 15% IPC uplift over Zen 1, I'd consider a Zen 2 to Zen 3 IPC uplift below 20% to be disappointing.
That 52% was compared to Bulldozer though, and literally anything would have been better than Bulldozer.
>=20% is happening. Apparently, Zen 4 is even bigger. Fun times ahead.That 52% was compared to Bulldozer though, and literally anything would have been better than Bulldozer.
Given Forrest's comments and Zen 2's 15% IPC uplift over Zen 1, I'd consider a Zen 2 to Zen 3 IPC uplift below 20% to be disappointing.
Not correct - AMD compared Zen (Summit Ridge) to Excavator (2015), Bulldozer was a first gen released on October 2011.
https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
https://en.wikipedia.org/wiki/Excavator_(microarchitecture)
Excavator was still a pile of garbage though. Lipstick on a pig if you will.
Zen3 features a different CCX layout implying a new L3 compared to family 17h. This fact alone makes it a different AMD architecture...Everytime I go find new changes in the manuals and the kernel changes, I keep wondering about Forrest's comment about Zen3.
Not correct - AMD compared Zen (Summit Ridge) to Excavator (2015), Bulldozer was a first gen released on October 2011.
https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
https://en.wikipedia.org/wiki/Excavator_(microarchitecture)
Zen3 features a different CCX layout implying a new L3 compared to family 17h. This fact alone makes it a different AMD architecture...
Why, what is that? Snails on a baguette?I just love that expression in English.
French isn't funny as that one.