64 core EPYC Rome (Zen2)Architecture Overview?

Page 33 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 11, 2004
18,183
586
126
I was gonna say, wasn't there talk that that K12 and Zen were being co-developed with similar ideas behind them? Just one being x86 and the other ARM. Which, especially in hindsight, they were smart to focus development, but I do wonder if they'd be able to apply Zen ideas to ARM for benefits to get into that space. I feel like AMD should at least consider ARM, if for no other reason than to offer hybrid designs, where they could have a bank of ARM cores, and a bank of AMD x86 cores. I think that would help them out in laptop space, where on battery it relies more on the ARM stuff, but then docked, or for certain use cases like gaming it uses the x86 stuff. Really seemingly there's nothing that would stop them from integrating another company's ARM chip just like they do their own modules, so they could slap a Snapdragon chip in there and they can offer all the benefits it has but also x86 compatibility. Which I think that'd be the way to do the Surface Book design, have the ARM chip in the tablet part so it goes to that, and then have the x86 and GPU in the laptop base.

Which, by now the standard ARM cores are probably as good or better than what K12 would have been (maybe not though?), so ditch the custom ARM and just integrate standard ones for low power use. Plus it would let them do Chrome boxes/books, and they could even tout it as a developer feature (where say they could run the ARM stuff in VM so they could test out their Android code). I have a hunch it'd be beneficial for consoles too (especially if they want to win back Nintendo's favor).

Interesting about combining the good parts of Bulldozer and Jaguar. Which, the latter was probably where the CCX design came from, no? I know the console chips, or maybe its just the Xbox, had Jaguar cores in two groups of 4 cores. Not sure what was positive about Bulldozer, although there probably was some aspects, if nothing else some of the logic blocks could probably be ported over, and then they could improve them bits at a time (kinda like them with the 3rd party memory controller).
 

amd6502

Senior member
Apr 21, 2017
505
66
86
I was gonna say, wasn't there talk that that K12 and Zen were being co-developed with similar ideas behind them? Just one being x86 and the other ARM. Which, especially in hindsight, they were smart to focus development, but I do wonder if they'd be able to apply Zen ideas to ARM for benefits to get into that space. I feel like AMD should at least consider ARM, if for no other reason than to offer hybrid designs, where they could have a bank of ARM cores, and a bank of AMD x86 cores. I think that would help them out in laptop space, where on battery it relies more on the ARM stuff, but then docked, or for certain use cases like gaming it uses the x86 stuff. Really seemingly there's nothing that would stop them from integrating another company's ARM chip just like they do their own modules, so they could slap a Snapdragon chip in there and they can offer all the benefits it has but also x86 compatibility. Which I think that'd be the way to do the Surface Book design, have the ARM chip in the tablet part so it goes to that, and then have the x86 and GPU in the laptop base.

Which, by now the standard ARM cores are probably as good or better than what K12 would have been (maybe not though?), so ditch the custom ARM and just integrate standard ones for low power use. Plus it would let them do Chrome boxes/books, and they could even tout it as a developer feature (where say they could run the ARM stuff in VM so they could test out their Android code). I have a hunch it'd be beneficial for consoles too (especially if they want to win back Nintendo's favor).

Interesting about combining the good parts of Bulldozer and Jaguar. Which, the latter was probably where the CCX design came from, no? I know the console chips, or maybe its just the Xbox, had Jaguar cores in two groups of 4 cores. Not sure what was positive about Bulldozer, although there probably was some aspects, if nothing else some of the logic blocks could probably be ported over, and then they could improve them bits at a time (kinda like them with the 3rd party memory controller).
I kind of hope they stay far away from ARM cores. There isn't much wrong with licensing available cores, and those have come along pretty well; so maybe at most, a shoestring budget project like an A1100 successor. For a sister core, it seems like one would have to pour a whole lot of resources and time into reinventing the wheel just to make it somewhat better. There is an interesting Register article on the history of AMD's A1100 acorn server soc "Seattle", see reddit thread and link: https://old.reddit.com/r/Amd/comments/a13vwc/the_register_amazons_homegrown_23ghz_64bit/

Back then when the leading core (A57) was so weak may have been the time when one would think, hey "let's do our own much better version". Yet in hindsight that may not have paid off, as along came the A73 (in less time than one could have finished something homegrown).
 
Last edited:

moinmoin

Senior member
Jun 1, 2017
736
233
96
Note that AMD does still use ARM as part of their SoCs, specifically for TrustZone, likely also for other parts (I'd suspect the SCF with all its sensors and all the fancy PR names for "SenseMI" technology is internally running on ARM). ARM just isn't used as a performance product itself like K12 should have been, and that's a good decision since ARM competition is less predictable while within the x86 ecosystem AMD will always be a valid second source in the worst case (something they are already exploiting well using their semi custom business).
 
Mar 11, 2004
18,183
586
126
I kind of hope they stay far away from ARM cores. There isn't much wrong with licensing available cores, and those have come along pretty well; so maybe at most, a shoestring budget project like an A1100 successor. For a sister core, it seems like one would have to pour a whole lot of resources and time into reinventing the wheel just to make it somewhat better. There is an interesting Register article on the history of AMD's A1100 acorn server soc "Seattle", see reddit thread and link: https://old.reddit.com/r/Amd/comments/a13vwc/the_register_amazons_homegrown_23ghz_64bit/

Back then when the leading core (A57) was so weak may have been the time when one would think, hey "let's do our own much better version". Yet in hindsight that may not have paid off, as along came the A73 (in less time than one could have finished something homegrown).
Yeah, the standard ARM designs are good enough that they'd be good to go with. Actually would probably be the best route just for the software alone (no specialization needed like a custom core likely would).

My argument is that, instead of trying to build Zen to scale down (not that its horrible, and especially Zen 2 should be good in laptops, but it won't touch ARM in low power), just use ARM. Pair it with Zen (kinda like how they're approaching memory/storage as a tiered system, do the same with processors) to offer hybrid setups, where on battery or idle it uses ARM. It would let them get into markets they currently aren't or don't compete well in (Chromebooks, tablets, laptops in general), and its not just lower cost stuff, I think it could help them get into premium segments better as well (stuff like the Surface line, if I were AMD I'd have a Surface Book like 2-in-1/convertible, where the ARM chip is in the tablet part, and the CPU/GPU in the base). They could partner with other companies who would focus on the ARM stuff, saving AMD the resources of that.

I think getting in good with Google and Microsoft would be beneficial for software development as well, so its more than just the sales. Google is working on a hybrid OS that fuses ChromeOS and Android, which I think would be a great fit for the type of thing I'm talking about. And I think it'd be a good fit for Surface (the Pro as well, where on battery it could focus on the ARM, and then it could have a dock to enable higher performance kinda like the Switch does - which there was a Surface dock before), and Microsoft is doing a push for Windows on ARM (because I think Microsoft recognizes that ARM is now powerful enough to start expanding beyond phones and even tablets). I think it would be a good fit for Apple too (Macbook Pro, where it has an Apple SoC but also like an 8+core Zen 2 CPU with decent AMD GPU, I'd think that'd be a Apple developers dream - plenty of power for Mac development and editing/content creation, with the ability to test iOS code right on the chip).

I think working with ARM companies would help for their console/custom business too. And maybe they can license their GPU in ARM designs of other companies. So, its key they get their GPU designs to improve substantially (especially in efficiency). Oh and speaking of GPU, I think these types of designs would be well suited for external GPU boxes/docks, so they could push thin and light laptops that have usable battery life because of ARM, but can offer substantial power when docked.

ARM SoCs also would help them with networking stuff, as it would let them add cellular (and even just Wifi and the like) without having to do their own.

I think it will also be important for their future. I think its just a matter of time til ARM takes over most consumer devices. Its angling for laptops, but I think as consumer stuff becomes more simplified terminal relying on cloud backend to do the heavy lifting (which keep Zen development for that), that ARM will become the norm because that stuff will focus on small size, low power, and connectivity. Stuff like AR glasses, where the ARM chip is more just handling I/O, and communication.

Note that AMD does still use ARM as part of their SoCs, specifically for TrustZone, likely also for other parts (I'd suspect the SCF with all its sensors and all the fancy PR names for "SenseMI" technology is internally running on ARM). ARM just isn't used as a performance product itself like K12 should have been, and that's a good decision since ARM competition is less predictable while within the x86 ecosystem AMD will always be a valid second source in the worst case (something they are already exploiting well using their semi custom business).
Right, but that's very different from what I'm talking about.

Absolutely, I think they'd be smart to eschew customizing ARM and instead just use the standard design or partnering with other companies that are doing ARM stuff.
 
Last edited:

jpiniero

Diamond Member
Oct 1, 2010
6,287
197
126
big.LITTLE on different architectures would be weird. If they were going to do this, Jaguar would be the pick, especially if they can get Sony/MS to pay for the majority of the design costs of shrinking it to 7.
 

JohnTheHero

Junior Member
Nov 19, 2018
1
2
41
big.LITTLE of two different architectures would be a nightmare for developers. That would mean two totally different intruction-sets, and therefore you actually need two different binaries. Doesn't ARM allow big.LITTLE only between selected compatible cores of same ISA variants?
I cannot even imagine how ISA heterogeneous SOCs would work, how to do migrate thread from big to little? If you cannot migrate task in runtime (application restart required), or if it requires to much work to do so (takes too much time, and energy) then it wouldn't make sense to implement it.
I guess ISA-variant compatibility problem also stands on the way to zen-cat mix.
 
Apr 27, 2000
11,446
790
126
ISA heterogeneous SoCs already exist. AMD has had ARM cores on-die for awhile, with TrustZone etc. Now from a code perspective, it's more complicated since you rarely (if ever) address a chip like TrustZone from the userspace.

Note that big.LITTLE is (as of now) out-of-date. It's all DynamIQ now. Which is basically the same thing, except that the core configurations are now more flexible.

Furthermore, I see no reason for AMD to continue using cat cores. Development on those has mostly ended. Low-power Zen would outperform cat cores easily. Do not underestimate what Zen can do at low voltages; AMD has simply chosen not to target any market (yet) where that would be viable, since the margins are so low.
 

DisEnchantment

Senior member
Mar 3, 2017
220
21
106
ISA heterogeneous SoCs already exist. AMD has had ARM cores on-die for awhile, with TrustZone etc. Now from a code perspective, it's more complicated since you rarely (if ever) address a chip like TrustZone from the userspace.

Note that big.LITTLE is (as of now) out-of-date. It's all DynamIQ now. Which is basically the same thing, except that the core configurations are now more flexible.

Furthermore, I see no reason for AMD to continue using cat cores. Development on those has mostly ended. Low-power Zen would outperform cat cores easily. Do not underestimate what Zen can do at low voltages; AMD has simply chosen not to target any market (yet) where that would be viable, since the margins are so low.
Interesting when you mention this, I was digging some patents and also watching some Coreteks video (at 1315 seconds) about HSA in AMD's context


Here , US Patent Application #20190129489

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1="20190129489".PGNR.&OS=DN/20190129489&RS=DN/20190129489


HSA.png

A heterogeneous processor system, comprising: a first processor implementing an instruction set architecture (ISA) including a set of ISA features and configured to support a first subset of the set of ISA features; and a second processor implementing the ISA including the set of ISA features and configured to support a second subset of the set of ISA features, wherein the first subset and the second subset of the set of ISA features are different from each other.
They are talking about big and small clusters, could this be arm64 + x64 or something like big little like Intel. This application is from 2017 so probably AMD has some thought about it at least as much as Intel has been recently.
And the patent mentioned completely different ISA.

AMD's has had big visions about HSA goals since some time now. May not result in a product but nonetheless very interesing.


Interesting when you consider that they have many patents tied around their fabric.
CPUs with GPUs [There are many for this]
CPUs and FPGAs [App #20190028752 ]
CPU with Video Codec and Inference Engine [App #20190042313]

AMD has a new term for these called APDs, Accelerated Processing Devices
 
Last edited:

amd6502

Senior member
Apr 21, 2017
505
66
86
Interesting when you mention this, I was digging some patents and also watching some Coreteks video (at 1315 seconds) about HSA in AMD's context


Here , US Patent Application #20190129489

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1="20190129489".PGNR.&OS=DN/20190129489&RS=DN/20190129489


View attachment 5811



They are talking about big and small clusters, could this be arm64 + x64 or something like big little like Intel. This application is from 2017 so probably AMD has some thought about it at least as much as Intel has been recently.
And the patent mentioned completely different ISA.

AMD's has had big visions about HSA goals since some time now. May not result in a product but nonetheless very interesting.


Interesting when you consider that they have many patents tied around their fabric.
CPUs with GPUs [There are many for this]
CPUs and FPGAs [App #20190028752 ]
CPU with Video Codec and Inference Engine [App #20190042313]

AMD has a new term for these called APDs, Accelerated Processing Devices

That is a very interesting patent link. The purpose of it seems to be to extend the power and performance range of big cores, which would be a big asset for mobile. The patent refers to the high feature processor as #1 and low feature processor as #2 and having power conserving features. Both #1 and #2 share a basic ISA but also contain extensions, or subsets of instructions outside of the basic ISA that differ.

So coreteks video is way off base in understanding the word heterogenous as meaning differing architectures such as x86 and acorn, or x86 and GCN. (Either of these pairs lack a common ISA).

My guess is that there is another architecture in the works, probably an extension of Jaguar, but with added coarse grain multithreading plus FPU SMT, and adapted to a highly efficient process such as 7nm+ or 6nm, and optimized, like Puma. This allows offloading of threads from Zen cores during low p-states and when underutilization conditions are detected, allowing Zen cores to sleep.

As far as the acorn server market, AMD has been testing the waters with the A1100 series. So far I think the acorn market is saturated and slow, but there are some niche pockets. https://www.theregister.co.uk/2019/05/01/softiron_hybrid_arrays/

A high cost 7nm project would be hard to justify, but I could see a A1100 successor on a low cost node like 22FDX. I kind of wonder if they could be in chiplet form and reuse the Rome IO hub.
 
Last edited:

Yotsugi

Senior member
Oct 16, 2017
794
196
96
My guess is that there is another architecture in the works
No.
It's all yearly Zen iterations.
AMD is not, and will never be interested in anything but highest performance parts, as long as Lisa is the CEO.
 

amd6502

Senior member
Apr 21, 2017
505
66
86
Unlikely.

You can also think of it as a low power extension to Zen.

Zen is somewhat under 200M transistors (180-190? with L1+L2)
Jaguar would be under 40M with similar L2, maybe ~30M with minimal L2.
i386 is around 1/4 M transistors.
Zen2 likely is well above 200M transistors (~300M wild guess)

So you add 10% to 20% to that if you strap one or a pair of Jaguar++ to a Zen core as low power extension (and with the option of having extra threads as bonus).

So a mobile oriented mini-CCX would consist out of 2c/4t big core plus 4c Jag++, with the SoC now having 4 big threads and 4 small threads. During idle and low demand conditions would conserve very much power compared to full CCX Zen 1 or 2.
 
Apr 27, 2000
11,446
790
126
AMD's has had big visions about HSA goals since some time now. May not result in a product but nonetheless very interesing.


Interesting when you consider that they have many patents tied around their fabric.
Also interesting when you look at these patents while taking CCIX into account. Also see some of the issues NV is having trying to put together useful render clusters featuring a multitude of their dGPUs connected by NVLink (using UVM).
 

DisEnchantment

Senior member
Mar 3, 2017
220
21
106
When you think about it,
One CCX of high powered cores, another CCX low powered cores and with something like dynamic Ryzen Master kind of funtionality they can switch on and off the CCXs like how it is now to schedule the tasks on one or the other.
Now I remember there is another patent application to move execution context across cores ( # 20180113797 )

Does not seem far fetched to implement given that they have the CCX concept already in place.
 
Apr 27, 2000
11,446
790
126
When you think about it,
One CCX of high powered cores, another CCX low powered cores and with something like dynamic Ryzen Master kind of funtionality they can switch on and off the CCXs like how it is now to schedule the tasks on one or the other.
Now I remember there is another patent application to move execution context across cores ( # 20180113797 )

Does not seem far fetched to implement given that they have the CCX concept already in place.
I'm not sure if they can shut down an entire CCX fully. IF links would probably still be active. Otherwise, I see no reason why they couldn't do that. ARM has been load-and-power balancing between atypical cores for years. It would require more work on the Windows scheduler (which MS is already doing/has already done for Win10-on-ARM anyway).
 

Tuna-Fish

Senior member
Mar 4, 2011
952
151
136

moinmoin

Senior member
Jun 1, 2017
736
233
96
I wonder who was the target audience?
It was presented at the GDC so game developers I guess? ;)
To me it seems more like a primer getting developer already familiar with Jaguar in PS3 and Xbox One used to what is coming in PS4 and likely the next Xbox as well. It's telling that the presenter is part of the Radeon Technology Group (so primarily graphics, not CPU) which also handles semi-custom business (including consoles) nowadays.
 

DisEnchantment

Senior member
Mar 3, 2017
220
21
106
See this post on twitter on the possible moves by AMD


I also happen to be following AMD's patent application regularly and the following patents have indeed become the norm
- 3d stacked memory, stacked GPU/Accelerators are very frequent
- versatile interconnect to plug in FPGAs/GPUs/Accelerators/Other CPUs etc are common
- integrated thermoelectric layer between stacked dies
- Integrated NV memory
 

amd6502

Senior member
Apr 21, 2017
505
66
86
See this post on twitter on the possible moves by AMD


I also happen to be following AMD's patent application regularly and the following patents have indeed become the norm
- 3d stacked memory, stacked GPU/Accelerators are very frequent
- versatile interconnect to plug in FPGAs/GPUs/Accelerators/Other CPUs etc are common
- integrated thermoelectric layer between stacked dies
- Integrated NV memory
Neat pointers. I like the integrated peltier effect patent. That could speed up heat flux over natural diffusion. Maybe slightly help address some of the hot spot issues of 7nm and very small nodes. Somebody on the ryzen thread mentioned tscm 7nm is HP oriented. While for servers (3-4ghz) 7nm should be great the same can't be taken for granted for HEDT freq past 4.5ghz; it just kind of boggles my intuition that you can pump all that power through these smaller nodes and make good frequency gains over 12nm.
 


ASK THE COMMUNITY