What’s the fate of K12/ARM at AMD?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
20,272
9,327
136
Wasn't K12 planned to have the same backend as their x86-64 cores?
Near as I can tell, yes.

How then do you imagine it would end up much faster? Maybe smaller but Zen is already minuscule.
Wait, what is supposed to be faster than what here? I'm confused.

The crux of my argument is that AMD has no incentive to go ARM when other companies are already rolling out cores that would steamroll K12 from 2016/2017 on a per-core basis. I know that A12 makes my 1800x @ 4.0 GHz look bad in GB4 ST. Which is kind of embarassing.

And RPI 4...
RPI4 is only A72. It's decent for what it is. I'm more interested in Rockchip's upcoming A76 SoC. Pine will probably have one of those on an SBC.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,351
651
136
Did he say how much? Most studies have shown that ISA is nearly irrelevant for modern compiler-produced machine code. Although I suppose a smaller decoder would allow for larger buffers, queues and caches.
Nope, he did not say any numbers. It was just qualitatively. And the ISA is very relevant as it is the interface between HW and SW. In addition there is the memory model, which is drastically different between contemporary RISC architectures and x86/x64.
The argument, that the x64 instruction are anyway translated into RISC like uops is only a very simplified view.

Not sure what studies you have read, but evidence shows the contrary.

Coming back to K12 - if it would have been just Zen (size and performance) with different instruction set, it would not be competitive with recent ARM designs.
 

Thala

Golden Member
Nov 12, 2014
1,351
651
136
RPI4 is only A72. It's decent for what it is. I'm more interesting in Rockchip's upcoming A76 SoC. Pine will probably have one of those on an SBC.
Yup, i was a bit disappointed, that RPI4 SoC is still 28nm - this puts quite a limit on the performance. I assume you can fit A76@10nm into the same or even lower power envelope.
Also they should have put a Mali GPU in there - it has much broader documentation and SW support.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,422
2,390
136
Nope, he did not say any numbers. It was just qualitatively. And the ISA is very relevant as it is the interface between HW and SW. In addition there is the memory model, which is drastically different between contemporary RISC architectures and x86/x64.
The argument, that the x64 instruction are anyway translated into RISC like uops is only a very simplified view.

Not sure what studies you have read, but evidence shows the contrary.

Coming back to K12 - if it would have been just Zen (size and performance) with different instruction set, it would not be competitive with recent ARM designs.
It was the skybridge talk and he said the k12 oooe engine was 10% bigger then the x86. So that's only going to be a point or two of IPC.
 

Thala

Golden Member
Nov 12, 2014
1,351
651
136
How do you figure? VideoCore VI is supported by the V3D driver, which has been in Mesa for over a year.
In this case it seems i did not take attention, that there is a VideoCore VI MESA branch public available - i thought the latest was VideoCore V. So i take back my claim about better SW support for Mali.
 

Thala

Golden Member
Nov 12, 2014
1,351
651
136
It was the skybridge talk and he said the k12 oooe engine was 10% bigger then the x86. So that's only going to be a point or two of IPC.
I did refer to the AMD core innovation summit 2014, where Keller said that the ARM version was build with a bigger engines, but he did not mention any numbers at this event.
 

gdansk

Golden Member
Feb 8, 2011
1,209
1,182
136
Nope, he did not say any numbers. It was just qualitatively. And the ISA is very relevant as it is the interface between HW and SW. In addition there is the memory model, which is drastically different between contemporary RISC architectures and x86/x64.
The argument, that the x64 instruction are anyway translated into RISC like uops is only a very simplified view.

Not sure what studies you have read, but evidence shows the contrary.

Coming back to K12 - if it would have been just Zen (size and performance) with different instruction set, it would not be competitive with recent ARM designs.
Blem, for example, finds that ISA is a toss up.

Regardless, I don't think Zen 1/2 with an ARMv8 front-end would be appealing. There's too much software that works better with x64. Very few companies are willing to invest the money to port and optimize software to ARMv8 for a hypothetical 5-10% performance gain when they could just wait a year for AMD/Intel's new slightly faster uarch instead. Or for the number of cores to double again.
 
  • Like
Reactions: Tlh97 and amd6502
Mar 11, 2004
22,144
4,472
146
I see more opportunity for a coprocessor than a replacement chip. Maybe it’s the tinkerer in me but i would just like to some sort of secondary chipset to revive serial analog connections and interface with various new/legacy hardware. I hate that laptops are going to all usb all the time, I have to service too much old machinery that needs legacy connectors and if i had a configurable serial interface i could slap together a port monitor, read and format the data i need without the need of old legacy equipment..... idk it’s a bit of a pipe dream
Yeah, gonna say AMD probably isn't interested in making a chip to cater to your niche of a niche. I actually think there likely already exists ARM based stuff that does what you want (there's lots of Raspberry Pi and other "tinker" boards with tons of different uses and I could swear I've seen someone that made one just to easily interface with older serial interfaces for various reasons like you're apparently looking to do).

I also don't know why you don't like USB, it sounds like its making it so you wouldn't need such a thing?

K12 appears to be shelved. I don't think it is in AMD's interests to encourage ARM servers. Right now the server AMD64 CPU market is basically a duopoly, and AMD and Intel aren't giving out IP licenses to allow for new competitors- whereas ARM will happily sell a license to anyone with a big enough cheque book. Better for AMD to let the ARM market flounder and die, and keep a nice big chunk of a restricted market.
Yeah K12 is dead. If AMD were to go ARM it'd almost certainly be using standard ARM designs, possibly tweaked some. I think AMD would look to leverage their other IP (stuff like their GPU, InfinityFabric) more as their means of differentiating their ARM products. They'd also likely target server markets since there's not a lot of ARM competition and it would let them leverage more of their IP.

Which I've offered the question before, why wouldn't AMD buddy up with ARM to develop a new high performance ARM core (maybe for servers or something), where AMD could license IP like InfinityFabric and/or their GPU (which gives them a leg up on getting software built for their hardware). AMD gets to influence the core design (which means they'll know whats in it before any other licensee so they'll have a near perpetual leg up on the competition), gets IP licensing, while hedging their bets should there be any issues with x86 licensing.

ARM isn't floundering or dying. It never took off in the server space really if that's what you're talking about. But I think that absolutely could change. There's going to be a lot of change to that market in the next 5-10 years, and I feel like ARM is likely more adaptible to changes than x86 is, but AMD and Intel are making that possibly not true. I think that is an area where ARM's licensing model is actually somewhat limiting (by that I mean, that they develop a core and other aspects, then companies can license it and build from there, but there's an inherent delay in the development). x86 looks like it'll be doing very well (i.e dominate outside of mobile) for another 5 years. Past that though, we'll see. But by then I think x86 vs ARM will probably be even less of an issue/factor, so it might be entirely moot. But I think the issue will more be, how do those integrate into the overall system. It would be my guess that ARM will be more able to be adaptible into a new programming chain where maybe GPUs would be the dominate piece programmed for. Although maybe its about what can do the control core aspect the best (i.e. where its dictating data to and from specialized hardware), and I feel like x86 has an edge there (but don't know if that's true or not).

The new iPad OS is a very strong push towards Mac OS functionality, I would not be entirely surprised to find that it has even more in common with the Mac OS codebase than iOS does.

Especially considering it addresses a lot of the productivity functionality that was missing in iOS that would be necessary for migrating Mac apps to iOS/iPadOS and ARM.
Actually I'd guess its the other way around, that Apple made iPad OS because they're slowly making MacOS more like iOS, but they kinda need to split iOS into a higher performance version for higher end hardware, while they can keep tailoring regular iOS to phones and other devices. Although, one of the biggest announcements from the recent Apple dev conference is some method of translating iOS apps onto MacOS as I think they're looking to converge development, so it could be argued that they're more going the route you said.

Which, I wonder why they wouldn't just run iOS on Mac in a VM. And then maybe even in the future do the same with MacOS. Where they can maintain compatibility, and then use whichever one as they see fit. Maybe even do a third VM that is just the GUI so they can tailor that to individual products. But then I think we'll see things simplify by the time they'd get around to that (basically I think AR glasses will become the dominant base computing form factor, powered by cloud processing although initially it'll probably tether to phone/tablet/laptop/desktop as an intermediary until we get wireless networking that could link to the cloud platforms; but instead of having a phone display or tablet display or other monitor, most people will be using AR glasses, so UI will target that). From what I've gathered there's very little (almost no) performance penalty these days and the VM environments can have low enough level access to the hardware, but can also be isolated for security and stability.

This seems like one of those instances where Microsoft has a good idea (that's essentially what Microsoft has been doing on the Xbox One, which they've been moving Xbox and even Windows as more services), but because its Microsoft, Apple has to try and accomplish the same thing in a different way just because. Kinda like how they used to say the Surface line is bad and flawed by being a 2 in 1 and merging tablet and laptop, but then have been doing exactly that (and even started marketing the iPad Pro as a computer and not a tablet).

Wasn't K12 planned to have the same backend as their x86-64 cores? How then do you imagine it would end up much faster? Maybe smaller but Zen is already minuscule.

They developed a high performance x86-64 core around 4mm^2 with up to 64 in a single socket. I can see why they abandoned the ARM route. Instead they'll bring more cores to the x86-64 market. It seems like a good strategy when ARM servers are still a rounding error in the server market. Put another way: they could develop an ARM chip similar to Rome, but how much smaller would that market be?
Which that's the thing, arguably I think ARM would let AMD cram more cores into a die than x86 (which is why I think they were co-developing them, as they likely could see the writing on the wall that ARM would keep advancing, and ARM likely would let them stuff more cores in for the markets that they were targeting - servers/HPC/etc). Zen is a good design, but that means K12 likely would've been similarly good.

I think what balked AMD is that the consumer ARM space is crowded so they'd really struggle to make inroads there, the software wasn't mature for that (Windows being the key one and Android was still a mess so it wasn't going to be a viable alternative to Windows for computing devices). Plus at the time they were likely deciding this stuff, Apple had started to make their own ARM chips and had just started to make custom ones that were opening people's eyes about ARM's potential (so AMD wouldn't be able to court Apple with ARM cores). Point being, I think there were a lot of reasons why AMD decided to focus on x86, and I think it was a smart move for the time.

But I think ARM has potential, and I think its has more potential now than it did before. Much of what made Zen good could be applied to ARM (InfinityFabric, I/O die, CPU die modules, etc). So I wouldn't rule out AMD looking at ARM in the future. They don't need to now and for the forseeable future, and x86 hasn't been a hindrance yet (and possibly won't, weirdly it seems like a lot of people think that ARM is going to be eclipsed by MIPS or something, which I'm just not seeing at all). But I feel like we'll see ARM show some benefits that could help it start to win out versus x86. Most if not all of the things advancing x86 processors these days can be utilized almost directly in ARM. I think the next test will be which one can transition to this next level of memory/storage tiers, and unified addressing, as that's what will fuel performance the most. Dropping overall system data latency is the next big computing performance step. Its Intel's focus (I think its why they're looking for GPUs, as they can integrated their own interconnects and work to minimize latency for large datasets best; of course they use the GPU for processing the datasets so they can make money from it, but by unifying aspects of the overall platform, much in the same way that AMD is looking at doing, means they can offer a jump in the overall performance CPU advancement or GPU advancement alone couldn't achieve; and likewise they can't get that from trying to work with other companies where driver software and the like comes into play - that can be mitigated with highly tailor software for supercomputers but for the more general HPC/enterprise markets it doesn't).

One area that I think has potential in the HPC space is, I've wondered if we might even see ARM be able to realize the idea behind Intel's Larrabee. Granted GPUs have since offered most of that by becoming much more programmeable, but the thought behind it was that cramming in as many programmable cores in as possible. Intel couldn't realize that and only kinda half-hearted tried at it (I think they talked about like 512 core versions, but we only got like 7x something core chips - which even with 4 way multi-threading only offered 2xx threads). Even that was good for some markets. ARM I think could cram in a lot more, or I think it can be integrated into other processors (think how AMD was looking to merge CPU and GPU).

Two last things to note. Most of these new processes seem built with ARM designs more in mind so they seem to favor ARM designs somewhat inherently. The other thing to note is that Nvidia, almost out of necessity is planning on pushing ARM more, since AMD and Intel both now have reasons to prioritize their own stuff. I think Nvidia saw that coming and why they started on Tegra (since that happened around the time that Intel showed substantially more interest in GPU with the iGPU advancement in Sandy Bridge, and AMD had been talking about that for years), especially the custom cores. But they're talking about integrating ARM into their GPUs for even HPC markets. Which maybe they'd integrate an ARM core into their base GPU SM block, so however many SMs a chip has it'd have that many ARM cores. I'd guess it'd be more like a block of them with 1/2 to 1/4 CPU core to SM core count.
 
Last edited:
  • Like
Reactions: BHZ-GTR and amd6502

Thala

Golden Member
Nov 12, 2014
1,351
651
136
Blem, for example, finds that ISA is a toss up.

Regardless, I don't think Zen 1/2 with an ARMv8 front-end would be appealing. There's too much software that works better with x64. Very few companies are willing to invest the money to port and optimize software to ARMv8 for a hypothetical 5-10% performance gain when they could just wait a year for AMD/Intel's new slightly faster uarch instead. Or for the number of cores to double again.
Thats the thing, as Moores Law slowing down, efficiency is the single most important metric. It allows you to extract most performance out of a given area, power and thermal budget. We are not in situation anymore where "waiting a year" solves issues.

Regarding the paper, some of the numbers seem very dubious in particular on the performance side - the methodology used is largely ok though. Further Intel 45nm process already had high-k metal gate, reducing static power quite a bit compared to the 45nm ARMs. Also Intels Atom cores are largely hard IP and full custom blocks - while OMAPs certainly uses soft IP. This kicks in when they apply their DVFS scaling and artificially reduce the voltage of Atom. And most importantly, many of the advantages of implementing an RISC like ISA only kicks in when looking at wide oooe architectures - where the goal is to extract as much instruction level parallelism as possible. Just look at more recent ISA developments like RISC-V.
Today we have ARM cores with a very streamlined ARMv8 ISA with similar or better per-cycle performance than the best x86 designs at higher effciency points. In one of the Microprocessor reports from last year there is an article comparing Cortex A76 to Skylake for example - here is where things really get interesting.

Otherwise i agree, Zen with ARMv8 (aka K12) with somewhat higher performance would not be particularly impressive in the ARM landscape, where you can find half as large cores at similar per-cycle performance level.
 
  • Like
Reactions: Nothingness

SarahKerrigan

Senior member
Oct 12, 2014
229
253
136
Thats the thing, as Moores Law slowing down, efficiency is the single most important metric. It allows you to extract most performance out of a given area, power and thermal budget. We are not in situation anymore where "waiting a year" solves issues.

Regarding the paper, some of the numbers seem very dubious in particular on the performance side - the methodology used is largely ok though. Further Intel 45nm process already had high-k metal gate, reducing static power quite a bit compared to the 45nm ARMs. Also Intels Atom cores are largely hard IP and full custom blocks - while OMAPs certainly uses soft IP. This kicks in when they apply their DVFS scaling and artificially reduce the voltage of Atom. And most importantly, many of the advantages of implementing an RISC like ISA only kicks in when looking at wide oooe architectures - where the goal is to extract as much instruction level parallelism as possible. Just look at more recent ISA developments like RISC-V.
Today we have ARM cores with a very streamlined ARMv8 ISA with similar or better per-cycle performance than the best x86 designs at higher effciency points. In one of the Microprocessor reports from last year there is an article comparing Cortex A76 to Skylake for example - here is where things really get interesting.

Otherwise i agree, Zen with ARMv8 (aka K12) with somewhat higher performance would not be particularly impressive in the ARM landscape, where you can find half as large cores at similar per-cycle performance level.
Yes, the A76 and A77 deliver very impressive iso-clock performance, but I'm not aware of either of them being shipped or announced in excess of 3GHz - so concluding that ARM is inherently better than x86 based on that is, I think, dubious. Throwing RISC-V into the mix is even stranger, since no announced RISC-V core I'm aware of thus far has iso-clock performance above Cortex-A72.

I'm also not sure what you're referring to by "advantages of implementing a RISC-like ISA." I write a lot of low-level code on wide, aggressive RISC cores (Power) and single-thread is merely good, not amazing. As far as I know, in absolute general-purpose single-thread performance, x86 is hard to beat right now; the mainframe folks (mainly z) may match or exceed it but those, too, are CISC. Apple is probably very close, and I'm curious what kind of clock headroom they have if thermal and power limitations are removed.

Anyway, none of this is intended as a criticism of ARM; I think their microarchitectures are absolutely excellent and the roadmap looks solid on both the client and Neoverse sides. There is potential for making real gains in the PC ecosystem, if the stars align properly - but those stars include factors beyond just microarchitecture quality. Compatibility, available hardware, and consumer familiarity all play in ways that are favorable to incumbents.
 

SarahKerrigan

Senior member
Oct 12, 2014
229
253
136
I'd also like to add on to the above that most of the interesting problems with manycore chips are not particularly ISA-bound. Interconnects and coherence are hard problems and continue to be equally hard regardless of the ISA. If aggressive server-style uncores aren't necessary, Tilera would happily sell you a 72-core chip with 3-wide cores and some interesting peripherals in sixty watts, eight years ago. Kalray has, today, 256 cores (plus a few dozen auxiliary ones), each 5-wide, on a die that dissipates something like 20W. It is eminently possible to put lots of cores on a die without ARM coming into the picture.

Anyway, it's entirely possible I'm wrong. It's certainly happened before. But when I think of "interesting server problems that can be solved by going to ARM", it tends to be more in the realm of "I want to do a specialized chip with powerful licensed cores on die", rather than "ARM renders everything else obsolete."

On the flip side, there's genuinely a lot going for them; the licensable interconnect situation seems to have become quite favorable, N1 looks like a great core, and SVE, provided they ever ship a licensable core with it, is wonderful.
 

NTMBK

Diamond Member
Nov 14, 2011
9,975
4,356
136
I see more opportunity for a coprocessor than a replacement chip. Maybe it’s the tinkerer in me but i would just like to some sort of secondary chipset to revive serial analog connections and interface with various new/legacy hardware. I hate that laptops are going to all usb all the time, I have to service too much old machinery that needs legacy connectors and if i had a configurable serial interface i could slap together a port monitor, read and format the data i need without the need of old legacy equipment..... idk it’s a bit of a pipe dream
Lots of modern motherboards still have at least one COM port header on the motherboard. You just need to buy a cheap header-to-port device, that puts a COM port in one of your empty PCI slots on the back of the case and plugs it into your motherboard.
 

Thala

Golden Member
Nov 12, 2014
1,351
651
136
Yes, the A76 and A77 deliver very impressive iso-clock performance, but I'm not aware of either of them being shipped or announced in excess of 3GHz - so concluding that ARM is inherently better than x86 based on that is, I think, dubious.
Well frequency scaling is mostly a process property. As long as ARM cores achieve same frequency at about the same voltage, you can assume they will scale similarly with voltage. That is a little simplified as in order to really reach high frequencies you would need to optimize the cell mix, add buffers and clean hold-violations for high voltages - but that is mostly backend work.
That ARM cores are not shipping with very high frequencies is no good metric, as desktop class ARM implementations running in in overdrive voltage region of 1.3+V are just not available - for obvious reasons.

Throwing RISC-V into the mix is even stranger, since no announced RISC-V core I'm aware of thus far has iso-clock performance above Cortex-A72.
This was again not an argument about existing cores but about ISA design principles. David Patterson and others were going lengthly about these principles in few publications lately. I am sure RISC-V implementations can achieve similar efficiency as ARMv8 implementations.

I'm also not sure what you're referring to by "advantages of implementing a RISC-like ISA." I write a lot of low-level code on wide, aggressive RISC cores (Power) and single-thread is merely good, not amazing. As far as I know, in absolute general-purpose single-thread performance, x86 is hard to beat right now; the mainframe folks (mainly z) may match or exceed it but those, too, are CISC. Apple is probably very close, and I'm curious what kind of clock headroom they have if thermal and power limitations are removed.
I was referring to the HW implementation of an ISA, namely the microarchitecture.
 

soresu

Golden Member
Dec 19, 2014
1,856
1,023
136
Yes, the A76 and A77 deliver very impressive iso-clock performance, but I'm not aware of either of them being shipped or announced in excess of 3GHz - so concluding that ARM is inherently better than x86 based on that is, I think, dubious. Throwing RISC-V into the mix is even stranger, since no announced RISC-V core I'm aware of thus far has iso-clock performance above Cortex-A72.
For server/datacenter systems dominated by huge cooling and power costs, I'd say that ARM exhibits an opportunity for per thread power efficiency that is highly beneficial in that use case.

03_Infra Tech Day 2019_Filippo Neoverse N1 FINAL WM15_575px.jpg

If this holds up in practise then that proposition is very good, and ARM has already shown that they are looking at SMT going forward with Neoverse E1/Cortex-A65AE.

As someone also stated, SVE/SVE2 shows a path going forward to compete with AVX2, and higher vector length SIMD (up to 2048 bit if that floats you boat for custom HPC cores).

SVE2 is also capable of replacing NEON in the future (media/DSP instruction parity for SVE, same/higher 128 bit performance), and TME will compete with Intel's TSX Transactional Memory scheme - both of these were heralded as long term multi year investments by ARM, so they are extremely likely to be part of future off-the-shelf Cortex-Axx core IP.

At a guess, I would assume that any first SVE2 enabled core will have no greater than 256 bit SIMD, given the likely jump in power consumption that will bring - I'd estimate late 2021-2022 at the earliest.
 

Shivansps

Diamond Member
Sep 11, 2013
3,641
1,329
136
RPI4 is only A72. It's decent for what it is. I'm more interested in Rockchip's upcoming A76 SoC. Pine will probably have one of those on an SBC.
RPI 3 A54 cores were already giving about half the ST/MT geekbench score of a quad core Sempron 3850, im expecting the RPI4 A72s to perform like the Sempron 3850 or very close to it. The mayor issue is the IGP, but again this is nothing Microsft cant fix if they are want.
 

gdansk

Golden Member
Feb 8, 2011
1,209
1,182
136
The other thing to note is that Nvidia, almost out of necessity is planning on pushing ARM more, since AMD and Intel both now have reasons to prioritize their own stuff.
That's true because it is the only other ISA with widespread software support. But, for example, Nvidia is using their own custom designed RISC-V cores internally on future GPUs because of the lack of licensing fees.

in the ARM landscape, where you can find half as large cores at similar per-cycle performance level.
I think Apple's Vortex is the exception. Every ARMv8 uarch you can actually buy for servers (Vulcan, Falkor, Skylark) is nearer to Zen than it is to Vortex in IPC.
 

Thala

Golden Member
Nov 12, 2014
1,351
651
136
And its the AES and SEGEMM what is really pushing in favor of the 3850.
I exepect even higher score under 64 bit. If i remember correctly Cortex A72 phones were achieving 1800-1900 single core score - at 2.2GHz or something.
But hey, Sempron 3850 is a very good comparison, as it was also manufactured in 28nm - albeit it was a 25W TDP Chip.
 

Richie Rich

Senior member
Jul 28, 2019
470
228
76
Few questions if somebody has a idea what was K12:
  1. As Keller mentioned he can get more performance from ARM than x86, does it means K12 architecture was planned as Zen1/Zen2 line up successor (higher performance than Zen 2)?
  2. if K12 was next-gen of Zen1/2 how much possible is that Keller designed this K12 as ultra wide 6xALU core he knew from Apple?
  3. How much is possible that Zen 3 is based on some ideas/peaces developed for K12? Hard to imagine low resource company like AMD wasted all the work was done on K12.
  4. Was K12 killed completely or they just killed ARM branch and put all resources into x86 version? Is it possible that Zen 3 is x86 branch of K12 just with appropriate naming to Zen chronology? Forrest Norod said that Zen 3 completely new uarch which opens the possibility for this option.
 

ASK THE COMMUNITY