What’s the fate of K12/ARM at AMD?

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Platinum Member
Feb 6, 2011
2,765
3,131
136
Is this really true? XV's FPU is 50% more capable than that of PD/SR?
Like most things that come from Mr 83 it's just wrong.

Bd/pd had 4 fpu pipes all 128bit. 2 pipes did fp /avx, the other two did int style simd xop/mmx.

Later on sr/ex they reduce to 3 pipes that shared both int and fp again all 128bit.
 
  • Like
Reactions: amd6502

Thunder 57

Platinum Member
Aug 19, 2007
2,674
3,796
136
To me it sounded like a (twin) sister architecture. So it would have been essentially the same design as Zen1 (14nm), except with an ARM front end. So that'd be a 4+2 wide integer core.

Right, as opposed to this mythical 6+2 (?) K12.

Like most things that come from Mr 83 it's just wrong.

What is that supposed to be a reference to, for those that don't get it?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,765
3,131
136
What is that supposed to be a reference to, for those that don't get it?

just figure i'll call him (Richie Rich) mr 83 from now on.

Can you PLEASE make this '83% higher IPC sheet' your signature? It'd make life so much easier for a lot of us on this forum. Not for you though, as I guess you permanently have it on your clipboard.
 

Gideon

Golden Member
Nov 27, 2007
1,625
3,650
136
I think I read somewhere that the A1100 was the end result of a commision by or joint venture with Amazon. The partnership didn't really work out and Amazon decided to part ways and do their own totally in-house develpment. The A1100 seemed like it was just quickly wrapped up into a pretty practical (networking oriented) package (likely with Amazon buying some number of these.).

This! I also remember reading that Amazon ordered the A1100 from AMD and AMD blew the power-budget, why Amazon decided to do the chip in-house.

Amazon alone would have justified bringing K12 to market, but After A1100 failed the profitability of the endeavour became questionable. Considering AMDs R&D budget at the time, IMO the correct call was made.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136
It's not like AMD immediately fired everyone who worked on K12 and lit their hard drives on fire. They still have a bunch of institutional knowledge about how to make a good ARM CPU, and I would be very surprised if they didn't still have a tiny team dedicated to keeping track of latest ARM developments. If the ARM server market takes off, I'm sure AMD can jump back in... Especially now they will be well funded from all their success with Zen.
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
It's not like AMD immediately fired everyone who worked on K12 and lit their hard drives on fire. They still have a bunch of institutional knowledge about how to make a good ARM CPU, and I would be very surprised if they didn't still have a tiny team dedicated to keeping track of latest ARM developments. If the ARM server market takes off, I'm sure AMD can jump back in... Especially now they will be well funded from all their success with Zen.
This. I remember Keller said they learned at lot from the K12 project. Doing such departures is always good for an organization to learn and stay flexible.
You just cant ask people to think out the box and be creative, but doing practical work on an arm arch, beeing used to x86 perspective, requires them to do excactly that.
Knowledge development requires long term perspective and our companies seldom have that. Most of the time it comes from unintentional actions. Lol.

Its fairly easy to count what is wasted but difficult near impossible to meassure what is gained. We will see in zen 3 and 4.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
This! I also remember reading that Amazon ordered the A1100 from AMD and AMD blew the power-budget, why Amazon decided to do the chip in-house.

Amazon alone would have justified bringing K12 to market, but After A1100 failed the profitability of the endeavour became questionable. Considering AMDs R&D budget at the time, IMO the correct call was made.

Wonder how close Amazon's ARM chips of today are to what K12 would have been?
 

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
The A1100 seemed like it was just quickly wrapped up into a pretty practical (networking oriented) package (likely with Amazon buying some number of these.).
The idea of A1100 was to help port software to AArch64. But AMD under delivered and were very late.
 
  • Like
Reactions: amd6502

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Yes I wonder why no one tries to make a CPU as good as Apple, it's so trivial to make a high-perf CPU.

You need a solid bussiness case first and then some big investment into development. It is not necessarily a technical showstopper.
 

Nothingness

Platinum Member
Jul 3, 2013
2,405
736
136
You need a solid bussiness case first and then some big investment into development. It is not necessarily a technical showstopper.
Yes that's required as you need a very experienced CPU design team which means a lot of money. And that's why even if there was a serious business opportunity very few companies can afford it.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Just a repeated reminder that ARM is already part of every single Zen chip, as part of the SCF (with all its flexibility, like the boosting algorithm) as well as the ("optional") AMD Secure Processor. AMD definitely retained knowledge about ARM, aside the ARM TrustZone it's just all for internal use and not publicly exposed. Zen essentially runs on ARM, without the ARM firmware in the UEFI no Zen chip will boot.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
The techniques that make ARM better or not than X86, outside of the decoder, are things that can be applied to both strains of CPUs.

Do we remember the early days of RISC? For those who don't, RISC was supposed to destroy X86. What really happened was that X86 designs incorporated the essence of RISC in their cores. What we had was not enough improvement to be worth the switching effort.

Maybe this time will be different, but I'm guessing it won't.

A question to the more knowledgeable members. Is there anything in ARM, outside of the instruction set, that cannot be co-opted by X86 to make itself better?
 
  • Like
Reactions: Richie Rich

naukkis

Senior member
Jun 5, 2002
705
576
136
A question to the more knowledgeable members. Is there anything in ARM, outside of the instruction set, that cannot be co-opted by X86 to make itself better?

x86 is total store order arch, arm64 isn't, with arm they can use store buffer much more efficiently to improve load/store performance. Arm has more architectural registers. Those are much needed when trying to get wide arch to perform well. Good luck for x86 to keep up, it's outright impossible to have similar IPC at similar perf/watt envelope. Let's see how Intel cove-arch does, it seems that with trying to get IPC improvements drive power usage through the roof. With arm loose memory ordering they can keep data much more inside the core and reduce cache accesses which burn lots of power in high IPC designs.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
x86 is total store order arch, arm64 isn't, with arm they can use store buffer much more efficiently to improve load/store performance. Arm has more architectural registers. Those are much needed when trying to get wide arch to perform well. Good luck for x86 to keep up, it's outright impossible to have similar IPC at similar perf/watt envelope. Let's see how Intel cove-arch does, it seems that with trying to get IPC improvements drive power usage through the roof. With arm loose memory ordering they can keep data much more inside the core and reduce cache accesses which burn lots of power in high IPC designs.
And the best is, that there's no downside to this!

Oh wait...
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
just figure i'll call him (Richie Rich) mr 83 from now on.
I agree with nick Mr.83 when you admit I'm gonna call you based on same rule, on Intel's average IPC rise per generation (it's about +4%) so I'm gonna call you Mr.4%. Do you agree? :D

Like most things that come from Mr 83 it's just wrong.

Bd/pd had 4 fpu pipes all 128bit. 2 pipes did fp /avx, the other two did int style simd xop/mmx.

Later on sr/ex they reduce to 3 pipes that shared both int and fp again all 128bit.
Ok, thanks for info, I'm still learning. However this prove my opinion that Zen FPU is based on DNA from XV/SR/BD. And Thunder57 was wrong saying "I don't agree with what you are saying about Zen borrowing anything significant from BD." Thanks for clarificiation and we can close this OT.


x86 is total store order arch, arm64 isn't, with arm they can use store buffer much more efficiently to improve load/store performance. Arm has more architectural registers. Those are much needed when trying to get wide arch to perform well. Good luck for x86 to keep up, it's outright impossible to have similar IPC at similar perf/watt envelope. Let's see how Intel cove-arch does, it seems that with trying to get IPC improvements drive power usage through the roof. With arm loose memory ordering they can keep data much more inside the core and reduce cache accesses which burn lots of power in high IPC designs.
So you say at least due to perf/watt envelope there is significant advantage for ARM and x86 cannot do anything about that? IMHO that's why Intel had to back off from smartphone/tablet market. This can be key advantage when you have massive multi-core server chiplet based server CPU like Epyc 2. In other words AMD with K12 would be able to put more cores within same 280W TDP or clock them higher. But how much this could be, something like +25% or even more? (assuming iso-ipc)
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
I agree with nick Mr.83 when you admit I'm gonna call you based on same rule, on Intel's average IPC rise per generation (it's about +4%) so I'm gonna call you Mr.4%. Do you agree? :D


Ok, thanks for info, I'm still learning. However this prove my opinion that Zen FPU is based on DNA from XV/SR/BD. And Thunder57 was wrong saying "I don't agree with what you are saying about Zen borrowing anything significant from BD." Thanks for clarificiation and we can close this OT.



So you say at least due to perf/watt envelope there is significant advantage for ARM and x86 cannot do anything about that? IMHO that's why Intel had to back off from smartphone/tablet market. This can be key advantage when you have massive multi-core server chiplet based server CPU like Epyc 2. In other words AMD with K12 would be able to put more cores within same 280W TDP or clock them higher. But how much this could be, something like +25% or even more? (assuming iso-ipc)
You realize, that he doesn't insert an excel sheet into all his comments, claiming the same godforsaken thing over and over again, right? I don't think you could call him anything based on the same rule.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
x86 is total store order arch, arm64 isn't, with arm they can use store buffer much more efficiently to improve load/store performance. Arm has more architectural registers. Those are much needed when trying to get wide arch to perform well. Good luck for x86 to keep up, it's outright impossible to have similar IPC at similar perf/watt envelope. Let's see how Intel cove-arch does, it seems that with trying to get IPC improvements drive power usage through the roof. With arm loose memory ordering they can keep data much more inside the core and reduce cache accesses which burn lots of power in high IPC designs.
I still see these '83% higher IPC ZOMG ZOMG' comparisons empirical at best. The 2 fast cores in the A13 have a what, 8 MB L2 cache? They are designed for complete different use cases and conditions.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
The only thing Jim Keller said is that the team know how to make high frequency cores (bulldozer) and also make small/low power cores (cats), I don't see where he said he would be reusing any 'units' of those for zen.

What I get from him is that zen would be fast (clocks), small and efficient, which is.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
You know mobile GPUs in those ARM chips are ahead of iGPUs in the x86 chips too right?

It can't be Apples to Apples ISA comparison when one group executes far better.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,765
3,131
136
The only thing Jim Keller said is that the team know how to make high frequency cores (bulldozer) and also make small/low power cores (cats), I don't see where he said he would be reusing any 'units' of those for zen.

What I get from him is that zen would be fast (clocks), small and efficient, which is.
Its not what he said , its about how they work. You think AMD rebuilt all there ADDer's , muls, div's , queues, decode logic, addr gen logic, power control logic, FRP, retire etc etc etc from scratch, why? what was the big limitations of those units? Im sure if we could look at version control we will still find circuits fundamentally designed in the K7 days alive in Zen. You dont waste your time rebuilding thing that never where a problem. You fix the things that are a problem.

So what did amd do, fixed caches, improved load/store, improved predict, improved decode(uop cache) made the core wider (ALU's) and the OOOE engine bigger.

Also on the K12 thing the only thing im aware Jim Keller ever stated was that the OOOE engine would be about 10% bigger in K12 then Zen. Thats operations in flight, ie ROB, FRP, not any of this crazy super wide stuff.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
So what did amd do, fixed caches, improved load/store, improved predict, improved decode(uop cache) made the core wider (ALU's) and the OOOE engine bigger.

Also on the K12 thing the only thing im aware Jim Keller ever stated was that the OOOE engine would be about 10% bigger in K12 then Zen. Thats operations in flight, ie ROB, FRP, not any of this crazy super wide stuff.
Where those 10% number come from? I'd like to read that, do you have a link? In that interview Keller mentioned "bigger engine" without specifying any number.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
At Real World Tech forum is info about K12 uarch. Pretty interesting stuff and maybe hint how Zen3 uarch might look like.

K12:
  • 4xALU + 1xBr (second Br is shared at 4th ALU) .... 5-wide integer (+25%)
  • 3xAGU (1x Load, 1x Load/Store, 1x Store) ... +50% than Zen1, similar to Zen2
Zen 1&2:
  • 4x ALU (2x Br are shared at 3rd and 4th ALU) .... 4-wide integer
  • 2xAGU (2x Load/Store in 3x pipes) Zen1, +1xAGU for store in Zen2 (still using 3xpipe)

So Jim Keller's "wider" engine of K12 was significantly better than Zen1 in integer. Going 4-wide to 5-wide is theoretically +25% integer IPC. Maybe Zen3 uarch is based at K12's 5-wide integer core, we'll see.

Also AGU performance was significantly faster (+50% IPC). From all instructions there is 35-40% for load and store. Clearly Zen2's tripple AGUs was inspired by Jim's K12 ARM.
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
Well, hm let's say AMD K12 is back or returned from hibernation. :mask: