New Zen microarchitecture details

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

el etro

Golden Member
Jul 21, 2013
1,581
14
81
Kaveri unerdelivered pretty much. Carrizo promissed far less and ended overdelivering a bit.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Kaveri unerdelivered pretty much. Carrizo promissed far less and ended overdelivering a bit.

having had a kaveri desktop/laptop long term and a carrizo laptop currently, I will tell you that kaveri was a great piece of kit.

Carrizo on the other hand was so boring and under performing, it has been so underwhelming that it has been hard getting motivated to update my blog and subreddit.

Those are just my thoughts on the matter.
 

Adored

Senior member
Mar 24, 2016
256
1
16
There's only so much you can do with a fundamentally broken arch. Everything post Piledriver is just open research in preparation for Zen.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I feel completely the opposite. For example the APUs released after Trinity are something I previously would never have thought someone would actually release to the market.

With Steamroller a huge amount of features which are either not fully working or completely broken / missing, stuff that doesn't work as documented, etc. It's like someone pulled the plug on the project and many parts of the design were left incomplete, broken or untested. The situation improved quite alot in Carrizo / Bristol Ridge, but even they still contain some silly errors.

IIRC Trinity was the last project which had John Bruno as the chief engineer. It might be a coincidence, but Trinity is IMO the very last AMD design which is fully functional.

I think it's more likely because Trinity is the last APU that AMD thought might actually be competitive. At some point after the Piledriver architecture, AMD's R&D pipeline was flushed; the original plans for Steamroller and Excavator were canned, and what we got under those names were basically hacked-together stopgaps. There's a reason why AMD never even bothered to make HEDT or server chips for anything past Piledriver.

In other words, AMD's best CPU/platform engineers have all been working on Zen for the past couple of years, and the holdover construction core products got the "B" team.
 

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
I think it's more likely because Trinity is the last APU that AMD thought might actually be competitive. At some point after the Piledriver architecture, AMD's R&D pipeline was flushed; the original plans for Steamroller and Excavator were canned, and what we got under those names were basically hacked-together stopgaps. There's a reason why AMD never even bothered to make HEDT or server chips for anything past Piledriver.

In other words, AMD's best CPU/platform engineers have all been working on Zen for the past couple of years, and the holdover construction core products got the "B" team.

Given their total R&D budget it was the right move to make. All of these sorts of plans play out on the "years" timescale but AMD saw two key facts, 1) That they were "generations" ahead on the GPU side and 2) They were "even more generations" behind on the CPU side. Fact number 1 promised them at least some sort of niche, however small it may be, by simply trotting out iterative APUs to keep the ship afloat. Fact number 2 meant it simply wasn't reasonable to keep pouring money into Bulldozer derived cores as the ROI was never going to be favorable. Zen represented the best case for higher ROI and considering their budget limitations could you really blame them for jumping in with both feet?
 

YBS1

Golden Member
May 14, 2000
1,945
129
106
having had a kaveri desktop/laptop long term and a carrizo laptop currently, I will tell you that kaveri was a great piece of kit.

I agree, I couldn't be happier with my Kaveri. For the money I put into it, and the versatility it offers, I can't complain about any aspect of it. I wonder if there is any technical reason they couldn't have offered an 8 core, iGPU less version of this for the FM2+ platform? In theory that would have outperformed the FX line. I think it would have done well for them as a holdover until Zen.
 

deasd

Senior member
Dec 31, 2013
516
746
136
Sorry for picking this up late, but I still had that tab open. ;)
Did you see the different scores at different TDP settings? I think, the consensus so far was, that while discussing CMT or SMT scaling, we left out power constraints and turbo modes. Constant clock frequency tests were ideal. With the P3DNow! data I get 73% (CB15) and 79% (CB11.5).

I agree with this, TDP setting could affect result so much. At least a module penalty is heavier than most expected especially those who still believe it's dual core.
 

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
At least a module penalty is heavier than most expected especially those who still believe it's dual core.

It s the other way around, a lot of people exagerate the penalty, for instance in CB R15 if a single thread is 100% then two threads will yield 188%, the penalty is 6%...
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
I would be happy to hear that people who worked on Hounds, Stars, Bulldozer, Piledriver and Cat cores also worked on Zen. It's the people who worked on Steamroller and Excavator who should be keelhauled, flogged and then banned working on semiconductors for life.

It's interesting you should mention that because the rumors I heard some years ago was that AMD had stripped their main team from the construction cores and put Steamroller and Excavator into the hands of a team mostly dedicated to power efficiency.

I believe the core designs for SR and XV were already figured out, so you had bits left dangling as the best talent went to work on Zen and Zen+ under Keller.
 

Sheep221

Golden Member
Oct 28, 2012
1,843
27
81
It's interesting you should mention that because the rumors I heard some years ago was that AMD had stripped their main team from the construction cores and put Steamroller and Excavator into the hands of a team mostly dedicated to power efficiency.

I believe the core designs for SR and XV were already figured out, so you had bits left dangling as the best talent went to work on Zen and Zen+ under Keller.
I'm sure it was because of bad management and organization, not the engineers. Many brilliant ideas don't make it to design/volume production, or are left incomplete. Only reason that AMD is not making good CPUs is because they have bad attitude towards things.

btw, your avatar is so scary
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I agree, I couldn't be happier with my Kaveri. For the money I put into it, and the versatility it offers, I can't complain about any aspect of it. I wonder if there is any technical reason they couldn't have offered an 8 core, iGPU less version of this for the FM2+ platform? In theory that would have outperformed the FX line. I think it would have done well for them as a holdover until Zen.

There is no technical reason why this didn't happen. It was entirely economic reasons.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Kaveri was a pretty good chip, don't understand how you came to this conclusion that it is "sad".

What did they a reasonably good job of doing was covering up the flaws in Kaveri. It did have some serious problems that still dog its users today. Pointless throttling and poor configurability of the memory controller (stuck @ DDR3-2400) are the two most glaring flaws.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
jhu, let me quote you from another thread here:
Well, depends on the test and how it's run. From my own tests with Povray, I got a 5% IPC increase (1 thread per core) or 30% IPC increase (2 threads per core) over Core 2.
We've got the 40% IPC increase claim for Zen over XV. I often heard the question, whether this is about per thread (ST) or per core throughput. I think, your example gives a good datapoint, as the microarchitectural+architectural changes seem to be much less than between XV and Zen. Yet there is a 30% IPC increase for the 2-threaded core over the 1-threaded one, without adding lots of execution ressources, large increases in L1 B/W, etc. (even latency went up from 3 to 4).
conroe_nehalemxqsd2.png

More here and here.

So I question any claims, that the 40% number is only achieved with SMT.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
I'm sure it was because of bad management and organization, not the engineers. Many brilliant ideas don't make it to design/volume production, or are left incomplete. Only reason that AMD is not making good CPUs is because they have bad attitude towards things.

btw, your avatar is so scary

Products being released half-done is definitely an issue of management. I think the only reasons we saw Steamroller and Excavator at all was because of prior promises AMD made to release them back even before Bulldozer released. Legal issues abound when you abandon promised products.

Management otherwise didn't care about SR or XV - they didn't help market positioning, they were expenses that would not bring higher revenue, and so on. A six core SR or XV could have made a nice chip, but graphics performance was more important to maintain the existing market.

Thanks, I love my avatar :p
 

majord

Senior member
Jul 26, 2015
433
523
136
Thanks Stilt for the detailed info!

What did they a reasonably good job of doing was covering up the flaws in Kaveri. It did have some serious problems that still dog its users today. Pointless throttling and poor configurability of the memory controller (stuck @ DDR3-2400) are the two most glaring flaws.

Well (And this applies to Stilt's complaints too) whilst these are valid complaints in isolation, I think for the target market it's a non issue to be honest.

Whilst AMD Marketing push some of these as semi-enthusiast chips, the reality is, MOST buyers are not enthusiasts buying APU's just for the 'fun' of pushing them to the limit, and therefore aren't going to ever want to pair fast RAM with these things, purely because it costs too much, and is really poor value for the benefit. 2133Mhz support was more then enough at the time Kaveri was released.

If anything, stilt's comments regarding 2400 being the maximum Multiplier on BR, (and from what we've heard possibly also Zen) is more of a concern going forward. It could hurt enthusiasts enthusiasm for sure. As 2400+ speeds are already starting to become more cost effective.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Well (And this applies to Stilt's complaints too) whilst these are valid complaints in isolation, I think for the target market it's a non issue to be honest.

. . . and they covered up the problems by isolating Kaveri to markets where the throttling and RAM speeds would be non-issues. There was so much potential in Kaveri that was wasted . . . it's really quite sad. Don't get me wrong, I like my 7700k, but I've been fighting with it for awhile to get the most I can out of it.
 

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
jhu, let me quote you from another thread here:

We've got the 40% IPC increase claim for Zen over XV. I often heard the question, whether this is about per thread (ST) or per core throughput. I think, your example gives a good datapoint, as the microarchitectural+architectural changes seem to be much less than between XV and Zen. Yet there is a 30% IPC increase for the 2-threaded core over the 1-threaded one, without adding lots of execution ressources, large increases in L1 B/W, etc. (even latency went up from 3 to 4).
conroe_nehalemxqsd2.png

More here and here.

So I question any claims, that the 40% number is only achieved with SMT.

Dresdenboy, can you elaborate how this applies in the case of XV's 1 module versus Zen's 1 core and the implications of Zen's performance?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Again, bits & chips ;)

http://www.bitsandchips.it/52-english-news/6815-speculations-about-zen-after-our-april-s-fool

They were right before on many occasions, but forum users will always neglect that. There is much more reason to believe in words of Fottemberg about this. You have to digest which is info he provides and which is his speculation(Fabbing process).

I'm skeptical about their Zen speculation. The freq/W given is extremely optimistic. And the claim that 2x128-bit FPU is a simplification vs 1x256-bit makes no sense. Fewer but wider SIMD units are simpler and use less power, area, etc for the amount of work done. That's the entire point of SIMD.

What are some of these many occasions Bits and Chips was right before? nVidia's interposer? The impression I get is that they could be a lot like Charlie at SemiAccurate: some legitimate sources and leaks but poor interpretation and analysis.
 

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
I'm skeptical about their Zen speculation. The freq/W given is extremely optimistic.

Frequency/watt improvement is about the one announced by GF for 14nm LPP LVT in respect of their 28nm HPP used for Kaveri/Carrizo.

Such a 4C APU would fit within 25W/3.7Ghz if shrinked to 14nm, a 8C Zen is logically 4x more power hungry for the expected 4x FP throughput.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
I'm skeptical about their Zen speculation. The freq/W given is extremely optimistic. And the claim that 2x128-bit FPU is a simplification vs 1x256-bit makes no sense. Fewer but wider SIMD units are simpler and use less power, area, etc for the amount of work done. That's the entire point of SIMD.

What are some of these many occasions Bits and Chips was right before? nVidia's interposer? The impression I get is that they could be a lot like Charlie at SemiAccurate: some legitimate sources and leaks but poor interpretation and analysis.

I think the Zen FPU can be viewed from a few different perspectives:

1. FPU design
2. Core/SIMD width
3. SIMD throughput
4. SIMD latency
5. X86 code

1. FPU design,
This looks anything but simple, it looks very much like a future derivative of the Bridged FMA design that AMD released in a white paper pre Bulldozer, its focus was to not increase the latency or power consumption of ADD's or MUL's while still allowing FMA.

2. Designing a core for 256 bit datapaths vs 128bit has a power cost and not just in the execution units but from atleast from the L1D and possible L2 ( bandwidth) through the core and all the way back again. i guess you can call this a simplification, especially on effort to optimizing your pipeline for power when executing ops that aren't 256bit wide.

3. This is where it gets interesting, as Zen has 4 pipelines but only 256/128bits load store and mixed operations across those pipes . So if the data is "in the core" ( FPU PRF) then a Zen core can be 512bit wide for non FMA operations, but obviously this would be very hard to sustain.

4. This is one area where Zen will hopefully be good especially compared to bulldozer having "simple" none SIMD FP instructions like add taking 5 cycles compared 3 in STARS or 2 in Jaguar isn't awesome.

5. In Enterprise Server software SIMD is rare, and 256bit and FMA being extremely more rare. In consumer software that is also the case. The thing where it gets interesting is what is popular software using and outside of things like encoders and other throughput workloads thax to intel themselves that is firmly in SSE.


Putting all of that together i think for most workloads the Zen team looks to have made very good choices but a Zen core isn't going to look great in workloads that have a large amount of 256bit FMA where it wold be something like 4*2*2 ops for Zen vs 8*2*2 ops for >= Haswell.


TLDR i dont think its simple :D
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
itsmydamnation, I'm not making an argument that there aren't reasonable tradeoffs between 128-bit and 256-bit SIMD, or that the latter is always preferable. Simply saying that what Bits and Chips stated, that 2x128-bit is simpler than 1x256-bit, is nonsense. You need a lot more than just twice as many data lines to make the former happen. Also, once you do have 2x128-bit symmetrical pipelines there isn't really that much you need to add to make it split AVX to both pipes simultaneously. The instruction set was even designed with this in mind, eg by not having full shuffles. So I never really got what Bulldozer's SIMD "fusing" was supposed to amount to.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
itsmydamnation, I'm not making an argument that there aren't reasonable tradeoffs between 128-bit and 256-bit SIMD, or that the latter is always preferable. Simply saying that what Bits and Chips stated, that 2x128-bit is simpler than 1x256-bit, is nonsense
i was agreeing with you :D

The instruction set was even designed with this in mind, eg by not having full shuffles. So I never really got what Bulldozer's SIMD "fusing" was supposed to amount to.
As far as i know bulldozer never fused together its 128bit units, but executed the two 128 bits back to back for 256bit ops, unless you mean something else.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
i was agreeing with you :D

Okay, nevermind then ;)

As far as i know bulldozer never fused together its 128bit units, but executed the two 128 bits back to back for 256bit ops, unless you mean something else.

AFAIK, a fastpath double instruction was issued, and there was nothing stopping it from executing on two SIMD ports in parallel, so long as they were available.

But a lot of people said that two pipes "fused" together to become one AVX pipe. AFAIK AMD themselves used such language. It's fairly meaningless. But I think this concept is what the Bits and Chips article is alluding to.