New Zen microarchitecture details

Discussion in 'CPUs and Overclocking' started by Dresdenboy, Mar 1, 2016.

  1. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    It will definitely be interesting to see both how Polaris overclocks, especially the smaller variant. Bonaire is basically the benchmark for Polaris, since it is the best clocking 28nm GCN GPU AMD has. AIBs have shipped these cards at 1200MHz out of the box and they hit 1400MHz quite easily on stock cooling.

    Since certain people claim that shrinking virtually any design from 28nm bulk to 14nm FinFet LPP will immediately increase the Fmax by 50 - 100%, I'm really keen to see how that will work out :sneaky:
     
  2. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    Be careful of process PR tho. IIRC it's 30% frequency gain or 50% less power from a process viewpoint... At a standard voltage and temp (like 1V). It most definitely won't be FMax. It's in one of the links I posted.

    But yield is a different spanner altogether. 14nm is no where near 28nm (which is due to remain in high volume production till 2020 at minimum) and the production cost for 14nm is more than 2.5x.

    And the way PC markets function with the TDP caps, additional features to the chip are typically favoured over frequency. Since those finFET power improvements tend to be mostly static rather than switching. 50% less sounds a lot but.. For example, out of 15W on a 100W chip, it's not so much.

    I think I would agree looking at the design with many cores and SMT, to hit BD frequencies at <125W 14nm would be a remarkable achievement. Even more so with the limited Vt scaling these days.

    I also feel that one of the next major features to come would be incorporating GPU SPs in place of the FPU with the breakthroughs in HBM/eDRAM. I think that's why the HSA/APU movement originally began.

    Sent from HTC 10
    (Opinions are own)
     
  3. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    BTW... Looking at DT Excavator, I wouldn't want AMD to project anything based off it!

    It's certainly not even 5% better than PD on average. It wins few but loses more with a landslide!

    Absolute performance wise, it's ppp. It needs FAR higher clocks but it doesn't scale at all. It's a mobile chip, simply put, low frequency+graphics optimized.

    It's no better than the old Regor chips were at the time. Kuma caned them, literally.

    Sent from HTC 10
    (Opinions are own)
     
  4. itsmydamnation

    itsmydamnation Golden Member

    Joined:
    Feb 6, 2011
    Messages:
    1,433
    Likes Received:
    369
    Its not even close to Regor vs Kuma. For one in almost anything that is not a throughput test/workload per clock excavator is anywhere upto 15% faster per clock then kaveri.

    An actual fair comparison would be something like an X6 vs llano. L3 vs no L3 , APU focued vs not. Regor vs Kuma was about cost reduction.

    The only way trinity wins out vs excavator per clock is because of the extra L2 and thats completely disregarding TDP. As pointed out by Stilt the L2 is very power hungry and clock limited. if looking at integer IPC out side of the L2 difference there is nothing in excavator that sacrifices performance for lower power.

    the funny thing about IPC is the C, so the point of your post was........
     
  5. coercitiv

    coercitiv Golden Member

    Joined:
    Jan 24, 2014
    Messages:
    1,838
    Likes Received:
    602
    It's not about automatic Fmax but rather about giving the chip some breathing room in power usage, hence potentially higher clocks.

    However, I have yet to see people hope or claim 50-100% increase, when even Nvidia brought about 40% at best. How about we hope for 20-40% instead? :)
     
    #2030 coercitiv, Jun 18, 2016
    Last edited: Jun 18, 2016
  6. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    The average figures quoted by AMD are pretty accurate based on my own testing. According to AMD Steamroller is 10% faster than Piledriver and Excavator is 5% faster than Steamroller. There certaily is extremities to both directions, but generally that's the actual average.

    So on average Excavator should have 15.5% higher IPC than Piledriver. In most cases it has, unless it's primary weakness is exploited (insufficient L2).
     
  7. itsmydamnation

    itsmydamnation Golden Member

    Joined:
    Feb 6, 2011
    Messages:
    1,433
    Likes Received:
    369
    i posted this on beyond3d in regard to discussion on Zeppelin PCI-E/GMI, figured i'ld share here see what people think:

     
  8. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    In which DT apps? (PD vs SR) Are we talking real apps or synthetics here, and did you try CPU Queen or Fritz Chess by any chance?

    K10 is still faster than even EX...

    http://www.pcgameshardware.de/Athlon-X4-845-CPU-261962/Tests/Excavator-Benchmarks-Test-1191570/
    http://www.overclockersclub.com/reviews/amd_athlon_x4_845_cpu/5.htm
    http://excavator.looncraz.net/
    http://www.ferra.ru/ru/system/review/amd-excavator-athlon-x4-845/#.V2UtSZ_TXqA
    http://www.neoseeker.com/Articles/Hardware/Reviews/amd-athlon-x4-845/2.html

    Sent from HTC 10
    (Opinions are own)
     
    Dresdenboy likes this.
  9. Dresdenboy

    Dresdenboy Golden Member

    Joined:
    Jul 28, 2003
    Messages:
    1,709
    Likes Received:
    504
    Let's not forget the missing L3 and the higher IMC latencies, which might also make XV's results worse, as shown by KTE.
     
  10. Dresdenboy

    Dresdenboy Golden Member

    Joined:
    Jul 28, 2003
    Messages:
    1,709
    Likes Received:
    504
    Were TDPs take care of?

    Here are some results with x4 845 @ 95W:
    http://www.planet3dnow.de/cms/22697-erste-benchmarks-des-athlon-x4-845/

    And regarding IPC (similar to Looncraz's work), there is also the P3DNow test:
    http://www.planet3dnow.de/cms/18564...cavator-leistungsvergleich-der-architekturen/
    (article index is below the headline)
     
  11. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    OT:

    It has always been pretty funny an exercise when comparing benchmarks online - 16 years and enthusiasts still haven't developed firm approaches. Quite a conundrum :)

    Before Zen is out, I should say it. I have a problem with some common data. Let's face it,

    • Some benches are purely synthetics, true to their name with little correlation to reality.
    • Some synthetics show best case for an architecture and some show worst case results.
    • Then there are those which are popular among benchers.
    • There's even, synthetic benchmarks which contain a good instruction mix and which correlate very well with what you find in the public domain.
    • Then benches which are testing a future capability.
    • Also benches which scale well or poorly from 1C to nC.
    • Furthermore, there are those which are optimized for one architecture more than the other.

    Then you have the real world apps, which can be divided to be:

    1. All or any of the above...
    2. RW apps most use and those most do not use.
    3. Benches running tasks and or functions which most use/do not use.
    4. Benches which run at sizes uncommon in RW.
    5. Benches which show results in percentages which mean little for actual runtimes (relevancy threshold)
    6. Then you have outliers, corner cases (best/worst) in the results, improper setups, test bed hampering, etc.

    My gripe is usually with 3-5. It is so easy to get caught up in numbers. What does 5fps mean at 70fps? What does 7s mean to unzip anything, unless its sub 25s runtime? But even then? Does 20s vs 25s boot time really matter? Would consumers perceive it? Would it improve their survival age by a day? :eek:

    We need to be able to look for a relevant threshold when advising and comparing, especially reviewers and this is done by taking the benchmark, its bench result in actual figures, its applicability to real world and its total runtime and sizes in consideration. % statistics are useful to determine which is faster, for analysis, but don't tell the complete tale for an end user.
     
  12. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
  13. Dresdenboy

    Dresdenboy Golden Member

    Joined:
    Jul 28, 2003
    Messages:
    1,709
    Likes Received:
    504
    He's still posting there in the forum, but less often, also not actively working on articles.
     
  14. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    It would be good to see him back now in order to review Zen... As he's very knowledgeable with AMD CPUs and had pioneering experience from the Agena days.

    Sent from HTC 10
    (Opinions are own)
     
  15. happy medium

    happy medium Lifer

    Joined:
    Jun 8, 2003
    Messages:
    13,379
    Likes Received:
    159
    AMD Zen ~= Intel 3930k @ 130watts.

    That's a good guess. and more than I expect.
     
  16. moonbogg

    moonbogg Diamond Member

    Joined:
    Jan 8, 2011
    Messages:
    9,036
    Likes Received:
    449
    If that's the case then Zen would have to be given away for nearly free. Most people would just buy a new intel quad and get better performance. If it performs like Sandy Bridge, then AMD is stuck being the budget brand forever. Also, people don't care about 8 cores and they don't need them. A few of us around here might care but no one else does and no one needs it. Sad truth is, a fast quad is going to be just fine for another decade.
     
  17. Abwx

    Abwx Diamond Member

    Joined:
    Apr 2, 2011
    Messages:
    8,246
    Likes Received:
    154
  18. Azuma Hazuki

    Azuma Hazuki Senior member

    Joined:
    Jun 18, 2012
    Messages:
    515
    Likes Received:
    149
    Well, according to Passmark's CPU list that puts it at about 12000 points, where the 6700K is just below 11000. If they're selling this for $400 or below, especially $350 or below, this is a no-brainer. Single thread is fast enough, multithread is exceptional, and with DX12 this should be a decent enough gamer CPU.

    Personally, as a Gentoo fan who's stuck on Arch because compiling on a Core 2 Duo is made of pain, I am looking forward to this.
     
  19. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Could you post a link where AMD said that? AMD said "Orochi" and the "FX-8350" was nothing but a interpretation of WCCF.

    If you look at the first (Orochi vs. Summit) and the second (Excavator vs. Zen) version of the slide, there is a pretty clear pattern. I think that's the very reason why the slides got pulled away :sneaky:

    When viewed in the original size, the height of Summit's / Zen's column is 658 pixels in the first version of the slide (Orochi vs. Summit) and 667 pixels in the second version (Excavator vs. Zen). Meanwhile the height of Orochi's and Excavator's columns are 328 and 370 pixels.

    658 / 328 = 100.60% higher (Orochi vs. Summit)
    667 / 370 = 80.27% higher (Excavator vs. Zen)

    In the very same section of the slide AMD states "significant performance leap expected - 40% IPC improvement".

    Is the ~80% higher column for Zen (vs. Excavator) just a coincidence, or does the slide have 2:1 scale ;)
    50.3% would fit perfectly as the difference between Orochi (Piledriver) and Zen, considering the average performance difference between Orochi and Excavator in Cinebench.

    Here are the originals (no resizing, lossless).

    [​IMG]


    [​IMG]
     
    #2044 The Stilt, Jun 19, 2016
    Last edited: Jun 19, 2016
  20. coercitiv

    coercitiv Golden Member

    Joined:
    Jan 24, 2014
    Messages:
    1,838
    Likes Received:
    602
    In other words, Zen 8C will have double the throughput only at same frequency, once a ~25% clock speed difference gets factored in, throughput advantage drops from +100% to +50%.

    It all depends on final clocks.
     
  21. Abwx

    Abwx Diamond Member

    Joined:
    Apr 2, 2011
    Messages:
    8,246
    Likes Received:
    154
    Why should there be a 2/1 scale..?..

    The 80% obviously doesnt apply to the IPC, so it s the other metric mentioned, that is, a Zen core has 80% more perf than a XV core; so this obviously apply to throughput, as you know it 80% over XV is 100% better than Piledriver in say Cinebench 11.5.

    You know that i do not use CB R15 because it doesnt mimick the results of CB 11.5 for SR and XV, so there s another factor at play, possibly cache size and this produce non significant results IPC wise for these APUs as the L3 cache equipped FX doesnt seems to suffer from this detail.
     
  22. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    Of course, that depends on which application AMD means (unknown) or how many of them they are considering.
     
  23. Dresdenboy

    Dresdenboy Golden Member

    Joined:
    Jul 28, 2003
    Messages:
    1,709
    Likes Received:
    504
    Yep, that'll be good. I also wrote a few articles there (and am still waiting for my BD sample :D), and provided one part about a uarch last year. Maybe I'll be involved regarding Zen. ;)
     
  24. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    It'd be good if you wrote the architectural side to Zen at least. Esp. in the preface to mention the more important bandwidths and instruction latencies that have changed. You rarely if ever see that covered anywhere (expect, Aces/RWT formerly).

    Also an idea for you to pass on for Zen (I haven't spoken to MusicIsMyLife since 2007 ;)): Electrical testing of the various voltage supply lines (like ht4u.net used to do).
     
  25. superstition

    superstition Platinum Member

    Joined:
    Feb 2, 2008
    Messages:
    2,219
    Likes Received:
    215
    The best thing is to use benchmarks that measure the different aspects of the CPU in a concise and targeted manner, such as:

    cache performance
    integer performance
    floating-point performance
    AVX performance
    AVX-2 performance
    graphical performance (for integrated graphics)

    performance per watt, maximum and minimum
    performance per watt over time

    A one-size-fits-all benchmark can have big drawbacks, like relying overly on a FP-heavy benchmark to characterize the difference between an 8370 and a 3770K. 8 integer cores and 4 floating point units is a design that is going to look particularly sub-par in a FP-heavy benchmark, unless those 4 floating point units are really powerful.

    "Real-world" is tricky because applications can be coded in a manner to favor one architecture over another. Cinebench, for instance, could be updated to further favor Intel by leaning even more heavily on something like AVX-2. I assume Skylake and Kaby Lake are going to have stronger AVX-2 performance than Zen. So, all one needs to cook up a benchmark that proves how sad Zen is is something "real world" like Cinebench "12" that leans very heavily on a specific Intel advantage. The opposite is a benchmark that doesn't use AVX at all in a circumstance where it would provide additional performance.

    If Intel were putting Broadwell C-style EDRAM in its chips a "real world" benchmark could also be designed to heavily favor a large victim cache. Since Zen is unlikely to have anything like that initially then that could be a big marketing edge. "Zen falls short of Kaby Lake by 60%!" (in benchmark that leans heavily on 128 MB of L4 cache). Of course, if Intel and the benchmark maker were to pursue this more obviously-exposed avenue it would have been wise to put the L4 on all Skylake as well as Broadwell E parts. Even more clever would have been to withhold the L4 until Skylake to claim that the big performance advantage is due to the newer Skylake cores.