ZEN ES Benchmark from french hardware Magazine

Arachnotronic · Dec 27, 2016

bjt2 said:
I don't remember, but last intel archs have L3 clock separate from core clock. Maybe sandy bridge has l3 at core clock? or maybe sandy bridge does not have ring bus?

SNB ran the L3$ at core clock. It was the first consumer implementation of the ring-bus from Intel.

zinfamous · Dec 27, 2016

itsmydamnation said:
So who exactly is buying an i5/i7 level desktop system (4core 8 thread) to "surf the web, play videos, play Facebook/Flash games, etc"? Sounds like the I3/Pentium range to me.

Surfacebook. I would not have bought this thing if it were packed with an i3.

SpaceBeer · Dec 27, 2016

You know ULV i5/i7 (for surface book/ultrabooks) are basally the same as desktop i3, only with lower clocks :d

rvborgh · Dec 27, 2016

Anyone know why K10's L1 cache latency is so fast? Seems that even Zen hasn't beaten it.

zinfamous · Dec 27, 2016

railroadmaster said:
I think Zen won't initially impact the mainstream or gamers as much. The 8 core 16 thread sunmit ridge and 32 core 64 thread naples part will be most useful for workstation stuff such as

Digital audio workstations/music production

3d models, rendering and video game development

Software Development, building applications from source tar balls. Building applications in IDEs.

Video editing and production, rendering, encoding, handbrake for converting makemkv blu ray backups.

For gamers the Core i5 series and AMD FX series are still great for games and the prices will only get better as time goes on. A Core i7-6900k or AMD Summit Ridge 8 core 16 thread only provides benefit in games with large maps like strategy games, tycoon games, or city builders, but not the majority of games though. AMD to better meet the needs of gamers should have a lower power and lower cost FX and a higher clocked quad core Zen and maybe a quad core Zen with an integrated Vega GPU that can crossfire with a dedicated Vega GPU.

For servers AMD already is planning products but is remains to be seen if they can make a significant dent as this Intel's biggest golden goose by far.

For Ultrabooks, Chromebooks, convertibles, mini desktops, HDMI sticks, and Windows tablets class products. AMD needs something to compete with the ULV core i series, Intel Atoms and Core M products. I think AMD can crack this egg but this is a harder egg to crack. ARM will probably be a better competitor in this area. I would love a 8" tablet with an active digitizer and AMD processor and graphics.

For essential products or low end computers such as the Celeron or Pentium series, AMD has competing products but maybe Zen plus vega can bring needed improvement in this segment.

Finally another area that will be hard for AMD to crack is Intel's laptop grade full voltage core i Series processors. Maybe improved APUs can help here, but this is by far the hardest egg of them all to crack. The laptop Core i5 processor is ubiquitous it will be hard to compete in this area.

My .02 Federal Reserve Notes.

Agree for the most par,t but this really depends on pricing. We don't know anything about the 4/8 and 6/12 Zen parts, but those appear to be coming out after Ryzen, anyway. I'm of the camp that believes that Ryzen, if these benchmarks are real and might indeed get better, will be priced somewhere in the $500-600 range, minimum, for the gold binned parts, anyway. Maybe even higher? Now, if AMD puts out a 8/16 Ryzen chip, say at 3.2Ghz or so, for $400, maybe some later bundles with 480/Vega and/or Mobo+DDR3 come Memorial Day or other sales holidays, or some $350 rebate deals at Microcenter, Frys or whatever, then you will have a ton of potential 6700K/7700K customers thinking about doubling up their true core count for a few extra sheckles.

And I'm talking about gamers exclusively. The latest benchmarks with Watchdogs 2, any modern RTS/strategy game, and other upcoming, optimized DX12 titles tells the informed gamer that more real cores is the future, and sooner than some want us to believe.

bjt2 · Dec 27, 2016

rvborgh said:
Anyone know why K10's L1 cache latency is so fast? Seems that even Zen hasn't beaten it.

K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

zinfamous · Dec 27, 2016

SpaceBeer said:
You know ULV i5/i7 (for surface book/ultrabooks) are basally the same as desktop i3, only with lower clocks :d

ORYLY? damn tricksters!

Thunder 57 · Dec 27, 2016

rvborgh said:
Anyone know why K10's L1 cache latency is so fast? Seems that even Zen hasn't beaten it.

bjt2 said:
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

Basically by design. Intel's Penryn had a 3 cycle L1 latency as well. They bumped it up to 4 cycles in Nehalem to allow higher frequencies. I know there is a review (I believe on Anandtech) that goes into this a little bit, but I can't seem to find it

.

krumme · Dec 27, 2016

bjt2 said:
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

Hmm i need this cache explained.

1. As you asume the FO4 is low for Zen and lower than skl the similar latency means?

2. And why can you just increase associativity without increasing latency? I mean then why not go 1MB 32way? Bwe 256 is 8 way and skl 256 is 4 way...

FlanK3r · Dec 27, 2016

cache latency seems good

coercitiv · Dec 27, 2016

witeken said:
What do you mean?

It wasn't long ago you foresaw Intel integrating the PCH on die, obviously for some kind of benefit.

simas · Dec 27, 2016

zinfamous said:
Now, if AMD puts out a 8/16 Ryzen chip, say at 3.2Ghz or so, for $400, maybe some later bundles with 480/Vega and/or Mobo+DDR3 come Memorial Day or other sales holidays, or some $350 rebate deals at Microcenter, Frys or whatever, then you will have a ton of potential 6700K/7700K customers thinking about doubling up their true core count for a few extra sheckles.

I would be one of those customers seriously thinking about it but I would need some guidance from AMD on when they are planning to do that with 1H 2017 , as well as understand what their partners have MB wise in terms of features. Show me AM4 MB with 32GB+ RAM support , decent number of PCIE lanes (don't care about CF/SLI at all) , at reasonable price without insane intel premium. If you can fit network port with >1 Gb speed , you have an instant sale.

bjt2 · Dec 27, 2016

krumme said:
Hmm i need this cache explained.

1. As you asume the FO4 is low for Zen and lower than skl the similar latency means?

2. And why can you just increase associativity without increasing latency? I mean then why not go 1MB 32way? Bwe 256 is 8 way and skl 256 is 4 way...

I am not so into cache design, but maybe even if the latency is the same, the ways are different and maybe Zen cache can be clocked higher? Or maybe the cache is not the bottleneck. There is also 6T/8T design into play. There are many unknown. Knowing only the latency is not enough to say which cache will clock higher...

witeken · Dec 27, 2016

coercitiv said:
It wasn't long ago you foresaw Intel integrating the PCH on die, obviously for some kind of benefit.

Not sure I said that. I have never been the biggest proponent of integrating the PCH on-die. People like Ashraf have argued that it will reduce, say power consumption, but I've always doubted it would be significant, but sure it would help, especially for smartphones (phablets) which is what CNL-Y will target. Unless Zen targets phones (lol), it's not all that important.

jpiniero · Dec 27, 2016

witeken said:
Unless Zen targets phones (lol), it's not all that important.

It would save power, but it also would save on motherboard costs if OEMs can get away with chipsetless desktops. That's why AMD did it; because you know OEMs will appreciate that.

coercitiv · Dec 27, 2016

witeken said:
Not sure I said that.

You did. Check the conversation.

witeken said:
I have never been the biggest proponent of integrating the PCH on-die. People like Ashraf have argued that it will reduce, say power consumption, but I've always doubted it would be significant, but sure it would help, especially for smartphones (phablets) which is what CNL-Y will target.

Here I'm inclined to agree with you, whether it's useful to integrate on die or just on package is a decision to make based on multiple factors, and lower power may not necessarily win even in mobile chips. This is a subject I hope we'll revisit on the forums.

witeken said:
Unless Zen targets phones (lol), it's not all that important.

I think I should further clarify what I meant by small form factor, since we're thinking about different things: even on mITX boards space is a problem nowadays. My Z170 board comes with the M2 slot placed on the back of the board, and MSI only managed to make it a 60mm slot (at 80mm there's no room for the mounting screw, other components are in the way). One may argue my example is anecdotal and better engineers might have solved that, but it still paints an interesting picture of what happens with all this miniaturization.

AMD already experimented with Kabini as a SoC, so they have some experience. As long as they make it work financially, there are benefits to be had from the integration, even if we're not talking mobile (15W TDP and bellow).

USER8000 · Dec 27, 2016

The Xeon D is an SOC - I assume AMD might be also going after the same market in the future?

Dygaza · Dec 27, 2016

zinfamous said:
And I'm talking about gamers exclusively. The latest benchmarks with Watchdogs 2, any modern RTS/strategy game, and other upcoming, optimized DX12 titles tells the informed gamer that more real cores is the future, and sooner than some want us to believe.

This is exactly where we are heading. Slowly yes, but still heading. If you already have 4C/8T cpu, there is no real reason to upgrade to anything else but 6C+ cpu in the future. Ofc it's personal oppinnioin, but I find it very hard to justitfy myself going to 4C/8T cpu when there are already games where 8 hyperthreaded threads ain't simply enough. Naturally single core performance will stay important aswell.

Insert_Nickname · Dec 27, 2016

jpiniero said:
It would save power, but it also would save on motherboard costs if OEMs can get away with chipsetless desktops. That's why AMD did it; because you know OEMs will appreciate that.

coercitiv said:
AMD already experimented with Kabini as a SoC, so they have some experience. As long as they make it work financially, there are benefits to be had from the integration, even if we're not talking mobile (15W TDP and bellow).

I'm just speculating here, but I'd assume Summit Ridge's integrated FCH is primarily there because of the common AM4 socket. Since AMD has already done an integrated FCH on Kabini and Bristol Ridge, you can bet the upcoming Raven Ridge APUs will have one as well. So it might also be there for compatibility and commonality reasons.

Dresdenboy · Dec 27, 2016

Thunder 57 said:
Basically by design. Intel's Penryn had a 3 cycle L1 latency as well. They bumped it up to 4 cycles in Nehalem to allow higher frequencies. I know there is a review (I believe on Anandtech) that goes into this a little bit, but I can't seem to find it .

Anand himself said:
The L1 cache is the same size as what we have in Penryn, but it’s actually slower (4 cycles vs. 3 cycles). Intel slowed down the L1 cache as it was gating clock speed, especially as the chip grew in size and complexity. Intel estimated a 2 - 3% performance hit due to the higher latency L1 cache in Nehalem.

http://www.anandtech.com/show/2594/9

Other articles:
http://www.anandtech.com/show/2658
http://www.anandtech.com/show/2663

KTE · Dec 27, 2016

bjt2 said:
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

Llano is walled for seperate reasons than being a low pipeline stage design.

F10h went to 3.7GHz shippable on 45nm...

Caches are designed by the architect compromising between speed/size/associativity.

Sent from HTC 10
(Opinions are own)

bjt2 · Dec 27, 2016

Other scans of french magazine: http://imgur.com/a/qo9pH

IBQ: queue of bytes from L1I. 20x16Bytes, 10 per thread in SMT mode. In old CPUs was 2 or 4 x16B if I remember well.

uop cache: 2k uops. AFAIK INTEL have 1.5K uops and in Zen anyway for microcoded instructions there should be only the pointer, unlike INTEL. Moreover AMD's MOPs are denser, so given a program, it should translate in less uops.

uop cache throughput: 8uops/cycle. This was not specified in hot chips slides (I checked). The diagram is simplified (lacks ucode rom and stack memfile). INTEL has 1.5K uops and 6uops/cycle throughput.

raghu78 · Dec 27, 2016

https://twitter.com/CPCHardware/with_replies

Canard PC Hardware ‏@CPCHardware 18h18 hours ago
@Dresdenboy @InstLatX64 Yep. SMT activated. But current samples seem to have a serious bug with SMT & µop cache enabled.

If the sample which canardpc used had a bug then there is no way we can arrive at an accurate conclusion of Zen IPC and SR performance. I think the final production chip which goes out to reviewers is what matters. Hopefully AMD can sample the tech press at CES 2017 and launch by Feb 2017. AMD can let the press to run benchmarks and allow a preview with benchmarks and architectural details but can withhold pricing information till retail launch.

Ajay · Dec 27, 2016

Bugs with with uop cache w/SMT enabled?? Has AMD spun a new A1?! Damn, can't read French.

Dresdenboy · Dec 27, 2016

bjt2 said:
uop cache throughput: 8uops/cycle. This was not specified in hot chips slides (I checked). The diagram is simplified (lacks ucode rom and stack memfile). INTEL has 1.5K uops and 6uops/cycle throughput.

I saw that, too (having the print edition). I think, those 8 uops (actually "instr") per cycle are wrong and should be 6 as shown at Hot Chips. There should also be 32B/cycle going from L2$ to L1I$.

ZEN ES Benchmark from french hardware Magazine

Lifer

No Lifer

Senior member

Member

No Lifer

Senior member

No Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Lifer

Diamond Member

Golden Member

Member

Diamond Member

Golden Member

Senior member

Senior member

Diamond Member

Lifer

Golden Member