ZEN ES Benchmark from french hardware Magazine

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 10, 2006
11,715
2,012
126
I don't remember, but last intel archs have L3 clock separate from core clock. Maybe sandy bridge has l3 at core clock? or maybe sandy bridge does not have ring bus?

SNB ran the L3$ at core clock. It was the first consumer implementation of the ring-bus from Intel.
 
  • Like
Reactions: rvborgh

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
You know ULV i5/i7 (for surface book/ultrabooks) are basally the same as desktop i3, only with lower clocks :d
 
  • Like
Reactions: Headfoot

zinfamous

No Lifer
Jul 12, 2006
110,568
29,179
146
I think Zen won't initially impact the mainstream or gamers as much. The 8 core 16 thread sunmit ridge and 32 core 64 thread naples part will be most useful for workstation stuff such as
  • Digital audio workstations/music production
  • 3d models, rendering and video game development
  • Software Development, building applications from source tar balls. Building applications in IDEs.
  • Video editing and production, rendering, encoding, handbrake for converting makemkv blu ray backups.
For gamers the Core i5 series and AMD FX series are still great for games and the prices will only get better as time goes on. A Core i7-6900k or AMD Summit Ridge 8 core 16 thread only provides benefit in games with large maps like strategy games, tycoon games, or city builders, but not the majority of games though. AMD to better meet the needs of gamers should have a lower power and lower cost FX and a higher clocked quad core Zen and maybe a quad core Zen with an integrated Vega GPU that can crossfire with a dedicated Vega GPU.

For servers AMD already is planning products but is remains to be seen if they can make a significant dent as this Intel's biggest golden goose by far.

For Ultrabooks, Chromebooks, convertibles, mini desktops, HDMI sticks, and Windows tablets class products. AMD needs something to compete with the ULV core i series, Intel Atoms and Core M products. I think AMD can crack this egg but this is a harder egg to crack. ARM will probably be a better competitor in this area. I would love a 8" tablet with an active digitizer and AMD processor and graphics.

For essential products or low end computers such as the Celeron or Pentium series, AMD has competing products but maybe Zen plus vega can bring needed improvement in this segment.

Finally another area that will be hard for AMD to crack is Intel's laptop grade full voltage core i Series processors. Maybe improved APUs can help here, but this is by far the hardest egg of them all to crack. The laptop Core i5 processor is ubiquitous it will be hard to compete in this area.

My .02 Federal Reserve Notes.

Agree for the most par,t but this really depends on pricing. We don't know anything about the 4/8 and 6/12 Zen parts, but those appear to be coming out after Ryzen, anyway. I'm of the camp that believes that Ryzen, if these benchmarks are real and might indeed get better, will be priced somewhere in the $500-600 range, minimum, for the gold binned parts, anyway. Maybe even higher? Now, if AMD puts out a 8/16 Ryzen chip, say at 3.2Ghz or so, for $400, maybe some later bundles with 480/Vega and/or Mobo+DDR3 come Memorial Day or other sales holidays, or some $350 rebate deals at Microcenter, Frys or whatever, then you will have a ton of potential 6700K/7700K customers thinking about doubling up their true core count for a few extra sheckles.

And I'm talking about gamers exclusively. The latest benchmarks with Watchdogs 2, any modern RTS/strategy game, and other upcoming, optimized DX12 titles tells the informed gamer that more real cores is the future, and sooner than some want us to believe.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Anyone know why K10's L1 cache latency is so fast? Seems that even Zen hasn't beaten it.
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...
 
  • Like
Reactions: rvborgh

Thunder 57

Platinum Member
Aug 19, 2007
2,674
3,796
136
Anyone know why K10's L1 cache latency is so fast? Seems that even Zen hasn't beaten it.

K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

Basically by design. Intel's Penryn had a 3 cycle L1 latency as well. They bumped it up to 4 cycles in Nehalem to allow higher frequencies. I know there is a review (I believe on Anandtech) that goes into this a little bit, but I can't seem to find it o_O.
 
  • Like
Reactions: rvborgh

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...
Hmm i need this cache explained.

1. As you asume the FO4 is low for Zen and lower than skl the similar latency means?

2. And why can you just increase associativity without increasing latency? I mean then why not go 1MB 32way? Bwe 256 is 8 way and skl 256 is 4 way...
 

simas

Senior member
Oct 16, 2005
412
107
116
Now, if AMD puts out a 8/16 Ryzen chip, say at 3.2Ghz or so, for $400, maybe some later bundles with 480/Vega and/or Mobo+DDR3 come Memorial Day or other sales holidays, or some $350 rebate deals at Microcenter, Frys or whatever, then you will have a ton of potential 6700K/7700K customers thinking about doubling up their true core count for a few extra sheckles.


I would be one of those customers seriously thinking about it but I would need some guidance from AMD on when they are planning to do that with 1H 2017 , as well as understand what their partners have MB wise in terms of features. Show me AM4 MB with 32GB+ RAM support , decent number of PCIE lanes (don't care about CF/SLI at all) , at reasonable price without insane intel premium. If you can fit network port with >1 Gb speed , you have an instant sale.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Hmm i need this cache explained.

1. As you asume the FO4 is low for Zen and lower than skl the similar latency means?

2. And why can you just increase associativity without increasing latency? I mean then why not go 1MB 32way? Bwe 256 is 8 way and skl 256 is 4 way...

I am not so into cache design, but maybe even if the latency is the same, the ways are different and maybe Zen cache can be clocked higher? Or maybe the cache is not the bottleneck. There is also 6T/8T design into play. There are many unknown. Knowing only the latency is not enough to say which cache will clock higher...
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
It wasn't long ago you foresaw Intel integrating the PCH on die, obviously for some kind of benefit.
Not sure I said that. I have never been the biggest proponent of integrating the PCH on-die. People like Ashraf have argued that it will reduce, say power consumption, but I've always doubted it would be significant, but sure it would help, especially for smartphones (phablets) which is what CNL-Y will target. Unless Zen targets phones (lol), it's not all that important.
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,207
136
Unless Zen targets phones (lol), it's not all that important.

It would save power, but it also would save on motherboard costs if OEMs can get away with chipsetless desktops. That's why AMD did it; because you know OEMs will appreciate that.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
Not sure I said that.
You did. Check the conversation.

I have never been the biggest proponent of integrating the PCH on-die. People like Ashraf have argued that it will reduce, say power consumption, but I've always doubted it would be significant, but sure it would help, especially for smartphones (phablets) which is what CNL-Y will target.
Here I'm inclined to agree with you, whether it's useful to integrate on die or just on package is a decision to make based on multiple factors, and lower power may not necessarily win even in mobile chips. This is a subject I hope we'll revisit on the forums.

Unless Zen targets phones (lol), it's not all that important.
I think I should further clarify what I meant by small form factor, since we're thinking about different things: even on mITX boards space is a problem nowadays. My Z170 board comes with the M2 slot placed on the back of the board, and MSI only managed to make it a 60mm slot (at 80mm there's no room for the mounting screw, other components are in the way). One may argue my example is anecdotal and better engineers might have solved that, but it still paints an interesting picture of what happens with all this miniaturization.

AMD already experimented with Kabini as a SoC, so they have some experience. As long as they make it work financially, there are benefits to be had from the integration, even if we're not talking mobile (15W TDP and bellow).
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
The Xeon D is an SOC - I assume AMD might be also going after the same market in the future?
 

Dygaza

Member
Oct 16, 2015
176
34
101
And I'm talking about gamers exclusively. The latest benchmarks with Watchdogs 2, any modern RTS/strategy game, and other upcoming, optimized DX12 titles tells the informed gamer that more real cores is the future, and sooner than some want us to believe.

This is exactly where we are heading. Slowly yes, but still heading. If you already have 4C/8T cpu, there is no real reason to upgrade to anything else but 6C+ cpu in the future. Ofc it's personal oppinnioin, but I find it very hard to justitfy myself going to 4C/8T cpu when there are already games where 8 hyperthreaded threads ain't simply enough. Naturally single core performance will stay important aswell.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,691
136
It would save power, but it also would save on motherboard costs if OEMs can get away with chipsetless desktops. That's why AMD did it; because you know OEMs will appreciate that.

AMD already experimented with Kabini as a SoC, so they have some experience. As long as they make it work financially, there are benefits to be had from the integration, even if we're not talking mobile (15W TDP and bellow).

I'm just speculating here, but I'd assume Summit Ridge's integrated FCH is primarily there because of the common AM4 socket. Since AMD has already done an integrated FCH on Kabini and Bristol Ridge, you can bet the upcoming Raven Ridge APUs will have one as well. So it might also be there for compatibility and commonality reasons.
 
  • Like
Reactions: coercitiv

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Basically by design. Intel's Penryn had a 3 cycle L1 latency as well. They bumped it up to 4 cycles in Nehalem to allow higher frequencies. I know there is a review (I believe on Anandtech) that goes into this a little bit, but I can't seem to find it o_O.

Anand himself said:
The L1 cache is the same size as what we have in Penryn, but it’s actually slower (4 cycles vs. 3 cycles). Intel slowed down the L1 cache as it was gating clock speed, especially as the chip grew in size and complexity. Intel estimated a 2 - 3% performance hit due to the higher latency L1 cache in Nehalem.
http://www.anandtech.com/show/2594/9

Other articles:
http://www.anandtech.com/show/2658
http://www.anandtech.com/show/2663
 
Last edited:

KTE

Senior member
May 26, 2016
478
130
76
K10 is an high FO4, "high" IPC, low frequency design, so in the time cycle of the design, they managed to have a 3 cycle L1. On 32nm (Llano) it walled at 3GHz with 1.4V Vcore... Lowering FO4 to have higher clock at low voltage required to increase cache latency, to stay into the new cycle timings...

Llano is walled for seperate reasons than being a low pipeline stage design.

F10h went to 3.7GHz shippable on 45nm...

Caches are designed by the architect compromising between speed/size/associativity.

Sent from HTC 10
(Opinions are own)
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Other scans of french magazine: http://imgur.com/a/qo9pH

IBQ: queue of bytes from L1I. 20x16Bytes, 10 per thread in SMT mode. In old CPUs was 2 or 4 x16B if I remember well.

uop cache: 2k uops. AFAIK INTEL have 1.5K uops and in Zen anyway for microcoded instructions there should be only the pointer, unlike INTEL. Moreover AMD's MOPs are denser, so given a program, it should translate in less uops.

uop cache throughput: 8uops/cycle. This was not specified in hot chips slides (I checked). The diagram is simplified (lacks ucode rom and stack memfile). INTEL has 1.5K uops and 6uops/cycle throughput.
 
Last edited:
  • Like
Reactions: psolord

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
https://twitter.com/CPCHardware/with_replies

Canard PC Hardware ‏@CPCHardware 18h18 hours ago
@Dresdenboy @InstLatX64 Yep. SMT activated. But current samples seem to have a serious bug with SMT & µop cache enabled.

If the sample which canardpc used had a bug then there is no way we can arrive at an accurate conclusion of Zen IPC and SR performance. I think the final production chip which goes out to reviewers is what matters. Hopefully AMD can sample the tech press at CES 2017 and launch by Feb 2017. AMD can let the press to run benchmarks and allow a preview with benchmarks and architectural details but can withhold pricing information till retail launch.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Bugs with with uop cache w/SMT enabled?? Has AMD spun a new A1?! Damn, can't read French.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
uop cache throughput: 8uops/cycle. This was not specified in hot chips slides (I checked). The diagram is simplified (lacks ucode rom and stack memfile). INTEL has 1.5K uops and 6uops/cycle throughput.
I saw that, too (having the print edition). I think, those 8 uops (actually "instr") per cycle are wrong and should be 6 as shown at Hot Chips. There should also be 32B/cycle going from L2$ to L1I$.