New Zen microarchitecture details

The Stilt · Jun 4, 2016

DrMrLordX said:
AMD has said that they aren't going to be the "bargain" brand anymore. $299 Summit Ridge makes them the bargain brand.

While completely true, what does releasing Radeon RX 480 at 199$ make them? :sneaky:

Personally I understood the "not going to be a bargain brand anymore" statement that they won't have those <70$ APUs, which nobody really wants even for free available in retail anymore (after Zen happens).

sirmo · Jun 4, 2016

The Stilt said:
While completely true, what does releasing Radeon RX 480 at 199$ make them? :sneaky:

Personally I understood the "not going to be a bargain brand anymore" statement that they won't have those <70$ APUs, which nobody really wants even for free available in retail anymore (after Zen happens).

Bargain brand is a broad term. AMD saying they won't be a bargain brand is them saying what we all know to be true which is that Bulldozer derived chips aren't competitive at the high end.

That doesn't mean they aren't allowed to compete on price, they have market share to take back. And rx480 is exactly that. Pound for pound it looks like a screamer. And they never said they scrapped the Vega parts. People like to take everything AMD says out of context and spin it in the worst possible light for some reason.

JDG1980 · Jun 4, 2016

The Stilt said:
While completely true, what does releasing Radeon RX 480 at 199$ make them? :sneaky:

Personally I understood the "not going to be a bargain brand anymore" statement that they won't have those <70$ APUs, which nobody really wants even for free available in retail anymore (after Zen happens).

The specific statement was as follows: "The idea that AMD is a cheap solution has to be replaced with the idea that AMD is a very competitive solution." Now, "cheap" in this context could simply mean low price, but that term often has connotations of poor quality or corner-cutting as well.

What AMD wants to avoid is the situation where they have to blow out their products at bargain-basement prices (making little or no profit) because those products are simply not competitive at all. They have been stuck in that position in the x86 market since late 2011, when Bulldozer first dropped (and flopped).

Part of offering a "competitive solution" is a good price. Not bargain-basement blowouts, but competition means a company like AMD with lesser brand equity than its major competitors needs to offer better perf/$ if it wants to gain market share. As you noted, the Polaris 10 announcement seems to indicate they realize this.

Boze · Jun 4, 2016

The Stilt said:
So far I think I've said anything between 2.6GHz - 3.2GHz as the base and anything between 3.2GHz - 3.8GHz for the maximum boost.

It'll need to be at least 3.0 gHz to really turn any heads. I can't imagine a lot of enthusiastic consumer support for a 2.8 gHz 8c/16t chip, but I could be wrong.

The biggest plus of this, ironically, is that it should push Intel to start releasing consumer level K chips with 6c/12t.

This era of 4c/4t and 4c/8t is at an end. DirectX 12 has ensured that.

JDG1980 · Jun 4, 2016

coffeemonster said:
what are you basing that on? that is wildly contradictory to what has already been shown in the carizzo/bristol ridge threads.

Athlon X4 845 (Excavator with 3.8 GHz max turbo) gets 95 points in Cinebench R15, as mentioned previously.

The Nehalem i7-990X has a turbo clock speed almost the same (3.73 GHz). It gets 117 points in Cinebench R15. Thus, Nehalem beats Excavator in IPC by no less than 23 percent. That's a huge margin.

There was no Thuban chip with a turbo clock as fast as X4 845, but the X6 1100T comes close, with a 3.7 GHz turbo. It scores 94 points in Cinebench R15. Now, if that turbo clock is actually being used, then it indicates Thuban IPC is roughly equivalent to Excavator IPC. But Anandtech described Thuban turbo core as "pretty much non-functional". If that X6 1100T was actually running at only the 3.3 GHz base clock, then Excavator IPC is still substantially worse than Thuban.

Conroe is known to have very similar IPC to Thuban so the same applies here. (No Conroe CPU, as far as I know, ever had an official clock rate as high as the other products discussed above.)

The Stilt · Jun 4, 2016

"Hounds" get ~100 points in CB R15 at 3.8GHz.

krumme · Jun 4, 2016

I think you guys underestimate the tdp limitation. (And the huge advantage of it btw)
Its -less- than 2/3 of bwe
And it hits at the base freq. Its gotta hurt.
What would you think bwe 8c whould have of freq with 95tdp? (And pls not intel 140w tdp is 110w)

And thats even with a process tuned for the purpose. Unlike gf process where the history is less than convincing.

How does 60w tdp 8c core sound to you? I my ears its lots of business opportunities. The point is getting in a situatin where you are superior. Thats where the profit is.

Striving for 8c bwe like perf minus 10% is just 100% dead end and its not gonna happen. Besides its more probably skl e that the main compettitor. I hope they target this product differently. Much leaner and cheaper.

The Stilt · Jun 4, 2016

Hardware.fr measured 117.6W as VRIN power from EPS12V during Prime95 on i7-6950X :sneaky: When VRM (~85%) and FIVR (~80%) losses are accounted for, that's around 80W.

Zeppelin will be available in various different TDP configurations, however 95W is expected to be the highest for consumer platform (Summit Ridge, AM4).

EDIT: Link http://www.hardware.fr/news/14643/intel-lance-i7-bdw-e-i7-6950x-tete.html

coffeemonster · Jun 4, 2016

JDG1980 said:
Athlon X4 845 (Excavator with 3.8 GHz max turbo) gets 95 points in Cinebench R15, as mentioned previously.
The Nehalem i7-990X has a turbo clock speed almost the same (3.73 GHz). It gets 117 points in Cinebench R15. Thus, Nehalem beats Excavator in IPC by no less than 23 percent. That's a huge margin.
...Excavator IPC is still substantially worse than Thuban.
...Conroe is known to have very similar IPC to Thuban

I'm no expert on benches, but why then does Carrizo beat thuban and nahelam in geekbench and passmark(and superpi too I believe)?
Intel Core i7-975(Nahalem) turbo 3.6GHz single thread 1463 https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-975+@+3.33GHz&id=841
Phenom II X4 980(K10) 3.7Ghz single thread 1302 http://www.cpubenchmark.net/cpu.php?cpu=AMD+Phenom+II+X4+980
Athlon II X4 740(Piledriver) turbo 3.7Ghz single thread 1284 https://www.cpubenchmark.net/cpu.php?cpu=AMD+Athlon+X4+740+Quad+Core
Athlon II X4 840(Steamroller) Turbo 3.8Ghz Passmark single thread 1536
https://www.cpubenchmark.net/cpu.php?cpu=AMD+Athlon+X4+840&id=2463
Athlon II X4 845(Excavator) Turbo 3.8Ghz Passmark single thread 1775 https://www.cpubenchmark.net/cpu.php?cpu=AMD+Athlon+X4+845&id=2721

For gaming and real desktop usage, I frequently read that piledriver and newer quads trump Phenoms even at the same clock(piledriver mainy for newer ISAs, PM and OC potential). Which is what the above benches would suggest.

KTE · Jun 4, 2016

The Stilt said:
Hardware.fr measured 117.6W as VRIN power from EPS12V during Prime95 on i7-6950X :sneaky: When VRM (~85%) and FIVR (~80%) losses are accounted for, that's around 80W.

Zeppelin will be available in various different TDP configurations, however 95W is expected to be the highest for consumer platform (Summit Ridge, AM4).

EDIT: Link http://www.hardware.fr/news/14643/intel-lance-i7-bdw-e-i7-6950x-tete.html

These processors since Nehalem pull power thru other rails too (i.e. http://m.ht4u.net/reviews/2008/intel_nehalem_core_i7/index30.php)

The Stilt said:
I would assume that on Intel base and all core turbo exists because of the same thing why Broadwell-E got "Turbo Boost Max 3.0", i.e different clocks for AVX2 and non-AVX2 workloads (difference in power draw).

Zen shouldn't have a drastic difference in power consumption between AVX2 and non-AVX2 workloads, as far as I've understood it.

It's because some instructions (FP) are far higher power users than anything else. One method of processor power management monitors the pipeline and reduces clocks only when such instructions are executing (like IBMs). It's a very advanced fine grained method.

DrMrLordX said:
It's current pricing is due to its position as last generation's boutique flagship product. Do not think Intel must sell it at that price. There are any number of Xeons providing similar performance at lower prices, albeit without the unlocked multi.

Bottom line is that Intel has had multicore Haswell-E parts since Q3 2014 with 8c and more. Yet AMD is chasing Sandy Bridge?!? The TDP is nice, but Intel's relative performance per watt has improved with Broadwell-E already, and Skylake-E is the next competitor in line . . .

I agree with you here.

Sent from HTC 10

mikk · Jun 4, 2016

coffeemonster said:
I'm no expert on benches, but why then does Carrizo beat thuban and nahelam in geekbench and passmark(and superpi too I believe)?

Main reason in Geekbench is AES acceleration in hardware. If you exclude this Nehalem IPC is certainly better.

The Stilt · Jun 4, 2016

KTE said:
These processors since Nehalem pull power thru other rails too (i.e. http://m.ht4u.net/reviews/2008/intel_nehalem_core_i7/index30.php)

Sent from HTC 10

I would expect all of the other planes (VRING, VSA, VIO) are also fed by the EPS12V since they have their own smart buck circuits. These kind of planes are rarely fed by anything else than 12V and the ATX connector alone is not sufficient to feed them (considering the potential PCI-E bus power draw).

krumme · Jun 4, 2016

The Stilt said:
Hardware.fr measured 117.6W as VRIN power from EPS12V during Prime95 on i7-6950X :sneaky: When VRM (~85%) and FIVR (~80%) losses are accounted for, that's around 80W.

Did you briefly transform into another member of this forum or is this sarcasm? Man i think i have seen 2000 post like that the last 5 years explaining how some iteration of bd is not using so much power constantly debating it with mr. cuz its amd it sucks.

Better call Intel and tell them it was a printing error to raise tdp from 130 to 140w going from hsw e to bdw e

Does anyone know if the team and people working on eg the 390 boards are the same doing the new vrm specs for zen cpu or is cpu/gpu teams completely seperated?

The Stilt · Jun 4, 2016

krumme said:
Did you briefly transform into another member of this forum or is this sarcasm? Man i think i have seen 2000 post like that the last 5 years explaining how some iteration of bd is not using so much power constantly debating it with mr. cuz its amd it sucks.

Better call Intel and tell them it was a printing error to raise tdp from 130 to 140w going from hsw e to bdw e

Does anyone know if the team and people working on eg the 390 boards are the same doing the new vrm specs for zen cpu or is cpu/gpu teams completely seperated?

I don't quite follow?
The figures I used (85% & 80%) are actually rather optimistic, or pessimistic in terms of the actual CPU power draw. The VRIn VRM efficiency is higher than normal because the output current is rather low (due significantly higher output voltage). Also for Haswell's FIVR solution, Intel has quote efficiency of 75% (IIRC). The FIVR efficiency will always be pretty poor, since it operates at up to 500 times higher fSW than conventional VRM. The hyper-high fSW is required due the inductors being integrated into the core (i.e. tiny and low inductance).

sirmo · Jun 4, 2016

Even dedicated VR's rarely cross 85% efficiency. Because you can optimize them for a certain load but the load changes. If you read the datasheets 90+ efficiency is only achievable in like specific scenarios with low currents at low temperatures.

KTE · Jun 4, 2016

The Stilt said:
I would expect all of the other planes (VRING, VSA, VIO) are also fed by the EPS12V since they have their own smart buck circuits. These kind of planes are rarely fed by anything else than 12V and the ATX connector alone is not sufficient to feed them (considering the potential PCI-E bus power draw).

I haven't read an indepth analysis on CPU power feeds in a long time (since SNB) but yes, you are right AFAIK.

Sent from HTC 10

VirtualLarry · Jun 4, 2016

Boze said:
This era of 4c/4t and 4c/8t is at an end. DirectX 12 has ensured that.

I certainly hope so! I've even been toying with pulling my Thuban 1045T CPUs out of retirement with some new mobos (with PCI-E M.2 slots), and replacing my Z170 mobos with G4400 CPUs in my Raidmax Cobra cases.

I mean, my G4400 Pentium Skylake dual-cores, especially overclocked to 4.4Ghz, have screaming single-threaded performance, but I'm thinking, things are getting more multi-threaded, so they just aren't enough. (Well, except for web browsing.)

krumme · Jun 4, 2016

The Stilt said:
I don't quite follow?
The figures I used (85% & 80%) are actually rather optimistic, or pessimistic in terms of the actual CPU power draw. The VRIn VRM efficiency is higher than normal because the output current is rather low (due significantly higher output voltage). Also for Haswell's FIVR solution, Intel has quote efficiency of 75% (IIRC). The FIVR efficiency will always be pretty poor, since it operates at up to 500 times higher fSW than conventional VRM. The hyper-high fSW is required due the inductors being integrated into the core (i.e. tiny and low inductance).

The same can be said for zen right? Its still less than 2/3 tdp.
If we look at lower tdp xeon they take a sizable freq hit.

Abwx · Jun 4, 2016

AtenRa said:

Those numbers are the only ones that currently matters.

From thoses numbers we can accurately predict that an average power Zen will require no more than 0.865V at 2.8GHz.

Also, that s not the fastest transistors for LPP 14nm wich are used in these tests, there s a higher grade in this respect (sLVT option) but since its leakage is substancially higher this is likely to be used only in the higher clocked part of a design such as Zen.

krumme · Jun 4, 2016

As a bit of ot. Even audio dacs get their own in house designed low noise voltage regulator these days
http://www.esstech.com/index.php/en/products/low-noise-low-dropout-regulator/
Lol. The 1 bit dacs is of nature prone to both clock jitter and voltage noise but ofcource its also about simplifying implementation. Ess gives you both voltage regulator, dac and lineout amp. One company can manage it all today even within such a small market.

The Stilt · Jun 4, 2016

krumme said:
The same can be said for zen right? Its still less than 2/3 tdp.
If we look at lower tdp xeon they take a sizable freq hit.

Zeppelin has hardware built into the die, which Broadwell-E doesn't have. No idea how much power it consumes on 14nm LPP, but I would expect it to stay below 10% of the total TDP.

krumme · Jun 4, 2016

The Stilt said:
Zeppelin has hardware built into the die, which Broadwell-E doesn't have. No idea how much power it consumes on 14nm LPP, but I would expect it to stay below 10% of the total TDP.

Ahh f. I forgot its not into the die on bw e. But anyway the difference on paper is still huge. How that translates in real world scenarios is yet to be seen and could well be different. But imo efficiency is master key.

itsmydamnation · Jun 4, 2016

DrMrLordX said:
No they wouldn't. Their TCO is entirely too high even taking into account throughput.

you cant even remotely say things like this with any confidence , because if throughput was the key metric organisations wanted then server market share would have at least stabilized with bulldozers release and we would have seen a release of a steamroller based and excavator based server parts. those cores themselves also likely would look completely different then they do now.

If that were true, neither Intel nor AMD would even bother selling HT-capable CPUs as datacenter products. Total throughput is still going to matter. You are also failing to take into account the fact that Zen will have to have significantly higher performance from the first thread of each core vs. the first thread of each XV module in order to exceed XV's throughput given identical clockspeeds. AMD has to deliver on both counts.

You realize that if you have high ILP (aka throughput) then SMT provides very little in the way of performance benefit because your alreadly bottle necked in the core else where.

I dont know how many datacentres you have been in but i've designed converged infrastructure for more then i care to count. What does the average datacentre compute layer look like? its looks like one of two things:

1. racks of 1ru pizza boxes ( normally 2 proc)
2. racks of ~10 ru blade enclosures (normally 2 proc)

what in almost every case is running on the bear metal? KVM/ESX/hyperV. These days the only x86 stuff you see that isn't running on a hypervisor is big DB.

Now what does HT allows you do to? It allows you to increase your density of VM's. This is because the hypervisor has a NUMA aware scheduler, so if you have a VM that is running a throughput workload and maxing a core ( or several) it will move workloads around accordingly to deliver the best realtime performance it can to all VM's. So two VM's that aren't doing much end up sharing a core at that point.

Now you can always just oversubscribe the number of vthreads to real threads ( you almost always do regardless of having HT or not) but when that ratio becomes to high ( this is variable based on workload) response times become really poor, having more threads allows keeping that ratio lower and thus giving better performance( particularly in latency).

Each Zen core has twice the fp resources of a single XV module.

No it doesn't, it all depends how you count it. a XV module has 2xFMA units, Zen has 2xFMA a core. XV module has one FPU store port, we dont know how many in Zen but also likely one. XV module has two add's , Zen has two adds. XV has two muls, Zen has two muls.

What has changed is the way the hardware is arranged and it's exposure to the FPU scheduler. That change in arrangement will help with not FMA SSE and AVX but not that much for throughput and the reason for that is that the load/store system of zen is 2loads/1store a cycle. This is 1/2 of a VX module and in throughput workloads you are going to have lots of loads and store to memory sub-system. Thats why there is such a big AVX256 jump from ivory brige to haswell , because the load store bandwidth doubled.

What Zen FPU design does it reduce latency vs XV, lots of SSE,AVX workloads can have dependent instructions ( think physics games etc). Zen will reduce the cycle time of those operations vs VX and should be able to on average schedule them earlier ( more exposed ports), thats IPC increase.

Now as to integer IPC, there are lots of things in Zen to help integer IPC these will also help FP that has dependent instructions. The big one is the cache system, bulldozers L2 is a mess and its CMT's fault, high L2 latency doesn't hurt throughput but it does hurt serial code a lot. Next integer execution units are actually doubled from a CON core, this will allow things to be scheduled sooner and give a better distribution of complex instructions over the ports ( ie not having the only branch and only imul on the same port like in CON cores).

After that its the incremental things, better prefetch and predict , better memory disambiguation. There are other things that might help but we dont really know yet, like the stack cache implementation (if its explicit and has lower latency) and the "uop/trace cache" that according to patient exists in the execution/retirement stage of the pipeline not in the instruction decode like in core2 cpu's ( there also appears to be a loop buffer in decode).

Doom2pro · Jun 4, 2016

sirmo said:
Even dedicated VR's rarely cross 85% efficiency. Because you can optimize them for a certain load but the load changes. If you read the datasheets 90+ efficiency is only achievable in like specific scenarios with low currents at low temperatures.

For 90+ efficiency with varying loads you need multi-tapped inductors, and a complicated switcher to utilize the different taps for different loads... Hopefully hybrid planar inductor+mosfet VRMs will solve this problem in the future, as they can have integrated multiplexing switchers (one piece of silicon for switchers, synchronous rectification and driver IC) coupled to multiple taps on the inductor and multiplexed in real time, in step with the amount of load, all in one package (presumably all encased in sintered ferrite).

Right now using discrete components would be excessive, too much package overhead, taking up too much board real estate.

LTC8K6 · Jun 4, 2016

The Stilt said:
I would assume that on Intel base and all core turbo exists because of the same thing why Broadwell-E got "Turbo Boost Max 3.0", i.e different clocks for AVX2 and non-AVX2 workloads (difference in power draw).

Zen shouldn't have a drastic difference in power consumption between AVX2 and non-AVX2 workloads, as far as I've understood it.

http://www.intel.com/content/www/us/en/support/processors/000005523.html

No mention of AVX2 in the turbo freq tables.

Most of them have an all core turbo speed that is higher than the base clock.
The 6700K being a notable exception...

New Zen microarchitecture details

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member

Golden Member

Senior member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Senior member

No Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Lifer