AMD Zeppelin codename confirmed and perhaps 32 cores per socket

NTMBK · Feb 2, 2016

Sweepr said:
32-cores to adress 80% of the server market, interesting.

They probably don't expect it to beat the fastest Skylake-EP/EX (up to 28-cores).

Or alternatively, they expect to lose to it in 20% of use-cases. Say, ones that make heavy use of FP vectors.

svenge · Feb 2, 2016

Given AMD's current nose-down trajectory I'm surprised that they didn't go one further and just name it Hindenburg, as it consists primarily of hot air and will likely go down in flames.

Threadcrapping is not allowed here
Markfw900

mrmt · Feb 2, 2016

svenge said:
Given AMD's current nose-down trajectory I'm surprised that they didn't go one further and just name it Hindenburg, as it consists primarily of hot air and will likely go down in flames.

This. Plus Zeppelins are big and slow, exactly the traits you should expect of your brand new processor.

PPB · Feb 2, 2016

mrmt said:
This. Plus Zeppelins are big and slow, exactly the traits you should expect of your brand new processor.

From attacking the OP to now speculating the product's performance based on it's codename.

The only nose-diving I am seeing here is in the forum technical discussion's level.

OT: So the L3 isn't actually a LLC talking die-wise. Are they probably relying on HBM for non-consumer products to do that job?

superstition · Feb 2, 2016

PPB said:
From attacking the OP to now speculating the product's performance based on it's codename.

The only nose-diving I am seeing here is in the forum technical discussion's level.

agreed

Dresdenboy · Feb 2, 2016

NostaSeronx said:
It is only single socket.

Zeppelin = 8 cores per die * 4 dies on interposer * 1 socket => 32 cores / 64 threads.

Dual Socket = 8 cores per die * 4 dies on interposer * 2 sockets => 64 cores / 128 threads.

Not like Zep will ever release. 22FDX is pretty much being shoved down AMD's throats by GlobalFoundries in the closed off meetings. (They didn't do STM's 28nm FDSOI, and they wont do Samsung's FinFETs.)

I assume, the cores/die granularity is 16 with 2 mem channels. This shouldn't be too big (~160 - 200 mm²), as one CU with 8 MB L3 (assumed) is ~30 mm².

ShintaiDK said:
If that was the case, AMD wouldn't keep it a secret and AMDs stock would explode.

Hypothetical scenarios stays hypothetical

Ask your preferred server OEMs. They might know more. AMD might still look at the Shmoo plots.

Sweepr said:
32-cores to adress 80% of the server market, interesting.

They probably don't expect it to beat the fastest Skylake-EP/EX (up to 28-cores).

Surely not at FP SIMD heavy stuff. Without that, cores become smaller, use less power, need lower voltage margins.

Azuma Hazuki · Feb 2, 2016

NostaSeronx said:
It is only single socket.

Zeppelin = 8 cores per die * 4 dies on interposer * 1 socket => 32 cores / 64 threads.

Dual Socket = 8 cores per die * 4 dies on interposer * 2 sockets => 64 cores / 128 threads.

Not like Zep will ever release. 22FDX is pretty much being shoved down AMD's throats by GlobalFoundries in the closed off meetings. (They didn't do STM's 28nm FDSOI, and they wont do Samsung's FinFETs.)

Speaking of, what happened to those Harvester and Crane cores you mentioned a few months ago?

myocardia · Feb 2, 2016

PPB said:
Are they probably relying on HBM for non-consumer products to do that job?

I hope not. Sure, it would give them better performance, but it would come at the cost of both SKU prices and availability. Not only does AMD need HBM for their higher performing GPUs, they also need it for their Zen APUs. Also, don't forget that nVidia has plans of using HBM on their Pascal GPUs, as well.

Call me crazy (everyone else does!), but I'd rather see widely available Zens that are slightly lower-performing at a lower price that makes AMD plenty of profit, than to see higher-performing Zens that are scarce, higher-priced, and making AMD less profit.

Bryf50 · Feb 2, 2016

Azuma Hazuki said:
Speaking of, what happened to those Harvester and Crane cores you mentioned a few months ago?

His insider sources are the voices inside his head.

Dresdenboy · Feb 2, 2016

PPB said:
OT: So the L3 isn't actually a LLC talking die-wise. Are they probably relying on HBM for non-consumer products to do that job?

I need to check Loh's works regarding this nomenclature. The second patch talks about LLC (on die), while HBM + NVM/DDR4 might also be seen as external two level memory architecture. So instead of HBM = L4, it would simply hide the off-package mem's low bandwidth.

krumme · Feb 2, 2016

NTMBK said:
Or alternatively, they expect to lose to it in 20% of use-cases. Say, ones that make heavy use of FP vectors.

Good point. One can add that keller and team didnt have the ressources to develop the new fancy fp part anyway. They did have to rely on a lot of prior design blocks. As long as zen pulls a future console and desktop the gpu can take care of the rest. Still -adress- 80% of future server market. Makes good sense imo to focus unlike the bd that was supposed to be good at everything and failed at everything.

mrmt · Feb 2, 2016

krumme said:
Good point. One can add that keller and team didnt have the ressources to develop the new fancy fp part anyway. They did have to rely on a lot of prior design blocks. As long as zen pulls a future console and desktop the gpu can take care of the rest. Still -adress- 80% of future server market. Makes good sense imo to focus unlike the bd that was supposed to be good at everything and failed at everything.

Bulldozer wasn't supposed to be good at everything, it was supposed to excel at INT workloads while being reasonable on FP workloads. It also relied on more cores in order to compensate for the lower IPC.

Not that I think AMD will release a product so bad as Bulldozer, but the strategy behind it isn't anything new.

PPB · Feb 2, 2016

myocardia said:
I hope not. Sure, it would give them better performance, but it would come at the cost of both SKU prices and availability. Not only does AMD need HBM for their higher performing GPUs, they also need it for their Zen APUs. Also, don't forget that nVidia has plans of using HBM on their Pascal GPUs, as well.

Call me crazy (everyone else does!), but I'd rather see widely available Zens that are slightly lower-performing at a lower price that makes AMD plenty of profit, than to see higher-performing Zens that are scarce, higher-priced, and making AMD less profit.

I dont think LLC is something to make big generalizations. We have examples of products having great all around performance without a die-wise LLC (conroe to me is a good example), and products that suffer little performance for not having LLC compared to the very same die javing that LLC enabled (because it was slow to begin with, think about K10/10.5 and the whole BD architecture).

AMD could very well target good enough performance for the consumer and just keep this L3 cache shared between those 4 core blocks (making a smaller but faster L3 cache compared to BD) , and have HBM as a proper, big LLC in their server or HPC products.

JDG1980 · Feb 2, 2016

mrmt said:
Bulldozer wasn't supposed to be good at everything, it was supposed to excel at INT workloads while being reasonable on FP workloads. It also relied on more cores in order to compensate for the lower IPC.

Not that I think AMD will release a product so bad as Bulldozer, but the strategy behind it isn't anything new.

Cache latency was one of the big IPC killers on Bulldozer and its progeny. AMD's presentation slides specifically indicated improved cache performance for Zen, so at least there's an internal awareness of this issue and efforts taken to fix it.

Also, it's important to distinguish between trading off FPU performance in general (which Bulldozer did) and trading off AVX performance in particular. The FPU in Bulldozer is used for a lot of stuff, much of which routinely shows up in mainstream apps; not only actual floating point, but also MMX and SSE2 integer vector operations, go through it. This is where the corner cutting really hurts. On the other hand, AVX performance is a much more acceptable sacrifice. AVX, in practice, often gives only marginal improvements over SSE2, and is used in a lot fewer applications.

It's worth pointing out that even Sandy Bridge made some compromises with AVX to save die space:

Sandy Bridge allows 256-bit AVX instructions to borrow 128-bits of the integer SIMD datapath. This minimizes the impact of AVX on the execution die area while enabling twice the FP throughput, you get two 256-bit AVX operations per clock (+ one 256-bit AVX load).

I think we can expect something similar from AMD this time around.

Also, remember that Intel didn't consider AVX-512 important enough to justify its die-space expense on the desktop and mobile SKUs. Once you get above 128 bits, you run into diminishing returns with vector operations - unless your whole code path is massively parallel, in which case you should just be running it on a GPU instead.

JDG1980 · Feb 2, 2016

Sweepr said:
32-cores to adress 80% of the server market, interesting.

They probably don't expect it to beat the fastest Skylake-EP/EX (up to 28-cores).

If this product does make it to market, I think we'll see 32 Zen cores with IPC between Sandy Bridge and Haswell levels depending on workload, running at a clock speed probably between 2.0 and 2.5 GHz. This would fall behind 28-core Skylake, due to the latter's superior IPC, but not by a huge amount. (Of course, single-threaded and lightly-threaded apps would see the worst hit, but this is less of an issue on servers than on desktops.) On a 14nm FinFET process, Zeppelin could probably fit within a typical server CPU power envelope (130W-150W). This would require a dedicated socket, since AM4 is said to only be rated for up to 95W.

el etro · Feb 2, 2016

Sweepr said:
They probably don't expect it to beat the fastest Skylake-EP/EX (up to 28-cores).

Coming form Bulldozer, is everything if it is anything more.

DrMrLordX · Feb 2, 2016

Dresdenboy said:
If the former leaked slides are real, then such a CPU with 4xDDR4 channels, HBM, Greenland GPU surely isn't aimed at HEDT. AM4 simply wouldn't support it.

Yup, nothing quad channel will work on AM4. I'm interested in knowing if a dual-channel version could be made to work on AM4, though.

NostaSeronx said:
Not like Zep will ever release. 22FDX is pretty much being shoved down AMD's throats by GlobalFoundries in the closed off meetings. (They didn't do STM's 28nm FDSOI, and they wont do Samsung's FinFETs.)

. . . seriously?

Azuma Hazuki said:
Speaking of, what happened to those Harvester and Crane cores you mentioned a few months ago?

He's just keeping up with his stellar track record.

erunion · Feb 2, 2016

ShintaiDK said:
I was talking about csbin. But you obviously felt hit for some reason...

because it's his blog

NostaSeronx · Feb 3, 2016

Azuma Hazuki said:
Speaking of, what happened to those Harvester and Crane cores you mentioned a few months ago?

Excavator - 28nm Advanced(Ex:SLPP//Super Low Power Plus)
Excavator+ - 20nm Advanced Next(Ex:LPMP//Low Power Mobility Plus)
Harvester(ARM) + Crane(x86) - 14nm eXtreme Mobility

then,
Excavator - 28nm Advanced
Harvester(ARM) + Crane(x86) - 28nm FDSOI

then,
Excavator - 28nm Advanced
Zen - 14nm LPP
Tunnelborer(ARM/x86) - 22FDX
---
TB => Quad-core CMT cluster with Dual-core CSMT. Sometime around boot-time determines if its x86 or ARM via UEFI/OS integration.

crashtech · Feb 3, 2016

Tunnelborer???

Azuma Hazuki · Feb 3, 2016

NostaSeronx said:
Excavator - 28nm Advanced(Ex:SLPP//Super Low Power Plus)
Excavator+ - 20nm Advanced Next(Ex:LPMP//Low Power Mobility Plus)
Harvester(ARM) + Crane(x86) - 14nm eXtreme Mobility

then,
Excavator - 28nm Advanced
Harvester(ARM) + Crane(x86) - 28nm FDSOI

then,
Excavator - 28nm Advanced
Zen - 14nm LPP
Tunnelborer(ARM/x86) - 22FDX
---
TB => Quad-core CMT cluster with Dual-core CSMT. Sometime around boot-time determines if its x86 or ARM via UEFI/OS integration.

Sorry, I'm still not following this. "Excavator+" at 20nm doesn't look like anything AMD has on its roadmaps, and it *sounds* like Stoney Ridge, except THAT'S supposed to be another 28nm Carizzo-based SOC isn't it?

I haven't heard of 28nm anything, let alone full-depletion silicon on insulator, on their roadmap either, as far as CPUs go.

Is Harvester supposed to be their answer to Apple's Cyclone or something? And is Crane some kind of 14nm respin of Excavator to follow Stoney Ridge? Also, HOW would Tunnelborer even work?

This stuff is all over the place. Where are you getting this?

Arachnotronic · Feb 3, 2016

Azuma Hazuki said:
Sorry, I'm still not following this. "Excavator+" at 20nm doesn't look like anything AMD has on its roadmaps, and it *sounds* like Stoney Ridge, except THAT'S supposed to be another 28nm Carizzo-based SOC isn't it?

I haven't heard of 28nm anything, let alone full-depletion silicon on insulator, on their roadmap either, as far as CPUs go.

Is Harvester supposed to be their answer to Apple's Cyclone or something? And is Crane some kind of 14nm respin of Excavator to follow Stoney Ridge? Also, HOW would Tunnelborer even work?

This stuff is all over the place. Where are you getting this?

Haven't you figured it out yet? This is all made-up garbage.

Dresdenboy · Feb 3, 2016

mrmt said:
Bulldozer wasn't supposed to be good at everything, it was supposed to excel at INT workloads while being reasonable on FP workloads. It also relied on more cores in order to compensate for the lower IPC.

Not that I think AMD will release a product so bad as Bulldozer, but the strategy behind it isn't anything new.

I followed the BD arch (not the marketing slides) starting 7 years ago. And IMO before hitting the 32nm realities, they tried to put more hardware per thread into a given area, applying a sharing concept for typically underutilized blocks, while making it a speed racer design to compensate for cores certainly not designed to excel at INT workloads, especially at low thread counts. But this is one part. As it emerged, some serious bottlenecks sat in the data writing paths to L2 (WCC quickly overflowing, low L2 write B/W) and further down the memory hierarchy.

Azuma Hazuki · Feb 3, 2016

I often wonder what the architecture would have been like with double the L2 cache with more associativity, speed, and lower latency. It looks to me like a hungry and chronically underfed animal.

itsmydamnation · Feb 3, 2016

Azuma Hazuki said:
I often wonder what the architecture would have been like with double the L2 cache with more associativity, speed, and lower latency. It looks to me like a hungry and chronically underfed animal.

But thats the thing according the david kanter the actual L2 arrays themselves are fast, so if they are fast yet latency to L2 is so high whats wrong and how do you fix it.

one thing that will be very interesting to see with Zen is what will the L1 latency be, i wouldn't be surprised to see 3 cycle but i expect 4.

AMD Zeppelin codename confirmed and perhaps 32 cores per socket

Lifer

Senior member

Diamond Member

Golden Member

Platinum Member

Golden Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Lifer

Senior member

Diamond Member

Lifer

Golden Member

Lifer

Golden Member

Golden Member

Diamond Member