New Zen microarchitecture details

Page 192 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,723
4,594
136
It's exactly the opposite because to write code for the GPU, we need to go back to low level languages and paradigms and yes I count C++ as low-level. Yes, there are libraries to use OpenCL from high-level languages but you still greatly need to adjust (or so dumbify) your code to work for OpenCL or GPUs in general. From objects back to arrays and scalars. Point being it is much harder and takes longer to develop and hence will only be used in software were that really brings a great benefit in terms of absolute and relative performance increase.
Problem here is the abstraction of the software. If the APIs or any software would be as optimized as CUDA(ergo. zero abstraction) we would see GIGANTIC leap in performance.
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
It's exactly the opposite because to write code for the GPU, we need to go back to low level languages and paradigms and yes I count C++ as low-level. Yes, there are libraries to use OpenCL from high-level languages but you still greatly need to adjust (or so dumbify) your code to work for OpenCL or GPUs in general. From objects back to arrays and scalars. Point being it is much harder and takes longer to develop and hence will only be used in software were that really brings a great benefit in terms of absolute and relative performance increase.

You are taking the metaphor too seriously... The point I was making was, back then there was something new that could improve something but not everyone was adapting it because they were stuck in their ways. I wasn't trying to draw any parallels (no pun) with anything low level.
 
  • Like
Reactions: Drazick

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
That kind of serial code which will probably affect user experience might often be some not so well written code at any stage. There are many ways to create something slow. And you might find it anywhere from device drivers, OS, application code. Even shader compilers might show bad habits here.

Where did you get this 1.2b number from? You don't need money for developing the main GPU components, also used in dGPU chips. These are IP blocks. Putting this all together into a design, where the necessary blocks (Zen cores, GPU cores, UMC, FCH, etc. already exist, mostly in the same process) would in case of ommitting a $5-$10 interposer solution cause costs for new mask sets of <$100M.

Clarkdale and Arrandale had GPUs on package. Would that have been anything close to Fusion if the GPU could have done OpenCL stuff?

Why wouldn't it be close to fusion?
The hardware is just executing an abstraction.

The 1.2b is probably way wrong. But say at 100M for design and mask in same process vs 10 usd for an interposer one have to be very carefull what path you take.
For a company that makes no profit 100M is a huge amount and exposes it to treats and risk. Risk is money. Thats why you pay insurance or huge cooperations invest in derivatives. To minimize risk.
Think about this. Intel in Otellini style goes for the economic impact and reaction.
They know amd have invested 100m in some huge apu for specific segment. What do you think their response is?
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Price depends on volume of production. The higher production is, the lower costs are. AMD can have much better deal than anyone else with HBM, because they have co-developed it, and will have at least two products that will use HBM2(so the higher volume they can buy, lowering the manufacturing costs, and therefore lowering the price for AMD).
You are forgetting yields. It is possible that HBM2 yields are so terrible that the proper volumes simply cannot be produced. Hence it only lands in the highest margin products. Also, beware that we have no substantiated information about other Vega die, it may actually not use HBM2, you know. It is unlikely, but it is in picture.
For example HIP. HCC Compiler. Companies are starting to move away from CUDA, to OpenCL. And here HIP was extremely helpful to compile CUDA code to OpenCL.
You did not actually answer the question. Plus, i suggest that the second statement needs a source.
Yes, I know that Intel GPUs are not bad right now, however, they are only not bad in high-end part. The parts with EDRAM are not bad. The rest is rubbish.
Both Intel GPUs and AMD GPUs have hit the same memory bottleneck nowadays, and that applies to low end parts as well.
Each memory cell uses between 3 to 5W of power, depending on the clocks of it. HBM was using 5W of power, and Fiji memory was using 20W of power, total. HBM2 uses 3 or 4 W of power, depending on the clocks.
May i see a source? For now, deducing from Hynix page and AMD slide deck on HBM1, i come to conclusion that 1 stack of 1Ghz HBM2 [so 1024-wide@2Gpbs] would consume about 7 to 8 watts. For comparison, 1060's memory consumes about 20 watts, if my memory serves me right.
Let me give you perspective. 95W APU with 4C/8T can offer Core 7 4790S level of performance(65W CPU) plus GPU that offers WX 5100 level of performance(75W TDP).
Sure, if it will not power throttle clocks down to levels of 35W CPU and 50W TDP GPU :). Where are the last 10 watts you ask? Why, in HBM2 of course. That's the part you leave out of equation: just because TDP is set at 95W it does not mean that it will work in full capacity at this TDP. Intel's mobile line-up does that ALL the time.

In the end, do i think it is possible that single die APU with HBM2 or what have you will happen? Yes. Do i think it will be part of Raven Ridge line-up, or even Apple semi-custom? Not at all, and i think i have laid out my reasoning... badly, but you get it.
 
  • Like
Reactions: Sweepr

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
What kind of point is that? Due to iGPUs being severly memory bandwidth limited, they usually were slow and priced in accordingly. If one could reduce the bandwidth bottleneck, would the latter still remain be true?
If given the choice between a cookie cutter system with APU and raw CPU+dGPU at the same cost and largely the same performance (for the sake of it, let's just say APU is slightly faster), which one would average layman choose? That's what mindshare stands for. You can actually witness that in laptop market where bottom-tier dGPUs are coupled with mainstream -U chips landing similar performance as Iris -U chips at similar cost and worse power consumption. Sometimes it gets so bad, that even mainstream GT2 iGPU has the same or better performance as dGPU and the latter does nothing but waste power. Want to take a shot at what version sells more units?
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
At this puint amd needs to get real and pair a igp that wont be bottlenecked in 80% of the cases with ddr4 2666 sticks for example. 384 shaders at 700-800 mhz would probably suffice at this stage.

Obviously, this would be bad marketing if said igp would be the top one, but also mindlessly throwing CUs at the problem without significally better system banwidth in non hbm scenarios is also stupid.

Sent from my XT1040 using Tapatalk

Edit: bad marketing compared to your previous top igp with 512 shaders obviously.
 
  • Like
Reactions: Phynaz

Glo.

Diamond Member
Apr 25, 2015
5,723
4,594
136
You are forgetting yields. It is possible that HBM2 yields are so terrible that the proper volumes simply cannot be produced. Hence it only lands in the highest margin products. Also, beware that we have no substantiated information about other Vega die, it may actually not use HBM2, you know. It is unlikely, but it is in picture.
You need to bring numbers, otherwise you are creating a problem that does not exist, to prove your point, about higher manufacturing cost. Is it possible? Yes. Is it likely? No.

You did not actually answer the question. Plus, i suggest that the second statement needs a source.
Already AMD provided few announcements about HPC switching to their platform from CUDA.

Both Intel GPUs and AMD GPUs have hit the same memory bottleneck nowadays, and that applies to low end parts as well.
What does it have to possibility of having 512 GB/s HBM2 in 95W APU, and its affect on performance. Would it be then Memory Bandwidth starved?

May i see a source? For now, deducing from Hynix page and AMD slide deck on HBM1, i come to conclusion that 1 stack of 1Ghz HBM2 [so 1024-wide@2Gpbs] would consume about 7 to 8 watts. For comparison, 1060's memory consumes about 20 watts, if my memory serves me right.
http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/5
Given that AMD opted to spend some of their gains on increasing memory bandwidth as opposed to just power savings, the final power savings aren’t 3X, but by AMD’s estimates the amount of power they’re spending on HBM is around 15-20W, which has saved R9 Fury X around 20-30W of power relative to R9 290X
R9 290X uses 16 memory cells with 512 Bit memory bus. If it consumes around 50W of power, then each cell consumes around 3.1W, at 5000 MHz clock.

There you are. You have proven me correct. Your conclusion is incorrect. Each memory cell is 32 bit. To get 192 Bit memory bus, you need 6 memory cells. 20W divided by 6 gives you around 3.3W of power, at 8000 MHz memory clocks.

SK Hynix PDF about HBM2 shows that HBM2 uses 8% less power per stack than HBM. So if HBM package, for 4 stack was consuming 15-20W(3-4W per stack), we can also see within this range power consumption on HBM2. So we are looking at 8W for two stacks of HBM2.
Sure, if it will not power throttle clocks down to levels of 35W CPU and 50W TDP GPU :). Where are the last 10 watts you ask? Why, in HBM2 of course. That's the part you leave out of equation: just because TDP is set at 95W it does not mean that it will work in full capacity at this TDP. Intel's mobile line-up does that ALL the time.
I am not taking out HBM2 of the equation. You are just overestimating its power draw. It will be between 3 to 4W for each stack of HBM2 on the package.
In the end, do i think it is possible that single die APU with HBM2 or what have you will happen? Yes. Do i think it will be part of Raven Ridge line-up, or even Apple semi-custom? Not at all, and i think i have laid out my reasoning... badly, but you get it.
Fair enough. But you have not provided anything besides your reasoning.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
That's what mindshare stands for.

AMD needs to capture mindshare. The only way to do that is to create a solid brand. Let's say that every Vega gpu in every desktop APU AND every notebook is, bare minimum, a 16CU APU with 8GB of HBM or a discrete with at least that level of performance. Then AMD would have a brand. People could look at the spec sheet and know that if it says Vega, then it is at least this powerful. AMD has sullied every attempt at branding. Take FX for example. That used to mean something, now it means nothing. If they release a 8 CU single channel DDR4 turd, then all their higher tiers are going to be tied to it. And you know if they even allow single channel DDR4 as a possibility, then that is what they will all be. They need to cut that possibility off completely.
 
  • Like
Reactions: Phynaz

lixlax

Member
Nov 6, 2014
183
150
116
Someway it will make sense for AMD to release SR7 as 8c/16t, SR5 as 8c/8t and SR3 as 4c/8t. The performance delta will be exactly the same as between i3, i5 and i7, just with 2x the cores and threads for each segment. Assuming the IPC will be around Haswell/Broadwell levels and base clocks 3,4GHz or higher (good enough) they could price their tiers something like that:
SR7 400-500USD
SR5 275-350USD
SR3 200-225USD
Some months later release 4c/4t as Athlon X4 around 100USD to compete with the Pentiums.

Ofcourse it could backfire on sales because average Joe wouldn't understand why should he pay more for SR7 compared to the same number from the superior company.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Why wouldn't it be close to fusion?
The hardware is just executing an abstraction.
If GMA would have supported OpenCL (we might also count the former OpenGL compute stuff), this would still be like a CPU and a GPU connected via QPI instead of AGP/PCIe, with the MC even sitting on the GPU. It got closer together, but the differences started to appear, when for example mem pages were shared between the different compute hardware instances. I think, one main goal for Fusion/HSA was to reduce the overhead of the "old-world" way of doing CPU-GPU computing, as depicted here. If GMA would have had OpenCL support, it might even count as package level "Fusion" according to the linked roadmap.

The 1.2b is probably way wrong. But say at 100M for design and mask in same process vs 10 usd for an interposer one have to be very carefull what path you take.
For a company that makes no profit 100M is a huge amount and exposes it to treats and risk. Risk is money. Thats why you pay insurance or huge cooperations invest in derivatives. To minimize risk.
Think about this. Intel in Otellini style goes for the economic impact and reaction.
They know amd have invested 100m in some huge apu for specific segment. What do you think their response is?
Indeed, it's also a matter of costs. So if AMD in your example doesn't plan to sell more than 10M units over the next years, the interposer might be better. But aside from this being likely or not, there might be other issues, like limited interposer production or packaging capabilities.

I think, regarding investing 100M or not depends a lot on a products' prospects, not only on available money.

If given the choice between a cookie cutter system with APU and raw CPU+dGPU at the same cost and largely the same performance (for the sake of it, let's just say APU is slightly faster), which one would average layman choose? That's what mindshare stands for. You can actually witness that in laptop market where bottom-tier dGPUs are coupled with mainstream -U chips landing similar performance as Iris -U chips at similar cost and worse power consumption. Sometimes it gets so bad, that even mainstream GT2 iGPU has the same or better performance as dGPU and the latter does nothing but waste power. Want to take a shot at what version sells more units?
That APU vs. CPU+dGPU discussion has been repeated many times. And the layman might buy a system with "Quadcore processor" and "Radeon graphics" stickers on it. Does he care about it being an iGPU system? Why did AMD sell APUs all these years? System builders/OEMs might prefer to put one component on the board (plus RAM) instead of two.

What would the average layman do according to your opinon? And is it black/white or can we talk percentages? I don't want to hear that 50.1% would buy discrete GPUs, so the argument has been won, this is no election. 49.9% can still be enough to do business. ;)

I don't care about stupid laptop configurations. There are also good ones. Nobody needs to buy the bad designs. How many systems get sold, which have Iris graphics and a 2-3x faster dGPU?
 
Last edited:
  • Like
Reactions: krumme

Glo.

Diamond Member
Apr 25, 2015
5,723
4,594
136
I don't care about stupid laptop configurations. There are also good ones. Nobody needs to buy the bad designs. How many systems get sold, which have Iris graphics and a 2-3x faster dGPU?
There was even stupid design with Iris Pro and similarly performing GPU.

*cough* Apple *cough*.
 
  • Like
Reactions: Dresdenboy

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Indeed, it's also a matter of costs. So if AMD in your example doesn't plan to sell more than 10M units over the next years, the interposer might be better. But aside from this being likely or not, there might be other issues, like limited interposer production or packaging capabilities.

I think, regarding investing 100M or not depends a lot on a products' prospects, not only on available money.

In this case to go for the on die amd will have to be 100% sure they can sell 10M units plus interest (internal interest rate probably in excess of 30% rights now at amd because there is so many good other investment possibilities but thats another discussion).

Here comes the catch. They cant be 100% sure what to sell as they dont know Intel response for that segment. Its difficult to asses risk. Now imagine this as its just a way to frame it. If amd had to go to an insurance company and ask them to cover the cost for the project if it failed and was not profitable what would it cost? You have to add those cost to the project so to speak. Easily in the order of 20M i would say.

So in this case the investment is better viewed like 150M design, masks plus interest and risk. In amd its probably more about where to use sparse funds but it goes to show that amd have to plan to sell well in excess of 15M units in this situation to select the single die. If its a 16c die with huge gpu i personally wouldnt do it unless you have a proposition for a segment where Intel cant bring any compettition at all and its an absolute nessessity. Thats what i meant by exposing yourself to a risk. You simply make yourself vulnerable to counter attack from Intel. This is a strategic thinking amd needs to be far better at.

On the top level risk management is a vital part of strategy. With so enourmous startup cost and huge risk infinity fabric is a gift from the Gods.
 
  • Like
Reactions: Dresdenboy

jpiniero

Lifer
Oct 1, 2010
14,642
5,270
136
I was thinking about this, What do you guys think the odds are that AMD has non-Ryzen branded SR models at launch (ie: locked)?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
https://twitter.com/CPCHardware/status/826132904593985536

We might see 5 Ryzen SKUs announced.The actual availability of the 4C/8T and 4C/4T might happen in Q2.
This would actually be a change vs older plans (8/16, 8/8, 4/8), maybe due to an SMT related bug in earlier steppings (4/4 part) and bad cores (defect or out of spec -> 6/12).

There was even stupid design with Iris Pro and similarly performing GPU.

*cough* Apple *cough*.
Seems there are lots of good sales people out there! Long ago I learned that not every human decision can be explained with logic and reason. ;)
 
  • Like
Reactions: dfk7677

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
In this case to go for the on die amd will have to be 100% sure they can sell 10M units plus interest (internal interest rate probably in excess of 30% rights now at amd because there is so many good other investment possibilities but thats another discussion).

Here comes the catch. They cant be 100% sure what to sell as they dont know Intel response for that segment. Its difficult to asses risk. Now imagine this as its just a way to frame it. If amd had to go to an insurance company and ask them to cover the cost for the project if it failed and was not profitable what would it cost? You have to add those cost to the project so to speak. Easily in the order of 20M i would say.

So in this case the investment is better viewed like 150M design, masks plus interest and risk. In amd its probably more about where to use sparse funds but it goes to show that amd have to plan to sell well in excess of 15M units in this situation to select the single die. If its a 16c die with huge gpu i personally wouldnt do it unless you have a proposition for a segment where Intel cant bring any compettition at all and its an absolute nessessity. Thats what i meant by exposing yourself to a risk. You simply make yourself vulnerable to counter attack from Intel. This is a strategic thinking amd needs to be far better at.

On the top level risk management is a vital part of strategy. With so enourmous startup cost and huge risk infinity fabric is a gift from the Gods.
We might quickly get lost in interesting theories here. But let's return back to basics (KISS principle, Occam's razor, etc.): They have a pool of good IP, now including a viable CPU core, too, they have access to two current and one or two future not so crappy processes with at least a little flexibility in capacity w/o a fixed cost burden as in the past, they have Infinity Fabric, experience with interposers and HBM1/2, they look at adding PIM and FPGAs to the mix, the software stack is getting better, and whatever I forgot to list. Intel has a different mix, a mindshare advantage, and lots of production capacity in an environment of shareholder expectations being used to Intel's current numbers. That could mean they have to be careful with cutting prices.

We might also have to adjust our estimated volume. It's difficult to find useful numbers, so let's try with these:
In Q1'14 AMD sold 17M GPUs with 4.36M being dGPUs. This leaves about 13M APUs (they probably sold some chipsets with iGPU, too) in a first quarter. Given that 28nm APUs are uarch, power, and mem bandwith constrained, and sold in combination with not so fast CPU cores, the prospects of Zen cores, 14LPP, faster uarch and DDR4 memory, on a nicely upgradable socket for DT at least, should be good to sell at least 10M per quarter again, or 40M per year (seasonality removed). This means, if sold for only one year, roughly $4 per chip would suffice to recover the R&D investment incl. risk. Add $30 per packaged good die, and some margin to offset other costs, $45 should be a good ballpark estimate. This would define where AMD could begin to earn money, aside from sold chipsets. How would Intel's numbers change, if they'd lower prices of similarly performing to <$100?
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Seems there are lots of good sales people out there! Long ago I learned that not every human decision can be explained with logic and reason. ;)

We make decisions with cognition, feelings and our senses. Different parts of the brain. Logic is solved with cognitive abilities. But sometimes using feelings or your senses is more rational and effective and have been historical of great use. If you eg meet a tiger or attractive woman. Therefore the cognitive and logical abilities is only a few mm in the frontal lopes of the brain. The rest is for muscle control feelings and so on. So there is plenty reason to sell to those part of the brain. On consumer market thats vital. Nv knows how to do it. Amd dont. They have competences and culture geared for b2b market.
 
  • Like
Reactions: Dresdenboy

jpiniero

Lifer
Oct 1, 2010
14,642
5,270
136
In Q1'14 AMD sold 17M GPUs with 4.36M being dGPUs. This leaves about 13M APUs (they probably sold some chipsets with iGPU, too) in a first quarter.

That's 2014 though. AMD has to be selling a lot less than 13M APUs now. The entire PC market is now only like 72M in Q4 2016. Intel's desktop volumes in Q4 were down 10% compared to Q4 2015, too. That's sort of been the biggest question I think, that even the extreme best case scenario for AMD might not do much for them when the PC market as a whole is in decline.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
On consumer market thats vital. Nv knows how to do it. Amd dont. They have competences and culture geared for b2b market.

AMD's marketing has always been a disaster.

Simple reason being its inconsistency.

They get a new marketing "guru" in every 6 months that wipes the slate clean and starts again.

A number of AMD ideas in the past were fundamentally good, including:

- Torrenza (about 10 years ahead of its time given now all the talk is of using FPGA to accelerate specialised workloads).
- Fusion
- AMD Vision

But, not one of the marketing programs had any staying power, yet are now poised to be fundamental aspects of AMD products in the near future.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
You need to bring numbers, otherwise you are creating a problem that does not exist, to prove your point, about higher manufacturing cost. Is it possible? Yes. Is it likely? No.
Valid, but neither of us may find direct numbers. There is a solid hint in SKHynix only listing 1.6Gpbs HBM2 this quarter.
Already AMD provided few announcements about HPC switching to their platform from CUDA.
Names, i need names. My google fails to find any.
What does it have to possibility of having 512 GB/s HBM2 in 95W APU, and its affect on performance. Would it be then Memory Bandwidth starved?
Nothing, but it has to do with Intel's iGPUs being as good as AMD's nowadays because of the same bottleneck. Oh, by the way, you are now hoping for 2 stacks of 2Gbps HBM2 in an APU, not even 1.
R9 290X uses 16 memory cells with 512 Bit memory bus. If it consumes around 50W of power, then each cell consumes around 3.1W, at 5000 MHz clock.
You are comparing apples and oranges, let's start with this.
SK Hynix PDF about HBM2 shows that HBM2 uses 8% less power per stack than HBM. So if HBM package, for 4 stack was consuming 15-20W(3-4W per stack), we can also see within this range power consumption on HBM2. So we are looking at 8W for two stacks of HBM2.
There are 2 reference points to this: AMD lists bandwidth per watt in their PDF on HBM. Since power consumption scales basically linearly with clocks and HBM2 gains on HBM are all clocks, there is a reason to suspect that unlabeled graph compares bandwidth per watt (or bytes per joule whatever). As such, you can derive the power consumption from 204GB/s bandwidth (with 1.6Gbps die) yourself. Works out at about 7 watts. Would be 8 with 2Gbps die.
You are just overestimating its power draw.
Overestimating would be claiming that Fiji's HBM1 stacks consume total of 30 watts... Wait a minute, did anyone actually measure HBM1 power consumption? It would not even be hard to do.
Fair enough. But you have not provided anything besides your reasoning.
Doing anything else would require connections i do not possess.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Names, i need names. My google fails to find any.

It's true. A couple of Universities switched after AMD paid them to switch. It's not a bad concept, get the people in college to use your platform and then carry that experience into the business world.
 

inf64

Diamond Member
Mar 11, 2011
3,703
4,034
136
People are forgetting about XFR and its potential :). I would like to see a super-binned BE high leakage 16T part that is rated at 95W at rather lowish base clock (think 3.3-3.4Ghz), with no fixed Turbo rating. Then pair that thing with some premium water cooling and let it Turbo boost to whatever AMD presets for this range . Of course they'd have to test these parts and make sure they do run at say 4.3 or 4.4Ghz at elevated Vcore settings on all cores. These could also be OCed manually , ie. for competitions/hwbot. They could easily price this part at 800 dollars.
 
  • Like
Reactions: Drazick