AMD Raven Ridge 'Zen APU' Thread

T1beriu · Jun 2, 2017

I'm sorry, but that still contradicts Lisa's statement. An Intel package with Intel CPU & AMD GPU would compete against an AMD APU.

Lisa Su said:
We’re not looking at enabling a competitor to compete against our products.

Source: AMD

Anyways, sorry everybody for being offtopic.

mtcn77 · Jun 2, 2017

imported_jjj said:
It's not true at all that Intel or AMD made any comments on this.Their comments were strictly on licensing.

There is too much smoke and while there doesn't seem to be any licensing deal, that has nothing to do with putting 2 die in a single package and maybe using Intel's silicon bridge to link them.
Intel makes the CPU, AMD makes the GPU and that's it. Not much different than pairing an Intel CPU and an AMD discrete.
It's cost effective, it makes some sense and it's likely happening.

Stacking memory on logic is the path forward and will happen, there is no doubt about it but we are not there yet. Costs and thermal are issues but you save power, increase perf.

Not until spintronics. Thermal density sets the limit for performance. 2.5D doesn't try to circumvent this and that is why it is so successful. It beat every timetable whereas HMC failed. I even got a response they weren't competing. 🙂

imported_jjj · Jun 2, 2017

T1beriu said:
I'm sorry, but that still contradicts Lisa's statement. An Intel package with Intel CPU & AMD GPU would compete against an AMD APU.

Source: AMD

Anyways, sorry everybody for being offtopic.

Not if it's done at the request of a customer without availability outside of that.
It's no different than selling discrete and it's a gain for AMD at Nvidia's expense.
It would also be followed by all AMD such solutions for other OEMs, taking away share from discrete and Nvidia.

mtcn77 · Jun 2, 2017

imported_jjj said:
Not if it's done at the request of a customer without availability outside of that.
It's no different than selling discrete and it's a gain for AMD at Nvidia's expense.
It would also be followed by all AMD such solutions for other OEMs, taking away share from discrete and Nvidia.

I can speak from direct correspondence that their cpus are more important for AMD than gpus, at this time.

imported_jjj · Jun 2, 2017

mtcn77 said:
Not until spintronics. Thermal density sets the limit for performance. 2.5D doesn't try to circumvent this and that is why it is so successful. It beat every timetable whereas HMC failed. I even got a response they weren't competing. 🙂

Depends on market. Cost is the bigger problem in some markets. Not the same scenario in a 100W ASIC, a 4W SoC or a 200mW SoC. And ofc there are all kinds of memories.
Logic on logic is hard but that will come too.

imported_jjj · Jun 2, 2017

mtcn77 said:
I can speak from direct correspondence that their cpus are more important for AMD than gpus, at this time.

Ofc since it's a much larger market but that doesn't contradict my point at all.
There is a risk of opening this door and ending up having to do it for others too i guess but those customers would need to be high volume and pay a premium as neither Intel nor AMD would do it otherwise.Intel loses out too by not selling their own GPU so they need a premium.

Valantar · Jun 2, 2017

mtcn77 said:
EPYC multi-die sockets aren't even for HEDT, they are for servers. Match!

Sorry, I thought we were talking about Raven Ridge? As in: the mobile/low-to-mid power desktop chip? And besides, stacking makes even less sense in the server space where CPUs typically have ~125-140W TDPs and very strict thermal tolerances due to high density. Good luck shoving any kind of secondary IC in between the CPU and IHS/HSF there and not toasting the entire chip.

imported_jjj said:
Ofc they are because that's where they need the BW.
However they are not moving away from stacked chips as you call it, they are moving towards it just like everybody else.
They are moving away from silicon interposer as those are too costly.
Even the fan out solution for mobile shows stacked DRAM.
2.5 and 3D packaging is a great thing and will be used more an more as monolithic 3D is not quite here yet and process shrinks don't provide much.
But nobody will use it for fun, it has to make sense.

Moving away from the interposer and moving towards PoP stacking is not necessarily the same. Also, fun that you use the "fan out" solution as an example, given that the 2nd gen fan out clearly shows them not stacking ICs on top of each other any more. Now, I wonder why they might stop doing that? Might it be related to thermals? Hmmmm.

T1beriu said:
What?! Did I miss the news that talks about the technology going into this direction, HBM on top of CPUs? Can I have a source please?

Yeah, I'd like one too. And I'd like to see one demonstration of stacked memory on top of a power-consuming chip like a GPU, CPU or SoC where it doesn't thermally limit performance even at very low wattages.

Now, we've seen proposed tech for thermal vias and the like, but it's never been done in practice, at least not outside of laboratories. Up until it is (and even then), stacking chips is roughly equal to sticking a layer of insulation on top of your CPU/GPU instead of a heatspreader. Thought experiment: what would have the best thermal transfer of a layer of pure copper vs. a piece of silicon with tiny copper tubes intermittently spread through it? Remember, silicon has horrible thermal conductivity. And while ICs have plenty of copper in them, that doesn't help much. ARM tablets and their utter lack of sustained performance are a perfect illustration of this.

mtcn77 · Jun 2, 2017

imported_jjj said:
Depends on market. Cost is the bigger problem in some markets. Not the same scenario in a 100W ASIC, a 4W SoC or a 200mW SoC. And ofc there are all kinds of memories.
Logic on logic is hard but that will come too.

No, it won't. That is an NP hard question.
One's a mobile tech; the other cutting-edge, power-stricted, budget-restricted, feasibly-limited prototype runs. Infinitesimally different. Different: the only way it comes is when hbm is cheaper than regular memory, a.k.a the inverse of the proposition.

imported_jjj · Jun 2, 2017

Valantar said:
Moving away from the interposer and moving towards PoP stacking is not necessarily the same. Also, fun that you use the "fan out" solution as an example, given that the 2nd gen fan out clearly shows them not stacking ICs on top of each other any more. Now, I wonder why they might stop doing that? Might it be related to thermals? Hmmmm.

.

Are you high?
In mobile they move from PoP to a RDL substrate and stacked DRAM- DRAM is an IC BTW.
In the high end solution they just replace the SI interposer for lower costs but still use HBM.
You really have no idea what's in that slide and maybe Google for PoP since you clearly don't know what that is.

Valantar · Jun 2, 2017

imported_jjj said:
Are you high?
In mobile they move from PoP to fan-out with stacked DRAM- DRAM is an IC BTW.
In the high end solution they just replace the SI interposer for lower costs but still use HBM.
You really have no idea what's in that slide and maybe Google for PoP since you clearly don't know what that is.

Okay. Let me be clear: the first "fan out" picture shows DRAM stacked on top of the Application Processor - PoP . The second clearly doesn't. Now I might be misreading the slide, but to my eyes, that's a pretty clear illustration of no longer stacking RAM on top of the SoC. Of course DRAM is an IC (don't insult my intelligence, you dolt), but it's not one that produces any noticeable amount of heat. Instead, if stacked on top of a heat-producing IC such as a CPU, GPU or SoC, it acts as an insulator. Is that so hard to grasp? The second "fan out" image is labeled "side by side". What do you suppose they mean by that? And to be even more clear: I've never, ever, talked about stacking DRAM on top of DRAM (which is an utterly obvious thing to do, and which all HBM does...), in case your reading comprehension is that awful. I've been talking about stacking DRAM and other, hotter ICs together - which is the whole core of this discussion, of HBM and Raven Ridge. Oh, and the Server & Network part of the image only shows using a cheaper/easier to produce material for the substrate, no other change than that.

Insulting other members is not allowed.
Markfw
Anandtech Moderator

imported_jjj · Jun 2, 2017

mtcn77 said:
No, it won't. That is an NP hard question.
One's a mobile tech; the other cutting-edge, power-stricted, budget-restricted, feasibly-limited prototype runs. Infinitesimally different. Different: the only way it comes is when hbm is cheaper than regular memory, a.k.a the inverse of the proposition.

Note that i was talking memory on logic not HBM on logic or even 3D memory on logic -although 3D is likely.

Edit: Just remembered that memory on logic is already mass market with Sony's 3 layer image sensor.

imported_jjj · Jun 2, 2017

Valantar said:
Okay. Let me be clear: the first "fan out" picture shows DRAM stacked on top of the Application Processor - PoP . The second clearly doesn't. Now I might be misreading the slide, but to my eyes, that's a pretty clear illustration of no longer stacking RAM on top of the SoC. Of course DRAM is an IC (don't insult my intelligence, you dolt), but it's not one that produces any noticeable amount of heat. Instead, if stacked on top of a heat-producing IC such as a CPU, GPU or SoC, it acts as an insulator. Is that so hard to grasp? The second "fan out" image is labeled "side by side". What do you suppose they mean by that? And to be even more clear: I've never, ever, talked about stacking DRAM on top of DRAM (which is an utterly obvious thing to do, and which all HBM does...), in case your reading comprehension is that awful. I've been talking about stacking DRAM and other, hotter ICs together - which is the whole core of this discussion, of HBM and Raven Ridge. Oh, and the Server & Network part of the image only shows using a cheaper/easier to produce material for the substrate, no other change than that.

PoP is not stacking chips, it's "stacking" packages, 2 very very different things.
You claim that DRAM isn't producing any "noticeable amount of heat" and i'm the one that you deem necessary to insult? ROFL.

Valantar · Jun 2, 2017

imported_jjj said:
PoP is not stacking chips, it's "stacking" packages, 2 very very different things.
You claim that DRAM isn't producing any "noticeable amount of heat" and i'm the one that you deem necessary to insult? ROFL.

Very different only in production, not in thermal characteristics. Sure, removing the substrate of the top package removes one layer of thermal insulation. There are quite a few left. And yes, in comparison to an SoC, CPU or GPU, RAM produces negligible amounts of heat. Show me a DRAM stack/chip with a TDP above 2-3W, even in a high power scenario like a GPU, and I'll buy you an ice cream. In mobile, where RAM is extremely heavily optimized for power (rather than performance, like in a GPU), RAM might still be one of the more power consuming parts, but nowhere near an SoC or display. If that was the case, phones with more RAM would have significantly worse battery life...

As an example, check out this paper. Sure, memory is a measurable power draw, and in certain scenarios significant. But when compared to the SoC, even in a smartphone with a ridiculously low SoC TDP, it's only a fraction.

imported_jjj · Jun 2, 2017

Valantar said:
Very different only in production, not in thermal characteristics. Sure, removing the substrate of the top package removes one layer of thermal insulation. There are quite a few left. And yes, in comparison to an SoC, CPU or GPU, RAM produces negligible amounts of heat. Show me a DRAM stack/chip with a TDP above 2-3W, even in a high power scenario like a GPU, and I'll buy you an ice cream. In mobile, where RAM is extremely heavily optimized for power (rather than performance, like in a GPU), RAM might still be one of the more power consuming parts, but nowhere near an SoC or display. If that was the case, phones with more RAM would have significantly worse battery life...

As an example, check out this paper. Sure, memory is a measurable power draw, and in certain scenarios significant. But when compared to the SoC, even in a smartphone with a ridiculously low SoC TDP, it's only a fraction.

You really need to stop talking, you keep saying crazier and crazier things.
Here's an old slide from the AMD Fury stack. In mobile it's much worse when the DRAM uses a couple of W but this is more than enough, you are going on the ignore list.

Valantar · Jun 2, 2017

imported_jjj said:
You really need to stop talking, you keep saying crazier and crazier things.
Here's an old slide from the AMD Fury stack. In mobile it's much worse when the DRAM uses a couple of W but this is more than enough, you are going on the ignore list.

Well, I guess you won't see this (ignoring people that disagree with you makes for fun discussions!), but thanks for demonstrating my point, I suppose. That slide shows that a huge 512-bit GDDR5 bus (consisting of 16 32-bit interfaces, I assume) consumes gobs of power (duh), while four stacks of HBM drastically lowers that. And that's in a high power environment (GPU) where the RAM is constantly being hammered with use - i.e. those are maximum numbers or close to it. From that, we can tell that a single stack of HBM1 would consume something in the realm of 7W max. Now, HBM2 is supposed to be noticeably more power efficient than that, ~20% at the same bandwidth, IIRC. That brings us down to 5.5W max, while still running at high clocks. And as power draw with increased clocks is non-linear for RAM as for anything else, lowering clocks would drastically cut that. I wouldn't be surprised at all if you could get a single HBM2 stack down to ~2-3W sustained max at mobile-suitable clocks. Will it be as fast? Of course not. But that's neither necessary nor desirable in the mobile space. Oh, and that 3W figure would rarely if ever actually happen, due to the nature of mobile workloads (see the paper above).

maddie · Jun 2, 2017

Atari2600 said:
It wouldn't be low perf (i.e. not the ~11 CU that RR will now ship with). More like 20+ CU.

It'd be for use (1) in areas which benefit from limited acceleration across GPUs, so presents a more balanced solution that dGPUs and (2) use as a high end APU for consumers/prosumers.

HBM is obviously needed to feed instance (2), and in terms of (1), having a local on-die area that can act as a... I'll not say cache as the pure latency isn't there for that... but more a second storage using separate memory "channels" can lift the bandwidth load off the main memory access - and that would apply to both x86 and GPU.

Erm... just had a thought.

With the way infinity fabric is... isn't that virtually the same as a CCX+GPU? Would the HBCC in Vega allow for seamless integration into IF I wonder.

Hmmmm....

I always understood HBCC to be parsed as (HBC)C. In other words a (high bandwidth cache) controller. No HBC aka HBM2 or equivalent, makes a HBCC ineffective or inefficient. Just my thoughts.

maddie · Jun 2, 2017

jpiniero said:
No they can't. That's the issue. Even Kaby-G is probably an Apple semi-exclusive.

That's what I am predicting AMD is doing with Navi.

Some of us have predicted this last year. Strangely, I find myself agreeing with you on this.

maddie · Jun 2, 2017

Valantar said:
No way. You know that stacked memory leads to cooling issues for tablets, right? With ~2.5-4W SOCs? Stacking memory on top of a 150+W GPU would be a recipe for disaster. It would probably destroy itself rather quickly, even if it had some sort of thermal vias embedded into the HBM.

You mean "3x better performance for the fraction of a second it takes it to overheat and throttle". The announcements made sense in regard to further developing an up-and-coming memory technology, making it suitable for more use cases in the future.

I tend to agree, but I also think that depends on how powerful it turns out to be. If it's comparable to a low-end dGPU, I'd see some takers. The problem with Iris is that while it's far faster than HD 620 and the like, it's not fast enough that that actually matters. 620 does all the acceleration and media playback you want, but Iris can't really game even at 720p. If KBL-G could replace a RX 550 or GT 1030, I would want one.

You are making the error in assigning the total power budget to a single stack. The power of scaling is to distribute power and computation. What if each stack is 30W or 50W? Workable then?

Glo. · Jun 2, 2017

maddie said:
You are making the error in assigning the total power budget to a single stack. The power of scaling is to distribute power and computation. What if each stack is 30W or 50W? Workable then?

No memory cell ever will have 30-50W TDP, unless the sizes will change dramatically.

Valantar · Jun 2, 2017

maddie said:
You are making the error in assigning the total power budget to a single stack. The power of scaling is to distribute power and computation. What if each stack is 30W or 50W? Workable then?

It sounds like you're mixing up VRAM power and GPU power (or you're not expressing yourself quite clearly). I (we?) was talking about stacking HBM on top of a relatively high powered chip like a CPU, GPU or APU. Even though the HBM does consume some power, it's bound to be far lower than the chip beneath, unless you're talking about a Core m competitor, in which case a couple of stacks of low-clocked HBM2 would probably be roughly equal. The problem here is twofold: first (and the lesser of the two) is that you're concentrating heat-generating chips into a smaller surface area, i.e. concentrating the heat. The second, and bigger issue is that the stacked-on-top HBM will act as a thermal insulator between the CPU/GPU/APU die and the IHS(/cold plate in a laptop), making thermal transfer (cooling) exponentially more difficult. In essence, you're trapping part of the heat from the lower chip, not allowing it to be carried away by the cooling system.

While switching from PoP to stacked dice will lower this effect (as @imported_jjj was pointing out) it will in no way remove it entirely. And the hotter the chip below, the bigger the problem this becomes (as more heat generated means it needs to be dissipated quicker to avoid overheating). You'll run a very high risk of your Cpu entering heat soak, while your cooling solution will become far less efficient.

This barely works for ultra-low power devices like tablets (where, as I said, the iPad outperforms everything else in sustained loads largely due to having off-package RAM), but can never, ever work with more power hungry chips than that.

AtenRa · Jun 3, 2017

RavenRidge looks to be exactly what an AMD APU was supposed to be in the first place, cheap yet powerful CPU+iGPU and now at low wattage as well.
With 40% (iGPU) higher perf than BristolRidge at 50% less power, it could easily become the best APU in 2017 for Laptops.

Valantar · Jun 3, 2017

AtenRa said:
RavenRidge looks to be exactly what an AMD APU was supposed to be in the first place, cheap yet powerful CPU+iGPU and now at low wattage as well.
With 40% (iGPU) higher perf than BristolRidge at 50% less power, it could easily become the best APU in 2017 for Laptops.

Yeah, despite wishing for an HBM-equipped APU, I think RR will be pretty great. I'll probably be in the market for a new laptop in the next year, so hopefully they get some good design wins. I'd like something Surface Pro-like (with pen support, and preferably some sort of docking that enables a cTDP-up state), or a simple 13-14" ultrabook like the XPS 13 with great battery life. Should be very doable, and should be able to play some decent games. I'm not looking for AAA on my laptop, but the odd round of Rocket League (without it looking like crap) would be nice.

mtcn77 · Jun 3, 2017

Valantar said:
Yeah, despite wishing for an HBM-equipped APU, I think RR will be pretty great. I'll probably be in the market for a new laptop in the next year, so hopefully they get some good design wins. I'd like something Surface Pro-like (with pen support, and preferably some sort of docking that enables a cTDP-up state), or a simple 13-14" ultrabook like the XPS 13 with great battery life. Should be very doable, and should be able to play some decent games. I'm not looking for AAA on my laptop, but the odd round of Rocket League (without it looking like crap) would be nice.

I am looking very much at AA on my laptop - in case I ever buy one. The quoted improvement of 40% gpu is direct count from gpu cores. I can see where that fits, but there is still the question of AA - it requires bandwidth.
I cannot live without narrow-tent: SMAA+MLAA+ 4x adaptive MSAA narrow-tent. It would be great if it returned to the filtering stack. It really is something for me.

lolfail9001 · Jun 3, 2017

AtenRa said:
With 40% (iGPU) higher perf than BristolRidge at 50% less power, it could easily become the best APU in 2017 for Laptops.

We both know how that 40% number is calculated, i wonder how does that translate in real life. Especially with power claim.

Valantar · Jun 3, 2017

mtcn77 said:
I am looking very much at AA on my laptop - in case I ever buy one. The quoted improvement of 40% gpu is direct count from gpu cores. I can see where that fits, but there is still the question of AA - it requires bandwidth.
I cannot live without narrow-tent: SMAA+MLAA+ 4x adaptive MSAA narrow-tent. It would be great if it returned to the filtering stack. It really is something for me.

I think you misread my post entirely. I was talking about "triple-A" games (e.g. CoD, Tomb Raider, that kind of stuff), not Anti-Aliasing.

AMD Raven Ridge 'Zen APU' Thread

Member

Member

Senior member

Member

Senior member

Senior member

Golden Member

Member

Senior member

Golden Member

Senior member

Senior member

Golden Member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Lifer

Golden Member

Member

Golden Member

Golden Member