Discussion Intel current and future Lakes & Rapids thread

Page 398 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
-Vega 8 is too small so it has little room to stretch with extra TDP

Vega 8 IS too small, i was the only one to point that out when they downgraded the IGP on Renoir, not sure why it was needed for Intel to actually come up with a decent IGP in order to see it.

Now, im really not sure about the 12CU on RMB because i dont think you can feed that with just 128bit LPDDR5.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
12CU of RDNA2 should do just fine with 128bit LPDDR5. RDNA2 has significantly better memory compression and bandwidth management than VEGA does. The DDR5 implementation should give at least 50% more effective bandwidth, and, the bandwidth needed per CU per Mhz should actually decrease.
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,574
146
12CU of RDNA2 should do just fine with 128bit LPDDR5. RDNA2 has significantly better memory compression and bandwidth management than VEGA does. The DDR5 implementation should give at least 50% more effective bandwidth, and, the bandwidth needed per CU per Mhz should actually decrease.

Assuming LPDDR5-5500 you get ~30% extra memory bandwidth over LPDDR4X. On top of that, the 5700XT performs ~25% better than the Vega64 with ~10% less bandwidth (or alternatively, comparing the equally performing 5600XT vs the Vega64 and you're looking at ~40% lower bandwidth). Just combining those numbers gets you the ~80% greater performance I mentionned before.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
Vega 8 IS too small, i was the only one to point that out when they downgraded the IGP on Renoir, not sure why it was needed for Intel to actually come up with a decent IGP in order to see it.

Now, im really not sure about the 12CU on RMB because i dont think you can feed that with just 128bit LPDDR5.

And I still think you are wrong. One of they key wins of Renoir was having up to 8 cores. If that means cutting 3 CU's, so be it. with the increase in frequency, performance was basically a wash anyway.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I agree. Based on the numbers I cannot conclude Iris Xe is very memory bandwidth bound. They were at one point, with the 40EU Iris versions using Gen 8 graphics without eDRAM. But ever since Gen 9 it wasn't really sensitive.

The 28W DDR4 significantly outperforms the 15W LPDDR4X, yet the 28W LPDDR4X isn't much faster.

Also the consensus is that the Iris Xe fares significantly better against Vega 8 at 28W then it does at 15W. At 15W it just falls behind, while at 28W it can be noticeably faster.

Possibilities:
-Vega 8 is too small so it has little room to stretch with extra TDP
-Intel CPU cores are power hungry and takes away budget from the GPU
-Tigerlake's design point is at 25W+
-Xe scales more linearly with extra resources

Alderlake can improve on #2, #1 means 12 CU or more on Remembrandt will be awesome, #4 means dGPU may not be terrible.

Independent of the question if the observations can be sufficiently explained by either bandwidth and power limits - what can be concluded is both higher power and bandwidth efficiency of Vega. You said it yourself, Vega 8 is more compute limited than anything else at the moment(see #1). Since this conclusion can be drawn before the process improvement + RDNA 2 changes (which improves power and bandwidth efficiency even further) - it should be clear that AMD has the opportunity to beat ADL convincingly with Rembrandt. I say "has the opportunity" as it still requires to up the compute performance sufficiently (e.g. having sufficient CUs).
I wonder when people get it into their heads, efficiency == performance (everything else being the same).
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
And I still think you are wrong. One of they key wins of Renoir was having up to 8 cores. If that means cutting 3 CU's, so be it. with the increase in frequency, performance was basically a wash anyway.

You are assuming that they were only able to archive 8C by cutting 3CU off the IGP, we dont know that.

And lets remember that when i had this discussion the main argument is that Vega 8 was bandwidth starved, so there was no point in more CUs (at the same time that AMD were cloking Vega in Renoir very aggressively to compensate for the lost CUs), we now know that Vega 8 is not so bandwidth starved as so many people belived.
 
Last edited:
  • Like
Reactions: mikk

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
It's obvious that AMD could have chosen to increase the size of the Renoir/Lucienne/Cezanne die by the few percent that would have been needed to add two or more extra CUs. That's not the question. The question is, was it worth the various trade-offs that they made in the final chip that was produced? I suggest that it was. At the time that it was released, it was the fastest game in town in it's full configuration. We also know that AMD has been capacity constrained with their APUs since Renoir was released, so the limited number of CUs has evidentially not harmed their ability to sell the chip. If we like it or not is not really on their mind.

Going forward, we know that, save for a refresh of Cezanne on N6, VEGA8 is done in APUs. There may be a cut-down version sold in the value sector, but that's about it. It did what it was supposed to do, and is serving well where it is used.

I look forward to the next generation of APUs.

I wonder if AMD will leave any ray tracing abilities in the RDNA2 CUs in APUs? I doubt that they will be sufficient for game use, but, they may have other uses.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
I don't think they'll include any RT hardware in their APUs. Even cards like a 3090 don't have enough to be able to provide acceptable frame rates without using DLSS, so for an APU it's a complete non-starter. I can't think of any useful functionality that dedicated RT hardware would provide that couldn't be done well enough using the shaders or CPU cores instead. The problem would almost essentially have to be something that looks like an RT problem to take advantage of the specific hardware.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
You are assuming that they were only able to archive 8C by cutting 3CU off the IGP, we dont know that.

And lets remember that when i had this discussion the main argument is that Vega 8 was bandwidth starved, so there was no point in more CUs (at the same time that AMD were cloking Vega in Renoir very aggressively to compensate for the lost CUs), we now know that Vega 8 is not so bandwidth starved as so many people belived.

I'm sure they could have left them there. At the cost of die size and power usage. I'm sure they looked at the trade offs and did what they did for a reason. They went with a design that most people would probably prefer. Besides, laptops often get stuck with crap RAM anyway.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
The 7nm process from TSMC is probably mature enough that there's not much need for additional redundancy that might get put their when first using a node.

Most of the segmentation across particular parts will be artificial at this point. There's not a lot of reason to include additional CUs that you'll wind up turning off for a lot of parts anyways.
 
  • Like
Reactions: Tlh97

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
That GB score is roughly in the ballpark of the 5900H. The 11800H beats it in single core by a bit, but looses in multi-core. That tends to fit with what we believe about the process in that it tends to have slightly inferior thermal/power efficiency to N7, resulting in a probably limit to multi-core performance that's related to heat dissipation or total package power draw. At least, that's my guess as to what we're seeing. Of note, the 11800H is requiring 2.5X the L2 cache and 1.5X the L3 cache to achieve those numbers.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I can't think of any useful functionality that dedicated RT hardware would provide that couldn't be done well enough using the shaders or CPU cores instead.

Accurate lighting, including global illumination, refractions, occlusion and reflections are almost impossible to do with only shader cores (or with the CPU) with sufficient speed in realtime. Game developers are working with very crude approximations when not utilizing raytracing including screen space effects.
That having said, raytracing can be done with the CPU and the shaders - it is just highly inefficient.
 
Last edited:
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Accurate lighting, including global illumination, refractions, occlusion and reflections are almost impossible to do with only shader cores (or with the CPU) with sufficient speed in realtime. Game developers are working with very crude approximations when not utilizing raytracing including screen space effects.
That having said, raytracing can be done with the CPU and the shaders - it is just highly inefficient.

The problem is that in an APU you can't include very much of that hardware and even cards like the 3090 don't really have enough of it to make it useful without also using something like DLSS to compensate for the massive performance hit. As such, adding that hardware to an APU is even more pointless. They already struggle to run most games with acceptable framerates at 1080p low settings and even 720p can be taxing for certain titles. The far better approach would be to include specialized hardware to accelerate whatever AMD's version of DLSS is because it would make a larger difference than any RT hardware would.

The new Switch is using a similar approach and I honestly think this is the best use case for DLSS because it lets users get a lot more out of this kind of low-power hardware. I don't know if that specialized hardware would be any more useful than specialized RT hardware, but in terms of what the end-user would get out of it, it's loads better than wasting die space in an APU on any kind of ray tracing. My comment was specifically in response to a question about whether that hardware could be used for things that aren't ray tracing and I doubt it.
 
  • Like
Reactions: coercitiv

Thala

Golden Member
Nov 12, 2014
1,355
653
136
The problem is that in an APU you can't include very much of that hardware and even cards like the 3090 don't really have enough of it to make it useful without also using something like DLSS to compensate for the massive performance hit. As such, adding that hardware to an APU is even more pointless. They already struggle to run most games with acceptable framerates at 1080p low settings and even 720p can be taxing for certain titles. The far better approach would be to include specialized hardware to accelerate whatever AMD's version of DLSS is because it would make a larger difference than any RT hardware would.

The question is not one or the other. DLSS itself only helps you with resolution but does not help you with raytracing - so I'd say both are very valuable. In the bigger picture both RT and DLSS accelerating HW have the goal of keeping the framerates up while improving IQ.
That having said, the question is, if HW in the performance class of an APU would make a raytracing game playable. I guess that depends on the preferences of the user. Thing is, at least in case on NVidia, the RT cores are inherent part of the´TPC and are from area perspective only a tiny fraction (around 7%). The question is, from design and verification perspective, would it be worth removing the RT cores.

If we think outside of gaming, RT hardware is also able to accelerate common raytracing task for several raytracing applications. As it turned out, using the the RT cores for the render backends is faster compared to just using a generic GPU backend utilizing the compute units(shaders).
 
Last edited:
  • Like
Reactions: Tlh97

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
That GB score is roughly in the ballpark of the 5900H. The 11800H beats it in single core by a bit, but looses in multi-core.

Geekbench tends to favor Intel CPUs compared to AMD. It's also the only benchmark I've seen where Amberlake gets fairly close to the U chips(in all other scenarios Amberlake really falls behind). So it might represent a bursty workload where it performs like a thermally unconstrained part.
 

mikk

Diamond Member
May 15, 2012
4,112
2,106
136
Actually Hardware Unboxed tests show while LPDDR4x benefits the Iris Xe G7 a lot at 15W, the differences aren't as profound under 28W:


This test is a mess. If there is a bandwidth benefit in 15W, it won't disppear with 28W. Actually the difference should be bigger with a higher power budget because the iGPU runs faster. They are mixing so many different Tigerlake devices and we don't have 3dmark scores to check how fast it is in comparison with other devices, this test is useless. And some scores doesn't make sense, check out the Counterstrike 15W difference: 124 fps LPDDR4x, 86 fps DDR4. Or Rainbow Six Siege: 46 fps LPDDR4x, 32 fps DDR4. This gap appears too big and by the way they should add the exact specification of the memory. x8/x16, single/dual rank etc. It makes a big difference on Iris Xe. The 15W DDR4 device might run on a slower DDR4 combination, that's why 3dmark scores are so important. Some 25W+ DDR4-3200 Iris Xe devices barely reach 4000 points in Firestrike.
 
  • Like
Reactions: Tlh97

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Geekbench tends to favor Intel CPUs compared to AMD. It's also the only benchmark I've seen where Amberlake gets fairly close to the U chips(in all other scenarios Amberlake really falls behind). So it might represent a bursty workload where it performs like a thermally unconstrained part.

Yes, Geekbench is bursty by design.

do we want a benchmark that reflects this thermal throttling issue or do we want a benchmark that sort of represents “This is what the processor can do.” A lot of workloads on phones are bursty. If you’re not playing a game, it’s going to be something like you open up facebook, you upload that picture, you scroll through, you see know like how many likes you have, and then you close your phone, and you’re contented with your life. it’s a lot of bursty stuff like that. Checking email or loading a web page. And you’re going to spend a lot of time where the CPU is just going to be idle.

...

But anyway, we made the decision of “Let’s insert these gaps, let’s give the processor a little bit of time to cool down”

 

Hulk

Diamond Member
Oct 9, 1999
4,191
1,975
136
Will we see U series (~15W) 8 core Intel mobile processors before Alder Lake?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Will we see U series (~15W) 8 core Intel mobile processors before Alder Lake?

Not on the roadmaps. Considering how power hungry their cores are they might have to underclock it significantly to make it not worth it.

I guess Tigerlake-H set at 35W counts? 35W is what some U systems run at.

This test is a mess.

Yea it could be better. I wonder if they could play with XTU and test by underclocking the LPDDR4X to 3200MT/s?

Still I think if it gained a lot Intel would have said something. We don't even know Alderlake's Xe even clocks higher!
 
  • Like
Reactions: Tlh97