Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Page 82 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Now this is pretty interesting, Nvidia seems to have changed how they offer mobile GPUs in terms of PCB packaging between Ampere and Ada.

1675791127519.png
In Ampere the PCB1 "large" package (for large notebooks) covered everything from 3080ti to 3050ti, while the PCB2 "small" package covered from 3050ti downwards (3050ti offered in both package sizes).
1675791376207.png
In Ada the PCB1 "large" package is only limited to RTX4090/80 laptops. While PCB2 "small" package covers everything from RTX 4070 laptops downwards.

Technically speaking, this means all the thin/light notebooks which were only limited to 3050ti on Ampere could technically spec up to a RTX 4070 this generation.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
That is quite probably a consequence of the shrinking of die sizes and memory buses on Ada: 3070 mobile was based on GA104 (392 mm^2) with 256 bit bus, 4070 mobile on AD106 (190 mm^2) with 128 bit bus. 3060 also had a bigger de than AD106, GA106 was 276 mm^2 with up to 192 bit bus.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
First RTX 4060 Laptop tested (Videocardz)
NVIDIA-RTX4060-LAPTOP-GPU-COMPARISON-7.jpg

I think performance is pretty good, scoring ~12% higher than the fastest RTX 3060 laptop in notebookcheck database.
Not to my liking is only 8GB Vram, but that was to be expected.
What is terrible is the laptop prices unless you are ok with a low-mid Alder lake, those weren't as expensive.
I hope with AMD CPU It will be lower, but I am pretty sceptical.
 

MrTeal

Diamond Member
Dec 7, 2003
3,554
1,658
136

Appears the 4060 desktop has the same specs as the laptop and not using a further cut down AD106. At those specs it might have difficulty beating the 3060 Ti.
Depending on clocks it should have 15-16TF compute which is right around the 3060Ti. That's similar to the 3090Ti and 4070Ti, so that seems reasonable. The proposed 4060 has more bandwidth relative to the 3060Ti than the 4070Ti does to the 3090Ti though, so maybe it won't suffer as much at higher res relative to last gen.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
The VRAM is going to make it look worse than a 3060 in some titles where 8 GB isn't enough. The benchmarks for the Harry Potter game where RT was turned on and the 3060 was beating the 3080 were kind of funny.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
The VRAM is going to make it look worse than a 3060 in some titles where 8 GB isn't enough. The benchmarks for the Harry Potter game where RT was turned on and the 3060 was beating the 3080 were kind of funny.
That benchmark was from HWUB, but the 3080 seems to do fine here as per Computerbase. Screenshot_2023-02-14-10-49-43-335_com.sec.android.app.sbrowser.jpg

I think Hogwarts Legacy just has performance issues that prevent it from being a consistently benchmarkable game at this state.
 

jpiniero

Lifer
Oct 1, 2010
14,510
5,159
136
That benchmark was from HWUB, but the 3080 seems to do fine here as per Computerbase.

I think Hogwarts Legacy just has performance issues that prevent it from being a consistently benchmarkable game at this state.

I haven't watched the video but I think the HUB benchmarks are from further in the game.
 

jpiniero

Lifer
Oct 1, 2010
14,510
5,159
136
nVidia just announced that Jensen is doing a GTC keynote on March 21. Could be when the 4070 and 4060 Ti is officially announced.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Let's just be glad we're getting 2GB of VRAM and I'm he hasn't decided to start selling cards with no additional VRAM (to offset increasing costs) and then sell it as an optional upgrade that can be purchased separately.

1676911937359.jpeg

Everything old is new again. Just make sure to get in before the shader extension packs make an appearance.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
Notebookcheck.net
Laptop GPUTDPTimeSpy graphicCyberpunk 2077 QHDF1 22 QHDThe Witcher 3 v.400 QHD
Schenker XMG Neo 17 Engineering SampleRTX 4070115W + 25W12529 (147%)57 (178%)52 (158%)54 (146%)
Razer Blade 18RTX 4070115W + 25W11683 (137%)54 (169%)------------------------
Gigabyte Aero 16RTX 407080W + 25W------------53 (166%)44 (133%)------------
MSI Katana 17RTX 406080W + 25W10299 (121%)46 (144%)41 (124%)41 (111%)
Schenker XMG Focus 15 Engineering SampleRTX 4050115W + 25W8536 (100%)32 (100%)33 (100%)37 (100%)
Many more games are tested in that link, I just included a few.
In my opinion, RTX 4060 looks best from the bunch, even If It has only 75% of TDP. Regrettable, they didn't have the highest TDP version, I think It would mean +10% of performance at least.
 
Last edited:

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Notebookcheck.net
Laptop GPUTDPTimeSpy graphicCyberpunk 2077 QHDF1 22 QHDThe Witcher 3 v.400 QHD
Schenker XMG Neo 17 Engineering SampleRTX 4070115W + 25W12529 (147%)57 (178%)52 (158%)54 (146%)
Razer Blade 18RTX 4070115W + 25W11683 (137%)54 (169%)------------------------
Gigabyte Aero 16RTX 407080W + 25W------------53 (166%)44 (133%)------------
MSI Katana 17RTX 406080W + 25W10299 (121%)46 (144%)41 (124%)41 (111%)
Schenker XMG Focus 15 Engineering SampleRTX 4050115W + 25W8536 (100%)32 (100%)33 (100%)37 (100%)
Many more games are tested in that link, I just included a few.
In my opinion, RTX 4060 looks best from the bunch, even If It has only 75% of TDP. Regrettable, they didn't have the highest TDP version, I think It would mean +10% of performance at least.

That's unlikely as the 4060 starts to go asintotically flat at 105W already


Here are all the performance curves in terms of Timespy for each chip and TDP, compared to the last generation, from a reliable source.

Edit: One thing to note is that XMG/Clevo Laptops with the 4070/4080/4090 have the possibility to use a liquid cooling system. Jarrod received that with their 4070 model but they did not use it for comparing apples with apples. It is possible that by using that system you can increase substantially the performance of the GPU. I say this because there is quite the difference between the Razer Blade and the XMG systems in the notebookcheck results, and that with the same TDP settings. So either:

1) 4070 GPU chips greatly vary their performance according to the individual IC
2) XMG cooling is way better than the Razer Blade cooling
3) XMG is using the LCS

Looking at the Skyjuice table, however, it seems that 2) can be the right answer
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
That's unlikely as the 4060 starts to go asintotically flat at 105W already


Here are all the performance curves in terms of Timespy for each chip and TDP, compared to the last generation, from a reliable source.
The question is If this is also true for demanding games or only TimeSpy.
If yes, then I don't understand why they go up to 140W(115+25W) or 33-47% higher.
FpfymVYakAIujYi

4050 stops scaling at 95W, 4060/4070 stops scaling at 105W.
This means they hit their highest clockspeed at that TGP and more does nothing except increasing power consumption. Nvidia should have increased frequency or lowered TGP, to improve perf/W.

edit:
Those 2 chips are too close in size to each other and performance is also pretty close, but AD106 has 8% lower max boost.
AD106: 186mm² 36SM
AD107: 156mm² 24SM
In my opinion, AD106 should have been made larger 40SM + 192bit 12GB. I don't think It would be more than 210mm².
 
Last edited:

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Timespy seems to be a good index of the performance scaling for same or similar architectures. By that, you could predict the performance of 4060 compared to 4070-4080-4090 and also why Jarrod and HWUB test saw practically no difference between the 3070 Ti and the 4070 (which is confirmed by other sources and not only Skyjuice and co.). Increasing frequency means adding more consumption... As why they try to push the 4070 and 4060 to an higher power than needed to max their performance, I cannot respond, it is a commercial choice and in any case performance/W is better than the previous gen... but not at all TGP points. Possibly with better cooling they could squeeze a little more performance by that (the LCS example) at max TGP.
In any case I always found notebookcheck tests not really reliable, they don't always specify the exact settings for the games or tell about the dependence by the CPU used, while Jarrod and HWUB are way more complete and detailed.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,554
1,658
136
In the curve by Geekerwan are those 30 series comparisons mobile parts (IE 3080 = GA104) or desktop (GA102)? I'm guessing RTX3080 mobile based on the data source and tight grouping with the 3070 line, but that's pretty terrible if RTX4070 is ~10% faster than RTX3070 even if it is at lower power.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
These are the mobile RTX3000 parts. The reason about the 4070 having these results is that even if Ada is way more efficient and more powerful per SM than Ampere the dies they chose for the mobile parts are way more limited in the base amount of resources. The 4070 as it is now has 4608 CUDA cores while the 3070TI has 5888 of them and bus width is halved (128bit vs 256 bit), so even if the boost clock is lower on the 3070Ti, the gap in efficiency seems to be filled by the difference in raw resources. 4N TSMC process is probably way more expensive than the Samsung one used in Ampere cards, so probably die costs are not much different. These Ada chips do quite good for their size, and they can be set with a way higher power efficiency, but in absolute power they are put in comparison with way bigger dies that use the "slower but wider" card and have larger bandwidth available.
 
Last edited:
  • Like
Reactions: TESKATLIPOKA

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
In the curve by Geekerwan are those 30 series comparisons mobile parts (IE 3080 = GA104) or desktop (GA102)? I'm guessing RTX3080 mobile based on the data source and tight grouping with the 3070 line, but that's pretty terrible if RTX4070 is ~10% faster than RTX3070 even if it is at lower power.
Actually, RTX 4070 laptop is doing pretty good against the previous gen. It performs a bit better than a RTX3080 laptop, which is a full 48SM GA104.
It could do even better If It wasn't limited to just 2175MHz vs 2370MHz for AD107.
From a performance perspective I think the new generation are doing pretty good, what I don't like is the amount of Vram. AD107 should have been made with 192-bit 12GB memory, then It would be a decent midrange GPU for QHD.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,554
1,658
136
Actually, RTX 4070 laptop is doing pretty good against the previous gen. It performs a bit better than a RTX3080 laptop, which is a full 48SM GA104.
It could do even better If It wasn't limited to just 2175MHz vs 2370MHz for AD107.
From a performance perspective I think the new generation are doing pretty good, what I don't like is the amount of Vram. AD107 should have been made with 192-bit 12GB memory, then It would be a decent midrange GPU for QHD.
Yeah for the dies themselves AD106 is doing decently well compared to GA104, providing similar performance at ~75-80% of the power. In terms of product positioning though, in the upper mainstream where you see x70 laptops, there's really not much improvement in performance between a 3070 laptop and a 4070 laptop.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
Yeah for the dies themselves AD106 is doing decently well compared to GA104, providing similar performance at ~75-80% of the power. In terms of product positioning though, in the upper mainstream where you see x70 laptops, there's really not much improvement in performance between a 3070 laptop and a 4070 laptop.
Honestly speaking, I also don't understand the point of releasing AD106 with those specs.
It's 20% bigger with 20% higher performance than AD107, but the same Vram.
AD106 laptop has theoretically 1.38x more TFLOPs, yet performance is <=20% better.
AD106 should have come with wider bus and 12GB Vram, then It would be a good GPU.
Even a bit more SM wouldn't hurt.
I think 40SM + 192-bit GDDR6 AD106 would be <215mm2.

Don't mention laptop prices. Just looking at them makes me so sad.
Honestly speaking, buying a new laptop with only 8GB Vram is not a good option, but paying 2499€ for the cheapest RTX 4080 12GB is seriously a lot.
N32 doesn't look like It will be any cheaper based on what N22 based laptops are currently going for.
 
Last edited:

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Honestly speaking, I also don't understand the point of releasing AD106 with those specs.
It's 20% bigger with 20% higher performance than AD107, but the same Vram.
AD106 laptop has theoretically 1.38x more TFLOPs, yet performance is <=20% better.
AD106 should have come with wider bus and 12GB Vram, then It would be a good GPU.
Even a bit more SM wouldn't hurt.
I think 40SM + 192-bit GDDR6 AD106 would be <215mm2.

In general there may be several reasons:

number one is costs, 5N and 4N are very expensive processes so the more mainstream products must be small for being profitable in that market range.
Another reason is that these dies are mobile first, and they are destined to cover also the market for thinner and lighter notebooks which are not suitable for large memory buses/bigger dies (because of space/packaging limits), and moreover having a bigger die with a 128 bit bus would have resulted in bottlenecking the bigger die itself.
But in general yes, there is space for a GPU with intermediate specs between the 4070 mobile and 4080mobile, possibly a "4070 Ti" with a 192bit bus which will probably come out next year with the usual refresh of the line (4060Ti, 4080Ti, 4090Ti are quite possible as well).
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
In general there may be several reasons:

number one is costs, 5N and 4N are very expensive processes so the more mainstream products must be small for being profitable in that market range.
Another reason is that these dies are mobile first, and they are destined to cover also the market for thinner and lighter notebooks which are not suitable for large memory buses/bigger dies (because of space/packaging limits), and moreover having a bigger die with a 128 bit bus would have resulted in bottlenecking the bigger die itself.
1. That beefed up AD106 would cost more(extra 25-30mm2 and 4GB Vram), true, but performance would be also better. Nvidia could ask more for It, which will naturally result in higher laptop price, which is bad, but still cheaper than RTX 4080 12GB and more future-proof than with only 8GB Vram.

2. As you said, the space is a problem. If they used HBM It would be resolved, but they keep using GDDR6. With HBM they wouldn't even need big L2 cache and memory PHY would be also smaller. Amount of Vram would also no longer be a problem.

3. Current AD106 is a "big die" paired with only 128-bit bus. It is already bottlenecked.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
Those 2 chips are too close in size to each other and performance is also pretty close, but AD106 has 8% lower max boost.
AD106: 186mm² 36SM
AD107: 156mm² 24SM
In my opinion, AD106 should have been made larger 40SM + 192bit 12GB. I don't think It would be more than 210mm².

36 is 50% more than 24 SMs. That's a fairly substantial step.
 
  • Like
Reactions: Mopetar