Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Page 58 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
The 4070 is very likely a cut AD104 - 7160 shaders, maybe 160 bit bus. I have doubts you will even see that soon, let alone AD106.

After all, the 3060 Ti GDDR6X is going to be announced soon.
AD104 - 7160 could also be 4070 Ti.
Unless Ti is reserved for higher clocked versions in the future.
 

Mopetar

Diamond Member
Jan 31, 2011
7,835
5,982
136
The fight AD106 vs N33 will be very interesting.

edit: AD106 with 40SM should perform a lot better than 3070-3070ti.

N33 is on N6 so I would think Nvidia would have a general advantage. The memory bus size and cache sizes are identical, but AD106 has 25% more shaders, so AMD would need some combination of 25% more clock speed or architectural advantages to go head to head.

They would have a cost advantage being on N6 with a similar overall die size, but full due AD106 should have an advantage on average.
 
  • Like
Reactions: Tlh97 and Leeea

GodisanAtheist

Diamond Member
Nov 16, 2006
6,783
7,117
136
I'm starting to think N31 and N32 will occupy product branding all the way down to the 7700 line. N33 will be 7600XT and below.

AD104 will likely be competing with the cutdown N32 packages.

If AMD can get away with it, they absolutely will and NV has given them quite the opening.
 
  • Like
Reactions: Tlh97 and Leeea

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
N33 is on N6 so I would think Nvidia would have a general advantage. The memory bus size and cache sizes are identical, but AD106 has 25% more shaders, so AMD would need some combination of 25% more clock speed or architectural advantages to go head to head.

They would have a cost advantage being on N6 with a similar overall die size, but full due AD106 should have an advantage on average.
RDNA2 vs Ampere are comparable in performance, with the same number of CU vs SM based on TPU game average in 4K.
Boost clockCU(WGP) or SMShadersTFLOPs
RX 6950XT2310 MHz
(139%)
80(40)5120 (100%)23.7 (100%)
RTX 3080Ti1665 MHz (100%)8010240 (200%)34.1 (144%)

Possible specs:

Die Size [mm2]
Frequency [MHz]CU(WGP) or SMShadersTMUROPCache [MB]Memory bus width [bit]
AD1062610 ? (100%)40 ?5120 ?160 ?40-48 ?32 ?128 ?
N333200 ?32(16)4096128 ?48-64 ?32128
N22x3600
(138%)
40(20)2048806496192
I am not sure AD106 has 40SM, but It's highly likely after looking at the Ampere lineup.
N33 vs AD106 It's hard to predict performance, so I made a super clocked N22x for comparison purposes, which should perform as AD106.

N33 vs N22x
N22x has 12.5% higher clocks and 25% more WGPs than N33, this should add up to 41% higher performance.
Now, the big question is how much better is RDNA3 WGP vs RDNA2 WGP.
We can guess as much as we want here.
It certainly won't be 2x better, but 50% better should be doable, and then N33 would be a bit faster than N22x.
If this turns out as true, then It can also compete with AD106.
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
I'm starting to think N31 and N32 will occupy product branding all the way down to the 7700 line. N33 will be 7600XT and below.

AD104 will likely be competing with the cutdown N32 packages.

If AMD can get away with it, they absolutely will and NV has given them quite the opening.

This has been pretty obvious since N33 was very strongly rumoured to have just a 128 bit bus because I don't see AMD going backwards on VRAM at the x700 tier. It would go down like a lead balloon.

Looking at what we have so far for the 4080 12GB it is ~ 3080Ti ~ 6950XT perf in raster. I expect a cut N32 will have more shaders than the 6950XT and run them at higher clocks and while I do see an IPC drop off I don't think it will be enough to offset the clockspeed gain so I expect 7700XT to come in around or slightly above 3090Ti performance and if that is true then it lands between the 4080 12GB and the 4080 16GB in a package with a lower BOM than the 4080 12GB.

I also think that at 1080p N33 will be closer to the 4080 12GB in performance than AD106 is to N33.

So ultimately I agree. AMD have a great chance to offer a very compelling product stack and really challenge NV for outright raster performance superiority as well as perf/$ superiority, and they can do it without gutting their margins as well which makes it that little bit more likely.
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
I doubt that, even If we talk only about raster.

This is the rumoured spec of AD106.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1414765a-6d42-4436-9158-e4071bcca163_1024x619.jpeg


The 4090 manages to show a 60% performance uplift in raster (based on the limited data we have seen so far) from 52% more shaders / TMUs running 35% higher boost clocks (not sure on actual clocks obviously) with 71% more ROPs with the same bandwidth and 96MB L2 cache. It has upgrade all around

AD106 vs GA106 has 20% more shaders (maybe TMUs) with an unknown boost clock increase with the same rop count, less bandwidth and 32MB L2 cache. It looks like an upgrade but it is not the same all over upgrade as the 4090 is over the 4090. If we say this config shows a 50% performance bump I think that is generous and that puts it into 3070 performance territory (1440p because 128 bit + 32MB cache starts to fall off at 4k).

N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.

Given the 4080 12GB looks to be ~ 3080Ti performance if we normalise to the 3070 then we get this ranking

Current GPUFuture GPUGuess Performance Rating
3070AD106100
N33111 - 125
3080Ti4080 12GB131

So personally I think the above is generous to AD106 gains and understating N33 gains and we come out with N33 being between 10% and 25% faster than AD106 and being between 85% to 95% of 4080 12GB performance.

So I do think N33 is going to be closer to the 4080 12GB in performance than AD106 is to N33 performance. At worse N33 might slot in bang in the middle but given it is has a lower BOM than AD106 I think AMD are onto a winner. If you could get 90% of 4080 12GB performance at 1440p for half the price with a smaller die then you are doing a 4870 again except AMD will have faster products to sell you a well.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
This is the rumoured spec of AD106.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1414765a-6d42-4436-9158-e4071bcca163_1024x619.jpeg

The 4090 manages to show a 60% performance uplift in raster (based on the limited data we have seen so far) from 52% more shaders / TMUs running 35% higher boost clocks (not sure on actual clocks obviously) with 71% more ROPs with the same bandwidth and 96MB L2 cache. It has upgrade all around

AD106 vs GA106 has 20% more shaders (maybe TMUs) with an unknown boost clock increase with the same rop count, less bandwidth and 32MB L2 cache. It looks like an upgrade but it is not the same all over upgrade as the 4090 is over the 4090. If we say this config shows a 50% performance bump I think that is generous and that puts it into 3070 performance territory (1440p because 128 bit + 32MB cache starts to fall off at 4k).
Tomorrow we will see how much performance uplift RTX4090 really brings.
Full GA106 had 30SM and GA104 48SM. Difference is 50%.
Full AD104 has 60SM. If AD106 has only 36SM then the difference is 66.67%.
20%(36 vs 30) more SM means +20% Shaders and +20% TMUs.
Bump in frequency is ~50%(2.7GHz).
36SM*128Cuda*2Flop*2.7GHz=24.88 TFLOPs, that's 15% above 3070Ti.
Let's say It performs 50% better than RTX3060(28SM), although It has +93% TFLOPs, +93% Texture Fillrate, +50% Pixel Fill rate, only bandwidth being lower.
That's about RTX 3070Ti level of performance at 1080p and RTX 3070 at 1440p(TPU: Cyberpunk 2077).

N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.

Given the 4080 12GB looks to be ~ 3080Ti performance if we normalise to the 3070 then we get this ranking

Current GPUFuture GPUGuess Performance Rating
3070AD106100
N33111 - 125
3080Ti4080 12GB131

So personally I think the above is generous to AD106 gains and understating N33 gains and we come out with N33 being between 10% and 25% faster than AD106 and being between 85% to 95% of 4080 12GB performance.

So I do think N33 is going to be closer to the 4080 12GB in performance than AD106 is to N33 performance. At worse N33 might slot in bang in the middle but given it is has a lower BOM than AD106 I think AMD are onto a winner. If you could get 90% of 4080 12GB performance at 1440p for half the price with a smaller die then you are doing a 4870 again except AMD will have faster products to sell you a well.
50% more performance than N23 would be between RTX 3070Ti and RX 6800 at 1080p and around RTX 3070 at 1440p (TPU: Cyberpunk 2077).
That's the same performance as AD106 at 1440p and a bit better at 1080p.
Of course If It's 60% It will aim higher, but can It? Even with 24gbps GDDR6 It would have only 37% higher bandwidth than RX 6650XT. Number of shader engines, WGP, IC and ROPs stays the same as N23. The increase in performance will have to come from higher clockspeed and WGP(shaders and maybe TMU).
BTW ADA104 has +67% SM(CUDA and TMU), +67% ROPs, 50% more L2 cache, 50% higher bandwidth. So only 31% higher performance than Ada106 looks too low in my opinion.
 

exquisitechar

Senior member
Apr 18, 2017
657
871
136
N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.
N33 doesn't really have 100% more shaders.
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
Tomorrow we will see how much performance uplift RTX4090 really brings.
Full GA106 had 30SM and GA104 48SM. Difference is 50%.
Full AD104 has 60SM. If AD106 has only 36SM then the difference is 66.67%.
20%(36 vs 30) more SM means +20% Shaders and +20% TMUs.
Bump in frequency is ~50%(2.7GHz).
36SM*128Cuda*2Flop*2.7GHz=24.88 TFLOPs, that's 15% above 3070Ti.
Let's say It performs 50% better than RTX3060(28SM), although It has +93% TFLOPs, +93% Texture Fillrate, +50% Pixel Fill rate, only bandwidth being lower.
That's about RTX 3070Ti level of performance at 1080p and RTX 3070 at 1440p(TPU: Cyberpunk 2077).


50% more performance than N23 would be between RTX 3070Ti and RX 6800 at 1080p and around RTX 3070 at 1440p (TPU: Cyberpunk 2077).
That's the same performance as AD106 at 1440p and a bit better at 1080p.

relative-performance_2560-1440.png


Taken from the ARC review because it is the latest chart TPU have so will be the latest drivers and their latest game suite.

50% over the 6600XT is 146% so is bang in the middle of 3070Ti and 6800, vs a 6650XT it would be closer to 6800.

50% over the 3060 is 135% which is 3070 tier.

Given the uplifts in AD102 vs GA102 and the 60-65% performance gain over the 3090Ti shown so far AD106 has less of an uplift of GA106 so yea a 50% uplift seems generous but not impossible.

Of course If It's 60% It will aim higher, but can It? Even with 24gbps GDDR6 It would have only 37% higher bandwidth than RX 6650XT. Number of shader engines, WGP, IC and ROPs stays the same as N23. The increase in performance will have to come from higher clockspeed and WGP(shaders and maybe TMU).
BTW ADA104 has +67% SM(CUDA and TMU), +67% ROPs, 50% more L2 cache, 50% higher bandwidth. So only 31% higher performance than Ada106 looks too low in my opinion.

GPUShadersROPSL2BusBoost ClockTFlopsDelta vs lower tierPerf GapTflop to FPS scale
409016384 (1.68x)192 (1.71x)96 (1.5x)384 (1.5x)2.51Ghz(1x)82.5751.69x1.43x0.85x
4080 16GB9728 (1.27x)112 (1.27x)64 (1.33x)256 (1.33x)2.52Ghz(0.97x)49.0291.23x1.2x0.98x
4080 12GB7680 (1.67x)80 (1.67x)48 (1.5x)192 (1.5x)2.61ghz (0.97x)40.0891.61x
AD10646084832128~2.7Ghz24.883
6900XT5120 (2x)128 (2x)128 (1.33x)256 (1.33x)2.25Ghz (0.87x)23.0401.74x1.43x0.827x
6700XT256064961922.581Ghz13.214

I eyeballed the charts NV showed for gap to lower tier so AD104 may be closer to 1.37x AD106 judging by the 4090 scaling but we will know more tomorrow.

Obviously without actual clock speeds we can only get a ballpark figure but it looks like ballpark will be
AD106 ~= 3070
N33 ~= 6800
4080 12GB ~= 3080Ti / 3090

It looks like quite a large gap between AD106 and the 4080 12GB but a cut down AD104 with 10GB ram could fill that gap to make a 4070.

It boils down to N33 uplift IMO but at 1440p I can see it splitting the difference between AD106 and the 4080 12GB. There is also every possibility that the GPU that uses AD106 is not using the full die as well so I stand by my estimate that N33 will be closer to 4080 12GB than whatever uses AD106 is to it.

N33 doesn't really have 100% more shaders.

It will have double the FP32 and INT32 throughput so it is closer to double than Ampere was over Turing. Yes other bits are staying the same like ROP count and so on but RDNA2 seems more shader limited anyway so that just feels like a rebalance.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
It will have double the FP32 and INT32 throughput so it is closer to double than Ampere was over Turing. Yes other bits are staying the same like ROP count and so on but RDNA2 seems more shader limited anyway so that just feels like a rebalance.
N33 has 2x more FP32 units, so throughput should be >2.5x depending on clockspeed increase(>25% increase), but It looks like you didn't mean the theoretical throughput.
N22 has the same number of ROPs as N23 and wasn't limited, so It shouldn't be a mayor bottleneck.

Does anyone know when will the RTX 4090 reviews come out tomorrow?
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
The same, they have the same suppliers. :p

What do you mean? Fabs create the silicon wafers themselves based on their own processes. Unless you are speaking tongue in cheek in regards to the raw materials. Then sure, they may both get the raw minerals from the same places. But thats like saying a daewoo is equivalent to Porsche because they both get steel and aluminum from the same place.
 
  • Like
Reactions: igor_kavinski

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
What do you mean? Fabs create the silicon wafers themselves based on their own processes. Unless you are speaking tongue in cheek in regards to the raw materials. Then sure, they may both get the raw minerals from the same places. But thats like saying a daewoo is equivalent to Porsche because they both get steel and aluminum from the same place.
I don't think so. Here is a list (2022) of the big players. This is part of the B<>B supply chain that's generally hidden from the public.

What they do with these wafers is where the fab magic takes place.


GlobalWafers Singapore Pte. Ltd. (Headquarter - Missouri, United States)

Okmetic Oy (Headquarter - Vantaa, Finland)

Shanghai Simgui Technology Co. Ltd. (Headquarter - Shanghai, China)

Shin-Etsu Chemical Co. (Headquarter - Tokyo, Japan)

Silicon Materials Inc. (Headquarter - Pennsylvania, United States)

Siltronic AG (Headquarter - Munich, Germany)

SK Siltron Co., Ltd. (Headquarter - Gumi City, South Korea)

Sumco Corporation (Headquarter - Tokyo, Japan)

Tokuyama Corporation (Headquarter - Tokyo, Japan)

Virginia Semiconductor, Inc. (Headquarter - Virginia, United States)

Wafer Works Corporation (Headquarter -Taoyuan City, Taiwan)
 
  • Like
Reactions: Tlh97 and Leeea

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
I don't think so. Here is a list (2022) of the big players. This is part of the B<>B supply chain that's generally hidden from the public.

What they do with these wafers is where the fab magic takes place.


GlobalWafers Singapore Pte. Ltd. (Headquarter - Missouri, United States)

Okmetic Oy (Headquarter - Vantaa, Finland)

Shanghai Simgui Technology Co. Ltd. (Headquarter - Shanghai, China)

Shin-Etsu Chemical Co. (Headquarter - Tokyo, Japan)

Silicon Materials Inc. (Headquarter - Pennsylvania, United States)

Siltronic AG (Headquarter - Munich, Germany)

SK Siltron Co., Ltd. (Headquarter - Gumi City, South Korea)

Sumco Corporation (Headquarter - Tokyo, Japan)

Tokuyama Corporation (Headquarter - Tokyo, Japan)

Virginia Semiconductor, Inc. (Headquarter - Virginia, United States)

Wafer Works Corporation (Headquarter -Taoyuan City, Taiwan)

Ok, slight wording snaffu on my part. Several companies provide the wafers to say, TSMC. However, they are manufactured to TSMC's design specs. If you took a wafer intended for TSMC, and gave it to Samsung to do the lithography, it would not work out.
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,958
126
We've reached a new level of stupid idiocy, folks:


Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.
 

Tup3x

Senior member
Dec 31, 2016
959
942
136
We've reached a new level of stupid idiocy, folks:


Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.
Well, 4090 FE is actually a bit smaller than 3090 FE. You can bet that Radeons will be as large. AIBs are going nuts.
 
  • Like
Reactions: amenx
Jul 27, 2020
16,165
10,240
106
NVIDIA GeForce RTX 4090 - Where the misconception about 600 watts really comes from and why the cards are so huge | igor'sLAB (igorslab.de)

Finally, we have an explanation of what went wrong. Nvidia engineers gave a thermal design guide of 600W to AIBs but then the chip yield and quality proved to be phenomenal. Unfortunately, the manufacturing wheels had already started turning so it was too late to stop. Seems there should be revised boards soon, once the initial supply of oversized cards runs out. My guess is around January?
 

gorobei

Diamond Member
Jan 7, 2007
3,668
993
136
NVIDIA GeForce RTX 4090 - Where the misconception about 600 watts really comes from and why the cards are so huge | igor'sLAB (igorslab.de)

Finally, we have an explanation of what went wrong. Nvidia engineers gave a thermal design guide of 600W to AIBs but then the chip yield and quality proved to be phenomenal. Unfortunately, the manufacturing wheels had already started turning so it was too late to stop. Seems there should be revised boards soon, once the initial supply of oversized cards runs out. My guess is around January?
so the 4000 series could have been potentially a little cheaper than the 3000 from the better yields, but because nv overproduced 3000 so massively for miners and scalpers the glut forced them to overprice and delay the smaller ada cards until ampere supply clears out. so they look like greedy arrogant bastards for a few months and rely on the koolaid crowd to forget and forgive when they can price things at the normal teamgreen tax instead of the super mega teamgreen tax.

the molds for the aib cooler plastic exteriors have already been cut into injection dies, so they arent going to pay for re-machining a smaller/shorter tool die. so at best they might be able to get the heatpipe/finstack manufacturer (usually coolermaster) to recycle the old ampere thinner finstacks, meaning the revised cards wont be that much cheaper.
 
Last edited:

psolord

Golden Member
Sep 16, 2009
1,913
1,192
136
We've reached a new level of stupid idiocy, folks:


Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.

Considering it will be like 6 times faster than the PS5 in raster and 12 times faster in RT, it's not THAT bad in terms of engineering.

But yeah, with these prices they are not getting a cent from me either.