Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

TESKATLIPOKA · Oct 9, 2022

jpiniero said:
The 4070 is very likely a cut AD104 - 7160 shaders, maybe 160 bit bus. I have doubts you will even see that soon, let alone AD106.

After all, the 3060 Ti GDDR6X is going to be announced soon.

AD104 - 7160 could also be 4070 Ti.
Unless Ti is reserved for higher clocked versions in the future.

Mopetar · Oct 9, 2022

TESKATLIPOKA said:
The fight AD106 vs N33 will be very interesting.

edit: AD106 with 40SM should perform a lot better than 3070-3070ti.

N33 is on N6 so I would think Nvidia would have a general advantage. The memory bus size and cache sizes are identical, but AD106 has 25% more shaders, so AMD would need some combination of 25% more clock speed or architectural advantages to go head to head.

They would have a cost advantage being on N6 with a similar overall die size, but full due AD106 should have an advantage on average.

GodisanAtheist · Oct 10, 2022

I'm starting to think N31 and N32 will occupy product branding all the way down to the 7700 line. N33 will be 7600XT and below.

AD104 will likely be competing with the cutdown N32 packages.

If AMD can get away with it, they absolutely will and NV has given them quite the opening.

TESKATLIPOKA · Oct 10, 2022

Mopetar said:
N33 is on N6 so I would think Nvidia would have a general advantage. The memory bus size and cache sizes are identical, but AD106 has 25% more shaders, so AMD would need some combination of 25% more clock speed or architectural advantages to go head to head.

They would have a cost advantage being on N6 with a similar overall die size, but full due AD106 should have an advantage on average.

RDNA2 vs Ampere are comparable in performance, with the same number of CU vs SM based on TPU game average in 4K.

	Boost clock	CU(WGP) or SM	Shaders	TFLOPs
RX 6950XT	2310 MHz (139%)	80(40)	5120 (100%)	23.7 (100%)
RTX 3080Ti	1665 MHz (100%)	80	10240 (200%)	34.1 (144%)

Possible specs:

Die Size [mm2]	Frequency [MHz]	CU(WGP) or SM	Shaders	TMU	ROP	Cache [MB]	Memory bus width [bit]
AD106	2610 ? (100%)	40 ?	5120 ?	160 ?	40-48 ?	32 ?	128 ?
N33	3200 ?	32(16)	4096	128 ?	48-64 ?	32	128
N22x	3600 (138%)	40(20)	2048	80	64	96	192

I am not sure AD106 has 40SM, but It's highly likely after looking at the Ampere lineup.
N33 vs AD106 It's hard to predict performance, so I made a super clocked N22x for comparison purposes, which should perform as AD106.

N33 vs N22x
N22x has 12.5% higher clocks and 25% more WGPs than N33, this should add up to 41% higher performance.
Now, the big question is how much better is RDNA3 WGP vs RDNA2 WGP.
We can guess as much as we want here.
It certainly won't be 2x better, but 50% better should be doable, and then N33 would be a bit faster than N22x.
If this turns out as true, then It can also compete with AD106.

Timorous · Oct 10, 2022

GodisanAtheist said:
I'm starting to think N31 and N32 will occupy product branding all the way down to the 7700 line. N33 will be 7600XT and below.

AD104 will likely be competing with the cutdown N32 packages.

If AMD can get away with it, they absolutely will and NV has given them quite the opening.

This has been pretty obvious since N33 was very strongly rumoured to have just a 128 bit bus because I don't see AMD going backwards on VRAM at the x700 tier. It would go down like a lead balloon.

Looking at what we have so far for the 4080 12GB it is ~ 3080Ti ~ 6950XT perf in raster. I expect a cut N32 will have more shaders than the 6950XT and run them at higher clocks and while I do see an IPC drop off I don't think it will be enough to offset the clockspeed gain so I expect 7700XT to come in around or slightly above 3090Ti performance and if that is true then it lands between the 4080 12GB and the 4080 16GB in a package with a lower BOM than the 4080 12GB.

I also think that at 1080p N33 will be closer to the 4080 12GB in performance than AD106 is to N33.

So ultimately I agree. AMD have a great chance to offer a very compelling product stack and really challenge NV for outright raster performance superiority as well as perf/$ superiority, and they can do it without gutting their margins as well which makes it that little bit more likely.

TESKATLIPOKA · Oct 10, 2022

Timorous said:
I also think that at 1080p N33 will be closer to the 4080 12GB in performance than AD106 is to N33.

I doubt that, even If we talk only about raster.

Timorous · Oct 10, 2022

TESKATLIPOKA said:
I doubt that, even If we talk only about raster.

This is the rumoured spec of AD106.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1414765a-6d42-4436-9158-e4071bcca163_1024x619.jpeg

The 4090 manages to show a 60% performance uplift in raster (based on the limited data we have seen so far) from 52% more shaders / TMUs running 35% higher boost clocks (not sure on actual clocks obviously) with 71% more ROPs with the same bandwidth and 96MB L2 cache. It has upgrade all around

AD106 vs GA106 has 20% more shaders (maybe TMUs) with an unknown boost clock increase with the same rop count, less bandwidth and 32MB L2 cache. It looks like an upgrade but it is not the same all over upgrade as the 4090 is over the 4090. If we say this config shows a 50% performance bump I think that is generous and that puts it into 3070 performance territory (1440p because 128 bit + 32MB cache starts to fall off at 4k).

N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.

Given the 4080 12GB looks to be ~ 3080Ti performance if we normalise to the 3070 then we get this ranking

Current GPU	Future GPU	Guess Performance Rating
3070	AD106	100
	N33	111 - 125
3080Ti	4080 12GB	131

So personally I think the above is generous to AD106 gains and understating N33 gains and we come out with N33 being between 10% and 25% faster than AD106 and being between 85% to 95% of 4080 12GB performance.

So I do think N33 is going to be closer to the 4080 12GB in performance than AD106 is to N33 performance. At worse N33 might slot in bang in the middle but given it is has a lower BOM than AD106 I think AMD are onto a winner. If you could get 90% of 4080 12GB performance at 1440p for half the price with a smaller die then you are doing a 4870 again except AMD will have faster products to sell you a well.

jpiniero · Oct 10, 2022

TMUs is 4 per SM, so you get 144 for AD106. AD106 even be slower than the 3070.

Hans Gruber · Oct 10, 2022

Which is better silicon, Samsung or TSMC?

TESKATLIPOKA · Oct 10, 2022

Timorous said:
This is the rumoured spec of AD106.

The 4090 manages to show a 60% performance uplift in raster (based on the limited data we have seen so far) from 52% more shaders / TMUs running 35% higher boost clocks (not sure on actual clocks obviously) with 71% more ROPs with the same bandwidth and 96MB L2 cache. It has upgrade all around

AD106 vs GA106 has 20% more shaders (maybe TMUs) with an unknown boost clock increase with the same rop count, less bandwidth and 32MB L2 cache. It looks like an upgrade but it is not the same all over upgrade as the 4090 is over the 4090. If we say this config shows a 50% performance bump I think that is generous and that puts it into 3070 performance territory (1440p because 128 bit + 32MB cache starts to fall off at 4k).

Tomorrow we will see how much performance uplift RTX4090 really brings.
Full GA106 had 30SM and GA104 48SM. Difference is 50%.
Full AD104 has 60SM. If AD106 has only 36SM then the difference is 66.67%.
20%(36 vs 30) more SM means +20% Shaders and +20% TMUs.
Bump in frequency is ~50%(2.7GHz).
36SM*128Cuda*2Flop*2.7GHz=24.88 TFLOPs, that's 15% above 3070Ti.
Let's say It performs 50% better than RTX3060(28SM), although It has +93% TFLOPs, +93% Texture Fillrate, +50% Pixel Fill rate, only bandwidth being lower.
That's about RTX 3070Ti level of performance at 1080p and RTX 3070 at 1440p(TPU: Cyberpunk 2077).

N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.

Given the 4080 12GB looks to be ~ 3080Ti performance if we normalise to the 3070 then we get this ranking

Current GPU Future GPU Guess Performance Rating
3070 AD106 100
N33 111 - 125
3080Ti 4080 12GB 131

So personally I think the above is generous to AD106 gains and understating N33 gains and we come out with N33 being between 10% and 25% faster than AD106 and being between 85% to 95% of 4080 12GB performance.

So I do think N33 is going to be closer to the 4080 12GB in performance than AD106 is to N33 performance. At worse N33 might slot in bang in the middle but given it is has a lower BOM than AD106 I think AMD are onto a winner. If you could get 90% of 4080 12GB performance at 1440p for half the price with a smaller die then you are doing a 4870 again except AMD will have faster products to sell you a well.

50% more performance than N23 would be between RTX 3070Ti and RX 6800 at 1080p and around RTX 3070 at 1440p (TPU: Cyberpunk 2077).
That's the same performance as AD106 at 1440p and a bit better at 1080p.
Of course If It's 60% It will aim higher, but can It? Even with 24gbps GDDR6 It would have only 37% higher bandwidth than RX 6650XT. Number of shader engines, WGP, IC and ROPs stays the same as N23. The increase in performance will have to come from higher clockspeed and WGP(shaders and maybe TMU).
BTW ADA104 has +67% SM(CUDA and TMU), +67% ROPs, 50% more L2 cache, 50% higher bandwidth. So only 31% higher performance than Ada106 looks too low in my opinion.

TESKATLIPOKA · Oct 10, 2022

Alleged NVIDIA GeForce RTX 4080 16GB 3DMark benchmarks have been leaked

25% higher score than 3090Ti.

30% higher score than 3090Ti.

Stuka87 · Oct 10, 2022

Hans Gruber said:
Which is better silicon, Samsung or TSMC?

Certainly TSMC in terms of the quality of the silicon. Samsung is quite a bit cheaper though.

exquisitechar · Oct 10, 2022

Timorous said:
N23 vs N33 has 100% more shaders with an unknown clock increase, the same rop count and more bandwidth. A 50% performance bump here is probably on the low end but that would already land it around 6800 numbers (1440p again) and a 60% increase is around 3080/6800XT tier.

N33 doesn't really have 100% more shaders.

igor_kavinski · Oct 10, 2022

TESKATLIPOKA said:
Alleged NVIDIA GeForce RTX 4080 16GB 3DMark benchmarks have been leaked
View attachment 68891
25% higher score than 3090Ti.

View attachment 68892
30% higher score than 3090Ti.

What are the odds that UL will release a new 3DMARK test where the 4090 is 50-70% faster? (You heard it here first!).

maddie · Oct 10, 2022

Hans Gruber said:
Which is better silicon, Samsung or TSMC?

The same, they have the same suppliers.

Timorous · Oct 10, 2022

TESKATLIPOKA said:
Tomorrow we will see how much performance uplift RTX4090 really brings.
Full GA106 had 30SM and GA104 48SM. Difference is 50%.
Full AD104 has 60SM. If AD106 has only 36SM then the difference is 66.67%.
20%(36 vs 30) more SM means +20% Shaders and +20% TMUs.
Bump in frequency is ~50%(2.7GHz).
36SM*128Cuda*2Flop*2.7GHz=24.88 TFLOPs, that's 15% above 3070Ti.
Let's say It performs 50% better than RTX3060(28SM), although It has +93% TFLOPs, +93% Texture Fillrate, +50% Pixel Fill rate, only bandwidth being lower.
That's about RTX 3070Ti level of performance at 1080p and RTX 3070 at 1440p(TPU: Cyberpunk 2077).

50% more performance than N23 would be between RTX 3070Ti and RX 6800 at 1080p and around RTX 3070 at 1440p (TPU: Cyberpunk 2077).
That's the same performance as AD106 at 1440p and a bit better at 1080p.

Taken from the ARC review because it is the latest chart TPU have so will be the latest drivers and their latest game suite.

50% over the 6600XT is 146% so is bang in the middle of 3070Ti and 6800, vs a 6650XT it would be closer to 6800.

50% over the 3060 is 135% which is 3070 tier.

Given the uplifts in AD102 vs GA102 and the 60-65% performance gain over the 3090Ti shown so far AD106 has less of an uplift of GA106 so yea a 50% uplift seems generous but not impossible.

TESKATLIPOKA said:
Of course If It's 60% It will aim higher, but can It? Even with 24gbps GDDR6 It would have only 37% higher bandwidth than RX 6650XT. Number of shader engines, WGP, IC and ROPs stays the same as N23. The increase in performance will have to come from higher clockspeed and WGP(shaders and maybe TMU).
BTW ADA104 has +67% SM(CUDA and TMU), +67% ROPs, 50% more L2 cache, 50% higher bandwidth. So only 31% higher performance than Ada106 looks too low in my opinion.

GPU	Shaders	ROPS	L2	Bus	Boost Clock	TFlops	Delta vs lower tier	Perf Gap	Tflop to FPS scale
4090	16384 (1.68x)	192 (1.71x)	96 (1.5x)	384 (1.5x)	2.51Ghz(1x)	82.575	1.69x	1.43x	0.85x
4080 16GB	9728 (1.27x)	112 (1.27x)	64 (1.33x)	256 (1.33x)	2.52Ghz(0.97x)	49.029	1.23x	1.2x	0.98x
4080 12GB	7680 (1.67x)	80 (1.67x)	48 (1.5x)	192 (1.5x)	2.61ghz (0.97x)	40.089	1.61x
AD106	4608	48	32	128	~2.7Ghz	24.883
6900XT	5120 (2x)	128 (2x)	128 (1.33x)	256 (1.33x)	2.25Ghz (0.87x)	23.040	1.74x	1.43x	0.827x
6700XT	2560	64	96	192	2.581Ghz	13.214

I eyeballed the charts NV showed for gap to lower tier so AD104 may be closer to 1.37x AD106 judging by the 4090 scaling but we will know more tomorrow.

Obviously without actual clock speeds we can only get a ballpark figure but it looks like ballpark will be
AD106 ~= 3070
N33 ~= 6800
4080 12GB ~= 3080Ti / 3090

It looks like quite a large gap between AD106 and the 4080 12GB but a cut down AD104 with 10GB ram could fill that gap to make a 4070.

It boils down to N33 uplift IMO but at 1440p I can see it splitting the difference between AD106 and the 4080 12GB. There is also every possibility that the GPU that uses AD106 is not using the full die as well so I stand by my estimate that N33 will be closer to 4080 12GB than whatever uses AD106 is to it.

exquisitechar said:
N33 doesn't really have 100% more shaders.

It will have double the FP32 and INT32 throughput so it is closer to double than Ampere was over Turing. Yes other bits are staying the same like ROP count and so on but RDNA2 seems more shader limited anyway so that just feels like a rebalance.

TESKATLIPOKA · Oct 10, 2022

Timorous said:
It will have double the FP32 and INT32 throughput so it is closer to double than Ampere was over Turing. Yes other bits are staying the same like ROP count and so on but RDNA2 seems more shader limited anyway so that just feels like a rebalance.

N33 has 2x more FP32 units, so throughput should be >2.5x depending on clockspeed increase(>25% increase), but It looks like you didn't mean the theoretical throughput.
N22 has the same number of ROPs as N23 and wasn't limited, so It shouldn't be a mayor bottleneck.

Does anyone know when will the RTX 4090 reviews come out tomorrow?

Stuka87 · Oct 10, 2022

maddie said:
The same, they have the same suppliers.

What do you mean? Fabs create the silicon wafers themselves based on their own processes. Unless you are speaking tongue in cheek in regards to the raw materials. Then sure, they may both get the raw minerals from the same places. But thats like saying a daewoo is equivalent to Porsche because they both get steel and aluminum from the same place.

maddie · Oct 10, 2022

Stuka87 said:
What do you mean? Fabs create the silicon wafers themselves based on their own processes. Unless you are speaking tongue in cheek in regards to the raw materials. Then sure, they may both get the raw minerals from the same places. But thats like saying a daewoo is equivalent to Porsche because they both get steel and aluminum from the same place.

I don't think so. Here is a list (2022) of the big players. This is part of the B<>B supply chain that's generally hidden from the public.

What they do with these wafers is where the fab magic takes place.

Top Silicon Wafer Manufacturing Companies Worldwide:

By IMARC Group some of the top silicon wafer manufacturing companies are being GlobalWafers Singapore Pte. Ltd., Okmetic Oy, Shanghai Simgui Technology Co. Ltd., Shin-Etsu Chemical Co., Silicon Materials Inc., Siltronic AG, SK Siltron Co., Ltd., Sumco Corporation, Tokuyama Corporation, Virginia...

www.imarcgroup.com

GlobalWafers Singapore Pte. Ltd. (Headquarter - Missouri, United States)

Okmetic Oy (Headquarter - Vantaa, Finland)

Shanghai Simgui Technology Co. Ltd. (Headquarter - Shanghai, China)

Shin-Etsu Chemical Co. (Headquarter - Tokyo, Japan)

Silicon Materials Inc. (Headquarter - Pennsylvania, United States)

Siltronic AG (Headquarter - Munich, Germany)

SK Siltron Co., Ltd. (Headquarter - Gumi City, South Korea)

Sumco Corporation (Headquarter - Tokyo, Japan)

Tokuyama Corporation (Headquarter - Tokyo, Japan)

Virginia Semiconductor, Inc. (Headquarter - Virginia, United States)

Wafer Works Corporation (Headquarter -Taoyuan City, Taiwan)

Stuka87 · Oct 10, 2022

maddie said:
I don't think so. Here is a list (2022) of the big players. This is part of the B<>B supply chain that's generally hidden from the public.

What they do with these wafers is where the fab magic takes place.

Top Silicon Wafer Manufacturing Companies Worldwide:

By IMARC Group some of the top silicon wafer manufacturing companies are being GlobalWafers Singapore Pte. Ltd., Okmetic Oy, Shanghai Simgui Technology Co. Ltd., Shin-Etsu Chemical Co., Silicon Materials Inc., Siltronic AG, SK Siltron Co., Ltd., Sumco Corporation, Tokuyama Corporation, Virginia...

www.imarcgroup.com

GlobalWafers Singapore Pte. Ltd. (Headquarter - Missouri, United States)

Okmetic Oy (Headquarter - Vantaa, Finland)

Shanghai Simgui Technology Co. Ltd. (Headquarter - Shanghai, China)

Shin-Etsu Chemical Co. (Headquarter - Tokyo, Japan)

Silicon Materials Inc. (Headquarter - Pennsylvania, United States)

Siltronic AG (Headquarter - Munich, Germany)

SK Siltron Co., Ltd. (Headquarter - Gumi City, South Korea)

Sumco Corporation (Headquarter - Tokyo, Japan)

Tokuyama Corporation (Headquarter - Tokyo, Japan)

Virginia Semiconductor, Inc. (Headquarter - Virginia, United States)

Wafer Works Corporation (Headquarter -Taoyuan City, Taiwan)

Ok, slight wording snaffu on my part. Several companies provide the wafers to say, TSMC. However, they are manufactured to TSMC's design specs. If you took a wafer intended for TSMC, and gave it to Samsung to do the lithography, it would not work out.

BFG10K · Oct 11, 2022

We've reached a new level of stupid idiocy, folks:

Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.

Tup3x · Oct 11, 2022

BFG10K said:
We've reached a new level of stupid idiocy, folks:

Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.

Well, 4090 FE is actually a bit smaller than 3090 FE. You can bet that Radeons will be as large. AIBs are going nuts.

igor_kavinski · Oct 11, 2022

NVIDIA GeForce RTX 4090 - Where the misconception about 600 watts really comes from and why the cards are so huge | igor'sLAB (igorslab.de)

Finally, we have an explanation of what went wrong. Nvidia engineers gave a thermal design guide of 600W to AIBs but then the chip yield and quality proved to be phenomenal. Unfortunately, the manufacturing wheels had already started turning so it was too late to stop. Seems there should be revised boards soon, once the initial supply of oversized cards runs out. My guess is around January?

gorobei · Oct 11, 2022

igor_kavinski said:
NVIDIA GeForce RTX 4090 - Where the misconception about 600 watts really comes from and why the cards are so huge | igor'sLAB (igorslab.de)

Finally, we have an explanation of what went wrong. Nvidia engineers gave a thermal design guide of 600W to AIBs but then the chip yield and quality proved to be phenomenal. Unfortunately, the manufacturing wheels had already started turning so it was too late to stop. Seems there should be revised boards soon, once the initial supply of oversized cards runs out. My guess is around January?

so the 4000 series could have been potentially a little cheaper than the 3000 from the better yields, but because nv overproduced 3000 so massively for miners and scalpers the glut forced them to overprice and delay the smaller ada cards until ampere supply clears out. so they look like greedy arrogant bastards for a few months and rely on the koolaid crowd to forget and forgive when they can price things at the normal teamgreen tax instead of the super mega teamgreen tax.

the molds for the aib cooler plastic exteriors have already been cut into injection dies, so they arent going to pay for re-machining a smaller/shorter tool die. so at best they might be able to get the heatpipe/finstack manufacturer (usually coolermaster) to recycle the old ampere thinner finstacks, meaning the revised cards wont be that much cheaper.

psolord · Oct 11, 2022

BFG10K said:
We've reached a new level of stupid idiocy, folks:

Bigger than a keyboard, almost as wide & tall as a PS5, won't even fit into his case, LMFAO.

I used to spend a lot of money on nVidia hardware, but they'll never get a cent from me for such garbage engineering.

Considering it will be like 6 times faster than the PS5 in raster and 12 times faster in RT, it's not THAT bad in terms of engineering.

But yeah, with these prices they are not getting a cent from me either.

Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Lifer

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Lifer

Diamond Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Golden Member

Lifer

Diamond Member

Platinum Member