Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

utahraptor

Golden Member
Apr 26, 2004
1,052
199
106
Sorry for so many posts, but to add some closure I found a Haswell rig on FB market place for $140 that had no gpu or hard drive. I put my ram, SSD, psu and 1080 in it and I am back in action! I can’t believe the 4770k is running so much cooler on a simple air cooler than my 3770k was on a closed loop cooler. Maybe I had it hooked up wrong over all those years. Set it at auto OC to 4.2 GHz with no issue and just updated the chipset, audio and lan drivers to be safe.

6A9D270B-F64B-4D4B-8362-0922DBC4278E.jpeg
634BBB26-A9A4-4FB9-8E08-CE0E70F5460E.jpeg
 

biostud

Lifer
Feb 27, 2003
18,251
4,764
136
Sorry for so many posts, but to add some closure I found a Haswell rig on FB market place for $140 that had no gpu or hard drive. I put my ram, SSD, psu and 1080 in it and I am back in action! I can’t believe the 4770k is running so much cooler on a simple air cooler than my 3770k was on a closed loop cooler. Maybe I had it hooked up wrong over all those years. Set it at auto OC to 4.2 GHz with no issue and just updated the chipset, audio and lan drivers to be safe.

View attachment 74613
View attachment 74614
My old clc cooler pump died, so I changed it with for a hyper evo 212+ and haven't looked back. Maybe your cooler wasn't working properly.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
16,328
10,340
106
Sorry for so many posts, but to add some closure I found a Haswell rig on FB market place for $140 that had no gpu or hard drive. I put my ram, SSD, psu and 1080 in it and I am back in action! I can’t believe the 4770k is running so much cooler on a simple air cooler than my 3770k was on a closed loop cooler. Maybe I had it hooked up wrong over all those years. Set it at auto OC to 4.2 GHz with no issue and just updated the chipset, audio and lan drivers to be safe.

View attachment 74613
View attachment 74614
OK, my good man, pay your bills and let's put that 4090 bad boy back in for another trial run :D
 

utahraptor

Golden Member
Apr 26, 2004
1,052
199
106
OK, my good man, pay your bills and let's put that 4090 bad boy back in for another trial run :D
So at the time my computer died and I called Gigabyte technical support I was not aware my entire computer had died. I was in the denial stage of grief and in my mind only the GPU could have failed and nothing as drastic as the entire PC. I entered a replacement request with Newegg which was approved and drove to the UPS store over my lunch and mailed the 4090 back to Newegg. I don't know if it is dead or not. I thought it was at the time, but had no way to determine. I don't know yet what Newegg will do or the status of the card. In any event it will not fit in the new case by any way and I would not risk putting it in there. I will buy a large case (maybe Fractal Air Pop XL) whenever Ryzen 7000 3d cache chips launch and if I obtain a new 4090 it will go in there.
 
  • Like
Reactions: CP5670

In2Photos

Golden Member
Mar 21, 2007
1,629
1,651
136
NVIDIA making more changes to the 4080 already.

 
  • Like
Reactions: Ranulf and Mopetar

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106

I take this as a bad sign in that they felt like higher frequencies didn't matter because of the memory bottleneck. Still probably 3070 - 3070 Ti performance.
It's a cutdown version of AD106.
4060 Ti: 34SM; 160W; 32MB L2; 128bit 18gbps GDDR6 -> 288GB/s
4070 Ti: 60SM(+76.5%); 285W(+78%); 48MB L2(+50%); 192bit 21gbps GDDR6x -> 504GB/s (+75%)

There is no memory bottleneck and frequencies should be comparable.
The only problem with this card is 8GB Vram and likely price.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
It's probably a little early to say that a 4060 Ti with those specs won't have bandwidth issues. At least looking back at AMD's slides from the intro of Infinity Cache, hit rates are primarily resolution and cache size dependent. The difference in bandwidth and SM between the 4070 Ti and 4060 Ti is the same, but the 4060 Ti might have to go out to GDDR6 more often. If AMD's slide is still accurate, they show at 1080p 32MB of IC having a 55% hit rate and 48MB is more like 66%. That's a miss rate of 45% and 34%, meaning a 32MB GPU might need 33% more trips to memory than the 48MB one.

Maybe it won't be so bad, but 288GB/s is just so low. That's 64% of what the 3060 Ti had and is basically halfway between a 3060 and a 3050.

Edit: Especially considering that the 4070 Ti already shows indications of being bandwidth limited; going from leading a 3090 Ti at 1080p to being basically tied at 1440p and being ~10% down at 4k.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
It's probably a little early to say that a 4060 Ti with those specs won't have bandwidth issues. At least looking back at AMD's slides from the intro of Infinity Cache, hit rates are primarily resolution and cache size dependent. The difference in bandwidth and SM between the 4070 Ti and 4060 Ti is the same, but the 4060 Ti might have to go out to GDDR6 more often. If AMD's slide is still accurate, they show at 1080p 32MB of IC having a 55% hit rate and 48MB is more like 66%. That's a miss rate of 45% and 34%, meaning a 32MB GPU might need 33% more trips to memory than the 48MB one.

Maybe it won't be so bad, but 288GB/s is just so low. That's 64% of what the 3060 Ti had and is basically halfway between a 3060 and a 3050.
If 4070Ti has 76.5% more SM and TFLOPs will be >66% higher, then doesn't It mean It also need >66% more data to access?
Then 4070Ti would need to go out to GDDR6x more often than 4060Ti, because miss rate is only 33% better in comparison.
4070Ti would need to have 27% miss rate or 73% hit rate to be on par with 4060Ti. That would be ~72MB at 1080p.
Is this correct or did I write a BS?

edit: 4070 Ti has only 48MB L2, that's only 35% hitrate at 4K, so It's not surprising It starts to lose against a GPU with 2x more BW. Actually, even the difference between 3090 Ti vs 4080 is smaller at 4K than at full HD, 64MB L2 has ~40% hitrate at 4K.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
I don't believe so, but I would definitely cede the point to someone more knowledgeable than I. My basic understanding is that with increasing framerates comes increasing bandwidth requirements, as the GPU needs to go to memory (either cache or external) for every frame. However the size of memory required is dependent on the amount of data that's needed to be stored, which is resolution (and game) dependent. If the data needed is in cache you don't need to go to external memory, otherwise you do. For a given game and resolution, having a larger L2 means higher hit rates, so less trips to external memory. In the case of something like the 4070 Ti vs 4060 Ti, the larger cache means fewer misses and thus less data is needed from external memory at a given resolution, and the wider bus and faster memory mean that data takes less time to transfer.

This supposes that hit rate doesn't go down at higher framerates vs lower ones, but I don't logically think it does. Scene data is more time dependent that framerate dependent; you move from one room to another in X seconds, not in Y number of frames. That's just speculation on my part though.

Both vendors are using big caches to compensate for narrower bus width and lower bandwidth though. Something like the 6750XT used a big 96MB cache and still had 432GB/s to external memory. It'll be interesting to see how a card that should have similar gaming performance does at higher res with a cache 1/3 the size and 2/3rds the memory bandwidth.
 
  • Like
Reactions: TESKATLIPOKA
Jul 27, 2020
16,328
10,340
106
Scene data is more time dependent that framerate dependent; you move from one room to another in X seconds, not in Y number of frames. That's just speculation on my part though.
The proliferation of open world game designs is exacerbating the memory pressure issue. If you are going in a high speed vehicle, for instance, the game is constantly streaming data and this will introduce more latency as the data needs to be loaded into VRAM and older data evicted to make space for new data. Clever design where a lot of textures are re-used seamlessly without letting it be too obvious to the gamer is a solution but it would require more creativity and increase the development time to figure everything out. That's why more VRAM makes more sense now than increasing the cache size. There's a limit to the cache size but it's much easier and more economical to have a larger VRAM size, as long as you are not using crazy expensive memory tech like GDDR6X or HBM3.

I also think that onboard 32GB Optane on a graphics card as an additional cache layer could solve the latency issue further.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
I also think that onboard 32GB Optane on a graphics card as an additional cache layer could solve the latency issue further.
What would be the benefit of that? I wouldn't think there would be a use for the non-volatile nature of it on a GPU, and would there be much latency benefit to onboard Optane vs PCIe to main memory? I thought at least on the CPU side that the benefit of Optane was providing a layer of NVM between DRAM and SSD, with much larger capacities than you'd have DRAM but still faster than a traditional NVMe SSD.
Or do you mean using Optane as a cache to calls to NVMe through DirectStorage or something similar? If that was the case, wouldn't you want something a lot bigger than 32GB?
 
Jul 27, 2020
16,328
10,340
106
Or do you mean using Optane as a cache to calls to NVMe through DirectStorage or something similar? If that was the case, wouldn't you want something a lot bigger than 32GB?
64GB or 128GB would be better but I don't expect Intel to let that be economical. It should save time because access times on the Optane would be better than pulling data off the SSD with higher access times.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
64GB or 128GB would be better but I don't expect Intel to let that be economical. It should save time because access times on the Optane would be better than pulling data off the SSD with higher access times.
Ah, yeah. At that point it's almost like the AMD SSG, but with a little faster memory. It's kind beside the point now I guess, since Optane is EOL.
 
Jul 27, 2020
16,328
10,340
106
Ah, yeah. At that point it's almost like the AMD SSG, but with a little faster memory. It's kind beside the point now I guess, since Optane is EOL.
I hope Raja Koduri reads these forums. He can finally find Optane a nice home and purpose.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
I don't believe so, but I would definitely cede the point to someone more knowledgeable than I. My basic understanding is that with increasing framerates comes increasing bandwidth requirements, as the GPU needs to go to memory (either cache or external) for every frame. However the size of memory required is dependent on the amount of data that's needed to be stored, which is resolution (and game) dependent. If the data needed is in cache you don't need to go to external memory, otherwise you do. For a given game and resolution, having a larger L2 means higher hit rates, so less trips to external memory. In the case of something like the 4070 Ti vs 4060 Ti, the larger cache means fewer misses and thus less data is needed from external memory at a given resolution, and the wider bus and faster memory mean that data takes less time to transfer.

This supposes that hit rate doesn't go down at higher framerates vs lower ones, but I don't logically think it does. Scene data is more time dependent that framerate dependent; you move from one room to another in X seconds, not in Y number of frames. That's just speculation on my part though.

Both vendors are using big caches to compensate for narrower bus width and lower bandwidth though. Something like the 6750XT used a big 96MB cache and still had 432GB/s to external memory. It'll be interesting to see how a card that should have similar gaming performance does at higher res with a cache 1/3 the size and 2/3rds the memory bandwidth.
I think you are right and Hitrate won't be affected by higher FPS.

Then what should matter is actual BW needs.
If GPU A is 50% faster than GPU B, has the same 32MB L2 as GPU B but 50% higher BW to Vram, then are they comparable or the same L2 size is a bottleneck for the stronger one?
I will calculate based on Infinity cache 2, because there is practically no info about Ada's L2.
2005502394.png


Infinity cache 216 MB32 MB48 MB64 MB96 MB128 MB (doesn't exist)
B/CLK3847681152153623043072
Theoretical BW at 2.5GHz960 GB/s1920 GB/s2880 GB/s3840 GB/s5760 GB/s7680 GB/s
Hit rate at 1080p (BW)37 % (355 GB/s)55 % (1056 GB/s)65 % (1872 GB/s)72 % (2765 GB/s)78 % (4493 GB/s)80 % (6144 GB/s)
Hit Rate at 1440p (BW)23% (221 GB/s)38 % (730 GB/s)48 % (1382 GB/s)57 % (2189 GB/s)69 % (3974 GB/s)74 % (5683 GB/s)
Hit rate at 4K (BW)19 % (182 GB/s)27 % (518 GB/s)34 % (979 GB/s)42 % (1613 GB/s)53 % (3053 GB/s)62 % (4762 GB/s)
At 1080p 48MB IC has 77% higher BW than with 32MB.

I can't tell If the reason for 48MB L2 in AD104 is because of hitrate at higher resolutions, or also to provide higher BW at the same resolution because It's needed.

P.S. It looks like Infinity cache 2 is working at ~1.9GHz. Similar speed as the first Infinity cache.
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
The proliferation of open world game designs is exacerbating the memory pressure issue. If you are going in a high speed vehicle, for instance, the game is constantly streaming data and this will introduce more latency as the data needs to be loaded into VRAM and older data evicted to make space for new data. Clever design where a lot of textures are re-used seamlessly without letting it be too obvious to the gamer is a solution but it would require more creativity and increase the development time to figure everything out. That's why more VRAM makes more sense now than increasing the cache size. There's a limit to the cache size but it's much easier and more economical to have a larger VRAM size, as long as you are not using crazy expensive memory tech like GDDR6X or HBM3.

I also think that onboard 32GB Optane on a graphics card as an additional cache layer could solve the latency issue further.

Having an SSD would be really bad for this use case. The longevity would be poor due to the relatively low write life of solid state storage.

I suppose it could be argued that a level 4(?) cache could be handy. Where the GPU itself has its L1 and L2 memory on the GPU. Then you have the main memory, which basically acts as a L3 cache, and needs to be very fast with a high bandwidth. But you could then have a L4 piece of memory that could be much cheaper and not as performant. But still faster than going back to the CPU which then goes to system storage to grab additional resources. It would essentially put an end to load stuttering when entering new areas in a game, or anytime assets need to be loaded for the first time. They would just be pre-loaded into this slower memory.
 
  • Love
Reactions: igor_kavinski

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Is any of the 4000 series cards released so far a full die?

No, and I don't expect there to be any. They will leave room for Ti or Super refreshes.

Correction, the 4070 Ti is full die AD104.
 
Last edited: