Memory BW is obviously why the 3090 Ti pulls ahead at 4K.
4070Ti has literally only half the Memory BW, and that is going to be most impactful at highest resolution. It would be more surprising if this wasn't the case.
A lot of those extra transistors will be attempting to compensate for the lost BW, but they fall short, some of the other transistors might be boosting RT, also more transistors can be used to boost clock speed as well...
OK. I should have compared only at 1080p, at that resolution It's a bit faster than RTX 3090Ti.
What you said is true, but don't forget GA102 has more of everything, SM, Cuda, ROPs, TMUs, memory width etc.
Here is maybe a better comparison.
| Transistors | SM | Cuda | TMU | ROP | RT cores | Tensor cores | Bus width | L2 cache | Base frequency | Boost frequency |
GA104 | 17.4 | 48 | 6144 | 192 | 96 | 48 | 192 | 256-bit | 4 MB | 1580 MHz | 1770 MHz |
ADA104 | 35.8 (+106%) | 60 (+25%) | 7680 (+25%) | 240 (+25%) | 80 (-17%) | 60 (+25%) | 240 (+25%) | 192-bit (-25%) | 48 MB (+1100%) | 2310 MHz (+46%) | 2610 MHz
(+47%) |
Performance is only 41-43% better, but transistor count is >100% higher.
I wonder how much transistors were used for the extra cache.