The immediate oddity here is that power efficiency is normally measured at a fixed level of power consumption, not a fixed level of performance. With power consumption of a transistor increasing at roughly the cube of the voltage, a “wider” part like Ampere with more functional blocks can clock itself at a much lower frequency to hit the same overall performance as Turing. In essence, this graph is comparing Turing at its worst to Ampere at its best, asking “what would it be like if we downclocked Ampere to be as slow as Turing” rather than “how much faster is Ampere than Turing under the same constraints”. In other words, NVIDIA’s graph is not presenting us with an apples-to-apples performance comparison at a specific power draw.
If you actually make a fixed wattage comparison, then Ampere doesn’t look quite as good in NVIDIA’s graph. Whereas Turing hits 60fps at 240W in this example, Ampere’s performance curve has it at roughly 90fps. Which to be sure, this is still a sizable improvement, but it’s only a 50% improvement in performance-per-watt. Ultimately the exact improvement in power efficiency is going to depend on where in the graph you sample, but it’s clear that NVIDIA’s power efficiency improvements with Ampere, as defined by more normal metrics, are not going to be 90% as NVIDIA’s slide claims.
All of which is reflected in the TDP ratings of the new RTX 30 series cards. The RTX 3090 draws a whopping 350 watts of power, and even the RTX 3080 pulls 320W. If we take NVIDIA’s performance claims at their word – that RTX 3080 offers up to 100% more performance than RTX 2080 – then that comes with a 49% hike in power consumption, for an effective increase in performance-per-watt of just 34%. And the comparison for the RTX 3090 is even harsher, with NVIDIA claiming a 50% performance increase for a 25% increase in power consumption, for a net power efficiency gain of just 20%.