I think Nvidia's naming convention is alphabetical. Fermi < Kepler < Maxwell < Pascal < Turing < Volta. So consumer Volta might be coming.
But...
Ryan Smith said:
For complete details on the Turing architecture, please see
our companion article. But in short Turing is an evolution of the Volta architecture, taking everything that made the GV100 fast, and then improving on it.
On the other hand, all of the new features for neural network acceleration and raytracing acceleration are irrelevant to existing Distributed Computing applications, which are FP32 centric. (FP64 projects, notably Milkyway, being the rare exceptions, are not benefiting from those features either.)
I am aware of one change in Volta over Pascal which affects FP32, but I don't quite understand whether or not existing applications benefit from it: The CUDA cores (re-?)gained individual program counters and stacks, allowing for finer grained thread scheduling. I have only briefly looked at the articles on Turing so far, and am not sure whether this update of Volta was carried over to Turing. —
Edit, also, L1 and L2 caches in Volta were tweaked, but the corresponding details for Turing are not yet published.
There was an arguably small step up in process technology from TSMC 16 nm FinFET (Pascal) to TSMC 12 nm FFN (Volta and Turing), which promises somewhat increased performance per Watt; though not as much as the step from TSMC 28 nm (Maxwell) to Pascal — AFAIU.
Looking at specs that are relevant to FP32 GPGPU computing:
1070 vs. 2070
150 W : 175 W (1 : 1.167)
1920 shaders : 2304 shaders (1 : 1.200)
The shader count was increased a little bit more than the power target.
This is good for performance as well as for perf/Watt, at least in workloads which are able to utilize all shaders.
1080 vs. 2080
180 W : 215 W (1 : 1.194)
2560 shaders : 2944 shaders (1 : 1.150)
The shader count was not increased as much as the power target.
While performance should go up, this is a bad sign for perf/Watt.
1080Ti vs. 2080Ti
250 W : 250 W (1 : 1.000)
3584 shaders : 4352 shaders (1 : 1.214)
The shader count was increased, but not the power target.
Good for perf/Watt and for performance, at least in workloads which are able to utilize all shaders.
(Note, there are plenty of Distributed Computing applications which are
not able utilize all shaders out of the box. IOW they do not scale well to GPUs with higher shader count. In Folding@home, this can be partially fixed by switching from Windows to Linux. In BOINC, there are fixes like running two or more jobs on the same GPU at once, or giving arcane command line arguments in app_config which are specific to the particular application, or by finding optimized applications from 3rd parties.)
Edit,
Only two things are potentially improved with the 2000 series: RAM speed and IPC. The 2070 and 2080 have 14Gbps GDDR6, while the 1080 only has 10Gbps GDDR5X, all on a 256-bit bus. But how many projects are VRAM-limited? Maybe, I'm guessing, Folding@Home and PrimeGrid GFN?
Good point. I am seeing appreciable memory controller utilization in SETI@home/cuda90 as well. On average not as high as shader utilization, but with occasional peaks. Having headroom for these peaks may help overall throughput a little bit.