Nvidia 2000-series: A bad value for DC?

Ken g6 · Aug 20, 2018

Let's look at the specs for the new ~~GTX~~ RTX 2000 series. A 2070 will have 2304 cores at a base clock of 1410MHz. A 1080 has 2560 cores at a base clock of 1607MHz. A GTX 1080 is going for as little as $440 right now. MSRPs for 2070s will be around $500.

Only two things are potentially improved with the 2000 series: RAM speed and IPC. The 2070 and 2080 have 14Gbps GDDR6, while the 1080 only has 10Gbps GDDR5X, all on a 256-bit bus. But how many projects are VRAM-limited? Maybe, I'm guessing, Folding@Home and PrimeGrid GFN? Any others? There was also mention that FP and integer work can be done simultaneously. That might help.

Didn't I see somebody around here mention they had access to a Volta Quadro or something? If anyone here does, how's the IPC on those shaders? (PrimeGrid PPS Sieve would tell you.) But, then, Turing seems to be different from Volta anyway.

Markfw · Aug 20, 2018

I just ordered another 1080TI for $730. With the prices they are charging for the 2xxx series, I still win in price/perf.

crashtech · Aug 20, 2018

Nvidia's not under much pressure to improve, I'd say.

UsandThem · Aug 20, 2018

People can't really blame them for the high prices. They have no true competitor, and as they found out during the mining craze, many people were willing to pay significantly more than MSRP for their cards. With them charging more upfront, they they will keep more of the money instead of the distributors and retailers. I bet their stockholders and investors are doing back-flips tonight.

I have to say though the prices are crazy (IMHO), and I think a lot of people who just game will spend more time on their consoles instead of dropping $500 - $1200 on a GPU. There will still be enough people who might bite their tongue, and still reluctantly shell out the money for a new GPU despite the high prices, where they will be fine for a while. However, if this is the pricing trend going forward, it can very well backfire on them.

Hopefully AMD can have a competitive product in the next several years, and who knows, maybe Intel's second venture into discreet GPUs will go a hell of lot better than their first attempt at it.

lane42 · Aug 20, 2018

These are turing's, no volta's ?
Are these what we want for D.C. or are the volta's coming.
A grand for a 2080ti

UsandThem · Aug 20, 2018

lane42 said:
These are turing's, no volta's ?
Are these what we want for D.C. or are the volta's coming.
A grand for a 2080ti

$1200 for the Founders Edition. MSRP of $999 for AIB, but I imagine there will be next to no cards at that price. Just like Pascal, the AIB cards should all well be over the Founders Edition's price. My personal bet will be that almost all of the AIB cards that are for sale will all be around the $1200 price, and go up from there.

Ken g6 · Aug 20, 2018

I think Nvidia's naming convention is alphabetical. Fermi < Kepler < Maxwell < Pascal < Turing < Volta. So consumer Volta might be coming.

StefanR5R · Aug 21, 2018

Ken g6 said:
I think Nvidia's naming convention is alphabetical. Fermi < Kepler < Maxwell < Pascal < Turing < Volta. So consumer Volta might be coming.

But...

Ryan Smith said:
For complete details on the Turing architecture, please see our companion article. But in short Turing is an evolution of the Volta architecture, taking everything that made the GV100 fast, and then improving on it.

On the other hand, all of the new features for neural network acceleration and raytracing acceleration are irrelevant to existing Distributed Computing applications, which are FP32 centric. (FP64 projects, notably Milkyway, being the rare exceptions, are not benefiting from those features either.)

I am aware of one change in Volta over Pascal which affects FP32, but I don't quite understand whether or not existing applications benefit from it: The CUDA cores (re-?)gained individual program counters and stacks, allowing for finer grained thread scheduling. I have only briefly looked at the articles on Turing so far, and am not sure whether this update of Volta was carried over to Turing. — Edit, also, L1 and L2 caches in Volta were tweaked, but the corresponding details for Turing are not yet published.

There was an arguably small step up in process technology from TSMC 16 nm FinFET (Pascal) to TSMC 12 nm FFN (Volta and Turing), which promises somewhat increased performance per Watt; though not as much as the step from TSMC 28 nm (Maxwell) to Pascal — AFAIU.

Looking at specs that are relevant to FP32 GPGPU computing:

1070 vs. 2070

150 W : 175 W (1 : 1.167)
1920 shaders : 2304 shaders (1 : 1.200)
The shader count was increased a little bit more than the power target.
This is good for performance as well as for perf/Watt, at least in workloads which are able to utilize all shaders.

1080 vs. 2080

180 W : 215 W (1 : 1.194)
2560 shaders : 2944 shaders (1 : 1.150)
The shader count was not increased as much as the power target.
While performance should go up, this is a bad sign for perf/Watt.

1080Ti vs. 2080Ti

250 W : 250 W (1 : 1.000)
3584 shaders : 4352 shaders (1 : 1.214)
The shader count was increased, but not the power target.
Good for perf/Watt and for performance, at least in workloads which are able to utilize all shaders.

(Note, there are plenty of Distributed Computing applications which are not able utilize all shaders out of the box. IOW they do not scale well to GPUs with higher shader count. In Folding@home, this can be partially fixed by switching from Windows to Linux. In BOINC, there are fixes like running two or more jobs on the same GPU at once, or giving arcane command line arguments in app_config which are specific to the particular application, or by finding optimized applications from 3rd parties.)

Edit,

Ken g6 said:
Only two things are potentially improved with the 2000 series: RAM speed and IPC. The 2070 and 2080 have 14Gbps GDDR6, while the 1080 only has 10Gbps GDDR5X, all on a 256-bit bus. But how many projects are VRAM-limited? Maybe, I'm guessing, Folding@Home and PrimeGrid GFN?

Good point. I am seeing appreciable memory controller utilization in SETI@home/cuda90 as well. On average not as high as shader utilization, but with occasional peaks. Having headroom for these peaks may help overall throughput a little bit.

Howdy · Aug 21, 2018

UsandThem said:
$1200 for the Founders Edition. MSRP of $999 for AIB, but I imagine there will be next to no cards at that price. Just like Pascal, the AIB cards should all well be over the Founders Edition's price. My personal bet will be that almost all of the AIB cards that are for sale will all be around the $1200 price, and go up from there.

With these "higher" price schemes the 20xx has, it has also kept the 10xx series pricing inflated. The 10xx have relaxed a little but are still at or slightly under original pricing from the AIB from what I am seeing.

Ken g6 · Aug 21, 2018

StefanR5R said:
On the other hand, all of the new features for neural network acceleration and raytracing acceleration are irrelevant to existing Distributed Computing applications, which are FP32 centric.

They don't appear to help my favorite project, PPS Sieve, either, since I made it INT64-centric.

I hadn't seen that Volta article, thanks!

What I'm trying to figure out is this: If you have 2304 "CUDA cores", do you have 2304 INT32 cores and 2304 FP32 cores, or 1152 of each?

Hans Gruber · Aug 21, 2018

Remember when the Titan came out and it was either $899 or $999? The top card at the end of every Nvidia product cycle run the last few generations? Now they raise the bar in an industry that has been flat for years. I guess 1080Ti's will be the fire sale item after bitcoin completely fails.

StefanR5R · Aug 21, 2018

Ken g6 said:
StefanR5R said:

On the other hand, all of the new features for neural network acceleration and raytracing acceleration are irrelevant to existing Distributed Computing applications, which are FP32 centric.

Click to expand...

They don't appear to help my favorite project, PPS Sieve, either, since I made it INT64-centric.

I totally forgot that not all projects leave the point floating around... I am idly wondering, are Amicable, Collatz, Enigma, RC5-72 operating on integer data too?

Ken g6 said:
What I'm trying to figure out is this: If you have 2304 "CUDA cores", do you have 2304 INT32 cores and 2304 FP32 cores, or 1152 of each?

From what I took from the news pieces so far: Superficially, you have 2304 "cores" of each. Turing intro article:

Ryan Smith said:
[...] the Turing architecture Streaming Multiprocessor (SM) itself is also learning some new tricks. In particular here, it’s inheriting one of Volta’s more novel changes, which saw the Integer cores separated out into their own blocks, as opposed to being a facet of the Floating Point CUDA cores.

But perhaps it is better to say just "ALU", not "core". Volta first quick look:

Ryan Smith said:
Now there are a bunch of unknowns here, including how flexible these cores are, and how much die space that they take up versus FP32 CUDA cores. But at a high level, this is looking like a relatively rigid core, which would make it very die-space efficient. By lumping together so many ALUs within a single core and without duplicating their control logic or other supporting hardware, the percentage of transistors in a core dedicated to ALUs is higher than on a standard CUDA core. The cost is flexibility, as the hardware to enable flexibility takes up space. So this is a very conscious tradeoff on NVIDIA’s part between flexibility and total throughput.

IOW there are separate arithmetic logic units, but they share register files, cache, dispatch/ scheduler units, etc.. On the other hand, much of the shared stuff has been beefed up vs. Pascal also, e.g. the added scheduling hardware.

StefanR5R · Aug 21, 2018

PS,

StefanR5R said:

That's GV100's SM floor plan, particularly.
For comparison,

(from the GTX 1080/1070 FE review, which are GP104, not GP100... But a "Pascal deep dive" article never happened, as far as I can tell.)

lane42 · Aug 21, 2018

UsandThem nailed it. Saw preorders at microcenter for 2080ti, $1150-1250.
1080ti's in the $650 range, almost half the price...….

Shmee · Aug 22, 2018

I would be interested in seeing how these do in terms of Ethereum mining and Equihash/Zhash. That said, high price means still may not be worth it. We will see, plus the software takes time to develop.

StefanR5R · Sep 19, 2018

Reviews which include something resembling a computational benchmark:

guru3d have one rendering benchmark:
VRAY GPU benchmark

AnandTech should have FAHBench when they get their review out.

Phoronix usually has some computational benchmarks, among them FAHBench. But the review isn't out yet, as Michael Larabel received a review sample just yesterday, but no driver.
previous FAHBench results on Linux, several 10-, 9-, 7-series cards

No other reviewer ever heard of consumer GPUs being used for computational workloads. I am not surprised, but am nevertheless rolling my eyes.

Markfw · Sep 19, 2018

I saw one bench that showed it like 25% faster, but compared to current 1080TI ? almost twice the price, so I am not biting anytime soon.

StefanR5R · Sep 21, 2018

Machine learning/ neural network benchmarks with TensorFlow (on CUDA 9, not using the dedicated tensor cores):

https://www.computerbase.de/2018-09...itt_maschinelles_lernen_mit_tensorflow_update
(in German)

(edit)
The Phoronix GPGPU review is out now:

https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti-compute&num=1
includes performance/Watt results, but no FAHBench. :-(

biodoc · Sep 21, 2018

The Anandtech review has FAHbench results. These are at founder's edition clock speeds I believe. I guess if you can get a 2080 for about the same price as a 1080Ti, then the 2080 is the way to go for folding.

StefanR5R · Sep 21, 2018

This may be reference TDP (and ~reference clocks), not Founders Edition.

On page 5 Nate Oh said:
Because NVIDIA is not productizing any other reference-quality GeForce RTX 2080 Ti and 2080 card besides the Founders Editions, which are non-reference by specifications, we've gone ahead and emulated the true reference specifications with a 90MHz downclock and lowering the TDP by roughly 10W. This is to keep comparisons standardized and apples-to-apples, as we always look at reference-to-reference results.

Either the "FE" designation is missing by mistake in the FAHBench graph, or he ran this test indeed only with the -10 W TDP modification on the 20 series cards.

So whether it's one or the other is not 100 % sure. The other problem which I have with this result is that I wonder how much the 1080 FE and 1080 Ti FE cards are constrained by their blower style cooler in this test. For a given chip and workload, lesser cooler performance causes the chip to work at a slightly worse frequency:voltage combination. (When hotter, the chip needs higher voltage at same frequency.)

After I saw AnandTech's FAHBench graph I was keen on seeing Phoronix' FAHBench results because it would have shown not only the performance but also the power consumption (while also having the influence of blower style cooler for 10 series vs. open cooler for 20 series). And it would have been Linux results.

Markfw · Sep 21, 2018

biodoc said:
The Anandtech review has FAHbench results. These are at founder's edition clock speeds I believe. I guess if you can get a 2080 for about the same price as a 1080Ti, then the 2080 is the way to go for folding.

So its $1350 vs $719 for 2080TI vs 1080TI FTW3 top card. The one they benched was stock. So I bet it would do better.

BTW, I downloaded it, and ran it on win 10 (score 86) and linux (score 95) with the default parameters, and he does not say what he used, but it can't be those parameters.

So if anyone knows what was used, let me know and I will run it again, but I still say I doubt those numbers using a 1080TI FTW3

Edit: Not sure what parameters they used, but if I choose nav for work unit and openCL I get 156.6 in windows.

alcoholbob · Sep 22, 2018

There's no obvious IPC improvement with Turing over Volta in regards to pure rasterization ability. And RTX cards are no faster than 10 series when it comes to per core rasterization of DX11. The improvements are all from asynchronous compute which carry over to about 20% IPC improvement in low level API rasterization.

TennesseeTony · Sep 22, 2018

Bob, you stopping back by in December and joining us for our annual Folding@Home race? Gotta keep the house warm in the winter, ya know?

biodoc · Sep 24, 2018

First F@H data for RTX2080.

Windows 10 enterprise.

Esti PPD is around 1.44M
GPU Load (GPU-Z), stable at 88%
Temp 75 - 77C degrees

PRCG 11728 (0, 862, 55)
Fahcore 21

Wiz · Sep 25, 2018

Thanks, I came looking for exactly this info.
I need to replace a workstation, was thinking of waiting until the vendor I use has 2080 cards in systems but looks like if I went with a pair of 1080Ti cards I'd be farther ahead.

Nvidia 2000-series: A bad value for DC?

Programming Moderator, Elite Member

Moderator Emeritus, Elite Member

Lifer

Elite Member

Diamond Member

Elite Member

Programming Moderator, Elite Member

Elite Member

Senior member

Programming Moderator, Elite Member

Platinum Member

Elite Member

Elite Member

Diamond Member

Memory & Storage, Graphics Cards Mod Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Diamond Member

Elite Member

Moderator Emeritus, Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member