GeForce Titan coming end of February

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

boxleitnerb

Platinum Member
Nov 1, 2011
2,605
6
81
There is always a tradeoff between power and die space. A much larger but lower clocked GPU should always be more efficient.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The only trade-off right now is power, not die size.
TSMC's 28nm allows a clockrate up to 1100MHz with less than 1,2V on kepler cards.

So the question is: What is giving nVidia more performance:
A smaller chip with a much higher clock or a wider chip with a lower clock?
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
I meant you can achieve a certain goal with two different approaches. Each one has its own pro and cons regarding power and die size.

I believe Titan could have better performance/W than GTX680, but in turn, GTX680 should have better perf/mm2.

well said. given that graphics is a massively parallel problem by going wider (more cuda cores) but slower (core clocks) Nvidia will gain on perf/watt but lose on perf/sq mm. It looks likely that Nvidia will gain 50% performance over GTX 680 for a 30% higher TDP. Such an increase in performance is logically possible.
 

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
well said. given that graphics is a massively parallel problem by going wider (more cuda cores) but slower (core clocks) Nvidia will gain on perf/watt but lose on perf/sq mm. It looks likely that Nvidia will gain 50% performance over GTX 680 for a 30% higher TDP. Such an increase in performance is logically possible.

Finally someone who see it my wai :p
 

SirPauly

Diamond Member
Apr 28, 2009
5,187
1
0
One other thing to keep in mind is that the GTX 680 is clocked past optimal Perf/Watt levels. It's a mid-range die from Nvidia that had it's TDP budget pushed past ideal levels to compete with the 7970.

I don't know, actually think nVidia was conservative on TDP.
 

SirPauly

Diamond Member
Apr 28, 2009
5,187
1
0
nVidia has been touting efficiency with Kepler since the beginning, may be the same with the potential Titan sku.
 

Arzachel

Senior member
Apr 7, 2011
903
76
91
well said. given that graphics is a massively parallel problem by going wider (more cuda cores) but slower (core clocks) Nvidia will gain on perf/watt but lose on perf/sq mm. It looks likely that Nvidia will gain 50% performance over GTX 680 for a 30% higher TDP. Such an increase in performance is logically possible.

While this might seem true at the first look, shaders are not the only thing affecting performance. Bigger memory bus, more ROPs, more TMUs, more cache etc. On the same node and using the same architecture, SKUs get increasingly less efficient with higher performance (salvage parts skew this somewhat, but it's generally true). I'd say 50% more performance for 50% increase in actual power draw on the same node is crazy good and 50% faster for 65-70% more power draw is far more likely and still pretty good.
 

boxleitnerb

Platinum Member
Nov 1, 2011
2,605
6
81
I don't know, actually think nVidia was conservative on TDP.

The GTX680 is already bandwidth limited compared to the GTX670 (that is why they are so close). A 1536 ALU card with 10-15% lower clocks would be quite a bit more efficient. I think Nvidia had this target in mind originally, but had to increase clocks to match Tahiti.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
The GTX680 is already bandwidth limited compared to the GTX670 (that is why they are so close). A 1536 ALU card with 10-15% lower clocks would be quite a bit more efficient. I think Nvidia had this target in mind originally, but had to increase clocks to match Tahiti.

They may have had target clocks in mind before samples came back, but they can't automatically adjust clocks based on competition. Not to a substantial degree anyways, and definitely not without detrimental consequences to power draw. Yields dictate the vast majority of a chip's final clocks.
 

boxleitnerb

Platinum Member
Nov 1, 2011
2,605
6
81
True. As far as I know, Nvidia clocked GK104 about 10% higher than they originally planned. 10% is certainly within reason.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Seems like this card will be limited by memory bandwidth just like GK104. It has roughly the same memory bandwidth as Tahiti despite having way more processing power. GPGPU applications don't benefit as much from memory bandwidth as rendering graphics that's why nv didn't go with a wider 512 bit bus or even 448bit. Sacrificing some shader performance for another memory controller would have made that card way faster in games, but it's clear that games are not the primary focus of GK110. On top of that wider memory bus would made its PCB more complex. That's why I find claims that it will be 2x faster than GTX680 absurd. I think it will be 40%-60% faster than GTX680 depending on the game. In games using compute extensively it can be a lot faster than that, but AFAIK we don't have such games just yet.

One way to increase its performance without increasing its TDP would be to make a much larger turbo boost percentage-wise. Clearly TDP must account for stressing both SP and DP shaders, unless they artificially limit its DP performance which doesn't seem far-fetched. Games don't need DP shaders at all. It would even make some sense aside from product segmentation, it could allow to increase base clock with the same TDP without resorting to a large unpredictable turbo boost.
 
Last edited:

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
Seems like this card will be limited by memory bandwidth just like GK104. It has roughly the same memory bandwidth as Tahiti despite having way more processing power. GPGPU applications don't benefit as much from memory bandwidth as rendering graphics that's why nv didn't go with a wider 512 bit bus or even 448bit. Sacrificing some shader performance for another memory controller would have made that card way faster in games, but it's clear that games are not the primary focus of GK110. On top of that wider memory bus would made its PCB more complex. That's why I find claims that it will be 2x faster than GTX680 absurd. I think it will be 40%-60% faster than GTX680 depending on the game. In games using compute extensively it can be a lot faster than that, but AFAIK we don't have such games just yet.

One way to increase its performance without increasing its TDP would be to make a much larger turbo boost percentage-wise. Clearly TDP must account for stressing both SP and DP shaders, unless they artificially limit its DP performance which doesn't seem far-fetched. Games don't need DP shaders at all. It would even make some sense aside from product segmentation, it could allow to increase base clock with the same TDP without resorting to a large unpredictable turbo boost.
Considering more people play at 1080p or less, this is a sound strategy and probably is what they're hoping for. It tends to work against nvidia's last few card generations that many reviews sites test at higher resolutions, as that's where AMD's cards seem to have an advantage. I'm not sure what particularly about nvidia's designs make them perform better at lower resolutions, maybe it's the extra ROP's or efficiently loading the cores, or maybe something else, but it's a trend I've noticed for awhile now.

That said, I agree that memory bandwidth will be a major factor at 1440p+, which is becoming more popular among enthusiasts thanks to the availability of the Korean monitors. If they do keep the clocks low to favor TDP targets, but keep the shaders active to hit performance targets, that's a very favorable setup for overclockers as long as we're given easy access to overvoltage options.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I think it will be 40%-60% faster than GTX680 depending on the game. In games using compute extensively it can be a lot faster than that, but AFAIK we don't have such games just yet.

Well there are 4 compute heavy titles out (Sleeping Dogs, Dirt Showdown, Sniper Elite V2, Hitman Absolution). HD7970 OC is > 50% faster than GTX680 in Sleeping Dogs. Dirt Showdown is a blowout. Bioshock Infinite may or may not be compute heavy but the slide for the game shows DirectCompute is used. There are are some compute heavy games but a very small #.

With 2688 SPs @ 925mhz and 288GB/sec (6Ghz GDDR5), GK110 would have about 53% more shader/pixel fillrate/memory bandwidth and 64% more texture fillrate specs over the 680.

Using Balla's suggested GTX670 as a better indicator of performance/watt for GK104 over the factory preoverclocked 680, the 670 draws about 152W peak in demanding games.

I am going to take a wild stab and say the Titan has 905-925mhz GPU clocks with 235W power consumption in the same game. But then I can't reconcile how a 550mm2 chip with 6GB of VRAM can hit 925mhz on 28nm node when a 365mm2 925mhz 7970 used 189W with 3GB of GDDR5. Could the 28nm node have matured that much?
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Well there are 4 compute heavy titles out (Sleeping Dogs, Dirt Showdown, Sniper Elite V2, Hitman Absolution). HD7970 OC is > 50% faster than GTX680 in Sleeping Dogs.

If AA is turned to low, is Sleeping Dogs still 50% faster?

EDIT: I just ran the internal benchmark GTX 670 oc'd @ 1440p. exact frame rates were: 82.5 average, 109.8 max, 50.0 minimum with AA on LOW, 23.5 average, 29.0 maximum, 15.7 minimum with AA maxed out. I don't think compute is what is killing Kepler in that game. Pretty sure memory bandwidth is what is killing frame rates on Kepler in Sleeping Dogs.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
RussianSensation has no clue. He thinks that OGSSAA is the same like compute (DirectCompute). Or that MSAA is the same like compute (DirectCompute). And he thinks that it's a new breakthrough in scientist that a card with 32% more compute performance is actually faster...
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
I am going to take a wild stab and say the Titan has 905-925mhz GPU clocks with 235W power consumption in the same game. But then I can't reconcile how a 550mm2 chip with 6GB of VRAM can hit 925mhz on 28nm node when a 365mm2 925mhz 7970 used 189W with 3GB of GDDR5. Could the 28nm node have matured that much?

http://www.techpowerup.com/reviews/HIS/Radeon_HD_6970/27.html

Not really all that different between Cayman and GF110 and GF110 was only a little better than gtx480 (in other words, still not very efficient).
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
RussianSensation has no clue. He thinks that OGSSAA is the same like compute (DirectCompute). Or that MSAA is the same like compute (DirectCompute). And he thinks that it's a new breakthrough in scientist that a card with 32% more compute performance is actually faster...

What are you on about? Where did you pull that HD7970 GE is 32% faster in DirectCompute/Computer Shader performance over GTX680? Do not directly mix Single Precision floating point w/ Direct Compute performance. It doesn't work like that.

Comparison of the GLOPs, memory bandwidth and other theoretical parameters of HD7870 vs. GTX670 and then looking at benchmarks of Dirt Showdown show how flawed your entire argument is.

14_dirt.png


HD7870 has less memory bandwidth, less texture fillrate, less GFLOPs (because GTX670 boosts over 915mhz effectively in games), same 2GB of VRAM as the 670, and barely more pixel fillrate and yet it outperforms the 670 by 28% in Dirt Showdown at 2560x1600 8AA.

I suggest you head over to this thread and start doing some researching/reading. boxleitnerb was open-minded about it but you seem to be stuck in Denial Land that GK104 doesn't have an issue with DirectCompute / Compute Shaders. I explain it in detail using examples from several compute games and show why memory bandwidth and GFLOPS alone do not explain the discrepancies for why GK104/VLIW architectures have issues in Compute Shader heavy titles compared to GCN parts. The mathematics, discussion of what DirectCompute means for games, graphs, it's all there. You should start reading on how GCN architecture actually works to understand what was so special about its redesign for Compute. It appears most people didn't read that article because to them it's still shocking to accept that GCN Tahiti XT is far more advanced than GK104/VLIW architectures are for Compute Shaders.

Don't get your panties in a bunch when a stock HD7970GE destroys the GTX680 in 3 of the 4 compute games I listed and obliterates it in Dirt Showdown (which just happens to be the most compute heavy title this generation).

Also, in case you didn't notice I said HD7970 OC. The 1050mhz 7970 has a 30%+ lead already and piling in more Stream Processor performance (1330mhz) will extend the lead to > 50% in Sleeping Dogs. And guess Stream Processors can do in a CU unit of Tahiti XT? Perform DirectCompute work faster. It just happens to be that you keep interchanging GLOPs and Stream Processor performance interchangeably because it happens to align in that 1 game by accident. The GLOPs math completely doesn't work in Sniper Elite V2 and Dirt Showdown which throws the GLOPs theory right out the window (just compare GTX580 to 680 in Sniper Elite V2 or HD6970 vs. 7870).

How about this:

Care to explain how a GTX690 has vastly superior theoretical performance in every metric possible, including memory bandwidth and GLOPs compared to a single HD7970GE but gets obliterated in Sleeping Dogs?

And please don't say it's VRAM bottlenecked because an HD7850 2GB is still beating it. Also don't say SLI scaling doesn't work because it's 77% compared to a single 680.

sleepingdogs_5760_1080.gif


Even if GK110 can match or beat HD7970GE in some of those DirectCompute heavy titles, on a per mm2, GCN is still going to be way more efficient which means a 550mm2 Tahiti XT would crush a GK110 in every single one of those compute titles as well.

There is not a single game that has been released so far where GTX680 is beating HD7970Ghz that incorporates heavy use of computer shaders. Known memory bandwidth hogs are Metro 2033 and Aliens vs. Predator. Don't start mixing and matching different game engines to prove a point.

Admit the facts that GK104 is inferior for DirectCompute / Computer Shaders and move on. NV caught AMD with its pants down with tessellation for 2 generations and NV got caught with its pants down with Compute.

Keep digging in your arsenal of excuses why GK104 is not a garbage chip for DirectCompute. You might as well spend a month trying to prepare a logical rebuttal.

===========================

DirectCompute is off-topic anyway. If you want to discuss it, head over to the relevant thread. Let's keep the topic to Titan here.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
What are you on about? Where did you pull that HD7970 GE is 32% faster in compute over GTX680? Single Precision floating point =! Direct Compute performance. It doesn't work like that. I suggest you head over to this thread and start doing some researching/reading. boxleitnerb was open-minded about it but you seem to be stuck in Denial Land that GK104 doesn't have an issue with DirectCompute / Compute Shaders.

I read it and i thought the exact same thing.

I explain it in detail using examples from several compute games and show why memory bandwidth and GFLOPS alone do not explain the discrepancies for why GK104/VLIW architectures have issues in Compute Shader heavy titles compared to GCN parts. The mathematics, discussion of what DirectCompute means for games, graphs, it's all there. You should start reading on how GCN architecture actually works to understand what was so special about its redesign for Compute. It appears most people didn't read that article because to them it's still shocking to accept that GCN Tahiti XT is far more advanced than GK104/VLIW architectures are for Compute Shaders.
Tahiti XT has between 25% and 40% more compute performance than a GTX680.
Maybe i should use a car comparison to show that theoretical a car with more hp is faster than a car with less...

Don't get your panties in a bunch when a stock HD7970GE destroys the GTX680 in 3 of the 4 compute games I listed and obliterates it in Dirt Showdown (which just happens to be the most compute heavy title this generation).
Right and in the 15 others is only a few % faster. But i guess these are not "compute games". BTW: Why is Sleeping Dogs, Hitman and Elite v2 a "compute heavy title"? Crysis 2 looks much better than Elite Sniper v2...

Also, in case you didn't notice I said HD7970 OC. The 1050mhz 7970 has a 30%+ lead already and piling in more shader performance (1330mhz) will extend the lead to > 50% in Sleeping Dogs in cases outside of MSAA and 1600P too.
Yeah, MSAA in Sleeping Dogs. :rolleyes:

Care to explain how a GTX690 has vastly superior theoretical performance in every metric possible, including memory bandwidth and GLOPs compared to a single HD7970GE but gets obliterated in Sleeping Dogs?

And please don't say it's VRAM bottlenecked because an HD7850 2GB is still beating it. Also don't say SLI scaling doesn't work because it's 77% compared to a single 680.

sleepingdogs_5760_1080.gif
These numbers are not real. Wizzard made a misstake.
Look at these:
sleepingdogs_2560_1600.gif


I hope you find the difference...

There is not a single game that has been released so far where GTX680 is beating HD7970Ghz that incorporates heavy use of computer shaders. Known memory bandwidth hogs are Metro 2033 and Aliens vs. Predator. Don't start mixing and matching different game engines to prove a point.
Yeah, and a Golf with 140PS is not faster than my BMW 120D with 177PS. I guess my BMW 120d is a wonder of a car, right...

Admit the facts that GK104 is inferior for DirectCompute / Computer Shaders and move on. NV caught AMD with its pants down with tessellation for 2 generations and NV got caught with its pants down with Compute.
Lol, "pants down"? You mean like AMD with the 7970 for $549 which got beaten by the GTX680 with the inferior compute performance?

Comparison of the GLOPs, memory bandwidth and other theoretical parameters of HD7870 vs. GTX670 and then looking at benchmarks of Dirt Showdown show how flawed your entire argument is.

14_dirt.png


Keep digging in your arsenal of excuses why GK104 is not a garbage chip for DirectCompute. You might as well spend a month trying to prepare a rebuttal.
Wow, now we using Showdown to show, that the GK104 is "garbage chip for DirectCompute"?
Man here is Assassin's Creed III:http://ht4u.net/reviews/2013/50_directx_11_grafikkarten_im_test/index15.php

GTX680 is 2x faster than the 7870.
Keep digging in your arsenal of excuses why [GCN] is not a garbage chip for DirectCompute. You might as well spend a month trying to prepare a rebuttal.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
EDIT: I just ran the internal benchmark GTX 670 oc'd @ 1440p. exact frame rates were: 82.5 average, 109.8 max, 50.0 minimum with AA on LOW, 23.5 average, 29.0 maximum, 15.7 minimum with AA maxed out. I don't think compute is what is killing Kepler in that game. Pretty sure memory bandwidth is what is killing frame rates on Kepler in Sleeping Dogs.

AA and Compute are related in those games. Some of the AA is calculated using Compute Shaders. Same story with Dirt Showdown forward + AA. That's the whole point. AMD is using Compute Shaders to perform certain graphical functions faster. SSAA in Sleeping Dogs is one of those. It says so on the AMD Blog.

It's not memory bandwidth since HD7850 can outperform GTX680 in Sleeping Dogs.
 

notty22

Diamond Member
Jan 1, 2010
3,375
0
0
Wait we can't find a good thing to say about Crysis 2-3, and now we are showcasing a game, because it has lopsided performance results?
Sleeping Dogs PC review: Grab a rolled up newspaper

99be76f87aa1b0ade1bdb87062aea12d.jpg


it only took about 2 hours of playing this sloppy PC port before I sent it to the doghouse.
Severely sloppy execution in numerous areas made me want to take a rolled up newspaper to Sleeping Dogs. The entirety of the first two hours I played was punctuated with sloppy programming, poor execution, and inexcusable bugs.
Once I’m in the game and switch it to full screen mode, however, I immediately notice that the user interface is a crappy console port that sort of lets you use the mouse for some functions (like clicking on buttons), but also insists that you navigate with the keyboard as well.
But my experience with the game suggests it’s a lousy after-thought of a console port. Maybe it’s a great game on the consoles. I’d recommend you try it there if you must play it. But on the PC
Sleeping Dogs (PC) is Chinese water torture in gaming form.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Tahiti XT has between 25% and 40% more compute performance than a GTX680. Maybe i should use a car comparison to show that theoretical a car with more hp is faster than a car with less...

You are hopeless. You have no clue as to the difference between Single Precision Compute and Direct Compute / Compute shaders. I don't think you even understand what a Compute Shader is. You keep talking about Single Precision performance. Single Precision compute =! Direct Compute shader performance. That argument can be dismissed:

GTX580 1.58 Tflops vs. GTX680 3.09 Tflops
HD6970 2.7 Tflops vs. HD7870 2.56 Tflops
HD7970GE 4.3 Tflops vs. HD6970 2.7 Tlops

sleepingdogs_2560_1600.gif


Now calculate the performance gains of those GPUs vs. each other. Your argument has failed you already. None of the math adds up. Missing link: Compute Shader performance.

Why is Sleeping Dogs, Hitman and Elite v2 a "compute heavy title"? Crysis 2 looks much better than Elite Sniper v2...

So now you are debating the efficiency of games vs. their graphics? What DirectCompute features does Crysis 2 have? I am all ears. Global illumination, contact hardening shadows? Let's go, name them. That's like me dismissing tessellation because Crysis 1 looks better than Batman AC without out. Changes nothing about the discussion of the actual feature being discussed.

Yeah, MSAA in Sleeping Dogs. :rolleyes:

Perfomance is accelerated because AMD leveraged Compute Shaders in the game engine. :rolleyes:

These numbers are not real. Wizzard made a misstake. I hope you find the difference...

Xbitlabs HD7870 beating GTX670 no problem in Dirt Showdown, a heavy Compute title.

Lol, "pants down"? You mean like AMD with the 7970 for $549 which got beaten by the GTX680 with the inferior compute performance?

Irrelevant to the discussion of DirectCompute. That's like me downplaying Tessellation advantage of GTX480's by focusing on its 6 months later launch, or its price against the 5870. What's that have to do with Fermi's tessellation advantage over Cypress/Cayman? Nothing. Dropping a red herring I see.

Wow, now we using Showdown to show, that the GK104 is "garbage chip for DirectCompute"?
Man here is Ascension Creed 3:
GTX680 is 2x faster than the 7870.

What DirectCompute features does AC3 have? I am all ears. Global illumination, contact hardening shadows? Let's go, name them.

While at it, please explain why i7 3770K and GTX690 experience sporadic FPS drops to 20-30 fps in that game in some parts? AC3 is a shoddy console port, one of the worst coded games last year. Again, has nothing to do with DirectCompute though.

===========================

Wait we can't find a good thing to say about Crysis 2-3, and now we are showcasing a game, because it has lopsided performance results?

Gameplay, level design, AI, etc. if Sleeping Dog is a bad game are all irrelevant. The discussion started regarding games that use DirectCompute for graphics. Now if you think the game is a poorly coded turd, that's a completely different argument. GCN parts still dominate VLIW or GK104 in the same poorly coded turd game code. So in essence it doesn't change the conclusion that GK104 is slower in DirectCompute. 3 more games prove this.

Bioshock Infinite could be a 5th. How many more Compute heavy games need to come out and HD7970GE beats 680? 10, 15, 25?

==========================

DirectCompute is off-topic anyway. If you want to discuss it, head over to the relevant thread. Let's keep the topic to Titan here. It's only fair to stop derailing this thread.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
What explains the 7870 or 7850 dominating the GK-104 [in Dirt Showdown]?

Apparently HD7850-7870's GLOPs, texture fill-rate, pixel fill-rate, VRAM, GPU clock speed, tessellation, memory bandwidth advantages over the GTX680........oh wait. [Clue: contact hardening shadows, global illumination, HDAO/AO are accelerated using DirectCompute in that game]
 
Last edited:
Status
Not open for further replies.