GK110--when do you think it will be released?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

toyota

Lifer
Apr 15, 2001
12,957
1
0
Well, reading the thread again, RussianSensation and ViRGE have already suggested a possible GK114. (quite thorougly) I largely missed it. I suppose it's possible for it to have 9 or 10, or even 12 Clusters with 320-bit memory interface. (How much memory? 2GB or 3GB? Maybe 2.5 GB?) That sounds like quite a bit of work, though.
the way Nvidia names things, GK114 would just be an improved GK104 chip. since nothing is locked on GK104 then that means all the specs such as cores, tmus, rops and bus width would be unchanged. you cant just tack that stuff on because it would have to be a different chip. surely they are working on something though that can slot between the behemoth GK110 and GK104/GK114.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
the way Nvidia names things, GK114 would just be an improved GK104 chip. since nothing is locked on GK104 then that means all the specs such as cores, tmus, rops and bus width would be unchanged. you cant just tack that stuff on because it would have to be a different chip. surely they are working on something though that can slot between the behemoth GK110 and GK104/GK114.

There has been the statement that there wouldn't be a GK110 Geforce card unless AMD pulled a rabbit out of a hat and came out with something really powerful. If that's the line of thinking then they might be trying to make some super GK114 chip to compete against the 8970.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
what I don't understand is how FP32 could be enough precision to emulate MSAA, with 100% accuracy, when a RGBA16FP color buffer and 64 bit FP depth buffer is used.

Have a good use for FP64 as a depth buffer? I've seen some theoretical fractal demos, that's about it.

Also, a 32 bit FP zbuffer can't be 100% linear in eye space (error rate can be rather high with FP32 1 - (z/w) z buffer), so I don't understand how double precision shaders couldn't be useful for reducing error rate when trying to emulate a 32 bit fixed point logarithmic z buffer.

If you are worried about perfect linearity in eye space, use W instead? Perhaps I'm not seeing where you are going with your comments. Workarounds to both exist, changing code base, hardware and data structures around to deal with things that aren't issues aren't normally the sort of thing that I think of as reasonable logical progression. Not saying that they aren't issues, I'm trying to think of what you are doing that would require you to go about it in the manner you describe?

Also, how are CG movies not rendered with double precision if it's standard with SSE2?

Heh, was working with off line render engines when they started implementing SSE support. They use it as a vector processor, very high throughput FP32. All of the potential issues you bring up with potential accuracy issues for games is due to approximation hacks we use to get decent results. You aren't dealing with a dozen different fast as possible shader routines running into round errors when combined. You don't have to worry much about a MSAA routine for CGI either:)
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
There has been the statement that there wouldn't be a GK110 Geforce card unless AMD pulled a rabbit out of a hat and came out with something really powerful. If that's the line of thinking then they might be trying to make some super GK114 chip to compete against the 8970.

I think the entire statement was something like there would be no GK110 Geforce card THIS YEAR unless AMD pulled a rabbit out of a hat...
 

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
Have a good use for FP64 as a depth buffer? I've seen some theoretical fractal demos, that's about it.
Thanks for answering :) to answer you questiom... Probably not if a Wbuffer can be emulated with no sacrifice or if a 32 bit fixed point linear zbuffer is available in hardware with no sacrifice (except in terms of performance, as 32 bit log zbuffer can only be done without early z culling).
Not saying that they aren't issues, I'm trying to think of what you are doing that would require you to go about it in the manner you describe?
well, approximation hacks can't always be good enough for accuracy. I'm no programmer, incidentally, and sometimes hacks can be better because they're faster or they look better(although what looks better is actually subjective)... however, I think that more precision is necessary because approximation hacks can only go but so far in terms of accuracy.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
however, I think that more precision is necessary because approximation hacks can only go but so far in terms of accuracy.

The thing is that we are dealing with accuracy issues because we are using dozens of different hacks on top of each other to simulate things like Ray Tracing and Radiosity. What's interesting, actually using them requires a lot less accuracy as you aren't dealing with an averaged weighting of 20 truncated values, you are using one with a slight rounding error.

Just to give an idea, FP32 color has 1.097Trillion color values. Going to FP64 in computer graphics may well have some very viable uses, but I don't know of them yet.
 

Granseth

Senior member
May 6, 2009
258
0
71
(...)

Just to give an idea, FP32 color has 1.097Trillion color values. Going to FP64 in computer graphics may well have some very viable uses, but I don't know of them yet.

Won't 32 bit give you a little more than 4 billion combinations, I think you need 40 bit to get to 1 trillion.
But I don't know anything about color values outside of computer images, but there you often have other values that don't describe color but transparency and brightness.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
lopri,

If you recall HD6970, GTX480, HD7970, GTX680 rumored specs, they were all wrong until very near launch. Interestingly enough all those rumors shared 1 universal theme: unrealistic increases in performance/# of CUDA Cores / shader units.

I believe the 6970 was rumoured to come in at 1792 SP or so, GTX480 with 768 SPs, HD7970 rumours were anywhere from 2304 SP to 3072 SPs, while many of us expected a 'real' flagship Kepler GTX680 to completely wipe 7970 off the map after NV made comments that they are "relieved after seeing 7970 benchmarks." None of these very optimistic spec predictions came true.

Right now, many are once again being very optimistic and calling for a 1Ghz 2880 SP GTX780. I am going with the track record of all previous GPU spec misses (i.e., human nature makes us want to have as much increase as possible so we'd want a 1Ghz 2880 SP GK110 in theory). The reality is likely to be something between 1536 and 2880 SPs @ 1Ghz+ clocks OR much lower clock speeds with full 2880 SPs.

Just to remind you how off the mark the rumors were for GTX680 even in February 2012, less than 2 months away from launch:

"If the above rumors are true, GTX680 will offer approx 45% better performance than AMD Radeon HD7970, which is it's direct competitor. GTX670 will be a 20% improvement over HD7950, while GTX660Ti will perform 10% better than HD7950." ~ CPU World

I am putting about 0.1% credibility into the 2880 SP @ 1Ghz GK110 by Spring 2013 rumor at this point simply because every time someone has rumor calling for a 60-70% GPU speed increase, it doens't come true. Even if GTX780 is 40-45% faster than GTX680, that would be pretty impressive as it is.
 
Last edited:

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Won't 32 bit give you a little more than 4 billion combinations, I think you need 40 bit to get to 1 trillion.

FP32 D3D is actually a 128bit specification, 32bit per component.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Oak Ridge National Laboratory is getting it's first batches.
http://www.hpcwire.com/hpcwire/2012...h_of_kepler_gpus_for_titan_supercomputer.html

ORNL WHY U BIAS TOO MUCH?!!
Going with ultra expensive K20, while 7970 offers 1000 GFLOPS of double precision for only $429 :rolleyes:

WTF?

When all is said and done, the Titan supercomputer will perform at an estimated 20 peak petaflops. This will be achieved with 18,688 nodes running the latest 16-core AMD CPUs and 14,592 K20 GPUs. Each node will have 32 GB of memory, supplying 2GB per CPU core.

:eek:

EDIT: Didn't get to finish my post. AMD processors? What kind of black magic is this?
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
No magic - bounding. Cray is using Hypertransport so they must use AMD processors.

That's the reason why they did not build the announced system with Fermi because Bulldozer was so late that they are going directly to Kepler.
 

boxleitnerb

Platinum Member
Nov 1, 2011
2,605
6
81
Great news. I thought they would ship the first GPUs much later this year. Now I'm really pissed that I have to wait for the GTX780 until March 2013 (or even later).

Would it be so bad to have a "mini-launch" with maybe 2000 GK110 world wide for the enthusiasts? Wouldn't cost Nvidia that much but would be a good publicity stunt.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Oak Ridge National Laboratory is getting it's first batches.
http://www.hpcwire.com/hpcwire/2012...h_of_kepler_gpus_for_titan_supercomputer.html

ORNL WHY U BIAS TOO MUCH?!!
Going with ultra expensive K20, while 7970 offers 1000 GFLOPS of double precision for only $429 :rolleyes:

Does the article say that they bought the GPUs for double precision performance? How do you know that in a specific application (scientific computing) that they aren't using single precision or maybe K20 just works MUCH faster in a specific program that benefits substantially from the CUDA architecture?

Pretty funny you keep rolling your eyes though. You should post in this section of our forum to the guys who run MilkyWay, etc. that double precision compute is irrelevant. Just because all you do is play videogames, doesn't mean the entire world only buys GPUs for games. If someone wants to participate in distributed computing projects that use double precision, NV's consumer videocards are worthless. Same story for ALU performance for hashing --> bitcoin mining. This generation you ignored both of these unique AMD features. NV will gladly sell you similar MilkyWay@Home performance for 7-10x more $.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Does the article say that they bought the GPUs for double precision performance? How do you know that in a specific application (scientific computing) that they aren't using single precision or maybe K20 just works MUCH faster in a specific program that benefits substantially from the CUDA architecture?

Pretty funny you keep rolling your eyes though. You should post in this section of our forum to the guys who run Collatz Conjecture, MilkyWay, etc. that double precision compute is irrelevant. Just because all you do is play videogames, doesn't mean the entire world only buys GPUs for games. If someone wants to participate in distributed computing projects that use double precision, NV's consumer videocards are worthless. Same story for ALU performance for hashing --> bitcoin mining. This generation you ignored both of these unique AMD features.

I thought K20 was supposed to have the full gamut of computing power and functionality including monstrous double precision ability. "Big K". GK110. As opposed to K10 for specific compute functionality or "Little K" GK104.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Great news. I thought they would ship the first GPUs much later this year. Now I'm really pissed that I have to wait for the GTX780 until March 2013 (or even later).

Would it be so bad to have a "mini-launch" with maybe 2000 GK110 world wide for the enthusiasts? Wouldn't cost Nvidia that much but would be a good publicity stunt.

The question is, will you be willing to drop $599-649 for a GK110 Geforce card? I fully believe that is what we're going to get price wise. If the performance of GK110 over GK104 can be extrapolated by viewing GF110's performance over GF114, I think we're going to see a gtx785 @ $649, gtx780 @ $549, and a gtx770 @ $449 all based off GK110, all faster than GK104. GK104's refresh, and again I am guessing, will start at $379-399.

One good thing is that if GK110 is going into full production in October / November, then hopefully by the time they bring it to their Geforce lineup yields will be good enough that the highest end part won't have any fused off SMX units. It will be a BEAST right from the start.
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
The question is, will you be willing to drop $599-649 for a GK110 Geforce card? I fully believe that is what we're going to get price wise. If the performance of GK110 over GK104 can be extrapolated by viewing GF110's performance over GF114, I think we're going to see a gtx785 @ $649, gtx780 @ $549, and a gtx770 @ $449 all based off GK110, all faster than GK104. GK104's refresh, and again I am guessing, will start at $379-399.

One good thing is that if GK110 is going into full production in October / November, then hopefully by the time they bring it to their Geforce lineup yields will be good enough that the highest end part won't have any fused off SMX units. It will be a BEAST right from the start.

Those prices are reminiscent of GT200 Launch prices. Then AMD undercut with 48xx prices. GTX280 was 649.00 and GTX260 was 449.00 at launch.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I thought K20 was supposed to have the full gamut of computing power and functionality including monstrous double precision ability. "Big K". GK110. As opposed to K10 for specific compute functionality or "Little K" GK104.

It does, but I can't buy it for $429. I would have LOVED a GK110 gaming chip in a GTX680 with strong DP compute since I don't care about 275W of power consumption as long as superior performance is there.

But even if NV delivers 1 Tflops DP with GTX780, HD8970 will have even more DP compute than 7970, so NV will still be behind in this area I bet.

One good thing is that if GK110 is going into full production in October / November, then hopefully by the time they bring it to their Geforce lineup yields will be good enough that the highest end part won't have any fused off SMX units. It will be a BEAST right from the start.

GTX660Ti peaks at 146W
GTX670 peaks at 152W
GTX680 peaks at 186W
http://tpucdn.com/reviews/Palit/GeForce_GTX_660_Ti_Jet_Stream/images/power_peak.gif

All 3 have very similar GPU clocks, GTX660Ti/670 share same # of CUDA cores = 1344, while GTX670/680 shader same number of ROPs, and memory bandwidth.

GTX680's power consumption is significantly higher but the memory bandwidth and 32 ROPs are exactly the same as the 670, with only a small increase in GPU clocks, a little bit more TMUs and shaders.

How do you suppose NV will be able to sell a 2880 SP, 240 TMU, 48 ROP, 384-bit bus GTX780 without blowing way past 250W of GPU power and not dropping GPU clocks well below 1Ghz? Also, why wouldn't NV just play it safe and launch a much higher yielding 12-13 SMX GK110 chip? When you go to a much larger mass production scale, trying to yield a full 15 SMX 550mm^2 die chip is very difficult (we saw this with GTX480). Plus, that means 48 ROPs and 384-bit memory bus? The power consumption of such chip would be high unless I am missing something? Backing into the GPU clocks of K20 using its peak 1 Tflop DP performance suggests around 725-750mhz GPU clock (unless my math is totally off).
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
It does, but I can't buy it for $429. I would have LOVED a GK110 gaming chip in GTX680 with DP compute since I don't care about 275W of power consumption as long as superior performance is there. I have to wait until next generation but then HD8970 will have even more DP compute than K20, so NV will still be behind.

To be fair we should just wait and see what Big K is capable of. It might run right with dp and other compute functionality of Tahiti, or leagues ahead of it. Or anywhere in between.
Would be nice to buy one at 429. But you have 7970 to hold you over for a while.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Those prices are reminiscent of GT200 Launch prices. Then AMD undercut with 48xx prices. GTX280 was 649.00 and GTX260 was 449.00 at launch.

Well we all got a taste of how much AMD is itching to drive up it's ASP, so I doubt they will want to severely undercut Nvidia again anytime soon. But what you said is true and plausible. But again, I'm extrapolating GK110's full performance from how much faster GF110 was over GF114. A gtx580 had 33% more cores operating 6.5% slower than a gtx560ti, yet performance was regularly 40% faster in anandtech benchmarks. In comparing situations where the gtx560ti was not bandwidth bottlenecked, we get:

Crysis Warhead - 40.4% faster
Metro2033 - 41.3% faster
Batman AC - 61.4% faster
Shogun 2 - 30.5% faster
BF3 - 29% faster
Skyrim - 39.2% faster
Civ V - 45% faster

So even if GK110 doesn't scale higher over GK104 like GF110 did over GF110, if we make a guesstimate to assume GK110's cores perform the same as GK104 insofar as graphics, a fully functional GK110 has 2880 cores (87.5% more than GK104), and if it's operating at 850mhz, we still end up with 52% more graphical horsepower (and 50% more memory bandwidth). GK110, on paper, is a bigger step up from GK104 than GF110 was from GF114. If it delivers in line with what Fermi did, then it will be an absolute beast and come with a beastly price tag because I think the performance delta between GK110 and AMD's best Sea Islands will be greater than what was between Cayman and GF110.
 
Last edited:

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
How do you suppose NV will be able to sell a 2880 SP, 240 TMU, 48 ROP, 384-bit bus GTX780 without blowing way past 250W of GPU power and not dropping GPU clocks well below 1Ghz? Also, why wouldn't NV just play it safe and launch a much higher yielding 12-13 SMX GK110 chip? When you go to a much larger mass production scale, trying to yield a full 15 SMX 550mm^2 die chip is very difficult (we saw this with GTX480). Plus, that means 48 ROPs and 384-bit memory bus? The power consumption of such chip would be high unless I am missing something?

What you're saying makes perfect sense and I agree, I believe the power draw barrier will be GK110's biggest obstacle to it's full performance potential. My previous post right above this went on a little bit about how I think it will be a performance monster. Having 87.5% more cores than GK104, 50% more bandwidth, it can be clocked 200mhz lower than GK104 and still have a > 50% theoretical core throughput advantage. Since the cores aren't all that different from Fermi in functionality, I used GF110's performance over GF114 to show that 33% more cores operating at a slightly slower frequency still netted over 40% performance improvements in most situations. So I'm using that logic to say 87.5% more cores, operating at say 850mhz (23.5% slower speed than a reference gtx680) are still going to be incredibly powerful.

Also, if they are yielding fully functional chips in a large enough volume in march, then why wouldn't they release them as fully functional chips and sell them at a higher price?
 
Last edited: