Titan X Launch

Paratus · Mar 22, 2015

This may be a dumb question, but if Titan X is the full GM200 and 601mm^2, which I think is the maximum die size TSMC can manufacture, but only has 1/32 DP capability is there another version of GM200 that has 1/3DP for the professional market?

Would they be the same chip with the Titan having been crippled or does the professional card have more transistors? If it's the latter how are they manufacturing it?

alcoholbob · Mar 22, 2015

Paratus said:
This may be a dumb question, but if Titan X is the full GM200 and 601mm^2, which I think is the maximum die size TSMC can manufacture, but only has 1/32 DP capability is there another version of GM200 that has 1/3DP for the professional market?

Would they be the same chip with the Titan having been crippled or does the professional card have more transistors? If it's the latter how are they manufacturing it?

Remember Maxwell originally was built for the TSMC 20nm process before it ended up being too expensive and Nvidia had to go back to the drawing board and redesign it for 28nm.

The GM204 is 398mm2 on 28nm, if it had been on 20nm it would have been around 280-290mm2 which is the size of the GTX 680.

GM200 on the other hand is 601mm2 on 28nm, if it had been on 20nm it would have been 430mm2, which would be smaller than any flagship Nvidia had released since 2007.

A full-size compute/gaming flagship of say 561mm2 on 20nm would scale to 785mm2 on 28nm, or 30% bigger than GM200.

So essentially it couldn't fit on the die on this process and still be a fast gaming card.

RaulF · Mar 22, 2015

Paratus said:
This may be a dumb question, but if Titan X is the full GM200 and 601mm^2, which I think is the maximum die size TSMC can manufacture, but only has 1/32 DP capability is there another version of GM200 that has 1/3DP for the professional market?

Would they be the same chip with the Titan having been crippled or does the professional card have more transistors? If it's the latter how are they manufacturing it?

From reading other post here, it seems GM200 was built from the ground up as a gaming card full on. That's why it does not have a big compute side to it.

3DVagabond · Mar 22, 2015

RaulF said:
From reading other post here, it seems GM200 was built from the ground up as a gaming card full on. That's why it does not have a big compute side to it.

That's really strange. Are nVidia simply conceding that they can't compete with GCN on compute loads, so they aren't even trying? It's such a large part of their business (profits wise).

SolMiester · Mar 23, 2015

3DVagabond said:
That's really strange. Are nVidia simply conceding that they can't compete with GCN on compute loads, so they aren't even trying? It's such a large part of their business (profits wise).

LOL, they already have Tesla, I believe they are skipping Tesla on Maxwell this time around....

SPBHM · Mar 23, 2015

3DVagabond said:
That's really strange. Are nVidia simply conceding that they can't compete with GCN on compute loads, so they aren't even trying? It's such a large part of their business (profits wise).

they have GK210 for DP
http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu

3DVagabond · Mar 23, 2015

SPBHM said:
they have GK210 for DP
http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu

Hawaii kills Kepler though in DP. 2.62 TFLOPS from a single GPU. I'm just surprised.

ViRGE · Mar 23, 2015

3DVagabond said:
Hawaii kills Kepler though in DP. 2.62 TFLOPS from a single GPU. I'm just surprised.

Tesla sells extremely well, and it runs CUDA*. On paper AMD should be doing better than they do, but in practice NVIDIA's killing it. In any case, NV's compute business is big enough now that they aren't piggybacking on graphics GPUs anymore, hence GM200 is 602mm^2 of graphics while GK210 is used solely for compute.

* CUDA was miles ahead of OpenCL in functionality in the early years, and that lead has resulted in the creation of a large NV-only ecosystem

Cookie Monster · Mar 23, 2015

3DVagabond said:
Hawaii kills Kepler though in DP. 2.62 TFLOPS from a single GPU. I'm just surprised.

Theoretical maximums does not necessarily equal real world results. I can tell you just how hard it is to extract GCN's peak numbers if even possible to do so for a real world application (maybe for very specific things?). And its also a combination of things like having vast libraries/tools (platform) to really help take advantage of the hardware. Yet AMD doesn't do very well here. This is actually well documented and a reason why CUDA is so successful in the world of academia and in the industrial side of things e.g Ansys software which I use alot really benefits from GPGPU (nVIDIA only though at this point in time).. saves days off some of my very complex simulation runs.

It's the same story all the way from G80 days. The choice of architecture meant that AMD GPUs although having very high peak theoretical maximums would not be able to sustain it in real world application due to the complexity involved in utilizing all the available hardware resources. nVIDIA on the other hand has always had lower peaks but it is much much easier to not only code, but extract high utilization and i.e. sustain a higher rate than AMD.

As SPBHM pointed out, GK210 (a re-spin of GK110 with increased register count/cache for compute) is there to fill in the gap. No ones abandoning such a lucrative business especially when nVIDIA was the one heavily driving the GPGPU approach. I bet nVIDIA had no choice but to focus on gaming with Maxwell since its heavily restricted by physical die size and power consumption limit (28nm limitation). I think its a good idea for gamers especially because compute also doesn't necessarily mean FP64 i.e DP. Some real world apps require FP64, some FP32 and then theres FP16.

3DVagabond · Mar 23, 2015

ViRGE said:
Tesla sells extremely well, and it runs CUDA*. On paper AMD should be doing better than they do, but in practice NVIDIA's killing it. In any case, NV's compute business is big enough now that they aren't piggybacking on graphics GPUs anymore, hence GM200 is 602mm^2 of graphics while GK210 is used solely for compute.

* CUDA was miles ahead of OpenCL in functionality in the early years, and that lead has resulted in the creation of a large NV-only ecosystem

So, because of CUDA nVidia isn't going to continue to compete at the hardware level? How does that make sense? GK210 doesn't come close to matching Hawaii in DP performance. I think it's 1.45 TFLOPS to 2.6 TFLOPS in favor of Hawaii. Even with no more improvement in efficiency Fiji "could" be more than 2x as fast as GK210. I would think these big commercial installations can't simply ignore greater than 2x the efficiency.

Cookie Monster · Mar 23, 2015

ViRGE said:
Tesla sells extremely well, and it runs CUDA*. On paper AMD should be doing better than they do, but in practice NVIDIA's killing it. In any case, NV's compute business is big enough now that they aren't piggybacking on graphics GPUs anymore, hence GM200 is 602mm^2 of graphics while GK210 is used solely for compute.

* CUDA was miles ahead of OpenCL in functionality in the early years, and that lead has resulted in the creation of a large NV-only ecosystem

Exactly. It is the reason why OpenCL is severely neglected by nVIDIA.

Cookie Monster · Mar 23, 2015

3DVagabond said:
So, because of CUDA nVidia isn't going to continue to compete at the hardware level? How does that make sense? GK210 doesn't come close to matching Hawaii in DP performance. I think it's 1.45 TFLOPS to 2.6 TFLOPS in favor of Hawaii. Even with no more improvement in efficiency Fiji "could" be more than 2x as fast as GK210. I would think these big commercial installations can't simply ignore greater than 2x the efficiency.

If you start talking about commercial installations.. you can't really use an one to one comparison. Many GPUs goes into a rack where there is a power requirement and cooling requirement (often with heatsinks cooled by the intake/outtake fans). These cards often are underclocked to meet these targets with many many validation checks. The question here is, just how well do these cards stack up once equalized in terms of power consumption?

If there is a 10W difference, say over a 1000 of them. That is simply alot. Imagine an extra 10kW of heat being dissipated. The air conditioning system has to be more powerful. They all have a knock on effect. And we all know when it comes to realworld perf/w efficiency in such markets, GK110 is pretty good at it.

As for fiji, it will be interesting to see what its capable of in this area.

3DVagabond · Mar 23, 2015

Cookie Monster said:
If you start talking about commercial installations.. you can't really use an one to one comparison. Many GPUs goes into a rack where there is a power requirement and cooling requirement (often with heatsinks cooled by the intake/outtake fans). These cards often are underclocked to meet these targets with many many validation checks. The question here is, just how well do these cards stack up once equalized in terms of power consumption?

If there is a 10W difference, say over a 1000 of them. That is simply alot. Imagine an extra 10kW of heat being dissipated. The air conditioning system has to be more powerful. They all have a knock on effect. And we all know when it comes to realworld perf/w efficiency in such markets, GK110 is pretty good at it.

As for fiji, it will be interesting to see what its capable of in this area.

Hawaii wins in efficiency too.

Now, I'm not dismissing CUDA or it's importance in the equation. Hardware is worthless without software and nVidia has put a lot into developing CUDA. It's not unlike DX. But, as we are seeing with DX, other API's can come along and compete. As we're seeing with ARM vs X86, it doesn't matter how entrenched something is it's not safe from competition. You let your hardware slip too much and you can reach a point where it's just to much to continue competing.

Sorry. To get it a bit back on topic, People are talking like it's a conscious decision on nVidia's part to neglect compute for gaming with their latest designs. People were even saying that with Kepler because it wasn't as powerful as GCN. I find this questionable though. Is it by design? (I personally doubt it considering the importance of compute for nVidia's business.) Or is it because they've had to? (Is this the only way they can get the efficiency increase they are going for?)

JoeRambo · Mar 23, 2015

Cookie Monster said:
As for fiji, it will be interesting to see what its capable of in this area.

Is there ECC variant of HBM atm? Without ECC noone will touch it in computing. Also there is issue of HBM VRAM capacity.

Great memory bandwidth combined with lower latency can work wonders in science computing. Combined with HBM power efficiency, AMD could at least improve their technical standing.

Keysplayr · Mar 23, 2015

3DVagabond said:
So, because of CUDA nVidia isn't going to continue to compete at the hardware level? How does that make sense? GK210 doesn't come close to matching Hawaii in DP performance. I think it's 1.45 TFLOPS to 2.6 TFLOPS in favor of Hawaii. Even with no more improvement in efficiency Fiji "could" be more than 2x as fast as GK210. I would think these big commercial installations can't simply ignore greater than 2x the efficiency.

Not exactly. A whole "ecosystem", as ViRGE put it, is needed to fulfill buyer confidence in a product and a company that backs it. Tesla units are the prime choice for compute and it's due to Nvidia's early delve into this not yet established arena. Everyone laughed (I mean everyone. Nvidia proponents and opponents alike) at first. Now look at them. They run the joint.

3DVagabond · Mar 23, 2015

Keysplayr said:
Not exactly. A whole "ecosystem", as ViRGE put it, is needed to fulfill buyer confidence in a product and a company that backs it. Tesla units are the prime choice for compute and it's due to Nvidia's early delve into this not yet established arena. Everyone laughed (I mean everyone. Nvidia proponents and opponents alike) at first. Now look at them. They run the joint.

That's not in debate. Their two latest arch's being relatively poor in compute is. Especially it appears so for Maxwell even more than Kepler which wasn't particularly good.

tviceman · Mar 23, 2015

Astrallite said:
Remember Maxwell originally was built for the TSMC 20nm process before it ended up being too expensive and Nvidia had to go back to the drawing board and redesign it for 28nm.

The GM204 is 398mm2 on 28nm, if it had been on 20nm it would have been around 280-290mm2 which is the size of the GTX 680.

GM200 on the other hand is 601mm2 on 28nm, if it had been on 20nm it would have been 430mm2, which would be smaller than any flagship Nvidia had released since 2007.

A full-size compute/gaming flagship of say 561mm2 on 20nm would scale to 785mm2 on 28nm, or 30% bigger than GM200.

So essentially it couldn't fit on the die on this process and still be a fast gaming card.

I'm curious how you are deriving your math. TSMC's 20nm has up to 1.9x the transistor density of 28nm. Not that Nvidia would have achieved perfect scaling, but even 1.5x scaling would have resulted in smaller dies than what you are guesstimating.

Dribble · Mar 23, 2015

I suspect that titan X was originally never a flagship, just the xx114 chip for 20nm. It would have been the new the 680, however when 20nm was late they just tweaked it a bit and made it at 28nm, called it something fancy and charged a lot of money for it. Hence it not being a true 64 bit compute chip like previous *real* titans.

The next big compute monster for tesla is probably too big for 28nm and is waiting for 20nm.

Erenhardt · Mar 23, 2015

JoeRambo said:
Is there ECC variant of HBM atm? Without ECC noone will touch it in computing. Also there is issue of HBM VRAM capacity.

Great memory bandwidth combined with lower latency can work wonders in science computing. Combined with HBM power efficiency, AMD could at least improve their technical standing.

most likely. Think about HBM as a regular GDDR5 downclocked to idle speeds, but so wide its not even funny.
No reason why there shouldn't be ECC

ViRGE · Mar 23, 2015

3DVagabond said:
That's not in debate. Their two latest arch's being relatively poor in compute is. Especially it appears so for Maxwell even more than Kepler which wasn't particularly good.

On the contrary, Kepler (GK110/GK210) is extremely good for compute. DGEMM efficiency is at 93%, which is very high. Even Maxwell should be quite good for compute, though only for FP32.

Meanwhile, as Cookie has already noted, AMD has the higher performance on paper, but there are multiple factors in getting all of that performance in the real world. While AMD is no slouch, for whatever reason it seems to be harder to get peak performance out of GCN.

Sorry. To get it a bit back on topic, People are talking like it's a conscious decision on nVidia's part to neglect compute for gaming with their latest designs. People were even saying that with Kepler because it wasn't as powerful as GCN. I find this questionable though. Is it by design? (I personally doubt it considering the importance of compute for nVidia's business.) Or is it because they've had to? (Is this the only way they can get the efficiency increase they are going for?)

Can it not be both? They already have GK210, which at 550mm2 is quite large. GM200 stripped the FP64 capabilities for more graphics capabilities. That option isn't really on the table for compute.

Plus Pascal will be here next year anyhow. So a Tesla Maxwell would be very short lived compared to the 3-4 years of Tesla Kepler.

Keysplayr · Mar 23, 2015

3DVagabond said:
So, because of CUDA nVidia isn't going to continue to compete at the hardware level? How does that make sense? GK210 doesn't come close to matching Hawaii in DP performance. I think it's 1.45 TFLOPS to 2.6 TFLOPS in favor of Hawaii. Even with no more improvement in efficiency Fiji "could" be more than 2x as fast as GK210. I would think these big commercial installations can't simply ignore greater than 2x the efficiency.

See post 434 for a really great explanation that directly addresses your question.

Paratus · Mar 23, 2015

Thanks guys. I think that answered my question on the state of NVs high end gameing vs entry compute cards.

:thumbsup:

Jaydip · Mar 23, 2015

Another important thing to consider is ROI.Last year we upgraded from Q6000 to QK6000 and if NV releases another K6000 successor now many customers will be upset.Big customers don't upgrade in a jiffy you know

, it takes a hell long time before making bulk purchases for a new successor.

Baasha · Mar 23, 2015

Get ready for some 4-Way SLI Titan X + 5K Madness!

Annisman* · Mar 23, 2015

I'm more impressed with that tissue box than anything else, Baasha got style.

Edit: it looks like a treasure chest. I can't stop looking at it.

Edit2: I bet you've got those lotion medicated tissues in there don't ya.

Titan X Launch

Lifer

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Lifer

Elite Member, Moderator Emeritus

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Golden Member

Elite Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Elite Member, Moderator Emeritus

Elite Member

Lifer

Diamond Member

Golden Member

Golden Member