• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

NVIDIA Pascal Thread

Page 37 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
inside Pascal:
https://devblogs.nvidia.com/parallelforall/inside-pascal/

again, who said Pascal is Maxwell on 16FF ? 😀😀😀
Nvidia themselves:
http://cdn.wccftech.com/wp-content/uploads/2015/09/NVIDIA-Pascal-GPU_Compute-Performance.jpg

thats fp32 cores. + 1792 fp 64 cores means it is actually 5376

Also confirmed 610mm2 - even bigger than GM200. New node and right to the reticle limit right away. incredible

Nope. It looks like FP64 on Tesla P100 uses two FP32 cores, to get the code executed.

Yup:
Nvidiablog said:
GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.

It is 3584 CUDA core FP32 GPU, and FP64 is executed by using two FP32 cores at the same time, to get FP64.
 
Last edited:
Nope. It looks like FP64 on Tesla P100 uses two FP32 cores, to get the code executed.

Yup:

It is 3584 CUDA core FP32 GPU, and FP64 is executed by using two FP32 cores at the same time, to get FP64.

No it doesn't, look at the diagram (separate SP and DP cores):

gp100_SM_diagram-624x452.png


Just like Kepler:

smxdiagram.png
 
I'm not sure if they changed the arrangement of their geometry processors but if the rule still holds that 1 GPC = 1 raster engine then Pascal is somewhat of a disappointment triangle output wise since it can still only rasterize 6 triangles per cycle which is the same triangle rate compared to GM200 ...

I can't be too sure but I don't see dedicated compute engines either on the GP100 diagram so interpret that as you will ...
 
antihelten go back again to my post. Whats more, your screen from GP100 shows exactly what I have said. Look at cache, and read what Nvidia said about it.

It is 3584 CUDA core GPU that uses Two FP32 cores to execute FP64.

I have to say: It is efficient design of FP64...
 
antihelten go back again to my post. Whats more, your screen from GP100 shows exactly what I have said. Look at cache, and read what Nvidia said about it.

It is 3584 CUDA core GPU that uses Two FP32 cores to execute FP64.

I have to say: It is efficient design of FP64...

Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput.
 
Holy shit 1500mhz boost for big pascal.Also new architecture.
This GPU will anhilate GM200

i am eager to see Gp104...Without DP units ...Base clock 1400mhz and boost 1600Mhz?
Maybe:
2560SP
160TMU

TDP 180w

base clock 1400mhz
Boost clock 1600Mhz
 
Last edited:
Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

FP64 unit doesn't necessarily mean it's a full FP64 core that runs independently of the FP32 cores.

Does Kepler have 2880 FP32 cores + 960 FP64 cores, or does it have 2880 cores that can do FP64 at 1/3rd rate?
 
Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

If they would be separate then ok. But it does not work that way you described.

Overall you are right, it does not use FP32 to get FP64. But it has still 3584 CUDA cores.
 
Last edited:
It's the GPU die size, not the substrate.
I don't mean the size of the substrate itself. I was talking about the components mounted on the substrate (GPU + memory).

Both Fiji and Maxwell had GPU's around 600mm^2 at 28nm. I'm a bit skeptical that Nvidia will come out with a GPU the exact same size on 16nm. The transistor core would have to be incredibly high to account for a GPU that large on 16nm. I think it's more likely that the 610mm^2 reference was for the area of the GPU + the area of the HBM2 since they're now packaged together on the substrate as a single unit.
 
inside Pascal:
https://devblogs.nvidia.com/parallelforall/inside-pascal/

again, who said Pascal is Maxwell on 16FF ? 😀😀😀

the only true leak that came from nvidia was one

pascal was going to be a maxwell with bigger sp and dp perf this was the only actual leak we had from nvidia themselfs

turns out its true and turns out to be the only true rumor/leak so far

this is a titan x
http://images.anandtech.com/doci/9059/TITAN_X_Block_Diagram_FINAL.png
this is pascal
https://devblogs.nvidia.com/paralle...ads/2016/04/gp100_block_diagram-1-624x368.png

take out nvlink and the hbm memory links and what you have is what they actually said

(also on that devblog they said they have a bigger number of threads compared to kepler while they have the same as maxwell...
 
FP64 unit doesn't necessarily mean it's a full FP64 core that runs independently of the FP32 cores.

Does Kepler have 2880 FP32 cores + 960 FP64 cores, or does it have 2880 cores that can do FP64 at 1/3rd rate?

I'm not saying it's running independently as such, merely that a FP64 unit/core is not simply an abstraction of 2 FP32 cores joining together to run DP, but that it is instead a separate functional unit of it's own.

If they would be separate then ok. But it does not work that way you described.

I never said anything about how it worked, you were the only one to do so (saying that DP workloads where handled by 2 FP32 cores instead of by a separate FP64 core)

I simply said that the FP64 cores where distinct separate units.
 
Last edited:
I don't mean the size of the substrate itself. I was talking about the components mounted on the substrate (GPU + memory).

Both Fiji and Maxwell had GPU's around 600mm^2 at 28nm. I'm a bit skeptical that Nvidia will come out with a GPU the exact same size on 16nm. The transistor core would have to be incredibly high to account for a GPU that large on 16nm. I think it's more likely that the 610mm^2 reference was for the area of the GPU + the area of the HBM2 since they're now packaged together on the substrate as a single unit.

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Nvidia specifically says 610mm² for the GPU die itself, and shows the known 601mm² die size for Maxwell.
 
So much for the posters claiming that the new single gpu Titan was imminent! I think we're in for another year's wait.
 
It's 610mm2 big single die.

The clocks are ok but not that impressive given we're looking at a 300W part compared to a 250W Maxwell. Plenty of 980 Ti's out there at 1300MHz and beyond...
 
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many folks here to have such an opinion, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but we won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

EDIT: I am disappointed in the timelines being so far out. If we're looking at Q1 2017 for Tesla, we won't see consumer versions of this for at least another year.
 
Last edited:
I'm not saying it's running independently as such, merely that a FP64 unit/core is not simply an abstraction of 2 FP32 cores joining together to run DP, but that it is instead a separate functional unit of it's own.



I never said anything about how it worked, you were the only one to do so (saying that DP workloads where handled by 2 FP32 cores instead of by a separate FP64 core)

I simply said that the FP64 cores where distinct separate units.

Yes, you are right on that, it does not use FP32 to get FP64, my bad. But it still is also not 53something GPU. It is still 3584 CUDA core GPU.

P.S. They still show DP on Maxwell and Kepler it should tell you how to understand Pascal also.
 
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many here but folks on either side to make such a decision, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but I won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

It's meh, probably noticeably worse in perf/transistor in gaming scenarios but the engineering feat is nothing short of great ...
 
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many folks here to have such a reaction, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but I won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.

I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
 
Guys, if I'm correct Pascal will only bring around 75 % increase in gaming performance. Looking at gm200 to P100 the GFLOPS is only 75 % increase.

I'm not a computer expert but is this correct?
 
AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.

I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
Not to mention the ~18B transistors in Vega will hold them in good stead, 14nm might just be the difference between the two GPU makers this time around.
 
Back
Top