News Intel GPUs - waiting for B770

Dayman1225 · Oct 25, 2019

jpiniero said:
If the release is going to be in June, that doesn't seem like that much time from when they have an actual working sample.

Clarification - Swan said "Power on exit" which apparently means this:

that means the initial validation testing Intel has been carrying out on its first discrete Xe graphics card in its own labs has been finished, and finished satisfactorily. Which we assume means the next phase of the process is to get the hardware out for external validation with developers and system builders, whether laptop or desktop.

jpiniero · Oct 25, 2019

The next question to me is how high up the food chain are we talking about. Are we talking about a Ampere "mx350" competitor, or something better than that?

beginner99 · Oct 26, 2019

jpiniero said:
The next question to me is how high up the food chain are we talking about. Are we talking about a Ampere "mx350" competitor, or something better than that?

Obviously this isn't known and I thought it was on 14nm but now writing this, this is also another question, what process is actually used. If it's 10nm, it can't be big...

DrMrLordX · Oct 26, 2019

beginner99 said:
If it's 10nm, it can't be big...

Wellll Agilex is 10nm. But that's an FPGA, and FPGAs are a different sort of animal.

Ajay · Oct 26, 2019

nvm...too much cold medicine.

IntelUser2000 · Oct 27, 2019

beginner99 said:
Obviously this isn't known and I thought it was on 14nm but now writing this, this is also another question, what process is actually used. If it's 10nm, it can't be big...

The top one is 512EUs with likely what's a GDDR6 interface. It can't be that small either. Gen 11 with 64 EUs is ~40mm2. If you scale that up directly you get 300-350mm2 die size. Probably 350-400mm2 if you consider the added I/O interface and the enhanced architecture.

I don't think based on early rumors about the problems they were having the yield issue is a functional one, rather something else. The rumors were that they had hard time making large Gen 10 GPUs on Cannonlake.

A functional yield doesn't really match with a GPU, since GPUs can yield higher due to its repetitive structure. I'm guessing 10nm didn't really save power at higher frequencies, limiting performance.

Gen 11 is probably rearchitected(along with 10nm) to account for that.

Ajay said:
So, when someone has the time I would appreciate a summary (short) of what DG1&2 are, expected due dates and process node. I’m a bit confused - allot of info seems to come from twitter and Reddit - not my usual stomping grounds. I’m getting old I guess

There's very little info other than some driver based leaks.

DG1 had the "LP" or low power designation and DG2 had "HP" or high power/high performance designation, and that's the one with 128/256/512 EUs. Both look to be 10nm.

I don't know what the 2 means. Will it come later with low power parts coming first? Or its just indicating LP and HP?

jpiniero · Oct 30, 2019

IntelUser2000 said:
There's very little info other than some driver based leaks.

DG1 had the "LP" or low power designation and DG2 had "HP" or high power/high performance designation, and that's the one with 128/256/512 EUs. Both look to be 10nm.

I don't know what the 2 means. Will it come later with low power parts coming first? Or its just indicating LP and HP?

That was kind of what I was getting at... just because the driver has it in there doesn't mean it will actually get released.

Dayman1225 · Nov 13, 2019

Intel's 7nm GP-GPU codename is Ponte Vecchio which will be used in the Exascale Supercomputer named Aurora.

Ponte Vecchio will use Foveros 3D Stacking and CLX interconnect. Ponte Vecchio also supports "ultra high cache", "high memory bandwidth" and "high double precision FP throughput"

In a supposed other slide Intel lists all the markets that Xe will target:

HPC/Exascale
DL/Training
Cloud GFX
Media Transcode
Analytics
Workstation
Gaming
PC Mobile
Ultra Mobile

The Aurora Supercompute will use Two Sapphire Rapids CPUs and 6 Ponte Vecchio GPUs per node as well as Intel's OneAPI.

Intel will share more details on November 17th

Source - VideoCardz

DrMrLordX · Nov 13, 2019

7nm Xe uses Foveros? I want to see more details on that.

jpiniero · Nov 13, 2019

DrMrLordX said:
7nm Xe uses Foveros? I want to see more details on that.

The original Foveros patent described a CPU, but it's likely relevant. It had chiplets on top of a mesh+L3 base die.

DrMrLordX · Nov 13, 2019

jpiniero said:
The original Foveros patent described a CPU, but it's likely relevant. It had chiplets on top of a mesh+L3 base die.

Oh I don't doubt it's possible to use it on a GPU design. I want to see what exactly is being stacked here, and how they're going to deal with heat. Are we talking multiple GPU dice stacked on top of one another? Are we talking GPU stacked on top of memory modules? The possibilities here are endless.

jpiniero · Nov 13, 2019

DrMrLordX said:
Oh I don't doubt it's possible to use it on a GPU design. I want to see what exactly is being stacked here, and how they're going to deal with heat. Are we talking multiple GPU dice stacked on top of one another? Are we talking GPU stacked on top of memory modules? The possibilities here are endless.

Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.

Ajay · Nov 13, 2019

jpiniero said:
Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.

Well, IIRC, a typical HPC node has 4 mezzanine GPUs per board. With 6 GPUs per mainboard, a small mezzanine card may be required. Stacking the likely large amount of HBM above or below the GPU die/dice would reduce the size of the mezzanine and minimize the mainboard space required. Anyway, just a thought.

I am really interested in how Granite Rapids may use foveros. Interesting times ahead.

IntelUser2000 · Nov 13, 2019

jpiniero said:
Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.

I agree with this comment. They'll probably have the mesh and L3 like the patent acting as the active interposer and the GPU and HBM chip on top. If the die size is 300mm2, then you can fit something like 800MB SRAM cache on the bottom.

At that point though I assume they'll switch SRAM out for denser technologies because when its that big the speed advantages are nullified due to the travel distance.

Ajay · Nov 14, 2019

IntelUser2000 said:
I agree with this comment. They'll probably have the mesh and L3 like the patent acting as the active interposer and the GPU and HBM chip on top. If the die size is 300mm2, then you can fit something like 800MB SRAM cache on the bottom.

Well, not sure about Intel, but NV and AMD GPU caches are a bit more convoluted than in CPUs. Moving the LLC of die would have create more problems than it's worth, IMHO. GPU cache sizes are being increased, but the need for high bandwidth, low latency caches are still essential for GPU performance. Stacking GPU dram makes a lot of sense though. The HPC targeted GPU could built using chiplets for a larger size as compute work lloads are already designed to be allocated to multiple discrete GPU targets.

IntelUser2000 · Nov 14, 2019

Ajay said:
Well, not sure about Intel, but NV and AMD GPU caches are a bit more convoluted than in CPUs. Moving the LLC of die would have create more problems than it's worth, IMHO.

This is what Foveros allows because its true 3D stacking and it has immense bandwidth and low power per bit not dissimilar to monolithic dies. The cost and difficulty of production is high but if you are aiming it for the datacenter and sell it for $3000+ like with Quadro/Tesla it can be justified. It's like how the Volta uses a custom process with 815mm2 die and silicon interposer on top of that.

They'll likely use EMIB to connect with HBM2 though. Why I don't think they'll stack DRAM is because a) Beyond EMIB is not needed b) cost is high. c) Intel doesn't manufacture HBM leading to further complexities. 3D stacks such as Foveros might be a waste because the bottleneck shifts from the interconnect to the media(in this case DRAM) itself. Maybe they'll put their eDRAM to use or STT-MRAM like some early patents.

While the production costs and difficulty of new process are increasing, we see products being sold in the ever high end as well. That's why up until 2005 the prices have been decreasing, but since then computer prices have been going the other way. At the same time we're getting exotic technologies such as 3D stacking and HBM. They may always be at the high end since we're past the days where semiconductors become cheaper every 2 years.

jpiniero · Nov 16, 2019

GFXBench - Unified cross-platform 3D graphics benchmark database

The first unified cross-platform 3D graphics benchmark database for comparing Android, iOS, Windows 8, Windows Phone 8 and Windows RT capable devices based on graphics processing power.

gfxbench.com

Appears that DG1 is indeed 128 EUs, this one has 96 enabled. So yeah it does look like Intel is only going to compete in the low end. That might be just because of the 10 nm yields that they don't bother with anything bigger/more competitive at this point.

mikk · Nov 17, 2019

DG1 which is based on Gen12LP competes in the lowend, Gen12HP in the higher end market. DG1 is based on Gen12LP while DG2 on Gen12HP. Intel just confirmed at HPC Devcon that Gen12LP operates in the 5-20W range, so yes this is a 96EU or 128EU low end part. But don't be wrong and believe Intel won't go higher with their Xe.

Dayman1225 · Nov 17, 2019

mikk said:
DG1 which is based on Gen12LP competes in the lowend, Gen12HP in the higher end market. DG1 is based on Gen12LP while DG2 on Gen12HP. Intel just confirmed at HPC Devcon that Gen12LP operates in the 5-20W range, so yes this is a 96EU or 128EU low end part. But don't be wrong and believe Intel won't go higher with their Xe.

I'd like to add that they also said it could scale to 50w

Dayman1225 · Nov 17, 2019

Intel’s 2021 Exascale Vision in Aurora: Two Sapphire Rapids CPUs with Six Ponte Vecchio GPUs

Intel’s Xe for HPC: Ponte Vecchio with Chiplets, EMIB, and Foveros on 7nm, Coming 2021

Lots of info here, some I already posted from the leaks.

swilli89 · Nov 17, 2019

Hot take: Intel's XE will never be released in consumer format. It will never be competitive in Vulkan/DX12 versus NV or AMD.

What it will be used for is super computer and enterprise applications where Intel can help with custom libraries and support for specific data sets and algorithms. It will probably succeed in this space but it will never be a Radeon/Geforce competitor.

DrMrLordX · Nov 17, 2019

So Intel wants nothing to do with stuff like OpenCAPI, GenZ or CCIX. Hmm. Also it looks like IceLake-SP is going to be late 2020 (surprise!) so if Intel is delivering Sapphire Rapids in 2021, that gives IceLake-SP a year or less of market presence.

lobz · Nov 18, 2019

DrMrLordX said:
So Intel wants nothing to do with stuff like OpenCAPI, GenZ or CCIX. Hmm. Also it looks like IceLake-SP is going to be late 2020 (surprise!) so if Intel is delivering Sapphire Rapids in 2021, that gives IceLake-SP a year or less of market presence.

They are also saying it comes on 7nm in 2021... I'm calling 10000% utter BS here, unless they mean foundry manufacturing. Which I also don't believe.

IntelUser2000 · Nov 18, 2019

"EMIB for HBM"
"Foveros for RAMBO cache"

Pretty much expected.

40x DP FP Flops per EU. The Gen 9 GPU has 1/4 FP64, so they are comparing it to Gen 11, that basically has FP64 units removed. I wonder what Gen 11's DP FP compute is then?

Unfortunately the information revealed today has little to no relevance for us.

IntelUser2000 · Nov 18, 2019

Interesting thing regarding power for the Aurora nodes.

So Ian is estimating on his article Aurora consists of 2400 nodes, with each node having 2x Sapphire Rapids CPU and 6x Ponte Vecchio GPU. And based on that he's estimating the Ponte Vecchio portion having DP FP throughput of 67 TFlops.

The new systems are aiming for the 40MW power range. If you assume 90% of the power is used by the nodes, then we end up with,

15KW(15,000W) power per node. If we further assume 80% is taken up by the compute GPUs, then 12KW for six, or 2KW per GPU.

If we assume nodes only take 80% power, and GPUs only take 70% of the power, and there's 10% power conversion losses(meaning 40MW is actual used power), then we end up with each GPU using 1.4KW.

1400-2000W for a compute class GPU.

News Intel GPUs - waiting for B770

Golden Member

Lifer

Diamond Member

Lifer

Lifer

Elite Member

Lifer

Golden Member

Lifer

Lifer

Lifer

Lifer

Lifer

Elite Member

Lifer

Elite Member

Lifer

Diamond Member

Golden Member

Golden Member

Golden Member

Lifer

Platinum Member

Elite Member

Elite Member