News Intel GPUs - Intel launches A580

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
If the release is going to be in June, that doesn't seem like that much time from when they have an actual working sample.
Clarification - Swan said "Power on exit" which apparently means this:

that means the initial validation testing Intel has been carrying out on its first discrete Xe graphics card in its own labs has been finished, and finished satisfactorily. Which we assume means the next phase of the process is to get the hardware out for external validation with developers and system builders, whether laptop or desktop.
 

jpiniero

Lifer
Oct 1, 2010
14,580
5,203
136
The next question to me is how high up the food chain are we talking about. Are we talking about a Ampere "mx350" competitor, or something better than that?
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
The next question to me is how high up the food chain are we talking about. Are we talking about a Ampere "mx350" competitor, or something better than that?

Obviously this isn't known and I thought it was on 14nm but now writing this, this is also another question, what process is actually used. If it's 10nm, it can't be big...
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Obviously this isn't known and I thought it was on 14nm but now writing this, this is also another question, what process is actually used. If it's 10nm, it can't be big...

The top one is 512EUs with likely what's a GDDR6 interface. It can't be that small either. Gen 11 with 64 EUs is ~40mm2. If you scale that up directly you get 300-350mm2 die size. Probably 350-400mm2 if you consider the added I/O interface and the enhanced architecture.

I don't think based on early rumors about the problems they were having the yield issue is a functional one, rather something else. The rumors were that they had hard time making large Gen 10 GPUs on Cannonlake.

A functional yield doesn't really match with a GPU, since GPUs can yield higher due to its repetitive structure. I'm guessing 10nm didn't really save power at higher frequencies, limiting performance.

Gen 11 is probably rearchitected(along with 10nm) to account for that.

So, when someone has the time I would appreciate a summary (short) of what DG1&2 are, expected due dates and process node. I’m a bit confused - allot of info seems to come from twitter and Reddit - not my usual stomping grounds. I’m getting old I guess :(

There's very little info other than some driver based leaks.

DG1 had the "LP" or low power designation and DG2 had "HP" or high power/high performance designation, and that's the one with 128/256/512 EUs. Both look to be 10nm.

I don't know what the 2 means. Will it come later with low power parts coming first? Or its just indicating LP and HP?
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,580
5,203
136
There's very little info other than some driver based leaks.

DG1 had the "LP" or low power designation and DG2 had "HP" or high power/high performance designation, and that's the one with 128/256/512 EUs. Both look to be 10nm.

I don't know what the 2 means. Will it come later with low power parts coming first? Or its just indicating LP and HP?

That was kind of what I was getting at... just because the driver has it in there doesn't mean it will actually get released.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
Intel's 7nm GP-GPU codename is Ponte Vecchio which will be used in the Exascale Supercomputer named Aurora.

Ponte Vecchio will use Foveros 3D Stacking and CLX interconnect. Ponte Vecchio also supports "ultra high cache", "high memory bandwidth" and "high double precision FP throughput"

In a supposed other slide Intel lists all the markets that Xe will target:
  1. HPC/Exascale
  2. DL/Training
  3. Cloud GFX
  4. Media Transcode
  5. Analytics
  6. Workstation
  7. Gaming
  8. PC Mobile
  9. Ultra Mobile
The Aurora Supercompute will use Two Sapphire Rapids CPUs and 6 Ponte Vecchio GPUs per node as well as Intel's OneAPI.

Intel will share more details on November 17th

Source - VideoCardz
 

DrMrLordX

Lifer
Apr 27, 2000
21,616
10,823
136
The original Foveros patent described a CPU, but it's likely relevant. It had chiplets on top of a mesh+L3 base die.

Oh I don't doubt it's possible to use it on a GPU design. I want to see what exactly is being stacked here, and how they're going to deal with heat. Are we talking multiple GPU dice stacked on top of one another? Are we talking GPU stacked on top of memory modules? The possibilities here are endless.
 

jpiniero

Lifer
Oct 1, 2010
14,580
5,203
136
Oh I don't doubt it's possible to use it on a GPU design. I want to see what exactly is being stacked here, and how they're going to deal with heat. Are we talking multiple GPU dice stacked on top of one another? Are we talking GPU stacked on top of memory modules? The possibilities here are endless.

Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.
 
  • Like
Reactions: IntelUser2000

Ajay

Lifer
Jan 8, 2001
15,422
7,842
136
Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.
Well, IIRC, a typical HPC node has 4 mezzanine GPUs per board. With 6 GPUs per mainboard, a small mezzanine card may be required. Stacking the likely large amount of HBM above or below the GPU die/dice would reduce the size of the mezzanine and minimize the mainboard space required. Anyway, just a thought.

I am really interested in how Granite Rapids may use foveros. Interesting times ahead.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Probably not that exotic. Maybe multiple GPU chiplets on top of the base die, and HBM2/3 connected via EMIB like Kaby-G.

I agree with this comment. They'll probably have the mesh and L3 like the patent acting as the active interposer and the GPU and HBM chip on top. If the die size is 300mm2, then you can fit something like 800MB SRAM cache on the bottom.

At that point though I assume they'll switch SRAM out for denser technologies because when its that big the speed advantages are nullified due to the travel distance.
 

Ajay

Lifer
Jan 8, 2001
15,422
7,842
136
I agree with this comment. They'll probably have the mesh and L3 like the patent acting as the active interposer and the GPU and HBM chip on top. If the die size is 300mm2, then you can fit something like 800MB SRAM cache on the bottom.

Well, not sure about Intel, but NV and AMD GPU caches are a bit more convoluted than in CPUs. Moving the LLC of die would have create more problems than it's worth, IMHO. GPU cache sizes are being increased, but the need for high bandwidth, low latency caches are still essential for GPU performance. Stacking GPU dram makes a lot of sense though. The HPC targeted GPU could built using chiplets for a larger size as compute work lloads are already designed to be allocated to multiple discrete GPU targets.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Well, not sure about Intel, but NV and AMD GPU caches are a bit more convoluted than in CPUs. Moving the LLC of die would have create more problems than it's worth, IMHO.

This is what Foveros allows because its true 3D stacking and it has immense bandwidth and low power per bit not dissimilar to monolithic dies. The cost and difficulty of production is high but if you are aiming it for the datacenter and sell it for $3000+ like with Quadro/Tesla it can be justified. It's like how the Volta uses a custom process with 815mm2 die and silicon interposer on top of that.

They'll likely use EMIB to connect with HBM2 though. Why I don't think they'll stack DRAM is because a) Beyond EMIB is not needed b) cost is high. c) Intel doesn't manufacture HBM leading to further complexities. 3D stacks such as Foveros might be a waste because the bottleneck shifts from the interconnect to the media(in this case DRAM) itself. Maybe they'll put their eDRAM to use or STT-MRAM like some early patents.

While the production costs and difficulty of new process are increasing, we see products being sold in the ever high end as well. That's why up until 2005 the prices have been decreasing, but since then computer prices have been going the other way. At the same time we're getting exotic technologies such as 3D stacking and HBM. They may always be at the high end since we're past the days where semiconductors become cheaper every 2 years.
 

jpiniero

Lifer
Oct 1, 2010
14,580
5,203
136

Appears that DG1 is indeed 128 EUs, this one has 96 enabled. So yeah it does look like Intel is only going to compete in the low end. That might be just because of the 10 nm yields that they don't bother with anything bigger/more competitive at this point.
 

mikk

Diamond Member
May 15, 2012
4,133
2,136
136
DG1 which is based on Gen12LP competes in the lowend, Gen12HP in the higher end market. DG1 is based on Gen12LP while DG2 on Gen12HP. Intel just confirmed at HPC Devcon that Gen12LP operates in the 5-20W range, so yes this is a 96EU or 128EU low end part. But don't be wrong and believe Intel won't go higher with their Xe.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
DG1 which is based on Gen12LP competes in the lowend, Gen12HP in the higher end market. DG1 is based on Gen12LP while DG2 on Gen12HP. Intel just confirmed at HPC Devcon that Gen12LP operates in the 5-20W range, so yes this is a 96EU or 128EU low end part. But don't be wrong and believe Intel won't go higher with their Xe.
I'd like to add that they also said it could scale to 50w
 

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
Hot take: Intel's XE will never be released in consumer format. It will never be competitive in Vulkan/DX12 versus NV or AMD.

What it will be used for is super computer and enterprise applications where Intel can help with custom libraries and support for specific data sets and algorithms. It will probably succeed in this space but it will never be a Radeon/Geforce competitor.
 

DrMrLordX

Lifer
Apr 27, 2000
21,616
10,823
136
So Intel wants nothing to do with stuff like OpenCAPI, GenZ or CCIX. Hmm. Also it looks like IceLake-SP is going to be late 2020 (surprise!) so if Intel is delivering Sapphire Rapids in 2021, that gives IceLake-SP a year or less of market presence.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
So Intel wants nothing to do with stuff like OpenCAPI, GenZ or CCIX. Hmm. Also it looks like IceLake-SP is going to be late 2020 (surprise!) so if Intel is delivering Sapphire Rapids in 2021, that gives IceLake-SP a year or less of market presence.
They are also saying it comes on 7nm in 2021... I'm calling 10000% utter BS here, unless they mean foundry manufacturing. Which I also don't believe.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
"EMIB for HBM"
"Foveros for RAMBO cache"

Pretty much expected. :)

40x DP FP Flops per EU. The Gen 9 GPU has 1/4 FP64, so they are comparing it to Gen 11, that basically has FP64 units removed. I wonder what Gen 11's DP FP compute is then?

Unfortunately the information revealed today has little to no relevance for us.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Interesting thing regarding power for the Aurora nodes.

So Ian is estimating on his article Aurora consists of 2400 nodes, with each node having 2x Sapphire Rapids CPU and 6x Ponte Vecchio GPU. And based on that he's estimating the Ponte Vecchio portion having DP FP throughput of 67 TFlops.

The new systems are aiming for the 40MW power range. If you assume 90% of the power is used by the nodes, then we end up with,

15KW(15,000W) power per node. If we further assume 80% is taken up by the compute GPUs, then 12KW for six, or 2KW per GPU.

If we assume nodes only take 80% power, and GPUs only take 70% of the power, and there's 10% power conversion losses(meaning 40MW is actual used power), then we end up with each GPU using 1.4KW.

1400-2000W for a compute class GPU.