Discussion Next Navi GPU specifications (Hypothetical!)

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
MCM != chiplet.
Not all MCM's are chiplet but all chiplet packages are MCM's. They are talking about four compute dies on a single package (along with HBM in this example but that's not strictly necessary). So, your statement here is pointless.

Did you even read the paper (and workloads in it)?
Yes, I have read the whole thing, it's been a while but yes. It does seem they are using all compute benchmarks here, I thought there were some graphics workload examples too. In any case, the optimizations that they discussed in the article should apply just fine to graphics workloads as well.

You are basically just crapping all over this thread and not actually contributing anything useful, so it's probably best to just ignore your posts on this thread anyways.
 
Last edited by a moderator:
  • Like
Reactions: xpea

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
Not all MCM's are chiplet but all chiplet packages are MCM's.
W-rong.
I can do chiplets in 3DSoIC and it's not gonna be MCM.
It does seem they are using all compute benchmarks here
They didn't do that just because.
They did that for a reason.
In any case, the optimizations that they discussed in the article should apply just fine to graphics workloads as well.
Graphics isn't FP64 in vacuum, hammer that into your head already.
 

Ottonomous

Senior member
May 15, 2014
559
293
136
Doesn't have "Some GFX10 bug with misaligned multi-dword LDS access in WGP mode."
Workgroup processing problem with LDS allocation? Aren't workgroups effectively limited by the LDS size? Why would there be misalignment in some forms of addressing?
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
W-rong.
I can do chiplets in 3DSoIC and it's not gonna be MCM.

TSMC's press release literally says "Bonding chips together in a 3D structure will allow chip makers to utilise a multi-chip design while benefitting from low latency interconnects and fewer of the performance downsides that are seen in some of today's multi-chip products. "
You are really grasping at straws there. I mean MCM is literally an assembly with multiple chips on it.

So what's your point here, I mean even you said there will be multi chip GPU's in 5-7 years+ Obviously these guys are working on them right now. We wouldn't have a 2 year old research paper in the public otherwise. I really think we will see something a fair bit sooner than that -- more in the 3-4ish year time and I wouldn't be surprised if it wasn't AMD that brings out the first product. Or is your point that we will never see a MCM graphics GPU? "Never" is a pretty bold claim to make, and you'd need to bring some pretty solid evidence to back a statement like that. I mean, honestly, 3dfx was doing multi-chip graphics back in the 90's, although it wasn't the most elegant solution, it did work. Compared to compute, graphics adds some fixed function stuff like geometry, texturing, rasterization, which sure, they make it harder but not impossible. I mean honestly all you are doing in this thread is telling people they are wrong and making snide comments while not adding anything useful or thoughtful. Why don't you try that out?
 
Last edited:

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
Bonding chips together in a 3D structure will allow chip makers to utilise a multi-chip design while benefitting from low latency interconnects and fewer of the performance downsides that are seen in some of today's multi-chip products
MCM in OSAT terminology is multiple chips on organic carrier.
I mean even you said there will be multi chip GPU's in 5-7 years
That's me being very optimistic.
I mean, honestly, 3dfx was doing multi-chip graphics back in the 90's, although it wasn't the most elegant solution, it did work
We're not talking mGPU.
We're talking fully coherent multidie solutions.
 
  • Like
Reactions: Glo.

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
MCM in OSAT terminology is multiple chips on organic carrier.
So those guys wouldn't consider Pentium Pro or POWER5 to be MCM?

Splitting hairs here, surely.

We're talking fully coherent multidie solutions.

And this is the only way we can get chiplet based graphics for games.

And, funnily enough, that nvidia research paper fully covered this, including coming up with some basic optimizations to help overcome the reduced bandwidth and additional latency to 'remote' caches and memory.
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
So those guys wouldn't consider Pentium Pro or POWER5 to be MCM?
These are too ancient for modern OSAT terminology.
funnily enough, that nvidia research paper fully covered this
Unfortunately for you, it didn't cover jobs that don't easily scale to n chips/boards/nodes/you name it.
Jobs like realtime computer graphics.
 
  • Like
Reactions: Glo.

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
Why don't they? Specifically
Very tangible synchronisation and memory access overhead for anything graphics that could easily kill perf.
It's kind of hurts compute too, particularly training (hence why we use xboxhueg dies like V100/Spring Crest and weird scale up setups like DGX-2/whatever Nervana is doing), but it's manageable there.
 
  • Like
Reactions: Glo.

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
Very tangible synchronisation and memory access overhead for anything graphics that could easily kill perf.
It's kind of hurts compute too, particularly training (hence why we use xboxhueg dies like V100/Spring Crest and weird scale up setups like DGX-2/whatever Nervana is doing), but it's manageable there.

Compute suffers just as much from synchronisation and memory access overhead. (Love how you added "Very tangible" to your statement, ) This is all discussed in the paper and like I have mentioned at least two times before this there are some simple optimizations that they presented in the paper to assist with overcoming those issues. You don't need to entirely eliminate the overhead, just reduce it enough that you net more performance because of the greater overall GPU resources you have available. The paper presents it very well -- they talk about the performance of the largest possible gpu that you could physically build and then ways to get more performance than that.

If you are trying to come up with a good argument as to why compute is easy to do on multi-chip, but graphics is not, you need to talk about problems that are unique to (or at least more difficult on) graphics. You keep mentioning that this is "impossible for graphics" but you haven't said anything specifically about how the additional aspects of graphics (stuff like geometry, rasterization, texturing, etc) cannot scale across multiple dies. Rasterization is already done in tiles on a lot of architectures, and each tile can be small and worked on independently. In fact, this approach was initially thought of to save on memory access and improve data locality -- exactly what we need here. Geometry and texturing can similarly be done by splitting the scene up into chunks. I do agree that graphics is more difficult, but I do not agree that is impossible.

Not sure why you mention DGX as an example of avoiding synchronisation and memory access overhead, as that uses multi gpu with NVLink and has much worse synchronisation and memory access overhead than even a hypothetical multi-chip gpu using on-package interconnects.

Also, I just want to add the whole discussion about what is or isn't MCM is pointless. The point is whether a GPU manufacturer has made the choice to invest into solutions to overcoming the hurdles that building a GPU out of multiple dies presents. For the purposes of this discussion it doesn't really matter if the dies are on an organic substrate, a ceramic one, a silicon interposer, or even a more exotic solution. Sure, the more exotic ones will probably allow better interconnects but that's not the point here, remember your argument is that multi die graphics GPU's are essentially impossible and that it will be 5-7+ years before we see any multi chip gpu solution.
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
Compute suffers just as much from synchronisation and memory access overhead
Nowhere near as much as graphics.
Try scaling anything real-time to at least multiple GPUs.
If you are trying to come up with a good argument as to why compute is easy to do on multi-chip, but graphics is not, you need to talk about problems that are unique to (or at least more difficult on) graphics
Easily but not here and you can always annoy the likes of sebbbi on Twitter if you want to.
specifically about how the additional aspects of graphics (stuff like geometry, rasterization, texturing, etc)
Computer graphics isn't FF stuff only.
Geometry and texturing can similarly be done by splitting the scene up into chunks.
Congrats you're doing good old mGPU.
Next.
Not sure why you mention DGX as an example of avoiding synchronisation and memory access overhead
Where did I say "avoiding"?
Your headcanon doesn't work here.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
Nowhere near as much as graphics.
Try scaling anything real-time to at least multiple GPUs.

No substance there.

Easily but not here and you can always annoy the likes of sebbbi on Twitter if you want to.

Nope, none here either.

Where did I say "avoiding"?
Your headcanon doesn't work here.

Let's take a look at the statement:

Very tangible synchronisation and memory access overhead for anything graphics that could easily kill perf.
It's kind of hurts compute too, particularly training (hence why we use xboxhueg dies like V100/Spring Crest and weird scale up setups like DGX-2/whatever Nervana is doing), but it's manageable there.

You basically say "There are synchronization and memory access overhead for graphics, that kind of hurt compute, too, so we use big monolithic gpus like V100" This part makes sense, because big monolithic gpus avoid synchronization and memory access overhead you get from multi die. Then you add "and weird scale up setups like DGX-2" which flys completely against the first half of the statement because it is pretty much worst case scenarios for synchronization and memory access overhead.

In any case, NO SUBSTANCE! You are basically making a hand wavey argument that it doesn't work "because" -- and it's really starting to seem like you don't understand enough about exactly how graphics workloads work to make a solid argument as to exactly why they can't be scaled across multiple dies. But oh the latency!!! Then make sure you have good data locality and build intelligent local caches to hide that deficit. (Again, some good techniques are discussed in the whitepaper).
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
No substance there.
No, genuinely go try it.
Then try some CUDA.
Nope, none here either
Do you expect me to write you some examples for free?
Jeez.
which flys completely against the first half of the statement because it is pretty much worst case scenarios for synchronization and memory access overhead
It's like you never did anything that already scales to multiple nodes.
High bandwidth and coherent > p2p weirdness > going off the node.
But oh the latency!!!
Wow, your headcanon is getting stronger.
Impressive in your desperation.
Very cute though.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
No, genuinely go try it.
Then try some CUDA.

Do you expect me to write you some examples for free?
Jeez.

An example of how it is difficult to scale to multiple discrete GPUs is not an example of why it is harder or impossible to scale graphics than compute to multiple dies. Nvidia's research paper already proves that it is possible to scale compute across multiple dies, that's not even a question. You keep forgetting your argument, that graphics is somehow impossible to do on multiple dies.

It's like you never did anything that already scales to multiple nodes.
High bandwidth and coherent > p2p weirdness > going off the node.

This is random gibberish. What are you even trying to say here? And why do you bring DGX-2 into the situation when you are trying to say that multi die graphics is impossible yet DGX-2 has more latency and less bandwidth than any multi-chip solution would ever have. People use DGX-2 because they figure out ways to mitigate the shortcomings of the platform in exchange for it's greater performance, which is, funnily enough, exactly what you'd have to do for multi-die. Kind of shooting yourself in the foot there.


Wow, your headcanon is getting stronger.
Impressive in your desperation.
Very cute though.

I mean I suppose if you can't even formulate a cohesive argument you can resort to this... Or is this some sort of real-life example of the Dunning–Kruger effect where I am the one bringing a reference into the argument, and you are using nothing but hand wavey magic, but firing 'headcanon' at me, nice.
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
An example of how it is difficult to scale to multiple discrete GPUs is not an example of why it is harder or impossible to scale graphics than compute to multiple dies
It's the very same thing, except with tons more bandwidth between GPUs.
This is random gibberish
You never doing any mGPU jobs doesn't make it gibberish.
What kills mGPU graphics hurts compute too.
Fortunately enough we mitigate that for some workloads by replacing PCIe p2p with something faster and coherent and then just making a fatter node.
People use DGX-2 because they figure out ways to mitigate the shortcomings of the platform in exchange for it's greater performance
People use DGX-2 for those jobs that scale better with fatter nodes instead of node count.
You know, the good old scale-up versus scale-out.
 

Guru

Senior member
May 5, 2017
830
361
106
There is no need for chiplet design for gpu's as of right now for gaming workloads, because the drawbacks are quite big and the advantages small. In terms of compute, AI, chiplet design does make sense and it could be done without much of the negatives that are there for gaming.