The solution to midrange vs high end markets... Why can't amd make a scalable gpu?

Irenicus · Jun 2, 2016

This applies to nvidia as well but I don't care about them as much.

People always talk about yields being lower on a new process, hence the cost of the larger dies being much greater with a new generation. This leads companies like amd and nvidia choosing smaller die sizes initially while yields improve over time.

Multi gpu often sucks, it's dependent on developer support and even with that often has issues.

So why can't amd create a megazord like gpu? A gpu where several smaller gpu dies come together and form a single larger die and most importantly, BEHAVE like a single larger die? What is the technical constraint there?

Is is the proximity to the memory? The inability to properly share the same data to the different smaller gpu pieces at a time?

But can't there be a common pool of memory stuck somewhere on the board with optical interconnects to the individual gpu components? I thought that was the whole point of that talk about the machine hp was building and later backed away from?

Would optical interconnects between the different gpu dies be insufficient to allow enough bandwidth and speed to work as a single unit? What else is the constraint here?

There must be an answer. OR is this the kind of thing amd means on that slide with navi when talking about a more scalable gpu?

If amd had this tech, there would be no need to target midrange vs higher end, they could create smaller dies that were less expensive due to the way wafer defects affect the economy of chip production, and scale them up to whatever they wanted.

Bacon1 · Jun 2, 2016

GPUs are already like that basically. Unlike CPUs which have 2-8 cores, GPUs have thousands. What makes a big GPU big is having 2-3x the amount of "cores" as the smaller GPUs.

Think of it like a 2 core CPU as a 1000 core GPU, the 8 core CPU is a 4000 core GPU.

The reason MGpu doesn't work well is because the GPUs have to pass eachother information about what work each is going to do, and what work the other has done (and often share texture data and everything else).

Fallen Kell · Jun 2, 2016

Most GPUs have a small portion of it that are designed to be able to have had a manufacturing defect and can then be binned as the next card lower in the performance/market segment. But no one does this for the entire card. The constraint is product design, testing, and cost associated with it. While in theory it sounds all and good to use larger dies that had more portions of them fail during the manufacturing process as a smaller/less powerful product line. The reality is eventually the manufacturing process matures and you are then selling fully working dies with a portion intentionally disabled at a lower cost than you could get for that exact same part. Your fixed costs on that piece are still the same and you are now intentionally saying I am going to sell this for less money than I could get for the product otherwise to fill a specific market segment.

It also costs a lot more money to design such a product, money that AMD quite frankly does not have. If you havn't looked at their financials, they have been bleeding cash for the last 5 years!

richaron · Jun 2, 2016

AMD has been working on similar tech for ages, and is years ahead of the competition.

The answer is Navi.

Glo. · Jun 2, 2016

All what you are talking about is HSA 2.0 foundation principle. Nothing new. Next generation GPUs from AMD will be designed with this in the first place.

Problems are software level to overcome. That is the biggest problem here.

This is last node where we will see 600 mm2 die size from any of the vendors(Nvidia OR AMD). It will simply migrate into smaller die sizes like 80 mm2 and smaller. So high-end in future will not be one single big GPU with 250W. But two 350mm2 dies combined on single PCB with 300W of TDP total. High-End market becomes enthusiast market. Mainstream market becomes High-End. low-end market becomes mainstream. At least that is the picture from production cost perspective.

tonyfreak215 · Jun 2, 2016

richaron said:
AMD has been working on similar tech for ages, and is years ahead of the competition.

The answer is Navi.

Glo. said:
All what you are talking about is HSA 2.0 foundation principle. Nothing new. Next generation GPUs from AMD will be designed with this in the first place.

Problems are software level to overcome. That is the biggest problem here.

I believe this is part of AMD's long term strategy with Direct X 12.

AdoredTV did a great video on it.
https://www.youtube.com/watch?v=aSYBO1BrB1I

Headfoot · Jun 2, 2016

This topic has already been discussed here: http://forums.anandtech.com/showthread.php?t=2461027&highlight=interposer

That thread isn't old enough that it couldnt continue to be the location for this discussion

maddie · Jun 2, 2016

Irenicus said:
This applies to nvidia as well but I don't care about them as much.

People always talk about yields being lower on a new process, hence the cost of the larger dies being much greater with a new generation. This leads companies like amd and nvidia choosing smaller die sizes initially while yields improve over time.

Multi gpu often sucks, it's dependent on developer support and even with that often has issues.

So why can't amd create a megazord like gpu? A gpu where several smaller gpu dies come together and form a single larger die and most importantly, BEHAVE like a single larger die? What is the technical constraint there?

Is is the proximity to the memory? The inability to properly share the same data to the different smaller gpu pieces at a time?

But can't there be a common pool of memory stuck somewhere on the board with optical interconnects to the individual gpu components? I thought that was the whole point of that talk about the machine hp was building and later backed away from?

Would optical interconnects between the different gpu dies be insufficient to allow enough bandwidth and speed to work as a single unit? What else is the constraint here?

There must be an answer. OR is this the kind of thing amd means on that slide with navi when talking about a more scalable gpu?

If amd had this tech, there would be no need to target midrange vs higher end, they could create smaller dies that were less expensive due to the way wafer defects affect the economy of chip production, and scale them up to whatever they wanted.

There is nothing technical to prevent a segmented GPU design. Just look at SOC designs. Each of their component blocks once stood alone. So technically being possible is not the problem.

My view is that it needed some method of transferring the vast data volume associated with GPUs. Now that transposers are becoming cheap enough for general use, this block is removed and the new problem is energy consumed in the increased distances data have to flow. There are exponential increases in energy use by increasing data transport distances.

My guess is that you can downclock the entire system, make it much wider and get overall savings. HBM as developed by AMD follows this philosophy. We get to see what Navi will bring in this regard.

Here we have the two GPU companies pursuing polar opposites of design. Slower and
wider versus faster and narrower. Will be interesting to see the next few years.

My view also is that this wider, slower segmented design has been in the making for years and Navi will be the first implementation of this new GPU design branch. Read the research papers and patents.

AMD also appears to be leveraging the benefits of Vulkan and DX12 in multi-GPU to help with some of the practical problems of a pure hardware solution.

Concillian · Jun 2, 2016

Consider that GPU silicon dedicates most, if not all of the outer "ring" of the silicon to pads to connect I/O and power between the package and the silicon. Logistically, these must be on the outside, you cannot interconnect with the center of a piece of silicon... these need to be on the outside ring of the silicon. You need a lot of I/O pathways for wide memory buses that video cards use. So many that the lower limit of the size for a given memory bus width is determined by the amount of space you need for these pads / interconnects. Usually your lowest end GPUs are close to this minimum size, maybe 10% larger... something in that range.

If you were to design a GPU that would have the option of using 1, 2, or 4 GPUs, then you couldn't have a full ring for these pads, you'd be limited to 2 sides on the smallest version, as the other sides would need to have the option to connect with other dies in your multi-setup. Since you'd have HALF the linear space for pads on the smallest GPU (2 sides instead of 4), and the area is the product of the sides, and these sides need to be double the length to fit all the interconnects you'd have been able to fit when you can use all 4 sides, the minimum area of the die QUADRUPLES.

This is not at all economically feasible for what should be very obvious reasons. All of your product die sizes become HUGE.

Lets use the GTX750 as an example, this silicon is 148mm2, I don't know how close to the minimum size this is, let's say the actual minimum for 75w and 128 bit memory to have enough pad space is an even 100mm2, 4 sides of 100mm or 400mm of linear space to make all your interconnects. So your design takes away 2 sides for pads, so now you need 2 sides 200mm long... now your smallest possible silicon size becomes 400mm2 ?!? This is bigger than a 980Ti die for a 128 bit memory bus. How do you think a card with the execution units and price of a GTX980 would fare when coupled with the memory bandwidth of a GTX750Ti?

This, of course, ignores the engineering hurdles of how you have high speed data paths that work when you don't cut the thing into 1/4ths but are inactive when you do cut it into 1/4ths. Plus the amount of wasted silicon space ($$$) you'd need to leave on your high end GPUs to give room to make the cuts so you could cut your low end GPUs. Silicon is very expensive, and wasted area is a good way to put your operating expenses way over your competition's. This idea doesn't work on many levels.

dark zero · Jun 2, 2016

Is in the similar fashion nVIDIA can't deliver lower tier dies.... the process is not mature for both

maddie · Jun 2, 2016

Concillian said:
Consider that GPU silicon dedicates most, if not all of the outer "ring" of the silicon to pads to connect I/O and power between the package and the silicon. Logistically, these must be on the outside, you cannot interconnect with the center of a piece of silicon... these need to be on the outside ring of the silicon. You need a lot of I/O pathways for wide memory buses that video cards use. So many that the lower limit of the size for a given memory bus width is determined by the amount of space you need for these pads / interconnects. Usually your lowest end GPUs are close to this minimum size, maybe 10% larger... something in that range.

If you were to design a GPU that would have the option of using 1, 2, or 4 GPUs, then you couldn't have a full ring for these pads, you'd be limited to 2 sides on the smallest version, as the other sides would need to have the option to connect with other dies in your multi-setup. Since you'd have HALF the linear space for pads on the smallest GPU (2 sides instead of 4), and the area is the product of the sides, and these sides need to be double the length to fit all the interconnects you'd have been able to fit when you can use all 4 sides, the minimum area of the die QUADRUPLES.

This is not at all economically feasible for what should be very obvious reasons. All of your product die sizes become HUGE.

Lets use the GTX750 as an example, this silicon is 148mm2, I don't know how close to the minimum size this is, let's say the actual minimum for 75w and 128 bit memory to have enough pad space is an even 100mm2, 4 sides of 100mm or 400mm of linear space to make all your interconnects. So your design takes away 2 sides for pads, so now you need 2 sides 200mm long... now your smallest possible silicon size becomes 400mm2 ?!? This is bigger than a 980Ti die for a 128 bit memory bus. How do you think a card with the execution units and price of a GTX980 would fare when coupled with the memory bandwidth of a GTX750Ti?

This, of course, ignores the engineering hurdles of how you have high speed data paths that work when you don't cut the thing into 1/4ths but are inactive when you do cut it into 1/4ths. Plus the amount of wasted silicon space ($$$) you'd need to leave on your high end GPUs to give room to make the cuts so you could cut your low end GPUs. Silicon is very expensive, and wasted area is a good way to put your operating expenses way over your competition's. This idea doesn't work on many levels.

While this would be true for a traditional design, the use of interposers and associated microbumb technology allows orders of magnitude improvement in connection density.

You can also use areas inside the edge of the die and not be limited to edges.

How do you explain the use of a 4096 bit memory bus with a 600mm^2 GPU and this is not even close to the microbump limit which is around 400/mm^2 with research leading to even higher densities.

Xylink has a 4 die 600mm^2 processor with more than 10,000 connections between each successive sub unit for some years now.

This means that you CAN use very small dies for a distributed GPU without the old worries of pad space limitations.

I think you're giving out false information.

xpea · Jun 3, 2016

maddie said:
My view is that it needed some method of transferring the vast data volume associated with GPUs. Now that transposers are becoming cheap enough for general use, this block is removed and the new problem is energy consumed in the increased distances data have to flow. There are exponential increases in energy use by increasing data transport distances.

^^ THIS
multi GPU will never be competitive in term of power efficiency because calculation (FLOP) is cheap in terms of energy but moving data is not...

maddie · Jun 3, 2016

xpea said:
^^ THIS
multi GPU will never be competitive in term of power efficiency because calculation (FLOP) is cheap in terms of energy but moving data is not...

Don't you think that is a bit harsh. Localization of data is the key to reducing energy use and this applies both to a monolithic GPU as well as a multi-die one. As all things, the key is to find cost effective solutions.

At the upper limit, a traditional monolithic die appears to be maxed out at around 600mm^2. A multi-die one however can go past that limit easily.

For the roughest of comparisons. Say a 900mm^2 multi-die GPU is made to compete with a 600mm^2 monolithic die. We could assume a 2/3 speed as giving equivalent shader performance. Power would be much lower.

Look at HBM memory. Very wide with a much lower clock rate leading to 70% + saving in power consumed and this is directly related to data movement, not processing.

Finally, with through silicon VIAs providing electrical pathways and the ability to transport heat from the internal die to the top, maybe we will see true 3D integration. This is by far the best way to reduce signal lengths and thus power lost by data flows. Multi-die with 3D integration might be a solution.

Midwayman · Jun 3, 2016

tonyfreak215 said:
I believe this is part of AMD's long term strategy with Direct X 12.

AdoredTV did a great video on it.
https://www.youtube.com/watch?v=aSYBO1BrB1I

That is freaking awesome. If he is right about MS and Sony going multi gpu on the next consoles it is about to change everything in a big way.

tonyfreak215 · Jun 3, 2016

Midwayman said:
That is freaking awesome. If he is right about MS and Sony going multi gpu on the next consoles it is about to change everything in a big way.

AMD is already confirmed for the next-gen of consoles. If it doesn't happen then, it will certainly happen the ones after that. If only AMD can hold on until then.

Mopetar · Jun 3, 2016

tonyfreak215 said:
AMD is already confirmed for the next-gen of consoles. If it doesn't happen then, it will certainly happen the ones after that. If only AMD can hold on until then.

AMD will still be around in ten years even if they're still horribly unsuccessful to that point. Intel would be considered a de facto monopoly without AMD around which Intel does not want to deal with at all.

Zen should be good enough that AMD can start making money again and eventually they'll be completely free of Global Foundries which is another millstone removed from their neck.

tonyfreak215 · Jun 3, 2016

Mopetar said:
AMD will still be around in ten years even if they're still horribly unsuccessful to that point. Intel would be considered a de facto monopoly without AMD around which Intel does not want to deal with at all.

Zen should be good enough that AMD can start making money again and eventually they'll be completely free of Global Foundries which is another millstone removed from their neck.

But would Intel be a monopoly? It's been having a lot of competition from ARM recently. Even then, its not like Intel would give AMD cash to keep them afloat.

I', very excited for ZEN. I've always been an AMD fan, but their recent processors have been horrible.

Mat3 · Jun 3, 2016

Rather than two identical GPUs on an interposer, what about separating different parts of one GPU?

Specifically I mean separate the shader array from the rest of the chip. So we'd have 1 processor, the shader array, connected to the second processor (ROPs, memory controllers and interface, misc. logic), which is itself connected to a couple blocks of stacked HBM RAM. The two chips, especially the middle one that connects to both memory and the shader chip, would be a long rectangular shape to maximize the perimeter for all the interfaces.

This idea wouldn't be completely unheard of before: Xbox 360 is somewhat similar in concept. It keeps the ROPs (biggest bandwidth user?) closest to memory and let's them separate a 600mm2 GPU like Fiji into two more manageable ones. The interposer should allow for the shader array to have enough bandwidth. And you wouldn't have to worry about how to share memory.

Thoughts? Feasible? Terrible idea?

maddie · Jun 3, 2016

Mat3 said:
Rather than two identical GPUs on an interposer, what about separating different parts of one GPU?

Specifically I mean separate the shader array from the rest of the chip. So we'd have 1 processor, the shader array, connected to the second processor (ROPs, memory controllers and interface, misc. logic), which is itself connected to a couple blocks of stacked HBM RAM. The two chips, especially the middle one that connects to both memory and the shader chip, would be a long rectangular shape to maximize the perimeter for all the interfaces.

This idea wouldn't be completely unheard of before: Xbox 360 is somewhat similar in concept. It keeps the ROPs (biggest bandwidth user?) closest to memory and let's them separate a 600mm2 GPU like Fiji into two more manageable ones. The interposer should allow for the shader array to have enough bandwidth. And you wouldn't have to worry about how to share memory.

Thoughts? Feasible? Terrible idea?

Post #7 in this thread has a link you should find most interesting.

Mat3 · Jun 4, 2016

maddie said:
Post #7 in this thread has a link you should find most interesting.

Good read, thanks.

IntelUser2000 · Jun 4, 2016

Cost is important as anything else. It's going up in all aspects. You need very different memory hard to produce like HBM. You need complicated physical structures in your process, like FinFET.

There's so many issues that come up so we really have no idea at all what's feasible and what's not.

We have transistors in the billions. While lot are automated, a lot are still done by humans. Very smart, but they have limits. Having made my own circuits in a scale that's very small, I still need to worry about how the layout is going to turn out. Even with circuits so simple the most important thing for me is the same important thing for big circuit designers at Intel/AMD/Nvidia. And that's reliable production.

What works out in schematic and theory doesn't necessarily translate to that in the real world. It's extremely easy to disregard all the advancements and sweat and tears that went into making these things and be like those guys that just plot the advancements in the graph and say "oh it'll continue like that for 50 more years and we'll have computers to replace humans". All that really tells us is how much those guys that speculate don't know about the very things they speculate about.

The solution to midrange vs high end markets... Why can't amd make a scalable gpu?

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Junior Member

Diamond Member

Junior Member

Elite Member