• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

[VideoCardz]NVIDIA GP104 and first Polaris GPUs supposedly spotted on Zauba

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Yes. I actually think they might have exposed the left side of one interposer together with the right side of the adjacent interposed in one mask exposure. The interposer would be cut down the middle of where the mask exposed.

How would that have worked? Fiji is right in the middle of the interposer.
 
NVIDIA-GP104-Pascal-Zauba-Listing-900x326.png


What makes these things a video card or GPU? I must be missing something. This looks like components for a TEC cooler to me.

That was my thoughts as well, which I mentioned in the post above yours. Nothing in that shipping list indicated there's a board design or chip being shipped.
 
You would have to access memory in a NUMA style faction. Cache coherency across chips etc.

This one is easily solved, because cache coherency that exists even between different blocks on the same die is very limited on GPUs. L2 is attached directly to the memory controller, all coherent operations are resolved in the L2. Just like it works on GPUs today.

And it adds latency as well.

That it does. Latency really doesn't matter that much on a GPU, though. Double the vector register count on CUs and that will hide all the latency this adds and then some. (cough cough pet peeve cough cough)

I cant imagine it being practical vs a single big die.

I can see possible worlds where is would make sense, I just wouldn't bet that we are living in one.
 
How would that have worked? Fiji is right in the middle of the interposer.
If you look at the Fiji interposer, you see the memory data lines entering from both sides of the GPU Die. These are too far apart to fit within the reticle limit and etch with one exposure. The adjacent interposers however, allow the left of one to lie close enough to the right of the other to fit. Remember the mask in this case not for a complete integrated circuit. You can have left and right traces on the mask separated by an empty space. you slice right down this space. In normal use you would mask and etch a complete circuit array and then slice at the boundaries. This is not a hard limitation. I would imagine there are no etched signal lines in the center of the interposer.

The mask will not align directly over the interposer but will straddle two adjacent ones, centered over the boundary, ensuring the lefthand HBM traces on the interposer are etched by the righthand side of the mask, with the opposite happening to the adjacent interposer.

Am I describing this well enough?
 
If you look at the Fiji interposer, you see the memory data lines entering from both sides of the GPU Die. These are too far apart to fit within the reticle limit and etch with one exposure. The adjacent interposers however, allow the left of one to lie close enough to the right of the other to fit. Remember the mask in this case not for a complete integrated circuit. You can have left and right traces on the mask separated by an empty space. you slice right down this space. In normal use you would mask and etch a complete circuit array and then slice at the boundaries. This is not a hard limitation. I would imagine there are no etched signal lines in the center of the interposer.

The mask will not align directly over the interposer but will straddle two adjacent ones, centered over the boundary, ensuring the lefthand HBM traces on the interposer are etched by the righthand side of the mask, with the opposite happening to the adjacent interposer.

Am I describing this well enough?

Not really, no. If you expose the left side of one interposer along with the right side of the adjacent interposer, your exposed size still needs to be as large as the whole interposer.
 
Not really, no. If you expose the left side of one interposer along with the right side of the adjacent interposer, your exposed size still needs to be as large as the whole interposer.
My limitation.

You might be thinking left and right side of the interposer in my description means the left and right half. The mask will only etch partially into the interposer, not going all the way to the center. If it did, then yes, the mask would need to be as large as the interposer. Visually estimating the mask to be about 75% vertically and 60% horizontally of the interposer dimensions to give the desired effect.
 
My limitation.

You might be thinking left and right side of the interposer in my description means the left and right half. The mask will only etch partially into the interposer, not going all the way to the center. If it did, then yes, the mask would need to be as large as the interposer. Visually estimating the mask to be about 75% vertically and 60% horizontally of the interposer dimensions to give the desired effect.

Sorry, still not seeing it. Are you saying the mask wouldn't expose the center of the interposer under the Fiji GPU? The mask would need to fully expose the underside of the GPU as beside the memory interface, there's still a massive amount of pins needed for other things like the display outputs, PCIe lanes, and enough power and gnd bumps to support a few hundred amps.

Is there a reason why you think they might have exposed the die in the way you suggested instead of how Anandtech proposed they did it?
 
Last edited:
Sorry, still not seeing it. Are you saying the mask wouldn't expose the center of the interposer under the Fiji GPU? The mask would need to fully expose the underside of the GPU as beside the memory interface, there's still a massive amount of pins needed for other things like the display outputs, PCIe lanes, and enough power and gnd bumps to support a few hundred amps.

Is there a reason why you think they might have exposed the die in the way you suggested instead of how Anandtech proposed they did it?
I don't think you need the entire Fiji Die area on the interposer to be exposed. With what I stated, you still have a lot of area for the additional input/output. I think the max microbump density is around 500-600 connections/mm^2. The main reason for traditionally accessing the entire base of a Die was a much lower pad density and you also needed all of the memory lanes in addition to all the power, video, etc. Memory was a large part of this, but we no longer need this connection to the PCB

All that I read on Anandtech was this "The actual interposer die is believed to exceed the reticle limit of the 65nm process AMD is using to have it built, and as a result the interposer is carefully constructed so that only the areas that need connectivity receive metal layers. This allows AMD to put down such a large interposer without actually needing a fab capable of reaching such a large reticle limit."

Did they mention how it was accomplished?
 
Last edited:
I don't think you need the entire Fiji Die area on the interposer to be exposed. With what I stated, you still have a lot of area for the additional input/output. I think the max microbump density is around 500-600 connections/mm^2. The main reason for traditionally accessing the entire base of a Die was a much lower pad density and you also needed all of the memory lanes in addition to all the power, video, etc. Memory was a large part of this, but we no longer need this connection to the PCB

All that I read on Anandtech was this "The actual interposer die is believed to exceed the reticle limit of the 65nm process AMD is using to have it built, and as a result the interposer is carefully constructed so that only the areas that need connectivity receive metal layers. This allows AMD to put down such a large interposer without actually needing a fab capable of reaching such a large reticle limit."

Did they mention how it was accomplished?

You're still limited by the bump density on the bottom of the interposer. Silicon to silicon bump density on the top might be high, but it won't be any different than a standard GPU on the bottom of the interposer. Just to be clear, let me just toss up a picture based on my understanding (including the 60% and 75% ratios you mentioned) of what you're proposing.
a4lwg4.png

The exposed part of the interposer is in blue, with obviously another one on the opposite sides of those two chips. It seems rather impracticable and fragile. You still need then to move all the power for the chip through the silicon to power the shaders in the center of the die, not an ideal situation. Given that there is no fundamental reason that a signal should need to travel from one half to the other, you could pattern the interposer in two separate exposers similar to what they do with a multi layer mask.
264j11f.png

Really, it's likely that the GPU bumpout could have been done in such a way that a mirror of each side could be used. If the reticle limit of the 65nm process is much smaller than the ~1000mm^2 interposer that would allow, that would allow you to get full coverage on the interposer, with no more potential registration issues than what you're proposing.
 
Regarding Polaris and Zen release dates:

At Polaris AMD based thereon is considerably more offensive, but products are also much closer to the market as such on Zen base. Su reiterated that AMD is the number one would pursue in the graphics market, the company in the past been held, but I lost steadily lost ground. The new architecture will therefore come in all market segments for use, although not all at the same time. The aim is mid-2016, in particular for major in the United States back-to-school period should be available not just desktop solutions, but above all also notebook graphics chips. It is precisely there is a better ratio of performance to power the most distinctive, Su also explained by referring to their own performance specifications of a mainstream desktop solution based on the Polaris architecture.

AMD's chief reassured: Zen was still "on schedule," 2016 should be the pattern for the high-end desktop and server market the year, calling at the end of the "early series production" and first chip to be available, large numbers of chips are then planned for 2017. Now already getting mail from enthusiasts who would like to see Zen earlier, Su emphasized with a smile.
 
Guys, just ignore the magic interposer theorists. They will never fold, and when they're proven wrong they'll just bring the theory back next year.
 
I wonder what makes it exactly GP104 and not say GP100 or 102 or 107.
It's a guess based on the package size. 37.5mm would be too small for GP100, and likely larger than you would expect a smaller die to be.

Edit: Also, nothing in there actually indicates a chip, as people have said. The things they have listed are
-A 650W TEC Cooler and a water cooling lid
-Signal, power and ground probes
-A retainer, guide plate socket base
-A Thermal Head
 
Last edited:
What makes you so sure, oh the source of truth?

The fact that none of you guys who are so absolutely sure of this happening without any real evidence were able to answer my question on the last page is proof enough to me. Seriously, you guys were wring about Fiji being dual Tonga, so I don't know what makes you so confident this time when nothing has changed.
 
Last edited:
I wonder what makes it exactly GP104 and not say GP100 or 102 or 107.

I'm wondering what makes it a GP* anything? I don't see anything on that list that denotes a gpu or parts for a video card. Unless they are going to be releasing some kind of TEC cooled, socketed GPU.

We have a 650W TEC, a guide plate, A signal probe, power and ground probe, a retainer, a Thermal head, water cooling lid, and a socket base. What about that denotes a graphics card?
 
The fact that none of you guys who are so absolutely sure of this happening without any real evidence were able to answer my question on the last page is proof enough to me. Seriously, you guys were wring about Fiji being dual Tonga, so I don't know what makes you so confident this time when nothing has changed.

I was NOT writing Fiji being dual Tonga, get your facts straight. I was vehemently against that theory, because 28nm is very mature and yields are high, there's no gains to use a risky tech to put multiple chips together.

AMD likes to be at the forefront of new tech, but they need experience with HBM first before moving to multi-chip on an interposer. But now they have that experience and the circumstance of the new nodes means only small chips for a long while.

If not on 14nm, then the next node down, chip makers will have to move to a technology that makes multiple smaller dies behave like a single monolithic chip due to yield and costs.
 
I was NOT writing Fiji being dual Tonga, get your facts straight. I was vehemently against that theory, because 28nm is very mature and yields are high, there's no gains to use a risky tech to put multiple chips together.

AMD likes to be at the forefront of new tech, but they need experience with HBM first before moving to multi-chip on an interposer. But now they have that experience and the circumstance of the new nodes means only small chips for a long while.

If not on 14nm, then the next node down, chip makers will have to move to a technology that makes multiple smaller dies behave like a single monolithic chip due to yield and costs.

I'd really like Maddie to answer, but he/she doesn't seem to have interest in answering questions that can't be used to prove the possibility.

Either way, this is a stupid conversation to have. In order for this idea to work, the smaller dies need to use HBM2. If yields are part of the issue, we can throw this whole idea out of the window. On top of that, there's no reason to believe that this solution would cost significantly less to develop than a single die, considering how much new technology is needed. Also, as I said before, if this is so much cheaper to develop, Nvidia would do it too. It's not happening now, end of story.
 
It's a guess based on the package size. 37.5mm would be too small for GP100, and likely larger than you would expect a smaller die to be.

Edit: Also, nothing in there actually indicates a chip, as people have said. The things they have listed are
-A 650W TEC Cooler and a water cooling lid
-Signal, power and ground probes
-A retainer, guide plate socket base
-A Thermal Head

Oh, thanks for explanation.
 
You're still limited by the bump density on the bottom of the interposer. Silicon to silicon bump density on the top might be high, but it won't be any different than a standard GPU on the bottom of the interposer. Just to be clear, let me just toss up a picture based on my understanding (including the 60% and 75% ratios you mentioned) of what you're proposing.
a4lwg4.png

The exposed part of the interposer is in blue, with obviously another one on the opposite sides of those two chips. It seems rather impracticable and fragile. You still need then to move all the power for the chip through the silicon to power the shaders in the center of the die, not an ideal situation. Given that there is no fundamental reason that a signal should need to travel from one half to the other, you could pattern the interposer in two separate exposers similar to what they do with a multi layer mask.
264j11f.png

Really, it's likely that the GPU bumpout could have been done in such a way that a mirror of each side could be used. If the reticle limit of the 65nm process is much smaller than the ~1000mm^2 interposer that would allow, that would allow you to get full coverage on the interposer, with no more potential registration issues than what you're proposing.
Had to run and see about an issue yesterday.

OK. I see your suggestion. Sounds better than mine. Better Die coverage.

Can a multi layer mask that large be used?
 
I'd really like Maddie to answer, but he/she doesn't seem to have interest in answering questions that can't be used to prove the possibility.

Either way, this is a stupid conversation to have. In order for this idea to work, the smaller dies need to use HBM2. If yields are part of the issue, we can throw this whole idea out of the window. On top of that, there's no reason to believe that this solution would cost significantly less to develop than a single die, considering how much new technology is needed. Also, as I said before, if this is so much cheaper to develop, Nvidia would do it too. It's not happening now, end of story.
Is the post below the one you wanted questions answered?

I didn't answer because Tuna-Fish gave you an answer I agreed with. I saw no reason to post a reply as you addressed your post to all. What is the issue with the answer you received?

To be unambigous, I'm not saying that it is certain that AMD will have a multi-Die solution, but that everything is in place for them to do it.

They have experience with and have used an interposer.
Yields on 14nm will be bad initially for larger Die.
They have a lot of experience with mixing IP blocks to make custom SOCs
They appear to be going the wide and slow approach for GPUs [850Mhz demo]

What do you mean by prove the possibility? I'm only pointing to the possible. You seem angry and I can't see why this is.

To everyone with this nutty multi-die idea, I have one question: If it really is more cost effective to do this, why is it only being applied to AMD? Does Nvidia just not want to save money?
 
Is the post below the one you wanted questions answered?

I didn't answer because Tuna-Fish gave you an answer I agreed with. I saw no reason to post a reply as you addressed your post to all. What is the issue with the answer you received?

To be unambigous, I'm not saying that it is certain that AMD will have a multi-Die solution, but that everything is in place for them to do it.

They have experience with and have used an interposer.
Yields on 14nm will be bad initially for larger Die.
They have a lot of experience with mixing IP blocks to make custom SOCs
They appear to be going the wide and slow approach for GPUs [850Mhz demo]

What do you mean by prove the possibility? I'm only pointing to the possible. You seem angry and I can't see why this is.

His answer was a non-answer to the first part (which is the most important part and this idea completely falls apart if this doesn't save a huge amount of money over designing a new chip), and the second part is meaningless because none of the known work they've done with interposers goes farther than what they've done with Fiji, so Nvidia has the same level of experience.

Also, the red is completely irrelevant conjecture because:

  1. The chip wasn't running at full speed, so you can't extrapolate anything from the speed it runs at
  2. It's a small die aimed at mobile, so even if you could extrapolate a little from that, the desktop ship would run faster
  3. Since it's a small die, it's going to use GDDR5(X), which means that it could notb be used as part as a bigger HBM2 package, and
  4. Even if you argue that it is possiblle, I'd imagine that matching busses would be necessary... unless you're expecting multiple magic interposer chips?

The point being that there isn't anything concrete whatsoever so support this theory. There aren't even any hints that AMD is hiding something huge from us. This would be the best-kept secret in the tech world in over a decade if it were true. However, you have nothing. You're just reading a bunch of random things as evidence because you've already decided that it's something that's probably going to happen. When there's actual substance behind this theory, I'll take is seriously. Until then, this theory falls into the same category as expecting big Polaris (if that's even a thing) to be 3x as fast and efficient as GM200 or expecting Nintendo NX to have Fiji-level performance.
 
His answer was a non-answer to the first part (which is the most important part and this idea completely falls apart if this doesn't save a huge amount of money over designing a new chip), and the second part is meaningless because none of the known work they've done with interposers goes farther than what they've done with Fiji, so Nvidia has the same level of experience.

Also, the red is completely irrelevant conjecture because:

  1. The chip wasn't running at full speed, so you can't extrapolate anything from the speed it runs at
  2. It's a small die aimed at mobile, so even if you could extrapolate a little from that, the desktop ship would run faster
  3. Since it's a small die, it's going to use GDDR5(X), which means that it could notb be used as part as a bigger HBM2 package, and
  4. Even if you argue that it is possiblle, I'd imagine that matching busses would be necessary... unless you're expecting multiple magic interposer chips?

The point being that there isn't anything concrete whatsoever so support this theory. There aren't even any hints that AMD is hiding something huge from us. This would be the best-kept secret in the tech world in over a decade if it were true. However, you have nothing. You're just reading a bunch of random things as evidence because you've already decided that it's something that's probably going to happen. When there's actual substance behind this theory, I'll take is seriously. Until then, this theory falls into the same category as expecting big Polaris (if that's even a thing) to be 3x as fast and efficient as GM200 or expecting Nintendo NX to have Fiji-level performance.
It doesn't have to save a huge amount of money. Once it saves some it's worthwhile to do it. They have designed a new chip anyhow.

One person's random thing is another one's pattern.

Fair enough. Don't bother with any speculation. I'm not asking you to agree with the speculation.

As to the apparent anger?????????
 
Back
Top