AMD's next GPU uarch is called "Polaris"

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
AMD already started they will use TSMC as well for other chips. So that should answer your question.

Not to mention I already posted about this previous.

But how do we know that it's not related to volume? We do know that GloFlo struggles with that. At the same time, we also know that Samsung's A9 produces better battery life than than TSMC's at the expense of a very small amount of performance. Given these facts, and assuming what you just said is true, isn't another possible explanation that the 14LPP process is better for lower-end chips, while the 16FF+ is better suited for high-power chips?
 

MrTeal

Diamond Member
Dec 7, 2003
3,916
2,700
136
But how do we know that it's not related to volume? We do know that GloFlo struggles with that. At the same time, we also know that Samsung's A9 produces better battery life than than TSMC's at the expense of a very small amount of performance. Given these facts, and assuming what you just said is true, isn't another possible explanation that the 14LPP process is better for lower-end chips, while the 16FF+ is better suited for high-power chips?

Do you have a link to that? I thought it was the opposite; Samsung's A9 was more dense but produced worse battery life under CPU load.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
Do you have a link to that? I thought it was the opposite; Samsung's A9 was more dense but produced worse battery life under CPU load.

Hm... You might be right. I haven't looked in a while.

EDIT:Yep, I got mixed up. The chip is faster but a little less efficient.
 
Last edited:

Azix

Golden Member
Apr 18, 2014
1,438
67
91
Maybe they will use tsmc for refreshes of 28nm products. Or they will keep making some 28nm products (if the claim was not that they would use 16nm from tsmc)
 

beginner99

Diamond Member
Jun 2, 2009
5,315
1,760
136
LOL

4790K with DDR4 RAM.

Radeon%20Technologies%20Group_Graphics%202016-page-016_575px.jpg

nice catch.
 

MrTeal

Diamond Member
Dec 7, 2003
3,916
2,700
136
nice catch.

The whole slide deck is riddled with poor phrasing and inconsistent layout. They obviously didn't run it by marketing, or even one of the Engineers who's taken more than a single mandatory technical writing course.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
The whole slide deck is riddled with poor phrasing and inconsistent layout. They obviously didn't run it by marketing, or even one of the Engineers who's taken more than a single mandatory technical writing course.

Meh, has nothing to do with the actual end results, or the highlighted products being compared -- graphics cards. Could have put i7 4790K with DDR2 and it wouldn't change anything since there is an actual video backing up the data. A lot better than claiming 64 ROPs and 4GB fast GDDR5 on a real product and then when confronted, doing absolutely nothing about it, instead continuing to sell it with wrong specs to this day to millions of unsuspecting customers.
 

MrTeal

Diamond Member
Dec 7, 2003
3,916
2,700
136
Meh, has nothing to do with the actual end results, or the highlighted products being compared -- graphics cards. Could have put i7 4790K with DDR2 and it wouldn't change anything since there is an actual video backing up the data. A lot better than claiming 64 ROPs and 4GB fast GDDR5 on a real product and then when confronted, doing absolutely nothing about it, instead continuing to sell it with wrong specs to this day to millions of unsuspecting customers.

True, just pointing out that far from being some kind of underhanded move, it's just par for the course in that presentation. Even Ganesh was mocking it on twitter.
https://twitter.com/ganeshts/status/684156199927857152

It could create a little confusion though. It should be obvious since 2600 is common with DDR3 and not with DDR4, but claiming DDR4 and a 4x4GB memory configuration could lead someone to thing that the typo is that the CPU was a HW-E one instead of a 4790k.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Its just a typo guys... if you watch the video, the typo is corrected and it says DDR3
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
If you take that typo and combine it with the strange driver numbering of 16.10 you get the launch date of 4/10/16 :)

LOL

first the stealth conspiracy, then this , whats next ?? Aliens ??? ehehehehehehe
 

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
Just bringing us back to earth for a moment.

What do we know:
2 new 14nm GPU die this year [Raja Koduri]
1st die released probably = 100-110 mm^2 [GTX950 price range @PcPer]

Infer:
A 250mm^2 die needed to equal FuryX assuming some architectural gains
A 300 mm^2 die needed to give FuryX + 20%
Implies 100mm^2 and 300mm^2 as the two new die designs

Problems:
A huge gap between them [worse ratio than R7 260 : R9 290X]
Lots of value wasted in die harvesting

Conclusion:
Unrealistic for cash poor AMD
Missing some important information

Suggested Solution:
100-110 mm^2 Gddr5 die
200-225 mm^2 HBM die
Interposer multi-die approach for high end market [Fury interposer is big enough]
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
Just bringing us back to earth for a moment.

What do we know:
2 new 14nm GPU die this year [Raja Koduri]
1st die released probably = 100-110 mm^2 [GTX950 price range @PcPer]

Infer:
A 250mm^2 die needed to equal FuryX assuming some architectural gains
A 300 mm^2 die needed to give FuryX + 20%
Implies 100mm^2 and 300mm^2 as the two new die designs

Problems:
A huge gap between them [worse ratio than R7 260 : R9 290X]
Lots of value wasted in die harvesting

Conclusion:
Unrealistic for cash poor AMD
Missing some important information

Suggested Solution:
100-110 mm^2 Gddr5 die
200-225 mm^2 HBM die
Interposer multi-die approach for high end market [Fury interposer is big enough]

These two statements contradict one-another. Does anybody have proof that this bullcrap is even possible? And how does this not count as being its on chip if they're on the same die? It's just pure stupidity. I am so positive that this won't ever happen with any chip that I'm willing to put money on it.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Just bringing us back to earth for a moment.

What do we know:
2 new 14nm GPU die this year [Raja Koduri]
1st die released probably = 100-110 mm^2 [GTX950 price range @PcPer]

Infer:
A 250mm^2 die needed to equal FuryX assuming some architectural gains
A 300 mm^2 die needed to give FuryX + 20%
Implies 100mm^2 and 300mm^2 as the two new die designs

Problems:
A huge gap between them [worse ratio than R7 260 : R9 290X]
Lots of value wasted in die harvesting


Conclusion:
Unrealistic for cash poor AMD
Missing some important information

Suggested Solution:
100-110 mm^2 Gddr5 die
200-225 mm^2 HBM die
Interposer multi-die approach for high end market [Fury interposer is big enough]

Actually its well known that 16/14nm yields for die sizes above 200 sq mm are really bad. In fact only TSMC 16FF+ is going to be feasible for 300+ sq mm GPUs in terms of yields. But TSMC will be capacity constrained as demand far outstrips supply at TSMC 16FF+. Moreover TSMC will give first priority for Apple A9, A9X and A10/A10X (Q3 2016 release) . Nvidia and AMD are better served by using 16FF+ for selling high performance USD 300+ GPUs. AMD's choice to thus go with two GPU dies - a 110 - 120 sq mm low power GPU die fabbed at GF 14LPP and a high performance 300 sq mm GPU die fabbed at TSMC 16FF+ makes sense.

My guess is the the low power GPU specs will be a
R7 470 - 768 sp,
R7 470X - 1024 sp, 1 geoometry engine, 1 raster engine, 32 ROP, 128 bit memory bus 8 Ghz GDDR5

The performance will be on par with GTX 960 for the fully enabled SKU and GTX 950 for the salvage SKU.

the high performance GPU using HBM2 will power 4 SKUs as I expect yields to be really bad for 300 sqmm GPUs in 2016. I think there is going to be heavily salvaged SKUs in 2016 to fill the product stack. We will see a dedicated mid range chip in 2017 once yields are much better.

R9 490X - 4096 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 490 - 3072 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 480x - 2048 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.
R9 480 - 1792 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.

AMD's approach makes a lot of sense as they use GF 14LPP to serve the high volume GPU market as AMD has a WSA to meet. GF will be able to yield a 110-120 sq mm die reasonably well enough and AMD can try and push as much volume as possible from GF 14LPP. TSMC 16FF+ will be used for the bleeding edge GPUs of 2016.

I expect 4th gen GCN to have significant improvements in perf/sp and thus I think we can expect a 25-30% faster flagship R9 490X GPU compared to Fury X. I think Nvidia will come out with a faster GPU as Maxwell already has impressive perf/cc and Pascal should bring more. I think the Nvidia GPU will be 10% faster than AMD's flagship GPU.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,916
2,700
136
These two statements contradict one-another. Does anybody have proof that this bullcrap is even possible? And how does this not count as being its on chip if they're on the same die? It's just pure stupidity. I am so positive that this won't ever happen with any chip that I'm willing to put money on it.

I don't see why it wouldn't be possible, though I'd agree with you that we're probably not likely to see a true dual GPU chip. Even if they did do something like this, there's no indication that AMD included any synchronization technology that would make the thing anything but a glorified 295X2 with both GPUs under a common heatsink.
 

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
These two statements contradict one-another. Does anybody have proof that this bullcrap is even possible? And how does this not count as being its on chip if they're on the same die? It's just pure stupidity. I am so positive that this won't ever happen with any chip that I'm willing to put money on it.
Just bringing us back to earth for a moment.

My error in not being clear. This was for Kenmitch, two posts earlier. It had nothing to do with the rest of the post. Since yesterday he has been dropping some hilarious posts. Strangely though, quite a few here are taking him at face value adding to the humor.

Now to the rest of your post.

Just found this example of high performance multi-die on interposer:

http://www.semiwiki.com/forum/showw...ree-Dimensional+Integrated+Circuit+3D+IC+Wiki

"Production interposer: Xilinx Virtex-7
Xilinx is using this technology for their Virtex-7 FPGAs. They call the technology “stacked silicon interconnect” and claim that it gives them twice the FPGA capacity at each process node. This is because very large FPGAs only become viable late after process introduction when a lot of yield learning has taken place. Earlier in the lifetime of the process, Xilinx have calculated, it makes more sense to create smaller die and then put several of them on a silicon interposer instead. It ends up cheaper despite the additional cost of the interposer because such a huge die would not yield economic volumes.
The Xilinx interposer consists of 4 layers of 65um metal on a silicon substrate. TSVs through the interposer allow this metal to be connected to the package substrate. Microbumps allow 4 FPGA die to be flipped and connected to the interposer. See the picture above. An additional advantage of the interposer is that it makes power distribution across the whole die simpler. This seems to be the only design in high volume production today."

I bolded the advantages possible in the above quote.

All I'm saying is what I posted before. We get strange conclusions if RTG is only producing 2 Die and the 1st one is 100-110 mm^2.

Do you believe that they will surpass Fury X this year?
What size Die?
Where is the mid range?
Is mid range a 50-60% activated full die?
Can AMD afford this waste as mid-range will sell many multiples of high end?

If this can be done, we can have very high performance 14nm GPUs very early in the cycle. The interposer will introduce latency, but as I keep reading on this forum, "GPU latency can be designed around".

In my opinion the advantages are compelling. Surely AMD has thought about it.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
How did a 90W GTX950 get up to 140W?

It's total system power usage with an i7 4790 and 950 vs. i7 4790 and Polaris. In the video, the NVIntel system actually hits 152-154W at the wall. In AMD's slides, they were actually nicer to the 950 system since they quoted 140W.

Also, the other common mistake you made is you equated TDP with power usage. No matter how many times it's been stated that TDP does not mean actual power usage, TDP keeps being used interchangeably with power usage. This is especially true for NV who is known to understate TDP figures by low-balling them.
https://www.techpowerup.com/reviews/EVGA/GTX_950_SSC/28.html

Just like 99% of after-market 970s do not use 145W TDP but more like 170-185W. By now you should have noticed that NV TDP has almost little to do with actual NV products.

https://www.youtube.com/watch?v=5g3eQejGJ_A

Easily. Specially since 16FF+ is better than 14LPP.

There is no way this statement is that simple for too many reasons.

#1. Samsung's node may work better for AMD's dense architectures aimed at lower clocks.

#2. The cost structure alone could mean AMD may be able to afford larger 14nm Samsung die than more expensive 16nm FinFET TSMC die. It's possible that to lure new customers, Samsung offered some big incentives to utilize their fabs and gain new book of business. The cost structure is also tied into WSA agreement with GloFo which could mean AMD's penalties for not fulfilling the agreement made it a necessity to split the manufacturing across more than 1 fab partner.

#3. I have not seen any evidence to conclusively state that 16FF+ is better than 14LPP in all key metrics.

FeatureSizes1.png

http://www.extremetech.com/computing/215217-samsung-tsmc-both-fab-apples-a9-processor-report

#4. In many cases, AMD's problem has been supply and yields. At the end of the day, you can have a product that's 5-10% faster at Fab #2 but if it takes 1-2 quarters more to launch it, AMD risks losing some key OEM design wins around key launch dates of certain products. For instance, there will be new laptops around Kaby Lake launch date and around Apple's refresh cycle in early summer 2016. It's probably a good idea to try and win those design wins by at least showing up on time. For AMD showing up on time is arguably way more crucial than outright winning. Had 290/290X and then 380/380X/390/390X/Fury/Fury X launched 3-6 months earlier, the current 28nm generation would have played out differently. For AMD, showing up earlier is FAR more important than it is for Intel/NV.

Even if there could be extra performance with TSMC, it may not be the best strategy in the context of AMD's GPU turnaround strategy. Cards with very mediorce performance and price/performance like 750/750Ti obliterated AMD's market share this generation in the <=$150 pricing segments, well before 970/980 even showed up. Unless we have a direct way of knowing the launch dates, wafer pricing, wafer supply and yields of both 14nm vs. 16FF+, it's probably not that easy to conclude that one is definitively better than the other for AMD.
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
It's wall power.

Well, it specifically says that each card consumed...not "the system consumed"...

And it talks about performance per watt, just as they said about Fiji cards.

It may indeed mean wall power, but there's nothing to indicate that in the slide.