From where would AMD pull off 2x the performance/watt on 28nm? You realize how architectures work, right? GCN was AMD's major redesign in 2011 and they have already stated on many occasions they are not changing this architexture. GCN will have gradual improvements in all areas but the core architecture will remain. If you look at AMD, they have a history of using the same fundamental base and growing the functional units with upgrades to memory bandwith, geometry, texture/color fill rate efficiencies. Their last VLIW architecture lasted a very long time.
RS the 2x perf/watt at 28nm is not correct. Nvidia compare a 398 sq mm GM204 with a 294 sq mm GK104 and conclude that they got 2x perf/watt. You have to normalize for die size and wattage to conclude that its a perfect 2x perf/watt scaling which is not the case. If you say 2x perf/watt then GM200 should double the perf of GK110. But given that GM200 is likely to be 600+ sqmm and 250W TDP and maybe 50% faster than GTX 780 Ti its clear that 2x perf/watt is just marketing talk.
Nvidia did achieve a significant improvement in performance per shader and perf/ sq mm. They did it by improving SM efficiency. Maxwell was a redesign of Kepler to improve SM efficiency and improve perf/watt and perf/sq mm. 1 Maxwell SM or SMM with 128 cc provides 90% of the performance of a Kepler SMX which has 192 cc. Kepler improved on Fermi by going with a single clock domain and focusing on perf/watt and perf/sq mm. Maxwell refined it. The changes are significant but the core architectures are not radically different. you have 2 to 3 (max) grounds up new arcitectures in a decade. For nvidia it was G80 (late 2006) and GF100 (mid 2010). Mostly the grounds up new architectures are designed keeping in mind the latest DX API they were targetting. DX10 for G80 and DX11 for GF100. For Nvidia their next grounds up new architecture should be Pascal (DX12, HBM, NVLink).
Even though GCN 2.0 should have some improvements in performance/watt, AMD will gain a lot more from a shift to a less power hungry HBM, a more efficient 20nm, and WC. AMD doesn't have resources of NV to have 2-3 separate teams working on next gen architexture every 3-4 years.
Equating R&D size with ability co compete is not always correct. How is it that AMD executed so well with HD 4870 and HD 5870 and provided some of the most intense competition and price wars ever seen by the industry (especially during the GTX 275/HD 4890 timeframe when both launched for USD 250).
With NV they said they had a team working on Kepler for 4 years when Fermi launched. So obviously it was a different team. Same with Maxwell.
Do you believe Nvidia did not incorporate any of the learnings from the mistakes committed in development of the initial Fermi GF100 chip into future chip development projects. These teams might work in parallel but the learnings are definitely what contribute to improvement and drive future product goals. With Fermi GF100 Nvidia must have known from mid-late 2009 that they had problems. They fixed few of them with GF110 in late 2010. But the major focus on perf/watt and perf/sq mm was definitely driven by the mistakes and associated learnings from Fermi.
AMD can't do this but you continue living in the clouds assuming AMD = NV or Intel in resources. They aren't in the same league, not to mention graphics is not AMD's majority business either. They have many product lines from CPUs to APUs to servers that require resources and R&D. NV has what Tegra and they can focus 90% of their R&D and reaoucrs ONLY on graphics. That means even with identical R&D and employees, AMD has less to use towards deaktop graphics. How do you still keep ignoring this?
Today AMD's future depends even more on graphics than it did a few years back. What is saving the company from death is the semi-custom wins which were driven primarily by AMD's APU expertise and graphics IP. Today Nvidia and AMD are quite similar.
1.) Both have CPU cores . Nvidia has custom core Denver 64 bit. AMD has 2 CPU cores on which its working - Zen and K12. From 2016 AMD can easily use Zen to address the entire breadth of their product requirements. AMD is not interested in smartphones. They are interested in tablets,notebooks and desktops. So if a Broadwell core can be used in 4.5W Core M SOCs all the way upto 80W desktop SOCs and even 150w Xeon server chips, then Zen can be used to drive all of AMDs SOC from 4.5W - 95w.
2.) Both AMD and Nvidia develop GPU cores and standalone discrete GPUs.
3.) Both AMD and Nvidia develop SOCs - Nvidia Tegra and AMD's x86-64 SOCs and ARMv8 SOCs.
I agree AMD has lesser resources but that does not mean they cannot compete.
Therefore, for AMD to keep up, they need to take a lot more risks in adopting new technologies to overcome the lack of resources disadvantage.
Even though there is a lot of challenges AMD has a lot of opportunity to grow their business due to x86. I am looking forward to a well designed quad core x86-64 Zen SOC with 1024 GCN 2.0 cores and 128 GB/s HBM bandwidth. This will most likely be faster than HD 7870 and CPU performance should be competitive with Intel Haswell core i5.
That's why AMD is way more likely to try more exotic options first like HBM, WC and 20nm. They don't have time or money to do major GPU architectural redesigns. We are going to get GCN 2.0, then 3.0, maybe 4.0 before AMD ditches GCN.
I would not presume to tell what AMD does in future. But if history is any indication GCN iterations will be here till 2017 - 2018. AMD has had 3 major architectures in the past 12 years - R300 in 2002(9700, 9800, X1800, X1900 series), R600 in 2005 (Xbox 360, HD 2900XT, HD 4870,HD 5870, HD 6970 ) and GCN in 2012. That does not mean these iterations will not be significant improvements in architectural efficiency.
I guess what they meant in that article is yields are too low and risk of defects is too high for the node to produce large die high performance GPUs in 2015. They are basically saying AMD and NV are going to skip 20nm entitely. This contradicts Lisa Su's statements that AMD will have graphics on 20nm node. I can't imagine AMD building a 550mm2 die on 28, which is what they'd realistically need to compete. Imagine how hot and power hungry such a chip would run after 438mm2 Hawaii? If they reduce transistor density like NV, they will have more difficulty scaling SPs, TMUs, etc.
Lisa did not state that they are making GPUs specifically. She said they are designing in 20nm but stopped short of confirming which exact products are moving to 20nm. AMD did confirm Skybridge 20nm SOC x86-64 and ARMv8. But we know they are low power and small die size. btw what makes you think AMD cannot improve perf/shader, perf/CU and perf efficiency/CU. What makes you think AMD cannot go to a 128 stream processor CU design and improve perf/sq mm and perf/watt ?
Let me throw this in for speculation- 32 wide SIMD - 128 sp per SIMD, 4 SIMD per CU, 2 CU per shader engine, 4 shader engines.
So total CU count would remain the same as in HD 7970.
Shader engine, geometry engine and raster engine count would remain same as in Hawaii. But they also could sport performance and efficiency improvements.
http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/3
So you are looking at much better perf/sq mm and perf/watt due to improved SIMD and CU layout. Add architectural improvements to bring 10 - 20% in perf/shader. Combine it with better memory efficiency from Tonga (1.4x) and 60% higher memory bandwidth (512 GB/s on R9 390X vs 320 GB/s on R9 290X) (1.6 x 1.4 = 2.24) and you have 2.24 / (4096 / 2816) = 1.54 times memory bandwidth per shader compared to Hawaii. Improved ROP performance as seen in Tonga. Add to it tesselation improvements and other tweaks.
Remember if AMD achieve 50% better perf/watt and 50% better perf than R9 290X they would have an extremely competitive GPU against GM200. Not to mention they already have superior multi GPU scaling wrt CF vs SLI. My gut says AMD has gone with GF 28SHP(which also brings reduced leakage as seen in Beema/Mullins).
http://www.anandtech.com/show/7974/...hitecture-a10-micro-6700t-performance-preview
"
AMD claims a 19% reduction in core leakage/static current for Puma+ compared to Jaguar at 1.2V, and a 38% reduction for the GPU. The drop in leakage directly contributes to a substantially lower power profile for Beema and Mullins."
AMD is making Kaveri, semi custom game console chips and GPUs at GF 28SHP. Just to remind you GF,Amkor and Hynix have been partnering with AMD from 2011 on 2.5D stacking.
http://sites.amd.com/la/Documents/TFE2011_001AMC.pdf
http://www.amd.com/Documents/TFE2011_006HYN.pdf
http://www.amkor.com/index.cfm?objectid=E6A2243B-0017-10F6-B680958B1E902E87
http://electroiq.com/blog/2013/12/amd-and-hynix-announce-joint-development-of-hbm-memory-stacks/
http://www.setphaserstostun.org/hc2...Bandwidth-Kim-Hynix-Hot Chips HBM 2014 v7.pdf
2.5D stacking brings a fundamental change to the way in which GPUs are built and no more is it just the foundry partner but the alliance of partners who you have worked with for years to bring this solution to market.