ambidextrous computing; AMD project skyla..skybridge!

NostaSeronx · May 7, 2014

cbn said:
Anyone want to field opinions on why AMD chose cortex A57 for their Heirofalcon storage server SOC.

It's not a storage server SoC.

It's an embedded High Performance CPU SoC.
http://hardwarebbq.com/wp-content/uploads/2013/09/AMD_EMBEDDED_31.png

cbn · May 8, 2014

NostaSeronx said:
It's not a storage server SoC.

It's an embedded High Performance CPU SoC.
http://hardwarebbq.com/wp-content/uploads/2013/09/AMD_EMBEDDED_31.png

Thanks.

It looks like I was getting Heirofalcon mixed up with Seattle (Opteron A1100):

http://www.anandtech.com/show/7724/...arm-based-server-soc-64bit8core-opteron-a1100

Screen%20Shot%202014-01-28%20at%206.37.11%20PM_678x452.png

cbn · May 8, 2014

So back to my original question:

Jaguar/Puma+ vs. Cortex A57 for a storage server?

What am I missing here?

Vesku · May 8, 2014

Unlikely they will have to compete head-to-head with Intel in any potential ARM server market. Despite failing as a company, Calxeda did demonstrate there is quite a lot of interest in ARM servers just not 32 bit ones.

cbn · May 8, 2014

Vesku said:
Unlikely they will have to compete head-to-head with Intel in any potential ARM server market. .

Comparing Seattle to Avoton for storage server:

1. Both have 8 small cpu cores
2. Both have integrated LAN
3. Both have a good amount of SATA ports and PCI-E lanes

.....but one will run x86 code and the other ARM code.

Vesku · May 8, 2014

AMD looking to be a market leader in a potential new market. Second fiddle in x86 but first in ARM -

https://technology.ihs.com/452584/arm-servers-aim-higher-despite-currrent-miniscule-market

monstercameron · May 8, 2014

cbn said:
Comparing Seattle to Avoton for storage server:

1. Both have 8 small cpu cores
2. Both have integrated LAN
3. Both have a good amount of SATA ports and PCI-E lanes

.....but one will run x86 code and the other ARM code.

which doesnt matter in the server market, as you most likely will be running bsd or linux. You will have source and are able to compile for that system...

erunion · May 8, 2014

mrmt said:
AMD cannot use TSMC.

because it would be contrary to previous statements about the WSA? When has the WSA not been confusing and contradictory? I'd say its actual likely that previous guidance is not followed.

What I find unlikely is GF launching 20 nm at the same time as TSMC.

cbn · May 8, 2014

monstercameron said:
which doesnt matter in the server market, as you most likely will be running bsd or linux. You will have source and are able to compile for that system...

Maybe this is the wrong thread for my question, but as I roughly understand things Jaguar/Puma+ has much better FPU performance compared to something designed by ARM.

Does compiling for ARM (from x86 code) fully compensate for these differences in FPU?

Idontcare · May 8, 2014

erunion said:
Several minor things.

20nm is what caused nvidia's infamous wafer cost revolt.
http://www.extremetech.com/computin...y-with-tsmc-claims-22nm-essentially-worthless

Now the rumor is there will be no 20nm GPUs until 2015. And that Nvidia is back porting Maxwell to 28nm for its 800 series.

And unfortunately I don't have a link, but I remember someone from TSMC saying that in hindsight they probably shouldn't have launched 20nm planar and then transitioned to finfet a year later.
20nm was always going to be a short lived node, but it will be interesting to see if customers decide to skip 20nm and go straight to 16nm

Ah, OK, I recommend you don't use Nvidia as an indicator of TSMC's 20nm HOL (health of line) because there are some things going on which are causing that and it doesn't have much to do with TSMC or its 20nm situation.

Nvidia is having issues adjusting to the reality that the GPU stopped being the fab filler they thought it would always be because TSMC found that the mobile business (anything ARM) is ridiculously higher volume than the discrete GPU business and the customers in mobile to be far less "violently disagreeable" in public.

To put it another way, Nvidia's lack of access to inexpensive 20nm wafer starts is entirely due to supply and demand; with other companies such as Apple and Qualcomm representing a new demand vector willing to pay a price premium to access the precious limited wafer availability during the first 6 months of ramp.

I can tell you 20nm is ramping quite nicely, but like everything in life the wafers are supply constrained and the customers who are willing to pay for them are getting them.

Nvidia's butthurt aside, if they really are worried about 20nm pricing they should scout out Samsung, a bit of a fire-sale going on there right now.

NostaSeronx · May 8, 2014

erunion said:
because it would be contrary to previous statements about the WSA? When has the WSA not been confusing and contradictory? I'd say its actual likely that previous guidance is not followed.

28nm-HPP down to 28nm-LPS from GlobalFoundries was bad in an unknown metric. So, AMD decided to continue doing 32nm-SHP designs for an extended amount of time to offset GlobalFoundries' loss. With the usage of 28nm-SHP, AMD is locked to GlobalFoundries. With no future designs going to TSMC unless that design is Semi-custom. Semi-custom designs are not purely AMD designs, so those designs don't have the same limitations.

erunion said:
What I find unlikely is GF launching 20 nm at the same time as TSMC.

GlobalFoundries came second but was the cheaper costing version. TSMC's 20nm started at costs magnitudes higher than 40nm to 28nm. While, GlobalFoundries' 20nm in early 2015 will be in the perfect price metric.

Idontcare said:
I can tell you 20nm is ramping quite nicely, but like everything in life the wafers are supply constrained and the customers who are willing to pay for them are getting them.

It is artificially constrained all three of the major TSMC foundries are producing 20nm at full volume.

Those already using 20nm from TSMC have the money to afford an expensive node.

monstercameron · May 8, 2014

cbn said:
Maybe this is the wrong thread for my question, but as I roughly understand things Jaguar/Puma+ has much better FPU performance compared to something designed by ARM.

Does compiling for ARM (from x86 code) fully compensate for these differences in FPU?

I couldnt say, but there is tuning that could be done at compile time for different uarchs.

NTMBK · May 8, 2014

Idontcare said:
Nvidia's butthurt aside, if they really are worried about 20nm pricing they should scout out Samsung, a bit of a fire-sale going on there right now.

Not surprising when their biggest customer moved over to TSMC for 20nm! (Or so the rumour mill says.)

ams23 · May 8, 2014

Idontcare said:
Nvidia is having issues adjusting to the reality that the GPU stopped being the fab filler they thought it would always be because TSMC found that the mobile business (anything ARM) is ridiculously higher volume than the discrete GPU business and the customers in mobile to be far less "violently disagreeable" in public.

Oh please, "violently disagreeable"? That was an erroneous spin by Extremetech's author. There was nothing violent nor disagreeable about that slide reel (nor was it ever meant for the public), but the reality it spoke of is true.

To put it another way, Nvidia's lack of access to inexpensive 20nm wafer starts is entirely due to supply and demand; with other companies such as Apple and Qualcomm representing a new demand vector willing to pay a price premium to access the precious limited wafer availability during the first 6 months of ramp

This is an overly simplistic argument. NVIDIA's higher end GPU's have much larger die sizes and much higher transistor count than any ultra mobile SoC or ultra mobile modem product. For a variety of different reasons, it doesn't always make sense to jump at the chance of adopting the most bleeding edge fab process node.

sontin · May 8, 2014

Idontcare said:
Nvidia is having issues adjusting to the reality that the GPU stopped being the fab filler they thought it would always be because TSMC found that the mobile business (anything ARM) is ridiculously higher volume than the discrete GPU business and the customers in mobile to be far less "violently disagreeable" in public.

You should talk to Qualcomm about what they thought about the 28nm shortage two years ago. :sneaky:

BTW: 20nm will be supply constraint and more expensive than 28nm at the beginning. Tegra K1 is on 28nm and has a one generation leap over every other SoC. Maxwell v1 is a generation leap over Kepler on the same node.
If you need 20nm to catch up you are screwed.

NTMBK · May 8, 2014

sontin said:
You should talk to Qualcomm about what they thought about the 28nm shortage two years ago. :sneaky:

I rather suspect that IDC has a better idea of what's going on at TSMC than we do.

jdubs03 · May 8, 2014

Maxwell should certainly be 20nm for the higher-end models, and Erista should be as well. Erista/dual-core on 20nm will be quite the model. But I can't wait til 16nm though for Volta/and whatever they're calling their SOC.

Arachnotronic · May 8, 2014

Idontcare said:
Ah, OK, I recommend you don't use Nvidia as an indicator of TSMC's 20nm HOL (health of line) because there are some things going on which are causing that and it doesn't have much to do with TSMC or its 20nm situation.

Nvidia is having issues adjusting to the reality that the GPU stopped being the fab filler they thought it would always be because TSMC found that the mobile business (anything ARM) is ridiculously higher volume than the discrete GPU business and the customers in mobile to be far less "violently disagreeable" in public.

To put it another way, Nvidia's lack of access to inexpensive 20nm wafer starts is entirely due to supply and demand; with other companies such as Apple and Qualcomm representing a new demand vector willing to pay a price premium to access the precious limited wafer availability during the first 6 months of ramp.

I can tell you 20nm is ramping quite nicely, but like everything in life the wafers are supply constrained and the customers who are willing to pay for them are getting them.

Nvidia's butthurt aside, if they really are worried about 20nm pricing they should scout out Samsung, a bit of a fire-sale going on there right now.

Idontcare

Can you tell us about TSMC's 16 FinFET? When should we expect products in the market based on this process?

I am estimating 1H 2016 for high volume mobile, but I have a sneaking suspicion that if TSMC wants Apple's business it'll need to have FinFETs ready for the iPhone 6s for a 4Q 2015 launch.

erunion · May 8, 2014

Thanks for the insight, IDC.

We've been hearing rumors about Apple leaving Samsung for years, I don't pay much attention to them anymore. But your statement about Samsung having a fire sale sounds like a pretty strong hint that you know something we don't.

Idontcare · May 8, 2014

Intel17 said:
Idontcare

Can you tell us about TSMC's 16 FinFET? When should we expect products in the market based on this process?

I am estimating 1H 2016 for high volume mobile, but I have a sneaking suspicion that if TSMC wants Apple's business it'll need to have FinFETs ready for the iPhone 6s for a 4Q 2015 launch.

Last time I had the opportunity to discuss 16FF with people who would know first hand, development was definitely on track as needed to put production readiness in 2H 2015 timeframe.

As to whether or not they remain on track in terms of making the required/necessary improvements in both yield and reliability between now and then remains to be seen (but there are no crisis-like unresolved barriers at this time impeding the remaining development path).

Furthermore, as to whether or not they will be in a position of pushing the high-volume button for a 4Q 2015 Apple launch is an entirely different question as well that depends on 2015 global financials (not too mention Apple's).

What I can tell you is they are building out their fab capacity for 20nm and 16FF at a rate that is simply unparalleled anywhere else in the world for node-on-node wafer capacity expansion in the history of my history with this industry. It is breathtaking to see the construction sites. Somebody at TSMC is gearing up to push a ridiculously large quantity of 20nm/16FF wafers.

Personally, and this is just me being a jaded and cynical process development engineer of yore, I fully expect TSMC to be late(r) to market relative to what they are currently aiming for and guiding their customers towards. It would just be too unlike TSMC to actually deliver their deliverables on time and on target, so I'm expecting history to repeat (but would love to be surprised to the upside on that).

cbn · May 8, 2014

cbn said:
Comparing Seattle to Avoton for storage server:

1. Both have 8 small cpu cores
2. Both have integrated LAN
3. Both have a good amount of SATA ports and PCI-E lanes

.....but one will run x86 code and the other ARM code.

monstercameron said:
which doesnt matter in the server market, as you most likely will be running bsd or linux. You will have source and are able to compile for that system...

cbn said:
Maybe this is the wrong thread for my question, but as I roughly understand things Jaguar/Puma+ has much better FPU performance compared to something designed by ARM.

Does compiling for ARM (from x86 code) fully compensate for these differences in FPU?

monstercameron said:
I couldnt say, but there is tuning that could be done at compile time for different uarchs.

Here is an Anandtech article I found examining FPU on ARM CPUs.

http://www.anandtech.com/show/6971/exploring-the-floating-point-performance-of-modern-arm-processors

In the big picture, readers may want to know how the the floating point capabilities of these cores compares to x86 cores. I consider Intel's Ivy Bridge and Haswell as datapoints for big x86 cores, and AMD Jaguar as a datapoint for a small x86 core. For double-precision (fp64), current ARM cores appear to be limited to 2 flops/cycle for FMAC-heavy workloads and 1 flops/cycle for non-FMAC workloads. Ivy Bridge can have a throughput of up to 8 flops/cycle and Haswell can do 16 flops/cycle with AVX2 instructions. Jaguar can execute up to 3 flops/cycle. Thus, current ARM cores are noticeably behind in this case. Apart from the usual reasons (power and area constraints, very client focused designs), current ARM cores also particularly lag behind in this case because currently NEON does not have vector instructions for fp64. ARMv8 ISA adds fp64 vector instructions and high performance implementations of the ISA such as Cortex A57 should begin to reduce the gap.

For fp32, Ivy Bridge can execute up to 16 fp32 flops/cycle, Haswell can do up to 32 fp32 flops/cycle and AMD's Jaguar can perform 8 fp32 flops/cycle. Current ARM cores can do up to 8 flops/cycle using NEON instructions. However, ARM NEON instructions are not IEEE 754 compliant, whereas SSE and AVX floating point instructions are IEEE 754 compliant. Thus, comparing flops obtained in NEON instructions to SSE instructions is not apples-to-apples comparison. Applications that require IEEE 754 compliant arithmetic cannot use NEON but more consumer oriented applications such as multimedia applications should be able to use NEON. Again, ARMv8 will fix this issue and will bring fully IEEE 754-compliant fp32 vector instructions.

To conclude, Cortex A15 clearly leads amongst the CPUs tested today with Krait 300 very close behind. It is also somewhat disappointing that none of the CPU cores tested displayed a throughput of more than 1 FP instruction/cycle in these tests. I end at a cautionary note that the tests here are synthetic tests that only stress the FP units. Floating point ALU peaks are only a part of a microarchitecture. Performance of real-world applications will depend upon rest of the microarchitecture such as cache hierarchy, out of order execution capabilities and so on. We will continue to make further investigations into these CPUs to understand them better.

So with ARMv8 FPU is getting better.

......Looking forward to tests comparing Intel's Atom and AMD Jaguar/Puma+ to the stock Cortex A57.

Arachnotronic · May 8, 2014

Idontcare said:
Last time I had the opportunity to discuss 16FF with people who would know first hand, development was definitely on track as needed to put production readiness in 2H 2015 timeframe.

As to whether or not they remain on track in terms of making the required/necessary improvements in both yield and reliability between now and then remains to be seen (but there are no crisis-like unresolved barriers at this time impeding the remaining development path).

Furthermore, as to whether or not they will be in a position of pushing the high-volume button for a 4Q 2015 Apple launch is an entirely different question as well that depends on 2015 global financials (not too mention Apple's).

What I can tell you is they are building out their fab capacity for 20nm and 16FF at a rate that is simply unparalleled anywhere else in the world for node-on-node wafer capacity expansion in the history of my history with this industry. It is breathtaking to see the construction sites. Somebody at TSMC is gearing up to push a ridiculously large quantity of 20nm/16FF wafers.

Personally, and this is just me being a jaded and cynical process development engineer of yore, I fully expect TSMC to be late(r) to market relative to what they are currently aiming for and guiding their customers towards. It would just be too unlike TSMC to actually deliver their deliverables on time and on target, so I'm expecting history to repeat (but would love to be surprised to the upside on that).

Thank you, Idontcare.

A follow up if I may be so bold...how do you view the relative positioning of the remaining semiconductor manufacturing players in terms of process technology (Intel, Samsung, TSMC)?

i.e. as an outsider, should I buy Intel's marketing spin that they're 2 years ahead of the industry blah blah or that 16 FinFET+ will offer a magical 15% performance improvement at no power penalty shortly after 16 FinFET hits the market, etc.? Oh and Intel's density claim...that one I'd love to hear addressed by an expert.

How should I think about this space in general? What's the no-BS view from somebody who actually knows what he's talking about?

Thanks!

cbn · May 9, 2014

Regarding node advantage, I wonder how ARM developing multiple cores for ARMv8 could factor into this?

If ARM ends up with six designs for ARMv8 like they did for ARMv7 (Cortex A5, Cortex A7, Cortex A8, Cortex A9, Cortex A12, Cortex A15) they could potentially have every niche between 1 watt and 100 watts covered (with each core design working squarely in a tightly defined efficiency zone).

In contrast, with Intel we are seeing only two cpu core designs. In some cases, the Core series chips are being pushed into very low TDP service (eg, Y series chips) and I believe the performance per watt is less than what It could be in these conditions.

piesquared · May 9, 2014

I think clearly intel's 14nm node is delayed longer than expected. It is hurting their competitive position badly. Mobile X86 28nm products are already competing with their 22nm products. There's no reason to think that 20nm products won't be just as good or better than their 14nm products. Then what? Furthuremore there is no way they are going to compete with ARM in efficiency. They'll need 14nm just to shrink the margin of ARM products already on the market for at least a year. Not only that, with the volume of 20nm products flooding into the market, chances are it'll be A LOT cheaper than what intel can produce. The mobile ecosystem is owned by ARM with BILLIONS of products on the market.

I think AMD's strategy is extremely sound. If X86 contracts it will contract upward, staying just ahead of ARM in performance. ARM will continue to be the mobile dominant force as the ARM ISA is so good and AMD has products for both. Skybridge gives them the opportunity to target both, at a minimal cost to developers.

Lest we forget that almost all 20nm ARM products from AMD, as well as from the rest of the HSA foundation member's, will be HSA products. The benefits of HSA are clear one just needs to look up LibreOffice results. The next and final step of AMD's HSA initiative is system integration, which is what Project Skybridge is. It's extremely cool. Jim Keller said something along the lines of 'if you want to work on cutting edge new designs, you come and work for AMD'.

AMD's ambidextrous computing roadmap includes:
"Project SkyBridge" - This design framework, available starting in 2015, will feature a new family of 20 nanometer APUs and SoCs that are expected to be the world's first pin-compatible ARM and x86 processors. The 64-bit ARM variant of "Project SkyBridge" will be based on the ARM Cortex®-A57 core and is AMD's first Heterogeneous System Architecture ("HSA") platform for Android; the x86 variant will feature next-generation "Puma+" CPU cores. The "Project SkyBridge" family will feature full SoC integration, AMD Graphics Core Next technology, HSA, and AMD Secure Technology via a dedicated Platform Security Processor (PSP).
"K12" - A new high-performance, low-power ARM-based core that takes deep advantage of AMD's ARM architectural license, extensive 64-bit design expertise, and a core development team led by Chief CPU Architect Jim Keller. The first products based on "K12" are planned for introduction in 2016.

http://ir.amd.com/phoenix.zhtml?c=74093&p=irol-newsArticle&ID=1926976&highlight=

stuff_me_good · May 9, 2014

Please enlighten me!

So AMD is makin it's custom ARM core just like samsung, nvidia, qualcom and all the rest. But what will change and how will it be any better than vanilla? There probably will be completely new and faster vanilla by the time AMD has it's own ready.

I mean, more cache, custom power savings, what? How much faster it will be?

ambidextrous computing; AMD project skyla..skybridge!

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Elite Member

Diamond Member

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Golden Member

Lifer

Senior member

Elite Member

Lifer

Lifer

Lifer

Golden Member

Senior member