Qualcomm moves Cortex A72 to the mid-range

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jdubs03

Golden Member
Oct 1, 2013
1,473
1,049
136
The title seems a bit misleading. The A72 may be used for mid-range devices, but that comes down to more mature processes. With 16FF+ and 14FF-Samsung, that architecture will be used for high-end devices. That to me seems like the differentiating factor.

ARM did mention Maya AND Artemis, so maybe ARM just hasn't shown their cards just yet (though Artemis was supposed to be the lower-power uarch).

to nothingness:
You seem to think having big.LITTLE is a handicap. Why do you think so? I think having very low power cores along high perf cores is good to have as long as the big core is power efficient enough, which A72 might be if we are to believe ARM (though only time will tell of course).

I think the concept is quite logical, and there are a myriad of ways of implementing it. Why not 2 Super-ARM cores with x amount of low-power cores? The 4x4 implementation isn't the lone possibility. Maybe 3x3 would make sense allowing for higher single-core performance, why not skimping out too much on the low-power cores.
 
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,367
2,459
136
big.LITTLE on one hand seems much better for low power.... but for maximum sustained speed at a certain load two or three big cores appear to be a better solution
And this is exactly what Qualcomm will offer: two big cores. Or perhaps you think having b.L will slow down the two big cores?

These A53 cores are basically vestigal while the A72 cores operate... that is a lot of wasted die space.
A53 is really tiny on a SoC :)
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Am I the only one that's seriously underwhelmed by the gains made by the A57? It's not radically faster than the highest end Krait, while seemingly guzzling power at the same time. The AT A15/A57 comparison as I read it spoke of significant gains in only a few areas (like crypto, which has very specific reasons for improving), all the while increasing power draw and heat by at least the same amount. Of course, this is only a single, immature implementation, versus a very mature A15 - but A15 was never really competitive with Krait in high-end the first place. I'm seriously hoping for the A72 to at least stay within the same power envelope (at both varied and sustained loads) as Krait while outperforming it by at least 20% across the board (not to mention the gains due to handling 64 bit instructions). Otherwise it's just not worth upgrading. I'm very curious as to what Qualcomm comes up with next, at the same time the A57 article had made me all the more appreciative of the engineering brilliance in the Cyclone architecture. It blows both ARM and QC out of the water both in terms of performance and power scaling. The others have some serious catching up to do.

But more on topic: this seems fitting, to be honest. If the A72 falls in line with the gains made by the A57, then it would look great in a mid-range 2+4 (or even 2+2) setup, seeing how ARMs designs seem unable to approach the power scaling of Cyclone and Krait. Not crazy power hungry even when stressed, but still comparable to today's high end when it comes to performance. Let's just hope that this actually ends up in some decent devices, and doesn't get artificially gimped by OEMs with lackluster hardware in other areas.
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
I'm surprised that the A72 is coming this year!

And sadly they will keep doing the 4 weaker A53's...I'd rather they add more to the iGPU's instead!
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
big.LITTLE on one hand seems much better for low power.... but for maximum sustained speed at a certain load two or three big cores appear to be a better solution. These A53 cores are basically vestigal while the A72 cores operate... that is a lot of wasted die space.


I think Apple's solution of two or three big cores, along with superb software power management, is a more elegant solution than big.LITTLE. In theory big.LITTLE will always be more power efficient, but if the core switching isn't done perfectly it becomes more marketing than a real advantage.


I think the main thing big.LITTLE gives Qualcomm and Samsung, is the ability to claim they have 8 cores in their phones vs apple's "measly" 2 cores. That's where they will see the benefit monetarily - in marketing.

An a53 is 0.7mm2 hardly lot of die space lol

People also dont know/remember a single cyclone core = one quad core cluster of a57 for mm2 !
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
I'm just skeptical. I'd be happy to be proven wrong by an independent site like AT.

The world would look different if AT prior "independent" test somehow resembled reality.
Then the x86 power myth was busted and all would run ct+ -> and Intel would hardly lose 4b on mobile
Sorry but on history AT was not right.

(edit: for all fairness johan was right - but prior history was wrong)
 
Last edited:

imported_ats

Senior member
Mar 21, 2008
422
64
86
An a53 is 0.7mm2 hardly lot of die space lol

People also dont know/remember a single cyclone core = one quad core cluster of a57 for mm2 !

Well except that 1 Cyclone core gives better performance at significantly lower power than a 4 pack of A57! Seriously, for phones there are only downsides to 4 and 8 core designs. And certainly the issues with big.little are well documented, well known, and realistically not really solvable.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
Well except that 1 Cyclone core gives better performance at significantly lower power than a 4 pack of A57! Seriously, for phones there are only downsides to 4 and 8 core designs. And certainly the issues with big.little are well documented, well known, and realistically not really solvable.

Well i prefer a phone with 4x a57 than a single core cyclone. But cyclone is damn impressive. And 2 a8 is ofcource far better and twice as expensive so to speak.
But Its different targets.

Go look single thread perf a8 vs ss 20nm a57 as sweepr have shown. And that on 25% area. Add extra cores the seldom times its needed.

its a shame big little is not working yet. But lets have a look in 18 months. Perhaps the follower to a72 :)
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
And 2/4 cyclone perf in midrange sounds nice to me. And all the Chinese producers will dump it on the market so the low end will have it too. Great lol.
 

imported_ats

Senior member
Mar 21, 2008
422
64
86
Well i prefer a phone with 4x a57 than a single core cyclone. But cyclone is damn impressive. And 2 a8 is ofcource far better and twice as expensive so to speak.
But Its different targets.

Why? The phone can't physically actually run all 4 a57 cores without burning up. And the Cyclone core gives you the performance you actually need with is ST performance.

Go look single thread perf a8 vs ss 20nm a57 as sweepr have shown. And that on 25% area. Add extra cores the seldom times its needed.

The area is basically immaterial even in a phone SOC. The area is generally vastly dominated by non-gpu/non-cpu logic. Power is what is important, and at any given power level, Cyclone delivers vastly superior performance not the least of which is that cyclone can actually run continuously unlike a57 as aptly demonstrated by this very site.

its a shame big little is not working yet. But lets have a look in 18 months. Perhaps the follower to a72 :)

Its not a shame. Its reality. Its like people saying just give VLIW more time for the compilers to eventually catch up. Except, its been decades and the compilers still haven't caught up. bl relies on something that basically requires precognition.

The fundamental problem with bl is that the costs of switching processes between contexts is too high and always will be too high from both a power and performance perspective. The only way that bl works is if you integrate the bl into a single core. Basically you have your high performance core design that has advanced clock and power gating and de-spectulation abilities.

So what you actually do is design a 3-4 wide core that can reduce fetch and fetch related speculation down to 2 wide when required. You design your OoO queues such that they have be effectively reduced to cover pipeline delays only. You design your various pipelines such that pipelines beyond the bare minimum can be clock and power gates. And what you end up with is in reality what already exists in advanced power efficient cores like Cyclone, Core ix, etc.

bl therefore is basically a dead end. it is a nice intellectual idea that ignores all the realities.
 

imported_ats

Senior member
Mar 21, 2008
422
64
86
What are the downsides of 4xa53 in a little big concept?

First off, 2xa53 will give you 99.9999999999% of the performance with significant power reduction. Second, any little big concept is fundamentally flawed as I've pointed out. for bl/lb to work requires precognition and/or zero effect from switching context. Neither of which can exist.

And 2/4 cyclone perf in midrange sounds nice to me. And all the Chinese producers will dump it on the market so the low end will have it too. Great lol.

The low end and high end would be much better off with just ~2xa53 in 99.99% of cases. If they could ever get the power draw of an a57 to a reasonable level they would be even better off with 1xa57+1xa53 with a fairly simple modification to the bog standard linux scheduler. The only reason we have 4 and 8 core devices is marketing to the ignorant masses.
 

lopri

Elite Member
Jul 27, 2002
13,327
708
126
But big.LITTLE was conceived not just for phones, at least from ARM's perspective. In MT bench it shows its strength. The latest implementation (Exynos 7420) is fast in ST performance and its MT performance is often 5 times the ST performance or more, per Geekbench subtests.

I have my misgivings on these 8 core little.LITTLE configurations, though. Also I am not sure if 2 + 4 configuration is really all that. I know 2 big + 4 LITTLE has a certain intuitive appeal, but we are talking about general purpose cores that are supposed to run the same instructions. If there is a significant overhead already in 4+4, I wonder if the overhead will be even bigger in an asymmetrical design like 2+4? (stalls, misses, waking up wrong cores, etc.)
 

lopri

Elite Member
Jul 27, 2002
13,327
708
126
I too would like to see more 2+2 designs. (even 1+1 design) I guess A7 and A53 are so small and cheap that OEMs do not feel the need for more optimization there. According to AT's latest investigation, the LITTLE cores are scary small. (0.40mm² for A7, 0.70mm² for A53, on 20nm)
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
Less cores sure didn't help the Nexus 9!

People need to stop generalizing, Apple made a great SoC, and expect it to keep growing. Jeez, its not like they will stay at 2-3 cores forever.

But what other SoC that has less than 4 cores is great? None!
 

lopri

Elite Member
Jul 27, 2002
13,327
708
126
I agree, MisterLilBig. They are different approaches to different applications. I do not see inherent (dis)advantage of either approach, and in the long run they all seem to gravitate towards 4-thread performance anyway.
 

NTMBK

Lifer
Nov 14, 2011
10,522
6,039
136
Asymmetrical core counts implies that they are using global task scheduling, which should work much better than the old cluster migration. Put main app thread on an A72, put the garbage collector on an A53, get a responsive app without blowing up the power budget.
 

Nothingness

Diamond Member
Jul 3, 2013
3,367
2,459
136
So what you actually do is design a 3-4 wide core that can reduce fetch and fetch related speculation down to 2 wide when required. You design your OoO queues such that they have be effectively reduced to cover pipeline delays only. You design your various pipelines such that pipelines beyond the bare minimum can be clock and power gates. And what you end up with is in reality what already exists in advanced power efficient cores like Cyclone, Core ix, etc.
Clock gating is used but as far as I know no CPU limits resources the way you describe (width reduction, queue restrictions, etc.) dynamically to reduce power consumption. Power gating is very expensive and is done only on larger blocks (FPU for instance).

I might be wrong but I'll need a serious reference to be convinced :)
 

Hans de Vries

Senior member
May 2, 2008
347
1,177
136
www.chip-architect.com
So Qualcomm may have been one of the lead licensees for the A72 in Q2 2014:


ARM CEO Simon Segars said:
The 41 licensees signed in 2Q14 included 7 additional ARMv8-A processor licenses including lead licensees for two forthcoming processor core codenamed Artemis and Maia. A total of 8 Mali processor licenses were sold including licenses for video and display processors and 20 Cortex-M licenses of cores suitable for microcontrollers.

http://electronics360.globalspec.com/article/4392/strong-licensing-drives-arm-s-q2-results
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,522
6,039
136
I just read a review of Asus' fanless UX305 with 5Y10. It has ~Hawell-U i3 performance, but with a 3X lower TDP.

Hah, right. For a more realistic analysis of Broadwell-U/Y, take a look at TechReport's review of the Broadwell NUC: http://techreport.com/review/27798/intel-broadwell-powered-nuc-mini-pc-reviewed It gets compared with the Haswell NUC, which has the same form factor and power budget for the APU. It's the fairest comparison we can get, with no massive 13" display sucking down power and throwing power consumption comparisons way out. The result? Broadwell-U is slightly faster than Haswell-U , with ~10-15% power reduction. Not shabby, but none of this 3X madness.