Qualcomm 615, an arm littleLittle?

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
So I just learned that the qualcomm 615 is an octo core based off cortex a53. I already knew it was an octocore prior to yesterday, but ...

What I just learned that it is a two clusters of four cortex a53 cores, and one cluster is going to be running at 1 ghz and the other cluster is going to be running at 1.7 ghz. The clusters can be used simultaneously but very little software will take advantage of 8 cores, and furthermore it is likely you could not run all 8 cores at max clocks in many devices for a extended period of time for arm like intel uses a burst mode. In other words Qualcomm purposefully made the second cluster max frequency 1 ghz instead of a higher frequency and is using the lower frequency cluster for power savings.

So my question is how does this make sense from anything besides a marketing standpoint where you can put octocore on your marketing materials. Why would you use a qualcomm 615 and deal with the hassle of cluster-switching from 1 ghz to 1.7 ghz instead of just using the qualcomm 610 rich is a single cluster of cortex a53 that runs up to 1.7 ghz, the same clock speed of the 615?

Is Qualcomm just being stupid and making a part to make the OEMs happy for the OEMs love small die sizes and they can do marketing BS?
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
The 6xx series has become really pathetic, most of the mobile SoC market has, actually. The new S610 and S615 are actually downgrades from the S600. Where have the A9s gone? Where are the new A12/A17s? They're (the A9s) old but certainly at 20nm should still be quite a bit faster than the in-order cores Qualcomm and the whole market is shoving down our throats.

The market has stopped innovating in the low-end and now it's time for Intel to stop that. I'd rather have a dualcore SoFIA than a slow octacore.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I think Qualcomm just went full AMD D:

zHOilTgh2G.png
 

Nothingness

Diamond Member
Jul 3, 2013
3,316
2,386
136
Where are the new A12/A17s?
They are there, but indeed not that numerous. Look for 3085 and 3086 on Geekbench (3086 is Cortex-A17 CPU ID, and 3085 is A12). Lenovo and Meizu are using MT6595 and the scores are very good.

They are humiliating the poor Intel CPU, in particular, despite the lack of AES: http://browser.primatelabs.com/geekbench3/compare/1189293?baseline=1131578. I guess things will get worse with SoFIA. OTOH the MT6595 (and Cortex-A17 in general) is not really low end but rather middle end.

As far as the 8 cores thing goes, I agree with you, it is utterly stupid. 2 or 3 beefier cores is the way to go IMHO.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Having two clusters makes sense because it allows separate clock domain and voltage rails to each cluster. This allows some level of asynchronous DVFS which allows certain workloads to operate more efficiently. It makes sense that Qualcomm would want this since their Krait CPUs operate on independent voltage rails per core. Per-cluster is a good compromise between full synchronous DVFS and no asynchronous DVFS (Krait suffered from a lot of L2 cache latency because it too ran on a separate clock and voltage domain, while the dual-clusters here don't share L2 cache)

It's also possible that the layout has been optimized differently, so that the cluster that's limited to 1GHz is more power efficient at lower frequencies. That would be similar to the approach nVidia took with their "companion core."

The two options aren't just cluster switching or running 8 cores, those are only the two extremes. You can have any mix of CPUs from either cluster running. Anything less than 4+4 would mean that you can't run even 4 cores simultaneously with full flexibility as to which clock and voltage domain to use for each.

They're (the A9s) old but certainly at 20nm should still be quite a bit faster than the in-order cores Qualcomm and the whole market is shoving down our throats.

I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.
 
Last edited:

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.

I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.
 

NTMBK

Lifer
Nov 14, 2011
10,461
5,846
136
I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.

The A7 is fine, frankly. I have a quad 1.2GHz A7 in my Moto G, and it's genuinely fine. Browses internet fine, plays back video fine, runs podcasts fine, sends text messages fine. I don't see what more I would really want from my smartphone processor. I would rather double up the RAM so it didn't need to reload apps and tabs so often.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.

Kind of weird to complain about the S610 and S615 and ask where the A9s have gone, then say you're not even talking about A53s.

A7 wasn't ever positioned as some kind of replacement for or successor to A9. A9 itself wasn't ever positioned as low end (even Tegra 4i was trying to be mid-range, although it bombed pretty hard). Look at Qualcomm's low end offerings, go back several years. They migrated from ARM11 to Cortex-A5 to Cortex-A7.

They've also moved the 400 series to Cortex-A53, meaning they've moved on from A7 in 2014.

I agree the numbering is a little weird now on the 600 series since they're going from up to 1.9GHz Krait 300 to 1.7GHz Cortex-A53, but they're kind of in an awkward position with that one. Maybe they could have been more aggressive with the clock speeds. MediaTek says MT6795 can clock all the way up to 2.2GHz.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
Having two clusters makes sense because it allows separate clock domain and voltage rails to each cluster. This allows some level of asynchronous DVFS which allows certain workloads to operate more efficiently. It makes sense that Qualcomm would want this since their Krait CPUs operate on independent voltage rails per core. Per-cluster is a good compromise between full synchronous DVFS and no asynchronous DVFS (Krait suffered from a lot of L2 cache latency because it too ran on a separate clock and voltage domain, while the dual-clusters here don't share L2 cache)

It's also possible that the layout has been optimized differently, so that the cluster that's limited to 1GHz is more power efficient at lower frequencies. That would be similar to the approach nVidia took with their "companion core."

The two options aren't just cluster switching or running 8 cores, those are only the two extremes. You can have any mix of CPUs from either cluster running. Anything less than 4+4 would mean that you can't run even 4 cores simultaneously with full flexibility as to which clock and voltage domain to use for each.



I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.

Thanx for great explanation!
From zero sense it now makes perfect sense. Lol.
 

videogames101

Diamond Member
Aug 24, 2005
6,783
27
91
When you target 1.7 GHz you make compromises in the implementation phase in order to meet the frequency goals. You'll use lower Vt cells, higher drive currents, etc. In making these changes you're taking a hit to power consumption even when you run the thing at a lower frequency, say 1GHz.

When you target 1 GHz you can use higher Vt cells, lower drive currents, and generally target a much lower power number than a cluster that was targeted 1.7GHz but is running at 1GHz.

So there is so sense in shoving 2 quad core clusters in there, especially if area is basically free and if you do power gating (using sleep transistors, etc) on the cluster that's not active.
 
Last edited:

Piroko

Senior member
Jan 10, 2013
905
79
91
Reads like a chip specifically designed for the chinese market, the number eight as a lucky number will probably sell more chips than any quadcore design might - no matter how rubbish the actual chip is.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
ARM naming scheme looks much saner than Intel's one
Then what sane logic is there to have random single digit names and then go magically to 5x names with no pattern? If you have i7 4770K, then you know the next one will be i7 5770K (or i7 5790K if it's the successor of DC).
 

Nothingness

Diamond Member
Jul 3, 2013
3,316
2,386
136
Then what sane logic is there to have random single digit names and then go magically to 5x names with no pattern?
Isn't it obvious 5x means ARMv8 CPU?

If you have i7 4770K, then you know the next one will be i7 5770K (or i7 5790K if it's the successor of DC).
Yeah sure. And how do you relate 1230 to 4770 or to G3240T? What are G3240T and G3320TE? Do you want me to continue? Intel naming scheme is plain horrible. But at least their ARK site helps a lot.
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,619
7,801
136
The Cortex A5x series are 64 bit. Right? They should have jumped to the Cortex A6x series, but they went to the 5x instead. Who knows why?

The Cortex A53 is a really nice core if the power that ARM claims is true. 2.2 DMIPS/MHz, 64 bit architecture and lower power consumption than an A9.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Isn't it obvious 5x means ARMv8 CPU?
Then why don't use A87 and A83?

Yeah sure. And how do you relate 1230 to 4770 or to G3240T? What are G3240T and G3320TE? Do you want me to continue? Intel naming scheme is plain horrible. But at least their ARK site helps a lot.

It might be horrible, but it isn't too bad once you know what K, T etc. stand for and you realize that higher 00x0 is better and 0x00 is also a bit better and moves from i3 to i5 to i7 and x000 is the generation. At least it's consistent; at least from Ivy Bridge to Haswell to Broadwell.
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
I understand why have two clusters would make sense if there is something different at the transistor level where one is optimized for power and one is optimized for switching speed.


What I don't understand is why would you not just scale up the voltage or scale down the voltage with dynamic voltage frequency scaling if the chips are identical on a transistor level. You can always scale down the 1.7 ghz part to lower clock speeds such as 1 ghz or something like a few hundred mhz. The only way this would make sense to me is that it is faster to switch from one cluster to the other cluster and then turn off the high power cluster instead of switching the voltage and the frequency. I seriously doubt this is the case for you would have to copy all that data from one cache to the other and that would have latency.

I understand bigLittle that makes sense to me for those are different at the transistor level and it makes sense to suffer some latency switching from one cluster to the other in exchange for faster clock speeds once the switch is done, or better battery life if you do the opposite, but this littleLittle seems like it would rarely make sense besides the marketing standpoint where you can say you have more cores than your opponent.

CPUs are not GPUs, you can't just throw more cores at the problem and hope it will scale, and unless the transistors are different I doubt there would be any significant power savings.

----

We will eventually see devices and see if this implementation makes sense, but when I see it all I can do is a double take and wonder what they are thinking. Maybe I am just not smart enough on the technical level.
 

Nothingness

Diamond Member
Jul 3, 2013
3,316
2,386
136
Then why don't use A87 and A83?
It doesn't have to match the architecture version. All you need to know is that A5x is for ARMv8 64-bit cores :)


It might be horrible, but it isn't too bad once you know what K, T etc. stand for and you realize that higher 00x0 is better and 0x00 is also a bit better and moves from i3 to i5 to i7 and x000 is the generation. At least it's consistent; at least from Ivy Bridge to Haswell to Broadwell.
The first digit isn't the generation, re-read my post, it's far from being consistent.
- 1230 represents 3 different generations depending on the lack of suffix or v2 or v3
- 2680 ditto
- 4770 is a Haswell core
- G3240 also is Haswell
- 2980U also is Haswell
Of course you can make a sense out of it... But your claim was utterly wrong: looking at the first digit won't tell you the generation ;)
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Didn't realize that there were exceptions :p. At least, most of the time it will match the generation. I guess they give the low-end CPUs a lower generation number to push people into buying i3/5/7. When in doubt, just buy a CPU with the highest first digit and i3/5/7 depending on how much performance/threads you want ;).
 

videogames101

Diamond Member
Aug 24, 2005
6,783
27
91
I understand why have two clusters would make sense if there is something different at the transistor level where one is optimized for power and one is optimized for switching speed.


What I don't understand is why would you not just scale up the voltage or scale down the voltage with dynamic voltage frequency scaling if the chips are identical on a transistor level. You can always scale down the 1.7 ghz part to lower clock speeds such as 1 ghz or something like a few hundred mhz. The only way this would make sense to me is that it is faster to switch from one cluster to the other cluster and then turn off the high power cluster instead of switching the voltage and the frequency. I seriously doubt this is the case for you would have to copy all that data from one cache to the other and that would have latency.

I understand bigLittle that makes sense to me for those are different at the transistor level and it makes sense to suffer some latency switching from one cluster to the other in exchange for faster clock speeds once the switch is done, or better battery life if you do the opposite, but this littleLittle seems like it would rarely make sense besides the marketing standpoint where you can say you have more cores than your opponent.

CPUs are not GPUs, you can't just throw more cores at the problem and hope it will scale, and unless the transistors are different I doubt there would be any significant power savings.

----

We will eventually see devices and see if this implementation makes sense, but when I see it all I can do is a double take and wonder what they are thinking. Maybe I am just not smart enough on the technical level.

If they are truly the same at the transistor level, then you're correct. But I think it's fairly likely that they are different at the transistor level. These are two A53's with the same digital logic but probably with different transistor implementations to optimize for speed or power.

That's the thing with largely synthesized cores, you can implement them for totally different power/speed/area targets and get very different looking designs at the transistor level, even though the logic is the same.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
If they are truly the same at the transistor level, then you're correct. But I think it's fairly likely that they are different at the transistor level. These are two A53's with the same digital logic but probably with different transistor implementations to optimize for speed or power.

That's the thing with largely synthesized cores, you can implement them for totally different power/speed/area targets and get very different looking designs at the transistor level, even though the logic is the same.

After understanding this concept it looks brilliant to me;

In order and a cluster tuned for low power on top of that (Vt transistor level). That cluster will probably be used 98% of active the time anyway = battery power in spades = consumer love #1

Synthesized cores and then just reuse digital logic = dirt cheap to develop

Cores probably in say 0.7mm2 including L1 on dirt cheap 28nm lp process. 4 extra cores hardly makes any difference on the total diebudget = dirt cheap to produce

Marketing power in spades = sells great as seen by tegra3

From a business perspective this is just a stellar product $$$
 
Last edited: