Qualcomm 615, an arm littleLittle?

Roland00Address · Dec 18, 2014

So I just learned that the qualcomm 615 is an octo core based off cortex a53. I already knew it was an octocore prior to yesterday, but ...

What I just learned that it is a two clusters of four cortex a53 cores, and one cluster is going to be running at 1 ghz and the other cluster is going to be running at 1.7 ghz. The clusters can be used simultaneously but very little software will take advantage of 8 cores, and furthermore it is likely you could not run all 8 cores at max clocks in many devices for a extended period of time for arm like intel uses a burst mode. In other words Qualcomm purposefully made the second cluster max frequency 1 ghz instead of a higher frequency and is using the lower frequency cluster for power savings.

So my question is how does this make sense from anything besides a marketing standpoint where you can put octocore on your marketing materials. Why would you use a qualcomm 615 and deal with the hassle of cluster-switching from 1 ghz to 1.7 ghz instead of just using the qualcomm 610 rich is a single cluster of cortex a53 that runs up to 1.7 ghz, the same clock speed of the 615?

Is Qualcomm just being stupid and making a part to make the OEMs happy for the OEMs love small die sizes and they can do marketing BS?

witeken · Dec 18, 2014

The 6xx series has become really pathetic, most of the mobile SoC market has, actually. The new S610 and S615 are actually downgrades from the S600. Where have the A9s gone? Where are the new A12/A17s? They're (the A9s) old but certainly at 20nm should still be quite a bit faster than the in-order cores Qualcomm and the whole market is shoving down our throats.

The market has stopped innovating in the low-end and now it's time for Intel to stop that. I'd rather have a dualcore SoFIA than a slow octacore.

Idontcare · Dec 18, 2014

I think Qualcomm just went full AMD D:

Nothingness · Dec 18, 2014

witeken said:
Where are the new A12/A17s?

They are there, but indeed not that numerous. Look for 3085 and 3086 on Geekbench (3086 is Cortex-A17 CPU ID, and 3085 is A12). Lenovo and Meizu are using MT6595 and the scores are very good.

They are humiliating the poor Intel CPU, in particular, despite the lack of AES: http://browser.primatelabs.com/geekbench3/compare/1189293?baseline=1131578. I guess things will get worse with SoFIA. OTOH the MT6595 (and Cortex-A17 in general) is not really low end but rather middle end.

As far as the 8 cores thing goes, I agree with you, it is utterly stupid. 2 or 3 beefier cores is the way to go IMHO.

itsmydamnation · Dec 18, 2014

its likely targeted as an asian entry level chip, those guys seem to love their cores......

edit: competition to things like
http://liliputing.com/2014/07/mediatek-launches-64-bit-octa-core-smartphone-chip.html

Exophase · Dec 18, 2014

Having two clusters makes sense because it allows separate clock domain and voltage rails to each cluster. This allows some level of asynchronous DVFS which allows certain workloads to operate more efficiently. It makes sense that Qualcomm would want this since their Krait CPUs operate on independent voltage rails per core. Per-cluster is a good compromise between full synchronous DVFS and no asynchronous DVFS (Krait suffered from a lot of L2 cache latency because it too ran on a separate clock and voltage domain, while the dual-clusters here don't share L2 cache)

It's also possible that the layout has been optimized differently, so that the cluster that's limited to 1GHz is more power efficient at lower frequencies. That would be similar to the approach nVidia took with their "companion core."

The two options aren't just cluster switching or running 8 cores, those are only the two extremes. You can have any mix of CPUs from either cluster running. Anything less than 4+4 would mean that you can't run even 4 cores simultaneously with full flexibility as to which clock and voltage domain to use for each.

witeken said:
They're (the A9s) old but certainly at 20nm should still be quite a bit faster than the in-order cores Qualcomm and the whole market is shoving down our throats.

I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.

witeken · Dec 18, 2014

Exophase said:
I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.

I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.

NTMBK · Dec 18, 2014

witeken said:
I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.

The A7 is fine, frankly. I have a quad 1.2GHz A7 in my Moto G, and it's genuinely fine. Browses internet fine, plays back video fine, runs podcasts fine, sends text messages fine. I don't see what more I would really want from my smartphone processor. I would rather double up the RAM so it didn't need to reload apps and tabs so often.

Exophase · Dec 18, 2014

witeken said:
I didn't specifically mean the A53, but more the A7. It's slower per clock and has a lower frequency. I feel like there isn't really anything midrange on the market. That will fortunately change with the new competition in '15.

Kind of weird to complain about the S610 and S615 and ask where the A9s have gone, then say you're not even talking about A53s.

A7 wasn't ever positioned as some kind of replacement for or successor to A9. A9 itself wasn't ever positioned as low end (even Tegra 4i was trying to be mid-range, although it bombed pretty hard). Look at Qualcomm's low end offerings, go back several years. They migrated from ARM11 to Cortex-A5 to Cortex-A7.

They've also moved the 400 series to Cortex-A53, meaning they've moved on from A7 in 2014.

I agree the numbering is a little weird now on the 600 series since they're going from up to 1.9GHz Krait 300 to 1.7GHz Cortex-A53, but they're kind of in an awkward position with that one. Maybe they could have been more aggressive with the clock speeds. MediaTek says MT6795 can clock all the way up to 2.2GHz.

krumme · Dec 18, 2014

Exophase said:
Having two clusters makes sense because it allows separate clock domain and voltage rails to each cluster. This allows some level of asynchronous DVFS which allows certain workloads to operate more efficiently. It makes sense that Qualcomm would want this since their Krait CPUs operate on independent voltage rails per core. Per-cluster is a good compromise between full synchronous DVFS and no asynchronous DVFS (Krait suffered from a lot of L2 cache latency because it too ran on a separate clock and voltage domain, while the dual-clusters here don't share L2 cache)

It's also possible that the layout has been optimized differently, so that the cluster that's limited to 1GHz is more power efficient at lower frequencies. That would be similar to the approach nVidia took with their "companion core."

The two options aren't just cluster switching or running 8 cores, those are only the two extremes. You can have any mix of CPUs from either cluster running. Anything less than 4+4 would mean that you can't run even 4 cores simultaneously with full flexibility as to which clock and voltage domain to use for each.

I think you're underestimating A53. Having some instruction reordering doesn't make A9 better universally, for example its decoupled L2 will have higher latency than A53's. In practice I think they have similar perf/MHz (sometimes A53 is worse, sometimes better), with the A53 probably being more power efficient. And it has the advantage of supporting 64-bit, which is a performance benefit as well.

And it's not like S610 or S615 are on 20nm either, in fact they're not even on 28HPM but 28LP, the process Qualcomm first introduced Krait on in January 2012. Which explains the clock limits a little better. No one wants to use a leading edge process like TSMC 20nm (and pay a price premium) with a now old core like Cortex-A9.

Thanx for great explanation!
From zero sense it now makes perfect sense. Lol.

videogames101 · Dec 18, 2014

When you target 1.7 GHz you make compromises in the implementation phase in order to meet the frequency goals. You'll use lower Vt cells, higher drive currents, etc. In making these changes you're taking a hit to power consumption even when you run the thing at a lower frequency, say 1GHz.

When you target 1 GHz you can use higher Vt cells, lower drive currents, and generally target a much lower power number than a cluster that was targeted 1.7GHz but is running at 1GHz.

So there is so sense in shoving 2 quad core clusters in there, especially if area is basically free and if you do power gating (using sleep transistors, etc) on the cluster that's not active.

Piroko · Dec 18, 2014

Reads like a chip specifically designed for the chinese market, the number eight as a lucky number will probably sell more chips than any quadcore design might - no matter how rubbish the actual chip is.

jhu · Dec 18, 2014

krumme said:
Thanx for great explanation!
From zero sense it now makes perfect sense. Lol.

Except ARM and Qualcomm's naming schemes make Intel's look sane.

Nothingness · Dec 18, 2014

jhu said:
Except ARM and Qualcomm's naming schemes make Intel's look sane.

ARM naming scheme looks much saner than Intel's one, but for Qualcomm I agree

witeken · Dec 18, 2014

Nothingness said:
ARM naming scheme looks much saner than Intel's one

Then what sane logic is there to have random single digit names and then go magically to 5x names with no pattern? If you have i7 4770K, then you know the next one will be i7 5770K (or i7 5790K if it's the successor of DC).

Nothingness · Dec 18, 2014

witeken said:
Then what sane logic is there to have random single digit names and then go magically to 5x names with no pattern?

Isn't it obvious 5x means ARMv8 CPU?

If you have i7 4770K, then you know the next one will be i7 5770K (or i7 5790K if it's the successor of DC).

Yeah sure. And how do you relate 1230 to 4770 or to G3240T? What are G3240T and G3320TE? Do you want me to continue? Intel naming scheme is plain horrible. But at least their ARK site helps a lot.

gdansk · Dec 18, 2014

The Cortex A5x series are 64 bit. Right? They should have jumped to the Cortex A6x series, but they went to the 5x instead. Who knows why?

The Cortex A53 is a really nice core if the power that ARM claims is true. 2.2 DMIPS/MHz, 64 bit architecture and lower power consumption than an A9.

witeken · Dec 18, 2014

Nothingness said:
Isn't it obvious 5x means ARMv8 CPU?

Then why don't use A87 and A83?

Yeah sure. And how do you relate 1230 to 4770 or to G3240T? What are G3240T and G3320TE? Do you want me to continue? Intel naming scheme is plain horrible. But at least their ARK site helps a lot.

It might be horrible, but it isn't too bad once you know what K, T etc. stand for and you realize that higher 00x0 is better and 0x00 is also a bit better and moves from i3 to i5 to i7 and x000 is the generation. At least it's consistent; at least from Ivy Bridge to Haswell to Broadwell.

Roland00Address · Dec 18, 2014

I understand why have two clusters would make sense if there is something different at the transistor level where one is optimized for power and one is optimized for switching speed.

What I don't understand is why would you not just scale up the voltage or scale down the voltage with dynamic voltage frequency scaling if the chips are identical on a transistor level. You can always scale down the 1.7 ghz part to lower clock speeds such as 1 ghz or something like a few hundred mhz. The only way this would make sense to me is that it is faster to switch from one cluster to the other cluster and then turn off the high power cluster instead of switching the voltage and the frequency. I seriously doubt this is the case for you would have to copy all that data from one cache to the other and that would have latency.

I understand bigLittle that makes sense to me for those are different at the transistor level and it makes sense to suffer some latency switching from one cluster to the other in exchange for faster clock speeds once the switch is done, or better battery life if you do the opposite, but this littleLittle seems like it would rarely make sense besides the marketing standpoint where you can say you have more cores than your opponent.

CPUs are not GPUs, you can't just throw more cores at the problem and hope it will scale, and unless the transistors are different I doubt there would be any significant power savings.

----

We will eventually see devices and see if this implementation makes sense, but when I see it all I can do is a double take and wonder what they are thinking. Maybe I am just not smart enough on the technical level.

Nothingness · Dec 18, 2014

witeken said:
Then why don't use A87 and A83?

It doesn't have to match the architecture version. All you need to know is that A5x is for ARMv8 64-bit cores

It might be horrible, but it isn't too bad once you know what K, T etc. stand for and you realize that higher 00x0 is better and 0x00 is also a bit better and moves from i3 to i5 to i7 and x000 is the generation. At least it's consistent; at least from Ivy Bridge to Haswell to Broadwell.

The first digit isn't the generation, re-read my post, it's far from being consistent.
- 1230 represents 3 different generations depending on the lack of suffix or v2 or v3
- 2680 ditto
- 4770 is a Haswell core
- G3240 also is Haswell
- 2980U also is Haswell
Of course you can make a sense out of it... But your claim was utterly wrong: looking at the first digit won't tell you the generation

witeken · Dec 18, 2014

Didn't realize that there were exceptions

. At least, most of the time it will match the generation. I guess they give the low-end CPUs a lower generation number to push people into buying i3/5/7. When in doubt, just buy a CPU with the highest first digit and i3/5/7 depending on how much performance/threads you want

.

Nothingness · Dec 18, 2014

Yeah, and in case of doubt ark.intel.com is really great

videogames101 · Dec 18, 2014

Roland00Address said:
I understand why have two clusters would make sense if there is something different at the transistor level where one is optimized for power and one is optimized for switching speed.

What I don't understand is why would you not just scale up the voltage or scale down the voltage with dynamic voltage frequency scaling if the chips are identical on a transistor level. You can always scale down the 1.7 ghz part to lower clock speeds such as 1 ghz or something like a few hundred mhz. The only way this would make sense to me is that it is faster to switch from one cluster to the other cluster and then turn off the high power cluster instead of switching the voltage and the frequency. I seriously doubt this is the case for you would have to copy all that data from one cache to the other and that would have latency.

I understand bigLittle that makes sense to me for those are different at the transistor level and it makes sense to suffer some latency switching from one cluster to the other in exchange for faster clock speeds once the switch is done, or better battery life if you do the opposite, but this littleLittle seems like it would rarely make sense besides the marketing standpoint where you can say you have more cores than your opponent.

CPUs are not GPUs, you can't just throw more cores at the problem and hope it will scale, and unless the transistors are different I doubt there would be any significant power savings.

----

We will eventually see devices and see if this implementation makes sense, but when I see it all I can do is a double take and wonder what they are thinking. Maybe I am just not smart enough on the technical level.

If they are truly the same at the transistor level, then you're correct. But I think it's fairly likely that they are different at the transistor level. These are two A53's with the same digital logic but probably with different transistor implementations to optimize for speed or power.

That's the thing with largely synthesized cores, you can implement them for totally different power/speed/area targets and get very different looking designs at the transistor level, even though the logic is the same.

krumme · Dec 18, 2014

Deleted....

krumme · Dec 18, 2014

videogames101 said:
If they are truly the same at the transistor level, then you're correct. But I think it's fairly likely that they are different at the transistor level. These are two A53's with the same digital logic but probably with different transistor implementations to optimize for speed or power.

That's the thing with largely synthesized cores, you can implement them for totally different power/speed/area targets and get very different looking designs at the transistor level, even though the logic is the same.

After understanding this concept it looks brilliant to me;

In order and a cluster tuned for low power on top of that (Vt transistor level). That cluster will probably be used 98% of active the time anyway = battery power in spades = consumer love #1

Synthesized cores and then just reuse digital logic = dirt cheap to develop

Cores probably in say 0.7mm2 including L1 on dirt cheap 28nm lp process. 4 extra cores hardly makes any difference on the total diebudget = dirt cheap to produce

Marketing power in spades = sells great as seen by tegra3

From a business perspective this is just a stellar product $$$

Qualcomm 615, an arm littleLittle?

Platinum Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member