ARM CPU big.LITTLE process transfer - how does it work?

Fjodor2001 · Jan 11, 2013

Hi,

I just wonder how the ARM CPU big.LITTLE concept is supposed to work when transferring a process from one CPU core to another?

For example, the Samsung Exynos 5 Octa SoC has 4xCortex-A7 and 4xCortex-A15 cores. But doesn't the ARM Cortex A15 have instructions that are specifically tailored for that CPU core that the Cortex-A7 cannot perform? I.e. similar to the TSX and AVX2 instructions that Haswell will introduce on x86?

If so, how is it possible to transfer a process from a Cortex-A15 core to a Cortex-A7 core when the OS decides that it should go into low power & low performance mode?

Any idea?

ShintaiDK · Jan 11, 2013

http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf

Exophase · Jan 11, 2013

Cortex-A15 and Cortex-A7 are architecturally compatible, at least in user space (not sure if there's anything kernel level that isn't). That doesn't just help big.LITTLE but cross-platform compatibility in general.

Fjodor2001 · Jan 11, 2013

ShintaiDK said:
http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf

So short version: ARM Cortex-A7 can execute all instructions that Cortex-A15 can, but does it slower? Is that correct?

Then what about a similar x86 solution that is said to be developed? Can they make a small & power efficient x86 core that still has support for all advanced TSX/AVX/AVX2/etc instructions?

Exophase · Jan 11, 2013

Fjodor2001 said:
So short version: ARM Cortex-A7 can execute all instructions that Cortex-A15 can, but does it slower? Is that correct?

Yes, although saying it does it slower is an over-simplification.

Fjodor2001 said:
Then what about a similar x86 solution that is said to be developed? Can they make a small & power efficient x86 core that still has support for all advanced TSX/AVX/AVX2/etc instructions?

Sure they could. Will they? Right now Intel doesn't want to enable all of its instruction set extensions in every current gen processor SKU, and Atom is further behind. But of course that can change.

One of the reasons why ARM can justify big.LITTLE with Cortex-A15 and Cortex-A7 is that both have useful market segments by themselves, and continue off of work established in other markets (for instance Cortex-A7 is the successor to work done on Cortex-A5, which was coupled to work done on Cortex-R4/R5 which are important in totally different markets). While Intel could do something like Haswell + Silvermont or AMD could do Steamroller + Jaguar going for something overall lower power, more aligned with A15 + A7, would require new CPU cores that are designed specifically for this purpose. And it's a bit more challenging to push an x86 CPU down to A7 perf/area and perf/W levels.

Some people are highly skeptical that big.LITTLE has any merit to begin with and think that one CPU can adequately scale to both levels, and can be accommodated with fully asynchronous multicore. I'm sure there are various pros and cons involved.

Fjodor2001 · Jan 11, 2013

Exophase said:
Sure they could. Will they? Right now Intel doesn't want to enable all of its instruction set extensions in every current gen processor SKU, and Atom is further behind. But of course that can change.

But would it be as good as an ARM big.LITTLE solution? Doesn't e.g. Haswell have more complex instructions than Cortex-A15, making it harder to do a small & power efficient core that still can execute all TSX/AVX2/etc instructions?

Exophase said:
Some people are highly skeptical that big.LITTLE has any merit to begin with and think that one CPU can adequately scale to both levels, and can be accommodated with fully asynchronous multicore. I'm sure there are various pros and cons involved.

You mean it would be equally as good to CPU frequency scale a Cortex-A15 even lower than transferring the process to the Cortex A-7?

Or do you mean that the CPU core should be redesigned to internally switch between low power & low performance vs high power & high performance? So you can keep a single CPU core but still get the effect of the big.LITTLE concept? Would that really be possible without more or less designing a CPU core that actually is 2 cores in 1? I.e. the end result would be the same as big.LITTLE?

Finally, what would be the pros/cons of big.LITTLE?
-The pros would be much better performance & power scaling.
-The cons I can think is that it will have to save and restore the CPU state (registers & interrupts) when transferring a process to another core. But according to the presentation that ShintaiDK linked to that takes only 20.000 CPU cycles (or 20 microseconds at 1 GHz), which is not that much if not done very frequently.
-Anything else?

ShintaiDK · Jan 11, 2013

Another issue with big.LITTLE is the additional die space it takes up and complexity overhead on the software front. If you dont go with the big.LITTLE MP solution, then one type of cores at a time is essentially dead space.

Its gonna be interesting to see tho. x86, PPC, IA64 and MIPS sofar went with the 1 core type solution for power saving and ARM with the 2 type solution. Perhaps the issue is rather that ARM couldn´t get A15 down to the levels required. R&D budget quickly comes to mind besides the issue of being behind process wise. Or maybe ARM just saw the light the others missed. We will see in the near future how it ends.

Fjodor2001 · Jan 12, 2013

ShintaiDK said:
Another issue with big.LITTLE is the additional die space it takes up [...]. If you dont go with the big.LITTLE MP solution, then one type of cores at a time is essentially dead space.

But note that the Cortex-A7 cores are very small, so it's not that much space that potentially can be wasted. Also, as I understand it there is a mode where both the big and LITTLE cores can execute code in parallel. In that case the cores will not be wasted.

See the presentation that you linked to which says:

"Since a big.LITTLE system containing Cortex-A15 and Cortex-A7 is fully coherent through CCI-400 another logical use-model is to allow both Cortex-A15 and Cortex-A7 to be powered on and simultaneously executing code. This is termed big.LITTLE MP, which is essentially Heterogeneous Multi-Processing. Note that in this use model Cortex-A15 only needs to be powered on and simultaneously executing next to Cortex-A7 if there are threads that need that level of processing performance. If not, only Cortex-A7 needs to be powered on."

ShintaiDK said:
...and complexity overhead on the software front

As I understand it big.LITTLE only requires modifications to the OS. Or actually it requires the addition of a "switcher" done in SW to transfer/mitigate processes between cores. And ARM already provides such a "switcher". No modification to the applications should be needed.

So for e.g. mobile phones where the manufacturer controls what OS goes into the device it's just a matter of making sure that big.LITTLE support is enabled and then it won't be a problem. And if big.LITTLE support makes it into the Linux Kernel, Apple iOS and possibly Windows (for Windows on ARM), then the SW problem is more or less solved.

ShintaiDK said:
Its gonna be interesting to see tho. x86, PPC, IA64 and MIPS sofar went with the 1 core type solution for power saving and ARM with the 2 type solution. Perhaps the issue is rather that ARM couldn´t get A15 down to the levels required. R&D budget quickly comes to mind besides the issue of being behind process wise. Or maybe ARM just saw the light the others missed. We will see in the near future how it ends.

How does a 1 core equivalent to the big.LITTLE concept work? Are you just talking about having CPU cores that are power efficient in general, or is there some specific architecture for this? For example do you mean that the single CPU core e.g. internally can switch between low power & low performance mode vs high power & high performance mode?

beginner99 · Jan 12, 2013

Fjodor2001 said:
How does a 1 core equivalent to the big.LITTLE concept work? Are you just talking about having CPU cores that are power efficient in general, or is there some specific architecture for this? For example do you mean that the single CPU core e.g. internally can switch between low power & low performance mode vs high power & high performance mode?

power efficient cores, different dies for different applications and binning. Use those that can run a low voltage for low power use.

The 17W Ivy Bridge and 77W Desktops parts have the same core. And with haswell the 7 W part willl have the same core as the 95 W desktop part.

or what Tegra 4 does. separate A15 core on a separate low-power process and other enhancement that it uses less power (and obviously runs at lower frequencies).

Fjodor2001 · Jan 12, 2013

beginner99 said:
power efficient cores, different dies for different applications and binning. Use those that can run a low voltage for low power use.

The 17W Ivy Bridge and 77W Desktops parts have the same core. And with haswell the 7 W part willl have the same core as the 95 W desktop part.

or what Tegra 4 does. separate A15 core on a separate low-power process and other enhancement that it uses less power (and obviously runs at lower frequencies).

How would that be comparable to the big.LITTLE concept? It's not the same thing at all as I see it.

Also, you will not get as good power & performance scaling as with big.LITTLE that way. For example a 7 W Haswell core will never even be close to as power efficient as a Cortex-A7 core in low performance mode.

See:

ShintaiDK · Jan 12, 2013

Fjodor2001 said:
How would that be comparable to the big.LITTLE concept? It's not the same thing at all as I see it.

Also, you will not get as good power & performance scaling as with big.LITTLE that way. For example a 7 W Haswell core will never even be close to as power efficient as a Cortex-A7 core in low performance mode.

See:

Haswell is not the main competitor. Atom is.

I think his point is that you got 84W chips in 1 end, 7-10W in the other end. And both with SOIx states that can idle in the miliwatts.

Now transform this to Atom and you should get the point. Silvermont with a new uarch should show where Atom stands in the game.

ARM simply choose the big.LITTLE concept instead of making more power efficient cores. The A15 is a guzzler compared to A7.

ShintaiDK · Jan 12, 2013

Fjodor2001 said:
But note that the Cortex-A7 cores are very small, so it's not that much space that potentially can be wasted. Also, as I understand it there is a mode where both the big and LITTLE cores can execute code in parallel. In that case the cores will not be wasted.

See the presentation that you linked to which says:

"Since a big.LITTLE system containing Cortex-A15 and Cortex-A7 is fully coherent through CCI-400 another logical use-model is to allow both Cortex-A15 and Cortex-A7 to be powered on and simultaneously executing code. This is termed big.LITTLE MP, which is essentially Heterogeneous Multi-Processing. Note that in this use model Cortex-A15 only needs to be powered on and simultaneously executing next to Cortex-A7 if there are threads that need that level of processing performance. If not, only Cortex-A7 needs to be powered on."

As I understand it big.LITTLE only requires modifications to the OS. Or actually it requires the addition of a "switcher" done in SW to transfer/mitigate processes between cores. And ARM already provides such a "switcher". No modification to the applications should be needed.

So for e.g. mobile phones where the manufacturer controls what OS goes into the device it's just a matter of making sure that big.LITTLE support is enabled and then it won't be a problem. And if big.LITTLE support makes it into the Linux Kernel, Apple iOS and possibly Windows (for Windows on ARM), then the SW problem is more or less solved.

How does a 1 core equivalent to the big.LITTLE concept work? Are you just talking about having CPU cores that are power efficient in general, or is there some specific architecture for this? For example do you mean that the single CPU core e.g. internally can switch between low power & low performance mode vs high power & high performance mode?

6mm2 here and there. Its the volume that matters. A penny is no money, a billion pennies is alot of money.

The cores are not wasted as I wrote if you use big.LITTLE MP. But for smartphones, I am quite sure the main method will be big.LITTLE and not the MP version.

Only changes to the OS? You make it sound so easy

I think its already explain, ARM uses other types cores to do their powersaving, instead of making their faster cores more energy efficient and the ability to idle/gate/etc in lower states. It could be that ARM simply lacks the IPs or R&D money. Or they just saw something the others missed. But sofar ARM is the one doing "something else". And that can be positive or negative. Only time will tell.

Fjodor2001 · Jan 12, 2013

ShintaiDK said:
6mm2 here and there. Its the volume that matters. A penny is no money, a billion pennies is alot of money.

The cores are not wasted as I wrote if you use big.LITTLE MP. But for smartphones, I am quite sure the main method will be big.LITTLE and not the MP version.

Sure. But on the other hand the device manufacturers will not use a big.LITTLE SoC unless they also put an OS in the device that actually supports big.LITTLE. So in reality, no cores will be wasted.

It's not like with a desktop PC where the end user can install any OS of choice, which possibly could be one that does not support big.LITTLE.

ShintaiDK said:
Only changes to the OS? You make it sound so easy

Actually, it's not THAT complicated if you check out the whitepaper that you linked to previously. Think of it as a somewhat more complicated OS kernel process scheduler.

Also, what I meant was that it's not a very intrusive solution compared to having to redesign or recompile all applications. It's a huge difference. Compare to for example Intel adding TSX/AVX2 instructions in Haswell, where all applications will have to be recompiled to make use of that.

In addition, ARM already provides the necessary "switcher SW" for process transfer.

ShintaiDK said:
I think its already explain, ARM uses other types cores to do their powersaving, instead of making their faster cores more energy efficient and the ability to idle/gate/etc in lower states. It could be that ARM simply lacks the IPs or R&D money. Or they just saw something the others missed. But sofar ARM is the one doing "something else". And that can be positive or negative. Only time will tell.

Couldn't the reason also be that ARM has a different CPU architecture? I.e. it doesn't have as many complex instructions compared to for example modern x86 CPUs such as Haswell. So that makes it easier for ARM to do a low power CPU core that still can support all instructions.

So maybe a big.LITTLE solution is not so easy to archive for Intel with x86, even if they wanted to?

ShintaiDK · Jan 12, 2013

Fjodor2001 said:
Sure. But on the other hand the device manufacturers will not use a big.LITTLE SoC unless they also put an OS in the device that actually supports big.LITTLE. So in reality, no cores will be wasted.

If you only use 4 at a time. Then you do have 4 cores wasted. Simply because only 4 cores actually archive the performance/watt wished. I assume big.LITTLE MP will mostly be used in tablets.

Fjodor2001 said:
It's not like with a desktop PC where the end user can install any OS of choice, which possibly could be one that does not support big.LITTLE.

But it could or could not hinder future updates. The device maker (usually)dont make the OS either. For Samsung its Android, meaning Google.

Fjodor2001 said:
Actually, it's not THAT complicated if you check out the whitepaper that you linked to previously. Think of it as a somewhat more complicated OS kernel process scheduler.

Windows and Bulldozer/Pilediver uarch scheduler. How hard could it be? Or the years spend on getting HT scheduling working.

Fjodor2001 said:
Also, what I meant was that it's not a very intrusive solution compared to having to redesign or recompile all applications. It's a huge difference. Compare to for example Intel adding TSX/AVX2 instructions in Haswell, where all applications will have to be recompiled to make use of that.

In addition, ARM already provides the necessary "switcher SW" for process transfer.

I have no clue what you are trying to point out here. I think you get confused with compiler paths and move actually memory running applications and OS realtime between different types of cores.

Fjodor2001 said:
Couldn't the reason also be that ARM has a different CPU architecture? I.e. it doesn't have as many complex instructions compared to for example modern x86 CPUs such as Haswell. So that makes it easier for ARM to do a low power CPU core that still can support all instructions.

So maybe a big.LITTLE solution is not so easy to archive for Intel with x86, even if they wanted to?

Nothing at all points to this.

sefsefsefsef · Jan 12, 2013

A fully compatible "little" x86 core is certainly possible. It could even be super tiny and still maintain ISA support for all advanced extensions, like AVX2. Processing AVX2 instructions, for example, doesn't require tons of floating point hardware to produce a "correct" result (programatically speaking), and it could in fact even be accomplished using a single floating point unit, iterating over many clock cycles.

The only thing you can't get away from, in terms of ISA compatibility, is that you cannot take any shortcuts when it comes to the size of your architected register file. You can fudge everything else using very limited hardware, and the register file itself isn't *that* big.

Fjodor2001 · Jan 12, 2013

ShintaiDK said:
If you only use 4 at a time. Then you do have 4 cores wasted. Simply because only 4 cores actually archive the performance/watt wished. I assume big.LITTLE MP will mostly be used in tablets.

Ok, then I see what you mean. But I don't consider it as those cores being wasted. It's just that they are not used all of the time. It's simply the price you pay to get the additional power saving benefits. By design so to speak.

ShintaiDK said:
But it could or could not hinder future updates. The device maker (usually)dont make the OS either. For Samsung its Android, meaning Google.

Actually the delivery chain is like this:

Linux Kernel (from kernel.org) -> Google Android -> SoC Vendor Android (e.g. Qualcomm) -> Device Vendor Android (e.g. LG / Sony Mobile / HTC)

In each steps SW patches are added compared to the previous step. And the end of the chain is actually very similar to the original Linux Kernel. Not that many percent of the around 10 million lines of code in the Linux Kernel are actually modified along the way. For example Google Android adds about 250 patches or 25.000 lines of code in the kernel, see here. Then they of course add lots of code outside of the Linux Kernel too, but that's another story.

Depending on how the "big.LITTLE task switcher/migrator" is actually implemented, my guess is that big.LITTLE support will make it into the Linux Kernel, which means that the downstream gets it automatically too. If not, Google or the SoC Vendor will get it from ARM and add it as relevant SW patches.

Regardless, the Device Vendor at the end always has the possibility to add the necessary patches to make sure it works. But normally the Device Vendor will require that the SoC Vendor (or further upstream) provides that before signing up to buy their SoC.

ShintaiDK said:
Windows and Bulldozer/Pilediver uarch scheduler. How hard could it be? Or the years spend on getting HT scheduling working.

But hey, that was Windows dude...

But seriously, the Linux Kernel scheduler already works perfectly fine on a huge amount of CPU. Then there may of course always be tuning and details that can be improved upon.

ShintaiDK said:
I have no clue what you are trying to point out here. I think you get confused with compiler paths and move actually memory running applications and OS realtime between different types of cores.

The point is that if you would have needed to redesign and/or recompile the applications for big.LITTLE to work, it would have been far more intrusive and complicated. But since changes are isolated to the OS or "task switcher SW", then the changes are quite local and support for big.LITTLE is much easier to introduce software-wise.

Compare modifying and/or recompiling one part (the OS) vs modifying hundreds of thousands of parts (all applications) distributed across thousands of different companies.

ShintaiDK said:
Nothing at all points to this.

Because?

ShintaiDK · Jan 12, 2013

Fjodor2001 said:
Ok, then I see what you mean. But I don't consider it as those cores being wasted. It's just that they are not used all of the time. It's simply the price you pay to get the additional power saving benefits. By design so to speak.

Its a tradeoff. Just like the GT3 in ULV Haswell. But the money spend only comes from one place.

Fjodor2001 said:
Actually the delivery chain is like this:

Linux Kernel (from kernel.org) -> Google Android -> SoC Vendor Android (e.g. Qualcomm) -> Device Vendor Android (e.g. LG / Sony Mobile / HTC)

In each steps SW patches are added compared to the previous step. And the end of the chain is actually very similar to the original Linux Kernel. Not that many percent of the around 10 million lines of code in the Linux Kernel are actually modified along the way. For example Google Android adds about 250 patches or 25.000 lines of code in the kernel, see here. Then they of course add lots of code outside of the Linux Kernel too, but that's another story.

Depending on how the "big.LITTLE task switcher/migrator" is actually implemented, my guess is that big.LITTLE support will make it into the Linux Kernel, which means that the downstream gets it automatically too. If not, Google or the SoC Vendor will get it from ARM and add it as relevant SW patches.

Regardless, the Device Vendor at the end always has the possibility to add the necessary patches to make sure it works. But normally the Device Vendor will require that the SoC Vendor (or further upstream) provides that before signing up to buy their SoC.

And that chain works fast right?

Fjodor2001 said:
But hey, that was Windows dude...

What a brilliant argument there. Seems like I am wasting time on you again.

Fjodor2001 said:
But seriously, the Linux Kernel scheduler already works perfectly fine on a huge amount of CPU. Then there may of course always be tuning and details that can be improved upon.

Irrelevant to big.LITTLE

Fjodor2001 said:
The point is that if you would have needed to redesign and/or recompile the applications for big.LITTLE to work, it would have been far more intrusive and complicated. But since changes are isolated to the OS or "task switcher SW", then the changes are quite local and support for big.LITTLE is much easier to introduce software-wise.

Compare modifying and/or recompiling one part (the OS) vs modifying hundreds of thousands of parts (all applications) distributed across thousands of different companies.

The wast majority of apps are Java based. Same reason you can use these apps across different ARM uarchs and x86.

When you compile a program you often have several paths. For example if you play a game. It might support SSE2, SSE3, SSE4.1/2 AVX1, AVX2 and so on. But you can still play the game on a machine that only supports SSE2 for example. Simply due to more paths.

Fjodor2001 said:
Because?

Because I dont see anything pointing to this. sefsefsefsef essentially also answered this.

Either ARM saw some benefit everyone else didnt. Or the real issue is that ARM couldnt afford it R&D wise, didnt have enough manpower or simply lack/not willing to pay for the IPs needed.

dagamer34 · Jan 12, 2013

ShintaiDK said:
Haswell is not the main competitor. Atom is.

I think his point is that you got 84W chips in 1 end, 7-10W in the other end. And both with SOIx states that can idle in the miliwatts.

Now transform this to Atom and you should get the point. Silvermont with a new uarch should show where Atom stands in the game.

ARM simply choose the big.LITTLE concept instead of making more power efficient cores. The A15 is a guzzler compared to A7.

The original purpose of Cortex-A15 was for servers.

taltamir · Jan 12, 2013

Fjodor2001 said:
Then what about a similar x86 solution that is said to be developed? Can they make a small & power efficient x86 core that still has support for all advanced TSX/AVX/AVX2/etc instructions?

Isn't that what intel did with the atom? (beyond the die shrink, etc)
IIRC it is based on the P5 architecture, the original Pentium that didn't even have MMX.
They added to it MMX, SSE, SSE2, SSE3, SSSE3, Enhanced Intel SpeedStep Technology (EIST), XD bit (an NX bit implementation), and Hyper-threading.
And in the desktop parts also Intel 64

Fjodor2001 · Jan 12, 2013

ShintaiDK said:
Its a tradeoff. Just like the GT3 in ULV Haswell. But the money spend only comes from one place.

Agreed. And you would not call the GT3 waste, right?

ShintaiDK said:
And that chain works fast right?

Yes, in fact it does. We're usually talking months. And urgent patches can be delivered in weeks or days.

Urgent patches can also be cherry picked directly e.g. from the Linux Kernel to the Device Vendor source code tree.

Also, we're talking about adding this big.LITTLE functionality once. Then the rest will be improvements or bug fixes as usual. So I don't see where your "upgrade fears" come into play, where you thought the functionality suddenly would be removed due to an OS upgrade?

ShintaiDK said:
What a brilliant argument there. Seems like I am wasting time on you again.

Did you notice the text afterwards saying "But seriously"... ?

ShintaiDK said:
Irrelevant to big.LITTLE

No, it's an example indicating that big.LITTLE scheduling and process transfer likely is not as complex as you are trying to imply.

ShintaiDK said:
The wast majority of apps are Java based. Same reason you can use these apps across different ARM uarchs and x86.

On Android yes, on Windows no.

So e.g. with TSX/AVX2 you'll have to recompile all Windows applications that use native code for them to make use of those instructions.

Also, note that TSX/AVX2 is not in any way functionality-wise related to big.LITTLE, which you perhaps thought I meant and got confused by earlier. But it's interesting anyway when comparing "how hard is it to roll out a new CPU change", which is relevant for both big.LITTLE and TSX/AVX2.

ShintaiDK said:
When you compile a program you often have several paths. For example if you play a game. It might support SSE2, SSE3, SSE4.1/2 AVX1, AVX2 and so on. But you can still play the game on a machine that only supports SSE2 for example. Simply due to more paths.

True. But for the programs that are already compiled without support for AVX2 (which is the case for most current programs), they will not benefit from Haswell having AVX2 support. The programs will have to be recompiled to reap the benefit of AVX2 (they can of course benefit from e.g. shared libs containing AVX2 instructions though).

If you compare that to big.LITTLE, you do not have to recompile any applications for them to make use of the new "functionality".

ShintaiDK said:
Because I dont see anything pointing to this. sefsefsefsef essentially also answered this.

I just read his post, and he has a point. But could it be that even if it's possible to implement very complex x86 instructions in a small core, it may not be possible to do it as efficiently as in the ARM case? I.e. on x86, the complex instructions will perhaps be insanely slow on the low power core?

ShintaiDK said:
Either ARM saw some benefit everyone else didnt. Or the real issue is that ARM couldnt afford it R&D wise, didnt have enough manpower or simply lack/not willing to pay for the IPs needed.

Yes, it's hard to tell. I guess will see in the next few years what comes out of this. Anyway, I think it's a cool idea, and ARM should at least get some credit for being innovative...

Exophase · Jan 12, 2013

taltamir said:
Isn't that what intel did with the atom? (beyond the die shrink, etc)
IIRC it is based on the P5 architecture, the original Pentium that didn't even have MMX.
They added to it MMX, SSE, SSE2, SSE3, SSSE3, Enhanced Intel SpeedStep Technology (EIST), XD bit (an NX bit implementation), and Hyper-threading.
And in the desktop parts also Intel 64

The Atom design isn't related to P5C in the slightest. You're either thinking of of Larrabee (which must be heavily modified in its own right) or tech journalists who think that dual-issue and in-order means it's the same chip. The uarch is totally different, top to bottom, new design, with everything about it being a deliberate design decision and not a lazy reuse of something ancient.

dagamer34 said:
The original purpose of Cortex-A15 was for servers.

Hardly. ARM isn't so dumb, they know they'd get almost no market share with that, and even if they did the market is too low volume even at high success - what were they going to do, charge 20x their usual license fee? Something like X-Gene is ARM specifically for server products because it's backed by a company that can deliver that whole product themselves. And because it's actually 64-bit.

ARM doesn't offer Cortex-A7 alongside Cortex-A15s because it's cheaper or easier than making a "good" A15, anymore than Intel offering Atom because it's cheaper than scaling Core CPUs down that low. It's a trade-off, one that not just ARM but SoC vendors and fabs make. Cortex-A15 in general deliberately trades some perf/W and area for peak perf vs Cortex-A8 and Cortex-A9, while Cortex-A7 does exactly the opposite. Some people are jumping all over A15 for using a lot more power at peak perf vs Atom, but I don't see anyone even trying to characterize whole perf/W curves, and this is just for one part where specific binning is involved (Exynos 5 actually has a ton of different voltage group bins in the Linux kernel code, so there's a lot of variation), and specifying trade-offs made for instance static leakage vs dynamic. On that point, the claim that it has bad idle characteristics is just bunk, which anyone can see from the reviews.

No matter how good you are at designing chips, if you tighten the execution range you can ALWAYS do better. A CPU designed to only run at up to 1GHz will always be able to do it more efficiently than one that can go up to 3GHz. You trade a lot of resources to make the latter possible. A15 is geared for those higher end targets (2.5+GHz being real design goals) and is fairly wide. It could well be that another company like Apple designs a processor like Swift to only comfortably hit ~1.5GHz on 32/28nm and they're more efficient for that, I won't contest that.

Does it look like ARM must be crazy for doing something no one else is doing? They're not the only ones doing asynchronous/heterogeneous multicore, nor have they been for ages - they're just refining the concept. It's way too early to say something like that is indicative of going against what the rest of the market knows to be best.

Fjodor2001 · Jan 13, 2013

Here are two very interesting articles on how Linux can support big.LITTLE and the complications involved:

"Update on big.LITTLE scheduling experiments":

http://www.linuxplumbersconf.org/20...12-lpc-scheduler-task-placement-rasmussen.pdf

"Linux support for ARM big.LITTLE":

http://lwn.net/Articles/481055

Fjodor2001 · Jan 13, 2013

Exophase said:
No matter how good you are at designing chips, if you tighten the execution range you can ALWAYS do better. A CPU designed to only run at up to 1GHz will always be able to do it more efficiently than one that can go up to 3GHz. You trade a lot of resources to make the latter possible.

^^^^
This is why I think it will be hard for Intel and others to do a single core equivalent that will be as good as the big.LITTLE solution (which has one low power & low performance core, combined with one separate high power & high performance core). Two separate cores tailor made for their respective use cases will do better than one single core that should be able to handle all use cases.

Idontcare · Jan 13, 2013

Exophase said:
The Atom design isn't related to P5C in the slightest. You're either thinking of of Larrabee (which must be heavily modified in its own right) or tech journalists who think that dual-issue and in-order means it's the same chip. The uarch is totally different, top to bottom, new design, with everything about it being a deliberate design decision and not a lazy reuse of something ancient.

I suspect Talta was accidentally conflating Atom's pedigree with that of Larrabee. I do the same thing from time to time, completely by accident, even though I personally have little to no excuse for doing so

:\

taltamir · Jan 13, 2013

I keep on using "thought" I should be more clear and concise.
I remember reading but I don't remember where I read it that larrabee is an array of modified atom chips.
I remember reading but I don't remember where I read it that atom was made by taking the original pentium chip, shrinking it, and then modernizing it as necessary.
Actually now that I think about it, I think I remember the atom bit coming from an interview

Idontcare said:
I suspect Talta was accidentally conflating Atom's pedigree with that of Larrabee. I do the same thing from time to time, completely by accident, even though I personally have little to no excuse for doing so :\

Thanks for the link.
The pertinent info is actually on the previous page 4 not 5 of that article you linked

The first step in the pathfinding effort was to find out if Bonnell could be based on an existing Intel microarchitecture. The small team looked at reusing the Pentium M or the yet-to-be-released Core 2 and some analysis was done on both of these options. It didn't take long to come to the conclusion that it wasn't possible to hit the power and performance targets using an existing design. As Intel discovered with Centrino, in order to target a new market it would need to use a new microprocessor design.

The team went back to a clean sheet of paper and started with the simplest of microprocessor architectures, a single-issue, in-order core and built up from there. The designers kept iterating until the performance and power targets at that time were met.

ARM CPU big.LITTLE process transfer - how does it work?

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Platinum Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Lifer