Mobile Haswell achieves all day battery life - beginning of the end for ARM?

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Pick the fasted ARM SoC GPU you know of then compare that to a Radeon HD 6310. What's your conclusion?

Fastest GPUs in ARM SoCs out today is irrelevant, because you're comparing against an SoC that isn't out in devices. Furthermore, since you only said "Cortex-A15 system" we can be free to pick any SoC that's ever released with a Cortex-A15, no matter how far into the future. Neither of us can say how weak or how powerful the best one will be, but since ARM IP is deployed by a wide variety of manufacturers you'll get some with a stronger emphasis on GPU capabilities than others.

You've been ignoring the fact that I'm talking about system performance.

No, I've been pointing out over and over again that it doesn't make any sense..

Well, in the HTML5 Fishbowl demo, Clovertrail was only showing 1 fish at 30fps whereas Temash was showing 80 at 60fps undocked.

So what? Is that your way of claiming Clover Trail is 160 times faster? You know that's not a correct interpretation right? All you can say for sure is that Temash was at least 2x faster (which is of course no big accomplishment). If you want a real GPU benchmark you have to have the same load on both units.

the Exynos 5 Dual can go up to 8W and needs to throttle the CPU down to 800MHz to keep it under 4W at load. How do you think an 800MHz A15 will compare with Temash now?

Yeah, if you run both CPU cores and GPU at full speed, which is rarely what a game will actually want. It's also just one particular Cortex-A15 SoC with one particular GPU and one particular set of thermal limits. The Temash you compare against isn't 4W TDP (it's 5.9W) and we can already infer that Tegra 4 in Shield isn't going to be thermally limited to 4W either, on account of battery life figures suggesting a system total of 8W, and the thing being packed with a fairly heavy fan and heatsink.

Samsung probably had a reason to switch back to IMG for Exynos 5 Octa, and perf/W of the GPU may well have something to do with that. But you can't blame the CPU for that.
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Cortex-A15s will show up in other applications besides tablets and below. For example: http://semiaccurate.com/2013/02/26/st-ericsson-shows-off-3-0ghz-arm-a9-tablet/#.USzXFjdWl0R

I'm sure you could find a way to consume 25W with a Cortex-A15 SoC if the SoC designer so desires to target it this way. The uarch is capable of clocking up to at least 2.5GHz.

The empirical evidence has been presented throughout this thread. Cortex-A15 also has a 128-bit FPU with FMA. I don't know the full details but it has at least some ability to co-issue 128-bit NEON operations (as in I've measured such a thing).

Of course we're still not talking about GPUs here. But if an SoC vendor wants to license available IP to scale up to GPU levels Jaguar and beyond I'm sure it's very doable. IMG's Series 5 could scale up to 16 scores and Series 6 is rumored to scale even higher. I've heard talk of 1TFLOPs in the highest end configurations. Of course you pay for this in power and area but those are options, and there are many other companies putting different GPU IP on SoCs..

But so long as you talk about "Cortex-A15" how can I not omit the GPU part? There is no GPU part in that. There's no implied GPU part, just in the minds of people who are making various assumptions.

Yes you can theoretical put a lot of gpu power on a15, and scale it to 3Ghz.
But its just talk and ppt like bs like the SA article about A9 on drugs. And it will not happen on the market.

Little-big is a great concept, but it will not be used in the same performance bracket as temash/kabini.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't know what you mean it's "talk" and "BS", did you not see that A15 is being used in chips meant specifically for servers and networking? Why do you think only tablet vendors are interested in this?

I think ARM knows better than you do when they made a chip that can scale this high, and knew they had some partners interested in this. Otherwise they made a pretty big mistake, because if no one is ever using it at > 2GHz they wasted efficiency in the design in allowing it to scale this high (likely a big reason why Apple did their own, because they don't need a chip that can be useful at > 10W)

What is Jaguar going to clock at in 25W anyway? You don't need a 3GHz Cortex-A15 to be competitive with a 2GHz Jaguar.
 

Maragark

Member
Oct 2, 2012
124
0
0
First of all, the Radeon HD 6310 is a graphics card, not an SoC :)

It's not a graphics card, it's the GPU used by the E-350.

Anyway, for lack of a better metric for comparison, according to wikipedia the Radeon HD 6310 has a throughput of 80 GFLOPS. The SGX 554MP4 GPU in the A6X SoC has a throughput of 71 GFLOPS. The ULP Geforce GPU in Tegra 4 has a throughput of 97 GFLOPS.

What's its pixel and texture rate? Here's the data for the HD 6310.

You are misinterpreting Anand's words. What Anand said is that Exynos 5 Dual SoC will allow the CPU and GPU to consume up to ~ 4w each before ramping down voltage. In a game such as Modern Combat 3 that is much more GPU-intensive than CPU-intensive, the GPU voltage ramps up while the CPU voltage ramps down. In a benchmark such as Coremark that is much more CPU-intensive than GPU-intensive, the CPU voltage ramps up while the GPU voltage ramps down.

Here's what Anand actually said:

Anand said:
The SoC is allowed to reach 8W, making that its max TDP by conventional definitions, but seems to strive for around 4W as its typical power under load. Why are these two numbers important? With Haswell, Intel has demonstrated interest (and ability) to deliver a part with an 8W TDP. In practice, Intel would need to deliver about half that to really fit into a device like the Nexus 10 but all of the sudden it seems a lot more feasible. Samsung hits 4W by throttling its CPU cores when both the CPU and GPU subsystems are being taxed, I wonder what an 8W Haswell would look like in a similar situation...

I've highlighted the relevant part. I don't see how you could possibly think I misinterpreted what was said. It's pretty specific. When the system is under load, the CPU will get throttled to get it down to 4W.
 

Maragark

Member
Oct 2, 2012
124
0
0
But that benchmark was very misleading. DX9 GPU and IE using Direct2D and DirectWrite that requires DX10. And any missing functionality gets CPU emulated via the platform update.

That's a possible explanation but it was only an assumption on my part. I'd say it needs more investigation before accepting it as fact.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
A6X is 8 TMUs + 8 ROPs @ 280MHz = 2240 MPixel/s.
Tegra 4 is 4 TMUs + 4ROPs @ 670MHz = 2688 MPixel/s.

Any official information on what Temash's GPU will clock at when in 5.9W TDP mode? Don't tell me you think it'll be the same level as the GPU in the 18W Zacate. A combination of better process and better uarch will certainly improve power consumption but not by this much.

You know, it's funny. Back when AMD was doing their demos they suggested to run some CPU intensive workload in the background during a game to watch IGP performance go down. Now Intel said to do that same thing for Exynos 5 to watch the CPU performance go down. I wonder who will do it to AMD when Temash is being benchmarked?

Here's some useful information: 1) almost no game can come even close to balancing both CPU and GPU demands so that they happen to peak both 2) these days with all of the advanced DVFS we have for both CPU and GPU it's the SoC's job to balance the workload so you give the power budget to the part that needs it more, therefore coming close to balancing it for the game.

Silly scenarios like running CPU intensive benchmarks in the background while running a game throw a wrench in that. Good thing that's not a useful usage scenario.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
That's a possible explanation but it was only an assumption on my part. I'd say it needs more investigation before accepting it as fact.

There is no DX10 drivers for SGX545 for Clover Trail. So its a solid fact that its CPU emulated.
 

ams23

Senior member
Feb 18, 2013
907
0
0
It's not a graphics card, it's the GPU used by the E-350.

The Radeon HD 6310 is an integrated graphics card. But one thing we know for sure is that it is not an SoC :D

What's its pixel and texture rate? Here's the data for the HD 6310.

The pixel and texel fillrate for Tegra 4 is 2.7 GP/s, so it has higher pixel fillrate but lower texel fillrate vs. the HD 6310. That said, the comparison between Tegra 4 and HD 6310 is not quite apples to oranges, since they have completely different shader architectures. Tegra 4 has a total of 72 non-unified shaders operating at 672MHz, while HD 6310 has a total of unified 80 shaders operating at 492MHz.

I've highlighted the relevant part. I don't see how you could possibly think I misinterpreted what was said. It's pretty specific. When the system is under load, the CPU will get throttled to get it down to 4W.

You clearly did misinterpret what Anand said. You claimed that Exynos 5 Dual needs to "throttle the CPU down to 800MHz to keep it under 4W at load", and Anand never said anything of the sort. All Anand said that was CPU voltage starts to ramp down once power consumption for the CPU reaches 4w. That would be no different than what would happen with Jaguar cores in a similar scenario.
 
Last edited:

Maragark

Member
Oct 2, 2012
124
0
0
There is no DX10 drivers for SGX545 for Clover Trail. So its a solid fact that its CPU emulated.

I just checked the datasheet for the Z2760 and it clearly state that it supports DX9.3 and doesn't mention anything about DX10. So, that wraps up the case of the lone fish.
 

Maragark

Member
Oct 2, 2012
124
0
0
The Radeon HD 6310 is an integrated graphics card. But one thing we know for sure is that it is not an SoC :D

The pixel and texel fillrate for Tegra 4 is 2.7 GP/s, so it has higher pixel fillrate but lower texel fillrate vs. the HD 6310. That said, the comparison between Tegra 4 and HD 6310 is not quite apples to oranges, since they have completely different shader architectures. Tegra 4 has a total of 72 non-unified shaders operating at 672MHz, while HD 6310 has a total of unified 80 shaders operating at 492MHz.

So which do you think is the better GPU and how much better is it? Is the difference in performance significant? Now, if the best ARM SoC GPU can only provide similar performance to AMDs ancient VLIW5 architecture, is it not obvious that GCN will be better?

You clearly did misinterpret what Anand said. You claimed that Exynos 5 Dual needs to "throttle the CPU down to 800MHz to keep it under 4W at load", and Anand never said anything of the sort. All Anand said that was CPU voltage starts to ramp down once power consumption for the CPU reaches 4w. That would be no different than what would happen with Jaguar cores in a similar scenario.

Here's the previous paragraph to the one I quoted earlier:

Next, while CoreMark is still running on both cores, we switch back to Modern Combat 3 (pink section of the graph). GPU voltage ramps way up, power consumption is around 4W, but note what happens to CPU power consumption. The CPU cores step down to a much lower voltage/frequency for the background task (~800MHz from 1.7GHz). Total SoC TDP jumps above 4W but the power controller quickly responds by reducing CPU voltage/frequency in order to keep things under control at ~4W. To confirm that CoreMark is still running, we then switch back to the benchmark (blue segment) and you see CPU performance ramps up as GPU performance winds down. Finally we switch back to MC3, combined CPU + GPU power is around 8W for a short period of time before the CPU is throttled

Again, I've highlighted the relevant part.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
So just to be clear, you're criticizing that Exynos 5250 (at least in this configuration) doesn't have a power budget that lets it run both CPU + GPU at max clocks simultaneously.

Then we take Temash which, when capped at ~6W, can run all four CPU cores at 1GHz when the GPU load is close to zero. So what'll Temash do when you try to run this heavy benchmark that stresses all cores and a game with non-trivial 3D load at the same time? There can be one of three scenarios:

1) There's almost no power budget left over, and it'll refuse to let you use the GPU anywhere close to its peak clock (wrong decision, since the game will want GPU perf first)
2) There's almost no power budget left over, and it'll start throttling the CPU clocks below 1GHz - (right decision, since the game will normally not tax all four cores)
3) There was actually lots of power budget left over, which is great since it means the cores are really low power, but bad since it means turbo isn't doing anything to turn that power overhead into more MHz for AMD's test.

I'm not going to be that surprised if, while running a game AND running non-yielding loads on all four cores, the SoC pushes them below 1GHz (or only some of them if asynchronously clocked.. does anyone know? I doubt it since they have a shared L2 with a core derived clock..)
 

Maragark

Member
Oct 2, 2012
124
0
0
Fastest GPUs in ARM SoCs out today is irrelevant, because you're comparing against an SoC that isn't out in devices. Furthermore, since you only said "Cortex-A15 system" we can be free to pick any SoC that's ever released with a Cortex-A15, no matter how far into the future. Neither of us can say how weak or how powerful the best one will be, but since ARM IP is deployed by a wide variety of manufacturers you'll get some with a stronger emphasis on GPU capabilities than others.

Wait too long and it won't be competing against Temash, it would be competing against its successors.

So what? Is that your way of claiming Clover Trail is 160 times faster? You know that's not a correct interpretation right? All you can say for sure is that Temash was at least 2x faster (which is of course no big accomplishment). If you want a real GPU benchmark you have to have the same load on both units.

No, that was me passing on information about AMDs MWC demo.

Yeah, if you run both CPU cores and GPU at full speed, which is rarely what a game will actually want. It's also just one particular Cortex-A15 SoC with one particular GPU and one particular set of thermal limits. The Temash you compare against isn't 4W TDP (it's 5.9W) and we can already infer that Tegra 4 in Shield isn't going to be thermally limited to 4W either, on account of battery life figures suggesting a system total of 8W, and the thing being packed with a fairly heavy fan and heatsink.

Well if approaches 15W then it's getting into Kabini territory. The fact remains though, that the A15 SoC can't run at full load at 1.7GHz and will throttle down to 800MHz. At that clock, its performance will be closer to Hondo than Temash.

Samsung probably had a reason to switch back to IMG for Exynos 5 Octa, and perf/W of the GPU may well have something to do with that. But you can't blame the CPU for that.

Like I keep telling you though, I'm talking about overall system performance and I believe that the example of the A15 SoC show why it makes perfect sense to do so.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Now you're using Exynos 5 as "the A15 SoC." First it was Cortex-A15, now "Cortex-A15 system", now Exynos 5250. I don't see how you can make this blanket argument when all these other Cortex-A15 SoCs are coming out soon, in devices around when Temash will be. Using it exclusively for your argument doesn't make any sense.
 

ams23

Senior member
Feb 18, 2013
907
0
0
So which do you think is the better GPU and how much better is it? Is the difference in performance significant? Now, if the best ARM SoC GPU can only provide similar performance to AMDs ancient VLIW5 architecture, is it not obvious that GCN will be better?

Without any performance data on the Temash GPU, it is hard to say exactly how it would compare to the Tegra 4 GPU. With respect to ancient GPU architectures, the Tegra 4 non-unified GPU architecture has roots that are far older than the Radeon HD 6xxx GPU architecture, but it has clearly been architected for ultra low power useage and is meant for use in both tablets and high end smartphones.

Again, I've highlighted the relevant part.

What Anand says is that the CPU voltage steps way down for background (ie. non CPU-intensive) tasks, where the CPU operating frequency drops down to 800MHz.
 

Maragark

Member
Oct 2, 2012
124
0
0
A6X is 8 TMUs + 8 ROPs @ 280MHz = 2240 MPixel/s.
Tegra 4 is 4 TMUs + 4ROPs @ 670MHz = 2688 MPixel/s.

Any idea of the texture rate for the A6X?

Any official information on what Temash's GPU will clock at when in 5.9W TDP mode? Don't tell me you think it'll be the same level as the GPU in the 18W Zacate. A combination of better process and better uarch will certainly improve power consumption but not by this much.

I'm sure I saw a benchmark saying 300MHz undocked and 500MHz docked. AMD claim 100% GPU peformance increase over its predecessor, which would be Hondo. So, I'd say it should be a bit faster than the HD 6310.
 

Maragark

Member
Oct 2, 2012
124
0
0
So just to be clear, you're criticizing that Exynos 5250 (at least in this configuration) doesn't have a power budget that lets it run both CPU + GPU at max clocks simultaneously.

Then we take Temash which, when capped at ~6W, can run all four CPU cores at 1GHz when the GPU load is close to zero. So what'll Temash do when you try to run this heavy benchmark that stresses all cores and a game with non-trivial 3D load at the same time? There can be one of three scenarios:

1) There's almost no power budget left over, and it'll refuse to let you use the GPU anywhere close to its peak clock (wrong decision, since the game will want GPU perf first)
2) There's almost no power budget left over, and it'll start throttling the CPU clocks below 1GHz - (right decision, since the game will normally not tax all four cores)
3) There was actually lots of power budget left over, which is great since it means the cores are really low power, but bad since it means turbo isn't doing anything to turn that power overhead into more MHz for AMD's test.

I'm not going to be that surprised if, while running a game AND running non-yielding loads on all four cores, the SoC pushes them below 1GHz (or only some of them if asynchronously clocked.. does anyone know? I doubt it since they have a shared L2 with a core derived clock..)

What makes you think that Temash can't run both the GPU and CPU at load at the default clocks within it's specified TDP? I've seen nothing to suggest this. Do you have any sources to back up this claim?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Texture rate and pixel rate are the same for both of them, they have the same number of TMUs and ROPs.

If you really want to look at Apple SoCs (which aren't even Cortex-A15, of course) then I'm sure they'll have something newer out not that long after Temash is out, and of course long before Temash is succeeded. Probably based on IMG's Series 6 (Rogue), which changes things again. I'm sure there will be A15 SoCs using it too, with at least one announced so far.

What makes you think that Temash can't run both the GPU and CPU at load at the default clocks within it's specified TDP? I've seen nothing to suggest this. Do you have any sources to back up this claim?

You must not be following my reasoning, because I'm not claiming that - I gave that as two of the possible scenarios. The third scenario says it can do that but questions why it couldn't turbo the CPU beyond default clocks when the GPU isn't being used. But without knowing what default clock even means for the GPU who really knows?
 

ams23

Senior member
Feb 18, 2013
907
0
0
Any idea of the texture rate for the A6X?

Since the number of TMU's is equal to the number of ROP's for A6X and Tegra 4, the texel fillrate is equal to the pixel fillrate. That said, these metrics alone don't tell a complete story, because they ignore the shader/ALU throughput.
 

Maragark

Member
Oct 2, 2012
124
0
0
Now you're using Exynos 5 as "the A15 SoC." First it was Cortex-A15, now "Cortex-A15 system", now Exynos 5250. I don't see how you can make this blanket argument when all these other Cortex-A15 SoCs are coming out soon, in devices around when Temash will be. Using it exclusively for your argument doesn't make any sense.

I was quite clearly talking about the A15 SoC that Anand was reviewing.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
In a way that you think makes it characteristic of every other SoC that has A15s in it. It's because of that GPU's power consumption that the SoC has to throttle. While knowing nothing about the perf/W of everyone else's GPUs (on different processes as well, Samsung is already superseding the process Exynos 5 was made on) you can't draw conclusions about what headroom any other SoC will have.
 

Maragark

Member
Oct 2, 2012
124
0
0
Texture rate and pixel rate are the same for both of them, they have the same number of TMUs and ROPs.

If you really want to look at Apple SoCs (which aren't even Cortex-A15, of course) then I'm sure they'll have something newer out not that long after Temash is out, and of course long before Temash is succeeded. Probably based on IMG's Series 6 (Rogue), which changes things again. I'm sure there will be A15 SoCs using it too, with at least one announced so far.

That sounds awfully close to admitting that there will be no faster GPU available for use in an ARM SoC when Temash is released in a couple of months.

You must not be following my reasoning, because I'm not claiming that - I gave that as two of the possible scenarios. The third scenario says it can do that but questions why it couldn't turbo the CPU beyond default clocks when the GPU isn't being used. But without knowing what default clock even means for the GPU who really knows?

So basically, because I showed you that a certain ARM SoC cant run both it's GPU and CPU at load without throttling the CPU, you invent some scenario in which Temash is in the same situation? Yeah, that's not completely dishonest.
 

ams23

Senior member
Feb 18, 2013
907
0
0
That sounds awfully close to admitting that there will be no faster GPU available for use in an ARM SoC when Temash is released in a couple of months.

That is not what he said at all. FYI, Tegra 4 devices (with quad core A15 CPU and 72 core ULP Geforce GPU) will come to market in both Q2 and Q3 2013. Temash is not expected before that timeframe. Anyway, there is essentially no GPU performance data available yet on Temash, so it's a moot point.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
That sounds awfully close to admitting that there will be no faster GPU available for use in an ARM SoC when Temash is released in a couple of months.

Um, no? Just because I'm saying yet another GPU is coming out in SoCs soon doesn't mean I'm saying anything one way or the other about relative performance vs Temash....

First when I brought up future GPUs you said that Temash's successor will be out by then. Now that I'm saying they're right around the corner you're saying well, Temash will still beat them to the door. I'm sorry, but when you say "Everything I've heard so far tells me that neither the A15 or Atom will come close to the performance of Temash" I didn't know that only applied to a tiny timespan right after the product is released..

So basically, because I showed you that a certain ARM SoC cant run both it's GPU and CPU at load without throttling the CPU, you invent some scenario in which Temash is in the same situation? Yeah, that's not completely dishonest.

What I said has nothing to do with Exynos 5250. I'm sorry that you're having a hard time following the reasoning that if it runs all cores at 1GHz when the GPU is barely used that means there either isn't a lot of TDP left or the SoC is doing a poor job utilizing it for turbo (since we know that the CPU cores are capable of clocking up to at least 1.4GHz we know it's purely a power limit and not a design one). I think it's pretty straightfoward. Note that I haven't made an assertive claim about either of these situations. Just that one must be true. You pick if you want.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
We know some specs of Temash's GPU. It has 128SPs. At 500MHz, that makes it 128GFlops, and at 300MHz that makes it 76.8GFlops.

Depending on how the clock speed is, it may very well be at the level of GPUs used in ARM. Based on the same info though, it suggests without the Turbo Dock, it only runs at 300MHz.

Trinity benchmarks show that running CPU+GPU heavy loads reduces clock speeds of both below the Base frequency. So AMD is obviously not exempt from using throttling when both are loaded heavily.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Trinity benchmarks show that running CPU+GPU heavy loads reduces clock speeds of both below the Base frequency. So AMD is obviously not exempt from using throttling when both are loaded heavily.

People view turbo boost as a big positive and throttling as a big negative, when they're really comparing half glass full with half glass empty. It all depends on what you define "base frequency" (usually, some advertised clock speed) to mean, which is pretty arbitrary.

Sure, you can establish both CPU and GPU frequencies that you will guarantee the SoC will never go below. If you ask me, any design with aggressive load balanced DVFS and where CPU perf is heavily power constrained should make the minimum guaranteed GPU speed the lowest one it can run at. Because there's no real lower bound to how little an application may need GPU. Same thing with CPU, more or less.

Of course there are limits where you can't lower frequency anymore or where doing so gives you no power benefit. So it makes sense to stop there. And you can advertise base speeds here if you really want. That seems to be what Intel is doing with the very low GPU base speeds on their 17W parts.

But establishing a TDP that allows all of the CPU and GPU cores to run at this advertised frequency is IMO a bad marketing decision, because it's giving a baseline that's outside of what most people are interested in using. In other words, short selling the device.