Intel Medfield Performance Oddities

Arachnotronic · Oct 28, 2012

Okay, so I'm at a complete loss to explain the following, so any help would be much appreciated.

In the Anandtech reviews of the Surface and the iPhone 5, we see benchmarks of the Atom absolutely dominating the Qualcomm Snapdragon S4, the Nvidia Tegra 3, and even the Apple A6. However, when I peruse the Geekbench database, I see that the Medfield, while faster than a dual core Cortex A9 @ 1GHz, is slower than the Tegra 3 and the Snapdragon S4.

What explains this disparity? Why does Medfield/Clover Trail win all these more "real-world" CPU oriented tests but seems to be just "competitive" in Geekbench?

LogOver · Oct 28, 2012

The first reason is that Anand tested javascript, which is singlethraded. Geekbench tests both, singlethreaded and multithreaded performance.
The second reason is compiler differences between different OS. For example, Atom N450 has almost the same frequency as Z2460, but its linux scores are much higher.

ShintaiDK · Oct 28, 2012

Remember any platform differences as well. Its basicly the classic issue of comparing 2 laptops. Same CPU, possible competely different performance.

jones377 · Oct 28, 2012

In addition, Javascript performance on x86 has been optimized alot over the years due to the browser wars.

Nemesis 1 · Oct 28, 2012

Intel17 said:
Okay, so I'm at a complete loss to explain the following, so any help would be much appreciated.

In the Anandtech reviews of the Surface and the iPhone 5, we see benchmarks of the Atom absolutely dominating the Qualcomm Snapdragon S4, the Nvidia Tegra 3, and even the Apple A6. However, when I peruse the Geekbench database, I see that the Medfield, while faster than a dual core Cortex A9 @ 1GHz, is slower than the Tegra 3 and the Snapdragon S4.

What explains this disparity? Why does Medfield/Clover Trail win all these more "real-world" CPU oriented tests but seems to be just "competitive" in Geekbench?

Was the other review using a razrI if not thats your differance. For a single core medfield is great for intels first SoC. Had they used a better newer Imgination graphics it would be really good

Arachnotronic · Oct 28, 2012

LogOver said:
The first reason is that Anand tested javascript, which is singlethraded. Geekbench tests both, singlethreaded and multithreaded performance.
The second reason is compiler differences between different OS. For example, Atom N450 has almost the same frequency as Z2460, but its linux scores are much higher.

He tested the Kraken benchmark, which is very CPU heavy & multithreaded. Medfield won there, too.

LogOver · Oct 28, 2012

Intel17 said:
He tested the Kraken benchmark, which is very CPU heavy & multithreaded. Medfield won there, too.

Yes, but it is still javascript (so singlethreaded). z2460 has better singlethreaded performance then ARM A9 processors but it has less cores.

Nemesis 1 · Oct 28, 2012

We seen medfield last year in the show in january . So thats when we should see Intels version of the 22nm super smartphone As intel is on yearly updates now. As they said they would be

LogOver · Oct 28, 2012

jones377 said:
In addition, Javascript performance on x86 has been optimized alot over the years due to the browser wars.

Not necessarily true now. For example Win8 shares the same javascript source code across both, x86 and arm versions, as well, as chrome for Android.

Nemesis 1 · Oct 28, 2012

Well I really don't care what apps or benches are used. The Razr i with the 2,ghz atom runs great . Thats not the best 32nm. core to be released. Intel is likely already shipping Z2580 the dual core Atoms with updated graphics . When we see the first one from motarola is anyones gues . That will be the Highend saltwell(medfield) The present razri is the middle model for google Than the Z2000 fills out 32nm intel saltwell(medfield) on the low end for google. Plus any other design wins intel gets. So intel will likey show 22nm in jan. phone cpu releases are nothing like PC releases.

Nemesis 1 · Oct 28, 2012

here is ATs
take on the whole line

http://www.anandtech.com/print/5592

Exophase · Oct 29, 2012

LogOver said:
Not necessarily true now. For example Win8 shares the same javascript source code across both, x86 and arm versions, as well, as chrome for Android.

Same browser doesn't mean same code for the JS JIT (which dominates performance), unless you think the same code will two completely different types of machine instructions...

It's dead obvious how much more mature JS performance is on x86 than ARM if you simply compare how benchmark scores have changed in various browsers for both over the last couple years.

LogOver · Oct 29, 2012

Exophase said:
Same browser doesn't mean same code for the JS JIT (which dominates performance), unless you think the same code will two completely different types of machine instructions...

Chrome/chromium uses the same V8 opensource javascript engine across all platforms:
http://code.google.com/p/v8/

MS also claimed that Windows8 shares source code with Windows RT.

Exophase said:
It's dead obvious how much more mature JS performance is on x86 than ARM if you simply compare how benchmark scores have changed in various browsers for both over the last couple years.

I would say that javascript performance has progressed much faster on ARM platform. I actually think that some companies (MS, Apple) started to optimize their javascript engine for specific benchmarks (for example sunspider - which has a big chunk of a "dead code").

Exophase · Oct 29, 2012

LogOver said:
Chrome/chromium uses the same V8 opensource javascript engine across all platforms:
http://code.google.com/p/v8/

MS also claimed that Windows8 shares source code with Windows RT.

I don't think you see what I'm saying. I know that the same project is used cross platform and much of the code is shared, but what you're saying is like claiming that GCC generates the same quality code on all platforms. It doesn't. A lot of performance in a compiler is specific to the code generation on a specific platform.

LogOver said:
I would say that javascript performance has progressed much faster on ARM platform. I actually think that some companies (MS, Apple) started to optimize their javascript engine for specific benchmarks (for example sunspider - which has a big chunk of a "dead code").

That's exactly my point. If something is improving a lot faster that means it's less mature. Think about it.

LogOver · Oct 30, 2012

Exophase said:
I don't think you see what I'm saying. I know that the same project is used cross platform and much of the code is shared, but what you're saying is like claiming that GCC generates the same quality code on all platforms. It doesn't. A lot of performance in a compiler is specific to the code generation on a specific platform.

Yes, it always comes down to compiler (as I said in the beginning). But who knows what compiler is more mature now? With all the hype surrounding ARM cpus lately it would be not an exaggeration to suggest that much more resources are thrown into arm-gcc optimization (who would really care about atom-gcc optimization for linux). Think about it.

Exophase said:
That's exactly my point. If something is improving a lot faster that means it's less mature. Think about it.

Yes, but using a car analogy, if one car is riding behind another, but has a higher speed, then at some point in time the first car will overtake the second. How do we know if this "point of time" has already occurred or not?

Exophase · Oct 30, 2012

LogOver said:
Yes, it always comes down to compiler (as I said in the beginning). But who knows what compiler is more mature now? With all the hype surrounding ARM cpus lately it would be not an exaggeration to suggest that much more resources are thrown into arm-gcc optimization (who would really care about atom-gcc optimization for linux). Think about it.

It has nothing to do with GCC. I'm not sure you understand that V8 and every other major Javascript engine uses a JIT, therefore generates its own code. That's what's performance sensitive here. The Linux part also means nothing. It's not about specific Atom optimization, it's much more about the ISA, and JS engines have targeted x86 for a lot longer.

LogOver said:
Yes, but using a car analogy, if one car is riding behind another, but has a higher speed, then at some point in time the first car will overtake the second. How do we know if this "point of time" has already occurred or not?

Because in the history of compiler improvement you don't tend to see a back-end start way behind then improve tremendously and quickly overtake another one that had been the focus for years longer. Progress follows a pretty smooth and monotonic curve, with big improvements at first followed by smaller and smaller ones gradually.

For something like Javascript the situation favors the first implementation more because there are a lot of extra compiler idioms to try to optimize the dynamic typing.. with these systems originally designed with x86 targets in mind (for instance with respect to things like specializing code using self-modification) there's going to be more entrenched that has to be redone for best performance.

ARM is harder to generate good code for than x86, because you have to manage large immediates, have to cache more in registers, and have to give additional special support for folded shifts, predication, address post-increment, and block memory instructions, as well as a bunch of others that were added since ARMv6. There's just more room to grow.

But if that's too speculative for you, just look at the thread opening, why does Atom do so well vs ARM in JS but not nearly so well in other things? It's not just Geekbench, also Phoronix tests...

LogOver · Oct 30, 2012

Exophase said:
It has nothing to do with GCC. I'm not sure you understand that V8 and every other major Javascript engine uses a JIT, therefore generates its own code. That's what's performance sensitive here.

V8 does not use JIT (does not execute bytecod). V8 compiles Javascript into native code before execution (such as any other compiler).

Exophase said:
it's much more about the ISA, and JS engines have targeted x86 for a lot longer.

Thats not true regarding V8. The project started somewhere in 2008 for x86 and arm simultaneously.

Exophase said:
But if that's too speculative for you, just look at the thread opening, why does Atom do so well vs ARM in JS but not nearly so well in other things? It's not just Geekbench, also Phoronix tests...

Actually it seems like Atom do well vs. ARM in "other things" also (thanks for the phoronix hint).
http://www.phoronix.com/scan.php?page=article&item=calxeda_ecx1000_atom&num=1
And they only tested very well multithreaded workloads. This actually supports something what I said before - Atom is faster in singlethreaded workloads but lately arm has more cores.

Exophase · Oct 30, 2012

LogOver said:
V8 does not use JIT (does not execute bytecod). V8 compiles Javascript into native code before execution (such as any other compiler).

No it doesn't. https://developers.google.com/v8/design Do you understand what dynamic code generation means? It handles it "when it's executed." Not ahead of time.

Being able to dynamically compile code is important for a JS engine because they make optimistic assumptions about data types for a variable, then have to compile alternative versions when these assumptions are proven wrong.

And even if you were right GCC would have nothing to do with it. Unless you seriously think V8 converts JS to C code and has GCC compile it.

LogOver said:
Thats not true regarding V8. The project started somewhere in 2008 for x86 and arm simultaneously.

It doesn't matter, the point is that aggressive optimizations were started on x86 were started much earlier, with the big pissing match in 2008 when Chrome was first released. Who was comparing browser performance on ARM then? Were you? Android was barely being used.

LogOver said:
Actually it seems like Atom do well vs. ARM in "other things" also (thanks for the phoronix hint).
http://www.phoronix.com/scan.php?page=article&item=calxeda_ecx1000_atom&num=1
And they only tested very well multithreaded workloads. This actually supports something what I said before - Atom is faster in singlethreaded workloads but lately arm has more cores.

Sure, a 1.8GHz dual core Atom does well vs a 1GHz dual core Cortex-A9 (particularly one on a fairly old SoC running ES chips with a bunch of errata like Pandaboard; look up Exynos 4 results, they're much better). And on multithreaded loads the gap is bigger because Atom has HT.

On single threaded workloads Cortex-A9 typically does a little better clock for clock (except for with JS), which is exactly what you'd expect given the respective uarchs. Especially with Medfield which has to contend with the smaller register file on x86-32, which is a bigger problem for Atom.

LogOver · Oct 30, 2012

Exophase said:
No it doesn't. https://developers.google.com/v8/design Do you understand what dynamic code generation means? It handles it "when it's executed." Not ahead of time.

Read it again:
V8 compiles JavaScript source code directly into machine code when it is first executed. There are no intermediate byte codes, no interpreter. Property access is handled by inline cache code that may be patched with other machine instructions as V8 executes.
Do you understand what does it mean and how does it differ from JIT in other javascript engines?
But in this context it really doesn't matter. I still don't see any proof that x86 version is better optimized.

Exophase said:
It doesn't matter, the point is that aggressive optimizations were started on x86 were started much earlier, with the big pissing match in 2008 when Chrome was first released.

Oh, it's like ARM wasn't exist before 2008.

Exophase said:
Who was comparing browser performance on ARM then? Were you?

And who was comparing browser performance on x86? Sunspider firstly appeared in mobile phone review articles. No one cares about javascript performance on PC (MS is miles behind other browsers and is not in a hurry to fix it)

Exophase said:
Sure, a 1.8GHz dual core Atom does well vs a 1GHz dual core Cortex-A9 (particularly one on a fairly old SoC running ES chips with a bunch of errata like Pandaboard; look up Exynos 4 results, they're much better). And on multithreaded loads the gap is bigger because Atom has HT.

Did you read the article at all? How did 4-core, 1.4GHz A9 turn into 1GHz dual-core?

Exophase said:
On single threaded workloads Cortex-A9 typically does a little better clock for clock (except for with JS), which is exactly what you'd expect given the respective uarchs.

It would be a real shame if out-of-order A9 lost to in-order Atom clock for clock (but A9 is really not much better). Any way who cares about clock for clock?

Exophase · Oct 30, 2012

LogOver said:
Read it again:
V8 compiles JavaScript source code directly into machine code when it is first executed. There are no intermediate byte codes, no interpreter. Property access is handled by inline cache code that may be patched with other machine instructions as V8 executes.
Do you understand what does it mean and how does it differ from JIT in other javascript engines?
But in this context it really doesn't matter. I still don't see any proof that x86 version is better optimized.

You seriously have no idea what you're talking about.

JIT = dynamic compilation. Period. "when it's first executed" = "just in time." You don't have to be hot spot to be JIT.

LogOver said:
Oh, it's like ARM wasn't exist before 2008.

No, I said Android. Or you think that Google spent a lot of time optimizing for ARM before Android became popular? Dalvik itself didn't even have a JIT until 2010, and certainly not a good one (2-3 times slower than JME) and yet you think that in 2008 Google put as much effort into V8 for ARM as they did for x86, where it DID have a JIT?

Talk about very bad priorities.

LogOver said:
And who was comparing browser performance on x86? Sunspider firstly appeared in mobile phone review articles. No one cares about javascript performance on PC (MS is miles behind other browsers and is not in a hurry to fix it)

In 2008 when Chrome was released? Seriously? You think JS benchmarks started with phones? You must have been paying zero attention back then. JS performance was one of Chrome's biggest selling points on its first release. Before then no one really cared, but it brought that topic to the forefront. See for yourself:

http://news.cnet.com/8301-1001_3-10030888-92.html

MS isn't very behind anymore either, and has published their own articles on JS performance improvements.

LogOver said:
Did you read the article at all? How did 4-core, 1.4GHz A9 turn into 1GHz dual-core?

I was talking about the Pandaboard, obviously? The 4 core Calxeda nodes were shown beating the D525 a lot of the time at much lower clocks...

LogOver said:
It would be a real shame if out-of-order A9 lost to in-order Atom clock for clock (but A9 is really not much better). Any way who cares about clock for clock?

I care about clock for clock when you can get a phone with a Cortex-A9 at 1.6GHz, while Medfield only turbos to 1.6-2GHz. Atom only yields a real clock speed advantage when it budgets a bunch of extra TDP for it. See, here I thought we were talking about Medfield, not D525..

LogOver · Oct 31, 2012

Exophase said:
No, I said Android. Or you think that Google spent a lot of time optimizing for ARM before Android became popular? Dalvik itself didn't even have a JIT until 2010, and certainly not a good one (2-3 times slower than JME)

What you're talking about? What has Dalvik JIT to do with V8?

Exophase said:
and yet you think that in 2008 Google put as much effort into V8 for ARM as they did for x86, where it DID have a JIT?

I think you underestimate Google. Here is a small history lesson for you:
http://www.niallkennedy.com/blog/2008/09/google-chrome.html
Clearly V8 development started from the scratch with x86 and arm optimization in mind.

Exophase said:
In 2008 when Chrome was released? Seriously? You think JS benchmarks started with phones? You must have been paying zero attention back then. JS performance was one of Chrome's biggest selling points on its first release.

JS benchmark as a processor benchmarks started with phones. On PC JS benchmarks were used as arguments in "browsers war" but never as cpu speed test.
But any way, back then Chrome was faster not because of specific x86 optimizations, but because of different approach they took (javascript compilation into native code vs. bytecode interpretation).

Exophase said:
MS isn't very behind anymore either

Try to run any JS benchmark (Octane, Kraken, V8....) and see yourself.

Exophase said:
The 4 core Calxeda nodes were shown beating the D525 a lot of the time at much lower clocks...

Let me fix it for you:
The 4 core Calxeda nodes were shown beating the D525 sometimes at much lower clocks but with 2 more cores.
I hope you understand the difference between core and hyper-threading.

Exophase said:
I care about clock for clock when you can get a phone with a Cortex-A9 at 1.6GHz, while Medfield only turbos to 1.6-2GHz. Atom only yields a real clock speed advantage when it budgets a bunch of extra TDP for it.

And you think 1.6 GHz for A9 is an instant clock? And can you provide any proof that 4-core arm under load (with full clock speed) consumes less power than Medfield at turbo? The only comparison I saw is this:
http://www.anandtech.com/show/6330/the-iphone-5-review/12
Medfield 2GHz consumes little bit more then dual-core Krait. While Tegra3 consumes a lot more.

Exophase · Oct 31, 2012

LogOver said:
What you're talking about? What has Dalvik JIT to do with V8?

The point is that Dalvik performance is more fundamental to Android than V8. And yet Google saw it fit to release Dalvik without even having a JIT for quite a long time.

LogOver said:
I think you underestimate Google. Here is a small history lesson for you:
http://www.niallkennedy.com/blog/2008/09/google-chrome.html
Clearly V8 development started from the scratch with x86 and arm optimization in mind.

Mentioning ARM and x86 in the same sentence there doesn't mean they got equal attention.

In 2011 V8 got some very basic optimizations to massively improve ARM performance:

http://blogs.arm.com/software-enablement/456-googles-v8-on-arm-five-times-better/

And they came from ARM employees, not Google. You can't possibly say Google was aggressively optimizing for ARM when they didn't even have VFP code generation for a float heavy language. I don't think you really understand the implications here. In 2011 was JS performance improving 5 times for V8 on x86? No.

LogOver said:
JS benchmark as a processor benchmarks started with phones. On PC JS benchmarks were used as arguments in "browsers war" but never as cpu speed test.
But any way, back then Chrome was faster not because of specific x86 optimizations, but because of different approach they took (javascript compilation into native code vs. bytecode interpretation).

It makes absolutely zero difference if people were using them as CPU benchmarks, the fact is that in 2008 people were very interested in Javascript performance on browsers all of a sudden. Meaning that the browser with the best JS performance won, and this meant the best x86 code gen. ARM wasn't on the radar at all.

Chrome was the first browser to use JITs for their JS engine, yes. But the others (Firefox, IE, Safari, Opera, etc) really quickly followed suit, because of all the attention Google was getting over it. I never claimed that Chrome's advantage was due to targeting x86 more aggressively than the others, that's silly. Rather they were all targeting code gen and they were all focusing on x86 because that's what people were using. ARM performance wasn't taken seriously until at least 2010.

LogOver said:
Try to run any JS benchmark (Octane, Kraken, V8....) and see yourself.

Many already have. http://codehenge.net/blog/2012/08/javascript-performance-rundown-2012/

Of the three tested V8 only has a big win in the benchmark they developed themselves, big surprise there right?

And MS is still aggressively pursuing improved JS performance, like everyone else. http://encosia.com/interesting-details-on-ie10s-javascript-performance-tweaks/

LogOver said:
Let me fix it for you:
The 4 core Calxeda nodes were shown beating the D525 sometimes at much lower clocks but with 2 more cores.
I hope you understand the difference between core and hyper-threading.

Of course I do.

The point is that the test is extremely apples to oranges for a CPU uarch test because you're looking at very different core and clock speed configurations. And Phoronix isn't reporting things like CPU utilization to try to get an idea of what threading requirements were necessary for each task.

LogOver said:
And you think 1.6 GHz for A9 is an instant clock? And can you provide any proof that 4-core arm under load (with full clock speed) consumes less power than Medfield at turbo? The only comparison I saw is this:
http://www.anandtech.com/show/6330/the-iphone-5-review/12
Medfield 2GHz consumes little bit more then dual-core Krait. While Tegra3 consumes a lot more.

What do you mean "instant clock"? There are SoCs that can clock A9 that high. Where have I ever claimed that FOUR Cortex-A9 cores consume less power than ONE Atom core at similar clocks? Now you're being ridiculous.

Let's focus on what I did say: Cortex-A9 tends to perform a little better clock per clock in single threaded workloads vs Atom, and the two reach similar clocks (although in Medfield Atom can turbo - this could possibly be done in software with A9 SoCs but no one does it. But it's not really a uarch feature, IMO). I haven't said anything about power consumption, and it varies a lot depending on implementation, ie not just what manufacturing process is used but how it was laid out and optimized. But I would easily expect the Cortex-A9s on Samsung's 32nm process, that is in A5r2 and Exynos 44xx for instance, to consume a lot less power at the same clock speed as Saltwell. So yes, much better perf/W for the CPU core, but if the L2 cache and/or memory controller suck that can change things.

I know people like to say stuff about single core Medfields beating 2-4 core Cortex-A9s and Kraits but this is ridiculous, they only look at single threaded tests while making these claims...

Nemesis 1 · Nov 1, 2012

So show us more real results that show multicore use lets see the results .

Intel Medfield Performance Oddities

Lifer

Member

Lifer

Senior member

Lifer

Lifer

Member

Lifer

Member

Lifer

Lifer

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Lifer