Exo,
I can believe the bolded sentence. Krait isn't that much faster than Atom, except in FP. General performance doesn't seem to be hugely than the regular A9 cores.
I think it's more nuanced than that, but it's really hard to gain expectations of a chip where so little information is available. One other impression I got from benches is that raw memory performance was substantially better.
According to ARM though, the 1.4GHz A9 gets SpecINT2K score of ~455. The original Silverthorne based core at 1.6GHz gets 653. 1.8GHz Pineview D525 gets 725.
Where has ARM ever delivered SpecInt2K scores for Cortex-A9? Benchmarks like Spec are influenced by more than the CPU core, especially in the case of Cortex-A9 where even the L2 cache isn't part of the core. Of course ARM could do it on an engineering chip that uses their L2 controller (like everyone does AFAIK) and their memory controller (which not everyone does)..
There's a paper by VIA comparing SpecInt2K scores between Atom and Nano:
http://www.via.com.tw/en/downloads/whitepapers/processors/NanoX2_whitepaper_201107.pdf There's an important additional point in there: on Atom ICC produces scores 25% better than GCC. It also produces scores 20% better on Nano, just to show that this isn't all due to superior scheduling for an in-order processor that's more sensitive to it. Of course Intel is using ICC when reporting Spec numbers but no one is using ICC on Android and I doubt a lot are using it on Windows 8, especially if also targeting WinRT. This was with GCC 4.5.1, so a little old, but typical x86 performance hasn't improved that much since then (typical ARM performance has improved more)
Part of the advantage from ICC is due to it being a legitimately better compiler, but another part is probably due to Intel specifically tuning for Spec (although given the real world nature of Spec tests it's hard to separate the two)
Phoronix tests are same version of GCC vs GCC so I find them more even and useful for hardware comparison. Hopefully someone gets Linux on a Clovertrail tablet so they can test there, although they could probably do a lot of the tests on a stock Win8 machine.
Acer W510 benchmark shows an advantage of 45-65% in TouchXPRT benchmark. That corresponds to the approximate difference in performance for SpecInt2K.
That difference reduces to 20-60% on the WebXPRT benchmark, which can be explained by having the application multi-threaded to some degree at least.
Those are some data points. Another one is the Word boot where the Tegra 3 machine seems to beat out the Clovertrail. That could be due to storage speeds instead, although the big CPU usage spikes would seem to suggest it isn't I/O bound that whole time.
I asked in the comments for CPU utilization graphs so we could make real analyses about threading vs performance instead of just guessing. This should have been easy for Anand to do, and I hope this gains traction.
It's going to take more than "multi-threaded to some degree" to really make a difference here, when comparing 4C vs 2C/4T on a system that gets a large boost from HT. You'd probably need over 2.5x threading gain to start making a difference on Tegra 3. I personally don't think four synchronously clocked/same voltage cores is a great setup for client loads. That's why I think HT, turbo boost, Qualcomm's asynchronous domains, and big.LITTLE (but much less so Tegra's "companion cores") are very nice features for this space.
Atom's Hyperthreading brings ~35% in SpecIntRate. That with 60% single threaded advantage translates into ~2.2x gain, with probably quad core in Tegra bringing not double, but 1.7-1.8x. That makes sense too.
1.7-1.8x on what, very parallel stuff that maxes both cores without adding a lot of overhead doing so? If so, why? L2 cache competition, bandwidth starving? The latter is going to be more of a Tegra 3 problem than a Cortex-A9 problem, given everyone else is using dual channel controllers..
Regarding memory bandwidth some more.. Anand didn't say what type of memory is used on the tablets (doesn't seem to say it on the W510 review either). Ark says CloverTrail supports up to 400MHz LPDDR2 so we can go with that (
http://us.acer.com/ac/en/US/content/model-datasheet/NT.L0KAA.001 at least confirms LPDDR2 and going with < 400MHz seems crazy to me) vs 750MHz DDR3L in Surface RT. That means the raw bandwidth is not that different. But this has implications for power consumption. Even though it's one channel vs two, power consumption of memory doesn't scale linearly with frequency, it's much worse than that (just like with processors). So Tegra 3 takes a big hit running this high frequency RAM. It also takes a hit for being DDR3L instead of LPDDR2, especially in idle.
Memory power consumption will tend to couple pretty closely with CPU power consumption. I don't know if the rails measured are where the memory draws from, but even if they aren't, the memory controller will also use more power for the much higher frequency even w/lower width (64-bit memory controller probably doesn't use anywhere close to twice what a 32-bit memory controller does because a lot of the commanding is redundant).
Anand doesn't mention memory at all, just attributes everything to the SoC which is IMO sloppy.
This is also a major factor in the Cortex-A15 Chromebook power consumption tests that I rarely see mentioned, where dual channel DDR3L 800MHz is used.
A15 is indeed much faster, with score of ~900 for 1.5GHz using Spec.
Source please.. is this from ARM's recent claims regarding A57?