2013 core sizes: A7-A15-Jaguar-Atom-Haswell

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
YOU wish . AVXII will be quickly adapted , We will see programs already when haswell launches

How many AVX-enabled apps exist now? Serious question. I aware of only one - LinX/IBT.

The thing that sucks about new ISA extensions is you (the consumer) have to pay for them twice. First you have to buy the hardware, the CPU, and then you have to buy yet-another-version of your existing software.

Legacy software cost a bundle to acquire. I only upgrade my software if I absolutely have to. I still use the aged and antiquated Photoshop 6.0 (from circa 1998?) because it cost me $700 and it'll cost me another $700 to update it. Investing $1400 into photoshop software is insane from my perspective.

I am far more interested in buying a $200 CPU that will increase the performance of my 12yr old software than I am interested in buying a few thousand dollars worth of updated software that would run faster thanks to having incorporated new ISA extensions.

Gaussian98 is another example, that software cost me nearly a thousand dollars, and to replace it (even as a consumer) will be another thousand bucks :\

Another app that comes to mind is Mathematica, I finally broke down and upgraded my 10+ yr old Mathematica 2.0 to Mathematica 8.0 but holy hell if it didn't cost me some $500 :eek:

When the hard-ware is dirt cheap compared to the software, I don't want new hardware (that I am already paying for) to require me to further spend money on acquiring new(er) software, I want my legacy apps to run faster with the new hardware.

AVX2 sounds great, so did AVX1, but I have a feeling they won't be prevalent and relevant for another 5-8yrs.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Which begs the question: why is AMD pouring any resources into BD when they are so far behind in every imaginable metric, when they could allocate those resources to Kabini where they are ahead and where their schwerpunkt lies? At the very very least, these guys should not be running 2 BD programs (Trinity and Vishera).

AMD is crazy.

For a very long time, I think AMD management was simply deluded about the value they were going to get from Bulldozer(/PD/SR/etc). I think they wanted to be a server company - they probably looked at Xeon (or 2003-2005 Opteron) and the potential margin %s made their eyes turn into dollar signs. Somehow they ended up convinced that Bulldozer could actually deliver those margins (whether it was through self-delusion or unreaslistic projections from engineering doesn't really matter), and dumped massive resources into those projects. How that continued after Bulldozer silicon was in hand is beyond me. Deception coming from engineering about how much improvement the subsequent cores in the BD family would deliver?

Past experience for AMD probably convinced them that margins in the client side are terrible, but it's strange that they didn't realize sooner that the client market can deliver nice margins when you have a competitive product (the cat cores).
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
How many AVX-enabled apps exist now? Serious question. I aware of only one - LinX/IBT.

The thing that sucks about new ISA extensions is you (the consumer) have to pay for them twice. First you have to buy the hardware, the CPU, and then you have to buy yet-another-version of your existing software.

The other annoyance is that Intel fuses AVX off on a lot of parts that could be supporting it, mainly Pentiums and Celerons that I'm sure make up a large part of the sales volume. I'm hoping they'll change their approach with AVX2 but I'm not that optimistic.

What you'll probably hear is the claim that AVX2 will bring forth a revolution in compiler auto-vectorization because of vector gather support, and that programs will go from near 0% SIMD to near 100% in the performance critical parts. That now any loops with independent iterations will be auto-vectorized. Of course that's ignoring any conditional flow within the loop body and is assuming all loops will be better off with 8x vectorization.

I do a lot of integer SIMD personally and if I were writing stuff for high end x86 I'd probably love AVX2.. not just for 256-bit integer and gather but I really like the new vector extract and deposit instructions, very useful for a lot of stuff I do. But I still don't think this is going to offer a huge average boost everything that's recompiled. Especially not crazy things like 5x, come on.
 

pablo87

Senior member
Nov 5, 2012
374
0
0
For a very long time, I think AMD management was simply deluded about the value they were going to get from Bulldozer(/PD/SR/etc). I think they wanted to be a server company - they probably looked at Xeon (or 2003-2005 Opteron) and the potential margin %s made their eyes turn into dollar signs. Somehow they ended up convinced that Bulldozer could actually deliver those margins (whether it was through self-delusion or unreaslistic projections from engineering doesn't really matter), and dumped massive resources into those projects. How that continued after Bulldozer silicon was in hand is beyond me. Deception coming from engineering about how much improvement the subsequent cores in the BD family would deliver?

Past experience for AMD probably convinced them that margins in the client side are terrible, but it's strange that they didn't realize sooner that the client market can deliver nice margins when you have a competitive product (the cat cores).

You hit the nail on the head. They bought ATI but were still thinking like AMD (we can only prosper with high end stuff) instead of thinking like AMD/ATI.

The WSA also affected their thinking IMO. Had Brazos been fabbed at Global Foundries, I personally believe they would have pursued a follow on much harder. I digress but that WSA is so bad, if as private equity you wanted to buy AMD, how do you (negatively) value such an off balance sheet liability that carries on until 2024 - negative $10B perhaps?

In this regard, we have our answer today from fud that Kabini will be fabbed at TSMC and thus will not contribute to the $1.1B take or pay which now explains why Richland carries on...:rolleyes:

I don't think investors have figured this out yet....though corporate customers seem to which is why server, almost regardless of how close to intel they get even at half the price (certainly not at half the cost), will never get back to the glory days. Corporate customers think like investors and what they're thinking - I'm wagging here - is this WSA/GloFlo stuff is a nightmare for this company.

Sorry Hans for derailing your thread
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
How many AVX-enabled apps exist now? Serious question. I aware of only one - LinX/IBT.

I think there is more than you normally expect due to compiler support for quite some time. The question however is more, how many apps with AVX actually got a large dependency on AVX instructions for the total performance.

Example of AVX enabled products:
http://www.niksoftware.com/site/

The story behind:
http://softtalkblog.com/2011/04/12/...se-study-for-parallel-computing-optimisation/

Cakewalk SONAR is another example of AVX enabled products.
http://software.intel.com/sites/default/files/m/d/4/1/d/8/Cakewalk_AVX_White_Paper.pdf

But its not something you publicly get informed of with a sticker on a product, logo or something. That might be something for the future tho.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
An even more interesting question, IMO, would be how much of popularly used software is compiled with ICC as opposed to MSVC or GCC. That is, the software that's C/C++/Fortran in the first place. Stuff generated by a JIT or immature compiler for a less popular or newer language are pretty unlikely to use AVX.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
How many AVX-enabled apps exist now? Serious question. I aware of only one - LinX/IBT.

The thing that sucks about new ISA extensions is you (the consumer) have to pay for them twice. First you have to buy the hardware, the CPU, and then you have to buy yet-another-version of your existing software.

Legacy software cost a bundle to acquire. I only upgrade my software if I absolutely have to. I still use the aged and antiquated Photoshop 6.0 (from circa 1998?) because it cost me $700 and it'll cost me another $700 to update it. Investing $1400 into photoshop software is insane from my perspective.

I am far more interested in buying a $200 CPU that will increase the performance of my 12yr old software than I am interested in buying a few thousand dollars worth of updated software that would run faster thanks to having incorporated new ISA extensions.

Gaussian98 is another example, that software cost me nearly a thousand dollars, and to replace it (even as a consumer) will be another thousand bucks :\

Another app that comes to mind is Mathematica, I finally broke down and upgraded my 10+ yr old Mathematica 2.0 to Mathematica 8.0 but holy hell if it didn't cost me some $500 :eek:

When the hard-ware is dirt cheap compared to the software, I don't want new hardware (that I am already paying for) to require me to further spend money on acquiring new(er) software, I want my legacy apps to run faster with the new hardware.

AVX2 sounds great, so did AVX1, but I have a feeling they won't be prevalent and relevant for another 5-8yrs.


If you're able to get by with such outdated versions of the software, you're probably not a power user of the software. If you're not a power user, you could probably get by with the open source clones that do get free updates and will benefit from new extensions.
Photoshop has easy replacements in the gimp (plus gimpshop) or paint.net.
Mathematica...less easy replacements, but Sage and Octave are worth taking a look at.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
IDCs plan is very clever, since he is a basic user of those programs and they fit exactly his needs why bother upgrading to the latest bloated version and not upgrade his cpu which will make all the old code run lightning fast, thats why he craves IPC above else, Intels chips cope very well with old code in contrast to AMDs which are server/workstation designs.
 

NTMBK

Lifer
Nov 14, 2011
10,450
5,833
136
If you're able to get by with such outdated versions of the software, you're probably not a power user of the software. If you're not a power user, you could probably get by with the open source clones that do get free updates and will benefit from new extensions.
Photoshop has easy replacements in the gimp (plus gimpshop) or paint.net.
Mathematica...less easy replacements, but Sage and Octave are worth taking a look at.

He's using "outdated" versions with over a decade's worth of CPU improvements. That's an order of magnitude more performance than he started out with (more, if the apps were multi-core aware). They still do the same tasks he needed them to do when he bought them, just now they do it a hell of a lot quicker.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
If you're able to get by with such outdated versions of the software, you're probably not a power user of the software. If you're not a power user, you could probably get by with the open source clones that do get free updates and will benefit from new extensions.
Photoshop has easy replacements in the gimp (plus gimpshop) or paint.net.
Mathematica...less easy replacements, but Sage and Octave are worth taking a look at.

I use the apps every day, multiple times a day. It is just that for these particular apps I have not bothered to upgrade them because the features added to the newer releases are features that I do not need (and they do not supplant or augment the existing features I do use).

That isn't to say that I don't have plans to update them, as I do update software regularly, but I'm not going to prioritize the updating of those three specific applications over the updating of other software packages unless there is some tangible ROI from doing so.

Thus far the higher ROI has been obtainable through hardware updates, not software updates. I'm expecting the ROI to not be there in about 4yrs though, so the software updates will happen then.
 

evangel76

Junior Member
Dec 14, 2008
10
0
0
ok, the 75% claim of the original post is totally bogus, as it is a comparison of the theoretical capabilities, it does assume that the code does not branch, and do not wait for memory, better said , it is just not real comparison and does not represent ANY real code.

Francois Piednoel
 

Acanthus

Lifer
Aug 28, 2001
19,915
2
76
ostif.org
For me, the story will be about performance and battery life in the apps i use.

A7 is pretty slow at 1.2ghz in my HTC Sensation on Jelly Bean when i am running encryption and browsing on WiFi in a full (non-mobile) browser.

I would be interested in a dual core phone with an upgraded camera and a standby battery life of more than 3 days with a talk time of more than 4 hours.

Quad core doesn't really interest me, there's no app support for that kind of threading.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
ok, the 75% claim of the original post is totally bogus, as it is a comparison of the theoretical capabilities, it does assume that the code does not branch, and do not wait for memory, better said , it is just not real comparison and does not represent ANY real code.

Francois Piednoel

So it is peak IPC then.

Do you have a better number? For realized IPC I mean.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't see the problem with the 75% A9 IPC number. Two things are brought up, branches and memory stalls, so let's look at both..

A7 has a similar pipeline length to A9, meaning it has a similar branch mispredict penalty - in-order doesn't change this. On A8, A9, and A15 taken branches always have a 1-cycle fetch bubble. while on A7 this is alleviated if it hits the 4-entry BTIC (so this will apply to most loop branches, for instance). But it's small compared to the relatively huge BTAC in A9 (configurable 512-4096); the 8-entry BTAC in A7 is just used for indirect branches. Without knowing the penalty for missing on either processor it's hard to say what the real world implications are. Effective mispredict rate will also be somewhat lower due to having a 256-entry GHB vs A9-s 1024-16384 entry GHB array; in neither case is history register length specified (it's 10-bits on A8 though). So I'd expect overall branch performance to be only slightly worse at most.

For memory stalls, OoO does help reduce miss penalty but only modestly in A9's sake, given that it can't reorder memory accesses. Beyond that A7 isn't really disadvantaged - L1 and L2 can be configured to similar sizes (up to 1MB in A7's case, which is bog standard for A9 SoCs) and the L2 is likely going to be lower latency because it's more tightly coupled. Main memory performance is out of the core's hands in both cases, but there's no reason why it has to be worse for an A7 implementation either. Both of them have auto-prefetchers which hit L1 directly. A9 does have better TLB options: A7 has 10/10 entry L1 DTLB/ITLB and 256 entry L2 TLB vs A9's 32 DTLB, 32/64 ITLB (configurable) and 64-512 L2 (the 512 option is probably not popular though).

Probably the biggest contributors to the 75% number is the lack of OoO and reduced pairing capability.

For me, the story will be about performance and battery life in the apps i use.

A7 is pretty slow at 1.2ghz in my HTC Sensation on Jelly Bean when i am running encryption and browsing on WiFi in a full (non-mobile) browser.

I would be interested in a dual core phone with an upgraded camera and a standby battery life of more than 3 days with a talk time of more than 4 hours.

Quad core doesn't really interest me, there's no app support for that kind of threading.

That's a qualcomm Scorpion, not a Cortex-A7. Those two processors have nothing to do with each other.

So it is peak IPC then.

Do you have a better number? For realized IPC I mean.

You do yourself a disservice taking his claim at face value with zero explanation or reference.

ARM has web benchmark numbers which show A7 slightly outperforming A8 at same clock (in Dhrystone the situation is opposite, A8 slightly outperforms A7) - this isn't surprising considering the smaller pipeline and automatic prefetching. So the situation in so-called realized IPC probably favors A7 vs A8. It's easy to see A9 gets around 20-30% better IPC than A8 in typical real world programs if you don't involve hand optimized NEON code.
 
Last edited:

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
42Pfft.

Regarding the thread in general, are the chips going to seriously be that small? Or is there something not being considered here?

Are these just SoC/BGA variations, rather than PGA, LGA?

These are the size of the individual cores, not the size of the SOC IC's themselves. I was confused at first too.
 

Acanthus

Lifer
Aug 28, 2001
19,915
2
76
ostif.org
That's a qualcomm Scorpion, not a Cortex-A7. Those two processors have nothing to do with each other.

ARM has web benchmark numbers which show A7 slightly outperforming A8 at same clock (in Dhrystone the situation is opposite, A8 slightly outperforms A7) - this isn't surprising considering the smaller pipeline and automatic prefetching. So the situation in so-called realized IPC probably favors A7 vs A8. It's easy to see A9 gets around 20-30% better IPC than A8 in typical real world programs if you don't involve hand optimized NEON code.

A dual core A8 is similar performance to a dual core A7.

Hence my post about it being sluggish in what i use it for.
 

MightyMalus

Senior member
Jan 3, 2013
292
0
0
What wlee15 is trying to point out it that Scorpion(and also Krait) are custom chips.
The only thing similar to ARM chips are some parts of the architecture. Which for Scorpion, that is a mix of A8 and A9.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
I don't see the problem with the 75% A9 IPC number. Two things are brought up, branches and memory stalls, so let's look at both..

Only 25% penalty for being in-order and not being able to issue the same number of instructions per clock?
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
I guess for most phone users web browsing speed is the area where the speed is mostly needed - where you can really feel your old A11 variant is lagging.

If ARM claim about the improved memory subsystem gives a better webbrowsing experience (fx. 256 TLB vs. the standard 128TLB for A9), i think most users will not be able to separate an A9 from an A7 in a phone.

http://www.arm.com/products/processors/cortex-a/cortex-a7.php
"Increased TLB size (256 entry, up from 128 entry for Cortex-A9 and Cortex-A5)Increases performance for large workloads like web browsing
"

If that is true, then this 0.45mm2 is able to saturate a huge part of the market, even beyond the 100usd low end market.
 

djgandy

Member
Nov 2, 2012
78
0
0
I guess for most phone users web browsing speed is the area where the speed is mostly needed - where you can really feel your old A11 variant is lagging.

If ARM claim about the improved memory subsystem gives a better webbrowsing experience (fx. 256 TLB vs. the standard 128TLB for A9), i think most users will not be able to separate an A9 from an A7 in a phone.

http://www.arm.com/products/processors/cortex-a/cortex-a7.php
"Increased TLB size (256 entry, up from 128 entry for Cortex-A9 and Cortex-A5)Increases performance for large workloads like web browsing
"

If that is true, then this 0.45mm2 is able to saturate a huge part of the market, even beyond the 100usd low end market.

By the time you factor in the whole SoC going 0.45mm is often quite pointless for area saving alone. At these sizes area cost is really not that important. You need to pick the product that creates the best package. The A7 is there to keep power usage low. I don't see it as product that allows much cheaper devices.

Cheaper devices come from the ability to put more on the SoC. Only having to manufacture a single chip has huge cost advantages. Plus a lot of stuff is just "good enough" so ends up being shrunk and shrunk over time (video decode for example)

Even then though, the physical SoC itself (excluding R&D) is a tiny cost compared to all the other parts of mobile devices.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Only 25% penalty for being in-order and not being able to issue the same number of instructions per clock?

And why not?

OoO isn't a binary quality, it's a continuum and even consists of many different techniques. If you do the bare minimum amount of reordering you won't get a fantastic performance improvement, especially if your compilers are any good. A9's OoO isn't that aggressive.

Saying that it can't issue the same number of instructions per clock is an unfair generalization. Both CPUs have 64-bit fetch (2-4 instructions depending if in ARM or Thumb-2) and 2-wide decoders. The difference is that A7 doesn't have the same co-issue restrictions A8 or A9 does (ie different execution unit layout).

ARM hasn't been specific on what this means exactly. All we know is that there are four excecution units and a "dual issue" unit. I'm guessing it can't do things like pair memory, branches, and multiplies together. That dual issue unit is probably capable of some if not all ALU instructions (for instance, may not be capable of shifts).

Qualcomm MSM8260 Dual-Core 1.2Ghz Cortex A8 / Scorpion CPU with Adreno 220 GPU

http://www.mobiletechworld.com/2011/04/12/htc-sensation-hardware-specifications/

That site is wrong. Scorpion isn't Cortex-A8, it's a totally unrelated CPU. There has been a lot of misinformation about this from the start. Performance is similar though. I can't say how similar because it's not something people looked into very carefully.

I would be doing myself a disservice if I didn't ask him to expand on the matter, which is all I was doing there.

You were giving him the benefit of the doubt that his claim that 75% IPC only applies in theoretical situation was actually true. You weren't asking him to expand on it, you were asking him what the real IPC is. Of course maybe you were just exposing that he has no references and nothing to pull an answer from...