Why have AMD APUs failed on the market?

positivedoppler · Feb 9, 2015

coercitiv said:
Now we know why they call it Zen.

lmao

NTMBK · Feb 9, 2015

mrmt said:
From the link you posted:

" After 30 minutes of full CPU loading, we found that the core temperatures were held below 100 C, the clock speeds remained at 3.2 GHz for the CPU (the GPU was 'idling' at 200 MHz) and there was no thermal throttling to be seen. At this point, we introduced Furmark loading into the picture. It becomes clear that the system gives preference to CPU performance. >>The GPU remains throttled at 200 MHz, while the CPU cores don't thermally throttle<<."

See? Base clock is base clock for a reason, at least in Intel processors. That is more than we can say of AMD processors.

That quote is talking about the CPU only Prime95 run. When they loaded both CPU and GPU, the CPU dropped below base clock. Take a look at the images- 3 of the cores are below 3.2GHz, despite flat out 100% utilization. And the GPU has plummeted down to its "idle" clock of 200MHz, despite being fully loaded.

coercitiv · Feb 9, 2015

mrmt said:
See? Base clock is base clock for a reason, at least in Intel processors. That is more than we can say of AMD processors.

BTW take a look at that screenshot - a 65W TDP CPU is reporting a 77W Package Power usage under combined load, while CPU load alone is enough to reach 67W.

Enigmoid · Feb 9, 2015

DrMrLordX said:
Bear in mind that, in some circumstances, you won't be spending more for an iGPU. Consider what most people recommend for a budget-to-midrange machine nowadays: a Haswell i5 on LGA1150. The cheapest is the i5-4440 (so far as I can tell) which is around ~170. In contrast, you can grab an A10-7850k for ~$135-$140, depending on where and when you buy it. If DX12 shows up big for iGPUs and the 7850k starts consistently beating i5s in new games while running a 290x or 380x or whatever, then you haven't spent extra money buying the better DX12 gaming CPU.

But Kavri will never consistently beat an i5 in gaming. Locked i5's have more throughput and ST performance. At best the two will be gpu limited and equal but kaveri will never be ahead.

Valantar · Feb 9, 2015

Attempting to bring this discussion back to its topic, here are my two cents about AMDs failings:

-The main issue as I see it is simply making the wrong SKUs compared to what people actually want/need. This relates to the dream of HSA drawing customers - applications are simply not taking advantage of the IGP, thus making this point moot (except in a few fringe cases). To put it bluntly: unless it makes web browsing faster, 90% of the market will not care. HSA seems like a fantastic concept, but it simply isn't useful today. The people who look for those features need more power, and go for a fast CPU and a powerful GPU. The people who don't just want a cheap yet fast CPU. AMD should have released a 6 compute unit APU with 4 CPU cores and 2 GPU modules for general usage, and similarly, a 14 compute unit - basically an A10-7850K with two more GPU modules. Yes, this would make it very power hungry (compared to CPUs with crappy iGPUs), but it would make it viable for gaming, and would still be cheaper AND less power hungry than a Celeron + GPU (say, GTX 750ti), and would still run relatively cool and quiet with a decent air cooler. Bundle something like a Hyper 212 EVO in the box, and it would sell like hotcakes to anyone looking for a low-end gaming setup or above-average HTPC.

-Of course, this is all for the desktop, which isn't that big a market any more. And all things considered, the high-end Kaveri FX 35W chips should have found a few homes in low-end gaming(-ish) laptops - but these still come with Intel CPUs and utterly useless low-end GPUs (GT 720m and the like). In this case, I'm left wondering if the OEMs were too conservative to gamble on cheap-ish "gaming" laptops (I for one would love to see a 13" APU-powered laptop with an SSD and decent gaming performance at ~720p). And considering the 35W TDP includes the iGPU, these APUs shouldn't be entirely horrible when it comes to battery life (considering my old ThinkPad X201 with a 35W i5-520M gets ~5 hours with the standard 3-cell battery), unless there is something fundamentally awful about the Kaveri architecture, of course (like the iGPU not being able to scale down to a low enough power draw when idle).

-Combining the two previous points: why not make a ~50-60W TDP properly gaming focused mobile APU? This would match the combined TDPs of a Haswell-U (17W) with a Geforce GT720m/740m (33W, according to Futuremark), in a MUCH smaller package, with (likely) vastly superior gaming performance - and probably a lot cheaper too. Of course, this would need to be paired with high-end memory, at the very least 1866MHz - but that's readily available in SODIMM format. Even 2133MHz SODIMM can be found quite easily. A solution like this would (compared to a CPU+GPU setup) allow for simpler (=cheaper, better) cooling designs, simpler (=cheaper, more efficient) motherboard designs, smaller computers (or larger batteries). It's hard to imagine more of a win-win scenario for both AMD, the OEMs and users/gamers. Throw in eDP Adaptive Sync, and you would have a surefire winner in the compact/budget gaming laptop category.

-Speaking of OEMs, they seem to lack the creativity (or just be too dumb) to make decent AMD laptops. The only models out there seem to be running ultra-low-end E1 chips and the like, saddled with too little memory (2GB) and 5400rpm HDDs. These barely enter the "usable" category, which isn't exactly good for AMDs reputation (especially when you can get a Haswell Celeron (or at least an Atom "Celeron") for maybe €50 more with much more decent performance. For example, they could translate the savings of using a cheap AMD APU into ditching the (slow!) 320-500GB HDD they always use in favour of a 128GB SSD. Even a low-end, cheapo SSD would make such a system feel FAR faster than a comparably priced Intel system for general usage. Plus, most users simply don't need that amount of storage - or if they do, they have an external HDD by now.

-On a related note, AMD should develop an Intel/Apple-like SSD caching tech, but aimed specifically at the low end. Make a 32/64GB SSD + 500GB HDD system feel like a 128GB SSD system in general usage, and the market will respond.

-On a deeper level, the module/core terminology/design just doesn't seem to be good enough. Intel's HyperThreading seems to work in almost all cases, while AMDs architecture simply doesn't. When compared in the same benchmarks, Intel 2C/4T and AMD 2U/4C units often show huge performance disparities (i.e. Intel CPUs behave like they have four cores, while AMD APUs behave like they have two). Even factoring in prices, this just isn't good enough. I know my understanding of the technical issues here isn't nearly good enough, but the end result is still the same - AMD APUs consistently underperform.

-Being on an ancient process node compared to Intel of course doesn't help.

-And last but not least: With the way Intel has been subsidizing its Atom platforms in tablets, who's to say that they aren't doing something similar in the low-end PC market? Of course not in the same magnitude, or even coming close to losing money on it, but I wouldn't be surprised for a second if Intel gave preferential pricing deals to OEMs who agree to keep AMD on the sidelines as much as possible.

TL;DR: this is complicated. Read the post.

frozentundra123456 · Feb 9, 2015

Enigmoid said:
But Kavri will never consistently beat an i5 in gaming. Locked i5's have more throughput and ST performance. At best the two will be gpu limited and equal but kaveri will never be ahead.

I think the assumption was that Kaveri will pull ahead because of gpu compute on the igp.

Remains to be seen though. I remain skeptical.

NTMBK · Feb 9, 2015

frozentundra123456 said:
I think the assumption was that Kaveri will pull ahead because of gpu compute on the igp.

Remains to be seen though. I remain skeptical.

But the i5 also has a pretty good IGP, and I would hope that would get used too. (If DX12 somehow magically made developers care about mixed-GPU game engines, which I kind of doubt it will.)

DrMrLordX · Feb 9, 2015

NTMBK said:
But the i5 also has a pretty good IGP, and I would hope that would get used too. (If DX12 somehow magically made developers care about mixed-GPU game engines, which I kind of doubt it will.)

The 32-bit fp and integer throughput of Kaveri's iGPU alone is higher than the 32-bit fp and integer throughput of an i5's Haswell cores and its iGPU combined.

Case in point:

http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/2

The HD4600 has peak theoretical throughput of 432 GFlops (presumably in 32-bit). I have a 7700k with only 384 shaders pushing 786.6 GFlops 32-bit fp @ 1028 mhz. Four Haswell cores are not going to make up that deficit, HT or no. Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .

So long as the API can help the programmer keep the iGPU pegged at 100% utilization doing physics, handling loops, and whatever stuff can be offloaded to the iGPU (which is more stuff than can be thrown at a dGPU, thanks to latency), Kaveri will always win. What remains to be seen is how such usage will affect actual framerates. If the game in question makes only intermittent use of the iGPU, it may make for a less-than-smooth experience.

ElFenix · Feb 9, 2015

a) throttling of any sort in a power virus scenario isn't something i'm much concerned with. thermal throttling under real world usage is something that i might be concerned with. furmark doesn't even get you fake internet points.

b) dynamically rebalancing the thermals for the CPU vs. the GPU so that whichever one is the bottleneck can use a bit more of a soaked cooling solution is kinda neat. assuming it's working as intended it'd end up giving you better performance overall. i really don't care that the CPU can run all the way up to its advertised clocks when the GPU is the bottleneck. in that situation i want the GPU to be able to clock up.

NTMBK · Feb 9, 2015

DrMrLordX said:
The 32-bit fp and integer throughput of Kaveri's iGPU alone is higher than the 32-bit fp and integer throughput of an i5's Haswell cores and its iGPU combined.

Case in point:

http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/2

The HD4600 has peak theoretical throughput of 432 GFlops (presumably in 32-bit). I have a 7700k with only 384 shaders pushing 786.6 GFlops 32-bit fp @ 1028 mhz. Four Haswell cores are not going to make up that deficit, HT or no. Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .

So long as the API can help the programmer keep the iGPU pegged at 100% utilization doing physics, handling loops, and whatever stuff can be offloaded to the iGPU (which is more stuff than can be thrown at a dGPU, thanks to latency), Kaveri will always win. What remains to be seen is how such usage will affect actual framerates. If the game in question makes only intermittent use of the iGPU, it may make for a less-than-smooth experience.

Theoretical maximum throughput doesn't translate into real world performance. For a start, both Intel and AMD APUs will face the same dual-channel DDR3 bottleneck- AMD is rated for higher performance DIMMs, but Intel's memory controller is more efficient and is backed up by a big fat L3 cache shared between GPU and CPU. And of course as soon as you start seriously loading that Kaveri GPU, the CPU clock is going to plummet and affect overall performance.

But yeah, I do hope someone gets round to implementing GPU physics on the iGPU. Almost every modern gaming build has a very competent parallel processor sitting idle on the APU, and it's a real shame.

DrMrLordX · Feb 9, 2015

ElFenix said:
a) throttling of any sort in a power virus scenario isn't something i'm much concerned with. thermal throttling under real world usage is something that i might be concerned with. furmark doesn't even get you fake internet points.

Assuming you're referring to the p5 state Kaveri throttling under iGPU load, bear in mind that nearly any significant iGPU load will do this, even if the iGPU isn't necessarily the bottleneck and even if thermals aren't a big deal. In Windows, Kaveri's iGPU throttles down to p5 state in
any 3DMark, any Unigine test . . . you could probably get it to throttle running GLQuake (lulz). It doesn't take a power virus to make this behavior emerge.

So you could actually have an older game (or a newer game at low res with low details) that is actually CPU-limited suffer CPU throttling because the iGPU is seeing some non-trivial amount of use. It can also affect 3DMark physics scores which are CPU-dependant.

DrMrLordX · Feb 9, 2015

NTMBK said:
Theoretical maximum throughput doesn't translate into real world performance.

Yes, you'll never get 432 GFlops out of HD4600. At least in some benchmarks, I can get 786 GFlops out of Kaveri.

For a start, both Intel and AMD APUs will face the same dual-channel DDR3 bottleneck- AMD is rated for higher performance DIMMs, but Intel's memory controller is more efficient and is backed up by a big fat L3 cache shared between GPU and CPU.

Ah, now you are getting into murkier territory. Can HD4600 make use of the CPU cache hierarchy? And will many (or any) of these compute tasks being passed off to the iGPU be bandwidth-sensitive? Some of the Luxmark testing done by AtenRa indicates that, for that application (rendering using the Luxrender core), it is somewhat sensitive to memory bandwidth, but not by all that much (even in the CPU + GPU render testing, where the CPU and iGPU should theoretically be fighting one another over a shared pool of bandwidth). In contrast, GPUPI ignores memory bandwidth within realistic boundaries (no difference on iGPU performance between DDR3-1600 and DDR3-2400 on Kaveri).

And of course as soon as you start seriously loading that Kaveri GPU, the CPU clock is going to plummet and affect overall performance.

Only in Windows, and only if the end-user isn't using amdmsrtweaker to stop that behavior.

But yeah, I do hope someone gets round to implementing GPU physics on the iGPU. Almost every modern gaming build has a very competent parallel processor sitting idle on the APU, and it's a real shame.

It would be a boon for Intel and AMD processors, especially once Gen8/Gen9 comes into ubiquity. We've had GPU physics for awhile.

VirtualLarry · Feb 9, 2015

Valantar said:
-Speaking of OEMs, they seem to lack the creativity (or just be too dumb) to make decent AMD laptops.
For example, they could translate the savings of using a cheap AMD APU into ditching the (slow!) 320-500GB HDD they always use in favour of a 128GB SSD. Even a low-end, cheapo SSD would make such a system feel FAR faster than a comparably priced Intel system for general usage. Plus, most users simply don't need that amount of storage - or if they do, they have an external HDD by now.

This, I am totally in favor of SSD for lower-end laptops. For many people, it's not about the space, it's about the performance. It's funny, too, a decent 120GB SSD ($60?), isn't very different in price than a 500GB or 1TB laptop HDD ($50-60), in terms of component costs in the consumer upgrade market. Either OEMs are getting insanely sweet deals on magnetic storage, or they're just stupid. Surely they could get some OEM Kingston V300 equivalent (with the cheaper Async NAND), for cheap, if they bought millions.

ElFenix · Feb 9, 2015

DrMrLordX said:
Assuming you're referring to the p5 state Kaveri throttling under iGPU load, bear in mind that nearly any significant iGPU load will do this, even if the iGPU isn't necessarily the bottleneck and even if thermals aren't a big deal. In Windows, Kaveri's iGPU throttles down to p5 state in
any 3DMark, any Unigine test . . . you could probably get it to throttle running GLQuake (lulz). It doesn't take a power virus to make this behavior emerge.

So you could actually have an older game (or a newer game at low res with low details) that is actually CPU-limited suffer CPU throttling because the iGPU is seeing some non-trivial amount of use. It can also affect 3DMark physics scores which are CPU-dependant.

what was the review on steam for 3dmark? worst pay to win game ever?

regardless, that would be sub-optimal behavior. it'd be fantastic if it could monitor instantaneous frame rates to determine where the bottleneck is to dynamically reassign power consumption. but that might require very tight integration between drivers, operating system, and processor firmware. it's something apple could accomplish being vertically integrated. amd? not so much.

-Speaking of OEMs, they seem to lack the creativity (or just be too dumb) to make decent AMD laptops. The only models out there seem to be running ultra-low-end E1 chips and the like, saddled with too little memory (2GB) and 5400rpm HDDs. These barely enter the "usable" category, which isn't exactly good for AMDs reputation (especially when you can get a Haswell Celeron (or at least an Atom "Celeron") for maybe €50 more with much more decent performance. For example, they could translate the savings of using a cheap AMD APU into ditching the (slow!) 320-500GB HDD they always use in favour of a 128GB SSD. Even a low-end, cheapo SSD would make such a system feel FAR faster than a comparably priced Intel system for general usage. Plus, most users simply don't need that amount of storage - or if they do, they have an external HDD by now.

it's not lack of creativity, it's market segmentation. the OEMs don't make much money on <$500 laptops. so, people who are in the know enough to want an SSD have to pay a premium for it and step up to something that's closer to (or over) $1000. way more margin on that. it's just like premium screens or anything else. see: the airline industry.

TheELF · Feb 9, 2015

DrMrLordX said:
Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .

This is not a turbo,they have two states one for low usage and one for high usage,it can run all day at full mhz (just like any dgpu) ,more than that you can even overclock the igp on any haswell and it will still run on low mhz when there is no need for more.

More on the point, IGPU computing will not catch on because it's usefulness is just as limited as that of many cores in general,people just don't care about luxmark or whatever fringe software that can utilize it.

It's all about either gaming (high budged) or media/office (low budged)
APUs are not good enough at gaming,or better put,they are not a good bargain for the level of gaming they provide.You can buy a dgpu+cpu for the same amount of money and have similar to slightly better performance.
Plus that you can o/c each component on its own instead of worrying that o/c ing one will throttle the other.

And on the media side they are way overpriced if you think about it,a g1820/1840 celeron can play up to 4k video @ 24-25fps.

Enigmoid · Feb 9, 2015

DrMrLordX said:
The 32-bit fp and integer throughput of Kaveri's iGPU alone is higher than the 32-bit fp and integer throughput of an i5's Haswell cores and its iGPU combined.

Case in point:

http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/2

The HD4600 has peak theoretical throughput of 432 GFlops (presumably in 32-bit). I have a 7700k with only 384 shaders pushing 786.6 GFlops 32-bit fp @ 1028 mhz. Four Haswell cores are not going to make up that deficit, HT or no. Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .

So long as the API can help the programmer keep the iGPU pegged at 100% utilization doing physics, handling loops, and whatever stuff can be offloaded to the iGPU (which is more stuff than can be thrown at a dGPU, thanks to latency), Kaveri will always win. What remains to be seen is how such usage will affect actual framerates. If the game in question makes only intermittent use of the iGPU, it may make for a less-than-smooth experience.

Depends. Haswell's igp has access to L3. igp will run at full turbo indefinitely on desktop chips on the standard voltage line. Not to mention that GFLOPS are not comparable and Kaveri is bandwidth limited.

AtenRa · Feb 10, 2015

TheELF said:
It's all about either gaming (high budged) or media/office (low budged)
APUs are not good enough at gaming,or better put,they are not a good bargain for the level of gaming they provide.You can buy a dgpu+cpu for the same amount of money and have similar to slightly better performance.

For $90 (A8-7600) what CPU + dGPU can you buy that will offer the same or better performance (both CPU and iGPU) ??

AtenRa · Feb 10, 2015

Enigmoid said:
Depends. Haswell's igp has access to L3. igp will run at full turbo indefinitely on desktop chips on the standard voltage line. Not to mention that GFLOPS are not comparable and Kaveri is bandwidth limited.

Kaveri is only bandwidth limited in games, GPGPU doesnt need high memory bandwidth in the majority of applications.

http://forums.anandtech.com/showthread.php?t=2394878

Erenhardt · Feb 10, 2015

ElFenix said:
what was the review on steam for 3dmark? worst pay to win game ever?

regardless, that would be sub-optimal behavior. it'd be fantastic if it could monitor instantaneous frame rates to determine where the bottleneck is to dynamically reassign power consumption. but that might require very tight integration between drivers, operating system, and processor firmware. it's something apple could accomplish being vertically integrated. amd? not so much.

it's not lack of creativity, it's market segmentation. the OEMs don't make much money on <$500 laptops. so, people who are in the know enough to want an SSD have to pay a premium for it and step up to something that's closer to (or over) $1000. way more margin on that. it's just like premium screens or anything else. see: the airline industry.

Kaveri uses some silly Application Power Management and adjust CPU and GPU clocks accordingly to workload and bottleneck. But I'm old school and would love to set my clocks to certain value and keep it there stable. More direct control on overclocking. I guess the new method caters more towards casual users thou.

NTMBK · Feb 10, 2015

AtenRa said:
Kaveri is only bandwidth limited in games, GPGPU doesnt need high memory bandwidth in the majority of applications.

http://forums.anandtech.com/showthread.php?t=2394878

It completely depends on your application. What you demonstrated is just that most "GPGPU" benchmarks are poor tests of the memory subsystem. There are plenty of GPGPU apps which rely massively on the memory subsystem.

TheELF · Feb 10, 2015

AtenRa said:
For $90 (A8-7600) what CPU + dGPU can you buy that will offer the same or better performance (both CPU and iGPU) ??

Like I said GAMING, no run of the mill user is going to care about CPU + IGPU computing,people care about either gaming or office/media.

NTMBK · Feb 10, 2015

DrMrLordX said:
Yes, you'll never get 432 GFlops out of HD4600. At least in some benchmarks, I can get 786 GFlops out of Kaveri.

Ah, now you are getting into murkier territory. Can HD4600 make use of the CPU cache hierarchy? And will many (or any) of these compute tasks being passed off to the iGPU be bandwidth-sensitive? Some of the Luxmark testing done by AtenRa indicates that, for that application (rendering using the Luxrender core), it is somewhat sensitive to memory bandwidth, but not by all that much (even in the CPU + GPU render testing, where the CPU and iGPU should theoretically be fighting one another over a shared pool of bandwidth). In contrast, GPUPI ignores memory bandwidth within realistic boundaries (no difference on iGPU performance between DDR3-1600 and DDR3-2400 on Kaveri).

Only in Windows, and only if the end-user isn't using amdmsrtweaker to stop that behavior.

It would be a boon for Intel and AMD processors, especially once Gen8/Gen9 comes into ubiquity. We've had GPU physics for awhile.

1) Yes, the GPU shares the LLC with the CPU (though not the L2 or L1 private caches, obviously). It also shares the eDRAM on higher end parts, though the i5 we were talking about does not share it.

2) Different GPGPU workloads will be more or less sensitive to memory bandwidth. Some of them will just hammer the ALUs on tiny datasets, whereas others will spend most of their time streaming data in and out. It depends on what kernel you are running.

3) If the end user has to hack their APU's firmware to get it running well, that's not a good sign.

Heck, we might as well start overclocking at that point. Do we know that Kaveri even stays within its power window with that "fix"?

seitur · Feb 10, 2015

1. Too weak CPU. Especially single-thread.

2. High-tier costly APU cost alot but cannot compete even against cheaper dedicated GPUs.

Retorical questions -

a) what would be a reason to buy an AMD cheap low-tier APU instead of Intel (big core)Celeron or Pentium?
b) what would be a reason to buy AMD high-tier APU instead similarly priced CPU+dedicated GPU combo?

Preety obvious why current AMDD APUs does not sell well.

NTMBK · Feb 10, 2015

TheELF said:
Like I said GAMING, no run of the mill user is going to care about CPU + IGPU computing,people care about either gaming or office/media.

The IGP can be used to accelerate office/media. GPU transcoding (for codecs which don't have hardware support), or GPU acceleration of large spreadsheets, or smoother/more responsive rendering of desktop publishing tasks.

DrMrLordX · Feb 10, 2015

TheELF said:
This is not a turbo,they have two states one for low usage and one for high usage,it can run all day at full mhz (just like any dgpu) ,more than that you can even overclock the igp on any haswell and it will still run on low mhz when there is no need for more.

Can you overclock it on a locked Haswell part?

More on the point, IGPU computing will not catch on because it's usefulness is just as limited as that of many cores in general,people just don't care about luxmark or whatever fringe software that can utilize it.

Luxrender (the software that powers luxmark) is a render core that can be utilized in several different 3D rendering packages, such as Blender. People most certainly do care about Blender. I am not sure if Luxrender is sophisticated enough to bridge a workload across multiple GPUs (HSA-compatible or otherwise), but if/when it is, you'd be crazy not to use a solid iGPU + dGPU combo for Blender rendering. Or 3DSMax once they've got something more solid.

Enigmoid said:
Depends. Haswell's igp has access to L3. igp will run at full turbo indefinitely on desktop chips on the standard voltage line. Not to mention that GFLOPS are not comparable and Kaveri is bandwidth limited.

Hmm . . . okay. Can we at least compare GFlops between two different processors running the same application?

http://www.aida64.com/support/faq :

AIDA64 CPU benchmarks are heavily optimized for Haswell – and all other modern CPU architectures –, and they utilize all available instruction set extensions, such as SSE, AVX, AVX2, FMA or XOP as well as full vectorization.

Using FMA and AVX2, a quad-core Haswell's x86/x64 part can indeed perform exceptionally, way better than the GT2 iGPU. It is, however, much easier to write such optimized code for the iGPU in OpenCL than for the CPU using a machine code generator or x86/x64 assembly.

And . . .

http://forums.guru3d.com/showpost.php?p=4743967&postcount=28

Haswell coming up big on 32-bit fp thanks to AVX2 + HT . . . but it still falls short of 384 shaders. A locked i5 won't turn in that kind of performance, either.

Erenhardt said:
Kaveri uses some silly Application Power Management and adjust CPU and GPU clocks accordingly to workload and bottleneck. But I'm old school and would love to set my clocks to certain value and keep it there stable. More direct control on overclocking. I guess the new method caters more towards casual users thou.

Some boards let you disable APM. Mine is disabled, I think. It doesn't seem to make much of a difference really, once you start overclocking things.

NTMBK said:
1) Yes, the GPU shares the LLC with the CPU (though not the L2 or L1 private caches, obviously). It also shares the eDRAM on higher end parts, though the i5 we were talking about does not share it.

http://forums.aida64.com/topic/1539-opencl-gpgpu-benchmarks/#entry8636

Hmm. Is that a driver problem or a hardware problem? Doesn't seem to be helping HD4600's bandwidth much.

2) Different GPGPU workloads will be more or less sensitive to memory bandwidth. Some of them will just hammer the ALUs on tiny datasets, whereas others will spend most of their time streaming data in and out. It depends on what kernel you are running.

Indeed. Hmm . . . has anyone done bandwidth-sensitivity comparisons between different GPGPU applications? How many can you think of are available for download that could be tested? I wouldn't mind taking a crack at it.

3) If the end user has to hack their APU's firmware to get it running well, that's not a good sign. Heck, we might as well start overclocking at that point. Do we know that Kaveri even stays within its power window with that "fix"?

I agree that the CPU throttling thing is pretty sad, though it's OS-specific so obviously MS and/or AMD could probably fix it if they really wanted to do so. I might just go and get a Kill-a-Watt while you're on the subject, just to see what power numbers I get on this old Corsair HX520. I can tell you that CPUID hwmonitor detects a total iGPU power draw of 44W when I have it clocked to 1028 mhz. That's with all the extra vNB, vVDDA, and other crap that may or may not affect iGPU power usage.

NTMBK said:
It completely depends on your application. What you demonstrated is just that most "GPGPU" benchmarks are poor tests of the memory subsystem. There are plenty of GPGPU apps which rely massively on the memory subsystem.

Are there any available freely for download? That topic has been one of interest to me for awhile (hence why I bugged AtenRa into benching it for me on at least one occasion). Now that I have my own Kaveri, I can do that myself. I haven't even tested AIDA64 to see how bandwidth-limited it is.

Why have AMD APUs failed on the market?

Golden Member

Lifer

Diamond Member

Platinum Member

Golden Member

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Lifer

No Lifer

Elite Member

Diamond Member

Platinum Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Senior member

Lifer

Lifer