positivedoppler
Golden Member
- Apr 30, 2012
- 1,148
- 256
- 136
From the link you posted:
" After 30 minutes of full CPU loading, we found that the core temperatures were held below 100 C, the clock speeds remained at 3.2 GHz for the CPU (the GPU was 'idling' at 200 MHz) and there was no thermal throttling to be seen. At this point, we introduced Furmark loading into the picture. It becomes clear that the system gives preference to CPU performance. >>The GPU remains throttled at 200 MHz, while the CPU cores don't thermally throttle<<."
See? Base clock is base clock for a reason, at least in Intel processors. That is more than we can say of AMD processors.
BTW take a look at that screenshot - a 65W TDP CPU is reporting a 77W Package Power usage under combined load, while CPU load alone is enough to reach 67W.See? Base clock is base clock for a reason, at least in Intel processors. That is more than we can say of AMD processors.
Bear in mind that, in some circumstances, you won't be spending more for an iGPU. Consider what most people recommend for a budget-to-midrange machine nowadays: a Haswell i5 on LGA1150. The cheapest is the i5-4440 (so far as I can tell) which is around ~170. In contrast, you can grab an A10-7850k for ~$135-$140, depending on where and when you buy it. If DX12 shows up big for iGPUs and the 7850k starts consistently beating i5s in new games while running a 290x or 380x or whatever, then you haven't spent extra money buying the better DX12 gaming CPU.
But Kavri will never consistently beat an i5 in gaming. Locked i5's have more throughput and ST performance. At best the two will be gpu limited and equal but kaveri will never be ahead.
I think the assumption was that Kaveri will pull ahead because of gpu compute on the igp.
Remains to be seen though. I remain skeptical.
But the i5 also has a pretty good IGP, and I would hope that would get used too. (If DX12 somehow magically made developers care about mixed-GPU game engines, which I kind of doubt it will.)
The 32-bit fp and integer throughput of Kaveri's iGPU alone is higher than the 32-bit fp and integer throughput of an i5's Haswell cores and its iGPU combined.
Case in point:
http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/2
The HD4600 has peak theoretical throughput of 432 GFlops (presumably in 32-bit). I have a 7700k with only 384 shaders pushing 786.6 GFlops 32-bit fp @ 1028 mhz. Four Haswell cores are not going to make up that deficit, HT or no. Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .
So long as the API can help the programmer keep the iGPU pegged at 100% utilization doing physics, handling loops, and whatever stuff can be offloaded to the iGPU (which is more stuff than can be thrown at a dGPU, thanks to latency), Kaveri will always win. What remains to be seen is how such usage will affect actual framerates. If the game in question makes only intermittent use of the iGPU, it may make for a less-than-smooth experience.
a) throttling of any sort in a power virus scenario isn't something i'm much concerned with. thermal throttling under real world usage is something that i might be concerned with. furmark doesn't even get you fake internet points.
Theoretical maximum throughput doesn't translate into real world performance.
For a start, both Intel and AMD APUs will face the same dual-channel DDR3 bottleneck- AMD is rated for higher performance DIMMs, but Intel's memory controller is more efficient and is backed up by a big fat L3 cache shared between GPU and CPU.
And of course as soon as you start seriously loading that Kaveri GPU, the CPU clock is going to plummet and affect overall performance.
But yeah, I do hope someone gets round to implementing GPU physics on the iGPU. Almost every modern gaming build has a very competent parallel processor sitting idle on the APU, and it's a real shame.
This, I am totally in favor of SSD for lower-end laptops. For many people, it's not about the space, it's about the performance. It's funny, too, a decent 120GB SSD ($60?), isn't very different in price than a 500GB or 1TB laptop HDD ($50-60), in terms of component costs in the consumer upgrade market. Either OEMs are getting insanely sweet deals on magnetic storage, or they're just stupid. Surely they could get some OEM Kingston V300 equivalent (with the cheaper Async NAND), for cheap, if they bought millions.-Speaking of OEMs, they seem to lack the creativity (or just be too dumb) to make decent AMD laptops.
For example, they could translate the savings of using a cheap AMD APU into ditching the (slow!) 320-500GB HDD they always use in favour of a 128GB SSD. Even a low-end, cheapo SSD would make such a system feel FAR faster than a comparably priced Intel system for general usage. Plus, most users simply don't need that amount of storage - or if they do, they have an external HDD by now.
Assuming you're referring to the p5 state Kaveri throttling under iGPU load, bear in mind that nearly any significant iGPU load will do this, even if the iGPU isn't necessarily the bottleneck and even if thermals aren't a big deal. In Windows, Kaveri's iGPU throttles down to p5 state in
any 3DMark, any Unigine test . . . you could probably get it to throttle running GLQuake (lulz). It doesn't take a power virus to make this behavior emerge.
So you could actually have an older game (or a newer game at low res with low details) that is actually CPU-limited suffer CPU throttling because the iGPU is seeing some non-trivial amount of use. It can also affect 3DMark physics scores which are CPU-dependant.
it's not lack of creativity, it's market segmentation. the OEMs don't make much money on <$500 laptops. so, people who are in the know enough to want an SSD have to pay a premium for it and step up to something that's closer to (or over) $1000. way more margin on that. it's just like premium screens or anything else. see: the airline industry.-Speaking of OEMs, they seem to lack the creativity (or just be too dumb) to make decent AMD laptops. The only models out there seem to be running ultra-low-end E1 chips and the like, saddled with too little memory (2GB) and 5400rpm HDDs. These barely enter the "usable" category, which isn't exactly good for AMDs reputation (especially when you can get a Haswell Celeron (or at least an Atom "Celeron") for maybe €50 more with much more decent performance. For example, they could translate the savings of using a cheap AMD APU into ditching the (slow!) 320-500GB HDD they always use in favour of a 128GB SSD. Even a low-end, cheapo SSD would make such a system feel FAR faster than a comparably priced Intel system for general usage. Plus, most users simply don't need that amount of storage - or if they do, they have an external HDD by now.
Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .
The 32-bit fp and integer throughput of Kaveri's iGPU alone is higher than the 32-bit fp and integer throughput of an i5's Haswell cores and its iGPU combined.
Case in point:
http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/2
The HD4600 has peak theoretical throughput of 432 GFlops (presumably in 32-bit). I have a 7700k with only 384 shaders pushing 786.6 GFlops 32-bit fp @ 1028 mhz. Four Haswell cores are not going to make up that deficit, HT or no. Also, the HD4600 on a locked i5 is not going to put out 432 GFlops consistently, since actual products with that iGPU have a base clock of only 350 mhz (1.2 ghz turbo, but how long is it going to stay at the turbo clock when under full utilization?) .
So long as the API can help the programmer keep the iGPU pegged at 100% utilization doing physics, handling loops, and whatever stuff can be offloaded to the iGPU (which is more stuff than can be thrown at a dGPU, thanks to latency), Kaveri will always win. What remains to be seen is how such usage will affect actual framerates. If the game in question makes only intermittent use of the iGPU, it may make for a less-than-smooth experience.
It's all about either gaming (high budged) or media/office (low budged)
APUs are not good enough at gaming,or better put,they are not a good bargain for the level of gaming they provide.You can buy a dgpu+cpu for the same amount of money and have similar to slightly better performance.
Depends. Haswell's igp has access to L3. igp will run at full turbo indefinitely on desktop chips on the standard voltage line. Not to mention that GFLOPS are not comparable and Kaveri is bandwidth limited.
what was the review on steam for 3dmark? worst pay to win game ever?
regardless, that would be sub-optimal behavior. it'd be fantastic if it could monitor instantaneous frame rates to determine where the bottleneck is to dynamically reassign power consumption. but that might require very tight integration between drivers, operating system, and processor firmware. it's something apple could accomplish being vertically integrated. amd? not so much.
it's not lack of creativity, it's market segmentation. the OEMs don't make much money on <$500 laptops. so, people who are in the know enough to want an SSD have to pay a premium for it and step up to something that's closer to (or over) $1000. way more margin on that. it's just like premium screens or anything else. see: the airline industry.
Kaveri is only bandwidth limited in games, GPGPU doesnt need high memory bandwidth in the majority of applications.
http://forums.anandtech.com/showthread.php?t=2394878
For $90 (A8-7600) what CPU + dGPU can you buy that will offer the same or better performance (both CPU and iGPU) ??
Yes, you'll never get 432 GFlops out of HD4600. At least in some benchmarks, I can get 786 GFlops out of Kaveri.
Ah, now you are getting into murkier territory. Can HD4600 make use of the CPU cache hierarchy? And will many (or any) of these compute tasks being passed off to the iGPU be bandwidth-sensitive? Some of the Luxmark testing done by AtenRa indicates that, for that application (rendering using the Luxrender core), it is somewhat sensitive to memory bandwidth, but not by all that much (even in the CPU + GPU render testing, where the CPU and iGPU should theoretically be fighting one another over a shared pool of bandwidth). In contrast, GPUPI ignores memory bandwidth within realistic boundaries (no difference on iGPU performance between DDR3-1600 and DDR3-2400 on Kaveri).
Only in Windows, and only if the end-user isn't using amdmsrtweaker to stop that behavior.
It would be a boon for Intel and AMD processors, especially once Gen8/Gen9 comes into ubiquity. We've had GPU physics for awhile.
Like I said GAMING, no run of the mill user is going to care about CPU + IGPU computing,people care about either gaming or office/media.
This is not a turbo,they have two states one for low usage and one for high usage,it can run all day at full mhz (just like any dgpu) ,more than that you can even overclock the igp on any haswell and it will still run on low mhz when there is no need for more.
More on the point, IGPU computing will not catch on because it's usefulness is just as limited as that of many cores in general,people just don't care about luxmark or whatever fringe software that can utilize it.
Depends. Haswell's igp has access to L3. igp will run at full turbo indefinitely on desktop chips on the standard voltage line. Not to mention that GFLOPS are not comparable and Kaveri is bandwidth limited.
AIDA64 CPU benchmarks are heavily optimized for Haswell – and all other modern CPU architectures –, and they utilize all available instruction set extensions, such as SSE, AVX, AVX2, FMA or XOP as well as full vectorization.
Using FMA and AVX2, a quad-core Haswell's x86/x64 part can indeed perform exceptionally, way better than the GT2 iGPU. It is, however, much easier to write such optimized code for the iGPU in OpenCL than for the CPU using a machine code generator or x86/x64 assembly.
Kaveri uses some silly Application Power Management and adjust CPU and GPU clocks accordingly to workload and bottleneck. But I'm old school and would love to set my clocks to certain value and keep it there stable. More direct control on overclocking. I guess the new method caters more towards casual users thou.
1) Yes, the GPU shares the LLC with the CPU (though not the L2 or L1 private caches, obviously). It also shares the eDRAM on higher end parts, though the i5 we were talking about does not share it.
2) Different GPGPU workloads will be more or less sensitive to memory bandwidth. Some of them will just hammer the ALUs on tiny datasets, whereas others will spend most of their time streaming data in and out. It depends on what kernel you are running.
3) If the end user has to hack their APU's firmware to get it running well, that's not a good sign.Heck, we might as well start overclocking at that point. Do we know that Kaveri even stays within its power window with that "fix"?
It completely depends on your application. What you demonstrated is just that most "GPGPU" benchmarks are poor tests of the memory subsystem. There are plenty of GPGPU apps which rely massively on the memory subsystem.