I don't see why the PS4 would spur on GPGPU apps.
I'll give you tessellation seeing as how it's not implemented at all in SNB, but neither OpenCL nor 'UVD' can be counted against SNB die space. OpenCL is more a matter of a few bug fixes than actual die size, and last I checked the SNB decode was on par with both AMD and NVIDIA offerings. As for IQ, no question that it's behind there due to its lack of proper anisotropic filtering, but that's effectively fixed with IVB and again didn't affect area much as all the logic was present already, just not working properly.
It is? That must by why according to notebookcheck.net the 2557m in a macbook air gets 1360 on 3dmark vantage P GPU while the 2620m in a macbook pro gets 1477... Oh wait, that's pretty much exactly the difference between their 1.2GHz vs 1.3GHz turbo speeds.
It will be interesting to see Trinity in a crossfire setup with the IGP and a discreet GPU. Intel cannot do this.
That's an oversimplification and it's 16, not 12kB L1.There's Alot of room for improvement on this cpu. The high latency cache. The longer pipline. The L1 data cache was cut for PH2 64k to 12k. The L2 and L3 caches are 40 percent slower than phenom, and since thats were the cpu pulls its instructions whos to say this cpu my be hamstringed by 40 percent. Imagine if it had 40 percent more performance from its current state.
And GPU-Z tells me that the HD 2000/3000 is actually 66mm^2.
Let me tell you why that Ivy Bridge's AF differences won't show up in real world scenarios.And anisotropic filtering still isn't on par with AMD/Nvidia on Ivy.
Now the Ivy Bridge one: http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/16What you won’t see however is a difference, particularly with our static screenshots. When discussing the matter, AMD noted that the difference in perceived quality between the old algorithm and the new one was practically the same.
Throughput (incl. instructions) is not bound by latency. Many people incl. Agner Fog, Andreas Stiller and me think, the decoder's throughput is the main bottleneck.
For a single instruction thread, Bulldozer offers more front end bandwidth than its predecessor. The front end is wider and just as capable so this makes sense. But note what happens when we scale up core count.
Since fetch and decode hardware is shared per module, and AMD counts each module as two cores, given an equivalent number of cores the old Phenom II actually offers a higher peak instruction fetch/decode rate than the FX. The theory is obviously that the situations where you're fetch/decode bound are infrequent enough to justify the sharing of hardware. AMD is correct for the most part. Many instructions can take multiple cycles to decode, and by switching between threads each cycle the pipelined front end hardware can be more efficiently utilized. It's only in unusually bursty situations where the front end can become a limit.
Cyclos and AMD didn't go into too much detail about Piledriver, though they did say it will consist of a 4GHz+ x86-64 core built on a 32nm CMOS process.
AFAIK Crossfire will work with any two AMD GPUs. So long as they're similar in performance it should work fine.Will they actually implement aCF? Trinity GPU will be VLIW4 and the only other VLIW4 products are 6970 and 6950. They would have to do VLIW4+VLIW5 CF.
Throughput (incl. instructions) is not bound by latency. Many people incl. Agner Fog, Andreas Stiller and me think, the decoder's throughput is the main bottleneck.
Possibly up to +100% (extreme case): Agner Fog found, that 256 bit AVX instructions can only be decoded at a rate of 1 per cycle while the hardware would actually allow to decode 2.How much, if at all, can this be improved, if it is the bottleneck?
AFAIK Crossfire will work with any two AMD GPUs. So long as they're similar in performance it should work fine.