Originally posted by: IntelUser2000
Originally posted by: chizow
I'm not sure what you are getting at, perhaps the same thing, but my point was that cache makes up a significant portion of traditional CPUs, the simplest transistor, allowing clock speeds to scale higher. Those die shots are consistent with my point, showing 40-50% of that die space being dedicated to cache.
That's very different than GPUs which are dedicating at least that much of their die space to execution units rather than cache. Translated to Larrabee, where each discrete core will be a mix of both cache and functional/execution units sets the expectation of clockspeeds being closer to those of a GPU, rather than Intel's CPUs, especially given the estimated size and TDP of Larrabee.
There's no difference. It's very complex+simple(core vs caches in CPU) vs relatively complex(GPU). GPUs have shaders, which aren't really cores
Look at the GT200 die shot:
http://www.anandtech.com/video/showdoc.aspx?i=3334
Even with the die micrograph you can clearly see it has significant amount of repetitive circuits put together.
In Nehalem, only 50mm2 out of the 263mm2 die is L2 and L3 caches. In Beckton, out of the ~600mm2 or so die size, the 8x 256KB L2 and 24MB L3 will occupy only 160mm2.
When there's so many damn repetitive units like in GPUs, it can be optimized for space. While its more than complex than cache, its nowhere near complex as non-cache CPU. In the end its all same.
And yes I agree that Intel has more experience with larger chips.
Oh, and about the "x86 tax". Let me briefly state what the engineer from AMD said about decoder size in their Athlon processors. "Each of the 9 decoders in the Athlon processors only take about the space of 4KB SRAM". I don't know why they stated 9 decoders when there's only 3, but the total space taken up by the decoders are equal to mere 36KB.
With Intel's 45nm process, SRAM takes only 6mm2 per MB of capacity. Since AMD's SRAM is less dense let's assume in Intel terms that the "9 decoders" as stated by the AMD engineer takes space equal to 72KB of SRAM. In Larrabbee, it only has 2 decoders, which is less than the 3 in Athlon CPUs, so lets take that 72KB figure and cut it by 1/3, making it approximately 50KB.
Let's assume that Larrabbee will have 48 cores for the 600mm2 die version. 50KB x 48 is approximately 2.4MB, which even with conservative estimates it'll only take up 15mm2. Inefficient circuitry placement itself will cost most than 15mm2!!
Sure you can call 2.5% usage of space tax if you so wish. But that's nothing that can't be overcome by clever design.
Back in 1996 when the first Pentium was running at 0.35u or even 0.5u, it was a significant impairment. Its no longer true now.