AtenRa said:
In a Phenom II design (Deneb) you need the front end, execution unit, L2 etc. In Bulldozer you only need the Integer Core (12% more in the Module).
First mistake, If you add 4 Deneb cores in a Phenom II Quad CPU you don’t get an 8 Core Bulldozer CPU but an 8 Core Deneb CPU. (8 Integer Execution Units + 8 FP Execution Units)
If you add 4 Deneb cores to a Quad Core Phenom II then you add 50% more die because you take 4 times the Front End and the L2 Cache. In Bulldozer you don’t add the front end, nor the L2 thought you don’t add 50%. Go it ???
Hope I was able to help
Alrighty, let's give this another shot. This isn't just for you, in case you start feeling I'm singling you out, but I'm quoting you since you've quoted me
[Warning for all - long wall of text incoming. SKIP if no interest in the Bulldozer 5% issue]
To set the context, let's have an example everybody here understands - broadband internet.
Say you've got a 2mbps line, and a carrier is trying to sell you a 3mbps line. Sounds good. They give you a presentation, and their pamphlet or marketing slides or website tell you "2.5mbps download speed, .5mbps upload". Sounds good still. That totally kicks the ass of your current 2mbps line, right?
So, are you gonna bite? Of course not yet. They are quoting a useless metric, most likely, because all they advertise is the burst speed, and in real life you will rarely get that much. What you need is the committed information rate.
So what you do is compare the CIR of your current 2mbps line against the CIR of the new offering, and if you discover you have much better CIR, then "upgrading" is obviously uncalled for.
Of course, consumers who are unaware of such details will be puzzled when you tell them you chose a 2mbps line from provider A instead of a 3mbps line from provider B. They have no idea what burst speed and CIR are. All they see are what's in the marketing materials - 2mbps VS 3mbps. To be clear, nobody is lying to anybody - it's just that the new provider (offering 3mbps) isn't really telling you the whole picture, and instead quoting big numbers (that are factual, but pretty much meaningless) to sell you something. It just so happened that maybe, the numbers not quoted are actually not in your favor.
That is what's happening here when the 5% - 12% - 50% figure issue exploded. I don't see it as malice on anyone's part that they insist on 5, 12, or 50. Rather, it's a difference in the understanding of the topic based on someone's depth of knowledge on the issue - much like those aware of CIR and burst speed reaching a different conclusion (or, even if the conclusion was the same, the thought process would have been different) from those consumers who just don't really know the intricacies of broadband connections.
Now that the context is settled, let's get this out of the way:
khon said:
An 8-core BD could well be smaller than a 4-core SB, since adding the four secondary cores apparently only increases size by 5%
This is pretty much what started it as far as I was involved with in the discussion, and there is more here than meets the eye, none of which is the fault of khon (I wanted to make this clear, so that no ill-feelings will be harbored by khon).
Khon and, I am sure, a lot of other people saw the presentation, saw the quoted figures, and now have the idea planted in their mind that goes something like this: "Wow! More cores for just 5% more space!". Let's study the quote from Khon again (I can quote others as well, from here and other forums, but let's stick to what we've already got):
khon said:
An 8-core BD could well be smaller than a 4-core SB, since adding the four secondary cores apparently only increases size by 5%
Emphasis mine, of course. So AMD made a presentation, without lying (facts presented are what we call "true facts"

), but in such a way that would make a totally different impression. AMD makes a statement:
int cores are just 5% of the die size (true), module wise the int cores themselves take about 12%, no biggie. And now, after that, people are:
Wow! What a breakthrough in design and engineering! Extra cores only take 5%! Damn, this is gonna be so much smaller than what they would have accomplished in a Deneb-shrink, or what Intel is going to get with SB!
See, AMD actually never mentioned any of that, but it was sort of implied, right? They didn't claim anything of the sort (only more perf/watt or perf/size or some other metric that has ALWAYS been touted - happened in Shanghai, happened in Istanbul, etc), but most people reading it just interprets it a wee bit differently, in a more favorable light.
Was AMD lying to us? Not really. Because 5% IS the size of an int core, relative to the whole die. Of course, this is just an int core, not really "one whole CPU core" the way we understand CPU cores today (which is not how it has always been, and probably not how it will always be - for the moment, if you do not understand that or why, let's leave it be and get back to it later). So, is this "5% int core" really amazing?
Let's look at Gulftown:
This is Gulftown's die map, naturally. Now, let's pretend I am a janitor for Intel, and on the day of Hot Chips, the guy supposed to make the Intel Gulftown presentation got a case of the runs. Hey, it happens. So, being the only good-looking janitor he's ever seen in Santa Clara, he asks me to make a presentation for him, to highlight how awesome our design, engineering, and industry-leading process tech is. Naturally, since I would rather not be cleaning up the clogged lavatories
again, I accept the challenge.
So on the presentation, I establish how awesome our design is (note: from here on, I will mimick the AMD "style", but I am in no way mocking AMD - it is simply a necessity for educational purposes). And I start with the die map above, and then follow it up with the next 2 pictures - the same thing but with overlays:
"Hi guys, welcome to Hot Chips, and here I'd like to show you something that's really gonna be a hot chip! This is gulftown, and this will boggle your mind! From the die map earlier, here I've traced out one core, so we can see just how awesome our design is. The red rectangle is exactly the size of a single core. Let's see what happens if we overlay more of those rectangles across most of the die - next slide:"
"I've alternated the colors so the overlay blocks are easier to distinguish from each other. As you can see, we already have 10 of these blocks, and it is not quite the entire die yet! We can estimate then that the cores only take up about 6-7% of the die size! Once again, Intel's industry-leading process tech has made magic! This is totally why we are completely innocent of that whole Dell brouhaha, I mean come on, if we are this awesome, why would we bother paying off the bastards at Dell, amirite?"
Ok, end the conference, back to the real world.
Let's review the "AMD computation":
JFAMD said:
Here is the math. Start with a full die. Remove 1 integer core from each module and 1 integer schedule (everything else stays where it is). Measure the die size. Your new number is 95% of the total die.
Ok, let's do that for Gulftown, based on the pretend presentation I made earlier:
Here is the math. Start with a full Gulftown die. Remove 1 whole core, like how I shaded it in the presentation (everything else stays where it is). Measure the die size. Your new number is 93-94% of the total die.
See what I did there? (And note that a Gulftown core even has its own FPU, it's not just an "int core") Like AMD several months ago, AMD at the conference, and AMD now through JFAMD, I am also
not lying. But it just doesn't mean anything, and it certainly doesn't mean that Intel only needs 7% die area to keep on adding more cores. What it does do, however, is imply awesomeness, design ingenuity, breath-taking innovation, and makes you think something like this (obviously, since we are only concerned about % of die sizes, let's ignore one is 32nm while one is 45nm):
adapting a quote from khon said:
An 8-core Gulftown could well be smaller than a 6-core Thuban, since adding the cores apparently only increases size by 6-7%
But it is certainly not true, because you just don't add cores to gulftown, right? There are other parts that are not the core (which is where "uncore" came from), and increasing cores but not the uncore means performance (per core) will not remain the same (which is a bad thing), even though it may increase multi-threaded throughput as more cores are involved.
I suppose by now we should get back to Bulldozer, to end it all. And let's get back to the quote from Aten Ra:
In a Phenom II design (Deneb) you need the front end, execution unit, L2 etc. In Bulldozer you only need the Integer Core (12% more in the Module).
See, this is where it gets muddy. You only need "12%" more, because AMD did the fuzzy math thing that I emulated a while ago (start with a full die, remove a core, etc)
The purpose of "pop quiz #1" was to establish that FROM A SINGLE CORE, making it a dual core means doubling everything, because it isn't just an int core you need. But when a "Bulldozer" is presented, and then the "lesser silicon" idea is hammered, it is easy to forget this, especially if you aren't really a CPU guy.
Let's look at it this way: if there was only a single core, would AMD have "invented" the module? Obviously not, right? In fact, the very purpose of the module is so that a monolithic dual core comes into existence. So "removing one int core" is actually pretty senseless - because if you remove the int core, then you also remove a lot of circuitry that is otherwise wasted. Why would you need a complex FPU that can serve two cores? Nope, don't need it, stick a normal FPU on it. Same with everything else - all are big enough and designed to be complex such as to allow servicing two int cores, so they all gotta go to.
So you can't really say: "Bulldozer, you only need to add an int core", because that's actually not true. They started with two full cores (this is probably the 6th time I have to repeat it, but right now I actually don't mind at all), die size is 100% for this normal dual core. They got the idea for "multi-threaded" efficiency, so from a normal dual core, they tweak the design until they fused the two cores together (and in the process, there was more or less a radical change in the design, such that now there are naked int cores), sharing resources such that they ended up reducing the size of the dual core. If the dual core was 100%, now they have ~75%, which means it only took them 50% of what would have been needed to go from single core to dual core. To illustrate:
1 core = 100% size
2 cores = 200% size (normal dual core)
2 BD-style cores (i.e., 1 "module") = 150%
Total space savings thanks to going the "module" design route: 50%
Therefore, cost of an additional BD core: only 50%, compared to 100% for the normal way of adding cores.
Of course, that's when you look at it
from scratch, which is the right perspective. You see, if you start with
the module being there already and you just want to add a core, you will think that it only took 12% more from the core you already had. But, if you had a module in the first place, then you already consumed more space than a normal single core. And if you already had a module but only had a single core, that means you wasted precious space (and design and engineering time) to design something for multithreaded efficiency that only had one core. It just doesn't make sense, exactly because it is the wrong perspective.
I don't know how to make it any more clearer, so let's recap the "old way" of making more cores:
Old Way said:
Start with 1 core.
Core size is 100%, for reference purposes only.
Add another core to make it a dual core
Core size is now 200%.
Add another two cores, to make it a quad core
Core size is now 400%
And here is AMD's "new way", using their architected modules
AMD "Bulldozer Module" Way said:
Start with 1 core.
Core size is 100%, for reference purposes only.
Add another core to make it a dual core, but "modularize" it to become a Bulldozer
Core size is now 150%
Add another two cores to make it a quad core Bulldozer
Core size is now 300%
Clearly, there is significant die space savings. This time, core size grew only up to 300%, whereas for the old way it would have gone to 400%.
Let's get back to pop quiz #2 as answered by AtenRa:
First mistake, If you add 4 Deneb cores in a Phenom II Quad CPU you don’t get an 8 Core Bulldozer CPU but an 8 Core Deneb CPU. (8 Integer Execution Units + 8 FP Execution Units)
If you add 4 Deneb cores to a Quad Core Phenom II then you add 50% more die because you take 4 times the Front End and the L2 Cache. In Bulldozer you don’t add the front end, nor the L2 thought you don’t add 50%. Go it ???
See, it's no mistake at all. That pop quiz was to gauge the understanding of how CPUs are designed and made. If you have 4 cores, and you want to make it an octo-core, you will either have to increase the core size by 100% (old way), or 50% (Bulldozer style).
So it doesn't take AMD "5%" or "12%" to add cores. The number is more like 50%, accounting for everything else needed to make a module, relative to what you would have needed if you only had a single core to start with (think about it: would a module exist at all if there was only a single core?). So from 100% size of a single core, the dual core would then be 150%.
Where did we get this 50%? From AMD themselves, when they corrected Anand, because Anand thought from 1 core, you need 5% more and you get a dual core. Which is not true, and if you've gotten this far into my post, you probably understand now that from 1 core, you need 50% more to make a dual core - exactly what AMD told Anand as a correction.
And, of course, after consuming that 50%, you now have a module, a "Bulldozer", in which an individual int core takes up 12% of its size (but quite naturally doesn't mean it only took 12% to make a dual core).
There we go.
In the closing words of AtenRa:
Hope I was able to help
