Global Foundries 32nm Process Status

jvroig · Jul 25, 2010

DrMrLordX said:
That being said, I am also interested in knowing what a bit slower and a bit hotter entails. If it's, oh I don't know, 100-200 less mhz and maybe 10-15W of extra TDP at launch, then big deal; they can fix that in future steppings I'm sure.

That's a relatively mild case, so yeah, not end-of-the-world big deal. I cringed when I first read it because it gave me flashbacks of Barcelona (maxing out at what, 2.6Ghz for the flagship?). But with Llano expected to be above 3Ghz, I guess this is just me being pessimistic. I can't remember the clock expectation for Bulldozer (if any were mentioned at all in the past), but it can't be lower than Llano by much.

IntelUser2000 · Jul 25, 2010

jvroig said:
That's a relatively mild case, so yeah, not end-of-the-world big deal. I cringed when I first read it because it gave me flashbacks of Barcelona (maxing out at what, 2.6Ghz for the flagship?). But with Llano expected to be above 3Ghz, I guess this is just me being pessimistic. I can't remember the clock expectation for Bulldozer (if any were mentioned at all in the past), but it can't be lower than Llano by much.

That's actually a lot. Each new generation of process tech brings ~20% increase in clock speeds OR 30% power reduction assuming everything else is same.

100-200MHz less + 10-15W also means 300-400MHz less at same TDP, or same clock speed and maybe 30-40W. That cuts down the advantage by more than half.

Little should mean lower than that.

Idontcare · Jul 25, 2010

DrMrLordX said:
Oh, of course . . . K8 vs Netburst is a classic example of that. I'm more focused, first and foremost, on release date. The sooner a relatively blunder-free launch for Bulldozer occurs, the better for all of us.

That being said, I am also interested in knowing what a bit slower and a bit hotter entails. If it's, oh I don't know, 100-200 less mhz and maybe 10-15W of extra TDP at launch, then big deal; they can fix that in future steppings I'm sure.

An upward shift of this magnitude in the k-effective value for the dielectric stack across a few critical metal levels would be expected to impact clockspeed and power-consumption by nothing more than around 5% in a case like this.

I'd say your MHz impact is probably about on target but your TDP impact is about 2x too much (5-8W).

...some of the IPC expectations bandied about in here are a bit, uhm, silly? 100% IPC improvement in a single architecture generation? Wow.

IntelUser2000 · Jul 25, 2010

People are confusing "per thread" or "single thread" performance with "per CPU" performance.

Bulldozer addresses one of the relative weaknesses for AMD and that's lack of multi-threading technologies like SMT. Whatever they want to call it, the focus of their approach is about multi-threading.

Going from Pentium 4 Presler to Core 2 resulted in 90% improvement of IPC, but 100% improvement from Sandy Bridge? Did anyone also forget Pentium 4 clocked higher too, meaning the performance improvement wasn't 100%? :roll:

DrMrLordX · Jul 25, 2010

jvroig said:
That's a relatively mild case, so yeah, not end-of-the-world big deal. I cringed when I first read it because it gave me flashbacks of Barcelona (maxing out at what, 2.6Ghz for the flagship?). But with Llano expected to be above 3Ghz, I guess this is just me being pessimistic. I can't remember the clock expectation for Bulldozer (if any were mentioned at all in the past), but it can't be lower than Llano by much.

Your tendency towards pessimism is understandable, though, at least this time, AMD is going into Bulldozer's launch with a strong 45 nm process to back them up. We won't have another situation where AMD is forced to limp along a few extra months on products as weak as Brisbane before their delayed "next big thing" hits the market. And that is assuming that anything outside of desktop Llano will be delayed.

Also, will Llano launch at speeds as high as 3 ghz? I had heard 2.5 ghz for the desktop part, but that was only from one source.

IntelUser2000 said:
That's actually a lot. Each new generation of process tech brings ~20% increase in clock speeds OR 30% power reduction assuming everything else is same.

100-200MHz less + 10-15W also means 300-400MHz less at same TDP, or same clock speed and maybe 30-40W. That cuts down the advantage by more than half.

Little should mean lower than that.

Shew. Yeah, taken together, a drop in clocks to that extent along with that increase in TDP would be um . . . less than good. Let us hope that my guesswork is wrong.

Idontcare said:
An upward shift of this magnitude in the k-effective value for the dielectric stack across a few critical metal levels would be expected to impact clockspeed and power-consumption by nothing more than around 5% in a case like this.

I'd say your MHz impact is probably about on target but your TDP impact is about 2x too much (5-8W).

One out of two ain't bad.

...some of the IPC expectations bandied about in here are a bit, uhm, silly? 100% IPC improvement in a single architecture generation? Wow.

Hey, it'd be great if it happened, but I won't hold my breath waiting for it. That would be a bigger launch than Conroe, and Conroe was preceded by months of engineering sample leaks (which was brilliant marketing for Intel, intentional or otherwise). We have seen neither hide nor hair of Bulldozer samples in the wild making it hard to justify any estimate of Bulldozer's performance.

Well, correction, I haven't seen any ESes floating around. Somebody else might have, but that's news to me.

IntelUser2000 · Jul 25, 2010

Let's put it this way. 100% IPC improvement in AMD terms is K6-2 to Deneb. All in one generation. Not possible.

Scali · Jul 25, 2010

IntelUser2000 said:
Bulldozer addresses one of the relative weaknesses for AMD and that's lack of multi-threading technologies like SMT. Whatever they want to call it, the focus of their approach is about multi-threading.

But Bulldozer addresses only integer performance. You'll get 8 integer units per unit, but only 2 float units.
If we compare a Bulldozer unit to two cores from Intel, it's not even more. Intel has 4 ALUs per core (AMD has had 3 since K7. Intel had 2 in the PIII, now 4 since Core2).

So it will be interesting to see how well AMD can feed these units. Intel feeds 4 of their ALUs via 2 threads, using SMT. So 8 ALUs run 4 threads.
If I understood Bulldozer's units correctly, you'd get 8 ALUs with 2 threads.

JFAMD · Jul 25, 2010

DrMrLordX said:
Hmm. I wonder how this squares with Bulldozer allegedly being on track? And just how much slower is a bit slower going to be?

Hmmm, let me see, we have 2 statements that are at odds with each other.

One is by the CEO of a company on a quarterly conference call that is being recorded. There are actual laws governing those types of disclosures, so that person would have to be a.) truthful and b.) 110% sure of what they were saying.

The other is by a person who (probably) does not work at the company and posts information passed on to them by someone else on an internet forum.

You can choose to believe whatever you want. As someone on the inside, I did not find the original post to be believeable.

IntelUser2000 · Jul 25, 2010

Scali said:
So it will be interesting to see how well AMD can feed these units. Intel feeds 4 of their ALUs via 2 threads, using SMT. So 8 ALUs run 4 threads.
If I understood Bulldozer's units correctly, you'd get 8 ALUs with 2 threads.

Intel has 3, and performance isn't always execution unit bound.

Scali · Jul 25, 2010

IntelUser2000 said:
Intel has 3, and performance isn't always execution unit bound.

No, but that would still work in Intel's advantage. SMT was introduced because the execution units could not be kept busy enough with 1 thread alone.
So I don't really see the idea of throwing more integer units at a single thread.
I would think the opposite would make more sense... Use the same, or even less, integer units per thread, so that the cores are smaller, and you can fit more cores in the same transistor budget.

Idontcare · Jul 25, 2010

JFAMD said:
Hmmm, let me see, we have 2 statements that are at odds with each other.

One is by the CEO of a company on a quarterly conference call that is being recorded. There are actual laws governing those types of disclosures, so that person would have to be a.) truthful and b.) 110% sure of what they were saying.

The other is by a person who (probably) does not work at the company and posts information passed on to them by someone else on an internet forum.

You can choose to believe whatever you want. As someone on the inside, I did not find the original post to be believeable.

Given the generous range in the publicly stated release timeline, a delay of this magnitude could easily be accommodated by the schedule and still give you and your employer plenty of "wiggle" room to claim you hit the schedule all along.

I don't see any contradictions nor a requirement that any party is lying/making it up. You guys give nice 6-12 month windows for product releases for a reason, to accommodate the unexpected delays in internal milestones as well as that of your foundry's process development and qualification team.

From the sounds of it here we now have some unconfirmed info regarding where a few months of this amply wide release window have gone.

Is it false? Can you publicly confirm or disavow that there were packaging or reliability issues with the 32nm low-k dielectric of choice by the IBM researchers for the lower metal levels?

JFAMD said:
You can choose to believe whatever you want. As someone on the inside, I did not find the original post to be believeable.

In the months leading up to the glofo spinoff it was surprising how few AMD'ers were in the know. Many claimed, here even, that such a thing simply could not and would not happen and they too disavowed - as insiders and employees - any rumors that were being spread of the impending spinoff.

It really is not our concern whether anything in this thread is true, we aren't part of the decision loop and there are no action items generated for us by this info. As consumers it is fun to see and hear about some of the behind the scenes battles going on, hence the appeal of Anand's articles regarding ATI's GPU development decisions and so forth.

But the impact to you, now I see where it can start to feel personal and that makes it easy to not be able to see the forest for the trees. I remember being in a similar position when TI decided to cancel 32nm and 45nm development. My vendors and suppliers knew before I did and when they tried to tell me about the rumor I was quite rapid to dismiss it as just simply absurd.

Back to the topic at hand though, I don't see anything here as unbelievable or out of the realm of plausibility even if I didn't have confirmation from folks in the know.

AMD gives nice wide release windows for a reason, for good reason imo, and this info simply explains why things might not happen at the earliest of possible release months in the range given by AMD. Seems fair enough to me.

jvroig · Jul 25, 2010

Scali said:
If I understood Bulldozer's units correctly, you'd get 8 ALUs with 2 threads.

Yeah, that's the tricky part. At first Dresdenboy (Matthias) thought Bulldozer's 4 pipelines per integer core was 2 / 2 (ALU / AGU; sometime November 2009), but then July 2010 he became sure it was 4 ALU and at least 3 AGU. Since this is for each integer core, this means a module will have at least 8 ALUs as you said. This does seem too much since execution resources aren't always the bottleneck (feeding them is, which is a hyperthreading win), but Matthias' wading through patents and Open64 source code gives him the opinion that Bulldozer may be latency tolerant (+ data speculation, checkpointing, replay and runahead execution to cover L1 and L2 misses), which may compensate for having only 2 threads and require the extra execution resources they made available.

Right now, without more info on Bulldozer, it seems HT is just a more elegant solution as it keeps the thread:ALU ratio more straightforward. It's not perfect (server loads, Rubycon loves talking about it), but it gets the job done. What AMD seems to be doing is going about the same thing in a rather more complicated (read, more prone to negative scenarios) approach.

We'll get to know more in a month's time, but given that Bulldozer is still more than a year away, I'm sure we wouldn't know everything yet. Still, fun to speculate and make a mental exercise out of it every once in a while.

Idontcare · Jul 25, 2010

jvroig said:
Right now, without more info on Bulldozer, it seems HT is just a more elegant solution as it keeps the thread:ALU ratio more straightforward. It's not perfect (server loads, Rubycon loves talking about it), but it gets the job done. What AMD seems to be doing is going about the same thing in a rather more complicated (read, more prone to negative scenarios) approach.

I think it is great we have such microarchitectural diversity.

Given the wide breadth of end-user applications out there it is simply impractical to expect a "one size fits all" approach to optimizing architecture to maximal benefit of all applications.

With Intel and AMD taking very divergent paths in their architectural evolution everyone (with their unique application needs) now stands twice the chance of finding a superbly performing COTS (commodity off the shelf) hardware solution that will deftly handle their needs.

Kinda like the two-party system in the USA...for most americans they find that either the democrat or republican party adequately represent them in matters of governance. Its not ideal, but it appears to be "good enough" and arguably better than a single-party system.

At any rate, what it means imo is that any given price-point every end-user gets a choice between two processors which perform comparably in standardized applications but will trade blows when it comes to specific niche end-user apps. I think this is just simply wonderful.

wlee15 · Jul 25, 2010

Idontcare said:
It was spun-off as a company called Spansion, which then went bankrupt a year ago (around the time that qimonda went bankrupt).

Spansion actually survied through bankruptcy and managed to restructure with much less debt. Qimonda on the other hand wasn't so lucky as it has been liquidated.

DrMrLordX · Jul 26, 2010

JFAMD said:
Hmmm, let me see, we have 2 statements that are at odds with each other.

To add to what IDC said, there is always the possibility that Bulldozer would not have needed Globalfoundries to nail down their 32nm process any earlier than the OP indicated in order for it to launch on time.

If I recall correctly, the latest projections of launch windows for 32nm products from AMD/Globalfoundries were desktop Llano in Q4 2010/Q1 2011 (presumably Jan 2011) and then Bulldozer in H2 2011 (presumably July 2011, maybe later). If 32nm production is being pushed back 6 months, that would certainly affect the Llano launch (well, the desktop Llano launch . . . mobile Llano is 40nm bulk silicon from TSMC) but not necessarily the Bulldozer launch. If anything we may see desktop Llano and Bulldozer launch at about the same time.

Idontcare said:
Is it false? Can you publicly confirm or disavow that there were packaging or reliability issues with the 32nm low-k dielectric of choice by the IBM researchers for the lower metal levels?

I too would like official or semi-official confirmation/denial, but at the same time, the last thing I want to do is get JF into any trouble (or put him in an uncomfortable position). He got enough flak over the AMDZone April Fools prank as it is, and that wasn't even remotely of his doing. Poor guy.

Idontcare said:
I think it is great we have such microarchitectural diversity.

Given the wide breadth of end-user applications out there it is simply impractical to expect a "one size fits all" approach to optimizing architecture to maximal benefit of all applications.

With Intel and AMD taking very divergent paths in their architectural evolution everyone (with their unique application needs) now stands twice the chance of finding a superbly performing COTS (commodity off the shelf) hardware solution that will deftly handle their needs.

Kinda like the two-party system in the USA...for most americans they find that either the democrat or republican party adequately represent them in matters of governance. Its not ideal, but it appears to be "good enough" and arguably better than a single-party system.

At any rate, what it means imo is that any given price-point every end-user gets a choice between two processors which perform comparably in standardized applications but will trade blows when it comes to specific niche end-user apps. I think this is just simply wonderful.

You know, with stuff like CUDA and OpenCL floating around in the mix, I think consumers may get more than two choices overall, provided that GPGPU acceleration of applications becomes more commonplace. Furthermore, it looks like AMD may maintain K10.5 variants and Bulldozer in the consumer market for a time as well, giving AMD buyers a choice between which AMD uarch they want.

jvroig · Jul 26, 2010

Idontcare said:
I think it is great we have such microarchitectural diversity.

Absolutely. It's just that the roles seem reversed - AMD, not having R&D dollars to burn through and not feel it, should (or at least, instinctively should) be taking the "tried and true" approach, and then it should be Intel who is doing all the R&D for "diversity", since they can have Larrabee fail, can it, and not worry about it and celebrate yet another best quarter in history. At least, that's how it is in my mental exercises: "Come on, AMD, do what works, stop fooling around".

I can be totally wrong, and AMD may execute their plan so well that it will work better than we'd expect (hehe, not so hard, given how pessimistic the view is - or at least, how pessimistic my view is).

DrMrLordX said:
He got enough flak over the AMDZone April Fools prank as it is, and that wasn't even remotely of his doing

Haha, yeah, I read that in a "news" article , which turned out to be an April Fools prank. It almost had me fooled since the author name-dropped, but Bulldozer was too awesome in the benchmarks, and that didn't fool the pessimist in me.

Furthermore, it looks like AMD may maintain K10.5 variants and Bulldozer in the consumer market for a time as well, giving AMD buyers a choice between which AMD uarch they want.

They will? Where did this come from?

busydude · Jul 26, 2010

jvroig said:
They will? Where did this come from?

CPU part of Llano is K10.5 derived. It will have a different socket though.

Scali · Jul 26, 2010

jvroig said:
We'll get to know more in a month's time, but given that Bulldozer is still more than a year away, I'm sure we wouldn't know everything yet. Still, fun to speculate and make a mental exercise out of it every once in a while.

Yea, I just get this nasty deja-vu regarding Barcelona.
Back then they were talking about a native quadcore-design which would be 40% faster than Kentsfield.
But from the technical data they had provided, I could not derive where that 40% would be coming from. The "secret sauce" was missing.

Seems we're in a similar situation now... From the data we have today, we don't know how it is going to feed those execution units, so where is the performance going to come from? What is the "secret sauce" that makes Bulldozer a success?
I do feel they're trying to do something similar to Sun's Niagara CPUs: don't bother too much about floating point, servers are mainly about integer performance (Niagara shares a single FPU unit between 8 cores/64 threads). The thing is, Niagara's "secret sauce" is SMT, which as far as I know, AMD has always denied.

Martimus · Jul 26, 2010

For those who would like a more detailed explanation on what is going on, I'll try to explain it here as it was explained to me:

The High-k dielectric is used for the transistors (called the front-end of the line or FEOL for short), which are wired together by the metal levels which themselves are insulated from one another by a low-k dielectric (called the back-end of the line or BEOL for short).

The higher the k-value for the dielectric the better it is for making transistors. And the lower the k-value for the dielectric the better it is for insulating the metal wires.

The performance issues with GloFo's 32nm are two-fold. First, the performance issue is the choice of going with the gate-first integration. This means the choice of high-k dielectric and metal gate materials are completely non-optimal for performance. Its entirely optimized for low cost and low die size. (compared to gate last that is, naturally the gate-first integration is still better than sticking with traditional doped-poly for the gate)

The second issue is the problematic issues with IBM's choice of low-k dielectric for the BEOL. Originally the plan was to use a new low-k dielectric for the four lowest metal levels (the ones closest to the xtors and are critical for speed/performance). This particular low-k material has a lower k-value than what they currently use at 45nm.

The problem is that it cracks when they package the chip. So they are going to just use this new lower-k material at metal level 4 or 3 (just one of the metal levels) and fall back to using the 45nm low-k material for the lowest metal levels. It is still a low-k dielectric, just not as low as they originally planned.

VirtualLarry · Jul 26, 2010

That makes it sound much less worse than your OP sounded.

Martimus · Jul 26, 2010

VirtualLarry said:
That makes it sound much less worse than your OP sounded.

That was part of the reason to elaborate on what was happening. Some people took the original post as a doomsday sign for the process. It is just that it won't be as good as expected; but that doesn't mean it will be 'bad'. The worst part of the news is the 6-month delay, in my opinion.

Although, even after reading over the original post again, I don't see how the technical portion appears to be conveyed as all that bad. (It definitely isn't conveyed as good, but it isn't exactly good news.)

heyheybooboo · Jul 26, 2010

Martimus said:
For those who would like a more detailed explanation on what is going on, I'll try to explain it here as it was explained to me:

The High-k dielectric is used for the transistors (called the front-end of the line or FEOL for short), which are wired together by the metal levels which themselves are insulated from one another by a low-k dielectric (called the back-end of the line or BEOL for short).

The higher the k-value for the dielectric the better it is for making transistors. And the lower the k-value for the dielectric the better it is for insulating the metal wires.

The performance issues with GloFo's 32nm are two-fold. First, the performance issue is the choice of going with the gate-first integration. This means the choice of high-k dielectric and metal gate materials are completely non-optimal for performance. Its entirely optimized for low cost and low die size. (compared to gate last that is, naturally the gate-first integration is still better than sticking with traditional doped-poly for the gate)

The second issue is the problematic issues with IBM's choice of low-k dielectric for the BEOL. Originally the plan was to use a new low-k dielectric for the four lowest metal levels (the ones closest to the xtors and are critical for speed/performance). This particular low-k material has a lower k-value than what they currently use at 45nm.

The problem is that it cracks when they package the chip. So they are going to just use this new lower-k material at metal level 4 or 3 (just one of the metal levels) and fall back to using the 45nm low-k material for the lowest metal levels. It is still a low-k dielectric, just not as low as they originally planned.

My understanding (always questionable

) is the exact opposite: the end result is increased performance and higher density due to reduction in leakage.

How 'hybrid' layers impact the overall chip, I'm clueless. I do take a bit of solace however from the apparent gains at EO 45nm (without knowing how it may 'drop down' to 32nm and 'hybrid' with gate-first). As typical with AMD, it's baby-steps and refine along the way.

What's been missed in the "Liano/1h11" discussion is the Zambezi chip (which I feel folks essentially 'booted' into Q410 - I among them - LOL) and the 'bump' given to Ontario.

(from Anand - 11/11/2009)

As I understand it, Zambezi is BD without the GPU on-die. While folks have fixated on Liano (which is 'Stars' plus a GPU), Zambezi is the first spins or 'proof of concept' which will lead to the actual 'Fusion' (with GPU on-die) in 2012 --- Liano is simply the mid-point, or mini-step, in the process (like Clarksdale with maybe a little AVX thrown in for good measure).

Did I get this right (and did it make sense) ??

--

aphorism · Jul 26, 2010

heyheybooboo said:
My understanding (always questionable ) is the exact opposite: the end result is increased performance and higher density due to reduction in leakage.

he is not referring to Hi-K in general, he is referring to gate first integration of HKMG. this implementation is simpler but not as fast as gate replacement.

integrating metal is complex because you cant crystallize it like you can with other materials and you cant get the temperature of the wafer too high because that would destroy the dopants in the wells and channels. this limits what kinds of materials are possible for HKMG.

cbn · Jul 26, 2010

IntelUser2000 said:
Bulldozer addresses one of the relative weaknesses for AMD and that's lack of multi-threading technologies like SMT. Whatever they want to call it, the focus of their approach is about multi-threading.

After an internet search of "Bulldozer" + "multithreading" I found this:

http://www.xbitlabs.com/news/cpu/di..._Simultaneous_Multi_Threading_Technology.html

Hopefully next month at Hot chips more light will be shed on this interesting issue brought up by Scali.

cbn · Jul 26, 2010

Scali said:
I do feel they're trying to do something similar to Sun's Niagara CPUs: don't bother too much about floating point, servers are mainly about integer performance (Niagara shares a single FPU unit between 8 cores/64 threads).

Yep, the Xbit article suggests that AMD may be using that approach.

If AMD goes with the SMT you are proposing it sounds like their customers would get a licensing discount when using VMware (although I have no idea how significant this is).

http://www.anandtech.com/show/3827/virtualization-ask-the-experts-1

There are two areas where Intel has an objective advantage. The first one is licensing. The twelve-core AMD Opteron 6100 and six-core Xeon 5600 perform more or less the same. However if you like to buy VMware vSphere essentials (which is an interesting option if you can run your services on 3 servers) you get a license for 3 servers, 2 CPUs per servers and 6 cores per CPU. You have buy additional licences if you have more cores per CPU.

Global Foundries 32nm Process Status

Platinum Member

Elite Member

Elite Member

Elite Member

Lifer

Elite Member

Banned

Senior member

Elite Member

Banned

Elite Member

Platinum Member

Elite Member

Senior member

Lifer

Platinum Member

Diamond Member

Banned

Diamond Member

No Lifer

Diamond Member

Diamond Member

Member

Lifer

Lifer