AMD describes multichip module: 12-core Magny-Cours

DrMrLordX · Aug 26, 2009

Originally posted by: geokilla

However, there is a problem with this. WHAT'S THE POINT OF HAVING MORE THAN QUAD CORES?!

The real question here isn't "what's the point", but "how expensive is the total platform for 12-24+ cores?". There are already multi-socket servers out there with 16+ cores (AMD has been selling 8p boards ever since the name Opteron hit the streets).

In the dual-core days, it took an 8p board and 8 very expensive CPUs to hit 16 cores on an AMD platform.
In the quad-core days, it took a 4p board and 4 very expensive CPUs to hit 16 cores on an AMD platform.
Once Mangy-Cours is released, it will take a 2p board and 2 very expensive CPUs to hit - wait for it - 24 cores on an AMD platform.

A board with fewer sockets is going to be cheaper to produce for many different reasons, and you're dealing with fewer CPUs overall to reach (or exceed) the desired core count, and you're achieving all this without losing any ipc in single or multi-threaded apps, assuming clock speeds stay the same or improve.

The boards will be smaller too, and should be easier to cool. Can you imagine a 2p mangy-cours in a 1u rack?

The only disadvantage I see to more cores-per-CPU is that you'll wind up with less memory per core unless there are a hell of a lot of DIMM slots available. But that's not necessarily a problem.

Viditor · Aug 26, 2009

Originally posted by: Phynaz

Only you would be silly enough to ask someone to prove a negative...

Click to expand...

That's a nice one, you say it's impossible, and then say it's impossible to prove that it's impossible.

Maybe you should prove the positive first...all you've shown is an obscure offhand comment by Otellini that may or may not have anything to do with CPUs.

First, I mentioned testing and packaging together because they happen in the same place (i.e. I never said that you would package first).

Click to expand...

Packaging does not always happen in the fab. I'm not even sure that GF currently packages in the fab.

GF does not package in the Fab, not did AMD. They package near the supply and testing centers to reduce costs. Packaging and testing centers are located next to supply centers.

So you actually believe that Intel can create a full turn AND package/test all in half the time that any other company can just get a wafer out???

Click to expand...

No, I never said that. Please go back and read again.

True, you only strongly implied it by comparing AMD's 13 weeks (which is their time to create a full turn AND package/test/ship).

Industry standard for just the wafer out is 10-12 weeks

Click to expand...

But Tux says they run them in six weeks...Who do we believe, an AMD fanboy or an Intel engineer?

No, he didn't...

Phynaz · Aug 26, 2009

Time to get your glasses checked friend

Okay, you win, AMD can produce chips faster and cheaper than Intel.
And the WSJ is an obscure publication.
And Otellini's remarks in an interview are offhand comments.
And I have my facts mixed up.
And IDC will be along anytime now to prove you're right.

Feel better now? We wouldn't want your ego bruised or your feelings to get hurt.

Oh, BTW, How's that Athlon II 550 working out for you? :laugh::laugh::laugh::laugh::laugh:

Idontcare · Aug 26, 2009

Gentlemen if I could weigh in here I think I may be able to shed some light on how you all are likely talking past each other.

First up, TuxDave is probably the easiest to explain. He is referring to reticle respins (steppings).

When a design team completes a new mask layer for tapeout its "sent to the factory" which really means its sent to the photomask maker (Intel uses internal, practically everyone else uses Toppan). Once the mask maker has completed the reticle (which includes its own bevy of inspections, verifications, and validations) the reticle goes to the fab where it is physically placed into a scanner's library and used to print wafers. A modern design MPU will have 60-80 photomasks.

Now for the design cycle of creating new steppings there is a certain degree of priority placed on the wafers in the fab to get the silicon turned thru the fab and back in the hands of the design team as they are gated by data at that point and time is money.

I've worked in a fab where we had four levels of priority, highest priority was called a P0 (pronounced pee-zero), next highest was a P1, then P2. Manufacturing, the vast volume of production wafers, in the fab operated at P3, the lowest priority.

Each priority level corresponds to a targeted "days per mask level" metric. P0 priority was targeted to enable 0.5 days per mask level. This was "crisis mode, OMFG we are going to lose a key account with one of our top-ten customers if this hot lot moves any slower than this" type speed in the fab. The lot was hand-carried 24/7 from tool to tool, tools were "held up" in advance to ensure they had nothing in their queue that would prevent the P0 lot from not immediately processing. We had a team of employees hired and dedicated to 24/7 on-site situational monitoring of these P0 lots, and full engineering support, dedicated every night (even as an R&D process development engineer I had to work the night shift for P0 lot coverage for the fab once every 30-45 days for 2 yrs) as well as during the day.

This support infrastructure was above and beyond what it took to run the rest of the fab to get the other 99% of the wafers out of the fab at P3 priority.

And even with all these resources dedicated to hitting 0.5 day/mask-level cycle time the whole fab and operation could not support more than three (3!) P0 lots at any one time.

(P1 lots were targeted at 1 day per mask level, P2 lots were 1.5 days per mask level)

So I know firsthand what it takes to make a lot move thru a fab at that kind of cycle-time. A 9LM device with about 60 masks moving at absolute breakneck speeds (which was only done for respins, quals, and absolute top corporate priority hot lots) will spend no less than 30 days in the fab, not including test and packaging.

You could daisy-chain the tools end-to-end such that the lot goes from one tool right into the next in one long assembly line and for a 60 mask IC the cycle time will not be any faster than 0.5 days per mask. (one caveat there, pm me if you really must have even more details)

So for TuxDave to say the design guys see fab turnaround time of ~6wks on stuff that is gated by getting silicon back from the fab that is quite explainable and at the same time entirely different from saying the average cycle-time for the other 99% of the fab's wip is ~6 weeks. The typical respin at TI was done on P1 priority, about 6 weeks fab turnaround time including packaging. The impact of running P0 type lots thru a fab on the rest of the operation is so severe that it really is restricted for all but the truly crisis situations, not every new chip design or reticle respin warrants 0.5 days/mask cycle-time.

There, done with that, please no more conflating design loop and development R&D cycle-time with production cycle-time metrics.

Now to the situation with Otellini's comment regarding the factories hitting 6weeks...we really have no idea as to what all was factored into his statement, he could have been talking about 65nm chipsets with 6LM and so forth for all we really know.

What we do know is that the likelihood of Otellini speaking about cycle-time metrics and John Fruehe speaking about cycle-time metrics are guaranteed to be two guys speaking about two widely disparately defined cycle-time metrics unless for some odd coincidence the assessment methodologies employed by both individuals (or more likely their direct reports) are referencing the exact same IC or competing IC's.

Is a 6 wk cycle-time possible for a 45nm 9LM device with double-patterning steps (mask adders) and replacement gate (mask adders) integration? Sure it is, 4 weeks is possible. A P1 level priority (TI nomenclature, not Intel's) I'd expect the lot to take 6 weeks to get through the fab.

But can an entire fab operate at P1 level of pace (1 day per mask level)? Phenomenal if true, which is what I left it at.

Is 13 wks a reasonable average production lot cycle-time for a 45nm 10LM SOI immersion-litho and standard gate integration? Yes, absolutely. Outside of the solitary quote from Otellini I know of no other fab or IDM that operates any faster than that (right around 2 days per mask level) for their bulk production volume in the fab.

(and by the way, casting cycle-time metrics as "days per mask level" is an industry standard, kind of like how the airline industry uses "seat-miles" as a common metric for benchmarking each other...so when I see Otellini making woefully unqualified cycle-time comments and not even using the standard cycle-time metric lingo it really raises a red-flag to me that this guy just might not be talking about what we all think he is talking about, same with Fruehe and his 13wk comment, unqualified as to device, technology, and mask levels so it is a worthless data point for comparing to other IDM's)

Phynaz · Aug 26, 2009

Wow, heck of a post IDC.

Thanks for that.

deputc26 · Aug 27, 2009

I love these discussions because I always learn a lot, the arguments help me to construct a more precise picture of what is going on. So thanks to Phynaz, Viditor, Tux Dave, IDC and BitByBit for contributing knowledge to this forum that is difficult to find/stumble on elsewhere.

heyheybooboo · Aug 27, 2009

Magny-Cours and other improvements on the enterprise side will keep AMD treading water for the next few years - in the fight for survival against the eeeville maniacal WIntel (excuse me if I vent too much - I'll try to keep a good thread (for the most part) on track).

Originally posted by: DrMrLordX ~~~

In the dual-core days, it took an 8p board and 8 very expensive CPUs to hit 16 cores on an AMD platform.
In the quad-core days, it took a 4p board and 4 very expensive CPUs to hit 16 cores on an AMD platform.
Once Mangy-Cours is released, it will take a 2p board and 2 very expensive CPUs to hit - wait for it - 24 cores on an AMD platform.
~~~

Consider what the Doc has said, relate it to Viditor's "hops" and roll in NUMA.

"Hops" are a big deal in the AMD enterprise arch, and soon to be a big deal with the enterprise i7. Previously, Intel MP communicated over the FSB - no longer the case moving forward with QP.

Each socket is a node with its own DIMM bank. To keep it simple, think of it as "Node0-Bank0", "Node1-Bank1", etc. Best practice is for each thread or process spawned on a node to access memory allocated locally. Node0 <---> Bank0, right?

In AMD parlance when a cpu must look outside its node it is called a 'Hop': "Node0 <--->Bank1" being a single hop. In original 4p systems HT links only existed with your neighboring node so "Node0 <--->Bank3" would be 2 hops. Thankfully, I have never dealt with 8P - LOL - and have no idea how folks deal with those hops (Actually, I do, and am thankful I never have had to deal with them).

The end result of all these hops are page faults and this can be a serious NUMA bummer. I think in the past the second hop has been called a 'crossfire' (not the other one) and this may effect operations more than 30%.

The reason I used the snarky 'WIntel' comment is heretofore Microsoft has considered the NUMA architecture to be esoteric (their words). With Intel abandoning the FSB, integrating the memory controller and using QP Microsoft has become more 'NUMA aware' in promoting multi-core non-uniform memory access to harness performance and capacity effectively.

Setting process and thread affinity with allocated local memory will help but if software developers are slow with parallelization for multiple cores how do you think confining them to a single node is going to work?

Hopefully, all this will lead to big improvements in the Windows NUMA APIs and operation which will in turn positively effect the development and operation of Magny-Cours and the i7.

VirtualLarry · Aug 27, 2009

Originally posted by: Nemesis 1

Well see about the performance VS Intel . But your way wromg about AMD s being Cheaper than INTELS.

1 Intel is a fully modular Chip= Intel can Add cores very cheaply.

2. Amd pays Intel for every chip/core they sell. AMD uses more expensive process.

AMD has more defects because of Intels Modular Chip. Intel SELLs in higher Volumn so they utilize more of their resources to produce lowest cost chip . = MARGINS

AS for MCM Intel used L2 cache for core communacations . FSB was for offcore memory . AMD is not using L3cache for the 2 Six core AMD to cummunicate with each other . If ya don't get this part your hopelessly lost.

Dude, when you make no technical sense, your fanboyism shows through.

I fail to see how Intel can "add cores very cheaply". Each core or set of cores that they add, still has to go through all of the debugging that any design goes through, and more cores == more silicon real-estate == lower yields == higher costs. Or did you forget completely the lesson that Intel's C2Q dual-die chips told you about mfg costs.

I don't know if AMD pays Intel for every chip that they sell. If so, then Intel must pay AMD for every chip that they sell, since all modern Intel chips are 64-bit, and AMD invented those extensions.

Truely, they probably have a wide-reaching patent cross-license agreement, and they probably pay each other nothing.

AMD using the more expensive process might not be true. Intel's 45nm HKMG is probably a bit more costly than AMD's 45nm non-HKMG. And 32nm is more costly than that, as far as the actual process technology goes. So you're wrong there, AMD is using a cheaper process.

Now if you want to talk yields and overall mfg efficiency and cost, Intel may have a leg up there. But their more advanced process technology itself is not cheaper, it's more expensive.

"AMD has more defects because of Intel's modular chips".

What are you trying to say there? It's a bit of a non-sensical statement. I fail to see what one has anything to do with another.

TuxDave · Aug 27, 2009

Originally posted by: Nemesis 1

"AMD has more defects because of Intel's modular chips".

What are you trying to say there? It's a bit of a non-sensical statement. I fail to see what one has anything to do with another.

While we're starting to venture to fabrication space (which I only have a 10,000 mile high view of), I think there's a couple implied steps missing. Assuming an MCM design vs a single chip that's twice the size, you get lower # of dies per wafer and a higher defect rate per die (due to its larger area).

So that could translate to a lower # of usable chips per wafer.

Idontcare · Aug 27, 2009

Originally posted by: TuxDave

Originally posted by: VirtualLarry

"AMD has more defects because of Intel's modular chips".

What are you trying to say there? It's a bit of a non-sensical statement. I fail to see what one has anything to do with another.

Click to expand...

While we're starting to venture to fabrication space (which I only have a 10,000 mile high view of), I think there's a couple implied steps missing. Assuming an MCM design vs a single chip that's twice the size, you get lower # of dies per wafer and a higher defect rate per die (due to its larger area).

So that could translate to a lower # of usable chips per wafer.

TuxDave your argument is valid, as is any argument relating the tradeoffs between yield and diesize, but the miscommunication here on this Nemesis quote is that he was specifically discussing the modular core design aspects of Nehalem architecture but he referred to it as modular chip (meaning the resultant product of a modular core design is the chip, and he is talking about the chip in the end) which has spawned all manner of confusion and posting about MCM vs. defect density which are entirely besides the point.

The beef over the statement is that no Intel design engineer is legally lawfully impacting the defect density inside AMD's fab. There are unlawful ways this could be done, but I don't think Nemesis means that.

AMD's defect levels and Nehalem's modular core architecture (enabling easier 2C, 4C, 6C and 8C product generation, at least according to Intel PR) are two entirely uncorrelated events. Both exist, neither is the progenitor or modulator of the other.

TuxDave · Aug 27, 2009

Originally posted by: Idontcare

Originally posted by: TuxDave

Originally posted by: VirtualLarry

"AMD has more defects because of Intel's modular chips".

What are you trying to say there? It's a bit of a non-sensical statement. I fail to see what one has anything to do with another.

Click to expand...

While we're starting to venture to fabrication space (which I only have a 10,000 mile high view of), I think there's a couple implied steps missing. Assuming an MCM design vs a single chip that's twice the size, you get lower # of dies per wafer and a higher defect rate per die (due to its larger area).

So that could translate to a lower # of usable chips per wafer.

Click to expand...

TuxDave your argument is valid, as is any argument relating the tradeoffs between yield and diesize, but the miscommunication here on this Nemesis quote is that he was specifically discussing the modular core design aspects of Nehalem architecture but he referred to it as modular chip (meaning the resultant product of a modular core design is the chip, and he is talking about the chip in the end) which has spawned all manner of confusion and posting about MCM vs. defect density which are entirely besides the point.

The beef over the statement is that no Intel design engineer is legally lawfully impacting the defect density inside AMD's fab. There are unlawful ways this could be done, but I don't think Nemesis means that.

AMD's defect levels and Nehalem's modular core architecture (enabling easier 2C, 4C, 6C and 8C product generation, at least according to Intel PR) are two entirely uncorrelated events. Both exist, neither is the progenitor or modulator of the other.

Yeah, taken directly from how it was written, it doesn't make much sense. I'm just helping the discussion along. I have a pet peeve of discarding arguments on the basis that it was poorly communicated (mainly because engineers are horrible at communication and so if I did that on a normal basis I would be throwing away tons of great ideas)

dmens · Aug 27, 2009

mainly because engineers are horrible at communication

that's unpossible!

Nemesis 1 · Aug 27, 2009

Originally posted by: deputc26
I love these discussions because I always learn a lot, the arguments help me to construct a more precise picture of what is going on. So thanks to Phynaz, Viditor, Tux Dave, IDC and BitByBit for contributing knowledge to this forum that is difficult to find/stumble on elsewhere.

Yes . This is a really good thread. One of the Best ones going on the net right now on Hardware. Add in how we agree to disagree . Its almost perfect . Looks like men chatting.

Search

AMD describes multichip module: 12-core Magny-Cours

DrMrLordX

Lifer

Viditor

Diamond Member

Phynaz

Lifer

Idontcare

Elite Member

Phynaz

Lifer

deputc26

Senior member

heyheybooboo

Diamond Member

VirtualLarry

No Lifer

TuxDave

Lifer

Idontcare

Elite Member

TuxDave

Lifer

dmens

Platinum Member

Nemesis 1

Lifer

TRENDING THREADS