6 core AMD on Desktop Preview

Idontcare · Oct 20, 2009

Originally posted by: tommo123
no idea tbh, just assumed it would be easier for em to stitch 2 quads together than 2 6 cores

Good point, once they invest in developing the infrastructure necessary to field MCM'ed CPU products there really is no technical reasoning why they wouldn't proceed to create 2x2, 2x3, 2x4, 2x5, 2x6, etc SKU's to fill-out their portfolio. I had not considered this until now, good posting on your behalf :thumbsup:

aigomorla · Oct 20, 2009

the 6core chip is way too big to stack side by side on the same platform.

tommo123 · Oct 20, 2009

Originally posted by: Idontcare

Originally posted by: tommo123
no idea tbh, just assumed it would be easier for em to stitch 2 quads together than 2 6 cores

Click to expand...

Good point, once they invest in developing the infrastructure necessary to field MCM'ed CPU products there really is no technical reasoning why they wouldn't proceed to create 2x2, 2x3, 2x4, 2x5, 2x6, etc SKU's to fill-out their portfolio. I had not considered this until now, good posting on your behalf :thumbsup:

ta, it's rare for me

tommo123 · Oct 20, 2009

Originally posted by: aigomorla
the 6core chip is way too big to stack side by side on the same platform.

as it is now? what about when they do a node shrink?

aigomorla · Oct 20, 2009

Originally posted by: tommo123

Originally posted by: aigomorla
the 6core chip is way too big to stack side by side on the same platform.

Click to expand...

as it is now? what about when they do a node shrink?

ummm..

the 32nm 6 core chip is still too big to stack side by side.

Trust me.

You think 45nm to 32nm is a significant shrinkage?

Lonyo · Oct 20, 2009

Originally posted by: aigomorla

Originally posted by: tommo123

Originally posted by: aigomorla
the 6core chip is way too big to stack side by side on the same platform.

Click to expand...

as it is now? what about when they do a node shrink?

Click to expand...

ummm..

the 32nm 6 core chip is still too big to stack side by side.

Trust me.

You think 45nm to 32nm is a significant shrinkage?

You think it matters?

http://arstechnica.com/hardwar...12-core-server-cpu.ars

Due in 2010 on AMD's 45nm SOI process, Magny-Cours uses the same basic core microarchitecture as the current Shanghai quad-core server processor, so if there's any improvement in per-thread performance it will have to come from better system design.

The basic idea behind Magny-Cours is simple: take two six-core Istanbul processors, downclock them a bit to reduce power, and squeeze them into a multichip module (MCM) so that they can fit into a single socket.

That's AMD and 45nm.
You think that 32nm 2x6 wouldn't be possible if they think 45nm 2x6 is possible?

aigomorla · Oct 20, 2009

Originally posted by: Lonyo

That's AMD and 45nm.
You think that 32nm 2x6 wouldn't be possible if they think 45nm 2x6 is possible?

nice 1...

i stand corrected...

Accord99 · Oct 20, 2009

The 12 core requires the much bigger Socket G34 though. There's virtually no chance a version will ever be released on anything existing today.

aigomorla · Oct 20, 2009

Originally posted by: Accord99
The 12 core requires the much bigger Socket G34 though.

then thats not fair.. LOL...

evolucion8 · Oct 20, 2009

Isn't the Core 2 Quad CPU connected through MCM aka Dual Core sandwich? But the CPU die is considerably smaller than AMD's Istanbul.

cbn · Oct 21, 2009

Maybe it would be better if AMD got to 32nm dual cores quicker (ie, smaller dies on the new process like they are doing with the graphics cards).

Too bad intel e6xxx is 82mm2 die to Athlon II X2 117mm2 die.....both at 45nm.

Cookie Monster · Oct 21, 2009

I dont think istanbul's desktop performance is representative of what thuban will be. As thilan pointed out, the platform/memory cripples the CPU big time (that nVIDIA chipset is old and thats probably an understatement).

Idontcare · Oct 21, 2009

Originally posted by: Just learning
Maybe it would be better if AMD got to 32nm dual cores quicker (ie, smaller dies on the new process like they are doing with the graphics cards).

Too bad intel e6xxx is 82mm2 die to Athlon II X2 117mm2 die.....both at 45nm.

At that die-size, the sub-120mm^2 but larger than 75mm^2, the difference in functional yield impact is quite a minimal cost adder to the larger die.

If GlobalFoundries were operate near capacity and was wafer-starts limited then the larger die would have an bigger cost-footprint as it would drive capex costs, but with the ongoing recession GF's is no where near 100% utilization on their 45nm line.

So, yes the die-size delta does result in a slightly higher cost structure for the Athlon II X2's versus the Intel E6xxx's but I'd be surprised if the net cost delta resulted in more than a 10-15% hit to gross margins per chip (not great but not problematic either).

Some back of the envelope calcs...first lets estimate die per wafer (DPW) using this simplistic equation for DPW estimation. (if you happen to know the LxW specifics of the respective chips then we can use Hackerott's Java calculator to get an even better estimate of DPW)

DPW = d*p*(d/(4S)-1/v(2S)) where d is wafer size in mm and S is diesize in mm². (note we use 294mm for the effective wafer diameter to accommodate a 3mm wafer edge exclusion (WEE))

For Athlon II X2 I get an estimate of 519 DPW, and for E6xxx I get 755 DPW (using your diesize numbers, which I am assuming are correct, have not verified).

Now we need to make some assumption regarding production costs per wafer (CPW) and yields. From production cost standpoint we have Intel's double-pattern (higher CT, higher cost, lower yield) HKMG (higher cost) 45nm process technology which has been in production roughly 12 months longer than GF's (expect lower D0 from better process maturity), versus GF's immersion-litho (lower CT, higher cost, lower yield) SOI (higher cost) 45nm process technology which has been in production roughly 12 months less than Intel's (thus we'd expect slightly higher D0 a priori).

So I'd propose from a cost per wafer standpoint we just call it a wash, they both probably cost pretty close to the same amount for very different reasons. A leading edge 4X nm wafer at a foundry costs around $6k-$7k (that includes the foundries markup, which GF is going to charge AMD but Intel doesn't charge itself of course in a way that carries thru to EPS) so the net wafer cost to Intel is probably closer to $3k-$4k versus AMD's net cost per wafer is going to likely be some 30-50% higher ($4k-$6k).

(BTW my basis for this cost estimation comes from the real-life experience with the cost/pricing difference between TI's internal process tech versus buying wafers on the same process node at leading edge foundries, I'm doing my best to keep these estimates "real world'ish" just in case anyone is wondering what basis I am using for estimating AMD's cost structure versus Intel's)

Now that we got our DPW estimates and our CPW estimates we just need to make a reasonable stab at yield estimates. There are two kinds of yield, functional and parametric. Functional we can estimate, parametric we basically have to just guess. (and when we have to basically just guess, the safest thing to do is slap an a priori clause on our guess and say "we expect them to be essentially equivalent for the purposes of this exercise)

We know that both die-size and fab defectivity levels (D0) contribute to estimating functional yields, this resource discusses a bevy of yield-estimation equations however I personally like to use this equation as discussed in the experimental section of this paper since it enables us to account for the clustering nature of killer defect sources which are a little more realistic than random distribution models.

Alpha is the defect clustering factor, from a classical negative binomial distribution used to calculate yields, it's what distinguishes yield estimations based on areal properties from those created by the assumption of an entirely random Poisson distribution of killer defects (the simpler rule of thumb equation that you see here in which the limit of alpha goes to infinity).

The relevance of this equation is that this is how we account for the process maturity difference between Intel and AMD given that there is about a one year gap between the production release of their respective 45nm process technologies. A reasonable D0 value for a mature process technology is a D0 of 0.10 defect/cm². To see how D0 impacts functional yield levels for the same IC I prepared this reference graph. Risk production for a chip typically occurs when the D0 is anywhere below 1 defect/cm² and entitlement D0 is anywhere below 0.2 defect/cm².

It only seems fair to assume GF's 45nm D0 is slightly higher than Intel's 45nm D0 given that Intel has had an additional year in production to improve the process maturity. Based on experience it usually takes about one year to reduce one's D0 by 50% once a node is in production (sub-0.5 defects/cm²), so GF's D0 can be reasonably expected to be around 2x that of Intel's D0, but let's assume GF has used their APC (advanced process control) to get their D0 lower at an even more aggressive pace (say 1.1x that of Intel, or 55% DD reduction per year) which would be the 12 month production gap really only incurs about a 36% penalty to D0 from reduced process maturity by my calcs.

So let's assume Intel's 45nm is operating at 0.15 defects/cm² and GF's 45nm is operating at 0.20 defects/cm². Then going by our graph above, the functional yield of E6xxx is around 88% and the functional yield of Athlon II X2 is around 80% (see graph).

Now we don't know parametric yield, that's the part where you lose chips that simply require too much Vcc to operate at sellable clockspeeds or simply can't reach sellable clockspeeds regardless of the Vcc. But as mentioned above, for sake of argument lets assume both companies are operating at or nearly at the same parametric yield levels. (that is a big caveat, but we got nothing to justify a quantitative assumption of any other value)

Let's tally our costs (sans parametric yield losses)...we've got a net 668 sellable DPW for E6xxx and a net 413 DPW for Athlon II X2. We've got Intel's per wafer production costs sitting around $3k-$4k and AMD's per wafer net production costs sitting around $4k-$6k (includes GF's markup).

That's puts the estimated production cost for an E6xxx at $4.50-$6.00 and for an Athlon II X2 at $9.70-$14.50. (remember the ranges of the cost estimates are correlated, if you use Intel's upper range then you must use AMD's upper range, and vice versa for the lower range)

So we are looking at AMD's cost structure for the larger diesize Athlon II X2 as incurring between $5-$8 more than its competition and the bulk of that cost differential is not the die-size delta but rather the markup costs that GF is going to be charging AMD for their foundry services.

cliffs: Despite being large in diesize, the Athlon II X2 is only costing AMD around $5-$8 more to produce and sell than Intel's smaller sized E6xxx chips.

evolucion8 · Oct 21, 2009

AMD claimed that Istanbul was designed to offer 50% more performance compared to its Phenom counterpart at the same clock speed. So in a desktop environment where there's no RAM limitation performance like in those tests, it should perform really close to Nehalem.

Cookie Monster · Oct 21, 2009

The 50% figure was for server applications and it does that for sure. I dont know about desktop performance.

IntelUser2000 · Oct 21, 2009

Can't say Thuban will reach similar clock speeds to Phenom. While on server, Shanghai's top frequency wasn't high and Istanbul could become fairly close to Shanghai's clock, with desktops, the 3.4GHz frequency is near top if not the top frequency of the part.

I expect in the 2.8-3.0GHz range, similar to Istanbul today. AMD might fare a little better off at equal core counts at 6 cores than they did with 4 cores against Intel considering,

-Gulftown's frequency is rumored to be significantly lower(Turbo or not)
-In multi-threaded apps, some apps might use 6 threads, but not 12
-Per-thread performance is probably the only gain with Gulftown(relatively) as the shared L3 gets larger

aigomorla · Oct 21, 2009

Originally posted by: IntelUser2000
-Gulftown's frequency is rumored to be significantly lower(Turbo or not)
-In multi-threaded apps, some apps might use 6 threads, but not 12
-Per-thread performance is probably the only gain with Gulftown(relatively) as the shared L3 gets larger

umm... no.. and cant tell ya..

ummm no... everything ive thrown at it used all 12

ummm no... die shrinkage.. did u forget that?

Okey i go quiet again.

If AMD keeps the price below 500.
I think there hexcore might win.
If its priced in the 4 figures... gulftown will smash it.

IntelUser2000 · Oct 22, 2009

First response: There aren't much info out there, so if your sources are close to Intel, I can't refute that. The clocks I've heard so far is 3.06GHz for high-end(prob EE), 2.53GHz and 2.4GHz.

Second: http://global.hkepc.com/3846/page/5#view

Significant majority of the apps don't use above 4 threads

Third: Comparison of Clarkdale vs. Havendale have shown there are no architectural changes in Westmere beyond AES instruction set and more caches with higher number of cores.

aigomorla · Oct 22, 2009

Ummm.... ask any mod here..

I gave the mods a special sneak peak.

Trust me it used all 12 threads, and might of even made some people cry at its sheer computational power.

Also look at what i found on the web..

http://img406.imageshack.us/img406/7638/34199880.jpg

Guess what that is... its 32nm... and i dont think its possible to stack them side by side.

evolucion8 · Oct 22, 2009

Originally posted by: IntelUser2000
First response: There aren't much info out there, so if your sources are close to Intel, I can't refute that. The clocks I've heard so far is 3.06GHz for high-end(prob EE), 2.53GHz and 2.4GHz.

Second: http://global.hkepc.com/3846/page/5#view

Significant majority of the apps don't use above 4 threads

Third: Comparison of Clarkdale vs. Havendale have shown there are no architectural changes in Westmere beyond AES instruction set and more caches with higher number of cores.

I rode that somewhere, and if it proves to be true, it could mean that the AMD Hex core should be more competitive this time against the Nehalem architecture, which makes some sense. Nehalem is a Quad Core with Hyperthreading which means that it shares its execution resources among two threads in each core, so the best case scenario of performance improvement theorically should be 50% faster than an identical Nehalem with Hyper Threading disabled with same clock speeds, but that also means that it will never behave like a true Octa core or even an Hexa core, but usually such gains in performance aren't that significant.

But an Hex processor means that it can process 6 threads each one with its own execution engine, so while both architectures can't be compared exactly because they're quite different, it should be more competitive against Nehalem, and if its priced right, it can also offer sligh pressure against Gulftown, we need competition so the prices can be driven down.

*Disclaimer; All that I posted is just speculative, don't flame me please, if you can post more informative stuff it would be great

piesquared · Oct 22, 2009

Originally posted by: aigomorla
Ummm.... ask any mod here..

I gave the mods a special sneak peak.

Trust me it used all 12 threads, and might of even made some people cry at its sheer computational power.

Also look at what i found on the web..

http://img406.imageshack.us/img406/7638/34199880.jpg

Guess what that is... its 32nm... and i dont think its possible to stack them side by side.

12 threads? Just wait til you see Magny Cours numbers then..

Accord99 · Oct 22, 2009

Originally posted by: evolucion8
But an Hex processor means that it can process 6 threads each one with its own execution engine, so while both architectures can't be compared exactly because they're quite different, it should be more competitive against Nehalem, and if its priced right, it can also offer sligh pressure against Gulftown, we need competition so the prices can be driven down.

Nehalem+HT has already proven itself to be a multi-threaded monster though, it's currently already more than a match for 6-core Istanbuls or Dunningtons in well-threaded tasks and with Turbo, doesn't sacrifice performance in typical desktop applications.

Originally posted by: piesquared
12 threads? Just wait til you see Magny Cours numbers then..

The impressive thing is Gulftown with a large enough clock differential (say 3GHz versus 2Ghz which wouldn't be unexpected given TDP limits) will beat Magny Cours even in well-threaded applications and tasks.

aigomorla · Oct 22, 2009

Originally posted by: piesquared

Originally posted by: aigomorla
Ummm.... ask any mod here..

I gave the mods a special sneak peak.

Trust me it used all 12 threads, and might of even made some people cry at its sheer computational power.

Also look at what i found on the web..

http://img406.imageshack.us/img406/7638/34199880.jpg

Guess what that is... its 32nm... and i dont think its possible to stack them side by side.

Click to expand...

12 threads? Just wait til you see Magny Cours numbers then..

Umm im beating full blown 3.2ghz paired gainestowns (16 threads) on a single gulftown overclocked.
(wont tell you by how much tho)

Magny Cours you say?

Originally posted by: Accord99

Originally posted by: piesquared
12 threads? Just wait til you see Magny Cours numbers then..

Click to expand...

The impressive thing is Gulftown with a large enough clock differential (say 3GHz versus 2Ghz which wouldn't be unexpected given TDP limits) will beat Magny Cours even in well-threaded applications and tasks.

LOL read my post above... im already beating dual gainestown systems with a high enough clock differential.. and yes its possible to do even on air.

Sub 500 AMD won hexcore... 4 figures.. get ready for slaughter..

Im being serious guys... 15k CB10 scores isnt even impressive.

Let me see an AMD pull average of 30K CB10 scores, and then we can start comparing.

Idontcare · Oct 22, 2009

Other than generating impressive CB10 benchmark scorez, what will folks be using Gulftown on desktop for? (or Magny-Cours for that matter?)

I know what I'll be doing, but that is so course-grained that I could do it with three quad-core rigs wired with 1G-ethernet just as well as I can with a single 12 core rig. For me its all about TCO/core and these poor-man multi-core rigs are great.

But what the heck else are people doing with them?

Oh I just realized one thing I'd do with a gulftown right now, this freaking TMPGEnc batch that is running here will take another 24hrs to complete. I wouldn't mind if that was just another 5 minutes instead of 24hrs. TMPGEnc doesn't do distributed computing, so I need all my threads available to the same windows session.

aigomorla · Oct 22, 2009

LOL IDC....

Check your PM Box...

6 core AMD on Desktop Preview

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Platinum Member

Platinum Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Lifer

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Platinum Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Platinum Member

Lifer

Diamond Member

Elite Member

Platinum Member

Diamond Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Platinum Member

Golden Member

Platinum Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member