The technology of AMD's jaguar

inf64 · Sep 1, 2012

Jaguar will be a speedy little chip 🙂. I wonder how will 18W Jaguar compare to 18W Trinity,should be interesting.

Arkadrel · Sep 1, 2012

inf64 said:
Jaguar will be a speedy little chip 🙂. I wonder how will 18W Jaguar compare to 18W Trinity,should be interesting.

I think a 15watt Jaguar with 4cores@2ghz or so, will beat what a trinity @17watts can do.
Simply because jaguar was designed for the low power envalope, and because of the node differnces (28nm vs 32nm).

Homeles · Sep 1, 2012

28nm bulk should be comparable to 32nm SOI.

Phynaz · Sep 1, 2012

grimpr said:
I trust this simulations

Phenom 1 my friend, Phenom 1.

Olikan · Sep 1, 2012

Phynaz said:
BULLDOZER my friend, BULLDOZER .

corrected

Cerb · Sep 1, 2012

Olikan said:
IPC = "big fat cores" my @T$

That's partly because they made Trinity's parent a frequency monster, and partly because Bobcat and now Jaguar aren't that.

Both their high and low end now are both made to be, "enough," rather than, "the best," at least per-core. The high end was executed somewhat poorly, however, while the low end was executed extremely well.

BD aught to have been able to take advantage of its big fatness to get near peak IPC fairly often, instead of long delays everywhere, had it been made for decent speeds (excepting some code that might only be able to get 1-2 instructions/cycle through the shared decoder--x86 VLE and all that).

Homeles · Sep 1, 2012

Phynaz said:
Phenom 1 my friend, Phenom 1.

Phenom I had a crippling flaw: its top clock speeds were terrible. I'm sure Steamroller will have its own set of issues, but it'd be pretty hard to top Phenom I in the "flop category."

Olikan · Sep 1, 2012

Homeles said:
Phenom I had crippling flaw: its top clock speeds were terrible. I'm sure Steamroller will have its own set of issues, but it'd be pretty hard to top Phenom I in the "flop category."

well, that's true...
usually the first chips of new arquitectures are the most problematic....
...PhI, BD, P4, Fermi, cell...and so on...

inf64 · Sep 1, 2012

It's obvious that SR core will be the best "bulldozer" when it launches. What we don't know is how much faster will it be. If they get 15-20% IPC jump and 10% clock jump vs FX8150,then I would consider it a job well done.

jpiniero · Sep 1, 2012

inf64 said:
It's obvious that SR core will be the best "bulldozer" when it launches. What we don't know is how much faster will it be. If they get 15-20% IPC jump and 10% clock jump vs FX8150,then I would consider it a job well done.

Aren't there rumors that Steamroller isn't going to be released on desktop, only mobile and server?

inf64 · Sep 1, 2012

jpiniero said:
Aren't there rumors that Steamroller isn't going to be released on desktop, only mobile and server?

No,SR core will be in client chips too.

grimpr · Sep 1, 2012

Phynaz said:
Phenom 1 my friend, Phenom 1.

That was a long long time ago, compute power and software for cpu simulations has improved about 2x and Jaguar is a pretty beefed up Bobcat not a brand new core like BD was.

pelov · Sep 1, 2012

Does anyone know what the size of the chip will be and how beefy the graphics are? This is the most interesting AMD product since Bobcat and I'm wondering if anyone has a bit more info. One of these little guys in a slim design laptop might sell quite well.

Homeles · Sep 1, 2012

jpiniero said:
Aren't there rumors that Steamroller isn't going to be released on desktop, only mobile and server?

Those rumors are started by morons that don't understand that the chips that don't make server-grade qualification are binned as desktop chips. AMD isn't going to suddenly start throwing away less-than-perfect but still fully functional chips.

Arkadrel · Sep 1, 2012

pelov said:
Does anyone know what the size of the chip will be and how beefy the graphics are? This is the most interesting AMD product since Bobcat and I'm wondering if anyone has a bit more info. One of these little guys in a slim design laptop might sell quite well.

http://www.semiaccurate.com/forums/showpost.php?p=167771&postcount=84

APUs have lower transistor density than GPUs.
380M transistors ÷ 75 mm² = 5.06M transistors/mm²

Click to expand...

http://www.chip-architect.com/news/A...eview_Atom.jpg
Bobcat ~75mm²
cpu part is 3+3+4.6+4.6=15.2mm²(cores and cache)
75-15.2=59.8mm² for the IGP+memory controller
The same gpu as this igp is Radeon HD 7350 and it's size is 59mm² on 40nm process so the gpu part has the same density.

Even if transistor density doubled for going from 40 nm to 28 nm (actually the more realistic scaling would be 1.42x), it still wouldn't be in the same ball park as Cape Verde and Pitcairn, which have transistor densities of >12M transistors/mm².

Click to expand...

Actually moving from 40->28nm transistor density doubled in GPU.
Turks 118mm² 716 million Transistors
Cape Verde 123mm² 1500 million Transistors

Back to jaguar die size
Radeon HD 7470 is 67mm² with 370 million Transistors(configuration: 160:8:4; 64bit)
So a GCN chip with ~700 million transistors should be around the same size but 2CU in my opinion shouldn't be more than 500-550 million transistors so I think the size would be ~55-60mm² for igp+memory controller.
Let's add 4 cores 4x 3.1mm² + 4x3mm²(cache should be smaller because this value is for 40nm and not for 28nm) and this adds up to 24.4mm².
My final estimation for Jaguar APU is ~79.4-84.4mm². At worst I don't think it would be more than 90mm².
Considering you get more than double of CPU and IGP power I think this die size is very nice at least if AMD's estimation is correct this time.

^ this guy probably knows better than I do, and hes guessing its gonna be around 79-85mm^2.
Which is slightly bigger than the 75mm^2 that the Brazos currently are (the E-350/E-450).

However:

Considering you get more than double of CPU and IGP power I think this die size is very nice

Its gonna be nice for Laptops 🙂

from Anandtech:
http://www.anandtech.com/show/5491/amds-2012-2013-client-cpugpuapu-roadmap-revealed

Kabini and Temash will also integrate the Fusion Controller Hub (FCH, aka South Bridge) making these two APUs AMD's first true single-chip solutions.

This is bound to make them more energy effecient than the bobcats system's too,
without the FCH being on a seperate chip (thats a large older node tech).

pelov · Sep 1, 2012

Back to jaguar die size
Radeon HD 7470 is 67mm² with 370 million Transistors(configuration: 160:8:4; 64bit)

7470 isn't GCN. AFAIK the upcoming 28nm low end APUs will feature GCN architecture. Considering there currently are no low-shader count GCN GPUs, whether APU or discrete, it's difficult to judge just what it's going to look like. If AMD is following their Trinity/Llano trend then the GPU might take up even more of the entire die space. How many CUs will the chips have? He's saying 2, but I'm not sure where he's getting that from.

Let's add 4 cores 4x 3.1mm² + 4x3mm²(cache should be smaller because this value is for 40nm and not for 28nm)

2 ALUs to 1 FPU? Instruction sets? wider pipeline? These are all variables that make a huge difference in overall core size (and the size of the GPU even more so). What we do know is that it's 2MB shared (likely dynamically like Steamroller will feature?). The 512KB per-core shouldn't change but the 28nm bulk shrink favors cache shrinks because they shrink linearly.

Anyone has any AMD info regarding the above points?

NostaSeronx · Sep 2, 2012

ALU width is the same
AGU width is the same
FPU necessary items have been changed from 64-bit to 128-bit.

Other than items required for the FPU not to get bottlenecked 90% of Bobcat is in Jaguar.

With everything provided Jaguar should be no larger than 100 mm²

Arkadrel · Sep 2, 2012

How many CUs will the chips have? He's saying 2, but I'm not sure where he's getting that from.

Its a educated guess, based on 1CU being to little a upg in terms of performance,
and 4CU's being to big and probably memory bandwidth starved.

7470 isn't GCN.

Yeah but what then? compair it to a 7770 with its 10 CU's ?
Thats like 123mm^2 / 5th's = ~24.6mm^2 for the GPU portion?

The reason he used the 7470, was to illustrate how transistors/die space is @40nm.
(370million transisots@40nm vs ~550million@28nm for 2CU = sameish space taken up)

I think he was being conservative, when he said 79-85mm^2 for the intire chip.
Worst case it ends up being around 90mm^2.

I think it ll be slightly smaller, probably closer to 79mm^2 than it is to 90mm^2.

happysmiles · Sep 2, 2012

wouldn't surprise me if 10+ hour battery life became the norm next year

Frankly if I can carry a 11.6" and play TF2 on max settings I'd be happy

Gideon · Sep 2, 2012

It seems that Jaguar could work very well in servers. If a 16 core chip with 128bit memory interface and a big L3 cache could be made, imagine 2 of these on one die. It should totally smoke current 16-core server Bulldozers in parallel applications and wouldn't be that far behind in serial because of the better IPC.

Cerb · Sep 2, 2012

Gideon said:
It seems that Jaguar could work very well in servers. If a 16 core chip with 128bit memory interface and a big L3 cache could be made, imagine 2 of these on one die. It should totally smoke current 16-core server Bulldozers in parallel applications and wouldn't be that far behind in serial because of the better IPC.

With enough cache, it probably could be competitive with the low-2GHz SKUs, sadly; but that would mean 4 of them, not 2, and about as much cache as BD, too. I seriously doubt, even if it could reach 3GHz and beyond, that performance would scale.

pelov · Sep 2, 2012

Arkadrel said:
Yeah but what then? compair it to a 7770 with its 10 CU's ?
Thats like 123mm^2 / 5th's = ~24.6mm^2 for the GPU portion?

Look at the Bobcat die shot:

That's a huuuuuuge GPU. We also know nothing about the GPU in the new 28nm APUs other than that they're GCN, so you've got more than 50% of the die missing. Just how can you make an educated guess with more than 50% of the die being an unknown variable?

I didn't ask because I couldn't pull a number out of my butt; I can. I have many numbers in my butt and pull them out quite liberally (admittedly it can be a pretty messy process). I asked because I was hoping AMD had a slide or document presented at hot chips that pertained to the die size instead of having to take a complete shot in the dark... or reach in deep and pull out slowly. Whichever you prefer.

Cerb · Sep 3, 2012

I've finally gone over more of it, and I find it interesting that they mention several D$, and particularly D$TLB, improvements; and reworked L/S. From other CPUs, I expected some of the times Bobcat choked had to do with I$/I$TLB, but maybe it was data address walks or LSU all along. Faster or concurrent (just says, "enhanced") PT walks sure can't hurt, whether they were a major bottleneck or not, on anything x86, though.

CTho9305 · Sep 4, 2012

Homeles said:
28nm bulk should be comparable to 32nm SOI.

Is there a comparison of device characteristics somewhere? Or an apples-to-apples product comparison?

Olikan said:
Jaguar will have 15% more IPC, and (almost) double the FPU right?

IPC = "big fat cores" my @T$

Does anyone know the benchmarks used to produce their IPC estimates?

The technology of AMD's jaguar

Diamond Member

Diamond Member

Platinum Member

Lifer

Platinum Member

Elite Member

Platinum Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Elite Member

Diamond Member

Elite Member

Elite Member