• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

The technology of AMD's jaguar

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jaguar will be a speedy little chip 🙂. I wonder how will 18W Jaguar compare to 18W Trinity,should be interesting.
 
Jaguar will be a speedy little chip 🙂. I wonder how will 18W Jaguar compare to 18W Trinity,should be interesting.

I think a 15watt Jaguar with 4cores@2ghz or so, will beat what a trinity @17watts can do.
Simply because jaguar was designed for the low power envalope, and because of the node differnces (28nm vs 32nm).
 
IPC = "big fat cores" my @T$
That's partly because they made Trinity's parent a frequency monster, and partly because Bobcat and now Jaguar aren't that.

Both their high and low end now are both made to be, "enough," rather than, "the best," at least per-core. The high end was executed somewhat poorly, however, while the low end was executed extremely well.

BD aught to have been able to take advantage of its big fatness to get near peak IPC fairly often, instead of long delays everywhere, had it been made for decent speeds (excepting some code that might only be able to get 1-2 instructions/cycle through the shared decoder--x86 VLE and all that).
 
Last edited:
Phenom 1 my friend, Phenom 1.
Phenom I had a crippling flaw: its top clock speeds were terrible. I'm sure Steamroller will have its own set of issues, but it'd be pretty hard to top Phenom I in the "flop category."
 
Last edited:
Phenom I had crippling flaw: its top clock speeds were terrible. I'm sure Steamroller will have its own set of issues, but it'd be pretty hard to top Phenom I in the "flop category."

well, that's true...
usually the first chips of new arquitectures are the most problematic....
...PhI, BD, P4, Fermi, cell...and so on...
 
It's obvious that SR core will be the best "bulldozer" when it launches. What we don't know is how much faster will it be. If they get 15-20% IPC jump and 10% clock jump vs FX8150,then I would consider it a job well done.
 
It's obvious that SR core will be the best "bulldozer" when it launches. What we don't know is how much faster will it be. If they get 15-20% IPC jump and 10% clock jump vs FX8150,then I would consider it a job well done.

Aren't there rumors that Steamroller isn't going to be released on desktop, only mobile and server?
 
Does anyone know what the size of the chip will be and how beefy the graphics are? This is the most interesting AMD product since Bobcat and I'm wondering if anyone has a bit more info. One of these little guys in a slim design laptop might sell quite well.
 
Aren't there rumors that Steamroller isn't going to be released on desktop, only mobile and server?
Those rumors are started by morons that don't understand that the chips that don't make server-grade qualification are binned as desktop chips. AMD isn't going to suddenly start throwing away less-than-perfect but still fully functional chips.
 
Does anyone know what the size of the chip will be and how beefy the graphics are? This is the most interesting AMD product since Bobcat and I'm wondering if anyone has a bit more info. One of these little guys in a slim design laptop might sell quite well.

http://www.semiaccurate.com/forums/showpost.php?p=167771&postcount=84

APUs have lower transistor density than GPUs.
380M transistors ÷ 75 mm² = 5.06M transistors/mm²
http://www.chip-architect.com/news/A...eview_Atom.jpg
Bobcat ~75mm²
cpu part is 3+3+4.6+4.6=15.2mm²(cores and cache)
75-15.2=59.8mm² for the IGP+memory controller
The same gpu as this igp is Radeon HD 7350 and it's size is 59mm² on 40nm process so the gpu part has the same density.

Even if transistor density doubled for going from 40 nm to 28 nm (actually the more realistic scaling would be 1.42x), it still wouldn't be in the same ball park as Cape Verde and Pitcairn, which have transistor densities of >12M transistors/mm².
Actually moving from 40->28nm transistor density doubled in GPU.
Turks 118mm² 716 million Transistors
Cape Verde 123mm² 1500 million Transistors

Back to jaguar die size
Radeon HD 7470 is 67mm² with 370 million Transistors(configuration: 160:8:4; 64bit)
So a GCN chip with ~700 million transistors should be around the same size but 2CU in my opinion shouldn't be more than 500-550 million transistors so I think the size would be ~55-60mm² for igp+memory controller.
Let's add 4 cores 4x 3.1mm² + 4x3mm²(cache should be smaller because this value is for 40nm and not for 28nm) and this adds up to 24.4mm².
My final estimation for Jaguar APU is ~79.4-84.4mm². At worst I don't think it would be more than 90mm².
Considering you get more than double of CPU and IGP power I think this die size is very nice at least if AMD's estimation is correct this time.
^ this guy probably knows better than I do, and hes guessing its gonna be around 79-85mm^2.
Which is slightly bigger than the 75mm^2 that the Brazos currently are (the E-350/E-450).

However:

Considering you get more than double of CPU and IGP power I think this die size is very nice

Its gonna be nice for Laptops 🙂


from Anandtech:
http://www.anandtech.com/show/5491/amds-2012-2013-client-cpugpuapu-roadmap-revealed

Kabini and Temash will also integrate the Fusion Controller Hub (FCH, aka South Bridge) making these two APUs AMD's first true single-chip solutions.

This is bound to make them more energy effecient than the bobcats system's too,
without the FCH being on a seperate chip (thats a large older node tech).
 
Last edited:
Back to jaguar die size
Radeon HD 7470 is 67mm² with 370 million Transistors(configuration: 160:8:4; 64bit)

7470 isn't GCN. AFAIK the upcoming 28nm low end APUs will feature GCN architecture. Considering there currently are no low-shader count GCN GPUs, whether APU or discrete, it's difficult to judge just what it's going to look like. If AMD is following their Trinity/Llano trend then the GPU might take up even more of the entire die space. How many CUs will the chips have? He's saying 2, but I'm not sure where he's getting that from.

Let's add 4 cores 4x 3.1mm² + 4x3mm²(cache should be smaller because this value is for 40nm and not for 28nm)

2 ALUs to 1 FPU? Instruction sets? wider pipeline? These are all variables that make a huge difference in overall core size (and the size of the GPU even more so). What we do know is that it's 2MB shared (likely dynamically like Steamroller will feature?). The 512KB per-core shouldn't change but the 28nm bulk shrink favors cache shrinks because they shrink linearly.

Anyone has any AMD info regarding the above points?
 
ALU width is the same
AGU width is the same
FPU necessary items have been changed from 64-bit to 128-bit.

Other than items required for the FPU not to get bottlenecked 90% of Bobcat is in Jaguar.

With everything provided Jaguar should be no larger than 100 mm²
 
Last edited:
How many CUs will the chips have? He's saying 2, but I'm not sure where he's getting that from.
Its a educated guess, based on 1CU being to little a upg in terms of performance,
and 4CU's being to big and probably memory bandwidth starved.

7470 isn't GCN.
Yeah but what then? compair it to a 7770 with its 10 CU's ?
Thats like 123mm^2 / 5th's = ~24.6mm^2 for the GPU portion?

The reason he used the 7470, was to illustrate how transistors/die space is @40nm.
(370million transisots@40nm vs ~550million@28nm for 2CU = sameish space taken up)


I think he was being conservative, when he said 79-85mm^2 for the intire chip.
Worst case it ends up being around 90mm^2.

I think it ll be slightly smaller, probably closer to 79mm^2 than it is to 90mm^2.
 
Last edited:
wouldn't surprise me if 10+ hour battery life became the norm next year

Frankly if I can carry a 11.6" and play TF2 on max settings I'd be happy
 
It seems that Jaguar could work very well in servers. If a 16 core chip with 128bit memory interface and a big L3 cache could be made, imagine 2 of these on one die. It should totally smoke current 16-core server Bulldozers in parallel applications and wouldn't be that far behind in serial because of the better IPC.
 
It seems that Jaguar could work very well in servers. If a 16 core chip with 128bit memory interface and a big L3 cache could be made, imagine 2 of these on one die. It should totally smoke current 16-core server Bulldozers in parallel applications and wouldn't be that far behind in serial because of the better IPC.
With enough cache, it probably could be competitive with the low-2GHz SKUs, sadly; but that would mean 4 of them, not 2, and about as much cache as BD, too. I seriously doubt, even if it could reach 3GHz and beyond, that performance would scale.
 
Yeah but what then? compair it to a 7770 with its 10 CU's ?
Thats like 123mm^2 / 5th's = ~24.6mm^2 for the GPU portion?

Look at the Bobcat die shot:

ontario_vs_atom.jpg


That's a huuuuuuge GPU. We also know nothing about the GPU in the new 28nm APUs other than that they're GCN, so you've got more than 50% of the die missing. Just how can you make an educated guess with more than 50% of the die being an unknown variable?

I didn't ask because I couldn't pull a number out of my butt; I can. I have many numbers in my butt and pull them out quite liberally (admittedly it can be a pretty messy process). I asked because I was hoping AMD had a slide or document presented at hot chips that pertained to the die size instead of having to take a complete shot in the dark... or reach in deep and pull out slowly. Whichever you prefer.
 
I've finally gone over more of it, and I find it interesting that they mention several D$, and particularly D$TLB, improvements; and reworked L/S. From other CPUs, I expected some of the times Bobcat choked had to do with I$/I$TLB, but maybe it was data address walks or LSU all along. Faster or concurrent (just says, "enhanced") PT walks sure can't hurt, whether they were a major bottleneck or not, on anything x86, though.
 
Back
Top