Observations with an FX-8350

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
your super pi times look similar to my 955 when i first got it,i found out it wouldn't ramp up right away and stay at full speed for the entire run.I used a program to set the load threshold and my times at 1M went from 23-24 to 18.3 seconds.By comparison my 8M time is 3m 52.08.I don't think super pi loads the cpu enough to keep it at full speed

I track clockspeed with AI Suite during the runs, and have turbo-core disabled. For my super PI scores the cores were running 4012 GHz (20x200.6).

Whatever AMD did to the piledriver microarchitecture to boost clockspeeds it really came at a sacrifice to the IPC of whatever instructions are involved in the super-PI application.
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,009
136
I track clockspeed with AI Suite during the runs, and have turbo-core disabled. For my super PI scores the cores were running 4012 GHz (20x200.6).

Whatever AMD did to the piledriver microarchitecture to boost clockspeeds it really came at a sacrifice to the IPC of whatever instructions are involved in the super-PI application.

It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.
 

bononos

Diamond Member
Aug 21, 2011
3,889
158
106
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

AMD is at a disadvantage here. They have to integrate/emulate both their stuff (3dnow) and Intel's instructions which is the defacto standard.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

I have no issue believing this to be the case, but damn, that had to save them how much die-area? Maybe 0.1mm^2? The x87 instruction set itself is rather darn small, so small in fact that even back in the time of 486 processors it was viable to incorporate it onto the die.

Removing it now has got to be the equivalent of writing this post but leaving off the period at the end of the sentence just for the sake of saving the effort of pressing one more key

So it's being emulated using microcode?

If it is then that is some darn impressive emulation considering performance is just lowly, not miserable or abysmal.
 

Abwx

Lifer
Apr 2, 2011
10,967
3,488
136
AMD is at a disadvantage here. They have to integrate/emulate both their stuff (3dnow) and Intel's instructions which is the defacto standard.

Current AMD CPUs support more Intel instructions than Intel s
own processors , including IBs....:biggrin:
 

wlee15

Senior member
Jan 7, 2009
313
31
91
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

Even a cursory look at the instruction tables in either the official software optimization guide or Agner Fog's guide would tell you that's not true at all.
 

Abwx

Lifer
Apr 2, 2011
10,967
3,488
136
Yes , particularly in floating point where current softs doesnt
reach even half of its peak throughput....
 

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,009
136
Even a cursory look at the instruction tables in either the official software optimization guide or Agner Fog's guide would tell you that's not true at all.

As I mentioned before, I was running mostly off of memory, so I decided to look it up in AMD's optimization guide for 15h (a quick google search will find it). In there I found hardly any references to x87 (AMD is really wanting people to get away from it) but here's what I found (granted, I'm no x86 expert so if you happen to have a deeper knowledge, then feel free to share).

First, AMD changed the make-up of their FPU. This is pretty much already known. Now, rather than having a 64-128 bit wide data path for FADD and FMUL, everything is handled by what AMD is calling the 128 bit FMAC. The FMAC instruction is a more modern way of handling fp's. Now, according to AMD, x87 FADD and FMUL's are handled by the FMAC. Given AMD's appearing extreme aversion to x87 for 15h and beyond, it seems to me that the "FMAC" units can run the x87 instruction set but it does so horribly inefficienctly as that is not what it is intended to do. It basically has to do a lot of manipulation of the registers and data to "massage" it through the fpu. For instance, if you look at this small chunk of x87 instructions:

qev7B.png


You can see the instructions going through the FMAC. Also of note, if you look at the bottom instruction, the latency of this instruction (far right number) more than doubles from K10 (114). Across most of the more complex instructions I looked through, the latency took a dramatic hit from K10.

So, perhaps emulation wasn't the right word (though I'm still not convinced that at least some of the instructions aren't done through emulation), but to me it is pretty clear that AMD's new fpu can do x87 for compatibility reasons only and did not give it much dedicated resources for fast execution.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,324
8,009
136
I have no issue believing this to be the case, but damn, that had to save them how much die-area? Maybe 0.1mm^2? The x87 instruction set itself is rather darn small, so small in fact that even back in the time of 486 processors it was viable to incorporate it onto the die.

Removing it now has got to be the equivalent of writing this post but leaving off the period at the end of the sentence just for the sake of saving the effort of pressing one more key

I think it is one piece of a larger movement. 3dnow is also on the chopping block (or was it cut in Bulldozer?) and one MMX pipe will be shared between two steamroller cores (rather than one per core). I think AMD is trying to make a move to 'trim the fat' to not only cut down on die space, but also get people to move to more modern instructions which AMD's latest chips are more designed to run.
 

SPBHM

Diamond Member
Sep 12, 2012
5,056
409
126
thanks again for all the testing, the result with super pi is quite interesting, it seems like the sharing of resources doesn't really have a big effect on it, I expected it to be a lot more significant when I asked for,
to see how it works with HT I tested here with 2m on a i3, it was around 31s 2c, 40s 1c/2t and 50s 1c/1t (2 super pi 2m at the same time)
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
thanks again for all the testing, the result with super pi is quite interesting, it seems like the sharing of resources doesn't really have a big effect on it, I expected it to be a lot more significant when I asked for,
to see how it works with HT I tested here with 2m on a i3, it was around 31s 2c, 40s 1c/2t and 50s 1c/1t (2 super pi 2m at the same time)

There really is no question, overall comparing AMD's current implementation of CMT to that of Intel's SMT (hyperthreading), the CMT approach is superior in terms of performance scaling.

Where it falls flat is in the marketing claims of "cores", Intel rightly avoids setting consumer expectations to expect their 4C/8T processors as being 8C processors (so the glass is always half-full from the consumer perspective - you bought 4 cores but you get a little more than that) whereas AMD sets themselves up for the consumer to have expectations that simply can never be met because at best their 8-core processor is going to perform like a 6-7 core processor (so the glass is always half-empty from the consumer perspective - you bought 8 cores but you always get a bit less than that).

That is what I call dumb marketing because you know in advance that you are not going to fool anyone for very long, eventually everyone will be on to you and they'll be irritated for having been deceived. Not that AMD is lying, of course there are technically 8 cores there, but expectations are set on the basis of technicalities that are then not achieved, and customer perception becomes one of buyer's remorse at that point. It takes dumb marketing to invite that sort of customer perception, especially when they would have known well in advance that it was going to be an issue.

Look at how ARM is handling it, they could call a 4+4 big.LITTLE chip an "8-cores of power! it goes to 11!" or they could rightly set customer's expectations such that the customer knows in advance the product has 4 big cores and 4 little cores. Yes there are 8 cores in there but no one is trying to pull the wool over anyones eyes, it is accepted that 4 of those cores are going to be lesser performing siblings compared to the other 4 cores.

And that is what the FX-8350, a 4x4 x86 big.LITTLE style microarchitecture of sorts. If you power up all four modules and use the chip as a quad-core you get really good performance scaling and as a quad-core the overall performance is not bad. If you go power-miser style and power up just two modules, power-gate the other two, then your four threads still get decent performance but at a reduced power footprint.

BenchwellCMTTaxat4GHz.png


The other thing that unfortunately undermines the credibility of AMD's CMT approach is that the overall performance per core is lower than that of their previous cores (as well as the competition) which is not the fault of CMT itself per se. AMD really made some unfortunate design choices with bulldozer/piledriver versus what they had going for them with thuban at least.

Not only is the performance lacking, but the core size itself is also rather large. In other words they win on exactly no fronts with their current implementation - die size comes out too large, performance comes out too low, and power comes out too high. It is the perfect design if your goal was to bankrupt the company :(

And it didn't have to be this way, that is the real kicker.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
I wonder how many millions of dollars went into R&D for Bulldozer/Piledriver. I am not an expert by any means, but my semi-uneducated opinion is that AMD could have done so much better shrinking and tweaking Thuban vs. what they have out now. I bet the R&D dollars to do that would have been much less than what was sank into Bulldozer/Piledriver, also.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Not only is the performance lacking, but the core size itself is also rather large. In other words they win on exactly no fronts with their current implementation - die size comes out too large, performance comes out too low, and power comes out too high. It is the perfect design if your goal was to bankrupt the company :(

And it didn't have to be this way, that is the real kicker.

I have said it before, AMD is one+ year behind Intel in manufacturing process. PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-14.html

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-15.html

If AMD FX8350 was launched in 2011 things would be way different. The design is fine, CMT is working as it should, they are out of phase in product releases.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I wonder how many millions of dollars went into R&D for Bulldozer/Piledriver. I am not an expert by any means, but my semi-uneducated opinion is that AMD could have done so much better shrinking and tweaking Thuban vs. what they have out now. I bet the R&D dollars to do that would have been much less than what was sank into Bulldozer/Piledriver, also.

Hard to say. But I guess we are atleast talking 2-3B$.

Its a lose/lose situation tho. Because even a tweaked Thuban wouldnt be enough to combat Intel.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

Thats completely wrong. Pilediver is only faster in a small range of highly scalable applications with high scalability efficiency per thread. In other words, applications that contains nearly no singlethreaded code. Add something as simple as 10-25% singlethreaded code and Pilediver is far from good again. And the big problem for Pilediver is, that what you call 99% is more like 0.1% of all multithreaded applications.

An example of a highly multithreaded application. But with a small amount of serial code:
res_syn_3dmark_physics.png

res_syn_fritz.png


And remember the benchmark you link uses 1866Mhz for Pilediver and 1333Mhz for SB. So if the benchmark is heavily memory bandwidth dependent. then you screw your benchmarks as well.

I have said it before, AMD is one+ year behind Intel in manufacturing process.

Even using the same 32nm vs 32nm shows how grim it is for AMD. We even got 45nm Intel chips that can easily compete with Pilediver.
 
Last edited:

Puppies04

Diamond Member
Apr 25, 2011
5,909
17
76
I have said it before, AMD is one+ year behind Intel in manufacturing process. PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-14.html

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-15.html

If AMD FX8350 was launched in 2011 things would be way different. The design is fine, CMT is working as it should, they are out of phase in product releases.


Your fanboyism frankly amazes me at times. If you are going to claim AMD is "X" amount of time behind intel lets compare it to the last intel chip developed before piledriver was released, namely the 3770K. In both your links the 3770K beats the 8350 in every single benchmark so what was the point you were trying to prove again?


Edit. Also the 2600K/2700K (near as damn is 2 year old architecture) beats the 8350 in almost every single benchmark often by large margins and when they do lose out it is by a small amount.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Your fanboyism frankly amazes me at times.If you are going to claim AMD is "X" amount of time behind intel lets compare it to the last intel chip developed before piledriver was released, namely the 3770K. In both your links the 3770K beats the 8350 in every single benchmark so what was the point you were trying to prove again?

Your luck of knowledge doesnt amaze me at all, i was comparing 32nm products but you always see fanboys. Next time try to read first, you might learn a thing or two.


Edit. Also the 2600K/2700K (near as damn is 2 year old architecture) beats the 8350 in almost every single benchmark often by large margins and when they do lose out it is by a small amount.

2600K/2700K loose in 99% of highly MT applications even in Cinebench. Bulldozer was designed for MT loads not single thread and at the same node process its doing exactly that, being faster than Sandybridge.
Now you may not like the idea of a faster than Intel AMD product but thats how it is. As i have said before, the problem is in the timing of the product release and manufacturing not in the design.
 

ozzy702

Golden Member
Nov 1, 2011
1,151
530
136
Blah blah blah blah blah

You really are missing the point. Piledriver beats the 2600k by the smallest of margins in best case scenarios and only because of the .6ghz advantage it has. 2 piledriver "cores" = 1 hyperthreaded i7 core. In only the most optimal of apps will piledriver shine, and even then that's without taking into account the extreme power consumption vs SB, let alone IB.

In the short term AMD chips still show value but over time they not only have worse performance but will cost you more when the cost of power is factored.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
You really are missing the point. Piledriver beats the 2600k by the smallest of margins in best case scenarios and only because of the .6ghz advantage it has. 2 piledriver "cores" = 1 hyperthreaded i7 core. In only the most optimal of apps will piledriver shine, and even then that's without taking into account the extreme power consumption vs SB, let alone IB.

In the short term AMD chips still show value but over time they not only have worse performance but will cost you more when the cost of power is factored.

This is not the best case scenario, it is the only scenario i was talking about. I have clearly stated from my first post that i was talking about the MT scenario only. I havent talked about power consumption or anything else, i was purely talking about MT Performance, the reason of BullDozer(CMT) design. In those applications the FX8350 is faster than Sandybridge and if PileDriver was released in 2011 things would have been different.

ps: CPU frequency is irrelevant, we dont have a low frequency high IPC contest, we are simple evaluating MT performance.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
AMD is better than Intel in MT apps 99% of the time only if you use the 1% of MT apps out there that are actually better on AMD, 99% of the time. In other words, Intel > AMD in every metric.
 

ozzy702

Golden Member
Nov 1, 2011
1,151
530
136
This is not the best case scenario, it is the only scenario i was talking about. I have clearly stated from my first post that i was talking about the MT scenario only. I havent talked about power consumption or anything else, i was purely talking about MT Performance, the reason of BullDozer(CMT) design. In those applications the FX8350 is faster than Sandybridge and if PileDriver was released in 2011 things would have been different.

ps: CPU frequency is irrelevant, we dont have a low frequency high IPC contest, we are simple evaluating MT performance.

CPU frequency most certainly is relevant, especially given chips that overclock to roughly the same frequency. AMD putting the 8350 @ 4ghz is their attempt to hotrod the chip as much as possible, whereas the 2600k @ 3.4ghz is a very very conservative effort on Intel's part. You're arguing the MT performance of an architecture, which you simply cannot do without discussing frequencies.

You're looking at the .1% of time that Piledriver beats two year old i7 tech and calling it a win. Bulldozer and Piledriver are both poor implementations of a great idea.