Observations with an FX-8350

Idontcare · Dec 24, 2012

hdfxst said:
your super pi times look similar to my 955 when i first got it,i found out it wouldn't ramp up right away and stay at full speed for the entire run.I used a program to set the load threshold and my times at 1M went from 23-24 to 18.3 seconds.By comparison my 8M time is 3m 52.08.I don't think super pi loads the cpu enough to keep it at full speed

I track clockspeed with AI Suite during the runs, and have turbo-core disabled. For my super PI scores the cores were running 4012 GHz (20x200.6).

Whatever AMD did to the piledriver microarchitecture to boost clockspeeds it really came at a sacrifice to the IPC of whatever instructions are involved in the super-PI application.

Hitman928 · Dec 24, 2012

Idontcare said:
I track clockspeed with AI Suite during the runs, and have turbo-core disabled. For my super PI scores the cores were running 4012 GHz (20x200.6).

Whatever AMD did to the piledriver microarchitecture to boost clockspeeds it really came at a sacrifice to the IPC of whatever instructions are involved in the super-PI application.

It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

grimpr · Dec 24, 2012

Hitman928 said:
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

This. Soon worlds best pi-cruncher, y-cruncher will be out with FMA4 and XOP support, x87 is obsolete.

http://www.numberworld.org/y-cruncher/

Ajay · Dec 24, 2012

Hitman928 said:
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

So it's being emulated using microcode?

bononos · Dec 24, 2012

Hitman928 said:
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

AMD is at a disadvantage here. They have to integrate/emulate both their stuff (3dnow) and Intel's instructions which is the defacto standard.

Idontcare · Dec 24, 2012

Hitman928 said:
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

I have no issue believing this to be the case, but damn, that had to save them how much die-area? Maybe 0.1mm^2? The x87 instruction set itself is rather darn small, so small in fact that even back in the time of 486 processors it was viable to incorporate it onto the die.

Removing it now has got to be the equivalent of writing this post but leaving off the period at the end of the sentence just for the sake of saving the effort of pressing one more key

Ajay said:
So it's being emulated using microcode?

If it is then that is some darn impressive emulation considering performance is just lowly, not miserable or abysmal.

Abwx · Dec 24, 2012

bononos said:
AMD is at a disadvantage here. They have to integrate/emulate both their stuff (3dnow) and Intel's instructions which is the defacto standard.

Current AMD CPUs support more Intel instructions than Intel s
own processors , including IBs....:biggrin:

wlee15 · Dec 24, 2012

Hitman928 said:
It's the x87 instruction set IIRC and AMD took out native support for space reasons as x87 is pretty much irrelevant today.

Even a cursory look at the instruction tables in either the official software optimization guide or Agner Fog's guide would tell you that's not true at all.

Arkaign · Dec 24, 2012

Abwx said:
Current AMD CPUs support more Intel instructions than Intel s
own processors , including IBs....:biggrin:

That must be why their performance is so spectacular!

Abwx · Dec 24, 2012

Yes , particularly in floating point where current softs doesnt
reach even half of its peak throughput....

gigatexal · Dec 25, 2012

hmm thats a lot of power draw

Hitman928 · Dec 25, 2012

wlee15 said:
Even a cursory look at the instruction tables in either the official software optimization guide or Agner Fog's guide would tell you that's not true at all.

As I mentioned before, I was running mostly off of memory, so I decided to look it up in AMD's optimization guide for 15h (a quick google search will find it). In there I found hardly any references to x87 (AMD is really wanting people to get away from it) but here's what I found (granted, I'm no x86 expert so if you happen to have a deeper knowledge, then feel free to share).

First, AMD changed the make-up of their FPU. This is pretty much already known. Now, rather than having a 64-128 bit wide data path for FADD and FMUL, everything is handled by what AMD is calling the 128 bit FMAC. The FMAC instruction is a more modern way of handling fp's. Now, according to AMD, x87 FADD and FMUL's are handled by the FMAC. Given AMD's appearing extreme aversion to x87 for 15h and beyond, it seems to me that the "FMAC" units can run the x87 instruction set but it does so horribly inefficienctly as that is not what it is intended to do. It basically has to do a lot of manipulation of the registers and data to "massage" it through the fpu. For instance, if you look at this small chunk of x87 instructions:

You can see the instructions going through the FMAC. Also of note, if you look at the bottom instruction, the latency of this instruction (far right number) more than doubles from K10 (114). Across most of the more complex instructions I looked through, the latency took a dramatic hit from K10.

So, perhaps emulation wasn't the right word (though I'm still not convinced that at least some of the instructions aren't done through emulation), but to me it is pretty clear that AMD's new fpu can do x87 for compatibility reasons only and did not give it much dedicated resources for fast execution.

Hitman928 · Dec 25, 2012

Idontcare said:
I have no issue believing this to be the case, but damn, that had to save them how much die-area? Maybe 0.1mm^2? The x87 instruction set itself is rather darn small, so small in fact that even back in the time of 486 processors it was viable to incorporate it onto the die.

Removing it now has got to be the equivalent of writing this post but leaving off the period at the end of the sentence just for the sake of saving the effort of pressing one more key

I think it is one piece of a larger movement. 3dnow is also on the chopping block (or was it cut in Bulldozer?) and one MMX pipe will be shared between two steamroller cores (rather than one per core). I think AMD is trying to make a move to 'trim the fat' to not only cut down on die space, but also get people to move to more modern instructions which AMD's latest chips are more designed to run.

SPBHM · Dec 25, 2012

thanks again for all the testing, the result with super pi is quite interesting, it seems like the sharing of resources doesn't really have a big effect on it, I expected it to be a lot more significant when I asked for,
to see how it works with HT I tested here with 2m on a i3, it was around 31s 2c, 40s 1c/2t and 50s 1c/1t (2 super pi 2m at the same time)

Idontcare · Dec 25, 2012

SPBHM said:
thanks again for all the testing, the result with super pi is quite interesting, it seems like the sharing of resources doesn't really have a big effect on it, I expected it to be a lot more significant when I asked for,
to see how it works with HT I tested here with 2m on a i3, it was around 31s 2c, 40s 1c/2t and 50s 1c/1t (2 super pi 2m at the same time)

There really is no question, overall comparing AMD's current implementation of CMT to that of Intel's SMT (hyperthreading), the CMT approach is superior in terms of performance scaling.

Where it falls flat is in the marketing claims of "cores", Intel rightly avoids setting consumer expectations to expect their 4C/8T processors as being 8C processors (so the glass is always half-full from the consumer perspective - you bought 4 cores but you get a little more than that) whereas AMD sets themselves up for the consumer to have expectations that simply can never be met because at best their 8-core processor is going to perform like a 6-7 core processor (so the glass is always half-empty from the consumer perspective - you bought 8 cores but you always get a bit less than that).

That is what I call dumb marketing because you know in advance that you are not going to fool anyone for very long, eventually everyone will be on to you and they'll be irritated for having been deceived. Not that AMD is lying, of course there are technically 8 cores there, but expectations are set on the basis of technicalities that are then not achieved, and customer perception becomes one of buyer's remorse at that point. It takes dumb marketing to invite that sort of customer perception, especially when they would have known well in advance that it was going to be an issue.

Look at how ARM is handling it, they could call a 4+4 big.LITTLE chip an "8-cores of power! it goes to 11!" or they could rightly set customer's expectations such that the customer knows in advance the product has 4 big cores and 4 little cores. Yes there are 8 cores in there but no one is trying to pull the wool over anyones eyes, it is accepted that 4 of those cores are going to be lesser performing siblings compared to the other 4 cores.

And that is what the FX-8350, a 4x4 x86 big.LITTLE style microarchitecture of sorts. If you power up all four modules and use the chip as a quad-core you get really good performance scaling and as a quad-core the overall performance is not bad. If you go power-miser style and power up just two modules, power-gate the other two, then your four threads still get decent performance but at a reduced power footprint.

The other thing that unfortunately undermines the credibility of AMD's CMT approach is that the overall performance per core is lower than that of their previous cores (as well as the competition) which is not the fault of CMT itself per se. AMD really made some unfortunate design choices with bulldozer/piledriver versus what they had going for them with thuban at least.

Not only is the performance lacking, but the core size itself is also rather large. In other words they win on exactly no fronts with their current implementation - die size comes out too large, performance comes out too low, and power comes out too high. It is the perfect design if your goal was to bankrupt the company

And it didn't have to be this way, that is the real kicker.

SlowSpyder · Dec 25, 2012

I wonder how many millions of dollars went into R&D for Bulldozer/Piledriver. I am not an expert by any means, but my semi-uneducated opinion is that AMD could have done so much better shrinking and tweaking Thuban vs. what they have out now. I bet the R&D dollars to do that would have been much less than what was sank into Bulldozer/Piledriver, also.

AtenRa · Dec 25, 2012

Idontcare said:
Not only is the performance lacking, but the core size itself is also rather large. In other words they win on exactly no fronts with their current implementation - die size comes out too large, performance comes out too low, and power comes out too high. It is the perfect design if your goal was to bankrupt the company

And it didn't have to be this way, that is the real kicker.

I have said it before, AMD is one+ year behind Intel in manufacturing process. PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-14.html

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-15.html

If AMD FX8350 was launched in 2011 things would be way different. The design is fine, CMT is working as it should, they are out of phase in product releases.

ShintaiDK · Dec 25, 2012

SlowSpyder said:
I wonder how many millions of dollars went into R&D for Bulldozer/Piledriver. I am not an expert by any means, but my semi-uneducated opinion is that AMD could have done so much better shrinking and tweaking Thuban vs. what they have out now. I bet the R&D dollars to do that would have been much less than what was sank into Bulldozer/Piledriver, also.

Hard to say. But I guess we are atleast talking 2-3B$.

Its a lose/lose situation tho. Because even a tweaked Thuban wouldnt be enough to combat Intel.

ShintaiDK · Dec 25, 2012

AtenRa said:
PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

Thats completely wrong. Pilediver is only faster in a small range of highly scalable applications with high scalability efficiency per thread. In other words, applications that contains nearly no singlethreaded code. Add something as simple as 10-25% singlethreaded code and Pilediver is far from good again. And the big problem for Pilediver is, that what you call 99% is more like 0.1% of all multithreaded applications.

An example of a highly multithreaded application. But with a small amount of serial code:

And remember the benchmark you link uses 1866Mhz for Pilediver and 1333Mhz for SB. So if the benchmark is heavily memory bandwidth dependent. then you screw your benchmarks as well.

AtenRa said:
I have said it before, AMD is one+ year behind Intel in manufacturing process.

Even using the same 32nm vs 32nm shows how grim it is for AMD. We even got 45nm Intel chips that can easily compete with Pilediver.

Puppies04 · Dec 25, 2012

AtenRa said:
I have said it before, AMD is one+ year behind Intel in manufacturing process. PileDriver is faster in 99% of MT applications against the 8 threaded SandBridge Core i7 2600K and 2700K.

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-14.html

http://www.tomshardware.com/reviews/cpu-performance-comparison,3370-15.html

If AMD FX8350 was launched in 2011 things would be way different. The design is fine, CMT is working as it should, they are out of phase in product releases.

Your fanboyism frankly amazes me at times. If you are going to claim AMD is "X" amount of time behind intel lets compare it to the last intel chip developed before piledriver was released, namely the 3770K. In both your links the 3770K beats the 8350 in every single benchmark so what was the point you were trying to prove again?

Edit. Also the 2600K/2700K (near as damn is 2 year old architecture) beats the 8350 in almost every single benchmark often by large margins and when they do lose out it is by a small amount.

AtenRa · Dec 25, 2012

Puppies04 said:
Your fanboyism frankly amazes me at times.If you are going to claim AMD is "X" amount of time behind intel lets compare it to the last intel chip developed before piledriver was released, namely the 3770K. In both your links the 3770K beats the 8350 in every single benchmark so what was the point you were trying to prove again?

Your luck of knowledge doesnt amaze me at all, i was comparing 32nm products but you always see fanboys. Next time try to read first, you might learn a thing or two.

Puppies04 said:
Edit. Also the 2600K/2700K (near as damn is 2 year old architecture) beats the 8350 in almost every single benchmark often by large margins and when they do lose out it is by a small amount.

2600K/2700K loose in 99% of highly MT applications even in Cinebench. Bulldozer was designed for MT loads not single thread and at the same node process its doing exactly that, being faster than Sandybridge.
Now you may not like the idea of a faster than Intel AMD product but thats how it is. As i have said before, the problem is in the timing of the product release and manufacturing not in the design.

ozzy702 · Dec 25, 2012

AtenRa said:
Blah blah blah blah blah

You really are missing the point. Piledriver beats the 2600k by the smallest of margins in best case scenarios and only because of the .6ghz advantage it has. 2 piledriver "cores" = 1 hyperthreaded i7 core. In only the most optimal of apps will piledriver shine, and even then that's without taking into account the extreme power consumption vs SB, let alone IB.

In the short term AMD chips still show value but over time they not only have worse performance but will cost you more when the cost of power is factored.

AtenRa · Dec 25, 2012

evilwhitey said:
You really are missing the point. Piledriver beats the 2600k by the smallest of margins in best case scenarios and only because of the .6ghz advantage it has. 2 piledriver "cores" = 1 hyperthreaded i7 core. In only the most optimal of apps will piledriver shine, and even then that's without taking into account the extreme power consumption vs SB, let alone IB.

In the short term AMD chips still show value but over time they not only have worse performance but will cost you more when the cost of power is factored.

This is not the best case scenario, it is the only scenario i was talking about. I have clearly stated from my first post that i was talking about the MT scenario only. I havent talked about power consumption or anything else, i was purely talking about MT Performance, the reason of BullDozer(CMT) design. In those applications the FX8350 is faster than Sandybridge and if PileDriver was released in 2011 things would have been different.

ps: CPU frequency is irrelevant, we dont have a low frequency high IPC contest, we are simple evaluating MT performance.

2is · Dec 25, 2012

AMD is better than Intel in MT apps 99% of the time only if you use the 1% of MT apps out there that are actually better on AMD, 99% of the time. In other words, Intel > AMD in every metric.

ozzy702 · Dec 25, 2012

AtenRa said:
This is not the best case scenario, it is the only scenario i was talking about. I have clearly stated from my first post that i was talking about the MT scenario only. I havent talked about power consumption or anything else, i was purely talking about MT Performance, the reason of BullDozer(CMT) design. In those applications the FX8350 is faster than Sandybridge and if PileDriver was released in 2011 things would have been different.

ps: CPU frequency is irrelevant, we dont have a low frequency high IPC contest, we are simple evaluating MT performance.

CPU frequency most certainly is relevant, especially given chips that overclock to roughly the same frequency. AMD putting the 8350 @ 4ghz is their attempt to hotrod the chip as much as possible, whereas the 2600k @ 3.4ghz is a very very conservative effort on Intel's part. You're arguing the MT performance of an architecture, which you simply cannot do without discussing frequencies.

You're looking at the .1% of time that Piledriver beats two year old i7 tech and calling it a win. Bulldozer and Piledriver are both poor implementations of a great idea.

Observations with an FX-8350

Elite Member

Diamond Member

Golden Member

Lifer

Diamond Member

Elite Member

Lifer

Senior member

Lifer

Lifer

Junior Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member