SAKaveri Versus Richland: A Performance Per Clock Comparison

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
TDP is defined as an average power draw over a (defined) period of time, not as peak power consumption.

TDP is the maximum sustainable power consumption within certain parameters (like temperature), a worst case scenario when running commercial applications. It only breaks TDP in thermally insignificant spikes. This is true for processors without Turbo. Processors with Turbo breach TDP specs when using turbo, but it scales back to standard levels once thermal headroom runs out.

We can only say that AMD is breaching their TDP specs with their FX line because MSI went on record to say that they were indeed breaching it. They should have all the parameters needed to calculate TDP, the thermal datasheet, and after running these calculations they reached the 140W number. As for the other AMD chips, we can't say for sure because nobody went on record saying and AMD refused to release the thermal datasheet for their post-bulldozer processors.

It's a mistake to get peak power consumption and say that AMD is breaching TDP specs, because we don't know how long this power spike last, but if a given AMD processor is consuming more power than the TDP for a thermally significant period of time, either we are operating the processor outside the planned envelope or AMD is indeed breaching their own TDP specs.

That said, it's pretty fishy to see current 45W processors of today consuming more juice than 45W processors of last year, one of the four happened:

- The measurements were taken using very different sets of components, leaving the comparison moot.

- Last year's processor consumes less than 45W

- Today's processor consumes more than 45W

- AMD changed the set of parameters used to calculate TDP in the new processor, rendering the comparison moot in the process.

Pick your choice.
 
Last edited:

9enesis

Member
Oct 14, 2012
77
0
0
Actually it is yes, but you are the second one to leave out the significant part in a sentence: "average power draw over a (defined) period of time". Time is very important here, and somehow all people read is "average power".

If you have a processor which is guaranteed by the manufacturer to work at 1W for 99% of the time and 1KW for 1% of the time, then you may encounter two very different scenarios:
- If the CPU uses 1KW in extremely short bursts, then the OEM needs to design a 11W TDP cooling solution.
- If the time window the CPU spends at 1KW is a thermally significant period, then the OEM needs to... design a 1KW TDP cooling solution.

In both cases the average power consumption is 11W, peak power consumption is 1KW, but the average power consumption in a relevant period of time can lead to either 11W TDP or 1KW TDP.


Wrong....ok, i guess i have to explain this to you ;) in order for your CPU to work on average power X while peaking anywhere from 1W to 1KW your MoBo manufacturer has to design a mobo that can deliver 1KW peaks on power rail ......do you get it now? or should i contunue ?
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
TDP is and will be TDP.

I'm trying for the life of me to figure out what your point is and I just don't see it.

Obviously X = X.

If you are trying to say TDP is some solid useful figure, I don't see that. Intel TDP has been useless for the last few generations of CPU, as CPU with identical TDP actually use very different power levels, and when turbo is considered even some Intel CPU exceed TDP, so it's not even a hard limit.
 

9enesis

Member
Oct 14, 2012
77
0
0
If you are trying to say TDP is some solid useful figure, I don't see that. .

it is a most significant figure for OEMs to deal with: first if you design a MoBo you have to consider TDP because of two things:

1) the POWER delivery rail for the CPU or APU

2) the amount of heat generated (for laptops as an example)

since you know that your CPU/APU is rated as 65W TDP - there is no problem for you to make adekvate cooling solution AND power supply.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,681
136
Wrong....ok, i guess i have to explain this to you ;) in order for your CPU to work on average power X while peaking anywhere from 1W to 1KW your MoBo manufacturer has to design a mobo that can deliver 1KW peaks on power rail ......do you get it now? or should i contunue ?
There's no need for you to continue, thank you.
 

lagokc

Senior member
Mar 27, 2013
808
1
41
The problem is the limited usage of the GPU. STREAM, CUDA, OpenCL, etc havent moved that barrier.

Small cores only serve a limited usage. In the consoles you already see games capped at 30FPS due to that. And AMD cant bet on big cores, because they cant afford them. Its all about R&D there.

HPC and small cores is a super tiny amount of the server segment. And the first ARM server designer went belly up.


I was under the impression the XB-one was limited to 30fps because its GPU is too weak to do 60fps at 1080 but the PS4 has no trouble at 60fps for most games.
 

Jovec

Senior member
Feb 24, 2008
579
2
81
AMD said:
http://support.amd.com/TechDocs/43375.pdf

TDP. Thermal Design Power. The thermal design power is the maximum power a processor can draw for a thermally significant period while running commercially useful software. The constraining conditions for TDP are specified in the notes in the thermal and power tables.

The part in bold is where the confusion comes from.

To most of us, we expect the TDP to never be exceeded (Regardless of how Intel and AMD want to define it, this is how we think of it). To AMD, it will never be exceeded while running useful programs. Stability testing and "power virus" apps such as IBT and P95 are not considered commercially useful software by AMD.

Yes, AMD CPU performance is sufficient for the vast majority of uses, but when compared to Intel on benchmarks that review sites use they are noticeably behind. AMD could choose to never exceed the TDP at all, but that would mean lowering voltage and clocks so that the worst case scenario was accounted for. Instead, they exclude IBT, P95, and the like, and this allows them to boost clocks to a point where standard apps will approach the TDP limit (like an x.264 encode).

I watch my 8350 pull more than 125w under P95 at stock settings and downclock cores at 55C. I watch my 5800k pull more than 95w under P95 at stock settings. I watched my (official) low power 705e Phenom II fail P95 at stock settings yet never otherwise have issue. It's something they've been doing for a long time.

Edit: AMD and Nvidia have been doing for a while too with their GPUs. It's also one of the reasons mobile AMD performance is so much worse than desktop AMD.
 
Last edited:

9enesis

Member
Oct 14, 2012
77
0
0
@Jovec

please provide us with details how did you get your numbers - mean what and how did you measure , to support your statements.
 

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
- The measurements were taken using very different sets of components, leaving the comparison moot.

- Last year's processor consumes less than 45W

- Today's processor consumes more than 45W

- AMD changed the set of parameters used to calculate TDP in the new processor, rendering the comparison moot in the process.

Pick your choice.

My choice is BS....

On Hardware.fr test of the 7850K the APU comsumption
including VRMs losses was 55% of the plateform total
comsumption when benching with Fritzchess with 4 threads,
ratio must be much lower with a 45W TDP APU given that it
operate at a point where the PSU has even lower efficency,
likely slightly below 50% of the total comsumption will be
drained by such an APU.

On the TReport test the PSU has about 75% efficency at
80W input so only 60W goes to the plateform wich correlate
the 45W TDP of the APU.

Edit : Tomshardware test of the 7600 is made with
TDP configured at 65W............
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
The part in bold is where the confusion comes from.

To most of us, we expect the TDP to never be exceeded (Regardless of how Intel and AMD want to define it, this is how we think of it). To AMD, it will never be exceeded while running useful programs. Stability testing and "power virus" apps such as IBT and P95 are not considered commercially useful software by AMD.

Prime 95 is a commercially useful application. I should be able to calculate Mersenne Primes with my CPU, shouldn't I? The fact that you can't on AMD processors without throttling shouldn't be blamed on the software, but on AMD.
 
Last edited:

Jovec

Senior member
Feb 24, 2008
579
2
81
@Jovec

please provide us with details how did you get your numbers - mean what and how did you measure , to support your statements.

It's actually closer/lower than I remember so I'll eat a little crow. The 705e is long gone so I can't use that. If I care enough I'll bust out the 8350 (which also exhibits the clock cycling).

Cobbled together a 5800k system, including 2x4GB DDR3 at 1600 1.5v, SS X650 Gold PS, Intel 320, AMD stock 125w cooler (aka heatpipe version), GB GA-F2A85XM-D3H with F3 BIOS and optimized defaults, Win 8.1 Pro 64

Observed via Kill-a-Watt:
  • In BIOS: 102w
  • Idle: 23w
  • P95 Large FFTs: 147w peak, settles around 128w once cores start downclocking
  • GPUZ Render Test: 57w
  • P95 Large FFts and GPUZ Render Test:147w

At 87% gold efficiency for 20% load:
  • In BIOS: 88w
  • Idle: under 20% max load, no efficiency spec
  • P95 Large FFTs: 128w peak, settles around 111w once cores start downclocking
  • GPUZ Render Test: under 20% max load, no efficiency spec
  • P95 Large FFts and GPUZ Render Test:128w
In order for the CPU to stay at 100w, the motherboard, RAM, SSD, and CPU fan have to be consuming 28w or more under load. Without voltage monitoring points on the mobo this is best I can do.

However, this GIF shows the clock cycling. I can't be sure if this is an attempt to keep temps in check (but they are under 50c), power usage in check, or AMD's crappy turbo. If I disable turbo than the clocks stay at 3.8 and the VID at 1.125.

5800k+Clock+cycle.gif


If I enable Gigabyte's "Turbo CPB" setting in the BIOS, I can hit 215w+ on the meter before crashing under P95. This pushes the Vcore to over 1.5v. I won't attempt to discover why: CPU, VRMs, bad implementation by GB, etc...

Just for my curiosity, I also did a Handbrake encode using the High Profile preset. For the "Turbo On" test the CPU did not down-clock as low as it did with P95, but it still did.

Turbo on: 29m, 51s - 117w average
Turbo off: 30m, 36s - 105w average
 

9enesis

Member
Oct 14, 2012
77
0
0
@Jovec

thanks. i think you have done good , considering the tools and info available.

but i'm still not sure(in your original post) as to why you say : "I watch my 5800k pull more than 95w under P95 at stock settings" is that just because of that 28W MoBo etc figure and frquency drop?
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
How is it that people come here with a straight face and say 7.7% is "solid" when 10% on Haswell was a huge fail?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I'm probably just repeating what others have said but I wouldn't include wPrime or TrueCrypt in the average. Keep the results so people get an idea of an improvement for those very specific instances, but don't let it contribute as representative of programs in general because they really aren't.

wPrime is definitely the most synthetic and limited benchmark there (yes, considerably worse than SunSpider and in a totally different league vs Cinebench). It's nothing more than a tight loop of calculating square roots via floating point Newton-Raphson iteration and comparisons for an error threshold. There are really big dependency chains involved with fairly high latency operations. I haven't dug through any code disassemblies or found any online, and would like to see the actual code, but I doubt it's much different than expected. What's weird is why it gets such a big boost in the first place. Maybe it has something to do with the flow control derived from floating point comparisons.

While square root calculation is occasionally useful (although often not as useful as reciprocal square root calculation), if it's used at all it'll be a very tiny part of a program's runtime. And this sort of code isn't similar to very much FP code at all, let alone general purpose code.

TrueCrypt could be a bigger and more meaningfully useful and representative program, but it's tainted by its use of hardware acceleration AES instructions.

I'd probably throw out Luxmark too if you want an actual CPU comparison. End result is around 6% improvement. But this could really be overly negative based on an unfortunate benchmark choice. Really could benefit from some more/better choices. Optimistically speaking, something like 10% isn't out of the question, but I wouldn't expect higher.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
wPrime is definitely the most synthetic and limited benchmark there (yes, considerably worse than SunSpider and in a totally different league vs Cinebench). It's nothing more than a tight loop of calculating square roots via floating point Newton-Raphson iteration and comparisons for an error threshold. There are really big dependency chains involved with fairly high latency operations. I haven't dug through any code disassemblies or found any online, and would like to see the actual code, but I doubt it's much different than expected. What's weird is why it gets such a big boost in the first place. Maybe it has something to do with the flow control derived from floating point comparisons.
Similar behaviour has been seen with Passmark Integer test. An HW divider actually boosted the result. IIRC whether it has been checked, but a equal distribution of instructions like single cycle instructions and int division will cause that the most part of the execution time is caught by the divisions, creating an overweight situation.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Similar behaviour has been seen with Passmark Integer test. An HW divider actually boosted the result. IIRC whether it has been checked, but a equal distribution of instructions like single cycle instructions and int division will cause that the most part of the execution time is caught by the divisions, creating an overweight situation.

Yeah, that situation with Passmark was very bad (and they admitted they do it like you say, although they at least changed the ratio later once they realized how bad this was). I was surprised that someone making popular benchmark software was actually using such a poor test. At least Dhrystone tried to include an appropriate distribution of operations based on some kind of measurement of programs in the day. You're not in good shape if you make Dhrystone look competent in comparison.

I haven't done NR iterations for square roots, I don't know if you can avoid the division, but either way it's probably in there.. so yeah, wPrime as well could be highly dominated by FP division performance which they may have improved.
 
Last edited:

sm625

Diamond Member
May 6, 2011
8,172
137
106
But why would Cinebench and LuxMark regress? Putting all the compiler nonsense aside, the new chip should simply outperform the old chip, across the board. There should be no regressions. How often do you see an ivy outperform a haswell, given the same clock speed, cache, etc? It shouldnt happen. And definitely not by almost 10%.
 

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
It's pretty clear that that isn't what sm625 was talking about.

It is.

Performing 10% better with 15% higher thermal dissipation
is not that hard , all you have to do is cranck up the clocks
by 10% using the same chip, AMD could have done it as well
with Kaveri and presented tremendous results versus Richland.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
How is it that people come here with a straight face and say 7.7% is "solid" when 10% on Haswell was a huge fail?

Personally, I think it's disappointing but expected given the delays. Why would they delay if they had a great chip on their hands?

Still, from a mobile perspective I've been interested in an AMD option since Trinity but OEMs have yet to provide a decent selection of AMD mobile products. If mobile Kaveri can manage to attract some OEM design wins in the ~25-35W category that will be a significant improvement in and of itself.