Benchmark skullduggery

Viditor · Jul 31, 2008

Arstechnica

Not, it's not an Intel conspiracy...
It's PCMark 2005...

They tested the Atom and the Nano on PC Mark, and then they changed the CPUID just to see if the test was fair.

"Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs"

nerp · Jul 31, 2008

Further evidence that benchmark suites are a huge crop of junk.

The only reliable testing method is real world apps, real world tasks and a script to run them in order with someone sitting with a stopwatch.

Gillbot · Jul 31, 2008

Originally posted by: nerp
Further evidence that benchmark suites are a huge crop of junk.

The only reliable testing method is real world apps, real world tasks and a script to run them in order with someone sitting with a stopwatch.

I use my own SOTP meter. (Seat of the pants)

If my apps run better, I say it is better.

Phynaz · Jul 31, 2008

skullduggery

noun
verbal misrepresentation intended to take advantage of you in some way.

A little presumptuous of you isn't it?

VirtualLarry · Jul 31, 2008

This reminds me of the quake/quack debacle.

Fox5 · Aug 1, 2008

Originally posted by: Viditor
Arstechnica

Not, it's not an Intel conspiracy...
It's PCMark 2005...

They tested the Atom and the Nano on PC Mark, and then they changed the CPUID just to see if the test was fair.

"Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs"

More likely laziness than intentional. At the time the benchmark was made, it was probably easiest to decide on optimizations (such as SSE) based on cpu identification. Perhaps all Intel cpus sold at the same support SSE2, whereas AMD was limited to SSE/3dnow, and everything else wasn't worth bothering with.

nyker96 · Aug 1, 2008

THis could very well just be a programming mistake that enables instruction optimizations of some type on Intel CPUIDed chips. It does look like the benchmark was written for that era when some instructions are only available for Intel or Amd but no other brand. Thus they do a CPUID check to determine whether to turn those instructions on/off.

Lorne · Aug 1, 2008

CONSPIRACY!!!! OMG!!!

I think happen also in the past with 3Dmark 99 or 2k or something like that as one of the programmers was a fanboy and all hell broke loose.

But I have wondered that myself, Ive always seen some oddities in the past about some readings.
Ive lately wonderd about games that installed and show the Intel logo if there might be a optimization change during its setup, The install does check what CPUID and notes it in a file.
Maybe somone could cut and paste CPUID's in those files and see if it makes a difference.

Though games are different then benchmark and are programmed and engine optimizations are different and if a CPU mfg wishes to sponsor then its up the the game mfg.
But 3Dmark shouldnt make any changes what so ever or try to optimize for CPUID what so ever, If a CPU doesnt support it then that test fails or is low scored, CPUID should be use only to show you and if online postings accure what CPU was used, nothing more.

CTho9305 · Aug 1, 2008

I think it's very irresponsible for reviewers to include synthetic benchmarks (like memory bandwidth, ALU throughput, etc). These benchmarks are extremely sensitive to microarchitecture details, and often you'll find that the code has been optimized for some processors but not others.

Originally posted by: Fox5
More likely laziness than intentional. At the time the benchmark was made, it was probably easiest to decide on optimizations (such as SSE) based on cpu identification. Perhaps all Intel cpus sold at the same support SSE2, whereas AMD was limited to SSE/3dnow, and everything else wasn't worth bothering with.

That has never been the case. The CPUID instruction returns feature bits that software is supposed to use to determine whether or not a given instruction set extension (e.g. SSE3) is supported by the processor.

That said, there is still a possibility that this wasn't malicious - the PCMark developers might have written codepaths that were optimized for each vendor's architecture regardless of the supported instructions. Each code sequence has different performance characteristics under each architecture, so it may be that AMD and Intel are given different SSE2 codepaths.... or that one vendor's implementation of SSE3 doesn't buy as much speedup so older instructions are used instead. In this case, we would see see a particularly strong effect because the performance characteristics of Nano are drastically different from the previous Via processors, and much more similar to AMD/Intel processors (since Nano finally supports out of order execution). Code optimized for narrow-issue in-order processors is not going to be optimal compared to code optimized for a wide out of order processor, regardless of the vendor / microarchitectural details.

This particular case may be different from the Intel compiler maliciousness that was seen a few years ago, where non-Intel CPUs never got an SSE codepath. In this case, the PCMark developers may have optimized for existing Via CPUs, but not yet optimized for Nano; in Intel's case they simply deoptimized 100% when a non-Intel CPU was used. It's also possible that Via was getting some generic codepath that still used SSE instructions but wasn't specifically optimized for any architecture, and it turns out that Nano looks enough like the AMD/Intel microarchitectures that code optimized for either of them runs pretty fast.

This is primarily a problem with "highly optimized" code - it's highly optimized for a given microarchitecture and may perform poorly on a different microarchitecture, even though the second microarchitecture may be "better" in general. K8 trashes Core 2 on this code sequence because that sequence of instructions is perfectly optimized for K8's available resources. It would be simple to write a small program that shows Core2 in a good light against K8; it's probably possible to write something that even makes Via Nano look very good (Nano's FP execution latency is very low, so a dependent chain of FP operations would probably be relatively fast on Nano clock-for-clock).

Did anybody else notice that nobody compared Nano to mainstream processors? Did Via force reviewers not to do that?

AdamK47 · Aug 1, 2008

It's no suprise to me that Futuremark would do this. I never take results from from their apps as an indication of anything performance related. They're good to loop and do stress testing though.

Idontcare · Aug 1, 2008

Of more interest to me is whether they can find any real-world applications that produce similar effect.

If it is just benchmarking that are suspect then it's hardly a surprise...paying for a benchmark for personal use to see how your system performs is very much like paying cable TV only to still have to sit there be exposed to commercial advertisements.

The benchmarking company does not exist for the benefit of consumers (neither does the cable company) and just because they charge the public consumer for use of their product does not mean the public consumer is the targeted/sought-after customer.

For cable company's the customer is quite clearly the commercial entities willing to pay for advertising. Likewise I haven't dug into a benchmarking company that didn't turn out to have a commercial side that sought "subsidies" from Intel/AMD/ATI/NVidia so as to ensure their benchmarks were best suited to reflect the "strengths" of IDM's products.

All this PCMark situation tells me is that Intel loaned more engineering expertise to them for optimizing the benchmark to highlight Intel's CPUs strengths whenever it was ran on an Intel chip than AMD apparently allocated for optimizing codes for their CPU's (and Via).

Invest nothing into the benchmarking firm and guess how much time/money they are going to invest in you? Same for cable TV. Same for every stock market analyst that freely gives their advice on MSNBC's Squawk Box. (gee how nice of them)

We consumers get exactly what we deserve because we surely don't demand/require anything more.

Idontcare · Aug 1, 2008

Originally posted by: CTho9305
Did anybody else notice that nobody compared Nano to mainstream processors?

Yes, same as with Atom.

Originally posted by: CTho9305
Did Via force reviewers not to do that?

I refuse to believe anyone works on a review with a gun to their head.

I do, however, very strongly believe in following the money as none of these review sites are non-profit...so the question I would entertain is just how does the revenue path become enhanced in favorable manner for every single review site to NOT step-up and at least compare Atom and Nano to a mainstream CPU for at least that one-time check and compare?

CTho9305 · Aug 1, 2008

Originally posted by: Idontcare

Originally posted by: CTho9305
Did anybody else notice that nobody compared Nano to mainstream processors?

Click to expand...

Yes, same as with Atom.

Originally posted by: CTho9305
Did Via force reviewers not to do that?

Click to expand...

I refuse to believe anyone works on a review with a gun to their head.

I do, however, very strongly believe in following the money as none of these review sites are non-profit...so the question I would entertain is just how does the revenue path become enhanced in favorable manner for every single review site to NOT step-up and at least compare Atom and Nano to a mainstream CPU for at least that one-time check and compare?

Sure, there's no gun to the head, but there's always the "do what we say or you never get a review sample again"...

magreen · Aug 1, 2008

Originally posted by: Phynaz
skullduggery

noun
verbal misrepresentation intended to take advantage of you in some way.

A little presumptuous of you isn't it?

presumptuous

adjective
Full of presumption; presuming; overconfident or venturesome; audacious; rash; taking liberties unduly; arrogant; insolent.

A little supercilious of you isn't it?

Idontcare · Aug 1, 2008

Originally posted by: CTho9305
Sure, there's no gun to the head, but there's always the "do what we say or you never get a review sample again"...

I am in no way disagreeing with your stated strategy as the likely one employed by the hardware vendors, but you have to admit the reason it works is because of only one reason...

Fill in the blank: "... and this deters the reviewers from straying off the scripted track because ____.

(hint - it has to do with $$$, ad money, web hits, page views, etc)

There's no forcing anyone here, in my opinion. Is it a little surly? Perhaps, but only to the naive consumer, and who cares what they think anyway?

aigomorla · Aug 1, 2008

so when do they come in cell phones?

i dont care which is better, just whose gonna be first.

I would love either or in a cell phone

Nathelion · Aug 1, 2008

When cell phones have batteries that can power a 50-60W device for more than twenty minutes

Idontcare · Aug 1, 2008

Originally posted by: Nathelion
When cell phones have batteries that can power a 50-60W device for more than twenty minutes

What, you don't want to be sporting this on your belt-clip!?

aigomorla · Aug 1, 2008

Originally posted by: Idontcare

Originally posted by: Nathelion
When cell phones have batteries that can power a 50-60W device for more than twenty minutes

Click to expand...

What, you don't want to be sporting this on your belt-clip!?

ROFLMAO

i want one of those.

thats almost as funny as the nintendo belt.

Benchmark skullduggery

Diamond Member

Diamond Member

Lifer

Lifer

No Lifer

Diamond Member

Diamond Member

Senior member

Elite Member

Lifer

Elite Member

Elite Member

Elite Member

Golden Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Senior member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member