Bapco (Sysmark) favoring Intel?

kuk

Platinum Member
Jul 20, 2000
2,925
0
0
I was reading through this thread @ AcesHardware, and taking Anand's benchmarks and analysis as a base, the author suggests that in each new release of Sysmark, Intel appears favored. Now that AMD has joined with the BAPCo team, I don't know what will hapen ... but this is nevertheless a good read.

I have posted this information before. But I am posting it again due to the fact that I have had a longstanding argument with Dean Kent regarding the merits of Sysmark and BAPCo. Dean said in a thread below:

"> I have seen that SYSmark 2001 and 2002 favor P4, and I have
> provided some possible reasons why this is the case. I have
> not seen any evidence that SYSmark 99 and 2000 favors PIII
> over its rival processors to some unusual extent, nor have
> I seen anything to suggest that SYSmark 95/96 and 97/98 had
> any similar issues. Your scientific trendline seems to include
> only two data points. "

It now becomes apparent that Dean never read my previous post before passing judgement, as I did provide more than two data points.

So here it is again. As someone pointed out before, although I have attempted to minimize it, the motherboards in these examples often do change. This might affect the results, but I will assume that Anand used the best motherboards available, and the trend is clear to see regardless.

All scores taken from www.anandtech.com.


http://www.anandtech.com/showdoc.html?i=1051&p=4
Sysmark 98 (October 14, 1999):
Athlon 600 (FIC SD11, ???): 267
P3 600 (ABit BX6, BX??): 239

AMD clearly the leader in Sysmark 98.


http://www.anandtech.com/showdoc.html?i=1247&p=6
Sysmark 2000 (May 24, 2000):
Athlon 600 (KX133): 127
P3 600/100 (BX): 133

Here is the first I could find of Sysmark 2000. Coppermine has appeared about 6 months before these scores are taken. Coppermine has on-die cache, which Athlon still does not have.

Anand says:
"Instead, it is correct to say that SYSMark 2000 is highly dependent on a fast L2 cache"

OK, Sysmark 2000 stresses the speed of the cache more than 98. P3 has faster cache than Athlon. This favours Intel, but it should as fast cache should increase performance.


http://www.anandtech.com/showdoc.html?i=1252&p=13
Sysmark 2000 (June 4, 2000) Thunderbird
P3 800/100 (BX): 161
P3 i820: 166
TBird (KT133) 800: 165

AMD Introduced Tbird with fast cache and catches up. Again, fair result.


http://www.anandtech.com/showdoc.html?i=1441&p=7
Sysmark 2000 (March 22, 2001):
TBird 1.33 (760 DDR): 261
TBird 1.33 (KT133A): 252
P3 1Ghz (ASUS CUSL2): 233
P4 1.5(ABIT TH7-RAID): 220
P4 1.3(ABIT TH7-RAID): 202

P4 Intorduced in Nov. 2000. AMD has 133 FSB TBird, DDR.
AMD is running away with the race due to high MHz advantage over P3.
P4 is introduced and fairs poorly.

Anand says:
"Although a few programs that compose the suite are particularly memory
bandwidth intensive, the same cannot be said about the whole.
For this reason, combined with the fact that there is no multitasking,
the AMD760?s DDR SDRAM does not offer any incredible performance advantages here"

i.e. Bandwidth is not stressed in Sysmark 2000.
Up until the P4 introduction AMD has had the bandwidth advantage due to a much faster bus than the P3. Just a coincidence that Sysmark has concentrated on cache instead of FSB. I'm sure they will continue to.

BAPCo introduces Webmark2001 (same review as above):
P4 1.5: 244.95
TBird 1.33 220.64

P4 also does very well in Webmark 2001B and Anand comments:
"The ?B? portion of the benchmark is actually quite easy to explain since 65%
of this score comes from a Microsoft Media Encoder test which easily shows off
the Pentium 4?s impressive memory bandwidth figures."

BAPCo has introduced Webmark2001 just as the P4 is released. "Luckily" for Intel, Webmark shows the P4's strengths very well. This just happens to be bandwidth. Fine, we all know that the "Web" is at least 65% about videos.


http://www.anandtech.com/showdoc.html?i=1460&p=9
Sysmark 2001 (April 23, 2001):

Sysmark 2001 Internet Content Creation:
P4 1.7: 191
P4 1.5: 169
P4 1.3: 148
TBird 1.33 (760): 147

Anand says:
"First we see that the Internet Content Creation portion of the SYSMark 2001
benchmark is heavily dominated by the Pentium 4 processor.
The reasoning behind this is simple; a large portion of this test is
based on a Windows Media Encoder benchmark that happens to be quite bandwidth intensive."

Hold on...I'm witnessing a pattern here... Sysmark2000 not bandwidth intensive... AMD has bandwidth lead over P3 here because of DDR. Sysmark 2001... bandwidth intensive... P4 has higher bandwidth than Athlon. Strange coincidence. I'm sure it is just that.


Sysmark 2001 Office Productivity:
TBird 1.33 (760): 152
P4 1.7: 146
P4 1.5: 140
P4 1.3: 126

AMD still holds the lead here. But compare the 2001 to the 2000 scores.
TBird 1.33 drops 41.76% while P4 1.5 drops 36.36%. Results are more favourable
to Intel than AMD. Have office apps changed? Have the processors changed? No. No. The drop should have been identical.


October 2001, Athlon XP released.

http://www.anandtech.com/showdoc.html?i=1554&p=4
Internet Content Creation (Nov 5, 2001):
Athlon XP 1.6GHz: 215
P4 2.0 GHz: 213
P4 1.7: 187
TBird 1.4: 163

Athlon XP takes the lead even in internet content creation. In office it is
even worse for Intel:
Athlon XP 1.6GHz: 190
P4 2.0 GHz: 171
P4 1.7: 152
TBird 1.4: 173

The 2.0 Ghz P4 is being beaten by the 1.4 TBird!

Northwood is released January 2002 and the 2.2 P4A takes the lead. AMD fairs not too badly as the 2.0 P4A is still beaten by the 1800+ in Sysmark 2001 Office Productivity.
(http://www.anandtech.com/showdoc.html?i=1574&p=6)
P4A 2.2Ghz: 199
Athlon XP 1.67GHz: 194
Athlon XP 1.60GHz: 193
Athlon XP 1.53GHz: 187
P4A 2.0Ghz: 186


By March 13, 2002 Sysmark 2002 (again, 6 months after the XP came out).
http://www.anandtech.com/showdoc.html?i=1595&p=7

For Sysmark 2002, Office productivity:
P4A 2.2Ghz: 165
P4A 2.0Ghz: 158
Athlon XP 1.73GHz: 153
Athlon XP 1.67GHz: 150
Athlon XP 1.53GHz: 141

Compare the percentage speed drops:
P4A 2.2GHz: 17.09%
Athlon XP 1.67GHz: 22.68%

Again, another release of Sysmark, another benchmark favouring Intel.

The coincidences are starting to add up.

This is more than two data points. And perhaps it isn't "conclusive" proof. But I am willing to bet that the majority of the people reading this can see a pattern. Combine this with the fact that Intel and BAPco originally shared the same address (and that this fact has subsequently been covered up), that the BAPCo website was originally created by Intel (again, subsequently covered up), and you get one hell of a smoking gun!


Kuk :confused:
 

ToBeMe

Diamond Member
Jun 21, 2000
5,711
0
0
Well of course..........doesn't everyone know that whenever an Intel chip wins a benchmark, the benchmark is always favoring the Intel chip...................;)
 

Snoop

Golden Member
Oct 11, 1999
1,424
0
76
I posted this a few months back but it still applies:

Although Sysmark 2002 is supposed to enable SSE for the Athlons, when compared to Sysmark 2001 (patched by Anandtech to use SSE) the Athlon loses considerable performance. For instance, the athlon XP1800, which beat the P4 2.0 by ~4% in Overall System Performance in 2001, now loses to the same p4 2.0 by ~8% in 2002. That is a 12% performance delta. A 14% variance in this particular benchmark is nominal to the difference between a p4 1.6 and p4 2.0. So between versions of the Sysmark benchmark suite, Intel is able to gain what equates to 4 speed grades of benchmarked performance over AMD.

IMO, this is not an acceptable deviation of a benchmark, which by definition (a 'standard') should remain semi-constant. This pattern is repeating as the Athlon does worse in SysMark 2002 then it did in 2001, and worse in 2001 then it did in 2000.... the same pattern held with the K6-2/3, they performed better in 98' then they did in 2000.

Links:
2001
2002

 

First

Lifer
Jun 3, 2002
10,518
271
136
Here's what I was told by the BAPCo Operations Manager. The paragraphs labeled "1)" and "2)" are the questions I sent to BAPCo:

Evan,
Below are the answers to your questions.

1) Since SYSMark2002 debuted, there's been quite a
noticable difference in scores between SYSMark2001
and SYSMark2002 on certain platforms. For example,
using the 2002 version, P4 platforms (DDR and RD) are
scoring exceptionally well against Athlon XP
platforms, whereas the difference was relatively slim
with the 2001 version. Is the 2002 version more
bandwidth dependent? Latency dependent?

One of the BAPCo's goals in developing SYSmark 2002
was to make it a more balanced system-level
benchmark. As a result, system characteristics like
memory latencies and I/O play a larger role in the
performance of a system running SYSmark 2002. The
combination of updated applications and workloads
give rise to different code and system level
characteristics that BAPCo has attempted to capture
in its White paper
(http://www.bapco.com/SYSmark2002Methodology.pdf).
For example, as seen in the graphs in the White
paper, SYSmark 2002 is more sensitive to memory and
I/O requirements. One of the other and important
beneficial side effects of the new applications and
workloads is that the distribution of the weights for
applications in SYSmark 2002 is more realistic as
seen in the Tables in the White paper.

2) Which applications did you add/remove going from
SYSMark2001 to SYSMark2002? What makes the 2002
version "better" or more advanced than 2001?

SYSmark 2002 uses the latest versions of the exact
same applications used in SYSmark 2001. For example,
we upgraded all Office applications to Office XP and
also updated Photoshop and Windows Media Encoder.
SYSmark 2002 thus contains the latest shipping
version of the software from the software vendor.
SYSmark 2002 also has better usability features like
support for several non-English Operating Systems,
command line operation, easier installation and
improved error reporting. As mentioned above, the
benchmark is also a more balanced system-level
benchmark.
 

Snoop

Golden Member
Oct 11, 1999
1,424
0
76
Their explanation is completely plausible.
My problem is that it is implausible that every year the changes benefit the same company. To me it seems unlikely that every year Intels platforms increase their performance, while their competitors performance goes down. It would be fine if this happened for one or two revisions of a benchmark, but it seems that EVERY revision that I have bothered to look, increases the standardized scores of Intel chips while reducing it for AMD chips.
By the law of averages, I would think that out of Sysmark 2002, 2001, 2000, and 1998 at least one would have increased the normalized scores of a non-Intel chip compared to an Intel chip. This is either incredible luck for Intel, or this benchmark is weighted toward the strongpoints of certain brand chips.
 

lambasa

Member
Mar 30, 2002
60
0
0
Their explanation is completely plausible. My problem is that it is implausible that every year the changes benefit the same company. To me it seems unlikely that every year Intels platforms increase their performance, while their competitors performance goes down. It would be fine if this happened for one or two revisions of a benchmark, but it seems that EVERY revision that I have bothered to look, increases the standardized scores of Intel chips while reducing it for AMD chips.

I believe there is a very understandable phenomenom allowing Intel to gain vs. AMD in subsequent versions. Historically Intel, due to their large size, has been able to dictate the code optimization standards for x86 code. Based on their microarchitecture, they write a performance optimization guide. This guide is generally used by compiler writers (and a few assembly coders) to optimizer their code. Compilers based on the new guide might only be released months after the new microarchitecture is released as a CPU. Applications which use this compiler might be months to over a year before you see the change. Benchmarks which use these applications might be a few more months. I believe this is why you are seeing Sysmark 2002 showing significant improvements for P4 systems, the optimizations are finally making it in.

AMD, due to their smaller market share, cannot dictate code optimizations to standard (industry-wide) compilers. They are forced to design their CPU's in a way that performs good on a large subset of the existing code. Intel can afford to get hit with a few penalties up front, but rely on "well-behaved" code to be shown in their best light.

This trend has been seen time and time again... Pentium, PPro, now P4. I would not bet against seeing the same behavior on Intel's next new microarchitecture.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
Well, why is it important? It's a synthetic benchmark. This is why it is synthetic, in that it has code that is never commonly used. Is it biased? Maybe. But then again, what code isn't biased? You'll always have code that is favorable (or rather, works better) towards one architecture. That's life. The point is not whether it's favorable but whether it's actually important. Does Sysmark make a good indication of the performance your average consumer should expect? If not, then it's invalid, if so then it doesn't matter what kind of optimizations it has. Try to keep some perspective here people. Benchmarks were not invented to help you fanboys bitch at eachother, they're suppose to show relative performance in commonly used applications. If one processor does better than another at a benchmark that truely represents commonly used applications then it shouldn't matter what speciality code it uses. The person who actually use this application's not gonna care why it runs faster on a P4 or Athlon.
Is Sysmark a good indication of commonly used applications? I don't think so. It is a simulation after all and albeit not a very good one. However this "it's unfair because it uses SSE2" arguement is pure BS. To everyone who says only non-SSE2 benchmarks are "fair", I say we should expand that to "non-x87 intensive benchmarks are fair". After all, x87 is an extension of the x86 ISA and software needs to be specifically written to use it. Coincidently it is also why the Athlon performs so well, because of its strong x87 FP power. I suppose that's unfair because it's not "pure" x86 code?
 

lambasa

Member
Mar 30, 2002
60
0
0
imgod2u:

What do you consider synthetic? While not a real application, sysmark in comprised of a suite of real applications. Bapco has attempted to characterize a standard (for an average consumer) workload using these applications. I would not throw Bapco into the synthetic benchmark club, where I think benchmarks like Sandra belong.

I agree with you that using SSE2 (and other ISA exentions) is "fair" if real consumers get the same benefit as the benchmark. If the benchmark is based off real applications, they will.
 

andreasl

Senior member
Aug 25, 2000
419
0
0
Just reactivating this thread... Apparently some new issues about Sysmark has come up. Van Smith will claim that he has broken this story, but I would like to make you guys aware of a post made by Dean Kent (the RWT guy) on Aces Forum before Van Smith published his article. Find Dean's post here.

I really don't like linking anything to Van's but if you read through the rhetoric you may find that the facts themselves are hard to dispute. I would expect more publications to publish such articles within the next few days. Even Anand hinted that he was investigation issues with Sysmark 2002 and omitted to using it in his latest XP2600+ article for that reason. I think whoever contacted Van and Dean probably also contacted Anand and probably others as well.

Van Smith's article

I would also recommend that if you want a balanced view of what is going on here you should listen to what Dean Kent has to say over probably anyone else (except perhaps Anand, I hope he publishes something about this soon). He will also publish an article (series even I think) about this issue.
 

MadRat

Lifer
Oct 14, 1999
11,961
278
126
Could this latest attack on BapCo be timed to coincide with the classaction lawsuits against Intel and HP?
 

sandorski

No Lifer
Oct 10, 1999
70,616
6,170
126
Bapco's answers to Evan's question are the complete and absolute truth. Bapco uses the exact same apps in 2002 as they did in 2001. However, it seems that what they do with those apps has changed dramatically.

Photoshop for instance. In 2001 AMD won 8 tests performing certain tasks. 2002 no longer uses those tests(they were all dropped), but they kept all the rest whcih the P4 won. 2002 also introduced a whole bunch new Photoshop tests, all of which happen to favour the P4. If you want the details you'll need to go to the link given by Andreasl. Now some may not think anything is afoot here, but it sure seems strange that though Photoshop is essentially the same, suddenly the filters that favored the competition(it should be noted that Bapco is Intel, for all intents and purposes) are removed.

Hmm, I suppose people have just quit using those filters, they went out of style or something.
rolleye.gif