Memory bandwidth benchmarking?

Scipionix

Golden Member
May 30, 2002
1,408
0
0
I've been running Sandra on my Asus P4T533-C/P1.8A/PC1066 system and the memory benchmarks show bandwidth of 3.4GB/s, or about 80% of the 4.2GB/s theoretical maximum. Is this typical for the real world or is something wrong with my stuff? Oh, and does anyone have any ideas on how to cool my MCH? I think it could use a fan.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
You can try this memory benchmark program that I've been working on. I got it off the web and modified it to suit my needs. You can download it here. I'd be interested in hearing your results.

link
 

Scipionix

Golden Member
May 30, 2002
1,408
0
0
Whatever this means to you . . .

SGI ex1: 40.723612ms = 804.643763mb/sec
SGI ex2: 25.738467ms = 1273.113908mb/sec
SGI ex3: 25.547102ms = 1282.650394mb/sec
SGI ex4: 31.662963ms = 1034.899995mb/sec

memcpy 40.538392ms = 808.320163mb/sec
memfill 12.268040ms = 2671.005387mb/sec
memfill 11.974427ms = 2736.498392mb/sec
memfill 11.986719ms = 2733.692184mb/sec
memfill 12.005436ms = 2729.430134mb/sec

memset 37.522087ms = 873.298965mb/sec


Zhi's memfill with rep movs 36.740703ms = 891.871882mb/sec
Zhi's memcpy with sse: 25.520003ms = 1284.012376mb/sec
AMD's memcpy with REP MOVSB: 38.189770ms = 858.030830mb/sec
AMD's memcpy with REP MOVSD: 38.267992ms = 856.276960mb/sec
AMD's memcpy with mmx & block prefetch: 24.035736ms = 1363.303353mb/sec
AMD's memcpy with sse & block prefetch: 24.155584ms = 1356.539340mb/sec
 

zzzz

Diamond Member
Sep 1, 2000
5,498
1
76
3.4 sounds about right for PC1066. I had a th7-2 which gave about the same when run at PC1066 speeds.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Hmmm, those results are real interesting. I used an algorithm provided by AMD but it seems like it works poorly with Intel processors. But an algorithm made by SGI for P3's is also used by this program and it stinks too. I wonder what algorithm Sandra uses? Maybe it's a nontemporal qword mov. Hmmm....

Regarding your question about real world programs, it's quite scary but real world programs usually don't give a rat's ass about memory performance. These two lines:

memcpy 40.538392ms = 808.320163mb/sec
memfill 12.268040ms = 2671.005387mb/sec

describe the performance of the Visual C++ 6.0 standard c library routines for memory copying and memory filling. As you can see, they stink.
 

Scipionix

Golden Member
May 30, 2002
1,408
0
0
I know as close to nothing as possible about programming, so what exactly is the problem? Are compilers not keeping up with hardware advances?
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Are compilers not keeping up with hardware advances?
Sorta. It's more like compiler makers are slackers or are poor.

Microsoft is a slacker. The PII was out for like 2 years before they released a patch for Visual C++ that let it use SSE. On the other hand, the folks that make GCC are poor since they don't have any budget at all! Borland C++ is more concerned with preserving market share and fighting Microsoft than worrying about performance.

The situation is so bad that Intel has had to resort to releasing their own C++ compiler. But not many people use this. For me, it has some compatibility problems too.

However, compiler makers are only responding to their customers - i.e. software companies. Software companies rely more on hardware advances than on low level efficiency to achieve adequate performance. There's never enough time to dick around with assembler code for the sake of good performance.

Personally, I think software companies shouldn't bother with low level performance unless they really need to. Few programmers know how and the code that is produced is too arcane which makes it difficult to manage. However, compiler makers should be held to a higher standard since their code is depended upon by so many other products. Visual C++'s memfill code is a joke. It's just a while loop that copies stuff one byte at a time. What awesome 8-bit technology.

Also, I think most performance problems are more a matter of poor design and poor algorithm selection/invention than a lack of low level optimizations.