Memory bandwidth tests... any real differences (PC5300 vs. PC8888)

graysky

Senior member
Mar 8, 2007
796
1
81
Common sense tell you that higher memory bandwidth should mean faster results, right? I set out to put this thought to the test looking at just two different memory dividers on my o/c'ed Q6600 system. At a FSB of 333 MHz, the slowest and fastest dividers I could run are:

1:1 a.k.a. PC5300 (667 MHz)
3:5 a.k.a. PC8888 (1,111 MHz)

CPU-Z Screenshots

Just for reference, as they relate to DDR2 memory:
PC4300=533 MHz
PC5300=667 MHz
PC6400=800 MHz
PC7100=900 MHz
PC8000=1,000 MHz
PC8500=1,066 MHz
PC8888=1,111 MHz
PC10600=1,333 MHz

The highest divider is 1:2 aka PC10600 (1,333 MHz) and it just wasn't stable with my hardware @ 333 MHz.

All other BIOS settings were held constant:
FSB = 333.34 MHz and multiplier = 9.0 which gives an overall core rate of 3.0 GHz.
DRAM voltage was 2.25V and timings were 5-5-5-15-4-30-10-10-10-11.

You can think of memory bandwidth as the diameter (size) of your memory's pipe. Quite often, the pipe's diameter isn't the bottle neck for a modern Intel-based system; it is usually much larger than the information flow to/from the processor. Think of it this way, if you can only flush your toilet twice per minute, it doesn't matter if the drain pipe connecting your home to the sewer is 3 inches around, or 8 inches around, or 18 inches around: the rate limiting step in removing water from your home is the toilet flushing/recycling and the pull of gravity, not the size of your drain line. The same is true for memory bandwidth.

After seeing the data I generated on a quad core @ 3.0 GHz, I concluded that this toilet analogy is pretty true: the higher memory bandwidth gave more or less no appreciable difference for real world applications. Shocked? I was.

Further, I should point out that in order for my system to run stable in PC8888 mode @ a FSB of 333, I had to boost my NB vcore two notches and raise my ICH to the max (both of which the BIOS colored red meaning "high risk.") The increased voltage means more heat production, and greater power consumption -- not worth it for small gains realized in my opinion. Anyway, the test details and results are below if you want to read on.

Relevant test hardware:

Motherboard: Asus P5B-Deluxe (BIOS 1215)
CPU: Intel C2Q - Q6600 (B3 revision)
Memory: Ballistix DDR2-1066

"Real-World" Application Based Tests

I chose the following apps: lameenc, x264, winrar, and the trial version of Photohop CS3. I ran these tests on a freshly installed Windows XP Pro SP2 machine.

Lame version 3.97 ? Encoded the same test file (about 60 MB wav) with these commandline options:
lame -V 2 --vbr-new test.wav
(which is equivalent to the old ?-alt-preset fast standard) a total of 8 times and averaged play/CPU data as the benchmark.

x264 version 0.55.663 ? Ran a 2-pass encode on the same MPEG-2 (720x480 DVD source) file 5 times totally and averaged the results. Without getting into too much detail, the benchmark is 1,749 frames @ 23 fps. Based on these numbers, I reported the time it would take to encode 215,784 frames (which is your average 2.5 h of video @ 23 fps). Why did I do this? The differences of just 1,749 frames were too insignificant.

Shameless promotion --> you can read more about the x264 Benchmark at this URL which contains results for hundreds of systems. You can also download the benchmark and test your own machine.

RAR version 3.62 ? rar.exe ran my standard backup batch file which generated about 1.09 G of rars (1,654 files totally). Here is the commandline used:

rar a -u -m0 -md2048 -v51200 -rv5 -msjpg;mp3;tif;avi;zip;rar;gpg;jpg "E:\Backups\Backup.rar" @list.txt
where list.txt a list of all the dirs I want it to back up. Benchmark results are an average of two runs timed with a stopwatch.

Trial of Photoshop CS3 ? The batch function in PSCS3 was used to do three things to a total of twenty-nine, 10.1 MP jpeg files:

1) bicubic resize 10.1 MP to 2.2 MP (3872x2592 --> 1800x1200) which is the perfect size for a 4x6 print @ 300 dpi.
2) unsharpen mask filter (60 %, 0.8 px radius, threshold 12)
3) saved the resulting files as a quality 8 jpg.

Benchmark results are an average of two runs timed with a stopwatch.

"Synthetic" Application Based Tests

Just two of these were chosen to illustrate a point about theoretical gains vs. real world gains. Actually, I did SuperPI for the hell of it. WinRAR served to illustrate that point.

SuperPI / mod1.5 XS ? The 16M test was run twice, and the average of the two are the benchmark.

WinRAR version 3.62 ? If you hit alt-B in WinRAR, it'll run a synthetic benchmark. This was run twice (stopped after 100 MB) and is the average of two runs.

Raw Data - "Real-World" Apps
Lameenc play/cpu (average 8 runs) @ PC5300: 30.7935
Lameenc play/cpu (average 8 runs) @ PC8888: 30.8045
Result: PC8888 is 0.5 % faster

x264 time to encode 2.5 h DVD @ PC5300: 01:48:54
x264 time to encode 2.5 h DVD @ PC8888: 01:46:14
Result: PC8888 is 2.5 % faster

rar.exe back-up (average 2 runs) @ PC5300: 45 sec
rar.exe back-up (average 2 runs) @ PC8888: 44 sec
Result: PC8888 is 2.2 % faster

Photoshop CS3 Trial batch (average 2 runs) @ PC5300: 33 sec
Photoshop CS3 Trial batch (average 2 runs) @ PC8888: 33 sec
Result: PC8888 is 0.0 % faster

So stop right here and ask yourself if a 2-3 % gain is worth the higher voltage and heat.

Raw Data - "Synthetic" Apps

SuperPI/16M test (average 2 runs) @ PC5300: 8 m 8.546 s
SuperPI/16M test (average 2 runs) @ PC8888: 7 m 33.328 s
Result: PC8888 is 7.8 % faster

Winrar internal benchmark (average 2 runs) @ PC5300: 1,515 KB/s
Winrar internal benchmark (average 2 runs) @ PC8888: 2,079 KB/s
Result: PC8888 is 37.2 % faster

...but who uses their system exclusively running internal and synthetic benchmarks? Recall that for my 1.09 gig back up, I only gained about 2 % doing "real work" by using the higher divider. Hardrives are notorious bottle-necks in systems that serve to nullify any memory bandwidth increases. In this case the 37 % theoretical increase was translated into only a 2 % "real world" increase likely due to the hardrive/rar's ability to read/write the data. Again, this seems kinda wasteful to me.

I will admit that there might be special cases where running at high memory dividers may produce more substantial gains: apps such as folding@home or seti@home, etc. may benefit from the higher memory bandwidth since they tend to make exclusive use of the system memory bandwidth and rely much less on the hardrive. I have no data to back-up this though. Also lacking in my experiments are any game data. I'd be interested in knowing if the higher bandwidth can be leveraged by game engines such as UT3, Crysis, etc. but I also didn't look at these here.

Finally, since I held everything else constant, I didn't look at the tighter timings in 1:1 mode that people can often use which may give additional gains. For example, I can get away with 3-3-3-9 @ 1:1 vs. the slower 5-5-5-15 @ 3:5 with this memory.

Anyway, I hope you found this useful and maybe this will inspire someone else to look at the gaps pointed out above (and the gaps I haven't thought of too!)
 

graysky

Senior member
Mar 8, 2007
796
1
81
Yeah, I read about how they are more dependent on the DRAM speed... this result is just for a q6600...
 

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
and this is two extremes, 667 vs. 1100...for those of us straining our motherboards or ram for 50-100MHz it's completely pointless.

I do remember seeing anand reviews with a much larger difference. I think games showed up to 10% improvements.
 

JustaGeek

Platinum Member
Jan 27, 2007
2,827
0
71
Great job, graysky!

Your testing confirms my "little theory" that the actual bandwidth and the overall throughput is more dependent on the FSB frequency than the RAM speed itself. The memory must only be able to accomodate it, regardless of its frequency.

BTW, I believe that DDR2 667 is called PC 5300, correct...?
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Originally posted by: JustaGeek
Great job, graysky!

Your testing confirms my "little theory" that the actual bandwidth and the overall throughput is more dependent on the FSB frequency than the RAM speed itself. The memory must only be able to accomodate it, regardless of its frequency.

BTW, I believe that DDR2 667 is called PC 5300, correct...?

and ddr2 1066 is PC-8500, this guy just has his numbers all jumbled up =)
 

graysky

Senior member
Mar 8, 2007
796
1
81
@bfdd and justageek - good catch, guys. My BIOS actually reported them as such! I corrected it - thanks!
 

21stHermit

Senior member
Dec 16, 2003
927
1
81
graysky,

Your memory tests are counter intuitive, hence not what the "experts" base their "expert opinions" on. This is another example of "One test is worth One-thousand Expert Opinions."

Thanks for all your efforts. :beer:

Hermit
 

21stHermit

Senior member
Dec 16, 2003
927
1
81
Originally posted by: JustaGeek
Your testing confirms my "little theory" that the actual bandwidth and the overall throughput is more dependent on the FSB frequency than the RAM speed itself. The memory must only be able to accomodate it, regardless of its frequency.
While I overall agree, I've not seen this scale linearly. IOW, a jump in FSB from 1066 to 1333 won't yield anywhere near a 25% performance boost, lucky to get 5%. That says to me the CPU is the limiter and its not being starved for memory.

Hermit

 

wgoldfarb

Senior member
Aug 26, 2006
239
0
0
Graysky

Great info! This helps newbies like me focus our efforts on other areas of overclocking, now that we know that extracting every last bit of bandwidth out of the RAM will probably not pay off in the end.

Now for the next part of this question: I have read some posts suggesting that tighter timings might have as much of an impact as higher frequencies. Have you looked at the same real world gains with changes in timings? For example, in your case, with a FSB at 333, the data suggests that running DDR800 RAM at 1:1 (DDR2 667) will not result in much of a performance loss. Yet, by doing this you may be able to get tighter timings and, ironically, actually improve performance by running your RAM at a lower speed. So, testing two very different sets of timings at the same "low" frequency (667 in this case) could be the other side of this coin. Assuming there is some performance gain from the tighter timings, the argument to run RAM at lower speeds becomes even stronger.

If there is no performance gain from the tight timings, well, maybe we can save some cash next time we buy RAM ;)
 

JAG87

Diamond Member
Jan 3, 2006
3,921
3
76
Graysky, there is no sense in having memory bandwith when you have no interconnect bandwith. You are pretty much maxing out a 333 mhz FSB with 333 mhz memory. If you want to see the 555 mhz memory shine you need to increase the FSB, otherwise the ram is being bottlenecked by it.

But if you increase the FSB, the chipset strap will increase, and the latencies will increase, and its all back to square one. So deep down you have the correct idea, but your attempt to prove it is very biased.
 

hokiealumnus

Senior member
Sep 18, 2007
332
0
71
www.overclockers.com
Originally posted by: JAG87
Graysky, there is no sense in having memory bandwith when you have no interconnect bandwith. You are pretty much maxing out a 333 mhz FSB with 333 mhz memory. If you want to see the 555 mhz memory shine you need to increase the FSB, otherwise the ram is being bottlenecked by it.

But if you increase the FSB, the chipset strap will increase, and the latencies will increase, and its all back to square one. So deep down you have the correct idea, but your attempt to prove it is very biased.

From what you're saying, it appears the quad-pumped FSB number is a complete hoax and no board runs above ~500MHz on its FSB. Is that a correct interpretation? Please source as this would be very interesting to read about.
 

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
Interesting results! I think most enthusiasts realise that C2D performance only scales marginally with faster memory, numerous tests have come to that conclusion and yours is no different. Winrar is an obvious exception of course, so people constantly compressing/decompressing large files may want to get faster memory. :)

Hmm, I might dabble in a bit of testing on my E4400. I know my Corsair VS-667 tops out at 1000MHz 4-4-4-12 so I'll use a 250MHz FSB (250 x 10 / 2.5GHz) for testing purposes. So the test will involve DDR2-500 @ 1:1 and DDR2-1000 @ 1:2.

Should I use optimal timings for the 1:1 ratio? It will easily run @ 3-3-3-9 @ 500MHz. I know for 'scientific testing' purposes I should keep it at 4-4-4-12... opinions?
 

JAG87

Diamond Member
Jan 3, 2006
3,921
3
76
Originally posted by: hokiealumnus
Originally posted by: JAG87
Graysky, there is no sense in having memory bandwith when you have no interconnect bandwith. You are pretty much maxing out a 333 mhz FSB with 333 mhz memory. If you want to see the 555 mhz memory shine you need to increase the FSB, otherwise the ram is being bottlenecked by it.

But if you increase the FSB, the chipset strap will increase, and the latencies will increase, and its all back to square one. So deep down you have the correct idea, but your attempt to prove it is very biased.

From what you're saying, it appears the quad-pumped FSB number is a complete hoax and no board runs above ~500MHz on its FSB. Is that a correct interpretation? Please source as this would be very interesting to read about.


Please install Everest, go into the bios and try it yourself, like I have been doing for the past 2 years of my life.

I am running DDR2-800, but yet going from 1066 FSB to 1600 FSB, results in a 35% boost in R/W/C of my memory, and obviously kills my latency (from 50ns to 70 ns because of chipset strap).

Shouldn't 1066 FSB since it's quad pumped, provide more than enough bandwith for DDR2-800? The base clock is what matter my friend, having memory whos base clock is higher than the FSB base clock is totally futile. The other way around is futile too, but at least you dont have to pay for it.
 

JustaGeek

Platinum Member
Jan 27, 2007
2,827
0
71
With the FSB of 1066MHz, with the Memory Divider of 2:3, my Memory Bandwidth was ~5400MB/s (SANDRA).

With the FSB of 1300MHz, and the Divider of 16/13, my Memory Bandwidth is ~6600MB/s.

I have also tweaked the subtimings based on the G.Skill recommendations for my MB, with the basic timings beings 4-4-4-12-2T, and the RAM frequency set to 800MHz.

Changing the frequency of memory itself to ~867MHz yielded an additional ~200MB/s, but I decided that it was not worth it.

I am running my memory at ~2.0V, BTW.

I believe that it is the reason why the newest motherboards are still equipped with the DDR2 RAM slots. The gains from using the faster DDR3 with the very relaxed timings are still not worth the price of DDR3 RAM today.
 

21stHermit

Senior member
Dec 16, 2003
927
1
81
Originally posted by: JustaGeek
With the FSB of 1066MHz, with the Memory Divider of 2:3, my Memory Bandwidth was ~5400MB/s (SANDRA).

With the FSB of 1300MHz, and the Divider of 16/13, my Memory Bandwidth is ~6600MB/s.
Since Sandra is a memory test utility, not an application, would these numbers show up in any real world application?

Thanks
Hermit

 

graysky

Senior member
Mar 8, 2007
796
1
81
Thanks for the feedback all. I don't have any plans to do more testing with latencies for example. Anyone else willing to do it?
 

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
OK guys, here are a few numbers of my own...

Specs:
C2D E4400 @ 2.5GHz (10 x 250) / P5B Deluxe / 2GB Corsair VS667 / 8800GTS 320 @ 660c/1060m

Tests were run at 1:1 DDR2-500 3-3-3-9 and 1:2 DDR2-1000 4-4-4-12

wPrime 32M:
DDR2-500: 37.531s
DDR2-1000: 37.50s
No statistical significant difference to speak of.

Aquamark 3 1024x768 0xAA/4xAF Max Details:
DDR2-500: 127.88fps
DDR2-1000: 144.07fps
Quite a substantial difference, though it was run at low res. Unfortunately the free version doesn't allow testing at higher resolutions, which I assume would result in a much closer result.

SuperPi 1M:
DDR2-500: 23.094s
DDR2-1000: 22.016s
A slight performance gain at best...

WinRAR benchmark:
DDR2-500: 959 KB/s
DDR2-1000: 1308 KB/s
Well this was to be expected, WinRAR has always shown significant performance gains from faster memory.

X3 Reunion rolling demo 1024x768 0xAA/0xAF Low Details
DDR2-500: 76.36fps
DDR2-1000: 89.305fps
I deliberately ran this at the lowest possible resolution/details to put as much emphasis on the CPU/memory as possible. Quite a significant gain there. 'Real world' resolutions tell a different story though...

X3 Reunion rolling demo 1680x1050 8xAA/16xAF High Details
DDR2-500: 63.532fps
DDR2-1000: 66.306fps
In 'real world' or GPU bound settings, the difference is much smaller, as expected.

Cinebench R10:
DDR2-500: 4716 CB / 3:07s
DDR2-1000: 4851 CB / 3:02s
Minor performance gain, a 5 second boost in rendering times over 3 minutes, nice to have I guess.

Well that wraps things up for now, I might do more tests later when I have time.

EDIT - Just downloaded grayskys x264 encoding benchmark, will give it a run now. :)
 

graysky

Senior member
Mar 8, 2007
796
1
81
Nice job harpoon! Your data seems to confirm my original assertion. I suspect I know the outcome of your x264 results... marginal :) Please let us know.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,994
15,948
136
You ned to bench using 1:1 divider both times, or the results are useless IMO. You need soe PC8500 ram (or something like that) and some slower ram.
 

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
Originally posted by: graysky
Nice job harpoon! Your data seems to confirm my original assertion. I suspect I know the outcome of your x264 results... marginal :) Please let us know.

You guessed right... no prizes though! :p

Results: (avg over 5 runs)

DDR2-500 3-3-3-9
Pass 1 - 70.32 fps
Pass 2 - 17.386 fps

DDR2-1000 4-4-4-12
Pass 1 - 73.73 fps
Pass 2 - 17.854 fps

So approx 2.5 - 5% gains from a doubling in bandwith. Not insignificant I guess, but nothing to lose sleep over. :)

Btw, for those interested, some further reading on this topic:
http://www.xbitlabs.com/articl...emory-guide.html#sect0
 

JustaGeek

Platinum Member
Jan 27, 2007
2,827
0
71
Originally posted by: Markfw900
You ned to bench using 1:1 divider both times, or the results are useless IMO. You need soe PC8500 ram (or something like that) and some slower ram.

If you bench using the 1:1 divider, you are increasing the FSB significantly, so it is not the memory test, but the FSB and overall throughput test.

Again, the faster memory will accomodate the increased bandwidth generated by the faster FSB much easier, but if you slow down the the memory frequency by using the appropriate divider, the testing will not show any significant gains with the same FSB speed.

I think that graysky and harpoon84 have done an excellent job.
 

JustaGeek

Platinum Member
Jan 27, 2007
2,827
0
71
Originally posted by: 21stHermit
Originally posted by: JustaGeek
With the FSB of 1066MHz, with the Memory Divider of 2:3, my Memory Bandwidth was ~5400MB/s (SANDRA).

With the FSB of 1300MHz, and the Divider of 16/13, my Memory Bandwidth is ~6600MB/s.
Since Sandra is a memory test utility, not an application, would these numbers show up in any real world application?

Thanks
Hermit

SANDRA is a memory test utility, but also a CPU and other components testing utility.

IMO, everyone must find a happy medium between the speed and real life performance, as well as the longevity of all the components.

I have found that happy medium with my CPU running virtually trouble free at 2.925GHz, RAM at 800MHz and 2.0V, and the 7950 SC running at stock OC of 600MHz. Trying different other frequencies for my CPU, Video Card and RAM would help achieve better PCMark05, 3DMark06 and SANDRA results, but the games would be a little "stuttery", and other applications just "did not feel right" (missed double click, windows taking longer to close etc.)

These numbers will not impress the "speed demons" in this forum, but I am perfectly happy with, as you've said, my "real life applications' " performance.
 

graysky

Senior member
Mar 8, 2007
796
1
81
@justageek - thanks for the kind words. You're right, after all, it's YOUR system. If low temps and longevity are what are important to you, set up your hardware accordingly. If ultra performance and squeezing every MHz out of the thing are important to you... etc. :)