Memory bandwidth tests... any real differences (part 2)

graysky

Senior member
Mar 8, 2007
796
1
81
About 7 months ago I posted data comparing two memory dividers (1:1 and 3:5 @ 333 MHz) on my then Q6600/P965 based system and concluded that for the 67 % increase in memory bandwidth, the marginal gains in actual performance weren't worth the extra voltage/heat.

Since then I've upgraded my hardware to an X3360/P35 setup and wanted to revisit this issue. Again, two dividers were looked at: one pair running 8.5x333=2.83 GHz, and another running @ 8.5x400=3.40 GHz:

333 MHz FSB:
1:1 a.k.a. PC2-5300 (667 MHz)
5:8 a.k.a. PC2-8500 (1,067 MHz)

400 MHz FSB:
1:1 a.k.a. PC2-6400 (800 MHz)
4:5 a.k.a. PC2-8000 (1,000 MHz)

I figured there would be a much greater difference in the 333 FSB case since the memory bandwidth increased by 60 % vs. 25 % in the 400 MHz FSB case. All other BIOS settings were held constant with the exception of the divider (and the strap) and the given FSB. Subtimings were set to auto and as such could vary as managed by the board which I found out, was required since manually settings some of the subtimings lead to either an incomplete POST, or an unstable system.

The benchmarks were broken down into three categories:
1) "Real-World" Applications
2) 3D Games
3) Synthetic Benchmarks

The following "real-world" apps were chosen: x264, winrar, and the trial version of Photohop CS3. All were run on a freshly installed version of Windows XP Pro x64 SP2 w/ all relevant hotfixes. The 3D games were just Doom3 (an older game) and Crysis (a newer game). Finally, I threw in some synthetic benchmarks consisting of the Winrar self test, Super Pi-mod, and Everest's synthetic memory benchmark. Here is an explanation of the specifics:

Trial of Photoshop CS3 ? The batch function in PSCS3 v10.0.1 was used process a total of fifty-six, 10.1 MP jpeg files (226 MB totally):

1) bicubic resize 10.1 MP to 2.2 MP (3872x2592 --> 1800x1200) which is the perfect size for a 4x6 print @ 300 dpi.
2) smart sharpen (120 %, 0.9 px radius, more accurate, lens blur setting)
3) auto levels
4) saved the resulting files as a quality 10 jpg.

Benchmark results are an average of two runs timed with a stopwatch.

RAR version 3.71 ? rar.exe ran my standard backup batch file which generated about 955 MB of rars containing 5,210 files totally. Here is the commandline used:
rar a -m3 -md4096 -v100m -rv40p -msjpg;mp3;tif;avi;zip;rar;gpg;jpg "f:\Backups\Backup.rar" @list.txt
where list.txt a list of all the target files/dirs included in back up set. Benchmark results are an average of two runs timed with a stopwatch.

x264 Benchmark HD ? Automatically runs a 2-pass encode on the same 720p MPEG-2 (1280x720 DVD source) file four times totally. It contains two versions of x264.exe and runs it on both. The benchmark is the best three of four runs (FPS) converted to total encode time.

Shameless promotion --> you can read more about the x264 Benchmark HD which contains results for hundreds of systems. You can also download the benchmark and test your own machine.

3D Games Based Benchmarks

Doom3 - Ran timeddemo demo1 a total of three times and averaged the fps as the result. Settings were 1,280x1,024, ultra quality with 8x AA.

Crysis - Ran the included "Benchmark_CPU.bat" and "Benchmark_GPU.bat" both of which runs the pre-defined timedemo, looped four times. I took the best three of four (average FPS) and averaged them together as the benchmark. Settings were 1,024x768, very high for all (used the DX9 very high settings hack, and 2x AA.

"Synthetic" Application Based Tests

WinRAR version 3.71 ? If you hit alt-B in WinRAR, it'll run a synthetic benchmark. This was run twice (stopped after 150 MB) and is the average of four runs.

SuperPI / mod1.5 XS ? The 16M test was run twice, and the average of the two are the benchmark.

Everest v4.50.1330 Memory Benchmark - Ran this benchmark a total of three times and averaged the results.

Hardware specs:
D.F.I. LP LT P35-TR2 (BIOS: LP35D317)
Intel X3360 @ 8.5x400=3.40 GHz
Corsair Dominator DDR2-1066 (TWIN2X4096-8500C5DF)
2x 2Gb @ 5-5-5-15 (all subtimings on auto)

(tRD=8) @ 667 MHz (1:1) @ 2.100V
(tRD=7) @ 1,066 MHz (5:8) @ 2.100V
(tRD=8) @ 800 MHz (1:1) @ 2.100V
(tRD=6) @ 1,000 MHz (4:5) @ 2.100V

EVGA Geforce 8800GTS (G92) w/ 512 meg
Core=770 MHz
Shader=1,923 MHz
Memory=2,000 MHz
Note: the performance levels (tRD) are set automatically by the board which wouldn't POST if I manually tweaked them. Even though they're different, I still feel the data are valid since this is the only way I can run them. In other words, if I'm going to run the higher dividers, it'll be as such or it won't POST!

Without further ado, here are the data starting first with a 333 MHz FSB comparing the 1:1 vs. 5:8 divider (DDR2-667 vs. DDR-1066):
8.5x333 @ 1:1 and 5:8

Here are the averaged data visualized graphically:
Averages graphed

Now on to the 400 MHz FSB comparing the 1:1 vs. 4:5 divider (DDR2-800 vs. DDR2-1000):
8.5x400 @ 1:1 and 4:5

And graphically:
Averages graphed

As you can see, there way nothing spectacular in either the real-world category, or the 3D games category in comparison to the massive increase in memory bandwidth (shown on the graphs in red). In fact, I was surprised to see that there were really no gains by Doom3 and minimal gains by Crysis. This is probably due to the fact that the video card shoulders the burden of these games with Doom3 being the light-weight of the two. As expected, the synthetic benchmarks did pick-up on the larger bandwidth, but only in the case of the 400 MHz FSB did I see anything approaching the theoretical increase (14 % of 25 % vs 15 % of 60 %).

If you read my first memory bandwidth post, perhaps the same conclusions can be drawn from these new data. One thing I'll add is that this new MB doesn't require extra voltage like my older P5B-Deluxe did to run the higher dividers, so it's not producing that much more heat. That said, I'm actually running the system with the 4:5 divider, since things seem to feel faster to me (windows opening, responsiveness, etc.) which are all unfortunately intangibles I can't measure.
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
Thanx for re-sharing your discovery.

However, I'm seeing a severe lack of info on what settings were used for this testing.

Of particular concern are:
Primary timings
Subtimings, especially tRD.

You can show us those with Everest, or MemSet.
CPU-Z doesn't help, as it shows only primary timings.

If you are leaving all the subtimings on auto....eh, let's just say you're generally defeating the purpose of running a higher RAM ratio, especially on a motherboard like your DFI, where tRD can be tightened at higher ratios, improving performance.
 

Alaa

Senior member
Apr 26, 2005
839
8
81
Off topic-

What's the difference between running 266 Mhz, 333 Mhz and 400 Mhz? Are there any real gains from decreasing the multi and increasing the FSB?
 

graysky

Senior member
Mar 8, 2007
796
1
81
@n7 - Sorry about that dude, for some reason that paragraph was omitted (edited). BTW, I tried monkeying w/ my subtimings but it wouldn't POST. If you have a guide you can point me to I'm all ears :) Also, I saw the following in your sig
4x2 GB Mushkin 996580 @ DDR2-1050 5-5-5-18
Can you post a screenshot of everest displaying your subtimings?

BTW, here is a screenshot of mine to augment what I put in the first thread.
 

lopri

Elite Member
Jul 27, 2002
13,314
690
126
Any specific reason you're running Doom 3 with 8AA? Doom 3 is highly sensitive and consistent to memory performance, if my memory serves me right. :D The differences will be small, but consistent nonetheless.
 

graysky

Senior member
Mar 8, 2007
796
1
81
@lopri - as I recall, it looks better. I haven't played d3 in a loooooong time though :)
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
Everest subtimings details for DDR2-1050 5-5-5-18 tRD 5.

Usually, there's nothing wrong with using auto for the majority of subtimings.
But tRD [tightening] has a major effect on performance, & tRFC [loosening] has a major effect on how high you can clock 2 GB dimms.

When you run auto for different RAM ratios, subtimings automatically change, which tends to somewhat skew benchmark results.

When you did your previous testing, you used the same mobo i used to have, the P5B-D.
The subtimings you could control in the bios for that mobo are limited, & important ones like tRD weren't available, which meant when you were benching, it was changing depending on the ratios you were running.

Anyway, what we need to see is a screenshot of your subtimings @ 1:1, & then one @ 5:6, as the subtimings will be different.
I'm mainly concerned with what tRD is for the different ratios.

Your mobo should be very similar to mine.
tRD is worded as "Performance Level", or in the bios, as "Performance LVL (Read Delay).

You can lower it, &/or also "Enable" "Read Delay Phase Adjust", which effectively results in a -1 for each phase...again, more details in the AT reviews.
You don't usually need to mess with many of them, as auto is usually fine for most, but AT's article here explains them in great detail.
http://www.anandtech.com/mb/showdoc.aspx?i=3129

I guess what i'm getting at, is that i'm just not a huge fan of the 1:1 myth is all.

If i'm seeing benchmarks on 1:1 vs. higher, i'd like very accurate results, & using auto subtimings, at least tRD, results in less than ideal accuracy.

Now in fairness, to really compare 1:1 vs. higher, you need to be tightening timings at lower speeds, since that's generally doable.

I see comments all over claiming 1:1 offers the best or equivalent to higher ratios performance, when that's not really true.

I do agree that the real world performance improvement is very small, but when overclocking, the different between 3 GHz & 3.2 GHz is also very small, yet you never see people suggesting running the slower speed...
RAM ratios offer even less of a performance improvement, but i think most overclockers like squeezing all possible performance out of their systems within reason, & running say 4:5 vs. 1:1 is generally one way to get that little bit more, e.g.

The main exception comes with the nForce chipset, where 1:1 tends to be best, simply due to the ability to run 1T, something Intel chipsets can't truely achieve.

Intel designs for their own chipsets though, & bandwidth has always been a strong point on theirs.
That's why you see their own generic motherboards giving the option of 333/400 MHz DDR2 with a 266 MHz FSB, e.g...they know very well 1:1 doesn't provide the best performance.

Same reason we're seeing 1:2 being the ideal ratio of choice when using DDR3-based systems.

I have nothing against 1:1.
It's regarded as the easy ideal default when overclocking.

But it's not the best (excepting nForce platforms or 975X...we'll leave that one alone), & that's why 1:1 is going to be basically extinct with DDR3.





 

graysky

Senior member
Mar 8, 2007
796
1
81
@n7 - can't get mine to post on anything by performance level 7
Also, which strap are you using? I'm 333/800. Finally, can you point me to a resource that can explain the implications of changing the strap? I just dropped mine down to either 266/800 and 266/667 (can't remember which) and am now at performance level 6 and a higher divider (4:5) so the DRAM is now @ 1,000 MHz.
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
I editted my post bigtime to convey what i'm trying to get at (though again, i do appreciate your benches, even though i'd say i disagree with your conclusion).

I am lucky to have a CPU with a 10x multi, which allows me to run a low FSB (good for keeping voltages low), & high RAM ratios.

I run 2:3, or 266/800.
I'm not sure if you'd have that option in your bios, but you won't be able to run that @ 400 MHz FSB & that RAM though, at least not stably.

That said, you have very nice RAM...it should do DDR2-1100 if you keep tRFC loose, or looser.

Anyway, can you check "Performance Level" @ 1:1 vs. 5:6 as i mentioned in the above post?

That's mainly what i'm trying to determine.
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
Ah sweet, i see you can do 266/800.
266/800 is 2:3 btw.
266/667 is 4:5.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: n7
Everest subtimings details for DDR2-1050 5-5-5-18 tRD 5.

Usually, there's nothing wrong with using auto for the majority of subtimings.
But tRD [tightening] has a major effect on performance, & tRFC [loosening] has a major effect on how high you can clock 2 GB dimms.

When you run auto for different RAM ratios, subtimings automatically change, which tends to somewhat skew benchmark results.

When you did your previous testing, you used the same mobo i used to have, the P5B-D.
The subtimings you could control in the bios for that mobo are limited, & important ones like tRD weren't available, which meant when you were benching, it was changing depending on the ratios you were running.

Anyway, what we need to see is a screenshot of your subtimings @ 1:1, & then one @ 5:6, as the subtimings will be different.
I'm mainly concerned with what tRD is for the different ratios.

Your mobo should be very similar to mine.
tRD is worded as "Performance Level", or in the bios, as "Performance LVL (Read Delay).

You can lower it, &/or also "Enable" "Read Delay Phase Adjust", which effectively results in a -1 for each phase...again, more details in the AT reviews.
You don't usually need to mess with many of them, as auto is usually fine for most, but AT's article here explains them in great detail.
http://www.anandtech.com/mb/showdoc.aspx?i=3129

I guess what i'm getting at, is that i'm just not a huge fan of the 1:1 myth is all.

If i'm seeing benchmarks on 1:1 vs. higher, i'd like very accurate results, & using auto subtimings, at least tRD, results in less than ideal accuracy.

Now in fairness, to really compare 1:1 vs. higher, you need to be tightening timings at lower speeds, since that's generally doable.

I see comments all over claiming 1:1 offers the best or equivalent to higher ratios performance, when that's not really true.

I do agree that the real world performance improvement is very small, but when overclocking, the different between 3 GHz & 3.2 GHz is also very small, yet you never see people suggesting running the slower speed...
RAM ratios offer even less of a performance improvement, but i think most overclockers like squeezing all possible performance out of their systems within reason, & running say 4:5 vs. 1:1 is generally one way to get that little bit more, e.g.

The main exception comes with the nForce chipset, where 1:1 tends to be best, simply due to the ability to run 1T, something Intel chipsets can't truely achieve.

Intel designs for their own chipsets though, & bandwidth has always been a strong point on theirs.
That's why you see their own generic motherboards giving the option of 333/400 MHz DDR2 with a 266 MHz FSB, e.g...they know very well 1:1 doesn't provide the best performance.

Same reason we're seeing 1:2 being the ideal ratio of choice when using DDR3-based systems.

I have nothing against 1:1.
It's regarded as the easy ideal default when overclocking.

But it's not the best (excepting nForce platforms or 975X...we'll leave that one alone), & that's why 1:1 is going to be basically extinct with DDR3.

To be frank it's these very "long on words, short on data" posts that wax enthusiastically about higher memory speeds that wear thin on me.

I appreciate Graysky's effort to reduce the situation to real-world applications. If you got data to support any of your above stated "expectations" then I'm sure everyone here (including Graysky) would love to see it.
 

graysky

Senior member
Mar 8, 2007
796
1
81
@n7 - thanks for the info... I'll boot tomorrow and answer your questions about the performance level @ 1:1 and 5:6. Right now I'm running p95 on the 266/667 (4:5) level to see if it's stable as my 333/800 was stable. I read the article you mentioned, but I found it lacking in the explanation of the features. I really need some 101 or 201 level material on this strap setting :)

BTW, I edited my post as you were replying to it... you might wanna see it again (the one above your last reply).
 

Alaa

Senior member
Apr 26, 2005
839
8
81
Am I going to get any reply? At least a link pointing me to another topic!
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
Idontcare, if minimal performance improvements aren't your thing, that's quite fine with me.
But most of us here like maximum performance if possible ;)

I'm not trying to say there is a big difference; there isn't.
I have no issue with running 1:1, it's easy, & it offers good performance.

I said that already.

But what i am getting at is that claiming that 1:1 is the same performance as 4:5 or 2:3, etc. is simply not true in most cases, & i see that inferred in this review, as well as all over on forums all over.

I have done more than enough benching on my own systems to see this, & there are many reviews out there that verify this.

It's not news; it's been the case for years.

Anyway, graysky, i don't know if i've found a straps & ratios guide 101 anywhere...AT's articles do a damn good job IMO...
Here's what i'd suggest reading thru for info on straps, ratios, & tRD:
http://www.anandtech.com/mb/showdoc.aspx?i=3208
Outstanding info there ^
You already know of the article for our mobos (or what's very similar to yours).
This new DFI one also offers more info: http://www.anandtech.com/mb/showdoc.aspx?i=3294

This is mostly theoretical, but it basically shows why performance doesn't drop at higher speeds & looser timings.
http://www.thetechrepository.com/showthread.php?t=160

Alaa, please read thru the links i posted, especially the Asus review one.
It explains somewhat why you don't need a high FSB.

To simplify what they state in the AT reviews...
You do generally gain performance from a higher FSB, but you also have to then use lower RAM ratio.
Generally speaking, the higher the FSB, the looser the tRD settings, which usually more than negates the benefit of running a high FSB in the first place.

I can't explain it nearly as well as AT can, but what you need to understand is that the FSB, NB, & RAM are all connected, so when one changes, the other must as well.

If just increasing the FSB was all that was actually happening, then performance would always be better.
But raising the FSB comes at the expense of a looser tRD (& who knows what other hidden NB settings), which makes doing so less than ideal in most cases.
 

Alaa

Senior member
Apr 26, 2005
839
8
81
n7,
Thanks for the info.

Edit:
I think that gigabyte P35 DS3L doesn't have an option to set the tRD manually. Correct? Does it set the value to the lowest auto? For example, (using E2180) setting FSB to 266 Mhz, CAS at 5, tRD 5 and Ram to 400 Mhz, satisfies the equation with x = 1 and N = 3/2. Does that mean the motherboard will set tRD to 5?
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
Originally posted by: graysky

Again, I was disappointed in the performance gains I measured. This time I looked at two dividers at two different FSB settings:

333 MHz FSB:
1:1 a.k.a. PC5300 (667 MHz)
5:8 a.k.a. PC8500 (1,067 MHz)

400 MHz FSB:
1:1 a.k.a. PC6400 (800 MHz)
5:8 a.k.a. PC7700 (960 MHz)

I figured there would be a much greater difference in the 333 FSB case since the memory bandwidth increased by 60 % vs. 20 % in the 400 MHz FSB case. All other BIOS settings were held constant with the exception of the divider @ the given FSB.

Why do you expect a gain in performance going from PC5300 to PC8500 on a 333MHz FSB? The aim of your investigation seems to be to determine the performance gain from additional bandwidth, but in both of the above cases the FSB is saturated. A 1333MHz (effective) FSB can handle at the most 10664MB/s of bandwidth, which is what two DDR2-667 modules can supply.

The only advantage with running faster memory such that the memory bus and FSB run asynchronously is a possible reduction in latency, at the same CL, and assuming no penalty for running the two asynchronously.
Comparing different FSBs with 1:1 ratios, at the same CPU frequency would be more indicative of the impact of additional bandwidth.
 

graysky

Senior member
Mar 8, 2007
796
1
81
@n7 - I edited the first post switching the highest 400 MHz FSB run from 5:6 to 4:5 (960 MHz vs. 1,000 MHz) and included the info you asked about.