Athlon 64 X2 and Pentium D memory comparison.

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Today, my colleague Yuriman and I, decided to do some performance comparisons on the effects of memory bandwidth on our computer systems. The evidence that is going to be presented to you is one-hundred percent repeatable and one-hundred percent verifiable. The test was to view what a 40% reduction in system memory bandwidth on each of our respective systems would do to the performance of our computers. Of course, before we did the test, we had formed the hypothesis, based off of what was learned on these forums, that the Athlon 64 X2 platform would have less of a performance hit than the Pentium D platform.

Before we begin, it must be stated that both of these machines pass dual Prime 95 for more than 24 hours, have no viruses, have no spyware, have all the latest drivers as of 4/29/06, and are free of any detectable defects that were not mentioned. To put it simply, these machines are rock solid.

The two platforms tested were as follows:

System One:

AMD Athlon 64 X2 3800+ @ 2.5Ghz
ATi Radeon X850XT @ 560/560
DFI Lanparty NF4 Ultra
2x1GB of PC4000 DDR running at the timings 3-4-3-8(250Mhz memory)
(All other system information is irrelevant)

System Two:

Intel Pentium D 820 @ 2.8Ghz
ATi Radeon X850XT @ 560/560
Intel D945PSN
2x1GB of PC5300 DDR2 running at the timings 4-4-4-13(333Mhz memory)
(All other system information is irrelevant)

The applications that were used are as follows:

UMark
Super Pi
Everest Home Edition
Pocket DivX Encoder
CPUMark
Sisoft Sandra
WinRAR

Each of these applications is of the latest version available as of 4/29/06.


The tests were run on each system first at the settings stated in the system specifications, and then a second time with the following changes:

System One; Memory clock changed to 150Mhz

System Two; Memory clock changed to 200Mhz

The resulting drop in overall system memory bandwidth is a theoretical 40%.



The format for the results will be the original configuration followed by the configuration with a 40% reduction in memory bandwidth, then showing the percentage of performance lost. For all benchmarks except those recorded by time, higher is better. Percentage loss remains the same for easier comparison between time and score.

The results are as follows:

UMark Flyby:

System One (AMD) 2622.852 -> 2250.548 -> 15.2%
System Two (Intel) 1940.389 -> 1890.542 -> 2.6%


UMark BotMatch

System One 1166.952 -> 1001.583 -> 15.2%
System Two 840.949 -> 811.129 -> 3.6%


Super Pi 1M

System One 35.782 Seconds -> 39.312 Seconds -> 9%
System Two 46.969 Seconds -> 48.735 Seconds -> 3.7%


Everest Memory Read

System One 5075MB/s -> 3641MB/s -> 29.3%
System Two 5415MB/s -> 4719MB/s -> 19.9%

Everest Memory Write

System One 2088MB/s -> 1426MB/s -> 32.8%
System Two 2225MB/s -> 1725MB/s -> 23.5%

Everest Memory Latency

System One 52.3ns -> 82.9ns -> 37%
System Two 100.6ns -> 118.3ns -> 15%


Pocket DivX Encoder

System One 9:09 -> 9:28 -> 3.4%
System Two 13:28 -> 13:39 -> 19.2%


CPUMark

System One 286 -> 270 -> 6.6%
System Two 163 -> 163 -> 0%


Sandra Memory Bandwidth

System One Int 6199MB/s -> Int 3693MB/s -> 41.5%
Float 6083MB/s -> Float 3656MB/s -> 39.9%

System Two Int 4961MB/s -> Int 4498MB/s -> 9.4%
Float 4973MB/s -> Float 4494MB/s -> 9.7%


Sandra Multimedia

System One Int 46923 it/s -> Int 47119 it/s -> 0.5%
Float 51134 it/s -> Float 50941 it/s -> 0.4%

System Two Int 31382 it/s -> Int 31364 it/s -> 0%
Float 37201 it/s -> Float 37216 it/s -> 0%


Sandra Arithmetic

System One ALU 22285 MIPS -> ALU 22244 MIPS -> 0%
Whetstone 10120 -> Whetstone 10128 -> 0%

System Two ALU 15096 MIPS -> ALU 15088 MIPS -> 0%
Whetstone 7080 MFLOPS -> Whetstone 7078 MFLOPS -> 0%


WinRAR Decompression

System One 31 Seconds -> 31 Seconds -> 0%
System Two 46 Seconds -> 47 Seconds -> 2.2%


WinRAR Compression

System One 206 Seconds -> 263 Seconds -> 22.7%
System Two 549 Seconds -> 568 Seconds -> 3.4%


It is first necessary to understand that this is not a comparison of the performance of these two systems. It is a comparison of processors, their memory architectures, and consequently, their respective relations to the overall performance of the system.

It is supposedly common knowledge that AMD K8 architectures are more efficient than Netburst processors when it comes to memory bandwidth efficiency. These tests do not prove it true or false. However, what these tests do show is that AMD processors are capable of benefiting from increased memory bandwidth just as much, if not more so, than Intel processors. The tests show with little variance, that the AMD processor has a larger difference in performance when memory bandwidth is altered.

As a consequence of some non standard configuration settings, we must bring forth the platforms themselves into consideration. The Intel system is an entry level dual core processor, consequently, the memory bandwidth requirements may be less than that of a higher end processor. By contrast, the AMD processor is at a clock speed that puts it at, or near the highest end of high performing processors; as a result, the memory bandwidth requirements may be greater than average. Nonetheless, the results shown by these tests should not be ignored. The fact that there is a performance difference between the system when the memory speed is changed, shows that at least dual core AMD processors are capable of benefiting from even faster memory.

The results lead us to believe that AM2 may possibly bring a greater than expected performance increase to AMD processors in at least some scenarios. As we ponder the aforementioned, the question of whether or not the extra bandwidth that DDR2 provides can offset the increase in latency. Perhaps later, we shall do extensive testing in the areas of memory timings on both DDR and DDR2 platforms.

Discuss.
 

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
The bus speeds would have had to be the same for a more valid comparision, otherwise the lower FSB of the pentium-d at stock speeds is going to scew the results some. Either way, a 2.5ghz X2 is going to be much much much faster than a 2.8ghz pentium-d anyway..
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
Bus speed makes no difference, AFAIK, since the Athlon64 has a direct connection to the memory. Also, we were not comparing performance of AMD vs Intel, but the percentage performance drop when we lowered memory speeds on each of them, which we stated several times in the original thread.
 

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
Originally posted by: Yuriman
Bus speed makes no difference, AFAIK, since the Athlon64 has a direct connection to the memory. Also, we were not comparing performance of AMD vs Intel, but the percentage performance drop when we lowered memory speeds on each of them.

Oops, my bad, I wasn't looking close enough, didn't see the part where the memory speed was lowered, thought they were running at 250mhz and 333mhz.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Originally posted by: stevty2889
The bus speeds would have had to be the same for a more valid comparision, otherwise the lower FSB of the pentium-d at stock speeds is going to scew the results some. Either way, a 2.5ghz X2 is going to be much much much faster than a 2.8ghz pentium-d anyway..


It is necessary for you to understand a few things. For one thing, we are not comparing the performance of a Pentium D to an Athlon 64. We are comparing the performance impact that occurs on each when memory bandwidth is changed by an equivalent amount.

Edit: I see your reply. It appears that you now understand.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
I have a few nitpicks about your test...

First off, why overclock one system but not the other? Having a higher clock will, of course lead to a higher memory usage and, thus, a higher performance drop from being bandwidth starved. Assuming that a 2.5GHz X2 is about the quivalent of a 3.6GHz P-D then you should have overclocked the Intel chip as well to make the performance capability of each processor similar.

Second, you are using different memory technologies, one of which delivers insane amounts of bandwidth while the other delivers very low (comparatively) latencies. They're just not the kinds of things that you can compare directly.

Third, just because UNDERCLOCKING the memory in an overclocked system lead to major performance degradation you cannot infer that increasing the bandwidth will help in ANY scenarios at all, your test says nothing about performance with memory at stock, nor does it say that you will be limited unless the bandwidth requirements for AMD CPUs increases by 66% (with, for example, a 70% clock speed hike). On the other hand, you CAN infer that quad-cores would be insanely bandwidth limited in the AMD platform, since dual-channel DDR is likely close enough to being maxed out to not be able to handle a doubling of cores (again).

 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Probably under the assumption that 2.6Ghz X2's are equivalent to the AM2 high ends.

But one thing the OP is missing is that DDR2 has much higher latency than standard DDR. 3-4-3-8 is in reality 4-4-4-14 @ 800Mhz DDR2.
 

F1shF4t

Golden Member
Oct 18, 2005
1,583
1
71
Ok in these tests u have to use processors which have similar performance, why, well it is simpler to describe in terms of cache.
U have 2 cpu's, both 2ghz same bus same cache speed, same memory speed. Except one has 1mg cache and the other has 2. Now as u start raising their clock the lower cache size cpu will start to gain less performance per mhz increase much earlier than the larger cache cpu. This will occur due to the fact that the cache size is no longer sufficient to keep the core fed with enough instructions. Now cache speed and memory speed will have similar effects.

So best way u could have conducted the test is to find the speed of the cpus where they perform as close as possible, then reduce the the mem speed. Since the x2 in ur test is MUCH quicker than the PD it will hit memory bandwidth problems much earlier as it can process thing much quicker.

Hey i could be wrong, i'm basing this over seeing the performance diff between celeron northwood and p4 northwood at same bus, mem, and clock, only diff being the cache.
 

Brunnis

Senior member
Nov 15, 2004
506
71
91
I'm not sure I'm following here... The Pentium-D 820 has a 200MHz quad-pumped FSB providing a theoretical max of 6.4GB/s. At first you're running the DDR2 memory at 333MHz (DDR2-667). This easily exceeds the theoretical max of the Pentium's FSB. You then lower the speed of the DDR2 memory so that its bandwidth matches that of the FSB (DDR2 @ 200MHz offers 6.4GB/s in dual channel).

So, the fact of the matter is that you didn't really reduce the theoretical bandwidth of the Pentium system at all. It has access to 6.4GB/s in both cases. What's causing the performance difference between the two memory settings is probably partly related to the faster access latency times when the memory runs at 333MHz.

This explains why you got the results you got. To really test the impact of reduced memory bandwidth on the Pentium, you'd have to start at 200MHz and reduce the memory frequency from there. You'd get very different results.

I might have missed something and seriously screwed up with this post, though... :p
 

F1shF4t

Golden Member
Oct 18, 2005
1,583
1
71
Originally posted by: Brunnis
I'm not sure I'm following here... The Pentium-D 820 has a 200MHz quad-pumped FSB providing a theoretical max of 6.4GB/s. At first you're running the DDR2 memory at 333MHz (DDR2-667). This easily exceeds the theoretical max of the Pentium's FSB. You then lower the speed of the DDR2 memory so that its bandwidth matches that of the FSB (DDR2 @ 200MHz offers 6.4GB/s in dual channel).

So, the fact of the matter is that you didn't really reduce the theoretical bandwidth of the Pentium system at all. It has access to 6.4GB/s in both cases. What's causing the performance difference between the two memory settings is probably partly related to the faster access latency times when the memory runs at 333MHz.

This explains why you got the results you got. To really test the impact of reduced memory bandwidth on the Pentium, you'd have to start at 200MHz and reduce the memory frequency from there. You'd get very different results.

I might have missed something and seriously screwed up with this post, though... :p



That could be a point, as if ram is clocked faster than fsb it helps latency, but will not have an effect on bandwidth.
 

Brunnis

Senior member
Nov 15, 2004
506
71
91
Originally posted by: Dark Cupcake
That could be a point, as if ram is clocked faster than fsb it helps latency, but will not have an effect on bandwidth.
It will actually have an effect on bandwidth too, as can be seen from the tests. The Pentium-D in the test is pretty far from it's theoretical max with either one of the memory frequencies. This is probably partly because of the high latencies and partly because of inefficiencies in the system and memory design. Increasing the frequency helps with some of those inefficiencies and brings the bandwidth close to the theoretical max of the FSB.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
It is necessary to keep in mind that this is not a test of the performance of the two systems.(It is common knowledge that a 2.5Ghz X2 will be faster than a 2.8Ghz Pentium D) It is a test that looks for the performance effects that the systems would incur if maximum memory bandwidth was dropped by 40%. We started with two systems that run at their everyday settings. We then ran the tests on each with the bandwidth reduced by 40%. This thread is to show what kind of performance difference occurs when you lower the maximum bandwidth.

One more thing to keep in mind is that this thread is not to say that all AMD chips will have this large of a performance drop or that all Intels will not. It is to say that AMDs that are clocked that high will have this performance drop and Intels that are clocked low will not. This was stated in the orignal post. We did not include the 3800+ at stock speeds only for time reasons.

It appears that many simply read the benchmark results and not th analysis. I have stated that the test does not prove which platform in general is more efficient when memory bandwidth is lost, but only these two categories of systems.

As for the statement that AM2 may be faster than first thought due to increased bandwidth. I have stated that it may be possible that AM2 offers more performance than people first thought. I did not say anywhere that I was certain of this, nor did I even say that it was even probable. If memory bandwidth could affect an Athlon 64 X2 that is clocked at 2.5Ghz by that much, one must wonder how a 3Ghz dual core or quad core could benefit from dual channel PC2 6400. It is just a hypothesis, nothing more.

The statement relating to the Pentium D having a 200Mhz FSB with memory at 333Mhz. Keep in mind that both systems were running the memory at speeds higher than what each processor is supposed to be fed. This keeps it moderately fair. Then we reduced each system's bandwidth by a percentage, not a certain amount of mhz.

 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
We would liked to have overclocked the Pentium-D, but due to motherboard limitations, we were forced to run it at stock. The results would probably have been different, perhaps by a large amount, had we raised the FSB to 333, but since we can't, I again state that we are not comparing the two systems directly, but showing the effects on each system.

Also, since there is no "true" 1:1 on an Athlon64, we might as well run it at 200MHz with a 5:4 divider. This just shows how a higher-clocked Athlon64 will react.
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
I just remembered something which should have made it into the original post. Most of the tests we ran were only single-threaded, so in many cases, the second core was not used much, if at all. The drop would likely be much larger in multi-threaded applications.
 

Brunnis

Senior member
Nov 15, 2004
506
71
91
Originally posted by: dguy6789
The statement relating to the Pentium D having a 200Mhz FSB with memory at 333Mhz. Keep in mind that both systems were running the memory at speeds higher than what each processor is supposed to be fed. This keeps it moderately fair.
No, it's not fair. The difference is that the Athlon64, with it's 1GHz HyperTransport bus, actually makes use of the extra bandwidth when running the memory at 250MHz. The Pentium-D, however, has a regular FSB and cannot take advantage of the huge bandwidth from the PC5300 memory. In short:

Athlon64 system with 250MHz mem: Theoretical max of 8.0GB/s to CPU
Athlon64 system with 150MHz mem: Theoretical max of 4.8GB/s to CPU

Pentium D system with 333MHz mem: Theoretical max of 6.4GB/s to CPU
Pentium D system with 200MHz mem: Theoretical max of 6.4GB/s to CPU

See the difference? It explains why the two systems reacted so differently to the change in memory frequency. I was under the impression that the meaning of the test was to investigate if a Netburst CPU really is more keen on high bandwidth than a K8 CPU. Since you never really changed the available bandwidth to the Netburst CPU in your test, it is not valid to draw any conclusions on the subject. I do appreciate the effort, though. :)
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
The Hypertransport bus has nothing to do with accessing the memory, AFAIK. It is the connection between the processor and the northbridge, but the memory controller is not on the northbridge. Athlon64s can access the memory at any divider of the CPU's speed, because the memory controller is on the processor itself and has a direct connection to the memory.

There is no fair comparison, Athlon64s will have full-speed access to memory, no matter what divider or bus settings you run.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Originally posted by: Brunnis
Originally posted by: dguy6789
The statement relating to the Pentium D having a 200Mhz FSB with memory at 333Mhz. Keep in mind that both systems were running the memory at speeds higher than what each processor is supposed to be fed. This keeps it moderately fair.
No, it's not fair. The difference is that the Athlon64, with it's 1GHz HyperTransport bus, actually makes use of the extra bandwidth when running the memory at 250MHz. The Pentium-D, however, has a regular FSB and cannot take advantage of the huge bandwidth from the PC5300 memory. In short:

Athlon64 system with 250MHz mem: Theoretical max of 8.0GB/s to CPU
Athlon64 system with 150MHz mem: Theoretical max of 4.8GB/s to CPU

Pentium D system with 333MHz mem: Theoretical max of 6.4GB/s to CPU
Pentium D system with 200MHz mem: Theoretical max of 6.4GB/s to CPU

See the difference? It explains why the two systems reacted so differently to the change in memory frequency. I was under the impression that the meaning of the test was to investigate if a Netburst CPU really is more keen on high bandwidth than a K8 CPU. Since you never really changed the available bandwidth to the Netburst CPU in your test, it is not valid to draw any conclusions on the subject. I do appreciate the effort, though. :)



I do suppose that your statement does make sense. Perhaps it would have made more sense to simply have each machine run the tests in dual channel, then in single channel. This way both machines would have a 100% reduction in memory bandwidth, and the Intel machine would have a real difference in bandwidth that is available to the processor.
 

Brunnis

Senior member
Nov 15, 2004
506
71
91
Originally posted by: Yuriman
The Hypertransport bus has nothing to do with accessing the memory, AFAIK. It is the connection between the processor and the northbridge, but the memory controller is not on the northbridge. Athlon64s can access the memory at any divider of the CPU's speed, because the memory controller is on the processor itself and has a direct connection to the memory.
Yes, I may have been wrong about bringing HT into this discussion.

Originally posted by: Yuriman
There is no fair comparison, Athlon64s will have full-speed access to memory, no matter what divider or bus settings you run.
I was just reflecting over the fact that the way the test was conducted defeats its purpose. Since you never decreased the bandwidth that the Netburst CPU sees, it is impossible to draw any conclusions about how bandwidth hungry the it really is.

However, if actual bandwidth available to the CPU was cut down equally for each system, it would have produced the results that you wanted from the start.

Originally posted by: dguy6789
I do suppose that your statement does make sense. Perhaps it would have made more sense to simply have each machine run the tests in dual channel, then in single channel. This way both machines would have a 50% reduction in memory bandwidth, and the Intel machine would have a real difference in bandwidth that is available to the processor.
Yep, that would have been more informative.
 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
Originally posted by: Brunnis
Originally posted by: Yuriman
The Hypertransport bus has nothing to do with accessing the memory, AFAIK. It is the connection between the processor and the northbridge, but the memory controller is not on the northbridge. Athlon64s can access the memory at any divider of the CPU's speed, because the memory controller is on the processor itself and has a direct connection to the memory.
Yes, I may have been wrong about bringing HT into this discussion.

Originally posted by: Yuriman
There is no fair comparison, Athlon64s will have full-speed access to memory, no matter what divider or bus settings you run.
I was just reflecting over the fact that the way the test was conducted defeats its purpose. Since you never decreased the bandwidth that the Netburst CPU sees, it is impossible to draw any conclusions about how bandwidth hungry the it really is.

However, if actual bandwidth available to the CPU was cut down equally for each system, it would have produced the results that you wanted from the start.

Originally posted by: dguy6789
I do suppose that your statement does make sense. Perhaps it would have made more sense to simply have each machine run the tests in dual channel, then in single channel. This way both machines would have a 50% reduction in memory bandwidth, and the Intel machine would have a real difference in bandwidth that is available to the processor.
Yep, that would have been more informative.


Woops, meant 50%.
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
I have been preparing some more in-depth results, but it will take a few days. This, of course, will be for an Athlon64 only, but I'm trying to determine at what clockspeed memory bandwidth requirements exceed the bandwidth given.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
This test isn't valid. When you OC the 3800+ to 2.5GHz, you increase the memory clock to 250mhz so the AMD machine has the same max bandwidth as the ddr500 ram. With the Intel P-d, using ddr2-666 doesn't increase the Intel's max bandwidth speed at all. It's FSB is still running at 400mhz clock speed. With either ddr2-666 or ddr2-400, the P4-D 820 has a max bandwidth of 6.4GB/s. Running with a setup like this improves memory performance a bit because it decreases absolute latency but it doesn't increase bandwidth at all. To make an extreme point, let's say I put 1THz memory on a P4-D 820. The P4 will still have a maximum bandwidth of 6.4GB/s even though the memory's max bandwith is 16TB/s.
 

DrMrLordX

Lifer
Apr 27, 2000
22,836
12,888
136
One other nitpick: by decreasing the memory clock, you're also effectively increasing latency on both platforms if you keep the same timings as when you ran the memory at full speed. A platform more sensitive to latency will suffer not only from a loss of bandwidth but also a reduction of latency.

I agree that running each configuration in dual and single-channel mode would be the most efficient way to compare each platform without messing with latencies or otherwise skewing results. I would recommend running both platforms at stock speeds for both HTT/FSB and memory as has already been suggested here.