AMD chips slow with linkedlists?

OCedHrt

Senior member
Oct 4, 2002
613
0
0
Right now for a cs program assignment, I have to write up a report regarding different ADTs. While running the time tests, I noticed that linkedlists are VERY VERY VERY slow on AMD cpus.

The dataset is 125000 integer inserts, and 125000 integer deletes in same order.
School's P4 or P3(not sure of speed): 87s.
My Athlon XP Barton(2.2ghz): 247s.
My friend's Athlon 64 (2ghz): 112s. <- Improvment, yay!
My Pentium M laptop(1.6ghz): 68s.

125000 integer inserts and 125000 random deletions.
School's P4(not sure of speed): 76s.
My Athlon XP Barton(2.2ghz): 169s
My Pentium M laptop(1.6ghz): 37s.

The difference here is huge. I remember this also greatly affects cursor lists, but each run is quite long so don't want to test it right now.

The compiled program that does the testing is not optimized in any way. Optimizing with /O2 /G7 /arch:SSE offers no real benefit to the AMDs at 240s and 165s, while drastically cutting runtime on the Pentium M 1.6 ghz down to 41s &amp; 25s respectively.

I suppose this could probably have to do with the chipset and not necessarily the cpu since the laptop is using an Intel chipset and the school computers probably are as well.
 

Hajime

Senior member
Oct 18, 2004
617
0
71
That's......odd, to say the least.

Going to run some tests on my A64 box/P4 box/AXP box real quick.

What language is this in, btw? (I'll write up a real quick hack for both Java and C++, but if you are using a diff language...)
 

Hajime

Senior member
Oct 18, 2004
617
0
71
Prelim results in C++ and Java, averaged out over 2 tries in C++:

2.2ghz P4: 92s
2.5ghz Barton AXP: 115s
2.3ghz A64: 72s

Mod to random deletions
2.2ghz P4: 84s
2.5ghz Barton AXP: 90s
2.3ghz A64: 67s


Compiled with standard gcc, my generic header file/standard generic functions and libs, and nothing special used other then that. If it's in C++ or Java, and you'd post the code, I might be able to spot something wrong specifically with the code, but other then that....

I'll post the Java results later.
 

OCedHrt

Senior member
Oct 4, 2002
613
0
0
It's in C++. The code is actually Weiss's code from Data Structures &amp; Algorithms Analysis in C++.

rar package.

Interesting, rar actually came out much smaller than 7-zip.
 

Hajime

Senior member
Oct 18, 2004
617
0
71
OC: That's possible. I don't use Windows for programming the majority of the time. If you can attach them, pm me. Otherwise, just e-mail me them at jbailes at gmail dot . com please.

I'll try to figure out the problem.
 

Hajime

Senior member
Oct 18, 2004
617
0
71
...You are just making a linked list of ints, with standard insertion/deletions, and you use -that- code? Jeebus.

No insult, but the writer of the book you use needs a reality check. A better language for that style would be Java, -not- C++.

My advice would be to refactor the code, but as you are taking a Data Structures course, I'm unsure of what your current ability with C++ is. Glancing over it, however, those results do appear accurate.
 

OCedHrt

Senior member
Oct 4, 2002
613
0
0
Well no, the code provided by the book and its author was meant for other purposes I'm sure. Our prof just decided to use that code to save himself the hassle :p Using ints was just to keep things simple. We are just analyzing the result a writing up a report, factoring in the dataset.

What I'm wondering about is why it is so much slower on the AMD chips, at first I thought it might have to do with the CPU cache but then I highly doubt the school PC's have more than 512kb of cache.

Also, your A64 results, although different from my friend's, isn't really that far off. 111 vs 70s while being 15% slower. Also, he wasn't really benching it, he had other things running at the same time. so the A64 result is probably the same. However, my Barton AXP doesn't even come close. 247 vs 115 is crazy. And, the 1.6 Ghz Pentium M still blows away everything, tying even the 2.3 A64.

My system is running a KT600 at 400 FSB while my friend's is the nForce3 at 200x10.
 

Hajime

Senior member
Oct 18, 2004
617
0
71
125k ints in a linked list?

Sizeof int on my Windows-based system reports 4 bytes.

4 * 125 = 500k.

500k bytes. Unfamiliar with the overhead associated with linked lists offhand, but his code has sufficient overhead to push you past the 512k point, and make you go outside of cache. My implementation should, for 125k instances, result in a memory footprint of approx. 510k

Does your friend have 1mb of cache? If so, then that's your reason.

(Edit) Fiddled with it and did some tweaking. Looks like it's cache. I don't have a chip onhand with 2mb of cache, but if that P-M has a large amount of cache, that would explain it all.
 

OCedHrt

Senior member
Oct 4, 2002
613
0
0
No my friend has the Newcastle core, so 512kb cache as well. And the school PCs defintely have 512kb or less. There's no way they'd use high end cpus in these labs. They're probably something like the first gen P4's as they aren't that new either.

Oh, and the other ADTs like the trees and hashes run just fine on the AMD (actually faster than the 1.6 P-M). Other than BSTs, BSTs don't work for File1-3 for some reason.

So to sum it up, LL, CL, P-M owns. Everything else, other than BSTs which don't work for some reason, my A-XP is faster by far (1.3s vs 1.9s with BTree M=3, L=1).
 

Hajime

Senior member
Oct 18, 2004
617
0
71
Okay....

Hrm. It looks like it is his specific implementation then, OC. Can't tell ya anything else - my implementation was quite a bit different, being a throwback to highly-optimized standard C.
 

OCedHrt

Senior member
Oct 4, 2002
613
0
0
Probably, it is very interesting though. Compiler optimizations on the linkedlists and cursor lists have no benefits as well. Benefits are apparant on the other ADTs though. With this in mind, this could explain some programs that are much faster on the P4.

The school's intel chips also do the inserts and deletes in 87s which is faster than your 2.2 P4 and I'm 99% sure the school chips are much slower with same or less cache. The P-M I have does have 2 mb cache but that doesn't explain the performance on the school computers.

Is there some way to tell what cpu the computer is using from a ssh shell?
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
ECS110, eh... odd, because if I recall correctly, this is the first programming assignment.
*shrug*
Which computers are you using? Only CS ones I can remember had bird names.
There was also a fairly new set of Pentium 4's somewhere with ~2GHz procs; Northwoods, if I remember correctly. I believe there should be a set of P3's somewhere running about 800 MHz, unless those were scrapped already. God knows the EEC department needs upgrading, but they can't move from UNIX thanks to software. CS boys are lucky in that respect.
At any rate, the dataset is designed to overflow the cache. This assignment is essentially a demonstration of memory latencies. I'm guessing the Pentium 4's cache structure is better suited for in-order insertions. I could go on, but then I'd be helping you with your report.
 

OCedHrt

Senior member
Oct 4, 2002
613
0
0
Lol, well, I don't have to write anything about why AMD is slower, I was just curious. And I did find out how to find out what cpu they're running.

model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping : 4
cpu MHz : 1993.975
cache size : 512 KB

Same 512 KB cache.

Hajime ran some tests w/ different code earlier and his results made more sense. Apparently Weiss's code runs slightly faster (87s @ 2.0 ghz vs 92s @ 2.2 ghz) on Intel but 3x slower on AMD.

Finally figured out how to mount the damn usb key under linux. Dumb mount /dev/sda instead of /dev/sda1. 71.13 from the Pentium M on Linux. Gonna try it on my A-XP next, if I can even get the live cd to boot (failed the first time I tried).

The first assignment only had 6 ADTs, now we have 15. I noticed the problem on the first one, but didin't bother.

Also, now that I'm running it on my laptop under Linux, seems like all of them are faster under Linux, other than the linked lists. Kudos to Linux. :p