athlon 64: why is it's poor multitasking ignored/downplayed?

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
previously titled, "amd vs intel question"

this may be something that doesn't apply to many people, but i ran across something today i'm hoping someone might be able to shed some light on.

i have 3.2ghz p4 northwood and an a64 3200+. while i haven't compared many benchmarks, the amd seems to have the edge in gaming performance, and that general consensus is shared by the majority of people i talk to.

the issue i ran into today involves an mmorp, dark age of camelot. when running two clients, the p4 runs it smooth as silk, but the a64 is performance is dismal. this isn't the case when only 1 copy of running. both systems run the same OS, same memory (1gb kingston hyperX), and same video, drivers, etc. the only significant difference is cpu/motherboard (p4, msi 965pe neo2 vs a54, chaintech vnf3-250).

anyway, is there something in the intel architecture which makes it run these 2 game processes at the same time more efficiently, or any reason the a64 has problems running both simultaneously?

 

clarkey01

Diamond Member
Feb 4, 2004
3,419
1
0
I know AMD has something in the works to combat HT.

Before we get Duive in here screaming about how good HT is you have to remember , the P4 only has it due to its 31 stage pipeline, AMD has a 12 stage with the a64, xp had a 10 stage, and Pentium M has about 13/14 is what we guess @ work, Even the folk @ santa clara said it would be useless to useless HT with a P-M due to its shorter pipeline.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
It seems there is a known issue with the Chaintech board and large fast Ram...
Newegg
devhardware
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: Viditor
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
It seems there is a known issue with the Chaintech board and large fast Ram...
Newegg
devhardware
When I run 2 seperate applications to which both vie for 100% usage of my A64's processing power it slows down by alot. As to if the P4 acts the same way or not due to HT tech., I wouldn't know.

P.S. Using ASUS K8V Deluxe FYI

EDIT: The large fast dimms problems from the comment from NewEgg from what I assume would be the limitation which also is inherent with the VIA K8T800 northbridge chipset (or memory controller in the A64 itself?) in which if you were to install more than 2x double sided and/or banked PC3200 memory modules, the motherboard will then downclock the PC3200 memory modules to PC2700 specs/speed.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
31,737
31,674
146
I've never built or owned a P4c system, but it certainly sound like you are seeing the benefit HT can provide. clarkey's points are valid and Viditor may be on to something about the ram issue holding the A64 back, but I have spoken with multiple P4c users who run DC projects and the gains are substantially more than 5% for F@H and SETI. Regardless of the reason for Intc implementing HT or of the average gains one can expect, there are areas where HT shines. Judging by your results CaiNam, I'd inquire if others have similar results when running 2 clients with either CPU. At the moment I'm inclined to believe HT is responsible for the nice performance gains you are seeing.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,105
16,017
136
2 clients on the network (one on each machine) ? or 2 clients on EACH machine ? And how can you run the game twice on one box ?
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
31,737
31,674
146
Originally posted by: Markfw900
2 clients on the network (one on each machine) ? or 2 clients on EACH machine ? And how can you run the game twice on one box ?
They added support for 2 clients in windowed mode starting with patch 1.60 B. I didn't read enough to be certain, but it sounds like they may have HT and/or MP specific support with the newer patches.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Originally posted by: CaiNaM
this may be something that doesn't apply to many people, but i ran across something today i'm hoping someone might be able to shed some light on.

i have 3.2ghz p4 northwood and an a64 3200+. while i haven't compared many benchmarks, that amd seems to have the edge in gaming performance, and that general consensus is shared by the majority of people i talk to.

the issue i ran into today involves an mmorp, dark age of camelot. when running two clients, the p4 runs it smooth as silk, but the a64 is performance is dismal. this isn't the case when only 1 copy of running. both systems run the same OS, same memory (1gb kingston hyperX), and same video, drivers, etc. the only significant difference is cpu/motherboard (p4, msi 965pe neo2 vs a54, chaintech vnf3-250).

anyway, is there something in the intel architecture which makes it run these 2 game processes at the same time more efficiently, or any reason the a64 has problems running both simultaneously?

why don't you try disabling smt and see how that affects things.

 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
Exactly.

If HT is why it's smooth, then if you disable HT, it should run like the AMD system. If it's still smooth, then look for a different suspect.
 

sonoran

Member
May 9, 2002
174
0
0
Originally posted by: Viditor
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
Actually, hyperthreading makes a huge difference in this usage scenario - quite literally the difference between the game being playable or not - as evidenced by the initial poster's experiences.

 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
31,737
31,674
146
Originally posted by: sonoran
Originally posted by: Viditor
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
Actually, hyperthreading makes a huge difference in this usage scenario - quite literally the difference between the game being playable or not - as evidenced by the initial poster's experiences.
That's my thinking too. Once he turns off HT the answer will be revealed for certain.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
ahhh.. ok, seems a couple people suggest seeing how it runs on the intel system with hyperthreading disabled. that's a good idea, thanks. i'll try that and post back. will be awhile tho.. xp is hanging trying to do a .net framework security update; no errors, it just stops - it shows it's doing the update, but no disc access and no cpu usage. everything else is responsive. grrr....

Originally posted by: Algere
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
It seems there is a known issue with the Chaintech board and large fast Ram...
Newegg
devhardware

i have a newer stepping, and am not seeing that issue. the bios comments they mention is there seems accurate tho, as if i run the 5/9 bios which fixes the cpu temp outpug and allows changing the cpu multiplier, it will only recognize my ram as 333mhz. the 7/29 bios is unstable on my system. the shipping bios doesn't allow for multiplier adj. and the cpu temp is incorrect, but the system is perfectly stable, at least from the point of view of crashes/hangs.

Originally posted by: Viditor
When I run 2 seperate applications to which both vie for 100% usage of my A64's processing power it slows down by alot. As to if the P4 acts the same way or not due to HT tech., I wouldn't know.

P.S. Using ASUS K8V Deluxe FYI

EDIT: The large fast dimms problems from the comment from NewEgg from what I assume would be the limitation which also is inherent with the VIA K8T800 northbridge chipset (or memory controller in the A64 itself?) in which if you were to install more than 2x double sided and/or banked PC3200 memory modules, the motherboard will then downclock the PC3200 memory modules to PC2700 specs/speed.

what you mention about 2 seperate apps vying for 100% cpu usage slowing your system down seems to be the same thing i'm seeing..

Originally posted by: Markfw900
2 clients on the network (one on each machine) ? or 2 clients on EACH machine ? And how can you run the game twice on one box ?[/L]

yea, if you have 1gb ram and run windowed mode, you can run 2 clients simultaneously on the same pc.


 

Lithan

Platinum Member
Aug 2, 2004
2,919
0
0
Originally posted by: clarkey01
I know AMD has something in the works to combat HT.

Before we get Duive in here screaming about how good HT is you have to remember , the P4 only has it due to its 31 stage pipeline, AMD has a 12 stage with the a64, xp had a 10 stage, and Pentium M has about 13/14 is what we guess @ work, Even the folk @ santa clara said it would be useless to useless HT with a P-M due to its shorter pipeline.

I'd guess just dual core cpu's. Of course they will need low end models and affordable dual-core support boards to accomplish that. Frankly, I think AMD is ignoring HT. If you need it, you should be able to afford a SMP system. How many businesses do you know that say "We don't need a quad Xeon or opteron server! We can just buy a 3.4EE with Hyperthreading! Look at these benchmarks of someone running "SuperPi" and "3dmark" at the same time! Hyperthreading is the SHIZNIT!!!!1!!!!!!!!1111111111111111"?
 

clarkey01

Diamond Member
Feb 4, 2004
3,419
1
0
Lithan you could say the same about Intel with hypertransport, they just dissed it in an interview.
 

Lithan

Platinum Member
Aug 2, 2004
2,919
0
0
Of course Hypertransport has proven itself to offer sizable performance in pretty well every application. Hyperthreading is mimicing existing technology and falling short. (Like running a 3d application on a 2d card.) Sure you could. And for a couple years a number of people did. But eventually people had to admit that it just failed in compairison to the real thing. And 3d cards had a much bigger mark-up (In %) over 2d cards than SMP does over HT procs. (Hell, there was that 2x1.6ghz XEON deal for $120... MUCH less than even the cheapest p4 HT. Amusingly, the Xeons had HT themselves.) Id compair amd's x86-64 to Hyperthreading before I compaired their Hypertransport to it. (x86-64 of course doesn't perform as well as a True, full 64bit processor does. But it's cheaper and it offers much better 32bit compatibility.)
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Originally posted by: Lithan
Of course Hypertransport has proven itself to offer sizable performance in pretty well every application. Hyperthreading is mimicing existing technology and falling short. (Like running a 3d application on a 2d card.) Sure you could. And for a couple years a number of people did. But eventually people had to admit that it just failed in compairison to the real thing. And 3d cards had a much bigger mark-up (In %) over 2d cards than SMP does over HT procs. (Hell, there was that 2x1.6ghz XEON deal for $120... MUCH less than even the cheapest p4 HT. Amusingly, the Xeons had HT themselves.) Id compair amd's x86-64 to Hyperthreading before I compaired their Hypertransport to it. (x86-64 of course doesn't perform as well as a True, full 64bit processor does. But it's cheaper and it offers much better 32bit compatibility.)

what exactly is a 'true 64-bit' processor?
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Lithan
Of course Hypertransport has proven itself to offer sizable performance in pretty well every application. Hyperthreading is mimicing existing technology and falling short. (Like running a 3d application on a 2d card.) Sure you could. And for a couple years a number of people did. But eventually people had to admit that it just failed in compairison to the real thing. And 3d cards had a much bigger mark-up (In %) over 2d cards than SMP does over HT procs. (Hell, there was that 2x1.6ghz XEON deal for $120... MUCH less than even the cheapest p4 HT. Amusingly, the Xeons had HT themselves.) Id compair amd's x86-64 to Hyperthreading before I compaired their Hypertransport to it. (x86-64 of course doesn't perform as well as a True, full 64bit processor does. But it's cheaper and it offers much better 32bit compatibility.)

the amd64 setup is underperforming compared to a similarly rated intel setup in this specific scenario. the purpose is to try and find why this is the case, not making general statements that HyperTransport rocks and HyperThreading sucks.. feel free to debate that in a seperate thread, i'm trying to get to the bottom of someting here and would like to stay on topic. thanks.
 

Vee

Senior member
Jun 18, 2004
689
0
0
If you've described the scenario accurately, I assume you are using a hyperthreading P4 and have hyperthreading enabled.

The thing is that the client is poorly programmed for your way of using it, in various ways. The Athlon64 (like all CPUs prior to the P4C) does not have any threading in the CPU. The thread switching is done entirely by Windows' sheduler, and Windows only releases one thread to the cpu at any time.

This does not need to be any problem. The sheduler can switch thread every 20 millisecond, and run two (or more) threads smoothly, seemingly concurrent. The threads will run half as fast of course, but that shouldn't be much of a problem. Actually 100 concurrent threads work fine too, but by then you would notice each stop running, intermittently.

This is only the case when you have several threads with equal priority, running simultaneously. This is not the normal multitasking case. And Windows sheduler works the way, that it computes a new priority for every thread every time, then runs the one with highest priority. One of the things that results in a higher priority, is if the thread is a "Windows Message Queue" thread and that particular window happens to be active. This works fine for the typical multitasking scenario:

An interactive application typically uses very little CPU-time during work. Mostly it just waits for the user to press down a key or move the mouse. This means the unused CPU capacity can be used to perform a large chunk of computing, in the background, on spare CPU time.

On a well working software setup, where the background task has lower priority than the interactive, you will not notice the background threads in any way. The foreground, interactive app will be as responsive as if it is the only application running on the PC. But at the same time, the background job, will in reality use up close to 100% of the CPU-capacity. This is the setup you have if you're running something like seti@home at an enlightened, and smart setting, like "idle". Nothing magical about it. The sheduler runs SETI, while it's waiting for you to press the next key. When you do, it instantly shuts down SETI, and process your keyinput instead. That doesn't take long, and it instantly goes back to running SETI again.
Finally, when you're ready to let the interactive application do a chunk of lengthy work, like compiling, it will do so, and this time SETI will be mostly shut down for the duration.

The OS' shell should have even higher priority though, so now you can start/switch to another interactive application. If your Windows/OS is configured right, the active window will define the app with the highest priority. Now it is this app that will stay snappy and responsive, while the compiling will be the background task and use spare CPU time. And SETI still won't run much at all, until compiling has finished. (Windows should of course be configured to give priority to the foreground app.)

If you work this way, you really don't have much use for hyperthreading.

So what about hyperthreading? Hyperthreading basically means the CPU itself can do threading within. It works like the OS' sheduler thinks you've got dual cpus, and sends a second thread to the CPU. In the above example, SETI will also start to run. Unless of course, our background task (compiling) is using 2 threads, or we have 2 background apps. Then the OS sheduler still won't run the one with least priority much.

So basically, hyperthreading works like you have 2 CPUs, for multitasking purposes. There have to be something 'wrong' with your software setup, if you're noticing any benefit to responsiveness from HT. That shouldn't be in an ideal world. But apps are far from ideal. So you will in practice see a real benefit, from hyperthreading, under some circumstances. Same as two CPUs.

Much Windows unresponsiveness, is however due to Windows blocking threads for purposes of event synchronization and resource contention handling. Hyperthreading will have no effect whatsoever on that.

Hyperthreading is, not to forget, also a way to get more work out of a P4. A P4's execution units spends a lot of time waiting, not knowing what to do until more data has arrived. This time can be used to run a different thread instead. So hyperthreading, in a way, increases the performance of the P4, provided you are running more than one concurrent heavy thread. Again, same as a dual CPU setup, though lower performing. I'm not exactly sure how much additional work can be squeezed out of a P4 this way, but I think it's like 15%-25%. Computer 3D image rendition, like recent versions of 3DSmax, can make very good use of this.


Well now, back to your case. Why doesn't it work like it *should* on Windows sheduler? The situation is not completely uncommon. But it is not clear either, and calls for some speculation. For some reason, that has to do with the program model and structure of the app, the running thread stick to a higher priority, even though it's waiting for something, probably server respons, and thus doesn't allow the other client to run.

On your P4C on the other hand, Windows sheduler sends 2 threads to the CPU, so it can run the second client instead, even when it has lower priority.
 

Vee

Senior member
Jun 18, 2004
689
0
0
Originally posted by: Lithan
(x86-64 of course doesn't perform as well as a True, full 64bit processor does. But it's cheaper and it offers much better 32bit compatibility.)

Well, yes, Lithan. :) X86-64 is indeed cheaper and do offer much better 32-bit compatibility. However, it is also most surely "a True, full 64bit processor" in every way. And I don't really see any other 64-bit processors outperform it much either.
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: CaiNaM
ahhh.. ok, seems a couple people suggest seeing how it runs on the intel system with hyperthreading disabled. that's a good idea, thanks. i'll try that and post back. will be awhile tho.. xp is hanging trying to do a .net framework security update; no errors, it just stops - it shows it's doing the update, but no disc access and no cpu usage. everything else is responsive. grrr....

Originally posted by: Algere
HT should help the P4, but nowhere near the degree you seem to be seeing...(maybe by 5% if it's a multithreaded app).
It seems there is a known issue with the Chaintech board and large fast Ram...
Newegg
devhardware

i have a newer stepping, and am not seeing that issue. the bios comments they mention is there seems accurate tho, as if i run the 5/9 bios which fixes the cpu temp outpug and allows changing the cpu multiplier, it will only recognize my ram as 333mhz. the 7/29 bios is unstable on my system. the shipping bios doesn't allow for multiplier adj. and the cpu temp is incorrect, but the system is perfectly stable, at least from the point of view of crashes/hangs.

Lies!! Deceiver!! Hiss... Hiss... :evil:


I didn't say that :p
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Originally posted by: Vee
Originally posted by: Lithan
(x86-64 of course doesn't perform as well as a True, full 64bit processor does. But it's cheaper and it offers much better 32bit compatibility.)

Well, yes, Lithan. :) X86-64 is indeed cheaper and do offer much better 32-bit compatibility. However, it is also most surely "a True, full 64bit processor" in every way. And I don't really see any other 64-bit processors outperform it much either.

at least not in terms of performace/price.
 

Lithan

Platinum Member
Aug 2, 2004
2,919
0
0
A64 is a 32bit processor with 64bit extensions. There are a few proprietary processor manufacturers that make fully 64bit cpus. And they, mhz for mhz, absolutely stomp all over the A64. And surely whatever 32bit processors Intel decides to enable 64bit on.