athlon 64: why is it's poor multitasking ignored/downplayed?

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: HeroOfPellinor
Originally posted by: CaiNaM
you know.. i wonder what difference 1mb l2 cache and/or dual channel ram would make (tho i doubt it's bandwidth related; a64 with its on-die memory controller does not benefit from dual channel anywhere near as much as intel does)?. wish i could make that comparison... =/

at any rate, i made a short video (14mb) of how rthdribl impacts this system. if you want to d/l it, you can get it here.

here's another observation. right now i'm running the athlon64 @ 2.3ghz/230htt (10x230). haven't installed all apps on this system yet, so i'm using winmpg to convert formats. takes like 1 sec to load, and 18 seconds to convert the 35mb .mov file to a 14mb divx file.

with rthdribl loaded and running, it takes 4 seconds to load the app, and 46 sec. to convert that same file to divx. that's almost 3x as long. during this time, rthdribl, which normally runs 70fps give or take, drops down to about 40fps. i'll let you guys draw your own conclusions.

I conclude, that people who enjoy running rthdribl while doing something else should get P4s.

well, since rthdribl is only an example to represent any "taxing" application, you're saying, "that people who enjoy running <insert ANY application here> while doing something else should get P4s."

i certainly wouldn't go THAT far, however i do think it's something to take into consideration if someone often does multiple things on the pc simultaneously.

 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
Well, I think that if you run 2 instances of something that is taxing, and each runs half as fast, then the cpu is doing its job.... You can;t get blood out of a rock, not even with a P4.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
Well, I think that if you run 2 instances of something that is taxing, and each runs half as fast, then the cpu is doing its job.... You can;t get blood out of a rock, not even with a P4.

you certainly can. at least if the rock is not bleeding all the way from performing the first task ;p

the entire issue seems to be that in instances of running multiple apps, the p4 takes advantage of being a slightly slower in single tasks. the athlon 64 keeps its short, 12-stage pipe full almost all the time - a significant part of the equation which makes the a64 much more efficient clock for clock compare to a p4, which has trouble keeping it's 20-stage pipe full.

HT simply makes use of this by filling the pipes with intructions from the second application, increasing the efficiency of the longer pipeline, allowing it to do more work, clock for clock, than it can when running a single application. combined with it's much higher clock rate, this can make a significant difference.

using your analogy, since the a64 architecture uses everything it has to perform a task, it has little left over to hande a second, simultaneous operation w/o taking away cpu cycles from it's original task. as you suggest, it simply can't "bleed" anymore; the p4 is, on the other hand is "coasting" due to it's inability to use it stages at max. efficiency when peforming a single task. in essence, it has "blood" left over for a second task, as HT makes use of the unused portions to process the second task simultaneously.

this also explains why prescott, with it's 31 stage pipelines, scales better than the northwood as clockspeed increase.

 
Dec 27, 2001
11,272
1
0
Originally posted by: CaiNaM
Originally posted by: HeroOfPellinor
I conclude, that people who enjoy running rthdribl while doing something else should get P4s.

well, since rthdribl is only an example to represent any "taxing" application, you're saying, "that people who enjoy running <insert ANY application here> while doing something else should get P4s."

i certainly wouldn't go THAT far, however i do think it's something to take into consideration if someone often does multiple things on the pc simultaneously.

Seriously, I don't know any experienced computer user who runs two taxing things at the same time. One can usually wait and if it absolutly cannot wait, then you usually have a separate rig for it. It's not like the P4 suffers ZERO slowdown when running two taxing apps.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
Originally posted by: CaiNaM
Originally posted by: Markfw900
Well, I think that if you run 2 instances of something that is taxing, and each runs half as fast, then the cpu is doing its job.... You can;t get blood out of a rock, not even with a P4.

you certainly can. at least if the rock is not bleeding all the way from performing the first task ;p

the entire issue seems to be that in instances of running multiple apps, the p4 takes advantage of being a slightly slower in single tasks. the athlon 64 keeps its short, 12-stage pipe full almost all the time - a significant part of the equation which makes the a64 much more efficient clock for clock compare to a p4, which has trouble keeping it's 20-stage pipe full.

HT simply makes use of this by filling the pipes with intructions from the second application, increasing the efficiency of the longer pipeline, allowing it to do more work, clock for clock, than it can when running a single application. combined with it's much higher clock rate, this can make a significant difference.

using your analogy, since the a64 architecture uses everything it has to perform a task, it has little left over to hande a second, simultaneous operation w/o taking away cpu cycles from it's original task. as you suggest, it simply can't "bleed" anymore; the p4 is, on the other hand is "coasting" due to it's inability to use it stages at max. efficiency when peforming a single task. in essence, it has "blood" left over for a second task, as HT makes use of the unused portions to process the second task simultaneously.

this also explains why prescott, with it's 31 stage pipelines, scales better than the northwood as clockspeed increase.
So what you are saying is that an inefficient CPU becomes a little more efficient with HT when running multiple apps. So ? I will take the efficient CPU that goes each process at 50% when running two of the same demanding app.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
So what you are saying is that an inefficient CPU becomes a little more efficient with HT when running multiple apps. So ? I will take the efficient CPU that goes each process at 50% when running two of the same demanding app.

no you wouldn't. if you would, you wouldn't have dual opts.

we all know amd is more efficient clock for clock. we also know in that most insances amd's are a bit faster when comparing PR ratings to intel's actual Mhz. if you want to give up being able to do simultaneous tasks smoothly to get a few % pts faster in single tasks and benchmarks, that's great; as i stated several times some ppl just don't use their pc's in that manner. but again, you obviously wouldn't as you have a dual opteron setup with effectively eliminates this weakness with the a64 architecture.

it's obvious you're an amd fan, and hey, if you want to wave your amd banner, and if that makes you happy, that's just fine... but it still doesn't change the facts... :disgust:
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: HeroOfPellinor
Originally posted by: CaiNaM
Originally posted by: HeroOfPellinor
I conclude, that people who enjoy running rthdribl while doing something else should get P4s.

well, since rthdribl is only an example to represent any "taxing" application, you're saying, "that people who enjoy running <insert ANY application here> while doing something else should get P4s."

i certainly wouldn't go THAT far, however i do think it's something to take into consideration if someone often does multiple things on the pc simultaneously.

Seriously, I don't know any experienced computer user who runs two taxing things at the same time. One can usually wait and if it absolutly cannot wait, then you usually have a separate rig for it. It's not like the P4 suffers ZERO slowdown when running two taxing apps.
if that's the case, i can't quite figure out why you're spending so much of your time following a thread about multitasking, as from what you state it obvioulsy doesn't relate to you in any way whatsoever.

again, the world is bigger than the room around you. that you use something in a particular is fine, but that doesn't mean everyone else uses their pc the same way as you do.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
I also own a Athlon64 3000+, and the dual opteron were so I could have my SCSI array running at 133 mhz and at 64 bit NOT for the dual part. But I must say, now that I have one, I will allways have at least one. I have 10 computers, so I have a wide array to test on, the XP series is OK vs the P4 in single tasks, the Athlon64 is great at everything, but the duallies just can't be stopped !

And the facts are they are fine at multitasking.....
 

KDKPSJ

Diamond Member
Dec 13, 2002
3,288
58
91
Originally posted by: Markfw900
Originally posted by: CaiNaM
Originally posted by: Markfw900
Well, I think that if you run 2 instances of something that is taxing, and each runs half as fast, then the cpu is doing its job.... You can;t get blood out of a rock, not even with a P4.

you certainly can. at least if the rock is not bleeding all the way from performing the first task ;p

the entire issue seems to be that in instances of running multiple apps, the p4 takes advantage of being a slightly slower in single tasks. the athlon 64 keeps its short, 12-stage pipe full almost all the time - a significant part of the equation which makes the a64 much more efficient clock for clock compare to a p4, which has trouble keeping it's 20-stage pipe full.

HT simply makes use of this by filling the pipes with intructions from the second application, increasing the efficiency of the longer pipeline, allowing it to do more work, clock for clock, than it can when running a single application. combined with it's much higher clock rate, this can make a significant difference.

using your analogy, since the a64 architecture uses everything it has to perform a task, it has little left over to hande a second, simultaneous operation w/o taking away cpu cycles from it's original task. as you suggest, it simply can't "bleed" anymore; the p4 is, on the other hand is "coasting" due to it's inability to use it stages at max. efficiency when peforming a single task. in essence, it has "blood" left over for a second task, as HT makes use of the unused portions to process the second task simultaneously.

this also explains why prescott, with it's 31 stage pipelines, scales better than the northwood as clockspeed increase.
So what you are saying is that an inefficient CPU becomes a little more efficient with HT when running multiple apps. So ? I will take the efficient CPU that goes each process at 50% when running two of the same demanding app.

Well, let me put myself into the topic again. Isn't *really tough job* such as video encoding or 3D game different from the normal applications mentioned above? As I mentioned earlier, Barton was having a hard time with multitasking while encoding runs. I don't know how encoding program works, but it seems like encoding tools eat up every CPU power left, everything. So, my guess is that, because HT logically divides long pipelines into two parts, it always leaves at least few pipeline for another logical CPU even if really tough job eats up everything CPU has. And leftover pipeline is used for jobs other than encoding. Contrast to that, for AMD CPU, encoding tool eats up everything CPU has because there's no logical barrier or limit for pipeline use. So, even though AMD's CPU actually faster and more efficient than Intel's, Intel works better in such situation. Maybe I am so wrong about this, because it's from the article I read about an year or so ago, when I was looking for upgrade.

EDIT:
Ok, I just turned off HT to see how everything goes. And a result is.. awful. It's about the same as Barton I had. Of course, encoding rate is absoutely better than Barton, but can't do much multitasking. So, it's like Barton with higher clock if I don't use HT, and thus I assume A64 would be about same as well (Because A64 is higher clocked Barton anyway, except x86-64 =P). But again, maybe it's the limit of M$ windows, I guess. I am going back to turn on HT, guess I can't live w/o HT anymore, lol.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
I also own a Athlon64 3000+, and the dual opteron were so I could have my SCSI array running at 133 mhz and at 64 bit NOT for the dual part. But I must say, now that I have one, I will allways have at least one. I have 10 computers, so I have a wide array to test on, the XP series is OK vs the P4 in single tasks, the Athlon64 is great at everything, but the duallies just can't be stopped !

And the facts are they are fine at multitasking.....

quite the opposite, as i have shown and others here have stated... as well as the few sparse reviews that have actually compared this and have expressed similar concerns.. but i guess your testimonial is somehow more valid?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
Originally posted by: CaiNaM
Originally posted by: HeroOfPellinor
Originally posted by: CaiNaM
Originally posted by: HeroOfPellinor
I conclude, that people who enjoy running rthdribl while doing something else should get P4s.

well, since rthdribl is only an example to represent any "taxing" application, you're saying, "that people who enjoy running <insert ANY application here> while doing something else should get P4s."

i certainly wouldn't go THAT far, however i do think it's something to take into consideration if someone often does multiple things on the pc simultaneously.

Seriously, I don't know any experienced computer user who runs two taxing things at the same time. One can usually wait and if it absolutly cannot wait, then you usually have a separate rig for it. It's not like the P4 suffers ZERO slowdown when running two taxing apps.
if that's the case, i can't quite figure out why you're spending so much of your time following a thread about multitasking, as from what you state it obvioulsy doesn't relate to you in any way whatsoever.

again, the world is bigger than the room around you. that you use something in a particular is fine, but that doesn't mean everyone else uses their pc the same way as you do.
OK, I think the reason he and I and others are following the thread so close, is YES, the P4 is a little better at multitasking, only to make up for its sub-standard performance otherwise compared to the Athlon64, but the thread says POOR performance for the Athlon64, there is a big difference, and its just not true.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
Originally posted by: SimsFreak
maybe it's becasue windows 64bit is still beta?
I'm not eve using window 64 bit, and niether is anybody else, we are talking 32bit here.....

 

Lithan

Platinum Member
Aug 2, 2004
2,919
0
0
Originally posted by: CaiNaM
Originally posted by: Markfw900
I also own a Athlon64 3000+, and the dual opteron were so I could have my SCSI array running at 133 mhz and at 64 bit NOT for the dual part. But I must say, now that I have one, I will allways have at least one. I have 10 computers, so I have a wide array to test on, the XP series is OK vs the P4 in single tasks, the Athlon64 is great at everything, but the duallies just can't be stopped !

And the facts are they are fine at multitasking.....

quite the opposite, as i have shown and others here have stated... as well as the few sparse reviews that have actually compared this and have expressed similar concerns.. but i guess your testimonial is somehow more valid?

Yes. I'd say his testimonial is more valid than reviews which have been discredited and someone who continues to reference them still.

Anyhow. I got my hands on a copy of winxp corporate, so next time I get my hands on a p4 (c) proc (Next really good 400sc deal probably). I'll be able to give performance numbers in typical desktop situations. Of course, I'm not a retarded fanboy, so "typical desktop conditions" won't include encoding while I try to play doom3 then bragging about my awesome 15 fps vs someone elses 6fps.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
again, the world is bigger than the room around you. that you use something in a particular is fine, but that doesn't mean everyone else uses their pc the same way as you do.
OK, I think the reason he and I and others are following the thread so close, is YES, the P4 is a little better at multitasking, only to make up for its sub-standard performance otherwise compared to the Athlon64, but the thread says POOR performance for the Athlon64, there is a big difference, and its just not true.
[/quote]

perhaps you should read the thread title then? it doesn't say poor performance, it says, "poor multitasking", which has been the subject at hand.

as far as "sub-standard" performance of the p4, that's not really accurate, although i guess that would depend on what one's "standards" are. i'm not playing favorites here, just trying to be accurate.. some points i've made earlier:

- gaming benchmarks have shown 5-10% performance advantage in favor of my amd64.
- while intel does encode most things at a faster rate, there are some things the amd does just as well.
- most applications seem to perform slightly better on the amd vs a similarly rated intel.

those are all postivies in amd's favor.

- multitasking performance is noticeably lower in the amd when using performing than 1 cpu intensive task simultaneously.

which seems to be the only weakness regarding the a64. while i've shown/observed some thing impacted more negatively than others, taking 3x as long to encode a 60 second clip in the background shows a substantial drop off in performance. imagine if it was a 2 hour video... granted, not everyone encodes videos, per se, but other applications are affected as well, tho perhaps not as significantly.

i've also stated this may not be as pronounced on s939 (tho again, it's my opinion dual channel mem simply doesn't have the impact on a64's) or on chips featuring 1mb l2 cache. i'd certainly like to compare them, and i may, tho i'm not sure it's worth the risk of spending even more money on the a64 platform (already bought 2 motherboards and another 1gb mem just to eliminate that possibility).

i'd certainly be happy if the amd's "substandard" multitasking performance was as slight as the p4's "substandard" single tasking performance, but the differences are far greater, and while it may not impact everyone to the degree it does me, it's certainly a valid observation, as i've both shown in actual application and references from online reviews, as well as a logical conclusion as to why this is the case.

it essentially boils down to this: while the p4 can process multiple threads simultaneously, the shorter a64 pipeline forces it to rely much more on windows thread tasking, and i have not heard a single explanation or theory which refutes this. your statement, "And the facts are they are fine at multitasking....." certainly doesn't.

Originally posted by: SimsFreak
maybe it's becasue windows 64bit is still beta?

well, it's certainly possible that, relying more on the OS for task switching, this situation may be improved by 64b windows. unfortunately, as you stay it's still beta, and the lack of 64-bit drivers at this point in certainly doesn't help.

it's my feeling that dual core a64s will eliminate what i percieve as amd's last weakness, at which time it will be really interesting to see how intel responds. will also be interesting as to where the pricepoints will be. amd needs to stay agressive as it's clear they are making a sizeable dent into intel's marketshare dominance. this is perhaps their best opportunity yet.

Originally posted by: Lithan
Yes. I'd say his testimonial is more valid than reviews which have been discredited and someone who continues to reference them still.

Anyhow. I got my hands on a copy of winxp corporate, so next time I get my hands on a p4 (c) proc (Next really good 400sc deal probably). I'll be able to give performance numbers in typical desktop situations. Of course, I'm not a retarded fanboy, so "typical desktop conditions" won't include encoding while I try to play doom3 then bragging about my awesome 15 fps vs someone elses 6fps.

lol.. yes, you've already proven how relevant your comments have been... you've ignored multiple sources and multiple logical explanations with nothing more credible than your own obviously biased opinion. i was gonna say, "sure, spoken as one fanboy to another", but that wouldn't exactly be fair to Markfw900.

and the extremetech article was never discredited (which was one test within a single review, not "reviews" as you claim), other than one person saying that - and even then it was only stated regarding the fs2004 "test", not to other tests within the same article which showed similar behavior, but that's also not the only article i've "continued to reference", as other reviews have made similar observations.



 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
CaiNaM, I meant multitasking when I said performance, I should have said multitasking performance, I just thought it was implied, in fact obvious. The one comment I disagree with is :
- multitasking performance is noticeably lower in the amd when using performing than 1 cpu intensive task simultaneously.
Although possibly it is a quantitave (sp) one. I can't see a problem without a freaking benchmark using the Athlon64 in multitasking (NOT dual Opterons, they are a killer), but I do notice the single app performance difference without a benchmark with respect to the P4.

How many different ways do I have to word this ?
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
CaiNaM, I meant multitasking when I said performance, I should have said multitasking performance, I just thought it was implied, in fact obvious. The one comment I disagree with is :
- multitasking performance is noticeably lower in the amd when using performing than 1 cpu intensive task simultaneously.
Although possibly it is a quantitave (sp) one. I can't see a problem without a freaking benchmark using the Athlon64 in multitasking (NOT dual Opterons, they are a killer), but I do notice the single app performance difference without a benchmark with respect to the P4.

How many different ways do I have to word this ?
i dunno.. in all honesty i don't see the single task advantage of my a64 over my p4 without a benchmark. i know it's there, i've benchmarked myself and seen it, but say, playing farcry at 55fps (a64) vs 50fps (p4) is not something i notice. in terms of "realworld", there isn't a noticeable difference (tho perhaps for some there may be a perceptual difference).

otoh, taking 3x as long to encode something, seeing the d/l rate in a background ftp session cut in half when a i do something else, or being unable to run a second daoc client is definately noticeable. maybe i just have higher standards having a p4 that CAN do all those things without performance issues (tho i'd think you'd feel the same being spoiled by your dual opt rig).

i'd certainly be more accepting to your claims if you offered even a logical explanation that this doesn't exist - i mean.. not proof.. even a logical theory. i'd certainly prefer solving this issue rather than spending that much more money on a new platform (again), but seeing what i am seeing, and hearing what i've heard form others (even in this thread), i simply cannot accept a simple, "And the facts are they are fine at multitasking", as anything other than a brandname loyal response.

i mean, not offense to you meant, as this is a general observation, but questioning anything regarding amd on AT lately is alot like going on rage3d and saying anything that not "pro ati"... if you follow ANY of my posts here at AT, rage3d, nvnews, and other forums, it would be obviously apparent i don't go into these things with any brand bias whatsoever - as a matter of fact, i also have both nv40 and r420 in my pc's..... along with another amd chip (barton) to go with my 2 intel pc's. while i certainly prefer to favor the underdog, i am not motivated by epenis envy or brand loyalty, and i certainly would not be wasting my time (and everyone else's - this thread has 4000 views) bringing these things up if they weren't valid.

for me, it's all about sharing experiences, knowledge, and information, not about rooting for one side or the other. if it's anything i despise, it's FUD above all, followed by those who spread it...

 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
The poor multitasking when RTHDRIBL is running on my machine seems limited to IE.
Firefox is quite zippy with it running right now. MSN page takes <1s to load. Partially because I set network.http.max-connections=48 and network.http.max-connections-per-server=16 on firefox, but still.... very different from the 13s in IE.
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: glugglug
The poor multitasking when RTHDRIBL is running on my machine seems limited to IE.
Firefox is quite zippy with it running right now. MSN page takes <1s to load. Partially because I set network.http.max-connections=48 and network.http.max-connections-per-server=16 on firefox, but still.... very different from the 13s in IE.

hey glug.. which cpu do you have? 939 or 754? 1mb cache or 512?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,280
16,121
136
CaiNam, again you are missing my point, when I encode, and multitask, I am gaming, so if it takes 30 min, or 1 hr 30 min, I don't care, but I don't like my games to lag. I give up on this thread, you have your mind set. Buy a P4, it works for you, fine.....
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
hmm.. okay, so we use the same chipset, and have the same l2 cache, so that wouldn't explain the difference.. the only other difference is dual channel memory (curiosly, the p4 also has dual channel memory, of course). i'm wondering if, while there's no significant difference in most tasks, that somehow the extra memory bandwidth allowed by the 939 has some impact when doing multiple tasks? while you see the same thing i do to a certain degree, it seems what i see is more pronounced.

i wonder how we could test this.... anyone offer any suggestions?
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
Originally posted by: Markfw900
CaiNam, again you are missing my point, when I encode, and multitask, I am gaming, so if it takes 30 min, or 1 hr 30 min, I don't care, but I don't like my games to lag. I give up on this thread, you have your mind set. Buy a P4, it works for you, fine.....

no, i don't "miss" your point.. that you "don't care" doesn't make the fact this condition exist conveniently "disappear"... and you certainly didn't say, "yea, but while that may be true, it doesn't really affect me..", rather you suggest the issue doesn't exist, regardless of whether it affects they way YOU use your pc.

as a matter of fact, i stated just that earlier when someone specifically asked what he should buy - that this may not bother some people as they just don't use their pc in that manner, and specifically referenced gaming and encoding.