Dyflam - I hate to add some more to this thread but regarding v 3.0... L2 cache size is no longer an issue (ie., my dual Xeon II 400 1Mb now SUCKS w/v. 3.0 as compared to 2.x), the working set of the 3.0 client *will* fit in the coppermine PIII's 256K cache whereas 2.4's needed something like 800K of cache. The Duron's total cache is naturally a bit smaller than the PIII (ie., the T-bird is more comparable to the PIII). However cache *speed* seems to help along with FSB speed.
Your PIII 450 should be katmai core with half-speed 512K cache, but the Duron does have full speed cache (192K total I believe?), so you'd expect it to do better. But, since the AMD's run on a Via chipset, you're not processing to your full ability unfortunately. There's some tweeking that you can do using H. Oda's tools, that would speed up your times on that Duron, ie., by enabling 4-way memory interleaving.
So you do have some room for improvement - check over on Distibuted computing and folks can help you.