Games and DVD encoding are repetitive tasks? Please, reconsider.
Sure they're more computationally repetitious in their CPU execution graphs and data access patterns than an email client or web browser
would be, but they're HARDLY good examples of compute bound or memory bound or cache bound situations.
In games they're inteactive and the vast majority of their execution performance dependency lies in the capabilities of
your special GPU processor. That's WHY many people spend from $200 to $600+ on high powered GPU cards -- to make GAMES
perform faster. Try running most any 3D high action highly graphic highly textured game at 1600x1200 resolution on JUST a CPU with
a basic GPU and see how far you get towards acceptable performance levels no matter WHAT CPU you have -- even the most
EXTREME EDITION, QUAD CORE, SINGLE CORE, OVERCLOCKED, WATER COOLED, whatever, will pathetically fail to achieve
what even a $150 graphics card can do.
Actually that's the POSTER CHILD example of proving my very point that the computational ARCHITECTURE is often the MAIN
factor in real world performance, not mere CPU MHz. It may surprise you to know that GPUs have very LITTLE cache, instead
they have massive numbers of independent processors each running at very high speeds coupled with lots of very high speed
memory. The reason for that is GPU rendering is anything BUT a computationally repetitive activity, it's called a
STREAMING activity where you calculate a given triangle's position / shading then you move on to the next one probably
never even processing the previous triangle AGAIN until the shader finishes which is a very long time and very large number
of triangles in the future. I said very clearly that cache benefits LOCALLY REPETITIVE processing of code/data, and
3D graphics in the traditional sense is not that.
The overall GAME performance is therefore GPU limited at any reasonably high resolution for most cutting edge games,
and what the CPU is left to do isn't at all the critical path or necessarily very local / repetitious.
Now as for DVD encoding, sure, it's computationally intensive, but again, it's PRIMARILY either a
SPECIAL PURPOSE HARDWARE limited activity, or a streaming not very repetitive (in data) action.
To compress a given line of video you need to know that line, the previous couple of lines, the next couple of lines, and
maybe the same lines of the previous and next frames of video. Once you have those data you can do your motion
estimation and spatial quantization and chromatic quantization et. al. You process those lines once and outside of
that small window of time and spatial context you NEVER TOUCH THEM AGAIN. Accessing data a few dozen times and
never touching it again isn't my (or a CPU's) definition of a super repetitive process. It's more of a streaming process.
At 60 fields per second of video that's 216,000 fields of video per hour and you're accessing any given
field maybe four times total since once you've read it, encoded it, you move on to the next totally unrelated
few fields. Sure a properly written software encoder can still make GOOD use of the CPU cache to deliver
substantial performance benefits of maybe 10x the performance compared to the same code that
either didn't use the cache right at all or which had to run on a CPU with WOEFULLY too little cache.
However the fraction of even POSSIBLY cacheable accesses to total ones is still small due to the small data access
repetition even if the coding algorithm probably is a lot more repetitive for the DCTs and quantizations etc.
Anyway that's all somewhat moot since anyone who's REALLY concerned about high video transcoding performance
probably has special purpose DSP encoder hardware inside their GPU or video chipset or digital video peripheral
that does about 90% if not 99% of the computing work involved in DCT / MPEG-2 / MPEG-4 / H.264 type encoding
and decoding processes which are specially designed to do just that task like ten to fifty times faster than any
general purpose CPU based software encoder can touch. Ever see ATI's AVIVO transcoder stuff?
The performance advantage of hardware MPEG-2 encoders/decoders for HDTV? I rest my case.
Saying that CPU cache only gives a moderate advantage to algorithms that are not even mostly
relevant to executing on the CPU or in a repetitive manner just points out how generally USEFUL
CPU cache is, giving decently noteworthy gains to programs that aren't even super great utilizers of it.
Now let's look at another REAL WORLD example that's a LOT more repetitive in certain configurations --
from the Folding @ Home forum. Observe the following gentleman's comment about a REAL WORLD
2X performance benefit on computations that take HOURS (much more than just a DVD encode!)
due to cache.
His 4MB Core2 can do the same type of work in half the time as his other machine that has
less cache.
As I've said before, you can (and many people should if they want the best performance)
OVERCLOCK *ANY* CPU QUITE A BIT, BUT YOU CANNOT OVERCACHE ONE AT ALL!
So if you happen to be running some program that's not hardware accellerated and needs
repetitive computations for CPU-hours per day / week, you CAN AND WILL get something
up to a 3X to 10X benefit (depending on the application) from having enough cache for
the program so that the 'hit ratio' is very high as opposed to having such little cache that
the hit ratio is well under 20%.
You just CANNOT buy at an attractive price OR OVERCLOCK a CPU 3X faster than a modern
mid-end one. So if you're doing very repetitive accesses for code / data that's taking
something like 80% of your total run time, it's the cache effectiveness that's MOSTLY
responsible for your performance, NOT the CPU MHz!
Look up the terms "compute bound", "memory bound", "cache bound" -- some things
need little cache and fast CPU/main memory (streaming programs), some things need super effective cache,
some things need a mix of both.
Assuming you overclock WHATEVER processor you buy to the max, and overclock whatever RAM
you have to the max, the KEY difference in whether you should even BOTHER to try to run certain
programs is if they fit in the cache well.
Know why you don't hear about a lot of programs that run miserably poorly on PCs without
large caches in common benchmarks? Because nobody even would TRY to run such a thing
on a PC UNLESS it had enough cache to perform well. Otherwise you would buy a special
workstation class or supercomputer class processor or beowulf cluster or whatever because
you'd just be wasting time and money to even try to run something that would be crippled
on a PC due to the small cache.
It is only recently that common PCs have even HAD 1MBy + cache amounts and it is no
coincidence that we now see a DRAMATIC increase in program POWER and CAPABILITY
for PCs because such programs are not even practical unless you have at least in
the 1MBy+ cache range, at least 2 CPU cores, and at least 2 GBy of RAM, none of which
has been possible with the majority of the consumer class of CPUs before the past year or two.
From:
http://forum.folding-community.org/
What the code would need to do is compare the time it takes for different units to complete, or even just look at the points per hour, and look for wildly varying numbers between supposedly similar projects.
As an example, here's some snippets from my 2Mb cache machine:
Index 4: finished 1523.00 pts (28.569 pt/hr) 1.8 X min speed
begin: Fri Jul 20 11:38:32 2007 end: Sun Jul 22 16:57:04 2007;
Index 5: finished 1760.00 pts (59.755 pt/hr) 3.26 X min speed
begin: Sun Jul 22 17:16:41 2007 end: Mon Jul 23 22:43:54 2007;
53.3 hours vs. 29.5... and on my 4Mb cache machine the complete times for those two projects are virtually identical. Admittedly these numbers wouldn't be as easy to identify on machines that spend a lot of time turned off, but downtime is going to average out some across the last ten units listed in the queue. A disparity that large shouldn't be that hard to identify.
Originally posted by: SerpentRoyal
Want repetitive tasks? Click on the next page of that article for video encoding and gaming data. Those are REAL-WORLD applications. Faster core speed wins with video encoding. Larger cache has a small advantage with some games. So where's the HUGE benefit of a larger cache?
You continue to harp on those bandwidth numbers. Let's translate those numbers to a real-world condition using my test rig. E4300 and E6320 on the same test platform. Both CPUs are set at 3.4GHz. During the re-building of a DVD movie, 6320 is 5% faster. Since the E6320 has an 7x multiplier, I have to crank up the memory speed and FSB speed to match the core speed of the E4300 (9x multi). If you discount the benefit of FASTER RAM speed, then the E6320 is only faster by 3 to 4%.
http://www.anandtech.com/cpuch...howdoc.aspx?i=2903&p=5
http://www.anandtech.com/cpuch...howdoc.aspx?i=2903&p=6