Simple question: given that newer architectures of the past decade have resorted to data prefetching as a viable option to hide memory load delays (especially given that CPU:FSB/MEM_BUS ratios are ever-increasing); how is prefetching accomplished given the constant ratio changes and therefore memory load delay changes?
To clarify, be it IPF/x86/Power PC/et al. family of CPU/chipset hardware, fact remains that speed bumps and newer and better system designs for a given architecture keep arising. This directly translates to the fact that the time taken to load data from memory is - while generally decreasing - constantly changing even between every CPU speed-bump. At the same time, my understanding of prefetching tells me that HW/compiler places prefetch-load instructions in stretegic places in the compiled code so as to bring the potentially needed data as close to the execution units by the time they are needed.
How then can the same binary be just as efficient say after a CPU speed bump for example, when the CPU:FSB/MEM_BUS ratio has changed. This HW change has caused the effective memory load delay to go up. This in turn - as I interpret - means that the data may NOT be in the desired location/level of cache/registers by the time it's needed with this newer 'configuration' (faster CPU with same system board etc).
A side question: what's the difference in doing this prefetching in hardware vs. software/compilers (I guess the latter option may only pertain to IPF/ EPIC architures?) Pros/cons?
To clarify, be it IPF/x86/Power PC/et al. family of CPU/chipset hardware, fact remains that speed bumps and newer and better system designs for a given architecture keep arising. This directly translates to the fact that the time taken to load data from memory is - while generally decreasing - constantly changing even between every CPU speed-bump. At the same time, my understanding of prefetching tells me that HW/compiler places prefetch-load instructions in stretegic places in the compiled code so as to bring the potentially needed data as close to the execution units by the time they are needed.
How then can the same binary be just as efficient say after a CPU speed bump for example, when the CPU:FSB/MEM_BUS ratio has changed. This HW change has caused the effective memory load delay to go up. This in turn - as I interpret - means that the data may NOT be in the desired location/level of cache/registers by the time it's needed with this newer 'configuration' (faster CPU with same system board etc).
A side question: what's the difference in doing this prefetching in hardware vs. software/compilers (I guess the latter option may only pertain to IPF/ EPIC architures?) Pros/cons?
