The CAS latency of memory modules in nanoseconds is a function of the latency in cycles and frequency.
For a DDR400 CL3 module, the latency in nanoseconds is calculated as follows:
Latency/ns = T X CAS
Where T is the period of the clock cycle, and CAS refers to the latency in cycles.
So, for a 200MHz DDR400 CL3 module, the latency is given by:
(1 / 200 X 10^6) X 3 = 15ns.
If you want to calculate the latency in ns of other modules, just substitute your own values into this formula:
(1 / clock cycles per second) X CAS Latency in cycles.
<Edit>
That can also be rewritten as: CAS Latency / Clock speed (actual)
</Edit>
Don't forget, the above calculates the theoretical latency of the memory module.
It does not give you the latency the processor sees, which is more dependent on the memory interface speed.
Also, if you want to calculate the equivalent latencies for modules of different frequency and CAS latency, then first find the latency in ns of the modules you want to compare with.
For example, if you want to find the equivalent latency of a CAS2 DDR400 and a CAS3 module of unknown frequency, just put them equal to eachother, rearranging to make the unknown the subject.
We know that the latency in ns of a CAS2 DDR400 module is 10ns (2/3 of 15ns) using the example above.
To find out the frequency a CAS3 module must reach to achieve the same latency, do it as follows:
10ns = (1 / Clock speed) X 3 {Simplifying...}
10ns = 3 / Clock speed {Making Clock speed the subject...}
Clock speed = 3 / 10E-09 [1 ns = 1E-09 s]
Clock speed = 3 X 10^8 = 300MHz.
Of course, common sense could have revealed the clock speed necessary in that case, but that formula can be used for more complicated examples.
It should be pointed out that memory performance is not determined by latency alone, however.
Memory is very often accessed in bursts, that is, once the initial access penalty (latency) is over, data is transmitted every clock cycle. In the example above, a CAS3 DDR600 module will outperform a CAS2 DDR400 module most of the time.
On the subject of which platform prefers lower latency and which prefers bandwidth, it isn't as simple as the P4 preferring bandwith and the Athlon preferring lower latency.
The Athlon XP couldn't make use of the bandwidth of faster memory modules in dual channel mode simply because it used a 64-bit 400MHz FSB, which could only utilise the bandwidth of one DDR400 module. The P4 has a 64-bit FSB but with an effective speed of 800MHz, meaning it responds well to moving from DDR333 to DDR400 memory in dual channel mode.
The degree of benefit a processor can derive from higher bandwidth memory is really determined by the maximum sustained execution rate it can manage. Of course, memory bandwidth isn't a problem if the processor's caches are being used fully, but in the cases of encoding and compression for example, memory bandwidth suddenly becomes a bottleneck.
The Athlon 64, with its ultra-fast memory interface still benefits from higher memory bandwidth. Looking at gaming benchmarks for example, the benefit derived from higher bandwidth can be anything up to 10%.
If the memory cannot feed the processor with the data it needs at a high enough rate, then pipeline bubbles are going to start to appear. The faster the processor, the higher its throughput, and in order to keep the pipeline full, it needs to be supplied with data that much more quickly.