The cas latency is one of many variables that determine the performance.
By itself, it is unlikely to be noticed assuming the frequency is high enough to make up for it.
Here is an overview of some formulas:
assuming the following:
Pretech = 8 for ddr4, 16 for ddr5
Bus width = 128bit for dual channel
Bandwidth ddr4 3200MHz:
(Pretech/2)*(Bus width)/8*3200/1000 = 204.8Gbit/s = 25.6GiB/s
Bandwidth ddr5 6400MHz:
(Pretech/2)*(Bus width)/16*6400/1000 = 409.6Gbit/s = 51.2GiB/s
Latency ddr4 3200MHz CL 20:
(read one byte) 20*2000/3200 = 12.5ns
(read 8 bytes) 20*2000/3200 + (7000/3200) = 14.69ns
Latency ddr5 6400MHz CL 40:
(read one byte) 40*2000/6400 = 12.5ns
(read 8 bytes) 40*2000/6400 + (7000/6400) = 13.59ns
(read 16 bytes) 40*2000/6400 + (15000/6400) = 14.84ns
These Latencies are essentially times to "burst" for single accesses to memory. If you do multiple (ie: larger than 1,4,8 bytes for ddr4, or over 16 bytes for ddr5) then you are limited by bandwidth and other factors. Also none of this even begins to take into account other complexities or commands, cache, etc.
I should have used 4800 cl40 ddr5 for the comparison.
The general idea being latency might be lower for some things, but the bandwidth and CPU cache is likely to make up for it in most scenarios. Once we see ~6400 with fast latencies, then the overrall latency will be comparible worst case.
We didn't get a prefetch "burst length" increase going from ddr3->ddr4, so this will be a bigger transistion than we saw then. Also the increased banks and bank groups should help a lot.
For gaming, and most apps, ddr5 with higher CL will be faster than ddr4 just due to it's other improvements.
I likely made some mistakes in here, but you should get the basic idea.