K10.5 memory latency

DrMrLordX · Dec 15, 2009

I'd like to take a survey of system memory latency as recorded by CPU-Z's latency.exe tool:

http://www.cpuid.com/download/latency.zip

Just unzip and run the executable. Usually the bottom-right number in the grid represents your effective memory latency.

The platform's I'd like to survey are AM3/DDR3 Phenom II and/or Athlon II X4 machines.

All I really need to know is:

NB speed
memory multiplier
CPU multiplier
memory timings (just the main four, though every setting availabe in your BIOS is okay with me if you care to record all the values)
HTT speed
system memory latency as reported by CPU-Z's latency.exe tool

If you want to report on Intel machines, AMD DDR2 AM2+/AM2 machines, or anything else, that's fine too, but the feedback I want the most is from AM3/DDR3 platforms. Doesn't even have to be overclocked.

Thanks in advance.

schenley101 · Dec 15, 2009

Phenom 2 955 stock 16*200
ddr3 1066 6-6-6-20

Level 1 size = 64Kb latency = 3 cycles
Level 2 size = 512Kb latency = 17 cycles
Level 3 size = 4096Kb latency = 56 cycles

DrMrLordX · Dec 15, 2009

schenley101 said:
Phenom 2 955 stock 16*200
ddr3 1066 6-6-6-20

Level 1 size = 64Kb latency = 3 cycles
Level 2 size = 512Kb latency = 17 cycles
Level 3 size = 4096Kb latency = 56 cycles

It didn't list your system memory latency? That's what the large number grid at the top is for. Thanks for the feedback though.

schenley101 · Dec 15, 2009

DrMrLordX said:
It didn't list your system memory latency? That's what the large number grid at the top is for. Thanks for the feedback though.

Cache latency computation, ver 1.0
www.cpuid.com

Computing ...

stride 4 8 16 32 64 128 256 512
size (Kb)
1 3 3 3 3 3 3 3 3
2 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 4 3
8 3 3 3 3 3 3 3 3
16 3 3 3 3 3 3 3 3
32 3 3 3 3 3 3 3 3
64 3 3 3 3 3 3 3 3
128 3 3 3 5 9 15 15 16
256 3 3 3 5 9 15 15 16
512 3 3 4 7 12 22 22 23
1024 3 4 6 12 23 54 54 55
2048 3 4 6 12 22 54 54 55
4096 3 4 6 13 24 57 59 62
8192 3 4 6 13 27 80 98 164
16384 3 4 6 13 27 90 99 174
32768 3 4 6 13 26 90 98 177

3 cache levels detected
Level 1 size = 64Kb latency = 3 cycles
Level 2 size = 512Kb latency = 17 cycles
Level 3 size = 4096Kb latency = 56 cycles

sorry i thought that I had cut/paste the whole thing

DrMrLordX · Dec 15, 2009

That's cool, thanks. So, 177 . . . hmm! Interesting.

Idontcare · Dec 15, 2009

What is it for an otherwise equivalently clocked Athlon II X4? Without that L3$ snoop getting in the way?

DrMrLordX · Dec 16, 2009

That is a very good question. Had I a Propus, I would test for that. Hopefully someone who owns such a chip can chime in. Or maybe I'll upgrade after Christmas if things go well . . . but that is wishful thinking.

The other question that comes to mind is: how much of that memory latency is a product of latency between the cores and the NB/IMC and how much of it is a product of latency between the NB/IMC and the DIMMs? I ask this because I was tinkering with my sorry old x2-3600+ earlier today and ran an odd little test at low CPU clock speeds.

What I did was configure the memory to have all the worst timings allowed in the BIOS (6-6-6-18, plus some other timings were increased) and set the ratio to 1:1 (DDR2-400) and came up with a latency of 175 cycles (on average) at a clockspeed of 1.7 or 1.8 ghz or whatever it was.

Then I just switched the ratio to 2:1 (DDR2-800) without changing anything else and two funky things happened:

1). The Brisbane IMC bug/"feature" reared its ugly head, causing the memory to run at DDR2-720 instead of DDR2-800
2). The reported system memory latency was 115 cycles.

Since I only got 80% of the memory overclock I was supposed to get, I figured I only got 80% of the latency reduction I was supposed to get, but even dividing the latency I got by .8 yields a final estimated latency of 100 cycles (can not test this for real due to the bizarre way Brisbanes clock RAM). Which means had I switched to DDR2-800 for real, I should have taken off 75 cycles.

The strange part about that is that, you'd think a doubling of system memory speed without any change in timings or processor speed that resulted in a 75-cycle reduction in latency would have come from an initial memory latency of 150 cycles, not 175. Unless . . .

Unless there's an intrinsic 25 cycles of latency between the cores and the IMC on my Brisbane. Then it would all work out; after all, increasing memory speed and/or reducing memory timings does nothing to affect latency between the cores and the IMC.

I would think there would be a similar, if not more-grave situation on K10 due to the NB so often running at clock speeds below that of the cores. And, as you indicated, when you have L3 on-die, there's a snoop penalty.

Idontcare · Dec 16, 2009

Oh yeah, absolutely you are dealing with a system of serialized events (chronology) and when you increase the speed of just one of the events it does not make everything speed up.

http://en.wikipedia.org/wiki/SDRAM_latency

Speeding up the ram does reduce the latency of the ram, but the ram itself doesn't represent the entire topology across which the data in the ram is to travel.

You know this is true because this is why integrating the memory controller onto the CPU die (making it an IMC) versus having it in the northbridge chipset makes such a huge impact to reducing the latency even if there are no improvements to the ram speed since latencies are additive.

Now I vaguely remember reading something a year ago (or so) that outlined how/why the L3$ on Phenom and Phenom II actually caused an increase in latency when reading from ram and (again to vague recollection) the reasoning was that the memory hierarchy forces the cpu to check the L3$ for the contents and confirm a copy is not already there before it can submit the request to the IMC to read the data from the ram (it doesn't do both in parallel, which would be faster but more complex to implement).

I didn't think much of it as the L3$-less version of the architecture did not exist at the time, but with Athlon II X4 it does.

DrMrLordX · Dec 16, 2009

Idontcare said:
Speeding up the ram does reduce the latency of the ram, but the ram itself doesn't represent the entire topology across which the data in the ram is to travel.

Indeedy. It's been a bit tricky trying to measure what that latency will be since publicly-available tools with which one can measure memory latency just give you one number, rather than a breakdown of how much latency there is per step.

I didn't think much of it as the L3$-less version of the architecture did not exist at the time, but with Athlon II X4 it does.

Some early K10-capable boards had BIOSes that could disable the L3 as a "fix" for the dreaded TLB bug. Woulda been interesting to use a board like that to see how L3 was affecting memory latency.

schenley101 · Dec 17, 2009

I changed the timing to 6-6-6-18

what exactly are you looking for?

Cache latency computation, ver 1.0
www.cpuid.com

Computing ...

stride 4 8 16 32 64 128 256 512
size (Kb)
1 3 3 3 3 3 3 3 3
2 3 3 3 3 3 3 3 3
4 3 3 3 3 3 3 3 3
8 3 3 3 3 3 3 3 3
16 3 3 3 3 3 3 3 3
32 3 3 3 3 3 3 3 3
64 3 3 3 3 3 3 3 3
128 3 3 3 5 9 15 15 15
256 3 3 3 5 9 15 15 15
512 3 3 3 5 9 15 15 15
1024 3 4 5 11 22 54 54 55
2048 3 4 6 12 22 53 54 55
4096 3 4 6 12 24 55 56 58
8192 3 4 6 12 24 78 97 168
16384 3 4 6 12 23 90 98 172
32768 3 4 6 12 24 90 98 174

3 cache levels detected
Level 1 size = 64Kb latency = 3 cycles
Level 2 size = 512Kb latency = 15 cycles
Level 3 size = 4096Kb latency = 54 cycles

K10.5 memory latency

DrMrLordX

Lifer

schenley101

Member

DrMrLordX

Lifer

schenley101

Member

DrMrLordX

Lifer

Idontcare

Elite Member

DrMrLordX

Lifer

Idontcare

Elite Member

DrMrLordX

Lifer

schenley101

Member

TRENDING THREADS