The toy after a week of use

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Dadofamunky

Platinum Member
Jan 4, 2005
2,184
0
0
Originally posted by: BTRY B 529th FA BN
Man, this thread is just making me peer over into the AMD court again. I just would be lost on a board.

I know, me too! :D
 

soonerproud

Golden Member
Jun 30, 2007
1,874
0
0
Originally posted by: Dadofamunky
Originally posted by: BTRY B 529th FA BN
Man, this thread is just making me peer over into the AMD court again. I just would be lost on a board.

I know, me too! :D

I know, me three!!!!!!!!!

Oh ,wait a minute.

Silly me, I already did. LOL

:roll:
 
Dec 30, 2004
12,553
2
76
Originally posted by: Idontcare
Or worse, who wants to buy four of them to get that board up to 32GB? Ouches.

Would rather just get a dual core server motherboard (with 8 slots for RAM)...
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
Originally posted by: soccerballtux
Lopri could you post some numbers about performance at 3.8ghz while simply cranking the multiplier up, vs. taking the time to find the HTT/memory limit? (HTT is the right term correct? It's been a while since I overclocked an AMD system and my memory is terrible :p. IE we know you lose very little [1-2%] performance on the Core 2 series chips by running at a higher multiplier vs higher bus; what happens on the AMDs?)
What numbers would you like to see? As far as I can tell, I didn't see any meaningful difference between the two differently configured but identical frequencies. (This is actually what's nice about these 'Black Edition' or 'Extreme Edition' CPUs)

Here is another observation I made. At its default multi, everything moves along with it up to 224 HTT. And once I reach 225 and beyond, the CPU keeps going up as it should (i.e. 16x225, 16x226, 16x227,..) yet HT (this is the equivalent of FSB/QPI) and NB don't follow the CPU. It looks as if their PLL resets, and instead of 10x225 (default NB multi + dictated HTT) it becomes 10x205. Then from there on, it goes up to 224 and resets again. (that's where CPU is 16x249)

Even more mysteriously, the reset value (which looks to be 25-20 at the default CPU multi, x16) differs per multipliers. Things like this are what I haven't figured out yet, but time will tell..

 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
~50C while running Linpack if NB is not overclocked.
~60C when NB is @2.6GHz.

But like I commented a few days ago, I can hardly believe a quad-core CPU @3.8GHz is running so cool. I think the 'CoreTemp' is about 10C higher, judging by my 'feel' test - I remove the fan from a heatsink and load the CPU lightly (like ~50%), then touch the heatsink to see how quickly it heats up. Compared to Q6600 - 955BE is defintely cooler. Compared to Q9400 - I am not sure.

Originally posted by: soccerballtux
Cinebench would be fine for me; any speed differences realized in that seem to be mirrored in games reliably.
Will do it for ya.
 

error8

Diamond Member
Nov 28, 2007
3,204
0
76
Well it really looks like a hell of a chip. Too bad that where I live it costs 290 $. :(
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
Originally posted by: soccerballtux
Cinebench would be fine for me; any speed differences realized in that seem to be mirrored in games reliably.

I have tested Cinebench at 3 different HTT points, and the differences are negligible and within the margin of errors (and differences per run). But this may or may not be what it's supposed to be. My board doesn't allow dividers/timings outside what it wants, so I can't manipulate the parameters as precisely as I want.

3.6 GHz | 18 x 200 -> 14531
3.6 GHz | 16 x 225 -> 14513
3.6 GHz | 12 x 300 -> 14565

From the above, it seems as if raising HTT actually hurts the performance because it gave the same score despite higher NB frequency. But it also has a disadvantage of asyncronous memory divider, so it's not a definite answer. (i.e. not apple-to-apple) I hope Gigabyte release a decent BIOS for the board sooner or later.
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
BTW can anyone with i7 run a cpu-z latency test? It looks like this. Stock, overclocked, whichever is convenient for you.
 

Flipped Gazelle

Diamond Member
Sep 5, 2004
6,666
3
81
Originally posted by: lopri
Originally posted by: soccerballtux
Cinebench would be fine for me; any speed differences realized in that seem to be mirrored in games reliably.

I have tested Cinebench at 3 different HTT points, and the differences are negligible and within the margin of errors (and differences per run). But this may or may not be what it's supposed to be. My board doesn't allow dividers/timings outside what it wants, so I can't manipulate the parameters as precisely as I want.

3.6 GHz | 18 x 200 -> 14531
3.6 GHz | 16 x 225 -> 14513
3.6 GHz | 12 x 300 -> 14565

From the above, it seems as if raising HTT actually hurts the performance because it gave the same score despite higher NB frequency. But it also has a disadvantage of asyncronous memory divider, so it's not a definite answer. (i.e. not apple-to-apple) I hope Gigabyte release a decent BIOS for the board sooner or later.

I think those scores are too close to call - Cinebench, in my experience, does not produce the exact same scores on every run.

My X3 710, 4th core unlocked, @ 3.5 Ghz (13x269) scored 14698. NB is at 2.4 Ghz, RAM (DDR2) is ~ 840.
 
Dec 30, 2004
12,553
2
76
Originally posted by: Flipped Gazelle
Originally posted by: lopri
Originally posted by: soccerballtux
Cinebench would be fine for me; any speed differences realized in that seem to be mirrored in games reliably.

I have tested Cinebench at 3 different HTT points, and the differences are negligible and within the margin of errors (and differences per run). But this may or may not be what it's supposed to be. My board doesn't allow dividers/timings outside what it wants, so I can't manipulate the parameters as precisely as I want.

3.6 GHz | 18 x 200 -> 14531
3.6 GHz | 16 x 225 -> 14513
3.6 GHz | 12 x 300 -> 14565

From the above, it seems as if raising HTT actually hurts the performance because it gave the same score despite higher NB frequency. But it also has a disadvantage of asyncronous memory divider, so it's not a definite answer. (i.e. not apple-to-apple) I hope Gigabyte release a decent BIOS for the board sooner or later.

I think those scores are too close to call - Cinebench, in my experience, does not produce the exact same scores on every run.

My X3 710, 4th core unlocked, @ 3.5 Ghz (13x269) scored 14698. NB is at 2.4 Ghz, RAM (DDR2) is ~ 840.

Too close to call is just the sort of verdict I was looking for-- looks like all we have to worry about is cranking up the multiplier.
 

soonerproud

Golden Member
Jun 30, 2007
1,874
0
0
Originally posted by: soccerballtux
Originally posted by: Flipped Gazelle
Originally posted by: lopri
Originally posted by: soccerballtux
Cinebench would be fine for me; any speed differences realized in that seem to be mirrored in games reliably.

I have tested Cinebench at 3 different HTT points, and the differences are negligible and within the margin of errors (and differences per run). But this may or may not be what it's supposed to be. My board doesn't allow dividers/timings outside what it wants, so I can't manipulate the parameters as precisely as I want.

3.6 GHz | 18 x 200 -> 14531
3.6 GHz | 16 x 225 -> 14513
3.6 GHz | 12 x 300 -> 14565

From the above, it seems as if raising HTT actually hurts the performance because it gave the same score despite higher NB frequency. But it also has a disadvantage of asyncronous memory divider, so it's not a definite answer. (i.e. not apple-to-apple) I hope Gigabyte release a decent BIOS for the board sooner or later.

I think those scores are too close to call - Cinebench, in my experience, does not produce the exact same scores on every run.

My X3 710, 4th core unlocked, @ 3.5 Ghz (13x269) scored 14698. NB is at 2.4 Ghz, RAM (DDR2) is ~ 840.

Too close to call is just the sort of verdict I was looking for-- looks like all we have to worry about is cranking up the multiplier.

Yup, and I think I will give that a try now and see how far I can go on stock cooling.
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
Originally posted by: jaredpace
Originally posted by: lopri
judging by my 'feel' test - I remove the fan from a heatsink and load the CPU lightly

Someone just posted a video of a 'feel' test:
http://forums.anandtech.com/me...=2301050&enterthread=y
Oh no~s. He's not my type. :laugh:

Edit: My 955BE setup is now moving onto a fine-tuning stage for complete setup later. So far I managed to lower the NB voltage to 1.30 (from 1.35V) while maintaining 2.60 GHz. Once I'm finished with this, I will save the BIOS and move on to more fun part. Yay!
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
I think this will be the last report on my 955BE overclocking. Can't believe it's already been a week since I got this chip. I've learned quite a bit clocking-wise, but the mystery of Phenom is profound. :laugh: (more on that later)

My setup is the #3 rig in my sig. OS is 64-bit Windows 7, and I am utilizing 4x2GB of DDR2-800 RAM (SPD is DDR2-667).

  • Max OC using air-cooling (Scythe Infinity): CPU 3.8 GHz (1.39V) / NB 2.6 GHz (1.30V)
    Max CPU OC using two fans (push-pull): CPU 3.9 GHz (1.42V) / NB 2.0 GHz (1.10V)
I have learned that this CPU likes 'just right' amount of voltage, and the NB seems to respond very sensitively to temperatures. (i.e. the lower the better) Towards the end, my choice of the 'sweet spot' for 24/7 OC is CPU 3.6 GHz (1.35V) / NB 2.6 GHz (1.25V). This combination gave me the best blend of performance and thermal characteristics. Another plus is this combo allows me to not raise the core voltage. Stability was tested with many loops of Linpack, PCMark Vantage, 3DMark Vantage, Cinebench, and 4 simultaneous instances of Super pi runs.

Linpack 20 pass Running

Linpack 20 pass Ending

Linpack & 3DMark06

3DMark Vantage

PCMark Vantage

Very satisfying experience, to say the least. :)

P.S. It has been brought up in this thread, but the way Phenom's IMC/NB works is pretty unique. Not that I figured it out, but I have some data to start with. I ran quick cpu-z latency test with CPU and NB under same frequencies, and different frequencies, then measured the L3 cache latency. The result is intriguing.

CPU -------- NB -------- L3 Latency
1600 ------ 1600 ------ 41 cycles
1800 ------ 1800 ------ 41 cycles
2000 ------ 2000 ------ 41 cycles
..
2600 ------ 2600 ------ 41 cycles

L3 latency stays the same (41 cycles) as long as CPU and NB are a same frequency. Next I fixed the NB frequency @1.6GHz and raised the CPU frequency using multipliers. I don't know how this discrepancy will affect the performance of the CPU, but hopefully I'll find out with others' help.

CPU -------- NB -------- L3 Latency
1600 ------ 1600 ------ 41 cycles
1800 ------ 1600 ------ 42 cycles
2000 ------ 1600 ------ 45 cycles
2200 ------ 1600 ------ 49 cycles
2400 ------ 1600 ------ 51 cycles
2600 ------ 1600 ------ 55 cycles

I don't know what this means, but this is the beginning. Screenshots if you're interested -> Click
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: lopri
P.S. It has been brought up in this thread, but the way Phenom's IMC/NB works is pretty unique. Not that I figured it out, but I have some data to start with. I ran quick cpu-z latency test with CPU and NB under same frequencies, and different frequencies, then measured the L3 cache latency. The result is intriguing.

CPU -------- NB -------- L3 Latency
1600 ------ 1600 ------ 41ns
1800 ------ 1800 ------ 41ns
2000 ------ 2000 ------ 41ns
..
2600 ------ 2600 ------ 41ns

L3 latency stays the same (41ns) as long as CPU and NB are a same frequency. Next I fixed the NB frequency @1.6GHz and raised the CPU frequency using multipliers.

CPU -------- NB -------- L3 Latency
1600 ------ 1600 ------ 41ns
1800 ------ 1600 ------ 42ns
2000 ------ 1600 ------ 45ns
2200 ------ 1600 ------ 49ns
2400 ------ 1600 ------ 51ns
2600 ------ 1600 ------ 55ns

I don't know what this means, but this is the beginning. Screenshots if you're interested -> Click

Lopri those are latencies in "clock cycles" not nanoseconds.

When your L3$ is clocked the same as the core the L3$ latency is 41 cycles regardless the core/L3$ frequency.

When you clock the CPU higher while holding the NB (L3$) frequency unchanged the latency (in clockcycles) gets worse but the absolute time for the memory access is fixed and is not changing (because the L3$ clockspeed is not changing).
 

lopri

Elite Member
Jul 27, 2002
13,325
706
126
You're correct. For some reason I always thought that cpu-z latency gives ns values. I have changed the post accordingly.

What would be an ideal test to look for the impact of core-NB discrepancy? I'd like absolute numbers but if that's not possible, something very consistent.

Edit: Or would that discrepancy directly translate to performace? Imagining a PII 3.6GHz with its NB running @3.6GHz as well. I haven't tested 3.6 GHz /1.6 GHz, but I'd assume the cycles will only get longer.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Take the 1.6GHz L3$ example...41 cycles @ 1.6GHz = 25.63ns

1.6GHz = 1.6E9 cycles/s

41 cycles/(1.6E9 cycles/s) = 2.56E-08 s = 25.63 ns

Now at 1.8GHz the latency in cycles is the same, 41 cycles, but there are more cycles/s (200million more to be precise) so the time per cycle is less and thus the time for 41 cycles is less.

41 cycles @ 1.8GHz = 22.78ns

And so on with the other core/uncore clock-synchronized combos:

41 cycles @ 2.0GHz = 20.50ns

41 cycles @ 2.6GHz = 15.77ns

So you see taking your L3$ from 1.6GHz all the way up to 2.6GHz has decreased the latency (in time) from 25.63ns to 15.77ns.

It is still 41 cycle latency, but the cycles are commensurately shorter (faster) in time so your L3$ blows thru the latency cycles all the quicker when clocked higher.

Now take the case where you leave the NB (L3$/uncore/etc) at 1.6GHz and start clocking up the CPU. At 1.6GHz the L3$ latency is 25.63ns and if the CPU is 1.6 GHz (same clockspeed) that 25.63 ns latency is equivalent to the CPU standing around waiting for 41 of its (the 1.6GHz CPU) clock cycles.

Now increase the clockspeed of the CPU but leave the cache as slow as ever (1.6GHz)...at 1.8GHz the CPU will clockstep thru 46.13 cycles (round it to 46 cycles since clock cycles are integers).

25.63ns @ 1.8GHz = 46 cycles

This means your 1.6GHz L3$ operating with a 41 cycle latency, which is 25.63ns, will make a 1.8GHz cpu wait 46 cycles (CPU cycles, not cache cycles) to access the memory. It is still the same time for the access, 25.63 ns, but in terms of the CPU it is waiting even more clock cycles to access the cache.

25.63ns @ 2.0GHz = 51 cycles

25.63ns @ 2.6GHz = 67 cycles

25.63ns @ 3.6GHz = 92 cycles

It is interesting to me that the latency reported by CPU-z does not scale as it should with the monotonic increase in core clockspeed.

For example increasing the core clockspeed from 1.6GHz to 1.8GHz should increase the L3$ latency from 41 cycles (cpu cycles) to 46 cycles (cpu cycles)...but you report CPU-z reporting the increase was merely 1 cycle to 42 cycles.

First off that isn't physically possible, so we must conclude the PhII contains some perfetcher circuitry that is masking some of the latency increase associated with increasing the core clockspeed. Some cache latency programs are good at knowing when prefetchers are circumventing their efforts to determine intrinsic latency, it would appear from your data that the latency test in CPU-z is getting spoofed by the cache on PhenomII and does not realize/account for the prefetchers at work in the background.

Which then begs the question - is the L3$ latency at 1.6GHz/1.6Ghz really 41 cycles? Or is it something much larger but prefetching (a good thing) is making CPU-z think the latency is much lower and thusly reporting the latency as merely 41cycle? (when it could be 80 cycle or even worse)