AMD 'Bulldozer' gets an Update from Microsoft

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

toyota

Lifer
Apr 15, 2001
12,957
1
0
if this could keep HT from ever being a hindrance then the i7 sure would look a lot better. I just refused to pay 100 bucks more for the 2600k knowing is was basically no better for gaming and could even cause an issue in rare cases.
 

beginner99

Diamond Member
Jun 2, 2009
5,228
1,603
136
if this could keep HT from ever being a hindrance then the i7 sure would look a lot better. I just refused to pay 100 bucks more for the 2600k knowing is was basically no better for gaming and could even cause an issue in rare cases.

IMHO windows 7 and HT work very well. I have a i7-870 and you can easily see that it first taxes the real cores and only then the "virtual" ones. (eg. first cores 0,2,4,6 and then the rest).
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
IMHO windows 7 and HT work very well. I have a i7-870 and you can easily see that it first taxes the real cores and only then the "virtual" ones. (eg. first cores 0,2,4,6 and then the rest).
for gaming it offers little to no help and can even slow down performance a little or cause stuttering. Windows 7 addressed a lot of it but it still happens some. again hopefully this patch can help Intel out a little and at least make it where HT never causes any adverse effects.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
for gaming it offers little to no help and can even slow down performance a little or cause stuttering. Windows 7 addressed a lot of it but it still happens some. again hopefully this patch can help Intel out a little and at least make it where HT never causes any adverse effects.
A CPU in which HT will never have performance regressions v. HT off is one in which a single thread can not use more than 50% of each CPU unit's time and memory. It can be worked around some and the hit reduced, but that's just part of the package.

SMT aims to solve the problem of low utilization of execution units. If that isn't your problem, it might not be a good solution. The best thing they could do, IMO, would be to allow HT to be dynamically toggled.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
The best thing they could do, IMO, would be to allow HT to be dynamically toggled.

Would it at all be possible to do what the graphics driver guys do and have a CPU driver that stores a default and user setable profile per software application?

For example, my laptop has optimus and its pretty slick how I can specify which applications result in the NV discreet GPU being enabled versus which apps are to be ran using the integrated GPU on the CPU.

How feasible would it be for Intel to produce a CPU driver, so to speak, which had a bevy of built-in default yes/no hyperthreading lookup tables based on the application which is consuming CPU resources? Dynamically enabled and disabling the logical CPU's on the fly.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
Would it at all be possible to do what the graphics driver guys do and have a CPU driver that stores a default and user setable profile per software application?

For example, my laptop has optimus and its pretty slick how I can specify which applications result in the NV discreet GPU being enabled versus which apps are to be ran using the integrated GPU on the CPU.

How feasible would it be for Intel to produce a CPU driver, so to speak, which had a bevy of built-in default yes/no hyperthreading lookup tables based on the application which is consuming CPU resources? Dynamically enabled and disabling the logical CPU's on the fly.

Affinity?
affinity.jpg
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Affinity?
affinity.jpg

Exactly.

That's pretty annoying to have to set up for every program you run every time you run it...

You only have to do it with LinX if you want the maximum heat/temperatures/power-consumption possible during your stability testing.

If you aren't interested in generating the max, then you don't have to worry about it.

You can run with HT, or without affinity locking, but it is less rigorous.

So it really just comes down to the end-user asking themselves why they are even bothering to go to the trouble of stress testing in the first place.

If the answer is "I want to know that my rig is stable under the most challenging of environments" then you will want nothing less for yourself than to go to the trouble of setting up LinX to use max-memory, affinity locked to physical cores, and set the thread count to that of the physical core count (4 for 2600K, not 8).

Folks who aren't concerned with really testing their rig for stability under challenging environments will find themselves satisfied by just running Prime95 (blend, large, small...whatever) or loading LinX/IBT and just letting the settings/threads do their default thing and just go with it.

No harm in either path, but the paths are not identical and it merits distinguishing between the two when people are making extraordinary claims of stable OC's at given voltages and so on.

Apples to oranges stability comparisons are problematic as we've all seen.
 

RobertR1

Golden Member
Oct 22, 2004
1,113
1
81
Wait for it to be rolled out as something more robust than a hotfix. If it is everything and a bag of chips then it will be included in a standard windows update download without you needing to install it manually.

Thanks. That's why I wanted to ask since MS had removed it already. Nothing worse than an unstable system. Hopefully, it'll be part of an official update down the road.


Merry Christmas.
 

bldegle2

Junior Member
Nov 27, 2005
5
0
0
Holiday wish to fix Zambezi...

SMP the Thubans, dual/multi ziffed AMD consumer boards, done...

Unfortunately, it can't be done, the SMP, just a thought...

laterzzzz.....
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
any update on this? I haven't heard anything in the news.

Microsoft pulled the patch. No joy for Zambezi owners :p

"there are actually two updates needed for AMD Bulldozer CPU architecture. Microsoft posted just the first patch and we do not believe users would benefit in any way from it. The patch was originally scheduled for the first quarter 2012 and then the users will see tangible performance benefits when using Windows 7 and Windows Server 2008 R2 operating systems."
 

Diceman2037

Member
Dec 19, 2011
54
0
66
increased my 920's cinebench score as well, usually i had to increase the process priority to get > 6, now i can get 6.20 and > 6.40 if forcing above normal priority.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
increased my 920's cinebench score as well, usually i had to increase the process priority to get > 6, now i can get 6.20 and > 6.40 if forcing above normal priority.
I wish there was a reasonable technical explanation of the mentioned patch.

No changelog these days, "improved Windows scheduler" doesn't fit my bill, sorry. Having said that, I applied this patch to a couple of Thubans I have at my disposal... well the difference is negligible, around 5%.

MC awaits
-SECOND ROUND-
 
Last edited:

bryanW1995

Lifer
May 22, 2007
11,144
32
91
if this could keep HT from ever being a hindrance then the i7 sure would look a lot better. I just refused to pay 100 bucks more for the 2600k knowing is was basically no better for gaming and could even cause an issue in rare cases.

Ha, I bought the 2500k b/c I was going cheap/value and was planning to put this system into my spare rig. Unfortunately, even as a dirt-cheap budget build, it is so much better than my i7 920 that I was forced to alter my strategai... ;)
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
81
The Bulldozer scheduler tweak shouldn't do zilch for other architectures, seems like it was just designed to make every even core (0, 2, 4, 6) be recognized as a physical core and every odd core (1, 3, 5, 7) be recognized as a logical core. Then the same scheduling logic already used on HyperThreaded CPUs (avoid scheduling threads to logical cores whenever possible and give first priority to physical cores) can be applied to Bulldozer to optimize performance in lightly threaded situations.

If performance was improved on Intel or other CPUs, it was due to something else. It might be that there were some scheduler tweaks for other processor families that MS was going to introduce in Win 8 as well, and they figured if they were going to go through the trouble of testing, validating, and rolling out a scheduler patch for BD CPUs, they might as well throw these other optimizations in as well.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
frostedflakes,

Yeah, that makes sense. Microsoft has been slow though. Thought this would have been addressed Pre-2012. Intel must have bribed them :D

Does *nix have similar "difficulties"?
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
The Bulldozer scheduler tweak shouldn't do zilch for other architectures, seems like it was just designed to make every even core (0, 2, 4, 6) be recognized as a physical core and every odd core (1, 3, 5, 7) be recognized as a logical core. Then the same scheduling logic already used on HyperThreaded CPUs (avoid scheduling threads to logical cores whenever possible and give first priority to physical cores) can be applied to Bulldozer to optimize performance in lightly threaded situations.

If performance was improved on Intel or other CPUs, it was due to something else. It might be that there were some scheduler tweaks for other processor families that MS was going to introduce in Win 8 as well, and they figured if they were going to go through the trouble of testing, validating, and rolling out a scheduler patch for BD CPUs, they might as well throw these other optimizations in as well.

The problem with HT CPU's is that the scheduler still doesn't use the distinction of logical or physical cores when it schedules threads.

This is entirely self-evident when you attempt to run LinX with 4 threads on a 2600K. If you don't manually thread-lock to physical cores then those 4 threads will bounce around all the logical and physical threads and the power-consumption/temperatures will be lower.

So any changes to the scheduler that attempts to actually leverage information regarding the heterogenous performance topology of non-CMP microarchitectures is a tide that will lift all boats.

BUT...my understanding was that this is NOT how the scheduler patch works for BD. The patch works by ganging threads onto as few modules as possible, not forcing them apart, so that the turboclocks on the modules can be activated and the clockspeed boost alone more than compensate for the CMT performance hit of shared resources within the module.

However, I do not know for certain whether or not this is the case.
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
81
You may be right about it trying to clump threads together instead of spreading them out. It would be nice if we had more documentation, hopefully the lack of it was due to the patch being released prematurely and when it's properly released we'll get more details.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Did this update even do anything measurable to help BD performance?

No one really got a chance to find out, Microsoft only released half of the update and then pulled that half back. So we all wait for a later date.
 

Diceman2037

Member
Dec 19, 2011
54
0
66
The problem with HT CPU's is that the scheduler still doesn't use the distinction of logical or physical cores when it schedules threads.

This is entirely self-evident when you attempt to run LinX with 4 threads on a 2600K. If you don't manually thread-lock to physical cores then those 4 threads will bounce around all the logical and physical threads and the power-consumption/temperatures will be lower.

the windows scheduler operates in several distinct manners depending how you distribute the threads. windows at idle will keep as much off the hyperthreads as it can and only pass over things that can be deffered a cycle to one of the logical cores.

Lynx works the way it does because it binds the thread to the core mask which is ignorant to the way of the logical processor / physical processor scheduling, so assigning 4 threads will put load on the first 4 processor masks being the primary and hyperthread of cores 1 and 2.