this confirms Windows 7 was in fact hampering “Bulldozer” from performing at 100% in all prior benches.
Hmmmm could it be a comeback??
Hope for competition's sake that Bulldozer is more a Pentium Pro than a Pentium 4. I disagree that it's terrible to need OS optimizations, Intel was throwing that FUD around with the introduction of x86-64 and AMD was using similar FUD to attack HyperThreading. Regardless of which category it ends up in, Bulldozer is still not a buy for consumers.
AMD and MS would need to work together on it, and to be sure it's correct, MS would need chip revisions guaranteed to give exactly the same performance as final production models. If they had it ready on the day of launch, they would still need to spend time testing it, and likely would not have gotten launch-equivalent CPU samples more than a few weeks ahead of the rest of the world.
Reliability anomalies are somewhat rare, thankfully, but performance anomalies are common, and have been getting patched for a good long while, either when MS or the vendor gets around to it.
It's more impressive on Intel's part that Nehalem and SB did not need any significant performance patching, than it is a black mark on AMD that BD needs it.
its not FUD... its terrible because AMD doesn't actually write that optimization, which means few others will.
Who do you think is going to write those optimization for open source OS, the type that powers all the servers in the world? Nobody that's who.
...and MS has a magic QA button, that allows them to know that changes made to a core OS functionality's behavior is going to remain isolated, and work exactly as expected, in 100% of cases, without doing extensive testing of use cases?kinda disagree there. This doesn't do anything besides spread the the threads on 0,2,4 and 6 first and then 1, 3, 5, and 7 second
Just so you guys know I installed the hotfix right when it came out on my sandy setup and its benching higher in everything that is using 8 threads and a bigger boost with avx.
I broke 120gflops using avx and 8 threads in intel burn test when before I could never get over 108 and my cinbench points increased as much as a 150-200mhz over clock would add.
this patch helps all threaded apps and helped out my 2600k
Its crazzy how a 1.3mb file ca add so much performance
pm me if you need it,I have it saved on my computer
Just so you guys know I installed the hotfix right when it came out on my sandy setup and its benching higher in everything that is using 8 threads and a bigger boost with avx.
I broke 120gflops using avx and 8 threads in intel burn test when before I could never get over 108 and my cinbench points increased as much as a 150-200mhz over clock would add.
this patch helps all threaded apps and helped out my 2600k
Its crazzy how a 1.3mb file ca add so much performance
pm me if you need it,I have it saved on my computer
HeavyHemi said:i7 980 4.3 Ghz 1.35 vcore
Just because I'm a crazy guy...
Pre patch...
Intel(R) LINPACK 64-bit data - LinX 0.6.4
Current date/time: Fri Dec 16 23:50:58 2011
CPU frequency: 4.207 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 12
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 10000
Leading dimension of array : 10008
Number of trials to run : 20
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 800844256, at the size = 10000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
10000 10008 4 10.715 62.2389 9.915883e-011 3.496441e-002
10000 10008 4 10.576 63.0562 9.915883e-011 3.496441e-002
10000 10008 4 10.549 63.2171 9.915883e-011 3.496441e-002
10000 10008 4 10.538 63.2818 9.915883e-011 3.496441e-002
10000 10008 4 10.622 62.7827 9.915883e-011 3.496441e-002
Post patch....
Intel(R) LINPACK 64-bit data - LinX 0.6.4
Current date/time: Sat Dec 17 00:24:04 2011
CPU frequency: 4.222 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 12
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 10000
Leading dimension of array : 10008
Number of trials to run : 20
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 800844256, at the size = 10000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
10000 10008 4 9.952 67.0063 9.915883e-011 3.496441e-002
10000 10008 4 9.951 67.0170 9.915883e-011 3.496441e-002
10000 10008 4 9.828 67.8505 9.915883e-011 3.496441e-002
10000 10008 4 9.900 67.3604 9.915883e-011 3.496441e-002
10000 10008 4 9.727 68.5595 9.915883e-011 3.496441e-002
It looks like this patch is not only about SMT (see i7 improvements), but probably contains some temporary thread pinning.
What is somewhat interesting is with HT disabled, there is zero difference in results. I just ran this twice with a clean boot
With the patch.
Intel(R) LINPACK 64-bit data - LinX 0.6.4
Current date/time: Mon Dec 19 01:17:58 2011
CPU frequency: 4.234 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 10000
Leading dimension of array : 10008
Number of trials to run : 20
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 800844256, at the size = 10000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
10000 10008 4 8.007 83.2852 9.915883e-011 3.496441e-002
10000 10008 4 7.898 84.4365 9.915883e-011 3.496441e-002
10000 10008 4 7.827 85.1980 9.915883e-011 3.496441e-002
10000 10008 4 7.836 85.1014 9.915883e-011 3.496441e-002
10000 10008 4 7.836 85.1068 9.915883e-011 3.496441e-002
And without the patch
Intel(R) LINPACK 64-bit data - LinX 0.6.4
Current date/time: Mon Dec 19 01:27:29 2011
CPU frequency: 4.233 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 10000
Leading dimension of array : 10008
Number of trials to run : 20
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 800844256, at the size = 10000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
10000 10008 4 7.830 85.1681 9.915883e-011 3.496441e-002
10000 10008 4 7.839 85.0728 9.915883e-011 3.496441e-002
10000 10008 4 7.843 85.0243 9.915883e-011 3.496441e-002
10000 10008 4 7.839 85.0719 9.915883e-011 3.496441e-002
10000 10008 4 7.840 85.0554 9.915883e-011 3.496441e-002
Would this hotfix do anything for people with Thuban or Sandy cpus?
I regularly see 10-15% core activity across all my PIIX6 cores during during browsing and listening to music, which keeps the core states active when they should be parked.
Windows does run on Bulldozer without this patch. It's just a tribute to the complexity and diversity among CPU architectures. There is no "one OS fits all" or "one compiler fits all" (on actual code generation) anymoreSo... in other words... Bulldozer was so bad that they had to adjust windows for it run them?
So why would the cache aliasing fix help Sandy Bridge? The fix improved performance by 1-2% in most cases under linux (see phoronix).more likely that it includes the cache aliasing fix which improves the condition on HT enabled processors.
the same processor had no improvement with ht disabled
open source OS, the type that powers all the servers in the world? Nobody that's who.
Windows does run on Bulldozer without this patch. It's just a tribute to the complexity and diversity among CPU architectures. There is no "one OS fits all" or "one compiler fits all" (on actual code generation) anymore
So why would the cache aliasing fix help Sandy Bridge? The fix improved performance by 1-2% in most cases under linux (see phoronix).
Microsoft mentioned the scheduler.
That means that you can have two copies of the same data in separate parts of the cache without knowing it... and they wouldn't be updated correctly, so you'd get wrong results.
So an overall improvement to aliasing will affect any cpu which has multiple cores, virtual or physical, if the share cacheCache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver, this can lead to hardware stability problems and system lockups.
Nobody? Hmmm, well, somebody did. Linux does have a performance patch. It is for a cache aliasing issue involving shared libraries.its not FUD... its terrible because AMD doesn't actually write that optimization, which means few others will.
Who do you think is going to write those optimization for open source OS, the type that powers all the servers in the world? Nobody that's who.
But it shouldn't, unless they specifically added a method for SB. Better temporal locality would improve any modern CPU's performance, however. Linux's alias fix, FI, worked around the specifics of BD's I$. More generic methods, like coloring and extra copies, tend to be error-prone and wasteful (in both space and performance), when used where not needed (OSes that use coloring and the like for most or all virtual memory are a different matter). If the hardware can do it reasonably well, it's not worth it.The cache aliasing tweak benefits Hyperthreaded processors where both threads share the same core/cache
But, which result is correct? The penalty of figuring that out will typically be negligible, but when it's not, it can be damning. In addition, we don't know, if, or how much, Windows may be affected. Cache aliasing on BD does not appear to be a major problem. Linux, FI, patched for a very specific case, and it only mildly affects performance even for that case (sometimes even performing slightly worse when corrected!).if the wrong result resides in cache, then the cpu has to go back to l2 or l3 and finally system memory to update this data which will reduce throughput.
But, not all CPUs will need it, ones that need it for one OS may not need it for another (assuming it's a performance, not correctness, problem), and in some cases, it may appear to need work, but letting the hardware take care of it may still be faster than a software fix. In the case that it does need work, it will need to be done in a way specific to the OS' use of memory on the specific family of CPUs in question.So an overall improvement to aliasing will affect any cpu which has multiple cores, virtual or physical, if the share cache
Well theres also the fact this patch may be a backport of the windows 8 scheduler, and windows 8 shows improvements in certain scenario's as well.But it shouldn't, unless they specifically added a method for SB. Better temporal locality would improve any modern CPU's performance, however. Linux's alias fix, FI, worked around the specifics of BD's I$. More generic methods, like coloring and extra copies, tend to be error-prone and wasteful (in both space and performance), when used where not needed (OSes that use coloring and the like for most or all virtual memory are a different matter). If the hardware can do it reasonably well, it's not worth it.
Maybe. I was addressing the potential for fixing cache aliasing issues across different CPUs. At this point, that sort of problem is very much CPU-dependent. It's not a program behavior, but the specific behavior of one uarch. Another may exhibit cache aliasing, as well, but the conditions in which it may present a problem to be fixed will generally be different, as will what will be needed to take care of it.from the results, it would seem that the hyperthread is gaining additional throughput, and iir the numbers correctly the hyperthread was usually 25% the performance of the first thread.
So I have a 2600k running Win 7 Ult 64. Should I run this? Any gaming benefits?
Nobody? Hmmm, well, somebody did. Linux does have a performance patch. It is for a cache aliasing issue involving shared libraries.
So I have a 2600k running Win 7 Ult 64. Should I run this? Any gaming benefits?