If you noticed the
summary in there, from amd.com.......IMHO the "patch" here seems to imply that there's some sort of issue (or bug) with the cache behaviour during virtual address aliasing. However, whether this issue affects other
unpatched operating systems (perhaps Windows) is unknown (speculation). And later, from Linus.....Not sure how disabling this feature also affects other operating systems. :hmm:
OK, I have to go a bit deeper into the matter, any virtual regiments aside..
Patching software just means to apply changed code to do something differently than originally implemented. This can be a workaround for a HW bug (maybe this is why you think every patch is related to a bug) or a fix for a SW bug, but it also can be an improvement or even a new feature.
In case of Barcelona the published patch was a workaround+disabling a feature regarding a real HW bug in form of the well known TLB bug. It was a bug because it could cause a crash under certain circumstances.
In the case of this BD related patch it's about handling something differently in the kernel to avoid situations where performance could be reduced by a couple of percent, as can be clearly seen here:
This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.
http://www.spinics.net/lists/linux-tip-commits/msg13140.html
Although it's just some "tuning" it affects much more situations than e.g. optimizations like using ADD/SHL instead of a slower MUL instruction depending on the operands.
The shared I-cache of a BD module is not a bug but a conceptual detail. And how it handles virtual addresses is also not a bug but an implementation detail.
BTW I found the posting where I read about it first time:
http://www.planet3dnow.de/vbulletin/showpost.php?p=4487776&postcount=4655
The OP already quoted the performance impact I used to get to 97%:
> Out of curiosity, what's the performance impact if the workaround is
> not enabled?
Up to 3% for a CPU-intensive style benchmark, and it can vary highly in a microbenchmark depending on workload and compiler.
The 3% number is also clearly mentioned in the patch as of 08/05 here:
Code:
+ align_va_addr= [X86-64]
+ Align virtual addresses by clearing slice [14:12] when
+ allocating a VMA at process creation time. This option
+ gives you up to 3% performance improvement on AMD F15h
+ machines (where it is enabled by default) for a
+ CPU-intensive style benchmark, and it can vary highly in
+ a microbenchmark depending on workload and compiler.
It might be interesting, how the situation is on Windows based systems. Any OS guys around?