What controls Turbo Core in Xeons?

The Stilt · Jan 13, 2017

Dufus said:
It does need intervention (BIOS Mod usually).

MCE is just a fancy marketing name. Basically Intel produced Haswell CPU's with an bug / errata which allows full turbo on all cores. This is fixed up with a microcode update. In essence apply full turbo before the microcode patch takes place (one of the microcode versions that fixes this particular errata) and you will probably find your 2699 running 3.6GHz across all cores providing the extra power draw can be handled.

I can swap the microcodes no problem, but ASRock is using the same microcode as my board does.

Dufus · Jan 13, 2017

I don't know how far you would have to go back. Probably a microcode older than version 0x19. The other option is to go with no microcode until after BIOS POST but of course need to hope the CPU doesn't trip up during that time and that the BIOS itself is okay with it.

0 means there's no applied microde but really there is by this time.

The Stilt · Jan 13, 2017

It's been a while since I've last been completely speechless...

Flushed the "bad microcodes" out of the bios, changed to desired configuration and it worked

I cannot lock it to the maximum ratio (36x), however I can now make all cores run at 3.0GHz instead of 2.3GHz.
It seems that higher ratios are ignored for higher core counts. With 1/2T I can still achieve the maximum turbo, same as before.

Windows automatically loads a newer microcode (0x2E) as it starts, but it is already too late to revert the changes

A VAST improvement.

Dufus · Jan 13, 2017

One caveat, once the microcode has loaded if any further attempt is made to change settings such as vcore / turbo values etc, it usually reverts the core back to the standard ratio settings. On Haswell core processors locking the overclocking register MSR 0x194 would prevent this as well as any further changes to overclocking. Haven't tried it yet on my Xeon.

Edit: Still seems to do the trick.

If you want me to have a look and see if I can spot anything obvious as to why it's not sticking at 36x I have an old dump program that dumps the BIOS and a text file with register values. I wouldn't need the BIOS file, the text file should be enough.

The Stilt · Jan 13, 2017

Could you be bit more specific regarding the MSR 0x194?
Which bit is set / cleared, at what point?

Dufus · Jan 13, 2017

Sure, set bit 20 of EAX. Sorry if I assume too much, it's a bad habit I have and probably why my explanations aren't very good.

Lock can be applied after CPU OC settings have been set (voltages and ratios). That lock option may already be an available setting in the BIOS itself and nearly always best to do it in the BIOS firmware. If not done in BIOS then be aware there can be sleep issues because usually registers are set back on wake up to BIOS set values. Other option in that case is to prohibit sleep.

Strange your not hitting 3.6 all cores, running into power / current limits? IIRC at 3.15GHz and running CPU-Z MT bench CPU package power draw estimation was ~110W

The Stilt · Jan 13, 2017

Setting bit 20:20 in 0x194 doesn't seem to make any difference (regardless when it is set). As soon as I try to change any of the settings, the CPU frequency still drops to 800MHz. The only way to release it from there is to reboot the system.

PPB · Jan 13, 2017

All this means is that you can get one of those low power e5 v3s with silly high 1 core turbo and with that pqrticular microcode you can make it mce to full turbo? Now THATS interesting and probably even a zen contender in MTperf/$ for me.

Sent from my XT1040 using Tapatalk

The Stilt · Jan 14, 2017

Another side effect I discovered: The CPU now seems to have an AVX2 offset. Fully multithreaded workloads run fine at 3.0GHz, however as soon I start a AVX2 workload the frequencies drop to 2.6GHz. Still faster than it originally was, but pretty weird stuff.

Since I cannot configure the power limits, I had to try something more complicated: I enabled the CPU SVID interface (which supplies telemetry back to the CPU) and scaled down the FIVR input power telemetry by 1/2. The CPU sees the change in power, however the clocks remain exactly the same as before.

Dufus · Jan 14, 2017

Okay, not seen operation below 1.2GHz yet but am aware of the MFM. Not seen any C3 / C6 package states either but that was also the case with the 6800k I had in there. Not been able to run static voltage either for some reason.

A few things locked down in my BIOS I need to look at when I have a bit ( a lot) of spare time. Power is currently locked to standard levels and as such running AVX on all cores results in throttling but no real AVX offset, 5 cores of Linpack will run 3.15GHz a little under the 120W limit. AVX VID is quite large (up to 0.1V more).

I'm using ucode ver 0x38, don't know if that makes a difference. If you need a utility for Windows there is one here. https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver

The Stilt · Jan 14, 2017

Updated the microcode from 0x2E to 0x39 and increasing the power limit no longer locks the frequency to 800MHz. However increasing any of the limits doesn't change the behavior. It runs at 2.6GHz as soon as AVX2 instructions are used, no matter how frequent they are. Scaling the power telemetry makes no difference either, as said before.

TennesseeTony · Jan 15, 2017

I am in way over my head in this thread, and I haven't understood but maybe 30% of what has been said thus far on this second page, so please forgive me for being the REAL Dufus here.

Regarding all this talk of microcode modification....is this something as simple as installing a really old factory BIOS, or are we talking about going in, cracking open the code, and manually programming some values in an existing BIOS?

I have four Xeon e5-2683-v3@2.5GHz (turbo) (28 threads) systems for distributed computing that I'd love to try and get to run at an elevated max turbo. They seem to run (non AVX) at only 95 of the 120 watts TDP, so there is a fair amount of headroom to play with from the get-go, without adjusting power limits, etc.

Sadly, I'm not capable of doing command line/programming type stuff without detailed, step by step, copy/paste, type instructions.

TennesseeTony · Jan 15, 2017

I've followed the link and downloaded the utility, where do I get these microcode updates, and do I put them in the same folder as the utility? A quick Google search didn't help (or I didn't understand what I was looking at).

The Stilt · Jan 16, 2017

After few days of testing, here's what I've discovered:

The most essential thing is that the CPU is initialized WITHOUT a microcode. Allegedly it is possible to initialize the CPU with an extremely old microcode version, but so far I haven't been able to find such version (hence allegedly). Microcode version 0x1F (06/03/2014) is already too "new" to prevent this exploit from working. Since each and every motherboard bios is supplied with a microcode present (for obvious reasons), initializing the CPU without a microcode mandates that the microcode is completely removed from the bios binary. This naturally involves modifying the bios and updating it, which in some cases can be little tricky.

After testing all of the different microcodes I could find, I've found out that there are rather large differences between them. The most important thing is, that it appears that Intel has no direct or indirect means to completely prevent this exploit from working. Technically they can reduce the "yield" (clocks) in certain workloads, but not prevent it completely as it is too late when the CPU has already been initialized. Newer microcode builds generally contain workarounds for errata and because of that it is generally recommended to use the newest build available. When using this exploit you'll need to decide if you want to have the highest possible performance in all workloads, possibly at the expence of reliability or alternatively slightly lower performance at the best known reliability (i.e with the most recent microcode update).

Haswell was the first "wide" core from Intel (256-bit FP). In order to preserve power, the Power Management Unit (PMU) power gates the upper 128-bit of the FP when 256-bit instructions are not executed. In somewhere between August and September of 2014 Intel changed the behavior of the Turbo on Haswell. Previously the Turbo behavior was identical regardless if the upper 128-bit of the FP was executing or not (i.e same clocks for 128-bit and 256-bit workloads). In the microcode released in September 2014 the Turbo behavior was changed significantly, from static to workload dependant. In this microcode and all the newer ones the Turbo clocks are exactly the same for 128-bit workloads as before, but significantly lower for 256-bit workloads. On my CPU the difference is 400MHz.

The newest microcode version for the Haswell-E/EP/EX/EN production stepping (CPUID 0x306F2) is version 0x39 (10/07/2016). This microcode can be used for this exploit, however it will result in lower yield (clocks) than the earlier ones. This microcode is highly recommended if you are satisfied with a more modest boost, or require maximum reliability (professional use). This microcode also has an additional advantage on systems, which lack both the "Power Limit" or "CPU telemetry feature" (SVID) options in the bios. Version 0x39 microcode is one of the few versions, which doesn't feature the bug I call as the "LFM bug". The best way to describe the "LFM bug" is that when you use this exploit, load a newer microcode in flight and then try adjusting any of the CPU parameters (frequency, voltage, power limits, etc), the CPU will lock to the LFM state (typically 800MHz).

I personally ended up using microcode version 0x27 (08/08/2014), and this is the version which offers the best performance. This versions still features the static Turbo behavior (same for 128/256-bit workloads) and has some of the most critical Haswell-Ex erratas (such as TSX) already fixed.

Additionally there appears to be some Turbo rules, which appear to be core configuration dependant and completely fixed.

These apply on my Haswell-E HCC, but they might be different on other variants:

- >= 10 cores == Maximum Turbo Ratio available
- >= 12 cores == Maximum Turbo Ratio - 100MHz
- >= 14 cores == Maximum Turbo Ratio - 200MHz
- >= 16 cores == Maximum Turbo Ratio - 400MHz
- >= 18 cores == Maximum Turbo Ratio - 500MHz

This means that when 0x27 microcode is used, I can run my 2699 at 3.6GHz (1-10 cores), 3.5GHz (with 12 cores), 3.4GHz (with 14 cores), 3.2GHz (with 16 cores), 3.1GHz (with 18 cores), regardless of the workload.

Since the microcode can be updated in flight, controlling the microcode version in Windows might be slightly harder.
For Windows 7 - 8.1 (including their server variants) update KB3064209 must be uninstalled, in case it is found in the system. This is a microcode update, which contains microcode version 0x2E for Haswell-Ex.
Windows 10 meanwhile is distributed with microcode version 0x36. To remove it, file named "mcupdate_GenuineIntel.dll" found in System32 folder must be renamed so that the system no longer finds it. Note that I haven't tested this procedure personally, since I'm still using Windows 7.
For Linux using a specific microcode version should be quite well documented else where.

The microcode in Windows can be updated with a driver released by VMWare: https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver
https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver
Here are version 0x27 & 0x39 microcodes for Haswell-Ex (0x306F2) in VMWare driver / Linux compatible format: https://1drv.ms/u/s!Ag6oE4SOsCmDhFnET3uw9wHeV4EA
Rename the desired version to microcode.dat, and proceed as instructed by VMWare.

Personally I gained around 28% of performance with this exploit.

lopri · Jan 16, 2017

TennesseeTony said:
I am in way over my head in this thread, and I haven't understood but maybe 30% of what has been said thus far on this second page, so please forgive me for being the REAL Dufus here.

Regarding all this talk of microcode modification....is this something as simple as installing a really old factory BIOS, or are we talking about going in, cracking open the code, and manually programming some values in an existing BIOS?

I have four Xeon e5-2683-v3@2.5GHz (turbo) (28 threads) systems for distributed computing that I'd love to try and get to run at an elevated max turbo. They seem to run (non AVX) at only 95 of the 120 watts TDP, so there is a fair amount of headroom to play with from the get-go, without adjusting power limits, etc.

Sadly, I'm not capable of doing command line/programming type stuff without detailed, step by step, copy/paste, type instructions.

I second this motion. lol.

Ionstream · Jan 16, 2017

I'd like in on all of this too. I have a pair of E5-2686v3 Xeons (which are lower binned 2699s), and I've been struggling to understand why I am unable to hit the designated boost bins, despite some pretty hefty cooling.

3/3/3/3/3/3/3/6/7/8/9/10/11/12/13/14/15/15
For instance, documentation says I can hit a boost clock of 2.9 GHz with 8 cores on non-AVX loads. I'm only seeing 2.68 - 2.76 GHz so far though.

Based on what's been mentioned previously in this thread, one needs to have:

1. The correct microcode, and a means to flash it
2. A motherboard which supports overclocking (X99?)
3. A Haswell-EP/EX CPU of your choice

The thing I don't understand is, how does one configure the boost multipliers for the CPU? I've seen these options in Intel XTU, but they're sadly blanked-out for me. The only thing I can adjust is the power limit for boost, which doesn't seem to do anything. I'm wondering if this is because I'm on a C612 chipset.

The Stilt · Jan 16, 2017

Inability to hit the boost bins is most likely due the power limits. If the motherboard lacks the option to disable the SVID telemetry, then your only option is to raise the power limits. This can be done through the standard MSRs or with third party tools, such as ThrottleStop. All Haswell-EP HCC SKUs should be unlocked up to 240W.

Likewise if the boost ratio configuration is not available in the bios, the ratios can be programmed through the MSRs (0x1AD, 0x1AE & 0x1AF).

The Stilt · Jan 17, 2017

Unfortunately it seems that the "AVX2 Offset" isn't completely missing from the old microcode versions either. It is just the threshold which is higher and the frequency reduction (lower) which differs in the older microcode versions. In an application which lightly utilizes AVX2 the older microcodes have no offset, but in application which heavily utilizes AVX2 the offset is present. The latter application is X265 and I verified the behavior by turning AVX2 off. With AVX2 disabled the frequency stays constant even with the newer microcodes. AVX has no effect to the frequencies.

Ionstream · Jan 17, 2017

The Stilt said:
Inability to hit the boost bins is most likely due the power limits. If the motherboard lacks the option to disable the SVID telemetry, then your only option is to raise the power limits. This can be done through the standard MSRs or with third party tools, such as ThrottleStop. All Haswell-EP HCC SKUs should be unlocked up to 240W.

Likewise if the boost ratio configuration is not available in the bios, the ratios can be programmed through the MSRs (0x1AD, 0x1AE & 0x1AF).

ThrottleStop readings indicate it's not due to power limits. I'm averaging 86W out of 120W with 8 cores fully loaded. Still reading through material to try and figure out what on earth is going on. I'm a total amateur at this; the only overclocking I ever did was to mess with the FSB on a C2D E8400, which bit the dust a year later chugging along at 3.6 GHz.

Ionstream · Jan 17, 2017

The Stilt said:
After few days of testing, here's what I've discovered:

The most essential thing is that the CPU is initialized WITHOUT a microcode. Allegedly it is possible to initialize the CPU with an extremely old microcode version, but so far I haven't been able to find such version (hence allegedly). Microcode version 0x1F (06/03/2014) is already too "new" to prevent this exploit from working. Since each and every motherboard bios is supplied with a microcode present (for obvious reasons), initializing the CPU without a microcode mandates that the microcode is completely removed from the bios binary. This naturally involves modifying the bios and updating it, which in some cases can be little tricky.

After testing all of the different microcodes I could find, I've found out that there are rather large differences between them. The most important thing is, that it appears that Intel has no direct or indirect means to completely prevent this exploit from working. Technically they can reduce the "yield" (clocks) in certain workloads, but not prevent it completely as it is too late when the CPU has already been initialized. Newer microcode builds generally contain workarounds for errata and because of that it is generally recommended to use the newest build available. When using this exploit you'll need to decide if you want to have the highest possible performance in all workloads, possibly at the expence of reliability or alternatively slightly lower performance at the best known reliability (i.e with the most recent microcode update).

Haswell was the first "wide" core from Intel (256-bit FP). In order to preserve power, the Power Management Unit (PMU) power gates the upper 128-bit of the FP when 256-bit instructions are not executed. In somewhere between August and September of 2014 Intel changed the behavior of the Turbo on Haswell. Previously the Turbo behavior was identical regardless if the upper 128-bit of the FP was executing or not (i.e same clocks for 128-bit and 256-bit workloads). In the microcode released in September 2014 the Turbo behavior was changed significantly, from static to workload dependant. In this microcode and all the newer ones the Turbo clocks are exactly the same for 128-bit workloads as before, but significantly lower for 256-bit workloads. On my CPU the difference is 400MHz.

The newest microcode version for the Haswell-E/EP/EX/EN production stepping (CPUID 0x306F2) is version 0x39 (10/07/2016). This microcode can be used for this exploit, however it will result in lower yield (clocks) than the earlier ones. This microcode is highly recommended if you are satisfied with a more modest boost, or require maximum reliability (professional use). This microcode also has an additional advantage on systems, which lack both the "Power Limit" or "CPU telemetry feature" (SVID) options in the bios. Version 0x39 microcode is one of the few versions, which doesn't feature the bug I call as the "LFM bug". The best way to describe the "LFM bug" is that when you use this exploit, load a newer microcode in flight and then try adjusting any of the CPU parameters (frequency, voltage, power limits, etc), the CPU will lock to the LFM state (typically 800MHz).

I personally ended up using microcode version 0x27 (08/08/2014), and this is the version which offers the best performance. This versions still features the static Turbo behavior (same for 128/256-bit workloads) and has some of the most critical Haswell-Ex erratas (such as TSX) already fixed.

Additionally there appears to be some Turbo rules, which appear to be core configuration dependant and completely fixed.

These apply on my Haswell-E HCC, but they might be different on other variants:

- >= 10 cores == Maximum Turbo Ratio available
- >= 12 cores == Maximum Turbo Ratio - 100MHz
- >= 14 cores == Maximum Turbo Ratio - 200MHz
- >= 16 cores == Maximum Turbo Ratio - 400MHz
- >= 18 cores == Maximum Turbo Ratio - 500MHz

This means that when 0x27 microcode is used, I can run my 2699 at 3.6GHz (1-10 cores), 3.5GHz (with 12 cores), 3.4GHz (with 14 cores), 3.2GHz (with 16 cores), 3.1GHz (with 18 cores), regardless of the workload.

Since the microcode can be updated in flight, controlling the microcode version in Windows might be slightly harder.
For Windows 7 - 8.1 (including their server variants) update KB3064209 must be uninstalled, in case it is found in the system. This is a microcode update, which contains microcode version 0x2E for Haswell-Ex.
Windows 10 meanwhile is distributed with microcode version 0x36. To remove it, file named "mcupdate_GenuineIntel.dll" found in System32 folder must be renamed so that the system no longer finds it. Note that I haven't tested this procedure personally, since I'm still using Windows 7.
For Linux using a specific microcode version should be quite well documented else where.

The microcode in Windows can be updated with a driver released by VMWare: https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver
Here are version 0x27 & 0x39 microcodes for Haswell-Ex (0x306F2) in VMWare driver / Linux compatible format: https://1drv.ms/u/s!Ag6oE4SOsCmDhFnET3uw9wHeV4EA
Rename the desired version to microcode.dat, and proceed as instructed by VMWare.

Personally I gained around 28% of performance with this exploit.

You do mention the possible use of newer uCodes, but this exploit requires the removal of uCodes from the BIOS. Care to elaborate a bit more here?

Does this mean a CPU running on a clean BIOS can have a uCode loaded on from Windows?
If so, does this also mean, that every time I power cycle the computer, the CPU will be running on a clean BIOS, only to have the uCode loaded on by Windows again?
Lastly, does this mean boost clocks can be altered while in Windows via third-party applications after Windows loads the uCode, or must this be set in the BIOS directly?

I think I'm starting to get a rough picture of what's going on.

The Stilt · Jan 17, 2017

Ionstream said:
You do mention the possible use of newer uCodes, but this exploit requires the removal of uCodes from the BIOS. Care to elaborate a bit more here?

Does this mean a CPU running on a clean BIOS can have a uCode loaded on from Windows?
If so, does this also mean, that every time I power cycle the computer, the CPU will be running on a clean BIOS, only to have the uCode loaded on by Windows again?
Lastly, does this mean boost clocks can be altered while in Windows via third-party applications after Windows loads the uCode, or must this be set in the BIOS directly?

I think I'm starting to get a rough picture of what's going on.

As said before, the most important thing is that you initialize the CPU without ANY microcode present (i.e in it's factory state). This is achieved by removing the microcode completely from the bios. When the CPU is initialized without any microcode, the errata which makes this exploit possible is still present. As soon as any microcode which features a workaround is loaded to the µROM, the errata will be repaired and this no longer works. However if you make the desired changes before any microcode is loaded, they will stick just fine even after loading any of the microcode versions. The errata which makes this exploit possible will be repaired the same way it would be normally, however the CPU cannot revert the changes you've made and therefore the changes will stick.

If the bios lacks OC options, such as the "multicore enhancement" (MCE), then this exploit will be harder to use but still not impossible. It can be done in UEFI, for example with tools such as RU. Programming the MSRs is far from rocket science:

The MSRs which need to be programmed are the same regardless of the model used, only the number of bytes you need to change depend on the core count.

The targeted registers are MSRs 0x1AD (Cores 0-7), 0x1AE (Cores 8-15), 0x1AF (Cores 16-18 + Semaphore bit).

Example default configuration for 2698 v3 CPU (16C/32T, 3.6GHz maximum turbo):

MSR 0x1AD: 0x1D1E1F20 (EDX), 0x21222424 (EAX) - 2.9/3.0/3.1/3.2/3.3/3.4/3.6/3.6GHz (Core 7 - 0)
MSR 0x1AE: 0x1C1C1C1C (EDX), 0x1C1C1C1C (EAX), - 2.8GHz (Core 8-15)
MSR 0x1AF: 0x00000000 (EDX), 0x00000000 (EAX) - N/A

To:

MSR 0x1AD: 0x24242424 (EDX), 0x24242424 (EAX) - 3.6GHz (Core 7 - 0)
MSR 0x1AE: 0x24242424 (EDX), 0x24242424 (EAX), -3.6GHz (Core 8-15)
MSR 0x1AF: 0x00000000 (EDX), 0x00000000 (EAX) - N/A

Once the multiplier registers have been programmed, the changes must be initialized by setting the semaphore bit (MSR 0x1AF, bit 63:63).
After that a microcode can be loaded and the CPU new frequency settings will stick.

Ionstream · Jan 19, 2017

Ah that clears things up a lot. Thank you

I'll go give this a shot when I return home.

C-Power/Tw0tch · Jan 22, 2017

Hi guys, first post ever on Anand

Anyway - I am following this thread since I own a 2683 V3, and I would love to get this thing to just run max core/turbo/all core speed all the time :>

Dufus has achieved exactly that, and I was wondering if it's just the motherboad (If so, I'll be switching my Extreme6/3.1 for a Taichi ASAP lol) or if I am missing something.

I Tried everything but can't for the life of me get it to run all cores at 30x multi, the best I can get out of it is a max of 25x multi.

Tried loading the bios with no Microcode, renamed the windows 10 genuine_intel file, tried several different settings in the bios.
I don't seem to have the "MCE" option with this CPU, while it should be in there.

So.. my question is more or less aimed at Dufus, as I am wondering how easy it is to achieve the 30x multi with the Taichi

Does it require a ton of bios hacking/modifying (I am not sure how to change those hex codes), or is it enough to remove the Microdcode from the bios and does that MCE option work out of the box then?

Edit,
BTW I am that guy who has the #1 and 2 spot on the CPU-Z validator site, but only on 2 cores unfortunatly

http://valid.x86.fr/top-cpu/496e746...0552045352d32363833207633204020322e303047487a

Dufus · Jan 22, 2017

Welcome to the forum.

I'm not sure if you've picked the right person to help you here, I'm not so good at explaining things well.

Note that as The Stilt points out there is AVX2 limiting which can drop that 30x a few bins. It works a little differently than the AVX2 offset used on Broadwell in that it is based on the number of active cores and core voltage. It would be nice to get rid of that effect.

The Taichi also removes BIOS Setup options for setting ratio's when running a v3 Xeon such as the 2683 and is probably not much different from your own board. BIOS mod is a little more involved than just hex editing unfortunately and my own modification is still pretty much work in progress when I am able to find time.

Next best thing from BIOS mod is if you can post without a microcode update then you should be able to force it easily enough by setting turbo ratio's and possibly mailbox ratio too, (don't remember). With UEFI it's usually easy enough to set the system to boot a UEFI driver before the OS loads with required settings.

Are you familiar with UEFI Shell?

BTW be careful of other software undoing the change, you may need to lock MSR 0x194 (see previous posts). For instance Throttlestop used to reset this type of OC if microcode update was in place, I don't know if that has changed since.

Dufus · Jan 24, 2017

@C-Power/Tw0tch I can write a crude UEFI driver that can automatically be loaded before the Windows OS boots but I need to know how familiar with the UEFI Shell you are?

What controls Turbo Core in Xeons?

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Elite Member

Elite Member

Golden Member

Elite Member

Member

Golden Member

Golden Member

Member

Member

Golden Member

Member

Junior Member

Senior member

Senior member