Overclocking in Linux: Kernel 3.19 + Kaveri + IOMMU

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
tl;dr version: Using kernel 3.19 + hardware IOMMU support on a 7700k resulted in non-HSA software performance regressing to that of the processor @ 3.4 ghz no matter what the clockspeed set in the UEFI.

Some background:

HSA relies on a lot of driver crap to work. If there is any one thing making it hard to set up and run HSA code on a Kaveri machine today, it has to be the massive amount of software infrastructure that must be laid down on top of the OS in order for HSA to be functional. For example, here's a how-to guide on supporting C++AMP in a Linux environment:

https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/HSA Support Status

I do not know if that is up-to-date, but that probably was at least true sometime in 2014, if not now. Similar guides intended to help the end-user set up Aparapi for HSA via Java lambdas is slightly less hellacious.

One of the major components of the HSA software stack as it currently exists is the Kernel Fusion Driver (or kfd). Check out slide 24 of this presentation:

http://www.slideshare.net/dibyendu.das/guide-to-heterogeneous-system-architecture-hsa-29621342

In Linux, a kfd is provided in certain versions of the Linux kernal. Up until kernel 3.18, you had to go and get a kernel specially-compiled with kfd support, probably from github. In 3.19, the kfd comes standard for those with supported hardware (it does nothing on a non-Kaveri machine). Those of you who have tried 3.19 on a Kaveri machine may notice the OS griping about IOMMU being disabled and failing to initialize the kfd device or whatever. Presumably, if your motherboard has proper support for IOMMU, you have it disabled (I did). So I enabled it just to stop the noise. I haven't even tried any kind of HSA development yet, but hey, may as well reduce the startup error train.

What I did not expect was the weird interaction between a Linux kfd kernel, hardware IOMMU support, and an overclocked CPU.

I noticed odd behavior from y-cruncher and some other programs after enabling IOMMU under kernel 3.19. It became very slow. Here's a breakdown of what I recorded when running y-cruncher 512m :

Code:
                IOMMU(3.19)          No IOMMU(3.19)          IOMMU(3.16)
3.4 ghz         ~330s                ~330s                   ~330s
4.5 ghz         ~330s                ~217s                   ~217s
4.7 ghz         ~330s                ~212s                   ~212s

(all tests were run at NB 2100mhz, 16gb DDR3-2400, 384 shader iGPU 1028mhz on an Asus A88x-Pro. The graphics driver is the generic Radeon driver, since fglrx is broken in 3.19)

Note that the times here did vary somewhat (I did multiple runs). Runs that are listed as identical above were within +/- %3 of each other. For example, some of the 330s runs were in the range of 336-338s. I was very careful to make sure that all cores were running at the intended clockspeed. Not only did I use cpufreq to lock the CPU at maximum frequency all of the time, but I monitored current clockspeed using a Conky widget, and I used cpufreq-info to look at frequency history for the present boot. At no point did I observe any throttling of the processor.

As a brief test, I booted to Win10 @ 4.5 ghz with the other above settings and IOMMU enabled in the UEFI and ran y-cruncher there, and got in the high 220s which is about what I expect from Win10 (it runs y-cruncher slower than Linux on this box).

The implications of this "issue" are not especially positive. Assuming the HSA software stack is streamlined to the point that it is all integrated into the OS and/or graphics drivers, Kaveri users will have a choice: activate hardware IOMMU support and render CPU overclocking completely inoperable, or disable hardware IOMMU support and break HSA compliance (making HSA-capable software not work). The issue presently seems to be that creating a kfd device requires the CPU to stay within spec or something. It is possible that it's a software issue that can be ironed out in the future . . . if anyone really cares to do so.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,205
126
Honestly, not that surprising. Overclocking is never guaranteed.

I remember someone (Rubycon?) mentioning that overclocking a 45nm Core2Quad over 3.4Ghz would break SSE4.1 apps.
 

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
Weird, and an interesting tidbit. What's also interesting is that I tested a speed within the turbo envelope for the 7700k (3.7 ghz) and it showed no gains either.
 

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
Okay, another update on this (relatively minor) issue. I still consider it to be of some interest, at least personally, since I am -> <- this close to finally writing some admittedly-primitive HSA software. But I digress.

I set the CPU multiplier to Auto and left on all the power management stuff to allow Turbo to function (if it can function at all), and then on top of that, I overclocked the bclk to 119 mhz. This set my base clock at 4016 mhz (p3 state), though under Linux, it still registers as 3.4 ghz. I can't tell what P0-P2 are right now since the tools to do that under Linux don't seem to work with kernel 3.19 yet. Oh well.

Anyway, I ran y-cruncher 512m and got a score of ~270s with hardware IOMMU enabled (allowing a functional kfd device). Then I disabled hardware IOMMU and got ~240s running the same benchmark using the same configuration. I am guessing that some kind of turbo setting kicked in, hence the superior performance.

What this tells me is that initializing a kernel fusion device under 3.19 somehow locks the CPU into the base multiplier for the CPU, overriding UEFI and turbo settings. Overclocking the bclk still allows one to OC the chip in these circumstances.

So, basically, if you want to overclock Kaveri while doing HSA anything, it's gotta be the bclk.

edit: I would like to issue a correction. It appears that turbo is *not* happening with 119 bclk and the CPU multiplier set to Auto (resulting in a p3 multi of 34x). It's just that with IOMMU enabled, there's some kind of a performance penalty. I tried 119 bclk and fixed 34x CPU multi (turbo disabled) without IOMMU and got ~240s. With IOMMU enabled, 119 bclk, and fixed 34x multi (again turbo disabled) I got up in the 270s range again. This kind of performance regression is not consistent with any of the other kind of Kaveri throttling I've seen to date. It's a little weird.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
Gah, I am defeated. For whatever reason, using the bclk to increase CPU performance while a kfd device is operating under Linux 3.19 now seems to not want to work anymore. Grumble grumble. At least overclocking the iGPU and NB still work under those circumstances. I'm not sure why it worked one time and won't work now but the whole thing is a bit odd.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Gah, I am defeated. For whatever reason, using the bclk to increase CPU performance while a kfd device is operating under Linux 3.19 now seems to not want to work anymore. Grumble grumble. At least overclocking the iGPU and NB still work under those circumstances. I'm not sure why it worked one time and won't work now but the whole thing is a bit odd.


It's probably just broken, apparently a lot of Kaveri is broken.
 

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
C'est la vie. Sometimes it wants to work and sometimes it does not. I'm still struggling to understand how the chip manages to slow down under these conditions. It does not help that Java 8 update 40 seems to have killed Math.round() and Math.sqrt() performance which had me really confused for awhile there, thinking my chip was slowing down just on those specific Math library methods (?!?!).
 

DrMrLordX

Lifer
Apr 27, 2000
22,688
12,632
136
Update: It looks like the overclocking + kfd problem is solved. My guess is that the current version of mainline kernel 3.19 (3.19.0-16) has a bugfix in there or something. 4.7 ghz worked wonderfully. Now I can do goofy HSA experiments fully overclocked. Huzzah!