- Apr 27, 2000
- 22,688
- 12,632
- 136
tl;dr version: Using kernel 3.19 + hardware IOMMU support on a 7700k resulted in non-HSA software performance regressing to that of the processor @ 3.4 ghz no matter what the clockspeed set in the UEFI.
Some background:
HSA relies on a lot of driver crap to work. If there is any one thing making it hard to set up and run HSA code on a Kaveri machine today, it has to be the massive amount of software infrastructure that must be laid down on top of the OS in order for HSA to be functional. For example, here's a how-to guide on supporting C++AMP in a Linux environment:
https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/HSA Support Status
I do not know if that is up-to-date, but that probably was at least true sometime in 2014, if not now. Similar guides intended to help the end-user set up Aparapi for HSA via Java lambdas is slightly less hellacious.
One of the major components of the HSA software stack as it currently exists is the Kernel Fusion Driver (or kfd). Check out slide 24 of this presentation:
http://www.slideshare.net/dibyendu.das/guide-to-heterogeneous-system-architecture-hsa-29621342
In Linux, a kfd is provided in certain versions of the Linux kernal. Up until kernel 3.18, you had to go and get a kernel specially-compiled with kfd support, probably from github. In 3.19, the kfd comes standard for those with supported hardware (it does nothing on a non-Kaveri machine). Those of you who have tried 3.19 on a Kaveri machine may notice the OS griping about IOMMU being disabled and failing to initialize the kfd device or whatever. Presumably, if your motherboard has proper support for IOMMU, you have it disabled (I did). So I enabled it just to stop the noise. I haven't even tried any kind of HSA development yet, but hey, may as well reduce the startup error train.
What I did not expect was the weird interaction between a Linux kfd kernel, hardware IOMMU support, and an overclocked CPU.
I noticed odd behavior from y-cruncher and some other programs after enabling IOMMU under kernel 3.19. It became very slow. Here's a breakdown of what I recorded when running y-cruncher 512m :
(all tests were run at NB 2100mhz, 16gb DDR3-2400, 384 shader iGPU 1028mhz on an Asus A88x-Pro. The graphics driver is the generic Radeon driver, since fglrx is broken in 3.19)
Note that the times here did vary somewhat (I did multiple runs). Runs that are listed as identical above were within +/- %3 of each other. For example, some of the 330s runs were in the range of 336-338s. I was very careful to make sure that all cores were running at the intended clockspeed. Not only did I use cpufreq to lock the CPU at maximum frequency all of the time, but I monitored current clockspeed using a Conky widget, and I used cpufreq-info to look at frequency history for the present boot. At no point did I observe any throttling of the processor.
As a brief test, I booted to Win10 @ 4.5 ghz with the other above settings and IOMMU enabled in the UEFI and ran y-cruncher there, and got in the high 220s which is about what I expect from Win10 (it runs y-cruncher slower than Linux on this box).
The implications of this "issue" are not especially positive. Assuming the HSA software stack is streamlined to the point that it is all integrated into the OS and/or graphics drivers, Kaveri users will have a choice: activate hardware IOMMU support and render CPU overclocking completely inoperable, or disable hardware IOMMU support and break HSA compliance (making HSA-capable software not work). The issue presently seems to be that creating a kfd device requires the CPU to stay within spec or something. It is possible that it's a software issue that can be ironed out in the future . . . if anyone really cares to do so.
Some background:
HSA relies on a lot of driver crap to work. If there is any one thing making it hard to set up and run HSA code on a Kaveri machine today, it has to be the massive amount of software infrastructure that must be laid down on top of the OS in order for HSA to be functional. For example, here's a how-to guide on supporting C++AMP in a Linux environment:
https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/HSA Support Status
I do not know if that is up-to-date, but that probably was at least true sometime in 2014, if not now. Similar guides intended to help the end-user set up Aparapi for HSA via Java lambdas is slightly less hellacious.
One of the major components of the HSA software stack as it currently exists is the Kernel Fusion Driver (or kfd). Check out slide 24 of this presentation:
http://www.slideshare.net/dibyendu.das/guide-to-heterogeneous-system-architecture-hsa-29621342
In Linux, a kfd is provided in certain versions of the Linux kernal. Up until kernel 3.18, you had to go and get a kernel specially-compiled with kfd support, probably from github. In 3.19, the kfd comes standard for those with supported hardware (it does nothing on a non-Kaveri machine). Those of you who have tried 3.19 on a Kaveri machine may notice the OS griping about IOMMU being disabled and failing to initialize the kfd device or whatever. Presumably, if your motherboard has proper support for IOMMU, you have it disabled (I did). So I enabled it just to stop the noise. I haven't even tried any kind of HSA development yet, but hey, may as well reduce the startup error train.
What I did not expect was the weird interaction between a Linux kfd kernel, hardware IOMMU support, and an overclocked CPU.
I noticed odd behavior from y-cruncher and some other programs after enabling IOMMU under kernel 3.19. It became very slow. Here's a breakdown of what I recorded when running y-cruncher 512m :
Code:
IOMMU(3.19) No IOMMU(3.19) IOMMU(3.16)
3.4 ghz ~330s ~330s ~330s
4.5 ghz ~330s ~217s ~217s
4.7 ghz ~330s ~212s ~212s
(all tests were run at NB 2100mhz, 16gb DDR3-2400, 384 shader iGPU 1028mhz on an Asus A88x-Pro. The graphics driver is the generic Radeon driver, since fglrx is broken in 3.19)
Note that the times here did vary somewhat (I did multiple runs). Runs that are listed as identical above were within +/- %3 of each other. For example, some of the 330s runs were in the range of 336-338s. I was very careful to make sure that all cores were running at the intended clockspeed. Not only did I use cpufreq to lock the CPU at maximum frequency all of the time, but I monitored current clockspeed using a Conky widget, and I used cpufreq-info to look at frequency history for the present boot. At no point did I observe any throttling of the processor.
As a brief test, I booted to Win10 @ 4.5 ghz with the other above settings and IOMMU enabled in the UEFI and ran y-cruncher there, and got in the high 220s which is about what I expect from Win10 (it runs y-cruncher slower than Linux on this box).
The implications of this "issue" are not especially positive. Assuming the HSA software stack is streamlined to the point that it is all integrated into the OS and/or graphics drivers, Kaveri users will have a choice: activate hardware IOMMU support and render CPU overclocking completely inoperable, or disable hardware IOMMU support and break HSA compliance (making HSA-capable software not work). The issue presently seems to be that creating a kfd device requires the CPU to stay within spec or something. It is possible that it's a software issue that can be ironed out in the future . . . if anyone really cares to do so.