It's additional to the game code,it might be less work in total than dx11 and spread over more threads but instead of having one core run only the main game code thread now it also has to run some of the api/driver stuff.
What?
It's less total work and it's more evenly distributed over all the cores, yet it's somehow slower??
What is your technical background?