The problem is being able to divide single task so that they can use multiple cores. It is a software design problem. Things like how do you speed up task like a XOR instruction. Our programming languages were all designed with a basis in single threads with the idea that if we wanted the program to run fast we would just get a faster cpu to run each instruction quicker. Parallel programming, something I worked on a ton in the late 1990's , requires programmers to re-think software design. I can write SMP code in assembly that uses all cores available, but when I move to C or C++ the efficiency drops quite a bit because I no longer have direct control over what is taking place in the instructions. Compilers are getting better and the tools are improving, but we have a long way to go.