But the basic idea of the OP is incorrect. This stuff cannot be done in hardware until the hardware can understand the semantics of the problem statement that the code solves for, and come up with an alternate, data-parallel solution. Basically, sci-fi stuff at this point.
Ok, so right now it cannot be done. Thing is, we are going to - pretty soon - reach a point where shrinking nodes will not be practical anymore... make that even 7nm, but anything smaller and faster than that assumes that "we will soon make new discoveries and invent new technologies" which we also assume will allow us to ever continue in increasing the processing power of a cpu core. But ... that's "maybe". Or even "possibly", but not necessarily true.
So in the planning ahead, perhaps moving both hardware *and* software towards abstract multithreading is the key to the future.
To write code which helps "the hardware .. understand the semantics of the problem ", and to also have hardware specifically built for this code.
Ofc i'm just imagining this, but i don't see any other way which we can keep increasing our PC power w/o resorting to this.
So we'd have to devise code which doesn't exist yet, for a machine which can't work without it.
I mean come on this is AT, shouldn't take you guys more than a month. DO it for science/