CPUs will be relegated to low power, low cost and that the future is really in software and the user experience.
Th first part hasn't happened yet. I'm not sure if it will come to pass or not, TBH. Hardware in general won't, but the CPU very well could, as other bottlenecks become more important (even now, "the CPU," encompasses a great deal of networking and storage handling, FI). The second part has been under way for years, and rightfully so: we've needed it.
The Von Neumann architecture has been exhausted and that more exotic architectures such as neural networks will take its place.
Only if you drink the Kool-aid that multiple Von Neuman machines strung together are no longer based on a Von Neuman architecture, or that virtual memory breaks VN architecture, or caches do, etc..
While different than the basic architecture described in the 40s, they are all derived directly from it, and share its basic workings. To make use of something not that, code written for said CPU would need to have some way of being validated by other means than instruction type, order, register allocation order, and memory access location and order, which would be difficult and time-consuming, to the point that nobody wants to do that, except in very low cost/space/power embedded situations, where the code base used is small and targeted to a specific application.
If you consider Von Neumann to only be a pure Von Neumann, we've already long since ditched it, rather than it being some future thing. More importantly, forget the, "novel CPU," circle jerking, and look at memory. Memory is what makes us slow, today, from SRAM caches to DRAM caches to inches-away DIMMs, to feet-away disks to milliseconds-away servers with the info you need right now.
Other architecture types, that truly aren't Von Neumann-based, are and will remain used for special tasks, where they can be thousands of times faster, at a fraction of the cost...but are a PITA to develop for, so remain niche. A lot of them are cool, and it would probably be much more enjoyable to work on one--and its supporting software stack--than a general purpose CPU, but they aren't going to become general purpose CPUs.
We are in the dark ages of parallelism and that highly parallel, many core CPUs will come after compiler breakthroughs.
Nope. We are in the Renaissance age of parallelism. Those CPUs exist, though aren't common, and programming languages are catching up. You can tinker with Erlang, Haskell, Ocaml, etc., and see the future, but you have to wait for that future stuff to work with C, C++, C#, etc., before it makes sense to put into production.
The compiler crap has been a silly myth since ever high level languages came into being, and will stay that way.
The work needs to be described to the compiler in a way that facilitates thread-level parallelism. Compilers that can handle the work after that already exist, and have for years (not decades, but maybe up to 15 years, depending on your you define it), but programming languages
that you can reasonably use with existing software bases, are still works in progress. C and C++, FI, have so many ticky little rules that you can't just write code that has no logical dependencies and expect a miracle--it takes effort and experience, on top of knowing what you're trying to accomplish, performance-wise, without premature optimization. Java and C# are basically impossible to add any significant parallelization to, without being explicit. Magical compilers that will auto-paralellize code written sequentially have been, are, and will remain, a fantasy, and common programming languages aren't well suited to not writing sequentially. The compiler to cure all serializing ills is like cold fusion--it might be possible, but don't expect to see it in your lifetime.
Also, as long as memory is the bottleneck that it is, and power efficiency isn't improved by leaps and bounds, predication will suck, as an option for low-parallelism algorithms, just as it has in the past.
However, if new memory technologies can allow many random accesses over just a few wires, then
game on (Google "wish joins" for a couple papers describing an efficient methodology for this sort of thing to be added, that builds upon common speculative features already in most high-performance processors).
Heterogeneous CPU/GPU architecture will take over.
Already happening. Due to the above programming language related issues, it's not happening super fast, but it is happening. The early CPUs/SoCs are already in consumers' hands (in many cases, literally), popular applications use the GPU for more than just 3D, and proper combination of support features is being worked on by various means, including HSA.
Analog computers will make a comeback
Doubtful, but it would be interesting. Now, we're getting things like SDR, showing that simple digital processors are fast enough to gradually replace fixed-function digital/analog combo units (they're still quite specialized, but not to the degree they were, and have allowed for things like firmware updates to draft hardware when a spec comes out, firmware updates to meet some random country's new regulations, etc..). Processing bit streams can be done plenty fast, if there's never a question of which bits need to where and in what order.
While I realize there currently isn't a need for more performance in conventional computing for most average users
They are insulated from it, but if you get out of the world that marketing has made, it's still there, but in a very different form than 10+ years ago. They don't care what went on to make their new iPad/Nexus/Note faster, they just know it is and that was worth upgrading for. The work to making mobile computers better, and networked infrastructures function well enough for average users, is nowhere near complete, and has been a major paradigm shift. Mobile is becoming conventional. Performance in terms of raw operations per time in a single thread, though, have hit a wall due to memory, and exceed the needs of most users, when power consumption is not an issue.
Since computing requirements won't stay constant, what do you think future CPU architectures will be like?
GPGPU and/or GPDSP (such as Hexagon), with programming languages abstracting them, will finally get enough support to get widespread use, leading to better implementations of them and wider adoption of varied competing designs. Along with this, more and more R&D money will go into on-chip and chip-to-chip networking, which is more of an unknown.
A simple 2D grid like Tilera has should be fine for real-time applications and basic networking tasks, but it's not going to be usable for general-purpose work. OTOH, a 3D mesh like Fujitsu has (6D by their marketing) is going to scale too poorly to be used anywhere but expensive clusters (which is fine for Fujitsu, since that's what it's made for), unless 3D transistor layouts can be made cheaply, allowing layers of connected 2D grids. More flexible topologies like AMD has allowed with HT, OTOH, or that proprietary networking vendors have added that run over PCIe, require a lot of software effort to make efficient, so aren't good general fits, on top of being expensive add-ons. There's a lot to do, here, a lot of CPU time spent waiting to do something, and nobody yet has it truly figured out.