Is it possible to design a chip to be cooled from both sides? I know older amd's and pentium used to not have any pins directly under the die (although they usually did have some electronics there), would it be possible to design a chip where there are no pins directly under the die, cut out that part of the package to give access to the die, and cool it using some sort of heatsink that clamps onto both sides of the chip?
Not easily, if at all. The "bottom" of the chip is where its pads are mated to the wires that go to the pins/pads for the socket (or solder pad grid, if BGA).
Some simple ICs allow for just that, but of course they don't have many pins, and you solder directly to the die's pads. To cool a modern chip, you need a good bit of pressure on the CPU, so a cooler would have to be like a vice, and even if you moved the pins out, and assuming the pads on the die are all on the edges (I think they are, but am not 100% sure), that would mean additional strain where those pads go to the pads for the socket or PCB, which also need some pressure applied for a socket, and either one-way pressure or no pressure, for BGA.
But, there's also not really a need to cool from both sides. The majority of non-mobile thermal problems are due to the problem that as the chips shrink, the power density is increasing. So, FI, a shrink might get you a 50% smaller chip that uses 66% of the power. Over 4 such shrinks, that would be a density increase of about 3x. Having so much power to dissipate over 1-2 in^2 is a problem, when it used to be that, even though more power was consumed, it was spread out across the system. There is no easy solution to this problem, if you need high performance CPUs/GPUs (if you don't, "disaggregation" should be a very good one).
My issue with ARM, none of their consumer-solutions even come close to a Core i7 in both single threaded and multi-threaded performance, let alone when you factor in overclocking.
So, your main problem with ARM is that they and their partners have been unable to do what AMD and IBM also haven't been able to do, and that everyone else has just given up on (such as NEC and Fujitsu)?
Nobody has yet been able to make a CPU that can approach Sandy Bridge single-threaded performance, much less Ivy, Haswell, and beyond.
Nobody. Intel is alone, up at the top.
It has little or nothing to do with ARM v. x86, and everything to do with money, talent, skill, and demand. Intel has among the best engineers, obviously good management, and boatloads of R&D money. And, there's not demand for super-fast ARM CPUs. Faster than last year's, yes, but not for speed like our desktops have. Widows runs on x86, 64-bit Linux will need awhile to get stable and reliable on ARM. Now, if the manifold ARM server thing works out, and 64-bit ARM Linux gets stable and fast,
then the door will be wide open for speedy ARM designs, by anyone that can afford to try to make them.
Runs for 10 hours on battery, fits in my pocket.
...
They're not hard to get, but you often need to clean them up, first. My Discover couldn't do that, with heavy usage, until after I rooted it and disabled all the carrier-added crapware. After doing that, and trying some different mail clients (Aquamail is what I settled on, since I could limit what folders get pushed, and UI is decent), it's having no problems running all day at to over 50% battery, except when stuck in low-signal areas (with light usage, >90% battery).
As to the main topic:
Wanted:
1. An ABI for shared-memory heterogeneous computing (in the works already for x86 and ARM).
2. ECC required.
3. NUMA support baked in (IE, standard for querying topology, and allowing software to measure its effects, and/or for hardware to do that and report it).
4. A common DSP IR, guaranteed supported by whatever DSPs actually may be implemented (including just using the main CPU). The DSP having its own fancy vendor-only features is fine, so long as it can be generically programmed, as well. As new features become common, they should be standardized in said IR spec, even if it means forcing the slowpoke designers to add those features (IoW, make standards bodies like Kronos' to grow some balls, and tell the legacy-whiners to modernize).
5. A common GPGPU-like vector IR, as above.
6. Partitioning virtualization support baked in and required.
Not wanted:
1. Microkernel OS. Only a handful of good ones have ever existed, and QNX is the only one that has survived well. The OS kernel is an incestuous thing, needing to link together different systems in ways that basic messaging make more difficult and slow that just having it all in the same memory space. Leave messaging to user-space applications, where it belongs.
2. Legacy API specs, and language support requirements. How many years until the OP's list are old? Just let that go as it will. If vendors can't agree on their shit, that's too bad, but it's not a problem for a platform spec to solve, unless that platform spec is from a software middleman, such as Google, Apple, or MS. That kind of thinking was moved away from in the 80s, and that was a very good thing.
Even with that said, Python and HTML5
for the kernel or base system? The kernel needs to be small, fast, scalable, and secure. It also needs to do absolutely
zero work where HTML technologies of any kind would be useful. As for replacing the likes of BASH and Perl, Python and PHP have been used for that in past, and are actually OK at it, but there's not really a compelling need, HTML and CSS would be 100% pointless, and Javascript would be in no way helpful (also Systemd is keeping maintainers busy, these days, anyway

).
3. Storage: software went to SCSI ages ago, both in *n*x and Windows, regardless of what's actually targeted. Hardware that works should be allowed, because the user needs or wants it.
4. Tiered RAM: there is no magic pixie dust. If somebody could make RAM faster, without being much more expensive and/or higher latency, they would. They do their best as new standards come about. While not including extra bits per chip for ECC makes me go grrr, issues of bandwidth and latency do matter to JEDEC members, and if you would do some research into the history and reasoning for what they choose to do, they generally do a good job of it, save for the RDRAM fiasco.
5. File system. Kind of like with the API and language specs. ZFS is now a solid file system, with age and cruft, and many, "we could have done that better," feature implementations. It also largely ignores single-disk setups, and is a RAM hog, being made by Unix server guys for Unix servers. It's good, yes, but it's the
present, not the future. The future will have more aggressive CoW, tend to be logging, and/or implement log-like transaction stores within extents, now that much of the bad stuff has been figured out for LSFSes and block-level CoW, and should eventually provide some resiliency for single-drive systems.