Intel already tried that with Knights Landing, a big compute CPU made from Atom class cores. It was both unpopular and not particularly performant.
That was Intel’s kludge to try and make a cpu behave like a gpu; they gave up on that and made a real gpu. I am talking about the opposite. Knights landing had super simple cores with big vector FP units. Atom cores are very weak, but you can fit a lot more ARM cores per unit area than big AMD64 cores, so a smaller core is necessary, even with the die size advantages from MCMs. Wide vector FP units take a lot of die area and the interconnect to support them burns a lot of power. A lot of server applications do not really need any floating point performance. A tiny scalar FPU or a very limited vector unit would be sufficient. I don’t know how well higher levels of threading (like 4 or 8 threads per core) would compare in power and die size to just a bunch of smaller cores. There has been some specialized throughput processors that supported a lot of threads per core with limited success; Cray MTA, SPARC T and M series, IBM Power processors. It is a way of sharing hardware though. They may want to keep support for the same instruction set, but they could emulate wider FP units with less hardware or swap the thread to a bigger core when it hits an unsupported instruction. I don’t know if they would essentially try to share an FPU like excavator cores did.
There are a lot of different options for supporting throughout oriented computing, but AMD is almost certainly working on a small, low power versions of their cores for mobile and other applications anyway. It is a requirement to stay competitive. Both AMD and Intel are going to have a hard time competing with Apple in the mobile space. Apple is probably at least a generation ahead on cpu design although some of their performance comes from their control over the whole system. I have an old Apple laptop, but I don’t want to be pushed into the Apple walled garden for my laptop so I am hoping to see some much better mobile solutions from AMD. Apple is definitely trying to push more users into the iPhone/iPad space. The locked down iPad Pro type devices are deliberately much more attractive than the light weight laptops. I am tempted by the iPad Pro mini-led display though. Other ARM server cpu designs are close with AMD processors already and a bit ahead of Intel. AMD64 makers would be in even more trouble if Apple actually made servers. Apple are supposedly going to make a 40 core Mac Pro, so they could make their own servers easily; they just don’t seem to want to be in that market even though they probably would have a significant performance per watt advantage. Although, once you support massive amounts of IO, the core power probably becomes less significant.
An AMD low power version probably isn’t going to have multiple AVX512 units, although a version with such units might be useful for some applications, like game consoles and perhaps some laptops or mobile gaming devices. It is generally better to use the gpu cores when possible rather than try to make such streaming applications perform well on the cpu. The cores used in heterogeneous compute solutions will certainly be more powerful than atom cores. They could make different versions with different big:little core ratios. You might have almost all big cores for HPC. Perhaps some mix for general use and a high core count version that is mostly little cores. I don’t know if they would do that via different versions of the base die and/or different stacked die. I could see them eventually stacking 2 layers of cpu core that would allow for a range of products. Perhaps 3 different die, one all or mostly big, one all little, and one mixed. You could make a lot of different products if you could stack just two of those. That would also allow them to use different process tech for different cores, like a high performance process for the big cores and a density/power optimized version for the small cores. I wonder how many small cores would fit on a CCD sized area. They are working on cooling solutions to deal with stacked chips, but that is probably very be expensive. A layer of ultra low power cores might be doable without exotic cooling though, especially for server applications with lower clock and higher parallelism.