Baseless speculation time.
N31 is 2 GCDs of around 7,500 shaders each for a 15k total. It also has some MCDs for cache, there are rumours there is an IO die as well due to the talk of an odd number of chiplets.
How would you fit this together.
Well 1st lets make an assumption, the GCDs will go on top because of heat. Pretty simple, sure MLID said the same thing as well and is pretty obvious when you look at the temps of the 5800X3D despite the lower clocks and voltages.
The next assumption we need is a die size estimate for the GCD. We have 50% more shaders than N21 but no real IO or cache so no idea on that front but we can put our fingers in the air and pull numbers out of our bottom so lets do that. N21 is about 110mm^2 of IO and probably about the same again for cache leaving 300mm^2 for shaders and other stuff. 50% more shaders bumps that to 450 which with N5 we can halve if AMDs numbers are accurate to 225mm^2. Lets add 10% to account for improvements to ray tracing and larger low level caches and other changes so call it a round 250mm^2 per GCD.
This means you have 500mm^2 of silicon on the top so you need around 500mm^2 of silicon under that if you are doing 3d stacking. We know 256 bit GDDR6 + 16x PCie4 on N7 is 110mm^2 so lets just assume that is the same for N31, some savings from N6 and some losses from PCIe 5 so lets call those a wash as a guess. That means 390mm^2 of space remains under the 2 GCDs for cache. We know AMD can fit 64MB in 36mm^2 on N7/N6 (do we actually know what node that cache is made on or do we just assume N7? I don't believe there have been any official announcements have there?) In any case that means you could easily fit 10 such cache dies and the IO die under the 500mm^2 of real estate the GCDs provide for. Another option here is that AMD can just use the same 64MB cache dies they do for Milan-X and Zen3D provided it works for RDNA3, heck they could even use if for Zen 4 as well.
That makes a 512MB cache part viable from a tech point of view. I can also see a case for a 7970XT 8k halo part to top the stack. Give it 32GB of ram, 512MB of cache and fully unlocked / top binned dies and you have your $2,000 monster part. I assume AMD have ways to disable cache dies so I expect in the event that a bond fails they can disable that individual die and re-use those parts as a 7900XT with 16GB of ram and 256MB of cache and also a full complement of shaders (or maybe not for a bit more differentiation). There would also be space for a cut down actually affordable 7800XT part as well to capture partially defective GCDs.
So while I have no idea if MLID has legit sources or if his information is credible from what we do know from more reliable leakers it seems 512MB is would physically fit and it also seems like there is a viable product opportunity that can fit in that space as well.
Another possibility with such a config is a single GCD design with a 128 bit IO die and less cache. Not viable in the N33 segment now due to lack of 3d stacking throughput and cost I expect but as a laptop halo product it might work or as a refresh to N33 or gap filler part it could happen in the future once capacity increases.