Given the vast increase in compute work over almost anything, maybe increasing SIMD to other resource ratio makes sense. Compute is increasingly butting in on the geometry pipeline now for an increasing number of titles, you need a lot of it for raytracing, etc. Smooshing more into a compute unit could make that L0 thing more worthwhile as well, allowing for L0 reuse in a compact area. And if AMD wants to raise performance above double a 6900xt they need better bandwidth usage or a 512bit bus. So I guess overall I can see the argument for it, especially if the instruction overlap and L0 scheme can indeed get a decent amount of work out of these extra SIMDs without causing a vast increase in silicon. Performance per SIMD would still probably go down, but performance per mm would go up overall, as would perf per watt.
As a guess based on that patch information (2 memory dies, 6 compute dies? for one chip) for how the lineup might look:
Compute die: 32 simds. Memory die: 192bit bus.
384bit bus/192 simds/384mb cache/24gb@24gbps ram/$1500-2000
384bit bus/128 simds/192mb cache/12gb@18-20gbps ram/$800-$1200
192bit bus/64 simds/12gb@24gbps ram/$500-600
192bit bus/56 simds/12gb@18-20gbps ram/$359-459
APU(dgpu monolithic or other coming later?):
128bit bus/32 simds/8gb@18-20gbps/$249-329 (ram and bus only applicable to DGPU)
Note the cut down (56simd) isn't considered a "seperate/new" gpu in some technical documentation as it would have the same configuration of chips as the none cut down one. Because moving to small chiplets would see so many good dies vs salvaged ones, and because the configurations of non salvaged dies might play out like above, I'm not sure I can see a lot of configurations using salvaged dies when one gap filling one needed to for the highest volume area might do. There's already a dearth of salvage AMD dies like the mostly missing rx6800 non xt, and it's only going to get worse with chiplets.