My speculation:
8K Micro-op/L0i cache
4x ALU-AGU pairs and each with a dual-ported unified scheduler (Reduction of scheduler and PRF networking:Zen 6-schedulers, Zen2 5-schedulers, just examples)
4x 128-bit FMAC + Single PRF array (>50%(or 40%?) smaller area at same nominal perf and lower power)
EPYC and Ryzen Mobile with TDP limited versions will potentially see a 1 GHz boost when 4x FMUL+4x FADD occurs.
With that sub-10 watt Zen3 Cezanne will potentially have same or improved clocks to 15w Zen2 Renoir.
Zen2 => 2x 256b FMAC arrays ported as 2x FMUL + 2x FADD(low 128-bit PRF) and 2x FMUL + 2x FADD(high 128-bit PRF)
Zen3 => 1x 512b FMAC array ported as 4x FMAC(128-bit PRF, tagged to 512-bit)
^-- Not sure on the logicistics, but FPU scheduler might be two-leveled?.. 32?-entry four-ported that feeds into 4x??-entry dual-ported which is tied to a FMUL+Bridge unit and a FADD unit.
Zen2 => 4x16 + 1x28 Schedulers w/ 4x ALU+3x AGU
Zen3 => 4x?? Schedulers that operate on dense fused micro-ops rather than individual micro-ops w/ 4x ALU+4x AGU
^-- reduces load-compute-store instruction to operation complexity in Zen. Potentially, comes with a bandwidth increase of 96B/clk in Zen2 to 128B/clk Zen3.