It's not a trivial problem. There's two different voltage regulators in series where both of them can affect voltage rise times. And it's documented, AMD have used years system that regulates core clock by actual delivered voltage - it's absolutely needed scheme to drive voltage margins down. AMD LDO is natural evolution step on that regulation scheme - after shaving voltage margins for all cores they also shaved those per core basis.
Finally had time to at least briefly look through the patents posted. It looks like AMD is using a fairly typical dLDO at its core for the individual core voltages, but obviously with some twists/enhancements. Much of my post will probably be a recap of things already discussed, but just wanted to make sure we're on the same page.
As already pointed out, they are not using a single regulated output for multiple reasons. They have the system level VR, which is high efficiency, large area, and slow. The system level VR will regulate to the highest needed voltage of any core/block. From there, the individual cores/blocks are provided with an additional regulated voltage provided through the dLDO. AMD is calling this a distributed regulation scheme. The main purpose/innovation of their patents is actually related to this distributed regulation scheme, rather than the dLDO design. This scheme, as outlined, has several benefits, not the least of which includes being able to be very area efficient and the ability to scale extremely well, even to provide effective VR for massive GPUs. The dLDO does suffer in efficiency when there is a large voltage drop required. The worst case scenario here would be when one core is running at full boost, requiring a high voltage, and the other cores are at idle, only requiring a low voltage.
The dLDOs supplying the cores at idle will tank in efficiency due to having to provide a large voltage drop from the system supplied voltage which is required to be high due to the boosting of the single core. With that said, from a system level view, the low efficiency of the dLDOs in this situation is not very impactful because the current draw in those dLDO's will be a small fraction of the system power consumption anyway. In other words, yes, you are getting bad efficiency from the dLDOs, but the magnitude of the power consumption of those idle cores/dLDOs is so low that you don't care much anyway and you are gaining far more efficiency by allowing the idle cores to have their own low voltage lines to begin with. The dLDOs supporting high frequency/voltage cores should be very efficient due to the low voltage drop required as well.
As for the dLDO itself, as I mentioned, it looks like AMD is using a fairly standard dLDO with some tweaks. For a quick recap, a dLDO works fairly similarly to the analog LDO
@PJVol posted a schematic diagram of. However, rather than an error amplifier, you would have an ADC and control signal block. Rather than a PFET being used as a common source amplifier, you would have an array of PFET switches. How many switches are turned on in parallel to provide the proper regulated voltage is controlled by the control signal block. AMD is, at least at a high level, using this same topology. They call their ADC a PSM and the control signal block a dLDO controller, but it works the same way.
There are two enhancements AMD has made to this design that I can see from the patent. The first enhancement is that, unlike in a standard dLDO, the PFET switches need not be homogeneous. In other words, they have PFET switches that are designed to give high, medium, or low resistance. This gives them the ability to have finer resolution (more accurate) control over the regulated voltage. The downside is that it makes the control circuitry more complex, but obviously AMD has figured out that the additional overhead in the control logic is worth it. The second enhancement is that they are rolling into the design a voltage droop detector (they call this a fast droop detector) which can bypass the normal control feedback flow and take over the control signals if a big enough voltage droop is detected. This allows AMD to quickly compensate for any large transients in the load (core) current. The cost here is simply the circuitry for the voltage droop detector and then a large MUX which will switch the control signals from regular control to droop scenario control if the voltage droop threshold is reached. I don't imagine this circuitry only has a negative effect on efficiency, but is obviously critical for a robust design.
I think the patents lay out pretty well what AMD is doing and it's a clever design. Basically the engineering trade off is individual core VR efficiency for overall system (SOC) efficiency, area, and scalability (I didn't touch too much on scalability but it is actually a key component of the design and is outlined well in the patents).
Edit:
@Ajay, I was typing this out as I had time throughout the morning and didn't see your request before posting. If someone wants to continue to discuss this and wants to make a separate thread, I'll post any replies there instead.