You can think of it this way.
Clocked processor is like assembly line, you have 1 minute to work on every part, then you pass it on to the next guy after 1 minute, no matter what. So everyone better do their work in 1 min or you are screwed.
Asynchronous processor is like an assembly line, except you only pass to the next guy when he is done with his previous part, and you are done with your part. The advantage is that if some guy is slow at any given time, the work is slowed, but it still gets done. That's why asynchronous processors can run at different speeds depending on voltage and temperature.
The disadvantage is that it's two way communication. The guy before you cannot give you his part until the guy in front takes your part from you. There is overhead communicating back and forth. Now imagine that you take two parts from two different guys and put them together and pass them on to third guy, like say an adder does in a CPU. Then you have to communicate with the two previous guys, and the guy in front of you. So the more branching that goes on, the higher this overhead becomes.
Also, instead of writing RTL, which every logic designer knows, you generally have to write handshaking expansions, which almost noone knows. Handshaking expansions define the logic and the communication protocol between the various asynchronous parts to basically tell each other when they are ready to accept the next piece of data. This can add a lot of overhead to the circuitry as well. For example in a synchronous processor, you may have a simple flip flop that fans out to say 20 other flip flops. No problem, only need one wire to go to all of them. But asynchronous, that is a much bigger issue, because you also need to get an acknowledge back from all those receivers. So what was a simple wire, becomes 20 wires coming back, plus you need a huge logic gate to AND all these wires, since you need to know that everyone has received your transmission before you stop transmitting it and move on to the next one. So building something like a shifter in delay insensitive asynchronous logic is not easy.
There are other things like GASP, which is asynchronous clocking for synchronous logic, but basically it's no longer delay insensitive, since you have race conditions between your asyncronous clock and data making it a lot harder to design and taking a lot of advantages of asynch away, though it is capable of very high speeds.