I also think the article's argument is uncompelling.
The author seems have a screwed up conception of instruction level parallelism. He says, "the Athlon can theoretically decode and execute three instructions per clock cycle, but poor instruction parallelism keeps it closer to two instructions per clock cycle in most instances." If this is the case, why would having a more powerful instruction decoder help things at all? Poor instruction level parallelism is the fault of the way code is and the number of registers on the x86 architechture.
Also, a double pumped decoder would contribute nothing to higher clock speeds. I think it highly unlikely that AMD would not try to increase clock speed by means of longer pipelines when this has been the standard thing to do with processors ever since pipelining was invented.
I never understood why the Inquirer has such a bad reputation but now I'm beginning to.