Kartajan and C1 have it as I remember. Parallel is faster when all things are the same as more wires, more data for the same speed.
Issues developed in making these systems in that once you start increasing the number of wires (bus size) the cost in time and materials to get it working gets harder/higher. 8 bits was proberly easy, 16 wide was getting tricky. 32 needed a computer to even consider doing it and 64 was looking silly at the board design level.
The issue being that for fast and reliable communications, each of the bus lines need to be of the same length (tricky to design without computer help) and spacing between all lines was best at the same distances (to help with higher speeds). This also lead to needing to share the connection between devices (64 lines from a central device to each individual slot/device was just not viable long term).
So going single dedicated high speed serial has had several advantages to start with. Now though in HDDs, a few more features have been added (like NCQ ect) which help address data internal to the HDD. IDE did have a similar one, TCQ, but IIRC only one drive implemented it (early raptors).
In terms of serial communications on the motherboard, parrell lines are still present, but each line is run independant to the others so removing previus issues and opening up flexabilty for the board designers (ie: GPUs can use upto 16 PCI-E lanes, or can make do with just 1).