pre-pentium pro.. it was before then that processors executed single ops per instruction. with the pentium pro, instructions got decoded to micro-ops.. making the core a risc architecture internally. today, mmx-type instructions are still 1 op per instruction.
new architecture, expanded logic, and larger cache contribute to a larger number of transistors. as the size of the transistor reduces, more can be fitted in the limited area. expanding logic can increase the number of transistors required for a certain function significantly.. but it allows the function to be performed at a much greater speed. for example, ripple carry adders.. when you add A and B, each of the bits are compared to get the carry out and sum (which is dependant on the carry in).. if the bit-width of A and B is small, this method will be sufficient.. however, if it's large.. it'll take a few cycles for the 32nd bit to be computed. carry propagate-generate adders expand the carry-in's and out's.. making it so the carry's are independant of the sum. this increases the amount of transistors required, but also allows the addition to be performed at a significantly higher speed.
i'm tired.. and i got school tomorrow, so i'll finish this up later.