bdver1 6 directpath instructions in two cycles for both cores? (3 per core)
bdver3 4 directpath instructions in two cycles per core? (4 per core)
bdver1 = ffma01, fmal01
bdver3 = ffma01, fpsto
It would appear that the first ffma in bdver3 takes the place of fmal0 from bdver1. fpshuf seems to be new.
http://c-cpp.r3dcode.com/files/gcc/4/6.2/gcc/config/i386/bdver1.md
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01079/bdver3.md
I'm going to guess that the missing address units is a typo. I believe fetching happening every two cycles might be the fact that it is fetching for each core rather one fetch that addresses both cores.
1 x 16B per cycle per core(end result: 2 x 16B per cycle both cores) -bdver1
1 x 32B per cycle(end result: 1 x 32B per cycle every other core) -bdver3
^-- lengthier pipeline maybe...
I think the three directpath decoder thing was copy and pasted because there is a different number of decode units. 3 -> 2. I'm looking at the newest GCC to see if bdver1 info has been updated. (4.8 snapshot has 82,000 items, 10 hours later...why did I put this on the HDD!!!!!!)
Someone should probably upgrade bdver1.md if there is actually four decoders...and bdver3.md if there is actually four decoders...