I dont know not that long ( 18 stages according to anadtech*) it has 4 execution domains ( amd has 2 int/fp and intel 1) to keep execution complexity in check. It also decodes and retires 8 instructions a clock and has 4 load store ports ( two dedicated load, two shared L/store) , that extra store port compared to intel/amd is vital for those high SMT modes.
* http://www.anandtech.com/show/10435/assessing-ibms-power8-part-1/3
Now i get why power8/9 gains so little from 8 thread (4 thread are still a good gain on 2)... 8 retire/cycle, like zen and skylake that have 2 threads... 4 threads can still be supported with 8 retire, but not 8...