NTMBK
Lifer
- Nov 14, 2011
- 10,208
- 4,940
- 136
Okay, so Skylake has 168 vector registers already. If each AVX-512 register takes 2 vector registers, and there's 32 AVX-512 registers, then it would fit into the existing register file with room to spare. It leaves less room for rescheduling, but with 32 registers you would hope that the compiler has more flexibility to schedule sensibly (and each op takes up two ports, so the OoO engine needs to find less parallelism to keep the core busy).