[Techreport] All Skylake-X i7 CPUs have two AVX-512 FMA's

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
I recall this info being posted here long ago when the chips came out?
Pretty sure someone confirmed that at the time?
 
  • Like
Reactions: Phynaz

Nothingness

Diamond Member
Jul 3, 2013
3,294
2,362
136
I recall this info being posted here long ago when the chips came out?
Pretty sure someone confirmed that at the time?
Yes, but having it confirmed by Intel means it won't suddenly disappear if a new stepping of the chip is made :)

OTOH what Tech Report calls "official docs" is ark.intel.com which has been proven wrong in the past.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
AVX-512 adds cool stuff like scatter instructions and vector masking, which makes it much more capable. Far more code can be vectorized that with SSE or AVX/AVX2. Plus it doubles the number of vector registers available.

Does it mean that a future game that would otherwise be CPU limited, will get a significant speed up because of AVX-512, if the game developer took AVX-512 into account when coding the engine?
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
I recall this info being posted here long ago when the chips came out?
Pretty sure someone confirmed that at the time?

Maybe, it was news to me. Checking a few launch day reviews - including Anandtech's - still shows the i7's as having one AVX 512 unit per core.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Does it mean that a future game that would otherwise be CPU limited, will get a significant speed up because of AVX-512, if the game developer took AVX-512 into account when coding the engine?

It could, as Intel supports packed vector math, so if the compiler compiler generates an AVX 512 code path even older AVX instructions can get speed increases due to being able to execute more instructions in parallel.

As far as the AVX 512 instructions themselves, there's not really anything that would be of much benefit to current game technology. I think most of the performance increases will come from packed math.

I'm guessing you're looking at a minimum of 5 years before you see games supporting this, as it's not available in regular consumer CPU's yet.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Does it mean that a future game that would otherwise be CPU limited, will get a significant speed up because of AVX-512, if the game developer took AVX-512 into account when coding the engine?
A game being CPU-limited does not necessarily mean that the limitation is due to lack of adequate vectorization. Most of the CPU limitations in game engines arise from a lack of multithreading, which modern game engines are slowly beginning to account for.
 
  • Like
Reactions: TempAcc99

jpiniero

Lifer
Oct 1, 2010
16,568
7,071
136
AVX-512 adds cool stuff like scatter instructions and vector masking, which makes it much more capable. Far more code can be vectorized that with SSE or AVX/AVX2. Plus it doubles the number of vector registers available.

It comes at a huge power and transistor cost though. Kinda useless outside of HPC right now although the GFNI instructions sound useful for AES.

The wider you get the more you may as well use a GPU.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
It comes at a huge power and transistor cost though. Kinda useless outside of HPC right now although the GFNI instructions sound useful for AES.

The wider you get the more you may as well use a GPU.

It's relatively small in terms of die size.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
A game being CPU-limited does not necessarily mean that the limitation is due to lack of adequate vectorization. Most of the CPU limitations in game engines arise from a lack of multithreading, which modern game engines are slowly beginning to account for.

Well if that is the case, can't see too much reason to get excited about AVX-512 then.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
It's relatively small in terms of die size.
Anandtech:

Given what we know about the AVX-512 units in Knights Landing, we also know they are LARGE. Intel quoted to us that the AVX-512 register file could probably fit a whole Atom core inside, and from the chip diagrams we have seen, this equates to around 12-15% of a Skylake core minus the L2 cache (or 9-11% with the L2). As seen with Knights Landing, the AVX-512 silicon takes up most of the space.
 

NTMBK

Lifer
Nov 14, 2011
10,423
5,728
136
A game being CPU-limited does not necessarily mean that the limitation is due to lack of adequate vectorization. Most of the CPU limitations in game engines arise from a lack of multithreading, which modern game engines are slowly beginning to account for.

If you've got a single thread bottleneck, then improving the throughput of data on that thread (by e.g. vectorizing a bunch of operations over arrays of data that need processing) will improve overall performance. Multithreading and per-thread performance are complementary!
 
  • Like
Reactions: dogen1

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
If you've got a single thread bottleneck, then improving the throughput of data on that thread (by e.g. vectorizing a bunch of operations over arrays of data that need processing) will improve overall performance. Multithreading and per-thread performance are complementary!
But aren't most ST bottlenecks due to trying to process too many draw calls through a single thread? This seems to be the case in DX9 and the Source engine - where you always get slowdowns when trying to render long draw distances. How will increased vectorization help alleviate draw-call bottlenecks?
 

NTMBK

Lifer
Nov 14, 2011
10,423
5,728
136
But aren't most ST bottlenecks due to trying to process too many draw calls through a single thread? This seems to be the case in DX9 and the Source engine - where you always get slowdowns when trying to render long draw distances. How will increased vectorization help alleviate draw-call bottlenecks?

If your thread is batching up a whole bunch of geometry data to dispatch to the GPU, then it's probably crunching through a load of linear algebra (e.g. transforming co-ordinates). Big array of data items that need identical floating point operations applying to them... seems like a great vectorization candidate.
 
  • Like
Reactions: tamz_msc

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
If your thread is batching up a whole bunch of geometry data to dispatch to the GPU, then it's probably crunching through a load of linear algebra (e.g. transforming co-ordinates). Big array of data items that need identical floating point operations applying to them... seems like a great vectorization candidate.
Hmm, so here is what I read up on:
https://support.unity3d.com/hc/en-us/articles/207061413-Why-are-my-batches-draw-calls-so-high-What-does-that-mean-
Static batching is the recommended batching technique for objects that do not move and it render batched objects very fast. It has a trade off regarding memory, as the meshes need to be combined into a single larger mesh, which is made of the union of all the smaller individual meshes in the scene that are marked as static and meet the criteria to be batched together. To do static batching, you need your objects to be static, thus mark them as static in the inspector.

Dynamic batching on the other hand, tries to optimize the way non-static objects are rendered, by this transforming their vertices on the CPU, grouping many similar vertices together, and drawing them all in one go. It's limited to small meshes, as batching larger meshes dynamically is more expensive than not batching them.
So like you said, vectorizing can indeed help to process dynamic batching in a better way.

Come to think of it, the slowdowns in Source are extremely apparent with smoke and other particle effects. I play Insurgency, and it is a general advice to disable soft particles on slower CPU for better frame rates. I also recall an instance where developers of Path of Exile sped up particle-based effects using AVX.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Hmm, so here is what I read up on:
https://support.unity3d.com/hc/en-u...tches-draw-calls-so-high-What-does-that-mean-

So like you said, vectorizing can indeed help to process dynamic batching in a better way.

Come to think of it, the slowdowns in Source are extremely apparent with smoke and other particle effects. I play Insurgency, and it is a general advice to disable soft particles on slower CPU for better frame rates. I also recall an instance where developers of Path of Exile sped up particle-based effects using AVX.

I think Unity's "dynamic batching" might be specific to that engine. I don't think I've heard of other engines doing the same thing, though I could be wrong.
 

Thunder 57

Diamond Member
Aug 19, 2007
3,838
6,479
136
Well if that is the case, can't see too much reason to get excited about AVX-512 then.

People use their PC's for things other than gaming. Shocking, I know. I could see this helping with transcoding as that has always seemed to benefit a lot from SIMD. SSE2, AVX, AVX2, they all seemed to have helped a good bit.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
People use their PC's for things other than gaming. Shocking, I know. I could see this helping with transcoding as that has always seemed to benefit a lot from SIMD. SSE2, AVX, AVX2, they all seemed to have helped a good bit.

Yeah but these things won't motivate me to upgrade and sounds like another niche dot point that Intel can use for marketing, but will be meaningless to most people.