just let them talk directly here, I really don't like this kind of "I have a friend" discussion, pls ask them to explain me how AVX2 gather instructions can help to sample textures when there is neither 8-bit nor 16-bit elements gather instructions, do they work with FP32 textures ? do they know it's plain overkill in most cases ?Huh? The engineers I've talked to who work on 3D content production on the CPU
btw when I say "I'm sure other people will report other findings" I really mean it, if they have a working AVX2 path, just let them report their actual results, it will be far more interesting than to hear you overhyping something for which you lack first-hand experience
if for ex. they apply convolution filters to FP32 textures yes, otherwise, I mean if they want something fast, AVX2 gather isn't very practical, but maybe AVX3 gather will be great ? let's skip AVX2 ?and the former seems like it could make good use of gather.
good point, so let's say 9% speedup from hardware gatherBut wouldn't that make scalar loads become more of a bottleneck?![]()
One frequent use case is the emulation of the MIC VCOMPRESS instruction, one easy to grasp example is back faces elimination, ask one of your "engineer who work on 3D production" if you don't get itI'm curious what you need the generic permute for in 3D rendering.
How GPUs limitations are relevant to rendering on the CPU ? Are you among all these people thinking that only GPUs can do graphics ?GPUs don't support any cross-lane operations as far as I know.
indeed why ? where did I say that we don't have a legacy SSE path ?And why would that be significant compared to the billions of people who don't have AVX support at all?
Last edited: