FX FP32 needs 2 clock cycles?

Bopple

Member
Jan 29, 2003
39
0
0
Q: How do you feel the pixel and vertex shader power and the pixel precision of the FX will come into play when the new wave of Direct3D 9 media hits the store shelves?

A: The CineFX engine is really the ¡°soul of the machine¡± for GeForce FX. It encompasses the VS2.0+ vertex engine, the PS2.0+ pixel shading engine and the studio-quality 128-bit color. The long shader programs and full 128-bit color enable a whole new level of image quality. These features are what make the characters Dawn and Ogre from the GeForce FX demos look so good. We¡¯re going to see that level of quality and that level of detail in future games based on DX9. You¡¯ll also see games take advantage of GeForce FX¡¯s native 64-bit color mode that offers high-precision math, but delivers blazing performance too.
:: From nVidia's Geoff Ballew in an Interview at EliteBastards
It's implying that 128bit-mode(FP32) is slower than 64bit-mode(FP16).


The R300 can run Doom in three different modes: ARB (minimum extensions, no
specular highlights, no vertex programs), R200 (full featured, almost always
single pass interaction rendering), ARB2 (floating point fragment shaders,
minor quality improvements, always single pass).

The NV30 can run DOOM in five different modes: ARB, NV10 (full featured, five
rendering passes, no vertex programs), NV20 (full featured, two or three
rendering passes), NV30 ( full featured, single pass), and ARB2.

The R200 path has a slight speed advantage over the ARB2 path on the R300, but
only by a small margin, so it defaults to using the ARB2 path for the quality
improvements. The NV30 runs the ARB2 path MUCH slower than the NV30 path.
Half the speed at the moment. This is unfortunate, because when you do an
exact, apples-to-apples comparison using exactly the same API, the R300 looks
twice as fast, but when you use the vendor-specific paths, the NV30 wins.

The reason for this is that ATI does everything at high precision all the
time, while Nvidia internally supports three different precisions with
different performances. To make it even more complicated, the exact
precision that ATI uses is in between the floating point precisions offered by
Nvidia, so when Nvidia runs fragment programs, they are at a higher precision
than ATI's, which is some justification for the slower speed. Nvidia assures
me that there is a lot of room for improving the fragment program performance
with improved driver compiler technology.
:: From the John Carmack Interview
When using FP, FX is at only the half speed of 9700p.


We know that NV30 supports the ability to produce two FP16 (64-bit) shader instructions in the time it takes to do one FP32 (128-bit instructions), however the number of cycles these take is unclear. At one launch presentation an NVIDIA representative mentioned that 32bit integer instructions can be done at two per clock, which was twice as fast as an single FP16, which suggests that FP16 instructions operate in one clock cycle, hence FP32 instructions would take two clock cycles. Is this the case?

We have not disclosed these details of our architecture.
:: From GFFX Tech Q&A about HOS & Shaders by Beyond3D
nVIDIA says no comment on this.


Sum) Seems FX FP32 needs 2 clock cycles to me. Am i wrong? I'm no 3d pro. Anyone would correct me plz?