AFAIK:
Both the FIFO and crossbar would be between the pixel pipes and the ROPs. FIFO is a first in, first out buffer. This makes sense with more pipes than ROPs, as the quads (group of four pipes) just "drop" in their pixels as they finish them, and the buffer maintains a queue of pixels to be rendered by the available ROPs. You'd want a crossbar if there were as many ROPs as pipes, which there is in a 6800. The crossbar isn't a buffer, it's more of a traffic cop, directing pixels to free ROPs. I learned this myself not too long ago
here.
Yes, ROP is last in the pipeline (altho the definition of pipeline is
http://www.beyond3d.com/forum/...ic.php?p=378404#378404). You can see that in the NV40 (6800) diagrams posted at Anandtech (and everywhere else) in the initial 6800 p/reviews.
As for the X700XT barely beating 6600GT in "fillrate limited games," there are a number of explanations. One, those games aren't that fillrate limited.

Two, it's the drivers, particularly with OGL games. Three, the X700XT doesn't seem to have the bandwidth to really outstretch the 6600GT in single-texture situations, where its extra ROPs would be apparent. (I guess this also applies to alpha blends, too, but I'm really out of my depth here--moreso than the rest of this post.)
Edit: As for your first point, the 5800 and 5900 has the same theoretical fillrate simply b/c they had the same number of ROPs. Of course, they had far less pixel shader power, something that was corrected in the 6800 by both adding more pipes and making the shaders more capable. Someone noted
how similar the 6600GT is to the 5800U. It
is an interesting comparison, and shows how much nV has learned since then.