what's an ROP?

her34

Senior member
Dec 4, 2004
581
1
81
in reviews, aside from usual stuff like vertex/pixel pipelines and clock speeds, i see ROP mentioned. what is it and does it usually limit performance?
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
ROP aka render output unit is what produces the final pretty pixel for the frame buffer. All the shading and texture goop is blended, transformed and what not to the final pixel value displayed on your screen by such a thing.

It stands for (I think) Raster Operations Pipeline. These days with various parts of the pipe being thicker than others and decoupled from the ROP, I'm not sure that term is very applicable.

Edit: you betcha bottom it limits performance. You can only render as many final pixels per clock as the number of ROPs. They can be prettier pixels if you have the math power to do so earlier in the pipe, but this one is an absolute limiter.
 

Dethfrumbelo

Golden Member
Nov 16, 2004
1,499
0
0
However, my understanding is that ROPs are not even remotely a bottleneck even on the X1900XT which still has 16 ROPs, same as the X800s. Pixel shader units and memory bandwidth seem to be the two biggest bottlenecks.

 

her34

Senior member
Dec 4, 2004
581
1
81
so all an ROP will determine is the maximum resolution/refresh rate that a gpu will support?
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
Not refresh rate, but frame rate at a certain resolution. The two are not required to be synchronous, you know. I run 120hz refresh at 1280x1024, but my video card is only good for 2-3 fps sometimes (definitely not ROP limited =).

And 4x the ROPs is a *huge* difference. That's the difference between being able to render 800x600 and 1600x1200 at the same frame rate (roughly 120 fps on ~600 mhz cards, unless I typoed data into xcalc). Everything can be a bottleneck at some point, but 4 ROPs is nowhere near enough unless your target resolution is 800x600 or target frame rate of 40 fps at 1600x1200.

In summary, if your target is 40 frames/sec or low res for very shader, geometry & texture heavy rendering, you won't be ROP limited with a 4 pipe card like the X1600. If you're trying to do high res or high frames of relatively easy to compute scenes, you will be.

(p.s., the X1600 is also possibly bottlenecked on textures, only having 4 texture units as well. That's a whole different topic re: theoretical impact on performance)


 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: v8envy
Everything can be a bottleneck at some point, but 4 ROPs is nowhere near enough unless your target resolution is 800x600 or target frame rate of 40 fps at 1600x1200.

In summary, if your target is 40 frames/sec or low res for very shader, geometry & texture heavy rendering, you won't be ROP limited with a 4 pipe card like the X1600. If you're trying to do high res or high frames of relatively easy to compute scenes, you will be.

Basically: you get, at most, one pixel per ROP per GPU clock cycle (that is, theoretical fillrate is actually ROPs * clock rate). It's meaningless to say that "X ROPs will let you run at resolution Y*Z" unless you also specify the clock frequency. More ROPs at the same speed will let you push higher framerates -- assuming, of course, that you have the memory bandwidth and other processing power (texture units, ALUs, shaders, etc.) to keep up.

800x600@60FPS is ~28MPixels/sec.
1600x1200@60FPS is ~115MPixels/sec.

An X1600Pro has "only" four ROPs, but it runs at 500Mhz, giving it a theoretical fillrate of ~2.0Gigapixels/sec. -- WAY more than enough to do 2048x1536@100FPS if it was rendering something absurdly simple. The limitation with most cards is that they just don't have the texturing capabilities and memory bandwidth to get anywhere near that kind of output unless you're playing something like Half-Life (1).
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
Originally posted by: Matthias99

Basically: you get, at most, one pixel per ROP per GPU clock cycle (that is, theoretical fillrate is actually ROPs * clock rate).

And since the ROPs are also doing stuff like blending and AA, you're not going to get anywhere near the full theoretical pixel fill rate, except as you said in the best possible absurd case. I vaguely remember running across this discussion elsewhere and people throwing around various numbers of # clock ticks/pixel render >>>> 1. I used 10.

Check out the pixel fill rate on the 7600 GT. That's enough for 4480 * 1000000 / 1600 / 1200 = 2333 frames/sec if it only took one clock tick to render a pixel. The NV guys think it'll take around 20 clock ticks per pixel to do the final rendering on modern games w/ settings enthusiasts want -- they're not going to simply waste silicon for double the fill rate of a X1600 pro unless they felt it was needed.

Of course, the 7600GT also has a 6720 MTexel/sec fill rate as well -- 3x that of the X1600 Pro's 2 gigatexel. So while 4 ROPs may not be * the * bottleneck for the X1600, it's definitely very likely it's *a* bottleneck.

edit: a little less handwaving, a bit more hard data:

http://www.simhq.com/_technology2/technology_075d.html

Using 3dmark2005 pixel/texture fill rate benchmarks which presumably model modern games gets us a ballpark of 114 fps with the 590 mhz X1600XT. 114 * ( 500/590 ) guestimates 96 fps for the Pro at 1024x768, and 96 * ( ( 1024 * 768 ) / ( 800 * 600 ) ) guestimates 150-ish fps at 800x600 for the X1600Pro. Not far off from my wild guess of 120. Am I good at picking random numbers or what?