opengl has better image quality than dx11

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
one major reason is because opengl has 32 bit log z-buffers while the best that DX has to offer is 32 bit float reverse z-buffers.
and that means that open gl would actually be better for an xbox classic emulator than DX11 would. not only is DX11 inadequate for matching the w-buffer (which was used by many Xbox classic games) 100% of the time, but 32 bit reverse float z-buffers arent enough for the PS2 replication 100% of the time either. also, dont forget that pre-DX9 games couldnt look as good on DX11 as they could with an openGL wrapper. ut 99's (unofficial) DX11 renderer doesnt look as good as an opengl renderer could because ut 99's original rasterizers used w-testing (for power VR's proprietary api) or w-buffering (for glide).

depth precision has been the most neglected aspect of 3d graphics ever since its inception imo. before G80, the only two hardware rasterizers for consumers that did 32 bit fixed point z-buffers were matrox's g400 and its derivatives, the PS2's GS, and the rage 128 and R200 (but i am not 100% sure of the latter two).

we have seen increases of render target precision to ARGB16 fixed point, ARGB16 float, and ARGB32 float, but DX still doesnt do anything that is a perfect match for the w-buffer.

and another advantage of 32 bit log z-buffers is that they dont have a stencil buffer which means that brand new shadowing algorithims would have to used which is a good thing. 8 bit stencil buffers have been used for shadows for way too long. fully programmable shadows via shaders would be much nicer.

ideally, programmable blending and depth would be used as software is more versatile than fixed function. software rendering would probably not be as fast or look as good at mid range settings, but assuming that the fpu did double precision, software rendering would definitely be able to do even higher quality than hardware (while being slower) and it would also definitely be capable of doing lower quality than hardware could (while being faster). and of course, software rendering is directly to the metal. it would definitely best performed on an isa better at graphics rendering than x86 is and two or even three general purpose dies plus display logic and probably texture units integrated into one of those dies, but it would definitely be more versatile. god in heaven help us if we ever see hardware ray tracing make it to store shelves.

your thoughts?
 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
DX11 allows the developer to create a shader which works with the depth buffer. Which gives the developers the ability to write any type of math into the z buffer calculations. Including log zbuffers.

(I've done it)
 
Last edited:

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
DX11 allows the developer to create a shader which works with the depth buffer. Which gives the developers the ability to write any type of math into the z buffer calculations. Including log zbuffers. (I've done it)
well according to dxcapsviewer, dx11 doesnt support D32 fixed point. i am not saying you are wrong, but why cant D32 be enabled in slave zero which used DX6? also, see here. is the PCSX2 dev wrong? thank you for your reply:)
 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
well according to dxcapsviewer, dx11 doesnt support D32 fixed point. i am not saying you are wrong, but why cant D32 be enabled in slave zero which used DX6? also, see here. is the PCSX2 dev wrong? thank you for your reply:)

I don't know all the details.

But I can guess.

It's because the z buffer is naturally floating point. Fixed point (which I assume means integer based) just isn't needed unless you have a stencil attached. As stencil buffers are integer based. So in other words, stencils require integer, so any depth buffer attached to a stencil also needs to be in an integer format. But if you have a 32 bit depth buffer with no stencil, there is no need for D32 integer format.

But looking at the directx documentation, there is a 16 bit integer depth format with no stencil attached, but not a 32 bit integer depth format. 32 bit requires float.

Not sure what the logic is behind that.

But anyways. You can still create a shader in 32 bit float depth format, or in a 24 bit integer depth + 8 bit stencil format and add in a Logarithm.

I believe the reason why most do not is because when you create a shader for the depth buffer, the video card optimization are disabled. So performance is reduced.
 
Last edited:

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
I don't know all the details. But I can guess. It's because the z buffer is naturally floating point. Fixed point (which I assume means integer based) just isn't needed unless you have a stencil attached. As stencil buffers are integer based. So in other words, stencils require integer, so any depth buffer attached to a stencil also needs to be in an integer format. But if you have a 32 bit depth buffer with no stencil, there is no need for D32 integer format. But looking at the directx documentation, there is a 16 bit integer depth format with no stencil attached, but not a 32 bit integer depth format. 32 bit requires float. Not sure what the logic is behind that. But anyways. You can still create a shader in 32 bit float depth format, or in a 24 bit integer depth + 8 bit stencil format and add in a Logarithm. I believe the reason why most do not is because when you create a shader for the depth buffer, the video card optimization are disabled. So performance is reduced.
thank you.:) i still dont know how a shader can make a float z buffer into a log one but thank you for explaining anyway:) i just dont get why opengl has 32 bit fixed point z-buffer support if 32 bit float z can do everything. i also thought that logarithmic z-buffers would require a larger integer range than a 32 bit float has. in other words, i thought that more decimal places wouldnt be as conducive to a logarithmic z-buffer as fixed point math would. also, i have read that 32-bit float reverse z-buffers are about the same as 24 bit fixed log z-buffers which makes sense as Xbox360 games used 32 bit float reverse z-buffer while the DX 9 pc versions used 24 bit fixed point z-buffer format.
 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
Here is some shader logic I have. It's very basic as all it's doing is calculating depth values (into the z buffer) for a shadow map:

Code:
PS_RenderOutputSM PS_RenderSceneSM(PS_RenderSceneInput I)
{

    PS_RenderOutputSM O;
	
    [b]O.fColor = I.f4Position.z / I.f4Position.w;[/b]

    return O;
}

The bold line is the math for the output to the depth buffer, which is a 32 bit float in my case. O.fColor is defined as the 32bit float because my depth buffer is a 32bit float.

Normally, a zbuffer depth write just takes the z / w like displayed above. Video cards are optimized to handle this without a shader and are about twice as fast as me doing the shader logic myself. But the results are the same.

Adding in a log would make the shader look like this:

Code:
	[b]O.fColor = log(I.f4Position.z / I.f4Position.w);[/b]

(log is a math operation like on a calculator, it could be sin/cos or any other logic I wish to put in there)... It helps smooth out the results in the zbuffer. Naturally the z/w math ends up with most of the precision near the camera. The first 10% of the camera/view space holds 90% of the zbuffer precision. Log smooth's that out. The first 10% of the camera/view space uses 10% of the zbuffer precision and down the line.

Now the log operation is actually more complex than I wrote above, but I wanted to show you how it works in general. The programmer can put whatever math they want in there, and there are pages upon pages on the web with various techniques to help the zbuffer depths in a more intelligent way. In the end, O.fColor is just a float that takes a number to indicate the depth. How I get there, or what I write as that depth is up to me.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
The most accurate solution for emulating a console is almost always going to require software rendering (which could include doing it in a compute shader). Some OpenGL version may get you closer than DirectX version or vice-versa but you probably won't get pixel perfect results. I don't think it's really fair to criticize a 3D API for not being compliant with some console specification that games (probably unwittingly) abused.

You even say you want better precision than the hardware you're emulating. What if that actually caused glitches too? It's not far fetched, I see things exactly like that in DS emulation.

And then there are some features you just aren't going to get, like real order independent translucency on Dreamcast. I don't know about classic XBox features in depth but emulation has got a long way to go before worrying about edge cases in precision between 32-bit float z buffers and w buffers..

A floating point reversed depth buffer format is already effectively logarithmic in its precision steps, even if a 32-bit float isn't as precise as a 32-bit int log for depth. But it's still good enough for almost all applications. And this is assuming the alternative is a true 32-bit integer depth format. What you might actually get is hardware that uses 32-bit floats for depth internally, and a trivial transformation to convert that into integer - still missing precision.

OpenGL also has its own headaches with its -1 to 1 depth range. There's a good article covering depth buffer precision here:

http://outerra.blogspot.com/2012/11/maximizing-depth-buffer-range-and.html

Bottom line: it hasn't been done because there isn't a demand.
 

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
The most accurate solution for emulating a console is almost always going to require software rendering (which could include doing it in a compute shader).
true.
but in the case of a dreamcast or xbox classic emulator, i would want it to be an enhanced software renderer, with better trilinear filtering and properly rotated grid aa. i was thinking that a GK110 could do most or even all of the graphics part in CUDA and the rest of the graphics plus everything else could be emulated on a 4790. nvidia has the documentation on the NV2A and the xbox classic used an intel x86 processor, so i am thinking it wouldnt be that hard to actually simulate it if enough effort was made.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
I'm a complete newb here but trying to follow along. I like what Anarchist is proposing. Given we use high speed multiple core systems would it be possible to dedicate some of the CPU to render better or more accurate depth precision or would this be too computationally expensive? Or can a 3D engine even render some of the objects from the CPU at the same time as video card handles the lion's share of work?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Given we use high speed multiple core systems would it be possible to dedicate some of the CPU to render better or more accurate depth precision or would this be too computationally expensive?

I guess you could do a depth pre-pass on the CPU instead of the GPU (meaning CPU has to do rasterization too, which the CPU isn't really great at), but it's not worth it. A GPU would probably still be better at doing this even with shader controlled 64-bit depth. There's just no driving incentive to use better depth buffers, see the article I linked to, 32-bit floats are more than good enough if massaged properly.

Anarchist420 said:
nvidia has the documentation on the NV2A and the xbox classic used an intel x86 processor, so i am thinking it wouldnt be that hard to actually simulate it if enough effort was made.

You mean it wouldn't be that hard if nVidia were the ones writing it? That wouldn't happen in a million years, so what difference does it make?