Thanks for the response, Ben, but the ad hominem attacks aren't necessary. Either way, here goes the counter-counter-ad infintum-counter argument:
The GPU has a texture cache, that is very basic.
Ok, fair enough. I suppose these specs aren't usually specified and which is why I'm not aware of them.
Eight, bilinear takes four.
Right. I forgot it's 4 texture reads for each of the two levels (that's the bilinear part)...I suppose I brainfarted this morning. But either way, if a game uses a mipmapped 1024x1024 (for level 0) texture, how would the hardware sampling algorithm change if it's ran on the console vs the PC (at the same resolution)? The point is that there wouldn't be, because these algorithms are implemented the same way in silicon. And you're saying that PC developers don't care at all how fine/coarse their textures are? I think a PC dev would know the difference in their game performance if they used 512^2 level 0 textures vs 1024^2 level 0 textures and they would optimize just the same as a console dev would do. So it's not as if there's anything different between the two, and therefore there wouldn't be anything to "optimize" in this very specific scenario.
If you run at a higher resolution the proper calculations for LOD bias are altered hence making higher resolution mip maps proper under those conditions.
Basically, what you're saying is that there is essentially a bias towards level 0 is applied when sampling the same texture in a lower res setting vs higher res setting (all else being equal). This results from the fact that finer samples are needed because the texture is more finely sampled in a higher res screen. Ok, I agree. But this doesn't cause the texture cache to miss more, percent-wise. Why not? Sure, as you said, lower leveled (hence finer) mipmap levels are loaded into the texture cache, and the lower leveled mipmap is higher res, hence more data needs to be filled into the texture cache. But this is
exactly (because the additional bias is based solely on the screen res) cancelled out by the higher res causing screen samples to be closer together, and thefore the cache hit-to-miss ratio remains the same.
Furthermore, "proper" should actually be "samples the correct level mipmap depending on the screen resolution." Maybe that's what you meant anyways.
No, that tends to be there to reduce the strain on board level memory, not on texture cache level memory.
The "texture detail" I was referring to was exactly that: the onboard video memory (remember, I didn't realize that there's a texture cache...so everything I was referring to was with regard to the onboard memory), not the texture cache memory. I suppose I should take it that you're agreeing with me here.
You going to leave those vertex shaders you had assigned idle so that a fixed amount will be used for those shader ops at all times?
This is where my argument to your argument that was arguing about my other argument (ok, this is confusing...but just trace the series of arguments backwards and you'll see what I'm referring to...go all the way back to "Some of them will always be configured to be vertex shaders while others will be pixel shaders." and read the sentence right after) is taken out of context (but my argument basically agrees with your statement that the shader units will be assigned to exactly what's needed...but then then that means the OS/driver knows exactly what's needed and will automatically use the correct shader...which means that there's nothing to optimize).
Shader hardware debuted on PCs with the GeForce3- there were no pixel and vertex shaders before then and using some sort of magic with fairy dust somehow games mangaged to be made....
Ah, I was using "shader" in its generic definition, not what you're referring to. What you're referring to is "custom-shader", not just "shader." I consider fixed-functionality a shader, since afterall, it does define the final surface property...just in a fixed way, not a custom way like what the GF3 introduced.
As for the interpolated part, sorry, I knew something would go wrong with a large post. I mean to say "and let them get interpolated" but totally didn't type what I meant. I apologize for the misunderstanding (yes, interpolation happens in between the vertex and pixel shaders).
Moreover, if a pixel shader is needed, then no, usually the end result wouldn't just be a texture fetch (then there's no need for the pixel shader), but rather a blend of some texture and some custom lighting (like Ward materials or something instead of just plain old Phong).
I said poorly optimized branch- again very PC centric thinking.
Yes, poorly optimized branch prediction (more like branch-unfriendly, since the actual prediction part is done by the hardware and the dev has no control over that). And? Which PC game uses branch-unfriendly shader code? Most game shaders run at 100 or less instructions...what, you're going to have 100 branch instructions, and nothing else? No. The majority, if not all, of the instructions won't be branch dependent. Most of these shaders are deterministic material calculations that requires only lookups and computation, not branching. It's not like games are doing GPGPU or something.
You seem to be completely ignoring the exploitation of proper levels of parallelism in code, for any GPU that is enormously important.
That's the point of hardware rasterization. After the set of fragments representing a triangle is rasterized, the fragments are processed all at once (this is part of the reason why attaching the destination to an FBO as a source is VERY bad). The point is that the exploitation is
automatic and requires no intervention whatsoever (not that there's a need). So it's impossible for a dev to exploit pixel-level parallelism because it's already fully optimized and there's nothing left over to exploit (this is the beauty of graphics...it's the most easily parallizable discipline within computer science...and you don't even need to manually parallize it these days!).
The Emotion Engine supports everything in OpenGL 12.9- the Emotion Engine is a general purpose PROCESSOR.
Ohhhhhh. Ok. I see it now. But from the performance specs (as I have no access to an actual dev kit), it seems like it is quite capable. So what's EE's non-software emulation mode? It seems like even without aniso, trilinear, mipmap, T&L, and all that jazz, it still has an impressive throughput. And that throughput should surely be spent somewhere for its graphics output. Of course, without any of these features and a high throughput rate, of course the devs need to spend time getting used to coding for the PS2. But it's not the same with the 360 (the argument was about the 360, remember), since MS DOES offer the libraries so there isn't all that much to "get used to" for the 360.
As a general rule of thumb an API works as an abstraction layer and prevents you from hitting peak levels of performance.
Granted, as you said, APIs tend to degrade performance...but how much degradation? If the APIs drops normalized efficiency down from peak 100% to 50%, then sure, I admit, once the console makers totally open up their hardware, then I can see a 2X boost in performance and I will retract my original argument. But if we're talking about from 100% to 95%, then that ~5% potential perfomance boost is hardly "a lot of room," in which case, yes, we have hit the GPU's performance limit already.