Are the new consoles CPU limited?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
No, they are quiet weak. Games will be hard pressed to keep up with advanced AI and physics among the normal CPU stuff. On the other hand the GPUs are very advanced, disproportionately so.

I suppose it is possible to type something less accurate, but I can't think of it off the top of my head. The X360's CPU may not be the monsterous beast that Cell is, but saying that it is overshadowed on a technical basis by the R500 borders on lunacy. The R500 is where MS saved the most money v the original XBox. While they really cheaped out with the first solution going with a sickly x86 and its lethargic x87 they improved their performance by orders of magnitude with the 360.

Thats why a company like Epic (or whoever is the Unreal Engine 3.0 creators) has a leg up on most console developers because they were expecting such hardware to come into spec eventually and so they just have to adapt their engine to run on the 360 hardware.

Epic is actually know for being extremely inept in console engine development. Something they have proven every time they have touched code aimed at the consoles.

if it's CPU bound, then decreasing the resolution won't make a difference...also, the texture samples take up the same amount of memory no matter what the screen resolution is

This is a good example of sloppy coding PC mentality and where console devs slaughter PC devs. Running at a higher resolution alters your lod bias altering the mip selection which increases your texture cache useage and increases cache misses and stalls. For a sloppy as he!l PC game it doesn't matter- on a console all of those factors are taken into consideration when they start trying to extract peak performance.

Some of them will always be configured to be vertex shaders while others will be pixel shaders.

Again, in a sloppy PC coded environment you are right, not on the consoles.

The fact remains that ALL of the 48 shaders are used because the instruction dispatches to each of the shader units are automatic.

Poorly optimized branch- lose a couple dozen clock cycles, dependant loop falling down, lose a couple hundred clock cycles. All standard issue for PC games. These things are fixed in later life cycle console titles.

we know that the CPU won't make a lot of difference when we talk about real-time rendering

The PS2's GS is a really fast Voodoo1 in essence, except it doesn't support all of the features the V1 does. Console devs are entirely different from PC devs- they are vastly superior at getting more performance out of less hardware and they have proven this every generation. I pulled up your quote as the little 300MHZ MIPS core in the PS2 is capable of closing the gap between what amounts to a Voodoo1 and a GeForce2 with power levels significantly lower then what is currently sitting idle on the X360's most CPU intensive titles. You really shouldn't continue to seriously underestimate what devs are going to be able to do with the hardware. Wait until TeamNinja starts showing off their second gen X360 titles and then come back and talk about how there wasn't enormous amounts of performance left to be had.
 

Acanthus

Lifer
Aug 28, 2001
19,915
2
76
ostif.org
Originally posted by: Fadey
uh not really a 7800gtx , the 256 one any way has problems running fear maxed out with softshadows etc or x16 aa , or really x8 even.

softshadows + AA doesnt work in fear.

and on top of that, at a resolution that low, a GTX would sceram at max settings.
 

dunno99

Member
Jul 15, 2005
145
0
0
graphics engines built around the hardware can allow for developers to dedicate far more power to their choice vertex or pixel shading than currently available modern graphics cards

Sure, I don't doubt that at all. But do we know if PGR3 itself has its engines built around the hardware? And is the hardware all that different that they need time to get used to know it? Remember there's a tendency for 80/20 here: 80% of the job done with 20% of the effort. So unless Bizzare only spent 10% effort, I think the large majority of the GPU's processing power is already put to use. Sure, second and third generation games might improve by 20% (whatever 20% means...I'm not really sure =P), but it's far from the general sense of "vast amount of power to be tapped."

Also, do you not realize that your comparison between the eDRAM on the Xenos and the main memory on the x1800xt is flawed?

Ok, I just checked the numbers again. Apparently it's 22.4GB/s to main memory, ~25GB FSB, and 256GB/s to eDRAM. So essentially we can assume that writing to the framebuffer for display is free. Hrmmm, well, I don't really know the % of bandwidth writing to the frame buffer takes, so I guess I will refrain from judgement. (but remember that the eDRAM is only for writing, not reading...so no "free" textures can be stored there)

I disagree dunno, there is always much to be learned about the hardware., none of the programmers have ever dealt with a 6 thread 48 pipe unified architecture. I honestly doubt the rumors are true about PGR3, I've played the game, and there is zero slowdowns at all... it's gorgeous. I will look forward to what's to come.

All the "unified" architecture means is that now they have virtually 48 vertex pipes or 48 pixel pipes if they only use one of the functions. But the realflexibility comes from the fact that the dev does not have to worry about balancing the different pipes because the underlying OS/drivers/hardware will automatically take care of that for them. If anything, this would mean that there's less room to optimize (because they get that increase in performance immediately...meaning that first gen 360 games will automatically operate at higher efficiency than other consoles), but they start off in a much better position.

This is a good example of sloppy coding PC mentality and where console devs slaughter PC devs. Running at a higher resolution alters your lod bias altering the mip selection which increases your texture cache useage and increases cache misses and stalls. For a sloppy as he!l PC game it doesn't matter- on a console all of those factors are taken into consideration when they start trying to extract peak performance.

Err? There's cache on the GPU? Mipmaps are all stored on the GPU's memory. Every trilinear filtered texture lookup requires at least 2 reads (more if you use anistropic filtering); one from the next higher level and one from the next lower level and then a weighted average (the weights being: 1 - how "far" the actual sample is from the two levels) is computed on the two samples...there is no "extra" usage. And if you're talking about using the LOD bias to eliminate lower (hence higher res) levels of mipmaps, then one can do exactly the same to PC games AND console games. This is exactly the reason why you see options like "texture detail" in games such as Doom 3. (and I knew someone was going use that argument...that's why I preemptively said texture samples take up the same amount of memory...I wasn't expecting my own argument to be "countered" with the argument that I was trying to counter in the first place...weird)

Again, in a sloppy PC coded environment you are right, not on the consoles.

What? All of a sudden a 3D game won't be using vertex shaders? Or pixel shaders? Where're the pixel shaders going to get their interpolated pixels if one doesn't transform the vertices, calculate their colors, and interpolate them using the vertex shader? On the other hand, where is game going to get its pixels on screen if it doesn't use pixel shaders? So, yes, both shaders have to be used, no matter if it's console or PC development.

Poorly optimized branch- lose a couple dozen clock cycles, dependant loop falling down, lose a couple hundred clock cycles. All standard issue for PC games. These things are fixed in later life cycle console titles.

IIRC, branch prediction isn't used all that much at the moment. Sure, SM3.0 allows 65536 instructions per shader (or was it infinite? I don't remember which number applies to the vertex and which to the pixel shaders), but that doesn't mean games these days take advantage of them (and not that they will need to in order to get good looking games...hey, a madd used is a madd used...it doesn't matter if it's for the second part of a short shader or part of a longer shader. And yes, this is another pre-emptive argument, if you can't tell). So, that argument is rather moot, since most, if not all, games these days don't branch all that much. They have simple, straightforward, and deterministic shader code.

The PS2's GS is a really fast Voodoo1 in essence

Voodoo...1? If you're saying Voodoo 5, I can understand. I'm sure the Emotion Engine supports most of the stuff in OGL 1.3 or so (I personally believe this is the first OGL version that is really usable...then again, it could be just me, whose first Red Book was based on 1.3 =P...moreover, OGL finalization is usually late by 1-2 years compared to other APIs). And yes, OGL 1.3 is more than capable of generating the graphics you see on the PS2 today. As a matter of fact, I'm sure that a GeForce3 (which I think should technically be the correct comparison to the EM in terms of performance, but not by date) can produce the same quality games as that on the PS2 given the same screen resolution. Of course, the reason that ALL (except the original XBox...but I never heard them having to slow this or that down in order to get good framerates in their first gen games) the previous systems needed devs to spend time to get used to them is because they're all customized chips. For example, the PS2 had some triple core design where all three cores acted differently...couple that with piss poor documentation, of course the dev would need time. However, all three of the current gen designs are based on PC gfx cards, which I'm sure if they had the right drivers, would support all the APIs that their PC counterparts would support.

But if anything, we should commend MS for letting develops onto familiar territory. And I suppose the bottom line is, are we content with next (technically "current", since the 360 just came out) gen graphics looking like PGR3? And that leads to the question: Do we want more rehashes with better graphics or do we want more gameplay with the same graphics?

*edited to make some clarifications*
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Err? There's cache on the GPU? Mipmaps are all stored on the GPU's memory.

The GPU has a texture cache, that is very basic.

Every trilinear filtered texture lookup requires at least 2 reads

Eight, bilinear takes four.

more if you use anistropic filtering

At least eight for those angles impacted using bilinear, up to 128 samples per pixel using today's hardware and trilinear.

one from the next higher level and one from the next lower level and then a weighted average (the weights being: 1 - how "far" the actual sample is from the two levels) is computed on the two samples...there is no "extra" usage.

And if you're talking about using the LOD bias to eliminate lower (hence higher res) levels of mipmaps, then one can do exactly the same to PC games AND console games.

You are talking about modifying LOD bias, I'm trying to fill you in on some basic 3D rasterizing elements. If you run at a higher resolution the proper calculations for LOD bias are altered hence making higher resolution mip maps proper under those conditions.

This is exactly the reason why you see options like "texture detail" in games such as Doom 3.

No, that tends to be there to reduce the strain on board level memory, not on texture cache level memory. PC developers do not have any way that they can directly access the texture cache on PC GPUs.

What? All of a sudden a 3D game won't be using vertex shaders? Or pixel shaders?

So you are running a scene that needs a total of 3Million vertex shader ops performed per frame and 380Million pixel shader ops. You going to leave those vertex shaders you had assigned idle so that a fixed amount will be used for those shader ops at all times? Do you think anyone is stupid enough to stall useable ALUs so they can keep a constant number?

Where're the pixel shaders going to get their interpolated pixels if one doesn't transform the vertices, calculate their colors, and interpolate them using the vertex shader? On the other hand, where is game going to get its pixels on screen if it doesn't use pixel shaders? So, yes, both shaders have to be used, no matter if it's console or PC development.

That is downright painful to read- it doesn't sound like you have the vaguest clue in the world what you are talking about. Shader hardware debuted on PCs with the GeForce3- there were no pixel and vertex shaders before then and using some sort of magic with fairy dust somehow games mangaged to be made.... in 3D even. I would try and split everything you said in that paragraph down but it is utter nonsense. Basic interpolation of pixel color data is handled on the fly in the TMU, transforming vertices can easily be done on the CPU, using static T&L or vertex shaders, pixel data is had from texture maps the overwhelming majority of the time(pixel shaders tend to, at most, modify).

IIRC, branch prediction isn't used all that much at the moment.

I said poorly optimized branch- again very PC centric thinking. Eliminating as much possiblity for stalls as possible is a big issue on consoles, on the PC you manage that for the NV4x and you may increase stalls on the R520.

...hey, a madd used is a madd used...it doesn't matter if it's for the second part of a short shader or part of a longer shader.

Actually yes it does, just doesn't for sloppy PC code. If you are utilizing a madd where it is in the shader, what your pipeline is doing and what the following instruction is that is going to be needed are all important factors for console code. You seem to be completely ignoring the exploitation of proper levels of parallelism in code, for any GPU that is enormously important.

Voodoo...1? If you're saying Voodoo 5, I can understand. I'm sure the Emotion Engine supports most of the stuff in OGL 1.3 or so

The Emotion Engine supports everything in OpenGL 12.9- the Emotion Engine is a general purpose PROCESSOR. The Graphics Synthesizer is the rasterizer in the PS2(can't be called a GPU as it isn't close). In full software emulation the EE can handle any graphics feature just as any CPU can, but not at remotely close to comparable speeds to dedicated hardware.

To your comment- the GS in the PS2 can not even handle mip mapping without jumping through hoops. Forget trilinear, anisotropic, transorm and lighting calcs or anything resembling pixel shading, the GS has a lower level feature set then the Voodoo1. Not the V5, the V1.

As a matter of fact, I'm sure that a GeForce3 (which I think should technically be the correct comparison to the EM in terms of performance, but not by date) can produce the same quality games as that on the PS2 given the same screen resolution.

The GeForce3 is vastly superior to the GS, it isn't even close. The advantage the GS has is that it is in a console and every last ounce of what it is capable of can be squeezed out.

Of course, the reason that ALL (except the original XBox...but I never heard them having to slow this or that down in order to get good framerates in their first gen games) the previous systems needed devs to spend time to get used to them is because they're all customized chips.

The original XBox's launch titles suffered from massive slowdown- single digit framerates were all too common in Halo and rarely did the game manage to maintain 30FPS for any period of time. Customized graphics chips are regularly proven to be superior in terms of actual useable performance versus the generalized chips found in PCs.

For example, the PS2 had some triple core design where all three cores acted differently...

The EE, which is the CPU in the PS2, had a single general purpose core and two vector units(VU0 and VU1).

However, all three of the current gen designs are based on PC gfx cards, which I'm sure if they had the right drivers, would support all the APIs that their PC counterparts would support.

OpenGL was ported to the PS2, GameCube and XBox. As a general rule of thumb an API works as an abstraction layer and prevents you from hitting peak levels of performance.

And I suppose the bottom line is, are we content with next (technically "current", since the 360 just came out) gen graphics looking like PGR3?

RE5 is a much better example of the coming gen's graphics then PGR3.
 

Teetu

Senior member
Feb 11, 2005
226
0
0
No system has ever had a launch game that maxed out the system's capabilities. If you compare first gen PS2 games to current ones it doesn't even seem like its running on the same hardware.

That said ps3 supposedly is going to be running 1080p.

Also, quake 4 is not running well on the 360. This could very well be the programmers fault (since cod2 runs well), but ya never know.
 

PingSpike

Lifer
Feb 25, 2004
21,758
602
126
Most of this is over my head, but I doubt they've squeezed all the juice then can out of the system...even if the architecture is somewhat familiar.
 

dunno99

Member
Jul 15, 2005
145
0
0
Thanks for the response, Ben, but the ad hominem attacks aren't necessary. Either way, here goes the counter-counter-ad infintum-counter argument:

The GPU has a texture cache, that is very basic.

Ok, fair enough. I suppose these specs aren't usually specified and which is why I'm not aware of them.

Eight, bilinear takes four.

Right. I forgot it's 4 texture reads for each of the two levels (that's the bilinear part)...I suppose I brainfarted this morning. But either way, if a game uses a mipmapped 1024x1024 (for level 0) texture, how would the hardware sampling algorithm change if it's ran on the console vs the PC (at the same resolution)? The point is that there wouldn't be, because these algorithms are implemented the same way in silicon. And you're saying that PC developers don't care at all how fine/coarse their textures are? I think a PC dev would know the difference in their game performance if they used 512^2 level 0 textures vs 1024^2 level 0 textures and they would optimize just the same as a console dev would do. So it's not as if there's anything different between the two, and therefore there wouldn't be anything to "optimize" in this very specific scenario.

If you run at a higher resolution the proper calculations for LOD bias are altered hence making higher resolution mip maps proper under those conditions.

Basically, what you're saying is that there is essentially a bias towards level 0 is applied when sampling the same texture in a lower res setting vs higher res setting (all else being equal). This results from the fact that finer samples are needed because the texture is more finely sampled in a higher res screen. Ok, I agree. But this doesn't cause the texture cache to miss more, percent-wise. Why not? Sure, as you said, lower leveled (hence finer) mipmap levels are loaded into the texture cache, and the lower leveled mipmap is higher res, hence more data needs to be filled into the texture cache. But this is exactly (because the additional bias is based solely on the screen res) cancelled out by the higher res causing screen samples to be closer together, and thefore the cache hit-to-miss ratio remains the same.

Furthermore, "proper" should actually be "samples the correct level mipmap depending on the screen resolution." Maybe that's what you meant anyways.

No, that tends to be there to reduce the strain on board level memory, not on texture cache level memory.

The "texture detail" I was referring to was exactly that: the onboard video memory (remember, I didn't realize that there's a texture cache...so everything I was referring to was with regard to the onboard memory), not the texture cache memory. I suppose I should take it that you're agreeing with me here.

You going to leave those vertex shaders you had assigned idle so that a fixed amount will be used for those shader ops at all times?

This is where my argument to your argument that was arguing about my other argument (ok, this is confusing...but just trace the series of arguments backwards and you'll see what I'm referring to...go all the way back to "Some of them will always be configured to be vertex shaders while others will be pixel shaders." and read the sentence right after) is taken out of context (but my argument basically agrees with your statement that the shader units will be assigned to exactly what's needed...but then then that means the OS/driver knows exactly what's needed and will automatically use the correct shader...which means that there's nothing to optimize).

Shader hardware debuted on PCs with the GeForce3- there were no pixel and vertex shaders before then and using some sort of magic with fairy dust somehow games mangaged to be made....

Ah, I was using "shader" in its generic definition, not what you're referring to. What you're referring to is "custom-shader", not just "shader." I consider fixed-functionality a shader, since afterall, it does define the final surface property...just in a fixed way, not a custom way like what the GF3 introduced.

As for the interpolated part, sorry, I knew something would go wrong with a large post. I mean to say "and let them get interpolated" but totally didn't type what I meant. I apologize for the misunderstanding (yes, interpolation happens in between the vertex and pixel shaders).

Moreover, if a pixel shader is needed, then no, usually the end result wouldn't just be a texture fetch (then there's no need for the pixel shader), but rather a blend of some texture and some custom lighting (like Ward materials or something instead of just plain old Phong).

I said poorly optimized branch- again very PC centric thinking.

Yes, poorly optimized branch prediction (more like branch-unfriendly, since the actual prediction part is done by the hardware and the dev has no control over that). And? Which PC game uses branch-unfriendly shader code? Most game shaders run at 100 or less instructions...what, you're going to have 100 branch instructions, and nothing else? No. The majority, if not all, of the instructions won't be branch dependent. Most of these shaders are deterministic material calculations that requires only lookups and computation, not branching. It's not like games are doing GPGPU or something.

You seem to be completely ignoring the exploitation of proper levels of parallelism in code, for any GPU that is enormously important.

That's the point of hardware rasterization. After the set of fragments representing a triangle is rasterized, the fragments are processed all at once (this is part of the reason why attaching the destination to an FBO as a source is VERY bad). The point is that the exploitation is automatic and requires no intervention whatsoever (not that there's a need). So it's impossible for a dev to exploit pixel-level parallelism because it's already fully optimized and there's nothing left over to exploit (this is the beauty of graphics...it's the most easily parallizable discipline within computer science...and you don't even need to manually parallize it these days!).

The Emotion Engine supports everything in OpenGL 12.9- the Emotion Engine is a general purpose PROCESSOR.

Ohhhhhh. Ok. I see it now. But from the performance specs (as I have no access to an actual dev kit), it seems like it is quite capable. So what's EE's non-software emulation mode? It seems like even without aniso, trilinear, mipmap, T&L, and all that jazz, it still has an impressive throughput. And that throughput should surely be spent somewhere for its graphics output. Of course, without any of these features and a high throughput rate, of course the devs need to spend time getting used to coding for the PS2. But it's not the same with the 360 (the argument was about the 360, remember), since MS DOES offer the libraries so there isn't all that much to "get used to" for the 360.

As a general rule of thumb an API works as an abstraction layer and prevents you from hitting peak levels of performance.

Granted, as you said, APIs tend to degrade performance...but how much degradation? If the APIs drops normalized efficiency down from peak 100% to 50%, then sure, I admit, once the console makers totally open up their hardware, then I can see a 2X boost in performance and I will retract my original argument. But if we're talking about from 100% to 95%, then that ~5% potential perfomance boost is hardly "a lot of room," in which case, yes, we have hit the GPU's performance limit already.
 

modedepe

Diamond Member
May 11, 2003
3,474
0
0
Originally posted by: CalamitySymphony
CPU limited? What are people smoking anyway. Everything is GPU limited. Oh not to mention, the CPUs on these beasts beat the crap out of every other CPUs available currently on PCs.

Yeah, speaking of smoking, what the hell is in your pipe? Maybe they beat the crap out of every other CPU in raw crunching power vs. price, but that's about it.
 

Falloutboy

Diamond Member
Jan 2, 2003
5,916
0
76
I think as with most consoles that it will just require a few games for the devs to get used to coding for the 360, sure its using all the power right now to do current games, but I doubt they are all that effeciently programed since they were rushing to be a launch title.
 

Fern

Elite Member
Sep 30, 2003
26,907
174
106
Originally posted by: SickBeast
Wouldn't even a 9700Pro be CPU limited in most games today? It's not a very high resolution at all.

I'm hesitant to disagree with you Beast, but umm check THIS out. You'll see the Barton 2500+ is not a limit. Its pulling right at 60fps, the cap for D3, on a 6800GT.

I still believe (1) any talk of "CPU limits/bottlenecks" is way overstated, and (2) FPS gaming is pretty much all about the gfx card.

The only thing bottlenecking a 9700pro is itself.

Fern
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
I wonder if today's launch releases were single-threaded or multithreaded? My guess is the vast majority of them was single-threaded (I hate it when they rush it like that...). Anyone care to elaborate?

The only time something is CPU limited (in this practical case meaning the CPU is preventing it from getting a playable rate) is when the physics/AI/etc. code take over and leave the driver with no power. Other than that you can keep ratcheting up the graphics quality (putting more stress on solely the GPU in this hypothetical statement) and keep from being CPU limited. Saying something is or is not CPU limited is such a blanket statement, and you can't tell from just a CPU and graphics card being in the equation. You need to know about driver overhead and whether the audio processing is being offloaded to an audio card, whether there is any AI at all, how stressful that is, etc...and not to mention whether the game is multithreaded. So I propose a new rule: whenever you say something is CPU limited, specify at the very minimum:

a) what you even mean by limited (30 FPS? 60 FPS? Infinite FPS?);
b) the name of the game and whether it's multithreaded;
c) single player or multiplayer mode;
d) the CPU and number of cores;
e) the graphics card and any applicable mods (clocks, pipelines, otherwise) to it, or "stock";
f) last but not least, the exact graphics settings. AA/AF, resolution, texture detail, you know the drill.

Anyone else concur? I'm not being a forceful Nazi about this, I'm just interested to see how one's opinion would differ about this. Not saying you have to say all this when you say CPU limited, but at least consider it. Everybody is going to interpret the other parameters however they want when you aren't specific. Some of the above isn't applicable for consoles though because they will be unmodifiable. I never knew, but can you adjust graphics settings on a console? I don't recollect being able to on my N64 (last console I had :D).