Bit-tech bench Tomb Raider Legends

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: gi0rgi0
Im surprised the game looks better on the x360. Over on the 360 forums people with both high end pc's and 360's said the demo didnt look as good as
the pc demo. Like dripping water and stuff. Was that just cuz it was a demo then ?

The article mentioned that the PC demo version had the next-gen content disabled.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Like I said before ATi's drivers are far more robust and stand up far better under a wide range of scenarios.

Remember that I have been running ATi for longer then you- their drivers fall over constantly and rarely work properly with newer games without being updated.

nVidia's aggressive application detection and per-app optimization is coming back to bite them in the butt just like I argued it would in the past.

ATi's always have- I guess that is the difference. Their per app optimizations have always fallen over all the time, starting with the RageIIc up to the x1900xtx. Now it is becoming an issue for nVidia which is the only thing that changed.

My X800XL had problems that could be counted on one hand

We have been over the hundreds of problems that their drivers had in that time period- your answer IIRC was that you weren't playing those games almost all of the time.

Remember that most benchmarks are run under quality mode which can cause visible shimmering in certain situations. Running under high quality generally cures it but can cause performance to tank.

Yes- for both ATi and nVidia so what is your point? Actually, they both still shimmer pretty badly in their quality modes- was this just an in general comment?
 

A554SS1N

Senior member
May 17, 2005
804
0
0
Originally posted by: BenSkywalker
Like I said before ATi's drivers are far more robust and stand up far better under a wide range of scenarios.

Remember that I have been running ATi for longer then you- their drivers fall over constantly and rarely work properly with newer games without being updated.

nVidia's aggressive application detection and per-app optimization is coming back to bite them in the butt just like I argued it would in the past.

ATi's always have- I guess that is the difference. Their per app optimizations have always fallen over all the time, starting with the RageIIc up to the x1900xtx. Now it is becoming an issue for nVidia which is the only thing that changed.

My X800XL had problems that could be counted on one hand

We have been over the hundreds of problems that their drivers had in that time period- your answer IIRC was that you weren't playing those games almost all of the time.

Remember that most benchmarks are run under quality mode which can cause visible shimmering in certain situations. Running under high quality generally cures it but can cause performance to tank.

Yes- for both ATi and nVidia so what is your point? Actually, they both still shimmer pretty badly in their quality modes- was this just an in general comment?



Hehe, owned!!!
 

jim1976

Platinum Member
Aug 7, 2003
2,704
6
81
Originally posted by: BenSkywalker

In terms of hardware implementation the 7900GTX and x1900xtx have nigh equal shader power, branching would be the only area where ATi may take a big lead(and even then it depends on exactly how it is implemented). ATi's huge edge in shader performance is a myth at the hardware level. Anyone who really looked over the publicly available documentation has known this for quite a while. So I would say at this point it appear that yes, ATi does need the 360 to show its shader supremacy.

In theoretical terms 7900GTX is nowhere near r580 in shader terms, but I do not disagree that this is not clearly evident yet in games..Nor if it will be.. Maybe if we had Crysis here right now, we could have seen this.. We can only have an idea because in almost every shader intensive game ATI is in front..
R580= 36 FLOPs (8 FLOPs for 4D MADD and 4 FLOPs for ADD per ALU) * 16 SIMD channels * 0.65GHz (only for pixel power excluding vertex) .. Nvidia is nowhere near..
Your theory is very weak.. I can inverse your"theory" also and ask you..
Was Nvidia in front in dx when Xbox was using an Nvidia chip? :) E.g (Halo)
It's not a myth that ATI is handling DX better than Nvidia since R3xx..
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
In theoretical terms 7900GTX is nowhere near r580, but I do not disagree that this is not clearly evident yet in games..Nor if it will be..

The R580 has 48 shader units each with its own ALU, the 7900 has 24 shader units each with 2 ALUs. The R580 and 7900 have 48ALUs each for fragment shading.

R580= 36 FLOPs (8 FLOPs for 4D MADD and 4 FLOPs for ADD per ALU) * 16 SIMD channels * 0.65GHz (only for pixel shaders) ..

Compared to the NV40-

- Each pipeline is capable of performing a four-wide, coissue-able multiply-add (MAD)
or four-term dot product (DP4), plus a four-wide, coissue-able and dual-issuable
multiply instruction per clock in series, as shown in Figure 30-11. In addition, a
multifunction unit that performs complex operations can replace the alpha channel
MAD operation. Operations are performed at full speed on both fp32 and fp16 data,
although storage and bandwidth limitations can favor fp16 performance sometimes.
In practice, it is sometimes possible to execute eight math operations and a texture
lookup in a single cycle.

- Dedicated fp16 normalization hardware exists, making it possible to normalize a
vector at fp16 precision in parallel with the multiplies and MADs just described.

- An independent reciprocal operation can be performed in parallel with the multiply,
MAD, and fp16 normalization described previously.

How exactly are you figuring the R580 has an edge in shader performance? Compare the raw numbers, it isn't there.

Was Nvidia in front in dx when Xbox was using an Nvidia chip? E.g (Halo)

By a huge margin looking at the comparable gen products. Compare the NV2x parts to the R2x0 parts and they leave a bloody smear on the side of the road. SplinterCell, Halo and pretty much every other port that went from console to PCs was an absolute anhilation for the GF4 over the R8500. Current get titles for the 360 were developed using the R580 platform- next round will be aimed at the R500 explicitly which should be interesting as ATi's R600 part will be around to capitalize on that(once the PC catches up to where consoles were last year) although clearly it is a staggering amount close this time around then when nVidia had the built in edge.
 

jim1976

Platinum Member
Aug 7, 2003
2,704
6
81
Originally posted by: BenSkywalker

The R580 has 48 shader units each with its own ALU, the 7900 has 24 shader units each with 2 ALUs. The R580 and 7900 have 48ALUs each for fragment shading.

R580= 36 FLOPs (8 FLOPs for 4D MADD and 4 FLOPs for ADD per ALU) * 16 SIMD channels * 0.65GHz (only for pixel shaders) ..

Compared to the NV40-
- Each pipeline is capable of performing a four-wide, coissue-able multiply-add (MAD)
or four-term dot product (DP4), plus a four-wide, coissue-able and dual-issuable
multiply instruction per clock in series, as shown in Figure 30-11. In addition, a
multifunction unit that performs complex operations can replace the alpha channel
MAD operation. Operations are performed at full speed on both fp32 and fp16 data,
although storage and bandwidth limitations can favor fp16 performance sometimes.
In practice, it is sometimes possible to execute eight math operations and a texture
lookup in a single cycle.

- Dedicated fp16 normalization hardware exists, making it possible to normalize a
vector at fp16 precision in parallel with the multiplies and MADs just described.

- An independent reciprocal operation can be performed in parallel with the multiply,
MAD, and fp16 normalization described previously.


How exactly are you figuring the R580 has an edge in shader performance? Compare the raw numbers, it isn't there.

Like this ;)

Having as given that:
multiply (MUL) = 4 floating point operations (FLOPs)
add (ADD) = 4 FLOPs
multiply + add (MADD) = 8 FLOPs

Also quad!=quad, ALU!=ALU

Don't confuse ALUs with Flops.. You cannot measure shader performance the way you attempted.. Again

G71: 24 SIMD channels * 16 FLOPs * 0.65GHz = 249.6 GFLOPs
R580: 16 SIMD channels * 36 FLOPs * 0.65GHz = 374 GFLOPs

Flops are a theoretical number of internal shader fillrate.They are indeed measurable in real time but you need applications like GPUbench to see the difference..
ATI has a massive advantage in internal shader fillrate , what remains to be seen is the application(game) to show it..
Things are much more complicated if you take into account the fact that ADDs can be used in Radeons under circumstances, but also in GeForces a part of fillrate is deducted from texture operations.. That's why we see major differences from theoretical numbers to "real" gaems..

By a huge margin looking at the comparable gen products. Compare the NV2x parts to the R2x0 parts and they leave a bloody smear on the side of the road. SplinterCell, Halo and pretty much every other port that went from console to PCs was an absolute anhilation for the GF4 over the R8500. Current get titles for the 360 were developed using the R580 platform- next round will be aimed at the R500 explicitly which should be interesting as ATi's R600 part will be around to capitalize on that(once the PC catches up to where consoles were last year) although clearly it is a staggering amount close this time around then when nVidia had the built in edge.

Ben this was my quote.. " It's not a myth that ATI is handling DX better than Nvidia since R3xx.. "
The vast majority of the games for Xbox were out when R3xx was introduced.. An equal if not worse annihilation for the Geforce FX series..From the time DX9 was introduced Nvidia in comparable products always stays back in high end single cards with small or big difference in performance depending on the case..
I'm not a fan of ATI, just stating that your theory is way too speculating for my ears without sufficient feedback to sound real..
Both cards right now have very good products and I'm reallly glad to see competition heating..
I for personal reasons found the solution of R580 more suitable for my needs, and I'm glad I did.. R580 is a magnificent piece of hardware and it doesn't need the backup of SexBox to showcase its power IMHO..
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Flops are a theoretical number of internal shader fillrate.

No, they aren't. FLOPS are floating ops per second. They are not anything remotely resembling internal shader fillrate, wherever you may have come up with that particular phrase. They indicate the ability to operate on a given fragment program. The complexity of the fragment program compared against the peak theoretical ops on a per function breakdown can give you the potential maximum fillrate ROP allowing. FLOPS are nothing remotely resembling anything close to a fillrate metric nor have they ever been.

G71: 24 SIMD channels * 16 FLOPs * 0.65GHz = 249.6 GFLOPs
R580: 16 SIMD channels * 36 FLOPs * 0.65GHz = 374 GFLOPs

The 'SIMD channels' that you are talking about mean absolutely nothing you do realize? We'll ignore that for a second though. Since you like to ignore the differences between the hardware- why is the x1900xtx losing in a 3 component MADD test? Since ATi's hardware is so vastly superior, surely it must be in something useful and not just very, very basic ADD ops(the test they run for ADD is non interdependant- a frightneningly simplistic task). In terms of actual shader functionality, there certainly is no large disparity between the two parts- nVidia comes out ahead in some of the more demanding tasks that fragment shaders frequently require. BTW- nV can do a 'free' normalize also, a test in which they would spank ATi but one that needs to be set up much like a 3vec ADD test, quite meaningless in real world terms.

Ben this was my quote.. " It's not a myth that ATI is handling DX better than Nvidia since R3xx.. "

Given a quick thought most people would realize that comparing a part that was released a long time after the XBox and how it compares to a subsequent part from the company that made that chip that went in to the XBox they may just think that is a completely laughable comparison. But then, someone else may point the people towards the original Splinter Cell and that person may point out that the x1900xtx still can not enable the highest quality options in the game. Since the XB didn't have any impact- why is it that ATi still can't handle a basic function that was available six years ago on nV hardware(don't even get me started on ATi's complete and abject failure in terms of WBuffer support)?

Both cards right now have very good products and I'm reallly glad to see competition heating..
I for personal reasons found the solution of R580 more suitable for my needs, and I'm glad I did.. R580 is a magnificent piece of hardware and it doesn't need the backup of SexBox to showcase its power IMHO..

Last nV part I purchased was a GeForceDDR when it first launched. I've been running ATi in my primary rig for quite a few years now- doesn't change the fact that the shader hardware of the x1900xtx is nothing remotely like what you are trying to make it out to be.
 

Dethfrumbelo

Golden Member
Nov 16, 2004
1,499
0
0
Originally posted by: tuteja1986
err I just got the game and played it with my xbox 360 USB controller. didn't like Keyboard Control :! Pretty good game i must say but i finished it in 7hrs : )

Its short but the best tomb raider game i must say : )

$60/7 hrs = $8.50 per hour. Pretty expensive for a tech demo.



 

jim1976

Platinum Member
Aug 7, 2003
2,704
6
81
Originally posted by: BenSkywalker
No, they aren't. FLOPS are floating ops per second. They are not anything remotely resembling internal shader fillrate, wherever you may have come up with that particular phrase. They indicate the ability to operate on a given fragment program. The complexity of the fragment program compared against the peak theoretical ops on a per function breakdown can give you the potential maximum fillrate ROP allowing. FLOPS are nothing remotely resembling anything close to a fillrate metric nor have they ever been.

Thank you for the terminology. Do you think that I don't know what flops is? I said what they can indicate.. :) Yeap FLOPs are nowhere near a theoretical shader fillrate measurment.. ..
Try to calculate what I have said in GPUBench and you'll see if what I said is right.. It's theoretical but everything I calculated can be given in real time...
But let's ignore everything else.. If you think that 48ALUS in G71(and with this logic r580 has 96 lol) can give you the same efficiency as 48 shader processors then it's ok with me, we have different opions on what is more crucial..

The 'SIMD channels' that you are talking about mean absolutely nothing you do realize? We'll ignore that for a second though. Since you like to ignore the differences between the hardware- why is the x1900xtx losing in a 3 component MADD test? Since ATi's hardware is so vastly superior, surely it must be in something useful and not just very, very basic ADD ops(the test they run for ADD is non interdependant- a frightneningly simplistic task). In terms of actual shader functionality, there certainly is no large disparity between the two parts- nVidia comes out ahead in some of the more demanding tasks that fragment shaders frequently require. BTW- nV can do a 'free' normalize also, a test in which they would spank ATi but one that needs to be set up much like a 3vec ADD test, quite meaningless in real world terms.

Hmm read better Ben SIMD channels by themselves mean nothing of course but under the formula estimate a significant result....And what does this bench prove? Nvidia has 4 MADDs+4MADDs =16FLOPs(multiply+add=2FP32 OPs) while ATI 4MADDs+4ADDs=12 FLOPs? :roll: It's nice to use this bench at your convenience.ADDs and MADDs by themselves mean nothing..These benches are static numbers with no essence put them under the whole architecture efficiency and see what happens. With your logic and this example 7600GT is better than X1800XT.. :Q Counting MADD issue rate is a much poorer gauge of arithmetic performance than multiplying shader pipelines by frequency...But since you brought it up I suggest you take a look at Cook torrence tests and look at some more meaningful results there ;)

Anyway we can talk for hrs about this, and I don't think this is the appropriate thread to do it..
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Yeap FLOPs are nowhere near a theoretical shader fillrate measurment.. ..

What on that page gives you the slightest notion that FLOPS equate out to fillrate.....? Nothing remotely close to it in the page you linked(I read through it again to double check, but no way Wavey would make such an assinine comment).

But let's ignore everything else.. If you think that 48ALUS in G71(and with this logic r580 has 96 lol) can give you the same efficiency as 48 shader processors then it's ok with me, we have different opions on what is more crucial..

The R580 could be considered to have 96 if its ALU 1 unit wasn't crippled. All it can do is ADD+modifier, no MUL and no MAD functionality. It is good for theoretical benches and that's it. As far as what we consider more crucial- analyze some shader code or simply run benches between the 7900 and R580 that are in game and shader limited. There is no big gap. It shaders were that heavily reliant on ADD functionality then maybe you would have a point, but they most certainly are not. In the most frequently utilized functions nV actually has an edge in performance.

SIMD channels by themselves mean nothing

They mean nothing period, not just by themselves. They simply are a generic terms looking at layout. It is akin to talking about what build process something is made on and how that relates to memory bandwidth. It is an utterly useless metric when talking about performance.

And what does this bench prove? Nvidia has 4 MADDs+4MADDs =16FLOPs(multiply+add=2FP32 OPs) while ATI 4MADDs+4ADDs=12 FLOPs? It's nice to use this bench at your convenience.

It's nice to have reality knock down the PR line you are trying to spread.

With your logic and this example 7600GT is better than X1800XT..

That's using your logic. Looks pretty foolish from the other side doesn't it? In terms of comparing raw computational capacity the 7600GT is in fact the superior of the x1800xt- 1800xt reference point, 7600GT reference point. In terms of useful computational resources that is something else entirely- but you don't appear to like that standard.

If you like relying on the "pixel shader fillrate" line then you need to tell everyone that the 7600GT is superior to the 1800xt. That is using your standards as a definition. Now when you want to talk reality and compare useful functionality then we can have a reasonable discussion.