5770 benchmarks at AlienBabelTech

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MarcVenice

Moderator Emeritus <br>
Apr 2, 2007
5,664
0
0
Toyota, I said he would be bottlenecked, which he would be, weren't it for the fact he said he would be running insane amounts of AA to create a gpu-bottleneck. Nevertheless though, I am interested in seeing how two or even three HD 5870's would do with a C2D, in let's say Stalker: Clear Sky. Then again, most people who can afford such awesome gpu-power, can afford a core i7 920 and a decent tri-cf mobo, wouldn't they. So it's a pretty moot discussion.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Very nice work on the IQ article BFG, as per usual for you.

Couple different things I thought were a bit odd. They aren't adjusting LOD bias at the driver level when enabling SS? This is a rather poor rookie mistake for them to be making, 3dfx was ripped apart for the blur it caused and had to fix it in their drivers after a relatively short time on the market. I'm assuming that someone at ATi will get their head out of their rectum and fix this in an upcoming driver update(easy fix for them). Any word on game profile support per title coming to CC? I suppose if this was implemented it wouldn't be that big of an issue, set once and be all set.

The AF undersampling is a very big letdown, as you showed in your analysis real world it is worse then the 4xxx series in certain circumstances, and that part wasn't all that great to begin with. I wonder how much additional performance it would take for them to just do it right, hopefully not too much so we can see this fixed in future driver revisions.

Thus the 5xxx is the first board in consumer space to offer truly angle invariant anisotropic filtering.

Come on BFG, you did not honestly write that did you? You can certainly debate the performance levels, but everything from NV10 to NV2A supported truly angle invariant anisotropic filtering, you know that :)

Pretty much it seems that the only real issues that the 5xxx series has can be fixed at the driver level. Not entirely sure how their sampling hardware works so that one may be a bit more complex, but if they can get it fixed it would certainly be a far more tempting part to pick up once they hit mainstream availability.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Thank goodness someone linked to it on their own accord in a post, because I couldn?t do it (against the TOS of the forum). :)

I also tested bottlenecking too and while I can?t link to it in my posts either, I found the 5770 is actually primarily limited by its core, not by its memory, as is commonly accepted.

Actually the two were quite close because it?s quite a balanced part, but the core edged it overall. From a 20&#37; underlock on each, the core lost 11.20% performance overall, while the memory lost 8.46%.

Well aren't all video cards limited by the core?

In fact, I can't even imagine a video card existing that isn't completely dependent on core. The memory bandwidth is merely to support the cores processing power right?

I wonder what would happen if you compared Hd5770 and HD4890? Both have the same core processing power so decreasing bandwidth (by a similar amount) on both would be interesting (ie, take 20% memory bandwidth off HD4890 vs 20% memory bandwidth off HD5770 then measure respective performance drops)

I am no engineer (obviously) and don't know much about architechture but something tell me these things operate on a slippery slope (with the engineer trying to balance various trade-offs). Maybe ATI decided including more processing power on hd5770 (with less bandwidth) was more bang for the buck?

P.S. Thanks for doing this review. I was always interested in seeing underclocking results on the memory (considering the new GDDR5 or maybe its the controller has some type of error correction process that obscures results of the memory overclock)
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
We can rule out the memory because it doesn?t seem to be the primary limitation based on my findings. There?s a bottleneck else where, and I?m convinced it?s at the driver level, intentional or not.

I don?t think we?ve seen the full performance of the 5xxx series yet, and I expect something interesting to happen when Fermi launches.

Yes, but isn't a ~8&#37; decrease (in performance) from the memory underclock quite significant when an equivalent decrease in core only yields a 11% decrease in performance?

Or maybe the HD4890 is just overspec'd on memory bandwidth (compared to HD5770) to the point it is less cost effective? (ie, the extra memory bandwidth actually boosts performance but is not as cost effective from an engineering standpoint as increasing core speed/stream processors)
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
I am still not sure about the new ATI cards.

Well Nvidia is doing something similar.

In fact, GT300 has more than double the stream processors (512 vs 240) compared to GTX285...but only ~50% more bandwidth compared to the previous generation.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
actually, gt 300 doesn't have sp's at all. they don't have ANYTHING at all right now except spec sheets. once they are actually built they will have "cuda cores" instead of "shader processors".
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
Very nice work on the IQ article BFG, as per usual for you.
Thanks, your comments and criticisms are always valuable to me in this area. :)

Couple different things I thought were a bit odd. They aren't adjusting LOD bias at the driver level when enabling SS? This is a rather poor rookie mistake for them to be making, 3dfx was ripped apart for the blur it caused and had to fix it in their drivers after a relatively short time on the market. I'm assuming that someone at ATi will get their head out of their rectum and fix this in an upcoming driver update(easy fix for them).
From what I’ve seen, I don’t believe they are adjusting the LOD. Of course they could be doing it behind the scenes in some of the games I tested. If you look at the Doom 3 screenshots for example, even 8xSS had practically zero blurring, but I never touched the LOD.

If they aren’t adjusting the LOD then they might have something smarter in place (it’d be interesting to find out what that is), but it doesn’t look like it works properly in every situation.

Any word on game profile support per title coming to CC? I suppose if this was implemented it wouldn't be that big of an issue, set once and be all set.
They’ve had profiles since CCC went live, but they’re not like nVidia’s. nVidia’s automatically activate when a game is opened, but ATi’s don’t. They also need to add a LOD adjustment into their control panel since ATi Tray Tools isn’t updated often enough for new cards to make it a viable replacement for CCC.

Come on BFG, you did not honestly write that did you? You can certainly debate the performance levels, but everything from NV10 to NV2A supported truly angle invariant anisotropic filtering, you know that :)
I honestly wasn’t sure whether it was perfectly angle-invariant, or just much better than everything except the 5xxx. I looked for sample patterns but I couldn’t find any.

In case it was, I’ve altered the sentence to read:

Thus the 5xxx is the first board in consumer space to offer truly angle invariant anisotropic filtering at a 16x level.
Since those old cards were limited to 2x/8x, that addendum covers me nicely. :)
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
Well aren't all video cards limited by the core?

In fact, I can't even imagine a video card existing that isn't completely dependent on core. The memory bandwidth is merely to support the cores processing power right?
Yes, almost all cards are limited by the core to some degree, but it’s not always the primary limitation. A GF2 MX for example would have memory as the primary limitation by far.

Yes, but isn't a ~8% decrease (in performance) from the memory underclock quite significant when an equivalent decrease in core only yields a 11% decrease in performance?
Yes, but remember the whole internet was convinced the 5770 was primarily bottlenecked by its memory. If that was the case then reducing it should’ve dropped performance the most, but it didn’t; performance dropped more when the core was underlocked.

So yes, you’re quite right, 11.20% (core) vs 8.46% is very close, but it proves two things:

(1) The core is primary limitation, not the memory.
(2) The 5770 is actually quite a balanced card and has been equipped with the right amount of bandwidth relative to its processing power.

As a comparison, on the 8800 Ultra I observed 12.64% core vs 5.45% memory overall. The 8800 Ultra’s results are even more different considering I ran half of the benchmarks without AA, while the 5770 didn’t run anything lower than 2xAA.

If the 5770 was primarily bandwidth limited I would’ve expected similar numbers but reversed, except that didn’t happen. So everyone running around claiming the 5770’s performance is held back by its memory is technically correct, but performance is held back more by the core. I actually think the bottleneck is at the driver level personally, and not related to the hardware.

So yes, video cards will always be bottlenecked somewhere, but there are different degrees within each GPU element. Also if all clocks show similar performance drops then that means the part is balanced, and doesn’t have an obvious limitation in one key area.

P.S. Thanks for doing this review. I was always interested in seeing underclocking results on the memory (considering the new GDDR5 or maybe its the controller has some type of error correction process that obscures results of the memory overclock)
Yes, I’m glad someone gets it! Some people are confused why I underclock the parts instead of overclocking them, but what they don’t realize is that it’s just a clock speed that offers a certain level of performance, and the hardware doesn’t care about the English terms “over” and “under” that we apply to it.

The GDDR5 error correction is but one of many reasons why I believe it’s better to lower clocks than raise them. Other reasons include almost complete freedom with clocks (good luck getting 20%-30% overclocks on any GPU), and that fact that overclocking the GPU can skew the results if it starts exposing limitations in the CPU/platform.

IMO it’s much more reliable to start with a certain level of performance and then see what we lose when we reduce individual areas, because the stock clocks tell us what performance we’re supposed to get.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
If the 5770 was primarily bandwidth limited I would&#8217;ve expected similar numbers but reversed, except that didn&#8217;t happen. So everyone running around claiming the 5770&#8217;s performance is held back by its memory is technically correct, but performance is held back more by the core. I actually think the bottleneck is at the driver level personally, and not related to the hardware.

So yes, video cards will always be bottlenecked somewhere, but there are different degrees within each GPU element. Also if all clocks show similar performance drops then that means the part is balanced, and doesn&#8217;t have an obvious limitation in one key area.

Just wondering if there is any kind of testing that stresses memory bandwidth proportionally more than the core?

You have mentioned AA stressing memory bandwidth....but wouldn't that stress the core equally as well? I am asking this because (for me) minimum frame rates are what I notice the most.

In fact, I would take a card that is more resistant to dips even if it meant losing some maximum frame rate performance.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Yes, almost all cards are limited by the core to some degree, but it&#8217;s not always the primary limitation. A GF2 MX for example would have memory as the primary limitation by far.

Your right.

The HTPC cards are a good example. I'm sure someone could lower the core on those and not see much change in performance (if memory bandwidth was still bottlenecking)
 
Oct 27, 2007
17,010
1
0
So I'm upgrading my video card in the next couple of weeks, I can't understand what could possibly compel me to buy the 5770? Seems to me like I'd be getting far better value from a 4870 or 4890.
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
You have mentioned AA stressing memory bandwidth....but wouldn't that stress the core equally as well?
No, AA is usually primarily limited by memory bandwidth, especially edge based schemes like MSAA which save texturing fillrate and shader performance by only evaluating one shader/texture sample regardless of sample count.

AA also hits the ROPs but unless you have a problem with them (e.g. Radeon 2xxx/3xxx series), memory becomes the limitation far quicker.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
No, AA is usually primarily limited by memory bandwidth, especially edge based schemes like MSAA which save texturing fillrate and shader performance by only evaluating one shader/texture sample regardless of sample count.

AA also hits the ROPs but unless you have a problem with them (e.g. Radeon 2xxx/3xxx series), memory becomes the limitation far quicker.

Thanks.

When I see a card with high average frame rates I often wonder if this because the card is actually resistant to dips or is it because its maximum frame rate is high.

Correct me if I am wrong but couldn't a HD5770 be capable of a higher maximum frame rates than a HD4870....but then the HD4870 is probably more resistant to dips with AA on (re: memory bandwidth doesn't become a restriction when graphical demands are at their highest)

HD5770= 1.35 TFLOPs (faster computational time= higher maximum frame rate possible as long as memory bandwidth doesn't become a restriction)

HD4870= 1.20 TFLOPs (slower computational time but then the memory bandwidth is so much greater)
 
Last edited:

bryanW1995

Lifer
May 22, 2007
11,144
32
91
So I'm upgrading my video card in the next couple of weeks, I can't understand what could possibly compel me to buy the 5770? Seems to me like I'd be getting far better value from a 4870 or 4890.

depends upon what you play, what res, what prices you can get in NZ, etc. if the prices are relatively close then the consensus seems to be go 5770. BFG and marcvenice, to name two knowledgeable sources, seem to think that 5770 is being held back deliberately by drivers (aka waiting for gt300). I don't necessarily subscribe to that theory, but I think that we can all agree that 5770 is going to improve much more over the next year's driver updates than 4870/90 or gtx 260 will. Add in the incredible power savings of 5770 and dx11 (which at first look appears to do what dx10 should have done and will probably end up being much more useful) and you would need to probably save $30-$40 on a 4870 1gb or gtx 260 to even CONSIDER one of them. If you can get a 4890 or gtx 275 for a few $$ less then that would be appealing, but otherwise I think that 5770 is the way to go.


@just learning: maybe bfg can confirm/deny this, but I think that 5xxx is better with AA than 4xxx was. If that's the case then you would probably have fewer/shallower dips in fps with 5770.



EDIT: by the way, I now have the links to all 3 articles in the OP.
 
Last edited:

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
Correct me if I am wrong but couldn't a HD5770 be capable of a higher maximum frame rates than a HD4870....but then the HD4870 is probably more resistant to dips with AA on (re: memory bandwidth doesn't become a restriction when graphical demands are at their highest)

HD5770= 1.35 TFLOPs (faster computational time= higher maximum frame rate possible as long as memory bandwidth doesn't become a restriction)

HD4870= 1.20 TFLOPs (slower computational time but then the memory bandwidth is so much greater)
Yes, the 4870 should do better than the 5770 with AA because it has more memory bandwidth, but we can’t really tell at the moment because the 5xxx series appears to be underperforming in general.

Also there have been other improvements made to the 5xxx over the 4xxx relating to memory, such as caching. So memory bandwidth isn’t necessarily the end-all to AA performance in this particular situation.
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
@just learning: maybe bfg can confirm/deny this, but I think that 5xxx is better with AA than 4xxx was. If that's the case then you would probably have fewer/shallower dips in fps with 5770.
Well, the 5770 has less bandwidth, but it has better caches than the 4870. Again, until we see the full performance of the 5xxx it’s tough to actually say how useful that extra bandwidth on the 4870 is.

EDIT: by the way, I now have the links to all 3 articles in the OP.
Thank you. :)
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
BFG and marcvenice, to name two knowledgeable sources, seem to think that 5770 is being held back deliberately by drivers (aka waiting for gt300).

I don't think they are waiting for Fermi, although they could be doing it to allow old inventory to get flushed out of the system. ATi is not going to want to take a write down on existing 48xx parts, if the 5770 was considerably faster then the top tier 4xxx boards then they would be forced to take a write down on them which is not something any smart business wants to do.

Thanks, your comments and criticisms are always valuable to me in this area.

With all my years reading all the various tech sites to come and go with so many different journalists reporting, I think you are the only person as anal about IQ as I am that I have seen :D

From what I&#8217;ve seen, I don&#8217;t believe they are adjusting the LOD. Of course they could be doing it behind the scenes in some of the games I tested. If you look at the Doom 3 screenshots for example, even 8xSS had practically zero blurring, but I never touched the LOD.

Given some of the oddities we saw from your AF testing, do you think this could be an issue of not having enough room in the viewport to be visible? Have you tried it without AF? Just a curiosity thing, but it may help narrow it down.

They&#8217;ve had profiles since CCC went live, but they&#8217;re not like nVidia&#8217;s. nVidia&#8217;s automatically activate when a game is opened, but ATi&#8217;s don&#8217;t.

Heh, didn't word that well but that's what I meant. The auto profiles are so easy to get used to I hate having to go into the CP for anything these days :)

Since those old cards were limited to 2x/8x, that addendum covers me nicely.

Yes it does, I went back and tried to find some tests with it but I couldn't. Was tempted to dig out the Ti4200 but it dawned on me that I don't have an AGP system up and running at the moment :p
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
Well, the IQ portion of the article was simply great. It confirmed my feelings upon getting a 5870 that aniso quality took a dive from my previous 8800GTS. I very much hope that IQ issues are fixed in future drivers, as angle dependence is far better (only sometimes noticeable) than uneven filtering transition ranges (pretty much always noticeable). I'm shocked no other professional reviewers were able to do those same tests or even notice that something was obviously wrong.
That aside, the bottleneck investigation was extremely poor. Your reasoning against increasing clocks (error correction, poor scaling) is valid, but there are also valid reasons against testing by decreasing clocks. You're assuming scaling is linear, which it very well might not be in some situations. For example, going from 1100MHz memory to 1200MHz memory clocks might yield a 10&#37; speedup, but that doesn&#8217;t mean that going from 1200MHz to 1300MHz will behave the same. In the worst case something completely different will bottleneck the card past 1200MHz and you will receive a 0% speedup. That kind of example might be extremely unlikely, but you can still get flawed results. A purely scientific approach would use neither methodology: Instead you would reduce both core and memory clocks in proportion, say down to half the original values, and THEN you would increase clockspeeds individually in increments. Increments are the key here, as you need several data points to determine a meaningful trend. A single data point yields hardly any information. Most interesting would be the difference between the original value where everything is at stock, and the value where the only thing changed is the core frequency reduced by half. That test would give you the max theoretical (because the CPU might become a bottleneck) speedup of a part with a memory controller twice as wide.

Also, your claim that R8xx is a balanced architecture based on speedown via underclocking is far too simplistic. Whether it is balanced or not depends not only on speedup, but also on the number of transistors (more accurately the area of transistors) devoted to each function. For example, if mem and core clocks are equally important to performance, but the memory controller only takes up 10% of the die while the shader units take up 50%, that is NOT indicative of a balanced architecture. In that example focusing on the memory controller would take 5 times less resources to achieve similar speedup as beefing up the shaders would. Pad limiting is of course an excuse, but doesn&#8217;t change the fact that an architecture is balanced or not.
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
Given some of the oddities we saw from your AF testing, do you think this could be an issue of not having enough room in the viewport to be visible?
I don’t believe so; I tried multiple games (e.g. Call of Duty 1, Doom 3, Quake 4) and SS exhibited practically no blurring. The issue with Call of Duty 4 clearly wasn’t normal.
Have you tried it without AF? Just a curiosity thing, but it may help narrow it down.
No, but I’d never run below 16xAF anyway so it’s probably a moot point.

I get the feeling I might be revisiting the issue in the future anyway. ;)
 

dug777

Lifer
Oct 13, 2004
24,778
4
0
Nice work mate :) Almost tempted to pick two up :eek:

On the balanced cards issue, I always feel that the 4870 was far better balanced than the 4850, would be interesting to see some 4870 512mb benchmarks cf a 4850 with equivalent core.

I've always seemed to score incredibly imbalanced cards, one way or the other, but a 4850 that does 780 core but barely budges on mem is a particularly cruel joke (not that I don't get a substantial boost in FPS in most things, but the wasted potential smashing into that mem bandwidth bottleneck brings tears to my eyes :()
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,971
126
I'm shocked no other professional reviewers were able to do those same tests or even notice that something was obviously wrong.
Heh, like I said earlier, my methods are somewhat unorthodox, but that’s because I try to show things that are a little different to the standard fare.
You're assuming scaling is linear…
Not at all; I’m only looking at what shows the most performance loss when the reduction is constant for both. If for example I had gotten 4% from the core and 2% from the memory I would have reached exactly the same conclusion.

In actual fact it’ll almost never be linear because it’s unlikely one clock will be the sole the bottleneck in every situation. What’s important is how each clock compares relative to the others, rather than the actual FPS or % drop.
A purely scientific approach would use neither methodology: Instead you would reduce both core and memory clocks in proportion, say down to half the original values, and THEN you would increase clockspeeds individually in increments. Increments are the key here, as you need several data points to determine a meaningful trend. A single data point yields hardly any information.
That would certainly be more comprehensive, but I still believe what I showed is still useful and accurate.

What I did operates on the simple philosophy that you have X amount of performance, derived from components A and B running at stock speed.

If we want to see which component is more important, we reduce one while leaving the other alone, and see which lowers performance the most.

The component that shows the lowest impact means it has the least amount to do with the performance when all are at stock.
Also, your claim that R8xx is a balanced architecture based on speedown via underclocking is far too simplistic. Whether it is balanced or not depends not only on speedup, but also on the number of transistors (more accurately the area of transistors) devoted to each function.
Understood, but relative transistor count was beyond the scope of the article, and was never intended to be covered. The article was only about performance so when I commented the 5770 was balanced, I meant that solely from a performance point of view.

This is unlike the 8800 Ultra for example which showed a reliance on the core by more than two times over memory. That is an unbalanced design from a performance point of view because either the core is too slow, or the shader and memory are too fast if you want to look at it another way.