5770 benchmarks at AlienBabelTech

MarcVenice · Nov 3, 2009

Toyota, I said he would be bottlenecked, which he would be, weren't it for the fact he said he would be running insane amounts of AA to create a gpu-bottleneck. Nevertheless though, I am interested in seeing how two or even three HD 5870's would do with a C2D, in let's say Stalker: Clear Sky. Then again, most people who can afford such awesome gpu-power, can afford a core i7 920 and a decent tri-cf mobo, wouldn't they. So it's a pretty moot discussion.

BenSkywalker · Nov 3, 2009

Very nice work on the IQ article BFG, as per usual for you.

Couple different things I thought were a bit odd. They aren't adjusting LOD bias at the driver level when enabling SS? This is a rather poor rookie mistake for them to be making, 3dfx was ripped apart for the blur it caused and had to fix it in their drivers after a relatively short time on the market. I'm assuming that someone at ATi will get their head out of their rectum and fix this in an upcoming driver update(easy fix for them). Any word on game profile support per title coming to CC? I suppose if this was implemented it wouldn't be that big of an issue, set once and be all set.

The AF undersampling is a very big letdown, as you showed in your analysis real world it is worse then the 4xxx series in certain circumstances, and that part wasn't all that great to begin with. I wonder how much additional performance it would take for them to just do it right, hopefully not too much so we can see this fixed in future driver revisions.

Thus the 5xxx is the first board in consumer space to offer truly angle invariant anisotropic filtering.

Come on BFG, you did not honestly write that did you? You can certainly debate the performance levels, but everything from NV10 to NV2A supported truly angle invariant anisotropic filtering, you know that

Pretty much it seems that the only real issues that the 5xxx series has can be fixed at the driver level. Not entirely sure how their sampling hardware works so that one may be a bit more complex, but if they can get it fixed it would certainly be a far more tempting part to pick up once they hit mainstream availability.

cbn · Nov 3, 2009

BFG10K said:
Thank goodness someone linked to it on their own accord in a post, because I couldn?t do it (against the TOS of the forum).

I also tested bottlenecking too and while I can?t link to it in my posts either, I found the 5770 is actually primarily limited by its core, not by its memory, as is commonly accepted.

Actually the two were quite close because it?s quite a balanced part, but the core edged it overall. From a 20% underlock on each, the core lost 11.20% performance overall, while the memory lost 8.46%.

Well aren't all video cards limited by the core?

In fact, I can't even imagine a video card existing that isn't completely dependent on core. The memory bandwidth is merely to support the cores processing power right?

I wonder what would happen if you compared Hd5770 and HD4890? Both have the same core processing power so decreasing bandwidth (by a similar amount) on both would be interesting (ie, take 20% memory bandwidth off HD4890 vs 20% memory bandwidth off HD5770 then measure respective performance drops)

I am no engineer (obviously) and don't know much about architechture but something tell me these things operate on a slippery slope (with the engineer trying to balance various trade-offs). Maybe ATI decided including more processing power on hd5770 (with less bandwidth) was more bang for the buck?

P.S. Thanks for doing this review. I was always interested in seeing underclocking results on the memory (considering the new GDDR5 or maybe its the controller has some type of error correction process that obscures results of the memory overclock)

cbn · Nov 3, 2009

BFG10K said:
We can rule out the memory because it doesn?t seem to be the primary limitation based on my findings. There?s a bottleneck else where, and I?m convinced it?s at the driver level, intentional or not.

I don?t think we?ve seen the full performance of the 5xxx series yet, and I expect something interesting to happen when Fermi launches.

Yes, but isn't a ~8% decrease (in performance) from the memory underclock quite significant when an equivalent decrease in core only yields a 11% decrease in performance?

Or maybe the HD4890 is just overspec'd on memory bandwidth (compared to HD5770) to the point it is less cost effective? (ie, the extra memory bandwidth actually boosts performance but is not as cost effective from an engineering standpoint as increasing core speed/stream processors)

cbn · Nov 3, 2009

frozentundra123456 said:
I am still not sure about the new ATI cards.

Well Nvidia is doing something similar.

In fact, GT300 has more than double the stream processors (512 vs 240) compared to GTX285...but only ~50% more bandwidth compared to the previous generation.

bryanW1995 · Nov 3, 2009

actually, gt 300 doesn't have sp's at all. they don't have ANYTHING at all right now except spec sheets. once they are actually built they will have "cuda cores" instead of "shader processors".

BFG10K · Nov 3, 2009

BenSkywalker said:
Very nice work on the IQ article BFG, as per usual for you.

Thanks, your comments and criticisms are always valuable to me in this area.

Couple different things I thought were a bit odd. They aren't adjusting LOD bias at the driver level when enabling SS? This is a rather poor rookie mistake for them to be making, 3dfx was ripped apart for the blur it caused and had to fix it in their drivers after a relatively short time on the market. I'm assuming that someone at ATi will get their head out of their rectum and fix this in an upcoming driver update(easy fix for them).

From what Ive seen, I dont believe they are adjusting the LOD. Of course they could be doing it behind the scenes in some of the games I tested. If you look at the Doom 3 screenshots for example, even 8xSS had practically zero blurring, but I never touched the LOD.

If they arent adjusting the LOD then they might have something smarter in place (itd be interesting to find out what that is), but it doesnt look like it works properly in every situation.

Any word on game profile support per title coming to CC? I suppose if this was implemented it wouldn't be that big of an issue, set once and be all set.

Theyve had profiles since CCC went live, but theyre not like nVidias. nVidias automatically activate when a game is opened, but ATis dont. They also need to add a LOD adjustment into their control panel since ATi Tray Tools isnt updated often enough for new cards to make it a viable replacement for CCC.

Come on BFG, you did not honestly write that did you? You can certainly debate the performance levels, but everything from NV10 to NV2A supported truly angle invariant anisotropic filtering, you know that

I honestly wasnt sure whether it was perfectly angle-invariant, or just much better than everything except the 5xxx. I looked for sample patterns but I couldnt find any.

In case it was, Ive altered the sentence to read:

Thus the 5xxx is the first board in consumer space to offer truly angle invariant anisotropic filtering at a 16x level.

Since those old cards were limited to 2x/8x, that addendum covers me nicely.

BFG10K · Nov 3, 2009

Just learning said:
Well aren't all video cards limited by the core?

In fact, I can't even imagine a video card existing that isn't completely dependent on core. The memory bandwidth is merely to support the cores processing power right?

Yes, almost all cards are limited by the core to some degree, but its not always the primary limitation. A GF2 MX for example would have memory as the primary limitation by far.

Yes, but isn't a ~8% decrease (in performance) from the memory underclock quite significant when an equivalent decrease in core only yields a 11% decrease in performance?

Yes, but remember the whole internet was convinced the 5770 was primarily bottlenecked by its memory. If that was the case then reducing it shouldve dropped performance the most, but it didnt; performance dropped more when the core was underlocked.

So yes, youre quite right, 11.20% (core) vs 8.46% is very close, but it proves two things:

(1) The core is primary limitation, not the memory.
(2) The 5770 is actually quite a balanced card and has been equipped with the right amount of bandwidth relative to its processing power.

As a comparison, on the 8800 Ultra I observed 12.64% core vs 5.45% memory overall. The 8800 Ultras results are even more different considering I ran half of the benchmarks without AA, while the 5770 didnt run anything lower than 2xAA.

If the 5770 was primarily bandwidth limited I wouldve expected similar numbers but reversed, except that didnt happen. So everyone running around claiming the 5770s performance is held back by its memory is technically correct, but performance is held back more by the core. I actually think the bottleneck is at the driver level personally, and not related to the hardware.

So yes, video cards will always be bottlenecked somewhere, but there are different degrees within each GPU element. Also if all clocks show similar performance drops then that means the part is balanced, and doesnt have an obvious limitation in one key area.

P.S. Thanks for doing this review. I was always interested in seeing underclocking results on the memory (considering the new GDDR5 or maybe its the controller has some type of error correction process that obscures results of the memory overclock)

Yes, Im glad someone gets it! Some people are confused why I underclock the parts instead of overclocking them, but what they dont realize is that its just a clock speed that offers a certain level of performance, and the hardware doesnt care about the English terms over and under that we apply to it.

The GDDR5 error correction is but one of many reasons why I believe its better to lower clocks than raise them. Other reasons include almost complete freedom with clocks (good luck getting 20%-30% overclocks on any GPU), and that fact that overclocking the GPU can skew the results if it starts exposing limitations in the CPU/platform.

IMO its much more reliable to start with a certain level of performance and then see what we lose when we reduce individual areas, because the stock clocks tell us what performance were supposed to get.

cbn · Nov 4, 2009

BFG10K said:
If the 5770 was primarily bandwidth limited I would’ve expected similar numbers but reversed, except that didn’t happen. So everyone running around claiming the 5770’s performance is held back by its memory is technically correct, but performance is held back more by the core. I actually think the bottleneck is at the driver level personally, and not related to the hardware.

So yes, video cards will always be bottlenecked somewhere, but there are different degrees within each GPU element. Also if all clocks show similar performance drops then that means the part is balanced, and doesn’t have an obvious limitation in one key area.

Just wondering if there is any kind of testing that stresses memory bandwidth proportionally more than the core?

You have mentioned AA stressing memory bandwidth....but wouldn't that stress the core equally as well? I am asking this because (for me) minimum frame rates are what I notice the most.

In fact, I would take a card that is more resistant to dips even if it meant losing some maximum frame rate performance.

cbn · Nov 4, 2009

BFG10K said:
Yes, almost all cards are limited by the core to some degree, but it’s not always the primary limitation. A GF2 MX for example would have memory as the primary limitation by far.

Your right.

The HTPC cards are a good example. I'm sure someone could lower the core on those and not see much change in performance (if memory bandwidth was still bottlenecking)

GodlessAstronomer · Nov 4, 2009

So I'm upgrading my video card in the next couple of weeks, I can't understand what could possibly compel me to buy the 5770? Seems to me like I'd be getting far better value from a 4870 or 4890.

BFG10K · Nov 4, 2009

Just learning said:
You have mentioned AA stressing memory bandwidth....but wouldn't that stress the core equally as well?

No, AA is usually primarily limited by memory bandwidth, especially edge based schemes like MSAA which save texturing fillrate and shader performance by only evaluating one shader/texture sample regardless of sample count.

AA also hits the ROPs but unless you have a problem with them (e.g. Radeon 2xxx/3xxx series), memory becomes the limitation far quicker.

cbn · Nov 4, 2009

BFG10K said:
No, AA is usually primarily limited by memory bandwidth, especially edge based schemes like MSAA which save texturing fillrate and shader performance by only evaluating one shader/texture sample regardless of sample count.

AA also hits the ROPs but unless you have a problem with them (e.g. Radeon 2xxx/3xxx series), memory becomes the limitation far quicker.

Thanks.

When I see a card with high average frame rates I often wonder if this because the card is actually resistant to dips or is it because its maximum frame rate is high.

Correct me if I am wrong but couldn't a HD5770 be capable of a higher maximum frame rates than a HD4870....but then the HD4870 is probably more resistant to dips with AA on (re: memory bandwidth doesn't become a restriction when graphical demands are at their highest)

HD5770= 1.35 TFLOPs (faster computational time= higher maximum frame rate possible as long as memory bandwidth doesn't become a restriction)

HD4870= 1.20 TFLOPs (slower computational time but then the memory bandwidth is so much greater)

bryanW1995 · Nov 4, 2009

GodlessAstronomer said:
So I'm upgrading my video card in the next couple of weeks, I can't understand what could possibly compel me to buy the 5770? Seems to me like I'd be getting far better value from a 4870 or 4890.

depends upon what you play, what res, what prices you can get in NZ, etc. if the prices are relatively close then the consensus seems to be go 5770. BFG and marcvenice, to name two knowledgeable sources, seem to think that 5770 is being held back deliberately by drivers (aka waiting for gt300). I don't necessarily subscribe to that theory, but I think that we can all agree that 5770 is going to improve much more over the next year's driver updates than 4870/90 or gtx 260 will. Add in the incredible power savings of 5770 and dx11 (which at first look appears to do what dx10 should have done and will probably end up being much more useful) and you would need to probably save $30-$40 on a 4870 1gb or gtx 260 to even CONSIDER one of them. If you can get a 4890 or gtx 275 for a few $$ less then that would be appealing, but otherwise I think that 5770 is the way to go.

@just learning: maybe bfg can confirm/deny this, but I think that 5xxx is better with AA than 4xxx was. If that's the case then you would probably have fewer/shallower dips in fps with 5770.

EDIT: by the way, I now have the links to all 3 articles in the OP.

cbn · Nov 5, 2009

deleted.

BFG10K · Nov 5, 2009

Just learning said:
Correct me if I am wrong but couldn't a HD5770 be capable of a higher maximum frame rates than a HD4870....but then the HD4870 is probably more resistant to dips with AA on (re: memory bandwidth doesn't become a restriction when graphical demands are at their highest)

HD5770= 1.35 TFLOPs (faster computational time= higher maximum frame rate possible as long as memory bandwidth doesn't become a restriction)

HD4870= 1.20 TFLOPs (slower computational time but then the memory bandwidth is so much greater)

Yes, the 4870 should do better than the 5770 with AA because it has more memory bandwidth, but we cant really tell at the moment because the 5xxx series appears to be underperforming in general.

Also there have been other improvements made to the 5xxx over the 4xxx relating to memory, such as caching. So memory bandwidth isnt necessarily the end-all to AA performance in this particular situation.

BFG10K · Nov 5, 2009

bryanW1995 said:
@just learning: maybe bfg can confirm/deny this, but I think that 5xxx is better with AA than 4xxx was. If that's the case then you would probably have fewer/shallower dips in fps with 5770.

Well, the 5770 has less bandwidth, but it has better caches than the 4870. Again, until we see the full performance of the 5xxx its tough to actually say how useful that extra bandwidth on the 4870 is.

EDIT: by the way, I now have the links to all 3 articles in the OP.

Thank you.

BenSkywalker · Nov 5, 2009

BFG and marcvenice, to name two knowledgeable sources, seem to think that 5770 is being held back deliberately by drivers (aka waiting for gt300).

I don't think they are waiting for Fermi, although they could be doing it to allow old inventory to get flushed out of the system. ATi is not going to want to take a write down on existing 48xx parts, if the 5770 was considerably faster then the top tier 4xxx boards then they would be forced to take a write down on them which is not something any smart business wants to do.

Thanks, your comments and criticisms are always valuable to me in this area.

With all my years reading all the various tech sites to come and go with so many different journalists reporting, I think you are the only person as anal about IQ as I am that I have seen

From what I’ve seen, I don’t believe they are adjusting the LOD. Of course they could be doing it behind the scenes in some of the games I tested. If you look at the Doom 3 screenshots for example, even 8xSS had practically zero blurring, but I never touched the LOD.

Given some of the oddities we saw from your AF testing, do you think this could be an issue of not having enough room in the viewport to be visible? Have you tried it without AF? Just a curiosity thing, but it may help narrow it down.

They’ve had profiles since CCC went live, but they’re not like nVidia’s. nVidia’s automatically activate when a game is opened, but ATi’s don’t.

Heh, didn't word that well but that's what I meant. The auto profiles are so easy to get used to I hate having to go into the CP for anything these days

Since those old cards were limited to 2x/8x, that addendum covers me nicely.

Yes it does, I went back and tried to find some tests with it but I couldn't. Was tempted to dig out the Ti4200 but it dawned on me that I don't have an AGP system up and running at the moment

HurleyBird · Nov 5, 2009

Well, the IQ portion of the article was simply great. It confirmed my feelings upon getting a 5870 that aniso quality took a dive from my previous 8800GTS. I very much hope that IQ issues are fixed in future drivers, as angle dependence is far better (only sometimes noticeable) than uneven filtering transition ranges (pretty much always noticeable). I'm shocked no other professional reviewers were able to do those same tests or even notice that something was obviously wrong.
That aside, the bottleneck investigation was extremely poor. Your reasoning against increasing clocks (error correction, poor scaling) is valid, but there are also valid reasons against testing by decreasing clocks. You're assuming scaling is linear, which it very well might not be in some situations. For example, going from 1100MHz memory to 1200MHz memory clocks might yield a 10% speedup, but that doesn’t mean that going from 1200MHz to 1300MHz will behave the same. In the worst case something completely different will bottleneck the card past 1200MHz and you will receive a 0% speedup. That kind of example might be extremely unlikely, but you can still get flawed results. A purely scientific approach would use neither methodology: Instead you would reduce both core and memory clocks in proportion, say down to half the original values, and THEN you would increase clockspeeds individually in increments. Increments are the key here, as you need several data points to determine a meaningful trend. A single data point yields hardly any information. Most interesting would be the difference between the original value where everything is at stock, and the value where the only thing changed is the core frequency reduced by half. That test would give you the max theoretical (because the CPU might become a bottleneck) speedup of a part with a memory controller twice as wide.

Also, your claim that R8xx is a balanced architecture based on speedown via underclocking is far too simplistic. Whether it is balanced or not depends not only on speedup, but also on the number of transistors (more accurately the area of transistors) devoted to each function. For example, if mem and core clocks are equally important to performance, but the memory controller only takes up 10% of the die while the shader units take up 50%, that is NOT indicative of a balanced architecture. In that example focusing on the memory controller would take 5 times less resources to achieve similar speedup as beefing up the shaders would. Pad limiting is of course an excuse, but doesn’t change the fact that an architecture is balanced or not.

BFG10K · Nov 6, 2009

BenSkywalker said:
Given some of the oddities we saw from your AF testing, do you think this could be an issue of not having enough room in the viewport to be visible?

I dont believe so; I tried multiple games (e.g. Call of Duty 1, Doom 3, Quake 4) and SS exhibited practically no blurring. The issue with Call of Duty 4 clearly wasnt normal.

Have you tried it without AF? Just a curiosity thing, but it may help narrow it down.

No, but Id never run below 16xAF anyway so its probably a moot point.

I get the feeling I might be revisiting the issue in the future anyway.

dug777 · Nov 6, 2009

Nice work mate

Almost tempted to pick two up

On the balanced cards issue, I always feel that the 4870 was far better balanced than the 4850, would be interesting to see some 4870 512mb benchmarks cf a 4850 with equivalent core.

I've always seemed to score incredibly imbalanced cards, one way or the other, but a 4850 that does 780 core but barely budges on mem is a particularly cruel joke (not that I don't get a substantial boost in FPS in most things, but the wasted potential smashing into that mem bandwidth bottleneck brings tears to my eyes

)

BFG10K · Nov 6, 2009

HurleyBird said:
I'm shocked no other professional reviewers were able to do those same tests or even notice that something was obviously wrong.

Heh, like I said earlier, my methods are somewhat unorthodox, but thats because I try to show things that are a little different to the standard fare.

You're assuming scaling is linear

Not at all; Im only looking at what shows the most performance loss when the reduction is constant for both. If for example I had gotten 4% from the core and 2% from the memory I would have reached exactly the same conclusion.

In actual fact itll almost never be linear because its unlikely one clock will be the sole the bottleneck in every situation. Whats important is how each clock compares relative to the others, rather than the actual FPS or % drop.

A purely scientific approach would use neither methodology: Instead you would reduce both core and memory clocks in proportion, say down to half the original values, and THEN you would increase clockspeeds individually in increments. Increments are the key here, as you need several data points to determine a meaningful trend. A single data point yields hardly any information.

That would certainly be more comprehensive, but I still believe what I showed is still useful and accurate.

What I did operates on the simple philosophy that you have X amount of performance, derived from components A and B running at stock speed.

If we want to see which component is more important, we reduce one while leaving the other alone, and see which lowers performance the most.

The component that shows the lowest impact means it has the least amount to do with the performance when all are at stock.

Also, your claim that R8xx is a balanced architecture based on speedown via underclocking is far too simplistic. Whether it is balanced or not depends not only on speedup, but also on the number of transistors (more accurately the area of transistors) devoted to each function.

Understood, but relative transistor count was beyond the scope of the article, and was never intended to be covered. The article was only about performance so when I commented the 5770 was balanced, I meant that solely from a performance point of view.

This is unlike the 8800 Ultra for example which showed a reliance on the core by more than two times over memory. That is an unbalanced design from a performance point of view because either the core is too slow, or the shader and memory are too fast if you want to look at it another way.

5770 benchmarks at AlienBabelTech

Moderator Emeritus <br>

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

Lifer

Lifer