NVIDIA 9800GTX+ Review Thread

ChronoReverse · Jun 24, 2008

Originally posted by: keysplayr2003
No. They are extremely inefficient in most circumstances, if not all circumstances. That is why they needed so many of them. 800 shaders should be a juggernaut Core. Un----touch----able. Hell, even half that (400) would be unstoppable. But that is not the case here. AMD did the best they could in a given amount of time. And with the time they had, their only option was to place an ocean of these same shaders onto a core. Die size and transistor count is only relevant from a cost perspective. I'm talking pure performance here Extelleron. I just want to make that clear. Performance per shader. And when all is said and done, it doesn't really matter as long as they get the job done.
Yes, Nv shaders are different from AMD/ATI's. Vastly different. It takes 1600 AMD/ATI shaders (2x4850's) to stay with a 240 shader GTX280. Vastly different.

Preliminary information right now from the Beyond3d forums has it that all 800 shaders of the RV770 may actually be "fat" and thus efficiency will be _much_ higher than I anticipated.

The real NDA expires tonight so we'll find out for sure but this should be pretty reliable.

This may be the reason why the 4870 turned out to be so fast and competes well with the GTX260.

Keysplayr · Jun 24, 2008

Originally posted by: ChronoReverse

Originally posted by: keysplayr2003
No. They are extremely inefficient in most circumstances, if not all circumstances. That is why they needed so many of them. 800 shaders should be a juggernaut Core. Un----touch----able. Hell, even half that (400) would be unstoppable. But that is not the case here. AMD did the best they could in a given amount of time. And with the time they had, their only option was to place an ocean of these same shaders onto a core. Die size and transistor count is only relevant from a cost perspective. I'm talking pure performance here Extelleron. I just want to make that clear. Performance per shader. And when all is said and done, it doesn't really matter as long as they get the job done.
Yes, Nv shaders are different from AMD/ATI's. Vastly different. It takes 1600 AMD/ATI shaders (2x4850's) to stay with a 240 shader GTX280. Vastly different.

Click to expand...

Preliminary information right now from the Beyond3d forums has it that all 800 shaders of the RV770 may actually be "fat" and thus efficiency will be _much_ higher than I anticipated.

The real NDA expires tonight so we'll find out for sure but this should be pretty reliable.

"fat" as in "No more 1 complex shader and 4 simple shaders out of every 5? Is that one complex shader what you call "fat"?
If that were the case, we would be seeing a helluva lot more performance out of RV770. It would be almost staggering. So, methinks Beyond3d is barking up the wrong tree. AMD may have tweaked the shaders and the architecture a bit, but they are still VEC5 shaders if the performance in any indicator. If anything, the 48xx has 160 "fat" shaders and 640 annorexic shaders as opposed to R6xx 64 "fat" shaders and 256 annorexic shaders. IMHO.

ChronoReverse · Jun 24, 2008

Yeah, it's rather unbelievable. However, the RV770 might be just bottlenecked by other things. It already is beating the GTX260 in some benches with a slower CPU after all.

Rys from Beyond3D
You'll only be waiting for 2.5 hours.

Random question answering from the thread and notes on things, since I won't have anything for NDA expiry (it's already 2am here, I'm not done, and I have work in the morning sadly):

FP16 filtering is half speed, and the samplers are limited by available interpolators (only 32 texcoords/clk) when processing INT8 at full speed

260mm2, 960M transistors or so

Huge focus on area efficiency and perf/watt

Chip was pad limited in the beginning***, so the last couple of SIMDs are value adds that weren't originally planned for. Explains the first point a little bit.

ROPs are mostly 2x everywhere measured with MSAA and really help the chip go fast. ROP MSAA downfilter this time.

Seems to be 64KiB L1 per sampler with huge L1 bandwidth, new design

Finding peak rates everywhere on the chip has been easy. I've seen 1Tflop FP32, full bilinear rates and peak INT8 blend and Z-only (256 Zs/clock, yay!)

GDDR5 is really what lets the chip kick some real ass

GS perf is up compared to RV670, maybe a new (bigger) coalescing stream out cache there, and more threads in flight

Colour cache is per ROP, same as R6

16KiB per SIMD shared memory + 16KiB global (not the SMX)

All 800 SPs can do integer, fat one only for specials still. 1 FP32 MAD or 1 FP<->INT or 1 integer op/clock for all

New caching scheme for the RF I think

Orlando-led design but very distributed in terms of teams. Scott Hartog led, and he worked on i740 at Intel

Over 5MiB of SRAM on-chip if you count every pool

New UVD block with 7.1 over HDMI

No ring bus MC, new controller nice and efficient due to new ROP design

It's the single most impressive graphics processor (and pairing with a memory technology, nice one Joe!) I've ever seen, when looked at as a whole. I don't say that lightly either, there have been some winning chips over the years.

Deeply impressive and really deserves to get ATI back on the map when it comes to performance 3D graphics. Sorry I won't have anything more filling for arch NDA expiry, go read hardware.fr, Tech Report and Morgoth's pieces if you want more data.

*** :lol:^infinity, that's honestly the best thing ever

Huh, he corrected it, looks like it was a typo. Heart attack over =P

FALSE ALARM

MarcVenice · Jun 25, 2008

Originally posted by: bryanW1995

Originally posted by: MarcVenice
You're arguing with Chizow, it's useless, he won't budge, ever, no matter how damning the evidence you give him. The HD4850 might not be impressive from a performance crown point of view, but it was never produced to be ATI's topperformer, let alone beat Nvidia's top performer. The HD4850 is VERY impressive if you look at it's performance you get for 200$, the same and often better performance then the 9800gtx, that was selling for 300$ not more then a week ago. It FORCED nvidia to drop prices considerably, on a videocard that's more expensive to produce then the HD4850. Knowing the HD4850 sometimes even comes close to Nvidia's $400 videocard, the GTX260, is even more impressive.

We will have to wait and see how impressive the HD4870 will actually be, and what it will do with prices of the gtx280 and gtx260. But only in a few months can we really determine a winner, when nvidia comes with 55nm GTX260's and GTX280's. But then again, ATI will have their HD4870X2 out by then. Exciting times, when financial results get released we will know for sure how good ATI is faring against Nvidia. The 8800gt launch was called impressive, but the HD4850 launch beats it fair and square.

Click to expand...

I'm stoked about 4850 as much as the next gpu fan, but I think that it's an overstatement to say that it beats the 8800gt launch fair and square. 8800gt, iirc, was called "the only card that matters" by anand during his review. it completely obliterated everything else on the market other than 8800gtx. It beat amd's incredibly successful 3870 (compared to the 2900xt that it replaced/rendered useless) by a good 15-20%, and nothing else even came close. It was close enough to 8800gtx that people were selling their 8800gtx's to get 2 8800gt's in xfire, or just one 8800gt and a boatload of cash. It was just plain stupid.

Amd has kicked some serious butt here, don't get me wrong, but the nvidia of june 2008 isn't asleep at the wheel like amd was from nov 06 to nov 07. nvidia is going to lose this round, but they're going down swingin' at least and we will all benefit because of it.

The 8800gt was nice, but imo it was still a rehash of g80, it didn't beat my 8800gts 320mb by a big enough margin for me to consider to upgrade. It didn't make prices of AMD go down, and it took a while for prices of the 8800gtx to lower as well. It trumped the HD3870 for sure, just like the HD4850 is trumping the 9800gtx and even comes close to the gtx260, a much more expensive videocard. If the HD4850 launch isn't more impressive then the 8800gt launch, then it's at least equally impressive. Also, don't forget the 8800gt was a half paper launch, there was no large stock which led to instant price gouging PAST MSRP.

Did you notice I own an Nvidia card? I'm in no way biased, I'm loving these price wars. I'm not due for a new videocard for at least a few more months, unfortenalely, and if Nvidia releases something that can compete with ATI's bang for buck cards, then I might even stick with Nvidia.

Extelleron · Jun 25, 2008

Originally posted by: keysplayr2003

Originally posted by: Extelleron

Originally posted by: keysplayr2003

Originally posted by: ChronoReverse
Worse case scenario is 1/5 of that of course. But I'd hope they'd get at least 37.5% efficiency (so that the 4870 would match the GTX280).

Click to expand...

I was kind of hoping they might do a rework of their shaders. Seems like such a humongous waste of transistors. But, if this is what they can do for now, I guess they could have done a lot worse. 4xxx series looks to be very nicely done.

Click to expand...

It might seem like that, but efficiency on R600-based GPUs can be very good in certain circumstances and the stream processors in R600/R700 are extremely efficient per die size/transistor count. AMD has packed an extra 480SP + 24 TMU + revamped ROPs in a die ~260mm^2 and with only ~290M more transistors than RV670.

So even if the shaders are inefficient (which they only are under certain circumstances) in how they perform per number of SPs, they are still incredibly efficient in terms of the amount of die space/transistors used to acheive a certain performance level. nVidia's SPs appear to take up much more space as seen by the huge transistor count of GT200 compared to G80/G92.

Click to expand...

No. They are extremely inefficient in most circumstances, if not all circumstances. That is why they needed so many of them. 800 shaders should be a juggernaut Core. Un----touch----able. Hell, even half that (400) would be unstoppable. But that is not the case here. AMD did the best they could in a given amount of time. And with the time they had, their only option was to place an ocean of these same shaders onto a core. Die size and transistor count is only relevant from a cost perspective. I'm talking pure performance here Extelleron. I just want to make that clear. Performance per shader. And when all is said and done, it doesn't really matter as long as they get the job done.
Yes, Nv shaders are different from AMD/ATI's. Vastly different. It takes 1600 AMD/ATI shaders (2x4850's) to stay with a 240 shader GTX280. Vastly different.

The number of stream processors cannot be compared across two architectures. You cannot say that GT200 is more efficient than RV770 because RV770 requires "800SP" to equal the 240SP in GT200. For one, nVidia and AMD count their SPs differently. By the way nVidia counts them, RV770 has 160SPs and RV770XT is able to equal GT200 in shading performance. If you want to count nVidia's SPs the way AMD does, then GT200 has 720SPs. GT200 SPs are also run at a much higher clock than AMD's.

When ATI was designing R600 and R700, they went with this design for a reason. RV770's stream processors ARE more efficient than GT200's in the only way that matters, which is the number of transistors/SP. And if you count the SPs by the same method (RV770 = 160SPs @ 750MHz , GT200 = 240SPs @ 1.3GHz) RV770 is actually more efficient.

nVidia's architecture is great in terms of how much usage you get out of the available resources, but the problem is that the resources take up a lot of space. Personally I can't believe how much AMD was able to stuff into RV770 given only ~300M more transistors than RV670. And I can't believe how big the GT200 die is given it is not 2x G80 in any area. nVidia's stream processors clearly take up a lot of die space and this is not helped by the fact that nVidia has to have dedicated DP units.

BenSkywalker · Jun 25, 2008

The only thing worse than a paper launch is a last-minute scramble paper launch, and that's exactly what this is.

I didn't get a chance to log yesterday, are you talking about the GTX+ or the 4870? Either way, they seem like announcements to me. Call them both panic launches if you will, guess we would have to disagree on terminology there.

Too bad the price didn't reflect its midrange status until after the 4850 launch

Think outside the box- imagine you aren't talking to some idiot loyalist

The 9800GTX was an absolute horrible deal until the price drop due to nVidia's own parts even if ATi never released a card this generation, the same would have held true. Now, nV made the price work as they were willing to allow their own parts to steal marketshare, but they won't so easily give it up to someone else. Besides that, they have the 260 and 280 for margins now.

You forgot to mention that SLI requires a craptastic NV mobo, doesn't always work, and has other issues such as broken vsync, no multi-monitor support, micro-stuttering, etc. If SLI is the magic answer when NV can't compete with a single gpu, then where does that leave the gtx280 which costs 2x as much as 8800gt SLI and is still slower?

I don't quite get the 'craptastic nv mbo' line, care to expand? What is wrong with their chipsets currently? Honestly interested here, as the only thing I am aware of is a shady IHV using crap parts on some mobos that used nV chipsets a few generations ago. SLI isn't any sort of magic answer whatsoever. While it clearly has less issues then CF still(particularly with scaling) neither of them are going to properly replicate a single GPU solution. That said, we are discussing price/performance and from that aspect it is hard to ignore the fact that the 4850 fails to deliver anything vaguely resembling the knockout die hard ATi loyalists are trying to portray. Simply put, the 4850 is spanked on that aspect by a last gen mid tier part running in SLI. If we want to eliminate SLI as a viable option for the purpose of this discussion that is fine.

Too bad the gtx280 still gets creamed by the 4850 when it comes to DP computation.

Maybe it's just poor information on my part, but to the best of my knowledge ATi has OFLOPS on IEEE compliant FP64 ops, they are using two FP32 units to approximate some of the operations but that leads to issues with word length IIRC. Maybe it's just me, but that has been my understanding. If my understanding is correct it makes their FP64 performance for the high end market equal to the bottle cap I have sitting in front of me(just a regular 20oz Dew one, not even die shrunk

).

Keysplayr · Jun 25, 2008

Originally posted by: Extelleron

Originally posted by: keysplayr2003

Originally posted by: Extelleron

Originally posted by: keysplayr2003

Originally posted by: ChronoReverse
Worse case scenario is 1/5 of that of course. But I'd hope they'd get at least 37.5% efficiency (so that the 4870 would match the GTX280).

Click to expand...

I was kind of hoping they might do a rework of their shaders. Seems like such a humongous waste of transistors. But, if this is what they can do for now, I guess they could have done a lot worse. 4xxx series looks to be very nicely done.

Click to expand...

It might seem like that, but efficiency on R600-based GPUs can be very good in certain circumstances and the stream processors in R600/R700 are extremely efficient per die size/transistor count. AMD has packed an extra 480SP + 24 TMU + revamped ROPs in a die ~260mm^2 and with only ~290M more transistors than RV670.

So even if the shaders are inefficient (which they only are under certain circumstances) in how they perform per number of SPs, they are still incredibly efficient in terms of the amount of die space/transistors used to acheive a certain performance level. nVidia's SPs appear to take up much more space as seen by the huge transistor count of GT200 compared to G80/G92.

Click to expand...

No. They are extremely inefficient in most circumstances, if not all circumstances. That is why they needed so many of them. 800 shaders should be a juggernaut Core. Un----touch----able. Hell, even half that (400) would be unstoppable. But that is not the case here. AMD did the best they could in a given amount of time. And with the time they had, their only option was to place an ocean of these same shaders onto a core. Die size and transistor count is only relevant from a cost perspective. I'm talking pure performance here Extelleron. I just want to make that clear. Performance per shader. And when all is said and done, it doesn't really matter as long as they get the job done.
Yes, Nv shaders are different from AMD/ATI's. Vastly different. It takes 1600 AMD/ATI shaders (2x4850's) to stay with a 240 shader GTX280. Vastly different.

Click to expand...

The number of stream processors cannot be compared across two architectures. You cannot say that GT200 is more efficient than RV770 because RV770 requires "800SP" to equal the 240SP in GT200. For one, nVidia and AMD count their SPs differently. By the way nVidia counts them, RV770 has 160SPs and RV770XT is able to equal GT200 in shading performance. If you want to count nVidia's SPs the way AMD does, then GT200 has 720SPs. GT200 SPs are also run at a much higher clock than AMD's.
------------------------------
Keysplayr posted: You misunderstood me, or I explained it poorly. One of the two.

What I meant was exactly this:

If AMD had all their shaders equivalent to the "fat" or complex shader (which is what I was hoping for) it would be a force to reckon with. Maybe even a force that couldn't be reckoned with. The only way in which I was comparing Nv shaders to AMD shaders was not in number, but in type. Each Nvidia shader can handle any function thrown at it. All of them the same. AMD's 1st shader out of every 5 can do the same. The other 4 are limited. AMD would not require 800 shaders. Granted, AMD would probably end up with a bigger die size if they had say 256 complex shaders.
----------------------------
When ATI was designing R600 and R700, they went with this design for a reason. RV770's stream processors ARE more efficient than GT200's in the only way that matters, which is the number of transistors/SP. And if you count the SPs by the same method (RV770 = 160SPs @ 750MHz , GT200 = 240SPs @ 1.3GHz) RV770 is actually more efficient.
----------------------------
Keysplayr posted: In the only way that matters huh? Ok. I doubt many share that view.
I would not stand there and say, "The only thing that matters is that AMD's die size is smaller than Nvidia's die size." That is just nonsense. The end user cares more about the end result. The performance. AMD has done a great job offering the 4xxx at their price points. They perform very well. But they could perform so much better with a shader rework.
Your calcualtions are not quite right. It's not like RV770's other 640 simple shaders aren't used.
Just nowhere near as useful as the complex shader. They do what they can.
----------------------------
nVidia's architecture is great in terms of how much usage you get out of the available resources, but the problem is that the resources take up a lot of space. Personally I can't believe how much AMD was able to stuff into RV770 given only ~300M more transistors than RV670. And I can't believe how big the GT200 die is given it is not 2x G80 in any area. nVidia's stream processors clearly take up a lot of die space and this is not helped by the fact that nVidia has to have dedicated DP units.
--------------------------------

Keysplayr posted: I can't help but notice how focused you are on die size and transistor count. But then again, to you, that's all that matters. Why? I have no idea, unless you are allergic. If you are that worried about the GT200's, don't be. They will be shrunk eventually. Just as Nvidia has always done.
---------------------------------

Denithor · Jun 25, 2008

I personally think that the best value in GPUs today is 2x9800GTX in SLI for $400. If you look at the multiGPU portion of the 4870 review you will see that this combo completely destroys the GTX 280 (for $250 less), beats the 9800GX2 (at $100 less), and rivals the 2x4870 in most benchmarks (at $200 less).

ChronoReverse · Jun 25, 2008

Originally posted by: BenSkywalker

Too bad the gtx280 still gets creamed by the 4850 when it comes to DP computation.

Click to expand...

Maybe it's just poor information on my part, but to the best of my knowledge ATi has OFLOPS on IEEE compliant FP64 ops, they are using two FP32 units to approximate some of the operations but that leads to issues with word length IIRC. Maybe it's just me, but that has been my understanding. If my understanding is correct it makes their FP64 performance for the high end market equal to the bottle cap I have sitting in front of me(just a regular 20oz Dew one, not even die shrunk ).

I've confirmed a bit on the Beyond3D forums and it seems the GTX280 also get0GLOPS for IEEE754 compliant DP.

So as long as we are comparing Apples-To-Apples, the GTX280 gets about 80GFLOPS peak while the 4870 gets 240GFLOPS peak.

Keysplayr · Jun 25, 2008

Originally posted by: ChronoReverse

Originally posted by: BenSkywalker

Too bad the gtx280 still gets creamed by the 4850 when it comes to DP computation.

Click to expand...

Maybe it's just poor information on my part, but to the best of my knowledge ATi has OFLOPS on IEEE compliant FP64 ops, they are using two FP32 units to approximate some of the operations but that leads to issues with word length IIRC. Maybe it's just me, but that has been my understanding. If my understanding is correct it makes their FP64 performance for the high end market equal to the bottle cap I have sitting in front of me(just a regular 20oz Dew one, not even die shrunk ).

Click to expand...

I've confirmed a bit on the Beyond3D forums and it seems the GTX280 also get0GLOPS for IEEE754 compliant DP.

So as long as we are comparing Apples-To-Apples, the GTX280 gets about 80GFLOPS peak while the 4870 gets 240GFLOPS peak.

Peak is great. But what are the average numbers for real? What are the chances a GTX280 will see 80GFLOPS? And the 4870 240GFLOPS? Peak is nice on paper. What do we get in real life?

ChronoReverse · Jun 25, 2008

Depends on the workload of course.

For simpler smaller batches the GTX280 will get closer to peak while for larger complex loads, the 4870 will get closer to peak.

However, the 4870 would have to perform at only 33% efficiency to match the GTX280 at 100% efficiency.

Keysplayr · Jun 25, 2008

Originally posted by: ChronoReverse
Depends on the workload of course.

For simpler smaller batches the GTX280 will get closer to peak while for larger complex loads, the 4870 will get closer to peak.

However, the 4870 would have to perform at only 33% efficiency to match the GTX280 at 100% efficiency.

I think you pegged those numbers just now. Good call.

bryanW1995 · Jun 25, 2008

Originally posted by: MarcVenice

Originally posted by: bryanW1995

Originally posted by: MarcVenice
You're arguing with Chizow, it's useless, he won't budge, ever, no matter how damning the evidence you give him. The HD4850 might not be impressive from a performance crown point of view, but it was never produced to be ATI's topperformer, let alone beat Nvidia's top performer. The HD4850 is VERY impressive if you look at it's performance you get for 200$, the same and often better performance then the 9800gtx, that was selling for 300$ not more then a week ago. It FORCED nvidia to drop prices considerably, on a videocard that's more expensive to produce then the HD4850. Knowing the HD4850 sometimes even comes close to Nvidia's $400 videocard, the GTX260, is even more impressive.

We will have to wait and see how impressive the HD4870 will actually be, and what it will do with prices of the gtx280 and gtx260. But only in a few months can we really determine a winner, when nvidia comes with 55nm GTX260's and GTX280's. But then again, ATI will have their HD4870X2 out by then. Exciting times, when financial results get released we will know for sure how good ATI is faring against Nvidia. The 8800gt launch was called impressive, but the HD4850 launch beats it fair and square.

Click to expand...

I'm stoked about 4850 as much as the next gpu fan, but I think that it's an overstatement to say that it beats the 8800gt launch fair and square. 8800gt, iirc, was called "the only card that matters" by anand during his review. it completely obliterated everything else on the market other than 8800gtx. It beat amd's incredibly successful 3870 (compared to the 2900xt that it replaced/rendered useless) by a good 15-20%, and nothing else even came close. It was close enough to 8800gtx that people were selling their 8800gtx's to get 2 8800gt's in xfire, or just one 8800gt and a boatload of cash. It was just plain stupid.

Amd has kicked some serious butt here, don't get me wrong, but the nvidia of june 2008 isn't asleep at the wheel like amd was from nov 06 to nov 07. nvidia is going to lose this round, but they're going down swingin' at least and we will all benefit because of it.

Click to expand...

The 8800gt was nice, but imo it was still a rehash of g80, it didn't beat my 8800gts 320mb by a big enough margin for me to consider to upgrade. It didn't make prices of AMD go down, and it took a while for prices of the 8800gtx to lower as well. It trumped the HD3870 for sure, just like the HD4850 is trumping the 9800gtx and even comes close to the gtx260, a much more expensive videocard. If the HD4850 launch isn't more impressive then the 8800gt launch, then it's at least equally impressive. Also, don't forget the 8800gt was a half paper launch, there was no large stock which led to instant price gouging PAST MSRP.

Did you notice I own an Nvidia card? I'm in no way biased, I'm loving these price wars. I'm not due for a new videocard for at least a few more months, unfortenalely, and if Nvidia releases something that can compete with ATI's bang for buck cards, then I might even stick with Nvidia.

marc, I know that you're no fanboy. Neither am I, I just feel that 8800gt was at least equally impressive. 8800gt was priced so high not just because of limited availability because within a couple months that situation sorted itself out. It was priced at $250 + for a LONG time simply because amd had absolutely nothing to compete with it and it was too clost to 8800gtx performance. 48x0 is equally as important for amd imho as 8800gt was for nvidia, I just feel that it's hard to justify going to 4850 from 8800gt and, unfortunately for amd, the 8800gt pulled a LOT of people with old gen cards into the 8xxx series.

I'm interested to see if nvidia will continue to sell 8800gt at such low prices or if they'll try to extract more performance out of 9800gt instead at 55nm.

BenSkywalker · Jun 25, 2008

I've confirmed a bit on the Beyond3D forums and it seems the GTX280 also get0GLOPS for IEEE754 compliant DP.

So as long as we are comparing Apples-To-Apples, the GTX280 gets about 80GFLOPS peak while the 4870 gets 240GFLOPS peak.

100% IEEE compliant? That would eliminate everything besides x87 and SSE IIRC, a lot of general purpose processors can't hit 100% compliant. That is VERY different then trying to double up SP units to try and recreate DP accuracy. You will have to give up...(been a while) 2 decimal points accuracy versus a DP unit. We aren't talking about some vague portion of the spec, that is a rather large issue.

Also, unless I'm mistaken looking at ATi's architecture they would need to tie up 4 SP units for one DP op, not two. That would drop their rather heavily inaccurate performance from 240 to 120.

ATi's Firestream is supposed to offer 200GFLOPS DP performance, that should be coming Q3-Q4 and is supposed to sell for around $1K.

bryanW1995 · Jun 25, 2008

I didn't get a chance to log yesterday, are you talking about the GTX+ or the 4870? Either way, they seem like announcements to me. Call them both panic launches if you will, guess we would have to disagree on terminology there.

4870 was a weak launch but at least it was available today as planned. gtx+-x= won't be even weakly launched for nearly another month.

Think outside the box- imagine you aren't talking to some idiot loyalist The 9800GTX was an absolute horrible deal until the price drop due to nVidia's own parts even if ATi never released a card this generation, the same would have held true. Now, nV made the price work as they were willing to allow their own parts to steal marketshare, but they won't so easily give it up to someone else. Besides that, they have the 260 and 280 for margins now.

I don't think anybody here is calling you an nvidia fanboy, I know that I'm not doing that. However, how is nvidia going to make $$ selling a $400 card with <= $300 card performance and a $650 card with ~ $400 performance?

bryanW1995 · Jun 25, 2008

Originally posted by: Denithor
I personally think that the best value in GPUs today is 2x9800GTX in SLI for $400. If you look at the multiGPU portion of the 4870 review you will see that this combo completely destroys the GTX 280 (for $250 less), beats the 9800GX2 (at $100 less), and rivals the 2x4870 in most benchmarks (at $200 less).

that's interesting that you didn't bring up crossfire 4850 for ~ $300. taking away that bestbuy deal, there is now a link in hot deals on 4850 for $149 AR shipped (asus) and $154 AR shipped (sapphire). Since 4850 is faster than 9800gtx and it can be had for at least $25 less at this point (recently saw 9800gtx for 179 AR), a crossfire setup is $50 cheaper. Which one is the better deal now?

Paratus · Jun 25, 2008

The 4800's use the 4 thin SPs for 1 DP op & the 1 fat one for another.

This gives 160 X 2 X 750MHZ = 240 GFLOPS.

posted via Palm Life Drive

ChronoReverse · Jun 25, 2008

Originally posted by: BenSkywalker
100% IEEE compliant? That would eliminate everything besides x87 and SSE IIRC, a lot of general purpose processors can't hit 100% compliant. That is VERY different then trying to double up SP units to try and recreate DP accuracy. You will have to give up...(been a while) 2 decimal points accuracy versus a DP unit. We aren't talking about some vague portion of the spec, that is a rather large issue.

ATI has SP's that can do the DP by itself. The other SP's combine 4 for the same kind of DP.

Also, unless I'm mistaken looking at ATi's architecture they would need to tie up 4 SP units for one DP op, not two. That would drop their rather heavily inaccurate performance from 240 to 120.

I'm not even sure what your point is. Especially since it's not even quite correct

Where did you think I got the theoretical peaks from?

In any case, Paratus is correct.

Denithor · Jun 25, 2008

Originally posted by: bryanW1995

Originally posted by: Denithor
I personally think that the best value in GPUs today is 2x9800GTX in SLI for $400. If you look at the multiGPU portion of the 4870 review you will see that this combo completely destroys the GTX 280 (for $250 less), beats the 9800GX2 (at $100 less), and rivals the 2x4870 in most benchmarks (at $200 less).

Click to expand...

that's interesting that you didn't bring up crossfire 4850 for ~ $300. taking away that bestbuy deal, there is now a link in hot deals on 4850 for $149 AR shipped (asus) and $154 AR shipped (sapphire). Since 4850 is faster than 9800gtx and it can be had for at least $25 less at this point (recently saw 9800gtx for 179 AR), a crossfire setup is $50 cheaper. Which one is the better deal now?

The 9800GTX+ SLI (and yes, I'm aware that card isn't even out yet--but an OC'ed 9800GTX will do the same--cheaper) beats/matches the CF 4850 most benchmarks (wins 4, loses 2, ties 1), most of the time convincingly. It even manages to beat/match the CF 4870 in half the benchies (wins 3, loses 3, ties 1). Now, this may be just SLI scaling better, but if you're looking at multiGPU options the value is pretty good (especially considering it essentially ties the 4870CF for $200 lower cost).

Now, don't get me wrong here, I'm no nVidia fanboy. I have a 4850 sitting on the desk beside me (BB deal

) waiting for an Accelero S1 before it replaces the 9600GT currently in my main box (9600GT going into HTPC). I'm not enough of a gamer to warrant spending $300-400 on a card and I'm not into multiGPU solutions personally. But I think there's a lot of value in a pair of $200 9800GTXs.

Extelleron · Jun 25, 2008

Originally posted by: Denithor

Originally posted by: bryanW1995

Originally posted by: Denithor
I personally think that the best value in GPUs today is 2x9800GTX in SLI for $400. If you look at the multiGPU portion of the 4870 review you will see that this combo completely destroys the GTX 280 (for $250 less), beats the 9800GX2 (at $100 less), and rivals the 2x4870 in most benchmarks (at $200 less).

Click to expand...

that's interesting that you didn't bring up crossfire 4850 for ~ $300. taking away that bestbuy deal, there is now a link in hot deals on 4850 for $149 AR shipped (asus) and $154 AR shipped (sapphire). Since 4850 is faster than 9800gtx and it can be had for at least $25 less at this point (recently saw 9800gtx for 179 AR), a crossfire setup is $50 cheaper. Which one is the better deal now?

Click to expand...

The 9800GTX+ SLI (and yes, I'm aware that card isn't even out yet--but an OC'ed 9800GTX will do the same--cheaper) beats/matches the CF 4850 most benchmarks (wins 4, loses 2, ties 1), most of the time convincingly. It even manages to beat/match the CF 4870 in half the benchies (wins 3, loses 3, ties 1). Now, this may be just SLI scaling better, but if you're looking at multiGPU options the value is pretty good (especially considering it essentially ties the 4870CF for $200 lower cost).

Now, don't get me wrong here, I'm no nVidia fanboy. I have a 4850 sitting on the desk beside me (BB deal ) waiting for an Accelero S1 before it replaces the 9600GT currently in my main box (9600GT going into HTPC). I'm not enough of a gamer to warrant spending $300-400 on a card and I'm not into multiGPU solutions personally. But I think there's a lot of value in a pair of $200 9800GTXs.

The problem with comparing SLI & Crossfire solutions is that you cannot consider both at the same time. With multi-GPU setups, you make your choice when you buy your motherboard and you must plan your video card purchases around which motherboard you buy.

nVidia SLI solutions are a great deal in many cases, but they are not an option if you don't have an nVidia motherboard. The vast majority of people right now have Intel setups with Intel motherboards that support either only a single card or, if they have 2 PCI-e x16 slots, AMD Crossfire. I'm pretty sure that the number of people with Crossfire-ready boards greatly exceeds the number of those with SLI-ready boards.

Crossfire scaling is indeed not as good as SLI, even now, but it is getting better. With the 4870 X2 using a hardware connection through the Crossfire sideport on the RV770, hopefully it will not be reliant on drivers for scaling, at least as much as current multi-GPU implementations are.

Munky · Jun 26, 2008

Originally posted by: keysplayr2003
"fat" as in "No more 1 complex shader and 4 simple shaders out of every 5? Is that one complex shader what you call "fat"?
If that were the case, we would be seeing a helluva lot more performance out of RV770. It would be almost staggering. So, methinks Beyond3d is barking up the wrong tree. AMD may have tweaked the shaders and the architecture a bit, but they are still VEC5 shaders if the performance in any indicator. If anything, the 48xx has 160 "fat" shaders and 640 annorexic shaders as opposed to R6xx 64 "fat" shaders and 256 annorexic shaders. IMHO.

The fat shader has the capability to do transcendetal functions (like SIN, COS, ...), but the "anorexic" shaders are just as capable in all other functions, like ADD, MAD, and bit shift. All 5 ALU's in a single shader SIMD array are capable of executing independent instructions, they are superscalar, but those instructions have to come from the same thread. It's likely that there are cases where it's not possible to to schedule 5 simultaneous independent instructions, but that's different than classifying the shaders as vec5 units. If you have a fragment program that works only with scalar data types, it doesn't mean only 1/5 of the units will be active; it actually depends on how much instruction level parallelism the compiler can extract from the code.

lopri · Jun 26, 2008

Since NV is shifting the G92 line to 55nm process, will there be 55nm version of GX2 as well?

ViRGE · Jun 26, 2008

Originally posted by: munky

Originally posted by: keysplayr2003
"fat" as in "No more 1 complex shader and 4 simple shaders out of every 5? Is that one complex shader what you call "fat"?
If that were the case, we would be seeing a helluva lot more performance out of RV770. It would be almost staggering. So, methinks Beyond3d is barking up the wrong tree. AMD may have tweaked the shaders and the architecture a bit, but they are still VEC5 shaders if the performance in any indicator. If anything, the 48xx has 160 "fat" shaders and 640 annorexic shaders as opposed to R6xx 64 "fat" shaders and 256 annorexic shaders. IMHO.

Click to expand...

The fat shader has the capability to do transcendetal functions (like SIN, COS, ...), but the "anorexic" shaders are just as capable in all other functions, like ADD, MAD, and bit shift. All 5 ALU's in a single shader SIMD array are capable of executing independent instructions, they are superscalar, but those instructions have to come from the same thread. It's likely that there are cases where it's not possible to to schedule 5 simultaneous independent instructions, but that's different than classifying the shaders as vec5 units. If you have a fragment program that works only with scalar data types, it doesn't mean only 1/5 of the units will be active; it actually depends on how much instruction level parallelism the compiler can extract from the code.

It sounds a heck of a lot like the Itanium situation playing out all over again. The hardware needs a great compiler to get full performance out of it. I hope AMD knows what they're doing, because that didn't work out so well for Intel.

Cookie Monster · Jun 26, 2008

Originally posted by: lopri
Since NV is shifting the G92 line to 55nm process, will there be 55nm version of GX2 as well?

GX2 has reached EOL.

@Virge

Its true what you've said. One of the drawbacks for RV670 and ultimately RV770 is that the compiler becomes one of the bottleneck (plus forces the driver team to optimize alot more if the need arises which is probably frequent seeing as codes vary alot in terms of shaders between games) because of the nature of AMD/ATIs architecture. nVIDIA on the other hand took the easiest way out of this problem by using scalar ALUs. That being said, there are tradeoffs. e.g Dual Precision.

BFG10K · Jun 26, 2008

It sounds a heck of a lot like the Itanium situation playing out all over again. The hardware needs a great compiler to get full performance out of it.

Yep, that's exactly what it is (VLIW). In general the hardware is less complex but the burden is shifted onto the compiler to extract good performance.

Fortunately for ATi having 800 shaders is quite useful and their compiler is probably quite good now.

NVIDIA 9800GTX+ Review Thread

Platinum Member

Elite Member

Platinum Member

Moderator Emeritus <br>

Diamond Member

Diamond Member

Elite Member

Diamond Member

Platinum Member

Elite Member

Platinum Member

Elite Member

Lifer

Diamond Member

Lifer

Lifer

Lifer

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Elite Member, Moderator Emeritus

Diamond Member

Lifer