Originally posted by: Rollo
I keep saying it's the 2002 feature set because, for the most part, it is.
Rollo, all current NV parts are more-or-less decendents of the Riva TNT2 cores too.
Do you see anyone running around bashing NV based on that line of commentary? No? Perhaps you can learn something from that then. The line is getting old, and as new features get added, it starts to make you look a little more more foolish each time that you repeat it.
Originally posted by: Rollo
ATI will have a SM3 card out soon, and all the people duped into buying a high cost X800 as a long term investment in 2005 are going to gnash their teeth as the value of their cards plummet.
Likewise, so are so many people that purchased current high-end NV4x parts, including those expensive SLI rigs - once they find out that their investment isn't actually that high-performance when running game that uses "real" SM3.0 code... well, they're not going to be very happy, I don't think.
But WTF am I doing, here I am, feeding the troll, and helping to contribute to devolving this thread into an NV/ATI comparison, when in fact, we should be focusing on whether or not SM3.0 has any
real value, as a
usable feature, on
today's current-gen cards (which just happens to be NV cards, right now.)
I also just wanted to posit some technical reasoning behind this all. First off, the disclaimer - I'm not intimately familiar with NV's NV4x pipeline structure, nor SM3.0 code in particular. But I am very familiar with CPU micro-architecture, on many platforms, so I'm going to generalize my understanding of that, and use it to generate a hypothesis here for discussion. If anyone has any more accurate/direct knowledge of NV4x pipeline architecture or SM3.0 coding, please step up.
I'm going to compare branching vs. non-branching throughput-oriented, pipelined architectures here.
In terms of a development/programming standpoint, offering the capability for branching can greatly simplify the work/algorithms needed, compared to coding for a dataflow architecture that lacks control-flow operations. So branching is a win in terms of developer ease and productivitity here.
But in terms of actual low-level hardware implimentation, it can be a nightmare. For longer-pipelined architectures, allowing branching/looping constructs, creates the possibility of pipeline stalls/flushes. So if you, as a the silicon designer, impliment a chip with 8 pipelines rather than 4, that's double (or more) the chip real-estate, requiring higher development/validatation and mfg costs to make those chips.
Now, if those pipelines were "non-branching", that means that you've just doubled your effective throughput of that chip, compared to the prior-gen chip with only half of the pipelines.
However, here's the crux of this issue - if you then allow those pipelines to impliment control-flow operations, which can cause stalls/flushes - let's hypothetically say that out of those 8 pipelines in operation, half of them at any one particular point in time may be experiencing stalls or whatnot. (Here's where some accurate low-level knowledge of their actual implementation would come in handy - experts?) Now the effective throughput of your new, more-expensive 8-pipeline chip, is back down to the
same sort of effective throughput of the cheaper, 4-pipeline chip, that didn't implement a branching programming model. Uh-oh. Sure, developer productivity is up, but actual hardware performance is down. Way down. Back to a prior-generation part level of performance down, even. People will start to wonder why they even decided to spend twice the money to replace their older card with a newer one that promise this new feature, only to find out that it ... doesn't do much for them, at least strictly speaking performance-wise. (It may allow those games that they want to play, to get released onto the market sooner, but that doesn't help their actual frame-rates any.)
Now do people understand what (may) be at issue here?
Now if NV is planning, on their next-gen parts, to implement something akin to Intel's HT support for their P4 CPUs, which effectively hides pipeline stalls by allowing a secondary thread to take advantage of the opportunity, and use the CPU's otherwise-idle functional units, to ensure that they stay maximally-utilized, then they would be a step in the right direction. It would likely help "fix" the potential problem that might otherwise become apparent in branch-heavy code on chips that didn't implement something like that.
But current-gen parts, don't have that feature, do they?
So,
if this is a real issue, in terms of the developer support for maximum hardware performance for current-gen parts, it would actually be
best to
avoid using branching shader code - which is one of the only major differences between SM3.0 and the prior-gen 2.0 stuff. (More or less.)
Higher-level graphical effects "features" like the much-touted "HDR", are not inherent features in the SM2 or 3 specs, but can be implimented in either. But due to the additional ease that a branching programming model offers, most devs will take the easy way out, and code it that way. However, that of course will likely not result in the highest level of actual hardware performance on currently-released parts.
I hope that I've hit most of the highlights here, and hopefully this thread can get back on track... but I doubt it.
Originally posted by: Rollo
I think their engineers were so proud of the R300 (justifiably so) they bought all the pot and Pink Floyd records in Canada and moved to the Caymans.
LOL. Sounds like possibly they might be partying with some of my cohorts from my game-programming days, after a developer release party.
Originally posted by: Rollo
They ran out of money for coconut shrimps and margaritas sometime late last year, so they stowed away on a fishing trawler, showed up back at the office with notes explaining their absence from Ziggy Marley, and announced the R520.
:roll:
LOL. Hey man, nothing beats Margueritaville.