"1) nvidia has been p1mping their version of HW T&L for what seems like forever now"
Just over a year, nearly forever in the computer industry
"2) HW T&L has been given "high praises" by game developers
3) HW T&L OBVIOUSLY can make a difference, both visually and speed-wise, when implemented "properly"
Of course..
"4) No game developers have done anything with T&L to make me think T&L presently is anything more than a box you can check in drivers"
With a ~1000MHZ CPU perhaps not. You have pointed out to me that I need to upgrade my CPU several times and I can in complete honest say that as far as gaming goes, outside of benching I have absolutely no reason to. The majority of the latest 3D games, that list faster CPUs then I have as reccomended, also support hardware T&L saving me a $200-$300 CPU upgrade as far as gaming is concerned. It does do me a lot of good, in games, right now. Perhaps if you only pick up an occasional title now and then you might not have noticed, but T&L support, while not pushing what it could, is alive and doing quite well for many.
"5) DX8 wIll support HW T&L, but will be geared toward the "programmable" kind, not the "hardwired" kind that the GeForce and Radeon have"
There seems to be much confusion over this, so a bit of an explenation.
There are mainly two things that "programmable" T&L does better then hardwired(well, it does and hardwired doesn't). One is non static vertices. This does
NOT mean anything that moves as so many seem to think, it indicates that the model itself is changing.
For an example, think of a driving game where you dent a fender. With hardwired T&L, the vertices that are creating the dent, on that area of that model, need to be offloaded to the CPU. Now this covers what, perhaps 3%, and that is pushing it, of the on screen vertices, and I don't know how good most people are, but I don't dent anywhere near 60 fenders per second
The times when that is an advantage are very small at best, miniscule in most situations. For other types of effects, such as those shown off by ATi's vertex morphing, well the GF and Radeon can both handle them already. I don't say this as speculation, anyone can dl the MS DX8 SDK that owns either a GF/GF2 or Radeon and see for yourself(the dancing MS logo is fuggin cool

)).
The other is HOS. This should be a rather important factor, but I see this mainly as aiding in saving bandwith. Oversimplified, you can upload say a *fender* instead of the 200 polys greatly reducing the amount of bandwith used.
Now programmable T&L isn't all good, it has several drawbacks. The first is that none of the implementations will be fully programmable, it is simply too slow. Look at how much time has passed since the launch of the GF1, and it is still significantly faster then the latest and greatest general purpose x86 CPUs. The more programmable and flexible you make a T&L unit, the slower it will be, it is a tradeoff. The upcoming T&L engines are *more* flexible then the current offerings, but they are far from the level of flexibility that many are thinking.
"6) nvidia is completely revamping their T&L with the NV20. It won't be anything like the GeForce series. In other words, they're completely abandoning their current T&L unit in favor of a programmable unit."
No, not at all. The new flexible engines will enable more features, but it does absolutely nothing to stop current implementations from being used to their fullest. This is where the API comes in(in this case, DX8). If you look at the vaious T&L boards that have been available throughout the years(pro side) they have greatly varrying levels of support, but yet the oldest of them or the latest will work(driver allowing) with pretty much any application it can handle(some require a minimum level of support). Any features that are not present, won't be used, but all features that are there still are used. The driver and the API will make sure all the operations are handled by the proper "unit" for each board.
"1) Today's cards are memory throughput-limited"
You've seen what HSR can do for the V5? Remember that the NV20(at the very least) was designed from the ground up for HSR, will have ~double the memory bandwith *and* will have FSAA that only uses a fraction the bandwith of the current boards. Said another way, we may well see Q3 1600x1200 32bit UHQ 4x FSAA in the 100FPS range.
It is common practice for certain companies to promote heavily the fact that we are near our bandwith limits, but the fact is that we are also near our
monitor limits. Monitor technology is nearly as slow as battery advancements. With the next gen, we should pretty much all be limited by our monitors and not any sort of memory bandwith(though this may be the generation after that). I know that bandwith requirements are going up, but video card technology has been moving *much* faster. Remember the treat that GLQuake was at 640x480 30FPS? We are now closing in on 1600x1200 100FPS, beyond the refresh rates for all but the highest end monitors at the highest resolutions most monitors support.
In summary, fillrate numbers are going to become increasingly less relevant along with bandwith in many instances. The fillrate is king is a line of thought promoted by two companies that have banked their future on tiling implementations, if they are no longer needed in a year, then they may well be in serious trouble.
"2) onboard HW T&L is VERY throughput-intensive"
Hell no. Using a whopping 266MB p/s bandwith I can push ~5 million polys per second. Say we figure for 100FPS, that gives us 50,000 polys per frame, roughly five fold increase over Quake3. We already have quadruple that amount of bandwith available over the AGP bus, without using any local bandwith. 200,000 polys per frame, roughly fifteen times more then Quake3 before we need to start worrying too much about bandwith.
"3) T&L only shows a speed increase in low-end CPUs or low resolutions"
If the board is fill limited at the higher resolutions then added T&L power won't do you any good, but how high end do you need to go? Look at MDK2 with quality lighting enabled, the V5 is handily being bested at all resolutions, not what I would consider playable with the overwhelming majority of CPUs out today(seen it run on a PIII 900MHZ, definately not silky smooth even at 640x480 16bit). This game, in particular, is a great example of they had no excuse in the world to not bump up the poly counts and go for the whole deal except saving time. They went through the trouble of adding the improved lighting, and it ran too slow to be truly useable on a non T&L board anyway, clearly this could have been a big title that came up short in that aspect(but it is still one he!! of a game

).
"Now, obviously the "next generation" of games (i.e. DX8 games) are going to be far more complex than the games today. They're also going to have HW requirements that are MUCH higher than the games of today."
Expect a plethora of seriously kick @ss games that require, and run smoothly on, a ~733 MHZ/128MB RAM/DX8 compliant board. This should remain the norm for system requirements for quite some time, I would say eighteen months on the short end with twenty four to thirty months being well within reason(no, I'm not joking at all).
"If T&L only shows a noticeable difference in today's games at low resolutions with low-end CPU's, then what is today's T&L going to do for us tomorrow, especially considering DX8 games will be geared toward programmable T&L units, which are nothing at all like todays' HW T&L units?"
MDK2 shows a bigger boost using T&L on a GHZ system then it does on a 500MHZ, so does Evolva and TD6. No developer has taxed current T&L units yet, with DX8 this is in fact easier(yes, even with DX7 hardware). The programmable part I already mentioned above, tasks are not going to get any simpler for CPUs, the more that can be offloaded the better, and hear even current hardware T&L will still outperform any CPU we are likely to see for some time.
"I mean, we have several people who STEADFASTLY will NOT lower resolution to enable FSAA, a feature which helps improve the way things look, especially at lower resolution."
You lose detail dropping the resolution to enable FSAA. I don't think anyone who has looked into the subject with an objective and educated eye will try and refute that. There was a long discussion at B3D about this, and even the biggest FSAA proponents had to admit that that is the case. With increased geometry, you drop the res to *increase* detail.
"Is a card that is stressed at reasonable settings in today's games, like the Radeon SDR or the MX, going to be able to hang with tomorrows games JUST because it has "a T&L unit"?"
A few of the games that I play I can play smoothly *because* I have a hardware T&L unit, without which I wouldn't be able to. That is now. None of these games however, come close to taxing that T&L unit. It still has plenty of overhead to be used before it starts to become an issue, fillrate and memory bandwith(for fillrate, not T&L) are the limiting factors. With intelligent desings for games, you can come up with significantly lower bandwith requirements by using flat shaded textures for most things and let the lighting take care of the rest(for most things, and this certainly isn't a joke, look at Toy Story).
"Soooo.... what good does today's HW T&L units do for us tomorrow?
Anything? Take up space on our card? Give us a box to put a checkmark in? Give us bragging rights?"
Have you seen Unreal2 running on a GF2? Fuggin incredible and running at 30FPS before *any* optimisations(faster then the original Unreal runs on my Athlon 550 and GF DDR without patches, by a decent margin too

). Yes, current T&L most certainly will do you good. How much depends on how frequently you upgrade. If you are going to pick up a GF2 and a V5 now and keep them for eighteen months, the GF2 will be significantly better then the V5 by the time of your next upgrade. If, however, you upgrade every six months then it is likely that you also upgrade your CPU frequently and you will not miss it *as* much, not to mention you will have a T&L board by the times all games use the feature.
"Did "HW T&L" have ANYTHING to do with your current card purchase? Since no games out now show much of a difference, you obviously must've been preparing for tomorrow, right?"
In one game, I have a ~400% performance improvement using hardware T&L. That may be minor to you, but moving from ~30FPS to well over 100FPS is definately noticeable by yours truly
With that said, yes, hardware T&L was definately an influence in my purchase, but not mainly for gaming. The fact that it has allowed me to use my CPU as long as it has without worry of being outdated is certainly a boost, and has done well to help it seem like a wise choice in my mind. I don't know how frequently people on these boards check out demos, but it is getting harder and harder to find games that *don't* support T&L now, many don't even bother to state it(such as No One Lives Forver, hardware T&L is built into the core Lithtech 2 engine, but no mention is made although with the settings I use a PIII 750MHZ is reccomended and the game still runs silky smooth).
"3Dfx seemed to make a big deal out of their "T-buffer" and "Motion Blur" technology with the Voodoo 5, yet, I don't know of one title that is going to use it. Correct me if I'm wrong."
Funny thing about motion blur, the GeForce boards all support it under DX8. Since it is no longer a proprietary feature, and in fact is superior in DX8 in terms of implementation, we may see some games start to use it. This isn't something that I would look forward to though, it isn't all that great.
OFF THREAD TOPIC-
Now I don't wanna crap in one of Robo's threads(yeah... like that is ever going to stop me

) but a few things happening around hear are jsut plain stupid.
Beta drivers are beta drivers no matter who makes them. I have seen a ton of spewing about the horrible stability of nV's drivers based nearly entirely on beta drivers. I also see 3dfx catching crap because of the "buggy" HSR support from the nV end with the same bashers of nV's beta drivers there to defend it. Hypocrisy is truly a great show of zealotry. Neither company should be given sh!t for their beta drivers not working properly, no company should.
The other point is the 180 that the 3dfx troops have taken on compatibility issues. I can name names and pull up quotes before people start saying "it wasnt me", but the same people that bashed nVidia for having problems with out of spec mobos are now backing 3dfx for having very similar problems(although on a technical basis the PIV is to spec, I still lay the blame entirely at Intel's feet). If you want to prove your not a blind zombie/zealot/idiot then act like it. Have a set of standards and stick to them. If it is nVidia's fault for underpowered AGP ports(which I don't think it is), then it is 3dfx's fault for building a board not compliant with known specs(which I don't think it is). If you want to bash a company because of problems with beta drivers, then don't defend another using the beta excuse.