Originally posted by: Dhaval00
It depends on how complex the query is. I just got burned on one of my projects last week for using table variables [synonymous to TVFs] in my stored procedures. There will always be issues that you will find in production because the data just keeps growing, especially so in case the system is an archival DB. In any case, there are shit loads of articles out there that tell you to use TVFs, table variables, or CTEs (Common Table Expressions), but hardly signify the upper limits on these constructs.
Going back to my example of my use of table variables, my stored procedures get called from within integration services, and until the DB had around ~200 million rows, everything was good. The run time of the queries then was probably about 10 minutes and the work included hitting 7-8 tables with most of them hosting around a couple million transactions, as well as the biggest table above with ~200 million rows. Of course, all of these have proper indexes [covered + non-covered]. Data would be queried every day to generate user reports and also data would be moved between the transactional DB and a trimmed version of a data warehouse. All of a sudden, these tasks which used to take about 10 minutes, now took 2-3 hours to finish [this happened overnight]. After crap loads of debugging and monitoring, I discovered that the DISK IO was taking a substantial hit when I was doing work on the data in the table variables.
The logic was something along these lines: It wouldn't be feasible to loop through 200 million rows and other large tables every night. So I made use of table variables and a staging DB, which also allowed for incremental updates, to query the tables on an incremental basis. Then I would be dealing with only a few thousand transactions as opposed to millions. Turned out that there is a 'threshold' in SQL Server, surpassing which, forces the Query Optimizer to just give up on table variables - when I say 'give up,' I mean the Query Optimizer's execution plan is fucked. I was doing things like updating the table variable once it was populated, which forced SQL Server to start an ambient transaction. This in turn also resulted in writing the logs of these updates to the log file which is huge because of millions of records - this is where SQL Server went into a suspended state, under the wait type PAGEIOLATCH_SH; simply put, the hard disk wasn't able to keep up with what SQL Server wanted to do. If you don't know, PAGEIOLATCH_SH indicates a resource contingency issue with the disk system. Overall, because our DB hit this threshold, the time for my queries increased from a few minutes to few hours overnight.
So the solution? Temporary tables! Even though some articles advertise the use of table variables and TVFS, they won't take into account such performance issues. In general, temporary tables can have indexes and statistics, which helps the Query Optimizer to cache the best execution plans - you don't get this liberty with table variables and TVFs. So my advice to you is to look at the execution plans. If you have updates/inserts in your logic, you possibly want to use a temporary table (you can create indexes inline, and then drop the table at the end). Then, it also depends on the overall size of your tables. Also, if your queries are too complex, ensure you have proper indexes defined [assuming it is a transactional + reporting DB]. One place where you may get hosed is when you need to pinpoint the bottleneck in your queries; I use PRINT statements to print the current time in my stored procedures at each step, which allow me to monitor around how much time each query is taking.
Hope this helps.
Edit: By the way, after switching to temporary tables, my run times have now gone down to 2.5 minutes. LOL.