• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Help Optimizing a SQL Query (MS SQL Server)

MGMorden

Diamond Member
Ok, I've got a TSQL query that I'm looking to clean up. I've got it working as is, but it just seems like a bass-ackwards way to achieve what I'm looking for and since this is being referenced by another query that is just running too slow, I'm hoping to speed it up.

My goal is, I have two databases: YearCurrent, and YearFuture. They contain a list of records and in MOST cases are the same. Whenever they're the same, I only care about the current record. If they're different, I need both. The records need to be aggregated into a single list though with in indication as to whether or not the record came from the current year or the future.

Right now I have the following:

SELECT
*,
'C' AS C_OR_F
FROM YearCurrent.dbo.ExtractSummary
UNION
SELECT
*,
'F' AS C_OR_F
FROM
(
SELECT *
FROM YearFuture.dbo.ExtractSummary
EXCEPT
SELECT *
FROM YearCurrent.dbo.ExtractSummary
) AS FUTURES_ONLY

That's just requerying the same exact data too many times though.

Anybody have any idea on how to just merge the datasets once and insert the C_OR_F value depending on which database the record came from?

Alternatively I'd love if I could somehow specify a list of fields for comparison on the EXCEPT and UNION operators.

Thanks.
 
What are the fields? How do you tell if there are duplicate future and current records?

The list of fields is pretty long - it's a summary view produced from joining several other tables.

In the version that I have now, I select everything from the current database, then union that with a virtual table that selects everything from the future table EXCEPT records in the current table. The SQL EXCEPT statement is what does the actual filtering.

Just seems wasteful as it's doing a full query of the current table (which is 95,000+ records in size with about 40 fields per record) twice rather than once.
 
Definitely sounds inefficient. Do the records need to be exact duplicates? can you filter based on some single field or multiple fields? Can you post some sample current and future records?
 
Assuming you can't have duplicates within the same table, you can union the 2 tables and do something like this... this is written in Oracle, but I assume there is an equivalent way to do it in your database language.

select field1, field2, etc, decode(sum(year),1,'Current',10,'Previous','Both')
from
(select field1, field2, etc, 1 year
from current_year
union all
select field1, field2, etc, 10 year
from past_year
)
group by field1, field2, etc

So basically, you are joining the 2 tables together into 1 big dataset (so you get 1 full table scan on each table). Then grouping them together to join the duplicates into 1 row. The numeric year field will help you determine which table it came from. since you are summing them up, any time it was just a single row grouped together, it would be either a 1 or a 10. 1 = current year, 10 = prior year. if they ever sum up to 11, that means the row was on both tables. If it ends up being something other than 1,10,or 11, that means one table had it on it duplicate times - so this whole thing assumes that situation can't occur. If that situation occurs, just stick a distinct in each query that you are unioning.

I think that should work, just uses a bit of trickery. Should end up just being a full table scan of each table, and the rest of the data manipulation happens in memory, so you aren't ever fetching the same row more than once.
 
Oh, and 1 other comment... even if you stick with what you have now, change the UNION to a UNION ALL. UNION filters the resulting dataset and removes duplicates. UNION ALL does not. In your case, you'll never have a duplicate since you are hard-coding a 'C' in the one table and an 'F' in the other. So it's doing all that filtering for nothing. And when you have large sets to join, that can be a significant amount of overhead.
(at least that's how it works in Oracle).
 
why don't you just open up the query analyzer and see where your query and the tables/indexes are spending the most time? if a stored procedure can not do your work more easily - sometimes its easier to code (php/.net/whatever) on the client.

sql server's gui is to die for - i've spent hours tweaking indexes and queries to ink out every last drop of execution time waste - i'm not pro but i certainly wouldn't have the skill to do it in my head
 
People who use select * need to be shot. Having said that, the query optimizer should be automatically rewriting the exists sub query as as an outer join. At minimum, that should be a union all statement since you can not have duplicates the way it is written. A union (without all) will order and run though the result set again looking for duplicates, which doesn't need to be done.

In honesty, the table needs to be changed to make that query run quickly. You need a single column that keeps a hash of the data contained in the rest of the row that is automaticly updated when data changes. Then a quick hash compare (join) on that column will find you the changed rows. I assume that table doesn't have a datestamp on change or anything like that. If it does, then that query needs to be changed to an outer join.

Anyway, here:

SELECT YearCurrent.dbo.ExtractSummary.*, 'C' AS C_OR_F FROM YearCurrent.dbo.ExtractSummary
UNION ALL
SELECT *, 'F' AS C_OR_F
FROM
(
SELECT YearFuture.dbo.ExtractSummary.*
FROM YearFuture.dbo.ExtractSummary LEFT OUTER JOIN YearCurrent.dbo.ExtractSummary ON <every field that can change>
WHERE YearCurrent.dbo.ExtractSummary.<PrimaryKeyField> IS NULL
) AS FUTURES_ONLY

To help more, I need the data structure, key in the tables, and the fields that can change.
 
Back
Top