• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Selective importing from flat file to SQL table

Homerboy

Lifer
I have a very large flat file that needs to be imported to a SQL table nightly (2.5M+ rows).

Much of this data I don't need.

I'd like to selectively limit which rows are actually imported based off the value of the "TRANS_DATE" column. Basically I don't care about anything prior to 2005... so something like "....WHERE TRANS_DATE < '2005-01-01' I know would be my normal query, but doing this via flat file import in SSIS, I'm not sure how to do that limitation.

The idea here is to both speed up the import (even if a little) and speed up queries on the table (this will cut the table down to nearly have the size)


Thanks in advance.
 
You can use the import data wizard in SQL Server Management Studio to set up the import from the flat file, and then somewhere in the advanced tab you'll see an option to edit the query. You can open the query and add a where clause that will do what you want.
 
I would import all the data into a staging table and then use SQL to move only the needed data over to your table. Truncate the staging table before each load.
 
I would import all the data into a staging table and then use SQL to move only the needed data over to your table. Truncate the staging table before each load.

This was actually my initial train of thought as well. Dump the entire flat file into a temp table, then via a simple query move from temp to real table.

My only hesitation with it was that it would actually increase the import time somewhat which was partially (though admittedly not the main) reason I wanted to limit the import.
 
Trying to come up with a scheme to avoid the staging table could eat up much more time if this is a one-time import.

With a staging table you also have a chance to sanity-check the import before adding it to your existing table, though that doesn't matter too much since you have a good backup scheme and will do a fresh backup right before the import. Right?
 
I guess I should have pointed out this is updated EVERY NIGHT.
I guess I shouldn't use the word "updated" either as nightly the entire TABLE is truncated then "rebuilt" from the flat file.

However, since there are a couple of indexes on the final table, if I used the temp table (no index) and imported the entire flat file there. Then imported from temp to final table with limitation on what was imported, I'd think the overall time would be reduced (since it would only be indexing on what will amount to about half the number of rows once I filter the import...

Hmmm decisions. That's whats so awesome about SQL (I am still learning it)... 100 ways to skin the same cat.
 
if you are using SQL2008 you can look into using MERGE. Otherwise, you'd be best using a staging table I think also.

UNLESS, you know for a fact the first 1M rows are all pre-2005, I believe on import you can tell it to skip a certain # of rows.
 
You can define "skip X rows" but I just won't know how many rows to skip, and technically speaking it will be dynamic too. I think, when it comes down to it I will be doing something like "...WHERE (DATEDIFF(YEAR, GETDATE(), TABLE.DATE) = -5))" to capture only the last 5 years worth of data.

I will look into the Merge function.
 
If you can find that particular column with a regex, you could use either grep (in Linux) or findstr (in Windows) to find rows that do not match "^(some_data)200[01234](optional data to help specify row)" or something. Or maybe search for "200[5-9]|201[0-9]" anywhere in the string.
 
yeah but I guess I'm missing how I would do that in an SSIS package itself. I know how I can do it if I am manually importing the flat file via SSMS or something. Maybe I can do it there, copy the TSQL statement over to SSIS or something...

Yeah I don't work with repeatable SSIS packages much, but I know that at the end of the manual import wizard you have the option to save an SSIS package, so maybe that's the answer.
 
Yeah I don't work with repeatable SSIS packages much, but I know that at the end of the manual import wizard you have the option to save an SSIS package, so maybe that's the answer.

Yeah. Within the SSIS packages there is a option to provide specific SQL statement. I think if I bastardize all this together, it may work... on some level.

🙂
 
I haven't used SSIS/DTS heavily for years, but you can put SQL in there, most definitely. It lets you configure the SQL to pull from the source, and then you can make customizations to the transforms per column... you can do just about anything you need to do in there.
 
I would recommend wiring a stored procedure to do the bulk insert from the csv file, and in the sp you would create the temp table to write the records into, query only the needed records into your final table, then dump all the staging data, etc. You could also rebuild any indexes you need in this sp. Here is a good article on using bulk insert for this task (http://blog.sqlauthority.com/2008/0...rt-load-comma-delimited-file-into-sql-server/

You would then schedule this job in the Sql Server Agent . You will need admin rights on the sql server in order to do bulk insert and schedule the job as well.
 
In SSIS, you could use a Conditional Split and have the data you want go into the SQL table and the rest into space.
 
Back
Top