Selective importing from flat file to SQL table

Homerboy · Jan 11, 2012

I have a very large flat file that needs to be imported to a SQL table nightly (2.5M+ rows).

Much of this data I don't need.

I'd like to selectively limit which rows are actually imported based off the value of the "TRANS_DATE" column. Basically I don't care about anything prior to 2005... so something like "....WHERE TRANS_DATE < '2005-01-01' I know would be my normal query, but doing this via flat file import in SSIS, I'm not sure how to do that limitation.

The idea here is to both speed up the import (even if a little) and speed up queries on the table (this will cut the table down to nearly have the size)

Thanks in advance.

Markbnj · Jan 11, 2012

You can use the import data wizard in SQL Server Management Studio to set up the import from the flat file, and then somewhere in the advanced tab you'll see an option to edit the query. You can open the query and add a where clause that will do what you want.

brandonbull · Jan 11, 2012

I would import all the data into a staging table and then use SQL to move only the needed data over to your table. Truncate the staging table before each load.

Homerboy · Jan 11, 2012

brandonbull said:
I would import all the data into a staging table and then use SQL to move only the needed data over to your table. Truncate the staging table before each load.

This was actually my initial train of thought as well. Dump the entire flat file into a temp table, then via a simple query move from temp to real table.

My only hesitation with it was that it would actually increase the import time somewhat which was partially (though admittedly not the main) reason I wanted to limit the import.

DaveSimmons · Jan 11, 2012

Trying to come up with a scheme to avoid the staging table could eat up much more time if this is a one-time import.

With a staging table you also have a chance to sanity-check the import before adding it to your existing table, though that doesn't matter too much since you have a good backup scheme and will do a fresh backup right before the import. Right?

Homerboy · Jan 11, 2012

I guess I should have pointed out this is updated EVERY NIGHT.
I guess I shouldn't use the word "updated" either as nightly the entire TABLE is truncated then "rebuilt" from the flat file.

However, since there are a couple of indexes on the final table, if I used the temp table (no index) and imported the entire flat file there. Then imported from temp to final table with limitation on what was imported, I'd think the overall time would be reduced (since it would only be indexing on what will amount to about half the number of rows once I filter the import...

Hmmm decisions. That's whats so awesome about SQL (I am still learning it)... 100 ways to skin the same cat.

WannaFly · Jan 11, 2012

if you are using SQL2008 you can look into using MERGE. Otherwise, you'd be best using a staging table I think also.

UNLESS, you know for a fact the first 1M rows are all pre-2005, I believe on import you can tell it to skip a certain # of rows.

Homerboy · Jan 11, 2012

You can define "skip X rows" but I just won't know how many rows to skip, and technically speaking it will be dynamic too. I think, when it comes down to it I will be doing something like "...WHERE (DATEDIFF(YEAR, GETDATE(), TABLE.DATE) = -5))" to capture only the last 5 years worth of data.

I will look into the Merge function.

Ken g6 · Jan 11, 2012

If you can find that particular column with a regex, you could use either grep (in Linux) or findstr (in Windows) to find rows that do not match "^(some_data)200[01234](optional data to help specify row)" or something. Or maybe search for "200[5-9]|201[0-9]" anywhere in the string.

LokutusofBorg · Jan 12, 2012

You can most definitely put actual SQL into the import and limit what gets imported, as Mark said.

WHERE YEAR(Table.Date) > 2005

http://msdn.microsoft.com/en-us/library/ms186313.aspx

Homerboy · Jan 12, 2012

LokutusofBorg said:
You can most definitely put actual SQL into the import and limit what gets imported, as Mark said.

WHERE YEAR(Table.Date) > 2005

http://msdn.microsoft.com/en-us/library/ms186313.aspx

yeah but I guess I'm missing how I would do that in an SSIS package itself. I know how I can do it if I am manually importing the flat file via SSMS or something. Maybe I can do it there, copy the TSQL statement over to SSIS or something...

Markbnj · Jan 12, 2012

Homerboy said:
yeah but I guess I'm missing how I would do that in an SSIS package itself. I know how I can do it if I am manually importing the flat file via SSMS or something. Maybe I can do it there, copy the TSQL statement over to SSIS or something...

Yeah I don't work with repeatable SSIS packages much, but I know that at the end of the manual import wizard you have the option to save an SSIS package, so maybe that's the answer.

Homerboy · Jan 12, 2012

Markbnj said:
Yeah I don't work with repeatable SSIS packages much, but I know that at the end of the manual import wizard you have the option to save an SSIS package, so maybe that's the answer.

Yeah. Within the SSIS packages there is a option to provide specific SQL statement. I think if I bastardize all this together, it may work... on some level.

🙂

LokutusofBorg · Jan 13, 2012

I haven't used SSIS/DTS heavily for years, but you can put SQL in there, most definitely. It lets you configure the SQL to pull from the source, and then you can make customizations to the transforms per column... you can do just about anything you need to do in there.

wrinkefreeon3 · Jan 13, 2012

I would recommend wiring a stored procedure to do the bulk insert from the csv file, and in the sp you would create the temp table to write the records into, query only the needed records into your final table, then dump all the staging data, etc. You could also rebuild any indexes you need in this sp. Here is a good article on using bulk insert for this task (http://blog.sqlauthority.com/2008/0...rt-load-comma-delimited-file-into-sql-server/

You would then schedule this job in the Sql Server Agent . You will need admin rights on the sql server in order to do bulk insert and schedule the job as well.

KentState · Jan 20, 2012

In SSIS, you could use a Conditional Split and have the data you want go into the SQL table and the rest into space.

Selective importing from flat file to SQL table

Homerboy

Lifer

Markbnj

Elite Member <br>Moderator Emeritus

brandonbull

Diamond Member

Homerboy

Lifer

DaveSimmons

Elite Member

Homerboy

Lifer

WannaFly

Platinum Member

Homerboy

Lifer

Ken g6

Programming Moderator, Elite Member

LokutusofBorg

Golden Member

Homerboy

Lifer

Markbnj

Elite Member <br>Moderator Emeritus

Homerboy

Lifer

LokutusofBorg

Golden Member

wrinkefreeon3

Junior Member

KentState

Diamond Member

TRENDING THREADS