SQL Server 2008 Linked Server Questions

brandonbull · Feb 9, 2010

Which method would be the superior way to extract data?

Create several linked servers in SS to Postgres and pull data down via queries

or

Create extract files from Postgres and then load them into SS via SSIS

KLin · Feb 9, 2010

Do you have several different postgres database servers?

JasonCoder · Feb 9, 2010

I'd only go with extract files if you were prohibited from linking the servers. Extract files has to be more difficult and I'm lazy.

Markbnj · Feb 9, 2010

I think dumping files would be by far the easiest way to go, since it doesn't involve the server configuration at all. Linking servers just to pull data from one to the other seems like overkill to me. I would usually only do that when I have data that spans the servers and is needed for runtime queries, not as a data migration technique.

I don't know postgres, but if you can get the data out into a comma- or tab-delimited file, importing it into SQL Server is absurdly easy and fast.

Of course, now that I think about it, if you have a driver for Postgres then SS can connect directly to that database and pull rows from one table into another.

brandonbull · Feb 9, 2010

Markbnj said:
I think dumping files would be by far the easiest way to go, since it doesn't involve the server configuration at all. Linking servers just to pull data from one to the other seems like overkill to me. I would usually only do that when I have data that spans the servers and is needed for runtime queries, not as a data migration technique.

I don't know postgres, but if you can get the data out into a comma- or tab-delimited file, importing it into SQL Server is absurdly easy and fast.

Of course, now that I think about it, if you have a driver for Postgres then SS can connect directly to that database and pull rows from one table into another.

We have a 3rd party data provider, PNGP, for Postgres.

The concept is to move data over from our Postgres production server to SQL Server as daily deltas. We have some stored procs that are called to query against the production dbs to pull records that have been updated within a certain datetime range.

Oyster · Feb 9, 2010

Depending on the complexity of your tasks SSIS may or may not be an overkill. The 2008 SSIS pipeline is extremely powerful and very efficient. It will allow you to clean up your data, do super-fast comparisons across databases (upserts), and organize your data (if need be). Of course, there is a BULK INSERT task which is highly configurable and which has been optimized for loading data into SQL Server.

Having said that, there are few caveats you'll need to keep in mind when using SSIS - you have to be very good with the profiler to monitor the kind of queries SSIS is issuing to your backend databases. For example, you can reference a table directly or use a stored procedure within your Data Flows - either way, SSIS queries the DB twice... first for the metadata, then for the actual data! Without the profiler, you'd never know what was going on - either that or your job took three hours to run. Of course, I am not saying I have experienced some of this 🙂.

If all you need to do is bring in data at regular intervals without having to worry about data consolidation, clean up, etc., just write a script or a .NET console application that bulk inserts data. In this case, do what Mark mentioned earlier - get data out in delimited files and use the BCP utility to bulk insert data.

brandonbull · Feb 10, 2010

Oyster said:
Depending on the complexity of your tasks SSIS may or may not be an overkill. The 2008 SSIS pipeline is extremely powerful and very efficient. It will allow you to clean up your data, do super-fast comparisons across databases (upserts), and organize your data (if need be). Of course, there is a BULK INSERT task which is highly configurable and which has been optimized for loading data into SQL Server.

Having said that, there are few caveats you'll need to keep in mind when using SSIS - you have to be very good with the profiler to monitor the kind of queries SSIS is issuing to your backend databases. For example, you can reference a table directly or use a stored procedure within your Data Flows - either way, SSIS queries the DB twice... first for the metadata, then for the actual data! Without the profiler, you'd never know what was going on - either that or your job took three hours to run. Of course, I am not saying I have experienced some of this 🙂.

If all you need to do is bring in data at regular intervals without having to worry about data consolidation, clean up, etc., just write a script or a .NET console application that bulk inserts data. In this case, do what Mark mentioned earlier - get data out in delimited files and use the BCP utility to bulk insert data.

The current process fires SQL Server stored procs at scheduled times and these procs pull the data over via the linked servers from the production db and then the data is loaded in via merge statements or table rebuilds and some data types have to be transformed due to incompatible data types between postgres and sql server.

I'm trying to gather support for having the production db generate the extracts and then have SSIS load the files into SQL Server. I know that bulk copy is fast but would it be faster than using openquery across a linked server?

Oyster · Feb 10, 2010

This is something you'll have to test. I have never been in a similar situation so cannot comment on it. Your requirements make it sound as if SSIS would be a pretty good solution, albeit it satisfies your performance requirements. My superficial take on it is that the SSIS pipeline will be the last thing to choke - you'll have to worry about your HD, network bandwidth, and DSN limitations (things like if big data sets are allowed across the wire). Also, I am not sure about the consequences of using openquery from within SSIS. If you must use openquery, I think it would be faster to get the data out in flat files, if possible, and then use SSIS for your data transformations, merge tasks, and bulk inserts.

One thing I don't know is if openquery will stream and load data at the same time. I know for a fact that SSIS will stream and load data in parallel - you could have [by default] up to five threads within a data flow streaming data and another five threads bulk loading data at the same time. Also, SSIS will allow you to control the batch size of each stream. It all depends on your design and your data dependencies.

JasonCoder · Feb 14, 2010

Markbnj said:
I think dumping files would be by far the easiest way to go, since it doesn't involve the server configuration at all. Linking servers just to pull data from one to the other seems like overkill to me. I would usually only do that when I have data that spans the servers and is needed for runtime queries, not as a data migration technique.

I don't know postgres, but if you can get the data out into a comma- or tab-delimited file, importing it into SQL Server is absurdly easy and fast.

Of course, now that I think about it, if you have a driver for Postgres then SS can connect directly to that database and pull rows from one table into another.

I was speaking from the standpoint of not knowing right off the top of my head how to do a dump in pgsql.

Also, if this is going to be repeatable, I don't see how dumping files is any easier to automate.

Markbnj · Feb 14, 2010

JasonCoder said:
I was speaking from the standpoint of not knowing right off the top of my head how to do a dump in pgsql.

Also, if this is going to be repeatable, I don't see how dumping files is any easier to automate.

I agree in retrospect. I would probably just set up a job in SSIS and pull the data directly.

SQL Server 2008 Linked Server Questions

brandonbull

Diamond Member

KLin

Lifer

JasonCoder

Golden Member

Markbnj

Elite Member <br>Moderator Emeritus

brandonbull

Diamond Member

Oyster

Member

brandonbull

Diamond Member

Oyster

Member

JasonCoder

Golden Member

Markbnj

Elite Member <br>Moderator Emeritus

TRENDING THREADS