MS SQL Performance question - delete all rows except about 1% of them

WannaFly · Apr 11, 2008

I have a job that runs every night that updates a table from a file that is FTP'd to me. The key thing is all rows except about ~2000 are deleted (if ID starts with a certian letter, they dont get deleted). I've noticed my transaction log fills up quickly because it is deleting ~500k rows every night. I know the truncate command uses less of the transaction log.

So my question is:
Would it be more efficient to copy the 2000 rows to a different table, truncate the table, then copy them back?

the job pretty much looks like this:
1. Get file
2. Delete from <table> where id not like 'p%'
3. Load from txt file

I think it might be better to do:
1. Get File
2. Insert into <TempTable> (select <cols> from <sourcetable> where id like 'p%')
3. Truncate <sourcetable>
4. Load from txt file
5. Insert into <sourcetable> (select <cols> from <TempTable> where id like 'p%')

It's pretty situational but I'm just curious, I might have to try it out myself. I know it's bad design and the 2000 that don't get deleted should be in a different table, but it wasnt my decision.

Thanks for the input!

Crusty · Apr 11, 2008

If all you are trying to do is reduce the amount of entries in the transaction log then go for it. Just remember that you won't have any record of WHICH rows got deleted.

Furor · Apr 12, 2008

What exactly is the problem, the performance of the job or the transaction log filling up? Obviously, doing a truncate is going to help with the transaction log issues, but as far as performance - change your like 'p%' to use some type of indexed integer or datetime field for better performance. If that isn't an option, maybe you could do where left(id,1) = 'p' instead of LIKE

KentState · Apr 13, 2008

Originally posted by: Furor
What exactly is the problem, the performance of the job or the transaction log filling up? Obviously, doing a truncate is going to help with the transaction log issues, but as far as performance - change your like 'p%' to use some type of indexed integer or datetime field for better performance. If that isn't an option, maybe you could do where left(id,1) = 'p' instead of LIKE

If the column he is doing the LIKE against is indexed, it is far better to use LIKE than a substring. SQL Server can still use an index with a LIKE clause but not with a substring.

My questions is, what type of database is this? Do you need anything more than SIMPLE mode? For example, with a data warehouse, you do not need all of the transaction logging since the system is not changed throughout the day.

KB · Apr 14, 2008

One way to reduce the trans log is to change the recovery model:
Within your copy job set the database recovery model to bulk-logged mode:
Write your bulk changes, which will be minimally logged,
IMPORTANT: Set the database back to Full recovery model.

ALTER DATABASE SET RECOVERY MODEL FULL

Bulk-logged
The bulk-logged recovery model protects against failure and offers the best performance. In order to get better performance, the following operations are minimally logged and not fully recoverable: SELECT INTO, bulk-load operations, CREATE INDEX as well as text and image operations. Under the bulk-logged recovery model, a damaged data file can result in having to redo work manually based on the operations that are not fully logged. In addition, the bulk-logged recovery model allows the database to be recovered to the end of a transaction log backup only when the log backup contains bulk changes.

Benefits of bulk-logged
The primary benefit of of using the bulk-logged recovery model is that it minimizes the space requirements for the transaction log so it can track every single operation that is part of the transaction. This is helpful if you have large systems where you import large sets of data or create large indexes. The transaction still gets committed and written to your data file, but the space requirements for the transaction log are not as great as with the full recovery model. From testing, you will see that inserting data still requires a lot of transaction log space, but the space needed to create indexes is greatly reduced.

WannaFly · Apr 14, 2008

Thanks all for your input...
Furor/KentState: The prob is the trans log filling up, and the job takes quite long to run. It is on an indexed column, and the rest of the columns start with numbers, so the index should be pretty efficient.

KB: Thats an interesting idea, didn't think you could set the recovery model on the fly. It might work, but my qorry in doing so is that there are still other operations going on that the full recovery model would be needed for.

KentState · Apr 17, 2008

Originally posted by: WannaFly
Thanks all for your input...
Furor/KentState: The prob is the trans log filling up, and the job takes quite long to run. It is on an indexed column, and the rest of the columns start with numbers, so the index should be pretty efficient.

KB: Thats an interesting idea, didn't think you could set the recovery model on the fly. It might work, but my qorry in doing so is that there are still other operations going on that the full recovery model would be needed for.

Recovery mode can be changed on the fly and I would suggest SIMPLE unless you are needing to log every transaction.

Gunslinger08 · Apr 26, 2008

Other than switching the recovery mode, I'd probably just use some select into's and truncate table's.

oog · Apr 26, 2008

split the rows you want to preserve into a separate table and place a view on top of both tables to serve as a replacement for the original table.

Search

MS SQL Performance question - delete all rows except about 1% of them

WannaFly

Platinum Member

Crusty

Lifer

Furor

Golden Member

KentState

Diamond Member

KB

Diamond Member

WannaFly

Platinum Member

KentState

Diamond Member

Gunslinger08

Lifer

oog

Golden Member

TRENDING THREADS