Improve data extraction from live system to data warehouse

coder_t2

Member
Nov 6, 2009
92
0
0
Hi guys, I am working on an initiative at my company to improve our process of data extraction. Currently, we have a Oracle database that performs data extracts on a DB2 server on a nightly basis. The data gets imported into a Oracle server for other people to perform analysis on. The DB2 server is a live system used by our sites, so we don't access it during the day so their operations are not affected. The servers are pretty old, 5-6 years I believe, and we may be upgrading them soon. I work at our HQ, so we perform top-down analysis of all our sites.

Currently our issue is that we can't extract all the data we want because of limited time and processing power. So some data we only extract weekly, although we would prefer to have it daily. And other data we would like to have access to during the day, as it happens.

I know IO is probably the most limiting factor here, not processing power. But would there be a way to set up some sort of RAID or virtual server that would allow us to extract data from the DB2 servers during the day without the DB2 servers slowing down. I am thinking something like RAID1, where we only access the 2nd hard drive. But I don't know much about RAID, so not sure how that would work. And I don't have much information on the servers current setup atm, either. So nothing to compare to. Thanks ahead of time.
 

Evadman

Administrator Emeritus<br>Elite Member
Feb 18, 2001
30,990
5
81
If disk is a limiting factor, you need to improve the IO on the DB2 server. If you want real time (or close) set up log shipping or the like and have the oracle box read the committed transaction log from the DB2 box. If the DB2 box was set up correctly, the log and data are on separate physical disks (or arrays), and both disks may not be having problems.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Realistically, when you load a machine more, it slows down. The best you can do is to slow it down very slightly, during a time in which nobody notices.

1. Make sure you know what your bottlenecks are. Disk? Network? CPU?
2. How big are your data sets? Size memories appropriately for them.
3. How much utilization do the transaction processing servers (the DB2's) observe during the day? Break in down into disk, CPU, and network. It probably comes in spikes, like most transaction processing does. If so, try extracting data during idle time. Don't ever bring utilization above, say, U' = 0.25. The ideal value of U' for your situation will vary depending on how much perceived load at the start of a utilization spike you think you can tolerate.
4. Without knowing what data you're looking for, and several other particulars, it's hard to know if this last suggesting is applicable or not. Here it is, you decide. :) Not all data in a database is easy to access. E.g., If you're looking for a log of updates to a multi-column table, modify the DB2 insertion protocols to first perform the update, then add to another table a log entry of the update itself. This slows down each DB2 operation, but generates a really handy and compact log that you can snarf up nightly at high speed, instead of grabbing the entire (presumably much larger) main table and running a diff.
 

coder_t2

Member
Nov 6, 2009
92
0
0
Ok thanks. I'll need to some more research for now. But I have some things to think about now.
 

brandonbull

Diamond Member
May 3, 2005
6,365
1,223
126
Can you set up another Oracle DB on separate hardware? Load the daily transactions logs from your production box and then do your extracts from that.