• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Improve data extraction from live system to data warehouse

coder_t2

Member
Hi guys, I am working on an initiative at my company to improve our process of data extraction. Currently, we have a Oracle database that performs data extracts on a DB2 server on a nightly basis. The data gets imported into a Oracle server for other people to perform analysis on. The DB2 server is a live system used by our sites, so we don't access it during the day so their operations are not affected. The servers are pretty old, 5-6 years I believe, and we may be upgrading them soon. I work at our HQ, so we perform top-down analysis of all our sites.

Currently our issue is that we can't extract all the data we want because of limited time and processing power. So some data we only extract weekly, although we would prefer to have it daily. And other data we would like to have access to during the day, as it happens.

I know IO is probably the most limiting factor here, not processing power. But would there be a way to set up some sort of RAID or virtual server that would allow us to extract data from the DB2 servers during the day without the DB2 servers slowing down. I am thinking something like RAID1, where we only access the 2nd hard drive. But I don't know much about RAID, so not sure how that would work. And I don't have much information on the servers current setup atm, either. So nothing to compare to. Thanks ahead of time.
 
If disk is a limiting factor, you need to improve the IO on the DB2 server. If you want real time (or close) set up log shipping or the like and have the oracle box read the committed transaction log from the DB2 box. If the DB2 box was set up correctly, the log and data are on separate physical disks (or arrays), and both disks may not be having problems.
 
Realistically, when you load a machine more, it slows down. The best you can do is to slow it down very slightly, during a time in which nobody notices.

1. Make sure you know what your bottlenecks are. Disk? Network? CPU?
2. How big are your data sets? Size memories appropriately for them.
3. How much utilization do the transaction processing servers (the DB2's) observe during the day? Break in down into disk, CPU, and network. It probably comes in spikes, like most transaction processing does. If so, try extracting data during idle time. Don't ever bring utilization above, say, U' = 0.25. The ideal value of U' for your situation will vary depending on how much perceived load at the start of a utilization spike you think you can tolerate.
4. Without knowing what data you're looking for, and several other particulars, it's hard to know if this last suggesting is applicable or not. Here it is, you decide. 🙂 Not all data in a database is easy to access. E.g., If you're looking for a log of updates to a multi-column table, modify the DB2 insertion protocols to first perform the update, then add to another table a log entry of the update itself. This slows down each DB2 operation, but generates a really handy and compact log that you can snarf up nightly at high speed, instead of grabbing the entire (presumably much larger) main table and running a diff.
 
Can you set up another Oracle DB on separate hardware? Load the daily transactions logs from your production box and then do your extracts from that.
 
Back
Top