Dealing with a lot of small files

wyvrn

Lifer
Feb 15, 2000
10,074
0
0
I have been given the task (ludicrously) of sorting through 574 text files, which comes to 32MB of uncompressed text. This is roughly 7.5 times the size of the King James Bible, Old and New Testament.

Not only do I need to sort through them and categorize them, once categorized, I have to put them into a database and write certain queries to pull out specific information. For example, do an inner join on two tables, and pull out relevant fields from each table. Then somehow follow up on the results (of which there could be thousands).

Is there a batch program that will import these into Access or Excel so that it doesn't take a week?

Yes I only have Access and Excel to work with (please stop laughing). I am thinking of going to my manager and explaining how this is next to impossible. But first, I wanted to make sure I wasn't looking at all possibilities.

Any suggestions?

(Yes I know this is software. Please move if appropriate, but I'll never get an answer in the software forum).
 

wyvrn

Lifer
Feb 15, 2000
10,074
0
0
Originally posted by: ColemontHD
What is in these files? This might help.


Quite a lot of things. They are text files. Mostly logs, but some other types as well.
 

corkyg

Elite Member | Peripherals
Super Moderator
Mar 4, 2000
27,370
240
106
I don't know of any easy solution. If the first sort were by date - then you could break the pile into smaller piles by year, month, etc. Then attack the smaller piles and go by subject or whatever discriminates. If you can do 25 files an hour, you are looking at roughly 3-4 day project.

It's really hard to figure unless one knows the general file subject/content so that key words can be developed.
 

wyvrn

Lifer
Feb 15, 2000
10,074
0
0
Ok I'll take an example of one subset.

Logs of who used the root account in Unix on several servers, including which connection they came in from and when. Then I also have su and sudo logs where users accessed root after logging in their account and what initial directory they accessed. There are hundreds of these files. I know how to do this in Access, but the time it takes to import these files so I can run my queries is what I am trying to reduce.
 

Boyo

Golden Member
Feb 23, 2006
1,406
0
0
Sounds like you have a lot of work on your hands and I don't see any easy way of sorting this out. Using Excell to do this will take a lot of hours....