Need Sorting Macro or Software For 2.5GB Database

BubbaBooBoo · Feb 15, 2009

I have a 2.5 GB database which is now separated in 10 different txt files, but its tab delemetered so it will import into spreadsheet formats.

The important informatoin is all in one column. its a huge list of "things." A lot of these things are duplicated over and over. Imagene something like this

Chipmunk
Squirrel
Raccoon
Chipmunk
Raccoon
Chipmunk
Squirrel
Chipmunk
Squirrel

What I want to end up with is a single list with the numbers of each ocurence.

Chipmunk 4
Squirrel 3
Raccoon 2

Here are my problems.

Will excel or openoffice open a 2.5 GB file? Will it take two hours to do anytihng to it?

What do I use to distell down this incredebly long list into a list with a list of each item just once and the number of times that item has ocured?

Im really stummped so Id apreciate any help!

geoffry · Feb 15, 2009

You can use the count function in excel to find out how many of each occurance are in the list.

Whether or not Excel will open a file that big I'm not sure.

BubbaBooBoo · Feb 15, 2009

COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!

geoffry · Feb 15, 2009

Originally posted by: BubbaBooBoo
COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!

Ah, I didn't realize there were so many repeating values.

sourceninja · Feb 15, 2009

best bet, install mysql. Then import the spreadsheet into a table. Finally, a simple SQL query will give you what you want.

Ken g6 · Feb 15, 2009

This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.

Nothinman · Feb 16, 2009

A combination of sort and uniq can do what you want too although you'll have to find a copy of them for Windows.

KB · Feb 16, 2009

ODBC actually has a text driver. You can treat the text files like a database and issue SQL statements against them. I would add an ODBC dsn to the file and then use MS Access to query the ODBC connection.

BubbaBooBoo · Feb 18, 2009

Originally posted by: Ken g6
This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.

Im gonna try that and see how it works out and let you know. I have 12 GB RAM so I should be okey. Thanks. As for the SQL I think Ill wait to see if the perl does the job first!

BubbaBooBoo · Feb 20, 2009

OK, Im almost there. i have the file all ready, perl installed, but it doesnt seem to be working as i think i have some of the vareables wrong.

my $file = '/path/to/file.txt';

is where im writing the path to where the big huge file is that i want sorted out.

Is that right?

And where do I tell perl where to write the final file to?

I think thats where im going wrong. 🙁

Ken g6 · Feb 21, 2009

OK, first, make sure perl.exe is in your path environment variable. Then make sure all the files, including the Perl program, are in one directory.

Now, I should make sure your file is set up for this program to read it. Have you split out just the column with the values you want to count into a separate file? If not, try this at a command prompt:

perl -ne "@list=split(/\t/);print $list[47],chr(10),chr(13);" yourbigfile.txt > onecolumnfile.txt

Where you replace the appropriate file names, and replace 47 with your column. (The first column is numbered 0!) If your column is the last one to the right, get rid of ",chr(10),chr(13)".

Next, yes, for this example, that line should look like:

my $file = 'onecolumnfile.txt';

Finally, run:

perl thatcode.pl > outputfile.txt

Need Sorting Macro or Software For 2.5GB Database

BubbaBooBoo

Member

geoffry

Senior member

BubbaBooBoo

Member

geoffry

Senior member

sourceninja

Diamond Member

Ken g6

Programming Moderator, Elite Member

Nothinman

Elite Member

KB

Diamond Member

BubbaBooBoo

Member

BubbaBooBoo

Member

Ken g6

Programming Moderator, Elite Member

TRENDING THREADS