Need Sorting Macro or Software For 2.5GB Database

BubbaBooBoo

Member
Jul 29, 2008
102
0
0
I have a 2.5 GB database which is now separated in 10 different txt files, but its tab delemetered so it will import into spreadsheet formats.

The important informatoin is all in one column. its a huge list of "things." A lot of these things are duplicated over and over. Imagene something like this

Chipmunk
Squirrel
Raccoon
Chipmunk
Raccoon
Chipmunk
Squirrel
Chipmunk
Squirrel

What I want to end up with is a single list with the numbers of each ocurence.

Chipmunk 4
Squirrel 3
Raccoon 2

Here are my problems.

Will excel or openoffice open a 2.5 GB file? Will it take two hours to do anytihng to it?

What do I use to distell down this incredebly long list into a list with a list of each item just once and the number of times that item has ocured?

Im really stummped so Id apreciate any help!
 

geoffry

Senior member
Sep 3, 2007
599
0
76
You can use the count function in excel to find out how many of each occurance are in the list.

Whether or not Excel will open a file that big I'm not sure.
 

BubbaBooBoo

Member
Jul 29, 2008
102
0
0
COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!
 

geoffry

Senior member
Sep 3, 2007
599
0
76
Originally posted by: BubbaBooBoo
COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!

Ah, I didn't realize there were so many repeating values.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
best bet, install mysql. Then import the spreadsheet into a table. Finally, a simple SQL query will give you what you want.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
A combination of sort and uniq can do what you want too although you'll have to find a copy of them for Windows.
 

KB

Diamond Member
Nov 8, 1999
5,406
389
126
ODBC actually has a text driver. You can treat the text files like a database and issue SQL statements against them. I would add an ODBC dsn to the file and then use MS Access to query the ODBC connection.
 

BubbaBooBoo

Member
Jul 29, 2008
102
0
0
Originally posted by: Ken g6
This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.

Im gonna try that and see how it works out and let you know. I have 12 GB RAM so I should be okey. Thanks. As for the SQL I think Ill wait to see if the perl does the job first!
 

BubbaBooBoo

Member
Jul 29, 2008
102
0
0
OK, Im almost there. i have the file all ready, perl installed, but it doesnt seem to be working as i think i have some of the vareables wrong.

my $file = '/path/to/file.txt';

is where im writing the path to where the big huge file is that i want sorted out.

Is that right?

And where do I tell perl where to write the final file to?

I think thats where im going wrong. :(
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,815
75
OK, first, make sure perl.exe is in your path environment variable. Then make sure all the files, including the Perl program, are in one directory.

Now, I should make sure your file is set up for this program to read it. Have you split out just the column with the values you want to count into a separate file? If not, try this at a command prompt:

perl -ne "@list=split(/\t/);print $list[47],chr(10),chr(13);" yourbigfile.txt > onecolumnfile.txt

Where you replace the appropriate file names, and replace 47 with your column. (The first column is numbered 0!) If your column is the last one to the right, get rid of ",chr(10),chr(13)".

Next, yes, for this example, that line should look like:

my $file = 'onecolumnfile.txt';

Finally, run:

perl thatcode.pl > outputfile.txt