• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Need Sorting Macro or Software For 2.5GB Database

I have a 2.5 GB database which is now separated in 10 different txt files, but its tab delemetered so it will import into spreadsheet formats.

The important informatoin is all in one column. its a huge list of "things." A lot of these things are duplicated over and over. Imagene something like this

Chipmunk
Squirrel
Raccoon
Chipmunk
Raccoon
Chipmunk
Squirrel
Chipmunk
Squirrel

What I want to end up with is a single list with the numbers of each ocurence.

Chipmunk 4
Squirrel 3
Raccoon 2

Here are my problems.

Will excel or openoffice open a 2.5 GB file? Will it take two hours to do anytihng to it?

What do I use to distell down this incredebly long list into a list with a list of each item just once and the number of times that item has ocured?

Im really stummped so Id apreciate any help!
 
You can use the count function in excel to find out how many of each occurance are in the list.

Whether or not Excel will open a file that big I'm not sure.
 
COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!
 
Originally posted by: BubbaBooBoo
COUNT Counts the numbers in the list of arguments, ignoring text entries.
COUNTA Counts the non-empty values in the list of arguments.
COUNTIF Counts the number of cells in a range that meet a specified condition.

I dont think anny one of those is going to do the job, as I cant specefy the actual values. There have to be hundreds of thousands of different values, eah one with as many as a thousand ocurences. This is realy complecated!

Ah, I didn't realize there were so many repeating values.
 
best bet, install mysql. Then import the spreadsheet into a table. Finally, a simple SQL query will give you what you want.
 
This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.
 
A combination of sort and uniq can do what you want too although you'll have to find a copy of them for Windows.
 
ODBC actually has a text driver. You can treat the text files like a database and issue SQL statements against them. I would add an ODBC dsn to the file and then use MS Access to query the ODBC connection.
 
Originally posted by: Ken g6
This sounds like a good task for Perl, if the distinct "things" can fit in memory. In fact, here's some Perl code that appears to do exactly what you want.

Edit: I see how many "things" you estimated. In theory, Perl should be OK up to tens of millions of different values, within 1GB of RAM or so.

Im gonna try that and see how it works out and let you know. I have 12 GB RAM so I should be okey. Thanks. As for the SQL I think Ill wait to see if the perl does the job first!
 
OK, Im almost there. i have the file all ready, perl installed, but it doesnt seem to be working as i think i have some of the vareables wrong.

my $file = '/path/to/file.txt';

is where im writing the path to where the big huge file is that i want sorted out.

Is that right?

And where do I tell perl where to write the final file to?

I think thats where im going wrong. 🙁
 
OK, first, make sure perl.exe is in your path environment variable. Then make sure all the files, including the Perl program, are in one directory.

Now, I should make sure your file is set up for this program to read it. Have you split out just the column with the values you want to count into a separate file? If not, try this at a command prompt:

perl -ne "@list=split(/\t/);print $list[47],chr(10),chr(13);" yourbigfile.txt > onecolumnfile.txt

Where you replace the appropriate file names, and replace 47 with your column. (The first column is numbered 0!) If your column is the last one to the right, get rid of ",chr(10),chr(13)".

Next, yes, for this example, that line should look like:

my $file = 'onecolumnfile.txt';

Finally, run:

perl thatcode.pl > outputfile.txt
 
Back
Top