MatLab for Data Crunching!!!!

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
Just thought I would ask here as well.. forgot we had a programming forum.

So..... I have two HUGE excel files with rows of numbers and text. One is complete and one is incomplete, and I have to find the errors in the incomplete file and fill them in with corresponding data from the complete file. Now, I'm totally new to programming, and I'm slowly trudging my way through MatLab. I'm told MatLab can perform this type of repetitive data crunching. And I would much rather spend my time learning MatLab, which could be helpful in the future, than going through and crunching data by hand.

Anyways, my question was whether or not MatLab would be the best program for this. One problem I've already run into is that MatLab isn't all that flexible with text, and my files are composed of both numbers and text. Any other methods you would like to propose?
 

nod218

Member
Nov 18, 2003
165
0
0
Matlab might not be a good choice. There are 2 functions i can think off on the top of my head in matlab, csvread and importdata. Unfortunately csvread cannot be applied to text data, and importadata will not accomplish the job alone. It's better to use java.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,251
3,846
75
If both files contain the same rows, you don't even need anything besides Excel. Just put both sheets in the same workbook, write a formula "=IF([value_from_incomplete]="",[value_from_complete],[value_from_incomplete])", and fill it down an empty column of the incomplete sheet.

Even if the files contain different data, you might be able to do something replacing [value_from_complete] with a vlookup of the complete sheet.
 

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
Ken_g6: If I am understanding you correctly, that means I'd have another column with just the missing values. The problem is, I need to be able to insert a row where I'm missing a value and put the value there. In other words, the column may go ...60, 61, 62, 64.... I will need to create a new row between 62 and 64 for 63.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,251
3,846
75
OK, maybe you could do something with the "complete" sheet, doing a vlookup on the incomplete one? Does the "complete" sheet contain all the necessary rows?

If not, you probably do need something like Perl. Do you have your choice of languages, or are you limited on what you can install?

Edit: The idea is to have one column (or a set of columns) that contain *all* the values. Then you can copy/paste-special that data into (well, probably over in this case) your incomplete data.
 

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
I probably can use any language, but MatLab is installed on my computer. And I don't know any computer languages - at all. This is my first experience with computer programming language.

Here is the problem:
The incomplete spreadsheet has five columns, one of which holds an "id" number. However, the sequence of id numbers is incomplete; some of them are missing. The complete spreadsheet has only one column: the "id" column. The complete spreadsheet's column has the complete sequence of id numbers. What I need to do is transfer these to the incomplete spreadsheet. Now the reason I can't just copy and paste over the incomplete spreadsheet is because each row in the incomplete spreadsheet has information in the other four columns that corresponds to the first "id" column. So whenever I insert a new number into the "id" column, I need the other four columns to move down as well.

Did that make sense? If not, I'll try describing again. Reading back over that, I don't know if I make all that much sense.
 

Gibson486

Lifer
Aug 9, 2000
18,378
1
0
First....you have to change the excel file to a csv file (just change the extension to csv, but do not do this on the origonal b/c you will loose formatting). I have never seen matlab (or any other 3rd party program) read excel files and write to them without doing this.

Second.... While Matlab can do this, it is not the best program to do so. You are better off doing this in basic. There is a basic editor installed in excel, so that will make it much easier. Just tell it to replace a cell as opposed to a whole row like you were describing. Matlab is best for reading and wrting data in relation to graphs.
 

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
Wait! Don't leave!

Are you saying that Excel can compare my two "id" columns and insert a missing id number (along with a brand spanking new column)? Cuz that is just what I need to do.
 

Gibson486

Lifer
Aug 9, 2000
18,378
1
0
Originally posted by: enwar3
Wait! Don't leave!

Are you saying that Excel can compare my two "id" columns and insert a missing id number (along with a brand spanking new column)? Cuz that is just what I need to do.

Yes. You just to have know how to program it. I am no VB expert, but the scriptng tool in excel is pretty good for its purpose.

Although, I would not compare the columns.....I would just compare the cells in the columns. Also, do not delete the cell if you do not have to. Just edit the cell.

your little project is not hard, but it is a very good exercise for people who do not code alot (like me).
 

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
Yea... it better not be hard. I'm glad to hear it's a good exercise for those who don't code.. but how about those who have never coded before?

And yes, I meant compare the cells within a certain column. Then, upon finding an error, I would have to add a row.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,251
3,846
75
...Or, now that I know how your data is set up, you could use vlookup, as follows:

1. Put both sheets in the same workbook. I'll call the incomplete sheet "incomp". Make sure the incomplete sheet is sorted by ID.
2. At cell B1 of the ID sheet (or B2 if there is a header), insert the function "=vlookup(1,incomp!$A$1:incomp!$E$65000,2,FALSE)"
3. Tweaks to that function:
a. If you have a header row on incomp, change $A$1 to $A$2. If there are more header rows, increase the number.
b. If you have fewer than 65000 rows on incomp, you can lower that number to match the last row number.
c. If you don't want unsightly "#N/A"s appearing on rows that don't have matches on the incomp sheet, use the following more complicated function:
=IF(B1=vlookup(1,incomp!$A$1:incomp!$E$65000,1,FALSE),vlookup(1,incomp!$A$1:incomp!$E$65000,2,FALSE),"")
Of course, the same tweaks above apply.
4. Copy that function to cells in columns C, D, and E. Change the number before FALSE from 2 to 3, 4, and 5 respectively.
5. Copy those functions down the ID sheet to get all the values.
 

Cooler

Diamond Member
Mar 31, 2005
3,835
0
0
What type of data crunching? If its heavy vector Math I would suggest fortran.