Faster matrix separation method in Matlab?

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
I've got a Mx2 matrix where the first column contains a Group ID number (lets say 1-5) and the second column contains some data value.

I need to separate the data column into multiple vectors based on the Group ID number and right now I'm just using a for loop that checks the Group ID number for each row and then puts the value into a new vector depending on that ID number.

The code works fine, but the for loop takes a long time to execute in Matlab and I was wondering if there is a faster method to accomplish this?
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Which version of Matlab are you using? Any version newer than 6.1 (? Might be 6.5, don't remeber) will use JIT compiling meaning even for-loops are reasonably fast and you do not actually gain much by removing them as long as the complier can do its job efficiently.
In older versions you should avoid for loops and try to use matrix operations. Unfortunately I can't think of a good to way do that in this case since you need to check each number.

One way would of course be to write this in C and then call the C- functionfrom wíthin Matlab.


 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Thats version 7.1. right? Then I don't think there is much you can do.
You must have a very long list if the code takes a long time to execute. There is a tool you can use to check how well the JIT complier is working, I don't remember what it is called but you can find it in one of the menues.
 

A5

Diamond Member
Jun 9, 2000
4,902
5
81
Originally posted by: jmcoreymv
I believe its actually a later version than 7.1 The list is about 50,000 long.

Geez. Matlab is good with vectors, but even it will take awhile with 50,000 entries. :p
 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
Originally posted by: A5
Originally posted by: jmcoreymv
I believe its actually a later version than 7.1 The list is about 50,000 long.

Geez. Matlab is good with vectors, but even it will take awhile with 50,000 entries. :p

Ha, guess I have no patience.
 

AeroEngy

Senior member
Mar 16, 2006
356
0
0
You could try something like the following. Basically take the diff of the group ID in the first column then find where it is non-zero. From this you can determine the start and end index to each group of data. Then just reasign that data to a new vector. You still have a for loop but it is only for the number of group IDs and not the length of your matrix. I am not sure how much faster it will be I only tested it on a small scale in the following code. I also coded in that it would assign the data to a dataVector that's name was appended with the group ID number. Hope i helps. P.S. There are probably even more effiecient ways to do this

% Create Matrix 1st Column ID # 2nd Colum Data
a2 = [0:1:14];
a1 = [1 1 1 2 2 2 3 3 3 3 3 4 4 4 4];
A:),2) = a2';
A:),1) = a1'
%Take diff and find where it is non-zero
diffA = diff(A:),1));
indexDiff = find(diffA ~= 0);

%Loop to create unique named data vector
startIndex = 1;
for i = 1:length(indexDiff)
endIndex = indexDiff(i);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
startIndex = indexDiff(i)+1;
end
%... And one more time for last group
endIndex = length(A);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
Originally posted by: AeroEngy
You could try something like the following. Basically take the diff of the group ID in the first column then find where it is non-zero. From this you can determine the start and end index to each group of data. Then just reasign that data to a new vector. You still have a for loop but it is only for the number of group IDs and not the length of your matrix. I am not sure how much faster it will be I only tested it on a small scale in the following code. I also coded in that it would assign the data to a dataVector that's name was appended with the group ID number. Hope i helps. P.S. There are probably even more effiecient ways to do this

% Create Matrix 1st Column ID # 2nd Colum Data
a2 = [0:1:14];
a1 = [1 1 1 2 2 2 3 3 3 3 3 4 4 4 4];
A:),2) = a2';
A:),1) = a1'
%Take diff and find where it is non-zero
diffA = diff(A:),1));
indexDiff = find(diffA ~= 0);

%Loop to create unique named data vector
startIndex = 1;
for i = 1:length(indexDiff)
endIndex = indexDiff(i);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
startIndex = indexDiff(i)+1;
end
%... And one more time for last group
endIndex = length(A);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)

That seems like it might work, but all my data is interleaved so the group id would go 1,2,3,4,1,2,3,4,etc. I could use sortrows probably to order it and then your method, but that might take just as long. I'll try it when I get home. Thanks!
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.
 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
Originally posted by: f95toli
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.


It is very fast when doing computations on the whole matrix such as FFT, filtering, etc, but for separating with the for loop it takes ~2 minutes.
 

pcy

Senior member
Nov 20, 2005
260
0
0
Hi,

I don't speak MatLab, but I just tied it in APL. 100,000 lines took 0.06 secs on a 5 year old PC.


So you might like to try the APL algorithm

1. Sort the matrix by groupid
2. Compare each groupid with it's predecessor to identify the start of each Group
3. partition the values by the start points

No loop, no for statement.



Peter
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Originally posted by: jmcoreymv
Originally posted by: f95toli
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.


It is very fast when doing computations on the whole matrix such as FFT, filtering, etc, but for separating with the for loop it takes ~2 minutes.


I just tried it using Matlab 6.5. I just used a SWITCH block, 5 vectors and 50 000 rows took about 2.1 s on my (slow) PC.

2 minutes?



 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
Here's the code I was using to do the separation that took so long:

j = 1;
k = 1;
data1i = zeros(100);
data1q = zeros(100);
% Separate the interleaved channel data
for i=1:length(chan)
__switch chan(i)
_____case 0 % Channel 0 Data
_______data0i(j) = datai(i);
_______data0q(j) = dataq(i);
_______j = j+1;
_____case 1 % Channel 1 Data
_______data1i(k) = datai(i);
_______data1q(k) = dataq(i);
_______k = k+1;
__end
end

I'll try the other methods and see if they work faster for me.
 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
I think I figured out the problem. I guess it was the server that Matlab is running on. It's running much faster today! Sorry guys!
 

AeroEngy

Senior member
Mar 16, 2006
356
0
0
It is probably doesn't matter anymore but you if you know the # of channels or whatever and they are sequential

you can use the find command which would return the index to each group something like this.

% find where group ID is equal to 1 or whatever
index = find(a:),1) == 1);
data1 = a(index,2);

This would create the data array for channel 1 in only two lines. You would want to implement that in a loop for group# somehow if there are lots of channels or unknown number of channles.
however, if there are not that many channels and they stay consistant for each Matrix you could just do this:

index1 = find(a:),1) == 1);
data1 = a(index1,2);
index2 = find(a:),1) == 2);
data2 = a(index2,2);
index3 = find(a:),1) == 3);
data3 = a(index3,2);
.......
ect.

You could also combine the two commands in one as follows but it might be confusing to someone who might come after you.

data1 = a(find(a:),1)==1),2);
data2 = a(find(a:),1)==2),2);
data3 = a(find(a:),1)==3),2);
....
ect.


 

imported_Ned Flanders

Senior member
May 11, 2005
641
0
0
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.
 

jmcoreymv

Diamond Member
Oct 9, 1999
4,264
0
0
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.


Thats how its setup at my work. It runs on a server through a program called Citrix as do most other programs. Unfortunately sometimes our server gets bogged down.
 

AeroEngy

Senior member
Mar 16, 2006
356
0
0
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, we actaully have it installed and a very fast Linux server. We then login on to the server remotely through a windows machine using Exceed software. In other labs i have seen similiar setup but logging on through dumb terminals that run everything off of the server.

 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, as jmcoreymv and AeroEngy has already pointed out it is possible.
However, Matlab is not a multi-threaded application and can therefore not take advantage of multiple processors(*). Hence, there is no such thing as a "super fast server" in this case. Any decent modern PC is likely to be as fast or faster than a server since the latter are much more expensive and therefore rarely upgraded. The onlly exception is if you are using very large matrices or solving large equation systems (the same thing), in that case you need a lot of memory (how fast you can run Matlab often depends more on the amount of RAM than on the processor, make sure the computer never needs to swap data to the HD). But then again, 2 or even 3 GB is quite common ín good gaming rigs nowadays and that will certainly be sufficient for most applications.


(*)There are add-ons which will allow you to write parallell programs and if you use the compliler you can even use Matlab funcion combined with MPI on e.g. a grid. However, this is not included in the standard version.
 

AeroEngy

Senior member
Mar 16, 2006
356
0
0
Originally posted by: f95toli
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, as jmcoreymv and AeroEngy has already pointed out it is possible.
However, Matlab is not a multi-threaded application and can therefore not take advantage of multiple processors(*). Hence, there is no such thing as a "super fast server" in this case. Any decent modern PC is likely to be as fast or faster than a server since the latter are much more expensive and therefore rarely upgraded. The onlly exception is if you are using very large matrices or solving large equation systems (the same thing), in that case you need a lot of memory (how fast you can run Matlab often depends more on the amount of RAM than on the processor, make sure the computer never needs to swap data to the HD). But then again, 2 or even 3 GB is quite common ín good gaming rigs nowadays and that will certainly be sufficient for most applications.


(*)There are add-ons which will allow you to write parallell programs and if you use the compliler you can even use Matlab funcion combined with MPI on e.g. a grid. However, this is not included in the standard version.

My server is definantly way faster and better suited to the huge matrix operations than any desktop I have seen. (It is not uncommon to have to do matrix operation and plotting of multiple 200 by 1,000,000 element matrixes at once). You are right however, that the speed in this instances is usually mostly dependant on tha amount of RAM you have

Also, with the Distributed Computing Toolbox you can write your m-code to run on multiple processors or multiple machines or combinations of both. This is how we are currently running our setup at work. (64-bit Linux matlab server with 128GB of RAM with AMD Opterons distibuting to a dozen linux desktops ) As f95toli mentioned this option is not part of the standard package and has to be purchased as a separate toolbox. It also require m-code to be rewritten into "tasks" that can be assigned to different processors, or separte machines.