• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Faster matrix separation method in Matlab?

jmcoreymv

Diamond Member
I've got a Mx2 matrix where the first column contains a Group ID number (lets say 1-5) and the second column contains some data value.

I need to separate the data column into multiple vectors based on the Group ID number and right now I'm just using a for loop that checks the Group ID number for each row and then puts the value into a new vector depending on that ID number.

The code works fine, but the for loop takes a long time to execute in Matlab and I was wondering if there is a faster method to accomplish this?
 
Which version of Matlab are you using? Any version newer than 6.1 (? Might be 6.5, don't remeber) will use JIT compiling meaning even for-loops are reasonably fast and you do not actually gain much by removing them as long as the complier can do its job efficiently.
In older versions you should avoid for loops and try to use matrix operations. Unfortunately I can't think of a good to way do that in this case since you need to check each number.

One way would of course be to write this in C and then call the C- functionfrom wíthin Matlab.


 
Thats version 7.1. right? Then I don't think there is much you can do.
You must have a very long list if the code takes a long time to execute. There is a tool you can use to check how well the JIT complier is working, I don't remember what it is called but you can find it in one of the menues.
 
Originally posted by: jmcoreymv
I believe its actually a later version than 7.1 The list is about 50,000 long.

Geez. Matlab is good with vectors, but even it will take awhile with 50,000 entries. 😛
 
Originally posted by: A5
Originally posted by: jmcoreymv
I believe its actually a later version than 7.1 The list is about 50,000 long.

Geez. Matlab is good with vectors, but even it will take awhile with 50,000 entries. 😛

Ha, guess I have no patience.
 
You could try something like the following. Basically take the diff of the group ID in the first column then find where it is non-zero. From this you can determine the start and end index to each group of data. Then just reasign that data to a new vector. You still have a for loop but it is only for the number of group IDs and not the length of your matrix. I am not sure how much faster it will be I only tested it on a small scale in the following code. I also coded in that it would assign the data to a dataVector that's name was appended with the group ID number. Hope i helps. P.S. There are probably even more effiecient ways to do this

% Create Matrix 1st Column ID # 2nd Colum Data
a2 = [0:1:14];
a1 = [1 1 1 2 2 2 3 3 3 3 3 4 4 4 4];
A🙂,2) = a2';
A🙂,1) = a1'
%Take diff and find where it is non-zero
diffA = diff(A🙂,1));
indexDiff = find(diffA ~= 0);

%Loop to create unique named data vector
startIndex = 1;
for i = 1:length(indexDiff)
endIndex = indexDiff(i);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
startIndex = indexDiff(i)+1;
end
%... And one more time for last group
endIndex = length(A);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
 
Originally posted by: AeroEngy
You could try something like the following. Basically take the diff of the group ID in the first column then find where it is non-zero. From this you can determine the start and end index to each group of data. Then just reasign that data to a new vector. You still have a for loop but it is only for the number of group IDs and not the length of your matrix. I am not sure how much faster it will be I only tested it on a small scale in the following code. I also coded in that it would assign the data to a dataVector that's name was appended with the group ID number. Hope i helps. P.S. There are probably even more effiecient ways to do this

% Create Matrix 1st Column ID # 2nd Colum Data
a2 = [0:1:14];
a1 = [1 1 1 2 2 2 3 3 3 3 3 4 4 4 4];
A🙂,2) = a2';
A🙂,1) = a1'
%Take diff and find where it is non-zero
diffA = diff(A🙂,1));
indexDiff = find(diffA ~= 0);

%Loop to create unique named data vector
startIndex = 1;
for i = 1:length(indexDiff)
endIndex = indexDiff(i);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)
startIndex = indexDiff(i)+1;
end
%... And one more time for last group
endIndex = length(A);
command = sprintf('%s = A((startIndex:endIndex),2)',['dataVector' num2str(A(startIndex,1))]);
eval(command)

That seems like it might work, but all my data is interleaved so the group id would go 1,2,3,4,1,2,3,4,etc. I could use sortrows probably to order it and then your method, but that might take just as long. I'll try it when I get home. Thanks!
 
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.
 
Originally posted by: f95toli
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.


It is very fast when doing computations on the whole matrix such as FFT, filtering, etc, but for separating with the for loop it takes ~2 minutes.
 
Hi,

I don't speak MatLab, but I just tied it in APL. 100,000 lines took 0.06 secs on a 5 year old PC.


So you might like to try the APL algorithm

1. Sort the matrix by groupid
2. Compare each groupid with it's predecessor to identify the start of each Group
3. partition the values by the start points

No loop, no for statement.



Peter
 
Originally posted by: jmcoreymv
Originally posted by: f95toli
50 000 rows isn't very much. I use matrices bigger than that in Matlab on a regular basis without any problem, as long as you do not try to create a square matrix of that size even memory is not an issue. Matlab should be able to handle it without a problem.


It is very fast when doing computations on the whole matrix such as FFT, filtering, etc, but for separating with the for loop it takes ~2 minutes.


I just tried it using Matlab 6.5. I just used a SWITCH block, 5 vectors and 50 000 rows took about 2.1 s on my (slow) PC.

2 minutes?



 
Here's the code I was using to do the separation that took so long:

j = 1;
k = 1;
data1i = zeros(100);
data1q = zeros(100);
% Separate the interleaved channel data
for i=1:length(chan)
__switch chan(i)
_____case 0 % Channel 0 Data
_______data0i(j) = datai(i);
_______data0q(j) = dataq(i);
_______j = j+1;
_____case 1 % Channel 1 Data
_______data1i(k) = datai(i);
_______data1q(k) = dataq(i);
_______k = k+1;
__end
end

I'll try the other methods and see if they work faster for me.
 
I think I figured out the problem. I guess it was the server that Matlab is running on. It's running much faster today! Sorry guys!
 
It is probably doesn't matter anymore but you if you know the # of channels or whatever and they are sequential

you can use the find command which would return the index to each group something like this.

% find where group ID is equal to 1 or whatever
index = find(a🙂,1) == 1);
data1 = a(index,2);

This would create the data array for channel 1 in only two lines. You would want to implement that in a loop for group# somehow if there are lots of channels or unknown number of channles.
however, if there are not that many channels and they stay consistant for each Matrix you could just do this:

index1 = find(a🙂,1) == 1);
data1 = a(index1,2);
index2 = find(a🙂,1) == 2);
data2 = a(index2,2);
index3 = find(a🙂,1) == 3);
data3 = a(index3,2);
.......
ect.

You could also combine the two commands in one as follows but it might be confusing to someone who might come after you.

data1 = a(find(a🙂,1)==1),2);
data2 = a(find(a🙂,1)==2),2);
data3 = a(find(a🙂,1)==3),2);
....
ect.


 
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.
 
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.


Thats how its setup at my work. It runs on a server through a program called Citrix as do most other programs. Unfortunately sometimes our server gets bogged down.
 
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, we actaully have it installed and a very fast Linux server. We then login on to the server remotely through a windows machine using Exceed software. In other labs i have seen similiar setup but logging on through dumb terminals that run everything off of the server.

 
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, as jmcoreymv and AeroEngy has already pointed out it is possible.
However, Matlab is not a multi-threaded application and can therefore not take advantage of multiple processors(*). Hence, there is no such thing as a "super fast server" in this case. Any decent modern PC is likely to be as fast or faster than a server since the latter are much more expensive and therefore rarely upgraded. The onlly exception is if you are using very large matrices or solving large equation systems (the same thing), in that case you need a lot of memory (how fast you can run Matlab often depends more on the amount of RAM than on the processor, make sure the computer never needs to swap data to the HD). But then again, 2 or even 3 GB is quite common ín good gaming rigs nowadays and that will certainly be sufficient for most applications.


(*)There are add-ons which will allow you to write parallell programs and if you use the compliler you can even use Matlab funcion combined with MPI on e.g. a grid. However, this is not included in the standard version.
 
Originally posted by: f95toli
Originally posted by: Ned Flanders
Sorry to take this off topic, but I have a work related question:

Is it possible to set up MatLab so you can use it remotely (Say - Installed on a super fast server).

Thanks guys.

Yes, as jmcoreymv and AeroEngy has already pointed out it is possible.
However, Matlab is not a multi-threaded application and can therefore not take advantage of multiple processors(*). Hence, there is no such thing as a "super fast server" in this case. Any decent modern PC is likely to be as fast or faster than a server since the latter are much more expensive and therefore rarely upgraded. The onlly exception is if you are using very large matrices or solving large equation systems (the same thing), in that case you need a lot of memory (how fast you can run Matlab often depends more on the amount of RAM than on the processor, make sure the computer never needs to swap data to the HD). But then again, 2 or even 3 GB is quite common ín good gaming rigs nowadays and that will certainly be sufficient for most applications.


(*)There are add-ons which will allow you to write parallell programs and if you use the compliler you can even use Matlab funcion combined with MPI on e.g. a grid. However, this is not included in the standard version.

My server is definantly way faster and better suited to the huge matrix operations than any desktop I have seen. (It is not uncommon to have to do matrix operation and plotting of multiple 200 by 1,000,000 element matrixes at once). You are right however, that the speed in this instances is usually mostly dependant on tha amount of RAM you have

Also, with the Distributed Computing Toolbox you can write your m-code to run on multiple processors or multiple machines or combinations of both. This is how we are currently running our setup at work. (64-bit Linux matlab server with 128GB of RAM with AMD Opterons distibuting to a dozen linux desktops ) As f95toli mentioned this option is not part of the standard package and has to be purchased as a separate toolbox. It also require m-code to be rewritten into "tasks" that can be assigned to different processors, or separte machines.
 
Back
Top