Finding largest 'common' element of many arrays (actually, more like 'mixed base' congruence?)?

Monotaur

Senior member
Nov 5, 2001
388
0
0
Ok, so I have a problem that I am trying to solve. I currently have a bunch of vectors (like 200) that each have 1000-5000 elements. I am trying to find the Highest 'common' value among them.

First of all, every element in each vector cannot be greator than the vectors size. Secondly, the LCM of all vector sizes is known.

I think this is best illustrated with an example. Suppose I want to 'combine' the following vectors and find the 'highest' common value:

First Array is: [0] (Max size = 2)
Second Array is: [1] (Max size = 3)
Third array is: [2,3] (Max size = 4)

First, find the size of the combined vector: LCM(2,3,4)=12:
Thus (if I was combining these three vectors), I would then do something like this:

First array changes to: [0,2,4,6,8,10] (ie, all elements mod 2 = 0, as indicated in origianl vector)
Second array changes to [1,4,7,10] (ie, all elements mod 3 = 1, as indicated in origianl vector)
Third array changes to [2,3,6,7,10,11] (ie, all elements mod 4 = 2 or 3, as indicated in origianl vector)

Now, to combine them I simply pull out the common elements into a new (combined) array of size 12: [10].

So, basically, for each array you have a multiplier and element counter. When the multiplier is multiplied by the the array size ratio minus 1 (LCM/Array size -1: for the first array this would be 12/2 - 1 = 5) and the element which is specific by the element counter is added to this value it could be 0...Max size-1 (or, 0..11).

So, first starting out I set up all of the array multipliers:
First array: 12/2-1=5
Second array: 12/3-1=3
Third array: 12/4-1=2

Next the max counter value is found (simply the number of elements in the original vectors):
First array: 0
Second array: 0
Third array: 1

So, to get the highest common value, I multiply the first multiplier by the first array size and add the element counter.. I then check other arrays (by multiplying their multiplier and adding the element counter) to see if they are the same... if they are not then I first adjust the element counter then the multiplier.. I do this until this value is the same across all arrays (I don't care what the multipliers are and array counters are, I just want this highest 'common' value). Example:

TestNum=((5*2)+Array(0))=((5*2)+0)=10 (same as when j=0...)
j=0:
((5*2)+Array(0))=((5*2)+0)=10, which is equal to TestNum, so ++j
j=1:
((3*3)+Array(0))=((3*3)+1)=10, which is equal to TestNum, so ++j
j=2:
((2*4)+Array(1))=((2*4)+3)=11, which is great than TestNum, so start decrementing the element counter and try again:
((2*4)+Array(0))=((2*4)+2)=10, which is equal to TestNum, so ++j

If (j>Number of original vectors), then quit since we have found the highest 'common' value possible.

Since writing this version of the code, I have dropped the practice of decrementing the mltiplier and counter by one and started to divide the difference between testnum[j] and the base testnum and use that to change the multiplier by a value higher than 1 (hopefully). This speeds things up a little but, but I didn't know if anyone else knew of a faster way? I think I had thought of one (multiplying mult of vec(0), adding element(element_count(0)), and then mod'ing this value by all vector sizes... if this value was then found in each array, it would qualify as a common value - this might be faster since we could use binary search...).

Anyway, does this make sense? Does anyone know of a faster way to do this?

Thanks!
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
<-Going from your title
Burte force method:

Pick any array and call it $main
Next , find the highest value and call it $largestForMain
Then, make sure every array has it.
Whenever you find an array doesn't, find the next highes value in $main and repeat.
 

Monotaur

Senior member
Nov 5, 2001
388
0
0
Originally posted by: amdfanboy
<-Going from your title
Burte force method:

Pick any array and call it $main
Next , find the highest value and call it $largestForMain
Then, make sure every array has it.
Whenever you find an array doesn't, find the next highes value in $main and repeat.


Heh, I know my post is pretty long but its not quite finding the simple common element... So far, I've been working on one set of vectors and have been able to find a 'common' element in the first 50. This has taken roughly 3 days on a 3.0 HT P4... I'd like to go to 200 or so fairly easily.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
I'm confused. What are you trying to find if not the highest common element?

If you're not looknig for a common element, why do you have vectors and not just your multiplier values?

You have a bunch of arrays, each of arbitrary length, containing a sequence of values corresponding to a specific formula determined by your multiplier value, correct?

You want the highest common value among those arrays? The highest common multiplier?

Are your values in each array always in sorted order?
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Your explanation is kinda long and is confusing to me. You're just trying to find the largest common value in each? And like notfred asked, whether or not they're sorted will probably make a big difference.

One thing I can think of is to find the smallest vector and work your way down from that, but the problem is still pretty vague to me.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
If you ARE trying to find the largest common value between each array, I just wrote a perl script that does it with 200 arrays of 5000 values each in about 20 mintues (worst case: assuming all 200 arrays are identical) on an 800Mhz Pentium 3.

When using randomly generated integers between 0 and 10,000 to fill the arrays, then actual running times are around 30 seconds.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Originally posted by: BingBongWongFooey
If they're sorted, perhaps a tree would be faster?

Probably, but mine aren't sorted. I have no idea what he's doing that's taking 4 days to run on a 3ghz machine.
 

Monotaur

Senior member
Nov 5, 2001
388
0
0
Sorry guys, I know this is a bit confusing. Let me try to explain again.

I have a process which outputs arrays of increasing possible size. Initially, they are of max size, 2, 3, 4, 5, 6, ..., n, where n could be up to like 1000 (or more). Each value within each array is always less than the max size of the array (ie, array #1 [max size of 2] could only have values 0 and 1), and all values are always sorted in ascending order.

Initially, I tried to 'combine' all arrays into one giant array so that I could just pull out the value that I needed (the highest one), but this was way too memory intensive. For example, if I wanted to combine the following vectors:

Vec1: [0] (max size=2)
Vec2: [1] (max size=3)
Vec3: [0,1] (max size=4)

I would first know that the max size of the combined vector would be LCM(2,3,4) = 12, so I extend each array out to a max size of 12 (by just concatenating itself time a increasing multiple on to the end of itself), like so:

Vec1 becomes: [0, 2, 4, 6, 8, 10] (max size =12. Noticed all odd numbers were skipped; this array was expanded by multiplying the max size by an incrementing multiple and adding the vector elements to that product: [(0*2)+0, (1*2)+0, (2*2)+0, ...., (5*2)+0])

Vec2 becomes: [1, 4, 7, 10] (max size of 12. Same this as above: [(0*3)+1, (1*3)+1, (2*3)+1, (3*3)+1])

Vec3 becomes: [0, 1, 4, 5, 8, 9] (max size of 12. A little different this time sinze there are 2 elements: [(0*4)+0, (0*4)+1, (1*4)+0, (1*4)+1, (2*4)+0, (2*4)+1])

I then pulled out the common elements, which in this case 4.

This works fine on my machine up until around the max size of 30 comes around. At which point, the combined max size is like 30 billion and thaere are over 3 million (I think) elements. Keep in mind that I'm having to use a LargeInt class since these values quickly climb over the maximum unsigned long int values.

Anyway, so I decided not to combine them in this fashion. Instead, I combined them into smaller 'chunks' that take less memory (maybe only 1000-5000 elements in each). I was then trying to find a simple way to extract the highest value common element (from the total combined vector) - thats where the multipliers and element counters come in.

To make things simple, I'll show a small example/excerpt that searches for the lowest value (it's the same method of searching for the highest value except iniital multipliers and counters are 0, so it's easier to grasp):

//Set all multipliers and element_counters to 0:

for (j=0; j<vecChunks.size(); ++j)
{
multiplier[j]=0;
element_counter[j]=0;
}

//search while we haven't found a value

j=1;
bFound=false
while (!bfound)
{
//get the first possible value
BaseTestNum=(mult[0]*(Max_size_of_vec[0]))+vec[element_count[0]];

//loop through all over arrays to see if they all aer at this value
while ((BaseTestNum=(mult[j]*(Max_size_of_vec[j]))+vec[element_count[j]]) &amp; (j<vec.size()))
++j;

//if we found a value that was common throughout all combined arrays then quit
if (j==vec.size())
{
bFound-true;
} else {
for (k=0;k<=j; ++k)
{
element_counter[k]++;
//if element_Counter is too high, set it to 0 and increase mult
if (element_counter[k]>max_element_counter[k])
{
element_counter[k]=0;
mult[k]++;
}
}
j=1;
}
}

This code isn't quite right (but it's very close, and a lot more simple, espcially compared against the code that finds the highest value), but it does serve the purpose of communicating the algorithm.

Does this make any more sense? I did think of another method (I haev used it in the past on a smaller problem and it seemed to be pretty quick, but I haven't tried it for this one yet). The new one is to just go through every possible ((mult[0]*Max_vec_size[0])+vec[element_counter[0]]) combination and then mod it by the other vectors max_size. Once I had the remainder, I oculd o a binary search to see if that value was contained in the other vector. If it was, then the highest common value could be found.

Does this help at all?

Thanks for the replies so far!
 

Templeton

Senior member
Oct 9, 1999
467
0
0
Have you tried something like:
Grab largest element of largest array.
Is this element in the second largest array?
yes- check next largest array
no - go back to begining, grab next largest element from largest array
repeat over and over until all arrays checked.
To improve efficiency, store a list of the last index checked in every array, that way you won't have to do a full search all over again when an element isn't found, you can instead do your search with the last checked index as an upper bound.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Do you have a link or scan of the homework assignment?

Your second explanation is a little clearer, but you may be leaving out a part of the assignment that at least hints at an elegant solution.
 

Monotaur

Senior member
Nov 5, 2001
388
0
0
No, there is no homework assignment.. this is something I got myself into. :)

Each input vector is the result of another process, which given a value 'MaxSize' ('MaxSize'=2...n) will return an array with a maximum size of MaxSize and values from 0 to MaxSize-1.

I don't know how else to explain this, except by using another example:

1.

(the following 4 vectors were returned by this other process which was called with parameter 2, then 3, then 4, then 5):

vec1=[1] (MaxSize=2)
vec2=[0] (MaxSize=3)
vec3=[2,3] (MaxSize=4)
vec4=[1,3,4] (MaxSize=5)

I will break down the 'combination' part into more steps:

First, combine vec1 and vec2:
1. The MaxSiez of the combined vector (of vec1 and vec2) will be the LCM of each of the component vectors max sizes. In this case, the maxsize of the combined vector is LCM(2,3)=6.
2. 'Extend' each vector out until it reaches the MaxSize of the combined vector, each time adding the max size of the component vector:

vec1 becomes: [1+0,1+2,1+4]=[1,3,5] (notice how all elements mod 2 = 1, just like original vector)
vec2 becomes: [0+0,0+3,0+6]=[0,3,6] (all elements mod 3 - 0, just like original vector)

3. Now find the common elements and combine them together into one vector. The only common element here is 3, so the combined vector of 1 and 2 (called vec1&amp;2) is [3] (max size is 6).

Now combine vec1&amp;2 and vec3:

New MaxSize is LCM(6,4)=12

vec1&amp;2 becomes: [3+0, 3+6]=[3,9]
vec3 becomes: [2+0, 3+0, 2+4, 3+4, 2+8, 3+8]=[2,3,6,7,10,11]

Common element is [3] (max size is now 12). Call this vector vec1&amp;2&amp;3.

Now combine vec1&amp;2&amp;3 with vec4:

New maxsize = LCM(12,5)=60

vec1&amp;2&amp;3 becomes: [3+0, 3+12, 3+24, 3+36, 3+48]=[3,15,27,39,51]
vec4 becomes: [1+0, 3+0, 4+0, 1+5, 3+5, 4+5, ... , 1+55, 3+55, 4+55]=[1,3,4,6,8,9,...,56,58,59]

Common elements are: [3, 39, 51] (Max size 60)

Does this process make sense now? I am able to combine a small number (like up to MaxSize 30) of vectors together like this, but it becomes impossible due to memory limitations to go much further. So, I started to group smaller numbers of vectors together and then come up with a way to find the common values between these arrays, that way I wouldn't have to store all of the intermediate values. For instance, I might have these arrays:

vec1=[1,453,647,6786,43587,47782, ..., 98459435, 928593459345] (MaxSize=346535834587)
vec2=[5,67,23463,283759,2894579324, ..., 9845677938465] (MaxSize=9845677938499)
...
etc

and I would like to find the HIGHEST value I were able to combine all of these vectors in the same manner I could combine the smaller vectors above. Does this help any?

Please let me know where you are confused.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Instead of describing the long, confusing, and slow algorithm that you're not having much luck with, try explaining the actual problem you're trying to solve, and we'll see if we can come up with a better way to solve it. No one has any idea WTF you're talking about right now.
 

rainypickles

Senior member
Dec 7, 2001
724
0
0
quick question: if two of your vectors had max sizes of 9845677938499 and 9845677938498 (just minus 1), does the final vector's maxsize necessarily have to be at least LCM of those two numbers? (not sure if i wrapped my head around that right?) if the answer is yes, isn't that number huge?

since i don't know much about algorithms (though i have heard of the word order), how long would it take to find a LCM of 10 thirteen digit numbers? is this doable in a reasonable amount of time?

in the meanwhile, i'll ponder the question (not that itll probably do any good).
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Originally posted by: notfred
Instead of describing the long, confusing, and slow algorithm that you're not having much luck with, try explaining the actual problem you try to solve, and we'll see if we can come up with a better way to solve it. No one has any idea WTF you're talking about right now.
Exactly -- as stated it sounds either completely pointless, or like a confusingly (perhaps incorrectly) paraphrased homework assignment.
 

Monotaur

Senior member
Nov 5, 2001
388
0
0
Ok, how is this for concise?

Given j arrays, I want to find the coefficient A[j] and element_counter[j] such that:

A[j]*(MaxSize[j])+vec[j][element_counter[j]] is equal to the same value (the highest value under the LCM of all MaxSize[j]) for all j. Actually, I don't really care about A[j] and element_counter[j].. I just want one of the value of A[j]*(MaxSize[j])+vec[j][element_counter[j]] which satisfies the condiation above.

Note:
MaxSize[j] = the max size of the j'th element
vec[j]=the ith element of the jth array
All array values are sorted in ascending order.

Does this make any more sense?

Thanks!