Computer science people - question for you.

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.
 

cchen

Diamond Member
Oct 12, 1999
6,062
0
76
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?
 

rsd

Platinum Member
Dec 30, 2003
2,293
0
76
Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.


Umm, I would think you HAVE to sort through the whole array to pick the 10 highest don't you think?

You can't just look at half the array and guarantee you have the 10 highest.

Simply choose your favorite O(n log n) algorithm imho. The act of taking the 10 top after sorting is done is trivial complexity wise.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Picking the ith element out of a set of n number is an O(n) operation. Just pick the 10 biggest elements. Your complexity is 10*O(n) which is the same as O(n).

Another easy algorithm is just to make an 10 element array that keeps track of the 10 biggest elements, and just iterate through the 100 elements, updating the 10 element array when necessary.
 

her209

No Lifer
Oct 11, 2000
56,336
11
0
Have a 2nd array where you keep a tab of the highest 10 as you iterate through the array.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?
 

beyonddc

Senior member
May 17, 2001
910
0
76
Well, you can use any sorting algorithm. But I will suggest you to use merge sort.
After merge sort is done, just allocate an array of 10, and use a for-loop to copy the first 10 entries from the search result.

That's the only way I can think of. But why do you want to put it into another array size of 10? when you can just iterate the first 10 entries very easily.
 

rsd

Platinum Member
Dec 30, 2003
2,293
0
76
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.


So you are saying you want to do (at worst) 90 * 10 comparisons for a pool of 100 numbers (basically n^2) which is much worse than nlogn.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: rsd
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.


So you are saying you want to do (at worst) 90 * 10 comparisons for a pool of 100 numbers (basically n^2) which is much worse than nlogn.

Yeah, you got that right. Now to figure out which sort to use :)
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: blahblah99
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?


Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: chuckywang
Originally posted by: blahblah99
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?


Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

How do I know which ten elements are the largest?
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: blahblah99
Originally posted by: chuckywang
Originally posted by: blahblah99
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?


Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

How do I know which ten elements are the largest?

By the linear time selection algorithm.
 

rsd

Platinum Member
Dec 30, 2003
2,293
0
76
Originally posted by: chuckywang
Originally posted by: blahblah99
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?


Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.


How are you going to get the 10 largest elements in an unsorted array in O(n)?

For a given array a[], if I take the ith element a[ I ], how do I know it is one of the 10 largest without comparing it with another array(or whatever) containing the 10 (current) largest elements?

I don't see how you can go through the array once without doing any comparison operations.
 

Templeton

Senior member
Oct 9, 1999
467
0
0
Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.
 

rsd

Platinum Member
Dec 30, 2003
2,293
0
76
Originally posted by: Templeton
Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.


We already stated that doing the above is O(n^2), which is worse than your typical Onlogn alg.
 

notfred

Lifer
Feb 12, 2001
38,241
4
0
Originally posted by: Templeton
Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.

That forces you to iterate across a 10 element array 100 times.
 

Ameesh

Lifer
Apr 3, 2001
23,686
1
0
sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: rsd
Originally posted by: chuckywang
Originally posted by: blahblah99
Originally posted by: blahblah99
Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?


Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.


How are you going to get the 10 largest elements in an unsorted array in O(n)?

For a given array a[], if I take the ith element a[ I ], how do I know it is one of the 10 largest without comparing it with another array(or whatever) containing the 10 (current) largest elements?

I don't see how you can go through the array once without doing any comparison operations.


Linear time selection algorithm says that selecting ith largest element in an array of n elements takes O(n) time. Just do SELECT(n), then SELECT(n-1), then SELECT(n-2), ...., SELECT(n-9) to get the 10 biggest elements. This is 10O(n) which is the same as O(n).

The algorithm is described here in Section 2: http://www.cs.duke.edu/educati...l03/Notes/lecture7.pdf
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: Ameesh
sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep

Yes, but to figure out the smallest of the greatest 10 still requires one to go through the array at least once in order to find out which element is the smallest.
 

her209

No Lifer
Oct 11, 2000
56,336
11
0
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.
 

Ameesh

Lifer
Apr 3, 2001
23,686
1
0
Originally posted by: blahblah99
Originally posted by: Ameesh
sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep

Yes, but to figure out the smallest of the greatest 10 still requires one to go through the array at least once in order to find out which element is the smallest.

no you do it as you are walking the array, you should only need 1 iteration over the array
 

rsd

Platinum Member
Dec 30, 2003
2,293
0
76
Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Too lazy to show the math, but that is still O(n^2) when you think about it.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

 

Hajime

Senior member
Oct 18, 2004
617
0
71
....

Here's a -simple- method to handle that.

Empty 10-element array containing all 0's assuming unsigned. If signed, all containing lowest possible value -32768, what have you.

Iterate through the big array. If the number is -greater- than the first value in the 10-element array, check against the second value in the array. Continue doing this untill you find a value that it is lower then, or reach 10. Insert it and shuffle down numbers. Rinse and repeat untill no more elements exist in the big-arsed array.

You brought up 1 million numbers - compared to a million numbers, the CPU usage is trivial for the array of 10 elements (Would demonstrate, but it should be self-explanatory.). If you were using an array of, say, 20, then there are better methods, true. But millions of numbers? The only situation this would be bad in is if the array was already sorted or partially sorted - and if it's already been pre-sorted, why are you bothering?

Armitage: Actually, it's not showing. There's not really an effective way of handling it any other way. (Edit: I read your post wrong originally ;)) There's only one minor bug in your idea, but that's for < 10 values, and it's also trivial....