Computer science people - question for you.

blahblah99 · Dec 21, 2004

Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

cchen · Dec 21, 2004

If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

rsd · Dec 21, 2004

Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Umm, I would think you HAVE to sort through the whole array to pick the 10 highest don't you think?

You can't just look at half the array and guarantee you have the 10 highest.

Simply choose your favorite O(n log n) algorithm imho. The act of taking the 10 top after sorting is done is trivial complexity wise.

blahblah99 · Dec 21, 2004

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

chuckywang · Dec 21, 2004

Picking the ith element out of a set of n number is an O(n) operation. Just pick the 10 biggest elements. Your complexity is 10*O(n) which is the same as O(n).

Another easy algorithm is just to make an 10 element array that keeps track of the 10 biggest elements, and just iterate through the 100 elements, updating the 10 element array when necessary.

her209 · Dec 21, 2004

Have a 2nd array where you keep a tab of the highest 10 as you iterate through the array.

blahblah99 · Dec 21, 2004

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

beyonddc · Dec 21, 2004

Well, you can use any sorting algorithm. But I will suggest you to use merge sort.
After merge sort is done, just allocate an array of 10, and use a for-loop to copy the first 10 entries from the search result.

That's the only way I can think of. But why do you want to put it into another array size of 10? when you can just iterate the first 10 entries very easily.

rsd · Dec 21, 2004

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

So you are saying you want to do (at worst) 90 * 10 comparisons for a pool of 100 numbers (basically n^2) which is much worse than nlogn.

blahblah99 · Dec 21, 2004

Originally posted by: rsd

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

So you are saying you want to do (at worst) 90 * 10 comparisons for a pool of 100 numbers (basically n^2) which is much worse than nlogn.

Yeah, you got that right. Now to figure out which sort to use

chuckywang · Dec 21, 2004

Originally posted by: blahblah99

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

blahblah99 · Dec 21, 2004

Originally posted by: chuckywang

Originally posted by: blahblah99

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

Click to expand...

Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

How do I know which ten elements are the largest?

chuckywang · Dec 21, 2004

Originally posted by: blahblah99

Originally posted by: chuckywang

Originally posted by: blahblah99

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

Click to expand...

Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

Click to expand...

How do I know which ten elements are the largest?

By the linear time selection algorithm.

rsd · Dec 21, 2004

Originally posted by: chuckywang

Originally posted by: blahblah99

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

Click to expand...

Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

How are you going to get the 10 largest elements in an unsorted array in O(n)?

For a given array a[], if I take the ith element a[ I ], how do I know it is one of the 10 largest without comparing it with another array(or whatever) containing the 10 (current) largest elements?

I don't see how you can go through the array once without doing any comparison operations.

Templeton · Dec 21, 2004

Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.

rsd · Dec 21, 2004

Originally posted by: Templeton
Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.

We already stated that doing the above is O(n^2), which is worse than your typical Onlogn alg.

notfred · Dec 21, 2004

Originally posted by: Templeton
Throw the first 10 elements of the large array into a seperate 10 element array, sort this small array. Now iterate over the large array, for each element, compare to the first element in the sorted 10 element array - if smaller, then insert into this array in the proper position to keep it sorted. Continue iterating.

That forces you to iterate across a 10 element array 100 times.

Ameesh · Dec 21, 2004

sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep

chuckywang · Dec 21, 2004

Originally posted by: rsd

Originally posted by: chuckywang

Originally posted by: blahblah99

Originally posted by: blahblah99

Originally posted by: cchen
If you don't iterate through the entire array, how would you know if the items you skipped aren't larger than what you think are the largest?

Click to expand...

I'm not saying don't iterate through the array... what I don't want to do is sort the entire array, then in the end pick the first 10 entries.... is there a way to iterate through the +1 mill array and just insert into the array with 10 elements?

Kinda like this:

1) From a pool of 100 numbers, pick a random 10 and sort that, insert into new array.
2) Pick another number, compare that to elements in new array and insert and rearrange.
3) Continue until all numbers from pool is exhausted.

That's what I want to do, but am wondering if there's a more optimal way of doing it.

Click to expand...

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Time to dig up my old college c++ books - quick sort is the fastest right?

Click to expand...

Dude, picking the 10 largest elements in a list of n elements is an O(n) algorithm. Sorting takes O(nlogn) at best (it has been proven that it can't be better than that). If you sort, you're doing more work than necessary.

Click to expand...

How are you going to get the 10 largest elements in an unsorted array in O(n)?

For a given array a[], if I take the ith element a[ I ], how do I know it is one of the 10 largest without comparing it with another array(or whatever) containing the 10 (current) largest elements?

I don't see how you can go through the array once without doing any comparison operations.

Linear time selection algorithm says that selecting ith largest element in an array of n elements takes O(n) time. Just do SELECT(n), then SELECT(n-1), then SELECT(n-2), ...., SELECT(n-9) to get the 10 biggest elements. This is 10O(n) which is the same as O(n).

The algorithm is described here in Section 2: http://www.cs.duke.edu/educati...l03/Notes/lecture7.pdf

blahblah99 · Dec 21, 2004

Originally posted by: Ameesh
sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep

Yes, but to figure out the smallest of the greatest 10 still requires one to go through the array at least once in order to find out which element is the smallest.

her209 · Dec 21, 2004

Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Ameesh · Dec 21, 2004

Originally posted by: blahblah99

Originally posted by: Ameesh
sorting the array in the best case will be a O(nlogn) you can do this linearly by keeping a second list that holds the best 10 ints, you can do this pretty easily by keeping track of the smallest of the greatest 10 and use that to figure out what to toss and what to keep

Click to expand...

Yes, but to figure out the smallest of the greatest 10 still requires one to go through the array at least once in order to find out which element is the smallest.

no you do it as you are walking the array, you should only need 1 iteration over the array

rsd · Dec 21, 2004

Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Too lazy to show the math, but that is still O(n^2) when you think about it.

Armitage · Dec 22, 2004

Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Hajime · Dec 22, 2004

....

Here's a -simple- method to handle that.

Empty 10-element array containing all 0's assuming unsigned. If signed, all containing lowest possible value -32768, what have you.

Iterate through the big array. If the number is -greater- than the first value in the 10-element array, check against the second value in the array. Continue doing this untill you find a value that it is lower then, or reach 10. Insert it and shuffle down numbers. Rinse and repeat untill no more elements exist in the big-arsed array.

You brought up 1 million numbers - compared to a million numbers, the CPU usage is trivial for the array of 10 elements (Would demonstrate, but it should be self-explanatory.). If you were using an array of, say, 20, then there are better methods, true. But millions of numbers? The only situation this would be bad in is if the array was already sorted or partially sorted - and if it's already been pre-sorted, why are you bothering?

Armitage: Actually, it's not showing. There's not really an effective way of handling it any other way. (Edit: I read your post wrong originally

) There's only one minor bug in your idea, but that's for < 10 values, and it's also trivial....

Computer science people - question for you.

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Lifer

No Lifer

Platinum Member

Senior member

Platinum Member

Platinum Member

Lifer

Platinum Member

Lifer

Platinum Member

Senior member

Platinum Member

Lifer

Lifer

Lifer

Platinum Member

No Lifer

Lifer

Platinum Member

Banned

Senior member