• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question for you Comp Science folks! (Round 2)

blahblah99

Platinum Member
A while back I posted this question and got some really good responses...however, I just recently came up with an EVEN FASTER way....

The problem is this - you have an array of 1 million elements and want to pick the top 10 largest elements in the array. What's the fastest way to implement the sort?

The 10-iteration bubble sort is slow compared to what I have. 😛

For comparison,

10-iteration bubble sort = ~17 seconds.
my ingenuity sort = <1 second on the same computer, typically 0.5-0.8 seconds.

I dare you to come up with a faster sort algorithm! 😀 And if it's good, some Paypal will be involved!
 
What exactly is the sort you're using? This is a classic selection sort case.
 
Radix Sort. It's incredibly fast, with a much quicker worst-case sort than Quicksort. I'd tell you what it was if I could find my damn textbook.
 
Originally posted by: blahblah99
A while back I posted this question and got some really good responses...however, I just recently came up with an EVEN FASTER way....

The problem is this - you have an array of 1 million elements and want to pick the top 10 largest elements in the array. What's the fastest way to implement the sort?

The 10-iteration bubble sort is slow compared to what I have. 😛

For comparison,

10-iteration bubble sort = ~17 seconds.
my ingenuity sort = <1 second on the same computer, typically 0.5-0.8 seconds.

I dare you to come up with a faster sort algorithm! 😀 And if it's good, some Paypal will be involved!

I can do it in theta(n) time.

 
I thought Linear Time Selection Algorithm was the best in this scenario, bar none? EDIT: And also, did you test best, worst, avg case scenarios etc? For a certain subsection of problems, it could be ridiculously fast, but in some cases, it may take till the end of the universe to finish. If you haven't, take a course where you do the proofs for alogorithms and show us your proof.
 
Originally posted by: blahblah99
Originally posted by: ViRGE
What exactly is the sort you're using? This is a classic selection sort case.

Selection sort isn't much faster than bubblesort in this case...
Well then what sort are you using?😛 Your time information is worthless, I need to know the algorithm if you want something better. And for the record, selection would be far better than bubble in this case; bubble is going to be doing many times the element swaps than selection, which incurs a definite speed penalty.

PS Props toyukichiga on the Radix sort, it may very well be the fastest
 
Originally posted by: BigJ
I thought Linear Time Selection Algorithm was the best in this scenario, bar none? EDIT: And also, did you test best, worst, avg case scenarios etc? For a certain subsection of problems, it could be ridiculously fast, but in some cases, it may take till the end of the universe to finish. If you haven't, take a course where you do the proofs for alogorithms and show us your proof.
Exactly.

Even bubble sort can sort in linear time if you feed it the right data.

In choosing the algorithm for large sets of data, one may be a much better choice than another even if its average performance is 10 times worse.

Consider:
1. Alg-A, average performance O(n lg n), worst case O(n lg n)
2. Alg-B average performance O(n lg n), worst case O(n^2).

Alg-A is probably the better choice for very large data sets even if its average runtime is double that of Alg-B, unless Alg-B's worst case is very rare and you can do something tricky like catch the worst case and switch to Alg-A.
 
This is not a sorting problem at all. I haven't wasted my time reading the other thread. You need to pick the ten largest items from an unsorted million, which means you have to make at least one pass over the entire body of data at least once. Even selection sort is the wrong idea: you're not after sorting the whole run, but rather just picking a measly ten.


initialze your ten "keepers" from the first ten items in the array, and set Y to be lowest of the ten

for each item X examined (starting from #11):

    compare X to the current lowest of the ten (Y)

    if X > Y

        discard Y, keep X, set Y to current lowest from the ten



Total time taken is roughly equal to reading a million array elements, plus a million numeric comparisons. The work of organizing the ten "keepers" vanishes into the distance.
 
Originally posted by: DaveSimmons
Originally posted by: BigJ
I thought Linear Time Selection Algorithm was the best in this scenario, bar none? EDIT: And also, did you test best, worst, avg case scenarios etc? For a certain subsection of problems, it could be ridiculously fast, but in some cases, it may take till the end of the universe to finish. If you haven't, take a course where you do the proofs for alogorithms and show us your proof.
Exactly.

Even bubble sort can sort in linear time if you feed it the right data.

In choosing the algorithm for large sets of data, one may be a much better choice than another even if its average performance is 10 times worse.

Consider:
1. Alg-A, average performance O(n lg n), worst case O(n lg n)
2. Alg-B average performance O(n lg n), worst case O(n^2).

Alg-A is probably the better choice for very large data sets even if its average runtime is double that of Alg-B, unless Alg-B's worst case is very rare and you can do something tricky like catch the worst case and switch to Alg-A.

But if you noticed I said Theta(n), not O(n) or omega(n).
So my algorithm will always run in linear time, no matter what input 🙂
 
Originally posted by: JayHu
Originally posted by: DaveSimmons
Originally posted by: BigJ
I thought Linear Time Selection Algorithm was the best in this scenario, bar none? EDIT: And also, did you test best, worst, avg case scenarios etc? For a certain subsection of problems, it could be ridiculously fast, but in some cases, it may take till the end of the universe to finish. If you haven't, take a course where you do the proofs for alogorithms and show us your proof.
Exactly.

Even bubble sort can sort in linear time if you feed it the right data.

In choosing the algorithm for large sets of data, one may be a much better choice than another even if its average performance is 10 times worse.

Consider:
1. Alg-A, average performance O(n lg n), worst case O(n lg n)
2. Alg-B average performance O(n lg n), worst case O(n^2).

Alg-A is probably the better choice for very large data sets even if its average runtime is double that of Alg-B, unless Alg-B's worst case is very rare and you can do something tricky like catch the worst case and switch to Alg-A.

But if you noticed I said Theta(n), not O(n) or omega(n).
So my algorithm will always run in linear time, no matter what input 🙂

We were talking to the OP
 
Originally posted by: jvarszegi
This is not a sorting problem at all. I haven't wasted my time reading the other thread. You need to pick the ten largest items from an unsorted million, which means you have to make at least one pass over the entire body of data at least once. Even selection sort is the wrong idea: you're not after sorting the whole run, but rather just picking a measly ten.


initialze your ten "keepers" from the first ten items in the array, and set Y to be lowest of the ten

for each item X examined (starting from #11):

compare X to the current lowest of the ten (Y)

if X > Y

discard Y, keep X, set Y to current lowest from the ten



Total time taken is roughly equal to reading a million array elements, plus a million numeric comparisons. The work of organizing the ten "keepers" vanishes into the distance.

Agreed.

Put first 10 elements into an ascending ordered array and test each of the 999,990 remaining elements to see if it's larger than the lowest item in the top10 array. If not, discard. If so, test until it's not higher, insert new number, and shift the rest of the top10 down one. Probably easier with a top10 linked list than with an array.
 
Originally posted by: amoeba
I don't think first 10 elements need to be sorted.

Faster if you do.. otherwise you have to compare each new element to every element in the top10. Keeping the top10 sorted is much quicker.
 
Originally posted by: joshsquall
Originally posted by: jvarszegi
This is not a sorting problem at all. I haven't wasted my time reading the other thread. You need to pick the ten largest items from an unsorted million, which means you have to make at least one pass over the entire body of data at least once. Even selection sort is the wrong idea: you're not after sorting the whole run, but rather just picking a measly ten.


initialze your ten "keepers" from the first ten items in the array, and set Y to be lowest of the ten

for each item X examined (starting from #11):

compare X to the current lowest of the ten (Y)

if X > Y

discard Y, keep X, set Y to current lowest from the ten



Total time taken is roughly equal to reading a million array elements, plus a million numeric comparisons. The work of organizing the ten "keepers" vanishes into the distance.

Agreed.

Put first 10 elements into an ascending ordered array and test each of the 999,990 remaining elements to see if it's larger than the lowest item in the top10 array. If not, discard. If so, test until it's not higher, insert new number, and shift the rest of the top10 down one. Probably easier with a top10 linked list than with an array.

Yep, you're right; I didn't finish thinking it through. A linked list for the keepers might be a nice touch to eliminate array-shift overhead, especially because you could reuse the node from the discarded item for the newly-inserted one.

 
Originally posted by: joshsquall
Originally posted by: amoeba
I don't think first 10 elements need to be sorted.

Faster if you do.. otherwise you have to compare each new element to every element in the top10. Keeping the top10 sorted is much quicker.

Well, I don't know about "much". In the general case, yes, but in this particular case you might be hard-pressed to even measure the difference, because replacements would take place so seldom.
 
what about running a binary compare 10 times?

Is that faster or slower than the way we are talking about?

 
our current method seems to take 5 million comparisons on average + the sorting of the 10 element array which I guess is sort of negligible.

 
Yikes, I come here for trivial entertainment, not for CSI...🙂 Anyway, how about the fastest alogorithm to find the median in an unsorted array of say 1 million elements, without doing a sort or replication of the data( so you could sort the copy)......
 
Originally posted by: amoeba
our current method seems to take 5 million comparisons on average + the sorting of the 10 element array which I guess is sort of negligible.

Yeah.. worst case, 10 comparisons for each number (array is sorted from lowest to highest). This gives 10 mil comparisons, plus whatever operations to create a new node and link it as the highest number.

Best case, array is sorted from highest to lowest. 1 comparison for each number (compared to the lowest in top10) equals 1 mil comparisons.

Average of 5 comparisons for each number (if it's in the middle of the top10 list) giving 5 mil comparisions, plus whatever operations to create a new node and link it in the middle of the top10 list.

5 mil comparisons in raw computer time = ?
 
Originally posted by: Jmman
Yikes, I come here for trivial entertainment, not for CSI...🙂 Anyway, how about the fastest alogorithm to find the median in an unsorted array of say 1 million elements, without doing a sort or replication of the data( so you could sort the copy)......

again, I can do it in theta n time. 🙂
so can anyone since 1973 🙂
 
I still like my answer the best.

(assuming 1 million elements)

take the first number
compare subsequent numbers (putting higher numbers than the first number into a new list)
if the new list > 10 then repeat until 10 elements are left.

So the number of compares C would be in round n:

1: 999999
2: 499999
3: 249999
4: 124999
5: 62499
6: 31249
7: 15624
8: 7812
9: 3906
10: 1953
11: 976
12: 488
13: 244
14: 122
15: 61
16: 30
17: 15

C = 1,999,975

I know this number isn't exact but it certainly is less than:
1: 999999
2: 999998
3: 999997
4: 999996
5: 999995
6: 999994
7: 999993
8: 999992
9: 999991
10: 999990

C = 9,999,945
 
Back
Top