Computer science people - question for you.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

her209

No Lifer
Oct 11, 2000
56,336
11
0
Originally posted by: rsd
Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.
Too lazy to show the math, but that is still O(n^2) when you think about it.
But that is assuming worst case scenario, ie, the array is in reverse order.

 

cchen

Diamond Member
Oct 12, 1999
6,062
0
76
Originally posted by: her209
Originally posted by: rsd
Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.
Too lazy to show the math, but that is still O(n^2) when you think about it.
But that is assuming worst case scenario, ie, the array is in reverse order.

you always assume worst case scenario
 

clamum

Lifer
Feb 13, 2003
26,256
406
126
Do an in-place quick sort, if the array to be sorted is random numbers it should run in O(n log n) time. It could be as bad as O(n^2) time if the array happens to be more or less already sorted, but it shouldn't.

Then take the last 10 elements of that array and copy them to a new array. Time: O(n log n + 10).

EDIT: Or use an in-place merge sort, always O(n log n) time.
 

spamsk8r

Golden Member
Jul 11, 2001
1,787
0
76
Using a binary heap to store the data would amortize the necessary time, so you would just maintain a heap as you add elements, then pick off the one from the top of the list, then again, for the first 10. Do a search for heap-sort if you have questions, its one of the more elegant (in my opinion) sorts, and its not difficult to understand.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: Armitage
Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

Here's the relation ... for m <= log(n), the above method is faster, for m > log(n) an nlogn sort is faster
 

oboeguy

Diamond Member
Dec 7, 1999
3,907
0
76
Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: oboeguy
Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

So, by your own logic, if m > log(n), you should sort
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Alrite folks, I just decided to use the insertion sort algorithm. It was the EASIEST to implement algorithm (I'm no cs guy), and it's not an algorithm that is going to be used continously, maybe two or three times a day at most. And that linear time selection algorithm looks interesting, but way to complicated for me to want to bother with.

 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Bah ... I have the algorithm I described above done in python ... but you posted this in the wrong forum. You can't attach code in off topic, and with python using whitespace for syntax, it won't work
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: Armitage
Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
Originally posted by: chuckywang
Originally posted by: Armitage
Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.

So does this algorithm is 10*O(n), since it only finds the ith largest number and I need to repeat 10 times to get the top 10?

 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: blahblah99
Originally posted by: chuckywang
Originally posted by: Armitage
Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.

So does this algorithm is 10*O(n), since it only finds the ith largest number and I need to repeat 10 times to get the top 10?


Yep.

Here's the bottom line:

In general, to find the largest m elements out of an n element set:

Linear Time Selection Algorithm: O(m*n)

Sorting: O(nlogn) where log is base 2.

Therefore, Lnear Time Selection Algorithm is faster if m<log n

You have 100 elements, and log (100) <7, so sorting would be faster in this case. However, Linear Time Selection is going to be better if the number of elements increases.
 
Sep 29, 2004
18,656
68
91
Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Bubblesort?
 
Sep 29, 2004
18,656
68
91
Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Find out what hte standard deviation is and all numbers in the lower 10 percentile are the nubmers you want?

OK, totally off the wall and wouldn't work, but thought I'd toss a totally random/different idea out there.
 

chuckywang

Lifer
Jan 12, 2004
20,133
1
0
Originally posted by: IHateMyJob2004
Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Bubblesort?

OMGWTFDIAF!!!
 
Sep 29, 2004
18,656
68
91
(1)Oh, break set of 100 elements into 10 seperate arrays(memcpy is fastest method) and sort each (bubblesort?).

(2)
Then take arrays 1,2 and find the ten lowest numbers and place them in a third array. Repeat this for 3 through 10.

(3)
now you have 5 arrays

(4)
Use simlar process as in (2) to end up with 3 arrays of size 10.

(5)
Use simlar process as in (4) to end up with 2 arrays of size 10.

(6)
Use simlar process as in (5) to end up with 1 arrays of size 10.

Step (6) will give you the lowest 10 numbers.

 
Sep 29, 2004
18,656
68
91
Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Bubblesort would probably be quicker.
 

sao123

Lifer
May 27, 2002
12,653
205
106
I think this is an easy but fast implementation....


Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

If you have an array of N numbers, and want the highest X numbers. It will take exactly X trips through the array. You do not have to sort the entire array.


[A][C][D]...[E][F] (B omitted due to bold formatting)
Compare element F to element E. If element F is greater than element E, change places. Continue comparing &amp; moving F as far as it will go. Where E stops, pick up the next value (it was heavier than E) and continue....
Each trip through the array... The heaviest remaining element will always fall to the bottom of the array. So after 10 trips you will have 10 sorted largest numbers in the array, and the rest as being unsorted. I think this is called the
edit
bubble sort because of how the heavier numbers fall toward the bottom &amp; the lighter numbers float up the array.
/edit
 

oboeguy

Diamond Member
Dec 7, 1999
3,907
0
76
Originally posted by: Armitage
Originally posted by: oboeguy
Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

So, by your own logic, if m > log(n), you should sort

Naturally. I should have put "not necessarily" instead of "no", but it was clear to you, right? :D
 

mjia

Member
Oct 8, 2004
94
0
0
If you deside to use something like quicksort, you can take a shortcut by throwing away the larger half as long as the small one has more then 10 elements. Actually, if you have some idea of how the numbers will be distributed, you can roughly pick pivots that will minimize the size of the smaller array so that it is close to 10 (probably safer to leave a fair margin so you can cut off most elements on the first go). I'm not really sure what the average time complexity is, but it will be better than O(nlogn).
 

oboeguy

Diamond Member
Dec 7, 1999
3,907
0
76
Originally posted by: mjia
If you deside to use something like quicksort, you can take a shortcut by throwing away the larger half as long as the small one has more then 10 elements. Actually, if you have some idea of how the numbers will be distributed, you can roughly pick pivots that will minimize the size of the smaller array so that it is close to 10 (probably safer to leave a fair margin so you can cut off most elements on the first go). I'm not really sure what the average time complexity is, but it will be better than O(nlogn).

This is going back like 10 years for me, but isn't the "select" method essentially a one-sided quicksort?