Computer science people - question for you.

her209 · Dec 22, 2004

Originally posted by: rsd

Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Click to expand...

Too lazy to show the math, but that is still O(n^2) when you think about it.

But that is assuming worst case scenario, ie, the array is in reverse order.

cchen · Dec 22, 2004

Originally posted by: her209

Originally posted by: rsd

Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Click to expand...

Too lazy to show the math, but that is still O(n^2) when you think about it.

Click to expand...

But that is assuming worst case scenario, ie, the array is in reverse order.

you always assume worst case scenario

clamum · Dec 22, 2004

Do an in-place quick sort, if the array to be sorted is random numbers it should run in O(n log n) time. It could be as bad as O(n^2) time if the array happens to be more or less already sorted, but it shouldn't.

Then take the last 10 elements of that array and copy them to a new array. Time: O(n log n + 10).

EDIT: Or use an in-place merge sort, always O(n log n) time.

spamsk8r · Dec 22, 2004

Using a binary heap to store the data would amortize the necessary time, so you would just maintain a heap as you add elements, then pick off the one from the top of the list, then again, for the first 10. Do a search for heap-sort if you have questions, its one of the more elegant (in my opinion) sorts, and its not difficult to understand.

darktubbly · Dec 22, 2004

Originally posted by: chuckywang

Originally posted by: blahblah99
How do I know which ten elements are the largest?

Click to expand...

By the linear time selection algorithm.

Winnnar. Hands down.

There's a good walkthrough of the algorithm here as well.

Armitage · Dec 22, 2004

Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

Armitage · Dec 22, 2004

Originally posted by: Armitage

Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Click to expand...

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

Here's the relation ... for m <= log(n), the above method is faster, for m > log(n) an nlogn sort is faster

Armitage · Dec 22, 2004

Originally posted by: Hajime
<snip>
There's only one minor bug in your idea, but that's for < 10 values, and it's also trivial....

What's that?

oboeguy · Dec 22, 2004

Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

Armitage · Dec 22, 2004

Originally posted by: oboeguy

Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Click to expand...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

So, by your own logic, if m > log(n), you should sort

blahblah99 · Dec 22, 2004

Alrite folks, I just decided to use the insertion sort algorithm. It was the EASIEST to implement algorithm (I'm no cs guy), and it's not an algorithm that is going to be used continously, maybe two or three times a day at most. And that linear time selection algorithm looks interesting, but way to complicated for me to want to bother with.

Armitage · Dec 22, 2004

Bah ... I have the algorithm I described above done in python ... but you posted this in the wrong forum. You can't attach code in off topic, and with python using whitespace for syntax, it won't work

chuckywang · Dec 22, 2004

Originally posted by: Armitage

Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Click to expand...

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.

blahblah99 · Dec 22, 2004

Originally posted by: chuckywang

Originally posted by: Armitage

Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Click to expand...

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

Click to expand...

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.

So does this algorithm is 10*O(n), since it only finds the ith largest number and I need to repeat 10 times to get the top 10?

EngenZerO · Dec 22, 2004

bwahahah, do your own hw problems.

chuckywang · Dec 22, 2004

Originally posted by: blahblah99

Originally posted by: chuckywang

Originally posted by: Armitage

Originally posted by: Armitage
Let's say you need the greatest m values
Copy the first m elements into your max array.
Sort it (O(mlogm)).

Now iterate from m->n
If the element is > the smallest element in the max array, insert it into the max array dropping the smallest member.
You will have to do this insertion at most (n-m) times, and the maximum number of comparisons you need to do to make the insertion is m-1 times, so the maximum number of comparisons you need to do is m*(n-m).

So the overall O for the worst case (original array already sorted and starting from the wrong end) would be O(mlogm) + O(nm-m^2)
For the best case (array already sorted, starting from the right end), you would never do any insertions ... just n-m comparisons.

I suspect my lack of formal compsci may be showing here, but it should be something like that.

Click to expand...

Heh ... for the case of n=1,000,000, m=10 the second term here works out to 9,999,900 while nlogn comes to 6,000,000
So you're still better off sorting in the worst case, although I suspect in a random case this algorithm would do better.

nlogn does mean base 10 logarithm right?

Click to expand...

log in compsci usually means base 2.

People....Linear Time Selection Algorithm. use it, it's the best.

Click to expand...

So does this algorithm is 10*O(n), since it only finds the ith largest number and I need to repeat 10 times to get the top 10?

Yep.

Here's the bottom line:

In general, to find the largest m elements out of an n element set:

Linear Time Selection Algorithm: O(m*n)

Sorting: O(nlogn) where log is base 2.

Therefore, Lnear Time Selection Algorithm is faster if m<log n

You have 100 elements, and log (100) <7, so sorting would be faster in this case. However, Linear Time Selection is going to be better if the number of elements increases.

IHateMyJob2004 · Dec 22, 2004

Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Bubblesort?

IHateMyJob2004 · Dec 22, 2004

Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Find out what hte standard deviation is and all numbers in the lower 10 percentile are the nubmers you want?

OK, totally off the wall and wouldn't work, but thought I'd toss a totally random/different idea out there.

chuckywang · Dec 22, 2004

Originally posted by: IHateMyJob2004

Originally posted by: blahblah99
Say I have an array of 100 elements with random numbers, and only want to keep the 10 highest numbers. What's the best way?

I don't want to sort through the whole array then pick the 10 highest, since the array can contain a million elements. Is there a variation of the sort algorithm?

TIA.

Click to expand...

Bubblesort?

OMGWTFDIAF!!!

IHateMyJob2004 · Dec 22, 2004

(1)Oh, break set of 100 elements into 10 seperate arrays(memcpy is fastest method) and sort each (bubblesort?).

(2)
Then take arrays 1,2 and find the ten lowest numbers and place them in a third array. Repeat this for 3 through 10.

(3)
now you have 5 arrays

(4)
Use simlar process as in (2) to end up with 3 arrays of size 10.

(5)
Use simlar process as in (4) to end up with 2 arrays of size 10.

(6)
Use simlar process as in (5) to end up with 1 arrays of size 10.

Step (6) will give you the lowest 10 numbers.

IHateMyJob2004 · Dec 22, 2004

Originally posted by: her209
Why not sort only half the array. E.g.,

Take the first number, any number higher than that, keep it: O(n)

Take the first number of the new list, and do the same: O(n/2)

Repeat until the list size equals the desired size.

Bubblesort would probably be quicker.

sao123 · Dec 22, 2004

I think this is an easy but fast implementation....

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

If you have an array of N numbers, and want the highest X numbers. It will take exactly X trips through the array. You do not have to sort the entire array.

[A][C][D]...[E][F] (B omitted due to bold formatting)
Compare element F to element E. If element F is greater than element E, change places. Continue comparing & moving F as far as it will go. Where E stops, pick up the next value (it was heavier than E) and continue....
Each trip through the array... The heaviest remaining element will always fall to the bottom of the array. So after 10 trips you will have 10 sorted largest numbers in the array, and the rest as being unsorted. I think this is called the
edit
bubble sort because of how the heavier numbers fall toward the bottom & the lighter numbers float up the array.
/edit

oboeguy · Dec 22, 2004

Originally posted by: Armitage

Originally posted by: oboeguy

Originally posted by: blahblah99

Doh, it just dawned on me that no matter what, I'll end up sorting the entire array...

Click to expand...

To clarify... no. The people telling you to use SELECT are correct, which will give you a method that runs in roughly O(mn) time for m = "10" in your example and n = elements in the array.

Click to expand...

So, by your own logic, if m > log(n), you should sort

Naturally. I should have put "not necessarily" instead of "no", but it was clear to you, right?

mjia · Dec 22, 2004

If you deside to use something like quicksort, you can take a shortcut by throwing away the larger half as long as the small one has more then 10 elements. Actually, if you have some idea of how the numbers will be distributed, you can roughly pick pivots that will minimize the size of the smaller array so that it is close to 10 (probably safer to leave a fair margin so you can cut off most elements on the first go). I'm not really sure what the average time complexity is, but it will be better than O(nlogn).

oboeguy · Dec 22, 2004

Originally posted by: mjia
If you deside to use something like quicksort, you can take a shortcut by throwing away the larger half as long as the small one has more then 10 elements. Actually, if you have some idea of how the numbers will be distributed, you can roughly pick pivots that will minimize the size of the smaller array so that it is close to 10 (probably safer to leave a fair margin so you can cut off most elements on the first go). I'm not really sure what the average time complexity is, but it will be better than O(nlogn).

This is going back like 10 years for me, but isn't the "select" method essentially a one-sided quicksort?

Computer science people - question for you.

No Lifer

Diamond Member

Lifer

Golden Member

Senior member

Banned

Banned

Banned

Diamond Member

Banned

Platinum Member

Banned

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Member

Diamond Member