Algorithms, Searches, Problems

WildW · Jul 24, 2010

Somehow I've managed to work as a programmer for the last 9 years without any "proper" training - i.e. I trained as an electronic engineer rather than doing a comp-sci. course. I do alright, but I think I've missed out on some things.

Specifically, I see plenty of threads in here with people asking how best to solve various problems, and folks reply with things like "This is a classic knapsack problem," or "Use a tortoise vs hare algorithm."

I've never learned about any of these, but I would like to. It's embarressing using bubble sorts all the time. Can anyone recommend a good resource on these kind of things - book, website, whatever?

Cogman · Jul 24, 2010

http://en.wikipedia.org/wiki/Knapsack_problem
http://en.wikipedia.org/wiki/Hamiltonian_path_problem
http://en.wikipedia.org/wiki/Directed_Hamiltonian_circuit_problem
http://en.wikipedia.org/wiki/Steiner_tree
(these are important to know because sometimes you just need to know when it is impossible to solve something directly without "guessing")

These are more or less the basics. Each article has tons of links into other aspects of these theories that you should feel free to browse. However, IMO getting these down is somewhat important.

C1 · Jul 24, 2010

Throw in :

"Operations Research for Immediate Application - A Quick & Dirty Manual"
by Robert E. D. Woolsey and Huntington Swanson (Harper & Row Publishing)

This book features many standard/classic algorithms and lists the needed math algorithms with explanation/tutorial along with sample useable program code. Topic areas include:

- Production Scheduling (eg, Minimize Total Processing Time, N Jobs on M Machines)
- Inventory Control
- Capitol Budgeting
- Markov Chains
- Out-of-Kilter Network
- Economic Analysis
- PERT-CPM
- Odds & End (Minimal Spanning Tree, Lumpy Demand, Work Load Balancing, Warehouse Location-Grange's Method, etc.)
- Academic Quick & Dirtys (Max Flow thru a Capacitated Network, Shortest Path from Node 1 to All Others, Max Cut-Min Flow, Slippery Algorithm, etc.)

Looks like there are at least 50-60 major methodologies provided.

Good Luck!

PS: I got my Operations Research MS Degree (IE Department) from University of Wisconsin - Madison, but it was a long time ago.

WildW · Jul 24, 2010

Looks like lots for me to start reading. Thanks folks

Schadenfroh · Jul 24, 2010

Both my regular 400 level undergraduate Algorithms class and my 500 level graduate advanced algorithms class used Introduction To Algorithms by Cormen, Rivest and Stein. The latter class was a living hell and caused much anguish and gnashing of teeth. My undergrad algorithms class was taught by a professor with a PhD in CS from UC Irvine and the graduate one was taught by a professor with a PhD in CS from MIT, so I would like to think that they picked the best book possible, but then again textbooks are the pharmaceuticals of academia.

Cogman's wikipages provides a good foundation if you do not want to read that textbook. You might want to audit an algorithms class at a local university as I had to spend many, many hours speaking with the TA before some of the more advanced topics and more difficult problems presented in the textbook started to make some sense.

Crusty · Jul 24, 2010

Schadenfroh said:
Both my regular 400 level undergraduate Algorithms class and my 500 level graduate advanced algorithms class used Introduction To Algorithms by Cormen, Rivest and Stein. The latter class was a living hell and caused much anguish and gnashing of teeth. My undergrad algorithms class was taught by a professor with a PhD in CS from UC Irvine and the graduate one was taught by a professor with a PhD in CS from MIT, so I would like to think that they picked the best book possible, but then again textbooks are the pharmaceuticals of academia.

Cogman's wikipages provides a good foundation if you do not want to read that textbook. You might want to audit an algorithms class at a local university as I had to spend many, many hours speaking with the TA before some of the more advanced topics and more difficult problems presented in the textbook started to make some sense.

My algorithms class at UT was taught from it, very thorough and useful book.

eLiu · Jul 24, 2010

Crusty said:
My algorithms class at UT was taught from it, very thorough and useful book.

This is the latest version of that text... at my school we call it "CLRS":
http://www.amazon.com/Introduction-A...0007631&sr=1-1

The second edition is more common & can probably be had for a good bit cheaper. It's missing a new section on multicore/parallel computing... but that section is very, very introductory so I wouldn't worry about it, lol. I think the pseudo-code was substantially improved btwn 2nd and 3rd editions too; that might be more of a concern.

That's the book that I learned out of. My only complaint is that the book is soooo wordy. (You can thank Leiserson for that.) But it is clear and generally does a fine job explaining a broad set of introductory algorithms topics. (One exception: the wiki article on Red-Black trees is more clear than CLRS on that topic, haha.)

Also, I would take a look at MIT's "OpenCourseWare" pages. You'll be able to see lecture notes, assignments (usually with solutions), etc. for a number of topics... good learning tool.
http://ocw.mit.edu/courses/electric...ntroduction-to-algorithms-sma-5503-fall-2005/
^^Undergraduate algorithms. A run of the mill undergrad algorithms class.

http://ocw.mit.edu/courses/electric...science/6-854j-advanced-algorithms-fall-2005/
^^Graduate algorithms. Quite a bit more challenging & the topics are more interesting too. And by quite a bit, I mean infinity-billion times more. (Can try http://courses.csail.mit.edu/6.854/current/ too. Only issue is once the next class starts, lots of materials will get taken off the page.)

http://ocw.mit.edu/courses/electric...ience/6-856j-randomized-algorithms-fall-2002/
^^Randomized algorithms. I haven't taken this yet but I want to.

http://courses.csail.mit.edu/6.851/
^Graduate data structures. Has some pretty cool stuff. Some practical, some not so much.

Edit: I should add that using bubblesort isn't embarrassing unless your data sets are larger than 20-30 numbers or smaller than 6-7

(Larger and the divide & conquer methods start winning. Smaller and you're better off coding up all the possibilities manually.) Though as cogman said, selection sort is typically faster without being any more complicated--"faster" as measured from a theoretical POV and a practical one.

For an even better sorting algorithm... http://en.wikipedia.org/wiki/Bogosort

And if you haven't looked up "tortoise vs hare" yet, (that isn't the actual name of the algorithm... I forgot who it's named after) try to solve this problem & you'll have discovered it for yourself

Suppose I give you a linked list (or more generally some representation of a directed graph); say each node contains an integer. You cannot modify the linked list, but you can step through the list nodes & read the integer they contain. The list is potentially extremely large, and I'm not telling you how large it is.

The problem: how quickly can you tell me if the linked list has a cycle? (Cycle = a list node points to a previous node or to itself, so that by always looking at node.next, you'll never reach "the end" of the list; i.e., tell me if the linked list has a last element.) And "quickly" means: how long will it take in terms of the number of nodes you read & how much storage will you require (storing 1 number is 1 unit of space).

Hint: the name "tortoise vs hare" has a lot to do with the answer!

WildW · Jul 24, 2010

eLiu said:
Suppose I give you a linked list (or more generally some representation of a directed graph); say each node contains an integer. You cannot modify the linked list, but you can step through the list nodes & read the integer they contain. The list is potentially extremely large, and I'm not telling you how large it is.

The problem: how quickly can you tell me if the linked list has a cycle? (Cycle = a list node points to a previous node or to itself, so that by always looking at node.next, you'll never reach "the end" of the list; i.e., tell me if the linked list has a last element.) And "quickly" means: how long will it take in terms of the number of nodes you read & how much storage will you require (storing 1 number is 1 unit of space).

Hint: the name "tortoise vs hare" has a lot to do with the answer!

Logic tells me that you can never know for sure that you've reached the end. It may appear to cycle a bazillion times and then be different. . . it depends on what level of confidence you want to be sure to?

Edit: Given some time to work on this, my subconscious has come up with this suggestion. The idiot hare runs off through the list examining values. The tortoise just stores a pointer to the first item we looked at and waits for the address to come around again. Another victory for the This pointer.

So yes, the bubble sort thing. . . I have used them a few times, but not often. Mostly my programming tasks are nothing "clever" that would require a sort at all. More often "filter these 5000 data files" or "make the explosion look prettier and follow this distribution." I've never worked with large data sets which is probably why I've never come across most of this stuff. But I would like to be a better programmer =)

Cogman · Jul 24, 2010

eLiu said:
Edit: I should add that using bubblesort isn't embarrassing unless your data sets are larger than 20-30 numbers or smaller than 6-7 (Larger and the divide & conquer methods start winning. Smaller and you're better off coding up all the possibilities manually.) Though as cogman said, selection sort is typically faster without being any more complicated--"faster" as measured from a theoretical POV and a practical one.

Selection sorting is always faster then bubble sorting, theoretically and practically. Selection sorts are very good with the cache and do less comparisons. That, and the code is really not much harder to write then that of a bubble sort (I would argue it is easier).

For small datasets (that fit in cache) selection sorting is pretty much unbeatable. For a really fast sort, a common trick is to use quick sort while the elements being sorted are big, then switch to the selection sort when the elements are small. Makes for a hard to beat sort.

The bubble sort is unholy, it is about the slowest REAL sort you can write (no bogo sorts). When you have an alternative that is faster, and just as complex to write/understand, why not use it all the time?

Just my two bits, and my crusade to eliminate the bubblesort from any sort of application.:awe:

Ken g6 · Jul 24, 2010

Cogman said:
http://en.wikipedia.org/wiki/Brute_force_method // You should know when you are doing it

As often happens with Wikipedia, I think that link doesn't go where you expected it to go.

Regardless, here are a couple of other algorithm design techniques:

http://en.wikipedia.org/wiki/Brute_force_search // The method of last resort.
http://en.wikipedia.org/wiki/Backtracking // Usually helps said method of last resort, at least a little.

Cogman · Jul 24, 2010

Ken g6 said:
As often happens with Wikipedia, I think that link doesn't go where you expected it to go.

Regardless, here are a couple of other algorithm design techniques:

http://en.wikipedia.org/wiki/Brute_force_search // The method of last resort.
http://en.wikipedia.org/wiki/Backtracking // Usually helps said method of last resort, at least a little.

While the link isn't EXACTLY what I'm looking for, it really isn't too far off.

eLiu · Jul 25, 2010

Cogman said:
Selection sorting is always faster then bubble sorting, theoretically and practically. Selection sorts are very good with the cache and do less comparisons. That, and the code is really not much harder to write then that of a bubble sort (I would argue it is easier).

For small datasets (that fit in cache) selection sorting is pretty much unbeatable. For a really fast sort, a common trick is to use quick sort while the elements being sorted are big, then switch to the selection sort when the elements are small. Makes for a hard to beat sort.

The bubble sort is unholy, it is about the slowest REAL sort you can write (no bogo sorts). When you have an alternative that is faster, and just as complex to write/understand, why not use it all the time?

Just my two bits, and my crusade to eliminate the bubblesort from any sort of application.:awe:

There are times when they would be equally slow--e.g., N,1,2,3,4,...,N-1. Though I guess I should have just said selection is always faster; I approve of your crusade against bubble sort

WildW: Consider this counter-example. Say my list is:
1->2, 2->2. (Node1.next = Node2. Node2.next = Node2)

So your tortoise pointer will be sitting at Node1. The hare will be running around the Node2->Node2 cycle until the end of time, never seeing the tortoise again. So in this case, your algorithm would actually never terminate.

As for your first remark, it is always possible to deterministically know whether a cycle exists. The answer doesn't depend on how 'confident' you want to be

WildW · Jul 25, 2010

eLiu said:
WildW: Consider this counter-example. Say my list is:
1->2, 2->2. (Node1.next = Node2. Node2.next = Node2)

So your tortoise pointer will be sitting at Node1. The hare will be running around the Node2->Node2 cycle until the end of time, never seeing the tortoise again. So in this case, your algorithm would actually never terminate.

As for your first remark, it is always possible to deterministically know whether a cycle exists. The answer doesn't depend on how 'confident' you want to be

Well yes, if we assume the cycle will happen in a "nice" circular way we can just wait for the first address to come around again. If we assume the list can do mental things like loop back to halfway inside itself then you'd check each new address against every one seen so far. I was actually expecting someone to say "Hey, you can't assume you're using a language where you can grab pointers to stuff."

As far as deterministically deciding whether a loop exists, from just looking at the values, I don't follow. Even if you see the whole pattern you've seen so far loop in its entirity it might not be a true loop. What are you getting at?

eLiu · Jul 25, 2010

WildW said:
Well yes, if we assume the cycle will happen in a "nice" circular way we can just wait for the first address to come around again. If we assume the list can do mental things like loop back to halfway inside itself then you'd check each new address against every one seen so far. I was actually expecting someone to say "Hey, you can't assume you're using a language where you can grab pointers to stuff."

As far as deterministically deciding whether a loop exists, from just looking at the values, I don't follow. Even if you see the whole pattern you've seen so far loop in its entirity it might not be a true loop. What are you getting at?

So this could require a lot of extra space: in the worst case, you have to store a duplicate copy of the entire linked list. So if there are N nodes, you require O(N) extra storage. Also to "compare with every one seen so far" costs O(N^2) time (= 1 + 2 + 3 + ... + N).

Storing every visited node & comparing against your "history" is most people's first guess on this problem (it's what I thought of first too). But you can do better: less than linear storage, less than quadratic time.

As for your second comment, I was unclear earlier. I meant to say that each node holds a *unique* integer, lol. That is meant to get around any computational model (non-pointer machine) or programming language assumptions about whether you can grab memory addresses.

WildW · Jul 25, 2010

eLiu said:
As for your second comment, I was unclear earlier. I meant to say that each node holds a *unique* integer, lol. That is meant to get around any computational model (non-pointer machine) or programming language assumptions about whether you can grab memory addresses.

Ahh, that makes more sense, now I can see things getting potentially more interesting.

This is the moment I would consider implementing some crazy maths, and then not bother. Unless code is performance critical I tend to err on the side of code readability for some future programmer - which is usually myself 6 months down the line thinking "what is this idiot doing?" and then noticing my name on the code.

But now I shall read and learn at least.

Search

Algorithms, Searches, Problems

WildW

Senior member

Cogman

Lifer

C1

Platinum Member

WildW

Senior member

Schadenfroh

Elite Member

Crusty

Lifer

eLiu

Diamond Member

WildW

Senior member

Cogman

Lifer

Ken g6

Programming Moderator, Elite Member

Cogman

Lifer

eLiu

Diamond Member

WildW

Senior member

eLiu

Diamond Member

WildW

Senior member

TRENDING THREADS