Algorithms, Searches, Problems

WildW

Senior member
Oct 3, 2008
984
20
81
evilpicard.com
Somehow I've managed to work as a programmer for the last 9 years without any "proper" training - i.e. I trained as an electronic engineer rather than doing a comp-sci. course. I do alright, but I think I've missed out on some things.

Specifically, I see plenty of threads in here with people asking how best to solve various problems, and folks reply with things like "This is a classic knapsack problem," or "Use a tortoise vs hare algorithm."

I've never learned about any of these, but I would like to. It's embarressing using bubble sorts all the time. Can anyone recommend a good resource on these kind of things - book, website, whatever?
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Holy crap, You use the bubble sort? Wow, selection sort it just as easy to implement and almost always faster (thought it is still a O(n^2) algo.)

Two things you'll want to pick up, and algorithms and data-structure book. Both are pretty key concepts to programming. For algorithms I recommend http://www.amazon.com/Introduction-D.../dp/0201743957 .
For data structures. I never really had a good book for that, so someone else might have a better recommendation.

The other thing that will be helpful is just reading the following wikipedia articles (the wiki really isn't all that bad for CS stuff).

Essentials
http://en.wikipedia.org/wiki/Big_o_notation
http://en.wikipedia.org/wiki/Master_theorem
http://en.wikipedia.org/wiki/Time_complexity

Algorithm design techniques
http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
http://en.wikipedia.org/wiki/Dynamic_programming
http://en.wikipedia.org/wiki/Greedy_algorithm
http://en.wikipedia.org/wiki/Iterative_method
http://en.wikipedia.org/wiki/Brute_force_method // You should know when you are doing it
http://en.wikipedia.org/wiki/Breadth-first_search
http://en.wikipedia.org/wiki/Depth-first_search

And specific algorithms that are basically taught everywhere
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm
http://en.wikipedia.org/wiki/Heap_sort
http://en.wikipedia.org/wiki/Merge_sort
http://en.wikipedia.org/wiki/Quick_sort
http://en.wikipedia.org/wiki/Insertion_sort
http://en.wikipedia.org/wiki/Selection_sort
http://en.wikipedia.org/wiki/Binary_search

Specific data structures that are important to know about
http://en.wikipedia.org/wiki/Hash_table
http://en.wikipedia.org/wiki/Linked_list
http://en.wikipedia.org/wiki/Stack_(data_structure)
http://en.wikipedia.org/wiki/Queue_(data_structure)
http://en.wikipedia.org/wiki/Graph_(data_structure)
http://en.wikipedia.org/wiki/Tree_(data_structure)
http://en.wikipedia.org/wiki/Binary_tree
http://en.wikipedia.org/wiki/Red_black_tree
http://en.wikipedia.org/wiki/AVL_Tree

And some specific NPC problems that everyone knows about (ok, halting isn't NPC)
http://en.wikipedia.org/wiki/Travelling_salesman_problem
http://en.wikipedia.org/wiki/Halting_problem <- Unsolvable, so don't try :p
http://en.wikipedia.org/wiki/Knapsack_problem
http://en.wikipedia.org/wiki/Hamiltonian_path_problem
http://en.wikipedia.org/wiki/Directed_Hamiltonian_circuit_problem
http://en.wikipedia.org/wiki/Steiner_tree
(these are important to know because sometimes you just need to know when it is impossible to solve something directly without "guessing")

These are more or less the basics. Each article has tons of links into other aspects of these theories that you should feel free to browse. However, IMO getting these down is somewhat important.
 

C1

Platinum Member
Feb 21, 2008
2,391
113
106
Throw in :

"Operations Research for Immediate Application - A Quick & Dirty Manual"
by Robert E. D. Woolsey and Huntington Swanson (Harper & Row Publishing)

This book features many standard/classic algorithms and lists the needed math algorithms with explanation/tutorial along with sample useable program code. Topic areas include:

- Production Scheduling (eg, Minimize Total Processing Time, N Jobs on M Machines)
- Inventory Control
- Capitol Budgeting
- Markov Chains
- Out-of-Kilter Network
- Economic Analysis
- PERT-CPM
- Odds & End (Minimal Spanning Tree, Lumpy Demand, Work Load Balancing, Warehouse Location-Grange's Method, etc.)
- Academic Quick & Dirtys (Max Flow thru a Capacitated Network, Shortest Path from Node 1 to All Others, Max Cut-Min Flow, Slippery Algorithm, etc.)

Looks like there are at least 50-60 major methodologies provided.

Good Luck!

PS: I got my Operations Research MS Degree (IE Department) from University of Wisconsin - Madison, but it was a long time ago.
 

Schadenfroh

Elite Member
Mar 8, 2003
38,416
4
0
Both my regular 400 level undergraduate Algorithms class and my 500 level graduate advanced algorithms class used Introduction To Algorithms by Cormen, Rivest and Stein. The latter class was a living hell and caused much anguish and gnashing of teeth. My undergrad algorithms class was taught by a professor with a PhD in CS from UC Irvine and the graduate one was taught by a professor with a PhD in CS from MIT, so I would like to think that they picked the best book possible, but then again textbooks are the pharmaceuticals of academia.

Cogman's wikipages provides a good foundation if you do not want to read that textbook. You might want to audit an algorithms class at a local university as I had to spend many, many hours speaking with the TA before some of the more advanced topics and more difficult problems presented in the textbook started to make some sense.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Both my regular 400 level undergraduate Algorithms class and my 500 level graduate advanced algorithms class used Introduction To Algorithms by Cormen, Rivest and Stein. The latter class was a living hell and caused much anguish and gnashing of teeth. My undergrad algorithms class was taught by a professor with a PhD in CS from UC Irvine and the graduate one was taught by a professor with a PhD in CS from MIT, so I would like to think that they picked the best book possible, but then again textbooks are the pharmaceuticals of academia.

Cogman's wikipages provides a good foundation if you do not want to read that textbook. You might want to audit an algorithms class at a local university as I had to spend many, many hours speaking with the TA before some of the more advanced topics and more difficult problems presented in the textbook started to make some sense.

My algorithms class at UT was taught from it, very thorough and useful book.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
My algorithms class at UT was taught from it, very thorough and useful book.

This is the latest version of that text... at my school we call it "CLRS":
http://www.amazon.com/Introduction-A...0007631&amp;sr=1-1

The second edition is more common & can probably be had for a good bit cheaper. It's missing a new section on multicore/parallel computing... but that section is very, very introductory so I wouldn't worry about it, lol. I think the pseudo-code was substantially improved btwn 2nd and 3rd editions too; that might be more of a concern.

That's the book that I learned out of. My only complaint is that the book is soooo wordy. (You can thank Leiserson for that.) But it is clear and generally does a fine job explaining a broad set of introductory algorithms topics. (One exception: the wiki article on Red-Black trees is more clear than CLRS on that topic, haha.)

Also, I would take a look at MIT's "OpenCourseWare" pages. You'll be able to see lecture notes, assignments (usually with solutions), etc. for a number of topics... good learning tool.
http://ocw.mit.edu/courses/electric...ntroduction-to-algorithms-sma-5503-fall-2005/
^^Undergraduate algorithms. A run of the mill undergrad algorithms class.

http://ocw.mit.edu/courses/electric...science/6-854j-advanced-algorithms-fall-2005/
^^Graduate algorithms. Quite a bit more challenging & the topics are more interesting too. And by quite a bit, I mean infinity-billion times more. (Can try http://courses.csail.mit.edu/6.854/current/ too. Only issue is once the next class starts, lots of materials will get taken off the page.)

http://ocw.mit.edu/courses/electric...ience/6-856j-randomized-algorithms-fall-2002/
^^Randomized algorithms. I haven't taken this yet but I want to.

http://courses.csail.mit.edu/6.851/
^Graduate data structures. Has some pretty cool stuff. Some practical, some not so much.

Edit: I should add that using bubblesort isn't embarrassing unless your data sets are larger than 20-30 numbers or smaller than 6-7 :) (Larger and the divide & conquer methods start winning. Smaller and you're better off coding up all the possibilities manually.) Though as cogman said, selection sort is typically faster without being any more complicated--"faster" as measured from a theoretical POV and a practical one.

For an even better sorting algorithm... http://en.wikipedia.org/wiki/Bogosort :D

And if you haven't looked up "tortoise vs hare" yet, (that isn't the actual name of the algorithm... I forgot who it's named after) try to solve this problem & you'll have discovered it for yourself :)

Suppose I give you a linked list (or more generally some representation of a directed graph); say each node contains an integer. You cannot modify the linked list, but you can step through the list nodes & read the integer they contain. The list is potentially extremely large, and I'm not telling you how large it is.

The problem: how quickly can you tell me if the linked list has a cycle? (Cycle = a list node points to a previous node or to itself, so that by always looking at node.next, you'll never reach "the end" of the list; i.e., tell me if the linked list has a last element.) And "quickly" means: how long will it take in terms of the number of nodes you read & how much storage will you require (storing 1 number is 1 unit of space).

Hint: the name "tortoise vs hare" has a lot to do with the answer!
 
Last edited:

WildW

Senior member
Oct 3, 2008
984
20
81
evilpicard.com
Suppose I give you a linked list (or more generally some representation of a directed graph); say each node contains an integer. You cannot modify the linked list, but you can step through the list nodes & read the integer they contain. The list is potentially extremely large, and I'm not telling you how large it is.

The problem: how quickly can you tell me if the linked list has a cycle? (Cycle = a list node points to a previous node or to itself, so that by always looking at node.next, you'll never reach "the end" of the list; i.e., tell me if the linked list has a last element.) And "quickly" means: how long will it take in terms of the number of nodes you read & how much storage will you require (storing 1 number is 1 unit of space).

Hint: the name "tortoise vs hare" has a lot to do with the answer!

Logic tells me that you can never know for sure that you've reached the end. It may appear to cycle a bazillion times and then be different. . . it depends on what level of confidence you want to be sure to?

Edit: Given some time to work on this, my subconscious has come up with this suggestion. The idiot hare runs off through the list examining values. The tortoise just stores a pointer to the first item we looked at and waits for the address to come around again. Another victory for the This pointer.

So yes, the bubble sort thing. . . I have used them a few times, but not often. Mostly my programming tasks are nothing "clever" that would require a sort at all. More often "filter these 5000 data files" or "make the explosion look prettier and follow this distribution." I've never worked with large data sets which is probably why I've never come across most of this stuff. But I would like to be a better programmer =)
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Edit: I should add that using bubblesort isn't embarrassing unless your data sets are larger than 20-30 numbers or smaller than 6-7 :) (Larger and the divide & conquer methods start winning. Smaller and you're better off coding up all the possibilities manually.) Though as cogman said, selection sort is typically faster without being any more complicated--"faster" as measured from a theoretical POV and a practical one.

Selection sorting is always faster then bubble sorting, theoretically and practically. Selection sorts are very good with the cache and do less comparisons. That, and the code is really not much harder to write then that of a bubble sort (I would argue it is easier).

For small datasets (that fit in cache) selection sorting is pretty much unbeatable. For a really fast sort, a common trick is to use quick sort while the elements being sorted are big, then switch to the selection sort when the elements are small. Makes for a hard to beat sort.

:) The bubble sort is unholy, it is about the slowest REAL sort you can write (no bogo sorts). When you have an alternative that is faster, and just as complex to write/understand, why not use it all the time?

Just my two bits, and my crusade to eliminate the bubblesort from any sort of application.:awe:
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,695
4,658
75

Cogman

Lifer
Sep 19, 2000
10,286
145
106

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Selection sorting is always faster then bubble sorting, theoretically and practically. Selection sorts are very good with the cache and do less comparisons. That, and the code is really not much harder to write then that of a bubble sort (I would argue it is easier).

For small datasets (that fit in cache) selection sorting is pretty much unbeatable. For a really fast sort, a common trick is to use quick sort while the elements being sorted are big, then switch to the selection sort when the elements are small. Makes for a hard to beat sort.

:) The bubble sort is unholy, it is about the slowest REAL sort you can write (no bogo sorts). When you have an alternative that is faster, and just as complex to write/understand, why not use it all the time?

Just my two bits, and my crusade to eliminate the bubblesort from any sort of application.:awe:

There are times when they would be equally slow--e.g., N,1,2,3,4,...,N-1. Though I guess I should have just said selection is always faster; I approve of your crusade against bubble sort :D

WildW: Consider this counter-example. Say my list is:
1->2, 2->2. (Node1.next = Node2. Node2.next = Node2)

So your tortoise pointer will be sitting at Node1. The hare will be running around the Node2->Node2 cycle until the end of time, never seeing the tortoise again. So in this case, your algorithm would actually never terminate.

As for your first remark, it is always possible to deterministically know whether a cycle exists. The answer doesn't depend on how 'confident' you want to be :)
 

WildW

Senior member
Oct 3, 2008
984
20
81
evilpicard.com
WildW: Consider this counter-example. Say my list is:
1->2, 2->2. (Node1.next = Node2. Node2.next = Node2)

So your tortoise pointer will be sitting at Node1. The hare will be running around the Node2->Node2 cycle until the end of time, never seeing the tortoise again. So in this case, your algorithm would actually never terminate.

As for your first remark, it is always possible to deterministically know whether a cycle exists. The answer doesn't depend on how 'confident' you want to be :)

Well yes, if we assume the cycle will happen in a "nice" circular way we can just wait for the first address to come around again. If we assume the list can do mental things like loop back to halfway inside itself then you'd check each new address against every one seen so far. I was actually expecting someone to say "Hey, you can't assume you're using a language where you can grab pointers to stuff." :p

As far as deterministically deciding whether a loop exists, from just looking at the values, I don't follow. Even if you see the whole pattern you've seen so far loop in its entirity it might not be a true loop. What are you getting at?
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Well yes, if we assume the cycle will happen in a "nice" circular way we can just wait for the first address to come around again. If we assume the list can do mental things like loop back to halfway inside itself then you'd check each new address against every one seen so far. I was actually expecting someone to say "Hey, you can't assume you're using a language where you can grab pointers to stuff." :p

As far as deterministically deciding whether a loop exists, from just looking at the values, I don't follow. Even if you see the whole pattern you've seen so far loop in its entirity it might not be a true loop. What are you getting at?

So this could require a lot of extra space: in the worst case, you have to store a duplicate copy of the entire linked list. So if there are N nodes, you require O(N) extra storage. Also to "compare with every one seen so far" costs O(N^2) time (= 1 + 2 + 3 + ... + N).

Storing every visited node & comparing against your "history" is most people's first guess on this problem (it's what I thought of first too). But you can do better: less than linear storage, less than quadratic time.


As for your second comment, I was unclear earlier. I meant to say that each node holds a *unique* integer, lol. That is meant to get around any computational model (non-pointer machine) or programming language assumptions about whether you can grab memory addresses.
 

WildW

Senior member
Oct 3, 2008
984
20
81
evilpicard.com
As for your second comment, I was unclear earlier. I meant to say that each node holds a *unique* integer, lol. That is meant to get around any computational model (non-pointer machine) or programming language assumptions about whether you can grab memory addresses.

Ahh, that makes more sense, now I can see things getting potentially more interesting.

This is the moment I would consider implementing some crazy maths, and then not bother. Unless code is performance critical I tend to err on the side of code readability for some future programmer - which is usually myself 6 months down the line thinking "what is this idiot doing?" and then noticing my name on the code.

But now I shall read and learn at least.