Big-O analysis question

OneOfTheseDays · Jan 25, 2013

I'm studying some practice problems for a programming interview and wanted to get some guidance from you guys. The question asks to write a method that will print out the path to the deepest leaf node in a binary tree. I've figured out how to do that, but wanted to make sure I'm computing the run-time of my algorithm correctly.

From my observation, the runtime of this algorithm will be O(n^2). It's n^2 because in the worst case (unbalanced tree) we will visit every node, and for each node we must revisit (n-1), (n-2), (n-3) nodes again and computing their heights as we traverse down the tree. Is this correct or am I missing something here?

Code:

void PrintStringToDeepestNode(Node *pRoot)
{
	if (!pRoot)
	{
                // Root is NULL, base case to break out of recursion
		return;
        }

	// Print current node value
	printf("%c\n", pRoot->c);

	// Determine which child subtree has the greater height then recursively
	// traverse down that path. If both heights are equal, the right subtree
        // is automatically chosen
	if (ComputeHeight(pRoot->left) > ComputeHeight(pRoot->right))
	{
		PrintStringToDeepestNode(pRoot->left);
	}
	else
	{
		PrintStringToDeepestNode(pRoot->right);
	}
}

int ComputeHeight(Node *pRoot)
{
	if (!pRoot)
	{
		// If root is NULL, return -1 to indicate no height
		return -1;
	} 	
        else if (!pRoot->left && !pRoot->right) 
	{
		// If root has no children, return 0
		return 0;
        }
	else
	{
		// Determine node height as follows:
                // height = 1 + height(max_child_subtree)
		return std::max(1 + ComputeHeight(pRoot->left), 
 	                              1 + ComputeHeight(pRoot->right));
	}
}

DaveSimmons · Jan 25, 2013

> for each node we must revisit (n-1), (n-2), (n-3) nodes again and computing their heights as we traverse down the tree.

If you used a global variable, I think you could do this in O(N) by having each leaf node update the "best node found" with itself if it is at the end of a longer path.

OneOfTheseDays · Jan 25, 2013

I have a more optimized solution to this problem where I cache the node heights, but for now I'm concerned with characterizing the runtime of my unoptimized solution. Thanks.

BrightCandle · Jan 25, 2013

The worst case for a tree is to be unbalanced such that it is a linked list, either all on the left or all on the right in terms of height. Otherwise we best assume its balance and its log2(N) high.

So dealing with N long linked list style tree ComputeHeight takes N steps and is called the height of the tree times. Thus we can conclude this is O(N^2).

The other case of a balanced tree has the height being log2(n). ComputeHeight will thus take this time period as will PrintStringToDeepestNode, both have performance determined by the height of the tree. Thus it is O(log2(n) ^2)

In conclusion O(N^2) in the worst case O(log2(n)^2) in the best case.

OneOfTheseDays · Jan 26, 2013

BrightCandle said:
The worst case for a tree is to be unbalanced such that it is a linked list, either all on the left or all on the right in terms of height. Otherwise we best assume its balance and its log2(N) high.

So dealing with N long linked list style tree ComputeHeight takes N steps and is called the height of the tree times. Thus we can conclude this is O(N^2).

The other case of a balanced tree has the height being log2(n). ComputeHeight will thus take this time period as will PrintStringToDeepestNode, both have performance determined by the height of the tree. Thus it is O(log2(n) ^2)

In conclusion O(N^2) in the worst case O(log2(n)^2) in the best case.

Thanks!

Now assuming I optimized my ComputeHeight() function to cache individual node heights in a hashtable as they are computed, would this effectively reduce the runtime down to O(n)?

slugg · Jan 26, 2013

OneOfTheseDays said:
Thanks!

Now assuming I optimized my ComputeHeight() function to cache individual node heights in a hashtable as they are computed, would this effectively reduce the runtime down to O(n)?

No. Big-O describes end behavior for sufficiently large N. So if you set up a cache, I'll just set my theoretical N to an even bigger set, then you're back to whatever your upper bound is. No implementation trick can lower the complexity of an algorithm. If it did, then it's not the same algorithm!

The optimal algorithm is in fact O(N). It's quite simple. Something like this:

Code:

int GetTreeHeight (node n)
{
    if(n == null)
    {
          return -1;
    }
    else
    {
         int left = GetTreeHeight(n.left) + 1;
         int right = GetTreeHeight(n.right) + 1;
         if(left > right)
         {
                return left;
         }
         else
         {
                return right;
         }
    }
}

You can think about it this way.

Assume you get a "best case" traversal the first time, which just happens to be the longest traversal (height) of the tree. So lets say for a tree of N nodes, the height will be n. So that leaves N - n nodes left in the tree that have not been traversed. The rest of the nodes must be traversed in order to prove the original traversal was in fact the longest. Worst case is just the contrapositive to this.

slugg · Jan 26, 2013

BrightCandle said:
The worst case for a tree is to be unbalanced such that it is a linked list, either all on the left or all on the right in terms of height. Otherwise we best assume its balance and its log2(N) high.

Okay, yes.

BrightCandle said:
So dealing with N long linked list style tree ComputeHeight takes N steps and is called the height of the tree times. Thus we can conclude this is O(N^2).

... What? No. I mean yes, you could write an O(N^2) algorithm, but that is worst case for a non-optimal algorithm.

BrightCandle said:
The other case of a balanced tree has the height being log2(n). ComputeHeight will thus take this time period as will PrintStringToDeepestNode, both have performance determined by the height of the tree. Thus it is O(log2(n) ^2)

Think about it; log(n)^2 = 2log(n) = O(log(n)). So you're saying you can exhaustively traverse a tree in O(log(n))? No. Not even close. Sorry.

edit: woops, messed up my parenthesis, which made this statement completely wrong. My apologies!

BrightCandle said:
In conclusion O(N^2) in the worst case O(log2(n)^2) in the best case.

No. See my previous post. The optimal algorithm is best and worst case N, or Big-Theta(N).

iCyborg · Jan 26, 2013

slugg said:
... What? No. I mean yes, you could write an O(N^2) algorithm, but that is worst case for a non-optimal algorithm.

Yes, there's a O(n) algorithm, which is the best you can do in the worst case, OP said that too, but the OP asked about his particular implementation from his first post. And that one is O(n^2).

slugg said:
Think about it; log(n)^2 = 2log(n) = O(log(n)). So you're saying you can exhaustively traverse a tree in O(log(n))? No. Not even close. Sorry.

log(n^2) = 2log(n), but log(n)^2 is not equal to 2log(n).
I didn't read how he got this bound, though I also have a feeling that even with a balanced tree, you still need O(n), how will you know it's balanced if you only inspected O(log(n)) nodes...

BrightCandle · Jan 26, 2013

Sorry you are both right the balanced tree compute height is indeed O(N) make this algorithms optimal complexity O(n log2(n)) worse case remains O(n^2)

There is no doubt in my mind that a better algorithm can be written to make this O(n) by storing the heights of the nodes on one call and then utilising those heights to transverse. Its the repeated calls to ComputeHeight which is repeating the work that is making this algorithm suboptimal.

The best case algorithm is actually O(log2(n)). But it requires you have a tree that knows the number of children so it can be queried. Its not as ridiculous as it sounds as the derivation of red/black trees was determined by trying to work out balance based on children counts and that actually you don't as much information as a count to make balancing work.

slugg · Jan 26, 2013

iCyborg said:
log(n^2) = 2log(n), but log(n)^2 is not equal to 2log(n).

edit: ah, sorry. I now read your parenthesis correctly and realized mine were wrong. Yea, I'd agree with that statement, but I don't agree with log(n)^2 as the upper bound. I agree with your following statement that you think it's n. My apologies.

BrightCandle said:
Sorry you are both right the balanced tree compute height is indeed O(N) make this algorithms optimal complexity O(n log2(n)) worse case remains O(n^2).

Where are you getting this? No, I must disagree.

You're confusing making a single traversal of a tree with proving that the traversal you made is the longest one. This is the problem - not only do you need to make a traversal (which yes, is best case log(N) and worst case N - no idea where you guys are getting N^2), but you must also show global optima. You must exhaustively traverse every path from the root to a leaf.

For any kind of binary tree, balanced or not, it will take a maximum of N iterations to perform the longest traversal. The algorithm must now backtrack (pop recursion stack) to each unvisited branch; this takes N - 1 iterations if the root has one child, or N if it has two children - so worst case is N for that part of the algorithm. So you have N + N = 2N iterations, which is O(N).

slugg · Jan 26, 2013

So this whole time, I've been talking about the OP's ComputeHeight algorithm. So now to talk about PrintStringToDeepestNode... As it's implemented, it's N^2, but the entire problem, "write a method that will print out the path to the deepest leaf node in a binary tree", can be done in O(N).

Add another parameter to GetTreeHeight - let's call it pathStack. It'll represent a stack of strings. Add the deepest node to a stack. Then in the calling function, pop off the stack to print in order of traversal (reverse of the insertion).

Code:

void PrintStringToDeepestNode(node n)
{
     Stack pathStack= new Stack();
     GetTreeHeightAndPath(n, pathStack);
     
     while (pathStack.Count > 0)
     {
           print(pathStack.pop() + " ");
     }
}

int GetTreeHeightAndPath(node n, Stack pathStack)
{
    if(n == null)
    {
          return -1;
    }
    else
    {
         int left = GetTreeHeightAndPath(n.left) + 1;
         int right = GetTreeHeightAndPath(n.right) + 1;
 
         if(left > right)
         {
                [B]pathStack.add(n.left.value);[/B]
                return left;
         }
         else
         {
                [B]printStack.add(n.right.value);[/B]
                return right;
         }
    }
}

GetTreeHeightAndPath is about 2N = O(N) as I showed in a previous post. Worst case, the traversal path of the height will be N nodes, so we get about 3N = O(N). So the entire algorithm is O(N).

BrightCandle · Jan 26, 2013

slugg said:
You're confusing making a single traversal of a tree with proving that the traversal you made is the longest one. This is the problem - not only do you need to make a traversal (which yes, is best case log(N) and worst case N - no idea where you guys are getting N^2), but you must also show global optima. You must exhaustively traverse every path from the root to a leaf.

For any kind of binary tree, balanced or not, it will take a maximum of N iterations to perform the longest traversal. The algorithm must now backtrack (pop recursion stack) to each unvisited branch; this takes N - 1 iterations if the root has one child, or N if it has two children - so worst case is N for that part of the algorithm. So you have N + N = 2N iterations, which is O(N).

I am listing the order of the algorithm presented, that is not the same algorithm as you just described. The Op has an algorithm that computes the height of the tree on every node, and while that computation of the height every time is ever decreasing it needs to transverse the entire remaining tree.

Computing the height of the tree increases the cost by 1 each time the height of the tree grows, it has to do N node lookups depending how deep it is. So how many visits does ComputeHeight cost us over time, well its actually (height-1) at that point + the next levels cost which will be one lower and infinitum until its zero.

Graphed it looks like this:

Red is 0.5 * N * n
Green is N * N
Blue is the actual cost
Yellow is N

The actual cost is definitely not O(N). Its also not tracking O(N^2) directly. We could come up with a better amortised bound here because we know actually ComputeHeight 0.5 * N on average vists for computeHeight and we call it N times, so N * 0.5 * N is much closer to what we see and slightly over estimates and underestimates at any given point but on the whole it tracks very well with the actual cost of visiting the nodes. So the algorithm cost is well approximated by 0.5 * N * N => O(N^2) but we know the constant is actually 0.5.

O(N) is the best you could do in an unbalanced tree and caching the heights of nodes would convert the algorithm from the current to O(N). This is evident as you could calculate the depth of the tree in O(N) by visiting each node and doing an in order transversal with a stacked height count storing that in a hashMap would given you amortised O(1) lookups for height in the future at the cost of O(N) initial caching. You would then have to visit all the nodes O(N) to print their depths but your ComputeHeight would be O(1) amortised within the printDepths as it was using the cached HashMap values.

However given a balanced tree that is maintained as balanced and depths cached in each node it would be possible to get the algorithm down to O(log2(N)). But it requires that the tree be balanced in its construction and have child counts/height counts in each nodes which serve the dual purpose of both the balancing algorithms information and for the purpose of printing the depth. It increases the space utilised by the tree by a word on each node (if you use height you could use a byte actually, because 2^256 is a very tall tree indeed and hence a byte could easily maintain the height of a balanced tree on any of todays modern computers).

So I maintain that my analysis is correct and that the original algorithm present is O(N^2), the best algorithm with the existing tree is O(N) and that the best case with a balanced tree is O(log2(N))

slugg · Jan 26, 2013

BrightCandle said:
So I maintain that my analysis is correct and that the original algorithm present is O(N^2), the best algorithm with the existing tree is O(N) and that the best case with a balanced tree is O(log2(N))

Okay, we had a bit of a disconnect. The OP's algorithm for printing the path to the deepest node is O(N^2), so I agree there. But the complexity for his ComputeHeight function is O(N) (actual cost roughly 2N), but that's regardless of whether the tree is balanced or not.

The OP didn't realize that his path printing algorithm was basically an unnecessarily nested duplicate of ComputeHeight, so I showed him an alternative that does less work.

BrightCandle said:
Computing the height of the tree increases the cost by 1 each time the height of the tree grows

No. The cost of traversing a terminal path from the root increases by 1 each time the height of the tree goes. Here's the flaw with your analysis; computing the height is not the same as traversing from the root to any particular leaf. Computing the height of a tree is basically just a specialized post-order traversal, which is O(N), regardless of the tree's balance. All nodes must be visited (once or twice, depending on what you consider a visit, but doesn't matter as it's just a constant) to validate that the computed path length is global maxima.

Your O(log(N)) argument is really stretching. That's one heck of a specialized data structure there. I wouldn't consider it a typical tree, which is what the OP is dealing with. But if you want to go into specialized data structures, we can recall the height in O(1), can't we?

BrightCandle · Jan 26, 2013

slugg said:
Your O(log(N)) argument is really stretching. That's one heck of a specialized data structure there. I wouldn't consider it a typical tree, which is what the OP is dealing with. But if you want to go into specialized data structures, we can recall the height in O(1), can't we?

It is actually how we came to red/black and the other less heavy trees. The initial thought process was to cache the height at that point and use it to determine how to rotate and where. But actually in the end it turned out we could get away with a lot less information than the height, so we did and saved the byte. Its a historic point of interest on the progression of the balanced tree algorithms, which in this particular circumstance is worth knowing about because it provides an elegant solution. Knuth wrote about it in the art of computer programming book 1 so I thought it was worth mentioning.

Were I needing to do this millions of times a second I would probably write my own balanced tree with height as the balancer and as an available property internally. It would make a massive difference in the performance on relatively modest sized trees, for what is quite likely a small percentage increase in memory usage. But its pretty specialised obviously, you wont find one of those in the libs!

OneOfTheseDays · Jan 26, 2013

I love you guys, especially you slugg. Made this all clear for me. I was being stupid in my original implementation. I had the optimal solution staring me in the face the entire time.

iCyborg · Jan 26, 2013

BrightCandle said:
It is actually how we came to red/black and the other less heavy trees.

I see now where was the disconnect for me: I thought the task was to find an algorithm that will work on any tree and the worst case is O(n), but the best case for such an algorithm is also O(n). You need to go through at least N/2 nodes to know that you've found the longest path.

But you're actually finding both a special type of a tree *and* a specialized algorithm that assumes this special type and will not work on any tree. In that case you can make O(log2(n)) without caching anything anywhere: take left balanced tree and just print out the path by choosing the left node always.

OneOfTheseDays · Jan 26, 2013

slugg said:
So this whole time, I've been talking about the OP's ComputeHeight algorithm. So now to talk about PrintStringToDeepestNode... As it's implemented, it's N^2, but the entire problem, "write a method that will print out the path to the deepest leaf node in a binary tree", can be done in O(N).

Add another parameter to GetTreeHeight - let's call it pathStack. It'll represent a stack of strings. Add the deepest node to a stack. Then in the calling function, pop off the stack to print in order of traversal (reverse of the insertion).

Code:

void PrintStringToDeepestNode(node n) { Stack pathStack= new Stack(); GetTreeHeightAndPath(n, pathStack); while (pathStack.Count > 0) { print(pathStack.pop() + " "); } } int GetTreeHeightAndPath(node n, Stack pathStack) { if(n == null) { return -1; } else { int left = GetTreeHeightAndPath(n.left) + 1; int right = GetTreeHeightAndPath(n.right) + 1; if(left > right) { [B]pathStack.add(n.left.value);[/B] return left; } else { [B]printStack.add(n.right.value);[/B] return right; } } }

GetTreeHeightAndPath is about 2N = O(N) as I showed in a previous post. Worst case, the traversal path of the height will be N nodes, so we get about 3N = O(N). So the entire algorithm is O(N).

Actually slugg your code is incorrect and does not produce the correct solution.

OneOfTheseDays · Jan 26, 2013

My optimized solution in two stages:

1.) Pre-compute node heights and store into hashtable using simple depth first search. Runtime = O(n).
2.) Traverse down tree using node heights stored in hashtable as guide. Worst case traversal time is a linked list like tree. Runtime = O(n).

Total runtime = 1 + 2 = O(n) + O(n) = O(2n) = O(n) once constant factors are removed.

Code:

int PreComputeHeights(Node *pRoot, std::unordered_map<Node *, int> &heights)
{
	if (!pRoot)
	{
		// If root is NULL, return -1 to indicate no height
		return -1;
	}
	else
	{
		int leftHeight = PreComputeHeights(pRoot->left, heights);	
		int rightHeight = PreComputeHeights(pRoot->right, heights);	

		int myHeight = 1+ max(leftHeight, rightHeight);

		heights.insert(std::pair<Node*, int>(pRoot, myHeight));
		
		return myHeight;
	}
}

void PrintStringToDeepestLeafNode(Node *pRoot)
{
    std::unordered_map<Node *, int> heights;
	PreComputeHeights(pRoot, heights);

	Node *curr = pRoot;
	while (curr)
	{
		// Print current node value
		printf("%d\n", curr->value);

		// Retrieve node children height, if they exist, from hashtable and 
		// traverse down larger subtree
		int leftHeight = (curr->left) ? heights[curr->left] : -1;
		int rightHeight = (curr->right) ? heights[curr->right] : -1;
		if (leftHeight > rightHeight)
		{
			curr = curr->left;
		}
		else
		{
			curr = curr->right;
		}
	}
}

BrightCandle · Jan 26, 2013

iCyborg said:
I see now where was the disconnect for me: I thought the task was to find an algorithm that will work on any tree and the worst case is O(n), but the best case for such an algorithm is also O(n). You need to go through at least N/2 nodes to know that you've found the longest path.

I don't often come up with such insights on algorithms, this problem just immediately struck me as a special case that had a data structure that could solve it. In theory it would work on any balanced tree actually so long as you knew where the furthest away nodes must be, normally always on the right branch. Balanced trees are very common structures because all the common operations can be implemented in log2(n) instead of n, its actually kind of important for performance to ensure a tree is balanced and the expense is of the same order as the adjustment operations to begin with. The key is you need the guarantee about the depth, either by implication in the design of algorithm and data structure or directly with the depth values on the node, either would do. So its not quite as specific as we might think.

I view this special case similar to the case of sorting a known range of integers. You can sort integers in O(n) time so long as the range fits within an array bounds. Then you go through the numbers once and add +1 to the ith array position. Then you just go through the array from beginning to end and output the array index value the number stored in that location. For a certain set of circumstances its a much faster algorithm than a generic sort algorithm. Its not a generally useful sorting algorithm, it only works for integers and only when the range of values is known and the top and bottom are close enough to make memory utilisation low enough AND to ensure that the array isn't so big that its size isn't dramatically larger than the original list. That is a lot of conditions but if you can craft the circumstance then you can go faster than the usual O(n log(n))

There are many of these sorts of specific solutions that outperform the general case in theory as well as practice. But knowing them all takes a lifetime. Today is the second time reading the art of computer programming has helped me with such a case. I don't want to confuse by introducing it, I am simply showing that O(n) is not the lower bound necessarily.

slugg · Jan 27, 2013

OneOfTheseDays said:
Actually slugg your code is incorrect and does not produce the correct solution.

Woops, yea you're right. I didn't really pay too much attention to detail since it was pseudo code, but I see what you mean. You'd have to fix the boundary conditions, or pass a new stack to each "left" and "right" branch and return the one with the highest height, that'll do it. Or whatever, really. Your hastable, pre-computed method would work fine, too. Even with my craptacular code, I'm glad you're now comfortable with and understand why your algorithm is O(n).

Big-O analysis question

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member