AVL Tree Big-O

AVL Trees Big-O video (27 minutes) (Spring 2021)

General Idea

The Big-O of BST operations correlates with the height of the BST.

"What is the worst case (tallest) height for n nodes?"

This is a difficult question to answer. But, the inverse question can be asked:

"For some height, what is the smallest number of nodes that a tree of that height can have?"

This will give the worst case, or tallest, height for that number of nodes.

AVL Trees

To be an AVL tree, each node's subtree must have a left and right subtree that differ in height by no more than 1.

With that constraint in mind, we can plot out a few AVL trees, answering the question that we have posed:

"For some height, what is the smallest number of nodes that a tree of that height can have?"

height = 1 An AVL tree of height 1 can be built with a single node.

O

height = 2 An AVL tree of height 2 can be built with 2 nodes.

O
 \
  O

height = 2 An AVL tree of height 2 can be built with 2 nodes.

O
 \
  O

height = 3 An AVL tree of height 3 can be built with 4 nodes.

   O
  / \
 O   O
      \
       O

height = 4 An AVL tree of height 4 can be built with 7 nodes.

    O
   / \
  O   O
 /   / \
O   O   O
         \
          O

Identifying a Trend

"For some height, what is the smallest number of nodes that a tree of that height can have?"

We have started to collect data points to answer this question.

height  | Nodes for this height
------------------------------
    0   |  0
    1   |  1
    2   |  2
    3   |  4
    4   |  7

There is also a pattern to be seen in the graphs of these AVL trees. Look at the trees of height 4, 3, and 2.

The tree of height 4 consists of a root node, and a subtree of height 3, and a subtree of height 2. Both subtrees (of height 2 and 3), are the trees that we have graphed, that have the minimum number of nodes to attain this height.

An AVL tree of height h, consists of a root node, an AVL subtree of height h-1, and an AVL subtree of height h-2

Phrasing this as N(h), for, number of nodes for a given height, this can be termed as

N(h) = 1 + N(h - 1) + N(h - 2)

This is a recurrence relation! And it's a hard one to solve.

Solving it for several values:

h  | N(h)
---------
0  |  0
1  |  1
2  |  2
3  |  4
4  |  7
5  |  12
6  |  20
7  |  33
8  |  54

There are two things to notice about this trend.

  1. It is similar to Fibonacci sequence! Just offset by one. Remember, Fib(n) = Fib(n - 1) + Fib(n - 2).
  2. Looking at the table, there seems to be an exponential relation. As h increases by 1, N(h) "almost" doubles.

Similarly to the Fibonacci recursive function, the number of nodes for a given height grows by 1.6^h.

(The recurrence relation is expanded to prove/demonstrate this at 18:12 in the lecture video.)

So, n = 1.6^h

So, inversely, h = log(n).

The height of an AVL tree having n nodes is O(log(n)).

Thanks

Thanks to Brian Foster for writing up these notes based on the video lectures.