AVL Trees

Rotations
General Form of AVL Rotations

In this section, we examine Adelson-Velsky & Landis trees, commonly known as AVL-trees.¹ AVL-trees are height-balanced binary search trees. Accordingly, they are a subset of balanced binary trees:

Given this fact, we might guess that there are various ways of defining what it means to be "balanced." And in fact, we would guess correctly. The term balanced tree, without more, could mean many things. Do we mean balanced in terms of weight? Height? Depth? Order? Node categories?

For AVL-trees, we define "balanced" in terms of height. And to construct that definition, we need a way of quantifying balance. We do so by defining a height balance factor:

Definition: Height Balance Factor. Let ${n}$ be a node in some tree. The height balance factor of ${n}$ is defined as:
$\textit{\textsf{BF}}(n) = \textit{\textsf{H}}(\textit{\textsf{LST}}(n)) - \textit{\textsf{H}}(\textit{\textsf{RST}}(n))$
where:

${\textit{\textsf{BF}}(n)}$ is the height balance factor of the node ${n,}$

${\textit{\textsf{H}}(\textit{\textsf{LST}}(n))}$ is the height of the left-subtree of the node ${n,}$ and

${\textit{\textsf{H}}(\textit{\textsf{RST}}(n))}$ is the height of the right-subtree of the node ${n.}$

To understand this definition, it's important to recall the definitions for height:

Definition: Height of a Node. Let ${n}$ be a node in some rooted tree. The height of ${n,}$ denoted ${\textit{\textsf{H}}(n),}$ is the number of nodes, including the starting and end nodes, along the longest strictly downward path from ${n}$ to a leaf.

Note that there are two different definitions for the height of a node. An alternative definition is:

Definition: Height of a Node. Let ${x}$ be a node in some tree ${T.}$ The height of ${x,}$ denoted ${\textit{\textsf{H}}(x),}$ is the number of edges in the longest strictly-downward path from ${x}$ to a leaf.

The key difference between this definition and the one we will use is whether a leaf has a height of ${0}$ or ${1.}$ The existence of two differing definitions for node height is similar to the situation of natural numbers in mathematics—does the set of natural numbers include ${0?}$ Some say yes, others no.

In response to this debate, we take the pragmatic approach. In some situations, it's more convenient to define the height of a node as the number of edges along the path, since this implies that a balanced binary tree of height ${H}$ has ${2^H}$ leaves. If we defined it as the number of nodes along the path (number of edges plus ${1}$ ), we would have ${H + 1}$ or ${H-1}$ leaves. For the purposes of AVL-trees, we use the definition presented in the main text.

Definition: Height of a Tree. Let ${\textsf{T}}$ be some rooted tree. The height of ${\textsf{T},}$ denoted ${\textit{\textsf{H}}(\textsf{T}),}$ is the height of its root node.

From the height balance factor ${\textit{\textsf{BF}},}$ we have the definition of an AVL tree:

Definition: avl-tree. Let ${A}$ be a binary tree. ${A}$ is an AVL-tree if, and only if, for every node ${n \in A,}$
$\textit{\textsf{BF}}(n) \in \{ -1, 0, 1 \}$
Or, alternatively:
$\lvert \textit{\textsf{BF}}(n) \rvert \leq 1$

Following the definition above, if the absolute value of the balance factor of some node ${x}$ is greater than or equal to ${1,}$ we say that the node ${x}$ is height-imbalanced. Moreover, to determine if a particular tree ${T}$ is an AVL-tree, we must calculate the height balance factor for each node in ${T.}$ If a single node in ${T}$ is imbalanced, we conclude that ${T}$ is height-imbalanced. If the tree is imbalanced, we can attempt to balance the tree by performing rotations, as we'll see later.

That said, let's consider a few examples. Below are several binary trees, with their heights indicated.

Examining each tree, we have the following tables of balance factors:

Tree	Node	Height of Left-subtree	Height of Right-subtree	Balance Factor
A	a	2	2	0
	b	1	1	0
	c	1	0	1
	d	0	0	0
	e	0	0	0
	f	0	0	0

Tree	Node	Height of Left-subtree	Height of Right-subtree	Balance Factor
B	a	2	2	0
	b	1	0	1
	c	0	1	-1
	d	0	0	0
	g	0	0	0

Tree	Node	Height of Left-subtree	Height of Right-subtree	Balance Factor
C	a	3	1	2
	b	2	0	2
	c	0	0	0
	d	1	0	1
	h	0	0	0

Tree	Node	Height of Left-subtree	Height of Right-subtree	Balance Factor
D	a	4	1	3
	b	3	0	3
	c	0	0	0
	d	0	2	-2
	i	1	0	1
	j	0	0	0

From the tables above, we can see that trees ${\texttt{C}}$ and ${\texttt{D}}$ are imbalanced. The same trees are presented below with their balance factors indicated (red for imbalanced, green for balanced).

Now that we know what it means for a tree to be imbalanced, let's examine rotating—the act of correct height imbalances.

Rotations

If we think carefully about height-imbalanced trees, we find that there are really only four situations where a binary tree is height-imbalanced. Why? Because to have a height-imbalanced binary tree, we need, at a minimum, three nodes. This follows from the definition of a height-balanced binary tree—to determine if a binary tree is height-balanced, we must compare the left- and right-subtrees of the root. And to reach the conclusion that a binary tree is height-imbalanced, the root must have a right- and left-child to begin with; three nodes total.

As we know from the Catalan number formula, given three nodes, there are five possible binary trees:

Of the possibilities above, only ${\texttt{P5}}$ is balanced. Given that all subtrees of a binary tree are binary trees, it follows that there are only four possible cases for height-imbalanced trees:

Because there are only four possible cases, there are only four methods for height-balancing some binary tree. These methods are called rotations: right rotation, left rotation, right-left rotation, and left-right rotation.

Right rotation

Suppose some user is entering a binary tree traversal sequence to construct a tree ${T.}$ After the first two entries, the binary tree appears as:

The user then enters another element:

At this point, the tree is imbalanced. Why? Because ${\texttt{node(a)}}$ has a left-subtree of height ${2,}$ and a right-subtree of height ${0.}$ Thus, the tree has an overall balance factor of ${\textit{\textsf{BF}} = 2 - 0 = 2.}$ We demarcate this situation as ${T:LL_i}$ to convey the proposition that the tree ${T}$ is left-left-imbalanced, which in turn conveys: An imbalance occurred at the left-subtree of the left-subtree of ${T.}$

Now, because we have ${T:LL_i,}$ we perform an right-rotation:

\Large \overset{\texttt{right-rotate}}{\curvearrowright}

The RR-rotation consists of the following steps:

${\texttt{node(b)},}$ the node just before the imbalance-causing node—in this case, ${\texttt{node(c)}}$ —becomes the new root. We call ${\texttt{node(b)}}$ the pivot.
${\texttt{node(a)},}$ formerly the root, becomes the right-child of ${\texttt{node(b)},}$ with ${\texttt{node(b)}'s}$ right-child as its left-child. In the case above, ${\texttt{node(b)}}$ 's right-child is ${\texttt{NULL}.}$
${\texttt{node(a)}}$ remains the left-child of ${\texttt{node(b)}.}$

One way to think about the right-rotation operation is to imagine holding down the pivot, and pulling the node to its right down. (Hence the term pivot). Alternatively, we can think of the right-rotation as it sounds like—the nodes move towards the right, transitioning from the shape ${~~/~~}$ to the shape ${~~\land~~.}$

The right-rotation operation works regardless of how many nodes there are. For example, suppose we have the following tree:

In the diagram above, the nodes without circles indicate that there are further subtrees. Importantly, the fact that the pivot, ${\texttt{node(b)},}$ has a balance factor of ${1}$ indicates that this tree is left-heavy. Accordingly, we have a right-rotation:

Left rotation

Now suppose the user enters the tree:

Here, we have an RR-imbalanced tree. To balance this tree, we must perform a left-rotation:

\Large \overset{\texttt{left-rotate}}{\curvearrowleft}

The procedure:

The pivot ${\texttt{node(b)},}$ becomes the new root. .
${\texttt{node(a)},}$ formerly the root, becomes the
left-child of ${\texttt{node(b)}.}$

${\texttt{node(c)}}$ remains the right-child of ${\texttt{node(b)}.}$

Right-left Rotation

Now we examine the third case. Suppose the user enters:

We say that this tree is right-left imbalanced. To balance this tree, we perform a right-left rotation:

\Large \overset{\texttt{right-rotate}}{\curvearrowright}

\Large \overset{\texttt{left-rotate}}{\curvearrowleft}

Left-right Rotation

Now we examine the final case. Suppose the user enters:

We say that this tree is left-right imbalanced. To balance this tree, we perform a left-right rotation:

\Large \overset{\texttt{left-rotate}}{\curvearrowleft}

\Large \overset{\texttt{right-rotate}}{\curvearrowright}

General Form of AVL Rotations

AVL tree rotations have a general form. Consider the following tree:

The nodes without circles indicate that the child node is the root to some larger subtree. In this case, we see that Node(A) is imbalanced — it has a balance factor of 2. Given that Node(B) has a balance factor of 1, we see that the rotation should be an LL-rotation. If Node(B) had a balance factor of -1, the rotation would be an LR-rotation.

More generally:

Tree	Imbalance	Procedure
	Left-left	Left-left rotation
	Left-right	Left-right rotation
	Right-right	Right-right rotation
	Right-left	Right-left rotation

Returning to our imbalanced tree, how do we balance this tree knowing it potentially has thousands of other nodes in those subtrees? The idea is to just momentarily ignore the fact that those subtrees exist. Performing the LL-rotation, we get:

Now we see that it's balanced. The subtrees are now:

Ok hang on. How did those subtrees move? A helpful way to think about how the subtrees got to their positions is to imagine the links between each node as a thread:

Imagining the links between each node as a string

Notice that by pulling the thread, the subtree slowly goes up, until it arrives at its final location. Knowing this general form, consider a more complicated tree:

This tree is balanced. Suppose we insert a new node with the value 4. Because 4 is smaller than 30, it goes towards the left, landing at the node containg 20. Then, because it's smaller than 20, it goes to the left again, landing on the node containing 10. And because it's smaller than 10, it lands at node 5. And because it's smaller than 5, it's inserted as the left child of 5:

Now that we've inserted node(4), the tree is imbalanced. To balance this tree, we focus on just three nodes:

After node(10), wherever node(4) is inserted, it's left-left imbalanced.

AVL-trees were invented in 1962, and named after their inventors, Soviet mathematicians Georgy Adelson-Velsky (8 January 1922–26 April 2014) and Evgenii Mikhailovich Landis (6 October 1921–12 December 1997).

Born in Samara, Russia, Adelson-Velsky is known for introducing bitboards, a bit array data structure now widely used to represent game positions in computer chess. Adelson-Velsky's chess program participated and won the first chess match between computer programs in 1966. The AVL-tree, in particular, was borne out of Adelson-Velsky's work in artificial intelligence.

His colleague, Evgenii Landis, was equally accomplished. Born in Kharkiv, Ukraine, Landis fought in Finland as a lieutenant in the Red Army, where he was wounded, frost-bitten, and severely shell-shocked. The traumas of war impaired his ability to hear high-pitched tones, to which Landis lamented his inability to hear violin flageolets. Despite these difficulties, Landis made numerous contributions to the field of partial differential equations, alongside his collaboration with Adelson-Velsky in inventing the AVL-tree. ↩