Introduction to Trees
This chapter covers notes on trees.
In this section, we examine the trees. Trees are a fundamental data structure in computer science. They're used as auxiliary data structures for many algorithms (in particular, searching algorithms), as well as a primary data structure for organizing data. As we'll see throughout the materials, there are many kinds of treesβbinary trees, AVL trees, M-way trees, B-trees, and so on. The materials we present in this section are general propositions regarding trees. As such, there is little to no code for this specific section. Instead, much of the content pertains to key properties and ideas that apply to all trees.
Roughly speaking, a tree is a collection of nodes and edges; the nodes being the colored circles, and the edges being the numbered lines. The nodes need not be occupied by number, nor do the edges have to be numbered. More generally, a tree is simply a particular type of graph.
Below is a working definition of a tree. Keep in mind that this is the mathematical definition. In later sections, we'll present a structural definition relevant to programming applications.
tree (graph theoretic). Let be the nonempty graph where is the set of all nodes on and is the set of all edges. Then is a tree if, and only if, the following propositions are true. (1) For any two nodes there exists an edge (2) is acyclic.
This is the definition with respect to graph theory. Set theory, however, presents a slightly stricter definition.
tree (set theoretic). Let be a set of nodes and a partial ordering on Then the poset is a tree if, and only if, the following propositions are true. (1) is a wellfounded partial ordering. (2) For every the set is well-ordered by (3) For all there exists a such that and
Tree Relations
Having given the mathematical definition of a tree, we now define a few relations relevant to trees.
First, if there are nodes, there are edges. Examining the tree above, notice that there are total nodes, and edges. We get edges because the very first node, called the root, has no parents or ancestors.
First, the total number of nodes in a tree is called the tree's order, denoted with It may also be denoted or
For example, the tree below is a tree of order
This tree is of order
And this tree is of order
A root is a node with at least one edge to a node below it, but no edges to nodes above it (i.e., a node with a child, but no parents, or, in some trees, the topmost node).
For all trees other than the empty tree, every node either (1) has a child, or (2) has no children. If a node has a child, then the node is called a parent. A child is a node with at least one edge to a node above it. For example, in the tree below, the node is a child of The nodes and are children of is the parent of three nodes: and And is the parent of and
A node with one child is called a uniparous node. A node with two children is called a biparous node. A node with three children, a triparous node, and so on.
A node with no children is called a leaf. Any node that is not a leafβi.e., a node with childrenβis called a branch node. In the tree diagrams we've seen thus far, branch nodes have been colored beige, and leaves have been colored green.
The number of leaves in a tree is called the foliage count, denoted For example, in the tree below, the
If a path exists from node to node we say that is an ancestor of and is a descendant of In the tree below, is an ancestor of and Similarly, is descendant of
Notice that following these definitions, the root node is the ancestor of all nodes within the tree. To refer to all the descendants of a particular node we use the term strict descendants, denoting such set 1 To refer to all the ancestors of we use the term strict ancestors, and denote the set For example, in the tree below, the strict descendants of are
And the strict ancestors of are
A sibling is a node that shares a parent with another node. For example, in the tree below, the nodes and are siblings, since they share the parent The nodes and are also siblings, since they share the parent The nodes and however, are not siblings, since and have a parent different from
The nodes and however, are cousinsβnodes whose parents are siblings. We also say that the parent of is a pibling of and and that and are niblings of
A node with no siblings is called a monoprogeny, or monoprog. In the tree above, the nodes and are monoprogs, since they are the only children of and respectively. From this definition, it follows that the root is always a monoprog, since it has no siblings.
The degree of a node is the number of the 's children (not descendants), denoted For example, in the tree below, since has three children: and The degree of is since has two children: and Implied by this definition is the proposition that all leaves have a degree of since leaves, by definition, have no children. Thus, and so on.
The degree of a tree, or tree degree, has varying definitions. Some authors will define the tree's degree as the degree of the root, others will define it as the maximum degree of any given node in a tree (i.e., the maximum number of children a node may have). In these materials, we define the tree degree as follows:
definition The degree of a tree denoted is the maximum degree of any given node in the tree.
When we seek to convey the degree of the root, we will use the term root degree. For example, the tree below has a degree since every node has at most children.
This tree has a degree since every node has at most children:
Tree Paths
The shape of a tree gives a natural flow from the root down to any node within the tree. For example, consider the following tree:
A path is essentially a "route" or "road" from some node to another node. For example, a path from the root to the node would be:
A path to the leaf
Notice that these paths are simply lists. Because paths are lists, they have a lengthβthe number of edges in the path. For example, the path has the length The path has the length
For any node within a tree, there is a unique path from the root to The length of this path is called the depth of Because there is no edge connecting the root to the root, we say that the depth of the root is
For example, the tree below presents different depths:
The tree depth is the maximum depth among all nodes in the tree. In the tree above, the tree depth is
Each node in the tree also has a height, the number of edges on the longest path from to a leaf. To illustrate, the tree below marks the various heights for each node. The leaf nodes have a height of because there is no path from the leaf node to a leaf node. The nodes and each have a height of since the longest path from those nodes to a leaf node consists of edge.
The tree height is the maximum height among all nodes in the tree. In the tree above, the tree height is
Binary Trees
The binary tree is a tree of degree-βa tree where every node can have, at most, children. For example, below is a binary tree consisting of fifteen nodes:
The binary tree above is an example of a perfect binary treeβa binary tree where all branch nodes have exactly two children, and every leaf is at the same generation or depth within the tree. Binary trees can be thought of in various shapes and sizes. For example, this is a binary tree:
Remember, the only requirement for a binary tree is that each node has, at most, two children. The tree above is a special kind of binary tree called a degenerate binary tree or a pathological binary tree. When the degenerate tree skews left, we call it a left-skewed tree.
And when the tree skews right, we call it a right-skewed tree:
Tree Counting
Tree counting is the process of counting the number of possible trees given certain constraints. The most common form of tree counting is counting the number of possible trees given nodes. How the counting is performed depends on whether the nodes are labelled or unlabelled. We distinguish these two constraints by presenting the following definitions immediately:
lemma. Given nodes, the number of possible unordered trees, denoted is given by the formula:
where
lemma. Given nodes, the number of possible ordered trees, denoted is given by the formula:
where
Unordered Tree Permutations
Suppose we have three nodes, and With three nodes, we have Suppose further that the nodes are unlabelled nodesβthey do not contain any data or any associated identifiers. Just three empty nodes. How many structurally different binary trees can be generated with these three nodes? Well, we can draw each of them:
We see that there are possible trees. What if we had nodes? Again, each node is just a nodeβno particular data or associated label. Four empty nodes. Once more, we can draw each of them:
Here, we see that there are possible trees. In general, the number of possible trees for unlabelled nodes is given by the Catalan Number Formula:
Catalan Number. A Catalan number is a positive integer that satisfies the following equation:
where
Expanding the Catalan number formula, we get:
We can use this formula to state our lemma:
Lemma: Unordered Tree Permutations. Given nodes, the number of possible unordered trees, denoted is given by the formula:
where
The formula can be verified by checking our earlier work exhausing possible trees. Where we have:
The lemma we've derived allows us to enumerate all the possible unordered trees given nodes without having to draw all of the possibilities. For example, with we have:
possible unordered, structurally different trees. Each of these trees is called an unordered tree permutation of order (-UTP).
Tallest Unordered Binary Tree Permutation
Among all of the possible binary -UTPs, we often want to find the the number of binary -UTPs of maximum height. We denote this number with the variable
definition. Given nodes, the number of unordered binary tree permutations of maximum height is:
To see why this formula holds, let's look at our previous diagrams. Where we have trees of maximum height. Where we have possible trees of maximum height. Laying this data out via table:
number of maximum height trees | |
---|---|
Accordinly, given nodes, the number of trees of maximum height is given by:
Ordered Tree Permutations
When the nodes are labelled, the number of possible treesβdenoted βis different. For example, recall that with unlabelled nodes, we have possible trees:
Once we introduce labels, we add a slight wrinkle to the analysis. For the first possibility, a left-skewed tree, we have possibilities:
Thus, every structurally unique tree carries with it possible arrangements, where is the number of nodes. Accordingly, we want to take the number of structurally different unordered trees, and multiply that number by to account for the different possible orders.
Thus, we have the following lemma:
lemma. Given nodes, the number of possible ordered trees, denoted is given by the formula:
where
This lemma gives us the cardinality of the set of all possible ordered trees given nodes. Each member of the set is called an ordered tree permutation of order |N| (N-OTP).
Height and Node Bounds
The number of nodes in a binary tree, and the tree's height have several relationships:
definition The shortest possible tree that can be achieved with nodes is called an -shrub or minimum height tree of order , denoted If the tree is restricted to a tree degree we write
definition The tallest possible tree that can be achieved with nodes is called an -lumber or maximum height tree of order , denoted If the tree is restricted to a tree degree we write
definition Given a height the minimum order necessary to achieving a tree of height is called the shrub size given height , denoted If the tree's nodes can only have at most children, we write
definition Given a height the maximum order necessary to achieving a tree of height is called the lumber size given height , denoted If the tree's nodes can only have at most children, we write
definition The minimum height achieved with nodes is called the shrub height given nodes, denoted If the tree's nodes can only have at most children, we write
definition The minimum height achieved with nodes is called the lumber height give nodes, denoted If the tree's nodes can only have at most children, we write
We examine each of these properties for binary trees.
Binary Shrub Sizes
A binary shrub size solves the following problem:
Given a height of a binary tree, what is the minimum number of nodes, or the minimum order, necessary to achieving
A good starting point to answering this question is by considering a few easy examples:
From the diagram above, we make the following findings:
Height | Mininum number of nodes | Maximum number of of nodes |
---|---|---|
Reading this table, we see that for a tree of height the smallest possible number of nodes we can use to achieve that height is and the biggest possible number of nodes we can use is
Studying the patterns, we have the following formulas:
Formula Given a binary tree height of the minimum number of nodes needed to achieve is:
Binary Lumber Size
The binary lumber size answers the following problem:
Given a height of a binary tree, what is the maximum number of nodes to achieve
For the maximum number of nodes, the trick is to look at the generations. Consider the tree where At the first generation, there is node. At the second generation, there are nodes. At the third generation, there are nodes. At the fourth generation, there are nodes. This yields:
The notation means, the set of all nodes whose generation is Thus, we're summing the number of nodes at each generation. To maximize the number of possible nodes, we want each node at each generation to have two children, per the constraints for a binary tree.
Examining the computation, we see a sequence, the geometric progression:
The sum of this sequence is the geometric series:
where is the common ratio, is the base is the power, and is the height. Thus, applying this formula to, say, we get:
Thus, we have the formula:
Formula. Given a binary tree height of the maximum number of nodes needed to achieve is:
Lumber Heights
Alternatively, some problems require us to find the tallest (i.e., the greatest possible height) tree given nodes. Knowing the height of this tree provides an upper bound for certain calculations. We call such a tree a lumber.
definition. A lumber is a tree with the greatest possible height given nodes.
For example, given nodes, we have a lumber of With nodes, we have a lumber of With nodes, we have a lumber of
Finding a lumber and its height for a binary tree of nodes is straightforward. Per the constraints of a binary tree, to achieve the maximum height, we just have to ensure that each node has and only child. In other words, arrange the tree linearly. Thus, we have the formula:
Timber Height Formula. Given nodes, the height of a lumber, denoted is given by the formula:
Shrub Heights
In many problems, we often want to find the shortest (i.e., the smallest possible height) tree given nodes. Knowing the height of this tree is a useful asset for analyzing various computations. We call such a tree a shrub.
definition. A shrub is a tree with the least possible height given nodes.
For example, with nodes, we have a shrub of With nodes, we have a shrub of With nodes, we have a shrub of
The formula for computing height of a shrub is simply a rearrangement of the formula for the maximum number of nodes need to achieve a height 2
Shrub Height Formula. Given nodes, the height of a lumber, denoted is given by the formula:
Binary Tree Bounds
From our analyses of shrubs and lumbers, we have the following conclusions:
Property | Quantification |
---|---|
Given a height the least number of nodes needed to achieve | |
Given a height the greatest number of nodes needed to achieve | |
Given nodes, the height of the shortest possible tree (a shrub). | |
Given nodes, the height of the tallest possible tree (a lumber). |
These findings yield two powerful theorems. First, the bounds for the height of a binary tree:
Lemma: Height Bounds of a Binary Tree. Let be a binary tree. The height of denoted is bounded on the interval:
where is the number of nodes in
A useful implication of this theorem is seen when we rewrite the upper and lower bounds in terms of big-O notation:
This yields the corollary:
Corollary: Big-O Height Bounds of a Binary Tree. Let be a binary tree. The height of denoted is bounded on the interval:
where is the number of nodes in
Second, the bounds for the cardinality of the set of all nodes in the binary tree
Theorem: Node Bounds of a Binary Tree. Let be a binary tree. The number of nodes in denoted is bounded on the interval:
where is the height of the binary tree
Branch Nodes v. Leaf Nodes
Recall that a leaf node is a node with no children; i.e., a node with Nodes that do have children are called branch nodes. With binary trees, branch nodes either have or We call a branch node with child a twig, and a branch node with children a bough. Is there a mathematical relationship between leaf nodes and branch nodes? Let's find out.
Consider the following binary trees:
In the trees above, bough nodes are colored brown, twig nodes are colored yellow, and leaf nodes are colored green. Examining each tree, we have the following observations:
Tree | (number of leaves) | (number of twigs) | (number of boughs) | |
---|---|---|---|---|
Examining this table, we see that there is some sort of relationship between leaves (nodes of degree ) and boughs (nodes of degree ): The number of nodes of degree is one plus the number of nodes of degree
lemma. Given a binary tree with leaves and boughs the number of leaves is one more than the number of boughs:
Types of Binary Trees
To understand the costs and benefits of the binary tree data structure, it's helpful to know the different types of binary trees. This allows us to efficiently communicate various propositions.
Rooted Binary Tree
Recall that a binary tree is a tree where each node has at most two children. Notice that this definition imposes no requirement for a root. In most applications, when we use the term "binary tree," we're actually referring to a rooted binary treeβa tree data structure with: (1) a root, and (2) each node in the tree has at most children.
Definition: Rooted Binary Tree. A binary tree with nodes is a rooted binary tree if and only if:
- has root, and
Proper Binary Tree
A proper binary tree is a binary tree where each node has either or children.3 The proper binary tree's definition is simply a modification of the second prong of the rooted binary tree's definition:
Definition: Proper Binary Tree A binary tree with nodes is a proper binary tree if and only if:
- has root, and
Binary trees that are not proper binary trees are called improper binary trees. For example, the green trees below are all proper binary trees, but the red trees are not:
!Proper and improper binary trees
We can think of a proper binary tree as a binary tree where the nodes are "all or nothing"βthe node either has the greatest possible number of children () or has no children at all (). Alternatively, we can think of the property binary tree as any tree where no node has just one child. A helpful way to remember this property is to observe that the definition of a proper binary tree prohibits a node from having just childβnode with a unary, not binary, degree.
Relationship Between Height and Nodes
Proper binary trees can be identified by a special relationship between its height and its nodes. Consider the following table:
Height | Minimum Number of Nodes | Maximum Number of Nodes |
---|---|---|
We have the following formulas establishing the relationship between the height of a proper binary tree and the minimum or maximum number of nodes needed to achieve that height.
Formula -for-. Let be a proper binary tree. Then the minimum number of nodes needed to achieve the height of denoted is given by the :
and the maximum number of nodes:
Formula: -for-. Let be a proper binary tree. Then the maximum number of nodes needed to achieve the height of denoted is given by the formula:
Shrubs
Given nodes, we often want to find the height of the proper shrubβthe shortest possible proper binary tree given nodes. As we saw earlier with the general formulas, we can derive the height of a proper shrub from the -for- formula:
Formula: Shrub Height. Given nodes, the height of shortest possible proper binary tree is given by the formula:
The same reasoning extends to deriving the formula for the height of the proper lumberβthe tallest possible binary tree given nodes:
Hence, we have the formula:
Formula: Lumber Height. Given nodes, the height of tallest possible proper binary tree is given by the formula:
Bounds of Proper Binary Trees
As we saw the general formulas, the lumber and shrub height formulas provide us with upper and lower bounds for the heights of proper binary trees.
Theorem: Proper Binary Tree Bounds. Given a proper binary tree with nodes, the height of denoted satisfies the following expression:
or, alternatively,
Relation: Branch Nodes and Leaves
Is there a relationship between a proper binary tree's branch nodes and leaves? It turns out yes:
Relation: Proper Branch Nodes and Leaves. Given a proper binary tree with branch nodes and leaves, the number of nodes and the number of leaves have the relation:
Complete Binary Trees
A complete binary tree is a binary tree where each generation other than possibly the last is completely filled (i.e., each node has exactly two children). The last generation may be completely or partially filled, but if it is partially filled, it must be filled from left to right.
Definition: Complete Binary Tree. Let be a binary tree. is a complete binary tree iff the following propositions are true:
- If a node has a height then the node has exactly two children.
- For the set of all nodes with a height the only monoprog must be in the right subtree of and the left child of its parent.
For example, the green trees below are all complete binary trees, while the non-green trees are incomplete binary treesβbinary trees that are not complete.
The purple tree is an incomplete binary tree because it has no siblings, but is not the left child. The brown tree is an incomplete binary tree because not every generation is filledβthe node at generation has only one child (i.e., a node with a degree of when it must have a degree of ). The blue tree is not a complete binary tree because it has no root.
Perfect Binary Tree
A perfect binary tree is a binary tree where all interior nodes have two children, and all leaves have the same depth or are on the same generation.
Definition: Perfect Binary Tree. Let be a binary tree. is a perfect binary tree if and only if:
- all branch nodes have two children, and
- all leaves have the same depth.
For example, all of the green trees below are perfect binary trees, while the non-green trees are imperfect binary treesβ trees that are not perfect binary trees:
The purple tree is imperfect because not all branch nodes have two children. The red and blue trees are imperfect because not all leaf nodes have the same depth.
Balanced Binary Tree
Balanced binary trees come in two forms: height-balanced trees and weight-balanced trees. Below are the definitions.
Definition: Height-balanced Binary Tree. Let be a binary tree. is a height-balanced binary tree if and only if:
- For each node the heights of its subtrees differ by at most
For example, the green trees below are all height-balanced binary trees, while the non-green trees are not:
Next, we have the definition of a weight-balanced binary tree:
Definition: Weight-balanced Binary Tree. Let be a binary tree. is a weight-balanced binary tree if and only if:
- For each node the numbers of branch nodes in its left subtree and the number branch nodes in its right subtree differ by at most
There is one unifying property for both height- and weight-balanced binary treesβheight. This leads to the general definition of a balanced binary tree:
Definition: Balanced Binary Tree. Let be a binary tree. is a balanced binary tree if and only if its height is
Summary of Tree Types
Because of the variety of tree types, it can be overwhelming at first keeping track of all the names and conditions. Accordingly, a helpful summary is provided in the list below.
We'll cover -ary trees in the next section, but a -ary tree is simply the general term for a tree with a degree of For example, a binary tree is a -ary tree, since every node can have at most children; a ternary tree is a -ary tree, since every node can have at most children; etc.
- Rooted -ary tree
- There is exactly root.
- All nodes have at most children.
- Perfect -ary tree
- All branch nodes have exactly children.
- All leaves have the same depth.
- Other names: Complete -ary tree
- Complete -ary tree
- All generations other than the last are completely filled.
- The last generation is filled from left to right.
- Other names: almost complete -ary tree, nearly complete -ary tree
- Proper -ary tree
- All nodes have either or children.
- Other names: full -ary tree, plane -ary tree, strict -ary tree
- Pathological tree
- Every node has exactly or child, left or right.
- Other names: degenerate tree
- Left-skewed tree
- Every node has exactly or left child.
- Right-skewed tree
- Every node has exactly or right child.
Additionally, it may be helpful to visualize the different types with a Venn diagram:
k-ary Trees
Binary trees are more generally called k-ary trees. Specifically, they are a 2-ary tree.
definition: A k-ary tree is a rooted tree where each node has at most children.
Thus, for the binary tree, we have This means that each node in the tree can only have or children. These are the only possiblities.
A ternary tree is a 3-ary tree, so Thus, in a ternary tree, each node can only have or children. For example, the green trees below are all ternary trees, but the red tree is not:
That blue tree looks like a binary tree. Indeed, it is. Remember, all that's required for a ternary tree is that each node must have at most nodes. That's a ceiling, not a floor.
Similarly, a 4-ary tree is a tree where each node only has either or children.
Strict k-ary Trees
A strict k-ary tree is a k-ary tree where each node has either (1) zero children or (2) exactly children.
definition. Let be a k-ary tree. is a strict k-ary tree iff every node in has a or children.
Thus, a strict binary tree (what we referred to earlier as a proper binary tree) is a binary tree where every node has either or children. A strict ternary tree is a ternary tree where every node has either or children. A strict 4-ary tree is a 4-ary tree where every node has either or children. And so on.
Strict k-ary Shrub Sizes
The shrub size lemmas we saw earlier apply to strict k-ary trees.
Lemma: Strict -ary Shrub Sizes. Let and The minimum number of nodes, i.e., the minimum size, necessary to achieving a k-ary tree of height is the strict -ary shrub size, denoted where:
Strict k-ary Timber Sizes
A similar lemma is found for timber sizes:
Lemma: Strict -ary Shrub Sizes. Let and The maximum number of nodes, i.e., the maximum size, necessary to achieving a k-ary tree of height is the strict -ary timber size, denoted where:
Strict k-ary Shrub Heights
The strict k-ary lumber size lemma yields the strict k-ary shrub height lemma: The smallest possible height strict k-ary tree given nodes.
Hence, we have the following lemma:
Lemma: Strict -ary Shrub Height. Let Given nodes, the shortest possible height strict k-ary tree, called the -ary shrub, has the height called the -ary shrub height, where:
Strict k-ary Timber Heights
By a similar line of reasoning, we can derive the strict k-ary timber height lemma from the strict k-ary shrub size lemma: The tallest possible height strict k-ary tree given nodes.
Thus, we have the lemma:
Lemma: Strict -ary Timber Height. Let Given nodes, the tallest possible height strict k-ary tree, called the -ary timber, has the height called the -ary timber height, where:
Relationship: Leaves & Branch Nodes
As we saw with binary trees, there's a relationship between the leaves of a k-ary tree and its branch nodes.
lemma. Let where is the number of leaves and is the number of branch nodes in a strict k-ary tree. Then:
or, alternatively,
Footnotes
-
Borrowing from legal terminology, we use the letter to stand for "issues," the legal term for a person's descendants. β©
-
Notice that the lumber height formula is also a rearrangement of the formula for the minimum number of nodes needed to achieve a height β©
-
Proper binary trees are also called full binary trees or plane binary trees. β©