Binary Trees

Binary trees are a core data structure behind much of the software products and services we use today: The most common use case is the binary search tree, used in numerous search applications where data is constantly ebbing and flowing in a given system. Binary space partitionining is used in 3D video games for determining which objects must be rerendered. Binary tries are used in high-bandwidth routers for storing router-tables, large address books that routers use to direct internet traffic. Huffman coding trees are binary trees used in compression algorithms, most notably the .jpeg and .mp3 file formats. The list goes on.

Why are binary trees more commonly used than ternary trees or other ${n}$ -ary trees? Simply because ${n}$ -ary trees are much more complicated. We can do many interesting things with ${n}$ -ary trees, but for most applications, we gain little in terms of time savings. And as we know, speed is everything when it comes to data structures. Of course, ${n}$ -ary trees outperform binary trees in certain situations, but the fact is, binary trees are easier to work with and cover more problem domains.

Definitions

Binary trees are a special kind of tree. They consist of nodes in a parent-child relationship. All binary trees have a single root—what we might think of as the first node—from which all other nodes descend. Moreover, binary trees have a two-child policy—go to any node, and we'll find either zero, one or two children, nothing more. Because of these properties, we say that all binary trees are trees, but not all trees are binary trees.

We also have the usual arboreal terms. The root is the top, or first, node in the tree. A child is a node directly descending from a node. The parent of a node ${n}$ is the node ${n}$ descends from. Siblings are nodes with the same parent. A leaf is a node with no children. A branch node is a node with children. An branch is the connection between a child node and its parent.

Static Binary Trees

As its name implies, an array-implemented binary tree (ABT) is a binary tree implemented, and represented, with an array. First, we have the following binary tree, with an accompanying empty array:

The challenge: How do we store each node in this binary tree in the array such that the relationships between the nodes are preserved? For example, B is the left child of A, and C is the right child of A. How is that information preserved in an array implementation?

Here's an idea: Use the array's indices. Let's start by storing the elements from the top generation to the bottom generation, left to right. At ${L = 1,}$ we have a single node, A. So we store in the array:

At ${L = 2,}$ we have two nodes, B and C, so we store, from left to right, B then C:

Then, we go on to ${L = 3,}$ and see that there are ${4}$ nodes — D, E, F, and G. Again we store the nodes from left to right:

Let's construct a table to see if we can find a pattern. Suppose ${i}$ is an index in the array, ${i_L}$ is the index of the left child, and ${i_R}$ is the index of the right child.

Element	${i}$	${i_L}$	${i_R}$
`A`	${0}$	${1}$	${2}$
`B`	${1}$	${3}$	${4}$
`C`	${2}$	${5}$	${6}$
`D`	${3}$	${\varnothing}$	${\varnothing}$
`E`	${4}$	${\varnothing}$	${\varnothing}$
`F`	${5}$	${\varnothing}$	${\varnothing}$
`G`	${6}$	${\varnothing}$	${\varnothing}$

Examining the table, we see that at the first generation, the element's left child is offset by ${1,}$ and its right child is offset by ${2.}$ For the left node on the second generation, the element's left child is offset by ${2}$ and its right child is offset by ${3.}$ For the right node on the second generation, the element's left child is offset by ${3}$ and its right child is offset by ${4.}$ Examining these mappings we that, to get the left child, we use:

i_L = 2i + 1

and to get the right child, we use:

i_R = 2i + 2

These two formulas allow us to access the left and right children. But what about the parents? Well, let's construct another table, where ${i_P}$ is the index of the given element's parent.

Element	${i}$	${i_P}$
`A`	${0}$	${\varnothing}$
`B`	${1}$	${0}$
`C`	${2}$	${0}$
`D`	${3}$	${1}$
`E`	${4}$	${1}$
`F`	${5}$	${2}$
`G`	${6}$	${2}$

Examining this table, we see that given an element with the index ${i,}$ the index of its parent is:

i_P = \left\lfloor \dfrac{i}{2} \right\rfloor

This is a fairly useful piece of information, so we state it as a lemma:

lemma. Let ${A}$ be an array representing an array-implemented binary tree, and ${i}$ an index in the array. Given a branch node ${A[i],}$ the left child of ${A[i]}$ has the index ${i_L,}$ where
$i_L = 2i + 1$
and the right child of ${A[i]}$ has the index ${i_R,}$ where
$i_R = 2i + 2$

Dynamic Binary Trees

The linked-list-implemented binary tree (LBT) looks similar to a general linked list in implementation. Usually, there's a root pointer, a pointer in the stack whose pointee is the root of the binary tree. For the binary tree, each node has ${2}$ next fields — one for a leftChild pointer and one for the rightChild pointer, and ${1}$ data field. Furthermore, each node in the tree, other than the root node, is the lone pointee of another node.

A low level visualization of a binary tree.

Above, we see that each node is a square comprised of one left compartment and one right compartment, beneath a larger compartment. The larger compartment is called the data field, and stores the substantive data. The left compartment houses the left subtree pointer, and the right compartment houses the right subtree pointer.

For example, suppose we had the following binary tree:

This tree would be implemented as:

If we look at the representation, we see that there are ${7}$ nodes, corresponding to the ${7}$ elements we seek to store. Notice that with ${7}$ nodes, there are ${8}$ null pointers. If there are ${N}$ nodes, then are ${N + 1}$ null pointers.

To begin, we need to implement the basic node data structure, as we saw with linked lists, along with its constructor in pseudocode:

structure ${\df{node}}$ contains
1. ${\df{comparable} ~ \tx{datum}}$
2. ${\df{node}^* ~ \tx{left-child}}$
3. ${\df{node}^* ~ \tx{right-child}}$
end

We'll define the binary tree data structure separately:

structure ${\df{binary-tree}}$ contains
1. ${\df{node}^* ~ root}$
end

Node Insertion

Below is one approach to inserting a new node into a binary tree, using the structural definitions for node and binary-tree provided previously.

Binary Tree Node Insertion

Identifier: insert
Preconditions:
- Structure ${\df{node}}$ is defined.
- Structure ${\df{binary-tree}}$ is defined.
Input:
- ${\df{binary-tree}^*}$ ${t}$ , the insertion's targeted tree.
- ${\df{comparable}}$ ${d}$ , the inserted node's data.
Output: ${\df{binary-tree}^*}$

if ${~ {t} \ix{\tx{root}} = \nil ~}$ then
1. init ${\df{node}^*}$ ${\let{n}{\df{node}(d)}}$
2. ${{t} \ix{\tx{root}} \gets n}$
else
1. init ${\df{node}^*}$ ${p \gets {t} \ix{\tx{root}}}$
2. init ${\df{node}^*}$ ${r \gets \nil}$
3. while ${p \neq \nil}$ do
  1. ${r \gets p}$
  2. if ${{d} \lt p \ix{\tx{datum}}}$ then ${p \gets p \ix{\tx{left-child}}}$
  3. else if ${{d} \gt p \ix{\tx{datum}}}$ then ${p \gets p \ix{\tx{right-child}}}$
  4. else return ${t}$
4. ${p \gets \df{node}({d})}$
5. if ${{d} \lt r \ix{\tx{datum}}}$ then ${r \gets p \ix{\tx{left-child}}}$
6. else ${r \ix{\tx{right-child}} \gets p}$
return ${t}$ ${\blacksquare}$

The procedure here is fairly straightforward. insert takes two arguments, ${t}$ and ${d.}$ ${t}$ is the tree the new node will be inserted into, and ${d}$ is the accompanying data value. By definition, ${t}$ is either null or nonnull. This yields a clear branch in the procedure: One branch handles the null case, the other the nonnull case.

We handle the first case starting at line zero and ending at line two. If the root of ${t}$ is undefined, then we instantiate a new node ${n,}$ initialize its data value to ${d,}$ and assign ${n}$ as the root. Note that in languages like C, line one may be a cause for concern. As is, the procedure above makes no checks for potential memory allocation failures.

The second case is where the tree argument is not null. insert handles this starting at line three. This part of the procedure is a bit more involved. First, we need two nodes ${p}$ and ${r.}$ We bind the ${t}$ 's root to ${p,}$ and ${\nil}$ to ${r.}$ Since we only entered the branch because ${t}$ was nonnull, we can infer that ${p}$ starts with some nonnull value. From there, we enter the while loop.