Binary Trees
Binary trees are a core data structure behind much of the software products and services we use today: The most common use case is the binary search tree, used in numerous search applications where data is constantly ebbing and flowing in a given system. Binary space partitionining is used in 3D video games for determining which objects must be rerendered. Binary tries are used in high-bandwidth routers for storing router-tables, large address books that routers use to direct internet traffic. Huffman coding trees are binary trees used in compression algorithms, most notably the .jpeg and .mp3 file formats. The list goes on.
Why are binary trees more commonly used than ternary trees or other -ary trees? Simply because -ary trees are much more complicated. We can do many interesting things with -ary trees, but for most applications, we gain little in terms of time savings. And as we know, speed is everything when it comes to data structures. Of course, -ary trees outperform binary trees in certain situations, but the fact is, binary trees are easier to work with and cover more problem domains.
Definitions
Binary trees are a special kind of tree. They consist of nodes in a parent-child relationship. All binary trees have a single rootβwhat we might think of as the first nodeβfrom which all other nodes descend. Moreover, binary trees have a two-child policyβgo to any node, and we'll find either zero, one or two children, nothing more. Because of these properties, we say that all binary trees are trees, but not all trees are binary trees.
We also have the usual arboreal terms. The root is the top, or first, node in the tree. A child is a node directly descending from a node. The parent of a node is the node descends from. Siblings are nodes with the same parent. A leaf is a node with no children. A branch node is a node with children. An branch is the connection between a child node and its parent.
Static Binary Trees
As its name implies, an array-implemented binary tree (ABT) is a binary tree implemented, and represented, with an array. First, we have the following binary tree, with an accompanying empty array:
The challenge: How do we store each node in this binary tree in the array
such that the relationships between the nodes are preserved? For example,
B
is the left child of A
, and C
is the right child of A
. How is
that information preserved in an array implementation?
Here's an idea: Use the array's indices. Let's start by storing the
elements from the top generation to the bottom generation, left to right.
At we have a single node, A
. So we store in the array:
At we have two nodes, B
and C
, so we store, from left to
right, B
then C
:
Then, we go on to and see that there are nodes β
D
, E
, F
, and G
. Again we store the nodes from left to right:
Let's construct a table to see if we can find a pattern. Suppose is an index in the array, is the index of the left child, and is the index of the right child.
Element | |||
---|---|---|---|
A | |||
B | |||
C | |||
D | |||
E | |||
F | |||
G |
Examining the table, we see that at the first generation, the element's left child is offset by and its right child is offset by For the left node on the second generation, the element's left child is offset by and its right child is offset by For the right node on the second generation, the element's left child is offset by and its right child is offset by Examining these mappings we that, to get the left child, we use:
and to get the right child, we use:
These two formulas allow us to access the left and right children. But what about the parents? Well, let's construct another table, where is the index of the given element's parent.
Element | ||
---|---|---|
A | ||
B | ||
C | ||
D | ||
E | ||
F | ||
G |
Examining this table, we see that given an element with the index the index of its parent is:
This is a fairly useful piece of information, so we state it as a lemma:
lemma. Let be an array representing an array-implemented binary tree, and an index in the array. Given a branch node the left child of has the index where
and the right child of has the index where
Dynamic Binary Trees
The linked-list-implemented binary tree (LBT) looks similar to a
general linked list in implementation. Usually, there's a root pointer, a
pointer in the stack whose pointee is the root of the binary tree. For
the binary tree, each node has next fields β one for a
leftChild
pointer and one for the rightChild
pointer, and data
field. Furthermore, each node in the tree, other than the root node, is
the lone pointee of another node.
Above, we see that each node is a square comprised of one left compartment and one right compartment, beneath a larger compartment. The larger compartment is called the data field, and stores the substantive data. The left compartment houses the left subtree pointer, and the right compartment houses the right subtree pointer.
For example, suppose we had the following binary tree:
This tree would be implemented as:
If we look at the representation, we see that there are nodes, corresponding to the elements we seek to store. Notice that with nodes, there are null pointers. If there are nodes, then are null pointers.
To begin, we need to implement the basic node data structure, as we saw with linked lists, along with its constructor in pseudocode:
- structure contains
- end
We'll define the binary tree data structure separately:
- structure contains
- end
Node Insertion
Below is one approach to inserting a new node into a binary tree, using the structural definitions for node and binary-tree provided previously.
Binary Tree Node Insertion
- Identifier: insert
- Preconditions:
- Structure is defined.
- Structure is defined.
- Input:
- , the insertion's targeted tree.
- , the inserted node's data.
- Output:
- if then
- init
- else
- init
- init
- while do
- if then
- else if then
- else return
- if then
- else
- return
The procedure here is fairly straightforward. insert takes two arguments, and is the tree the new node will be inserted into, and is the accompanying data value. By definition, is either null or nonnull. This yields a clear branch in the procedure: One branch handles the null case, the other the nonnull case.
We handle the first case starting at line zero and ending at line two. If the root of is undefined, then we instantiate a new node initialize its data value to and assign as the root. Note that in languages like C, line one may be a cause for concern. As is, the procedure above makes no checks for potential memory allocation failures.
The second case is where the tree argument is not null. insert handles this starting at line three. This part of the procedure is a bit more involved. First, we need two nodes and We bind the 's root to and to Since we only entered the branch because was nonnull, we can infer that starts with some nonnull value. From there, we enter the while loop.