Graph Algorithms
The following materials are notes on graph algorithms. They assume knowledge of basic graph theory.
Introduction
Graph algorithms pervade modern life: Maps, web links, circuits, schedules, matching, computer networks, road systems, program structures, company hierarchies, are just a few examples. In fact, trees and linked lists are just graphs. Any connected data structure is the realization of a mathematical graph. The most common graphs in computer science, however, simple graphs.
graph. A graph is a structure The tuple is a pair of two sets. The set is a set of distinct objects called vertices or nodes, and the set is a set of distinct pairs called edges such that, if then For all edges we may denote an edge with a variable and describe it as, "The edge connecting and "
Later, we will include more operations (e.g., operations on vertices). Importantly, a practical API would also include some method of removing parallel edges. We will define these operations more elaborately as we proceed. In analyzing graph algorithms, we assume that β the proportion of vertices to edges for some graph β is bounded above by a some constant For this first section, we'll only be working with sets of edges. An edge is defined as follows.
Implementations
The graph definitions provided previously are abstractions of implementation details. Accordingly, it's at our discretion how the implementation should be handled. For some algorithms, we might want an edge implementation, in which case we will use the following definition.
- structure contains
-
- return
- end
Adjacency Matrices
One approach is to use an adjacency matrix, a nested Boolean array.
0 | 1 | 0 | 1 | 1 | |
0 | 0 | 1 | 0 | 0 | |
0 | 0 | 0 | 1 | 0 | |
0 | 0 | 0 | 0 | 1 | |
0 | 0 | 0 | 0 | 0 | |
0 | 1 | 0 | 0 | 0 |
Here, the leftmost column represents the edge sources, and the topmost row represents destinations. The edge for example, connects vertex (the source) and vertex (the destination), so we mark the corresponding cell as
As is usually the case with nested arrays, there are costs and benefits to this approach. As nested arrays, adjacency matrices are Simple to implement and maintain (and are conducive to linear algebra applications). They're also particularly efficient, compared to other approaches, for dense graphs. Moreover, for weighted graphs, the time complexity of edge weight lookup is On the other hand, instantiating the structure takes time, and if is small enough, memory allocation calls will likely predominate. They also require space, and iteration takes time.
Adacency Lists
An alternative approach is to use an adacency list: Use a connected data structure (typically, a hash table), where each key is a node (the source-nodes of an edge), and each value is an array of nodes (the end-nodes of the edges).
For algorithms involving adjacency lists, we will use the structure below.
structure contains
- variables
- constructor
-
- return
-
- binary operations
- init
-
add a vertex
-
add an edge
-
remove an edge
- if then
- while
- if then
- if then
- while
- if then
- if then
-
remove vertex
- if then
-
- if then
-
- if then
- init
In general: Adjacency lists are particularly space efficient for for space graphs. On the other hand, they become much less space efficient for denser graphs because of their connected nature. In contrast to the adjacency matrix, iterating over is much faster, but because edges are represented as key-array pairs, we have an edge weight lookup of Adjacency lists are also slightly more complicated to implement and maintain, and aren't nearly as conducive to linear algebra applications as adjacency matrices.
Edge Lists
Yet another representation is the edge list. This is little more than a list (or array) of edges:
As we can likely tell, this approach is rarely used in practice. There isn't any notion of ordering, and edge lookup comes at That said, they're often used as auxiliary data structures for certain algorithms, and we'll seem repeatedly.
Common Problem Patterns
Whenever we work with graphs, we want to always ask the following set of questions:
- Is the graph directed or undirected?
- Are the edges weighted?
- Is the graph sparse or dense?
- Should we use an adjacency matrix, adjacency list, edge list, or some other structure?
Additionally, it's helpful to spot some common problem patterns that are squarely in the domain of graph theory.
shortest path problem. Given a graph with vertices and find the shortest path from to
The shortest path problem is addressed by numerous algorithms. For unweighted graphs, we have breadth-first search and depth-first search. For weighted graphs, we have Dijkstra's Algorithm, A*, the list goes on. All of these algorithms have different tradeoffs. Some problems, however, ask us to just find a path.
connectivity problem. Given a graph with vertices and is there a path between and
This problem appears benign on its face, until we're asked to find a path between two nodes among millions of interconnections.
negative cycle problem. Given a weighted graph is there a negative cycle?
A negative cycle occurs when an edge has a negative value. This can lead to wrong computations involving edge weights. This is particularly problematic, given the fact that the results of those computations are often used as intermediary results for other graph algorithms (e.g., a shortest path solution). In other applications, negative cycles are desirable. For example, FOREX trading applications have a particular interest in these cycles, as negative edge weights can be interpreted as measures of risk reduction. To common algorithms for this problem are the Bellman-Ford Algorithm and the Floyd-Warshall Algorithm.
strong-connectivity problem. Let be a directed graph with self-contained cycles with For all self-contained cycles is there directed path from a vertex to every other vertex
The strong-connectivity problem usually arises as an intermediary problem in implementing some of the algorithms mentioned thus far. Popular solutions include Tarjan's Algorithm and Kosaraju's Algorithm.
traveling salesman problem. Given a weighted graph what is the shortest Hamiltonian cycle of
In plain English: We're given a list of addresses. The addresses are separated by varying distances β some are closer to others, others farther. Our task: Visit every address exactly once, then return to the address we started on. What's the shortest route to accomplishing this task correctly? This problem is NP-hard. Some common algorithms include the Held-Karp Algorithm, Branch-and-bound, and many approximation algorithms.
edge-cutting problem. Let where and a graph with connected components. Suppose and that implies What is the value of
vertex-cutting problem. Let where and a graph with connected components. Suppose and that implies What is the value of
When we remove an edge from some graph we potentially end up with two or more connected components of The edge-cutting problem asks which edge, when removed, causes the number of connected components to increase beyond some tolerance maximum Edges that causes this increase are called bridges.
The vertex-cutting problem is similar to the edge-cutting problem, but the focus here is on cutting edges. The question then, is, which vertex, if cut, would cause an increase to the tolerance maximum Vertices that cause these increases are called articulation points. Both the edge- and vertex-cutting problems are particularly important because they reveal bottlenecks and vulnerabilities in a given graph.
minimum spanning tree problem. Let be a weighted graph with a total edge weight Let be a set of graphs such that for all each is a graph with the vertex set and an edge set and a total edge weight Construct the graph such that
In plain English: We're given a weighted graph We want to create a version of that includes all its vertices, but with the smallest possible total weight (the sum of all the edge weights in ). Common algorithms for this problem include Kruskal's Algorithm, Prim's Algorithm, and Boruvka's Algorithm.
Depth First Search (DFS)
Depth First Search (DFS) is an algorithm for traversing a graph's nodes and edges. The algorithm has a time complexity of order The algorithm itself is often used as an auxiliary, so it's often used on its own. Instead, it's often augmented to another algorithm β searching, counting, deleting, etc. The name "depth first" comes from the algorithm plunging into a graph without regard to which edge it traverses next. It's primary driver is a check: Has this node been visited? If the node has been visited, the algorithm backtracks.
For tracking, we'll fill in the node we're currently on green. After a node is visited, we'll fill the visited node grey. First, we pick any node to start. This is where the name depth first comes from: We plunge into a graph without regard of where we might go next. Here, we'll start with the node | |
We mark visited, and plunge. There's only one option, so we visit this node next. | |
Mark visited and plunge. We only have one option again, so visit | |
Now we have two options, and Again β plunge. It doesn't matter which node we visit. We just dive into the deep end. We'll take purely for illustrative purposes. | |
At we have many options, but again, we pick arbitrarily. We'll choose | |
We only have one option, so we go there. | |
This is where things get interesting. We only have one option, But remember, we don't want to "revisit" nodes. So, we backtrack to To denote backtracking, we'll fill the nodes beige and the edges red. Backtracking to 7, we still have edges to choose. | |
Back at 7, we still have edges to choose. We pick | |
We pick another node arbitrarily. We'll go | |
We hit a deadend, so we backtrack to and pick another node. We only have one option, so we visit that node. | |
Now visit | |
We can't visit since we've already visited that node, so we backtrack to Once we're there, the only node we can visit is Visting that node, we've traversed the entire graph. |
Below are two possible implementations of DFS, one recursive, the other iterative.
rdfs
- Input: a graph a vertex and a function operating on
- Output: the graph
- function
- if return
- foreach
- if then
- return
- if then
- return
idfs
- Input: a graph a vertex and a function operating on
- Output: the graph
- init
-
current vertex
- while
- foreach
- if then
- if then
- return