Complexity Analysis

This section provides a general overview of algorithmic analysis in programming terms. We present the motivating question, its answer, as well as illustrations of the answer applied. The materials that follow assume a basic understanding of functions, limits, sequences, and series (i.e., comfort with ideas from a basic calculus and discrete mathematics course). For a thorough mathematical justification of algorithmic analysis,

The most pertinent concern for analyzing algorithms is efficiency. One measure of efficiency is the amount of time an algorithm takes to execute. That is, how long it takes a given algorithm to solve a particular problem. To do so, however, we need some agreed-upon method for quantifying "speed." We can't rely on timing from start to finish if we want to compare algorithms across different machines, since every machine is different. Solving some differential equation on a small mobile phone will be slower than running the same operation on a super computer. Although it may be useful to know how fast the search runs on just one device model, it's not the principal concern in the analysis of algorithms. We want generality.

We often also want to know the amount of space an algorithm takes to execute. For example, one algorithm A{A} might be faster than algorithm B{B} but requires a significant amount of memory. Determining space complexity can serve as tie breakers in choosing one algorithm over another. Perhaps algorithms C{C} and D{D} take the same amount of time, but C{C} consumes less memory.

So how do we determine the amount of time or space an algorithm takes? One way is to count the number of basic steps performed as a function of n,{n,} where n{n} is the input size. These basic steps are listed below:

syntax
declarationinit a{\textbf{init} ~ a}
assignmentab{a \gets b}
comparisonsab,a<b,ab,a>b{a \le b, a \lt b, a \ge b, a \gt b}
additiona+b{a + b}
subtractionab{a - b}
multiplicationab{a \by b}
real divisionab{\dfrac{a}{b}}
integer divisiona/b{a / b}
remaindera rem b{a \rem b}
bitwise-not~x{\bnot x}
bitwise-anda & b{a \band b}
bitwise-ora | b{a \bor b}
bitwise-xora^b{a \bxor b}
bitwise-left-shifta << b{a \bls b}
bitwise-right-shifta >> b{a \brs b}
gotogoto(line x){\goto{x}}
returnreturn x{\textbf{return}~x}

example. Consider the summation algorithm below, which computes 0i<ni.{\tsum{0 \le i \lt n}{}{i}.}

  1. sum0{\let{sum}{0}}
  2. for (i0{\let{i}{0}}, i<n{i \lt n}, increment i{\df{increment}~i})
    1. sumsum+i{\let{sum}{sum' + i}}
  3. return sum{sum}

Counting the number of operations:

operationcountremark
sum0{\let{sum}{0}}1{1}Performed exactly once.
i0{\let{i}{0}}1{1}Performed exactly once.
i<n{i \lt n}n+1{n + 1}Comparison: performed after each iterate. It's true n{n} times and false once.
sum+i{{sum' + i}}n{n}Addition: performed only if the guard returns true.
sumsum+i{\let{sum}{sum' + i}}n{n}Assignment operation: performed only if the guared returns true.
increment i{\df{increment}~{i}}n{n}Increment: Performed only if the guard returns true.
bf sum{\text{bf}~{sum}}1{1}Performed exactly once.

Summing the number of executions for each operation, denoted t(n),{t(n),} we get:

t(n)=1+1+(n+1)+n+n+n+1=2+(n+1)+3n+1=3+(n+1)+3n=3+n+1+3n=3+1+n+3n=4+4n=4(1+n)=4(n+1) \eqs{ t(n) &= 1 + 1 + (n+1) + n + n + n + 1 \\ &= 2 + (n+1) + 3n + 1 \\ &= 3 + (n+1) + 3n \\ &= 3 + n + 1 + 3n \\ &= 3 + 1 + n + 3n \\ &= 4 + 4n \\ &= 4(1+n) \\ &= 4(n+1) \\ }

The final expression gives us the algorithm's runtime function as a closed form expression: t(n)=4(n+1).{t(n) = 4(n+1).} This is still just an estimate. If we wanted to be even more accurate, we would count the number of JUMP instructions executed by loop. And if we wanted to be even more accurate, we would count the number of calls to RAM the CPU made to retrieve data not found in its registers. And if we wanted to be even more accurate, we would count the number of steps the CPU took to check its processor registers, the number of steps needed to run through the cache hierarchy, and so on. The point is, counting steps will always be an estimation at best. This doesn't mean that step counting is a useless exercise. In fact, we'll often use step counting to build a base case or to build some intuition. Our final description, however, will be a complexity function.

Complexity Functions

To understand complexity analysis notations, we must understand how functions grow. In the preceding examples, we stated no requirements about the functions we used to model efficiency. At this point, we've seen enough to handle formality.

complexity function. Let f{f} be a function from R{\reals} to R{\reals} with xR.{x \in \reals.} Then f{f} is a complexity function if, and only if, the following axioms are true: (1) There exists a number nR+{n \in \reals^+} such that, if xn,{x \ge n,} then f(x)R.{f(x) \in \reals.} (2) There exists a number mR+{m \in \reals^+} such that, if xm,{x \ge m,} then f(x)0.{f(x) \ge 0.}

Complexity analysis — for our purposes — is limited to the class of functions above. We do so for several reasons. First, resources like time and memory are easiest to interpret in terms of positive numbers. Second, we could make things even easier by just limiting ourselves to the natural numbers, but then we'd exclude swathe of useful functions — f(n)=lgn,{f(n)=\lg n,} f(n)=n,{f(n)=\sqrt n,} and even the sinusoids f(n)=cos(n),f(n)=sin(n),etc.{f(n)=\cos(n), f(n)=\sin(n), etc.} So, we restrict complexity analysis to real-valued functions.

Requirement (1) in the definition ensures that we won't run into functions that never get defined on the reals when given positive arguments. For example, f(n)=n.{f(n) = \sqrt{-n}.} Requirement (2) is similar in spirit; we don't want to work with functions that never become positive. It's perfectly fine for the function to return negative values (e.g., f(n)=lgn{f(n) = \lg n}), but it's not ok for the function to always return negative values (e.g., f(n)=n2{f(n) = -n^2}).

Big O

big o. Given the complexity functions f:RR{f: \reals \to \reals} and g:RR,{g: \reals \to \reals,} if there exists a constant CR+{\C \in \reals^+} and a threshhold tR{t \in \reals} such that, for all xt,{x \geq t,} both f{f} and g{g} are defined with f(x)Cg(x),{\abs{f(x)} \le \C \cdot g(x),} then there exists a class of functions called big O of g,{g,} denoted O(g),{\bigO{g},} with fO(g).{f \in \bigO{g}.}

This definition has a few moving parts. But before we examine those parts in detail, let's get a big picture view first. The letter O{{\cal{O}}} comes from the German word ordnung, which means order of approximation. This gives us a hint as to what O{\cal{O}}'s purpose is.

Next, it's helpful to recall what a function is: It's a set of tuples. Given that we're working with complexity functions, we're looking at sets of pairs — objects of the form (x,y),{(x,y),} where x{x} and y{y} are real numbers. Say we had two complexity functions, f:RR{f: \reals \to \reals} and g:RR.{g : \reals \to \reals.} Both of these functions have the same domain and the same codomain, and again, both are sets of pairs. Suppose f{f} contains a pair (x0,y){(x_0,y)} and g{g} contains a pair (x0,b).{(x_0,b).} We're using y{y} instead of f(x0){f(x_0)} and b{b} instead of g(x0){g(x_0)} to make things clearer. Because y{y} and b{b} result from the same argument x0{x_0} and both y{y} and b{b} are real numbers, we can compare them. And if we did, the trichotomy law tells that there are three possible outcomes, only one of which is true:

Case 1Case 2Case 3
y<b{y \lt b}y=b{y = b}b<y{b \lt y}

Now, what big O says is this: If we ever see that case 1 is true and stays true for all the x{x}s after x0,{x_0,} or if we ever see that case 2 is true and stays true for all the x{x}s after x0,{x_0,} then we can use g{g} as an approximation of f.{f.} Why? Because g{g} contains all of f{f}'s values eventually.

So why exactly do we write fO(g)?{f \in \bigO{g}?} Because we want to emphasize that f{f} is an approximation. The notation O(g){\bigO{g}} represents a collection of functions — called a class of functions — whose y{y}s are, at some point, always less than or equal to g{g}'s b{b}s. Writing fO(g){f \in \bigO{g}} is akin to hedging our bets — we're not saying that f{f} is g,{g,} we're saying that f{f} is just one of the many functions covered by g.{g.} That second half is why describe this as hedging — there may actually be another function h,{h,} that's a much better approximation. That is, fO(h){f \in \bigO{h}} and O(h)O(g).{\bigO{h} \subseteq \bigO{g}.} To illustrate, consider the following function graphs:

00.511.522.533.544.555.56𝒙01020304050607080𝒚

example. Let f(x)=x2{f(x)=x^2} and g(x)=x3.{g(x)=x^3.} Then fO(x3),{f \in \bigO{x^3},} because if x>2,{x \gt 2,} then x3>x2.{x^3 \gt x^2.}

example. Function definitions do not always determine growth rates. For example, given f(x)=x2+1{f(x) = x^2+1} and g(x)=x2,{g(x)=x^2,} if C=2{C=2} and t=2,{t=2,} then 2x2>x2+1.{2x^2 \gt x^2 + 1.} Or, more explicitly, x2+x2>x2+1.{x^2 + x^2 \gt x^2 + 1.} thus, we have fO(x2){f \in \bigO{x^2}} when f(x)=x2+1{f(x) = x^2 + 1} and g(x)=x2.{g(x) = x^2.}

Having seen some examples, it's helpful to put the complexity functions in perspective. We do so with the table below. The times correspond to the approximate amount of time needed to completely execute an algorithm of order T.{T.}

T{T}ops/s{{\tx{ops/s}}}n=106{n=10^6}n=109{n=10^{9}}n=1012{n=10^{12}}
O(1){\bigO{1}}106{10^6}instantinstantinstant
109{10^9}instantinstantinstant
1012{10^{12}}instantinstantinstant
O(lgn){\bigO{\lg n}}106{10^6}
109{10^9}
1012{10^{12}}
O(n){\bigO{\sqrt{n}}}106{10^6}
109{10^9}
1012{10^{12}}
O(n){\bigO{n}}106{10^6}
109{10^9}
1012{10^{12}}
O(nlgn){\bigO{n \lg n}}106{10^6}
109{10^9}
1012{10^{12}}
O(n2){\bigO{n^2}}106{10^6}
109{10^9}
1012{10^{12}}