Number Theory for CS

Numeric Algorithms

This article presents some of the key algorithms associated with mathematical computations. These include prime-finders, factorization, numeric approximations, and bit manipulations.

Number Representation

Question: How many symbols do we need to represent any number? There are infinitely many numbers, but we cannot have infinitely many symbols. Thus, we need a finite number of symbols that allows us to represent every conceivable number. For most of the world, that finite number is ten:

0 1 2 3 4 5 6 7 8 9 \huge 0~1~2~3~4~5~6~7~8~9

Number systems that use ten symbols to represent all numbers are called decimal number systems, or base-10 systems. In the Western decimal number system, ten symbols (called the Hindu-Arabic numerals) are used to represent every conceivable number. The number of unique symbols used in a numbering system to represent a number in that system is called the base or radix of the numbering system. For example, with the decimal system, the radix is 10.{10.} For the ancient Mayans, the radix was 20{20} (called a vigesimal system). For the Babylonians, the radix was 60{60} (called a base-60 system).

To understand how the base works, we'll consider the decimal system. First, we have enough symbols to represent the numbers zero through nine:

0 1 2 3 4 5 6 7 8 9 \huge 0~1~2~3~4~5~6~7~8~9

The symbols above, however, hide a critical fact: They all have an infinite amount of "slots" or "places", immediately to the left, occupied by a zero:

010203040506070809 \large \begin{aligned} & \ldots 01 \\ & \ldots 02 \\ & \ldots 03 \\ & \ldots 04 \\ & \ldots 05 \\ & \ldots 06 \\ & \ldots 07 \\ & \ldots 08 \\ & \ldots 09 \\ \end{aligned}

When we reach the number nine, we've run out of unique symbols. So to represent the number ten, we go to the slot on the left (occupied by a zero), and increment it by 1:{1:}

010203040506070809101119 \begin{aligned} & \ldots 01 \\ & \ldots 02 \\ & \ldots 03 \\ & \ldots 04 \\ & \ldots 05 \\ & \ldots 06 \\ & \ldots 07 \\ & \ldots 08 \\ & \ldots 09 \\ & \ldots 10 \\ & \ldots 11 \\ & \vdots \\ & \ldots 19 \end{aligned}

Once we get to nineteen, we increment the slot again:

1011121314151617181920 \begin{aligned} & \ldots 10 \\ & \ldots 11 \\ & \ldots 12 \\ & \ldots 13 \\ & \ldots 14 \\ & \ldots 15 \\ & \ldots 16 \\ & \ldots 17 \\ & \ldots 18 \\ & \ldots 19 \\ & \ldots 20 \end{aligned}

Because there are ten symbols, and for each of these ten we can assign one of the ten symbols immediately to its left, we have 10×10=102=100{10 \times 10 = 10^2 = 100} unique symbols. This allows us to represent numbers from zero (0{0}) to ninety-nine (99{99}).

Once we reach 99,{99,} we've run out of symbols for the second slot, so we turn to the third slot: 100.{100.} Now we have 10×10×10=103=1000{10 \times 10 \times 10 = 10^3 = 1000} unique symbols. This allows us to count from zero to nine hundred and ninety-nine (999{999}). Once we've reached 999,{999,} the process continues. We go to the fourth slot, giving us 104{10^4} unique symbols. Once that's run out we go to the fifth—105{10^5} unique symbols; then the sixth—106{10^6} unique symbols; seventh—107;{10^7;} eighth—108;{10^8;} ad infinitum.

So, we have a way to generate unique symbols. But using this approach, we have a problem: How do we ensure that someone seeing 67{67} doesn't interpret it as 6{6} and 7?{7?} This is accomplished by everyone agreeing to intrepret the second position as indicating how many times we've used up symbols for the first position. For example, in the case of 67,{67,} we interpet the 6{6} as: "We've run through the symbols 0 1 2 3 4 5 6 7 8 9{0~1~2~3~4~5~6~7~8~9} six times in the first position." With the 7{7} included, the number 67{67} translates to "Six of the ten symbols and seven of the symbols":

(6×10)+7 (6 \times 10) + 7

When we see 826,{826,} we think: "Eight of ten of the ten symbols, two of ten of the symbols, and six of the symbols:"

(8×10×10)+(2×10)+6 (8 \times 10 \times 10) + (2 \times 10) + 6

This process continues over and over, embodying the following pattern:

(si×10k)(s3×103)+(s2×102)+(s1×101)+s0 (s_i \times 10^k) \ldots (s_3 \times 10^3) + (s_2 \times 10^2) + (s_1 \times 10^1) + s_0

where k{k} and si{s_i} are positive integers. Each of the terms above is called a place, and they are multiples of ten. For example, s2×102{s_2 \times 10^2} is the tenth's place, s3×103{s_3 \times 10^3} is the hundredth's place, and so on. The term s0{s_0} is often called the one's place, and it is mathematically denoted s0×100:{s_0 \times 10^0:}

(si×10k)(s3×103)+(s2×102)+(s1×101)+(s0×100) (s_i \times 10^k) \ldots (s_3 \times 10^3) + (s_2 \times 10^2) + (s_1 \times 10^1) + (s_0 \times 10^0)

Octal Number System

Although the decimal system allows us to write large numbers with fewer symbols, the more symbols we use, the more computations we must perform and the more likely we are to commit computational errors. Inversely, the fewer symbols we need, the less computations we must perform, and the less likely we are to commit computational errors. The tradeoff, however, is readability. With fewer digits, larger numbers are long and difficult to decipher. Nevertheless, some problems (as we'll later see) do not require humans to directly read the numbers, so we've gone ahead and used number systems with fewer symbols. One such system is the octal number system.

To understand the octal number system, it's worth mentioning a bit of its history. Before the French Revolution, most of Europe used a dizzying array of inconsistent measurement systems. At the turn of the nineteenth century, global trade increased dramatically, and the ancient units proved costly for international transactions. The French merchant would appear in court arguing he received less than the agreed upon amount, 1{1} pied du roi (the length of the King's foot, about 32.48{32.48} cm), with the German merchant responding that he gave the correct amount, 1{1} Fuß (possibly 23.6{23.6} cm; there were competing values of Fuß).

In 1799, the Académie des sciences, a French learned society known for solving scientific and mathematical problems, tasked a panel of scientists and mathematicians to remedy these problems.1 Their solution: a decimal system of measurement relying on ten symbols, using prefixes like deci, centi, milli, deca, hecto, kilo, and several others. As we can guess, we know this system today as the metric system.

Across the Channel, the English had their own system and its proponents. One such defender was the Scottish economist James Anderson. Weary of the costs of changing a centuries-old measurement system, Anderson advocated for an alternative more closely aligned with the English system. Instead of relying on ten symbols, Anderson's system relied on eight:

0 1 2 3 4 5 6 7 \huge 0 ~ 1 ~ 2 ~ 3 ~ 4 ~ 5 ~ 6 ~ 7

Likely inspired by the Latin octo, meaning "eight," Anderson called his alternative the octal system. Knowing the basics of number systems, we see that in the octal number system, the radix is eight—a base-8 number system. And with eight symbols, we can represent the numbers zero through seven. Like what we saw with the decimal system, we run out of symbols once we reach 7.{7.} To represent the number eight, we add a new symbol to the left, and start the symbol to the right back at 0:{0:}

10 \huge 10

Thus, in the octal number system, the number eight is represented as 10.{10.} Nine is 11,{11,} ten is 12,{12,} eleven is 13,{13,} and so on. Once we reach 17,{17,} the number fifteen, we increment the number to the right and start at zero on the left: The number sixteen is 20.{20.}

Similar to what we saw with the decimal system, going from right to left, the first position indicates the one's, and the second position indicates how many times we've gone through the first position. The third position indicates how many times we've gone through the second, the fourth the third, the third the fourth, and so on. The difference, however, is that instead of using a base of ten, we use a base of eight:

(si×8k)(s3×83)+(s2×82)+(s1×81)+(s0×80) (s_i \times 8^k) \ldots (s_3 \times 8^3) + (s_2 \times 8^2) + (s_1 \times 8^1) + (s_0 \times 8^0)

Thus, the octal number 117{117} is actually:

(1×82)+(1×81)+(7×80)=79 (1 \times 8^2) + (1 \times 8^1) + (7 \times 8^0) = 79

Because octal numbers use the same symbols as the decimal numbers, we indicate them with subscripted brackets containing the base. For example, the octal number 117{117} is formally written as:

117[8] \Large 117_{[8]}

Alternatively (and in some computer languages), the number 117{117} would be indicated as:

0o117 \Large \texttt{0o117}

Following standard practice, we will use the former syntax in exposition, but the latter when referring to code. To convert quickly from decimal to octal, it's helpful to know the first ten powers of eight by heart:

81=882=6483=51284=409685=32 76886=262 14487=2 097 15288=16 777 21689=134 217 728810=1 073 741 824 \begin{aligned} &8^1 = 8 \\ &8^2 = 64 \\ &8^3 = 512 \\ &8^4 = 4096 \\ &8^5 = 32~768 \\ &8^6 = 262~144 \\ &8^7 = 2~097~152 \\ &8^8 = 16~777~216 \\ &8^9 = 134~217~728 \\ &8^{10} = 1~073~741~824 \end{aligned}

Primality Tests

Suppose we're given a number n.{n.} We want to answer the following question: Is n{n} a prime number?

Recall the definition for a prime number. n{n} is a prime number only if it satisfies the following propositions:

  1. n{n} is a natural number,
  2. n{n} is greater than 1,{1,} and
  3. n{n} consists of only two divisors: n{n} and 1.{1.}

We denote the set of primes with the symbol P.{\mathbb{P}.} The set below illustrates this, listing the first twenty-four prime numbers, with ellipses to indicate more:

P={23571113171923293137414347535961677173798389} \mathbb{P} = \left\{ \begin{matrix} 2 & 3 & 5 & 7 \\ 11 & 13 & 17 & 19 \\ 23 & 29 & 31 & 37 \\ 41 & 43 & 47 & 53 \\ 59 & 61 & 67 & 71 \\ 73 & 79 & 83 & 89 \\ \vdots & \vdots & \vdots & \vdots \end{matrix} \right\}

Algorithms that determine whether a given number n{n} is a prime number are called primality tests. These algorithms are immensely important in cryptography. Cryptography itself is what ensures sensitive data like passwords, text messages, emails, and credit card numbers stay secure as they are transmitted across networks.

Brute-force Primality Test

There are a variety of ways to implement a primality test, but we start with the simplest algorithm: a brute-force primality test. This algorithm is part of a family of algorithms called brute-force algorithms. These algorithms solve problems in the most simple, direct, or obvious way. For the prime number testing, the brute-force primality test is closely related to proof-by-exhaustion, a method of proof in mathematics. We exhaust all of the possibilities before we exhaust ourselves (or more accurately, our machine's RAM).

To implement the brute-force primality test, we study the mathematical definition:

definition. If a natural number n{n} is prime, then n{n} is greater than 1{1} and is not divisible by any positive integer other than 1{1} and n.{n.}

When implementing algorithms based on mathematical definitions, it's helpful to read the definitions in all directions. In this case, let's read the definition in terms of its contrapositive:

definition. If n{n} is less than or equal to 1{1} or is divisible by a positive integer other than 1{1} and n,{n,} then n{n} is not prime.

The contrapositive essentially lays out the brute-force algorithm. The implementation is simple: Using a for-loop, we go through all of the integers from 1{1} to n1,{n - 1,} verifying no number divides n{n} without a remainder. If there is such a number, that would imply that n{n} has a divisor other than 1{1} and itself. This violates the last requirement above, and n{n} is not a prime number. Otherwise, n{n} is a prime number. Implementing in pseudocode:

fn brute_force_primality_test(int n):
	if (n < 2): return "n is not prime";
	else:
	for (int i = 2; i < n - 1; i++):
		if (n % i == 0): return "n is not prime";
	return "n is prime";

With brute-force primality test, we must iterate through n1{n - 1} elements. As such, the running time function for this algorithm can be expressed as f(n)=n1.{f(n) = n - 1.} Applying asymptotic analysis, the brute-force primality test, in the worst-case scenario, has a running time of order O(n).{O(n).} This is linear time. Can we do better? Let's consider another algorithm.

Trial Division

The second primality test we investigate is trial division. This algorithm relies on the same definition above, but the key insight lies in honing in on what constitutes a composite number—a number that is not prime.

definition. If n{n} is composite, then the following proposition is true: n=ab\large n = a \cdot b where a,bN,{a, b \in \nat,} a2,{a \geq 2,} and bn1{b \leq n - 1}

With the brute-force primality test, we attempted to find the divisor a.{a.} The index i{i} represents the divisor a,{a,} and we iterated up to n1{n-1} checking to see if na=b.{\dfrac{n}{a} = b.} Now, given any two numbers p{p} and q,{q,} or in our case, a{a} and b,{b,} one will be less than or equal to the other. This proposition originates in the trichotomy law in real analysis. Suppose ab.{a \leq b.} This is a perfectly reasonable assumption to start with; we don't lose any generality in doing so.

So, what's next? This is where we get creative. We look at the facts we have so far and try to draw further facts. We assumed that ab.{a \leq b.} And before that, we saw that the composite number n{n} has the form ab.{a \cdot b.} Our inequality looks like ab,{a \leq b,} so what if we tried getting it to look like n=ab?{n = a \cdot b?} To do so, we multiply a{a} to both sides:

aaaba2ab \begin{aligned} a \cdot a &\leq a \cdot b \\ a^2 &\leq a \cdot b \end {aligned}

That's an interesting result. If n=ab{n = a \cdot b} and ab,{a \leq b,} we have:

a2n a^2 \leq n

Now if we took the square root of both sides:

a2nan \begin{aligned} \sqrt{a ^ 2} &\leq \sqrt{n} \\ a &\leq \sqrt{n} \end{aligned}

This is another big insight. If n=ab{n = a \cdot b} and ab,{a \leq b,} then an.{a \leq \sqrt{n}.} Given that a{a} must be less than n,{\sqrt{n,}} we don't have to search up to n1.{n - 1.} Instead, we only have to search up to n.{\sqrt{n}.} The implementation is thus:

trial_division_primality(int n):
	if (n < 2): return "n is not prime";
	else:
		for (int i = 2; i < (sqrt(n)); i++):
			if n % i == 0:
				return "n is not prime";
		return "n is prime";

With trial-division, the algorithm only takes n1{\sqrt{n} - 1} operations to execute. Hence, this algorithm has a time complexity of O(n),{O(\sqrt{n}),} or O(n1/2).{O(n^{1/2}).} This is fractional power time, which is faster than linear time. Note that the time complexity for this algorithm may vary depending on how sqrt(n) is implemented. To avoid these issues, we can instead establish the test condition as i*i < n rather than i < sqrt(n):

trial_division_primality(int n):
	if (n < 2): return "n is not prime";
	else:
		for (int i = 2; i*i < n; i++):
			if n % i == 0:
				return "n is not prime";
		return "n is prime";

Prime Factorization

Closely related to primality tests is prime factorization—expressing a number as a product of their prime factors. For example, the number 12{12} in terms of its prime factors is:

2×2×3=12 2 \times 2 \times 3 = 12

We can visualize prime factorization in terms of a tree. For example, the prime factors of 100.{100.} We start with 2,{2,} the first prime number. Since 100{100} is divisible by 2,{2,} we get 50:{50:}

Then we consider the prime factors of 50.{50.} We start again with 2,{2,} and we see that 50{50} is also divisible by 50:{50:}

Then we look at 25.{25.} We go back to testing 2.{2.} 25{25} is not divisible by 2,{2,} so we consider the next prime number, 3.{3.} 25{25} is not divisible by 3,{3,} so test the next prime number, 5.{5.} Here, 25{25} is divisible by 5:{5:}

Finally, we have 5,{5,} which leads to:

Once we've reached the factor 1,{1,} we know we've found the prime factors.

2×2×5×5=22×52=100 2 \times 2 \times 5 \times 5 = 2^2 \times 5^2 = 100

Implementing the procedure above in pseudocode:

int factors[100];
int exponents[100];
int length = 0;
fn primeFactors(int m) -> void:
	int d = 2;
	while (m > 1):
		int k = 0;
		while (m % d == 0):
			m = m / d;
			k++;
		if (k > 0):
			length++;
			factors[length] = d;
			exponents[length] = k;
		d++;

In the pseudocode above, the factors[] array will hold the prime factors we find, and the exponents[] array will hold the exponents we find. Let's run through the code above with a simple example. Suppose we want to find the prime factors of 24.{24.} First, we call primeFactors(24):

primeFactors(24):
	int m = 24;
	int d = 2;

We then reach while-loop-1. Here, the test condition is (m > 1). In this case, 24 > 1, so we proceed:

primeFactors(24):
	int m = 24;
	int d = 2;
		while (m > 1):
		int k = 0;

We then encounter our while-loop-2. The test condition here is (m % d == 0). In this case, (24 % 2 == 0) is true, so we proceed:

primeFactors(24):
	int m = 24;
	int d = 2;
	while (m > 1):
		int k = 0;
		while (m % d == 0):
				m = m / d; m = 24 / 2 = 12
				k++; k = 1

Given that m = 12, while-loop-2's condition is still true, so we execute again:

primeFactors(24):
	int m = 24;
	int d = 2;
	while (m > 1):
		int k = 0;
		while (m % d == 0):
				m = 12
				k = 1
				m = m / d; m = 12 / 2 = 6
				k++; k = 2

Once more our test condition is true, so we execute again

primeFactors(24):
	int m = 24;
	int d = 2;
	while (m > 1):
		int k = 0;
		while (m % d == 0):
				m = 6
				k = 2
				m = m / d;
				m = 6 / 2 = 3
				k++;
				k = 3

Going back to while-loop-2, the test condition returns false—(3 % 2 != 0). So, we go to the next line in while-loop-1. In this case, it's an if-statement, whose test condition is true:

primeFactors(24):
	int m = 24;
	int d = 2;
	while (m > 1):
		int k = 0;
		while (m % d == 0):
			m = m / d;
			k++;
			k = 3
			if (k > 0):
				factors[length] = d;
				factors[0] = 2
				exponents[length] = k;
				exponents[0] = 3
				length++ factors = 1
			d++; d = 3

Now m = 3, and we go back to while-loop-1's test condition. Here, we see that m > 1 == 3 > 1, which is true. So, we again execute while-loop-1's body. We set int k = 0, and reach while-loop-2. Here, we see that m % d == 0 is true—3 % 3 == 0. So, we proceed executing while-loop-2's body:

primeFactors(24):
	int m = 24;
	int d = 2;
		while (m > 1):
			int k = 0;
			while (m % d == 0):
				m = m / d; m = 3 / 3 = 1
				k++; k = 1

Now, with m = 1, while-loop-2's condition is false, so we exit while-loop-2 and continue with the rest of the statements in while-loop-1. We see that k > 0 ${\equiv}$ 1 > 0, so we execute the if-block:

primeFactors(24):
	int m = 24;
	int d = 2;
		while (m > 1):
			int k = 0;
			while (m % d == 0):
				m = m / d;
				k++;
			k = 1
			if (k > 0):
				length = 1
				factors[length] = d;
				factors[1] = 3;
				exponents[length] = k;
				exponents[1] = 1;
				length++ length = 2
			d++; d = 4

With m = 1, our while-loop-1's condition is false, and we have finished. With the data from these two arrays, we have:

24=23×31{24 = 2^3 \times 3^1}

Which is, in fact, the prime factorization of 24.{24.} Intuitive as this algorithm is, it performs poorly in terms of time complexity. The worst case scenario in this case is when m is a prime number. If we call primeFactors(23), for example, we would have to increment d{d} all the way up to 23.{23.} This leaves us with a time complexity of O(n).{O(n).}

Bitwise Operations

For most developers, the bitwise operations are the least used. They can, however, come in very handy with certain problems. To begin, let's review these operations:

OperationDescription
a & bBitwise-AND. Given two binary digits, returns 0 if any of the two digits are 0, otherwise 1.
a | bBitwise-OR. Given two binary digits, returns 1 if any of the two gits are 1, otherwise 0.
a ^ bBitwise-XOR. Given two binary digits, returns 1 if exactly one of the two digits is 1, otherwise 0.
~aBitwise-NOT. Given a single binary digit, returns 0 if the digit is 1, and 1 if the digit is 0.
a<<bBitwise-left-shift. Moves all of the bits to the left, with the position previously occupied by the least-significant bit filled with a 0.
a>>bBitwise-right-shift. Moves all of the bits to the right, with the position previously occupied by the most-significant bit filled with a 0.

Let's see some examples. For the bitwise-AND:

0000101&   00001110000101\begin{aligned} &0000101 \\ \texttt{\&{}}~~~ &0000111 \\ \hline &0000101 \end{aligned}

Notice that we get 1 if, and only if, we have 1 & 1. Otherwise, it's 0. Furthermore, if we converted the binary numbers above into decimal, we get:

5 & 7 = 5\texttt{5 \&{} 7 = 5} Now let's see the bitwise-OR:

0000101|   00001110000111\begin{aligned} &0000101 \\ \texttt{|{}}~~~ &0000111 \\ \hline &0000111 \end{aligned}

In contrast to the bitwise-AND, for the bitwise-OR we get 0 if, and only if, we have 0 | 0. Otherwise, get 1. Thus, we have:

5 | 7 = 7\texttt{5 | 7 = 7}

Closely related is the bitwise-XOR. This is essentially bitwise-OR, but with one restriction: We can't have two 1s. In other words, we get 1 if, and only if, we have exactly one 1:

0000101ˆ   00001110000010\begin{aligned} &0000101 \\ \texttt{\^{}}~~~ &0000111 \\ \hline &0000010 \end{aligned}

Thus, we have: 5 ˆ 7 = 2\texttt{5 \^{} 7 = 2}

Next, the bitwise-NOT. Unlike the previous operators, bitwise-NOT is a unary operator—it applies to only one operand. Thus:

˜   00001111111000\begin{aligned} \texttt{\~{}}~~~ &0000111 \\ \hline &1111000 \end{aligned}

This yields: ˜ 7 = 120\texttt{\~{} 7 = 120}

Footnotes

  1. Among the panelists were the mathematicians Joseph-Louis Lagrange, Pierre-Simon Laplace, and Nicolas de Condorcet.