The rationals "contain" the integers and the naturals, in the sense that given a rational number, we can get back the integer or natural by "unwrapping" the relevant equivalence class. The rationals are an ordered field — the equivalence classes have an inherent order, alongside operations that, when applied to the rationals, do not disturb their inherent ordering. That is, given x,y,z∈Q, if y<z, then x+y<x+z. The same goes for multiplication, if y<z, then xy<xz. This all stems from the fact that the rationals are equivalence classes of pairs of integers, integers are equivalence classes of pairs of naturals, the naturals are inherently ordered because they're defined in terms of a successor set, and no successor set is a subset of its elements. (I.e., given a,b∈N, there are only a three cases: a⊂b,a=b,b⊂a, which, by convention, we denote as a<b,a=b,b<a).
The set of all rationals, however, are insufficient. We've seen this theme repeatedly. The naturals were insufficient, in the sense that there is no natural that satisfies x+1=0. So, we constructed the integers. Then we found that the integers were insufficient, as there is no integer that satisfies 2x=1. We patched that up with the rationals. Now we find that the rationals are insufficient, because they have a hole. Suppose the a group of analysts set out to lay all of the rationals on a number line in order, Sisyphean as it may be. One day, the geometers, working on some right triangles, come in and ask for a rational of length x, where x2=2. The analysts look at their number line, and find that no such length exists!
But it must exist, because we see right triangles whose legs are of length 1.
This is why we need R, the set of all real numbers.
The set R, however, is unlike any of the other sets we've seen. We have no way of pinpointing where the hole is. So how do we fill in this hole if we have no way of knowing where the hole is? As it turns out, there are two possible approaches: Dedekind cuts (or, as this author likes to call them, King Dedede cuts) and Cauchy sequences.
Cauchy Sequences
So, the notion of a square root doesn't exist, but we want to find out what {x∈Q:x2=2} is. We know that 12=1, and 1<2, so how about we try 1.12. This gives us (11/10)2=121/100=1.21. Still pretty far from 2, so let's try (1.2)2. Here, we have (12/10)2=1.44. Still pretty far. How about we jump to 1.4. This gives us (14/10)2=1.96. That's close. Let's look at the squares for x1=1.41,x2=1.414,x3=1.41414, and so on:
This keeps getting closer and closer to two. Moreover, the distance between each term, ∣xn−xn−1∣ where n∈Z+, keeps getting smaller and smaller. Visually, we might think of it like:
………∘∘∘∘∘∘∘∘∘∘∘2………
So, there's clearly some sort of sequence. But it's a very peculiar sequence, because the distance between the terms gets smaller and smaller. Augustin-Louis Cauchy (1789-1857) — the French mathematician credited with noticing and building upon this pattern with a definition. Before we get to that definition, we start with some definitions.
absolute value. Let x∈Q. If x>0, then ∣x∣=x. If x=0, then ∣x∣=0. If x<0, then ∣x∣=−x.
From these definitions, we now derive some lemmas.
lemma. For all x∈Q, it is true that ∣x∣=0 iff x=0.
proof. Let x∈Q. We have two cases, x=0⟹∣x∣=0 and ∣x∣=0⟹x=0. The first case is true by definition, so only the second case must be addressed. Suppose ∣x∣=0 and x=0. If x=0, then by the trichotomy law, either x<0 or x>0. If x<0, then by definition, ∣x∣=−x. But that cannot be true, since we assumed that ∣x∣=0. If x>0, then by definition, ∣x∣=x. But that cannot be true either, since we assumed that ∣x∣=0. It follows that x=0 and ∣x∣=0 can never be true. Therefore, ∣x∣=0 if and only if x=0.■
lemma. For all x,y∈Q, it is true that ∣x−y∣=0 iff x=y.
proof. Let x,y∈Q. Then by the closure of subtraction on rationals, x−y∈Q. By the additive inverse, x−y=0 if and only if x=y. It follows that ∣x−y∣=0 if and only if x=y.■
nonnegativity of absolute value. For all x∈Q, it is true that ∣x∣≥0.
proof. Let x∈Q. There are only three cases, x<0,x=0, and x>0 by the trichotomy law. If x<0, then x=−x, and ∣x∣=−(−x)=x. If x=0, then ∣x∣=0. If x>0, then ∣x∣=x. This exhausts all three possible cases of x. Therefore, for all x∈Q, it is true that ∣x∣≥0.■
lemma. For all x,y∈Q, it is true that ∣y−x∣=∣x−y∣.
proof. Let x,y∈Q. If x−y=0, then x=y, and y−x=0. Thus, ∣x−y∣=0=∣y−x∣. If x−y>0, then y−x<0. It follows that ∣x−y∣=x−y and ∣y−x∣=−(y−x)=x−y. If y−x>0, then x−y<0, so ∣x−y∣=−(−x−y)=y−x and ∣y−x∣=y−x. If x−y=0 then y−x=0, and by lemma 0.1.4, it follows that ∣x−y∣=0=∣y−x∣. Therefore, for all x,y∈Q, we have ∣y−x∣=∣x−y∣.■
lemma. If −y≤x≤y, then ∣x∣≤y.
proof. The inequality −y≤x≤y is equivalent to −x≤y. Since ∣x∣ equals x or −x, the lemma holds. ■
triangle inequality. For all x,y∈Q, it is true that ∣x+y∣≤∣x∣+∣y∣.
proof. Let x,y∈Q. From the nonnegativity of absolute value, we know that −∣x∣≤x≤∣x∣, and −∣y∣≤y≤∣y∣. Therefore, we have −∣x∣−∣y∣≤x+y≤∣x∣+∣y∣. By definition, ∣x+y∣ equals x+y or −(x+y). It follows that ∣x+y∣≤∣x∣+∣y∣.■
multiplicativity of absolute value. For all x,y∈Q, it is true that ∣xy∣=∣x∣⋅∣y∣.
proof. We address five cases: (a) x=y=0; (b) x>0 and y>0; (c) x<0 and y<0; (d) x<0 and y>0; and (e) x>0 and y<0. The theorem holds for case (a): ∣x∣∣y∣=0=xy=∣xy∣. For case (b), it follows that xy>0, and ∣xy∣=xy by definition. Given x=∣x∣ and y=∣y∣, we have ∣x∣⋅∣y∣=xy=∣xy∣. For case (c): It follows that xy>0, and ∣xy∣=xy. Given −x=∣x∣ and −y=∣y∣ by definition, we have ∣x∣∣y∣=(−x)(−y)=xy=∣xy∣. For case (d): Let x<0 and y>0. Then xy<0 and ∣xy∣=−(xy). By definition, we have −x=∣x∣ and y=∣y∣. Therefore, ∣x∣∣y∣=(−x)y=−(xy)=∣xy∣. By commutativity of multiplication, case (e) is equivalent to case (d). Therefore, we have ∣xy∣=∣x∣⋅∣y∣ for all x,y∈Q.■
distance. Given x,y∈Q, we define the notation d(x,y):=∣x−y∣.
closeness. Given x,y,ε∈Q and ε>0, we say that x and y are close if and only if d(x,y)≤ε. If x and y are close, we write x≈εy. Otherwise, we write x≈εy.
The notion of closeness leads to a few helpful lemmas.
reflexivity of closeness. For all x,y∈Q with ε>0, if x=y, then x≈εy.
proof. Let x,y,ε∈Q with ε>0. If x=y, then ∣x−y∣=0. Since ε>0, it follows that d(x,y)<ε, which implies that x≈εy.
symmetry of closeness. For all x,y,ε∈Q with ε>0, if x≈εy, then y≈εx.
proof. Let x,y,ε∈Q with ε>0. We have three cases, x<y,x=y, and x>y. If x=y, then y=x. Since x=y⇒x≈εy and y=x⇒y≈εx, the theorem holds when x=y. If x<y, then ∣x−y∣=y−x. If x>y, then ∣x−y∣=x−y. We know that ∣x−y∣=∣y−x∣. Thus, if y≈εx, then ∣y−x∣≤ε, and if x≈εy, then ∣x−y∣≤ε. Since ∣x−y∣=∣y−x∣, we have (∣x−y∣=∣y−x∣)≤ε. It follows that if x≈εy, then y≈εx.■
sequence. A sequence is a function a:{n,m∈Z:n≥m}↦Q, denoted (an)n=m∞. The members of a are an ordered collection of rationals, which we may denote as (a(1),a(2),a(3),…) By convention, we denote each application of X as (a1,a2,a3,…) in lieu of the standard notation a(n).
example. (n)n=1∞=(1,2,3,4,…).
example. (2n)n=1∞=(2,4,6,8,…).
example. (21n+(−1n))n=1∞=(0,1,0,1,0,1,…).
example. (n2+12n)n=1∞=(22,54,106,178,…).
From that definition, we have the Cauchy sequence:
cauchy sequence. A sequence (xn)n∈Z+ is a Cauchy sequence if, and only if: For any rational ε>0, there exists an integer N∈Z+, such that, if n,m≥N, then ∣xn−xm∣<ε.
Note what this preliminary definition is trying to capture — the behavior we saw earlier. Cauchy chose the symbol ε to denote "error" — that teeny-tiny bit separating xn and xm. Now, remember that sequences are functions. Cauchy's sequence says: There's some positive integer N (some index), where, for function arguments n or m greater than N (any two indices on or after N), we're going to see this behavior where ∣x(n)−x(m)∣ (the distance between the rationals xn and xm) is always less than any distance we can think of (the distance ε). That is, for any two terms after the index N, no matter what ε someone picks — "they're separated by ε" — we can always go back and tell them, "No, they're actually closer than that." That's all ε is — how separated the terms are.
Now, we might wonder why Cauchy decided to characterize this distance in terms of an "error." Here's some intuition: Suppose we bought a 16-inch pizza. We get the order back, and it's 12-inches. Would we complain? Of course! The pizza's off by 6 inches (which could very well translate to a price difference of over 10 dollars). How about 14-inches? Certainly. 15 inches? Maybe. What about 15.999999999999 inches? Unlikely — it's unclear whether we'd even notice the difference in the first place. That's what's going on with the term "error." If the error ε is small enough, the number xn, for all intents and purposes, is tantamount to xm.
With that intuition, we now state the following definition:
convergent sequence. Let (xn) be a sequence, with n∈Z+. We say that (xn)converges toa if, and only if, for all ε>0, there exists an N∈N, such that, for all n≥N, it is true that ∣xn−a∣<ε. If (xn) coverges to a, we say that a is the limit of (xn).
This seems like a complicated definition. Let's compare its logical form to the Cauchy sequence definition's logical form:
cauchy sequence
∀ε>0.
∃N∈N.
∀n,m≥N.
∣xn−xm∣<ε.
convergent sequence
∃a∈Q.
∀ε>0.
∃N∈N.
∀n≥N.
∣xn−a∣<ε.
Do we see how this definition is different? It hones in on that xm term from the Cauchy sequence — the term that "comes after" xn. Why the quotes? We're putting quotes around that phrase because we're now going to add a bit more nuance. Suppose we're not the pizza buyers, but the pizza sellers. Would we be upset if an employee accidentally delivered a 32-inch pizza instead of the 16-inch? Sure. How about an 18-inch pizza? Perhaps. How about 16.000000001? Hopefully not. The same idea applies here. The xn isn't necessarily 15.99999. It could also be 16.00000001. When we say that a sequence converges, we're saying that all of the sequence elements, eventually, are tantamount to a, before or after a.
All that said, we have to be careful with our language. At no point — never ever ever — is xn=a. It's why we use the phrase "tantamount to" rather than "same" or "indistinguishable." Clearly 15.999999999 and 16 are distinguishable. One has that symbol "5" and the other "6." The word tantamount means "virtually the same as," which is exactly the phrasing we want.
Putting it all together, we get something that looks like the following:
…□∣a−ε∘∘∘∘∘∘∘□∣a∘∘∘∘∘∘□∣a+ε…
The red dots comprise a region called the epsilon neighborhood of a. What the convergence definition tells us is, if a sequence (xn) converges, there are only so many points outside this neighborhood. That is, there are finitely many points outside the neighborhood. In contrast, no matter how small we make ε, we're always going to find a term closer to a. An often cited example of a convergent sequence is (1/n). Examine its values:
and yet there will always be something smaller. We should now able to see the connection between the Cauchy sequence and convergent sequences: All convergent sequences are Cauchy sequences, but not all Cauchy sequences are convergent. Why? Because the Cauchy sequences exist only among the rationals — they don't have that number a, because the a doesn't exist among the rationals. If it did, we wouldn't have this problem of trying to figure out what x is in x2=2.
Convergence of Cauchy Sequences
Suppose (xn) is a convergent sequence, with some limit a. We'll let ε>0, and define ε′:=ε/2>0. Because we assumed that (xn) is convergent, there exists an N∈N such that, for any index n greater than or equal to N:
∣xn−a∣<ε′.
We need to tie this to the Cauchy sequence's definition of distance (error between xn and xm). We can just write:
∣xn−xm∣=∣xn−a+a−xm∣.
And by the triangle inequality:
∣xn−xm∣=∣xn−a∣+∣a−xm∣
We thus have:
∣xn−xm∣=∣xn−a∣<ε′+∣a−xm∣<ε′
And since we defined ε′:=ε/2>0, we have:
(∣xn−xm∣)≤(∣xn−a∣+∣a−xm∣)<(2ε′=ε).
The Cauchy sequence, does, in fact, converge.
Dedekind Cuts
remark: For this section, assume Cauchy sequences do not exist. In fact, Dedekind cuts came before Cauchy sequences historically. Cauchy sequences were addressed first because they're generally easier to think about. Dedekind cuts are a bit trickier because the definitions are hefty, but they're well manageable, as long as we follow the definitions very carefully. Otherwise, we can quickly come unstuck!
Let's think about where x might be, given x2=2. Let's consider x2=2. In that case, we have x2<2, or x2>2. Let's look at x2<2 first. What is the set of all numbers whose square is less than 2? I.e., what is the set X={x∈Q:x2<2}? If we try some values, we'd find that this is a pretty big subset:
The problem with A are those pesky dots at the ends. We'd like to know where this set starts and ends. More formally, we'd like to know if A has a least upper bound. Let's define some terminology.
upper bound. Let P be a poset (P,≤) with a set S⊆P. If there exists an element a∈P such that for all s∈S, the relation s≤a is true, then a is called an upper bound of S. We write upper-bounds[S] to denote the set of all upper bounds of S. If upper-bounds[S]=∅, we say that S is bounded above.
example. Let A={1,2,3,4,5} with A⊂Z. 5 is an upper bound of A. So are 6, 7, 8, 9, 10, and so on, because A⊂Z.
lower bound. Let P be a poset (P,≤) with a set S⊆P. If there exists an element a∈P such that, for all s∈S, the relation a≤s is true, then a is called a lower bound of S. We write lower-bounds[S] to denote the set of all lower bounds of S. If lower-bounds[S]=∅, then we say that S is bounded below.
example. Let A={7,8,9,10} with A⊂Z. 7 is a lower bound of A. So is 6, 5, 4, and so on, because A⊂Z.
bounded set. Given a poset (P,≤) with a set S⊆P. We say that P is bounded above and below if, and only if, lower-bounds[S]=∅ and upper-bounds[S]=∅. That is, there exists elements n and N such that n≤x≤N for all x∈S.
least upper bound. Given a poset (P,≤) with a set S⊆P and upper-bounds[S]=∅. If there exists an element n∈upper-bounds[S], such that, for all u∈upper-bounds[S], the relation n≤u holds, then we say that n is the least upper bound or supremum of S and write supS=n.
The least upper bound (supremum), in short, is the least upper bound among all upper bounds. Notice that this implies that the least upper bound need not be in the subset S. Why? Because any element in the superset of S is a candidate for being an upper bound.
greatest lower bound. Given a poset (P,≤) with a set S⊆P and lower-bounds[S]=∅. If there exists an element ℓ∈lower-bounds[S], such that, for all L∈lower-bounds[S], the relation ℓ≥L holds, then we say that n is the greatest lower bound or infimum of S and write infS=n.
The diagram below is helpful for parsing the definitions.
We've marked the maximum and minimum of A separately to communicate the fact that they can be distinct, but they don't have to be, and vice versa. If supA turns out to be in A, then it is also maxA. Likewise, if infA turns out to be in A, then it is also minA. But it's not always the case that supA∈A, nor is it always the case that infA∈A.1 Put differently, if minA exists, then minA=infA, and if maxA exists, then maxA=supA. For both these propositions, neither the sufficient condition nor the converse is guaranteed to be true.
The greatest lower bound (infimum) is the greatest lower bound among all lower bounds. Like the supremum, the least upper bound need not be in the subset S. It depends entirely on how we've defined the superset of S. The infimum is also denoted lubS, and the infimum is also alternatively denoted glbS. We use supS and infS because LATEX provides them natively.
So, for our set Φ={x:x<2}, there are many upper bounds. We know that supΦ=2. Why? Because the subset Φ comprises all the rationals less than 2, and for all rationals x≥2 (the upper bounds of {x:x<2}), it is true that 2≤x. Is there a supremum for the set A={x:x2<2}? Let y=qp∈Q, such that y>x for all x∈A. Suppose y is the first rational satisfying y2>2. Consider y^=3+2y4+3y. In that case, we have (y^2−2=(3+2y)2y2−2)>0. So, we have y^2>2. But we know that y^<y, so it cannot be true that y2>2. What if y2<2? Again we have y^=3+2y4+3y, but now we have (y^2−2=(3+2y)2y2−2)<0. Thus, there can be no such way. The set A has no supremum.
These two examples, however, give us an idea. Because of the trichotomy law, we know that for any rational number, we're always going to have one of the following relations apply: <,=,>. So, for any r∈Q, we'll have many numbers to the left of r (<), r itself (=), and many numbers to the right of r (>). Maybe we could define the real numbers in this way: They're cuts in the sequence of rationals. That is, a real number r is either the supremum of a subset containing r, or the gap between the two sets.
……{……………}r{……………}…………{……………}{r……………}……
This is where the name Dedekind cut comes from. We're defining a new set of numbers by making cuts (or cutoffs, splits, etc.) along the sequence of rationals. We may may feel that this isn't quite as "natural" or "intuitive" as what we did for the integers and rationals. For those constructions, the new numbers resulted from mappings defined in terms of established operations. We'll revisit this feeling once we've established a bit more rigor to this notion of cutting.
cut. A cut α is a subset of Q such that: (1) α contains a rational number, but not all rational numbers. (2) Every rational number in α is smaller than every rational number not inα. (3) No number in α is greater than any other number in α.
Right off the bat, this definition is packed with a lot of conditions:
definition components: cut
2.0.1
α⊂Q
2.0.2
α=∅
2.0.3
α=Q
2.0.4
(p∈α)∧(q∈Q)∧(q<p)⇒q∈α
2.0.5
(p∈α)⇒(∃r.[(r∈α)∧(p<r)])
We have to be a little careful with the notion of a cut.
Footnotes
As we'll see later, the key distinction between the four concepts is that minA and maxA aren't guaranteed to exist for a set A⊂R, but supA and infA are guaranteed to exist for A⊂R (where R is the set of all real numbers). ↩