Binary Representation
"The synthetic words binit and bigit never flew. Bit did. But acceptance has not been universal. Lancelot Hogben wrote: 'The introduction by Tukey of bits for binary digits has nothing but irresponsible vulgarity to commend it.' Should Tukey byte back?"
Brockway McMillan, Anecdotes, 6 Annals of the History of Computing 155 (1984).
This chapter covers notes on bits, bytes, and binary representation. In the definitions below, we denote the set of all binary numbers with the symbol
Representing Information
On a computer, information is represented with bits β 0 or 1. By arranging the 1s and 0s in certain ways and assigning those arrangments meaning (a process called encoding) we can instruct a computer to perform operations.
This is a key point. A sequence of bits on a computer have no meaning unless we, humans, give those bits meaning. We can represent
as a number (15213) or as some other value, say a memory address, alphabet, numeral, emoji, and so on. Absent our giving it meaning, the sequence above is just a sequence of electrical pulses.
Why Bits?
Bits are used because they're the easiest way to deal with the noise and uncertainty of electrical pulses running through wires. If we examined the pulses with an oscilloscope, we'd see graphs that aren't true step functions like we'd see in a mathematics textbook. Instead, they're graphs that appear to resemble step functions.
Word Size
Every computer has a word size. In terms of hardware, the word size is the width of a single register on the machine. The width of a register is the number of binary cells the register contains. For example, a machine with a word size of 8 bits consists of registers with 8 binary cells. On a machine with a word size of 64 bits, a single register ctonains 64 binary cells.
In terms of software, the word size is the nominal size of pointer data (a memory address). Put differently, the word size is the maximum number of bits the computer's CPU can process at a single point in time. This is in line with the hardware perspective β if the most a register can take is bits, then the CPU can only work with at a time.
Thus, If we have a 4-bit machine, each register has 4 binary cells, each of which can be either 1 or 0. As such, the number of possible bit sequences for 4 binary cells ranges from 0 to or 16 possible sequences. This leaves 16 possible memory addresses.
Word-oriented Memory Organization
If we have possible numbers for addresses, the next question is how we want to use those numbers to address memory locations β the process of byte ordering. There are predominant two approaches too byte ordering:
- Big Endian (used by SPARC, PPC Mac, and the Internet), and
- Little Endian (used by the x86, ARM processesors for Android, iOS, Linux, as well as most modern operating systems like Windows and OSX)
With the Big Endian approach, the least significant byte has the highest address. With the Little Endian approach, the least signfiicant byte has the lowest address. Simply put, with Big Endian, the bits are ordered from left to right:
And with Little Endian, the bits are ordered from right to left:
Most of devices we use daily employ Little Endian ordering. The one exception is the Internet. This is because the Network Working Group (the organization responsible for implementing the Internet in its early years) decided to use Big Endian.^[note_big_endian] It's not entirely clear why they made that decision, but it's what we have today.
Encoding Bytes
Because a single bit is so small, when we want to think of the smallest possible unit on a computer, we tend not to think in terms of an individual bit, but rather a group of four bits, colloquially called a nibble. In doing so, we can then think of information in terms of hexadecimal:
Hex | Decimal | Binary |
---|---|---|
0 | 0 | 0000 |
1 | 1 | 0001 |
2 | 2 | 0010 |
3 | 3 | 0011 |
4 | 4 | 0100 |
5 | 5 | 0101 |
6 | 6 | 0110 |
7 | 7 | 0111 |
8 | 8 | 1000 |
9 | 9 | 1001 |
A | 10 | 1010 |
B | 11 | 1011 |
C | 12 | 1100 |
D | 13 | 1101 |
E | 14 | 1110 |
F | 15 | 1111 |
This allows us to write large numbers in hexadecimal notation ala C:
Data Representations in C
In C, we have the following data representations (in bytes):
Data Type | Typical 32-bit | Typical 64-bit | x86-64 |
---|---|---|---|
char | 1 | 1 | 1 |
short | 2 | 2 | 2 |
int | 4 | 4 | 4 |
long | 4 | 8 | 8 |
float | 4 | 4 | 4 |
double | 8 | 8 | 8 |
pointer | 4 | 8 | 8 |
Notice that the column headers are prefixed with "typical". This is because there is not set size for any of these types. It depends on how the program is compiled. A modern machine can have a register size of 64-bits, but output a program with 32-bits. Couple that with virtual machines, and there's really no such thing as an "π-bit machine". It entirely depends on the source machine, the program, the compiler, the target machine, and many other factors.
Encoding Integers
Integers can be positive or negative (signed or unsigned). For unsigned integers, the binary representation can be mathematically represented as:
And to represent signed integers, we use two's complement. Mathematically:
The term:
In Twos-complement, this is called the sign bit, and it indicates the numbers signage β 0 for nonnegative, 1 for negative. For example, consider the number 10 on a machine with a word size of 5. In binary, this number is represented as:
-16 | 8 | 4 | 2 | 1 |
---|---|---|---|---|
0 | 1 | 0 | 1 | 0 |
Notice that the leftmost bit has a heavy negative weight (), while all the bits to its right has a positive weight. To represent the number -10, we start with -16, and add the necessary bits to get -10:
-16 | 8 | 4 | 2 | 1 |
---|---|---|---|---|
1 | 0 | 1 | 1 | 0 |
This gives us: Because of twos-complement, computers do not perform real number addition. Instead, they perform modular addition.
Warning on Twos Complement
As a warning, we should never assume that a machine uses twos-complement representation. While there aren't very many ones-complement machines today, when working with low-level systems, we should check to know for certain that the relevant platform uses twos-complement. There are several ways to do so.
The simplest way is to check if:
If -1 and are equal, then the machine uses twos-complement representation. Why? Because -1 on a twos-complement machine is always a bit vector of entirely ones. If they aren't equal, then we're using either ones-complement hardware or, albeit very rare, a sign-magnitude machine.
Numeric Ranges
From the discussion of twos-complement, we can see that on a machine with a word size we have the following conclusions (where is an unsigned number and is a signed number in twos-complement):
0 | |||
000...0 | 111...1 | 100...0 | 011..1 |
For example, for a machine with a word size of 16, we have:
0 | 65535 | -32768 | 32767 |
0 | FF FF | 80 00 | 7F FF |
Notice that with twos-complement, we can represent more negative numbers than we can positive numbers. Why? Because we have to set aside space for 0.
In C, a short
is 2 bytes long.
Decimal | Hex | Binary |
---|---|---|
15213 | 3B 6D | 00111011 01101101 |
-15213 | C4 93 | 11000100 10010011 |
Relationship Between Signed and Unsigned Values
Following twos complement, we see that following relationship between unsigned and signed numeric values. Suppose the table below is drawn from a machine with a word size of 4.
Unsigned | Twos-Complement | |
---|---|---|
0000 | 0 | 0 |
0001 | 1 | 1 |
0010 | 2 | 2 |
0011 | 3 | 3 |
0100 | 4 | 4 |
0101 | 5 | 5 |
0110 | 6 | 6 |
0111 | 7 | 7 |
1000 | 8 | -8 |
1001 | 9 | -7 |
1010 | 10 | -6 |
1011 | 11 | -5 |
1100 | 12 | -4 |
1101 | 13 | -3 |
1110 | 14 | -2 |
1111 | 15 | -1 |
Notice that for nonnegative numbers, both signed and unsigned numbers have the same encodings. However, once we hit the most significant bit, the encodings change: For the signed numbers, we add
This means that we can jump to and from unsigned to signed numbers with the following:
lemma. Given a machine with a word size and a negative number on the machine,
This lemma leads to some strange real-world behaviors. Negative numbers can
become extremely large positive numbers, and large positive numbers can
become negative. Because of this, languages like Java do not provide an
unsigned data type β if a programmer uses a number that's too large, Java
throws a compile-time error Integer number too large
.
Sign Extension & Truncation
To perform arithmetic operations on binary numbers, the operands must have
the same length. For example, to add (a binary sequence of four
bits) to another binary number, that number must also be of length
(four bits). This is a straightforward concept complicated by the fact that
languages have different data types, e.g., a char
is 1 byte (16 bits),
while an short
is 2 bytes (32 bits). For example, in C, when we add a
char
(a smaller data type) to an int
(a larger data type), the char
value is promoted to an int
. In C++, when a long
value is assigned to a
short
value, the long
value is chopped to fit the short
value's
available addresses. These operations are made possible hrough sign
extension and truncation.
In sign extension, if we add 1010
to 1010 1100
, additional copies of
the most significant bit are appended to the smallest bit sequence. This
results in:
In truncation, the extra bits are chopped off:
Obviously, there's a limit on how many more bits we can append to a given bit sequence. When we go over that limit, we get an overflow bit.
Overflow Bits in Addition
When an overflow results from twos-complement addition, we have two possibilities:
- The result is within the twos-complement representable range,
- The result is too big (a positive overflow) β resulting in a negative number.
- The result is too small (a negative overflow) β resulting in a positive number.
Mathematically:
In languages like C, no overflow occurs for unsigned addition. The catch, of course, is that we get unsigned integer wrapping.
Overflow Bits in Multiplication
For multiplication, we have a messier situation. When we multiply an bit number by an bit number, we need bits for the product. If we have two bit numbers and , the product requires twice as many bits. Thus, the product of two unsigned bit numbers can require either or bits.
For 2s-complement multiplication (signed multiplication), we have two cases. If one of the operands is , then we may need up to bits. If one of the operands is we may need up to bits.
Again, we can't keep adding bits. As such, like addition, there are two approaches for handling overflows in multiplication:
- For unsigned multiplication:
- Ignore the overflow bits (truncation), or
- Implement modular arithmetic ().
- For signed multiplication:
- Ignore the overflow bits, or
- Compute the product (resulting in a bit sequence of length ), and then truncate the result.
Loops & Overflow Bits
One area where overflow bits are potentially dangerous is the use of iteration. A common iterative construct in many languages is the for-loop, and in many situations, we often want to count down:
int i;
for (i = start; i >= 0; i--) {
// code
}
The problem with this approach is that int
can be signed or unsigned.
Considering how common fenceposts are, unless the programmer truly knows
the data they're working with the loop above is risky. Because of this,
some programmers employ the following:
unsigned i;
for (i = start; i >= 0; i--) {
// code
}
While this declares that i
is a positive value, it does not guarantee it.
Because unsigned arithmetic on any machine employs unsigned wrapping, we
can imagine a scenario where the loop counter i
wraps.
We can prevent this is by using the following:
size_t start;
for (i = start-2; i < start; i--) {
// code
}
This ensures that we don't wrap. Of course, it also assumes that start
is
not less than 0.
Floating Point Numbers
For the rational number 3.145, we know that the 1's place is the 1/10ths place, the 4 the 1/100ths, the 5 the 1/100ths, and so on. The same idea applies to binary numbers. For the rational binary number 1011.101, the first 1 to the right of the point corresponds 1/2, the 0 to 1/4, and the next 1 1/8. Thus, the number 1011.101 corresponds to:
There are several limitations with this encoding scheme. First, we can only represent numbers of the form This means that we get binary representations like:
Second, the binary point is fixed β the point doesn't actually move; the bits go around it. This means that there are only so many values we can represent.
Because of these limitations, hardware designers came up with all sorts of ways to mitigate. In response to the increasingly chaotic market, the Institute of Electrical and Electronics Engineers (IEEE), a professional association of electrical and computer engineers, established a uniform standard for floating point arithmetic. In particular, the standard clarified how rounding, overflows, and underflows of floating points should be handled β the three most idiosyncratic areas at the time.
Today, the standard is supported by all major CPUs, and those that do not are generally regarded as black market CPUs. That said, the IEEE standard is a remarkably difficult standard to comply with at the hardware level. Much of the standard was defined by numerical analysts β mathematicians β rather than hardware engineers. The IEEE standard, however, is precisely what makes working with floating point numbers today user-friendly (at least compared to what it was like before the standard).
For computers, the numerical form of a floating point number is the following:
where is the sign bit. is the significand or mantissa. This is the number of significant digits, usually some fractional value in the range The bits that encode are called f-bits (fractional bits). And is the exponent, which weights the value by a power of two. The bits that encode are called the e-bits (exponent bits).
For example, the number as a float, can be represented as:
Here, because is positive. is the bit sequence of bits following the decimal point. Then we have
Visualizing with bits, this looks like:
Above, we've appended asterisks to each of the locations to indicate that these are encodings of the numbers comprising the numerical form. In other words, they're not exactly the numbers we'd see in the numerical form.
Single- vs. Double-Precision
Many languages provide the type float
and the type double
for floating
point numbers. These types correspond to single-precision floating point
numbers and double-precision floating point numbers respectively. At the
bit level, these distinctions result in the following:
Single Precision | Double Precision |
---|---|
1 bit for the sign bit | 1 bit for the sign bit |
8-bits for | 11 bits for |
23-bits for | 52 bits for |
total 32 bits (4 bytes) | total 64 bits (8 bytes) |
encodes up to 7 decimal digits | encodes up to 16 decimal digits |
There are of course, other formats: half precision, tripe precision, quad precision, etc.
Kinds of Floating Point Numbers
There are three kinds of floating point numbers, depending on how the e-bits are encoded.
- If all of the e-bits are 0, the number is a denormalized floating point number.
- If all of the e-bits are 1, the number is a special floating point number.
- If the e-bits are not all zeroes and not all ones, the number is a normalized floating point number.
The way a floating point number's exponent and significand is encoded depends on which of these three kinds the floating point number is.
Normalized Floating Point Numbers: Exponent
Following the normalized floating point number's definition, if we focused just on the e-bits, there's a finite range decimals we can represent with just the e-bits, since the e-bits cannot be 1 or 0 in decimal. Because this is a potentially very small amount, we want to make the best use of it.
To do so, the value is encoded as a biased value:
where is the unsigned number represented by the e-bits, and the bias is:
and is the number of e-bits. Putting it goether, the encoding of the exponent is:
Because of this, the encoding of falls within the ranges:
- For single-precision,
- For double-precision,
Normalized Floating Point Numbers: Significand
For the significand, the encoding starts with an implied 1:
This implies that for any given floating point number, the e-bits do not map to the binary representation of the number, but the IEEE-encoding of the number.
Above, the part are the f-bits. If all of these bits are 0, then If all of the bits are then Here, the is some tiny amount. This tiny amount comes from the fact that falls within the range so it can never quite get to
Denormalized Values
While the encoding scheme for normalized values covers a lot of floating-point use cases, we can get closer to zero.
For floating point numbers, when the e-bits are all 0s, we have a denormalized value. For denormalized values, we use a different encoding:
Bit Manipulation
The following sections examine bit manipulation techniques.
Bitwise AND
The bitwise AND operation is defined as follows:
definition. Let If both and , then Else,
The possible results with two bits:
Bitwise OR
The Bitwise OR operation is defined as follows:
definition. Let be binary numbers. If and then Else,
Here's another example. What is the result of ? Well, first convert 5 to binary, which is 0101. Then we convert 8 to binary, which is 1000. The result then is:
The number 1101, in decimal form, is 13.
Bitwise XOR
Bitwise XOR (the exclusive or) is defined as follows:
definition. Let . If and have the same value, then Else,
Laying out the possibilities with two bits:
Illustrating with decimals, suppose we wanted to compute The decimal 5 in binary is and the decimal 7 in binary is Thus, we get:
Bitwise NOT
The bitwise NOT operation is a unary operation β it only takes one argument. The operation is defined as follows:
definition. Let If then If then
For example:
We can think of the bitwise NOT operation as the "bit-flipper." For example, results in:
Importantly, the bitwise NOT operation will also flip the number's MSB (most significant bit). This means that the result of the number could very well be negative. Above, the number is 58 in decimal.
An interesting implication of this fact is the result of coding something along these lines in some language:
int a = 0;
int b = ~a;
print(b);
This will result in Why? Because the number on say, a 4-bit platform, is:
Flipping all of these bits, we get:
Because the MSB is 1, we have a negative number. The rest of the bits are the magnitude of the number. And because modern computers use 2s-complement to represent decimals, the decimal 1 is apparent when we go through the 2s compliment process.
Here, we have 2s-complement says we invert all the bits:
Then add 1:
The number in binary is the number Hence the result -1.
Bitwise Left-shift
The bitwise left-shift operation is defined as follows:
Defition. Let and Then shifts all the digits of to the left positions.
Let's look at an example. The number 5 in binary is When we write:
we get the binary number:
Just to make the comparison clearer:
Notice how the digits have shifted by places. This is interesting. The number in decimal is and the number in binary is This observation leads to an alternative definition of the binary left-shift operator:
Defition. Let and Then:
where is the decimal form of
Thus, we have the following:
Bitwise Right-shift
Bitwise right shift.
Defition. Let and Then shifts all the digits of to the right positions.
For example, the number 10 in binary is The computation:
results in:
A before and after shot:
Notice how we lose the last two binary digits. The number in decimal is Like the left-shift operation, the right-shift operation also yields a more interesting definition:
Defition. Let and Then:
where is the decimal form of
Thus, we have:
Note that this trick will "round the wrong way" when we shift negative numbers. That is, performing If we want the result to be closer to zero, we have to add some bias to the number:
Applications
Now that we have an understanding of the bitwise operations, let's see some useful applications. Given that we can perform some fairly interesting arithmetic with these operations, there are numerous possible applications. The curation below is just a small sampling.
Bit Manipulation Helpers
To help us look at our outputs, it's helpful to have some utility functions involving bits.
Printing Binary Representation
There are a number of ways to print the binary representation of a some
data. In C++, we can use the <bitset>
member function in the standard
library:
#include <iostream>
#include <bitset>
std::bitset<8> printBinaryInt(int n) {
return std::bitset<8>(n);
}
Negating a Number
We can negate a number by performing the following:
For example, say for the number in 4-bit binary we can negate by first performing the bitwise-not:
then adding 1 to the result:
which is in twos-complement. As a further observation, notice that:
which is -1 in twos-complement. Thus, we have another observation:
Representing & Manipulating Sets
Suppose we had the following sets:
We can represent these sets in binary. First, we know that the sets have a size of four. So, we can use two separate bytes corresponding to 0:
then, we set a variable i = 7
, and decrement it down to 0 for both sets.
If i
is equal to an element in the set, we toggle the bit to 1. This
gives us a binary representation of the sets:
Once we have these binary representations, the bitwise operators can be used to generate the different results of various set operations.
Set Intersection
For example, is A&B
:
which gives us the bit vector:
Examining the set, is in fact the set
Set Union
The set can be drawn from A|B
:
This gives us the bit vector:
And in fact,
Symmetric Difference
The set (the symmetric difference) is given by A^B
:
yielding the bit vector:
And again,
Complement
The complement of set is given by the ~B
.
yielding the bit vector:
We again see that this too is correct: The set contains all of the elements not in the set
Parity Checking
We can use the bitwise operations to check whether a given number is odd or even. I.e., checking the number's parity. The most common way to check if a given number π is odd or even is with the modulus operator:
Mathematically, this is equivalent to asking whether 2 divides If it does, then the π is even; otherwise, π is odd. So how might we do this with the bitwise operators? Well, let's look at the first 10 integers in binary:
Decimal | Binary |
---|---|
1 | 00001 |
2 | 00010 |
3 | 00011 |
4 | 00100 |
5 | 00101 |
6 | 00110 |
7 | 00111 |
8 | 01000 |
9 | 01001 |
10 | 01010 |
Do we see a patern in these numbers? Let's make it a little clearer:
Decimal | Binary |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 |
The last bit for an odd number is while the last bit for an even number is This is not surprising, since each digit in a binary number is a power of two, and the last digit corresponds to which is
Thus, we can check if a number is odd or even by checking if the last bit in the number is a or To get that last bit, we can use the bitwise-and operation. More specifically, to check if the number is odd or even, we perform:
If the last binary digit of is a then If it's a then For example, for the number we get:
In code, this might look like the following:
#include <iostream>
void parityCheck(int n) {
if (n & 1) {
std::cout << n << " is odd" << std::endl;
} else {
std::cout << n << " is even" << std::endl;
}
}
Bit Handling
Some of the later application's we'll consider will requiring us to manipulate the individual bits of a number. This is called bit handling, and it consists of three key operations: (1) reading the πth bit, (2) writing the πth bit, and (3) clearing the πth bit, where π is the position of some bit the number comprises.
For example, the number on an 8-bit platform is represented as We can visualize the positions as:
Importantly, notice the position of the bits. The MSB is the left-most bit, at the position where is the number of bits constituting the number (we subtract 1 because the count starts at 0).
Read the ith Bit
Using the number 5, suppose we want to get the bit at position 2:
We can't exactly use the bitwise-AND here, because that would only give us the first bit. What we really want is a bitsequence that sets up the following:
In other words, a bitsequence consisting entirely of 0s, other than the second position. How might we get this number? Well, it's just the following:
where is the integer one, and is the number of positions we want to shift. Once we do so, we can perform a bitwise-and:
Once we have this number, we can compare it against If it's greater than then the bit is Otherwise, its The result of computing is called a bitmask. Here's an implementation in C++:
#include <iostream>
int getIthBit(int n, int i) {
int mask = 1 << i;
return (n & mask) > 0 ? 1 : 0;
}
int main() {
const int n = 5;
int i;
std::cin>>i;
std::cout << "bit @ position " << i << " is " << getIthBit(n,i) << std::endl;
return 0;
}
Clear the ith Bit
Say we have the following bit sequence:
When we clear the bit at position 2, we get:
Notice what happened. The bit at position became but everything else was kept as is. This is the operation of bit-clearing. Like we saw previously, we can clear the ith bit by first creating a mask:
where is the position we want cleared. Then we flip the result with bitwise-not:
And then simply perform a bitwise-and:
where is the number comprising the bit sequence.