Variables & Types

Programs need data. To give our programs that data, they must be stored somewhere in memory. But before we store any data in Java, we must tell Java beforehand that you are going to store data. To do so, we need variables—an entity that holds a data type value. In Java, every variable has a name and a data type.1

// Declare variable named c of type char
char c;

// Declare variable named num of type int
int num;

// Declare variable named isPresent of type boolean
boolean isPresent;

In the example above, there are comments, denoted by two forward slashes (//). Every other statement is called a declaration. The left side is the variable's type, and the right side the variable's name. Simply put, variable declaration is the act of creating a variable.

In programming, a comment is a statement ignored by the compiler. Because they're ignored by the compiler, we can use them to annotate our code. Comments are valuable in programming. They tell readers what a particular statement means or does. But, they also take up space and too many comments can clutter a program, making it unreadable.

Comments should be concise and descriptive. They should not be simply identical to the actual code (since that would simply be a waste of space).

A brief note on terminology: There are different kinds of variables in Java. Variables that store primitive type values are called primitive variables__ (or, __primitive constants). Variables that store objects are called object references. With an object reference, the variable doesn't actually hold the object—it instead holds a reference to the object.

Constants

Variables fall into two categories: (1) variables, and (2) constants. In languages like Java, the values we store in a variable can be mutated. Accordingly, the term variable, on its own, usually implies that the value stored in the variable can be mutated (again, in the context of Java; some languages do not permit such mutation). When we use the term constant, however, we are referring to a variable whose stored value cannot be changed.

Types

Data comes in many forms. The "Call me Ishmael" and "WARNING" are textual data. The number of users visiting this page is numeric data. These data all take a different form, and there are things we can and cannot do with them. We can add 4 and 7, but we cannot divide "love" by "children" (at least not logically). Because of this fact, Java (and many other programming languages) classify data by type.

While a program runs, and while a file is open, it is stored in a hardware component called the RAM ("Random Access Memory"). When you install programs and save files (and they are not running or open), they are stored in a different hardware component, the hard drive (or hard disk or solid state disk; or a CD; or a USB; devices that store data, other than RAM). Devices like hard drives do not require electricity to store data. RAM, however, does. It is much faster than a hard drive, because it is a purely electronic device—unlike hard drives, there are no moving parts. With RAM, we have the ability to very quickly open files and execute programs, because everything is electric. This is why we lose data when our computer suddenly dies midway through writing a Word Document or a C program. The data is stored in RAM, and without electricity coming from the computer's power source, everything is lost.

At a very high level overview, the RAM can be understood as:

RAM Allocation

Whenever we declare a variable in Java, we must explicitly state the variable's type. This is an instruction to the compiler that the particular variable will store a value of a particular type. We will see later why Java's designers decided to implement this feature.

Now, once we declare a variable to be of a particular type, that variable's type can never change. For example, when write int x; the variable x will always store some value of type int. That value may change later, but the variable's type int can never be changed to double or char. Because of this trait—the prohibition of changing a variable's type after declaration—Java is described as a statically-typed language (i.e., the language's types are "static").2

Whenever we compile Java source code, the javac compiler will check all of our source code to ensure we are complying with Java's type-checking rules. Because type checking is done at compile time, we say that Java employs a static type checking system.

With few exceptions relating to scope—in Java, once you have declared a variable name of a particular type, you cannot redeclare it later down the program:

// Declare variable named c of type char
char c;

// Variable is already declared, this will return an error
char c;

// This also won't work, even if it's of different type
int c;

Even if the last example worked, it would be a stark example of dirty code, poor programming hygiene. How will our future selves or other coders tell the two apart?

In Java, there are eight primitive data types. All other data in Java is represented by some combination of these eight primitive types. The types fall into four broad categories:

  1. integers
  2. floating point numbers
  3. text
  4. booleans

A helpful way to think about types is to imagine them as cups. They hold something. For example, at a coffee shop, drinks can be ordered in various sizes. Suppose the sizes are small, short, tall, grande, gigante, and monstruoso. Some of the drink sizes are specific to certain drinks. Maybe the monstruouso size is only available for lemonade or water—it would be too costly to serve high quality coffee at such a size. The same idea extends to types. As we'll see, an int is 32{32} bits, a long is 64{64} bits, a char is 16{16} bits, and so on.

Integer Type

Integers are the whole numbers and their negative counterparts (e.g., 0, 5, -11, 27, etc.). In Java, these numbers can be represented by any one of these four types:

byte
short
int
long

We will later discuss why there are 4 different types. For now, let's review some representation. As we know, the computer doesn't actually know what 2 or 17 are—the computer only understands 1s and 0s. As such, integers, written in Hindu-Arabic numerals, must be translated to binary form for the computer to process.

Two's-Complement

With the natural numbers, we can convert these easily. The number 2{2} is 10[0]{10_{[0]}} in binary, and 17{17} is 10001[0].{10001_{[0]}.} But how do we deal with negative integers?

Let's consider the data type byte. As its name implies, a byte can only hold 8 bits. With 8 bits, we can represent 2n{2^n} unique bit patterns, starting from 00000000[0]{00000000_{[0]}} (0 in decimal) through 01111111[0]{01111111_{[0]}} (127 in decimal). Hang on. Why is is there 0{0} there? Shouldn't it be 01111111[0]{01111111_{[0]}} (128)? That 0{0} is called the sign bit. If the number is positive, the sign bit is 0,{0,} and if it's negative, the sign bit is 1.{1.}

The next question, however, is how does the computer store negative integers? For example, if we wrote 5,{-5,} how does the computer go from this literal to binary? The answer is through two's-complement. The idea is fairly straightforward. Let's start with the number 5,{5,} stored in a byte, and work our way to negative 5.{5.} First, we represent the number 5{5} in binary:

00000101[0] 00000101_{[0]}

Next, we take what we call the one's-complement. Essentially, we, invert, or change, all of the digits into their opposites:

11111010[0] 11111010_{[0]}

Then, we add 1{1} to the one's-complement:

11111010[0]+00000001[0]11111011[0] \begin{align*} 11111010_{[0]} \\ + 00000001_{[0]} \\ \hline 11111011_{[0]} \end{align*}

This result, 11111011[0],{11111011_{[0]},} is called two's-complement. Notice that we now have a 1{1} as the sign-bit. This is a negative number, but it's not 5,{-5,} as read. We need the 5.{5.} To get the 5,{5,} we simply take the two's-complement of 11111011[0],{11111011_{[0]},} while keeping the sign bit in place.

As an aside, notice that when we add the binary representation of 5{5} to the binary representation of 5,{-5,} we get 0:{0:}

00000101[0] (5[10])+11111011[0] (5[10])00000000[0] (0[10]) \begin{align*} 00000101_{[0]} \space &(5_{[10]}) \\ + 11111011_{[0]} \space &(-5_{[10]}) \\ \hline 00000000_{[0]} \space &(0_{[10]}) \end{align*}

Floating Point Types

Floating point numbers are those numbers with fractional parts, or decimal points (i.e., 0.0, 12.9, 3.14, -1.29, etc.) We can represent these numbers with any one of these two types:

float pi = 3.14;
double e = 0.577;

Text

There are two ways to represent textual data in Java: with the primitive type char or with the abstract data type String. We will investigate what the differences between a primitive type and an abstract data type are in later sections, but for now, it suffices to understand that char is a primitive, but String is not.

The char type represents individual alphanumeric characters for symbols. There are 216{2^16} different possible char values; this includes letters, numbers, symbols, and whitespace characters like single-space, tab, and newline.

char x = 'x';
String n = "Hello";

The String type represents sequences of characters. In Java, we denote strings by enclosing a sequence of characters in double quotes.

Boolean

Boolean values are the binary values true and false. These are the only two values under the type.

boolean a = true;
boolean b = false;

With just these eight types:

  • byte, short, int, long
  • float, double
  • char
  • boolean

we can represent any data that a computer can work with.

Initialization

Once we have declared a variable, we can assign it data. In doing so, we initialize the variable (we give the variable an initial value):

// declaration
float increment;

// initialization
increment = 0.1;

// inline-initialization: declare and initialize in one line
boolean isItSnowing = false;
long weightBlueWhale = 330000;

In the example above, notice that we can declare and initialize variables in one line. This is called inline-initialization.

Note that every statement in Java ends with a semicolon (;). Java is a semicolon-delimited language. To use variables in Java, they must be initialized:

double x;
System.out.println(x);
Line 3: error: variable x might not have been initialized
System.out.println(x);
									^
1 error

A literal is the Java source code representation of a data type value. The number or value we explicitly assign to a variable is called a literal. In programs, there are values that change as the program runs; an incrementer or accumulator, for example. Literals are those values that are not computed; they are explicitly written. They might be manipulated, but the value we explicitly wrote always appears directly in our source code.

The opposite of a literal is an expression. An expression is a combination of literals, variables, and operations that Java must evaluate to produce a single value. We can also think of the expression as the Java source code representation of a computation.

// These are literals
char tic = 'x';
char tac = 'o';

// Note the L suffix; this is how we tell Java that this is big number
long bigNum = 10000L;

boolean isComplete = true;
boolean isIncomplete = false;

// These are expressions, not a literal
int foo = 1 + 1;
long biggerNum = bigNum + 10000L;

Modifying Variables

Once a value is assigned to a variable, we can modify it as the program runs:

int changing = 10;

changing = 20; // assign a new value

changing = 20 + 20; // Arithmetic is ok right of =

changing += 1; // Add 1, assign it the result

changing -= 1; // Subtract 1, assign it the result

changing *= 2; // Multiply by 2, assign it the result

changing /= 2; // Divide by 2, assign it the result

We can also modify variables by assigning existing variables. Always read assignments from right to left:

double first = 10.0;
double second = 5.0;
first = second // assignment, right to left
second = 20.0;
first = second + 10.0;
double third = 2.0;
first = second + third; // arithmetic with variables on right is ok

int z = 10;
z = z + 1; // z is now 11

The last example reveals a crucial point about assignment, and why we want to always read them from right to left. The value assigned to z at the very last line is first computed—z + 1—then assigned. The value stored in z when z + 1 is computed is 10, so the value assigned to z at the very last line is 11.

Type Safety

Once we declare a variable in that variable must maintain its type. We cannot, say, declare a variable num of type int, and thereafter assign to it data of type float:

int num = 10;
num = 10.0; // error is returned

Failure to remember this rule can lead to unexpected results:

int num = 35;
num /= 2;
System.out.println(num);
11

The correct answer to the above should be 11.6.{11.\overline{6}.} (The overline, called a vinculum, indicates repeating digits). We're getting 11 because num was declared to store data of type int. Note that there is no rounding occurring here. Java is simply dropping the fractional digits. Note that Strings in Java must be enclosed in double quotes:

System.out.println("Hello, world!'); // causes an error
System.out.println("Hello, world!"); // no error

Why "Primitive" Type?

The eight types above are referred to as "primitive" types because they can all be represented by numbers. The numeric data types byte, short, int, long, float, double can obviously all be represented by numbers. But what about char and boolean? Well, boolean is simply a binary value—we can represent false with 0 and true with 1.

The Special Case of Char

In the early days of computing, the only characters necessary for computation were unaccented English characters (along with constructs like the linefeed, bell, and whitespace). All of these symbols—128 in total—formed the character set ASCII (American Standard Code for Information Interchange).

Because there were 128 ASCII characters, the integers from 0 to 127 were used to represent each character. Because the highest possible integer representation is 127—in binary, 111 1111[2]{111~1111_{[2]}}—seven bits were sufficient for representing all 128 characters. However, by the time ASCII was developed, the smallest possible unit in computer memory a user could reference was a byte (eight bits). Accordingly, ASCII characters—in C and C++, char values—take up 1 byte of memory. Because of this allocation, ASCII's users and developers found themselves with an additional bit. And with eight bits, the integers 128$ to 255 were available for mapping—users now had access to 256 possible characters.

The result was a lost-in-translation situation of painful magnitude. Governments, companies, independent developers, and users were coming up with their own ways of using the extra bit (i.e., the other 128 available integers). Documents, code, and data sent from one entity to the next could not be read because of conflicting standards.

Responding to the discord, IBM introduced code pages—systems mapping values to characters in an encoding system. In IBM's code pages, the integers 0 through 127 were always mapped to the ASCII characters, and the integers 128 to 255 (called the extended codes) were mapped to some language variation of the user's choice. For example, with code page 437, the extended codes were mapped to characters specific to IBM computers: diactritics (accented letters), icons, and system-specific symbols. For code page 737, the extended codes mapped to Greek letters, and for code page 826, the extended codes mapped to Turkish letters. With multiple code pages, users could simply swap code pages as needed. The mathematician might work predominantly with the Greek letter code page, but when reading a German paper, she could switch to the German code page. All 256 characters (the 128 original ASCII characters and the 128 additional characters from a code page) constitute an extended ASCII character set.

Because every ASCII encoding requires exactly 1 byte, we say that ASCII uses a fixed-width encoding system. This is a good point to clarify an important distinction: There's a difference between the character set (ASCII set) and the character set's encoding system (ASCII encoding). The encoding system is the way the characters in the set are represented in memory. In extended ASCII, characters are encoded as eight-bit character codes, as we stated earlier.

As the internet grew, consumers recognized that eight bits, 256 characters, were insufficient. And rightly so—the average Chinese user demands about 7000 characters for expressive use (from roughly 50000 possible characters). As exchanging text between systems—rather than entire systems or parts of the systems themselves—became prevalent, a paradigm shift in the encoding community occured. Rather than thinking of a character as a symbol with one, specified representation in computer memory, we think of a character as a concept that can be represented in multiple ways. In practice, we call the former paradigm a fixed-width encoding system, and the latter a variable-width encoding system.

For example, the letter A in ASCII encoding employs the former paradigm. It is always represented as:

A0100 0001 \texttt{A} \to \texttt{0100 0001}

Under the new paradigm, we map each character to a concept. That concept, called a code point, can then be deciphered by the computer in whatever way it sees fit (using more bits or bytes as necessary):3

AU+0001 \texttt{A} \to \texttt{U+0001}

The character set employing this new paradigm is called Unicode.4 Above, the symbol U+0001 is a code point. The code point is simply a number associated with a particular idea. That idea could be a letter, a mathematical symbol, a numeral, whitespace, tab, or an emoji. How that number is deciphered and stored as bits is up to the computer. As of the time of this writing, Unicode (now at version 14.0) has 144 697{144~697} ideas mapped, with 1 112 064{1~112~064} code points in reserve.

This entire discussion reveals a critical point when working with strings: There's no such thing as "plain text." Instructing a computer to change some int value to "plain text" is akin to asking the bureau de change, "Convert these dollars to currency." The only way a computer can separate 1 from "1" is if we explicitly provide the encoding encoding system to use. And it's considered best practice to explicitly define encoding whenever possible because there are multiple encoding systems:

  1. In ASCII, the characters are encoded as a sequence of 7 bits. This is a fixed-width encoding system, so only 128 characters can be represented. The characters mapped to the integers 0 to 31 are non-printable characters, while the characters from 32 to 127 comprise the printable characters often called "plain text."

  2. Like ASCII, Extended ASCII is a fixed-width encoding system, but with mappings for the additional 128 characters (really, all characters beyond the original 128). The name "extended ASCII" is informal. This is just ASCII, but with characters encoded as a sequence of 8 bits and the user providing some additional encodings (whether that's through a personal code page or another system, like Unicode) for additional numbers.

  3. OEM Code Pages or IBM Code Pages are fixed-width encoding systems for the additional 128 characters resulting from the unused eighth bit in ASCII. As such, characters in this encoding system are encoded as a sequence of 8 bits. There are multitudes of code pages, mapping the additional 128 characters to various symbols depending on language, field, country, or computer system.

  4. The ANSI Code Pages are Microsoft's equivalent to the IBM and OEM code pages, so characters here are also encoded as a fixed-width sequence of 8 bits. Contrary to popular belief, these pages were never standardized by ANSI (the American National Standards Institute, a private non-profit aimed at standardization). Microsoft intended to standardize one of their pages through ANSI and prepended to draft's title "ANSI", but no such standardization occurred.

  5. UTF-8, UTF-16, and UTF-32 are the most common systems used to convert Unicode code points to bits. Remember, Unicode is a system mapping concepts to code points; this process is distinct from converting the code points to bits.

The number U+1F60A is a code point. The letter U stands for Unicode and the number 1F60A is a hexadecimal number. To convert this code point into bits, the computer system looks for the encoding system we've defined. In Unicode, these systems are called Unicode Translation Formats (hence "UTF"). Importantly, the number following UTF (e.g., the 8 in UTF-8) does not specify how many bits the code point is translated into. Instead, it specifies the size of each code unit from translating the hexadecimal number. Thus, in UTF-8, the Unicode code point (the magic number 1F60A) is stored in memory as a sequence of 8-bits. Hence, every code point 0 to 7f (0 to 127 in decimal—the ASCII characters) is stored in exactly 1 byte. Code points beyond that are stored using 2, 3, or 4 bytes. Similarly, in UTF-32, the code units resulting from translating the code point are stored as sequences of 32 bits, and for UTF-16, a sequence of 16 bits.

Note how we said that UTF-8, UTF-16, and UTF-32 are the most common options. We say this because Unicode can be encoded through a wide variety of encoding systems: UTF-7, UCS, UCS-2 (now obsolete), ASCII, and many others. These other encoding systems continue to exist because the operations of other standards necessitate their use. For example, the standard for URL encoding is set by RFC 1738, which effectively provides that only a subset of the original ASCII characters can be used: We can't use non-printable characters and we can't use any of these characters:

""<>#{}sp (space)
|^\~[]```

If any of the characters above are used directly (i.e., maybe our directory name has a space, resulting in a space in the URL), an encoding algorithm is used:

  1. Find the ISO 8859-1 code point for the illegal character.
  2. Convert the code point to two hexadecimal characters.
  3. Append a percentage sign, %, to the front of the two hex characters.

For example, the single whitespace character is an illegal character under RFC 1738. Applying the algorithm above, the whitespace is replaced with a %20. Thus, when we see a %20 in a URL, we immediately know that whoever, or whatever, created that URL included a whitespace, inadvertently or otherwise. Similar algorithms exist for when we use characters that cannot be encoded. The replacement character   �   is often used to replace characters that cannot be encoded.

In sum, whenever we work with strings, it's important to always keep the encoding system in the back of our minds. This can be particularly helpful when analyzing and designing string algorithms:

Encoding System Lengths Memory Consumption
ASCII A sequence of 7 bits. Constant memory consumption; essentially 1 byte.
"Extended ASCII" A sequence of 8 bits. Constant memory consumption: 1 byte.
UTF-7 A sequence of 7 bits. Constant memory consumption; essentially 1 byte.
IBM/OEM Code Maps A sequence of 8 bits. Constant memory consumption: 1 byte.
ANSI Code Maps A sequence of 8 bits. Constant memory consumption: 1 byte.
ISO 8859 A sequence of 8 bits. Constant memory consumption: 1 byte.
UTF-8 Each code unit is a sequence of 8 bits.

Variable memory consumption; a character can take up 1, 2, 3, or 4 bytes. At a minimum, a character is 1 byte.

UTF-16

Each code unit is a sequence of 16 bits. The smallest possible memory consumption is 2 bytes, the largest is 4 bytes.

Variable memory consumption; a character can take up 2, 3, or 4 bytes. At a minimum, a character is 2 bytes.

UTF-32 A sequence of 32 bits. Constant memory consumption: 4 bytes.
UCS-2 (obsolete) A sequence of 16 bits. Constant memory consumption: 2 bytes.
UCS-4 (obsolete) A sequence of 32 bits. Constant memory consumption: 4 bytes.

To simplify our algorithms, we will be working almost exclusively with ASCII, where every character takes up 1 byte of memory. This will allow us to explore some of the limitations of such algorithms when a different encoding system is used. Because the original ASCII characters are widely used, it's helpful to memorize the following facts:

  1. The uppercase letters A through Z are mapped to the integers in the range [65,90].{[65, 90].}

  2. The lower case letters a through z are mapped to the integers in the range [97,122].{[97, 122].}

  3. The uppercase letters come "before" the lower case letters in terms of their integer equivalents.

  4. The numerals 0 through 9 are mapped to the integers in the range [48,57].{[48, 57].}

  5. The ranges of integers [32,47],{[32, 47],} [58,64],{[58, 64],} [91,96],{[91, 96],} and [123,127]{[123, 127]} map to special characters like () and /.

  6. The range of integers [0,31]{[0, 31]} map to control characters (these are non-printable characters).

  7. The integer 32{32} is mapped to whitespace.

  8. The integer 10{10} is mapped to linefeed (i.e., the result of hitting enter on the keyboard; a new line).

Strongly-typed v. Weakly-typed Languages

In many languages, like Python and JavaScript, we, the programmers, are not required to explicitly state what types our data are. These are called weakly-typed languages. Java and C, however, are strongly-typed languages—we must explicitly provide a datum's type. The benefit of strongly-typed languages: It forces us to specify how much space we need to store data. This means we have no choice but to be efficient. Moreover, it helps us catch some of the most common programming errors.

Why are there are so many types?

Java provides a variety of types to manage memory and to respond to hardware advances. Each of the types takes up a certain amount of memory. Here's an API:

Java Primitive TypeBytes RequiredRangeDefault
boolean1 bytetrue, falsefalse
byte2 bytes[128..127]{[-128..127]}0
char2 bytes[0..65535]{[0..65535]}\u0000
short2 bytes[32768..32767]{[-32768..32767]}0
int4 bytes[2147483649..2147483647]{[-2147483649..2147483647]}0
float4 bytes[1.4×1045..3.4×1038]{[-1.4 \times 10^{-45}..3.4 \times 10^{38}]}0.0f
long8 bytes--0
double8 bytes[4.39×10324..1.7×10308]{[-4.39 \times 10^{-324}..1.7 \times 10^{308}]}0.0d

But why so many? For example, byte, int, short, and long all represent integers. But why are there four separate types? The answer is a combination of history and economics.

First, all of the computations done by a computer are ultimately done by the computer's CPU. Without the CPU, a computer would just be a metal brick. Now, we might have heard of various terms like "64-bit processors" or "32-bit processors". At the time of writing, mainstream processors are 64-bit processors. Before this, we had 32-bit processors, and before that, 16-bit processors. There were, and are, 12-bit, 8-bit, and 4-bit processors. What do these terms mean?

In our early years, we learned to count with our fingers. 5 for five fingers, and 10 for ten fingers. Computers also have to count, but they don't have fingers. However, recall our discussion on representation. All the computer really needs is two fingers to represent 0 and 1. The word "bit," as used in "64-bit" communicates how many fingers the computer has to count. With a 4-bit processor, the computer only has four fingers, and it can only count up to the binary number 1111[0]{1111_{[0]}} (15 in decimal). With a 32-bit processor, the computer has 32 fingers, and it can count up to the binary number

11111111111111111111111111111111[0] 11111111111111111111111111111111_{[0]}

(4,294,967,296 in decimal).

Now, recall that when we execute programs, we are really sending instructions to the CPU. Those instructions are in 0s and 1s. The CPU, however, has a fundamental constraint: It has a fixed size for how many 0s and 1s it can process at once (or more formally, in one cycle). With a 64-bit processor, the CPU can process 64-bits of data in a single cycle. With a 32-bit processor, 32 bits, 16-bit 16 bits, and so on. This limitation impacts how well the computer handles large computations. For example, we can quickly compute 2+2=4.{2 + 2 = 4.} However, with something like 298+769,{298 + 769,} we have to perform carry-overs. The same idea extends to CPUs. With numbers beyond what it can handle in a single cycle, the CPU must perform more than 1 step to complete the computation.

This limitation extends to another important part of the computer: RAM. Recall the RAM diagram earlier. Each square in the grid has a memory address, and that address is named as an integer. For example, consider a 3-bit processor. With 3 bits, the computer can only count up to 111.{111.} This in turn means that the computer can only generate 8 possible patterns of bits: 000, 001, 010, 011, 100, 101, 110, or 111. In terms of memory, the computer would only be able to understand these 8 addresses. Reference anything beyond these addresses, and the computer won't know what we're talking about. And with that limitation, our programs can only be so large and complex—memory is everything.

How does this all relate to Java's types? At the time Java was introduced, mainstream processors were 32-bit processors. Accordingly, Java used 4 bytes to represent integers (8 bits in a byte, 4 bytes yields 32 bits). This also explains why compilers for older languages like C use 2 bytes to represent integers—at the time, 16-bit processors were the norm. To allow programmers to write programs for older machines—called backwards compatibility—Java provides byte and short. And in response to newer machines—using 64-bit processors—Java provided long.

Knowing these limitations is critical when we're working with data types of small value ranges. Failing to recall them can lead to unexpected results:

class Overflow {
	public static void main(String[] args) {
		byte num = 10;
		num += 256;
		System.out.println(num);
	}
}
10

Why isn't the console displaying 356? Because the type byte is restricted to exactly 1 byte—it can only store integers up to 256. Once we go beyond 256, go back to 0. In programming, this is called overflow. We can prevent encountering overflows by testing for the type's MIN_VALUE and MAX_VALUE.

Operators

Operators are evaluated in a specific order. Parentheses are always evaluated first. Inside the parentheses or otherwise, operators are evaluated left-to-right. Going from left to right, multiplication (*) and modulus (%) are evaluated before addition (+) and subtraction (-). Below is an API of the various operators in Java, where a{a} and b{b} are variables or literals.

Addition

a+b

The addition operator is straightforward. It computes a+b,{a + b,} where a{a} and b{b} are numeric types or char. If a{a} and b{b} are strings, then the strings are concatenated.

Subtraction

a-b

Another basic operation is subtraction—compute ab{a - b}, where a{a} and b{b} are numeric types or char.

Multiplication

a * b

Multiplication — a×b{a \times b} — is done with the asterisk or star symbol. In Java a{a} and b{b} are numeric types or char.

Division

a / b

Division — ab{\dfrac{a}{b}} — is done with the forward slash character. a{a} and b{b} are numeric types or char.

Remainder

a % b

The percentage sign corresponds to the remainder operator. It computes:

ab \dfrac{a}{b}

and returns the remainder. a{a} and b{b} are numeric types.

Increment

a++

The increment operator computes a+1.{a + 1.} a{a} is a numeric type.

Decrement

a--

The Decrement operator computes a1,{a - 1,} a{a} is a numeric type.

The Less Than Symbol

a < b

The less than symbol is a relational operator. It returns true if a<b,{a < b,} other wise false|a{a} is a numeric type.

The Greater Than Symbol

a > b

The greater than symbol is a relational operator. It returns true if a>b,{a > b,} other wise false|a{a} and b{b} are numeric types.

Less than or Equal To

a <= b

Another relational operator; returns true if a<b{a < b} or a=b,{a = b,} other wise false. a{a} and b{b} are numeric types.

Greater than or Equal To

a >= b

Relational operator; returns true if a>b{a > b} or a=b,{a = b,} other wise false|a{a} and b{b} are numeric types.

Equality

a == b

Relational operator; returns true if a=b,{a = b,} other wise false|a{a} and b{b} are numeric types.

Non-equality

a != b

Returns true if ab,{a \neq b,} other wise false|a{a} is a numeric type.

Logical AND

a && b

Logical operator AND; returns true if a{a} is true and b is true; otherwise false. a{a} and b{b} are of type bool

Logical OR

a || b

Logical operator OR; returns true if a{a} is true or b is true; otherwise false. a{a} and b{b} are of type bool.

Logical NOT

!a

The logical operator NOT returns false if a{a} is true, and returns true if a{a} is false. a{a} is of type bool

Idioms

Many computer science newcomers are unfamiliar with the remainder operator. This operator simply returns the remainder from dividing some number aa by b.{b.}

The remainder operator is a particularly useful operation. For example, if amod2{a \bmod 2} returns a remainder of 0, then we know that aa is an even number. If amod2{a \bmod 2} returns a remainder greater than 0, then we know that aa is an odd number. For example:

public class Demo {
	public static void main(String[] args) {
		int x = 4;
		int y = 5;
		boolean xIsEven = (x % 2 == 0); // xIsEven is true
		boolean yIsEven = (y % 2 == 0); // yIsEven is false
		System.out.println(xIsEven);
		System.out.println(yIsEven);
	}
}
true
false

Examining this use of the modulus operator, we can see that we can generalize this pattern even more—write x % 3, and we check if x is a multiple 3, x % 4, a multiple of 4, x % 11 a multiple of 11, and so on.

In computer science, this is an example of a idiom. An idiom is just a programming pattern. In this case, we have some pattern ${x}$ % ${n}$, where x{x} is some variable, and ${n}$ is a positive integer. Learning and recognizing idioms is a core skill in programming; with it, we can identify and solve smaller problems quickly. And as we'll see with functions, the best way to approach a problem is to break it down into smaller problems.

Bitwise Operators

The bitwise operators operate on bits of data. Because these operators operate on bits, they are the fastest of all operators. Below is an API of the operators. We will present examples separately.

OperatorMeaning
&Bitwise AND; corresponds to the logical {\land}
|Bitwise OR; corresponds to the logical {\lor}
~Bitwise NOT; corresponds to the logical ¬{\neg}
^Bitwise XOR; corresponds to the logical {\oplus}
>>Bitwise RIGHT SHIFT
>>>Bitwise UNSIGNED RIGHT SHIFT
<<Bitwise LEFT SHIFT

The bitwise operators are analogous to computing truth tables. Let's say we had two literals, a and b. Let's further say that a and b are represented in binary by just a single bit. Thus, a and b can only be 1 or 0. The bitwise AND & performs as such:

aba && b
000
010
100
111

The bitwise OR, |,

aba | b
000
011
101
111

The bitwise XOR, ^,

aba ^ b
000
011
101
110

and the bitwise NOT, ~,

a~a
01
10

Naming

A name, or more formally, an identifier, is a sequence of characters in source code used to label a particular entity (for example, a variable). In Java, names must begin with either Unicode letters and digits, the dollar sign ($), or the underscore (_). By convention, however, names should always begin with a letter. Names in Java also follow camelCasing (capitalizing every other word to indicate spacing), rather than potholecasing (using underscores to indicating spacing). The exception to this convention is with constants—very letter in a constant should be in upper case, with spaces indicated by undercores. Finally, Java places heavy emphasis on _descriptive and concise variable names, erring on the side of descriptive.

// This is good
int age = 22;

// These are good
int NUM_ATTENDEES = 87;
boolean ATTENDED = true;

// This is bad
float $pi = 3.14;

In general, there are three guidelines we should follow when creating variable names:

  1. The variable name should describe the data stored in the variable.

  2. Our code is read more times than it is written (whether by ourselves or others). Our priority should be "easy to understand," not "easy to write."

  3. If there are standard conventions, follow them; else, create a name and be consistent.

  4. Additionally, like any other programming language, there are certain words we cannot use as names in Java. These are called reserved words:

abstract, assert
boolean, break, byte
case, catch, char, class, const, continue
default, do, double
else, extends
false, final, finally, float, for
goto
if, implements, import, instanceof, int, interface
long
native, new, null
package, private, protected, public
return
short, static, strictfp, super, switch, synchronized
this, throw, throws, transient, true, try
void, volatile, while

Library Methods & APIs

In the examples above, we wrote the following:

System.out.print(/* some expression or value */)

This is a library method—a method provided by the Java library. There are numerous library methods in Java; methods for mathematics, printing, input and output, etc. Because of how many library methods there are, we will not list and explain them all at once. Instead, we will use and elaborate on them as needed.

Footnotes

  1. A data type is a finite set of values and the operations on those values. For example, the data type int consists of values: integers 231{-2^{31}} to 2311.{2^{31}-1.} And it consists of operations: addition, subtraction, multiplication, division, comparison, etc.

  2. The other end of the spectrum are the dynamically-typed languages like JavaScript; in these languages, a variable can store an integer, then later store a Boolean, then later store a string. Furthermore, type checking is done at runtime—dynamic type checking.

  3. The number 0001 is a hexadecimal number.

  4. The name "Unicode" is a morphological blending of unique, unified, universal, and encoding.