Variables & Types
- Constants
- Types
- Boolean
- Initialization
- Modifying Variables
- Type Safety
- Why "Primitive" Type?
- The Special Case of Char
- Strongly-typed v. Weakly-typed Languages
- Operators
- Addition
- Subtraction
- Multiplication
- Division
- Remainder
- Increment
- Decrement
- The Less Than Symbol
- The Greater Than Symbol
- Less than or Equal To
- Greater than or Equal To
- Equality
- Non-equality
- Logical AND
- Logical OR
- Logical NOT
- Idioms
- Bitwise Operators
- Naming
- Library Methods & APIs
Programs need data. To give our programs that data, they must be stored somewhere in memory. But before we store any data in Java, we must tell Java beforehand that you are going to store data. To do so, we need variables—an entity that holds a data type value. In Java, every variable has a name and a data type.1
// Declare variable named c of type char
char c;
// Declare variable named num of type int
int num;
// Declare variable named isPresent of type boolean
boolean isPresent;
In the example above, there are comments, denoted by two forward slashes
(//
). Every other statement is called a declaration. The left side is
the variable's type, and the right side the variable's name. Simply
put, variable declaration is the act of creating a variable.
In programming, a comment is a statement ignored by the compiler. Because they're ignored by the compiler, we can use them to annotate our code. Comments are valuable in programming. They tell readers what a particular statement means or does. But, they also take up space and too many comments can clutter a program, making it unreadable.
Comments should be concise and descriptive. They should not be simply identical to the actual code (since that would simply be a waste of space).
A brief note on terminology: There are different kinds of variables in Java. Variables that store primitive type values are called primitive variables__ (or, __primitive constants). Variables that store objects are called object references. With an object reference, the variable doesn't actually hold the object—it instead holds a reference to the object.
Constants
Variables fall into two categories: (1) variables, and (2) constants. In languages like Java, the values we store in a variable can be mutated. Accordingly, the term variable, on its own, usually implies that the value stored in the variable can be mutated (again, in the context of Java; some languages do not permit such mutation). When we use the term constant, however, we are referring to a variable whose stored value cannot be changed.
Types
Data comes in many forms. The "Call me Ishmael" and "WARNING" are textual data. The number of users visiting this page is numeric data. These data all take a different form, and there are things we can and cannot do with them. We can add 4 and 7, but we cannot divide "love" by "children" (at least not logically). Because of this fact, Java (and many other programming languages) classify data by type.
While a program runs, and while a file is open, it is stored in a hardware
component called the RAM ("Random Access Memory"). When you install
programs and save files (and they are not running or open), they are stored
in a different hardware component, the hard drive
(or hard disk or
solid state disk; or a CD; or a USB; devices that store data, other than
RAM). Devices like hard drives do not require electricity to store data.
RAM, however, does. It is much faster than a hard drive, because it is a
purely electronic device—unlike hard drives, there are no moving parts.
With RAM, we have the ability to very quickly open files and execute
programs, because everything is electric. This is why we lose data when our
computer suddenly dies midway through writing a Word Document or a C
program. The data is stored in RAM, and without electricity coming from the
computer's power source, everything is lost.
At a very high level overview, the RAM can be understood as:
Whenever we declare a variable in Java, we must explicitly state the variable's type. This is an instruction to the compiler that the particular variable will store a value of a particular type. We will see later why Java's designers decided to implement this feature.
Now, once we declare a variable to be of a particular type, that variable's
type can never change. For example, when write int x;
the variable x
will always store some value of type int
. That value may change later,
but the variable's type int
can never be changed to double
or char
.
Because of this trait—the prohibition of changing a variable's type after
declaration—Java is described as a statically-typed language (i.e., the
language's types are "static").2
Whenever we compile Java source code, the javac
compiler will check all
of our source code to ensure we are complying with Java's type-checking
rules. Because type checking is done at compile time, we say that Java
employs a static type checking system.
With few exceptions relating to scope—in Java, once you have declared a variable name of a particular type, you cannot redeclare it later down the program:
// Declare variable named c of type char
char c;
// Variable is already declared, this will return an error
char c;
// This also won't work, even if it's of different type
int c;
Even if the last example worked, it would be a stark example of dirty code, poor programming hygiene. How will our future selves or other coders tell the two apart?
In Java, there are eight primitive data types. All other data in Java is represented by some combination of these eight primitive types. The types fall into four broad categories:
- integers
- floating point numbers
- text
- booleans
A helpful way to think about types is to imagine them as cups. They hold
something. For example, at a coffee shop, drinks can be ordered in various
sizes. Suppose the sizes are small, short, tall, grande, gigante, and
monstruoso. Some of the drink sizes are specific to certain drinks. Maybe
the monstruouso size is only available for lemonade or water—it would be
too costly to serve high quality coffee at such a size. The same idea
extends to types. As we'll see, an int
is bits, a long
is
bits, a char
is bits, and so on.
Integer Type
Integers are the whole numbers and their negative counterparts (e.g., 0, 5, -11, 27, etc.). In Java, these numbers can be represented by any one of these four types:
byte
short
int
long
We will later discuss why there are 4 different types. For now, let's
review some representation. As we know, the computer doesn't actually know
what 2
or 17
are—the computer only understands 1s and 0s. As such,
integers, written in Hindu-Arabic numerals, must be translated to binary
form for the computer to process.
Two's-Complement
With the natural numbers, we can convert these easily. The number is in binary, and is But how do we deal with negative integers?
Let's consider the data type byte
. As its name implies, a byte
can only
hold 8 bits. With 8 bits, we can represent unique bit patterns,
starting from (0 in decimal) through
(127 in decimal). Hang on. Why is is there there? Shouldn't it be
(128)? That is called the sign bit. If the
number is positive, the sign bit is and if it's negative, the sign
bit is
The next question, however, is how does the computer store negative
integers? For example, if we wrote how does the computer go from
this literal to binary? The answer is through two's-complement. The
idea is fairly straightforward. Let's start with the number stored
in a byte
, and work our way to negative First, we represent the
number in binary:
Next, we take what we call the one's-complement. Essentially, we, invert, or change, all of the digits into their opposites:
Then, we add to the one's-complement:
This result, is called two's-complement. Notice that we now have a as the sign-bit. This is a negative number, but it's not as read. We need the To get the we simply take the two's-complement of while keeping the sign bit in place.
As an aside, notice that when we add the binary representation of to the binary representation of we get
Floating Point Types
Floating point numbers are those numbers with fractional parts, or decimal points (i.e., 0.0, 12.9, 3.14, -1.29, etc.) We can represent these numbers with any one of these two types:
float pi = 3.14;
double e = 0.577;
Text
There are two ways to represent textual data in Java: with the primitive
type char
or with the abstract data type String
. We will investigate
what the differences between a primitive type and an abstract data type are
in later sections, but for now, it suffices to understand that char
is a
primitive, but String
is not.
The char
type represents individual alphanumeric characters for symbols.
There are different possible char
values; this includes letters,
numbers, symbols, and whitespace characters like single-space, tab, and
newline.
char x = 'x';
String n = "Hello";
The String
type represents sequences of characters. In Java, we denote
strings by enclosing a sequence of characters in double quotes.
Boolean
Boolean values are the binary values true
and false
. These are the only
two values under the type.
boolean a = true;
boolean b = false;
With just these eight types:
byte, short, int, long
float, double
char
boolean
we can represent any data that a computer can work with.
Initialization
Once we have declared a variable, we can assign it data. In doing so, we initialize the variable (we give the variable an initial value):
// declaration
float increment;
// initialization
increment = 0.1;
// inline-initialization: declare and initialize in one line
boolean isItSnowing = false;
long weightBlueWhale = 330000;
In the example above, notice that we can declare and initialize variables in one line. This is called inline-initialization.
Note that every statement in Java ends with a semicolon (;). Java is a semicolon-delimited language. To use variables in Java, they must be initialized:
double x;
System.out.println(x);
Line 3: error: variable x might not have been initialized
System.out.println(x);
^
1 error
A literal is the Java source code representation of a data type value. The number or value we explicitly assign to a variable is called a literal. In programs, there are values that change as the program runs; an incrementer or accumulator, for example. Literals are those values that are not computed; they are explicitly written. They might be manipulated, but the value we explicitly wrote always appears directly in our source code.
The opposite of a literal is an expression. An expression is a combination of literals, variables, and operations that Java must evaluate to produce a single value. We can also think of the expression as the Java source code representation of a computation.
// These are literals
char tic = 'x';
char tac = 'o';
// Note the L suffix; this is how we tell Java that this is big number
long bigNum = 10000L;
boolean isComplete = true;
boolean isIncomplete = false;
// These are expressions, not a literal
int foo = 1 + 1;
long biggerNum = bigNum + 10000L;
Modifying Variables
Once a value is assigned to a variable, we can modify it as the program runs:
int changing = 10;
changing = 20; // assign a new value
changing = 20 + 20; // Arithmetic is ok right of =
changing += 1; // Add 1, assign it the result
changing -= 1; // Subtract 1, assign it the result
changing *= 2; // Multiply by 2, assign it the result
changing /= 2; // Divide by 2, assign it the result
We can also modify variables by assigning existing variables. Always read assignments from right to left:
double first = 10.0;
double second = 5.0;
first = second // assignment, right to left
second = 20.0;
first = second + 10.0;
double third = 2.0;
first = second + third; // arithmetic with variables on right is ok
int z = 10;
z = z + 1; // z is now 11
The last example reveals a crucial point about assignment, and why we want
to always read them from right to left. The value assigned to z
at the
very last line is first computed—z + 1
—then assigned. The value stored in
z
when z + 1
is computed is 10
, so the value assigned to z
at the
very last line is 11
.
Type Safety
Once we declare a variable in that variable must maintain its type. We
cannot, say, declare a variable num
of type int
, and thereafter assign
to it data of type float
:
int num = 10;
num = 10.0; // error is returned
Failure to remember this rule can lead to unexpected results:
int num = 35;
num /= 2;
System.out.println(num);
11
The correct answer to the above should be (The
overline, called a vinculum, indicates repeating digits). We're getting
11
because num
was declared to store data of type int
. Note that
there is no rounding occurring here. Java is simply dropping the fractional
digits. Note that Strings in Java must be enclosed in double quotes:
System.out.println("Hello, world!'); // causes an error
System.out.println("Hello, world!"); // no error
Why "Primitive" Type?
The eight types above are referred to as "primitive" types because they can
all be represented by numbers. The numeric data types
byte, short, int, long, float, double
can obviously all be represented by
numbers. But what about char
and boolean
? Well, boolean
is simply a
binary value—we can represent false
with 0
and true
with 1
.
The Special Case of Char
In the early days of computing, the only characters necessary for computation were unaccented English characters (along with constructs like the linefeed, bell, and whitespace). All of these symbols—128 in total—formed the character set ASCII (American Standard Code for Information Interchange).
Because there were 128 ASCII characters, the integers from 0 to 127 were
used to represent each character. Because the highest possible integer
representation is 127—in binary, —seven bits were
sufficient for representing all 128 characters. However, by the time ASCII
was developed, the smallest possible unit in computer memory a user could
reference was a byte (eight bits). Accordingly, ASCII characters—in C and
C++, char
values—take up 1 byte of memory. Because of this allocation,
ASCII's users and developers found themselves with an additional bit. And
with eight bits, the integers 128$ to 255 were available for mapping—users
now had access to 256 possible characters.
The result was a lost-in-translation situation of painful magnitude. Governments, companies, independent developers, and users were coming up with their own ways of using the extra bit (i.e., the other 128 available integers). Documents, code, and data sent from one entity to the next could not be read because of conflicting standards.
Responding to the discord, IBM introduced code pages—systems mapping values to characters in an encoding system. In IBM's code pages, the integers 0 through 127 were always mapped to the ASCII characters, and the integers 128 to 255 (called the extended codes) were mapped to some language variation of the user's choice. For example, with code page 437, the extended codes were mapped to characters specific to IBM computers: diactritics (accented letters), icons, and system-specific symbols. For code page 737, the extended codes mapped to Greek letters, and for code page 826, the extended codes mapped to Turkish letters. With multiple code pages, users could simply swap code pages as needed. The mathematician might work predominantly with the Greek letter code page, but when reading a German paper, she could switch to the German code page. All 256 characters (the 128 original ASCII characters and the 128 additional characters from a code page) constitute an extended ASCII character set.
Because every ASCII encoding requires exactly 1 byte, we say that ASCII uses a fixed-width encoding system. This is a good point to clarify an important distinction: There's a difference between the character set (ASCII set) and the character set's encoding system (ASCII encoding). The encoding system is the way the characters in the set are represented in memory. In extended ASCII, characters are encoded as eight-bit character codes, as we stated earlier.
As the internet grew, consumers recognized that eight bits, 256 characters, were insufficient. And rightly so—the average Chinese user demands about 7000 characters for expressive use (from roughly 50000 possible characters). As exchanging text between systems—rather than entire systems or parts of the systems themselves—became prevalent, a paradigm shift in the encoding community occured. Rather than thinking of a character as a symbol with one, specified representation in computer memory, we think of a character as a concept that can be represented in multiple ways. In practice, we call the former paradigm a fixed-width encoding system, and the latter a variable-width encoding system.
For example, the letter A
in ASCII encoding employs the former paradigm.
It is always represented as:
Under the new paradigm, we map each character to a concept. That concept, called a code point, can then be deciphered by the computer in whatever way it sees fit (using more bits or bytes as necessary):3
The character set employing this new paradigm is called
Unicode.4 Above, the symbol U+0001
is a code point. The
code point is simply a number associated with a particular idea. That idea
could be a letter, a mathematical symbol, a numeral, whitespace, tab, or an
emoji. How that number is deciphered and stored as bits is up to the
computer. As of the time of this writing, Unicode (now at version 14.0) has
ideas mapped, with code points in reserve.
This entire discussion reveals a critical point when working with strings:
There's no such thing as "plain text." Instructing a computer to change
some int
value to "plain text" is akin to asking the bureau de change,
"Convert these dollars to currency." The only way a computer can separate
1
from "1"
is if we explicitly provide the encoding encoding system to use.
And it's considered best practice to explicitly define encoding whenever
possible because there are multiple encoding systems:
-
In ASCII, the characters are encoded as a sequence of 7 bits. This is a fixed-width encoding system, so only 128 characters can be represented. The characters mapped to the integers 0 to 31 are non-printable characters, while the characters from 32 to 127 comprise the printable characters often called "plain text."
-
Like ASCII, Extended ASCII is a fixed-width encoding system, but with mappings for the additional 128 characters (really, all characters beyond the original 128). The name "extended ASCII" is informal. This is just ASCII, but with characters encoded as a sequence of 8 bits and the user providing some additional encodings (whether that's through a personal code page or another system, like Unicode) for additional numbers.
-
OEM Code Pages or IBM Code Pages are fixed-width encoding systems for the additional 128 characters resulting from the unused eighth bit in ASCII. As such, characters in this encoding system are encoded as a sequence of 8 bits. There are multitudes of code pages, mapping the additional 128 characters to various symbols depending on language, field, country, or computer system.
-
The ANSI Code Pages are Microsoft's equivalent to the IBM and OEM code pages, so characters here are also encoded as a fixed-width sequence of 8 bits. Contrary to popular belief, these pages were never standardized by ANSI (the American National Standards Institute, a private non-profit aimed at standardization). Microsoft intended to standardize one of their pages through ANSI and prepended to draft's title "ANSI", but no such standardization occurred.
-
UTF-8, UTF-16, and UTF-32 are the most common systems used to convert Unicode code points to bits. Remember, Unicode is a system mapping concepts to code points; this process is distinct from converting the code points to bits.
The number U+1F60A
is a code point. The letter U
stands for Unicode
and the number 1F60A
is a hexadecimal number. To convert this code point
into bits, the computer system looks for the encoding system we've defined.
In Unicode, these systems are called Unicode Translation Formats (hence
"UTF"). Importantly, the number following UTF (e.g., the 8 in UTF-8) does
not specify how many bits the code point is translated into. Instead, it
specifies the size of each code unit from translating the hexadecimal
number. Thus, in UTF-8, the Unicode code point (the magic number 1F60A
)
is stored in memory as a sequence of 8-bits. Hence, every code point 0
to
7f
(0 to 127 in decimal—the ASCII characters) is stored in exactly 1
byte. Code points beyond that are stored using 2, 3, or 4 bytes. Similarly,
in UTF-32, the code units resulting from translating the code point are
stored as sequences of 32 bits, and for UTF-16, a sequence of 16 bits.
Note how we said that UTF-8, UTF-16, and UTF-32 are the most common options. We say this because Unicode can be encoded through a wide variety of encoding systems: UTF-7, UCS, UCS-2 (now obsolete), ASCII, and many others. These other encoding systems continue to exist because the operations of other standards necessitate their use. For example, the standard for URL encoding is set by RFC 1738, which effectively provides that only a subset of the original ASCII characters can be used: We can't use non-printable characters and we can't use any of these characters:
"" | < | > | # | { | } | sp (space) |
| | ^ | \ | ~ | [ | ] | ``` |
If any of the characters above are used directly (i.e., maybe our directory name has a space, resulting in a space in the URL), an encoding algorithm is used:
- Find the ISO 8859-1 code point for the illegal character.
- Convert the code point to two hexadecimal characters.
- Append a percentage sign,
%
, to the front of the two hex characters.
For example, the single whitespace character is an illegal character under
RFC 1738. Applying the algorithm above, the whitespace is replaced with a
%20
. Thus, when we see a %20
in a URL, we immediately know that
whoever, or whatever, created that URL included a whitespace, inadvertently
or otherwise. Similar algorithms exist for when we use characters that
cannot be encoded. The replacement character � is often
used to replace characters that cannot be encoded.
In sum, whenever we work with strings, it's important to always keep the encoding system in the back of our minds. This can be particularly helpful when analyzing and designing string algorithms:
Encoding System | Lengths | Memory Consumption |
---|---|---|
ASCII | A sequence of 7 bits. | Constant memory consumption; essentially 1 byte. |
"Extended ASCII" | A sequence of 8 bits. | Constant memory consumption: 1 byte. |
UTF-7 | A sequence of 7 bits. | Constant memory consumption; essentially 1 byte. |
IBM/OEM Code Maps | A sequence of 8 bits. | Constant memory consumption: 1 byte. |
ANSI Code Maps | A sequence of 8 bits. | Constant memory consumption: 1 byte. |
ISO 8859 | A sequence of 8 bits. | Constant memory consumption: 1 byte. |
UTF-8 | Each code unit is a sequence of 8 bits. | Variable memory consumption; a character can take up 1, 2, 3, or 4 bytes. At a minimum, a character is 1 byte. |
UTF-16 | Each code unit is a sequence of 16 bits. The smallest possible memory consumption is 2 bytes, the largest is 4 bytes. | Variable memory consumption; a character can take up 2, 3, or 4 bytes. At a minimum, a character is 2 bytes. |
UTF-32 | A sequence of 32 bits. | Constant memory consumption: 4 bytes. |
UCS-2 (obsolete) | A sequence of 16 bits. | Constant memory consumption: 2 bytes. |
UCS-4 (obsolete) | A sequence of 32 bits. | Constant memory consumption: 4 bytes. |
To simplify our algorithms, we will be working almost exclusively with ASCII, where every character takes up 1 byte of memory. This will allow us to explore some of the limitations of such algorithms when a different encoding system is used. Because the original ASCII characters are widely used, it's helpful to memorize the following facts:
-
The uppercase letters
A
throughZ
are mapped to the integers in the range -
The lower case letters
a
throughz
are mapped to the integers in the range -
The uppercase letters come "before" the lower case letters in terms of their integer equivalents.
-
The numerals
0
through9
are mapped to the integers in the range -
The ranges of integers and map to special characters like
()
and/
. -
The range of integers map to control characters (these are non-printable characters).
-
The integer is mapped to whitespace.
-
The integer is mapped to linefeed (i.e., the result of hitting enter on the keyboard; a new line).
Strongly-typed v. Weakly-typed Languages
In many languages, like Python and JavaScript, we, the programmers, are not required to explicitly state what types our data are. These are called weakly-typed languages. Java and C, however, are strongly-typed languages—we must explicitly provide a datum's type. The benefit of strongly-typed languages: It forces us to specify how much space we need to store data. This means we have no choice but to be efficient. Moreover, it helps us catch some of the most common programming errors.
Why are there are so many types?
Java provides a variety of types to manage memory and to respond to hardware advances. Each of the types takes up a certain amount of memory. Here's an API:
Java Primitive Type | Bytes Required | Range | Default |
---|---|---|---|
boolean | 1 byte | true , false | false |
byte | 2 bytes | 0 | |
char | 2 bytes | \u0000 | |
short | 2 bytes | 0 | |
int | 4 bytes | 0 | |
float | 4 bytes | 0.0f | |
long | 8 bytes | -- | 0 |
double | 8 bytes | 0.0d |
But why so many? For example, byte
, int
, short
, and long
all
represent integers. But why are there four separate types? The answer is a
combination of history and economics.
First, all of the computations done by a computer are ultimately done by the computer's CPU. Without the CPU, a computer would just be a metal brick. Now, we might have heard of various terms like "64-bit processors" or "32-bit processors". At the time of writing, mainstream processors are 64-bit processors. Before this, we had 32-bit processors, and before that, 16-bit processors. There were, and are, 12-bit, 8-bit, and 4-bit processors. What do these terms mean?
In our early years, we learned to count with our fingers. 5 for five fingers, and 10 for ten fingers. Computers also have to count, but they don't have fingers. However, recall our discussion on representation. All the computer really needs is two fingers to represent 0 and 1. The word "bit," as used in "64-bit" communicates how many fingers the computer has to count. With a 4-bit processor, the computer only has four fingers, and it can only count up to the binary number (15 in decimal). With a 32-bit processor, the computer has 32 fingers, and it can count up to the binary number
(4,294,967,296 in decimal).
Now, recall that when we execute programs, we are really sending instructions to the CPU. Those instructions are in 0s and 1s. The CPU, however, has a fundamental constraint: It has a fixed size for how many 0s and 1s it can process at once (or more formally, in one cycle). With a 64-bit processor, the CPU can process 64-bits of data in a single cycle. With a 32-bit processor, 32 bits, 16-bit 16 bits, and so on. This limitation impacts how well the computer handles large computations. For example, we can quickly compute However, with something like we have to perform carry-overs. The same idea extends to CPUs. With numbers beyond what it can handle in a single cycle, the CPU must perform more than 1 step to complete the computation.
This limitation extends to another important part of the computer: RAM. Recall the RAM diagram earlier. Each square in the grid has a memory address, and that address is named as an integer. For example, consider a 3-bit processor. With 3 bits, the computer can only count up to This in turn means that the computer can only generate 8 possible patterns of bits: 000, 001, 010, 011, 100, 101, 110, or 111. In terms of memory, the computer would only be able to understand these 8 addresses. Reference anything beyond these addresses, and the computer won't know what we're talking about. And with that limitation, our programs can only be so large and complex—memory is everything.
How does this all relate to Java's types? At the time Java was introduced,
mainstream processors were 32-bit processors. Accordingly, Java used 4
bytes to represent integers (8 bits in a byte, 4 bytes yields 32 bits).
This also explains why compilers for older languages like C use 2 bytes to
represent integers—at the time, 16-bit processors were the norm. To allow
programmers to write programs for older machines—called backwards
compatibility—Java provides byte
and short
. And in response to newer
machines—using 64-bit processors—Java provided long
.
Knowing these limitations is critical when we're working with data types of small value ranges. Failing to recall them can lead to unexpected results:
class Overflow {
public static void main(String[] args) {
byte num = 10;
num += 256;
System.out.println(num);
}
}
10
Why isn't the console displaying 356
? Because the type byte
is
restricted to exactly 1 byte—it can only store integers up to 256. Once we
go beyond 256
, go back to 0
. In programming, this is called
overflow. We can prevent encountering overflows by testing for the
type's MIN_VALUE
and MAX_VALUE
.
Operators
Operators are evaluated in a specific order. Parentheses are always
evaluated first. Inside the parentheses or otherwise, operators are
evaluated left-to-right. Going from left to right, multiplication (*
) and
modulus (%
) are evaluated before addition (+
) and subtraction (-
).
Below is an API of the various operators in Java, where and are
variables or literals.
Addition
a+b
The addition operator is straightforward. It computes where
and are numeric types or char
. If and are
strings, then the strings are concatenated.
Subtraction
a-b
Another basic operation is subtraction—compute , where and
are numeric types or char
.
Multiplication
a * b
Multiplication — — is done with the asterisk or star symbol.
In Java and are numeric types or char
.
Division
a / b
Division — — is done with the forward slash character.
and are numeric types or char
.
Remainder
a % b
The percentage sign corresponds to the remainder operator. It computes:
and returns the remainder. and are numeric types.
Increment
a++
The increment operator computes is a numeric type.
Decrement
a--
The Decrement operator computes is a numeric type.
The Less Than Symbol
a < b
The less than symbol is a relational operator. It returns true
if
other wise false| is a numeric type.
The Greater Than Symbol
a > b
The greater than symbol is a relational operator. It returns true
if
other wise false| and are numeric types.
Less than or Equal To
a <= b
Another relational operator; returns true
if or
other wise false. and are numeric types.
Greater than or Equal To
a >= b
Relational operator; returns true
if or other wise
false| and are numeric types.
Equality
a == b
Relational operator; returns true
if other wise false|
and are numeric types.
Non-equality
a != b
Returns true
if other wise false| is a numeric type.
Logical AND
a && b
Logical operator AND; returns true
if is true
and b
is true
;
otherwise false. and are of type bool
Logical OR
a || b
Logical operator OR; returns true
if is true
or b
is true
;
otherwise false. and are of type bool
.
Logical NOT
!a
The logical operator NOT returns false
if is true
, and returns
true
if is false
. is of type bool
Idioms
Many computer science newcomers are unfamiliar with the remainder operator. This operator simply returns the remainder from dividing some number by
The remainder operator is a particularly useful operation. For example, if returns a remainder of 0, then we know that is an even number. If returns a remainder greater than 0, then we know that is an odd number. For example:
public class Demo {
public static void main(String[] args) {
int x = 4;
int y = 5;
boolean xIsEven = (x % 2 == 0); // xIsEven is true
boolean yIsEven = (y % 2 == 0); // yIsEven is false
System.out.println(xIsEven);
System.out.println(yIsEven);
}
}
true
false
Examining this use of the modulus operator, we can see that we can
generalize this pattern even more—write x % 3
, and we check if x
is a
multiple 3, x % 4
, a multiple of 4, x % 11
a multiple of 11, and so on.
In computer science, this is an example of a idiom. An idiom is just a
programming pattern. In this case, we have some pattern ${x}$ % ${n}$
,
where is some variable, and ${n}$
is a positive integer. Learning
and recognizing idioms is a core skill in programming; with it, we can
identify and solve smaller problems quickly. And as we'll see with
functions, the best way to approach a problem is to break it down into
smaller problems.
Bitwise Operators
The bitwise operators operate on bits of data. Because these operators operate on bits, they are the fastest of all operators. Below is an API of the operators. We will present examples separately.
Operator | Meaning |
---|---|
& | Bitwise AND ; corresponds to the logical |
| | Bitwise OR ; corresponds to the logical |
~ | Bitwise NOT ; corresponds to the logical |
^ | Bitwise XOR ; corresponds to the logical |
>> | Bitwise RIGHT SHIFT |
>>> | Bitwise UNSIGNED RIGHT SHIFT |
<< | Bitwise LEFT SHIFT |
The bitwise operators are analogous to computing truth tables. Let's say we
had two literals, a
and b.
Let's further say that a
and b
are
represented in binary by just a single bit. Thus, a
and b
can only be
1
or 0
. The bitwise AND
&
performs as such:
a | b | a && b |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
The bitwise OR
, |
,
a | b | a | b |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
The bitwise XOR
, ^
,
a | b | a ^ b |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
and the bitwise NOT
, ~
,
a | ~a |
---|---|
0 | 1 |
1 | 0 |
Naming
A name, or more formally, an identifier, is a sequence of
characters in source code used to label a particular entity (for example, a
variable). In Java, names must begin with either Unicode letters and
digits, the dollar sign ($
), or the underscore (_
). By convention,
however, names should always begin with a letter. Names in Java also follow
camelCasing (capitalizing every other word to indicate spacing), rather
than potholecasing (using underscores to indicating spacing). The
exception to this convention is with constants—very letter in a constant
should be in upper case, with spaces indicated by undercores. Finally, Java
places heavy emphasis on _descriptive and concise variable names,
erring on the side of descriptive.
// This is good
int age = 22;
// These are good
int NUM_ATTENDEES = 87;
boolean ATTENDED = true;
// This is bad
float $pi = 3.14;
In general, there are three guidelines we should follow when creating variable names:
-
The variable name should describe the data stored in the variable.
-
Our code is read more times than it is written (whether by ourselves or others). Our priority should be "easy to understand," not "easy to write."
-
If there are standard conventions, follow them; else, create a name and be consistent.
-
Additionally, like any other programming language, there are certain words we cannot use as names in Java. These are called reserved words:
abstract, assert
boolean, break, byte
case, catch, char, class, const, continue
default, do, double
else, extends
false, final, finally, float, for
goto
if, implements, import, instanceof, int, interface
long
native, new, null
package, private, protected, public
return
short, static, strictfp, super, switch, synchronized
this, throw, throws, transient, true, try
void, volatile, while
Library Methods & APIs
In the examples above, we wrote the following:
System.out.print(/* some expression or value */)
This is a library method—a method provided by the Java library. There are numerous library methods in Java; methods for mathematics, printing, input and output, etc. Because of how many library methods there are, we will not list and explain them all at once. Instead, we will use and elaborate on them as needed.
Footnotes
-
A data type is a finite set of values and the operations on those values. For example, the data type
int
consists of values: integers to And it consists of operations: addition, subtraction, multiplication, division, comparison, etc. ↩ -
The other end of the spectrum are the dynamically-typed languages like JavaScript; in these languages, a variable can store an integer, then later store a Boolean, then later store a string. Furthermore, type checking is done at runtime—dynamic type checking. ↩
-
The number
0001
is a hexadecimal number. ↩ -
The name "Unicode" is a morphological blending of unique, unified, universal, and encoding. ↩