Collections

Below, we investigate the basic constructs provided by C++ for user-defined data types: enums, classes, structs, and many others. Understanding these constructs is critical, as they are the basic building blocks of object-oriented programming.

Arrays

Arrays are the simplest data structure. Physically, arrays occupy a contiguous region of memory in the computer. We can initialize an array in C++ with the following:

int main() {
	int x[5] = { 1, 2, 3, 4, 5 };
	cout << x << endl;
	return 0;
}
0x7ffeeb1462d0

The line above declares a variable arr, which will store an int array of size 10. In other words, an array that can only take 10 values of type int.

The output we see above is a memory address. Specifically, it's the memory address for where the array begins. This evidences the fact that an array, physically, is a contiguous region in memory.

We can access individual elements in the array with square-bracket syntax:

#include <iostream>
using namespace std;


int main() {
	int x[5] = { 1, 2, 3, 4, 5 };
	cout << x[0] << endl;
	cout << x[1] << endl;
	cout << x[2] << endl;
	cout << x[3] << endl;
	cout << x[4] << endl;
	return 0;
}
1
2
3
4
5

Notice that like most other languages, C++ indexes its arrays starting from 0. This means that if we start a loop at 0 and use i{i} as a counter, then the last element in the array is n1,{n-1,} where n{n} is the size of the array. E.g., the size of the array above is 5, but the last element's index is 4. This is point where many beginning programmers make mistakes, most commonly seen in a fence post problem — going one unit beyond the limit. This can lead to frustrating bugs like an out-of-bounds error:

#include <iostream>
using namespace std;


int main() {
	int x[5] = { 1, 2, 3, 4, 5 };
	cout << x[5] << endl;
	return 0;
}
main.cpp:7:10: warning: array index 5 is past the end of the array (which contains 5 elements) [-Warray-bounds]
		cout << x[5] << endl;
						^ ~
main.cpp:6:2: note: array 'x' declared here
				int x[5] = { 1, 2, 3, 4, 5 };
				^
1 warning generated.

Let's print out all the elements in the array:

#include <iostream>
using namespace std;


int main() {
	int x[5] = { 1, 2, 3, 4, 5 };
	size_t LengthOfx = sizeof x / sizeof x[0];
	for (int i = 0; i < LengthOfx; i++) {
		cout << x[i] << endl;
	}
	return 0;
}
1
2
3
4
5

Above, we used the sizeof operator to obtain the array's length. The sizeof operator is a built-in operator that returns the number of bytes allocated to a particular variable. Thus, when we call sizeof x, we are asking for the number of bytes for the entire array. To obtain the actual length of the array, we need to divide that number by the number of bytes for each element in the array.

Alternatively, starting with C++11, we can use a range-based loop:1

#include <iostream>
using namespace std;

int main() { int x[5] = { 1, 2, 3, 4, 5 }; cout << "[ "; for (auto i : x)
cout << i << ", "; cout << "]" << endl; return 0; }

[ 1, 2, 3, 4, 5, ]

Note that the code above was compiled with the following make file:

CXX=g++
CXXFLAGS=-g -Wall -MMD -std=c++20
clean:
	$(RM) *.o *.d output

The auto keyword is used to infer the type of an array. We could've used int just as well.

Grids; Matrices

A grid, or matrix, is an abstraction of a multidimensional array. A multidimensional array is simply an array whose elements are themselves arrays. For example, a 2-dimensional array is an array whose elements arrays. A 3-dimensional array is an array whose elements are 2-dimensional arrays. And a 4-dimensional array is an array whose elements are 3-dimensional array arrays. We can visualize these multi-dimensional arrays as such:

A multidimensional array, visualized as a cube.

For example, here is a 2-dimensional array:

#include <iostream>
using namespace std;

int main(int argc, char \*argv[]) {
	int a[3][3] = &lcub;{0, 1, 2}, {3, 4, 5}, {6, 7, 8}&rcub;;

	cout << "i a[i][j]" << endl;
	for (int i = 0; i < 3; i++) {
		for (int j = 0; j < 3; j++) {
			cout << i << " " << a[i][j] << endl;
		}
	}

	return 0;
}

i a[i][j]
0 0
0 1
0 2
1 3
1 4
1 5
2 6
2 7
2 8

Notice how we must use a nested for loop to target each element in the nested array.

Pointers & Grids.

Exercises

Because of how important arrays are in computer science, it's critical that we are comfortable with using them. Below are some exercises.

Given the array: [3, 9, 11, 23, 14, 10, 34, 18, 27], compute the sum of all the elements.

Solution Here is one possible solution:

#include <iostream>
using namespace std;


int main() {
	int arr[] = { 3, 9, 11, 23, 14, 10, 34, 18, 27 };
	int sum = 0;
	for(int i:arr)
		sum += i;
	cout << (int)sum << endl;
	return 0;
}
149

-Given the same array above, return the maximum.

Solution

#include <iostream>
using namespace std;


int main() {
	int arr[] = { 3, 9, 11, 23, 14, 10, 34, 18, 27 };
	size_t LengthOfArr = sizeof(arr) / sizeof(arr[0]);
	int max = arr[0];
	for (int i = 0; i < LengthOfArr; i++) {
		if (arr[i] > max) {
			max = arr[i];
		}
	}
	cout << (int)max << endl;
	return 0;
}
34

-Given the same array, return the minimum.

Solution

#include <iostream>
	using namespace std;


	int main() {
		int arr[] = { 3, 9, 11, 23, 14, 10, 34, 18, 27 };
		size_t LengthOfArr = sizeof(arr) / sizeof(arr[0]);
		int max = arr[0];
		for (int i = 0; i < LengthOfArr; i++) {
			if (arr[i] < max) {
				max = arr[i];
			}
		}
		cout << (int)max << endl;
		return 0;
	}
3

Given the array above, determine (i.e., return true, or 1) if the element 67 is in the array; otherwise false (0).

Solution

#include <iostream>
using namespace std;


int main() {
	int arr[] = { 3, 9, 11, 23, 14, 10, 34, 18, 27 };
	size_t LengthOfArr = sizeof(arr) / sizeof(arr[0]);
	int queryElement = 67;
	bool result = false;
	for (int i = 0; i < LengthOfArr; i++)
	{
		if (arr[i] == queryElement) {
			result = true;
		}
	}
	cout << result << endl;
	return 0;
}
0

The last exercise above is an example of a linear search algorithm. We are searching for a particular element. Given very large arrays, we can suspect that this is a grossly inefficient algorithm. Why? Because the computer must check, one by one, whether the value exists. In this case, the number 67 doesn't exist in the array. The computer had to go through all of the elements to make that determination!

Vectors

As we can likely tell, arrays in C++ are restrictive and somewhat dangerous. They are examples of static lists — an indexed list of elements with a fixed size. We cannot delete elements, nor can we insert elements (at least not without the use of pointers). We can resize them, but it's a long and unwieldy process. Perhaps worst of all, C++ allows us to index out of the array's bounds. Contrast this with a language like Python, which provides dynamic lists — lists that can grow and shrink as needed. For most tasks involving an array-based data structure, that's what we want — a dynamic list.

Through the standard library, C++ provides dynamic lists. Where Python simply calls them a list, C++ opts for vectors. To use vectors in our program, we will have to use a preprocessor: #include <vector>. Example:

#include <iostream>
#include <vector>
using namespace std;

int main() {
	vector<int> arrNums {22, 48, 92, 44};
	cout << arrNums[1] << endl;
	arrNums[1] = 14;
	cout << arrNums[1] << endl;
}
48
14

Notice that we can using the usual indexing methods. More importantly, we can mutate the array. Vectors in C++ also come with a variety of member functions. Some of the most common ones:

Member FunctionDescriptionComment
v.push_back(val)Appends the value val{val} to the end of the vector v.{v.}
v.clear()Removes all elements in the vector v.{v.}
v[i]Returns the value at index i{i}Alternative syntax: v.get(i)
v.insert(i, val)Inserts the element val{val} just before index i{i}Operation shifts the subsequent elements to the right.
v.empty()Returns true if v{v} contains no elements; else false.
v.erase(i)Removes and returns the value at index i.{i.}Operation shifts subsequent elements to the left.
v[i] = valReplaces the element at index i{i} with the value val.{val.}Alternative syntax: v.assign(i, val)
v.size()Returns the number of elements in the vector.

Vectors in C++ are implemented via templates.2 Accordingly, we can create vectors with elements of any established type, so long as we specify it in the angle brackets. E.g., vector<double>, or vector<std::string>.

When should we use a vector?

The vector type represents an indexed, linear, and homogenous collection. Accordingly, we want to use a vector whenever we're dealing with a homogenous list. For example, a list of prices, names, an order of true or false values, or even an ordered list of ordered lists (although, we'll see that we might want to use a better data structure for lists of the lists, the grid).

Vectors are useful because they provide several features: (1) We can obtain the list's size; (2) we can safely access the list (i.e., if we go out of bounds, we'll get an error); (3) storage is automatically handled (i.e., it grows and shrinks without us having to do anything); (4) we can insert and remove; and (5) we can produce deep-copies on assignment, as well as pass- and return-by-value.

Enumerations

Enumerations are what we use to represent a range of discrete values. An example most commonly cited is the range of months in a year: January, February, March, April, May, etc. are all discrete months. There are, however, numerous other examples of enumerations: letter grades, traffic lights, the planets, zodiac signs, days of the week, and many more.

Here's an example of an enumeration (an enum) in C++:

enum class LetterGrade {
	A,
	B,
	C,
	D,
	F
}

Writing the above, we've created a new data type called LetterGrade. This means we can use it as we would base types:

enum class LetterGrade {
	A,
	B,
	C,
	D,
	F
}
LetterGrade johnsGrade = {LetterGrade::A}

Notice the syntax for initialization. We explicitly state the type, as we would any other variable. But in initializing the value, we again state the type alongside double colons ::. Each of A, B, C, D, and F are called enumerators.

Notice what happens when we cast a LetterGrade value into an int:

#include <iostream>

enum class LetterGrade {
	A,
	B,
	C,
	D,
	F
};

int main() {
	LetterGrade johnsGrade{LetterGrade::A};
	std::cout << "John's grade: " << static_cast<int>(johnsGrade) << std::endl;
	return 0;
}
John's grade: 0

In C++, enums are actually a range of integers under the hood. By default, the enumerators start at 0 and increment by 1, all the way up to however many enumerators we've specified. These integers are called raw values. We can change the default behavior by setting an initial raw value:

#include <iostream>

enum class LetterGrade {
	A = 1,
	B,
	C,
	D,
	F
};

int main() {
	LetterGrade johnsGrade{LetterGrade::A};
	std::cout << "John's grade: " << static_cast<int>(johnsGrade) << std::endl;
	return 0;
}

With this modification, we see the following:

#include <iostream>

enum class LetterGrade {
	A = 1,
	B,
	C,
	D,
	F
};

int main() {
	LetterGrade johnsGrade{LetterGrade::A};
	std::cout << "John's grade: " << static_cast<int>(johnsGrade) << std::endl;
	return 0;
}
John's grade: 1

We aren't limited to just positive integers. We can change them to negatives as well:

#include <iostream>

enum class LetterGrade {
	A = -1,
	B,
	C,
	D,
	F
};

int main() {
	LetterGrade johnsGrade{LetterGrade::A};
	LetterGrade bobsGrade{LetterGrade::B};
	LetterGrade iansGrade{LetterGrade::C};
	LetterGrade zanesGrade{LetterGrade::D};
	LetterGrade heathersGrade{LetterGrade::F};
	std::cout << "John's grade: " << static_cast<int>(johnsGrade) << std::endl;
	std::cout << "Bob's grade: " << static_cast<int>(bobsGrade) << std::endl;
	std::cout << "Ian's grade: " << static_cast<int>(iansGrade) << std::endl;
	std::cout << "Zane's grade: " << static_cast<int>(zanesGrade) << std::endl;
	std::cout << "Heather's grade: " << static_cast<int>(heathersGrade) << std::endl;
	return 0;
}
John's grade: -1
Bob's grade: 0
Ian's grade: 1
Zane's grade: 2
Heather's grade: 3

Size of an Enum.

Because enumerators are inherently ints, the size of an enumerator will be the size of an int, as determined by the compiler. Thus, if the compiler employs a size of 4 bytes for int, then each enumerator has a size of 4 bytes.

We can, however, change an enumerator's internal representation. In other words, we aren't limited to int:

enum class ErrorMessages: unsigned char {
	0,
	1,
	2,
	3
};

Above, we tell C++ that each of the values above are represented as unsigned char. This means that we are limited to a numeric range of 0 to 255.

Use Cases

Enums are useful for when we have a discrete set of categories for values. One use case we see in the real world is an application's set of possible options. For example:

enum class UserOptions {
	New,
	Open,
	Save,
	SaveAs,
	Close,
	Export
};

Structs

C++ also supports record types, which we can define with the keyword struct. For example:

struct Coordinate_2D {
	double x;
	double y;
};

defines a new record type called Coordinate_2D. Having defined the record type, we can then use it in our source code:

#include <iostream>
using namespace std;

struct Coordinate_2D {
	double x;
	double y;
};

int main() {
	Coordinate_2D point1;
	point1.x = 1.1;
	point1.y = 2.7;
	cout << point1.x << endl;
	cout << point1.y << endl;
	return 0;
}
1.1
2.7

Type Synonyms

Like many other languages, C++ provides the ability to implement type synonyms (other languages might call these type aliases). For those unfamiliar, a type synonym is simply another name for an existing type. Type synonyms are particularly useful for readability:

typedef int wholeNumber

The type definition above creates another name for int, called wholeNumber. Every line of code thereafter, we can use wholeNumber just as we would int.

#include <iostream>
using namespace std;

typedef unsigned int naturalNumber;

int main() {
	naturalNumber x = 7;
	naturalNumber y = 2;
	naturalNumber z = x + y;
	cout << z << endl;
	return 0;
}
9

Footnotes

  1. The range-based loop is also called a for-each loop.

  2. For more information on templates, See Function Templates.