Language Overview: C
These notes provide an overview of the C language. C is a mid-level language (as opposed to high-level languages like Python or JavaScript and low-level languages like x86-64 Assembly) geared at procedural programming. We can think of it as human-friendly Assembly.
Nodding to tradition, here is our introductory C program:
#include <stdio.h>
int main() {
// Prints hello world
printf("Hello, world\n");
return 0;
}
The program has the following parts:
#include <stdio.h>
is a preprocessor directive. This instruction is similar to the import statement found in languages like Python and JavaScript.stdio.h
is C's standard input/output library; a package for handling input and output to and from the console and files. The.h
extension indicates that we're importing source code from a header file, which is a file containing declarations for various functions or variables (essentially, an API).- The entire program is the definition of a function called
main()
. Themain()
function is a special function in C — it serves as the program's main entry point. The main entry point is the first place the computer will look at when it begins executing our instructions. - All functions in C have some sort of type. In this case,
main()
returns anint
. Even though we don't see an explicit return statement, all functions in C — with the exception ofvoid
functions, more on that later — return some value.
Basic Operators
C provides the typical primitive operators we would expect from a programming language: Addition, subtraction, multiplication, division, and the remainder operation.
#include <stdio.h>
int main() {
// addition
int sum = 12 + 10;
printf("12 + 10 = %i\n", sum);
// subtraction
int difference = 12 - 10;
printf("12 - 10 = %i\n", difference);
// multiplication
int product = 12 * 12;
printf("12 * 12 = %i\n", product);
// division
int quotient = 20 / 10;
printf("20 / 10 = %i\n", quotient);
// remainder operation
int remainder = 15 % 2;
printf("15 % 2 = %i\n", remainder);
return 0;
}
12 + 10 = 22
12 - 10 = 2
12 * 10 = 144
20 / 10 = 2
15 % 2 = 1
Because C is a relatively old language, the API of native operators is fairly small. The most important operators are presented in the table below.
symbol | operation |
---|---|
+ | Add |
- | Subtract |
* | Multiply |
/ | Divide |
% | Remainder operator |
++ | Increment |
-- | Decrement |
== | Equal |
!= | Not equal |
> | Greater than |
< | Less than |
>= | Greater than or equal to |
<= | Less than or equal to |
&& | Logical and |
|| | Logical or |
! | Logical not |
? : | Logical ternary |
& | Bitwise and |
| | Bitwise or |
^ | Bitwise xor |
~ | Bitwise one's complement |
<< | Bitwise shift left |
>> | Bitwise shift right |
sizeof() | Get the size of |
A[] | Array subscript, where |
& | The address of |
* | The value of |
-> | Structure dereference |
S.x | Structure reference, where |
= | Assign equal |
+= | Assign plus-equal |
-= | Assign minus-equal |
*= | Assign multiply-equal |
/= | Assign divide-equal |
%= | Assign modulus-equal |
<<= | Assign shift-left-equal |
>= | Assign shift-right-equal |
&= | Assign and-equal |
^= | Assign xor-equal |
= | Assign or-equal |
Variables
In the ancient Greek saga The Twelve Labors, the Greek demigod Hercules travels throught the far reaches of the Greco-Roman world, completing various tasks. As remarkable and brave Hercules is, he does not complete these tasks alone. Throught the saga, he obtains information and assistance from various persons. In the first task, killing the Nemean lion, a boy provides Hercules some data: If he slew the Nemean lion and returned alive within 30 days, the town would sacrifice a lion to Zeus, but if he did not, the boy would sacrifice himself. In the eleventh task, when Hercules must steal three golden apples from the Garden of the Hesperides, Hercules abducts the Old Man of the Sea and compels him to reveal the location of the garden.
Variables in C are akin to the minor characters of the Herculean saga. They hold information, and that information can be one of two things: (1) A literal value, or (2) an address. The C programmer, however, is much more powerful than Hercules — she can create these lesser characters at her bidding. This is done through variable declaration and assignment, which can both be done simultaneously in a process called variable initialization:
#include <stdio.h>
int main() {
int x; // variable declaration
x = 2; // variable assignment
int y = 3; // variable initialization
return 0;
}
To initialize a variable in C, we employ the following syntax:
t n = val
Where is a data type, is the variable's name, and is the data we assign to
Data Types
C is a statically- and explicitly-typed language. In the context of variables, this means we must explicitly state what type of data a particular variable will hold. In C, there several primitive data types (data types built in to the language). We provide an explicit data type in C to instruct the compiler how much memory should be allocated in memory for the data. How much memory a data type takes, however, depends on the compiler, or more generally, on the system architecture (i.e., 32b compiler vs. 64b compiler).
Examining the table below, it's helpful recall the units of computer memory. One byte (denoted ) is made of eight bits (denoted ). And a single bit is one of two values: or An unsigned 8-bit variable is a variable that can take on values between and A signed 8-bit variable can take on values between and Thus, when a variable is signed, half of its total range is spread below zero, the other half above zero.
char
- Single textual characters
- 1B
- Has a value range of or
unsigned char
- Single textual characters
- 1B
- Has a value range of
signed char
- Single textual characters
- 1B
- Has a value range of
int
- Integers
- 2B on a 32-bit compiler, 4B on a 64-bit.
- On a 32-bit compiler, And on a 64-bit compiler,
unsigned int
- Positive integers
- On a 32-bit compiler, 2B, and on a 64-bit compiler, 4B.
- On a 32-bit compiler, and on a 64-bit compiler,
short
- Short integers
- 2B
unsigned short
- Short positive integers
- 2B
long
- long integers
- On a 32-bit compiler, 4B, and on a 64-bit compiler, 8B.
unsigned long
- long positve integers
- 8B
float
- floating-point numbers (numbers with a decimal point) 4B
- (6 decimal places)
double
- double-precision floating point numbers
- 8B
- (15 decimal places)
long double
- floating point numbers
- 10B
- (19 decimal places)
Of note, C does not have a built-in data type for Boolean values. Instead,
any nonzero value is equivalent to true
, and 0 is equivalent to false
.
Because of this approach, the result of applying a relational or logical
operator is always a (false) or a (true). We can, however,
include the header file <stdboo.h>
, in which case the data type bool
is
provided.
#include <stdbool.h>
int main() {
bool x = true;
bool y = false;
}
Machine Size Test
We can determine the memory allocations for a given machine by running the code below:
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("a char is %ld bytes\n", sizeof(char));
printf("an int is %ld bytes\n", sizeof(int));
printf("an float is %ld bytes\n", sizeof(float));
printf("a double is %ld bytes\n", sizeof(double));
printf("a short int is %ld bytes\n", sizeof(short int));
printf("a long int is %ld bytes\n", sizeof(long int));
printf("a long double is %ld bytes\n", sizeof(long double));
return 0;
}
a char is 1 bytes
an int is 4 bytes
an float is 4 bytes
a double is 8 bytes
a short int is 2 bytes
a long int is 8 bytes
a long double is 16 bytes
Binary & Hexadecimal
As we know, there are three number systems we use in computing:
- decimal
- binary
- hexadecimal
In decimal, we use the number as a base, corresponding to digits:
In binary, we use the number as a base, corresponding to digits for representation:
For computers, each binary place is called a bit. There are bits
in a byte. Although standard C does not define binary constants, GNU C,
clang, and other popular compilers allow us to denote binary numbers with
the modern 0b
or 0B
prefixes:
#include <stdio.h>
int main() {
int x = 0b01;
int y = 0b10;
int z = x + y;
printf("%d + %d = %d \n", x, y, z);
return 0;
}
1 + 2 = 3
Because we're limited to two digits, binary numbers can quickly grow too long for practical use. This is worsened by the fact that the numbers we need to interact with hardware on modern computers are big, big numbers. Accordingly, we have an even more concise way of expressing numbers: hexadecimal. In hex, we use a base of
In terms of computer memory, hex digits make one byte (8 bits). A
single hex digit ( bits, or half a byte) is called a
nibble.1 Like binary numbers, most modern compilers allow us
to write hexadecimal numbers by prepending the prefix 0x
:
#include <stdio.h>
int main() {
int x = 0xA57;
int y = 0xB85;
int z = x + y;
printf("%d + %d = %d \n", x, y, z);
return 0;
}
2647 + 2949 = 5596
In Assembly, hex numbers are often indicated by $
prefix.
Casting
Consider the following code:
#include <stdio.h>
int main() {
double n = 3;
double m = 2;
int result = n / m;
printf("result = %d\n", result);
return 0;
}
We would expect the value of result
to be But notice the output:
1
In C, operations performed on arguments of the same type will output
a return of type An int
times an int
is an int
, a float
divided by a float
is a float
. However, if we assign a value of type
double
to a variable of type int
, we will implicitly cast the
double
value to an int
. This is what we're seeing above. To get back
we must ensure result
is of type double
:
#include <stdio.h>
int main() {
double n = 3;
double m = 2;
double result = n / m;
printf("result = %f\n", result);
return 0;
}
result = 1.500000
Loops
Loops in C are very much like other languages. Here is the for-loop:
#include <stdio.h>
int main() {
int SUM = 0;
for (int i = 0; i < 5; i++) {
SUM += i;
}
printf("Sum from 0 to 5 = %i\n", SUM);
return 0;
}
Sum from 0 to 5 = 10
And here is the while-loop:
#include <stdio.h>
int main() {
int SUM = 0;
int count = 0;
while (count < 5) {
SUM += count;
count++;
}
printf("Sum from 0 to 5 = %i\n", SUM);
return 0;
}
Sum from 0 to 5 = 10
Symbolic Constants
When writing programs, we want to avoid writing magic numbers. This is particularly true for C programs, since we aren't afforded the same level of safety as we would in languages like Java. Remember, C is a low-level language. Magic numbers reduce readability, and readability is paramount in C programming — we don't have nearly as much syntactic sugar or idioms as other languages, so C programs tend to be verbose and longer. And the longer the program, the more valuable is readability.
Suppose we want to sum the ages 12 through 30. We could use the same summing procedure above:
#include <stdio.h>
int main() {
int sum = 0;
for (int i = 12; i <= 30; i++) {
sum += i;
}
printf("age sum = %i\n", sum);
return 0;
}
age sum = 399
The problem, however, is that the code isn't very readable. Why 12? Why 30? To make it more readable, we want to use symbolic constants:
#include <stdio.h>
#define YOUNGEST_AGE 12
#define OLDEST_AGE 30
int main() {
int sum_of_ages = 0;
for (int age = YOUNGEST_AGE; age <= OLDEST_AGE; age++) {
sum_of_ages += age;
}
printf("age sum = %i\n", sum_of_ages);
return 0;
}
age sum = 399
Symbolic constants are essentially symbols for specified values. Wherever
the C compiler encounters a symbolic constant, it will replace it with the
value we've specified. In this case, YOUNGEST_AGE
is replaced with 12
,
and OLDEST_AGE
is replaced with 30.
Streams
C provides constructs for streams. Such streams include, for example, user and file I/O. Each construct is presented.
User Input
Recall that when we obtain input from a user, we read data from the user. For example, suppose we wanted to write a program that averages two grades. We could do so with the following:
#include <stdlib>
int main() {
int grade1 = 90;
int grade2 = 85;
int average = (grade1 + grade2) / 2;
printf("average: %d\n", average);
return 0;
}
The program above works fine, but it isn't general — the program only works for a specific set of inputs. A more general program would be that one works for any arbitrary set of inputs. One way to implement such a program would be to read data from the user:
#include <stdlib>
int main() {
int grade1;
int grade2;
scanf("%d", &grade1);
scanf("%d", &grade2);
int average = (grade1 + grade2) / 2;
printf("average: %d\n", average);
return 0;
}
Footnotes
-
Also spelled "nybble." ↩