Assemblers
An assembler is a piece of software that translates assembly code to machine code. I.e., going from this:
@i
M=1
@sum
M=0
@i
@R0
to something like this:
0101010001001001001
0111110011001101001
1101010011001001001
0101010001001011001
1101010011001001001
0101010001001011001
Assemblers are actually fairly simple programs. The basic logic is as follows:
- While the end-of-file (EOF) has not been reached:
- Read the next Assembly language command.
- Break the command into its component fields.
- Lookup the binary code for each field.
- Combine the codes into a single machine language command.
- Output the machine language command.
Let's illustrate with an example. Suppose we have the Assembly language command:
Load R1, 18
This tells the computer to load the value 18 in the register 1. If we look at this command closely, it's a sequence of characters:
When this command is sent to the Assembler, it reads the command and breaks the command down into its component parts, stripping white space and commas in the process:
Once the Assembler has the component parts, it must translate each of them into machine language — meaningful sequences of ones and zeroes. To do so, it looks up the opcode for each command in the computer's opcode table:
Command | Opcode |
---|---|
Load | 11011 |
R1 | 01 |
18 | 000010010 |
Some of the commands identify registers, in which case they might have shorter opcodes than others. Other commands are direct translations of numbers, so the direct binary conversion is used. After translating each component, the Assembler puts all of the translated pieces together
After the translation stage is where computer architects take different directions. Some computers require the Assembler to add some padding to the sequence to prevent possible gaps in memory. This keeps memory addressing accurate and safe. Other computers end the Assembler's job here: Once the Assembler has translated all the commands, it outputs the machine language to some file that the computer's hardware can take as input.