Assembler programming on the Raspberry Pi
Talk to your Raspberry Pi in its native assembler language.
Assembler programs run directly on the computer’s hardware, which means they can reach nearly the maximum achievable speed of execution. Because assembler program code is very low level, writing the code is more complicated, but it is still the best choice for some tasks, especially on a computer such as the Raspberry Pi with its limited resources. Before you can start creating programs, however, you need to plumb the depths of the CPU and peripheral architecture.
Machine Code
To begin, it makes sense to clarify some terms. The CPU only understands machine code – zeros and ones or, more precisely, voltage levels that represent zeros and ones. Each command in machine code has a human-readable abbreviation that is easy to remember. These abbreviations are known as mnemonics and act as assembler commands. Assembler code is specific to a CPU architecture, which means that code for a Raspberry Pi (ARM) will not run on a PC (x86).
Programming in assembler on the Raspberry Pi can be approached in two ways: First, you can create an image in which you package the code and then boot the small-board computer (SBC) from that image to run the program. In other words, you degrade the Raspberry Pi to a microcontroller. With this method, the Pi runs without an operating system. Although you have full access to everything, you don’t even get a shell.
The second way is to run the assembler program on the Raspberry Pi itself, which gives you the luxury of an operating system with everything that entails; however, you are limited in terms of direct access to the hardware. The second method was used for the example in this article.
Setup
A Raspberry Pi 3 with the current Raspberry Pi OS Lite provides the basis for my experiments. I will use Raspberry Pi Imager to prepare the SD card, after which, I can boot from the card and get started right away, because all the tools needed for coding in assembler are included in the image. That said, an additional action provides more convenience and flexibility (see the “Activating SSH” box).
To remove the need for an additional monitor and keyboard, I recommend working on the Raspberry Pi over SSH. To get the service running correctly on first boot requires some minor intervention. To begin, create an empty
/boot/ssh
file on the SD card; the SSH daemon will then launch automatically at boot time.If needed, redirect the output of the X server over SSH from the Raspberry Pi to the desktop PC with the
‑X
option. This works best if you are also using Linux on the desktop computer. If your router supports local name resolution, use thessh ‑X pi@raspberrypi@local
command to open the connection. All graphical output from programs then end up on the desktop computer. If the local DNS does not work, use the IP address of the Raspberry Pi, which you can look up from the list of connected devices on the router.
No Hello World
When you start working with a new programming language, the traditional approach is create a “Hello World” program; however, it takes a fair amount of assembler code and some understanding of strategies to generate even this simple output. Therefore, the first small assembler program only outputs the return code on the console, which indicates the status of a program on exiting. Bash stores this value in the $?
variable, which you read with echo $?
.
A return code of 0 means that the previously executed command ran without error; a value greater than 0 indicates an error. Listing 1 shows the example program 42.s
, so named because the return code value 42 is the result. (Note that the title of this article is 42 in binary-coded decimal (BCD) encoding, which represents each decimal section from 0 to 9 as 4 bits (i.e., half a byte – or a nibble, if you prefer.)
Listing 1: 42.s
.global main /* Entry point for the program */
main:
mov r0, #42 /* Move value 42 to register r0 */
bx lr /* Return to calling program */
Assembler comprises relatively simple commands that do nothing more than move bytes back and forth, manipulate them, or react to a status bit, which makes it extremely important to document the code thoroughly and to use mnemonic identifiers where possible.
Labels are used in programming languages to mark points in the source code that serve as jump targets. The compiler exchanges the label for a physical memory address at build time, which clarifies the massive advantage of using labels: You do not need to calculate laboriously where in memory a particular command is located. Moreover, with each additional command you insert, all the addresses below it would move.
As in many other programming languages, you need to specify the starting point for a program in assembler. In Java and C the corresponding function is main()
; in assembler you define the global label main
. The label must precede the first line of code you want to execute, as shown in the assembler program in Listing 1.
The first line defines main
globally so that the linker can find it. That label is then used in the second line, immediately followed by the first command, the mov
command (short for move), which is used to move values (e.g., to store constant values in a register or to transfer the content of one register to another). To move values from registers into RAM or load them from RAM, you need the str
(store register) and ldr
(load register) commands instead. A register is a memory location on the CPU. An overview of the registers on the Raspberry Pi is shown in Table 1.
Register |
Mnemonic |
Function |
r0-r10 |
– |
General registers without a special function |
r11 |
fp |
Frame pointer register |
r12 |
ip |
General register without a special function |
r13 |
sp |
Stack pointer |
r14 |
lr |
Link register |
r15 |
pc |
Program counter |
The program status register acts as the CPU’s internal control register. The states of the individual bits tell you what results specific CPU register operations return. The commands for conditional jumps react to these bits. One well-known control bit is the zero bit (bit 30), which indicates that the value of an arithmetic logic unit (ALU) of a CPU, wherein all calculations and comparisons are performed, is 0.
In the example in Listing 1, the mov
command stores a value of 42 in the CPU register r0
. When the program ends, the operating system reads the value of register r0
and stores it in shell variable $?.
The bx
command in the last line causes the CPU to continue the program at a different memory address – in this case, the address found in register lr
. This register stores the address for program calls that the computer has to make after the program terminates.
The only question that now remains is how to generate an executable program from the assembler code. The workflows required to do this are very similar to the process of compiling C programs:
$ as ‑o 42.o 42.s
$ gcc 42.o
$ ./a.out
$ echo $?
42
The first line creates an object file from the assembler source code, which is then bound in the next line to the operating system to obtain an executable file. Unless you specify otherwise, this file is named a.out
. You can execute the a.out
program as usual. The final command shows that the program returns the value 42.
Flash
Now that you have seen a simple assembler program, it’s time for a more sophisticated project. The good old flash program is a good candidate; it simply flashes a single LED. The positive contact of the LED is connected to GPIO21 (BCM notation) with a 1kilohm series resistor. The connection is routed out on header pin 40, which is handy because you have a GND on header pin 39 right next door. This pin is connected to the negative terminal on the LED.
Of course, the program presents several challenges to those used to programming with high-level languages. The GPIOs of the Raspberry Pi support extremely versatile use, which means it is not easy to address them correctly in an assembler program. Forty-one registers, each with a length of 32 bits, control the 54 GPIO pins of the chip installed on the Pi. (Not all of these GPIOs are accessible from the header; some of them are used internally by the Pi.) However, this still leaves you with a huge number of options for controlling the individual GPIOs. A detailed description of how the GPIOs work can be found in the BCM2835 peripherals data sheet (pages 89-109).
Now that you know how to address the GPIOs, another problem raises its head. The Raspberry Pi’s operating system prevents direct access to hardware resources. If you try to access the hardware directly, you will see a Segmentation Fault error message, which indicates that a memory protection violation occurred but gives you no additional clues as to where exactly the error occurred. Fortunately, the Raspberry Pi OS developers have provided a way to access the GPIOs without directly accessing the corresponding memory addresses. The operating system offers a driver to a special character file, /dev/mem
, that is a mirror of main memory. A good description of this can be found on the Sonoma State University website.
The first block of the program in Listing 2 contains the definition of various constants with assigned values, which offers several advantages: On the one hand, it lets you use meaningful names in the program, and on the other hand, it lets you to load the registers with 32-bit values.
Listing 2: flash.s
01 .globl main
02 .equ GPIOVAL, 0x200000 // Register value for the GPIO 21(BCM)
03 .equ GPFSEL2, 0x08 // Offset address for setting the GPIO mode
04 .equ GPIO_OUTPUT,0x08 // Define GPIO as an output
05 .equ GPFSET0, 0x1c // Offset register set
06 .equ GPFCLR0, 0x28 // Offset register clear
07 .equ TIME, 0x8000000 // Wait value
08 main:
09 ldr r0,=gpiomem
10 ldr r1,=0x101002 // Open for reading and writing
11 mov r7, #5
12 svc #0
13 mov r4, r0
14 mov r0, #0
15 mov r1, #4096
16 mov r2, #3
17 mov r3, #1
18 mov r5, #0
19 mov r7, #192
20 svc #0
21 // r0 Contains the base address of the mapped GPIO memory area
22 ldr r1, =GPIO_OUTPUT // GPIO21
23 str r1, [r0,#GPFSEL2] // set as output
24 ldr r2, =TIME // Wait in r2
25 ldr r1, =GPIOVAL // Register value in r1 for GPIO21
26 // Infinite loop
27 loop:
28 str r1, [r0,#GPFSET0] // Switch LED on
29 mov r10, #0 // Set r10 to 0
30 wait_on: // Increment r10 to TIME
31 add r10, r10, #1
32 cmp r10, r2
33 bne wait_on
34 str r1, [r0,#GPFCLR0] // Switch LED off
35 mov r10, #0 // Set r10 to 0
36 wait_off: // Increment r10 to TIME
37 add r10, r10, #1
38 cmp r10, r2
39 bne wait_off
40 b loop
41
42 .data
43
44 gpiomem: .asciz "/dev/gpiomem"
The mov
command can only move values up to a certain size directly to registers. The next large block of instructions opens the /dev/gpiomem
device and saves the base address of the mapped memory in register r0
.
Supervisor calls are used in the svc
block; put simply, these are something like calls to existing operating system programs. (The “Supervisor Calls” box provides additional information.) Initially, it is important that you have the option of accessing the GPIO from the address in r0
.
Supervisor calls (syscalls) are functions provided by the operating system to perform certain tasks. Each syscall has its own number with which it is called. On the Raspberry Pi, the numbers with the matching names can be output on the terminal with the
cat /usr/include/arm‑linux‑gnueabihf/asm/unistd‑common.h
command. You need the
svc #0
command to execute a syscall in assembler, but you can also execute syscalls from high-level languages. When doing so, the number of the syscall must be in the r7
register. Depending on the syscall, the registers r0
to r6
contain the associated parameters. The return value of the call always ends up in register r0
. You can access the documentation for the individual syscalls in a terminal window by typing:man 2 <name>
The
2
here indicates that you only want to search section 2 of the documentation.The flash program starts in line 22 by first setting the mode for GPIO21 to output. It then loads the wait value into register r2
(line 24), and line 25 stores the bit combination for switching GPIO21 on and off in register r1
.
The GPIO registers work in a fairly simple way, and each of them has a specific task (set GPIO, clear GPIO, enable pullup, etc.). Each single bit in the registers corresponds to a GPIO pin. If you set the bit corresponding to GPIO21 in register GPFCLR0
(register for clearing GPIOs), the GPIO drops off to 0, which is why the program loads the combination for GPIO21 into register r1
.
Now all you need to do later is alternately move the r0
register to the GPFSET0
and GPFCLR0
GPIO registers (lines 28 and 34). The command means: Store the contents of r1
at the memory address that results from adding the contents of r0
to the constant from GPFSET0
and GPFCLR0
, respectively. The two wait loops increment the value in r0
until it reaches the value of TIME
(r2
).
This procedure of creating a wait is not very smart, because one CPU core is counting continuously. You can use the top
command to look at the CPU usage when the program is running. A CPU time-optimized program would use timers and interrupts, but it would be considerably more complicated in that case.
Finally, line 40 contains an unconditional jump command that jumps back to the loop
label, thus running the program for all eternity.
Where To Go Next
Now that you are knee-deep in assembler programming, you might want to look into the subject in detail. I would recommend a tutorial, such as the one you can find on the Think in Geek website. It explains the basics from A to Z, with many useful tips.
You can easily enter simple examples, like the ones from this article, in an editor such as Nano. However, if you are working on more complex projects, you will want a more powerful editor to make your life easier.
Several possible ways to upload files to the Raspberry Pi are at your disposal. I mounted the Raspberry Pi over SSH in the Ubuntu file manager. This simple approach usually works fine on a LAN. With more complex projects it doesn’t make much sense to build all the files manually. Even the old-fashioned make
will save you time and overhead.
Before you start to implement a function, always have a look at the list of syscalls. In many cases you will find something suitable. When using syscalls, you can assume that they do not contain any errors, which is worth its weight in gold, especially in assembler programming.
Conclusions
In this article I was only able to scratch the surface of assembler programming, and many details remain open. Getting started with this programming language is not difficult, and the individual commands are not complicated. Once you have looked into the CPU architecture, the meaning of assembler code is fairly easy to understand.
The tricky part begins as soon as you start using assembler to solve problems that are typically tackled in high-level languages. Even a small program will quickly grow to a few hundred commands. The advantages are the minimal code footprint and maximum execution speed, if programmed correctly.