Krzysztof Hrynczenko's Dev Diary

My little place where I write about things that interest me.


32-bit ARM Assembly Basics: Part II - Segments, Memory, Stack, Heap

Posted on March 31, 2021

Introduction

Let's jump right into more details regarding 32-bit ARM assembly. In this post we will look at memory, and how it is supposed to be used, what are the tasks for the rest of the registers, and what is the stack, and how to use it.

Segments

If you look at the hello world example from part I of this series, you might remember two directives that we used there. They were the .data and the .text. You might wonder what is their meaning.

.data section indicates that this is the start of the data segment of our program. It is a segment in memory that we are allowed to read from, write to, but not execute.

.text section indicates that this is the start of the code segment. It is a segment in memory that we are allowed to read from, and execute, but not write to.

This division between these two sections is due to security concerns, like preventing executing injected code.

Memory

When thinking about how memory is laid out in our program we think of a continuous array. Byte after byte it is filled with some data or instructions that make up our program. We can access this data, but there is a catch. We can only do so for aligned words. This means that we can only access words that are put into addresses that are divisible by four.

0x00 <--- We can access this 
0x01
0x02
0x03
0x04 <--- We can access this 
0x05
0x06
0x07
0x08 <--- We can access this 

We must remember that we address words only by the address of their first byte. If the address of the first byte is not divisible by four we cannot access such word.

As mentioned in the section about segments, we already know of two places where data is stored. The data and code segments. But there are other places where we can store data.

The (Call) Stack

The call stack should not be thought of as a data structure. More precisely it is just another part of memory, but it has a special purpose and should be used in a certain way. How it is used is also related to the calling convention which we will discuss in the next post.

So what for do we use the call stack? We use it to record the function call history and store all the (not only) local variables that would not fit into the registers.

The interesting thing is that the stack grows downwards, i.e., towards the zero memory address. So when we push onto the stack we subtract from the original memory address. When we pop, we add to this address.

The essential part of working with the stack is the sp register known as the stack pointer. It should always point to the top of the stack (or the bottom depending on how you look at it as it grows to zero).

So how do we put something on the stack? First, we must allocate memory for new data. We do it by subtracting from the stack pointer.

sub sp, sp, #4

We have just allocated space for four bytes (machine word) on the stack. Now we need to write data on the stack.

str r0, [sp]

That is it. We just allocated new data on the stack. In order to pop from the stack, we do the reverse. We first get the value we want to read.

ldr r0, [sp]

After which we deallocate the memory by adding to the stack pointer.

add sp, sp, #4

That's it, but it quite cumbersome to add and subtract all by hand isn't it? Imagine if we would like to push several values onto the stack. That is why we have shortcuts that do exactly the same things, these shortcuts are instructions push and pop.

push {r0, r1}
pop {r2, r3}

This is equivalent to below.

sub sp, sp, #8
str r0, [sp]
str r1, [sp, #4]

ldr r2, [sp]
ldr r3, [sp, #4]
add sp, sp, #8

This example resulted in copying values from registers r0 and r1 to registers r2 and r3 respectively. Of course, this is just to showcase how the push and pop instructions work. Copying data between registers this way is inefficient.

An interesting detail is that when we use push and pop instructions the order of registers does not matter. Lower registers will be pushed towards lower memory addresses. So both below instructions have identical results. For me using gcc requires writing the registers in ascending order.

push {r0, r1, r2, r3}
push {r3, r2, r1, r0}

That means, whenever we pop we pop value that was coming from the smallest register pushed before.

push {r1, r2}
pop {r3, r4}  // r3 = r1, r4 = r2

Stack alignnent

You probably remember that word at memory should be aligned at four bytes? Well, when it comes to stack memory, there is an even bigger restriction, i.e., the stack pointer has to be aligned at eight bytes. So whenever we push only a single word onto the stack, it becomes misaligned. That is why in many cases when we need to push only a single word onto the stack, we push it together with a value from some dummy register like ip.

push {r1, ip}

That is the reason why we pushed lr together with ip at the beginning of our main function in the original example.

.text
    .global main
    main:
        push {ip, lr}
    ...

We want to push lr onto the stack because when calling other functions like printf they are allowed to use this register for themself so we could lose this value (more on that in the next post). We don't need to have ip value on the stack but we push it anyway just to keep things aligned.

Heap

The heap is another segment of memory, also called dynamic memory. We use it by requesting the operating system for allocation or deallocation. If we are using the libc function these will correspond to malloc and free. These perform system calls underneath. The malloc function returns a pointer to a memory that was allocated. It uses the value stored in r0 for how many bytes to allocate. After allocating it puts the aforementioned pointer into r0.

mov r0, #4
bl malloc
// r0 will now have the pointer to the allocated part of memory of four bytes
// length

The free function does not return anything, it only takes a pointer from the r0 register and deallocates memory at that address.

// assuming r0 still has the pointer
bl free

What's left

Now that we understand basic instructions, what are different memory segments, how to use stack and heap, we can dive into functions and what is calling convention. It will be the last step to fully understand the original hello world example. This will be explained in the part III.