Disclaimer: this article is experimental in that it is an LLM-assisted writeup. However, the structure and content is almost all my own, so you don’t need to worry about errors due to AI. (You may worry about errors caused by me instead 😆)


Today we’re going on a short journey through how programs go from your editor to execution. This is going to be a somewhat detailed exploration, so grab a cup of chai and let’s make our mark!

The Illusion of Memory: A Tale of Kernels and Spaces

As of today, ‘memory’ is arguably the most crucial aspect of a computer system, around which effectively everything revolves. And as far as computer hardware is concerned, we refer to it as “physical memory”. But when you as the user do something on your personal computer, the Operating System does its job (i.e., make everything ‘happen’ on the software side) without directly dealing with the physical memory. Instead, the OS is under the impression that there is some “virtual memory”, and that is from where it can look for ‘unused’ memory and avoid overwriting protected memory slots and so forth. It’s the OS kernel—the software sitting directly ‘above’ the hardware—that presents this virtual memory to the rest of the OS like it’s real.

When we talk about “the rest of the OS”, what we’re really referring to is what we call the application software layer, or user space. Here’s where things get interesting: everything in this user space has what we call “unprivileged access”. What does that mean? Well, it means it can’t just ‘talk’ with and directly access the hardware: there’s no way for it to poll or trigger an interrupt by itself (which are the two mechanisms by which you can request hardware to ‘do’ something for you). Sparing you from the technical deets for once, it’s analogous to something like this: a chef is who cooks food, but to get the food you want, you request it via a server in the restaurant, or through a food delivery service; no walking up to the kitchen and asking the chef yourself.

On the flip side, everything in the kernel space is living the high life with privileged access. This means it can directly access hardware resources whenever it wants. (Quick side note: there is a little more nuance to “privilege” than just this 😅 but it’s a pretty good starting point, and will be enough for this blog.)

But Why This Elaborate Illusion?

You might be wondering why we go through all this trouble with virtual memory through the kernel. Well, it’s pretty clever actually: the kernel knows all about the actual memory layout and alignment and all that good stuff. Plus, it sits right between the application software layer and the hardware, so it can act as this super smart mediator/middleman. This prevents users and software developers from doing anything potentially harmful with the actual memory, be it intentionally or not!

The Epic Journey: From C to Execution

Now, let’s embark on an even more fascinating journey. What happens when we write a C program and hit that magical ‘compile’ button? A multi-step process going on before the CPU finally gets to execute any instructions, of course. Oops if you thought it was simple, though let’s break it down anyway.

Stage 1: The Preprocessor’s Domain

First up in our compilation saga is the preprocessor, which has the job of resolving preprocessor directives likedefine. Let me break down what “resolve” means here, in English, it means to ‘close’ a matter, finish it up, like resolving an issue means doing something to get rid of that issue. It has the same meaning here and onwards.

So when the preprocessor resolves a directive likedefine Aa BB, it’s basically doing a search-and-replace mission, finding every Aa in the .c file’s text contents, and replacing each with BB. Meaning that after all the preprocessing, we’ll still have a .c file, just with all the preprocessor stuff sorted out. As the name suggests, we may now more meaningfully process the code and end up with some useful work.

Stage 2: The Compiler Takes the Stage

Here’s where things get really interesting. The actual compiler comes into play and generates what I’ll call “generic Assembly code”. This contains Assembly instructions corresponding to your C code and. yes, it includes some optimisations (e.g., some function with void return type ‘mysteriously’ not appearing in the Assembly generated by compiler can happen if perhaps there is no pointer access in that function, so it might be safe to assume that it plays no role during the program’s execution).

Yet, here’s the thing: this ‘generic’ Assembly code is actually pretty lazy! Why? Because the compiler’s job is simply to use the ISA (Instruction Set Architecture) as a reference manual of sorts, for translating from C to Assembly, and that’s it! This means that while the Assembly it generates will have instructions in the correct Assembly ‘flavour’ for your target ISA, it won’t be having any meaningful memory addresses in the instruction operands. Instead, it just puts placeholder numbers at all the places in the Assembly where the knowledge of the ISA alone isn’t enough to figure things out.

Fun fact time! 🎉 The “assembler”, which converts Assembly into binary (machine code), usually gets bundled together with the compiler. That’s why when you compile a C program, you’ll notice there’s an .o file that appears in your folder (corresponding to the .c file you tried to compile). That’s what we call an object file, and it contains machine code. You don’t actually get to see the file containing Assembly code, which is why compiler toolchains have a separate option for “disassembly” to simply convert the machine code back into Assembly.

Stage 3: Enter the Linker

Next up is the linker, whose job is to resolve all those placeholder symbols in the object files. Here’s an interesting detail: the linker is completely hidden from the user, so it makes total sense that it deals with machine code directly in the object files. (That might be why compiler toolchain spits out machine code directly, as I suppose there’s no point to having the intermediate Assembly code by default.)

But what exactly are these “symbols” that the linker resolves? They’re things like:

  • Memory addresses
  • Global variables
  • External variables
  • Any other placeholders that were waiting to be filled in

The linker goes on a scavenger hunt through certain directories to find the libraries and other C files you need. It looks within those files to find the locations (in virtual memory) of functions and variables you included in your program, and then plugs their memory addresses into the placeholders in the generic Assembly.

Here’s a tip: the linker will only look in directories and files that it’s explicitly told to look at. Yeah, I know, that’s a tautology (((🧠))). Still, what it means is that you can always track down where you told it about each directory/file. For example, the location for your C standard library (e.g., newlib) is probably tucked away in an environment variable on your OS, so that no matter where you cd to in your shell before you invoke the compiler toolchain, the linker will always know where to look for the library. Knowing this is how things work is just generally useful, hence why I’ve put it in as a tip! 😄

The Plot Thickens: Understanding the Output

At this point in our journey, we’ve got actual machine code, usually in an .exe format (if you’re using a modern Microsoft Windows OS) or .elf (a more generic format) or something similar. There’s actually a ton of details about what these files comprise and entail, but that’s a whole other deep dive for some other day.

Before we move on, there are some SUPER important points we need to clear up:

  1. When I mentioned “address” or “location” anywhere in this post so far, I was actually referring to addresses in virtual memory. This is because the linker (and everything before it) are part of the OS that can’t (and don’t need to) directly deal with physical memory.
  2. The machine code we get at the end (after the linking step) is actually meant for the OS, not the CPU itself, even though the CPU is what actually executes instructions in reality! As I just mentioned, everything in the machine code (so far) is in terms of virtual memory, which is this made-up thing that only the OS believes in.
  3. The Loader is normally considered the final step of the ‘compilation’ process and is also part of the OS (user mode). This is what springs into action when you double click that blabla.exe executable file (or do .\blabla.exe in the shell): the loader looks at the current state of the OS’s virtual memory, finds free space for the executable you’re trying to run, and then allocates space for it. (The executable is really just a bunch of instructions encoded in binary, so you can get an idea of how much space is needed by simply counting the number of bits.)
  4. Here’s the real kicker: there is literally NOTHING about physical memory in the OS or OS kernel at all… down to the very last detail (the loading step), it’s still all about virtual memory! So there’s actually a hardware unit that translates virtual memory addresses into physical memory addresses. This is known as the Memory Management Unit (MMU). You might’ve already encountered this if you’ve looked at the microarchitecture of an “application-class processor”, because an MMU is essential in a CPU if you want to load an OS into memory (i.e., “run an OS” on your processor).

System Calls: Where the Magic Happens

Another fascinating aspect is there are functions like write and _exit in some C libraries that are what we call “system calls” (syscalls), which do something requiring direct hardware interaction. Technically, that’s every function ever, but there’s abstractions involved. Syscalls are the most ‘barebones’ functions in some sense, meant to be free from abstractions so hardware can be used in a more straightforward manner through these functions. write is a syscall used in the printf function, and depending on your choice of a C standard library, its .c implementation can be messy. Let’s consider more principally what you might need to do to mimic printf’s functionality: you need to compute data on the CPU by executing instructions, and then somehow send the results to the host OS running on the CPU, where it’ll become available for the shell interface to render on screen. This “somehow sending results” might be possible through use of the UART, which is a common hardware unit for communication with a processor system. In that case, a basic print-output-to-shell function would require a syscall that invokes the kernel for interacting with the UART in hardware.

These functions, syscalls, are OS-dependent, meaning that a syscall will be implemented differently on a Linux-based OS and on Windows even if the high-level purpose for the function is to do the same thing. (Roughly speaking: when we say “implementation of syscall”, we’re pretty much just talking about the C code of the function definition.)

The “It Depends” Nature of Everything

So at the end of the day, the whole “C to binary” pipeline has a lot of “depends on…” at almost every step of the way. Let’s break this down:

Library Dependencies

What libraries are you using? There is no single “C standard library”, there are actually different implementations of a hypothetical C standard library. For example:

  • libc
  • glibc
  • newlib

These are all implementations of a C standard library. Meaning that if you call printf in a C program and you’re using stdio.h from libc, you are actually using an entirely different function compared to if you used newlib’s implementation of stdio.h! (Which one of your downloaded libraries is used for a particular compilation can be configured through options provided as part of the compiler toolchain, and people often bundle all their compiler options into a Makefile.)

Operating System Dependencies

What OS are you running? Windows, macOS, Ubuntu are the most popular examples but they all differ wildly in terms of:

  • How they implement virtual memory, like what size their page tables are (i.e., size of divisions within virtual memory)
  • What executable formats they support (e.g., .exe on Windows like I mentioned) and how data is structured in that executable format

… and more.

Hardware Dependencies

What ISA (i.e., machine architecture) does your CPU implement? How does your MMU wor—… wait, actually, the whole software compilation extravaganza doesn’t care about the MMU at all, because like I said, it’s all in terms of virtual memory. A rare case when something doesn’t matter! 😅

And we aren’t even close to scratching the surface yet. Like, how is your compiler designed? What about the assembler, linker, loader? What about dynamic memory allocation (or anything else that’s “dynamic”, i.e., something that happens at run-time rather than compile-time)? It’s good to live a stressfree life, so you’re better off worrying about it later, heh.

The Bare Metal Case: Life Without an OS

I think the answers to what physical memories are used & how depend on the MMU design ultimately, assuming that we are running an OS on the CPU itself.

But what if there is no OS in the first place, like a CPU core without an MMU? Then things are like so:

  1. To start with, it’s obviously necessary for the CPU to implement the ISA accurately
  2. Then, it suffices to generate ‘simple’ RISC-V Assembly (→ machine code), for which the linker must be explicitly told the addresses of physical memory … [the existence of a linker here assumes you compile the program on a different computer and then somehow directly load the machine code into this MMU-less CPU]
  3. So the memory-mapped I/O we define in the C programs in lab are actually the ‘real’ addresses in this core’s physical memory!

Alternatively, if you have just the CPU core and no access to any compiler toolchain at all, you’ll have to:

  • Manually write machine code
  • Put it in some storage that the CPU can interface with in hardware
  • Load that machine code directly into the instruction memory somehow … [which means that there should be something that happens in the core when it’s powered on… something that sets the Program Counter to read, by default, from the instruction memory starting at a specific address, and so the machine code you put in the instruction memory must be placed beginning at that specific address]

A Note About RISC-V

There is a version of the all-popular gcc compiler toolchain for the RISC-V ISA, which lets you tell the compiler to strictly implement the RISC-V calling conventions, which are rules about what the ‘generic’ Assembly must do in certain categories of situations. (There are multiple such conventions, like ilp32.)

When you call the compiler toolchain in a shell, you can pass in arguments that specify what the machine architecture is (e.g., rv32imc), and also what the machine Application Binary Interface is (for the calling conventions). That more or less determines the ‘flavoir’ of the machine code generated at the end. (If you’re curious about more options and details, check out the RISC-V compiler options documentation!)

FIN

Aaand we are done. A lengthy escapade into the fascinating world of how programs actually go from our fingertips to execution. I know we covered A LOT, but hopefully this helped demystify concepts that are sometimes not talked about together in this manner.

P.S. This is all still a fairly simplified view of things, of course. There’s always more to learn and discover when it comes to computer systems, as I myself keep finding out on a daily.


Addendum

I know that many of us like to see at least one concrete example to tie everything together, and so here is my attempt at one.

Our Example Program

We’ll use a simple program that demonstrates interaction with the OS, writing to standard output:

#include <stdio.h>
 
int main() {
    int number = 42;
    printf("The answer is: %d\n", number);
    return 0;
}

Seems simple, but there’s so much happening under the hood.

Stage 1: Preprocessing

After the preprocessor runs, our code looks something like this:

// [stdio.h contents get inserted here, including function declarations]
 
int main() {
    int number = 42;
    printf("The answer is: %d\n", number);
    return 0;
}```
 
 
### Stage 2: Compilation to RISC-V Assembly
 
As this is a popular Assembly language (at least amongst my peers at university), and simple, let’s use RISC-V. The compiler transforms the C code into RISC-V assembly (using RV32I base integer instruction set for simplicity):
 
```asm
.rodata
str:
    .string "The answer is: %d\n"
 
.text
.globl main
 
main:
    # Prologue
    addi    sp, sp, -16        # Adjust stack pointer
    sw      ra, 12(sp)          # Save return address
    sw      s0, 8(sp)            # Save frame pointer
    addi    s0, sp, 16         # Set up frame pointer
 
    # Store 42 in stack frame
    li      a5, 42                 # Load immediate value 42
    sw      a5, -12(s0)       # Store in stack frame
 
    # Prepare printf arguments
    lw      a1, -12(s0)        # Load 42 into a1 (second printf argument)
    lui     a0, %hi(str)        # Load upper immediate of string address
    addi    a0, a0, %lo(str)# Load lower immediate of string address
    call    printf                 # Call printf
 
    # Return 0
    li      a0, 0                   # Load immediate value 0 for return
 
    # Epilogue
    lw      ra, 12(sp)          # Restore return address
    lw      s0, 8(sp)            # Restore frame pointer
    addi    sp, sp, 16        # Restore stack pointer
    ret                              # Return

It’s totally fine to not understand most of this. All you need to know conceptually is that the C code was converted, using calling conventions, into the equivalent (RISC-V) Assembly, and that it does exactly what you expect it to do since it’s functionally the same as the C code.

Stage 3: Machine Code

The assembler converts this into machine code. Here’s a snippet of what some of these instructions look like in binary (with annotations):

# addi sp, sp, -16
1111111111110000 00000 00010 0010011     # 0xFFF00113
 
# sw ra, 12(sp)
0000000 00001 00010 010 01100 0100011    # 0x00112623
 
# li a5, 42
0000000000101010 00000 01111 0010011    # 0x02A00793

Again: not an issue if this isn’t clear, all you need to know is that we can directly map the Assembly to the corresponding machine code, deterministically. This is thanks to the existence, implementation, and use of the ISA.

What Happens at Runtime?

Here’s where things gets more interesting. Let’s see what happens when we run this program:

  1. Loading the Program:
  2. Starting Execution:
  3. Printf: Here’s where we see the OS in action! When printf is called:

The Role of Virtual Memory

Remember how we talked about virtual memory? Here’s how it works in our example:

  1. When our program accesses the stack (like storing 42):

sw a5, -12(s0) # Store 42 in stack frame

  1. This virtual address (-12 offset from s0) gets translated by the MMU:

Virtual Address: 0x7FFFFFF4 ↓ [MMU Translation] Physical Address: 0x4A2C1F4

(example addresses, used for the sake of explanation only)

ISA vs. Microarchitecture

While our assembly code shows the ISA-level instructions, the actual CPU might execute them differently internally:

  1. ISA Level (kind of the last thing we’d see, written in assembly instead of machine code for clarity):
  2. Microarchitecture Level (a more accurate description of what typically happens on a ‘proper’ processor):

Key Takeaways

The example demonstrates:

  1. How user code transitions to kernel mode (through printf)
  2. Virtual memory translation (stack access)
  3. The separation between machine architecture and its microarchitectural implementation
  4. The role of the OS in managing resources

P.S. For the curious: different OS implementations might handle these steps slightly differently, and different CPU designs might have various differences and optimisations in their microarchitecture. The beauty of abstraction layers is that our program works regardless.