Compilers

A compiler turns source text in one language into source text (or machine code) in another. The classical pipeline — lex, parse, codegen — is a small idea that scales: the same three stages run inside gcc, rustc, and every interpreter you've ever used. The way to learn it is to write one.

This section builds the same compiler — a growing subset of C, targeting x86-64 assembly — repeatedly, in different implementation languages. Same input, same output, different host. The host language changes what feels easy and what feels hard, and the contrast is most of what's worth learning after you've seen the algorithms once.

Implementation tracks

C compiler in Python — The opening track. Python lets the algorithms breathe — string slicing, dictionaries, exceptions, no manual memory. Read this first; the structure stays the same in every other track.
C compiler in C — The self-hosting move. No regex library, no built-in dict, no garbage collection. Forces you to confront the data structures you got for free in Python (symbol tables, AST allocation, error messages with source locations). When this version compiles itself, you have a self-hosting compiler.
C compiler in Lisp — The AST-as-data move. Lisp's homoiconicity makes the parser output literally a Lisp list, and codegen becomes a tree-walk with pattern matching. The whole compiler reads as one long transformation pipeline.
C compiler in OCaml — The type-driven move. Algebraic data types make AST nodes precise and exhaustive pattern matching catches missing cases at compile time. Many compilers in industry are written in OCaml or its descendants for this reason.

Only the Python track is written so far. The other tracks reuse the same chapter list, so once you've worked through Python you can pick any other track and read it as a "how do I express the same algorithm here?" exercise.

What we compile

A growing subset of C, chapter by chapter. The first chapter handles only int main(void) { return INTEGER; }; the last handles structs, pointers, and a multi-function program with a real calling convention. The C subset is the same across all tracks — only the implementation changes.

The target is x86-64 assembly in AT&T syntax — the dialect GNU as assembles by default. We hand the assembly to gcc for assembling and linking against the C runtime, so our main function becomes a real Unix program with a real exit code. Linux and macOS both work; the chapters call out the small Mach-O / ELF differences when they matter.