Understand the CPU (Timing-First Edition)

What This Section Teaches

This section teaches CPU performance engineering through portable experiments and measurement-driven reasoning. You'll learn to understand what the CPU is actually doing by measuring it, not just reading specs.

All experiments work on macOS, Linux, and Windows using timing measurements. Platform-specific tools (like perf on Linux) are optional enhancements, not requirements.

What is Performance Engineering?

Performance engineering is the practice of understanding and optimizing how code executes on real hardware. It's not about theoretical complexity—it's about measuring what actually happens and reasoning about why.

The key insight: CPUs are complex machines with caches, pipelines, branch predictors, and more. Understanding these mechanisms lets you write code that works with the hardware, not against it.

What Does "Understanding the CPU" Mean?

Modern CPUs are sophisticated systems with many interacting components. To understand them, we focus on three pillars:

Computation

The execution core: pipelines, superscalar execution, instruction-level parallelism, instruction latency and throughput.

Memory

The memory hierarchy: caches (L1/L2/L3), cache lines, TLB, DRAM bandwidth, locality, prefetching.

Coordination

Branching and concurrency: branch prediction, cache coherence, false sharing, lock contention, synchronization.

Why Timing is the Universal Fallback

Every platform has a high-resolution timer. Timing measurements work everywhere:

macOS: mach_absolute_time() or std::chrono
Linux: clock_gettime() or std::chrono
Windows: QueryPerformanceCounter() or std::chrono

By measuring wall-clock time for controlled experiments, we can infer what the CPU is doing without needing hardware performance counters. This makes our experiments portable and educational.

Optional enhancements: On Linux, perf provides detailed counters. On macOS, Instruments offers profiling. On Windows, ETW can capture events. But these are additions to timing, not replacements.

How to Use This Section

Each page follows a consistent structure:

What you'll learn — Key takeaways
Mental model — Conceptual explanation
Experiment — Portable code you can run
What to measure — Metrics and how to collect them
Expected shape of results — What you should see
Interpretation — What the CPU is doing
Common pitfalls — Noise sources and compiler tricks
Tooling upgrades (optional) — Platform-specific enhancements
Checklist — Validation steps

Prerequisites

Basic C++ knowledge (we use C++17/20)
A compiler (GCC, Clang, or MSVC)
Ability to run command-line programs
No hardware performance counters required

Getting Started

Start with Measurement Methodology to learn how to write reliable benchmarks. Then proceed through the modules in order, or jump to topics that interest you.

Each experiment is self-contained and includes all the code you need. Run it, measure it, and reason about the results.