Cache Overview

What You'll Learn

Mental Model

CPUs are fast. Memory is slow. The gap between CPU speed and DRAM speed is huge—hundreds of cycles. Caches bridge this gap by keeping frequently-used data close to the CPU.

Think of caches like a desk: you keep frequently-used items (cache) on your desk for quick access, while everything else (DRAM) is in filing cabinets across the room.

The Memory Hierarchy

L1 Cache

Size: Typically 32-64 KB per core
Latency: 1-3 cycles
Location: On the CPU core itself
Purpose: Fastest access, holds most recently used data

L2 Cache

Size: Typically 256 KB - 1 MB per core
Latency: 10-20 cycles
Location: On the CPU core or shared
Purpose: Larger than L1, still fast

L3 Cache

Size: Typically 8-32 MB shared across cores
Latency: 40-75 cycles
Location: Shared between cores
Purpose: Large shared cache, reduces DRAM access

DRAM

Size: GBs to TBs
Latency: 100-300 cycles
Location: Off-chip memory modules
Purpose: Main system memory

Cache Lines

Caches don't store individual bytes. They store cache lines (typically 64 bytes). When you access one byte, the entire 64-byte line is loaded into the cache.

This is why spatial locality matters: accessing nearby memory is fast because it's already in the cache line. Accessing random memory is slow because each access might miss the cache.

Locality

Temporal Locality

If you access data once, you're likely to access it again soon. Caches exploit this by keeping recently-used data.

Example: Loop variables, recently accessed array elements.

Spatial Locality

If you access data at address X, you're likely to access data near X. Caches exploit this by loading entire cache lines.

Example: Sequential array access, struct fields.

Associativity (Basic Intuition)

When data is loaded into cache, where does it go? The cache is organized into sets. Each memory address maps to a specific set. Within that set, there are multiple ways (slots) where the data can be stored.

Higher associativity reduces conflict misses (when two addresses map to the same set and evict each other), but increases complexity and cost.

Cache Miss Types

Why This Matters

Cache misses are expensive. A single DRAM access can take 100-300 cycles. If your code has poor locality, you'll spend most of your time waiting for memory, not computing.

Understanding caches helps you write code that:

Checklist