Cache Overview
What You'll Learn
- The memory hierarchy: L1, L2, L3 caches
- Cache lines and why they matter
- Temporal and spatial locality
- Basic cache associativity concepts
Mental Model
CPUs are fast. Memory is slow. The gap between CPU speed and DRAM speed is huge—hundreds of cycles. Caches bridge this gap by keeping frequently-used data close to the CPU.
Think of caches like a desk: you keep frequently-used items (cache) on your desk for quick access, while everything else (DRAM) is in filing cabinets across the room.
The Memory Hierarchy
L1 Cache
Size: Typically 32-64 KB per core
Latency: 1-3 cycles
Location: On the CPU core itself
Purpose: Fastest access, holds most recently used data
L2 Cache
Size: Typically 256 KB - 1 MB per core
Latency: 10-20 cycles
Location: On the CPU core or shared
Purpose: Larger than L1, still fast
L3 Cache
Size: Typically 8-32 MB shared across cores
Latency: 40-75 cycles
Location: Shared between cores
Purpose: Large shared cache, reduces DRAM access
DRAM
Size: GBs to TBs
Latency: 100-300 cycles
Location: Off-chip memory modules
Purpose: Main system memory
Cache Lines
Caches don't store individual bytes. They store cache lines (typically 64 bytes). When you access one byte, the entire 64-byte line is loaded into the cache.
This is why spatial locality matters: accessing nearby memory is fast because it's already in the cache line. Accessing random memory is slow because each access might miss the cache.
Locality
Temporal Locality
If you access data once, you're likely to access it again soon. Caches exploit this by keeping recently-used data.
Example: Loop variables, recently accessed array elements.
Spatial Locality
If you access data at address X, you're likely to access data near X. Caches exploit this by loading entire cache lines.
Example: Sequential array access, struct fields.
Associativity (Basic Intuition)
When data is loaded into cache, where does it go? The cache is organized into sets. Each memory address maps to a specific set. Within that set, there are multiple ways (slots) where the data can be stored.
- Direct-mapped: 1 way per set (simple, but conflicts are common)
- N-way associative: N ways per set (reduces conflicts)
- Fully associative: Data can go anywhere (ideal but expensive)
Higher associativity reduces conflict misses (when two addresses map to the same set and evict each other), but increases complexity and cost.
Cache Miss Types
- Compulsory (cold) miss: First access to data (unavoidable)
- Capacity miss: Working set too large for cache
- Conflict miss: Multiple addresses map to same cache set
Why This Matters
Cache misses are expensive. A single DRAM access can take 100-300 cycles. If your code has poor locality, you'll spend most of your time waiting for memory, not computing.
Understanding caches helps you write code that:
- Fits in cache (small working sets)
- Accesses memory sequentially (spatial locality)
- Reuses data (temporal locality)
- Avoids conflict patterns
Checklist
- ✓ Understand L1/L2/L3 cache hierarchy
- ✓ Know what cache lines are (64 bytes)
- ✓ Understand temporal vs spatial locality
- ✓ Basic intuition about associativity
- ✓ Ready to measure cache effects