Cache Size Detection (Timing Sweep)

What You'll Learn

Mental Model

As you increase the working set size, you'll see step-like increases in access latency. Each step corresponds to a cache level being exceeded. By measuring when latency jumps, you can detect cache sizes without any hardware counters.

Experiment: Working Set Sweep

This experiment sweeps through different working set sizes and measures access time:

#include <chrono>
#include <iostream>
#include <vector>
#include <iomanip>

using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::nanoseconds;

// Pointer-chasing to force memory access
double measure_access_time(size_t working_set_bytes) {
    size_t elements = working_set_bytes / sizeof(size_t);
    std::vector<size_t> data(elements);
    
    // Create a linked list pattern (pointer chase)
    for (size_t i = 0; i < elements - 1; ++i) {
        data[i] = (i + 1) * sizeof(size_t);
    }
    data[elements - 1] = 0;  // Terminate
    
    // Warmup
    size_t ptr = 0;
    for (int i = 0; i < 1000; ++i) {
        ptr = data[ptr / sizeof(size_t)];
    }
    
    // Measure
    const int iterations = 100000;
    volatile size_t sink = 0;
    
    auto start = Clock::now();
    ptr = 0;
    for (int i = 0; i < iterations; ++i) {
        ptr = data[ptr / sizeof(size_t)];
        sink = ptr;  // Prevent optimization
    }
    auto end = Clock::now();
    
    auto elapsed = std::chrono::duration_cast<Duration>(end - start);
    return static_cast<double>(elapsed.count()) / iterations;
}

int main() {
    std::cout << "Working Set (KB)\tAccess Time (ns)\n";
    std::cout << std::fixed << std::setprecision(2);
    
    // Sweep from 1 KB to 256 MB
    for (size_t kb = 1; kb <= 256 * 1024; kb *= 2) {
        size_t bytes = kb * 1024;
        double ns = measure_access_time(bytes);
        std::cout << kb << "\t\t" << ns << "\n";
    }
    
    return 0;
}

Compile: g++ -O3 -o cache_sweep cache_sweep.cpp
Run: ./cache_sweep

Alternative: Stride-Walk Pattern

A simpler pattern that's easier to understand:

// Stride-walk: access every Nth element
double measure_stride(size_t working_set_bytes, size_t stride = 64) {
    size_t elements = working_set_bytes / sizeof(int);
    std::vector<int> data(elements, 1);
    
    volatile int sum = 0;
    const int iterations = 1000000;
    
    auto start = Clock::now();
    for (int i = 0; i < iterations; ++i) {
        size_t idx = (i * stride) % elements;
        sum += data[idx];
    }
    auto end = Clock::now();
    
    auto elapsed = std::chrono::duration_cast<Duration>(end - start);
    return static_cast<double>(elapsed.count()) / iterations;
}

What to Measure

Expected Shape of Results

You should see a step-like curve:

The exact sizes depend on your CPU, but you should see clear steps. Plot the data to visualize the cache hierarchy.

Interpretation

Where the jumps come from: When the working set exceeds a cache level, data must be fetched from the next level down. Each level is slower but larger. The latency jump is the difference between cache levels.

Why it's step-like: Caches have fixed sizes. Once you exceed a cache, all accesses go to the next level, causing a sudden latency increase. The plateau before the jump is when you're still within that cache level.

Results Section

Plot your results here: Create a log-log plot of working set size (x-axis) vs access time (y-axis). You should see clear steps.

Insert your plot here

Expected: Step-like curve with jumps at L1→L2, L2→L3, L3→DRAM boundaries

Common Pitfalls

Tooling Upgrades (Optional)

Linux: perf stat

perf stat -e L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses ./your_program shows exact cache miss counts. Compare with timing measurements.

macOS: Instruments

Instruments can show cache miss rates on supported hardware. Useful for validating timing measurements.

Windows: ETW

ETW can capture cache-related performance counters. Windows Performance Analyzer provides visualization.

Checklist