Roofline Model
What You'll Learn
- What operational intensity is
- Compute roof vs bandwidth roof
- How to classify kernels from measured data
- Understanding performance limits
Mental Model
The roofline model shows that kernel performance is limited by either:
- Compute roof: Maximum FLOPS the CPU can perform
- Bandwidth roof: Maximum memory bandwidth
Which limit applies depends on operational intensity (operations per byte).
Operational Intensity
Operational intensity = Operations / Bytes transferred
Examples:
- Copy: 0 ops/byte (just memory movement) → bandwidth-bound
- Dot product: 2 ops/byte (1 multiply + 1 add per 4 bytes) → bandwidth-bound
- Matrix multiply: 2 ops/byte (high reuse) → compute-bound
Compute Roof vs Bandwidth Roof
Bandwidth roof: Performance = Bandwidth × Operational Intensity
Compute roof: Performance = Peak FLOPS (flat line)
The roofline is the minimum of these two limits. Low intensity → bandwidth-bound. High intensity → compute-bound.
How to Classify Kernels
From measured data:
- Measure kernel performance (FLOPS)
- Measure memory bandwidth used
- Calculate operational intensity
- Compare to theoretical roofs
- Classify as bandwidth-bound or compute-bound
Checklist
- ✓ Understand operational intensity
- ✓ Know compute roof vs bandwidth roof
- ✓ Can classify kernels from measurements