Memory Bandwidth Basics
What You'll Learn
- The difference between latency and bandwidth
- Why streaming kernels are bandwidth-bound
- Roofline model intuition
- How to measure memory bandwidth
Mental Model
Latency: How long one operation takes (time to first byte)
Bandwidth: How much data can be transferred per unit time (sustained throughput)
Memory has high latency (100-300 cycles) but also high bandwidth (tens of GB/s). For large sequential operations, bandwidth matters more than latency.
Why Streaming Kernels Are Bandwidth-Bound
Operations that process data sequentially (copy, add, multiply) are limited by how fast data can be moved from memory, not by computation. The CPU can compute faster than memory can supply data.
Roofline Intuition
The roofline model classifies kernels as:
- Compute-bound: Limited by CPU compute power
- Bandwidth-bound: Limited by memory bandwidth
Operational intensity (operations per byte) determines which limit applies. Low intensity → bandwidth-bound. High intensity → compute-bound.
Checklist
- ✓ Understand latency vs bandwidth
- ✓ Know why streaming kernels are bandwidth-bound
- ✓ Basic roofline intuition
- ✓ Ready to measure bandwidth