Memory Bandwidth Basics

What You'll Learn

Mental Model

Latency: How long one operation takes (time to first byte)
Bandwidth: How much data can be transferred per unit time (sustained throughput)

Memory has high latency (100-300 cycles) but also high bandwidth (tens of GB/s). For large sequential operations, bandwidth matters more than latency.

Why Streaming Kernels Are Bandwidth-Bound

Operations that process data sequentially (copy, add, multiply) are limited by how fast data can be moved from memory, not by computation. The CPU can compute faster than memory can supply data.

Roofline Intuition

The roofline model classifies kernels as:

Operational intensity (operations per byte) determines which limit applies. Low intensity → bandwidth-bound. High intensity → compute-bound.

Checklist