Bandwidth Experiments
What You'll Learn
- How to measure memory bandwidth
- How bandwidth scales with threads
- Why adding cores stops helping
- Understanding saturation
Experiment: Streaming Copy
#include <chrono>
#include <iostream>
#include <vector>
#include <thread>
using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::nanoseconds;
double measure_bandwidth(size_t size_bytes, int num_threads = 1) {
std::vector<char> src(size_bytes, 1);
std::vector<char> dst(size_bytes, 0);
auto start = Clock::now();
if (num_threads == 1) {
std::memcpy(dst.data(), src.data(), size_bytes);
} else {
// Multi-threaded copy
std::vector<std::thread> threads;
size_t chunk_size = size_bytes / num_threads;
for (int t = 0; t < num_threads; ++t) {
size_t offset = t * chunk_size;
size_t len = (t == num_threads - 1) ? (size_bytes - offset) : chunk_size;
threads.emplace_back([&, offset, len]() {
std::memcpy(dst.data() + offset, src.data() + offset, len);
});
}
for (auto& t : threads) {
t.join();
}
}
auto end = Clock::now();
auto elapsed = std::chrono::duration_cast<Duration>(end - start);
double seconds = elapsed.count() / 1e9;
double gb_per_sec = (size_bytes / 1e9) / seconds;
return gb_per_sec;
}
int main() {
size_t size = 1024 * 1024 * 1024; // 1 GB
std::cout << "Threads\tBandwidth (GB/s)\n";
for (int threads = 1; threads <= 8; threads *= 2) {
double bw = measure_bandwidth(size, threads);
std::cout << threads << "\t" << bw << "\n";
}
return 0;
} What to Measure
- GB/s: Memory bandwidth vs number of threads
- Saturation: Where adding threads stops helping
Expected Shape of Results
You should see:
- Single thread: Lower bandwidth (memory controller not fully utilized)
- Multiple threads: Higher bandwidth (better memory controller utilization)
- Saturation: Adding more threads stops helping (memory controller saturated)
Interpretation
Saturation: The memory controller has a maximum bandwidth. Once you hit that limit, adding more threads doesn't help—they just compete for the same bandwidth.
Why adding cores stops helping: Beyond the saturation point, you're bandwidth-bound, not compute-bound. More compute power doesn't help if you can't feed it data fast enough.
Checklist
- ✓ Measured bandwidth vs thread count
- ✓ Identified saturation point
- ✓ Understood why adding cores stops helping