“Know how to solve every problem that has been solved.” “What I cannot create, I do not understand.” — Richard Feynman

Lock Contention

What You'll Learn

Mutex vs spin vs atomic operations
How contention affects throughput
Why contention causes collapse
Measuring synchronization overhead

Experiment: Mutex vs Spin vs Atomic

#include <chrono>
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <atomic>

using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::nanoseconds;

double benchmark_mutex(int num_threads) {
    std::mutex mtx;
    int counter = 0;
    const int iterations = 1000000;
    
    auto start = Clock::now();
    std::vector<std::thread> threads;
    for (int t = 0; t < num_threads; ++t) {
        threads.emplace_back([&]() {
            for (int i = 0; i < iterations; ++i) {
                std::lock_guard<std::mutex> lock(mtx);
                counter++;
            }
        });
    }
    for (auto& th : threads) th.join();
    auto end = Clock::now();
    
    auto elapsed = std::chrono::duration_cast<Duration>(end - start);
    return static_cast<double>(elapsed.count()) / (num_threads * iterations);
}

double benchmark_atomic(int num_threads) {
    std::atomic<int> counter{0};
    const int iterations = 1000000;
    
    auto start = Clock::now();
    std::vector<std::thread> threads;
    for (int t = 0; t < num_threads; ++t) {
        threads.emplace_back([&]() {
            for (int i = 0; i < iterations; ++i) {
                counter++;
            }
        });
    }
    for (auto& th : threads) th.join();
    auto end = Clock::now();
    
    auto elapsed = std::chrono::duration_cast<Duration>(end - start);
    return static_cast<double>(elapsed.count()) / (num_threads * iterations);
}

int main() {
    std::cout << "Threads\tMutex (ns)\tAtomic (ns)\n";
    for (int threads = 1; threads <= 8; threads *= 2) {
        double mutex_time = benchmark_mutex(threads);
        double atomic_time = benchmark_atomic(threads);
        std::cout << threads << "\t" << mutex_time << "\t" << atomic_time << "\n";
    }
    return 0;
}

What to Measure

Throughput: Operations per second vs thread count
Tail latency: P95/P99 latency (optional)
Contention collapse: Where throughput stops scaling

Expected Shape of Results

You should see:

Single thread: Mutex and atomic similar
Multiple threads: Atomic faster (no OS calls)
High contention: Both degrade, mutex worse

Interpretation

Contention collapse: When many threads compete for the same lock, most threads spend time waiting. Throughput collapses because threads are blocked, not computing.

Checklist

✓ Measured mutex vs atomic performance
✓ Observed contention effects
✓ Understood why contention causes collapse