SIMD / Vectorization Overview

What You'll Learn

Mental Model

SIMD instructions process multiple data elements in parallel using wide registers. Instead of adding two numbers, you can add eight pairs of numbers simultaneously (on AVX-256). This can provide 4-8x speedups for data-parallel operations.

SIMD Concept

Scalar: Process one element at a time
Vector: Process multiple elements at once (SIMD)

Example: Adding two arrays of 8 floats. Scalar: 8 add instructions. Vector: 1 SIMD add instruction (processes all 8 pairs).

Alignment

SIMD instructions often require data to be aligned to specific boundaries (e.g., 16-byte or 32-byte alignment). Unaligned access can be slower or cause faults.

Why Compilers Refuse to Vectorize

Checklist