SIMD / Vectorization Overview
What You'll Learn
- What SIMD (Single Instruction, Multiple Data) is
- Why vectorization matters for performance
- Alignment requirements
- Why compilers sometimes refuse to vectorize
Mental Model
SIMD instructions process multiple data elements in parallel using wide registers. Instead of adding two numbers, you can add eight pairs of numbers simultaneously (on AVX-256). This can provide 4-8x speedups for data-parallel operations.
SIMD Concept
Scalar: Process one element at a time
Vector: Process multiple elements at once (SIMD)
Example: Adding two arrays of 8 floats. Scalar: 8 add instructions. Vector: 1 SIMD add instruction (processes all 8 pairs).
Alignment
SIMD instructions often require data to be aligned to specific boundaries (e.g., 16-byte or 32-byte alignment). Unaligned access can be slower or cause faults.
Why Compilers Refuse to Vectorize
- Dependencies: Loop-carried dependencies prevent parallelization
- Alignment: Data not aligned to SIMD boundaries
- Unknown trip count: Compiler can't prove vectorization is safe
- Complex control flow: Branches in loops complicate vectorization
- Function calls: Unknown side effects prevent vectorization
Checklist
- ✓ Understand SIMD concept
- ✓ Know why vectorization matters
- ✓ Understand alignment requirements
- ✓ Know why compilers sometimes can't vectorize