Householder QR Factorization

Linear Algebra

A complete reference for the Householder QR factorization algorithm (Golub & Van Loan, Matrix Computations, Algorithm 5.2.1). Five parts covering the same algorithm from different angles — definitions, invariants, a question-and-answer walkthrough, a traditional narrative, and a concept map. They are not meant to be read in sequence; they are meant to be used as a reference you return to.

How to use this document

Part 2 (Definitions) and Part 3 (Invariants) are the core: every claim in the document is grounded there. Part 4 (Dialogue) is the recommended first read for a new reader — each question is an independent entry point. Part 5 (Narrative) is the synthesis, with a worked example and code. Part 6 (Concept Map) shows the dependency graph of ideas — which concepts are load-bearing, and which connections are non-obvious.


Part 1 — Context

The Householder QR factorization is Algorithm 5.2.1 in Golub and Van Loan, Matrix Computations (4th ed.). It is the standard method for computing the QR factorization of a dense matrix. The storage scheme it uses — packing Householder vectors into the zeros they create — is the format returned by the LAPACK routine DGEQRF. Anyone reading that output without knowing the storage scheme will be confused by the lower triangle.


Part 2 — Definitions

Every term used in Parts 3–5 is defined here. No term in the invariants requires knowledge from outside this section.

Matrix.
A rectangular array of numbers with rows and columns. Written , size . Entry in row , column is written .
Vector.
A column of numbers. Written . The -th entry is . A vector of length is an matrix.
Length of a vector.
. Also called the Euclidean norm.
Dot product.
. Always a single number.
Outer product.
is an matrix whose entry is .
Identity matrix.
is the square matrix with 1s on the diagonal and 0s elsewhere. for any vector .
Transpose.
has rows and columns of swapped: .
Orthogonal matrix.
A square matrix satisfying , equivalently . Applying to any vector preserves its length: for all .
Upper triangular matrix.
A matrix where whenever . Everything below the main diagonal is zero.
.
The vector . Points along the first coordinate axis.
.
Returns if , returns if .
Submatrix .
The block of formed by rows through and columns through .
Rank-1 update.
An operation of the form , where is a vector and is a scalar. It modifies every column of by subtracting a scalar multiple of . Cost: .

Part 3 — Glossary of Invariants

Every statement below is necessarily true. Each uses only terms from Part 2 or from earlier invariants. Statements are ordered by dependency — no invariant uses a term introduced later.

The object

  1. Every matrix with can be written exactly as , where is an orthogonal matrix and is an upper triangular matrix. This is not an approximation.
  2. being upper triangular means every entry below the main diagonal is zero. The diagonal and above are generally nonzero.
  3. being orthogonal means : applying changes the direction of vectors without changing their length.
  4. Because , the operation — reorienting a vector — costs and destroys no information.
  5. The factorization always exists for any matrix with .
  6. The factorization is not unique: negating the -th column of and the -th row of simultaneously gives a different but equally valid factorization. The signs of the diagonal of are free.

Why it is useful

  1. The system is equivalent to , obtained by substituting and multiplying both sides by .
  2. Because is upper triangular, is solved from the bottom row upward, one unknown at a time. This is back-substitution; it costs operations.
  3. The least squares problem — find minimizing — has the solution obtained by: computing , taking its first entries, then solving by back-substitution. No other steps are required.
  4. Applying to does not change , because is orthogonal. It reorients without distorting it.

The key constraint

  1. Any transformation applied to during the factorization must be orthogonal — it must preserve lengths. Setting entries to zero directly changes the matrix itself, not just its coordinate representation, and is not permitted.
  2. Zeroing all entries below position in a column is equivalent to finding an orthogonal transformation such that . The first entry must be because preserves length and the right-hand side must have the same norm as .

The Householder reflector

  1. For any nonzero vector , the scalar satisfies .
  2. The matrix , where , is called a Householder reflector.
  3. is orthogonal: . Applying to any vector preserves its length.
  4. is symmetric: .
  5. is its own inverse: . Applying twice returns any vector exactly to where it started.
  6. reflects space across the hyperplane — the set of all vectors perpendicular to . The component of any vector along is negated; components perpendicular to are unchanged.
  7. is never stored or formed as an explicit matrix. Applying to a vector uses: . This requires one dot product, one scaling, and one vector subtraction. Cost: .
  8. Applying to an matrix uses: , where is a row vector of numbers. Cost: .

Constructing to zero a column

  1. Given a vector of length , define . The reflector with then satisfies : the first entry is and all other entries are zero.
  2. The vector from invariant 21 points from the target to the current position . The hyperplane perpendicular to bisects and its target. Reflecting across this bisector maps it exactly onto the target.
  3. The zeros in are not set directly. They appear because has been rotated to point along . A vector pointing along has zero in every position except the first by definition.
  4. The sign in invariant 21 is chosen so that is always large in magnitude. The alternative sign gives , which is nearly zero when . Near-zero causes catastrophic cancellation when is computed, amplifying floating point errors by factors of millions. Example: , . Wrong sign: (dangerous). Correct sign: (safe).

The algorithm

  1. The algorithm performs steps. At step (for ), it operates on the submatrix .
  2. At step : extract . Construct from using invariant 21. The reflector maps to .
  3. Apply to the entire submatrix — not just column . Applying it to one column but not the others would leave the matrix inconsistent.
  4. acts only on rows through . It therefore cannot disturb columns through , which are already in their final form. This is what makes the column-by-column approach correct, not merely plausible.
  5. After applying : entry becomes . Entries become exactly zero.
  6. After all steps: the upper triangle of the working matrix (including diagonal) contains . The product is the orthogonal factor. is never computed explicitly.
  7. The total cost of the factorization is floating point operations. For a square matrix this is — twice the cost of LU factorization (). QR is preferred not for square systems but for least squares and rank-deficient problems, where its orthogonality guarantees are essential.

The storage scheme

  1. After step , entries are zero. These slots held no information about the original and will not be needed going forward. They are free to reuse.
  2. By normalizing so that (dividing by its first component after construction), the first entry is always exactly 1. Only — a total of entries — need to be stored.
  3. The number of free slots created at step (from invariant 32) is . The number of entries of that need storing (from invariant 33) is also . These counts are equal. This exact coincidence is what enables the storage trick.
  4. The free slots in the lower triangle of column store through . The diagonal entry stores . The upper triangle stores .
  5. One scalar is stored per step in a separate length- vector . It cannot be recovered from the stored entries alone without recomputation.
  6. The output is: (a) the overwritten array, upper triangle = , strictly lower triangle = essential parts of ; and (b) the vector of length . Together these are a complete lossless encoding of both and .
  7. The lower triangle of the output is not part of , not zero, and not padding. It is the compressed representation of . This is the LAPACK DGEQRF format.

Recovering and applying

  1. To apply to a vector : for (forward), reconstruct by prepending the implied 1 to the stored subdiagonal of column , then compute .
  2. To apply to a vector : run the same loop in reverse, . This is correct because , since each is symmetric by invariant 16.
  3. Forming as an explicit matrix costs . Applying or to a single vector using the stored representation costs . Explicit is almost never needed.

What the algorithm is not doing

  1. It is not setting entries to zero directly. Zeros appear because columns are rotated onto coordinate axes.
  2. It is not building first and then computing . It applies each reflector directly to the working matrix as it goes.
  3. It is not storing Householder vectors in a separate data structure. They live in the zeros they created.
  4. The lower triangle of the output is not a mistake. It is the compressed recipe for .

Part 4 — Dialogue

Each question is an independent entry point. Answers are self-contained.

What are we trying to do?

We have a matrix . We want to write it as , where is upper triangular and is orthogonal.

Why upper triangular?

Because triangular systems are trivial to solve. If , then becomes — and that is just back-substitution.

Why orthogonal ?

Because orthogonal matrices don't distort lengths. Applying to both sides of an equation changes coordinates without changing geometry. It is the safest possible transformation.

How do you make a matrix triangular?

Column by column. Take column 1. Make everything below the diagonal zero. Then column 2, and so on.

But you said we can't distort lengths. Doesn't zeroing entries distort things?

Yes — setting entries to zero directly changes the matrix, not just its representation. The trick is to rotate the column so it naturally has zeros below the diagonal.

What does it mean to rotate a column so it has zeros below the diagonal?

It means finding an orthogonal such that:

The result must be in the first position because preserves length.

So we're not zeroing entries. We're rotating the whole column onto an axis.

Precisely. The zeros appear as a consequence of alignment, not by being set directly.

What kind of transformation achieves this?

A Householder reflector: , .

Why a reflection and not a rotation?

Both work. Rotations (Givens) zero one entry at a time. Reflections (Householder) zero an entire subvector at once. For a dense column, one reflection beats rotations.

How does work geometrically?

It flips space across the hyperplane perpendicular to . The component of any vector along is negated; everything perpendicular to is unchanged.

How do you choose ?

This vector points from the target () to the current position (). The hyperplane perpendicular to bisects and its target. Reflecting across this bisector maps exactly onto the target.

Why that particular sign?

Numerical stability. The wrong sign gives , nearly zero when already points near . Near-zero means is near zero, and explodes. The correct sign makes , always large.

You apply to one column. What about the rest of the matrix?

You apply to all remaining columns simultaneously: .

Why?

Because you're changing the coordinate system for the whole matrix. Transforming one column but not the others would leave them inconsistent.

Is that expensive?

No. is never formed explicitly. The rank-1 update costs per step. Total: .

After applying , where do the zeros go?

The entries below the diagonal of column are now exactly zero — created by the reflection.

So those memory slots are free?

Exactly. And we need to store for later use in applying . So we store it there.

But has one more entry than the number of zeros.

The first entry of is always 1 by convention. It is never stored. The remaining entries fit exactly into the zeros.

What is stored separately?

Just — one scalar per column, a vector of length .

What is in the end?

We never formed any . So how do we use ?

for j = 1 to n:
    reconstruct v_j: prepend 1 to stored subdiagonal of column j
    b[j:] ← b[j:] − τⱼ · vⱼ · (vⱼᵀ b[j:])

That applies . Reverse the loop for .

Why does reversing give and not ?

Because each is symmetric: . So . The forward loop applies them left to right (); the reverse loop applies them right to left ().


Part 5 — Traditional Narrative

What QR factorization is

Given an matrix with , QR factorization produces and such that . is orthogonal (); is upper triangular. The factorization always exists and is unique up to sign choices on the diagonal of .

The system becomes , solved by back-substitution. The least squares minimizer of satisfies the same triangular system. In both cases, is applied without forming explicitly.

The Householder reflector

For any nonzero vector , the matrix

is symmetric and orthogonal. It reflects space across the hyperplane . It is its own inverse: .

The key property: given any vector , choose

Then : first entry , all other entries zero.

The sign convention ensures is large, avoiding catastrophic cancellation. The wrong sign gives when nearly points along , causing to explode.

is never formed explicitly. Applied to matrix :

Cost: rather than .

The algorithm

At step , extract . Construct and . Apply:

After this update: , entries . Earlier columns are untouched because acts only on rows through .

After all steps: has been overwritten with , and .

The compact storage scheme

After step , the entries are zero and free. Normalizing so , only needs storing — exactly entries, matching the free slots. The diagonal stores .

The output contains two things simultaneously: upper triangle = , strictly lower triangle = essential Householder vectors. A separate length- vector completes the encoding. This is the LAPACK DGEQRF format.

Applying without forming it

To apply to :

To apply : same loop, reversed. Cost: . Forming explicitly costs and is almost never needed.

Complete numeric example

Step 1. , .

Apply to all three columns via .

: compute row by row.

So .

After update:

where and are packed into the lower triangle, and .

Step 2. Subvector: , .

Since , :

Apply to the subblock. After this step: , . packed into position .

Step 3. Single entry: is already in upper triangular position. is read directly. No reflection needed.

Final packed array:

Upper triangle: the answer (). Lower triangle: the recipe for . These two objects share one array.

Implementation

import numpy as np

def householder_qr(A):
    A = A.astype(float).copy()
    m, n = A.shape
    tau = np.zeros(n)
    for j in range(n):
        x = A[j:, j]
        v = x.copy()
        # sign convention: avoid cancellation when x[0] ≈ ||x||
        v[0] += (1.0 if v[0] >= 0 else -1.0) * np.linalg.norm(x)
        v /= v[0]                                      # normalize so v[0] = 1
        tau[j] = 2.0 / (v @ v)
        A[j:, j:] -= tau[j] * np.outer(v, v @ A[j:, j:])  # rank-1 update
        A[j+1:, j] = v[1:]                            # pack v into lower triangle
    return A, tau                                      # upper = R, lower = v's, tau separate

Line 7 is the sign convention fix (np.sign(0) = 0 in NumPy; this version defaults to when ). Line 10 is the entire reflector application. Lines 11–12 are the storage trick.

To apply to a vector given the packed output:

def apply_Qt(QR, tau, b):
    b = b.copy().astype(float)
    m, n = QR.shape
    for j in range(n):
        v = np.concatenate([[1.0], QR[j+1:, j]])      # reconstruct v with implied 1
        b[j:] -= tau[j] * v * (v @ b[j:])
    return b

Part 6 — Concept Map

A diagram of the dependency graph of ideas across three clusters — geometry, mechanism, and storage & use — with labelled cross-cluster connections (sign rule → numerical stability; rank-1 update → packed array; column-by-column → zeros are free; the exact coincidence zero count = v entry count). SVG is pending.


Known remaining issues

The following items are identified for completion:

  1. Geometric diagram. A 2D SVG illustration showing: vector as an arrow from origin; target on the horizontal axis; as the vector from target to ; the hyperplane as the perpendicular bisector; landing on the axis. Should make visually obvious that zeros appear from alignment, not force.
  2. Numeric example step 2. The subblock update after constructing should be carried through to produce exact values of , , , and . Currently only is stated.
  3. Concept map SVG. The SVG described in Part 6 needs to be built and embedded.
  4. Invariant ordering audit. Confirm that no invariant uses a term introduced in a later invariant. Current known issue: check that the ordering of 13–24 is strict (definitions before use).