Monday, December 29, 2025

What is projection?

 because “projection” is one of those words that sounds abstract but is actually very concrete in math and hardware.

I’ll explain it in layers and keep it grounded.


1. Projection in plain language

A projection means:

Taking a high-dimensional thing and expressing it using fewer, chosen directions.

Think:

  • You have a 3D object

  • You shine a light

  • The shadow on the wall is a projection

You didn’t keep all the information — only what lies along certain directions.


2. Mathematical definition (no ML yet)

Given:

  • A vector xRdx \in \mathbb{R}^d

  • A direction vector wRdw \in \mathbb{R}^d

The projection of x onto w is:

projw(x)=(xw)\text{proj}_w(x) = (x \cdot w)

That single number tells you:

“How much of x points in the direction w?”

If you use multiple directions w1,w2,,wsw_1, w_2, \dots, w_s:

Projection(x)=[xw1xw2xws]\text{Projection}(x) = \begin{bmatrix} x \cdot w_1 \\ x \cdot w_2 \\ \vdots \\ x \cdot w_s \end{bmatrix}

This is exactly a matrix multiply:

y=xWwhere WRd×sy = x W \quad \text{where } W \in \mathbb{R}^{d \times s}

3. Projection in neural networks

In Transformers:

  • Input vector: token embedding xRdx \in \mathbb{R}^d

  • Projection matrix: WRd×sW \in \mathbb{R}^{d \times s}

Compute:

y=xWy = xW

This is called:

  • Linear projection

  • Feature projection

  • Embedding projection

Nothing mystical — just dot products.


4. Projection in self-attention

Queries, Keys, Values are projections

Given token embedding xx:

Q=xWQQ = x W_Q K=xWKK = x W_K V=xWVV = x W_V

Each WW:

  • Selects different directions

  • Emphasizes different aspects of the same token

So:

  • Q = “What am I looking for?”

  • K = “What do I contain?”

  • V = “What information do I pass along?”

All three are just projections of x.


5. Projection in Primal-Attention (your context)

Instead of building a huge matrix QKTQK^T:

  • Choose a small set of directions (rank ss)

  • Project Q and K into that space:

Q=QWqQ' = Q W_q K=KWkK' = K W_k

Now interactions happen in a low-dimensional space.

This is why:

  • Computation drops from O(N²) → O(N·s)

  • Memory drops from quadratic → linear


6. Hardware interpretation (important)

On hardware, projection = GEMM:

Input vectors ──► Matrix multiply ──► Smaller vectors

Key properties:

  • ✔ Fully parallel

  • ✔ Cache-friendly

  • ✔ No control flow

  • ✔ Streamable

This is why projections are hardware gold.


7. Intuition with a real-world analogy

Imagine:

  • Each token is a long list of properties (d = 1024)

  • You only care about:

    • sentiment

    • tense

    • entity type

    • position

    • topic

Each of those is a direction.

Projection asks:

“How much does this token express each of these?”

That’s it.


8. One-sentence takeaway

A projection is a learned change of coordinates that keeps only the directions you care about.

In math: dot products
In ML: linear layers
In hardware: matrix multiplies


  • Projection vs embedding

  • Projection vs PCA

  • Why projections are learned instead of fixed

  • How projection ≈ basis change

No comments: