Adhyayan: What is projection?

Monday, December 29, 2025

What is projection?

because “projection” is one of those words that sounds abstract but is actually very concrete in math and hardware.

I’ll explain it in layers and keep it grounded.

1. Projection in plain language

A projection means:

Taking a high-dimensional thing and expressing it using fewer, chosen directions.

Think:

You have a 3D object
You shine a light
The shadow on the wall is a projection

You didn’t keep all the information — only what lies along certain directions.

2. Mathematical definition (no ML yet)

Given:

A vector $x \in \mathbb{R}^d$
A direction vector $w \in \mathbb{R}^d$

The projection of x onto w is:

\text{proj}_w(x) = (x \cdot w)

That single number tells you:

“How much of x points in the direction w?”

If you use multiple directions $w_1, w_2, \dots, w_s$ :

\text{Projection}(x) = \begin{bmatrix} x \cdot w_1 \\ x \cdot w_2 \\ \vdots \\ x \cdot w_s \end{bmatrix}

This is exactly a matrix multiply:

y = x W \quad \text{where } W \in \mathbb{R}^{d \times s}

3. Projection in neural networks

In Transformers:

Input vector: token embedding $x \in \mathbb{R}^d$
Projection matrix: $W \in \mathbb{R}^{d \times s}$

Compute:

y = xW

This is called:

Linear projection
Feature projection
Embedding projection

Nothing mystical — just dot products.

4. Projection in self-attention

Queries, Keys, Values are projections

Given token embedding $x$ :

Q = x W_Q

K = x W_K

V = x W_V

Each $W$ :

Selects different directions
Emphasizes different aspects of the same token

So:

Q = “What am I looking for?”
K = “What do I contain?”
V = “What information do I pass along?”

All three are just projections of x.

5. Projection in Primal-Attention (your context)

Instead of building a huge matrix $QK^T$ :

Choose a small set of directions (rank $s$ )
Project Q and K into that space:

Q' = Q W_q

K' = K W_k

Now interactions happen in a low-dimensional space.

This is why:

Computation drops from O(N²) → O(N·s)
Memory drops from quadratic → linear

6. Hardware interpretation (important)

On hardware, projection = GEMM:


Input vectors  ──►  Matrix multiply  ──►  Smaller vectors

Key properties:

✔ Fully parallel
✔ Cache-friendly
✔ No control flow
✔ Streamable

This is why projections are hardware gold.

7. Intuition with a real-world analogy

Imagine:

Each token is a long list of properties (d = 1024)
You only care about:
- sentiment
- tense
- entity type
- position
- topic

Each of those is a direction.

Projection asks:

“How much does this token express each of these?”

That’s it.

8. One-sentence takeaway

A projection is a learned change of coordinates that keeps only the directions you care about.

In math: dot products
In ML: linear layers
In hardware: matrix multiplies

Projection vs embedding
Projection vs PCA
Why projections are learned instead of fixed
How projection ≈ basis change

Adhyayan