because “projection” is one of those words that sounds abstract but is actually very concrete in math and hardware.
I’ll explain it in layers and keep it grounded.
1. Projection in plain language
A projection means:
Taking a high-dimensional thing and expressing it using fewer, chosen directions.
Think:
-
You have a 3D object
-
You shine a light
-
The shadow on the wall is a projection
You didn’t keep all the information — only what lies along certain directions.
2. Mathematical definition (no ML yet)
Given:
-
A vector
-
A direction vector
The projection of x onto w is:
That single number tells you:
“How much of x points in the direction w?”
If you use multiple directions :
This is exactly a matrix multiply:
3. Projection in neural networks
In Transformers:
-
Input vector: token embedding
-
Projection matrix:
Compute:
This is called:
-
Linear projection
-
Feature projection
-
Embedding projection
Nothing mystical — just dot products.
4. Projection in self-attention
Queries, Keys, Values are projections
Given token embedding :
Each :
-
Selects different directions
-
Emphasizes different aspects of the same token
So:
-
Q = “What am I looking for?”
-
K = “What do I contain?”
-
V = “What information do I pass along?”
All three are just projections of x.
5. Projection in Primal-Attention (your context)
Instead of building a huge matrix :
-
Choose a small set of directions (rank )
-
Project Q and K into that space:
Now interactions happen in a low-dimensional space.
This is why:
-
Computation drops from O(N²) → O(N·s)
-
Memory drops from quadratic → linear
6. Hardware interpretation (important)
On hardware, projection = GEMM:
Key properties:
-
✔ Fully parallel
-
✔ Cache-friendly
-
✔ No control flow
-
✔ Streamable
This is why projections are hardware gold.
7. Intuition with a real-world analogy
Imagine:
-
Each token is a long list of properties (d = 1024)
-
You only care about:
-
sentiment
-
tense
-
entity type
-
position
-
topic
-
Each of those is a direction.
Projection asks:
“How much does this token express each of these?”
That’s it.
8. One-sentence takeaway
A projection is a learned change of coordinates that keeps only the directions you care about.
In math: dot products
In ML: linear layers
In hardware: matrix multiplies
-
Projection vs embedding
-
Projection vs PCA
-
Why projections are learned instead of fixed
-
How projection ≈ basis change
No comments:
Post a Comment