How to treat models as workloads and equations as execution plans
Machine learning textbooks are often written as if hardware doesn’t exist. They focus on equations, abstractions, and training dynamics, while the machine executing those equations is almost entirely invisible.
For hardware-minded readers, this can make ML texts feel frustrating or irrelevant.
That’s a mistake.
The right way to read ML textbooks is not as theory, but as workload descriptions.
Models Are Execution Plans
An MLP layer isn’t just a function. It’s a matrix multiplication followed by elementwise operations. A Transformer block isn’t an abstraction — it’s a sequence of large batched matmuls, reductions, and memory movement.
ML textbooks describe what computation must happen. Hardware determines how expensive it is.
Once you see this, equations stop being abstract and start looking like execution plans.
Where Hardware Insight Changes the Reading
Consider attention. A textbook will tell you it scales as O(n²) in sequence length. A hardware-minded reader immediately translates this into:
-
memory bandwidth pressure
-
cache locality challenges
-
limited parallelism during inference
That shift in perspective is the difference between understanding why attention works and understanding why it’s hard to run.
When ML Textbooks Become Useful
If you read ML texts before understanding hardware, models feel magical and performance feels arbitrary.
But once you understand:
-
memory hierarchies
-
matrix multiplication dominance
-
kernel fusion
-
bandwidth vs compute tradeoffs
ML textbooks suddenly become concrete. You can predict bottlenecks before you run code.
That’s the moment they become valuable.
d2l.ai as a Hardware Companion
Dive into Deep Learning (d2l.ai) is especially useful in this role. It lays out modern workloads clearly — MLPs, CNNs, Transformers — without hiding the underlying operations.
For hardware readers, it’s best used after accelerator fundamentals are in place, as a way to map models to matrix multiplication, memory movement, and kernel behavior.
The Real Payoff
Hardware literacy turns ML from a black box into an engineering discipline. You stop asking “why is this slow?” and start asking “where is the data moving?”
That’s when scaling, optimization, and architecture decisions start to make sense.
Closing Thought
Hardware tells you how fast you can go.
ML textbooks tell you where you’re trying to go.
Understanding both is what makes systems legible.
Footnote: This post was co-generated with the assistance of AI tools (including ChatGPT) as part of an exploratory and educational writing process.
No comments:
Post a Comment