d2l.ai is often read as a machine-learning textbook. But for someone interested in hardware, it’s more useful to treat it as a workload specification.
Below are three ways to use it effectively.
1. d2l Chapters → Hardware Stress Points
Here’s a concrete mapping from d2l topics to what they stress in hardware.
MLPs
-
d2l focus: linear layers, activations
-
Hardware reality:
-
Dominated by GEMMs
-
Compute-heavy, easy to saturate GPUs
-
Memory reuse is excellent
-
-
Hardware lesson: why dense matmul is the ideal accelerator workload
CNNs
-
d2l focus: convolutions, pooling
-
Hardware reality:
-
Convs lower to matmul or stencils
-
Spatial reuse is critical
-
Sensitive to memory layout
-
-
Hardware lesson: why tiling and data locality matter
RNNs / LSTMs
-
d2l focus: recurrence, sequence modeling
-
Hardware reality:
-
Sequential dependency kills parallelism
-
Poor GPU utilization
-
-
Hardware lesson: why RNNs were a dead end for scaling
Transformers
-
d2l focus: attention, feed-forward blocks
-
Hardware reality:
-
Attention = memory-bound matmul + reductions
-
Training is parallel; inference is sequential
-
-
Hardware lesson: why memory bandwidth dominates modern accelerators
Optimization (SGD, Adam)
-
d2l focus: update rules
-
Hardware reality:
-
Small kernels, lots of memory traffic
-
Often fused to reduce overhead
-
-
Hardware lesson: why optimizer design affects kernel fusion and bandwidth
2. Which d2l Sections Matter Most for Hardware
If you’re reading selectively:
⭐ High value for hardware understanding
-
Linear layers & MLPs
-
CNNs (especially convolution lowering)
-
Attention & Transformers
-
GPU usage chapter (
use-gpu.html)
⚠️ Lower priority (hardware-agnostic)
-
Statistical learning theory
-
Classical ML (k-NN, Naive Bayes)
-
High-level ML math not tied to execution
The goal is not mastering ML theory — it’s understanding what patterns hardware must execute efficiently.
3. How to Read ML Textbooks as a Hardware Person
This mindset shift is the key.
When reading d2l.ai, ask:
-
Where are the matmuls?
-
How big are the tensors?
-
Is this compute-bound or memory-bound?
-
Can kernels be fused?
-
Is execution parallel or sequential?
Example:
“Attention has O(n²) complexity”
becomes
“This will thrash memory and stress bandwidth unless carefully tiled.”
That’s hardware literacy.
Why Patterson Ch. 7 and d2l Complement Each Other
-
Patterson Ch. 7 teaches machines
-
d2l.ai teaches workloads
Neither is sufficient alone.
Together, they let you:
-
predict bottlenecks
-
understand why TPUs look the way they do
-
see why scaling succeeds or fails
Dive into Deep Learning (d2l.ai) is a useful companion for understanding the structure of modern ML workloads. For hardware-minded readers, it’s most valuable once accelerator fundamentals are clear, as model descriptions can then be interpreted in terms of matrix multiplication, memory movement, and kernel behavior rather than abstract math.
One-Sentence Takeaway
Hardware explains how fast you can go; d2l explains what you’re trying to run.
Once you understand both, everything else becomes legible.
No comments:
Post a Comment