patterson 5th edition solutions
Yes — matrix multiplication can be a dataflow.
But only when it’s scheduled and streamed explicitly. Otherwise, it’s just a computation expressed as algebra.
Let’s unpack that cleanly.
1. What “dataflow” actually means
A dataflow model means:
-
Computation happens when data arrives
-
Execution order is implicit, not controlled by instructions
-
Data moves through fixed operators
-
No global “program counter” deciding what runs next
Think:
-
Assembly line
-
Circuit
-
Systolic array
Not:
-
CPU/GPU issuing instructions dynamically
2. Matrix multiplication as math vs execution
The math
This equation alone:
-
❌ is not dataflow
-
It’s just a declarative relationship
3. When matrix multiplication becomes a dataflow
Example: systolic array (TPU-style)
In a systolic array:
-
A values flow horizontally
-
B values flow vertically
-
Multiply–accumulate units sit at intersections
-
Partial sums flow diagonally or stay local
Here:
-
Each MAC fires when inputs arrive
-
No instruction fetch
-
No cache lookups
-
Timing is fixed
👉 That is pure dataflow.
4. GPU matrix multiplication (usually not dataflow)
On GPUs:
-
Threads are scheduled dynamically
-
Data is pulled from memory
-
Caches hide latency
-
Execution order depends on warps and occupancy
Even with Tensor Cores:
-
Control is still instruction-driven
-
Memory access is demand-based
So:
-
⚠️ Mathematically data-parallel
-
❌ Architecturally not dataflow
5. Compiler-defined dataflow (Groq-style)
Here’s the subtle but important case:
-
Compiler builds a static graph
-
Every multiply, add, and move is scheduled
-
Buffers and wires are fixed
-
No runtime decisions
Matrix multiply here:
-
Is unrolled
-
Placed spatially
-
Timed explicitly
This is dataflow even without a systolic array.
6. The litmus test (easy rule)
Ask this:
Does the compute wait for data, or does data wait for compute?
| Model | Answer | Dataflow? |
|---|---|---|
| CPU | Data waits | ❌ |
| GPU | Mostly data waits | ❌ |
| TPU systolic | Compute waits | ✅ |
| Groq LPU | Compute waits | ✅ |
7. Why this matters for inference
Inference:
-
Has known shapes
-
Has static weights
-
Has repeatable access patterns
So matrix multiplication during inference:
-
Can be compiled into dataflow
-
Should be, if you want determinism and efficiency
Training:
-
Dynamic shapes
-
Backprop
-
Irregular access
So:
-
Needs instruction-driven hardware
Final answer (tight)
Matrix multiplication is not inherently dataflow — but it becomes dataflow when mapped onto hardware with fixed, data-triggered execution (like systolic arrays or compiler-scheduled accelerators).
That distinction is the fault line between:
-
GPUs (general, flexible)
-
LPUs / TPUs (specialized, deterministic)
No comments:
Post a Comment