Adhyayan: Is matrix multiplication a Data flow?

Thursday, December 25, 2025

Is matrix multiplication a Data flow?

Yes — matrix multiplication can be a dataflow.
But only when it’s scheduled and streamed explicitly. Otherwise, it’s just a computation expressed as algebra.

Let’s unpack that cleanly.

1. What “dataflow” actually means

A dataflow model means:

Computation happens when data arrives
Execution order is implicit, not controlled by instructions
Data moves through fixed operators
No global “program counter” deciding what runs next

Think:

Assembly line
Circuit
Systolic array

Not:

CPU/GPU issuing instructions dynamically

2. Matrix multiplication as math vs execution

The math

$C_{i,j} = \sum_k A_{i,k} \times B_{k,j}$

This equation alone:

❌ is not dataflow
It’s just a declarative relationship

3. When matrix multiplication becomes a dataflow

Example: systolic array (TPU-style)

In a systolic array:

A values flow horizontally
B values flow vertically
Multiply–accumulate units sit at intersections
Partial sums flow diagonally or stay local

Here:

Each MAC fires when inputs arrive
No instruction fetch
No cache lookups
Timing is fixed

👉 That is pure dataflow.

4. GPU matrix multiplication (usually not dataflow)

On GPUs:

Threads are scheduled dynamically
Data is pulled from memory
Caches hide latency
Execution order depends on warps and occupancy

Even with Tensor Cores:

Control is still instruction-driven
Memory access is demand-based

So:

⚠️ Mathematically data-parallel
❌ Architecturally not dataflow

5. Compiler-defined dataflow (Groq-style)

Here’s the subtle but important case:

Compiler builds a static graph
Every multiply, add, and move is scheduled
Buffers and wires are fixed
No runtime decisions

Matrix multiply here:

Is unrolled
Placed spatially
Timed explicitly

This is dataflow even without a systolic array.

6. The litmus test (easy rule)

Ask this:

Does the compute wait for data, or does data wait for compute?

Model	Answer	Dataflow?
CPU	Data waits	❌
GPU	Mostly data waits	❌
TPU systolic	Compute waits	✅
Groq LPU	Compute waits	✅

7. Why this matters for inference

Inference:

Has known shapes
Has static weights
Has repeatable access patterns

So matrix multiplication during inference:

Can be compiled into dataflow
Should be, if you want determinism and efficiency

Training:

Dynamic shapes
Backprop
Irregular access

So:

Needs instruction-driven hardware

Final answer (tight)

Matrix multiplication is not inherently dataflow — but it becomes dataflow when mapped onto hardware with fixed, data-triggered execution (like systolic arrays or compiler-scheduled accelerators).

That distinction is the fault line between:

GPUs (general, flexible)
LPUs / TPUs (specialized, deterministic)

Adhyayan