https://timzaman.com/getting-into-ai-infra
https://x.com/cosminnegruseri/status/2012041539137257544 - making llms practical
https://x.com/reinerpope/status/2044525525646119419 - MatX, chip design and where silicon designed for LLMs is headed
https://x.com/FabianGloeckle/status/2044791592247066702 Formally verified code
https://aleximas.substack.com/p/how-will-ai-driven-automation-actually
Chipstack uses https://wavedrom.com/images/SNUG2016_WaveDrom.pdf
https://cheekypint.substack.com/p/reiner-pope-of-matx-on-accelerating
(8:17) Tightly coupling SRAM and HBM on one chip (14:03) More MoE FLOPS, smaller KV cache load (16:08) Numerics: from 32-bit to 4-bit (19:02) Targeting both training and inference (22:14) Chip timelines (27:15) Logic and memory scarcity (29:42) Compute costs (32:07) Latency: from 20ms to 1ms as the new table stakes (40:50) Programming the chip (43:00) Starting MatX (47:11) Codesign without seeing the models (51:57) Interconnect design (55:44) Performance modeling philosophy (1:07:02) Prefill vs. decode (1:13:47) What's next
https://pdf.isaak.net/thesis - Scaling Brain emulation
https://x.com/danielhanchen/status/1931468866279932208 - FP8 in H100
https://x.com/amanrsanger/status/1668144627004903424 probably needs update
https://x.com/dwarkesh_sp/status/2032493847666659780 Space GPUs
https://www.cs.cmu.edu/~213/schedule.html