https://x.com/drscotthawley/ - chunk sigreg slices
majority of gpu cycles in python
https://www.doubleai.com/research/doubleais-warpspeed-surpassing-expert-written-kernels-at-scale - distribution of error in human code different from
https://leodemoura.github.io/blog/2026/02/28/when-ai-writes-the-worlds-software.html
A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training
Formal Verification of an Iterative Low-Power x86 Floating-Point Multiplier with Redundant Feedback
https://x.com/MankyDankyBanky/status/2028923461213798724 - tiled matrix multiplication on GPU
No comments:
Post a Comment