https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=improving_performance_with_kernels - A primer on GPUs
https://www.techpowerup.com/gpu-specs/docs/nvidia-gh100-architecture
gputpu-architectures-sharada-yeluri-p5hwc - TPU hitting MoE wall
https://chokkan.github.io/mlnote/ in Japanese
A Domain-SpecificArchitecturefor Deep NeuralNetworks must read
Excellent GPU vs TPU from Semianalysis - how TPU might get over MoE issue
https://blog.character.ai/squinch/ 6 bit gradient algo
https://annals-csis.org/Volume_18/drp/pdf/159.pdf reference from chewing chips substack regarding the details in the LSTM.
No comments:
Post a Comment