Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Deepseek v3 101
FP32 is used for RSNorm but FP8 mixed-precision for attention and feed forward network
Post a Comment
No comments:
Post a Comment