We built a multi-tier-SLO LLM serving system that treats Tensor Parallelism as a runtime knob instead of a fixed setting.
Post a Comment
No comments:
Post a Comment