Progressive Mixed-Precision Decoding for Efficient LLM Inference
Published in 2025 The Thirteenth International Conference on Learning Representations (ICLR), 2024
Abstract
Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson Lee, Hongxiang Fan, Stylianos I Venieris
Download