Progressive Mixed-Precision Decoding for Efficient LLM Inference

Published in 2025 The Thirteenth International Conference on Learning Representations (ICLR), 2024

Abstract

Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson Lee, Hongxiang Fan, Stylianos I Venieris
Download