Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Published in 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Abstract
Hao (Mark) Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I Venieris, Hongxiang Fan
Download