Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Published in 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Abstract

Hao (Mark) Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I Venieris, Hongxiang Fan
Download