Parallel Prompt Decoding: A Cost-Effective and Memory-Efficient Approach for Accelerating LLM Inference

Published in Arxiv, 2024

Abstract

Hao Mark Chen, Wayne Luk, Hongxiang Fan, Roberto Bondesan
Download