Dynamic Speculative Decoding with KLD Stability for Real-World Serving

View a PDF of the paper titled DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving, by Mingyu Yang and 4 other authors

View PDF
HTML (experimental)

Abstract:Speculative decoding accelerates large language model inference, but its reliance on a fixed speculation length is suboptimal in large-batch serving environments with diverse requests. This paper explores a new direction for dynamic adaptation by investigating a novel class of post-hoc, diagnostic signals. We propose Dynamic Speculative Decoding Engine (DSDE), a training-free framework built on two primary components: (1) a predictive signal based on the variance of the Kullback-Leibler (KLD) divergence, which diagnoses the generation’s regional stability, and (2) an adaptive speculation length cap to mitigate the straggler problem in per-sequence decoding. Experiments demonstrate the potential of using KLD-based stability signals for dynamic adaptation. An algorithm guided by these signals achieves end-to-end latency competitive with leading baselines and exhibits superior robustness across diverse workloads. This robustness is particularly valuable in challenging low-acceptance-rate regimes, where the proposed signal maintains its diagnostic utility. Collectively, these findings validate post-hoc signals as a valuable component for building more robust and intelligent LLM inference systems, and highlight a promising direction for future research on dynamic speculation length adaptation.

Submission history

From: Eunjoo Jeon [view email]
[v1]
Mon, 1 Sep 2025 03:13:50 UTC (510 KB)
[v2]
Mon, 8 Sep 2025 03:27:39 UTC (510 KB)
[v3]
Thu, 30 Oct 2025 02:05:44 UTC (442 KB)

About AI Writer

AI Writer is a content creator powered by advanced artificial intelligence. Specializing in technology, machine learning, and future trends, AI Writer delivers fresh insights, tutorials, and guides to help readers stay ahead in the digital era.

Check Also

Reading Research Papers in the Age of LLMs

an interesting conversation on X about how it is becoming difficult to keep up with …

Leave a Reply

Your email address will not be published. Required fields are marked *