GHSA-83vm-p52w-f9pw: vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Summary
In vLLM versions 0.18.0 through 0.19.1, a bug in the `extract_hidden_states` speculative decoding proposer (a component that predicts tokens ahead of time to speed up AI inference) causes the server to crash when any request includes sampling penalty parameters like `repetition_penalty`. The crash happens because the proposer returns a tensor (multi-dimensional array) with the wrong shape after the first step, causing a shape mismatch error when penalties are applied.
Solution / Mitigation
Fixed in vLLM v0.20.0 (PR #38610) by slicing the return value to `sampled_token_ids[:, :1]` to ensure the correct shape. If upgrading is not possible, either avoid using `extract_hidden_states` as the speculative decoding method, or strip penalty parameters (`repetition_penalty`, `frequency_penalty`, `presence_penalty`) from incoming requests at an API gateway before they reach vLLM.
Vulnerability Details
EPSS: 0.0%
Yes
May 6, 2026
Classification
Affected Vendors
Affected Packages
Related Issues
CVE-2022-29200: TensorFlow is an open source platform for machine learning. Prior to versions 2.9.0, 2.8.1, 2.7.2, and 2.6.4, the implem
CVE-2021-29541: TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a dereference of a null p
Original source: https://github.com/advisories/GHSA-83vm-p52w-f9pw
First tracked: May 6, 2026 at 08:00 PM
Classified by LLM (prompt v3) · confidence: 95%