hackaprompt AI Red Teaming Framework

hackaprompt, Dreadnode AIRT Agent Example for the Learn Prompting AI CTF challenges - inline with the Dreadnode SDK

An AI Red Teaming framework for testing prompt injection attacks against hackaprompt challenges.

Quick Start

Install dependencies:
```
uv sync
```
Get your session token (CRITICAL - Must include ALL cookies):

⚠️ IMPORTANT: You need the COMPLETE Cookie header, not just the auth tokens!

Step-by-step instructions:
1. Login to hackaprompt.com
2. Make sure you can see the challenges (fully logged in)
3. Open DevTools (F12) → Network tab
4. Refresh the page or navigate to any challenge
5. Click on any request to hackaprompt.com in the Network tab
6. In the Request Headers section, find the Cookie header
7. Copy the ENTIRE Cookie header value (very long string)
Required cookies (your Cookie header must include ALL of these):
- _ga_*=... ← Google Analytics (required!)
- _ga=... ← Google Analytics (required!)
- _fbp=... ← Facebook pixel (required!)
- sb-iligpfkvyargzgpcrquc-auth-token.0=... ← Auth token part 1
- sb-iligpfkvyargzgpcrquc-auth-token.1=... ← Auth token part 2
- ph_phc_*=... ← PostHog analytics
Expected format:
```
_ga_C1Y5QFNVBM=GS1.1.1752777779.3.1.1752777826.0.0.0; _ga=GA1.1.1961087042.1752612468; _fbp=fb.1.1752612468123.123456789; sb-iligpfkvyargzgpcrquc-auth-token.0=base64-...; sb-iligpfkvyargzgpcrquc-auth-token.1=...; ph_phc_qHP99rFsNJJhkOJRxkiJm4Trf4A3TormWq0g5ShTLfc_posthog=%7B...
```

Set environment variables:

# Required - complete cookie string from browser
export HACKAPROMPT_SESSION_TOKEN="_ga_C1Y5QFNVBM=...; _ga=...; _fbp=...; sb-iligpfkvyargzgpcrquc-auth-token.0=base64-...; sb-iligpfkvyargzgpcrquc-auth-token.1=...; ph_phc_qHP99rFsNJJhkOJRxkiJm4Trf4A3TormWq0g5ShTLfc_posthog=..."
export ANTHROPIC_API_KEY="your_anthropic_key"  # or other AI provider

# Optional
export GROQ_API_KEY="your_groq_key"
export OPENAI_API_KEY="your_openai_key"

Get Session ID (Required for API reliability):

⚠️ IMPORTANT: For reliable API calls, you need to intercept a session ID from your browser:

Open any hackaprompt challenge in your browser (make sure you're logged in)
Open Developer Tools (F12) → Console tab

Paste and run this code:

const originalFetch = window.fetch;
window.fetch = function(...args) {
    const result = originalFetch.apply(this, args);
    result.then(response => {
        if (args[0].includes('/api/chat') && args[1]?.body) {
            try {
                const body = JSON.parse(args[1].body);
                if (body.session_id) {
                    console.log('🎯 SESSION ID:', body.session_id);
                    window.hackaPromptSessionId = body.session_id;
                }
            } catch (e) {}
        }
    });
    return result;
};
console.log('✅ Ready! Submit a prompt to capture session ID.');

Submit any prompt in the challenge (like "hello")
Copy the session ID from the console output

Run challenges:

# RECOMMENDED: Run with intercepted session ID (most reliable)
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --challenges basics --session-id "your-session-id-here"

# Run all challenges with session ID
uv run -m hackaprompt.main --mode
l groq/moonshotai/kimi-k2-instruct --session-id "your-session-id-here"

# Run specific challenge with increased steps
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --challenges instruction_defense --max-steps 500 --session-id "your-session-id-here"

# Run with proxy (e.g., Burp Suite)
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --challenges basics --proxy http://127.0.0.1:8080 --session-id "your-session-id-here"

# Run without session ID (may encounter API errors)
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --challenges basics

# List available challenges
uv run -m hackaprompt.main --list-challenges

How It Works

The framework:

Loads challenges from hackaprompt using your session token
Generates prompt injection attacks using an AI agent deployed via Dreadnode's awesome library, rigging.
Executes attacks against real hackaprompt challenges and returns output from the challenge into the rigging chat pipeline
Validates responses using hackaprompt's judge API
Reports success when the target phrase "I have been PWNED" is achieved
Metrics and Scoring verifiers via Dreadnode SDK mechanics for efficient evaluations

Available Challenges

Use --list-challenges to see all available challenges across all tracks.

Proxy Support

The framework supports HTTP proxies for traffic interception and analysis:

# Use Caido/Burp Suite (default proxy)
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --proxy http://127.0.0.1:8080

# Use custom proxy
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct --proxy http://proxy.example.com:3128

# No proxy (default)
uv run -m hackaprompt.main --model groq/moonshotai/kimi-k2-instruct

Important: Since the framework runs inside Docker containers, localhost addresses (127.0.0.1, localhost) are automatically converted to host.docker.internal to access services running on the host machine. SSL certificate verification is disabled for proxy connections to support SSL interception tools like Caido or Burp Suite.

This allows security researchers to:

Intercept and analyze hackaprompt API traffic
Modify requests/responses for testing
Monitor prompt injection attempts
Debug authentication issues

Setup for Caido or Burp Suite:

Start Caido/Burp Suite and configure it to listen on 127.0.0.1:8080
Run the framework with --proxy http://127.0.0.1:8080
The framework will automatically convert this to host.docker.internal:8080 for Docker networking

Architecture

Docker isolation for secure code execution
Session management with proper cookie handling
Real API calls to hackaprompt (no simulation)
Concurrent execution across multiple challenges
Comprehensive logging via Dreadnode integration
Proxy support for traffic analysis and debugging

Challenge Data Structure

The framework uses two JSON files for challenge information:

challenges_info.json - Basic challenge metadata from hackaprompt API (names, URLs, status)
hackaprompt/challenge_manifest.json - Complete challenge data including:
- Detailed descriptions with constraints, scoring, and mechanics
- Complete defending model system prompts for each challenge
- Challenge types and verbose explanations
- All context needed for effective prompt injection attacks

The manifest-based approach ensures attack models receive comprehensive challenge context rather than just basic API metadata, enabling more targeted and effective prompt injection strategies.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.hooks		.hooks
hackaprompt		hackaprompt
scripts		scripts
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CODEOWNERS		CODEOWNERS
README.md		README.md
RENOVATE_TESTING.md		RENOVATE_TESTING.md
Taskfile.yaml		Taskfile.yaml
challenges_info.json		challenges_info.json
pyproject.toml		pyproject.toml
python.code-workspace		python.code-workspace
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hackaprompt AI Red Teaming Framework

Quick Start

How It Works

Available Challenges

Proxy Support

Architecture

Challenge Data Structure

Star History

About

Uh oh!

Releases 1

Uh oh!

Languages

GangGreenTemperTatum/hackaprompt

Folders and files

Latest commit

History

Repository files navigation

hackaprompt AI Red Teaming Framework

Quick Start

How It Works

Available Challenges

Proxy Support

Architecture

Challenge Data Structure

Star History

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages