Multi-Agent Pentest Framework

승인된 범위 안에서 OWASP Top 10:2025 기반 공격 표면 매핑, 취약점 후보 생성, replay 검증, evidence 저장, confirmed-only report 생성을 자동화하는 black-box 모의해킹 런타임입니다.

현재 구현은 완성형 multi-agent 제품이 아니라, 팀원이 OWASP 항목별/취약점별 specialist를 나눠 개발할 수 있도록 만든 contract-first runtime baseline입니다.

현재 기준선

사용 런타임: pentest_engine
기본 실행 모드: deterministic local runner
지원 specialist: xss, idor, business_logic, sqli, authz, auth_session_check, security_misconfig, sensitive_data, exception_check, component_inventory
기본 discovery 경로: Claude Agent SDK와 in-process MCP server 기반 AI crawler
AI crawler: Playwright-backed MCP tool로 동적 surface를 탐색하고 Python runtime이 scope, queue, visited, persistence, AttackSurface 정규화를 담당
MVP 기준: OWASP Top 10 Coverage Matrix
기본 목표: 승인된 범위 안에서 넓은 공격 표면/후보 수집과 재현 가능한 검증 체인을 함께 달성
기본 산출물: outputs/pentest_engine/<run-id>/
기본 보고서 정책: confirmed finding만 report.md에 포함하되, exploit replay finding과 automatic security check를 구분

이전 v1 파이프라인이나 임시 agent 구조로 되돌아가지 않습니다.

최종 목표 모델

최종 제품 모델은 campaign-oriented automated black-box pentest platform이다.

현재 MVP는 linear pipeline에 가깝다.

Discovery -> Attack Surface -> Task Dispatch -> Agent Probe -> Replay Validation -> Report

장기 목표는 하나의 scan을 독립 실행이 아니라 campaign으로 관리하는 구조다.

Campaign
  -> Scope / Policy / Auth Profile
  -> Surface Graph
  -> Coverage / Gap Planner
  -> Specialist Agent Pool
  -> Evidence / Candidate Store
  -> Replay Validation
  -> Feedback Loop
  -> Confirmed-only Report

Campaign runtime은 승인된 scope 안에서 가능한 넓게 공격 표면과 후보를 찾고, 누락된 coverage gap을 다시 planner에 돌려 재탐색/재검증하며, replay로 증명된 finding만 보고서 본문에 올린다. 즉 현재는 pipeline-based MVP이고, 최종 목표는 campaign-based autonomous pentest system이다.

Specialist 확장 기준

현재 xss, idor, business_logic, sqli, authz, auth_session_check, security_misconfig, sensitive_data, exception_check, component_inventory는 기본 specialist로 본다. 새 specialist는 OWASP Top 10:2025에서 아직 자동 체크/후보 생성이 약한 항목을 우선 보강한다.

우선순위	후보 agent	OWASP 항목	목표
1	`auth` / `session_check`	A07 Authentication Failures	로그인, 세션, token, password reset/change 흐름의 인증 실패 후보 수집
2	`security_misconfig`	A02 Security Misconfiguration	현재 HTML security header checker, `cors` hint 기반 credentialed reflected CORS checker, OpenAPI/Swagger docs와 debug/admin interface manual-review 후보 구현됨
3	`sensitive_data` / `crypto_check`	A04 Cryptographic Failures	현재 auth/session cookie attribute checker와 `cleartext_transport` hint 기반 session material transport manual-review 후보 구현됨. localStorage, 민감정보 평문 노출, broader transport check는 후속
4	`exception_check`	A10 Mishandling of Exceptional Conditions	현재 error/debug hint가 있는 safe GET 응답의 stack trace, framework/debug error, verbose error 노출 checker 구현됨
5	`component_inventory`	A03 Software Supply Chain Failures	현재 source map/package manifest 노출과 외부 client asset SRI 누락을 manual-review 후보로 정리. broader component inventory는 후속

sqli는 safe probe와 replay oracle이 기본 specialist로 들어왔고, destructive SQL payload는 정책상 제외한다. authz는 A01 cross-auth replay baseline으로 기본 경로에 들어왔지만, object ownership 관측과 IDOR 경계가 겹치므로 confirmed/rejected 기준은 계속 보수적으로 조정한다.

빠른 시작

make setup
cp .env.example .env
make lint
make test
make run-local

make run-local은 fixture 기반 데모를 실행합니다.

python pentest_engine/orchestrator.py \
  --config configs/scan_config.yaml \
  --runner local \
  --attack-surface fixtures/attack_surface_sample.json

실행 후 outputs/pentest_engine/<run-id>/ 아래에 결과가 생성됩니다. outputs/, .env, telemetry, session log, runtime project log는 커밋하지 않습니다.

팀원이 먼저 읽을 문서

개발을 시작하기 전에 아래 순서로 읽는 것을 권장합니다.

CONTRIBUTING.md: 브랜치, 커밋, PR 기준
docs/05-team-work-items.md: 팀원별 작업 분담 후보
docs/07-repository-structure.md: 폴더 역할과 ownership
docs/agent-authoring-guide.md: specialist 작성 계약
docs/03-safety-model.md: scope, policy, evidence 안전 규칙

팀 개발 규칙

각자 맡은 specialist나 모듈만 좁게 수정합니다.
공통 계약 변경은 단독으로 진행하지 않습니다.
pentest_engine/models/, pentest_engine/contracts.py, pentest_engine/validation/ 변경은 테스트와 문서 업데이트를 같이 포함합니다.
새 specialist는 registry, config, dispatch/probe/replay 테스트를 함께 추가합니다.
merge 전 make lint, make test, make run-local을 통과시킵니다.
실제 대상 정보, 인증값, 세션 로그, raw evidence 원문은 커밋하지 않습니다.

어디를 수정해야 하나

작업	주 수정 위치	참고
새 취약점 specialist 추가	`pentest_engine/specialists/`	`docs/agent-authoring-guide.md`
specialist 활성화	`configs/scan_config.yaml`	`agents.enabled`
attack surface fixture 보강	`fixtures/`	실제 대상 데이터 금지
crawler/discovery 개선	`pentest_engine/discovery/`	현재 기본 경로는 Claude Agent SDK/MCP 기반 AI crawler다. agent는 Playwright-backed MCP tool로 탐색 판단을 수행하고 runtime은 scope/queue/visited/persistence/정규화를 통제한다. authenticated journey와 coverage 품질 보강은 후속 과제
replay/검증 개선	`pentest_engine/validation/`	계약 변경 가능성이 높음
공통 모델 변경	`pentest_engine/models/`, `pentest_engine/contracts.py`	팀 합의 필요
보고서 출력 개선	`pentest_engine/reporting/`	confirmed-only 유지
테스트 추가	`tests/`	`tests/specialist_harness.py` 활용

Runtime 흐름

attack_surface
  -> OWASP Top 10 coverage mapping
  -> specialist tasks
  -> agent candidates
  -> replay validation
  -> validated/rejected findings
  -> confirmed report

실행 산출물:

attack_surface.json
tasks.json
skipped_tasks.json
checker_sampling.json (lightweight checker sampling이 발생한 경우)
candidates.json
validated_findings.json
rejected_candidates.json
validation_assistant_notes.json (validation.assistant.enabled=true일 때만)
reporter_agent_notes.json (reporting.reporter_agent.enabled=true일 때만)
report.md
failure_summary.md (실패 종료 시 direct cause, 영향, 다음 조치 요약)
run_summary.json
manifest.json
scheduler_results.json
discovery_provenance.json
latest_run.json
evidence/http_traces.jsonl (HTTP trace가 생성된 경우)
evidence/task_screenshots.json, evidence/task_screenshots/*.png (tools.browser=true와 output.save_screenshots=true일 때 best-effort visual context)

needs_manual_review는 최종 보고서 본문에 넣지 않고 rejected_candidates.json에만 남깁니다.

OWASP Top 10 항목 중 자동 PoC/replay가 가능한 항목은 exploit_replay confirmed finding으로 올리고, A02/A04/A10 같은 lightweight checker 결과는 configuration_check 또는 session_control_check confirmed finding으로 구분합니다. black-box에서 자동 확정이 어려운 항목은 needs_manual_review 또는 extension_candidate로 분리합니다. 기준은 docs/09-owasp-top10-coverage.md를 따릅니다.

manifest.json과 latest_run.json은 dashboard/status API가 읽는 index입니다. discovery_provenance.json은 raw attack surface를 열지 않아도 discovery input mode, endpoint source, endpoint/flow/auth context count를 확인할 수 있게 하는 provenance artifact입니다. checker_sampling.json은 A02/A03/A04/A10 lightweight checker fan-out이 대표 endpoint sampling으로 줄었을 때 selected/sample-out count와 host/query를 제거한 sampled-out path summary를 남깁니다. failure_summary.md는 실패한 run에서 failed phase, direct cause, partial artifact, 재실행 조치를 사람이 읽는 형태로 남깁니다. evidence/http_traces.jsonl은 candidate의 evidence_refs에서 직접 참조되며 manifest artifact index에도 포함됩니다. Task screenshot은 confirmed 판정 근거가 아니라 dashboard/report에서 위치를 빠르게 이해하게 돕는 visual_context로만 연결됩니다.

agents.failure_policy 기본값은 continue_on_partial_failure입니다. 일부 specialist task가 일반 runner/sub-agent 오류로 실패해도 성공한 task가 하나 이상 있으면 실패 task를 scheduler_results.json, failed_tasks, run_summary.metadata.agent_failures에 기록하고 validation/report까지 진행합니다. 단, budget exhaustion, contract validation failure, 또는 모든 runnable task의 일반 실패는 run 실패로 유지합니다.

Dashboard API는 make status-api로 실행합니다. 브라우저 UI는 http://127.0.0.1:8765/dashboard에서 열 수 있습니다. UI는 target URL과 시작 버튼 중심으로 구성되어 있고, mode/runner/scope/agent/authorization은 자동 기본값으로 scan start 요청에 포함합니다. 제출 후에는 입력 폼을 숨기고 진행 단계별 caption과 agent 상태를 보여주며, 완료 후에는 report.md preview와 artifact 카드를 표시합니다. 현재 GET /api/status, GET /api/runs/latest, GET /api/runs/{run_dir}로 run 상태를 읽고, POST /api/scans로 명시적 authorization 확인이 포함된 scan start 요청을 받을 수 있습니다. 실행 중인 scan request는 POST /api/scans/{request_id}/cancel로 best-effort 취소할 수 있습니다. Phase 3.5 demo dashboard는 full queue worker 대신 동시에 하나의 active scan만 허용합니다. Scan start 요청은 임시 config와 log path를 outputs/pentest_engine/_scan_requests/ 아래에 만들고 고정된 orchestrator command만 background process로 실행합니다. 저장된 scan request는 GET /api/scans, GET /api/scans/{request_id}로 조회하며, detail payload에는 pid 기반 process liveness, terminal exit code/status, stdout/stderr log 파일 상태가 포함됩니다.

최소 scan start 요청 예시:

{
  "confirm_authorized": true,
  "target_url": "http://localhost:3000",
  "mode": "live_discovery",
  "runner": "local",
  "scope_paths": ["/"],
  "out_of_scope_paths": ["/admin/delete"],
  "enabled_agents": ["xss", "idor", "business_logic", "sqli", "authz", "auth_session_check", "security_misconfig", "sensitive_data", "exception_check", "component_inventory"]
}

Contract baseline

공통 계약 버전은 contract_version으로 run_summary.json과 manifest.json에 기록됩니다. 현재 버전은 2026.05.v3입니다. phase_events.json, budget_ledger.json, dashboard scan metadata처럼 기존 필수 model field를 깨지 않는 runtime 관측성/상태 payload는 additive 변경으로 보고 같은 version에서 관리합니다. AttackSurface, SpecialistTask, FindingCandidate, ValidatedFinding, RunSummary, rejected candidate 필수 필드가 바뀌면 version bump를 검토합니다.

오케스트레이터는 아래 단계에서 계약 위반 시 validation_failed(6)으로 종료합니다.

attack_surface 로드
tasks 생성
agent 실행 결과(AgentRunResult)
candidates 집계
validated_findings / rejected_candidates

종료 코드:

0 completed
1 unexpected_error
2 config_validation_failed
3 policy_blocked
4 budget_exhausted
5 runner_failed
6 validation_failed

Safety baseline

이 프로젝트는 승인된 테스트 환경에서만 사용합니다.

target scope 밖 host/path로 확장하지 않습니다.
destructive action은 자동 실행하지 않습니다.
HTTP 요청은 runtime의 HttpClient와 PolicyGate 경계를 통과해야 합니다.
evidence는 저장 전 redaction되어야 합니다.
report에는 검증된 finding만 포함합니다.
command/browser/external tool gateway는 아직 일반 실행 경계로 열려 있지 않습니다.

자세한 정책은 docs/03-safety-model.md를 기준으로 합니다.

Runner

Local runner

기본 개발과 테스트는 local runner를 사용합니다.

make run-local

local runner는 deterministic contract runner이며, specialist task별 HTTP probe와 replay 검증을 수행합니다.

Claude SDK runner

Claude SDK 연동을 확인할 때만 사용합니다.

make run-sdk

전제조건:

Claude Code CLI 설치 (claude 명령 사용 가능)
claude login 또는 CLAUDE_CODE_OAUTH_TOKEN 설정
필요 시 CLAUDE_CONFIG_DIR 설정

claude_sdk runner는 동일한 AgentRunner 계약을 구현하는 교체 가능한 백엔드입니다. 실패 시 fallback 없이 종료하며, partial output과 실패 원인을 run_summary.json에 남깁니다.

Specialist MCP runner의 실제 Claude tool-use smoke는 아래 opt-in script로 확인할 수 있습니다. 이 smoke는 fake HTTP target을 사용하지만 Claude SDK 호출은 실제로 수행하므로 token/cost가 발생할 수 있습니다.

.venv/bin/python scripts/mcp_agentic_specialist_smoke.py --run-claude --case sqli

자세한 기준은 docs/mcp-agentic-specialist-smoke.md를 참고합니다.

Validation Assistant / ReporterAgent의 실제 Claude smoke는 fake/redacted artifact 입력만 사용해 opt-in으로 실행합니다. 두 assistant는 status, severity, evidence, report inclusion을 바꾸지 못하고 설명 draft만 생성합니다.

.venv/bin/python scripts/agentic_assistant_smoke.py --run-claude --case all

자세한 기준은 docs/agentic-assistant-smoke.md를 참고합니다.

Specialist MCP Runner, Validation Assistant, ReporterAgent가 orchestrator artifact 흐름 안에서 함께 연결되는지는 기본 fake SDK mode 통합 smoke로 확인합니다. --run-claude를 붙이면 동일한 fake HTTP target에서 실제 Claude SDK 호출을 수행합니다.

.venv/bin/python scripts/agentic_pipeline_smoke.py
.venv/bin/python scripts/agentic_pipeline_smoke.py --run-claude

자세한 기준은 docs/agentic-pipeline-smoke.md를 참고합니다.

Run profiles

run_profile은 실행 안정성과 agentic 기능 노출 범위를 나누는 기본값 bundle입니다. 명시적으로 작성한 config 값은 profile 기본값보다 우선합니다.

Profile	목적	기본 방향
`safe_demo`	제출/시연용 안정 실행	`local` runner와 기존 deterministic 경로 유지
`agentic_demo`	Claude SDK/MCP 구조 시연	`claude_sdk` + `mcp_agentic` specialist runner 기본값
`lab_auto`	제한된 반복 실험	`mcp_agentic` 사용, 더 낮은 cost/request/discovery budget

Profile은 budget, Claude SDK max_turns, allowed_tools, discovery turn/cost 기본값을 조정한다. 기본 scan config와 dashboard scan request는 인증 자동화를 위해 auth.auto_register=true를 사용하되, credential 저장과 LLM registration planner는 기본 비활성이다. confirmed 승격과 report 포함 여부는 profile과 무관하게 replay validator와 confirmed-only reporter가 결정한다.

Dashboard에서 새 scan을 시작할 때 profile을 별도로 지정하지 않으면 agentic_demo와 claude_sdk runner를 기본으로 사용한다. 안정적인 deterministic 재현이 필요하면 request payload나 URL query에서 run_profile=safe_demo&runner=local을 명시한다.

새 specialist 추가 요약

예를 들어 새 specialist를 추가한다면 일반적인 수정 범위는 아래와 같습니다. sqli와 authz는 이미 이 경로로 추가되어 기본 agent 목록에 포함되어 있습니다.

pentest_engine/specialists/<agent_type>.py
pentest_engine/specialists/__init__.py
configs/scan_config.yaml
tests/test_engine_specialists_registry.py
tests/test_engine_dispatch_rules.py
tests/test_engine_<agent_type>_specialist.py 또는 tests/test_engine_specialist_harness.py

구현 체크리스트:

agent_type, cwe, remediation 정의
endpoint/flow task 생성 조건 정의
task context를 구조화된 데이터로 구성
probe(...)에서 안전한 최소 PoC 요청 생성
candidate에 hypothesis, preconditions, expected_impact, replay_steps 포함
legacy 호환이 필요한 경우에만 reproduction_plan 사용
validate_replay(...)에서 보수적인 oracle 구현
dispatch, probe, replay, contract 테스트 추가

프로젝트 구조

multi-agent-pentest-framework/
├── configs/
│   └── scan_config.yaml
├── fixtures/
│   └── attack_surface_sample.json
├── docs/
│   ├── 00-vision-and-goals.md
│   ├── 01-target-architecture.md
│   ├── 02-roadmap.md
│   ├── 03-safety-model.md
│   ├── 04-mentor-feedback-alignment.md
│   ├── 05-team-work-items.md
│   ├── 06-current-limitations.md
│   ├── 07-repository-structure.md
│   ├── 08-team-progress-status.md
│   ├── 09-owasp-top10-coverage.md
│   ├── 10-validation-reporting.md
│   ├── 11-main-agentic-runtime-direction.md
│   ├── README.md
│   ├── discovery-report.md
│   └── agent-authoring-guide.md
├── pentest_engine/
│   ├── orchestrator.py
│   ├── config/
│   ├── dashboard/
│   ├── discovery/
│   ├── evidence/
│   ├── models/
│   ├── reporting/
│   ├── runtime/
│   ├── specialists/
│   └── validation/
└── tests/

설계 문서

docs/00-vision-and-goals.md: 제품 지향점, 참고 모델, north-star metrics
docs/01-target-architecture.md: 목표 runtime architecture와 agent autonomy level
docs/02-roadmap.md: 단계별 구현 로드맵과 팀 workstream
docs/03-safety-model.md: authorization, intensity level, policy, evidence 안전 모델
docs/04-mentor-feedback-alignment.md: 멘토 피드백 반영 여부와 남은 작업
docs/05-team-work-items.md: 팀원별로 바로 나눌 수 있는 구현 작업
docs/06-current-limitations.md: 현재 구현 한계와 후속 보완 지점
docs/07-repository-structure.md: 폴더 역할, ownership, 향후 skills/agent_profiles 구조
docs/08-team-progress-status.md: workstream별 현재 구현 상태와 다음 작업
docs/09-owasp-top10-coverage.md: OWASP Top 10:2025 기준 MVP 커버리지와 차등 처리 방식
docs/10-validation-reporting.md: replay validation, evidence 저장, confirmed-only report와 dashboard 결과 기준
docs/11-main-agentic-runtime-direction.md: Claude Agent SDK/MCP 기반 Discovery runtime 상태와 향후 agentic 계층 확장 방향
docs/README.md: 문서 index와 읽기 순서
docs/discovery-report.md: 보고서에 붙일 수 있는 Discovery AI crawler 설명 초안
docs/agent-authoring-guide.md: specialist agent 작성 가이드

Agentic runtime 방향

현재 main의 가장 agentic한 계층은 Discovery다. Discovery 기본 경로는 Claude Agent SDK와 in-process MCP server를 사용하는 AI crawler이며, agent가 Playwright-backed MCP tool을 선택해 동적 웹 surface를 탐색한다. Python runtime은 allowed scope, pending queue, visited state, persistence, evidence redaction, 최종 AttackSurface 정규화를 계속 담당한다.

Specialist MCP Runner는 opt-in mcp_agentic 경로로 들어왔고, agent가 get_task_context, get_surface_context, run_safe_http_probe, emit_candidate MCP tool을 제한적으로 사용할 수 있다. get_task_context는 specialist별 replay hint와 class별 agent playbook을 제공하고, run_safe_http_probe는 PolicyGate, request budget 집계, redacted EvidenceStore를 통과하는 HTTP 관찰만 허용한다. Specialist agent는 여전히 FindingCandidate만 만들 수 있다. Validation Assistant와 ReporterAgent는 opt-in 설명 보조 레이어로 들어왔다. Validation Assistant는 rejected_candidates의 replay/reason 설명 초안을 만들 수 있지만 status를 결정하지 않고, ReporterAgent는 validated_findings 중 confirmed finding만 입력으로 받아 finding별 문장 초안을 만들 수 있지만 report inclusion을 결정하지 않는다.

자세한 기준은 docs/11-main-agentic-runtime-direction.md를 따른다.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
configs		configs
docs		docs
fixtures		fixtures
pentest_engine		pentest_engine
plans		plans
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
claude_agent_sdk_document.md		claude_agent_sdk_document.md
main_agentic_runtime_direction.md		main_agentic_runtime_direction.md
policy-limits.json		policy-limits.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Pentest Framework

현재 기준선

최종 목표 모델

Specialist 확장 기준

빠른 시작

팀원이 먼저 읽을 문서

팀 개발 규칙

어디를 수정해야 하나

Runtime 흐름

Contract baseline

Safety baseline

Runner

Local runner

Claude SDK runner

Run profiles

새 specialist 추가 요약

프로젝트 구조

설계 문서

Agentic runtime 방향

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Pentest Framework

현재 기준선

최종 목표 모델

Specialist 확장 기준

빠른 시작

팀원이 먼저 읽을 문서

팀 개발 규칙

어디를 수정해야 하나

Runtime 흐름

Contract baseline

Safety baseline

Runner

Local runner

Claude SDK runner

Run profiles

새 specialist 추가 요약

프로젝트 구조

설계 문서

Agentic runtime 방향

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages