Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Advanced generation of summaries for repositories. Designed for integration and ingestion by LLMs

License

Notifications You must be signed in to change notification settings

MrCabss69/RepoGPT

Repository files navigation

RepoGPT

Abstraction, summarization and code intelligence — built for both humans and LLMs

RepoGPT turns a source-tree into a consultable abstraction layer:
structured, queryable and ready for downstream indexing or RAG pipelines.


\[Collector] → \[Parser] → \[Processor] → \[Publisher]
\|            |              |             |
paths      CodeNode-trees  optional     JSON / NDJSON / stdout

  • Languages – Python (.py) & Markdown (.md) out-of-the-box.
    Extendable via plug-in parsers.
  • Outputs – hierarchical or flat, single-file JSON or streaming NDJSON.
  • Logging – powered by structlog; fully STDOUT-safe.
  • Fail-fast – abort immediately on the first parser error if you need strict runs.
  • Ignore rules.repogptignore (git-wildmatch) + sensible defaults (.git, node_modules, …).

Installation

git clone https://github.com/MrCabss69/RepoGPT.git
cd RepoGPT
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"           # installs structlog, pathspec, pytest, ruff…

Desarrollo y Calidad de Código

Este proyecto utiliza pre-commit para asegurar calidad automática:

  • Linting y autoformato (black, ruff)
  • Chequeo de tipado (mypy)
  • Tests unitarios y cobertura ≥80% (pytest --cov)

¿Cómo contribuyo de forma segura?

  1. Instala pre-commit (una vez):

    pip install pre-commit
    pre-commit install
    
  2. Antes de commitear, ejecuta todos los checks:

    pre-commit run --all-files
    

Si algún check falla, arregla el código antes de push/PR.
El pipeline de CI es igual de estricto.


Quick start

# analyse a codebase and emit a single JSON file
repogpt path-to-project/ -o report.json

# NDJSON one-line-per-file, streamed to stdout (great for pipes)
repogpt path-to-project/  --flatten node --format ndjson --stdout | jq 'select(.type=="Class")'

CLI reference

Flag Default Description
--flatten {node,file} node node: every CodeNode appears (can explode to many lines).
file: only the root node (tree) per file.
--format {json,ndjson} json Output container.
json: single list written to file.
ndjson: one JSON object per line (either node or file as above).
--stdout - Stream to STDOUT instead of file.
Passing -o /dev/stdout has the same effect.
-o, --output PATH analysis.json Destination file (ignored if --stdout).
--languages "py,md,ts" all parsers Comma-separated, case-insensitive whitelist of extensions.
--include-tests off Do not skip tests/ or test_*.py.
--log-level {INFO,DEBUG} INFO Structured logs to STDERR.
--fail-fast off Abort on the first parser error (exit 1).

Exit codes

Code Meaning
0 All requested files parsed successfully.
1 Fail-fast triggered or unrecoverable CLI error.

.repogptignore

Use the same glob syntax as .gitignore to exclude paths or files in addition to built-ins such as .git/, node_modules/, __pycache__/, etc.

# ignore generated docs
docs/build/

# ignore big assets
*.png
*.pdf

Output examples

1. JSON (flatten=node)

[
  {
    "id": "",
    "type": "Module",
    "name": "utils",
    "path": "src/repogpt/utils/text_processing.py",
    "lang": "py",
    "metrics": { "lines_of_code": 180, "blank_lines": 40 },
    
  },
  { "id": "", "type": "Function", "name": "extract_comments", },
  
]

2. NDJSON (flatten=file)

{"id":"…","type":"Module", ... ,"path":"README.md","lang":"md"}
{"id":"…","type":"Module", ... ,"path":"src/repogpt/__init__.py","lang":"py"}

Logging & diagnostics

RepoGPT never mixes data and logs:

  • Data → STDOUT (--stdout) or the output file.
  • Logs → STDERR (via structlog).

Examples:

2025-05-17 18:12:07 [info ] starting run          format=ndjson repo=/path/to/repo
2025-05-17 18:12:07 [debug] skip                  path=tests/foo.py reason=ignored
2025-05-17 18:12:08 [error] aborting — fail-fast  first_error="SyntaxError: invalid syntax"

Capture with pytest’s caplog, or redirect STDERR to a file in CI.


Development

ruff check .
mypy src/
pytest -q

Project layout

src/repogpt/
├── adapters/
│   ├── collector/      # filesystem traversal & ignore logic
│   ├── parser/         # language-specific parsers → CodeNode trees
│   ├── pipeline/       # glue + processors
│   └── publisher/      # JSON/NDJSON writer
├── core/               # service + clean-architecture ports
├── utils/              # file & text helpers
└── app/cli.py          # entry-point

Extending to another language

  1. Create src/repogpt/adapters/parser/<lang>_parser.py implementing parse() → CodeNode.
  2. Register it in adapters/parser/__init__.py.
  3. Add extension to docs and tests.

Tests

pytest                                 # full unit/integration suite
pytest tests/test_phase3.py -q         # logging & fail-fast happy-path

The suite exercises:

  • Collect / ignore rules
  • Markdown & Python parsers (fixtures under tests/data/)
  • NDJSON vs JSON writer
  • Fail-fast & debug logging

Roadmap

  • Phase 4 – caching by file-hash + parallel workers
  • Phase 5 – CI (ruff + mypy + pytest), release to PyPI
  • Phase 6 – plug-in entry-points for custom processors & new languages
  • Phase 7 – optional HTML / graph visualizer

License

MIT

Happy hacking 💻

About

Advanced generation of summaries for repositories. Designed for integration and ingestion by LLMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •