SlurpAI is a CLI tool that scrapes documentation websites and compiles them into clean markdown files. Including relevant docs in your AI context helps coding agents make fewer mistakes and hallucinations.
|
Before: React docs website |
After: Clean markdown file |
- Configurable spider — starts from any URL and follows internal links. See Configuration for filtering and tuning options.
- Content extraction — strips navigation, sidebars, footers, and other noise, keeping only the documentation content.
- Flexible output — compiles pages into a single markdown file or keeps them separate.
- Fast and lightweight — async scraping with configurable concurrency. No external services required.
- No AI used — pure Node.js scraping. SlurpAI is for AI, it doesn't use AI.
npm install -g slurp-aiPrerequisites: Node.js v20 or later
Windows: Works natively. Installing via npm automatically generates the slurp command wrappers.
# Scrape documentation from any URL
slurp https://expressjs.com/en/4.18/
# With base path filtering (only follow links under /docs/)
slurp https://example.com/docs/introduction --base-path https://example.com/docs/- Starts at the provided URL and discovers internal links
- Scrapes each page, converting HTML to clean markdown
- Removes navigation, headers, footers, and duplicate content
- Compiles everything into a single file in
slurps/(e.g.,expressjs_docs.md)
Customize behavior by modifying config.js in the project root:
| Property | Default | Description |
|---|---|---|
inputDir |
slurps_partials |
Directory for intermediate scraped markdown files |
outputDir |
slurps |
Directory for the final compiled markdown file |
basePath |
<targetUrl> |
Base path used for link filtering (if specified) |
| Property | Default | Description |
|---|---|---|
maxPagesPerSite |
100 |
Maximum pages to scrape per site (0 for unlimited) |
concurrency |
25 |
Number of pages to process concurrently |
retryCount |
3 |
Number of times to retry failed requests |
retryDelay |
1000 |
Delay between retries in milliseconds |
useHeadless |
false |
Use headless browser for JS-rendered sites |
timeout |
60000 |
Request timeout in milliseconds |
| Property | Default | Description |
|---|---|---|
enforceBasePath |
true |
Only follow links starting with the effective basePath |
preserveQueryParams |
['version', 'lang', 'theme'] |
Query parameters to preserve when normalizing URLs |
| Property | Default | Description |
|---|---|---|
preserveMetadata |
true |
Preserve metadata blocks in markdown |
removeNavigation |
true |
Remove navigation elements from content |
removeDuplicates |
true |
Attempt to remove duplicate content sections |
similarityThreshold |
0.9 |
Threshold for considering content sections duplicates |
The URL argument is the starting point. The --base-path flag defines a prefix for filtering which links to follow.
# Only scrape /docs/ pages, but start from the introduction
slurp https://example.com/docs/introduction --base-path https://example.com/docs/Links like https://example.com/docs/advanced are followed; https://example.com/blog/post is ignored.
SlurpAI is a lightweight alternative to tools like Context7. Rather than pulling large doc bundles automatically, SlurpAI lets you manually curate the docs you need and include them only when relevant. Less context means fewer mistakes during implementation.
SlurpAI MCP is in testing and included in this release.
Issues and pull requests welcome!
- Report issues: https://github.com/ratacat/slurp-ai/issues
- Repository: https://github.com/ratacat/slurp-ai
ISC