SlurpAI

Convert entire documentation sites into AI-ready markdown

SlurpAI is a CLI tool that scrapes documentation websites and compiles them into clean markdown files. Including relevant docs in your AI context helps coding agents make fewer mistakes and hallucinations.

Before: React docs website

After: Clean markdown file

Convert a Documentation Site to Markdown

Configurable spider — starts from any URL and follows internal links. See Configuration for filtering and tuning options.
Content extraction — strips navigation, sidebars, footers, and other noise, keeping only the documentation content.
Flexible output — compiles pages into a single markdown file or keeps them separate.
Fast and lightweight — async scraping with configurable concurrency. No external services required.
No AI used — pure Node.js scraping. SlurpAI is for AI, it doesn't use AI.

Installation

npm install -g slurp-ai

Prerequisites: Node.js v20 or later

Windows: Works natively. Installing via npm automatically generates the slurp command wrappers.

Usage

# Scrape documentation from any URL
slurp https://expressjs.com/en/4.18/

# With base path filtering (only follow links under /docs/)
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

What Happens

Starts at the provided URL and discovers internal links
Scrapes each page, converting HTML to clean markdown
Removes navigation, headers, footers, and duplicate content
Compiles everything into a single file in slurps/ (e.g., expressjs_docs.md)

Configuration (Optional)

Customize behavior by modifying config.js in the project root:

File System Paths

Property	Default	Description
`inputDir`	`slurps_partials`	Directory for intermediate scraped markdown files
`outputDir`	`slurps`	Directory for the final compiled markdown file
`basePath`	`<targetUrl>`	Base path used for link filtering (if specified)

Web Scraping Settings

Property	Default	Description
`maxPagesPerSite`	`100`	Maximum pages to scrape per site (0 for unlimited)
`concurrency`	`25`	Number of pages to process concurrently
`retryCount`	`3`	Number of times to retry failed requests
`retryDelay`	`1000`	Delay between retries in milliseconds
`useHeadless`	`false`	Use headless browser for JS-rendered sites
`timeout`	`60000`	Request timeout in milliseconds

URL Filtering

Property	Default	Description
`enforceBasePath`	`true`	Only follow links starting with the effective basePath
`preserveQueryParams`	`['version', 'lang', 'theme']`	Query parameters to preserve when normalizing URLs

Markdown Compilation

Property	Default	Description
`preserveMetadata`	`true`	Preserve metadata blocks in markdown
`removeNavigation`	`true`	Remove navigation elements from content
`removeDuplicates`	`true`	Attempt to remove duplicate content sections
`similarityThreshold`	`0.9`	Threshold for considering content sections duplicates

Base Path Explained

The URL argument is the starting point. The --base-path flag defines a prefix for filtering which links to follow.

# Only scrape /docs/ pages, but start from the introduction
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

Links like https://example.com/docs/advanced are followed; https://example.com/blog/post is ignored.

Alternative to Context7

SlurpAI is a lightweight alternative to tools like Context7. Rather than pulling large doc bundles automatically, SlurpAI lets you manually curate the docs you need and include them only when relevant. Less context means fewer mistakes during implementation.

MCP Server Integration

SlurpAI MCP is in testing and included in this release.

Contributing

Issues and pull requests welcome!

Report issues: https://github.com/ratacat/slurp-ai/issues
Repository: https://github.com/ratacat/slurp-ai

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
__tests__		__tests__
assets		assets
bin		bin
src		src
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
README.md		README.md
config.js		config.js
index.js		index.js
mcp-server.js		mcp-server.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlurpAI

Convert a Documentation Site to Markdown

Installation

Usage

What Happens

Configuration (Optional)

File System Paths

Web Scraping Settings

URL Filtering

Markdown Compilation

Base Path Explained

Alternative to Context7

MCP Server Integration

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Mattallmighty/slurp-ai

Folders and files

Latest commit

History

Repository files navigation

SlurpAI

Convert a Documentation Site to Markdown

Installation

Usage

What Happens

Configuration (Optional)

File System Paths

Web Scraping Settings

URL Filtering

Markdown Compilation

Base Path Explained

Alternative to Context7

MCP Server Integration

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages