Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Tool for scraping and consolidating documentation websites into a single MD file.

Notifications You must be signed in to change notification settings

Mattallmighty/slurp-ai

 
 

Repository files navigation

SlurpAI

SlurpAI Demo

Convert entire documentation sites into AI-ready markdown

SlurpAI is a CLI tool that scrapes documentation websites and compiles them into clean markdown files. Including relevant docs in your AI context helps coding agents make fewer mistakes and hallucinations.

React documentation site

Before: React docs website

Converted markdown

After: Clean markdown file

Convert a Documentation Site to Markdown

  • Configurable spider — starts from any URL and follows internal links. See Configuration for filtering and tuning options.
  • Content extraction — strips navigation, sidebars, footers, and other noise, keeping only the documentation content.
  • Flexible output — compiles pages into a single markdown file or keeps them separate.
  • Fast and lightweight — async scraping with configurable concurrency. No external services required.
  • No AI used — pure Node.js scraping. SlurpAI is for AI, it doesn't use AI.

Installation

npm install -g slurp-ai

Prerequisites: Node.js v20 or later

Windows: Works natively. Installing via npm automatically generates the slurp command wrappers.

Usage

# Scrape documentation from any URL
slurp https://expressjs.com/en/4.18/

# With base path filtering (only follow links under /docs/)
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

What Happens

  1. Starts at the provided URL and discovers internal links
  2. Scrapes each page, converting HTML to clean markdown
  3. Removes navigation, headers, footers, and duplicate content
  4. Compiles everything into a single file in slurps/ (e.g., expressjs_docs.md)

Configuration (Optional)

Customize behavior by modifying config.js in the project root:

File System Paths

Property Default Description
inputDir slurps_partials Directory for intermediate scraped markdown files
outputDir slurps Directory for the final compiled markdown file
basePath <targetUrl> Base path used for link filtering (if specified)

Web Scraping Settings

Property Default Description
maxPagesPerSite 100 Maximum pages to scrape per site (0 for unlimited)
concurrency 25 Number of pages to process concurrently
retryCount 3 Number of times to retry failed requests
retryDelay 1000 Delay between retries in milliseconds
useHeadless false Use headless browser for JS-rendered sites
timeout 60000 Request timeout in milliseconds

URL Filtering

Property Default Description
enforceBasePath true Only follow links starting with the effective basePath
preserveQueryParams ['version', 'lang', 'theme'] Query parameters to preserve when normalizing URLs

Markdown Compilation

Property Default Description
preserveMetadata true Preserve metadata blocks in markdown
removeNavigation true Remove navigation elements from content
removeDuplicates true Attempt to remove duplicate content sections
similarityThreshold 0.9 Threshold for considering content sections duplicates

Base Path Explained

The URL argument is the starting point. The --base-path flag defines a prefix for filtering which links to follow.

# Only scrape /docs/ pages, but start from the introduction
slurp https://example.com/docs/introduction --base-path https://example.com/docs/

Links like https://example.com/docs/advanced are followed; https://example.com/blog/post is ignored.

Alternative to Context7

SlurpAI is a lightweight alternative to tools like Context7. Rather than pulling large doc bundles automatically, SlurpAI lets you manually curate the docs you need and include them only when relevant. Less context means fewer mistakes during implementation.

MCP Server Integration

SlurpAI MCP is in testing and included in this release.

Contributing

Issues and pull requests welcome!

License

ISC

About

Tool for scraping and consolidating documentation websites into a single MD file.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 99.6%
  • Shell 0.4%