AnyCrawl

📖 Overview

AnyCrawl is a high‑performance crawling and scraping toolkit:

SERP crawling: multiple search engines, batch‑friendly
Web scraping: single‑page content extraction
Site crawling: full‑site traversal and collection
High performance: multi‑threading / multi‑process
Batch tasks: reliable and efficient
AI extraction: LLM‑powered structured data (JSON) extraction from pages

LLM‑friendly. Easy to integrate and use.

🚀 Quick Start

📖 See full docs: Docs

Generate an API Key (self-host)

If you enable authentication (ANYCRAWL_API_AUTH_ENABLED=true), generate an API key:

pnpm --filter api key:generate
# optionally name the key
pnpm --filter api key:generate -- default

The command prints uuid, key and credits. Use the printed key as a Bearer token.

Run Inside Docker

If running AnyCrawl via Docker:

Docker Compose:

docker compose exec api pnpm --filter api key:generate
docker compose exec api pnpm --filter api key:generate -- default

Single container (replace <container_name_or_id>):

docker exec -it <container_name_or_id> pnpm --filter api key:generate
docker exec -it <container_name_or_id> pnpm --filter api key:generate -- default

📚 Usage Examples

💡 Use the Playground to test APIs and generate code in your preferred language.

If self‑hosting, replace https://api.anycrawl.dev with your own server URL.

Web Scraping (Scrape)

Example

curl -X POST https://api.anycrawl.dev/v1/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "cheerio"
}'

Parameters

Parameter	Type	Description	Default
url	string (required)	The URL to be scraped. Must be a valid URL starting with http:// or https://	-
engine	string	Scraping engine to use. Options: `cheerio` (static HTML parsing, fastest), `playwright` (JavaScript rendering with modern engine), `puppeteer` (JavaScript rendering with Chrome)	cheerio
proxy	string	Proxy URL for the request. Supports HTTP and SOCKS proxies. Format: `http://[username]:[password]@proxy:port`	(none)

More parameters: see Request Parameters.

LLM Extraction

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'

Site Crawling (Crawl)

Example

curl -X POST https://api.anycrawl.dev/v1/crawl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "playwright",
  "max_depth": 2,
  "limit": 10,
  "strategy": "same-domain"
}'

Parameters

Parameter	Type	Description	Default
url	string (required)	Starting URL to crawl	-
engine	string	Crawling engine. Options: `cheerio`, `playwright`, `puppeteer`	cheerio
max_depth	number	Max depth from the start URL	10
limit	number	Max number of pages to crawl	100
strategy	enum	Scope: `all`, `same-domain`, `same-hostname`, `same-origin`	same-domain
include_paths	array	Only crawl paths matching these patterns	(none)
exclude_paths	array	Skip paths matching these patterns	(none)
scrape_options	object	Per-page scrape options (formats, timeout, json extraction, etc.), same as Scrape options	(none)

More parameters and endpoints: see Request Parameters.

Search Engine Results (SERP)

Example

curl -X POST https://api.anycrawl.dev/v1/search \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "query": "AnyCrawl",
  "limit": 10,
  "engine": "google",
  "lang": "all"
}'

Parameters

Parameter	Type	Description	Default
`query`	string (required)	Search query to be executed	-
`engine`	string	Search engine to use. Options: `google`	google
`pages`	integer	Number of search result pages to retrieve	1
`lang`	string	Language code for search results (e.g., 'en', 'zh', 'all')	en-US

Supported search engines

Google

❓ FAQ

Can I use proxies? Yes. AnyCrawl ships with a high‑quality default proxy. You can also configure your own: set the proxy request parameter (per request) or ANYCRAWL_PROXY_URL (self‑hosting).
How to handle JavaScript‑rendered pages? Use the Playwright or Puppeteer engines.

🤝 Contributing

We welcome contributions! See the Contributing Guide.

📄 License

MIT License — see LICENSE.

🎯 Mission

We build simple, reliable, and scalable tools for the AI ecosystem.

_{Built with ❤️ by the Any4AI team}

Name		Name	Last commit message	Last commit date
Latest commit History 297 Commits
.github/workflows		.github/workflows
apps		apps
docker		docker
docs		docs
packages		packages
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ai.config.example.json		ai.config.example.json
build.sh		build.sh
docker-compose.yml		docker-compose.yml
jest.config.base.mjs		jest.config.base.mjs
knip.json		knip.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
proxy.config.example.json		proxy.config.example.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AnyCrawl

📖 Overview

🚀 Quick Start

Generate an API Key (self-host)

Run Inside Docker

📚 Usage Examples

Web Scraping (Scrape)

Example

Parameters

LLM Extraction

Site Crawling (Crawl)

Example

Parameters

Search Engine Results (SERP)

Example

Parameters

Supported search engines

❓ FAQ

🤝 Contributing

📄 License

🎯 Mission

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors 3

Languages

License

any4ai/AnyCrawl

Folders and files

Latest commit

History

Repository files navigation

AnyCrawl

📖 Overview

🚀 Quick Start

Generate an API Key (self-host)

Run Inside Docker

📚 Usage Examples

Web Scraping (Scrape)

Example

Parameters

LLM Extraction

Site Crawling (Crawl)

Example

Parameters

Search Engine Results (SERP)

Example

Parameters

Supported search engines

❓ FAQ

🤝 Contributing

📄 License

🎯 Mission

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors 3

Languages

Packages