🍳 Thordata Cookbook

End-to-end examples, scripts, and notebooks built on top of the Thordata Python SDK.

These recipes show how to combine:

Thordata SERP API & Universal Scraper
Thordata Web Scraper API
Modern AI tools (OpenAI, LangChain, MCP)
Data tools (Pandas, BeautifulSoup) (Python 3.11+ recommended)

🧾 Recipe Index

Built-in Notebooks & Scripts

Recipe	Type	Description
Web Q&A Agent	Notebook	`notebooks/ai/web_qa_agent_with_thordata.ipynb` Ask questions, search SERP, scrape pages, and let an LLM answer with citations. Supports live & offline modes.
GitHub Repo Intel	Notebook	`notebooks/devtools/github_repo_intel.ipynb` Use Web Scraper API spiders to collect GitHub repository metadata (stars, forks, issues) and analyze it with Pandas.
OpenAI Research RAG	Notebook	`notebooks/rag/rag_openai_research.ipynb` Scrape dynamic pages, clean HTML, and export a Markdown knowledge base for RAG systems.
RAG Data Pipeline	Script	`scripts/rag_data_pipeline.py` CLI version of the RAG preparation pipeline: Scrape → Clean → Markdown, with CLI flags for URL/country/JS rendering.
MCP Tools for LLMs	Script	`scripts/mcp_server.py` Model Context Protocol (MCP) server exposing `search_web`, `search_news`, `read_website` to Claude Desktop or other LLMs.

External Standalone Examples

Repository	Description
thordata-web-qa-agent	A standalone CLI version of the Web Q&A Agent, easy to fork and deploy.
google-play-reviews-rag	Fetch Google Play reviews via Web Scraper API, build embeddings, and run RAG QA on user feedback.
google-news-scraper	Specialized CLI for Google News scraping with advanced filtering (topic, publication, time).

📦 Installation

Clone the repository and create a virtual environment:

git clone https://github.com/Thordata/thordata-cookbook.git
cd thordata-cookbook

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS / Linux:
# source .venv/bin/activate

python -m pip install --upgrade pip
python -m pip install -r requirements.txt

🔐 Configuration

Copy .env.example to .env and fill in your credentials:

cp .env.example .env

Edit .env:

THORDATA_SCRAPER_TOKEN=your_thordata_scraper_token
THORDATA_PUBLIC_TOKEN=your_thordata_public_token
THORDATA_PUBLIC_KEY=your_thordata_public_key

# Optional: for OpenAI-based recipes
OPENAI_API_KEY=sk-...

🧪 Running Scripts (CLI)

1. RAG Data Pipeline

Fetch, clean, and save page content as Markdown:

python scripts/rag_data_pipeline.py \
  --url "https://openai.com/research" \
  --output "data/openai_research_kb.md" \
  --country "us" \
  --js-render

2. MCP Server (Claude Tools)

Expose Thordata tools to an MCP client:

python scripts/mcp_server.py

Or test tools locally without a client:

python -m scripts.test_mcp_tools

📒 Running Notebooks

Activate environment:
```
source .venv/Scripts/activate
```
Start Jupyter:
```
jupyter lab
```
Open a notebook in notebooks/:
- ai/web_qa_agent_with_thordata.ipynb
- devtools/github_repo_intel.ipynb
- rag/rag_openai_research.ipynb

Tip: Set USE_LIVE_THORDATA = False in notebooks to use cached data and save credits during development.

📂 Structure

thordata-cookbook/
├── notebooks/             # Jupyter notebooks by category
│   ├── ai/
│   ├── devtools/
│   └── rag/
├── scripts/               # Standalone Python scripts (CLI / Servers)
│   ├── rag_data_pipeline.py
│   └── mcp_server.py
├── data/                  # Local cache (git-ignored)
├── requirements.txt
├── .env.example
└── README.md

📬 Support

If you have ideas for new recipes, please open an issue or submit a PR!

For SDK-specific questions, visit the thordata-python-sdk repository.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
notebooks		notebooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍳 Thordata Cookbook

🧾 Recipe Index

Built-in Notebooks & Scripts

External Standalone Examples

📦 Installation

🔐 Configuration

🧪 Running Scripts (CLI)

1. RAG Data Pipeline

2. MCP Server (Claude Tools)

📒 Running Notebooks

📂 Structure

📬 Support

About

Uh oh!

Releases

Packages

Languages

License

Thordata/thordata-cookbook

Folders and files

Latest commit

History

Repository files navigation

🍳 Thordata Cookbook

🧾 Recipe Index

Built-in Notebooks & Scripts

External Standalone Examples

📦 Installation

🔐 Configuration

🧪 Running Scripts (CLI)

1. RAG Data Pipeline

2. MCP Server (Claude Tools)

📒 Running Notebooks

📂 Structure

📬 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages