End-to-end examples, scripts, and notebooks built on top of the Thordata Python SDK.
These recipes show how to combine:
- Thordata SERP API & Universal Scraper
- Thordata Web Scraper API
- Modern AI tools (OpenAI, LangChain, MCP)
- Data tools (Pandas, BeautifulSoup) (Python 3.11+ recommended)
| Recipe | Type | Description |
|---|---|---|
| Web Q&A Agent | Notebook | notebooks/ai/web_qa_agent_with_thordata.ipynbAsk questions, search SERP, scrape pages, and let an LLM answer with citations. Supports live & offline modes. |
| GitHub Repo Intel | Notebook | notebooks/devtools/github_repo_intel.ipynbUse Web Scraper API spiders to collect GitHub repository metadata (stars, forks, issues) and analyze it with Pandas. |
| OpenAI Research RAG | Notebook | notebooks/rag/rag_openai_research.ipynbScrape dynamic pages, clean HTML, and export a Markdown knowledge base for RAG systems. |
| RAG Data Pipeline | Script | scripts/rag_data_pipeline.pyCLI version of the RAG preparation pipeline: Scrape → Clean → Markdown, with CLI flags for URL/country/JS rendering. |
| MCP Tools for LLMs | Script | scripts/mcp_server.pyModel Context Protocol (MCP) server exposing search_web, search_news, read_website to Claude Desktop or other LLMs. |
| Repository | Description |
|---|---|
| thordata-web-qa-agent | A standalone CLI version of the Web Q&A Agent, easy to fork and deploy. |
| google-play-reviews-rag | Fetch Google Play reviews via Web Scraper API, build embeddings, and run RAG QA on user feedback. |
| google-news-scraper | Specialized CLI for Google News scraping with advanced filtering (topic, publication, time). |
Clone the repository and create a virtual environment:
git clone https://github.com/Thordata/thordata-cookbook.git
cd thordata-cookbook
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS / Linux:
# source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txtCopy .env.example to .env and fill in your credentials:
cp .env.example .envEdit .env:
THORDATA_SCRAPER_TOKEN=your_thordata_scraper_token
THORDATA_PUBLIC_TOKEN=your_thordata_public_token
THORDATA_PUBLIC_KEY=your_thordata_public_key
# Optional: for OpenAI-based recipes
OPENAI_API_KEY=sk-...Fetch, clean, and save page content as Markdown:
python scripts/rag_data_pipeline.py \
--url "https://openai.com/research" \
--output "data/openai_research_kb.md" \
--country "us" \
--js-renderExpose Thordata tools to an MCP client:
python scripts/mcp_server.pyOr test tools locally without a client:
python -m scripts.test_mcp_tools-
Activate environment:
source .venv/Scripts/activate -
Start Jupyter:
jupyter lab
-
Open a notebook in notebooks/:
ai/web_qa_agent_with_thordata.ipynbdevtools/github_repo_intel.ipynbrag/rag_openai_research.ipynb
Tip: Set
USE_LIVE_THORDATA = Falsein notebooks to use cached data and save credits during development.
thordata-cookbook/
├── notebooks/ # Jupyter notebooks by category
│ ├── ai/
│ ├── devtools/
│ └── rag/
├── scripts/ # Standalone Python scripts (CLI / Servers)
│ ├── rag_data_pipeline.py
│ └── mcp_server.py
├── data/ # Local cache (git-ignored)
├── requirements.txt
├── .env.example
└── README.md
If you have ideas for new recipes, please open an issue or submit a PR!
For SDK-specific questions, visit the thordata-python-sdk repository.