Quick RAG Demo
Retrieval-augmented Q&A system powered by LangChain JS + OpenAI (gpt-4o) for answering questions about Pineapple Builder.
⸻
✨ Features • Crawler → JSON (scripts/web-scraper) — scrapes support.pineapplebuidler.com docs and saves to data/articles.json • Vector cache (embedder.ts) — embeds once, saves vectors to data/pages_vectors.json • RAG CLI (cli.ts) — ask questions from your terminal • Mini evaluator (eval.ts) — runs gold-set tests, outputs pass/fail TSV -> save eval_results.tsv
⸻
🔧 Requirements • Node ≥ 20 • yarn • OpenAI API key -> add to .env
⸻
🚀 Quick start
yarn # installs deps
cp .env.example .env # then paste OPENAI_API_KEY=sk-...
- run
python3 intercom_help_articles-structure.py- get structure and articles - run
python3 intercom_help_export.py- get articles content and savesdata/articles.json
yarn dev # "Ask › " prompt appears
Subsequent runs skip embedding thanks to the vector cache.
⸻
📂 Project layout
. ├── scripts/ │ └── web-scraper/ │ ├── get-url-app-future.py # fetch URLs from app │ └── intercom_help_articles-structure.py # structure scraper ├── data/ │ ├── articles.json # raw FAQ docs │ └── pages_vectors.json # persisted vectors (generated) ├── src/ │ ├── embedder.ts # vector store builder/cache │ ├── qa.ts # chain factory │ ├── cli.ts # interactive Q&A │ └── eval.ts # evaluation script ├── .env.example ├── package.json └── README.md # this file
⸻
🧪 Evaluation
yarn test # or npm run eval
Runs three default test questions (edit TESTS in eval.ts). Outputs data/eval_results.tsv with columns:
question sim overlap pass answer
A row passes when similarity ≥ 0.82 and ≥ 50 % of the answer’s words originate from the retrieved context.
⸻
🛡️ Hallucination mitigation
- System prompt: “Answer only from provided context…”
- Retrieval chunks limited to k = 4, temperature 0.
- Post-answer checks in eval.ts: • context-overlap • OpenAI Moderation API (easy to plug in)
- If either guard fails, respond with a safe fallback.
⸻
🏗 Extending • UI — swap cli.ts for a small Next.js frontend. • Scale — replace MemoryVectorStore with Pinecone. • Docs — add more TESTS and bump thresholds as needed.
⸻
Made with ❤️ in ~2 hours. Ping me if you hit any bumps!