A full-stack application for searching and analyzing NeurIPS 2025 research papers using semantic search and LLM-powered insights.
- Semantic Search: Search through NeurIPS 2025 papers, workshops, tutorials, invited talks, and expo events using natural language queries
- Advanced Filtering: Filter by author, affiliation, session/event type, and conference day/time (AM/PM)
- Full Conference Coverage: Search across the entire NeurIPS 2025 San Diego program (Dec 2-7)
- Multiple Content Types: Papers (5,450 items), workshops, tutorials, invited talks, and expo events (panels, demonstrations, workshops)
- Smart Session Filtering: Find specific event types like "Invited Talk", "Workshop", or "Expo Talk Panel"
- Bookmarks: Save papers and events to bookmarks for easy access later
- Persistent Storage: Bookmarks are saved in browser localStorage and persist across sessions
- Smart Organization: Bookmarked items are sorted by day, time (AM/PM), and poster number for easy conference navigation
- Poster Positions: View poster numbers for each paper to quickly locate them at the conference
- CSV Export: Export your bookmarks to CSV with day, time, poster, title, and session information
- Easy Management: Add/remove bookmarks with a single click, clear all bookmarks at once
- Deep Dive Chat: Add up to 25 papers to a Deep Dive queue and chat about them with OpenAI or Google Gemini
- One-click add: Use the button on each paper card or “Add all to Deep Dive” for the current results
- Smart Pre-upload: Papers are uploaded when you open the chat panel, making your first query instant
- File Caching: Papers are cached per session - no re-uploading on subsequent questions
- Full Paper Access: Gemini reads complete PDFs natively (no truncation)
- Markdown & LaTeX Support: Full rendering of mathematical formulas and formatted text
- Customizable System Prompts: Configure how the AI responds to your questions
- Model Selection: Choose from available OpenAI and Gemini models
NeuriScout/
├── backend/ # FastAPI backend server
│ ├── main.py # API endpoints
│ ├── rag.py # RAG logic and paper fetching
│ └── ingest.py # Data ingestion script
├── frontend/ # Next.js frontend
│ └── src/
│ ├── app/ # Pages and components
│ └── lib/ # API client
├── data/ # Data files (CSV, JSON)
├── scripts/ # Utility scripts for data processing
└── chroma_db/ # Vector database (generated)
- Python 3.10+
- Node.js 18+
- OpenAI API key and/or Google Gemini API key
- Clone the repository:
git clone https://github.com/gminneci/NeuriScout.git
cd NeuriScout- Set up Python environment and install backend:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .This installs the package in editable mode with all dependencies and creates the neuriscout-backend and neuriscout-ingest commands.
- Generate the ChromaDB database:
neuriscout-ingestThis will process the data from:
data/papercopilot_neurips2025_merged_openreview.csv(papers)data/neurips_2025_enriched_events.csv(workshops, tutorials, invited talks)data/neurips_2025_expo_events.csv(expo events: panels, demonstrations, workshops)
Creates the vector database in chroma_db/ with embeddings for ~5,500 unique items (papers + events). This takes a few minutes and creates ~90MB of data.
Note: The ChromaDB is required for the application to work. Keep it in the chroma_db/ directory (it's excluded from git).
- Set up frontend:
cd frontend
npm install
cd ..- Configure API keys (optional):
You can either set environment variables or enter them in the UI:
export OPENAI_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here- Start the backend (in one terminal):
source venv/bin/activate # On Windows: venv\Scripts\activate
neuriscout-backendThe backend will run on http://localhost:8000
Note: The first startup takes 10-30 seconds while loading AI models.
- Start the frontend (in a new terminal):
cd frontend
npm run devThe frontend will run on http://localhost:3000
- Open your browser to http://localhost:3000
- Search Papers & Events: Enter keywords or research questions in the search box
- Filter Results:
- Use the dropdowns to filter by author, affiliation, or session
- Filter by session to find specific content types:
- "Invited Talk" - Find all 6 invited talks (Rich Sutton, Zeynep Tufekci, Yejin Choi, Melanie Mitchell, Kyunghyun Cho, Andrew Saxe)
- "Workshop" - Browse workshop sessions
- "Tutorial" - Find tutorial sessions
- "Expo Talk Panel", "Expo Workshop", "Expo Demonstration" - Browse expo events
- Or filter by poster sessions (e.g., "San Diego Poster Session 1")
- Use day filters (Tue-Sun) and time filters (AM/PM) to browse by conference schedule
- Combine multiple filters with OR logic (e.g., "MIT" OR "Stanford")
- Bookmark Items:
- Click the star icon on any paper or event card to bookmark it
- View all bookmarks via the "Bookmarks" button in the header
- Bookmarks are automatically sorted by day, time (AM/PM), and poster number
- Poster numbers are displayed next to the time to help navigate the conference
- Export all bookmarks to CSV for offline reference
- Clear individual bookmarks or all at once
- Build Your Deep Dive:
- Click "Add to Deep Dive" on individual paper cards (or "Add all to Deep Dive" for the current results)
- Track how many slots remain (up to 25 papers can be active at once)
- Remove papers from the Deep Dive button if you want to swap them out
- Deep Dive Chat:
- Click "Deep Dive (X/25)" to open the chat panel
- Papers are automatically uploaded in the background (Gemini only)
- Click the settings icon to configure API keys and models
- Ask questions about the Deep Dive papers and tweak the system prompt anytime
- Subsequent questions are instant thanks to file caching
- Links to Sources:
- "View on NeurIPS" opens the official NeurIPS virtual site page for the paper/event (for logged-in bookmarks)
- "Paper" opens the OpenReview page for papers (reviews/discussion)
- NeurIPS links are standardized and work for all content types (posters, oral, tutorials, workshops, invited talks)
The scripts/ directory contains utilities for:
- Scraping paper data from various sources
- Merging datasets
- Data validation and debugging
-
Create a Render account at render.com
-
Create a new Web Service:
- Connect your GitHub repository
- Name:
neuriscout-backend - Environment:
Python 3 - Build Command:
pip install -e . - Start Command:
neuriscout-backend
-
Add a Persistent Disk (for ChromaDB):
- Go to your service settings
- Add Disk: Mount path
/opt/render/project/src/chroma_db, Size: 1GB
-
Set Environment Variables:
PYTHON_VERSION=3.11.0 HOST=0.0.0.0 PORT=8000 CHROMA_DB_PATH=/opt/render/project/src/chroma_db ALLOWED_ORIGINS=https://your-app.vercel.appOptional (or enter in UI):
OPENAI_API_KEY=your_key GEMINI_API_KEY=your_key -
Upload ChromaDB data:
- The ChromaDB database is NOT included in the repository (it's ~90MB)
- You need to generate it locally using
neuriscout-ingestand then upload it - Options for uploading:
- Via Render Shell: Access your service's Shell tab and run
neuriscout-ingest - Manual upload: Use
scpor Render's file upload to copy your localchroma_db/directory to the persistent disk mount path
- Via Render Shell: Access your service's Shell tab and run
- The database contains ~5,500 unique items (papers, workshops, tutorials, invited talks, expo events) with embeddings and takes a few minutes to generate
-
Copy your service URL (e.g.,
https://neuriscout-backend.onrender.com)
-
Create a Vercel account at vercel.com
-
Import your repository:
- Click "New Project"
- Import your GitHub repository
- Root Directory:
frontend - Framework Preset:
Next.js
-
Configure Environment Variables:
- Add
NEXT_PUBLIC_API_URLwith your Render backend URL - Example:
https://neuriscout-backend.onrender.com
- Add
-
Update CORS on Backend:
- Go back to Render dashboard
- Update
ALLOWED_ORIGINSto include your Vercel URL - Example:
https://neuriscout.vercel.app,https://neuriscout-*.vercel.app
-
Deploy!
- Vercel will automatically deploy
- Your app will be live at
https://your-app.vercel.app
- Cold Starts: Render's free tier sleeps after 15 minutes of inactivity. First request takes ~30 seconds to wake up.
- API Keys: You can set API keys as environment variables on Render, or users can enter them in the UI.
- ChromaDB Database:
- The vector database is NOT in the repository (excluded via
.gitignore) - Generate it locally with
neuriscout-ingestbefore deploying - Upload to the persistent disk after deployment (via Render Shell or manual transfer)
- The database is ~90MB and contains embeddings for ~5,500 items (papers + events + expo)
- The vector database is NOT in the repository (excluded via
- Custom Domain: Both Vercel and Render support custom domains for free.
Railway offers $5 free credit per month and simpler deployment:
- Create Railway account at railway.app
- Deploy from GitHub:
- New Project → Deploy from GitHub
- Select your repository
- Set environment variables:
HOST=0.0.0.0 ALLOWED_ORIGINS=* CHROMA_DB_PATH=/app/chroma_db - Upload ChromaDB:
- Generate locally:
neuriscout-ingest - Upload using Railway CLI or create a persistent volume and transfer files
- The
chroma_db/directory (~90MB) must be accessible at the path set inCHROMA_DB_PATH
- Generate locally:
Note: Railway paid tier ($5/month) is recommended for reliable hosting with better resources.
The backend includes two admin endpoints for managing deployments:
GET /admin/status - Diagnostic information:
- Base directory and ChromaDB path
- Data file existence and sizes (CSV files)
- Collection status (exists, item count)
- Example:
curl https://your-app.railway.app/admin/status
POST /admin/reingest - Manual data ingestion:
- Runs the ingest process to populate ChromaDB
- Returns stdout/stderr from the ingest process
- Takes several minutes to complete (~5,500 items)
- Example:
curl -X POST https://your-app.railway.app/admin/reingest
Note: The /admin/reingest endpoint runs synchronously and blocks the API during execution. For large datasets, use SSH to run the ingest in the background.
If the automatic ingestion fails or you need to re-populate the database:
-
Install Railway CLI (if not already installed):
npm i -g @railway/cli # or brew install railway -
Authenticate:
railway login
-
Copy SSH command from Railway Dashboard:
- Navigate to your project in the Railway dashboard
- Right-click on your service
- Select "Copy SSH Command" from the dropdown menu
- This generates a command like:
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>
-
Connect and run ingest:
# Connect using the copied SSH command railway ssh --project=<project-id> --environment=<env-id> --service=<service-id> # Inside the SSH session, run ingest in the background python -m backend.ingest > /app/ingest.log 2>&1 & # Exit the SSH session exit
-
Monitor progress:
# Check collection count via the status endpoint curl https://your-app.railway.app/admin/status | grep count # Or reconnect via SSH to check the log railway ssh --project=<project-id> --environment=<env-id> --service=<service-id> tail -f /app/ingest.log
The ingest process generates embeddings for ~5,500 unique items and takes several minutes. The collection count should reach ~5,500 when complete.
Alternative: Run a single command without an interactive session:
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id> -- python -m backend.ingestFor more details on Railway SSH, see the Railway CLI SSH documentation.
After merging to main, Railway will auto-deploy. To enable the new "View on NeurIPS" links, re-run the ingest so the ChromaDB contains the neurips_virtualsite_url metadata:
- Connect via Railway SSH (copy command from Dashboard):
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>- Run ingest in the container:
python -m backend.ingest > /app/ingest.log 2>&1 &- Monitor progress:
curl https://your-app.railway.app/admin/status | grep count
# or
tail -f /app/ingest.logTarget count is ~5500 items. Once complete, the frontend shows both "View on NeurIPS" and "Paper" buttons.
Backend:
- FastAPI
- ChromaDB (vector database)
- Sentence Transformers (embeddings)
- OpenAI API / Google Gemini API
Frontend:
- Next.js 16
- React
- TypeScript
- Tailwind CSS
- React Markdown + KaTeX
MIT License