NeuriScout

A full-stack application for searching and analyzing NeurIPS 2025 research papers using semantic search and LLM-powered insights.

Features

Semantic Search: Search through NeurIPS 2025 papers, workshops, tutorials, invited talks, and expo events using natural language queries
Advanced Filtering: Filter by author, affiliation, session/event type, and conference day/time (AM/PM)
- Full Conference Coverage: Search across the entire NeurIPS 2025 San Diego program (Dec 2-7)
- Multiple Content Types: Papers (5,450 items), workshops, tutorials, invited talks, and expo events (panels, demonstrations, workshops)
- Smart Session Filtering: Find specific event types like "Invited Talk", "Workshop", or "Expo Talk Panel"
Bookmarks: Save papers and events to bookmarks for easy access later
- Persistent Storage: Bookmarks are saved in browser localStorage and persist across sessions
- Smart Organization: Bookmarked items are sorted by day, time (AM/PM), and poster number for easy conference navigation
- Poster Positions: View poster numbers for each paper to quickly locate them at the conference
- CSV Export: Export your bookmarks to CSV with day, time, poster, title, and session information
- Easy Management: Add/remove bookmarks with a single click, clear all bookmarks at once
Deep Dive Chat: Add up to 25 papers to a Deep Dive queue and chat about them with OpenAI or Google Gemini
- One-click add: Use the button on each paper card or “Add all to Deep Dive” for the current results
- Smart Pre-upload: Papers are uploaded when you open the chat panel, making your first query instant
- File Caching: Papers are cached per session - no re-uploading on subsequent questions
- Full Paper Access: Gemini reads complete PDFs natively (no truncation)
Markdown & LaTeX Support: Full rendering of mathematical formulas and formatted text
Customizable System Prompts: Configure how the AI responds to your questions
Model Selection: Choose from available OpenAI and Gemini models

Project Structure

NeuriScout/
├── backend/              # FastAPI backend server
│   ├── main.py          # API endpoints
│   ├── rag.py           # RAG logic and paper fetching
│   └── ingest.py        # Data ingestion script
├── frontend/            # Next.js frontend
│   └── src/
│       ├── app/         # Pages and components
│       └── lib/         # API client
├── data/                # Data files (CSV, JSON)
├── scripts/             # Utility scripts for data processing
└── chroma_db/          # Vector database (generated)

Setup

Prerequisites

Python 3.10+
Node.js 18+
OpenAI API key and/or Google Gemini API key

Installation

Clone the repository:

git clone https://github.com/gminneci/NeuriScout.git
cd NeuriScout

Set up Python environment and install backend:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

This installs the package in editable mode with all dependencies and creates the neuriscout-backend and neuriscout-ingest commands.

Generate the ChromaDB database:

neuriscout-ingest

This will process the data from:

data/papercopilot_neurips2025_merged_openreview.csv (papers)
data/neurips_2025_enriched_events.csv (workshops, tutorials, invited talks)
data/neurips_2025_expo_events.csv (expo events: panels, demonstrations, workshops)

Creates the vector database in chroma_db/ with embeddings for ~5,500 unique items (papers + events). This takes a few minutes and creates ~90MB of data.

Note: The ChromaDB is required for the application to work. Keep it in the chroma_db/ directory (it's excluded from git).

Set up frontend:

cd frontend
npm install
cd ..

Configure API keys (optional):

You can either set environment variables or enter them in the UI:

export OPENAI_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here

Running the Application

Start the backend (in one terminal):

source venv/bin/activate  # On Windows: venv\Scripts\activate
neuriscout-backend

The backend will run on http://localhost:8000

Note: The first startup takes 10-30 seconds while loading AI models.

Start the frontend (in a new terminal):

cd frontend
npm run dev

The frontend will run on http://localhost:3000

Open your browser to http://localhost:3000

Usage

Search Papers & Events: Enter keywords or research questions in the search box
Filter Results:
- Use the dropdowns to filter by author, affiliation, or session
- Filter by session to find specific content types:
  - "Invited Talk" - Find all 6 invited talks (Rich Sutton, Zeynep Tufekci, Yejin Choi, Melanie Mitchell, Kyunghyun Cho, Andrew Saxe)
  - "Workshop" - Browse workshop sessions
  - "Tutorial" - Find tutorial sessions
  - "Expo Talk Panel", "Expo Workshop", "Expo Demonstration" - Browse expo events
  - Or filter by poster sessions (e.g., "San Diego Poster Session 1")
- Use day filters (Tue-Sun) and time filters (AM/PM) to browse by conference schedule
- Combine multiple filters with OR logic (e.g., "MIT" OR "Stanford")
Bookmark Items:
- Click the star icon on any paper or event card to bookmark it
- View all bookmarks via the "Bookmarks" button in the header
- Bookmarks are automatically sorted by day, time (AM/PM), and poster number
- Poster numbers are displayed next to the time to help navigate the conference
- Export all bookmarks to CSV for offline reference
- Clear individual bookmarks or all at once
Build Your Deep Dive:
- Click "Add to Deep Dive" on individual paper cards (or "Add all to Deep Dive" for the current results)
- Track how many slots remain (up to 25 papers can be active at once)
- Remove papers from the Deep Dive button if you want to swap them out
Deep Dive Chat:
- Click "Deep Dive (X/25)" to open the chat panel
- Papers are automatically uploaded in the background (Gemini only)
- Click the settings icon to configure API keys and models
- Ask questions about the Deep Dive papers and tweak the system prompt anytime
- Subsequent questions are instant thanks to file caching
Links to Sources:
- "View on NeurIPS" opens the official NeurIPS virtual site page for the paper/event (for logged-in bookmarks)
- "Paper" opens the OpenReview page for papers (reviews/discussion)
- NeurIPS links are standardized and work for all content types (posters, oral, tutorials, workshops, invited talks)

Data Processing

The scripts/ directory contains utilities for:

Scraping paper data from various sources
Merging datasets
Data validation and debugging

Deployment (Free Hosting)

Deploy to Vercel (Frontend) + Render (Backend)

Backend Deployment (Render)

Create a Render account at render.com
Create a new Web Service:
- Connect your GitHub repository
- Name: neuriscout-backend
- Environment: Python 3
- Build Command: pip install -e .
- Start Command: neuriscout-backend
Add a Persistent Disk (for ChromaDB):
- Go to your service settings
- Add Disk: Mount path /opt/render/project/src/chroma_db, Size: 1GB

Set Environment Variables:

PYTHON_VERSION=3.11.0
HOST=0.0.0.0
PORT=8000
CHROMA_DB_PATH=/opt/render/project/src/chroma_db
ALLOWED_ORIGINS=https://your-app.vercel.app

Optional (or enter in UI):

OPENAI_API_KEY=your_key
GEMINI_API_KEY=your_key

Upload ChromaDB data:
- The ChromaDB database is NOT included in the repository (it's ~90MB)
- You need to generate it locally using neuriscout-ingest and then upload it
- Options for uploading:
  - Via Render Shell: Access your service's Shell tab and run neuriscout-ingest
  - Manual upload: Use scp or Render's file upload to copy your local chroma_db/ directory to the persistent disk mount path
- The database contains ~5,500 unique items (papers, workshops, tutorials, invited talks, expo events) with embeddings and takes a few minutes to generate
Copy your service URL (e.g., https://neuriscout-backend.onrender.com)

Frontend Deployment (Vercel)

Create a Vercel account at vercel.com
Import your repository:
- Click "New Project"
- Import your GitHub repository
- Root Directory: frontend
- Framework Preset: Next.js
Configure Environment Variables:
- Add NEXT_PUBLIC_API_URL with your Render backend URL
- Example: https://neuriscout-backend.onrender.com
Update CORS on Backend:
- Go back to Render dashboard
- Update ALLOWED_ORIGINS to include your Vercel URL
- Example: https://neuriscout.vercel.app,https://neuriscout-*.vercel.app
Deploy!
- Vercel will automatically deploy
- Your app will be live at https://your-app.vercel.app

Important Notes:

Cold Starts: Render's free tier sleeps after 15 minutes of inactivity. First request takes ~30 seconds to wake up.
API Keys: You can set API keys as environment variables on Render, or users can enter them in the UI.
ChromaDB Database:
- The vector database is NOT in the repository (excluded via .gitignore)
- Generate it locally with neuriscout-ingest before deploying
- Upload to the persistent disk after deployment (via Render Shell or manual transfer)
- The database is ~90MB and contains embeddings for ~5,500 items (papers + events + expo)
Custom Domain: Both Vercel and Render support custom domains for free.

Alternative: Deploy to Railway

Railway offers $5 free credit per month and simpler deployment:

Create Railway account at railway.app
Deploy from GitHub:
- New Project → Deploy from GitHub
- Select your repository

Set environment variables:

HOST=0.0.0.0
ALLOWED_ORIGINS=*
CHROMA_DB_PATH=/app/chroma_db

Upload ChromaDB:
- Generate locally: neuriscout-ingest
- Upload using Railway CLI or create a persistent volume and transfer files
- The chroma_db/ directory (~90MB) must be accessible at the path set in CHROMA_DB_PATH

Note: Railway paid tier ($5/month) is recommended for reliable hosting with better resources.

Railway Admin Endpoints

The backend includes two admin endpoints for managing deployments:

GET /admin/status - Diagnostic information:

Base directory and ChromaDB path
Data file existence and sizes (CSV files)
Collection status (exists, item count)
Example: curl https://your-app.railway.app/admin/status

POST /admin/reingest - Manual data ingestion:

Runs the ingest process to populate ChromaDB
Returns stdout/stderr from the ingest process
Takes several minutes to complete (~5,500 items)
Example: curl -X POST https://your-app.railway.app/admin/reingest

Note: The /admin/reingest endpoint runs synchronously and blocks the API during execution. For large datasets, use SSH to run the ingest in the background.

Re-running Ingest via Railway SSH

If the automatic ingestion fails or you need to re-populate the database:

Install Railway CLI (if not already installed):

npm i -g @railway/cli
# or
brew install railway

Authenticate:
```
railway login
```
Copy SSH command from Railway Dashboard:
- Navigate to your project in the Railway dashboard
- Right-click on your service
- Select "Copy SSH Command" from the dropdown menu
- This generates a command like:
```
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>
```

Connect and run ingest:

# Connect using the copied SSH command
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>

# Inside the SSH session, run ingest in the background
python -m backend.ingest > /app/ingest.log 2>&1 &

# Exit the SSH session
exit

Monitor progress:

# Check collection count via the status endpoint
curl https://your-app.railway.app/admin/status | grep count

# Or reconnect via SSH to check the log
railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>
tail -f /app/ingest.log

The ingest process generates embeddings for ~5,500 unique items and takes several minutes. The collection count should reach ~5,500 when complete.

Alternative: Run a single command without an interactive session:

railway ssh --project=<project-id> --environment=<env-id> --service=<service-id> -- python -m backend.ingest

For more details on Railway SSH, see the Railway CLI SSH documentation.

Post-deploy: Populate NeurIPS Links

After merging to main, Railway will auto-deploy. To enable the new "View on NeurIPS" links, re-run the ingest so the ChromaDB contains the neurips_virtualsite_url metadata:

Connect via Railway SSH (copy command from Dashboard):

railway ssh --project=<project-id> --environment=<env-id> --service=<service-id>

Run ingest in the container:

python -m backend.ingest > /app/ingest.log 2>&1 &

Monitor progress:

curl https://your-app.railway.app/admin/status | grep count
# or
tail -f /app/ingest.log

Target count is ~5500 items. Once complete, the frontend shows both "View on NeurIPS" and "Paper" buttons.

Technology Stack

Backend:

FastAPI
ChromaDB (vector database)
Sentence Transformers (embeddings)
OpenAI API / Google Gemini API

Frontend:

Next.js 16
React
TypeScript
Tailwind CSS
React Markdown + KaTeX

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
backend		backend
data		data
frontend		frontend
scripts		scripts
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
pyproject.toml		pyproject.toml
railway-init.sh		railway-init.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeuriScout

Features

Project Structure

Setup

Prerequisites

Installation

Running the Application

Usage

Data Processing

Deployment (Free Hosting)

Deploy to Vercel (Frontend) + Render (Backend)

Backend Deployment (Render)

Frontend Deployment (Vercel)

Important Notes:

Alternative: Deploy to Railway

Railway Admin Endpoints

Re-running Ingest via Railway SSH

Post-deploy: Populate NeurIPS Links

Technology Stack

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

gminneci/NeuriScout

Folders and files

Latest commit

History

Repository files navigation

NeuriScout

Features

Project Structure

Setup

Prerequisites

Installation

Running the Application

Usage

Data Processing

Deployment (Free Hosting)

Deploy to Vercel (Frontend) + Render (Backend)

Backend Deployment (Render)

Frontend Deployment (Vercel)

Important Notes:

Alternative: Deploy to Railway

Railway Admin Endpoints

Re-running Ingest via Railway SSH

Post-deploy: Populate NeurIPS Links

Technology Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages