Vector Search Demo

A comprehensive demonstration of MongoDB Atlas Vector Search capabilities, featuring multilingual articles and real-time comparison of different search methods.

🚀 Features

🔍 Multiple Search Methods: Compare MongoDB Regex, Atlas Lucene, and Vector Semantic search side-by-side
🌍 Multilingual Support: Search across English, Spanish, and French articles
📊 Admin Dashboard: Real-time statistics and management interface
📰 Article Management: Fetch articles directly from The Guardian API
🧠 Vector Generation: Generate embeddings for semantic search with real-time progress
⚙️ Web-Based Management: Everything managed through a clean, modern web interface

🏗️ Architecture

Backend

Express.js server with TypeScript
MongoDB Atlas for document storage
Vector Search Index for semantic search
Atlas Search Index for full-text search
Local embeddings using Xenova/transformers (free, no API keys needed)

Frontend

React with TypeScript
Responsive design with modern UI components
Real-time progress tracking for long-running operations
Server-sent events for live updates

📋 Prerequisites

MongoDB Atlas Account (free tier works)
Guardian API Key (free from The Guardian Open Platform)
Node.js 16+ and npm

🛠️ Setup

1. Environment Variables

Create a .env file in the root directory:

# Required
MONGODB_URI=mongodb+srv://username:[email protected]/vector-search

# Optional (for article fetching)
GUARDIAN_API_KEY=your-guardian-api-key-here

2. Install Dependencies

# Install backend dependencies
npm install

# Install frontend dependencies
npm run install-frontend

3. MongoDB Atlas Setup

Create a new cluster in MongoDB Atlas
Create a database called vector-search
Create a collection called articles
Create two search indexes:

Vector Search Index

Index Name: vector_index
Configuration:

{
  "fields": [
    {
      "numDimensions": 384,
      "path": "contentVector",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "languageCode",
      "type": "filter"
    },
    {
      "path": "wordCount",
      "type": "filter"
    }
  ]
}

Atlas Search Index

Index Name: search-index
Configuration:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "text",
        "analyzer": "lucene.standard"
      },
      "content": {
        "type": "text",
        "analyzer": "lucene.standard"
      },
      "languageCode": {
        "type": "token"
      },
      "wordCount": {
        "type": "number"
      }
    }
  }
}

🚀 Running the Application

Development Mode

npm run dev

This starts both the server (port 3001) and React frontend (port 3000).

Production Mode

# Build the application
npm run build
npm run build-frontend

# Start the server
npm start

📱 Using the Application

🔍 Search Tab

Enter a search term (e.g., "artificial intelligence", "renewable energy")
Select search methods to compare (MongoDB, Atlas, Vector, or all)
Filter by language (optional)
Set minimum word count (optional)
Click Search to see results from all selected methods

⚙️ Admin Tab

📊 Statistics Dashboard

View total articles and embedding progress
See language distribution breakdown
Real-time statistics with refresh button

📰 Fetch Articles

Set your Guardian API key in the .env file (see setup instructions)
Set number of articles to fetch (1-1000)
Optionally specify a topic for focused content
Click "Fetch Articles" to pull fresh content from The Guardian

🧠 Generate Embeddings

Click "Generate Embeddings" for articles without vectors
Watch real-time progress with percentage and status updates
Automatic refresh of statistics when complete

🌟 Search Method Comparison

🔍 MongoDB Regex Search

Best for: Exact phrase matching
Search method: Case-insensitive regex on whole search term
Case insensitive: Yes
Speed: Very fast

📊 Atlas Lucene Search

Best for: Traditional full-text search
Features: Fuzzy matching, flexible word order
Highlighting: Shows matched terms
Speed: Fast

🧠 Vector Semantic Search

Best for: Understanding context and meaning
Features: Semantic similarity, concept matching
Cross-language: Works across languages
Speed: Moderate (requires embedding generation)

🔧 API Endpoints

GET /api/health - Health check
GET /api/stats - Collection statistics
GET /api/languages - Available languages
GET /api/search - Search with multiple methods
POST /api/fetch-articles - Fetch articles from Guardian
POST /api/generate-embeddings - Generate embeddings (SSE)

📦 Deployment

Quick Deploy Options

Vercel/Netlify (Frontend + Serverless Backend)
Railway/Render (Full-stack deployment)
Docker (Container deployment)

Environment Setup for Production

Set NODE_ENV=production
Configure CORS for your domain
Set up proper MongoDB connection string
Consider setting up SSL/TLS

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

MIT License - see LICENSE file for details

🆘 Support

Issues: Create a GitHub issue
Questions: Check the documentation or create a discussion
Guardian API: Official documentation

🎯 Example Searches to Try

"artificial intelligence" - Compare how each method handles tech terms
"renewable energy" - See semantic understanding across languages
"salud" (Spanish for health) - Test cross-language capabilities
"climate" - Watch vector search find related concepts like "environment"

Ready to explore the future of search? 🚀

Get started with npm run dev and open http://localhost:3000!# vector-search

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
frontend		frontend
src		src
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
SETUP.md		SETUP.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

amiram/vector-search

Folders and files

Latest commit

History

Repository files navigation