Thanks to visit codestin.com
Credit goes to github.com

Skip to content

amiram/vector-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Vector Search Demo

A comprehensive demonstration of MongoDB Atlas Vector Search capabilities, featuring multilingual articles and real-time comparison of different search methods.

πŸš€ Features

  • πŸ” Multiple Search Methods: Compare MongoDB Regex, Atlas Lucene, and Vector Semantic search side-by-side
  • 🌍 Multilingual Support: Search across English, Spanish, and French articles
  • πŸ“Š Admin Dashboard: Real-time statistics and management interface
  • πŸ“° Article Management: Fetch articles directly from The Guardian API
  • 🧠 Vector Generation: Generate embeddings for semantic search with real-time progress
  • βš™οΈ Web-Based Management: Everything managed through a clean, modern web interface

πŸ—οΈ Architecture

Backend

  • Express.js server with TypeScript
  • MongoDB Atlas for document storage
  • Vector Search Index for semantic search
  • Atlas Search Index for full-text search
  • Local embeddings using Xenova/transformers (free, no API keys needed)

Frontend

  • React with TypeScript
  • Responsive design with modern UI components
  • Real-time progress tracking for long-running operations
  • Server-sent events for live updates

πŸ“‹ Prerequisites

  1. MongoDB Atlas Account (free tier works)
  2. Guardian API Key (free from The Guardian Open Platform)
  3. Node.js 16+ and npm

πŸ› οΈ Setup

1. Environment Variables

Create a .env file in the root directory:

# Required
MONGODB_URI=mongodb+srv://username:[email protected]/vector-search

# Optional (for article fetching)
GUARDIAN_API_KEY=your-guardian-api-key-here

2. Install Dependencies

# Install backend dependencies
npm install

# Install frontend dependencies
npm run install-frontend

3. MongoDB Atlas Setup

  1. Create a new cluster in MongoDB Atlas
  2. Create a database called vector-search
  3. Create a collection called articles
  4. Create two search indexes:

Vector Search Index

  • Index Name: vector_index
  • Configuration:
{
  "fields": [
    {
      "numDimensions": 384,
      "path": "contentVector",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "languageCode",
      "type": "filter"
    },
    {
      "path": "wordCount",
      "type": "filter"
    }
  ]
}

Atlas Search Index

  • Index Name: search-index
  • Configuration:
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "text",
        "analyzer": "lucene.standard"
      },
      "content": {
        "type": "text",
        "analyzer": "lucene.standard"
      },
      "languageCode": {
        "type": "token"
      },
      "wordCount": {
        "type": "number"
      }
    }
  }
}

πŸš€ Running the Application

Development Mode

npm run dev

This starts both the server (port 3001) and React frontend (port 3000).

Production Mode

# Build the application
npm run build
npm run build-frontend

# Start the server
npm start

πŸ“± Using the Application

πŸ” Search Tab

  1. Enter a search term (e.g., "artificial intelligence", "renewable energy")
  2. Select search methods to compare (MongoDB, Atlas, Vector, or all)
  3. Filter by language (optional)
  4. Set minimum word count (optional)
  5. Click Search to see results from all selected methods

βš™οΈ Admin Tab

πŸ“Š Statistics Dashboard

  • View total articles and embedding progress
  • See language distribution breakdown
  • Real-time statistics with refresh button

πŸ“° Fetch Articles

  1. Set your Guardian API key in the .env file (see setup instructions)
  2. Set number of articles to fetch (1-1000)
  3. Optionally specify a topic for focused content
  4. Click "Fetch Articles" to pull fresh content from The Guardian

🧠 Generate Embeddings

  1. Click "Generate Embeddings" for articles without vectors
  2. Watch real-time progress with percentage and status updates
  3. Automatic refresh of statistics when complete

🌟 Search Method Comparison

πŸ” MongoDB Regex Search

  • Best for: Exact phrase matching
  • Search method: Case-insensitive regex on whole search term
  • Case insensitive: Yes
  • Speed: Very fast

πŸ“Š Atlas Lucene Search

  • Best for: Traditional full-text search
  • Features: Fuzzy matching, flexible word order
  • Highlighting: Shows matched terms
  • Speed: Fast

🧠 Vector Semantic Search

  • Best for: Understanding context and meaning
  • Features: Semantic similarity, concept matching
  • Cross-language: Works across languages
  • Speed: Moderate (requires embedding generation)

πŸ”§ API Endpoints

  • GET /api/health - Health check
  • GET /api/stats - Collection statistics
  • GET /api/languages - Available languages
  • GET /api/search - Search with multiple methods
  • POST /api/fetch-articles - Fetch articles from Guardian
  • POST /api/generate-embeddings - Generate embeddings (SSE)

πŸ“¦ Deployment

Quick Deploy Options

  1. Vercel/Netlify (Frontend + Serverless Backend)
  2. Railway/Render (Full-stack deployment)
  3. Docker (Container deployment)

Environment Setup for Production

  • Set NODE_ENV=production
  • Configure CORS for your domain
  • Set up proper MongoDB connection string
  • Consider setting up SSL/TLS

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ†˜ Support

  • Issues: Create a GitHub issue
  • Questions: Check the documentation or create a discussion
  • Guardian API: Official documentation

🎯 Example Searches to Try

  • "artificial intelligence" - Compare how each method handles tech terms
  • "renewable energy" - See semantic understanding across languages
  • "salud" (Spanish for health) - Test cross-language capabilities
  • "climate" - Watch vector search find related concepts like "environment"

Ready to explore the future of search? πŸš€

Get started with npm run dev and open http://localhost:3000!# vector-search

vector-search

vector-search

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published