A comprehensive demonstration of MongoDB Atlas Vector Search capabilities, featuring multilingual articles and real-time comparison of different search methods.
- π Multiple Search Methods: Compare MongoDB Regex, Atlas Lucene, and Vector Semantic search side-by-side
- π Multilingual Support: Search across English, Spanish, and French articles
- π Admin Dashboard: Real-time statistics and management interface
- π° Article Management: Fetch articles directly from The Guardian API
- π§ Vector Generation: Generate embeddings for semantic search with real-time progress
- βοΈ Web-Based Management: Everything managed through a clean, modern web interface
- Express.js server with TypeScript
- MongoDB Atlas for document storage
- Vector Search Index for semantic search
- Atlas Search Index for full-text search
- Local embeddings using Xenova/transformers (free, no API keys needed)
- React with TypeScript
- Responsive design with modern UI components
- Real-time progress tracking for long-running operations
- Server-sent events for live updates
- MongoDB Atlas Account (free tier works)
- Guardian API Key (free from The Guardian Open Platform)
- Node.js 16+ and npm
Create a .env file in the root directory:
# Required
MONGODB_URI=mongodb+srv://username:[email protected]/vector-search
# Optional (for article fetching)
GUARDIAN_API_KEY=your-guardian-api-key-here# Install backend dependencies
npm install
# Install frontend dependencies
npm run install-frontend- Create a new cluster in MongoDB Atlas
- Create a database called
vector-search - Create a collection called
articles - Create two search indexes:
- Index Name:
vector_index - Configuration:
{
"fields": [
{
"numDimensions": 384,
"path": "contentVector",
"similarity": "cosine",
"type": "vector"
},
{
"path": "languageCode",
"type": "filter"
},
{
"path": "wordCount",
"type": "filter"
}
]
}- Index Name:
search-index - Configuration:
{
"mappings": {
"dynamic": false,
"fields": {
"title": {
"type": "text",
"analyzer": "lucene.standard"
},
"content": {
"type": "text",
"analyzer": "lucene.standard"
},
"languageCode": {
"type": "token"
},
"wordCount": {
"type": "number"
}
}
}
}npm run devThis starts both the server (port 3001) and React frontend (port 3000).
# Build the application
npm run build
npm run build-frontend
# Start the server
npm start- Enter a search term (e.g., "artificial intelligence", "renewable energy")
- Select search methods to compare (MongoDB, Atlas, Vector, or all)
- Filter by language (optional)
- Set minimum word count (optional)
- Click Search to see results from all selected methods
- View total articles and embedding progress
- See language distribution breakdown
- Real-time statistics with refresh button
- Set your Guardian API key in the
.envfile (see setup instructions) - Set number of articles to fetch (1-1000)
- Optionally specify a topic for focused content
- Click "Fetch Articles" to pull fresh content from The Guardian
- Click "Generate Embeddings" for articles without vectors
- Watch real-time progress with percentage and status updates
- Automatic refresh of statistics when complete
- Best for: Exact phrase matching
- Search method: Case-insensitive regex on whole search term
- Case insensitive: Yes
- Speed: Very fast
- Best for: Traditional full-text search
- Features: Fuzzy matching, flexible word order
- Highlighting: Shows matched terms
- Speed: Fast
- Best for: Understanding context and meaning
- Features: Semantic similarity, concept matching
- Cross-language: Works across languages
- Speed: Moderate (requires embedding generation)
GET /api/health- Health checkGET /api/stats- Collection statisticsGET /api/languages- Available languagesGET /api/search- Search with multiple methodsPOST /api/fetch-articles- Fetch articles from GuardianPOST /api/generate-embeddings- Generate embeddings (SSE)
- Vercel/Netlify (Frontend + Serverless Backend)
- Railway/Render (Full-stack deployment)
- Docker (Container deployment)
- Set
NODE_ENV=production - Configure CORS for your domain
- Set up proper MongoDB connection string
- Consider setting up SSL/TLS
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
- Issues: Create a GitHub issue
- Questions: Check the documentation or create a discussion
- Guardian API: Official documentation
- "artificial intelligence" - Compare how each method handles tech terms
- "renewable energy" - See semantic understanding across languages
- "salud" (Spanish for health) - Test cross-language capabilities
- "climate" - Watch vector search find related concepts like "environment"
Ready to explore the future of search? π
Get started with npm run dev and open http://localhost:3000!# vector-search