EspressoMind - Because research should be as stimulating as your morning coffee. ☕
"The scholar's AI research companion"
AI-Powered Research Assistant with Automated Source Verification
- Hybrid search across web + academic databases (arXiv, PubMed)
- Automatic source quality assessment with confidence scoring
- Dynamic research strategy selection based on query complexity
- APA-style citation generation
- Automatic reference section formatting
- Source metadata extraction (authors, dates, publishers)
- PDF text extraction with PyMuPDF
- Image OCR using Tesseract
- Web page scraping with Playwright
- Async pipeline for concurrent processing
- Intelligent caching of frequent queries
- Progressive result refinement
| Component | Purpose |
|---|---|
| FastAPI | REST API backend |
| LangGraph | Agent workflow orchestration |
| Ollama | Local LLM (Mistral/Llama2) |
| Playwright | Browser automation |
| Pydantic v2 | Data validation |
arxiv - Academic paper search
pymupdf - PDF processing
pytesseract - Image OCR
requests - HTTP client
python-multipart - File uploads
- Python 3.10+
- Playwright browsers (
playwright install) - Tesseract OCR (
brew install tesseracton macOS)
# Clone repository
git clone https://github.com/dubeyakshat07/EspressoMind.git
cd EspressoMind
# Create virtual environment
conda create -n espresso python=3.10
# Install dependencies
pip install -r requirements.txt
uvicorn backend.main:app --reload --port 8000Basic Text Query:
import requests
response = requests.post(
"http://localhost:8000/analyze",
json={
"query": "Explain quantum entanglement",
"depth": "balanced" # quick/balanced/deep
}
)With PDF Upload:
with open('research.pdf', 'rb') as f:
response = requests.post(
"http://localhost:8000/analyze",
files={
'file': ('research.pdf', f),
'file_type': (None, 'pdf')
},
data={
'query': 'Summarize this paper'
}
){
"answer": "Quantum entanglement is... [1][2]",
"sources": [
{
"title": "Experimental observation of quantum entanglement",
"url": "https://arxiv.org/abs/1234.5678",
"source_type": "arxiv",
"authors": ["Einstein, A.", "Podolsky, B."],
"publish_date": "1935-05-15",
"confidence": 0.92
}
],
"related_queries": [
"Applications of quantum entanglement in computing",
"Recent breakthroughs in entanglement research"
],
"confidence_score": 0.87
}EspressoMind/
├── backend/
│ ├── agents/
│ │ ├── research_agent.py # Main research logic
│ │ └── tools.py # Search/scraping tools
│ ├── schemas/
│ │ └── models.py # Pydantic models
│ └── main.py # FastAPI app
├── frontend/ # Streamlit UI
│ └── app.py
├── tests/ # Test suite
├── docs/ # Documentation
├── requirements.txt # Dependencies
└── README.md # This file
Edit .env file:
# Search Configuration
SEARXNG_URL=https://search.example.com
MAX_WEB_RESULTS=5
MAX_ACADEMIC_RESULTS=3
# AI Configuration
OLLAMA_MODEL=mistral
LLM_TEMPERATURE=0.3
# Performance
SCRAPE_TIMEOUT=20
PDF_PROCESS_TIMEOUT=30Run the test suite:
pytest tests/ -v --cov=backend --cov-report=htmlKey test coverage:
- API endpoints
- Search tools
- Document processing
- Citation generation
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Project Maintainer - [Akshat Dubey]