Lightning-fast regex meets AI-powered semantic search
Find exact patterns and contextually relevant matches with intelligent history tracking and REST API integration.
Railway has a free tier due to which project might not be available by the time you open that link. In that case, refer to the github readme for usage instructions.
DeepGrep follows a modular architecture with clear separation between the web layer, core search engines, and data persistence:
graph TB
subgraph "Client Layer"
User[👤 User]
Browser[🌐 Web Browser]
end
subgraph "Web Layer"
UI[Web UI<br/>HTML/CSS/JS + Tailwind]
Flask[Flask Application<br/>Rate Limiting + CORS]
end
subgraph "Core Search Engines"
RegexEngine[Custom Regex Engine]
SemanticEngine[Semantic Search Engine<br/>SpaCy + NLTK]
subgraph "Regex Components"
Parser[Pattern Parser]
Matcher[State-based Matcher<br/>LRU Cache]
end
end
subgraph "Data Layer"
HistoryDB[(SQLite Database<br/>Search History)]
SpacyModel[SpaCy Model<br/>en_core_web_md]
WordNet[NLTK WordNet<br/>Antonym Filtering]
end
User --> Browser
Browser --> UI
UI --> Flask
Flask --> RegexEngine
Flask --> SemanticEngine
Flask --> HistoryDB
RegexEngine --> Parser
Parser --> Matcher
SemanticEngine --> SpacyModel
SemanticEngine --> WordNet
style User fill:#e1f5ff
style Flask fill:#ffd6e0
style RegexEngine fill:#fff4cc
style SemanticEngine fill:#d4f1d4
style HistoryDB fill:#e8d5f2
- Web UI: Interactive interface with dual search modes (regex/semantic), built with Tailwind CSS
- Flask Application: REST API with rate limiting, CORS support, and comprehensive logging
- Custom Regex Engine: From-scratch implementation supporting complex patterns, quantifiers, and capture groups
- Semantic Search Engine: AI-powered similarity matching using word embeddings and POS filtering
- Search History: Persistent SQLite database tracking all searches with timestamps and analytics
DeepGrep combines a high-performance custom regex engine with AI-powered semantic search, backed by persistent history tracking. Key features include:
- Full Regex Support: Implements a complete regex matcher from scratch, supporting literals, character classes (
\d,\w,[abc]), quantifiers (*,+,?,{n,m}), alternations (|), anchors (^,$), capture groups, and backreferences. - Efficient Matching: Uses state-based matching with caching for compiled patterns to ensure fast performance on large texts.
- Line-by-Line Processing: Optimized for searching through multi-line text inputs.
- AI-Powered Similarity: Leverages SpaCy NLP models to find semantically related words based on vector similarity.
- Antonym Avoidance: Integrates WordNet to exclude antonyms and irrelevant matches.
- POS Filtering: Filters results by part-of-speech (e.g., adjectives, verbs) for more accurate contextual matches.
- Configurable Thresholds: Adjustable similarity thresholds and top-N results for fine-tuned searches.
- Persistent Logging: SQLite-backed database to log all searches with timestamps, match counts, and file sources.
- History Queries: Retrieve recent searches, top-used patterns, or export/import history to/from JSON.
- Automatic Cleanup: Maintains a maximum history size to prevent database bloat.
- Flask Web App: Simple HTML/CSS/JS frontend for interactive searches.
- REST API: Endpoints for regex and semantic searches, with JSON responses.
- Rate Limiting: Configurable request limits to prevent abuse.
- CORS Support: Cross-origin requests enabled for integration.
- Logging: Comprehensive logging for debugging and monitoring.
- Docker Support: Dockerfile for easy containerization and deployment.
- Environment Configuration: Uses
python-decouplefor secure, environment-based config (e.g., via.envfiles). - Production Ready: Includes lazy initialization, error handling, and scalable architecture.
- Unit Tests: Test suite in the
tests/directory for core functionality. - Code Quality: Integrated with Qodana for static analysis.
- API Testing: Postman collections for endpoint validation.
DeepGrep is optimized for low resource environments while maintaining high throughput:
| Metric | Result | Context |
|---|---|---|
| Regex Throughput | ~85 lines/sec | Complex patterns on random text |
| Memory Footprint | < 1MB | Peak memory during heavy regex matching |
| History DB Write | ~675 ops/sec | SQLite write performance |
Benchmarks run on Python 3.14 on macOS.
Clone the repository:
git clone https://github.com/alwaysvivek/deepgrep.git
cd deepgrepInstall dependencies
pip install -r requirements.txt
python -m spacy download en_core_web_md
Copy .env.example to .env and configure as needed.
python -m deepgrep.web.app
docker build -t deepgrep .
docker run -p 8000:8000 deepgrep
Open http://localhost:8000 in your browser.
Enter text and patterns for regex search or keywords for semantic search.
Use tools like curl or Postman to interact with the API.
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"pattern": "hello.*world", "text": "hello beautiful world"}'
curl -X POST http://localhost:8000/semantic \
-H "Content-Type: application/json" \
-d '{"keyword": "happy", "text": "I am joyful and content."}'
Serves the home page.
Performs regex search.
Request body:
{
"pattern": "string",
"text": "string"
}
Response:
{
"matches": [],
"history": []
}
Performs semantic search.
Request body:
{
"keyword": "string",
"text": "string"
}
Response:
{
"matches": [["word", score]]
}
Configure via environment variables (or .env file):
PORT=8000
DEBUG=True
HOST=0.0.0.0
RATE_LIMIT_ENABLED=True
RATE_LIMIT_REQUESTS=100
DB_PATH=~/.grepify_history.db
MAX_HISTORY=200
SPACY_MODEL=en_core_web_md
SEMANTIC_THRESHOLD=0.45
SEMANTIC_TOP_N=10
Contributions are welcome!
- Fork the repo
- Create a feature branch
- Add tests for new features
- Ensure code passes Qodana checks
- Submit a pull request
This project is licensed under the MIT License.
See the LICENSE file for details.