HoarderMCP

HoarderMCP is a Model Context Protocol (MCP) server designed for web content crawling, processing, and vector storage. It provides tools for ingesting web content, extracting relevant information, and making it searchable through vector similarity search.

Features

Web Crawling: Crawl websites and sitemaps to extract content
Advanced Content Processing:
- Semantic chunking for Markdown with header-based splitting
- Code-aware chunking for Python and C# with syntax preservation
- Configurable chunk sizes and overlap for optimal context
- Token-based size optimization
Vector Storage: Store and search content using vector embeddings
API-First: RESTful API for easy integration with other services
Asynchronous: Built with async/await for high performance
Extensible: Support for multiple vector stores (Milvus, FAISS, Chroma, etc.)
Observability: Integrated with Langfuse for tracing and monitoring

Installation

Clone the repository:

git clone https://github.com/yourusername/hoardermcp.git
cd hoardermcp

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package in development mode:
```
pip install -e .
```
Install development dependencies:
```
pip install -r requirements-dev.txt
```

Prerequisites

Python 3.9+
Milvus (or another supported vector store)
Docker (for running Milvus)

Running the Server

Start Milvus using Docker:
```
docker-compose up -d
```
Run the development server:
```
python -m hoardermcp.main --reload
```
The API will be available at http://localhost:8000

API Documentation

Once the server is running, you can access:

OpenAPI docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Configuration

Configuration can be provided through environment variables or a .env file. See .env.example for available options.

Usage

Ingesting Content

import httpx

# Ingest a webpage
response = httpx.post(
    "http://localhost:8000/ingest",
    json={
        "sources": [
            {
                "url": "https://example.com",
                "content_type": "text/html"
            }
        ]
    }
)
print(response.json())

Advanced Chunking Example

from hoardermcp.core.chunking import ChunkingFactory, ChunkingConfig, ChunkingStrategy
from hoardermcp.models.document import Document, DocumentMetadata, DocumentType

# Create a markdown document
markdown_content = """# Title\n\n## Section 1\nContent for section 1\n\n## Section 2\nContent for section 2"""
doc = Document(
    id="test",
    content=markdown_content,
    metadata=DocumentMetadata(
        source="example.md",
        content_type=DocumentType.MARKDOWN
    )
)

# Configure chunking
config = ChunkingConfig(
    strategy=ChunkingStrategy.SEMANTIC,
    chunk_size=1000,
    chunk_overlap=200
)

# Get appropriate chunker and process document
chunker = ChunkingFactory.get_chunker(
    doc_type=DocumentType.MARKDOWN,
    config=config
)
chunks = chunker.chunk_document(doc)

print(f"Document split into {len(chunks)} chunks")
for i, chunk in enumerate(chunks):
    print(f"Chunk {i + 1} (length: {len(chunk.content)}): {chunk.content[:50]}...")

Searching Content

import httpx

# Search for similar content
response = httpx.post(
    "http://localhost:8000/search",
    json={
        "query": "What is HoarderMCP?",
        "k": 5
    }
)
print(response.json())

Development

Code Style

This project uses:

Black for code formatting
isort for import sorting
mypy for static type checking

Run the following commands before committing:

black .
isort .
mypy .

Testing

Run tests using pytest:

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
hoardermcp		hoardermcp
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.windsurfrules		.windsurfrules
Dockerfile		Dockerfile
PLANNING.MD		PLANNING.MD
README-DEPLOYMENT.md		README-DEPLOYMENT.md
README.md		README.md
TASK.MD		TASK.MD
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
start.sh		start.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HoarderMCP

Features

Installation

Prerequisites

Running the Server

API Documentation

Configuration

Usage

Ingesting Content

Advanced Chunking Example

Searching Content

Development

Code Style

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

illogical/HoarderMCP

Folders and files

Latest commit

History

Repository files navigation

HoarderMCP

Features

Installation

Prerequisites

Running the Server

API Documentation

Configuration

Usage

Ingesting Content

Advanced Chunking Example

Searching Content

Development

Code Style

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages