Thanks to visit codestin.com
Credit goes to github.com

Skip to content

satish860/vectorless

Repository files navigation

Vectorless RAG

A Python project for building RAG (Retrieval-Augmented Generation) applications without vector embeddings, focusing on legal document analysis using the CUAD (Contract Understanding Atticus Dataset).

Project Structure

vectorless/
├── src/                    # Core source code
├── scripts/                # Processing scripts
│   ├── process_contract.py # Main contract processing pipeline
│   └── run_all_41_questions.py # Sample evaluation script
├── docs/                   # Documentation
│   ├── README.md          # Detailed documentation
│   └── GENERALIZED_WORKFLOW.md # Workflow documentation
├── data/                   # Input datasets
├── sample_dataset/         # Sample data for development
├── output/                 # Generated outputs
│   ├── results/           # Processing results
│   └── segmentation_results/ # Cached segmentation data
├── main.py                # Entry point
├── pyproject.toml         # Project configuration
└── CLAUDE.md             # AI assistant instructions

Quick Start

# Install dependencies
uv sync

# Run main application
uv run python main.py

# Process a specific contract
uv run python scripts/process_contract.py --contract-index 0

# Run evaluation on sample data
uv run python scripts/run_all_41_questions.py

Features

  • Document segmentation without vector embeddings
  • Parallel question processing
  • Intelligent caching for performance
  • Comprehensive evaluation metrics
  • Generalizable workflow for different document types

See docs/ for detailed documentation and workflow guides.

About

A novel RAG approach without vector embeddings for legal document analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages