A hackathon project that parses the California Consumer Privacy Act (CCPA) from a PDF, indexes its sections for semantic retrieval, and actively reasons over business practices using a local LLM (Meta Llama 3 8B).
parse_statute.py: Extracts the 45 legal sections from the rawccpa_statute.pdf.ccpa_sections.json: The extracted sections (you don't strictly need to re-run the parser unless this file is deleted).retrieval.py: Usessentence-transformersandfaiss-cputo index sections and perform natural language semantic search.reasoning.py: Usesllama-cpp-pythonand a local Llama 3 8B model to evaluate business scenarios against the retrieved CCPA sections, outputting strict JSON compliance judgements.compliance_checker.py: The top-level entry point that uses keyword heuristics to short-circuit obvious violations before falling back to the LLM reasoning engine.
Requires Python 3.9+ (3.11 recommended).
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
pip install -r requirements.txtThe reasoning engine uses a quantized Llama 3 8B model. You need to download the .gguf file to a models/ directory in the project root.
# First, ensure you have the models directory
mkdir -p models
# Download using huggingface-cli (included in requirements.txt)
huggingface-cli download lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
--local-dir models(Note: This is a ~4.9GB download and may take a few minutes).
Run the two test scripts to confirm everything is working:
Test the Semantic Retriever:
python retrieval.py(You should see it retrieve sections relating to selling user data.)
Test the Reasoning Engine:
python reasoning.py(This will run 6 diverse test scenarios through the local Llama model. It might take 10-20 seconds to load the model into RAM for the first time.)
Test the Compliance Checker (Top-Level):
python compliance_checker.py(This tests the heuristic short-circuits and LLM fallbacks.)