Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Shubhamm-02/ccpa-reasoning-system

Repository files navigation

CCPA Compliance Reasoning System

A hackathon project that parses the California Consumer Privacy Act (CCPA) from a PDF, indexes its sections for semantic retrieval, and actively reasons over business practices using a local LLM (Meta Llama 3 8B).

What's included

  • parse_statute.py: Extracts the 45 legal sections from the raw ccpa_statute.pdf.
  • ccpa_sections.json: The extracted sections (you don't strictly need to re-run the parser unless this file is deleted).
  • retrieval.py: Uses sentence-transformers and faiss-cpu to index sections and perform natural language semantic search.
  • reasoning.py: Uses llama-cpp-python and a local Llama 3 8B model to evaluate business scenarios against the retrieved CCPA sections, outputting strict JSON compliance judgements.
  • compliance_checker.py: The top-level entry point that uses keyword heuristics to short-circuit obvious violations before falling back to the LLM reasoning engine.

Setup Instructions

1. Python Environment

Requires Python 3.9+ (3.11 recommended).

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate
pip install -r requirements.txt

2. Download the LLM Model

The reasoning engine uses a quantized Llama 3 8B model. You need to download the .gguf file to a models/ directory in the project root.

# First, ensure you have the models directory
mkdir -p models

# Download using huggingface-cli (included in requirements.txt)
huggingface-cli download lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \
  Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
  --local-dir models

(Note: This is a ~4.9GB download and may take a few minutes).

3. Verify the Installation

Run the two test scripts to confirm everything is working:

Test the Semantic Retriever:

python retrieval.py

(You should see it retrieve sections relating to selling user data.)

Test the Reasoning Engine:

python reasoning.py

(This will run 6 diverse test scenarios through the local Llama model. It might take 10-20 seconds to load the model into RAM for the first time.)

Test the Compliance Checker (Top-Level):

python compliance_checker.py

(This tests the heuristic short-circuits and LLM fallbacks.)

About

A containerized legal reasoning engine that analyzes natural-language business practices and determines potential violations of the California Consumer Privacy Act (CCPA), returning strictly structured statutory citations using a hybrid retrieval + constrained LLM architecture.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages