0% found this document useful (0 votes)

11 views19 pages

Project Paper

The document presents a mid-semester report on a mini project aimed at developing an AI-driven legal platform to enhance access to justice in India. It highlights the challenges of traditional legal support and proposes a comprehensive solution utilizing advanced AI techniques, including a chatbot, hybrid search, and community engagement features. The project emphasizes the importance of bridging technology with legal assistance to create a scalable and inclusive platform for users.

Uploaded by

nagavardhanairforce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views19 pages

Project Paper

Uploaded by

nagavardhanairforce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Indian Institute of Information Technology Surat

Mid Semester Report on

Mini Project - (CS 604)

Submitted by

M.Nagavardhan- ui22cs50

Yogesh Nade- ui22cs51

Faculty Supervisor

Dr. Shraddha Patel

Department of Computer Science and

Engineering Indian Institute of Information
Technology Surat Gujarat-394190, India

March - 2025
Acknowledgement

I am deeply grateful to everyone who supported and guided me throughout the journey of
completing this project: Empowering Access to Justice: An AI-Driven Legal Platform for Instant
Assistance, Document Insights, Lawyer Discovery, and Community Support.

I extend my sincere thanks to my faculty supervisor, Dr. [Supervisor's Name], whose thoughtful
insights and steady encouragement were instrumental in shaping both the technical foundation and
broader vision of this project. Their guidance pushed me to think critically and refine my ideas.

I am also truly appreciative of our esteemed director, Dr. Rajeev Shorey, for creating an environment
that nurtures creativity and innovation — a space where ideas grow and bold solutions emerge.

A heartfelt thank you to my friends and peers, whose honest feedback, collaborative mindset, and
engaging discussions added new dimensions to this work. Their support kept me motivated and open
to new perspectives.

Most importantly, I am forever thankful to my family for their unwavering belief in me. Their constant
encouragement gave me the strength to persevere and reminded me of the purpose behind this
project — to build something meaningful and impactful.

This project is not just a technical endeavor but a step toward bridging the gap between technology
and access to justice. I am truly grateful to everyone who played a part in this incredible learning
experience.

1
Abstract

Access to legal assistance in India remains a significant challenge, with millions facing barriers due to
high costs, lack of awareness, and limited access to legal professionals. Traditional legal support relies
on manual consultations, which are time-consuming, expensive, and geographically restrictive. This
study explores the development of an AI-driven legal platform that combines advanced AI techniques
like Retrieval-Augmented Generation (RAG) and graph-based reasoning to bridge this gap.

The project proposes a comprehensive platform with the following features:

● An AI-powered chatbot using LLaMA2-7B fine-tuned on Indian legal data.

● Hybrid search combining ChromaDB’s semantic retrieval and Neo4j’s
graph-based legal reasoning.
● A user-driven community forum to promote legal awareness and collaboration.
● Real-time legal updates on laws, amendments, and landmark judgments.
● AI-powered document analysis and Q&A on uploaded legal PDFs.

The platform aims to provide a scalable, inclusive, and AI-enhanced legal solution tailored for India’s
unique legal landscape.

Keywords: AI Legal Assistant, Retrieval-Augmented Generation (RAG), ChromaDB,

Neo4j, LLaMA2-7B, Indian Legal Data, Explainable AI, Legal Tech.

2
Table of Contents

S.No Title Page No. Remark

4
1. List of Tables
5
2. List of Figures

3. Abbreviations 6

4. Chapter 1: Problem Statement 7

5. Chapter 2: Literature Survey 8

9
6. Chapter 3:Novelty
10
7. Chapter 4: Methodology
13
8. Chapter 5: Result Analysis

9. Chapter 6: Conclusion 18

3
List of Tables

S.No Table No. Page No.

1. Table:1 Literature Survey 8

4
List of Figures

S.No Figure No. Page No.

10
1. Fig 4.1 RAG Architecture

2. Fig 4.2 RAG Query Flow 11

11
4. Fig 4.3 LLaMA2-7B Fine-Tuning Pipeline
15
5. Fig 4.4 Fine-Tuning LLaMA2-7B
16
6. Fig 4.5 Neo4j diagram

5
Abbreviations /Notations
RAG: Retrieval-Augmented Generation

LLaMA2-7B: Large Language Model Meta AI

AI: Artificial Intelligence

NLP: Natural Language Processing

ChromaDB: Vector Database for Semantic Search

Neo4j: Graph-Based Database

BERT: Bidirectional Encoder Representations from

Transformers LEGAL-BERT: BERT fine-tuned on legal data

RoBERTa: Robustly Optimized BERT Pretraining

Approach TF-IDF: Term Frequency-Inverse Document

Frequency

BM25: Best Matching 25 (ranking function for information retrieval)

VAE: Variational Autoencoder

SAC: Soft Actor-Critic

6
Chapter 1: Problem Statement
Empowering Access to Justice: An AI-Driven Legal Platform for Instant Assistance,
Document Insights, Lawyer Discovery, and Community Support

Access to legal assistance in India remains a critical challenge, with millions struggling to
overcome barriers like high legal costs, lack of legal awareness, and limited access to verified
legal professionals. Traditional methods of legal support often involve manual consultations,
which are time-consuming, costly, and restricted by geographical boundaries. Existing online
legal platforms lack AI-powered real-time assistance, reliable legal document insights, and
community collaboration.

Key challenges identified:

1. Limited AI Legal Assistance: Basic guidance without trusted, verifiable

sources leads to misinformation.
2. Complex Legal Documents: Users face difficulties in interpreting legal texts.
3. Nearby Lawyer Discovery: Inefficient access to verified legal professionals.
4. Lack of Community Engagement: Few platforms provide
collaborative legal discussions.
5. Delayed Legal Updates: Access to recent laws and landmark judgments is often slow.

7
Chapter 2:Related work , Literature Survey

Resource Year Title Algorithm + Limitation

Concept
Resource1
2024 AI-ML-Based Legal Assistant NLP, RAG, No fine-tuning on
for Contracts Transformer legal QA datasets
Models or Neo4j use.
Resource2
2024 Legal AI for Document Retrieval BERT, Semantic Lacks graph-based
Search case-law
reasoning.
Resource3
2024 Custom GPT Legal Model GPT, OCR, UI No graph-based
integration reasoning,
community Q&A, or
RAG.
Resource4 2024
Transformer-Based Legal BERT, No graph-based
Models LEGAL-BERT, reasoning,
RoBERTa, TF-IDF, real-time updates,
BM25 or generative QA.

Resource5 2024
SAC-VAE for Legal Text VAE for Complexity,
Summarization dimensionality domain-specific
reduction, SAC for focus, and high
policy learning computational
resource
requirements.

Table:1 Literature Survey

8
Chapter 3 : Novelty of Your AI Legal Platform

Round-the-Clock AI Legal Support: Get 24/7 legal assistance powered by LLaMA2-7B,

trained specifically on Indian legal data to provide accurate, context-aware advice tailored to
local laws.

Smart Legal Search with RAG + Neo4j: Combines ChromaDB’s advanced search
technology with Neo4j’s graph-based legal reasoning, offering deeper insights into case
laws and legal precedents.

Effortless Legal Document Analysis: Upload legal PDFs, quickly extract key points, and
ask AI-powered questions — simplifying complex legal jargon for everyone.

Community-Powered Legal Q&A: Join an interactive platform where both legal experts and
the public can ask, answer, and validate legal questions, encouraging collaborative
problem-solving.

Live Legal Updates and Insights: Stay informed with real-time updates on new laws,
important court rulings, and amendments — ensuring you're always up to date.

Accessible and Scalable Legal Aid: Built for individuals, small businesses, and
underrepresented communities, making legal help more inclusive and affordable through AI.

9
Chapter 4:Methodology

Retrieval-Augmented Generation (RAG) Retrieval-Augmented Generation (RAG) combines

information retrieval with text generation models, enhancing their ability to produce
accurate and contextually relevant responses by incorporating external knowledge sources.
Proposed by researchers at Facebook AI, RAG bridges the gap between static language
models and dynamic information retrieval systems.

Fig 4.1: RAG Model Architecture

RAG Workflow
● Query Processing
○ The user's legal query is preprocessed (tokenization, normalization).
○ Embeddings are generated using a pre-trained
transformer model (LLaMA2-7B).
● Retrieval Step
○ Vector embeddings of legal documents are stored in ChromaDB.
○ Graph-based relationships between case laws, statutes, and
precedents are maintained in Neo4j.
○ Hybrid retrieval combines vector similarity search (cosine
similarity) and graph traversal algorithms.
● Generation Step
○ Retrieved documents and nodes are passed as context to
the LLaMA2-7B model.
○ The model generates human-like, legally grounded responses.
● Post-processing
○ Responses are filtered to remove irrelevant information.
○ Citations are validated against authoritative sources.

10
Fig 4.2: RAG Query Flow

Fine-Tuning LLaMA2-7B
Fine-tuning involves training the LLaMA2-7B model on domain-specific datasets to
specialize it for legal applications. We used the following datasets:

● LawyerChat: A corpus of legal conversations.

● FALQU: Frequently Asked Legal Questions dataset.
● JEC-QA: Judicial Exam Corpus for legal question-answer

pairs. Fine-Tuning Steps:

1. Preprocessing
○ Tokenizing and formatting the data into instruction-based prompts.
○ Padding/truncating sequences to match model input size.
2. Training
○ Using LoRA (Low-Rank Adaptation) to fine-tune select
layers without modifying the entire model.
○ Optimizing with AdamW optimizer and a learning rate scheduler.
3. Evaluation
○ Validating on a hold-out set and calculating metrics like BLEU,
ROUGE, and perplexity

Fig 4.3: LLaMA2-7B Fine-Tuning

11
Pipeline Graph-Based Retrieval with Neo4j

Neo4j, a graph database, helps model legal data by capturing relationships like precedents,
citations, and references.

Graph Schema:

● Nodes: Case Laws, Statutes, Legal Principles

● Edges: CITES, REFERS_TO, SIMILAR_TO

Traversal Algorithms:

● BFS (Breadth-First Search): For exploring case law references.

● Personalized PageRank: To rank relevant legal statutes.

Hybrid Retrieval Model

The hybrid model merges vector-based and graph-based retrieval strategies:

Final Score=α×Vector Score+(1−α)×Graph Score

where α alpha is a tunable hyperparameter.

● Vector Score: Cosine similarity between query and document embeddings.

● Graph Score: Relevance score from Neo4j

traversal Benefits:

● Improves result accuracy by leveraging both semantic similarity

and legal relationships.
● Mitigates the limitations of pure vector search by incorporating domain knowledge.

12
Chapter 5: Result Analysis
Dataset Used

6.Indian Legal Datasets for RAG-Based AI

● A curated collection of official legal documents, case

laws, and acts from trusted Indian legal sources.
● Includes statutes, Supreme Court & High Court
judgments, IPC, CrPC, and legal codes.
● Used to generate document embeddings for ChromaDB-based retrieval.

7.Fine-Tuning Datasets for LLaMA2-7B

● LawyerChat: Dataset containing Indian legal Q&A pairs

to improve conversational understanding.
● FALQU: A legal dataset covering frequently asked legal queries
and their expert responses.
● JEC-QA: Judicial and case-law-based question-answer dataset
used to enhance legal reasoning.
● Synthetic Data Generation: AI-generated legal QA pairs for
domain-specific adaptation.

8.Knowledge Graph Data (Neo4j Integration)

● Structured case-law data linking precedents, legal

entities, and statutory provisions.
● Enables contextual and relational understanding of legal

documents. Used for graph-based legal retrieval and reasoning.

The core of this legal AI system is Retrieval-Augmented Generation (RAG), which

improves response accuracy by retrieving relevant legal text before generating an
answer. This ensures that the AI does not hallucinate information and instead
grounds responses in trusted legal documents.

How RAG Works in This Project

Step 1: Creating Legal Document Embeddings (ChromaDB)

● All legal texts (IPC, CrPC, Constitution, Supreme Court judgments, etc.)
are converted into vector embeddings using sentence-transformers
(like BERT-based models).
● These embeddings capture semantic meaning, allowing AI to
retrieve relevant legal information instead of relying on
keyword matches.
● The embeddings are stored in ChromaDB, a high-performance vector database
designed for fast similarity searches.

13
Step 2: Semantic Search Using User Queries

● When a user asks a legal question (e.g., "What are the bail
provisions under IPC?"), the system converts the query into an
embedding.
● This embedding is then matched against ChromaDB’s stored legal
document embeddings to retrieve the most relevant legal
sections, case laws, and provisions.
● Unlike traditional legal search engines (which rely on keywords),
this approach allows the system to understand the intent of the
query and fetch the most contextually relevant results.

Step 3: Passing Retrieved Legal Context to LLaMA2-7B for Response Generation

● The retrieved legal text is then fed into the LLaMA2-7B model, which is fine-tuned on
Indian legal datasets.
● The AI model generates responses based on both:
1. The retrieved legal text (retrieved via ChromaDB)
2. Its own knowledge from fine-tuning
● This reduces hallucination and ensures AI responses are factually accurate and
legally grounded.

14
Fine-Tuning LLaMA2-7B on Indian Legal Datasets

To enhance accuracy, the AI model is fine-tuned on legal question-answer pairs and

structured legal texts.

Fine-Tuning Process

1. Dataset Curation: The model is trained using Indian legal datasets, including:

○ LawyerChat (Legal Q&A dataset)

○ FALQU (Legal argumentation dataset)
○ JEC-QA (Judicial case-law question-answer dataset)
○ Synthetic data (generated using case laws and legal provisions)
2. LoRA-Based Training:
○ Low-Rank Adaptation (LoRA) is used to fine-tune LLaMA2-7B efficiently on
consumer hardware.
○ 8-bit quantization is applied to reduce memory
usage while preserving accuracy.
3. Legal Language Adaptation:

○ The model is trained to understand legal terminology, citations,

and act references.
○ AI-generated responses follow a structured legal format (e.g., "As
per IPC Section 376, the punishment for...").
4. Evaluation & Optimization:

○ Model responses are evaluated using BLEU, ROUGE, and BERTScore

to measure correctness.
○ Human legal experts assess responses to ensure factual accuracy
and coherence.
○ The best-performing model is integrated into the RAG system.

Fig : 4.4 Fine-Tuning LLaMA2-7B

15
Enhancing Case-Law Retrieval with Neo4j (Graph Database)
To further improve case-law reasoning, the system integrates Neo4j, a graph-based database
that models legal relationships.

Why Use Neo4j for Legal Data?

Legal cases and statutes have complex interconnections (e.g., one case may cite multiple previous
cases). A graph-based approach helps model:
Case citations (Which cases refer to which?)
Act-to-section relationships (Which sections fall under which act?)

Lawyer & judge connections (Which lawyers have worked on which cases?)

How Neo4j Works in This System

1. Legal Data Structuring

○ Court judgments, legal provisions, and case-law citations are converted into
nodes and relationships.
○ Example: A judgment citing another case is stored as a
“Cites” relationship in Neo4j.
2. Graph-Based Querying for Better Case Retrieval
○ When a user asks a case-law-related question, the system
queries Neo4j to find relevant cases.
○ This ensures that the AI retrieves relevant precedents,
improving legal reasoning.
3. Integration with RAG for Hybrid Search
○ If a legal question requires a combination of case-law reasoning and
statutory provisions, Neo4j and ChromaDB are queried together, providing
richer legal context.

Fig : 4.5 Neo4j diagram

16
Hybrid Query Processing (Combining Keyword & Semantic Search)
Unlike traditional keyword-based legal search engines, this project employs hybrid search, which
combines:

ChromaDB (Semantic Search) – Finds relevant legal provisions based on meaning.

Neo4j (Graph Search) – Retrieves case-law citations and structured legal relations.

BM25 (Keyword Matching) – Ensures exact legal terms are considered. How Hybrid

Search Works in Legal AI

User Query Understanding

AI determines whether the query is statutory (law-based), case-law-related, or mixed.

Retrieving Relevant Legal Data

If statutory, ChromaDB is queried for legal provisions & acts.

If case-law, Neo4j is queried for similar past cases.

If mixed, both are combined to generate a comprehensive response.

Generating Final Response

Retrieved documents are passed to LLaMA2-7B, which synthesizes a legal response.

AI adds citations & reasoning, making responses more legally sound.

Current Results

● Training Loss: 0.073

● Validation Loss: 0.128
● BLEU Score: 0.81
● ROUGE Score: 0.76

Expected Results We aim for the following:

● Improved accuracy due to fine-tuning LLaMA2-7B.

● Enhanced retrieval precision with hybrid models (combining ChromaDB and Neo4j).
● Greater explainability through graph-based legal relationships.

17
Conclusion

This project explores the application of RAG for AI-powered legal assistants.

By integrating LLaMA2-7B, ChromaDB, and Neo4j, we create a system that generates

accurate legal responses by combining semantic search and graph-based reasoning.
Fine-tuning LLaMA2-7B on domain-specific datasets further boosts performance, while
the hybrid retrieval model balances vector and graph search. Future work will focus on
scaling the system with more legal datasets and refining the hybrid retrieval strategy for
better real-world applicability.

Resources
Fig 4.1
:https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.leewayhertz.com
%2Fadvanced-rag%2F&psig=AOvVaw01obkb85CNOCDtc7Fgoz-c&ust=17415747
2 430
9000&source=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCMiGj7L--
4 sDF QAAAAAdAAAAABA

Fig4.4
:https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.researchgate.net%2Ffigu
r
e%2FThe-approach-to-fine-tuning-the-pretrained-Llama-2-model-for-text-classification_fi
g2_373451301&psig=AOvVaw106m0JGrLSS0CouY8vHhpo&ust=1741575026160000&s
ource=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCOCem9
-4sDFQAAAA AdAAAAAB

Fig 4.5:
https://www.google.com/url?sa=i&url=https%3A%2F%2Fblog.gopenai.com%2Frag-applic
at i on-
with-neo4j-constructed-knowledge-graphs-and-vector-index-6178c9bb8386&psig=AOvVaw1
pG
4edlDsRWaJ-IAuw2POb&ust=1741575345862000&source=images&cd=vfe&opi=89978449
&ve d=0CBQQjRxqFwoTCLi5vOKA_IsDFQAAAAAdAAAAABAQ

Pvs4 Information
No ratings yet
Pvs4 Information
110 pages
Ai System To Assist Legal Processes Using Natural Language Processing
No ratings yet
Ai System To Assist Legal Processes Using Natural Language Processing
32 pages
Some Examples of Use of Artificial Intelligence in Legal Field
No ratings yet
Some Examples of Use of Artificial Intelligence in Legal Field
4 pages
AI's Impact on Legal Practice
No ratings yet
AI's Impact on Legal Practice
20 pages
Solution Practice 6 Consolidations 3
No ratings yet
Solution Practice 6 Consolidations 3
8 pages
Conceptual Framework
No ratings yet
Conceptual Framework
12 pages
Smith & Wesson 2013 Catalog
100% (2)
Smith & Wesson 2013 Catalog
75 pages
Spray Booth Design English
No ratings yet
Spray Booth Design English
7 pages
Agust 21
No ratings yet
Agust 21
8 pages
Hydronic Heaters Selection Spreadsheet
No ratings yet
Hydronic Heaters Selection Spreadsheet
19 pages
Strategic Management Test Bank Wheelen smbp12 TB 05
No ratings yet
Strategic Management Test Bank Wheelen smbp12 TB 05
24 pages
IIT BH - DNC Lab - EE - Manual - Expt 7
No ratings yet
IIT BH - DNC Lab - EE - Manual - Expt 7
1 page
Lin's Concordance Correlation Coefficient
No ratings yet
Lin's Concordance Correlation Coefficient
7 pages
Arrays: Dr. Hadeer Ahmed Hassan Hosny
No ratings yet
Arrays: Dr. Hadeer Ahmed Hassan Hosny
27 pages
Pushover-Based Risk Assessment Method:: A Practical Tool For Risk Assessment of Building Structures
No ratings yet
Pushover-Based Risk Assessment Method:: A Practical Tool For Risk Assessment of Building Structures
14 pages
Digital Paddlewheel Flow Meter: Features
No ratings yet
Digital Paddlewheel Flow Meter: Features
4 pages
Law and Education
No ratings yet
Law and Education
15 pages
Unit 4 Bank Deposits and Lending
No ratings yet
Unit 4 Bank Deposits and Lending
30 pages
Hackathon: Documentation
No ratings yet
Hackathon: Documentation
6 pages
Idea Presentation Format SIH2023 College
No ratings yet
Idea Presentation Format SIH2023 College
4 pages
Woodhouse: Midgley Gardens
No ratings yet
Woodhouse: Midgley Gardens
36 pages
JETIR2405853
No ratings yet
JETIR2405853
9 pages
Classic Cars Script
No ratings yet
Classic Cars Script
4 pages
DOC-20240613-WA0010..pdf 20240613 152951 0000
No ratings yet
DOC-20240613-WA0010..pdf 20240613 152951 0000
48 pages
Ai With Law
No ratings yet
Ai With Law
3 pages
Legal Ai and Law Students
No ratings yet
Legal Ai and Law Students
16 pages
Ai in Block Chain
No ratings yet
Ai in Block Chain
30 pages
Role of Ai in Modern Legal Practice
No ratings yet
Role of Ai in Modern Legal Practice
12 pages
AI Legal Document Assistant Tool
No ratings yet
AI Legal Document Assistant Tool
4 pages
MP Report Part2 Format
No ratings yet
MP Report Part2 Format
33 pages
Final Research Paper
No ratings yet
Final Research Paper
7 pages
Law and Tech
No ratings yet
Law and Tech
9 pages
NLP-Based Intelligent Tagging System
No ratings yet
NLP-Based Intelligent Tagging System
23 pages
Marx's Impact on Class Struggle
No ratings yet
Marx's Impact on Class Struggle
3 pages
Legal AI: Trends and Challenges
No ratings yet
Legal AI: Trends and Challenges
14 pages
GDS Cycle V SOP
No ratings yet
GDS Cycle V SOP
5 pages
AI's Impact on Legal Professionals
No ratings yet
AI's Impact on Legal Professionals
7 pages
Legal Case Document Summarization Using Ai
No ratings yet
Legal Case Document Summarization Using Ai
6 pages
AI Legal Assistant Project Report
No ratings yet
AI Legal Assistant Project Report
47 pages
Initial Proposal and Workflow
No ratings yet
Initial Proposal and Workflow
10 pages
AI-Lawyer: Synopsis Presentation On
No ratings yet
AI-Lawyer: Synopsis Presentation On
11 pages
Data Science @chapter 1 Artificial Intelligence
No ratings yet
Data Science @chapter 1 Artificial Intelligence
10 pages
An Effective Search Algorithm For Analyzing and Extracting Indian Legal Judgment
No ratings yet
An Effective Search Algorithm For Analyzing and Extracting Indian Legal Judgment
6 pages
Shehnaz ITL Project
No ratings yet
Shehnaz ITL Project
24 pages
Law of Torts Project
No ratings yet
Law of Torts Project
9 pages
Astitva SDG
No ratings yet
Astitva SDG
5 pages
Superfinale 22
No ratings yet
Superfinale 22
12 pages
Deshmukh, S. M. Synopsis CD
No ratings yet
Deshmukh, S. M. Synopsis CD
17 pages
Mos Word 2016 - Core Practice Exam 3 Training
No ratings yet
Mos Word 2016 - Core Practice Exam 3 Training
9 pages
Batch 17-Ppt Second Review
No ratings yet
Batch 17-Ppt Second Review
14 pages
AI Legal Assistant For IPC
No ratings yet
AI Legal Assistant For IPC
5 pages
Write The Room
No ratings yet
Write The Room
11 pages
Minessota Law - Estudo de Impactos Da IA Na Advocacia
No ratings yet
Minessota Law - Estudo de Impactos Da IA Na Advocacia
89 pages
India's Ai Driven: Legal Future - JPCL
No ratings yet
India's Ai Driven: Legal Future - JPCL
2 pages
APPROVED Vendor Pending List
No ratings yet
APPROVED Vendor Pending List
177 pages
BCA 6TH E1 (AI-Powered-Legal-Assistant) Madhur-Thapliyal
No ratings yet
BCA 6TH E1 (AI-Powered-Legal-Assistant) Madhur-Thapliyal
7 pages
Maratha Mandal Engineering College Belagavi: Al-Powered Virtual Lawyer Assistant
No ratings yet
Maratha Mandal Engineering College Belagavi: Al-Powered Virtual Lawyer Assistant
14 pages
Undergraduate Thesis Template
No ratings yet
Undergraduate Thesis Template
34 pages
Project Phase 1 (Report)
No ratings yet
Project Phase 1 (Report)
17 pages
Reading Unit 4
No ratings yet
Reading Unit 4
3 pages
BCA 6TH MR - Madhur-Thapliyal (Synopsis)
No ratings yet
BCA 6TH MR - Madhur-Thapliyal (Synopsis)
7 pages
SIDF Corporate Profile 2022
No ratings yet
SIDF Corporate Profile 2022
63 pages
1.2. Free Radical Bromination of Alkanes - Master Organic Chemistry
No ratings yet
1.2. Free Radical Bromination of Alkanes - Master Organic Chemistry
1 page
Blue and White Illustrative Marketing Plan Presentation
No ratings yet
Blue and White Illustrative Marketing Plan Presentation
14 pages
IJRPR41492
No ratings yet
IJRPR41492
8 pages
1st Pinnacle Open Blitz Chess Tournament 2025
No ratings yet
1st Pinnacle Open Blitz Chess Tournament 2025
4 pages
Resolving Property Disputes With AI An NLP and BERT Powered Chatbot
No ratings yet
Resolving Property Disputes With AI An NLP and BERT Powered Chatbot
9 pages
Sonakshi Singh
No ratings yet
Sonakshi Singh
64 pages
AI-Powered Legal Documentation Assistant: P. Vimala Imogen, J. Sreenidhi, V. Nivedha
No ratings yet
AI-Powered Legal Documentation Assistant: P. Vimala Imogen, J. Sreenidhi, V. Nivedha
17 pages
Aditi
No ratings yet
Aditi
11 pages
Regulatory Frameworks For Artificial Intelligence
No ratings yet
Regulatory Frameworks For Artificial Intelligence
13 pages
Engineer Onboarding Form
No ratings yet
Engineer Onboarding Form
12 pages
Title Defense Akhilesh
No ratings yet
Title Defense Akhilesh
16 pages
Legal Assist AI-Leveraging Transformer Based Model
No ratings yet
Legal Assist AI-Leveraging Transformer Based Model
17 pages
Artificial Intelligence Legal Assistant
No ratings yet
Artificial Intelligence Legal Assistant
4 pages
PP Riseofchina
No ratings yet
PP Riseofchina
16 pages
AI Legal Query Assistant Project
No ratings yet
AI Legal Query Assistant Project
1 page
PT - 1 Apr 2025
No ratings yet
PT - 1 Apr 2025
4 pages
Romantic Escapade - South & North Goa
No ratings yet
Romantic Escapade - South & North Goa
15 pages
Revolutionizing Legal Workflows Advanced AI Techniques For Document Summarization Legal Translation and Conversational Assistance
No ratings yet
Revolutionizing Legal Workflows Advanced AI Techniques For Document Summarization Legal Translation and Conversational Assistance
4 pages
Wa0002.
No ratings yet
Wa0002.
12 pages
Lawpal: A Retrieval Augmented Generation Based System For Enhanced Legal Accessibility in India
No ratings yet
Lawpal: A Retrieval Augmented Generation Based System For Enhanced Legal Accessibility in India
12 pages
Extended Offline Legal QA Report Addendum
No ratings yet
Extended Offline Legal QA Report Addendum
3 pages
AI Legal Documentation Assistant
No ratings yet
AI Legal Documentation Assistant
5 pages
A I Powered Legal Documentation 11
No ratings yet
A I Powered Legal Documentation 11
18 pages
LegalEase India - Automated Legal Document Generati
No ratings yet
LegalEase India - Automated Legal Document Generati
3 pages
Human Centered AI For Indian Legal Text Analytics
No ratings yet
Human Centered AI For Indian Legal Text Analytics
9 pages

Project Paper

Uploaded by

Project Paper

Uploaded by

Indian Institute of Information Technology Surat

Mid Semester Report on

Yogesh Nade- ui22cs51

Dr. Shraddha Patel

Department of Computer Science and

The project proposes a comprehensive platform with the following features:

●​ An AI-powered chatbot using LLaMA2-7B fine-tuned on Indian legal data.

Keywords: AI Legal Assistant, Retrieval-Augmented Generation (RAG), ChromaDB,

S.No Title Page No. Remark

4. Chapter 1: Problem Statement 7

5. Chapter 2: Literature Survey 8

S.No Table No. Page No.

S.No Figure No. Page No.

2. Fig 4.2 RAG Query Flow 11

LLaMA2-7B: Large Language Model Meta AI

AI: Artificial Intelligence

NLP: Natural Language Processing

ChromaDB: Vector Database for Semantic Search

Neo4j: Graph-Based Database

BERT: Bidirectional Encoder Representations from

Transformers LEGAL-BERT: BERT fine-tuned on legal data

RoBERTa: Robustly Optimized BERT Pretraining

Approach TF-IDF: Term Frequency-Inverse Document

BM25: Best Matching 25 (ranking function for information retrieval)

VAE: Variational Autoencoder

SAC: Soft Actor-Critic

Key challenges identified:

1.​ Limited AI Legal Assistance: Basic guidance without trusted, verifiable

Resource Year Title Algorithm + Limitation

Table:1 Literature Survey

Round-the-Clock AI Legal Support: Get 24/7 legal assistance powered by LLaMA2-7B,

Retrieval-Augmented Generation (RAG) Retrieval-Augmented Generation (RAG) combines

Fig 4.1: RAG Model Architecture

●​ LawyerChat: A corpus of legal conversations.

pairs. Fine-Tuning Steps:

Fig 4.3: LLaMA2-7B Fine-Tuning

●​ Nodes: Case Laws, Statutes, Legal Principles

●​ BFS (Breadth-First Search): For exploring case law references.

Hybrid Retrieval Model

The hybrid model merges vector-based and graph-based retrieval strategies:

Final Score=α×Vector Score+(1−α)×Graph Score

●​ Vector Score: Cosine similarity between query and document embeddings.

●​ Improves result accuracy by leveraging both semantic similarity

6.​Indian Legal Datasets for RAG-Based AI

●​ A curated collection of official legal documents, case

7.​Fine-Tuning Datasets for LLaMA2-7B

●​ LawyerChat: Dataset containing Indian legal Q&A pairs

8.​Knowledge Graph Data (Neo4j Integration)

●​ Structured case-law data linking precedents, legal

documents. Used for graph-based legal retrieval and reasoning.

The core of this legal AI system is Retrieval-Augmented Generation (RAG), which

How RAG Works in This Project

Step 1: Creating Legal Document Embeddings (ChromaDB)

Step 3: Passing Retrieved Legal Context to LLaMA2-7B for Response Generation

To enhance accuracy, the AI model is fine-tuned on legal question-answer pairs and

○​ LawyerChat (Legal Q&A dataset)

○​ The model is trained to understand legal terminology, citations,

○​ Model responses are evaluated using BLEU, ROUGE, and BERTScore

Fig : 4.4 Fine-Tuning LLaMA2-7B

Why Use Neo4j for Legal Data?

How Neo4j Works in This System

1.​ Legal Data Structuring

Fig : 4.5 Neo4j diagram

ChromaDB (Semantic Search) – Finds relevant legal provisions based on meaning.

Search Works in Legal AI

User Query Understanding

AI determines whether the query is statutory (law-based), case-law-related, or mixed.

Retrieving Relevant Legal Data

If statutory, ChromaDB is queried for legal provisions & acts.

If case-law, Neo4j is queried for similar past cases.

If mixed, both are combined to generate a comprehensive response.

Generating Final Response

Retrieved documents are passed to LLaMA2-7B, which synthesizes a legal response.

AI adds citations & reasoning, making responses more legally sound.

●​ Training Loss: 0.073

Expected Results We aim for the following:

● An AI-powered chatbot using LLaMA2-7B fine-tuned on Indian legal data.

1. Limited AI Legal Assistance: Basic guidance without trusted, verifiable

● LawyerChat: A corpus of legal conversations.

● Nodes: Case Laws, Statutes, Legal Principles

● BFS (Breadth-First Search): For exploring case law references.

● Vector Score: Cosine similarity between query and document embeddings.

● Improves result accuracy by leveraging both semantic similarity

6.Indian Legal Datasets for RAG-Based AI

● A curated collection of official legal documents, case

7.Fine-Tuning Datasets for LLaMA2-7B

● LawyerChat: Dataset containing Indian legal Q&A pairs

8.Knowledge Graph Data (Neo4j Integration)

● Structured case-law data linking precedents, legal

○ LawyerChat (Legal Q&A dataset)

○ The model is trained to understand legal terminology, citations,

○ Model responses are evaluated using BLEU, ROUGE, and BERTScore

1. Legal Data Structuring

● Training Loss: 0.073

● Improved accuracy due to fine-tuning LLaMA2-7B.