0% found this document useful (0 votes)

4 views8 pages

RAG Analysis

The progress report outlines the development of a RAG-Powered Complaint Analysis System, detailing tasks such as data analysis, embedding, and the creation of an interactive interface. Key accomplishments include implementing session-based chat history, modularizing the project structure, and enhancing code quality through linting. The system aims to improve user experience and operational transparency while providing valuable business insights and analytics features.

Uploaded by

frehiwothaile389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

RAG Analysis

Uploaded by

frehiwothaile389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Progress Report: RAG-Powered Complaint

Analysis System
Author: Frehiwot Haile
Date: August 19, 2025

Existing Features
Task 1: Data Analysis & Preprocessing

● Loaded and explored the dataset, analyzing complaint volume by product and narrative
length.

● Filtered to five target products with non-empty narratives.

● Cleaned text by lowercasing, removing special characters, and stripping boilerplate.

Task 2: Chunking, Embedding & Indexing

● Split long narratives into manageable text chunks.

● Generated embeddings using a SentenceTransformer model.

● Stored embeddings in a persistent ChromaDB vector store, ensuring consistency by

clearing old collections.

Task 3: RAG Core Logic

● Connected the ChromaDB retriever with LLaMA3-70B via Groq.

● Built a RAG chain using a system prompt, retriever, and LLM.

● Evaluated the system with test queries, confirming accurate, complaint-based answers
with low latency.

Task 4: Interactive Interface

● Developed a Streamlit app with:

○ Input box and “Submit” button for queries.

○ AI-generated responses with source document display.

○ “Clear” button for resetting.

● Packaged in a standalone app.py for easy use by non-technical users.

New Changes
I updated the folder structure to be more modularized.

Fig: Folder structure before.
After

Fig: Screenshot of the modular project structure.

Task 5: Chat History and Modular Project Structure

Expected Deliverables

● Implement session-based chat history in the Streamlit app to retain user queries and AI
responses.
● Add a toggle sidebar for users to select and review previous chats.
● Restructure the codebase into modular subfolders: utils/, vectorization/, db/,
rag/, and app.py.
● Update imports and references to ensure functionality remains intact.
● Fix the "clear" button to reset both questions and answers.

Accomplishments
● Implemented persistent session-based chat history, enabling users to view prior
conversations for improved continuity.
● Added a toggle sidebar in the Streamlit UI, allowing users to select and review previous
chats.
● Restructured the project into modular subfolders as planned, separating utilities,
vectorization, database management, RAG pipeline logic, and the app interface.
● Updated all imports and references to align with the new folder structure, verifying that
no functionality was broken.
● Fixed the "clear" button issue, ensuring it resets both questions and answers, resolving
the prior issue where questions persisted.

Fig: Streamlit UI with chat history and toggle sidebar.

Deviations and Reasons

● No deviations occurred; all planned actions for Task 5 were completed as expected.

Impact

The modular structure enhances maintainability and scalability, while the chat history and fixed
clear button improve the user experience, making the interface intuitive for finance stakeholders.

Task 6: Code Quality and Linting Setup

Expected Deliverables
● Install and configure Black, Flake8, and isort for code formatting, linting, and import
sorting.
● Add configurations in pyproject.toml for consistent style enforcement.
● Fix failing pytest tests related to the vector store.
● Set up pre-commit hooks for automatic linting.
● Ensure compatibility across platforms, particularly for Windows.
● Exclude irrelevant files (e.g., venv/, .ipynb) from linting.

Accomplishments

● Installed Black, Flake8, and isort in the virtual environment.

● Configured pyproject.toml with:
○ Black: Line length set to 88.
○ Flake8: Ignored minor conflicts (E203, W503) and excluded venv/, .ipynb
files.
○ isort: Ensured consistent import ordering.
● Fixed failing pytest tests by including the vector store in the test suite, ensuring it is
properly initialized.
● Set up pre-commit hooks to enforce linting on commits.
● Verified cross-platform compatibility, including Windows, and ensured black . and
flake8 . run without errors.
● Excluded irrelevant files from linting, improving efficiency.

FIG: Output of Black and Flake8 linting checks.

Deviations and Reasons

● No deviations occurred; all planned actions for Task 6 were completed as expected.

Impact

The codebase now adheres to Python best practices, improving readability, maintainability, and
reliability. Automated linting and fixed tests reduce the risk of errors, critical for finance
applications.

FIG: Sample pytest output showing successful tests.

Task 7 Implementation Report: Transparency and Business

Insights

1. AI Transparency Features
● Direct source access embedded in responses with metadata (date, product, relevance).

● Intelligent fallback system when context is insufficient.

● Comprehensive audit logging with conversation history and source references.

2. Analytics Dashboard
● Data Integration: Real-time access to complaint data with performance optimization.

● Visualizations:

○ Complaint trends (time series)

○ Product category distribution (bar charts)

○ Sentiment breakdown (donut charts)

● Advanced Filters: By date, product, and sentiment with instant updates.

● KPIs: Complaint volume, top categories, dominant sentiment, and average length.

3. Key Challenges & Solutions

● Large dataset → optimized sampling for speed.
● Timestamp errors → robust date handling.

● Metadata gaps → fallback logic.

● Performance → caching and efficient pipelines.

4. Quality & Outcomes

● Fast performance: <3s load, <1s filter, <5s AI responses.

● Reliable: <0.1% error rate with full fallback coverage.

● Transparent: All AI answers include sources; clear, interactive visuals.

5. Business Impact
● Operational Transparency: Compliance-ready audit trails, explainable AI.

● Customer Experience: Faster issue detection and proactive service.

● Strategic Advantage: Better insights, optimized resources, early risk detection.

6. Future Enhancements
● Real-time streaming and predictive analytics.

● CRM integration and emotion-level sentiment.

● Automated compliance reporting and multilingual support.

Document 1
No ratings yet
Document 1
14 pages
Toxic Comment Analysis Report
No ratings yet
Toxic Comment Analysis Report
20 pages
Mainak Talks
No ratings yet
Mainak Talks
8 pages
All
No ratings yet
All
8 pages
Python Developer Resume
No ratings yet
Python Developer Resume
8 pages
205 Intern Report
No ratings yet
205 Intern Report
18 pages
Post-Interview Evaluation Test1
No ratings yet
Post-Interview Evaluation Test1
2 pages
Oth
No ratings yet
Oth
3 pages
Projects
No ratings yet
Projects
8 pages
Web Scraping - Get Product Recommendation With Sentimental Analysis Based On Reviews From Flipkart (Live Data) Using Langchain
No ratings yet
Web Scraping - Get Product Recommendation With Sentimental Analysis Based On Reviews From Flipkart (Live Data) Using Langchain
5 pages
Uplyft Round 2 - Case Study
No ratings yet
Uplyft Round 2 - Case Study
3 pages
Veerraju Palacharla (PY Project)
No ratings yet
Veerraju Palacharla (PY Project)
11 pages
Finalv2 Report AshuRosh 2
No ratings yet
Finalv2 Report AshuRosh 2
78 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
16 pages
Softeare Enginerring
No ratings yet
Softeare Enginerring
14 pages
Interview Questions UBS
No ratings yet
Interview Questions UBS
7 pages
M2-V2 Development Approach & Lifecycle Tutorial Prompts
No ratings yet
M2-V2 Development Approach & Lifecycle Tutorial Prompts
5 pages
Name Assiignment
No ratings yet
Name Assiignment
25 pages
Bahadur Sample
No ratings yet
Bahadur Sample
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
34 pages
Jag An Report
No ratings yet
Jag An Report
13 pages
Outlier 4
No ratings yet
Outlier 4
4 pages
HAv2 Write The Requirement
No ratings yet
HAv2 Write The Requirement
5 pages
Movie Recommender System Guide
No ratings yet
Movie Recommender System Guide
9 pages
Group Project Assignment
No ratings yet
Group Project Assignment
7 pages
Hãy đóng vai trò là một chuyên gia về AI và công...
No ratings yet
Hãy đóng vai trò là một chuyên gia về AI và công...
13 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
19 pages
WSMA Lab Manual 2
No ratings yet
WSMA Lab Manual 2
8 pages
Arsalan's Project
No ratings yet
Arsalan's Project
4 pages
SP22-BSE-041 RE - Assignment 4
No ratings yet
SP22-BSE-041 RE - Assignment 4
5 pages
Project Instructions
No ratings yet
Project Instructions
12 pages
Streamlit Apps
No ratings yet
Streamlit Apps
8 pages
Final Year Sem VII
No ratings yet
Final Year Sem VII
23 pages
? Semester Break Projects For BSC
No ratings yet
? Semester Break Projects For BSC
30 pages
Final
No ratings yet
Final
24 pages
Reception Plan
No ratings yet
Reception Plan
13 pages
10 Academy - AIM - Week 2
No ratings yet
10 Academy - AIM - Week 2
13 pages
0-To-PyPI - Developing Open Source Z - OS Python Packages - 26501 - 0toPyPIDevelopingOpenSourcezOSPythonPackages
No ratings yet
0-To-PyPI - Developing Open Source Z - OS Python Packages - 26501 - 0toPyPIDevelopingOpenSourcezOSPythonPackages
38 pages
CET333 Project Portfolio Report
No ratings yet
CET333 Project Portfolio Report
13 pages
Early 2025 AI Experienced OS Devs Study-20
No ratings yet
Early 2025 AI Experienced OS Devs Study-20
2 pages
DevOps & Automation Expertise
No ratings yet
DevOps & Automation Expertise
4 pages
Data Science Internship Report 2024
No ratings yet
Data Science Internship Report 2024
26 pages
Chatbot
100% (1)
Chatbot
48 pages
HAv2 PR and Problem Statement
No ratings yet
HAv2 PR and Problem Statement
10 pages
Mars Open Projects 2025
No ratings yet
Mars Open Projects 2025
7 pages
Prompt Engineering & Ai
No ratings yet
Prompt Engineering & Ai
22 pages
Arsalan's Project New
No ratings yet
Arsalan's Project New
4 pages
Esources: Python Python Modules SQL
No ratings yet
Esources: Python Python Modules SQL
5 pages
Project Final Report
No ratings yet
Project Final Report
17 pages
XII New Practical File Questions
No ratings yet
XII New Practical File Questions
7 pages
Inno Craft
No ratings yet
Inno Craft
7 pages
Project Plan
No ratings yet
Project Plan
8 pages
John Milton M Resume
No ratings yet
John Milton M Resume
3 pages
Atlan Interview Question
No ratings yet
Atlan Interview Question
9 pages
Project Final Report
No ratings yet
Project Final Report
17 pages
Resume Review Tool for Job Seekers
No ratings yet
Resume Review Tool for Job Seekers
12 pages
Deepika CV
No ratings yet
Deepika CV
3 pages
Experiment No 1
No ratings yet
Experiment No 1
7 pages
Information Extraction Using Context-Free Grammatical Inference From Positive Examples
No ratings yet
Information Extraction Using Context-Free Grammatical Inference From Positive Examples
4 pages
Records Storage & Retrieval Guide
No ratings yet
Records Storage & Retrieval Guide
147 pages
UNIT2 Internet of Things-2
No ratings yet
UNIT2 Internet of Things-2
10 pages
MIS Assignment 1
No ratings yet
MIS Assignment 1
6 pages
ACT Comparison File
No ratings yet
ACT Comparison File
1 page
Gui
No ratings yet
Gui
1 page
Computer and Civil Material
No ratings yet
Computer and Civil Material
17 pages
Geographic Information System (GIS) : Get Inspired
No ratings yet
Geographic Information System (GIS) : Get Inspired
2 pages
Linkedin: How Big Data Is Used To Fuel Social Media Success
No ratings yet
Linkedin: How Big Data Is Used To Fuel Social Media Success
7 pages
DPP System Design Overview
No ratings yet
DPP System Design Overview
31 pages
1000 Free Directory Backlink Guide
33% (3)
1000 Free Directory Backlink Guide
59 pages
AY2024 SCTP DIAF L03 TutorialAns
No ratings yet
AY2024 SCTP DIAF L03 TutorialAns
3 pages
Case Study On Relational Data Base Design
No ratings yet
Case Study On Relational Data Base Design
22 pages
How To Upgrade To SAP BW4HANA and BW 7.5 On SAP HANA - Potential Pitfalls and Tried and True Instructions For Success
No ratings yet
How To Upgrade To SAP BW4HANA and BW 7.5 On SAP HANA - Potential Pitfalls and Tried and True Instructions For Success
61 pages
Acc Orascom Project Handover Documentati
100% (2)
Acc Orascom Project Handover Documentati
28 pages
Clarion ASP Users Guide
No ratings yet
Clarion ASP Users Guide
172 pages
Analysis of Recommender System Using Generative Artificial Intelligence A Systematic Literature Review
No ratings yet
Analysis of Recommender System Using Generative Artificial Intelligence A Systematic Literature Review
25 pages
Billiel LS 556 Concept Fall 2022
No ratings yet
Billiel LS 556 Concept Fall 2022
12 pages
Openscape Business: Tutorial Setup "Branch On Data" Call Flows Within The Contact Center
No ratings yet
Openscape Business: Tutorial Setup "Branch On Data" Call Flows Within The Contact Center
17 pages
Productivity of Incident Management With Conversational Bots-A Review
No ratings yet
Productivity of Incident Management With Conversational Bots-A Review
14 pages
EUROLAB Cook Book - Doc No 13 Technical Records - Rev. 2017
No ratings yet
EUROLAB Cook Book - Doc No 13 Technical Records - Rev. 2017
4 pages
SWDD Template
No ratings yet
SWDD Template
6 pages
BSIT D 2018 Prospectus
No ratings yet
BSIT D 2018 Prospectus
2 pages
Book La Nueva Cura Biblica para La Depresion y Ansied
No ratings yet
Book La Nueva Cura Biblica para La Depresion y Ansied
2 pages
Introduction To AI
No ratings yet
Introduction To AI
10 pages
Amey B-50 DWM Lab Experiment-1
No ratings yet
Amey B-50 DWM Lab Experiment-1
12 pages
Week1 Lecture
No ratings yet
Week1 Lecture
182 pages
Unit 3 Gis
No ratings yet
Unit 3 Gis
19 pages
GIS Mapping with ArcGIS & Online Tools
100% (1)
GIS Mapping with ArcGIS & Online Tools
6 pages