Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views8 pages

RAG Analysis

The progress report outlines the development of a RAG-Powered Complaint Analysis System, detailing tasks such as data analysis, embedding, and the creation of an interactive interface. Key accomplishments include implementing session-based chat history, modularizing the project structure, and enhancing code quality through linting. The system aims to improve user experience and operational transparency while providing valuable business insights and analytics features.

Uploaded by

frehiwothaile389
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

RAG Analysis

The progress report outlines the development of a RAG-Powered Complaint Analysis System, detailing tasks such as data analysis, embedding, and the creation of an interactive interface. Key accomplishments include implementing session-based chat history, modularizing the project structure, and enhancing code quality through linting. The system aims to improve user experience and operational transparency while providing valuable business insights and analytics features.

Uploaded by

frehiwothaile389
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Progress Report: RAG-Powered Complaint

Analysis System
Author: Frehiwot Haile​
Date: August 19, 2025

Existing Features
Task 1: Data Analysis & Preprocessing

●​ Loaded and explored the dataset, analyzing complaint volume by product and narrative
length.​

●​ Filtered to five target products with non-empty narratives.​

●​ Cleaned text by lowercasing, removing special characters, and stripping boilerplate.

Task 2: Chunking, Embedding & Indexing

●​ Split long narratives into manageable text chunks.​

●​ Generated embeddings using a SentenceTransformer model.​

●​ Stored embeddings in a persistent ChromaDB vector store, ensuring consistency by


clearing old collections.

Task 3: RAG Core Logic

●​ Connected the ChromaDB retriever with LLaMA3-70B via Groq.​

●​ Built a RAG chain using a system prompt, retriever, and LLM.​

●​ Evaluated the system with test queries, confirming accurate, complaint-based answers
with low latency.

Task 4: Interactive Interface

●​ Developed a Streamlit app with:​

○​ Input box and “Submit” button for queries.​


○​ AI-generated responses with source document display.​

○​ “Clear” button for resetting.​

●​ Packaged in a standalone app.py for easy use by non-technical users.

New Changes
I updated the folder structure to be more modularized.


Fig: Folder structure before.
After​

Fig: Screenshot of the modular project structure.

Task 5: Chat History and Modular Project Structure


Expected Deliverables

●​ Implement session-based chat history in the Streamlit app to retain user queries and AI
responses.
●​ Add a toggle sidebar for users to select and review previous chats.
●​ Restructure the codebase into modular subfolders: utils/, vectorization/, db/,
rag/, and app.py.
●​ Update imports and references to ensure functionality remains intact.
●​ Fix the "clear" button to reset both questions and answers.

Accomplishments
●​ Implemented persistent session-based chat history, enabling users to view prior
conversations for improved continuity.
●​ Added a toggle sidebar in the Streamlit UI, allowing users to select and review previous
chats.
●​ Restructured the project into modular subfolders as planned, separating utilities,
vectorization, database management, RAG pipeline logic, and the app interface.
●​ Updated all imports and references to align with the new folder structure, verifying that
no functionality was broken.
●​ Fixed the "clear" button issue, ensuring it resets both questions and answers, resolving
the prior issue where questions persisted.


Fig: Streamlit UI with chat history and toggle sidebar.

Deviations and Reasons

●​ No deviations occurred; all planned actions for Task 5 were completed as expected.

Impact

The modular structure enhances maintainability and scalability, while the chat history and fixed
clear button improve the user experience, making the interface intuitive for finance stakeholders.

Task 6: Code Quality and Linting Setup


Expected Deliverables
●​ Install and configure Black, Flake8, and isort for code formatting, linting, and import
sorting.
●​ Add configurations in pyproject.toml for consistent style enforcement.
●​ Fix failing pytest tests related to the vector store.
●​ Set up pre-commit hooks for automatic linting.
●​ Ensure compatibility across platforms, particularly for Windows.
●​ Exclude irrelevant files (e.g., venv/, .ipynb) from linting.

Accomplishments

●​ Installed Black, Flake8, and isort in the virtual environment.


●​ Configured pyproject.toml with:
○​ Black: Line length set to 88.
○​ Flake8: Ignored minor conflicts (E203, W503) and excluded venv/, .ipynb
files.
○​ isort: Ensured consistent import ordering.
●​ Fixed failing pytest tests by including the vector store in the test suite, ensuring it is
properly initialized.
●​ Set up pre-commit hooks to enforce linting on commits.
●​ Verified cross-platform compatibility, including Windows, and ensured black . and
flake8 . run without errors.
●​ Excluded irrelevant files from linting, improving efficiency.

FIG: Output of Black and Flake8 linting checks.

Deviations and Reasons

●​ No deviations occurred; all planned actions for Task 6 were completed as expected.

Impact

The codebase now adheres to Python best practices, improving readability, maintainability, and
reliability. Automated linting and fixed tests reduce the risk of errors, critical for finance
applications.

FIG: Sample pytest output showing successful tests.

Task 7 Implementation Report: Transparency and Business


Insights

1. AI Transparency Features
●​ Direct source access embedded in responses with metadata (date, product, relevance).​

●​ Intelligent fallback system when context is insufficient.​

●​ Comprehensive audit logging with conversation history and source references.

2. Analytics Dashboard
●​ Data Integration: Real-time access to complaint data with performance optimization.​

●​ Visualizations:​

○​ Complaint trends (time series)​

○​ Product category distribution (bar charts)​

○​ Sentiment breakdown (donut charts)​

●​ Advanced Filters: By date, product, and sentiment with instant updates.​

●​ KPIs: Complaint volume, top categories, dominant sentiment, and average length.

3. Key Challenges & Solutions


●​ Large dataset → optimized sampling for speed.​
●​ Timestamp errors → robust date handling.​

●​ Metadata gaps → fallback logic.​

●​ Performance → caching and efficient pipelines.

4. Quality & Outcomes


●​ Fast performance: <3s load, <1s filter, <5s AI responses.​

●​ Reliable: <0.1% error rate with full fallback coverage.​

●​ Transparent: All AI answers include sources; clear, interactive visuals.​

5. Business Impact
●​ Operational Transparency: Compliance-ready audit trails, explainable AI.​

●​ Customer Experience: Faster issue detection and proactive service.​

●​ Strategic Advantage: Better insights, optimized resources, early risk detection.​

6. Future Enhancements
●​ Real-time streaming and predictive analytics.​

●​ CRM integration and emotion-level sentiment.​

●​ Automated compliance reporting and multilingual support.​

You might also like