Progress Report: RAG-Powered Complaint
Analysis System
Author: Frehiwot Haile
Date: August 19, 2025
Existing Features
Task 1: Data Analysis & Preprocessing
● Loaded and explored the dataset, analyzing complaint volume by product and narrative
length.
● Filtered to five target products with non-empty narratives.
● Cleaned text by lowercasing, removing special characters, and stripping boilerplate.
Task 2: Chunking, Embedding & Indexing
● Split long narratives into manageable text chunks.
● Generated embeddings using a SentenceTransformer model.
● Stored embeddings in a persistent ChromaDB vector store, ensuring consistency by
clearing old collections.
Task 3: RAG Core Logic
● Connected the ChromaDB retriever with LLaMA3-70B via Groq.
● Built a RAG chain using a system prompt, retriever, and LLM.
● Evaluated the system with test queries, confirming accurate, complaint-based answers
with low latency.
Task 4: Interactive Interface
● Developed a Streamlit app with:
○ Input box and “Submit” button for queries.
○ AI-generated responses with source document display.
○ “Clear” button for resetting.
● Packaged in a standalone app.py for easy use by non-technical users.
New Changes
I updated the folder structure to be more modularized.
Fig: Folder structure before.
After
Fig: Screenshot of the modular project structure.
Task 5: Chat History and Modular Project Structure
Expected Deliverables
● Implement session-based chat history in the Streamlit app to retain user queries and AI
responses.
● Add a toggle sidebar for users to select and review previous chats.
● Restructure the codebase into modular subfolders: utils/, vectorization/, db/,
rag/, and app.py.
● Update imports and references to ensure functionality remains intact.
● Fix the "clear" button to reset both questions and answers.
Accomplishments
● Implemented persistent session-based chat history, enabling users to view prior
conversations for improved continuity.
● Added a toggle sidebar in the Streamlit UI, allowing users to select and review previous
chats.
● Restructured the project into modular subfolders as planned, separating utilities,
vectorization, database management, RAG pipeline logic, and the app interface.
● Updated all imports and references to align with the new folder structure, verifying that
no functionality was broken.
● Fixed the "clear" button issue, ensuring it resets both questions and answers, resolving
the prior issue where questions persisted.
Fig: Streamlit UI with chat history and toggle sidebar.
Deviations and Reasons
● No deviations occurred; all planned actions for Task 5 were completed as expected.
Impact
The modular structure enhances maintainability and scalability, while the chat history and fixed
clear button improve the user experience, making the interface intuitive for finance stakeholders.
Task 6: Code Quality and Linting Setup
Expected Deliverables
● Install and configure Black, Flake8, and isort for code formatting, linting, and import
sorting.
● Add configurations in pyproject.toml for consistent style enforcement.
● Fix failing pytest tests related to the vector store.
● Set up pre-commit hooks for automatic linting.
● Ensure compatibility across platforms, particularly for Windows.
● Exclude irrelevant files (e.g., venv/, .ipynb) from linting.
Accomplishments
● Installed Black, Flake8, and isort in the virtual environment.
● Configured pyproject.toml with:
○ Black: Line length set to 88.
○ Flake8: Ignored minor conflicts (E203, W503) and excluded venv/, .ipynb
files.
○ isort: Ensured consistent import ordering.
● Fixed failing pytest tests by including the vector store in the test suite, ensuring it is
properly initialized.
● Set up pre-commit hooks to enforce linting on commits.
● Verified cross-platform compatibility, including Windows, and ensured black . and
flake8 . run without errors.
● Excluded irrelevant files from linting, improving efficiency.
FIG: Output of Black and Flake8 linting checks.
Deviations and Reasons
● No deviations occurred; all planned actions for Task 6 were completed as expected.
Impact
The codebase now adheres to Python best practices, improving readability, maintainability, and
reliability. Automated linting and fixed tests reduce the risk of errors, critical for finance
applications.
FIG: Sample pytest output showing successful tests.
Task 7 Implementation Report: Transparency and Business
Insights
1. AI Transparency Features
● Direct source access embedded in responses with metadata (date, product, relevance).
● Intelligent fallback system when context is insufficient.
● Comprehensive audit logging with conversation history and source references.
2. Analytics Dashboard
● Data Integration: Real-time access to complaint data with performance optimization.
● Visualizations:
○ Complaint trends (time series)
○ Product category distribution (bar charts)
○ Sentiment breakdown (donut charts)
● Advanced Filters: By date, product, and sentiment with instant updates.
● KPIs: Complaint volume, top categories, dominant sentiment, and average length.
3. Key Challenges & Solutions
● Large dataset → optimized sampling for speed.
● Timestamp errors → robust date handling.
● Metadata gaps → fallback logic.
● Performance → caching and efficient pipelines.
4. Quality & Outcomes
● Fast performance: <3s load, <1s filter, <5s AI responses.
● Reliable: <0.1% error rate with full fallback coverage.
● Transparent: All AI answers include sources; clear, interactive visuals.
5. Business Impact
● Operational Transparency: Compliance-ready audit trails, explainable AI.
● Customer Experience: Faster issue detection and proactive service.
● Strategic Advantage: Better insights, optimized resources, early risk detection.
6. Future Enhancements
● Real-time streaming and predictive analytics.
● CRM integration and emotion-level sentiment.
● Automated compliance reporting and multilingual support.