Codestin Search App

💻 About Me

I'm interested in learning everything in new tech preferably in data and ai and excited about how they both are converging. I also like to watch movies, especially the mystery and horror thrillers.

🔍 Building end-to-end data pipelines, lakehouse architectures, and AI-powered solutions
🌱 Always learning - currently exploring Airflow, dbt, Spark, and cloud data platforms
📚 Medium and Substack enthusiast for data engineering wisdom
🎯 Mission: Making data work smarter, not harder
📦 Organized person who believes in structure and clarity

🛠️ Tech Stack

Data Engineering Core

█████████░ Python       █████████░ SQL
████████░░ PySpark      ███████░░░ Airflow
███████░░░ dbt          ██████░░░░ Bash/Shell

Cloud Platforms

_{S3 • Glue • Redshift • Talend}

_{ADLS • ADF • Databricks • DevOps}

_{Dataproc • Dataflow • BigQuery}

Visualization & Analytics

Development Tools

Project Management & Collaboration

ServiceNow Jira Monday.com

Foundations

Computer Science: Data Structures & Algorithms • Database Management Systems • Operating Systems • Software Engineering • Web Technologies

Data Science: Big Data Analytics • Applied Machine Learning • Data Modeling • Knowledge Management • Principles of Data Science

Languages: Python • SQL • Java • C/C++ • Kotlin • HTML/CSS

💡 Featured Projects

🤖 Retailia: AI-Powered Customer Support Agent

Conversational AI meets retail intelligence

Tech Stack: Python • Llama 3.1 • LangChain • LangGraph • FAISS • MySQL • Streamlit
The Challenge: High query volumes + slow responses = frustrated customers
The Solution: Agentic RAG architecture with dual intelligence

What makes it special:

Smart Routing: Agentic RAG Handler directs queries to the right brain
- SQL Agent: Real-time data from MySQL (orders, products, inventory)
- FAQ Agent: Instant answers from FAISS vector knowledge base
LLM Orchestration: LangChain + LangGraph for seamless agent coordination
Prompt Engineering: Zero-shot and few-shot techniques for contextually aware responses
User Experience: Streamlit interface with secure authentication

Impact: Automated customer service that's actually helpful, respectful, and knows when to pull from structured vs. unstructured data.

🌍 Cloud-Based Crime Data Analytics Pipeline

Big data meets crime pattern detection

Tech Stack: GCP • Dataproc • BigQuery • Hadoop • Hive • Spark • OpenRefine • Tableau
Mission: Turn raw crime data into actionable insights for safer communities

Architecture breakdown:

Data Lake: Google Cloud Storage for scalable raw data management
Processing Power: Cloud Dataproc Hadoop cluster for distributed computing
Dual Query Engine: Compared Hive SQL vs. Spark SQL performance
Data Quality: OpenRefine for cleaning and standardization
Analytics: BigQuery external tables for pattern detection
Insights: Tableau dashboards revealing theft hotspots and temporal trends

Key Learning: Full data lifecycle on GCP - from messy police records to actionable intelligence.

📦 Retail Data Pipeline with Modern Data Stack

Orchestrating 1M+ transactions with Airflow & dbt

Tech Stack: Airflow • dbt • PostgreSQL • Docker • CI/CD
Scale: 1M+ transactions • 90% faster with incremental loading

The build:

Orchestration: Airflow DAGs managing the workflow
Transformation: dbt models (staging → core) with incremental materialization
Quality: 9 automated tests ensuring data integrity
DevOps: GitHub Actions CI/CD for every commit
Infrastructure: Fully Dockerized for consistency across environments

🎓 Certifications & Training

🎯 Databricks Certified Data Engineer Associate

Production-grade data engineering on Databricks platform

📚 DataExpert.io Bootcamp Graduate

Dimensional data modeling & fact table design
Apache Flink, Kafka, Snowflake, Spark deep-dives
Data quality patterns & pipeline maintenance
KPIs, experimentation, and analytical patterns

🤖 Google Gen AI Intensive (Kaggle)

5-day bootcamp diving into LLM fundamentals
Studied 5 key research papers on generative AI

🐍 GitHub Activity Snake

github contribution grid snake animation

💻 Current Focus

-- The data never lies, but sometimes it tells jokes

SELECT 
    'Suprathika' AS name,
    'Data Engineer' AS role,
    ARRAY[
        'Airflow orchestration patterns',
        'dbt incremental strategies',
        'Spark optimization wizardry',
        'Cloud-native architectures'
    ] AS currently_learning,
    ARRAY[
        'Production-grade systems',
        'ELT pipeline design',
        'Data quality frameworks',
        'Modern data stack'
    ] AS interests,
    'There\'s always something to learn 🌟' AS motto
FROM data_engineer_life
WHERE coffee_level > 0
    AND curiosity = 'infinite';

📝 What I'm Reading & Writing

Always hunting for the next "aha!" moment:

📝 Reading: Medium & Substack for best practices, architecture patterns, and technical insights
✍️ Want to Write Someday: Sharing my own journey and learnings
📖 Tech Docs: Because sometimes the manual is the real MVP

There's always something to learn! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💻 About Me

🛠️ Tech Stack

Data Engineering Core

Cloud Platforms

Visualization & Analytics

Development Tools

Project Management & Collaboration

Foundations

💡 Featured Projects

🤖 Retailia: AI-Powered Customer Support Agent

🌍 Cloud-Based Crime Data Analytics Pipeline

📦 Retail Data Pipeline with Modern Data Stack

🎓 Certifications & Training

🐍 GitHub Activity Snake

💻 Current Focus

📝 What I'm Reading & Writing

About

Uh oh!

Releases

Packages

Suprathika-vangari/Suprathika-vangari

Folders and files

Latest commit

History

Repository files navigation

💻 About Me

🛠️ Tech Stack

Data Engineering Core

Cloud Platforms

Visualization & Analytics

Development Tools

Project Management & Collaboration

Foundations

💡 Featured Projects

🤖 Retailia: AI-Powered Customer Support Agent

🌍 Cloud-Based Crime Data Analytics Pipeline

📦 Retail Data Pipeline with Modern Data Stack

🎓 Certifications & Training

🐍 GitHub Activity Snake

💻 Current Focus

📝 What I'm Reading & Writing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages