Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Suprathika-vangari/Suprathika-vangari

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Profile Views
Typing SVG



πŸ’» About Me

I'm interested in learning everything in new tech preferably in data and ai and excited about how they both are converging. I also like to watch movies, especially the mystery and horror thrillers.

  • πŸ” Building end-to-end data pipelines, lakehouse architectures, and AI-powered solutions
  • 🌱 Always learning - currently exploring Airflow, dbt, Spark, and cloud data platforms
  • πŸ“š Medium and Substack enthusiast for data engineering wisdom
  • 🎯 Mission: Making data work smarter, not harder
  • πŸ“¦ Organized person who believes in structure and clarity

πŸ› οΈ Tech Stack

Data Engineering Core

β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘ Python       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘ SQL
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ PySpark      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘ Airflow
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘ dbt          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ Bash/Shell

Cloud Platforms


S3 β€’ Glue β€’ Redshift β€’ Talend

ADLS β€’ ADF β€’ Databricks β€’ DevOps

Dataproc β€’ Dataflow β€’ BigQuery

Visualization & Analytics

Power BI Tableau Looker QlikView

Development Tools

Spark Kafka Flink Git Docker

Project Management & Collaboration

ServiceNow Jira Monday.com

Foundations

Computer Science: Data Structures & Algorithms β€’ Database Management Systems β€’ Operating Systems β€’ Software Engineering β€’ Web Technologies

Data Science: Big Data Analytics β€’ Applied Machine Learning β€’ Data Modeling β€’ Knowledge Management β€’ Principles of Data Science

Languages: Python β€’ SQL β€’ Java β€’ C/C++ β€’ Kotlin β€’ HTML/CSS


πŸ’‘ Featured Projects

Conversational AI meets retail intelligence

Tech Stack: Python β€’ Llama 3.1 β€’ LangChain β€’ LangGraph β€’ FAISS β€’ MySQL β€’ Streamlit
The Challenge: High query volumes + slow responses = frustrated customers
The Solution: Agentic RAG architecture with dual intelligence

What makes it special:

  • Smart Routing: Agentic RAG Handler directs queries to the right brain
    • SQL Agent: Real-time data from MySQL (orders, products, inventory)
    • FAQ Agent: Instant answers from FAISS vector knowledge base
  • LLM Orchestration: LangChain + LangGraph for seamless agent coordination
  • Prompt Engineering: Zero-shot and few-shot techniques for contextually aware responses
  • User Experience: Streamlit interface with secure authentication

Impact: Automated customer service that's actually helpful, respectful, and knows when to pull from structured vs. unstructured data.


Big data meets crime pattern detection

Tech Stack: GCP β€’ Dataproc β€’ BigQuery β€’ Hadoop β€’ Hive β€’ Spark β€’ OpenRefine β€’ Tableau
Mission: Turn raw crime data into actionable insights for safer communities

Architecture breakdown:

  • Data Lake: Google Cloud Storage for scalable raw data management
  • Processing Power: Cloud Dataproc Hadoop cluster for distributed computing
  • Dual Query Engine: Compared Hive SQL vs. Spark SQL performance
  • Data Quality: OpenRefine for cleaning and standardization
  • Analytics: BigQuery external tables for pattern detection
  • Insights: Tableau dashboards revealing theft hotspots and temporal trends

Key Learning: Full data lifecycle on GCP - from messy police records to actionable intelligence.


Orchestrating 1M+ transactions with Airflow & dbt

Tech Stack: Airflow β€’ dbt β€’ PostgreSQL β€’ Docker β€’ CI/CD
Scale: 1M+ transactions β€’ 90% faster with incremental loading

The build:

  • Orchestration: Airflow DAGs managing the workflow
  • Transformation: dbt models (staging β†’ core) with incremental materialization
  • Quality: 9 automated tests ensuring data integrity
  • DevOps: GitHub Actions CI/CD for every commit
  • Infrastructure: Fully Dockerized for consistency across environments

πŸŽ“ Certifications & Training

🎯 Databricks Certified Data Engineer Associate

  • Production-grade data engineering on Databricks platform

πŸ“š DataExpert.io Bootcamp Graduate

  • Dimensional data modeling & fact table design
  • Apache Flink, Kafka, Snowflake, Spark deep-dives
  • Data quality patterns & pipeline maintenance
  • KPIs, experimentation, and analytical patterns

πŸ€– Google Gen AI Intensive (Kaggle)

  • 5-day bootcamp diving into LLM fundamentals
  • Studied 5 key research papers on generative AI

🐍 GitHub Activity Snake

github contribution grid snake animation

πŸ’» Current Focus

-- The data never lies, but sometimes it tells jokes

SELECT 
    'Suprathika' AS name,
    'Data Engineer' AS role,
    ARRAY[
        'Airflow orchestration patterns',
        'dbt incremental strategies',
        'Spark optimization wizardry',
        'Cloud-native architectures'
    ] AS currently_learning,
    ARRAY[
        'Production-grade systems',
        'ELT pipeline design',
        'Data quality frameworks',
        'Modern data stack'
    ] AS interests,
    'There\'s always something to learn 🌟' AS motto
FROM data_engineer_life
WHERE coffee_level > 0
    AND curiosity = 'infinite';

πŸ“ What I'm Reading & Writing

Always hunting for the next "aha!" moment:

  • πŸ“ Reading: Medium & Substack for best practices, architecture patterns, and technical insights
  • ✍️ Want to Write Someday: Sharing my own journey and learnings
  • πŸ“– Tech Docs: Because sometimes the manual is the real MVP

There's always something to learn! ✨


Data Analytics

About

Config files for my GitHub profile.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published