I'm interested in learning everything in new tech preferably in data and ai and excited about how they both are converging. I also like to watch movies, especially the mystery and horror thrillers.
- π Building end-to-end data pipelines, lakehouse architectures, and AI-powered solutions
- π± Always learning - currently exploring Airflow, dbt, Spark, and cloud data platforms
- π Medium and Substack enthusiast for data engineering wisdom
- π― Mission: Making data work smarter, not harder
- π¦ Organized person who believes in structure and clarity
ββββββββββ Python ββββββββββ SQL
ββββββββββ PySpark ββββββββββ Airflow
ββββββββββ dbt ββββββββββ Bash/Shell
|
S3 β’ Glue β’ Redshift β’ Talend |
ADLS β’ ADF β’ Databricks β’ DevOps |
Dataproc β’ Dataflow β’ BigQuery |
Computer Science: Data Structures & Algorithms β’ Database Management Systems β’ Operating Systems β’ Software Engineering β’ Web Technologies
Data Science: Big Data Analytics β’ Applied Machine Learning β’ Data Modeling β’ Knowledge Management β’ Principles of Data Science
Languages: Python β’ SQL β’ Java β’ C/C++ β’ Kotlin β’ HTML/CSS
Conversational AI meets retail intelligence
Tech Stack: Python β’ Llama 3.1 β’ LangChain β’ LangGraph β’ FAISS β’ MySQL β’ Streamlit
The Challenge: High query volumes + slow responses = frustrated customers
The Solution: Agentic RAG architecture with dual intelligenceWhat makes it special:
- Smart Routing: Agentic RAG Handler directs queries to the right brain
- SQL Agent: Real-time data from MySQL (orders, products, inventory)
- FAQ Agent: Instant answers from FAISS vector knowledge base
- LLM Orchestration: LangChain + LangGraph for seamless agent coordination
- Prompt Engineering: Zero-shot and few-shot techniques for contextually aware responses
- User Experience: Streamlit interface with secure authentication
Impact: Automated customer service that's actually helpful, respectful, and knows when to pull from structured vs. unstructured data.
Big data meets crime pattern detection
Tech Stack: GCP β’ Dataproc β’ BigQuery β’ Hadoop β’ Hive β’ Spark β’ OpenRefine β’ Tableau
Mission: Turn raw crime data into actionable insights for safer communitiesArchitecture breakdown:
- Data Lake: Google Cloud Storage for scalable raw data management
- Processing Power: Cloud Dataproc Hadoop cluster for distributed computing
- Dual Query Engine: Compared Hive SQL vs. Spark SQL performance
- Data Quality: OpenRefine for cleaning and standardization
- Analytics: BigQuery external tables for pattern detection
- Insights: Tableau dashboards revealing theft hotspots and temporal trends
Key Learning: Full data lifecycle on GCP - from messy police records to actionable intelligence.
Orchestrating 1M+ transactions with Airflow & dbt
Tech Stack: Airflow β’ dbt β’ PostgreSQL β’ Docker β’ CI/CD
Scale: 1M+ transactions β’ 90% faster with incremental loadingThe build:
- Orchestration: Airflow DAGs managing the workflow
- Transformation: dbt models (staging β core) with incremental materialization
- Quality: 9 automated tests ensuring data integrity
- DevOps: GitHub Actions CI/CD for every commit
- Infrastructure: Fully Dockerized for consistency across environments
π― Databricks Certified Data Engineer Associate
- Production-grade data engineering on Databricks platform
π DataExpert.io Bootcamp Graduate
- Dimensional data modeling & fact table design
- Apache Flink, Kafka, Snowflake, Spark deep-dives
- Data quality patterns & pipeline maintenance
- KPIs, experimentation, and analytical patterns
π€ Google Gen AI Intensive (Kaggle)
- 5-day bootcamp diving into LLM fundamentals
- Studied 5 key research papers on generative AI
-- The data never lies, but sometimes it tells jokes
SELECT
'Suprathika' AS name,
'Data Engineer' AS role,
ARRAY[
'Airflow orchestration patterns',
'dbt incremental strategies',
'Spark optimization wizardry',
'Cloud-native architectures'
] AS currently_learning,
ARRAY[
'Production-grade systems',
'ELT pipeline design',
'Data quality frameworks',
'Modern data stack'
] AS interests,
'There\'s always something to learn π' AS motto
FROM data_engineer_life
WHERE coffee_level > 0
AND curiosity = 'infinite';Always hunting for the next "aha!" moment:
- π Reading: Medium & Substack for best practices, architecture patterns, and technical insights
- βοΈ Want to Write Someday: Sharing my own journey and learnings
- π Tech Docs: Because sometimes the manual is the real MVP
There's always something to learn! β¨