Michael Garcia-Rollet MichaelG-create

👋 Hi, I'm Michael Garcia (@MichaelG-create)

🧠 PhD in Experimental Physics turned Data Engineer
💻 Building production-ready data pipelines with modern DevOps practices and scientific rigor
🚀 Specialized in streaming architectures, cloud data platforms, and pipeline orchestration
🔧 I design end-to-end data systems: ingestion, transformation, storage, monitoring, and orchestration
🎯 Currently mastering Apache Kafka and real-time streaming for event-driven data architectures
📍 Open to opportunities in Geneva/Switzerland area (CERN, finance, research organizations, tech companies)

🔧 Tech Stack

Data Engineering: Python (Advanced), SQL, Apache Kafka, PySpark, dbt, Apache Airflow, ETL/ELT pipelines
Databases: PostgreSQL, Redshift, DuckDB, data modeling (star schema, normalization)
DevOps & Cloud: Docker, Kubernetes (basics), Terraform, AWS (S3, Redshift, MWAA), CI/CD (GitHub Actions)
APIs & Backend: FastAPI, REST APIs, data ingestion endpoints
Monitoring: Structured logging, metrics design (Grafana concepts), CloudWatch
Practices: Git workflows, automated testing (pytest), data quality checks, GDPR compliance, clean architecture

🚀 Featured Projects

🔹 kafka-market-stream - Real-Time Market Data Streaming

Production-ready Kafka pipeline for financial market indices

Streaming Architecture: Apache Kafka producers/consumers with confluent-kafka (Python)
Data Persistence: DuckDB for analytics, PostgreSQL-ready patterns
Monitoring: Metrics collection (throughput, lag, errors), structured logging
DevOps: Docker containerization, GitHub Actions CI/CD, Kubernetes deployment concepts
Scale: Real-time ingestion of S&P 500, STOXX 600, Nikkei 225 indices

Tech: Apache Kafka, confluent-kafka, Python, DuckDB, Docker, GitHub Actions

🔹 Sectoral - Financial Sectoral Analysis Pipeline

Cloud-native AWS data platform with enterprise DevOps practices

Infrastructure as Code: Terraform (VPC, S3, Redshift, MWAA)
Orchestration: Apache Airflow with dependency management and retry logic
Data Transformation: dbt Core with 90%+ test coverage, schema validation, incremental models
Monitoring: CloudWatch logs, alerts, performance metrics
Scale: 500+ stocks, 11 sectors, daily automated updates

Tech: Python, Terraform, AWS (S3, Redshift, MWAA), dbt, Airflow, CloudWatch

🔹 Bank Branch Footfall Analytics

End-to-end data platform for traffic analysis and staffing optimization

Architecture: FastAPI ingestion → PySpark transformations → DuckDB datamarts → Streamlit dashboards
Orchestration: Airflow DAGs (batch + near-real-time) with fault tolerance
Data Quality: 95% test coverage (unit + integration tests), automated validation
Deployment: Docker containerization, CI/CD pipeline
Impact: Data-driven staffing decisions based on visitor flow patterns

🌐 Live Demo | 🔗 Live API

Tech: FastAPI, PySpark, Airflow, Docker, DuckDB, Parquet, Streamlit, GitHub Actions

🤝 I'm open to collaborate on:

Data engineering projects (streaming, batch, hybrid architectures)
Production data pipelines with modern DevOps practices
Cloud-native data platforms (AWS, GCP)
Data systems for research organizations or scientific applications (CERN-style environments)

📫 Connect with me

🌐 Portfolio: michaelg-create.github.io/portfolio
💼 LinkedIn: michaelgarcia838383
📧 Email: [email protected]
📝 Medium: Teaching Kafka for Data Engineers

👨‍🔬 Background

12 years teaching physics in middle and high school developed my ability to explain complex technical systems clearly and support colleagues adopting new tools—skills I now apply when documenting data pipelines and collaborating with stakeholders.

PhD in Experimental Physics investigating microscale swimming dynamics. Built complete automated data processing pipeline for 5–10 TB experimental video datasets:

Image processing automation (ImageJ macros) → 50% time reduction
Custom tracking algorithms (IDL) analyzing 500+ videos, 50,000+ trajectories
Statistical computing pipelines and reproducible analysis workflows
International collaboration: 6+ researchers across 5 countries (100% English)

2 years as Retail Banking Advisor gave me deep understanding of financial operations, data quality requirements, and the business impact of reliable data systems.

This unconventional path combines scientific rigor, modern data engineering practices, and strong communication skills—ideal for data platform roles where technical depth meets user support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly