- 🧠 PhD in Experimental Physics turned Data Engineer
- 💻 Building production-ready data pipelines with modern DevOps practices and scientific rigor
- 🚀 Specialized in streaming architectures, cloud data platforms, and pipeline orchestration
- 🔧 I design end-to-end data systems: ingestion, transformation, storage, monitoring, and orchestration
- 🎯 Currently mastering Apache Kafka and real-time streaming for event-driven data architectures
- 📍 Open to opportunities in Geneva/Switzerland area (CERN, finance, research organizations, tech companies)
- Data Engineering: Python (Advanced), SQL, Apache Kafka, PySpark, dbt, Apache Airflow, ETL/ELT pipelines
- Databases: PostgreSQL, Redshift, DuckDB, data modeling (star schema, normalization)
- DevOps & Cloud: Docker, Kubernetes (basics), Terraform, AWS (S3, Redshift, MWAA), CI/CD (GitHub Actions)
- APIs & Backend: FastAPI, REST APIs, data ingestion endpoints
- Monitoring: Structured logging, metrics design (Grafana concepts), CloudWatch
- Practices: Git workflows, automated testing (pytest), data quality checks, GDPR compliance, clean architecture
Production-ready Kafka pipeline for financial market indices
- Streaming Architecture: Apache Kafka producers/consumers with confluent-kafka (Python)
- Data Persistence: DuckDB for analytics, PostgreSQL-ready patterns
- Monitoring: Metrics collection (throughput, lag, errors), structured logging
- DevOps: Docker containerization, GitHub Actions CI/CD, Kubernetes deployment concepts
- Scale: Real-time ingestion of S&P 500, STOXX 600, Nikkei 225 indices
Tech: Apache Kafka, confluent-kafka, Python, DuckDB, Docker, GitHub Actions
Cloud-native AWS data platform with enterprise DevOps practices
- Infrastructure as Code: Terraform (VPC, S3, Redshift, MWAA)
- Orchestration: Apache Airflow with dependency management and retry logic
- Data Transformation: dbt Core with 90%+ test coverage, schema validation, incremental models
- Monitoring: CloudWatch logs, alerts, performance metrics
- Scale: 500+ stocks, 11 sectors, daily automated updates
Tech: Python, Terraform, AWS (S3, Redshift, MWAA), dbt, Airflow, CloudWatch
End-to-end data platform for traffic analysis and staffing optimization
- Architecture: FastAPI ingestion → PySpark transformations → DuckDB datamarts → Streamlit dashboards
- Orchestration: Airflow DAGs (batch + near-real-time) with fault tolerance
- Data Quality: 95% test coverage (unit + integration tests), automated validation
- Deployment: Docker containerization, CI/CD pipeline
- Impact: Data-driven staffing decisions based on visitor flow patterns
Tech: FastAPI, PySpark, Airflow, Docker, DuckDB, Parquet, Streamlit, GitHub Actions
- Data engineering projects (streaming, batch, hybrid architectures)
- Production data pipelines with modern DevOps practices
- Cloud-native data platforms (AWS, GCP)
- Data systems for research organizations or scientific applications (CERN-style environments)
🌐 Portfolio: michaelg-create.github.io/portfolio
💼 LinkedIn: michaelgarcia838383
📧 Email: [email protected]
📝 Medium: Teaching Kafka for Data Engineers
12 years teaching physics in middle and high school developed my ability to explain complex technical systems clearly and support colleagues adopting new tools—skills I now apply when documenting data pipelines and collaborating with stakeholders.
PhD in Experimental Physics investigating microscale swimming dynamics. Built complete automated data processing pipeline for 5–10 TB experimental video datasets:
- Image processing automation (ImageJ macros) → 50% time reduction
- Custom tracking algorithms (IDL) analyzing 500+ videos, 50,000+ trajectories
- Statistical computing pipelines and reproducible analysis workflows
- International collaboration: 6+ researchers across 5 countries (100% English)
2 years as Retail Banking Advisor gave me deep understanding of financial operations, data quality requirements, and the business impact of reliable data systems.
This unconventional path combines scientific rigor, modern data engineering practices, and strong communication skills—ideal for data platform roles where technical depth meets user support.


