Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View jmdu99's full-sized avatar

Highlights

  • Pro

Block or report jmdu99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jmdu99/README.md

Hi there 👋, I'm Jose

Freelance Data Engineer — Turning messy data into clarity for projects with real impact.
I work with purpose-driven teams to build data systems they can trust.

🛠 What I Do

  • 📊 Centralise scattered data into a single source of truth
  • ⚙️ Automate cleaning & validation for always-ready data
  • 🚀 Design efficient ETL/ELT pipelines (Airflow, dbt, Spark…)
  • 📈 Build solid foundations for BI, ML & GenAI
  • ⏱ Create real-time dataflows when speed matters

💻 Tech Stack

Python SQL Airflow dbt Spark AWS GCP Docker Git PostgreSQL BigQuery Terraform

🎯 About Me

Since 2021, I've worked in data across tech, banking, and large-scale systems (Amazon, Slido/Cisco).
In 2025, I went freelance to focus on projects with real impact — from healthtech and edtech to any sector that values purpose as much as results.
I also donate 10% of my earnings to the GiveWell Top Charities Fund.

🗂 Portfolio & Contact

💼 Portfolio requestLinkedIn
📩 Let’s connect and discuss how to make your data work better.

🏆 GitHub Trophies

trophy

Pinned Loading

  1. Hybrid-Fitness-Data-Pipeline-Batch-Streaming Hybrid-Fitness-Data-Pipeline-Batch-Streaming Public

    This project demonstrates a full hybrid fitness data pipeline combining real-time streaming (Kafka + MongoDB) with scheduled batch enrichment and loading (Prefect + Redshift). Dashboards are built …

    Python

  2. Hybrid-Nutrition-Data-Pipeline-Batch-Streaming Hybrid-Nutrition-Data-Pipeline-Batch-Streaming Public

    This project simulates a real-time and batch data pipeline for food item enrichment and nutritional analytics. It demonstrates a modern architecture that uses Kafka for streaming ingestion, Cassand…

    Python

  3. dbpedia/DBpedia-Spotlight-Dashboard dbpedia/DBpedia-Spotlight-Dashboard Public

    An integrated statistical information tool from the Wikipedia dumps and the DBpedia Extraction Framework artifacts

    Python 1

  4. Data-Processes-assignment Data-Processes-assignment Public

    COVID-19 survival analysis of a dataset and prediction using Python (sklearn, pandas, numpy, matplotlib, lifelines, mlxtend, joblib)

    Python 1

  5. Spark-Practical-Work Spark-Practical-Work Public

    Big Data: Spark Practical Work First Semester 2021/2022

    Scala

  6. Graph-Analysis-Social-Networks Graph-Analysis-Social-Networks Public

    Assignments made during the Graph Analysis and Social Networks course using Tweepy, NetworkX and NLTK.

    Jupyter Notebook 1