CarePulse is an end-to-end data engineering pipeline designed to simulate real-world healthcare analytics. It combines batch and real-time streaming ingestion with slowly changing dimension (SCD) logic, Delta Lake medallion architecture, and KPI generation. This project was built entirely using Databricks Community Edition, Spark, and Power BI, making it fully free and reproducible.
- Apache Spark (Structured Streaming, DataFrame APIs)
- Delta Lake (Bronze, Silver, Gold architecture)
- Databricks Community Edition (or any Spark runtime)
- Kinesis-like simulation via Python scripts
- Power BI (dashboarding)
- Pandas for CSV export
- (Optional) Snowflake for external data warehouse integration
carepulse-healthcare-pipeline/
├── README.md
├── databricks_notebooks/
│ └── carepulse_end_to_end_notebook.ipynb
├── kinesis/
│ ├── kinesis_vitals_producer.py
│ └── kinesis_vitals_consumer.py
├── data/
│ ├── patients.csv
│ ├── organizations.csv
├── images/
│ ├── architecture_diagram.png
│ └── powerbi_dashboard.png
├── snowflake_export_code.py
└── requirements.txt
This pipeline is structured using the medallion architecture:
- Bronze Layer: Raw ingestion of batch CSVs and simulated real-time vitals
- Silver Layer: Cleaned data with SCD Type 2 tracking for
dim_patientanddim_hospital - Gold Layer: Aggregated KPIs per patient
- Batch:
patients.csv,organizations.csv
- Streaming:
- Python-based producer simulating vitals (heart rate, BP, etc.) per second
- ✅ Batch ingestion of healthcare data (patients, hospitals)
- ✅ Real-time vitals data simulation using Python (Kinesis-style)
- ✅ SCD Type 2 implementation for dimension tables
- ✅ Delta Lake storage and versioning
- ✅ Enriched fact table combining streaming + dimension joins
- ✅ Gold KPIs: heart rate, BP anomalies, hospital visits, high-risk tagging
- ✅ Exported to Power BI for visualization
- ✅ Ready for Snowflake export (optional script provided)
avg_heart_ratebp_abnormal_counthospitals_visitedcity_change_countis_high_risk_patient
- Import the notebook into Databricks CE
- Upload
patients.csvandorganizations.csvinto/FileStore/tables/ - Run the
kinesis_vitals_producer.pyscript locally - Upload generated
.parquetvitals files to/mnt/bronze/vitals - Run notebook to process Bronze → Silver → Gold
- Export
gold_patient_kpisto CSV for Power BI or Snowflake
A snowflake_export_code.py script is provided for pushing gold tables to Snowflake using write_pandas().
Power BI dashboard includes:
- High-risk patients filter
- Avg heart rate bar charts
- BP risk vs hospital visits scatter plot
Madhur Dixit
MIT License

