Healthcare Fraud Detection – CMS, Kaggle & Synthea Datasets

This project analyzes healthcare fraud patterns using three large-scale datasets:

CMS Medicare Data – Public provider billing records with cost and service metrics.
Kaggle Healthcare Fraud Dataset – Real-world data labeled with fraudulent claims.
Synthea Synthetic Data – Comprehensive synthetic EHR data including patients, conditions, and claims.

Python, Pandas, Seaborn, Scikit-learn, Matplotlib
Data preprocessing, outlier handling, feature creation
Visualization and statistical summary
Prepared for modeling with textbook methods from An Introduction to Statistical Learning

Course Project – Statistical Learning (Spring 2025)
Team Members: Nhan, Tan, Andre

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
CMS.rar		CMS.rar
Kaggle.rar		Kaggle.rar
Project Report 1- Group 2.ipynb		Project Report 1- Group 2.ipynb
Project Report 1- Group 2.pdf		Project Report 1- Group 2.pdf
Project Report 1.pdf		Project Report 1.pdf
Project Report 2- Group 2.ipynb		Project Report 2- Group 2.ipynb
Project Report 2- Group 2.pdf		Project Report 2- Group 2.pdf
README.md		README.md
Synthea.rar		Synthea.rar

Provide feedback