Thanks to visit codestin.com
Credit goes to github.com

Skip to content

nhanizDee/Predicting-Health-Insurance-Fraud-Using-Machine-Learning

Repository files navigation

Healthcare Fraud Detection – CMS, Kaggle & Synthea Datasets

This project analyzes healthcare fraud patterns using three large-scale datasets:

  1. CMS Medicare Data – Public provider billing records with cost and service metrics.
  2. Kaggle Healthcare Fraud Dataset – Real-world data labeled with fraudulent claims.
  3. Synthea Synthetic Data – Comprehensive synthetic EHR data including patients, conditions, and claims.

📊 Project Objectives:

  • Explore service volume and financial metrics across providers and states.
  • Detect patterns of excessive billing and service anomalies.
  • Prepare datasets for machine learning models focused on fraud detection.

🔧 Tools & Techniques:

  • Python, Pandas, Seaborn, Scikit-learn, Matplotlib
  • Data preprocessing, outlier handling, feature creation
  • Visualization and statistical summary
  • Prepared for modeling with textbook methods from An Introduction to Statistical Learning

Course Project – Statistical Learning (Spring 2025)
Team Members: Nhan, Tan, Andre

About

Healthcare Fraud Detection Using CMS, Kaggle, and Synthea Datasets :An exploratory data science project to analyze and detect potential healthcare fraud using three real-world and synthetic healthcare datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors