🛡️ Credit Card Fraud Detection — End-to-End Data Science Project

📖 Overview

This project demonstrates how machine learning can be applied to detect fraudulent credit card transactions in a real-world banking scenario.
Fraud detection is highly imbalanced: fraudulent transactions make up less than 0.2% of all data.
The challenge is to catch fraud while minimizing false alerts that overwhelm analysts and disrupt legitimate customers.

🛠️ Methodology

Data

284,807 anonymized transactions from Kaggle: Credit Card Fraud Detection.
Features V1...V28 are PCA-transformed for confidentiality.
Added features Time and Amount.

Process

Data Wrangling & Preprocessing
- Checked for nulls, validated schema.
- Scaled Amount and Time.
- Stratified train/validation split to preserve fraud ratio.
Modeling
- Logistic Regression (baseline).
- Random Forest with calibrated probabilities.
- Logistic Regression with SMOTE (oversampling minority class).
Evaluation
- Metrics: Precision, Recall, F1, PR-AUC, ROC-AUC.
- Focused on precision/recall trade-offs instead of accuracy.
- Threshold tuning to balance fraud detection with false alert reduction.
Interpretability
- Logistic Regression coefficients (top predictive components).
- Random Forest feature importances.
- Business-friendly explanations of model signals.
Deployment Awareness
- Saved trained models and scaler with joblib.
- Produced a comparison table for business decision-making.

📉 Precision–Recall Trade-off

The Random Forest model achieves a strong balance between recall (fraud caught) and precision (alerts that are correct).
The chart below shows how precision increases as recall decreases when the decision threshold is adjusted.

📊 Results

Model @ point	Threshold	Precision	Recall	F1	ROC-AUC	PR-AUC
Logistic Regression	0.973	0.517	0.854	0.644	0.973	0.704
Random Forest (best F1)	0.365	0.901	0.813	0.855	0.970	0.836
Random Forest (rec≥80%)	0.406	0.908	0.805	0.853	0.970	0.836
SMOTE + LogReg	1.000	0.844	0.789	0.815	0.972	0.711

Key Insights:

Logistic Regression: high recall (85%) but very noisy (52% precision).
Random Forest: best balance — ~81% recall with ~90% precision.
SMOTE + Logistic: better than baseline LR, but not as strong as Random Forest.

Business Takeaway:
A tuned Random Forest reduces false alerts by >90% while still catching >80% of fraud cases.
This makes fraud alerts more trustworthy for analysts and reduces unnecessary customer disruptions.

⚙️ How to Run

Download dataset from Kaggle.
Place creditcard.csv in your working directory.
Install requirements:
```
pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
artifacts		artifacts
images		images
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛡️ Credit Card Fraud Detection — End-to-End Data Science Project

📖 Overview

🛠️ Methodology

Data

Process

📉 Precision–Recall Trade-off

📊 Results

⚙️ How to Run

About

Uh oh!

Releases

Packages

Languages

Ganaa088/fraud-detection

Folders and files

Latest commit

History

Repository files navigation

🛡️ Credit Card Fraud Detection — End-to-End Data Science Project

📖 Overview

🛠️ Methodology

Data

Process

📉 Precision–Recall Trade-off

📊 Results

⚙️ How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages