This project aims to build robust models to detect fake Instagram profiles using two distinct approaches:
- Updated Baseline Model: Enhanced Deep Neural Network with advanced hyperparameter tuning and regularization techniques.
- New Model: Combines Traditional Machine Learning models (Logistic Regression, Random Forest) with an Enhanced Neural Network architecture.
Techniques Used:
- Optuna Hyperparameter Tuning: Optimizes number of layers, units, dropout rate, learning rate.
- Batch Normalization & Dropout: Stabilizes learning, faster convergence, improved generalization.
- EarlyStopping & ReduceLROnPlateau: Prevents overfitting, dynamically adjusts learning rate.
- AdamW Optimizer: Decouples weight decay for better regularization.
- Proper Seeding: Ensures reproducibility of results.
Impact:
- Improved accuracy, generalization, and stability.
- Reduced overfitting and better model convergence.
Preprocessing:
- Checked for missing values.
- Handled class imbalance using upsampling of minority (fake) class.
Feature Selection Techniques:
- Low Variance Removal: Eliminated features with near-zero variance.
- High Correlation Removal: Removed features with correlation > 0.9.
- Random Forest Feature Importance: Selected significant features (importance > 0.01).
Models Implemented:
- Logistic Regression: Provides baseline performance.
- Random Forest Classifier: Handles feature interactions & non-linearity well.
- Enhanced Neural Network:
- 4 Dense Layers: 256 → 64 → 64 → 32 → Output.
- Batch Normalization after each Dense layer.
- Dropout (0.3) to prevent overfitting.
- AdamW Optimizer with EarlyStopping & ReduceLROnPlateau.
Evaluation:
- Precision, Recall, F1-score.
- Confusion Matrix.
- Loss & Accuracy Curves.
pip install -r requirements.txt
Place the following files in your project directory:
train.csv
test.csv
# For Updated Baseline Model:
python updated_baseline_model.py
# For New Model:
python NewModels.py
Feature | Details |
---|---|
PEP8 Compliant | Proper indentation, spacing, and clear naming conventions |
Modular Code | Functions for preprocessing, training, evaluation |
Reproducibility | Seeded numpy , random , tensorflow for consistent results |
Clear Documentation | Comments & docstrings for each block and function |
Visualization | Confusion Matrix, Training Curves with Matplotlib/Seaborn |
Model | Precision | Recall | F1-Score |
---|---|---|---|
Original Baseline Model | 88 | 88 | 88 |
Logistic Regression | 88 | 88 | 87 |
Random Forest | 92 | 92 | 92 |
Enhanced Neural Network | 92 | 92 | 92 |
**Updated Baseline ** | 93 | 93 | 92 |
File Name | Description |
---|---|
NewModels.py |
New Model pipeline (ML + Enhanced Neural Network) |
updated_baseline_model.py |
Enhanced Baseline Deep Neural Network Model |
train.csv , test.csv |
Input datasets |
requirements.txt |
Required dependencies |
README.md |
Project overview & instructions |