Deep learning–based static malware analysis system that classifies executable binaries into 31 malware families using image-based representations and transfer learning.
- Total Classes: 31 Malware Families
- Training Samples: 8,388
- Validation Samples: 1,480
- Test Samples: 3,879
- Input Size: 384 × 384
- Batch Size: 16
- Model: EfficientNetV2-S (Pretrained on ImageNet)
- Optimizer: Adam
- Loss: Categorical Cross-Entropy (with Label Smoothing)
- Test Accuracy: 95%
- Macro F1-Score: 0.96
- Weighted F1-Score: 0.95
Strong class-wise precision and recall across most malware families.
Traditional signature-based antivirus systems struggle against obfuscated and zero-day malware.
This project implements a static, image-based malware classification pipeline that:
- Detects malicious binaries
- Classifies them into 31 malware families
- Avoids execution of untrusted files
- Uses transfer learning instead of handcrafted features
Executable binaries are converted into grayscale/RGB images by mapping raw byte values (0–255) to pixel intensities.
This preserves structural patterns such as entropy regions, packed sections, and instruction repetition.
- Resized to 384 × 384
- RGB format (3 channels)
- Normalized using EfficientNetV2
preprocess_input - Data augmentation applied:
- Random flip
- Random rotation
- Random zoom
- Random contrast
- EfficientNetV2-S (include_top=False)
- Global Average Pooling
- Batch Normalization
- Dense Layer (Swish activation)
- Dropout
- Final Dense (Softmax – 31 classes)
- Warm-up phase (base frozen)
- Head training
- Fine-tuning selected deeper layers
- Class weights applied for imbalance
- Early stopping + ReduceLR callbacks
- Model saved in
.kerasand.h5formats
Model shows stable convergence without severe overfitting.
Strong diagonal dominance indicates accurate class-wise prediction across malware families.
Balanced precision and recall across both frequent and minority classes.
Final evaluation performed on completely unseen test dataset.
- Python
- TensorFlow / Keras
- EfficientNetV2
- NumPy
- Scikit-learn
- Matplotlib
- GPU Acceleration (if available)
- 31 Malware Families
- Directory-based labeling
- Train / Validation / Test split
- Class imbalance handled using class weights
- Resolution: 384 × 384
- RGB format
- EfficientNet preprocessing applied
- Loss: Categorical Cross-Entropy with Label Smoothing
- Optimizer: Adam
- Batch Size: 16
- Transfer Learning Enabled
- Fine-Tuning Applied
- Early Stopping & Learning Rate Reduction
- Class Weights Integrated
pip install -r requirements.txtpython (classification)efficientnet v2s.ipynb
python (detection)efficientnet v2 s.ipynbpython launch_streamlit_tunnel.py- Static analysis (no malware execution)
- Transfer learning for faster convergence
- Handles class imbalance
- Modular deep learning pipeline
- Deployment-ready model formats
- Static analysis only (no behavioral features)
- Performance depends on dataset diversity
- Requires periodic retraining for evolving malware
Complete 34-page documentation available in:
Reports.pdf
This project demonstrates that image-based malware representation combined with EfficientNetV2 transfer learning achieves high multi-class classification performance (95% accuracy) while maintaining balanced precision and recall across malware families.
It provides a scalable, reproducible, and research-backed deep learning pipeline for static malware analysis.
Tharun Sridhar




