An optimized Python implementation for network traffic anomaly detection using a Graph Autoencoder (GAE) approach.
- Efficient packet parsing and flow construction from network traffic data
- Graph representation of network flows with optimized memory usage
- Graph autoencoder-based anomaly detection
- Concept drift detection and adaptation capabilities
- Performance optimization with:
- Memory-mapped file access for fast data loading
- NumPy vectorized operations for faster calculations
- Batch processing of packets and flows
- Multiprocessing support for parallel data processing
- Sparse matrix representations for large graphs
- Memory usage optimization for large datasets
- Efficient graph construction with incremental building
See requirements.txt for dependencies.
# Install requirements
pip install -r requirements.txt
# Basic usage
python main.py --data /path/to/data.txt --labels /path/to/labels.txt --output results
# Advanced usage with optimization options
python main.py --data /path/to/data.txt --labels /path/to/labels.txt --output results \
--flow-timeout 60.0 --batch-size 100000 --epochs 100 --multiprocessing --workers 8--data: Path to the data file (required)--labels: Path to the labels file (optional)--output: Directory to save results (default: 'results')--flow-timeout: Flow timeout in seconds (default: 60.0)--batch-size: Batch size for processing (default: 100000)--epochs: Number of training epochs (default: 100)--multiprocessing: Use multiprocessing for data loading--workers: Number of worker processes (default: auto)--skip-training: Skip model training and load from disk--model-path: Path to a saved model for loading
This implementation includes several optimizations to handle large-scale network traffic data:
- Memory-mapped I/O: Uses memory mapping for efficient file access
- NumPy Vectorization: Employs NumPy arrays for vectorized operations
- Batched Processing: Processes packets and constructs flows in batches
- Multi-processing: Utilizes multiple CPU cores for data processing
- Memory Management: Includes memory usage tracking and garbage collection
- Sparse Matrices: Uses sparse matrix representations for large graphs
- Efficient Data Structures: Uses optimized data structures and hash tables
- Graph Batch Processing: Builds graphs incrementally with buffer management
packet_parser.py: Parses packet data from input filesflow_constructor.py: Builds flow objects from packetsgraph_constructor.py: Constructs graph representations from flowsgraph_autoencoder.py: Implements the Graph Autoencoder modelmain.py: Main execution script with command-line interface
The system generates several output files in the specified directory:
- Model weights (
gae_model.pt) - Training loss curve
- ROC and Precision-Recall curves (if labels are provided)
- Evaluation metrics and performance statistics
- Drift detection and adaptation results
This software is provided as-is for research and educational purposes.