Thanks to visit codestin.com
Credit goes to github.com

Skip to content

liuliucu1/wasfea

Repository files navigation

Graph Autoencoder for Network Anomaly Detection

An optimized Python implementation for network traffic anomaly detection using a Graph Autoencoder (GAE) approach.

Features

  • Efficient packet parsing and flow construction from network traffic data
  • Graph representation of network flows with optimized memory usage
  • Graph autoencoder-based anomaly detection
  • Concept drift detection and adaptation capabilities
  • Performance optimization with:
    • Memory-mapped file access for fast data loading
    • NumPy vectorized operations for faster calculations
    • Batch processing of packets and flows
    • Multiprocessing support for parallel data processing
    • Sparse matrix representations for large graphs
    • Memory usage optimization for large datasets
    • Efficient graph construction with incremental building

Requirements

See requirements.txt for dependencies.

Usage

# Install requirements
pip install -r requirements.txt

# Basic usage
python main.py --data /path/to/data.txt --labels /path/to/labels.txt --output results

# Advanced usage with optimization options
python main.py --data /path/to/data.txt --labels /path/to/labels.txt --output results \
  --flow-timeout 60.0 --batch-size 100000 --epochs 100 --multiprocessing --workers 8

Command Line Arguments

  • --data: Path to the data file (required)
  • --labels: Path to the labels file (optional)
  • --output: Directory to save results (default: 'results')
  • --flow-timeout: Flow timeout in seconds (default: 60.0)
  • --batch-size: Batch size for processing (default: 100000)
  • --epochs: Number of training epochs (default: 100)
  • --multiprocessing: Use multiprocessing for data loading
  • --workers: Number of worker processes (default: auto)
  • --skip-training: Skip model training and load from disk
  • --model-path: Path to a saved model for loading

Performance Optimizations

This implementation includes several optimizations to handle large-scale network traffic data:

  1. Memory-mapped I/O: Uses memory mapping for efficient file access
  2. NumPy Vectorization: Employs NumPy arrays for vectorized operations
  3. Batched Processing: Processes packets and constructs flows in batches
  4. Multi-processing: Utilizes multiple CPU cores for data processing
  5. Memory Management: Includes memory usage tracking and garbage collection
  6. Sparse Matrices: Uses sparse matrix representations for large graphs
  7. Efficient Data Structures: Uses optimized data structures and hash tables
  8. Graph Batch Processing: Builds graphs incrementally with buffer management

Implementation Components

  • packet_parser.py: Parses packet data from input files
  • flow_constructor.py: Builds flow objects from packets
  • graph_constructor.py: Constructs graph representations from flows
  • graph_autoencoder.py: Implements the Graph Autoencoder model
  • main.py: Main execution script with command-line interface

Output

The system generates several output files in the specified directory:

  • Model weights (gae_model.pt)
  • Training loss curve
  • ROC and Precision-Recall curves (if labels are provided)
  • Evaluation metrics and performance statistics
  • Drift detection and adaptation results

License

This software is provided as-is for research and educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages