A Universal Benchmarking Framework for PyTorch-2, Tensorflow-2 Performance.
Evaluate and compare GPU and CPU performance with unparalleled accuracy using PyTorch-2.9, Tensorflow-2.20, pytest, and detailed Allure reporting. This robust framework offers out-of-the-box support for heterogeneous hardware, including NVIDIA, AMD, Intel, DirectML, and standard CPU-only execution. Generate clear performance metrics and interactive dashboards to quickly identify bottlenecks and optimize model execution across any accelerator.
This project is built on a comprehensive CI/CD pipeline and an automated Kubernetes deployment workflow:
- Testing: Developers run benchmarks locally using Pytest with specific markers (
-m gpu,-m cpu) to validate performance and collect detailed results. - CI/CD (GitHub Actions / Jenkins): The
ci.ymlworkflow in GitHub Actions (or an equivalent Jenkins pipeline) is triggered upon code changes.- It executes the benchmark tests against various hardware configurations.
- It uses Docker to ensure a consistent, reproducible environment for testing.
- It generates Allure Reports and plots system metrics (
scripts/plot_gpu_metrics.py).
- Docker Image Creation: Using one of the provided
Dockerfilevariants (Dockerfile.mini,Dockerfile.report), a Docker image containing the test environment, report server, and dependencies is built. - Registry Push: The final image is tagged and pushed to Docker Hub (or a private registry).
The deploy_gpu_workflow.py script manages the final deployment to a Kubernetes cluster:
- Cluster Cleanup: It first runs
kubectl delete deployment --allfor a clean state. - Dynamic GPU Detection: It scans cluster nodes for available extended GPU resources (e.g.,
gpu.intel.com/i915,nvidia.com/gpu). - Resource Allocation: The deployment manifest is dynamically configured to request the detected GPU resource or fall back to standard CPU limits (1 core / 1Gi).
- Deployment & Access: It creates the optimized Kubernetes Deployment and Service. Once the Pod is running, it initiates a blocking
kubectl port-forwardto map the cluster service (Port 80) to your local machine (Port 8080), allowing instant, interactive access to the Allure Report dashboard viahttp://127.0.0.1:8080.
This framework implements GPU/CPU performance benchmarking using PyTorch, TensorFlow, Pytest, Pytest-benchmark, and leveraging advanced CI/CD with Kubernetes & Docker.
It automatically detects available accelerators, measures inference throughput, GPU/CPU utilization, I/O, memory usage, etc., and produces interactive Allure reports for analysis.
| Component | Technology | Role |
|---|---|---|
| Test Runner | pytest | Executes benchmark and stress tests. |
| Performance Metrics | pytest-benchmark / SystemMetrics | Measures FPS, CPU/GPU utilization, memory usage. |
| GPU Detection | gpu_check.py | Detects NVIDIA CUDA, AMD ROCm, Intel GPU, DirectML, or CPU fallback. |
| Reporting | Allure | Generates professional, interactive HTML dashboards with charts. |
- π Python 3.10+ (recommended)
- π Optional: Allure command-line tool for report viewing
- π» Windows or Linux system with GPU support (optional for CPU-only fallback)
Clone the repository:
git clone https://github.com/luckyjoy/gpu_benchmark.git
cd gpu_benchmarkRun the setup script to create a virtual environment, install dependencies, and detect GPU on localhost.
Usage: python gpu_benchmark.py <Build_Number> [suite]
Example: python gpu_benchmark.py 4 gpu
-> Run GPU Benchmark with GPU suite, build number 4, on localhost, and generate ALlure report. (No CI/CD, Kubernetes, Docker)
Usage: python run_kubernestes.py <Build_Number> [Suite_marker or tests] [Dockerfile]
Example: python run_kubernestes.py 1 tests/test_data_preprocessing.py Dockerfile.custom
Example: python run_kubernestes.py 1 -m gpu Dockerfile.custom
Defaults: Dockerfile=Dockerfile.mini, Test=tests/test_data_preprocessing.py
-> Builds Docker images, runs GPU Benchmark [Suite_marker or tests] with build number 1,
-> Generates Allure report, and pushes Docker images to Docker Hub.
Usage: python deploy_gpu_workflow.py <Build_Number>
Example: python deploy_gpu_workflow.py 1
-> Creates the necessary Pod deployment and service, monitors Pod creation & reports scheduling events.
-> Deploys Docker image (build tag number 1) from Docker Hub, assigns a worker to run the Docker image within the assigned Pod.
-> Generates Allure Report.
The script will:
- Create
venv310if missing - Detect available GPU or fall back to CPU
- Install all required packages in requirements.txt
- Run benchmark tests and store results in
allure-results/
Ensure consistent results across systems by running inside Docker.
Image: gpu-benchmark:latest β includes:
- Python 3.10 or 3.11 environment
- Required preinstalled packages
- Allure CLI for reporting
/appas working directory
Script: run_docker.bat (Windows)
Workflow:
| Step | Description |
|---|---|
| 1οΈβ£ Check Docker | Verifies Docker Desktop is running. |
| 2οΈβ£ Clean Up | Deletes previous allure-results and .benchmarks. |
| 3οΈβ£ Build / Pull | Builds or updates Docker image. |
| 4οΈβ£ Execute Tests | Runs GPU/CPU benchmark suite. |
| 5οΈβ£ Generate Report | Produces Allure HTML output. |
| 6οΈβ£ Serve Report | Opens Allure dashboard locally. |
Command to execute:
run_docker.batgpu_benchmark/
ββ Dockerfile # Main Docker build file
ββ Dockerfile.mini # Minimal Docker build file
ββ Dockerfile.report # Docker build file for the report server
ββ Jenkinsfile # Option for Jenkins CI pipelines
ββ README.md
ββ requirements.txt
ββ pytest.ini # Pytest configuration
ββ g.bat # Convenience batch file
ββ gpu_benchmark.py # Main setup & execution script
ββ deploy_gpu_workflow.py # Kubernetes GPU auto-detection and deployment script
ββ run_docker.py # Script to run tests inside Docker
ββ run_gpu_benchmark.bat # Windows batch script to run benchmarks
ββ run_kubernestes.py # Kubernetes execution wrapper
ββ configs/ # Kubernetes job configurations
ββ allure-report/ # Dynamic history report files
ββ allure-results/ # Pytest-Allure raw results directory
ββ images/ # Documentation image assets
β ββ allure_report.jpg
β ββ gpu_cpu_utilization.png
ββ scripts/ # Utility scripts for metrics, plotting, and trend analysis
β ββ __init__.py
β ββ gpu_utils.py
β ββ plot_gpu_metrics.py # Generate charts for Allure
β ββ system_metrics.py # Capture CPU/GPU system metrics
β ββ update_trend.py
ββ supports/ # GPU detection and telemetry logic
β ββ __init__.py
β ββ categories.json
β ββ environments.properties
β ββ executor.json
β ββ gpu_check.py # Detects available hardware devices
β ββ gpu_monitor.py # Real-time GPU monitoring
β ββ performance_trend.py
β ββ telemetry_collector.py # Gathers performance data
β ββ telemetry_hook.py
β ββ telemetry_trend.py
β ββ telemetry_visualizer.py
β ββ ubuntu.properties
β ββ windows.properties
ββ tests/ # Benchmark test cases
β ββ __init__.py
β ββ conftest.py # Pytest fixtures and hooks
β ββ device_utils.py # Utilities for device handling
β ββ test_amd_gpu_accelerator.py
β ββ test_cpu_reference.py # CPU-only benchmarks
β ββ test_data_preprocessing.py
β ββ test_directml_gpu_accelerator.py
β ββ test_gpu_compute.py
β ββ test_gpu_convnet.py
β ββ test_gpu_matrix_mul.py
β ββ test_gpu_memory.py
β ββ test_gpu_mixed_precision.py
β ββ test_gpu_model_inference.py
β ββ test_gpu_stress.py
β ββ test_gpu_tensorflow_benchmark.py
β ββ test_gpu_transformer.py
β ββ test_idle_baseline.py
β ββ test_inference_load.py
β ββ test_intel_gpu_accelerator.py
β ββ test_io_accelerator.py
β ββ test_multi_gpu.py
β ββ test_network_io_accelerator.py
β ββ test_nvidia_gpu_accelerator.py
β ββ test_nvidia_real_gpu.py
β ββ test_nvidia_tensorrt_cudnn.py
β ββ test_parallel_training.py
ββ .github/ # GitHub Actions CI/CD workflows
ββ venv310/ # Optional virtual environment
ββ .benchmarks/ # Pytest-benchmark history
The framework uses Pytest Markers (-m) to categorize and select specific test suites for execution.
| Tag | Focus Area | Description |
|---|---|---|
gpu |
Core Benchmark | Tests running on any available accelerator (CUDA / ROCm / DirectML / Intel GPU). |
cpu |
Fallback / Reference | Tests running on CPU fallback. |
nvidia |
NVIDIA-Specific | Tests targeting NVIDIA CUDA features (e.g., CUDA, Tensor Cores). |
amd |
AMD-Specific | Tests targeting AMD ROCm features. |
intel |
Intel-Specific | Tests targeting Intel oneAPI / i915 features. |
directml |
DirectML-Specific | Tests targeting DirectML features (Windows / WSL). |
benchmark |
Performance Metric | Measures FPS, utilization, memory, and throughput. |
stress |
Endurance / Load | Heavy-load GPU endurance tests. |
Use the commands below to execute specific test suites and generate Allure data locally.
| Execution Mode | Command |
|---|---|
| Run All GPU Benchmarks | pytest -m gpu --alluredir=allure-results -v |
| Run All CPU Benchmarks | pytest -m cpu --alluredir=allure-results -v |
| Run Specific Tag (e.g., Performance + GPU) | pytest -m "benchmark and gpu" --alluredir=allure-results |
| Run GPU/CPU Combined | pytest -m "benchmark or gpu or cpu" |
python -m venv venv310
Linux/macOS (Bash/Zsh): source venv310/bin/activate
Windows (Command Prompt): call venv310\Scripts\activate
pytest --alluredir=allure-results
pytest /tests/test_gpu_tensorflow_benchmark --alluredir=allure-results
pytest -m "gpu or cpu" --alluredir=allure-results
pytest -m gpu --alluredir=allure-results
allure serve allure-results
πΈ Allure Report Preview:
Opens an interactive HTML dashboard with detailed execution insights.
pytest --html=reports/report.html --self-contained-html| System | Description |
|---|---|
| Jenkins / GitHub Actions | Automates test execution and report generation |
| Docker | Guarantees repeatable benchmark environments |
| Allure | Produces professional dashboards for CI/CD pipelines |
- Fork the repository
- Create a feature branch
- Implement new tests, benchmarks, or reporting features
- Run
pytest -vlocally and verify results - Submit a Pull Request with a clear description
Code Style:
- Follow PEP8 conventions
- Use pytest markers consistently
- Ensure Allure reports generate without errors
- Document new metrics or tests in Allure charts
Released under the MIT License β free to use, modify, and distribute.
π¬ Contact: Bang Thien Nguyen [email protected]
βMeasure performance before you optimize β know your hardware before you test your code.β


