Software-Only Memory-Safety for GPUs via PTX Guards and Managed Shadow Metadata
sf-nvcc is a drop-in replacement for nvcc that automatically injects spatial
and temporal memory safety checks into CUDA kernels.
It instruments generated PTX to track and validate global memory accesses at
runtime, catching out-of-bounds and temporal access violations in global memory
in cuda code, killing the kernel and raising a runtime error whenever
cudaDeviceSynchronize or cudaGetLastError are called.
It wraps some of the main CUDA functions such as cudaMalloc,
cudaMallocManaged, cudaFree etc and generates a table in memory with each
entry taking 32bytes (for alignment reasons) and performs a bounds check during
runtime.
It's split into two parts, a compile time part sf-nvcc which is responsible
for instrumenting the ptx with the bounds_check mechanism and a runtime part
libsafecuda.so which acts as a drop in solution by letting the user just
overload LD_PRELOAD to call the weapper CUDA functions.
Install the following dependencies:
- gcc-14
- g++-14
- cmake
- ninja
- cuda-toolkit-12-9 (https://developer.nvidia.com/cuda-downloads)
- gtest
- gtest-devel
- zlib
Note: Ensure that the CUDA toolkit is installed in /usr/local/cuda or
/opt/cuda/.
Note: Ensure that gcc-14 and g++-14 are symlinked to gcc and g++
respectively in CUDA's bin directory.
sudo ln -s /usr/bin/gcc-14 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-14 /usr/local/cuda/bin/g++Note: If you encounter issues failing to compile due to math function specifications (this is a known issue), apply the patch with (run as root):
sh ./scripts/apply_math_fix.sh- Clone the repository
- Navigate to the repository directory:
cd SafeCUDA - Run the build script:
Replace
sh ./scripts/build.sh [Debug|Release][Debug|Release]with your desired build type (default is Debug).-
If you want you can test the build using:
sh ./scripts/test.sh
and run a test example using
sh ./scripts/build_and_run_example.sh
Or you can build and run ctest using:
sh ./scripts/build_and_test.sh [Debug|Release]
-
-
After building, you can use
sf-nvccfrom either of the build directories ( examples will use the release build) to build your cuda program.cmake-build-Release/sf-nvcc [SafeCUDA options] <standard nvcc args>
For example:
cmake-build-Release/bin/sf-nvcc -Xcompiler -fPIC --extended-lambda --expt-relaxed-constexpr --generate-code arch=compute_75,code=sm_75 -rdc=true examples/example.cpp examples/out_of_bounds.cu -o example -Wno-deprecated-gpu-targets -sf-keep-dir "cmake-build-Debug/examples" -sf-debug true
-
After successful compilation with
sf-nvccyou can run your executable after exporting these variables in your terminal session:export LD_PRELOAD=cmake-build-Release/libsafecuda.so:$LD_PRELOAD export LD_LIBRARY_PATH=cmake-build-Release/:$LD_LIBRARY_PATH <your-executable>
For example (if you followed the previous build example):
export LD_PRELOAD=cmake-build-Release/libsafecuda.so:$LD_PRELOAD export LD_LIBRARY_PATH=cmake-build-Release/:$LD_LIBRARY_PATH ./example
Note: In the current build the following cuda functions are wrapped
- cudaMalloc
- cudaManagedMalloc
- cudaFree
- cudaDeviceSynchronize
- cudaLaunchKernel
- cudaGetLastError
However the real functions can be accessed by simply prepending a real_ before
their names, for example real_cudaMalloc
Here's a cheatsheet for the sf-nvcc specific arguments, a detailed description
can be obtained from sf-nvcc -sf-help
| Option | Description |
|---|---|
-sf-help |
Show SafeCUDA help |
-sf-version |
Print version/build info |
-sf-debug <true | FALSE> |
Enable detailed PTX instrumentation logging |
-sf-verbose <true | FALSE> |
Show full compile output |
-sf-fail-fast <TRUE | false> |
Abort on first violation |
-sf-keep-dir <path> |
Keep intermediate build files |
Note: The following nvcc args are stripped out:
-dryrun
--keep
--keep-dir
-lcudart_static
A test suite with some representative workload kernels are present in the
perf_tests folder. You can use the runner script perf_tests/run_perf.py to
benchmark and get the results as a csv file in
perf_tests/results/perf_results_<timestamp>.csv
| Test | Description |
|---|---|
| perf1 | Large vector ops |
| perf2 | Parallel sum reduction |
| perf3 | Memory copy + scaling |
| perf4 | Realistic HPC / ML compute loop |
| perf5 | Synthetic memory stress (~4 GB traffic) |
| perf6 | Multi-allocation test (1000 buffers, ~1 GB traffic) |
Format the code using clang-format by running:
sh ./scripts/format.shNote: Please enable the the pre-commit hook which automatically formats code & test changes in Release build before committing. You can enable it by running:
sh ./scripts/enable_pre_commit_hook.sh- Use of curand functions will trigger a SIGSEGV on the device due to how nullptrs are used in there