-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Incorrect exponential calculation on Jetson devices with float32 dtype #61110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note: my current workaround is to call
which gives the correct result but involves two type-casts. |
Triage review: Maybe promote to High Pri? |
@BrettRyland can you please check if torch.exp problems are reproducible with official PyTorch cpu-only builds, which could be downloaded from https://pypi.org/project/torch/#files gcc-7.5 have several known compiler bugs, which yields incorrect code for NEON optimized operations(that is almost everything using float32) |
Using the cpu-only build torch-1.9.0-cp36-cp36m-manylinux2014_aarch64.whl on the Xavier NX gives the correct results in python:
and similarly in C++:
However, I need the CUDA version for my project. Note: numpy was not included in the dependencies of the wheel, giving
on my first attempt and I then ran into this bug numpy/numpy#18131 when installing its default version (1.19.5). I fixed it by using v1.19.4 of numpy:
|
I've tried recompiling with gcc 8.4.0 (which is installable through apt on the Xavier NX) using
and it also gives the incorrect values for the exponential with float32 dtype. Build summary:
Also, I noticed this during the configuring stage:
so simply disabling NEON acceleration probably won't help. I've attached the CMakeCache.txt file in case that's useful. |
@BrettRyland I tried the same with GCC 8.4.0 and found the same behavior. Then I cherry-picked #47099 and this did fix it, with your test now passing. Here is the updated Jetson wheel: https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl ( Interestingly this is not a problem with the PyTorch v1.8 wheel and appears to be a regression in v1.9. |
OK, so cmake does eventually find NEON through testing for ASIMD:
I see the wheel you compiled was compiled with gcc-7.5.0 (which works, thanks!), however, cherry-picking #47099 and compiling with gcc-8.4.0 still fails as the gcc version check in that commit is for gcc >= 8.4, which gcc-8.4.0 passes. |
@dusty-nv I'm glad that fix works for you. Please note, that disabling NEON acceleration would have negative effect on CPU performance of PyTorch. |
Best way to solve the whole issue is by using clang instead of gnu. The NEON acceleration is still enabled. As you see, the error occurs also on a Raspberry Pi 64-OS. |
Hi, the exp function with dtype=torch.float32 in libtorch still give incorrect output when running on Jetson xavier. I recompiled the libtorch-1.10 with gcc 8.4.0: git clone -b v1.10.0 --recursive https://github.com/pytorch/pytorch.git
# install dependence: apt isntall ...
# export env: export USE_CUDA=ON ...
python3 ./tools/build_libtorch.py test code: void check_torch_exp(){
c10::InferenceMode guard(true); // Note: gives the same result without this.
auto t = torch::ones({3, 3}, torch::dtype(torch::kFloat32));
std::cout << "t:\n" << t << "\n";
std::cout << "t.exp():\n" << t.exp() << "\n";
std::cout << "t.hypot:\n" << torch::hypot(t, t) << "\n";
std::cout << "t.sigmoid_:\n" << t.sigmoid_() << "\n";
std::cout << "---------------------------------------------" << std::endl;
auto t_cuda = torch::ones({3, 3}, torch::dtype(torch::kFloat32)).to(torch::kCUDA);
std::cout << "t_CUDA:\n" << t_cuda << "\n";
std::cout << "t_CUDA.exp():\n" << t_cuda.exp() << "\n";
std::cout << "t_CUDAhypot:\n" << torch::hypot(t_cuda, t_cuda) << "\n";
std::cout << "t_CUDA.sigmoid_:\n" << t_cuda.sigmoid_() << "\n";
std::cout << "---------------------------------------------" << std::endl;
} cmake output: -- The C compiler identification is GNU 8.4.0
-- The CXX compiler identification is GNU 8.4.0
-- The CUDA compiler identification is NVIDIA 10.2.300
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda-10.2 (found version "10.2")
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda-10.2/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-10.2
-- Caffe2: Header version is: 10.2
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so
-- Found cuDNN: v8.2.1 (include: /usr/include, library: /usr/lib/aarch64-linux-gnu/libcudnn.so)
-- /usr/local/cuda-10.2/lib64/libnvrtc.so shorthash is 7d272a04
-- Autodetected CUDA architecture(s): 7.2
-- Added CUDA NVCC flags for: -gencode;arch=compute_72,code=sm_72
-- Found Torch: /data/zhangmm/opensource/pytorch/pytorch_1.10_GCC8.4/build_libtorch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nvidia/zhangmm/kd-distress-processor/test/debug/build final output: t:
1 1 1
1 1 1
1 1 1
[ CPUFloatType{3,3} ]
t.exp():
2.7183 1.0000 1.0000
1.0000 1.0000 1.0000
0.0000 1.0000 2.7183
[ CPUFloatType{3,3} ]
t.hypot:
1.4142 1.4142 1.4142
1.4142 1.4142 1.4142
1.4142 1.4142 1.4142
[ CPUFloatType{3,3} ]
t.sigmoid_:
0.7311 0.7311 0.7311
0.7311 0.7311 0.7311
0.7311 0.7311 0.7311
[ CPUFloatType{3,3} ]
---------------------------------------------
t_CUDA:
1 1 1
1 1 1
1 1 1
[ CUDAFloatType{3,3} ]
t_CUDA.exp():
2.7183 2.7183 2.7183
2.7183 2.7183 2.7183
2.7183 2.7183 2.7183
[ CUDAFloatType{3,3} ]
t_CUDA.hypot:
1.4142 1.4142 1.4142
1.4142 1.4142 1.4142
1.4142 1.4142 1.4142
[ CUDAFloatType{3,3} ]
t_CUDA.sigmoid_:
0.7311 0.7311 0.7311
0.7311 0.7311 0.7311
0.7311 0.7311 0.7311
[ CUDAFloatType{3,3} ] |
These errors are typical when using a GNU compiler like gcc 8.4.0. |
π Bug
The
exp
function in torch and libtorch (https://pytorch.org/docs/stable/generated/torch.exp.html#torch.exp) give incorrect output when run on Jetson devices (tested on Xavier NX and Nano) withdtype=torch.float32
.To Reproduce
test.cpp:
CMakeLists.txt:
Compile and run:
In python:
Expected behavior
Similar values as when calculated with float64.
Environment
Additional Context
I've noticed this behaviour on older versions of libtorch due to getting strange results in a custom NMS function, but have only now figured out the source of those strange results, so I'm not sure how old this error is.
cc @malfet @ngimel
The text was updated successfully, but these errors were encountered: