-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
π Describe the bug
When installing functorch alongside a different PyTorch wheel (torch 1.12 {cpu, cu102, cu113, cu116}) than it was built with, we are experiencing either
- missing symbol issues on
import functorch - exception handling issues with functorch where the exception handling produces unexpected output. Independently, torchtext exhibits the same issue.
These seem to stem from different symbols existing in the torch (cpu, cu113, cu116) wheels vs the torch (cu102) wheels. Possibly related: pytorch/builder#1028 .
We (@malfet and I) are not sure if this is a problem with PyTorch or the way we build extensions. FWIW this did not happen during the last functorch releases (0.1.x).
functorch repro
See pytorch/functorch#916 for original issue.
Case 1: built functorch against the torch 1.12 (cpu) wheels.
- When installing functorch with torch (cu102), on the AWS cluster,
import torch; import functorcherrors with missing symbol_ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev - When installing functorch with torch (cpu, cu113, cu116), there is no noticeable problem
Case 2: built functorch against the torch 1.12 (cu102) wheels
- When installing functorch with torch (cu102):
repro.pygives the expected output - When installing functorch with torch(cpu, cu113, cu116): gives unexpected output
# repro.py
import torch
from functorch import vmap
x = torch.randn(2, 3, 5)
vmap(lambda x: x, out_dims=3)(x)
Expected output
>>> vmap(lambda x: x, out_dims=3)(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 361, in wrapped
return _flat_vmap(
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 488, in _flat_vmap
return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 165, in _unwrap_batched
flat_outputs = [
File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 166, in <listcomp>
_remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
unexpected output
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 366, in wrapped
return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
flat_outputs = [
File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
_remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
RuntimeError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Exception raised from maybe_wrap_dim_slow at ../c10/core/WrapDimMinimal.cpp:29 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f10a018e612 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #1: c10::detail::maybe_wrap_dim_slow(long, long, bool) + 0x3d3 (0x7f10a017c023 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #2: at::functorch::_remove_batch_dim(at::Tensor const&, long, long, long) + 0x5e8 (0x7f0ff6088678 in /private/home/rzou/local/miniconda3/envs/py39/lib/p
ython3.9/site-packages/functorch/_C.so)
frame #3: <unknown function> + 0x23b502 (0x7f0ff608c502 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
frame #4: <unknown function> + 0x1ff6e2 (0x7f0ff60506e2 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
<omitting python frames>
frame #27: __libc_start_main + 0xf3 (0x7f10f1ae70b3 in /lib/x86_64-linux-gnu/libc.so.6)
The exception handling appears to be incorrect.
torchtext repro
torchtext is built against torch (cu102).
import torchtext
torchtext._torchtext._build_vocab_from_text_file_using_python_tokenizer("doesnotexist", 10, 10)
When installing torchtext with torch (cpu) and running the above two lines, we get the following error message:
error message
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Cannot open input file doesnotexist
Exception raised from _infer_lines at /root/project/torchtext/csrc/vocab.cpp:143 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fbbf0feebbe in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-pack
ages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fbbf0fc9e38 in /private/home/rzou/local/miniconda3
/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: torchtext::_infer_lines(std::string const&) + 0x254 (0x7fbb4e94cd84 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/to
rchtext/lib/libtorchtext.so)
frame #3: <unknown function> + 0x14bcb (0x7fbb4e674bcb in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
frame #4: <unknown function> + 0x34fb1 (0x7fbb4e694fb1 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
frame #5: <unknown function> + 0x2d7c9 (0x7fbb4e68d7c9 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
<omitting python frames>
frame #19: __libc_start_main + 0xf3 (0x7fbc0bd660b3 in /lib/x86_64-linux-gnu/libc.so.6)
This exhibits the same behavior as the functorch repo; it is not expected that there is additional information about the c++ stack trace.
Versions
PyTorch 1.12 (latest release)
torchtext 0.13 (latest release)
functorch RC binaries