Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheelsΒ #80489

@zou3519

Description

@zou3519

πŸ› Describe the bug

When installing functorch alongside a different PyTorch wheel (torch 1.12 {cpu, cu102, cu113, cu116}) than it was built with, we are experiencing either

  1. missing symbol issues on import functorch
  2. exception handling issues with functorch where the exception handling produces unexpected output. Independently, torchtext exhibits the same issue.

These seem to stem from different symbols existing in the torch (cpu, cu113, cu116) wheels vs the torch (cu102) wheels. Possibly related: pytorch/builder#1028 .

We (@malfet and I) are not sure if this is a problem with PyTorch or the way we build extensions. FWIW this did not happen during the last functorch releases (0.1.x).

functorch repro

See pytorch/functorch#916 for original issue.

Case 1: built functorch against the torch 1.12 (cpu) wheels.

  • When installing functorch with torch (cu102), on the AWS cluster, import torch; import functorch errors with missing symbol _ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev
  • When installing functorch with torch (cpu, cu113, cu116), there is no noticeable problem

Case 2: built functorch against the torch 1.12 (cu102) wheels

  • When installing functorch with torch (cu102): repro.py gives the expected output
  • When installing functorch with torch(cpu, cu113, cu116): gives unexpected output
# repro.py
import torch
from functorch import vmap
x = torch.randn(2, 3, 5)
vmap(lambda x: x, out_dims=3)(x)
Expected output

>>> vmap(lambda x: x, out_dims=3)(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 361, in wrapped
    return _flat_vmap(
  File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 488, in _flat_vmap
    return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
  File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 165, in _unwrap_batched
    flat_outputs = [
  File "/private/home/rzou/functorch4/functorch/_src/vmap.py", line 166, in <listcomp>
    _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

unexpected output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 366, in wrapped
    return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
  File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
    flat_outputs = [
  File "/private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
    _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
RuntimeError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Exception raised from maybe_wrap_dim_slow at ../c10/core/WrapDimMinimal.cpp:29 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f10a018e612 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #1: c10::detail::maybe_wrap_dim_slow(long, long, bool) + 0x3d3 (0x7f10a017c023 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packa
ges/torch/lib/libc10.so)
frame #2: at::functorch::_remove_batch_dim(at::Tensor const&, long, long, long) + 0x5e8 (0x7f0ff6088678 in /private/home/rzou/local/miniconda3/envs/py39/lib/p
ython3.9/site-packages/functorch/_C.so)
frame #3: <unknown function> + 0x23b502 (0x7f0ff608c502 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
frame #4: <unknown function> + 0x1ff6e2 (0x7f0ff60506e2 in /private/home/rzou/local/miniconda3/envs/py39/lib/python3.9/site-packages/functorch/_C.so)
<omitting python frames>
frame #27: __libc_start_main + 0xf3 (0x7f10f1ae70b3 in /lib/x86_64-linux-gnu/libc.so.6)

The exception handling appears to be incorrect.

torchtext repro

torchtext is built against torch (cu102).

import torchtext
torchtext._torchtext._build_vocab_from_text_file_using_python_tokenizer("doesnotexist", 10, 10)

When installing torchtext with torch (cpu) and running the above two lines, we get the following error message:

error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Cannot open input file doesnotexist
Exception raised from _infer_lines at /root/project/torchtext/csrc/vocab.cpp:143 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fbbf0feebbe in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-pack
ages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7fbbf0fc9e38 in /private/home/rzou/local/miniconda3
/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: torchtext::_infer_lines(std::string const&) + 0x254 (0x7fbb4e94cd84 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/to
rchtext/lib/libtorchtext.so)
frame #3: <unknown function> + 0x14bcb (0x7fbb4e674bcb in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
frame #4: <unknown function> + 0x34fb1 (0x7fbb4e694fb1 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
frame #5: <unknown function> + 0x2d7c9 (0x7fbb4e68d7c9 in /private/home/rzou/local/miniconda3/envs/py310/lib/python3.10/site-packages/torchtext/_torchtext.so)
<omitting python frames>
frame #19: __libc_start_main + 0xf3 (0x7fbc0bd660b3 in /lib/x86_64-linux-gnu/libc.so.6)

This exhibits the same behavior as the functorch repo; it is not expected that there is additional information about the c++ stack trace.

Versions

PyTorch 1.12 (latest release)
torchtext 0.13 (latest release)
functorch RC binaries

cc @ezyang @gchanan @zou3519 @malfet @seemethere

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: buildBuild system issuesmodule: cpp-extensionsRelated to torch.utils.cpp_extensiontopic: binariestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions