Add support for GPU to similarity. #999

austintlee · 2024-11-05T02:08:35Z

We want to be able to run the rerank transform on Ray on GPU, but we found a couple of issues while testing this using test_rerank.py. The tests in test_rerank.py currently run in ExecMode.LOCAL and I suspect most of the time they are run on CPU. If you change the execution mode to RAY and run the tests on a GPU machine, we find the following issues.

Issue 1 - similarity does not properly run on GPU.

Here's the stack trace:

ray.exceptions.RayTaskError(UserCodeException): ray::MapBatches(HuggingFaceTransformersSimilarityScorer)->Map(ray_callable)() (pid=626521, ip=192.168.68.124)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/base.py", line 203, in ray_callable
    return BaseMapTransform._process_ray(ray_input, name, lambda d: f(d, *args, **kwargs), enable_auto_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/base.py", line 257, in _process_ray
    outputs = f(docs)
              ^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/base.py", line 203, in <lambda>
    return BaseMapTransform._process_ray(ray_input, name, lambda d: f(d, *args, **kwargs), enable_auto_metadata)
                                                                    ^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/utils/import_utils.py", line 46, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/similarity.py", line 153, in __call__
    return self.generate_similarity_scores(doc_batch, query, score_property_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/similarity.py", line 88, in generate_similarity_scores
    scores = self.score(input_pairs)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/utils/time_trace.py", line 141, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/similarity.py", line 164, in score
    self._model = AutoModelForSequenceClassification.from_pretrained(self.model_name).to(self.device)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2958, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
           ^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

The above exception was the direct cause of the following exception:

ray::MapBatches(HuggingFaceTransformersSimilarityScorer)->Map(ray_callable)() (pid=626520, ip=192.168.68.124)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 461, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in __call__
    for data in iter:
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
    output = next(input)
             ^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 216, in __call__
    yield from self._row_fn(input, ctx)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 379, in transform_fn
    for row in rows:
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 269, in __call__
    for block in blocks:
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in __call__
    for data in iter:
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
    output = next(input)
             ^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 236, in __call__
    yield from self._batch_fn(input, ctx)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 282, in transform_fn
    res = fn(batch)
          ^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 194, in fn
    _handle_debugger_exception(e)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 210, in _handle_debugger_exception
    raise UserCodeException() from e
ray.exceptions.UserCodeException

../../.cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/exceptions.py:87: RayTaskError(UserCodeException)

I took a suggestion from @HenryL27 and mirrored the GPU setup in embed.py and the above problem went away. This is the diff in similarity.py.

austintlee · 2024-12-13T17:49:20Z

Issue 2 - pickle deserialization gets confused.

Once I got past the above issue, I started getting a different stack trace:

ray.exceptions.RayTaskError(UserCodeException): ray::Map(ray_callable)() (pid=545630, ip=192.168.68.124)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/transforms/sort.py", line 50, in ray_callable
    doc = Document.from_row(input_dict)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/data/document.py", line 237, in from_row
    return Document.deserialize(row["doc"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aryn/sycamore/lib/sycamore/sycamore/data/document.py", line 224, in deserialize
    data = pickle.loads(raw)  # mapped_loads(raw)
           ^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/storage.py", line 381, in _load_from_bytes
    return torch.load(io.BytesIO(b))
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 1040, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 1272, in _legacy_load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 1205, in persistent_load
    obj = restore_location(obj, location)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 390, in default_restore_location
    result = fn(storage, location)
             ^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 265, in _cuda_deserialize
    device = validate_cuda_device(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/torch/serialization.py", line 249, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

The above exception was the direct cause of the following exception:

ray::Map(ray_callable)() (pid=545630, ip=192.168.68.124)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 461, in _map_task
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in __call__
    for data in iter:
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
    output = next(input)
             ^^^^^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 216, in __call__
    yield from self._row_fn(input, ctx)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 380, in transform_fn
    out_row = fn(row)
              ^^^^^^^
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 194, in fn
    _handle_debugger_exception(e)
  File ".cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 210, in _handle_debugger_exception
    raise UserCodeException() from e
ray.exceptions.UserCodeException

../../.cache/pypoetry/virtualenvs/sycamore-monorepo-QRJsh08E-py3.12/lib/python3.12/site-packages/ray/data/exceptions.py:87: RayTaskError(UserCodeException)

Upon further investigation, I found what I believe to be the same issue reported here - pytorch/pytorch#16797. The problem went away once I applied the workaround suggested in that Github issue.

bsowell

Looks good. Thanks!

bsowell · 2024-12-16T22:46:59Z

lib/sycamore/sycamore/transforms/similarity.py

    def score(self, inputs: list[tuple[str, str]]) -> list[float]:
        import torch

+        print(f"GPU: {torch.cuda.is_available()}")


minor: Might want to remove this (or change to log).

austin-aryn-ai added 2 commits November 4, 2024 18:06

Add support for GPU to similarity.

41b8a50

Fix ruff/apply black

63049e5

austintlee marked this pull request as ready for review December 13, 2024 17:01

Fix similarity score to return Python floats, not Tensor

6f4791f

bsowell approved these changes Dec 16, 2024

View reviewed changes

austin-aryn-ai added 4 commits December 16, 2024 14:52

Remove unnecessary print statement

2fe425f

Remove sort integ test as there is already one

a897fe1

Fix failing scorer unit tests

f80295a

Fix lint

ddd4f58

austintlee merged commit 26b14f4 into main Dec 17, 2024
12 of 14 checks passed

austintlee deleted the similarity-gpu-ray branch December 17, 2024 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for GPU to similarity. #999

Add support for GPU to similarity. #999

austintlee commented Nov 5, 2024 •

edited

Loading

Uh oh!

austintlee commented Dec 13, 2024

Uh oh!

bsowell left a comment

Uh oh!

bsowell Dec 16, 2024

Uh oh!

austintlee Dec 16, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for GPU to similarity. #999

Add support for GPU to similarity. #999

Conversation

austintlee commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

austintlee commented Dec 13, 2024

Uh oh!

bsowell left a comment

Choose a reason for hiding this comment

Uh oh!

bsowell Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

austintlee Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

austintlee commented Nov 5, 2024 •

edited

Loading