Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bdice
Copy link
Contributor

@bdice bdice commented Dec 18, 2025

Description

In some instances, libcudf and/or rapidsmpf may call Thrust algorithms with an execution policy with a custom allocator that allocates device-accessible host memory. This was observed to fail on some systems, because trivial_copy_from_device uses cudaMemcpyDeviceToHost which becomes a cudaErrorInvalidValue. If we instead use cudaMemcpyDefault, the CUDA runtime correctly infers the source/destination locations and no error is observed.

Sample error output from @nirandaperera:

[13:01:35.917][140065985130496][CUDA][E] Returning 1 (CUDA_ERROR_INVALID_VALUE) from cuMemcpyDtoHAsync_v2
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  __copy:: D->H: failed: cudaErrorInvalidValue: invalid argument

I added a basic test that calls trivial_copy_(from|to)_device with host pointers. This should cover the failure case, though it is actually triggered by some more complex Thrust algorithms in cudf::pack that I did not think were worth replicating as a unit test. Being able to use (pinned) host memory more easily is becoming important for performance, so we'd like to support this behavior.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice requested a review from a team as a code owner December 18, 2025 01:17
@bdice bdice requested a review from gevtushenko December 18, 2025 01:17
@github-project-automation github-project-automation bot moved this to Todo in CCCL Dec 18, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 18, 2025
@bdice bdice added the bug Something isn't working right. label Dec 18, 2025
@bdice
Copy link
Contributor Author

bdice commented Dec 18, 2025

@pciolkosz Thanks for helping us analyze this issue. Could this PR be a backport candidate for CCCL 3.2?

@github-actions
Copy link
Contributor

😬 CI Workflow Results

🟥 Finished in 2h 28m: Pass: 40%/93 | Total: 2d 19h | Max: 2h 27m | Hits: 79%/47896

See results here.

@nirandaperera
Copy link
Contributor

@bdice thank you for doing this. Can we revert the fixes in utils.h and make sure that the test case catches the error? I saw this specifically when using pinned memory.

@nirandaperera
Copy link
Contributor

BTW this may well fix rapidsai/cudf#20886.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working right.

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants