ggml-backend : fix async copy from CPU #8897

slaren · 2024-08-06T22:04:36Z

The problem was that some copies from the CPU backend to the CUDA backend were not correctly synchronized, which in some cases could allow the CPU backend to overwrite the data in the next batch, before it was copied to the GPU.

slaren · 2024-08-06T22:06:07Z

@matteoserva please let me know if this fixes the issue in your system. I already tested this on @JohannesGaessler machine, so I expect it works there.

JohannesGaessler

Why does the destination backend need to be synchronized in ggml_backend_tensor_copy_async but not in ggml_backend_sched_compute_splits?

ggml-ci

slaren · 2024-08-06T22:18:17Z

The idea is that the scheduler makes multiple copies of every input and synchronizes access to them with events. Instead of having to synchronize the entire backend, it is enough to synchronize with the event. However there was a missing ggml_backend_event_synchronize in this case, it should be fixed now.

JohannesGaessler · 2024-08-06T22:28:52Z

ggml/src/ggml-backend.c

+                    if (sched->events[split_backend_id][sched->cur_copy] != NULL) {
+                        ggml_backend_event_synchronize(sched->events[split_backend_id][sched->cur_copy]);
+                    } else {
+                        ggml_backend_synchronize(split_backend);


I think this synchronization call can be optimized out since with a null event the backend has already been synchronized. But if there is no measurable performance difference it may be better to just keep it in to make the code easier to understand.

Yes, I left it there for clarity. For backends that don't support events, ggml_backend_synchronize should be a no-op anyway.

… the same

JohannesGaessler · 2024-08-06T22:51:31Z

Prior to the latest commit the fix was working on my second machine with 3x P40. I'll review the new changes tomorrow.

slaren · 2024-08-06T22:53:00Z

The changes to ggml_backend_cuda_cpy_tensor_async in the latest commit are not related to this issue, and these cases are never hit in llama.cpp. Nonetheless, I found these issues while looking into this, so I am fixing it now to avoid other issues in the future.

matteoserva · 2024-08-07T06:35:59Z

@slaren The patch fixed the issue on my system. Thank you!

JohannesGaessler · 2024-08-07T08:05:38Z

ggml/src/ggml-cuda.cu


+    if (backend_src != backend_dst) {


How is it ensured that there are no race conditions between backend_src and backend_dst for this code branch?

What race conditions are you thinking about? It uses an event to synchronize the two streams.

I think I misinterpreted the code. If my understanding is correct the synchronization happens outside this function.

Part of the synchronization is done in this function, but the most complicated parts happen in ggml_backend_sched. Ultimately, the only responsability of this function is to implement the semantics of the copy_async interface of ggml-backend, as defined in ggml-backend.h:

// asynchronous copy // the copy is performed after all the currently queued operations in backend_src // backend_dst will wait for the copy to complete before performing other operations // automatic fallback to sync copy if async is not supported GGML_API void ggml_backend_tensor_copy_async(ggml_backend_t backend_src, ggml_backend_t backend_dst, struct ggml_tensor * src, struct ggml_tensor * dst);

* ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same

JohannesGaessler reviewed Aug 6, 2024

View reviewed changes

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 6, 2024

ggml-backend : fix async copy from CPU

a5eae7a

ggml-ci

slaren force-pushed the sl/fix-cpu-async-copy branch from cf49428 to a5eae7a Compare August 6, 2024 22:17

JohannesGaessler reviewed Aug 6, 2024

View reviewed changes

cuda : more reliable async copy, fix stream used when the devices are…

96bcc9e

… the same

JohannesGaessler reviewed Aug 7, 2024

View reviewed changes

JohannesGaessler approved these changes Aug 7, 2024

View reviewed changes

slaren merged commit be55695 into master Aug 7, 2024
54 checks passed

slaren deleted the sl/fix-cpu-async-copy branch August 7, 2024 11:29

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 7, 2024

ggml-backend : fix async copy from CPU (ggml-org#8897)

9d73802

* ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

ggml-backend : fix async copy from CPU (ggml-org#8897)

b224d2d

* ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-backend : fix async copy from CPU #8897

ggml-backend : fix async copy from CPU #8897

Uh oh!

slaren commented Aug 6, 2024

Uh oh!

slaren commented Aug 6, 2024

Uh oh!

JohannesGaessler left a comment

Uh oh!

slaren commented Aug 6, 2024 •

edited

Loading

Uh oh!

JohannesGaessler Aug 6, 2024

Uh oh!

slaren Aug 6, 2024

Uh oh!

JohannesGaessler commented Aug 6, 2024

Uh oh!

slaren commented Aug 6, 2024

Uh oh!

matteoserva commented Aug 7, 2024

Uh oh!

JohannesGaessler Aug 7, 2024

Uh oh!

slaren Aug 7, 2024

Uh oh!

JohannesGaessler Aug 7, 2024

Uh oh!

slaren Aug 7, 2024

Uh oh!

Uh oh!

Uh oh!

ggml-backend : fix async copy from CPU #8897

ggml-backend : fix async copy from CPU #8897

Uh oh!

Conversation

slaren commented Aug 6, 2024

Uh oh!

slaren commented Aug 6, 2024

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

slaren commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Aug 6, 2024

Uh oh!

slaren commented Aug 6, 2024

Uh oh!

matteoserva commented Aug 7, 2024

Uh oh!

JohannesGaessler Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

slaren commented Aug 6, 2024 •

edited

Loading