feat: implement zero-copy return from workers via shm #75

kitsuyaazuma · 2025-07-12T02:23:40Z

WHAT

This PR introduces a "zero-copy" shared_memory IPC mode to significantly improve performance in multi-process training. It refactors the ProcessPoolClientTrainer to pre-allocate shared memory buffers for worker results, eliminating serialization overhead for the return trip from workers to the parent process. A new utility, process_tensors_in_object, is introduced to handle both moving tensors to shared memory and creating lightweight "handle" packages.

WHY

Profiling revealed that even with shared memory for the parent-to-worker data path, pickling the UplinkPackage for the return trip was a major performance bottleneck. This change addresses that bottleneck directly by avoiding tensor serialization on the return path. This leads to a substantial reduction in round-trip time and significantly improves the overall throughput and scalability of the federated learning process.

Signed-off-by: kitsuyaazuma <[email protected]>

Copilot

Pull Request Overview

This PR adds a zero-copy shared-memory IPC mode by pre-allocating shared-memory buffers for worker results and introducing utilities to move, replace, and reconstruct tensors without pickling.

Introduce SHMHandle, process_tensors_in_object, and reconstruct_from_shared_memory in utils.
Refactor ProcessPoolClientTrainer to prepare per-client shared-memory buffers and use the new utilities in local_process and worker.
Update tests and the FedAvg trainer to exercise the zero-copy return path.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/test_core/test_client_trainer.py	Add `tensor` field, import `SHMHandle`, and implement buffer prep
src/blazefl/core/utils.pyi	Define `SHMHandle`, `process_tensors_in_object`, and reconstruction
src/blazefl/core/utils.py	Implement tensor traversal, replace, and reconstruction utilities
src/blazefl/core/client_trainer.pyi	Update `worker` signature, add `prepare_uplink_package_buffer`
src/blazefl/core/client_trainer.py	Use new utils to move/replace tensors and reconstruct results
src/blazefl/core/init.py[.pyi]	Update exports to include new utilities
src/blazefl/contrib/fedavg.py	Extend FedAvg trainer for shared-memory uplink packages

Comments suppressed due to low confidence (3)

src/blazefl/core/utils.py:21

The default max_depth is set to 1 but the docstring describes a default of 10. Align the code and documentation by either updating the default to 10 or correcting the docstring.

    obj: T, mode: Literal["move", "replace"], max_depth: int = 1

src/blazefl/core/utils.py:41

This line describes a default of 10 for max_depth, but the function signature uses 1. Please keep these in sync.

        max_depth: The maximum recursion depth. Defaults to 10.

src/blazefl/core/utils.py:20

[nitpick] Consider adding unit tests for process_tensors_in_object and reconstruct_from_shared_memory to verify correct round-trip behavior, handle nested structures, and test the shared-memory paths.

def process_tensors_in_object(

Signed-off-by: kitsuyaazuma <[email protected]>

kitsuyaazuma added 3 commits July 12, 2025 10:52

feat: zero-copy return with shm buffers

20e64f4

Signed-off-by: kitsuyaazuma <[email protected]>

test: implement necessary method

2f2606a

Signed-off-by: kitsuyaazuma <[email protected]>

fix: stop appending result with client id

c20dda6

Signed-off-by: kitsuyaazuma <[email protected]>

kitsuyaazuma requested a review from Copilot July 12, 2025 02:26

Copilot AI reviewed Jul 12, 2025

View reviewed changes

kitsuyaazuma added 5 commits July 12, 2025 11:27

chore: bump

38281fb

Signed-off-by: kitsuyaazuma <[email protected]>

chore: fix inconsistent docstring

7a715b4

Signed-off-by: kitsuyaazuma <[email protected]>

fix: max depth check after tensor check

f5a9807

Signed-off-by: kitsuyaazuma <[email protected]>

test: add tests for shared memory utilities

8695238

Signed-off-by: kitsuyaazuma <[email protected]>

chore: change default mode of fedavg example to shared memory

19dcab8

Signed-off-by: kitsuyaazuma <[email protected]>

kitsuyaazuma merged commit b122224 into main Jul 12, 2025
2 checks passed

kitsuyaazuma deleted the feat-process-pool-zero-copy branch July 12, 2025 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: implement zero-copy return from workers via shm #75

feat: implement zero-copy return from workers via shm #75

Uh oh!

kitsuyaazuma commented Jul 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: implement zero-copy return from workers via shm #75

feat: implement zero-copy return from workers via shm #75

Uh oh!

Conversation

kitsuyaazuma commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WHAT

WHY

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kitsuyaazuma commented Jul 12, 2025 •

edited

Loading