Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kitsuyaazuma
Copy link
Collaborator

@kitsuyaazuma kitsuyaazuma commented Jul 12, 2025

WHAT

This PR introduces a "zero-copy" shared_memory IPC mode to significantly improve performance in multi-process training. It refactors the ProcessPoolClientTrainer to pre-allocate shared memory buffers for worker results, eliminating serialization overhead for the return trip from workers to the parent process. A new utility, process_tensors_in_object, is introduced to handle both moving tensors to shared memory and creating lightweight "handle" packages.

WHY

Profiling revealed that even with shared memory for the parent-to-worker data path, pickling the UplinkPackage for the return trip was a major performance bottleneck. This change addresses that bottleneck directly by avoiding tensor serialization on the return path. This leads to a substantial reduction in round-trip time and significantly improves the overall throughput and scalability of the federated learning process.

@kitsuyaazuma kitsuyaazuma requested a review from Copilot July 12, 2025 02:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a zero-copy shared-memory IPC mode by pre-allocating shared-memory buffers for worker results and introducing utilities to move, replace, and reconstruct tensors without pickling.

  • Introduce SHMHandle, process_tensors_in_object, and reconstruct_from_shared_memory in utils.
  • Refactor ProcessPoolClientTrainer to prepare per-client shared-memory buffers and use the new utilities in local_process and worker.
  • Update tests and the FedAvg trainer to exercise the zero-copy return path.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_core/test_client_trainer.py Add tensor field, import SHMHandle, and implement buffer prep
src/blazefl/core/utils.pyi Define SHMHandle, process_tensors_in_object, and reconstruction
src/blazefl/core/utils.py Implement tensor traversal, replace, and reconstruction utilities
src/blazefl/core/client_trainer.pyi Update worker signature, add prepare_uplink_package_buffer
src/blazefl/core/client_trainer.py Use new utils to move/replace tensors and reconstruct results
src/blazefl/core/init.py[.pyi] Update exports to include new utilities
src/blazefl/contrib/fedavg.py Extend FedAvg trainer for shared-memory uplink packages
Comments suppressed due to low confidence (3)

src/blazefl/core/utils.py:21

  • The default max_depth is set to 1 but the docstring describes a default of 10. Align the code and documentation by either updating the default to 10 or correcting the docstring.
    obj: T, mode: Literal["move", "replace"], max_depth: int = 1

src/blazefl/core/utils.py:41

  • This line describes a default of 10 for max_depth, but the function signature uses 1. Please keep these in sync.
        max_depth: The maximum recursion depth. Defaults to 10.

src/blazefl/core/utils.py:20

  • [nitpick] Consider adding unit tests for process_tensors_in_object and reconstruct_from_shared_memory to verify correct round-trip behavior, handle nested structures, and test the shared-memory paths.
def process_tensors_in_object(

@kitsuyaazuma kitsuyaazuma merged commit b122224 into main Jul 12, 2025
2 checks passed
@kitsuyaazuma kitsuyaazuma deleted the feat-process-pool-zero-copy branch July 12, 2025 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants