Initial support for multihost pipelines #5358

blp · 2025-12-31T22:53:36Z

No description provided.

Copilot

Pull request overview

This PR introduces initial support for multihost pipelines, enabling Feldera to distribute pipeline execution across multiple hosts. The implementation adds coordination mechanisms for managing distributed pipeline processes, including step synchronization, checkpoint coordination, and transaction management across hosts.

Key changes:

Added MultihostConfig and coordination API endpoints for multihost pipeline management
Implemented coordinator interfaces for controlling distributed pipeline execution
Extended runtime status enums with new Coordination state for pipeline initialization in multihost mode
Added infrastructure for inter-host communication and layout configuration

Reviewed changes

Copilot reviewed 39 out of 42 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
crates/feldera-types/src/coordination.rs	New module defining coordinator-pipeline interface for multihost operations
crates/feldera-types/src/config.rs	Added `MultihostConfig` struct and `multihost` field to `PipelineConfig`
crates/feldera-types/src/runtime_status.rs	Added `Coordination` status variant and snake_case serialization support
crates/adapters/src/server.rs	Implemented coordination API endpoints and multihost initialization logic
crates/adapters/src/controller.rs	Extended controller with coordination request handling and checkpoint preparation
crates/dbsp/src/circuit/dbsp_handle.rs	Modified fingerprint handling and layout support for multihost circuits
crates/dbsp/src/operator/dynamic/communication/shard.rs	Added worker range-based sharding for multihost data distribution
crates/pipeline-manager/src/runner/local_runner.rs	Added validation to reject multihost deployments in local runner
crates/pipeline-manager/src/db/types/program.rs	Integrated multihost configuration generation based on runtime config
Cargo.toml	Updated dependencies including tarpc version bump for RPC communication

Comments suppressed due to low confidence (3)

crates/pipeline-manager/src/db/types/combined_status.rs:1

The todo!() placeholder for RuntimeStatus::Coordination will panic at runtime. This needs to be implemented with the appropriate CombinedStatus variant or mapping logic before this code path is exercised.
crates/pipeline-manager/src/db/types/combined_status.rs:1
The todo!() placeholder for RuntimeDesiredStatus::Coordination will panic at runtime. This needs to be implemented with the appropriate CombinedDesiredStatus variant or mapping logic before this code path is exercised.
crates/pipeline-manager/src/db/types/combined_status.rs:1
The todo!() placeholder for RuntimeDesiredStatus::Coordination will panic at runtime. This needs to be implemented with the appropriate CombinedDesiredStatus variant or mapping logic before this code path is exercised.

crates/feldera-types/src/config.rs

crates/adapters/src/server.rs

crates/dbsp/src/operator/communication/exchange.rs

crates/adapterlib/src/errors/controller.rs

ryzhyk

I've only reviewed the preliminary commits so far. Starting on the main commit now.

crates/adapters/src/controller.rs

ryzhyk · 2026-01-05T20:45:45Z

crates/dbsp/src/operator/dynamic/communication/gather.rs

-                                        self,
-                                    )
+    #[track_caller]
+    fn dyn_gather_multihost(&self, factories: &B::Factories, receiver_worker: usize) -> Stream<C, B>


Is there any reason dyn_gather_multihost wouldn't work in the single-host case?

It should work there too, it's just more expensive.

ryzhyk

Haven't read coordinator code yet, but this looks extremely clean!

ryzhyk · 2026-01-05T21:55:42Z

crates/feldera-types/src/coordination.rs

+pub enum StepAction {
+    /// Wait for instructions from the coordinator.
+    Idle,
+    /// Wait for a triggering event to occur, such as arrival of a sufficient


I thought it was coordinator's job to decide when all conditions for a step are met.

The idea here is that it's a waste of a round trip for the pipeline to report that it's got data and then for the coordinator to tell it to run a step. Instead, the coordinator says to start a step if data shows up.

ryzhyk · 2026-01-05T22:11:50Z

crates/feldera-types/src/config.rs

+    ///
+    /// The worker threads are evenly divided among the hosts.  For single-host
+    /// deployments, this should be 1 (the default).
+    pub hosts: usize,


Do we need this here given that it's already in PipelineConfig

ryzhyk · 2026-01-05T22:24:36Z

crates/adapters/src/controller.rs

+    /// Used to notify watchers when a step has been completed.
+    ///
+    /// This is updated to match `step` whenever it changes.
+    step_sender: tokio::sync::watch::Sender<StepStatus>,


Nit: import Sender instead of using a fully qualified path.

We've got std::sync::mpsc::Sender imported already. I could import this as a partly qualified path as watch::Sender, if you like.

ryzhyk · 2026-01-05T22:59:08Z

crates/adapters/src/controller.rs

-    /// - Whether the pipeline is currently `running`.
    /// - Whether a checkpoint has already been requested.
+    /// - The current [CoordinationRequest] and the current step.
    ///


Is the current step the next step the pipeline hasn't taken yet?

ryzhyk · 2026-01-05T23:55:22Z

crates/adapters/src/server.rs

+    });
+    Ok(
+        HttpResponseBuilder::new(StatusCode::OK).streaming(stream.map(|value| {
+            Ok::<_, Infallible>(Bytes::from(serde_json::to_string(&value).unwrap() + "\n"))


I hope the HTTP connection doesn't timeout if there is no status change for a few seconds

It doesn't seem to. I think that reqwest shares connections across requests. At any rate, the coordinator reconnects if anything drops, so it's not a big deal if it does.

snkas · 2026-01-19T18:31:17Z

crates/adapterlib/src/catalog.rs

    /// the output connector is detached.
    pub enable_count: Arc<AtomicUsize>,
+
+    /// In a multihost pipeline, this is the range of workers that gathers the


Separate question regarding workers: does every host have the same number of workers? As in, runtime_config.workers means workers/host?

Ah I see, so the number of workers gets distributed evenly across the hosts. It seems to me counter-intuitive that when I raise the number of hosts, I don't necessarily get more compute unless my existing hosts were being overprovisioned.

How do you think that the workers and hosts should be specified?

Hmm, it depends whether there is an assumption all hosts are equal.

If yes, workers_per_host? It could also be done backward compatibly, by requiring hosts * workers_per_host = workers.

If no, maybe a worker_distribution array of arrays, e.g., [[0, 1, 2], [4], [5, 6, 7]]. In any case, if this is the case (heterogeneous hosts), it should be under the control of the user as the entire distributed compute follows the speed of the slowest worker.

I think that we should initially assume that all hosts are equal. Assuming otherwise means a lot of infrastructure, which I do not think exists, for assigning pods to particular machines.

Making users do a math problem seems like overkill too.

I don't see a really good solution so for now I think I'll stick with this one. It will satisfy our goal of being able to start out a pipeline on multiple hosts for backfill then consolidate them on a single host later when backfill is complete. That's easy: initially set hosts > 1, then later change it to 1. (Of course, we can't initially change hosts after deployment, but that will be a goal later.)

crates/adapterlib/src/utils/datafusion.rs

crates/adapters/src/server.rs

snkas · 2026-01-19T18:38:18Z

crates/adapters/src/server.rs

+                        elapsed.as_secs()
+                    );
+                });
+                std::thread::sleep(Duration::from_millis(100));


Nitpick: the 100 millisecond duration could be converted into a constant.

crates/feldera-types/src/runtime_status.rs

crates/pipeline-manager/src/db/types/combined_status.rs

crates/pipeline-manager/src/db/types/pipeline.rs

Signed-off-by: Ben Pfaff <[email protected]>

The actix_web #[get] macro does something weird with function names that makes it hard to use the same name pretty much anywhere in the same scope, even as local variables. I kept trying to use `status` and it was not working out well. Signed-off-by: Ben Pfaff <[email protected]>

This allows the runner to report errors from OpenSSL. Signed-off-by: Ben Pfaff <[email protected]>

It's easier for me to think about atomics than about mutexes because there is no way for anything to happen while they are "held", so no possibility of contributing to a deadlock, etc. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

This ensures that the checkpoint can always be read from the checkpoint directory before trying to make it the default checkpoint. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

@ryzhyk

@ryzhyk pointed out that this avoids a potential pitfall. Signed-off-by: Ben Pfaff <[email protected]>

Until now, each table scan during an ad-hoc query separately locked the table of snapshots. The table of snapshots could change from one scan to the next. This meant that ad-hoc queries that involved multiple tables, or that scanned a single table multiple times, could work with inconsistent data. This fixes the problem. Signed-off-by: Ben Pfaff <[email protected]>

This enables a multihost coordinator to get a lease on a particular step across all of the hosts, then scan the tables in that scan, and finally drop the lease. Signed-off-by: Ben Pfaff <[email protected]>

Before this, all the output streams (that output connectors read) were assigned to host 0. This distributes them evenly across the hosts. Signed-off-by: Ben Pfaff <[email protected]>

In CollectionHandle, the array of partitions only contains entries for the current host, so they should not be biased by the worker offset. Biasing them caused a panic due to an invalid array index. The input handle mailboxes, however, do expect a global worker index, but ZSetStagedBuffers::flush was passing in a local index, which also caused a panic due to an invalid array index. This fixes both issues. Signed-off-by: Ben Pfaff <[email protected]>

Same fix as the previous commit, for additional calls. Signed-off-by: Ben Pfaff <[email protected]>

I think that this is ultimately harmless, but it makes more sense than the other order. Signed-off-by: Ben Pfaff <[email protected]>

This updates a match case to be more like the previous one, which makes the code easier for me to understand. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

This fixes a panic due to attempting to serialize `dyn Data`, which has `todo!()` for its rkvy implementation. I don't know whether it is a correct fix. Issue: #5426 Signed-off-by: Ben Pfaff <[email protected]>

The await on a notified() object only wakes up for notification since the object was created. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

Extra space after comma was causing filter parsing issues, leading to no logs being captured. Signed-off-by: Swanand Mulay <[email protected]>

Thanks, Simon. Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

host-id specified in args can be used when initializing logging. It helps to distinguish logs originating from different hosts. Signed-off-by: Swanand Mulay <[email protected]>

Signed-off-by: feldera-bot <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

Signed-off-by: feldera-bot <[email protected]>

Signed-off-by: Ben Pfaff <[email protected]>

blp self-assigned this Dec 31, 2025

Copilot AI review requested due to automatic review settings December 31, 2025 22:53

blp added DBSP core Related to the core DBSP library ft Fault tolerant, distributed, and scale-out implementation connectors Issues related to the adapters/connectors crate labels Dec 31, 2025

Copilot AI reviewed Dec 31, 2025

View reviewed changes

crates/feldera-types/src/config.rs Outdated Show resolved Hide resolved

crates/adapters/src/server.rs Outdated Show resolved Hide resolved

crates/dbsp/src/operator/communication/exchange.rs Show resolved Hide resolved

crates/adapterlib/src/errors/controller.rs Outdated Show resolved Hide resolved

blp force-pushed the coordinator branch 3 times, most recently from bba6ac5 to 2832f45 Compare January 1, 2026 00:54

blp mentioned this pull request Jan 2, 2026

[adapters] API to list checkpoints and their storage footprint #5362

Open

blp force-pushed the coordinator branch 6 times, most recently from c25390d to e0d8714 Compare January 2, 2026 22:55

ryzhyk reviewed Jan 5, 2026

View reviewed changes

ryzhyk approved these changes Jan 6, 2026

View reviewed changes

blp force-pushed the coordinator branch 2 times, most recently from e238563 to 7049cd0 Compare January 15, 2026 00:48

swanandx mentioned this pull request Jan 16, 2026

logs not available in web-console with multi host #5445

Closed

igorscs temporarily deployed to ci January 16, 2026 20:46 — with GitHub Actions Inactive

snkas reviewed Jan 19, 2026

View reviewed changes

blp added 7 commits January 21, 2026 12:07

[adapters] Small code cleanup in metrics code.

5668316

Signed-off-by: Ben Pfaff <[email protected]>

[pipeline-manager] Add RunnerError variant for OpenSSL errors.

929f56a

This allows the runner to report errors from OpenSSL. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Change bootstrap_policy from Mutex to Atomic.

2ef2a98

It's easier for me to think about atomics than about mutexes because there is no way for anything to happen while they are "held", so no possibility of contributing to a deadlock, etc. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Improve ergonomics of ControllerError::io_error.

0b9afbe

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Improve ergonomics of ControllerError::storage_error.

29d6fb5

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Finalize checkpoint in checkpoint dir before in storage root.

d4b2c0d

This ensures that the checkpoint can always be read from the checkpoint directory before trying to make it the default checkpoint. Signed-off-by: Ben Pfaff <[email protected]>

blp and others added 21 commits January 21, 2026 12:31

[adapters] Use stronger typing for "/activate" endpoint.

8c885ca

Signed-off-by: Ben Pfaff <[email protected]>

[types] Make more adapters types deserializable.

98b392f

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Support coordinating a multihost pipeline.

f38b464

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Check for requested steps after checkpoint timer.

84bcd62

@ryzhyk pointed out that this avoids a potential pitfall. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Add support for multihost ad-hoc queries.

f2ed11f

This enables a multihost coordinator to get a lease on a particular step across all of the hosts, then scan the tables in that scan, and finally drop the lease. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] On multihost, allow output streams to be assigned to hosts.

b1f6cbe

Before this, all the output streams (that output connectors read) were assigned to host 0. This distributes them evenly across the hosts. Signed-off-by: Ben Pfaff <[email protected]>

[dbsp] Fix other calls to update_for_worker() in the same way.

f772525

Same fix as the previous commit, for additional calls. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Advance to Idle after the step runs, not before.

3e48aec

I think that this is ultimately harmless, but it makes more sense than the other order. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Code cleanup.

7de99fc

This updates a match case to be more like the previous one, which makes the code easier for me to understand. Signed-off-by: Ben Pfaff <[email protected]>

[dbsp] Send exchange data to other hosts in parallel, not sequentially.

d4607c7

Signed-off-by: Ben Pfaff <[email protected]>

[pipeline-manager] Only allow changing hosts when storage is clear.

31f5be7

Signed-off-by: Ben Pfaff <[email protected]>

[dbsp] Fix waterline serialization panic on multihost.

161bc18

This fixes a panic due to attempting to serialize `dyn Data`, which has `todo!()` for its rkvy implementation. I don't know whether it is a correct fix. Issue: #5426 Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Fix some uses of tokio::sync::Notify::notified().

c5aa0d3

The await on a notified() object only wakes up for notification since the object was created. Signed-off-by: Ben Pfaff <[email protected]>

[adapters] Add ability to support connector orchestration for multihost.

f973bdc

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] remove whitespace in EnvFilter string

d949978

Extra space after comma was causing filter parsing issues, leading to no logs being captured. Signed-off-by: Swanand Mulay <[email protected]>

[pipeline-manager] Fix bugs with enums caught in review.

84e77ed

Thanks, Simon. Signed-off-by: Ben Pfaff <[email protected]>

[docs] Document coordination status.

d27b5b6

Signed-off-by: Ben Pfaff <[email protected]>

[adapters] host-id in server args

05a3068

host-id specified in args can be used when initializing logging. It helps to distinguish logs originating from different hosts. Signed-off-by: Swanand Mulay <[email protected]>

[ci] apply automatic fixes

d35bf88

Signed-off-by: feldera-bot <[email protected]>

blp force-pushed the coordinator branch from 94cca4e to d32e2e6 Compare January 21, 2026 22:51

blp and others added 2 commits January 21, 2026 15:51

[adapters] Add host_id to tests.

a389e2b

Signed-off-by: Ben Pfaff <[email protected]>

[ci] apply automatic fixes

85ebb62

Signed-off-by: feldera-bot <[email protected]>

blp force-pushed the coordinator branch from 7ce5745 to 85ebb62 Compare January 21, 2026 23:55

[adapters] Disable clippy warning.

9f8166c

Signed-off-by: Ben Pfaff <[email protected]>

blp added this pull request to the merge queue Jan 22, 2026

Merged via the queue into main with commit db08a25 Jan 22, 2026
1 check passed

blp deleted the coordinator branch January 22, 2026 01:58

Initial support for multihost pipelines #5358

Initial support for multihost pipelines #5358

Uh oh!

Conversation

blp commented Dec 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryzhyk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryzhyk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants