Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[IMPROVEMENT] Implement Network Reconnection for Enhancing Replica Rebuilding ResilienceΒ #9626

Description

@derekbit

Is your improvement request related to a feature? Please describe (πŸ‘ if you like this request)

Replica rebuilding can fail due to networking issues such as dropped TCP connections when under high CPU load. When this occurs, the rebuilding process must restart. Although Longhorn can skip existing data blocks by comparing checksums of the source and destination, the rebuilding remains inefficient.

Describe the solution you'd like

To improve efficiency, Longhorn could attempt to reconnect and resume the replica rebuilding process after a connection drop. If reconnection attempts exceed a maximum number, Longhorn can abort the rebuilding due to the node's highly unstable network.

Describe alternatives you've considered

Additional context

The improvement is inspired by #8745

Metadata

Metadata

Labels

area/resilienceSystem or volume resiliencearea/v1-data-enginev1 data engine (iSCSI tgt)area/v2-data-enginev2 data engine (SPDK)area/volume-replica-rebuildVolume replica rebuilding relatedkind/improvementRequest for improvement of existing functionpriority/0Must be implement or fixed in this release (managed by PO)require/auto-e2e-testRequire adding/updating auto e2e test cases if they can be automated

Type

No type
No fields configured for issues without a type.

Projects

Status
Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions