Is your improvement request related to a feature? Please describe (π if you like this request)
Replica rebuilding can fail due to networking issues such as dropped TCP connections when under high CPU load. When this occurs, the rebuilding process must restart. Although Longhorn can skip existing data blocks by comparing checksums of the source and destination, the rebuilding remains inefficient.
Describe the solution you'd like
To improve efficiency, Longhorn could attempt to reconnect and resume the replica rebuilding process after a connection drop. If reconnection attempts exceed a maximum number, Longhorn can abort the rebuilding due to the node's highly unstable network.
Describe alternatives you've considered
Additional context
The improvement is inspired by #8745
Is your improvement request related to a feature? Please describe (π if you like this request)
Replica rebuilding can fail due to networking issues such as dropped TCP connections when under high CPU load. When this occurs, the rebuilding process must restart. Although Longhorn can skip existing data blocks by comparing checksums of the source and destination, the rebuilding remains inefficient.
Describe the solution you'd like
To improve efficiency, Longhorn could attempt to reconnect and resume the replica rebuilding process after a connection drop. If reconnection attempts exceed a maximum number, Longhorn can abort the rebuilding due to the node's highly unstable network.
Describe alternatives you've considered
Additional context
The improvement is inspired by #8745