Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BUG] Node disconnection test failedΒ #5476

@yangchiu

Description

@yangchiu

Describe the bug (πŸ› if you encounter this issue)

In Node disconnection test case 1, the desired behavior is:

If there is data writing during the disconnection, 
due to the engine process not able to talk with other replicas, 
the engine process will mark all other replicas as ERROR.

The volume will remain detached, and all replicas remain in error state after node network connection back.

But in v1.4.1-rc1, network disconnection will not cause the replica on the attached node becoming error state, instead it remains healthy, and after node network connection back, replicas on other nodes can be rebuilt from this healthy replica, and finally all the replicas are in healthy state, which is not the expected behavior.

Need to confirm whether the behavior changing is expected.

To Reproduce

manually execute Node disconnection test case 1

Expected behavior

A clear and concise description of what you expected to happen.

Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version:
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of management node in the cluster:
    • Number of worker node in the cluster:
  • Node config
    • OS type and version:
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

Add any other context about the problem here.

Metadata

Metadata

Labels

area/resilienceSystem or volume resiliencearea/v1-data-enginev1 data engine (iSCSI tgt)backport/1.4.1component/longhorn-managerLonghorn manager (control plane)kind/bugpriority/0Must be implement or fixed in this release (managed by PO)release/behavior-change-noteNote for behavior changereproduce/always100% reproduciblerequire/manual-test-planRequire adding/updating manual test cases if they can't be automatedseverity/1Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade)

Type

No type

Projects

Status

Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions