cephadm: fix rm-cluster when /etc/ceph/ceph.conf is a directory#67621
Open
cephadm: fix rm-cluster when /etc/ceph/ceph.conf is a directory#67621
Conversation
Fix IsADirectoryError in _rm_cluster() when /etc/ceph/ceph.conf exists
as a directory instead of a file.
The Error:
----------
During cluster cleanup, _rm_cluster() fails with:
Traceback (most recent call last):
File "/home/ubuntu/cephtest/cephadm", line 8634, in <module>
main()
File "/home/ubuntu/cephtest/cephadm", line 8622, in main
r = ctx.func(ctx)
File "/home/ubuntu/cephtest/cephadm", line 6538, in command_rm_cluster
with open(files[0]) as f:
IsADirectoryError: [Errno 21] Is a directory: '/etc/ceph/ceph.conf'
This occurs when attempting to remove a cluster where /etc/ceph/ceph.conf
is a directory instead of the expected file.
Root Cause:
-----------
Container services (iSCSI, NFS, NVMe-oF) create bind mounts like:
mounts[os.path.join(data_dir, 'config')] = '/etc/ceph/ceph.conf:z'
Docker/Podman has a quirk: when bind mounting to a destination that
doesn't exist, the container runtime creates the destination as a
DIRECTORY, not a file.
This occurs in test environments when:
1. Test framework creates /etc/ceph/ directory
2. Container service starts before bootstrap writes /etc/ceph/ceph.conf
3. Container runtime creates /etc/ceph/ceph.conf as a directory for the
bind mount
4. Later, _rm_cluster() tries to open('/etc/ceph/ceph.conf') assuming
it's a file, causing IsADirectoryError
Scenarios where /etc/ceph/ceph.conf may not exist as a file:
- Bootstrap with custom --output-config location
- Bootstrap failure/interruption before config write
- Race condition: container service starts before bootstrap completes
- Manual deletion of config file
- Container-only deployment without host-level config
The Fix:
--------
Use early returns and guard clauses to handle different states of
/etc/ceph/ceph.conf:
1. If ceph.conf doesn't exist: return early (nothing to clean up)
2. If ceph.conf is an empty directory: remove it and return (leftover
bind mount point from container)
3. If ceph.conf exists but is not a file: return early (unexpected state)
4. If ceph.conf is a file: validate fsid and remove all config files
Signed-off-by: Kefu Chai <[email protected]>
1899151 to
5bf13d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix IsADirectoryError in _rm_cluster() when /etc/ceph/ceph.conf exists as a directory instead of a file.
The Error:
During cluster cleanup, _rm_cluster() fails with:
This occurs when attempting to remove a cluster where /etc/ceph/ceph.conf is a directory instead of the expected file.
Root Cause:
Container services (iSCSI, NFS, NVMe-oF) create bind mounts like:
Docker/Podman has a quirk: when bind mounting to a destination that doesn't exist, the container runtime creates the destination as a DIRECTORY, not a file.
This occurs in test environments when:
Scenarios where /etc/ceph/ceph.conf may not exist as a file:
The Fix:
Use early returns and guard clauses to handle different states of /etc/ceph/ceph.conf:
Fixes: https://tracker.ceph.com/issues/75275
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.