ceph_check, in short, is a reporting tool for RHCS/Ceph clusters.
-
A distributed storage solution as Ceph has to be installed according to specific guide lines.
-
This is important for optimal performance and ease of use.
ceph_checkintends to find unsupported or inoptimal configurations. -
ceph_checkis mainly intended towardsRed Hat Ceph Storage(RHCS) installations, but can be equally applied on upstream Ceph installations as well.
ceph_check can be run from any node that fulfills the following points:
-
The node has to have
Ansibleinstalled. -
The user executing the program has passwordless SSH access to the cluster nodes.
-
The user executing the program has at least
read accessto the Ceph Admin keyring.
-
ceph_checkwill detect custom keyring locations, and use it appropriately. As a norm, any custom keyrings should be mentioned in/etc/ceph/ceph.conffor the Ceph cluster to work properly. -
Checks the package versions on all the nodes in the Ceph cluster, and will report any descrepancies.
-
Reports the generic status of the Cluster.
-
Check if there is a custom cluster name. 'ceph' is the one that is supported right now.
-
Checks the number of placement groups in the pools, and suggests a proper value.
-
Reports if a single journal disk is being used for more than 6 OSD disks, since 6 is the suggested value.
-
Checks for colocated MONs and OSDs
-
Checks for RHCS Tech-preview features being used.
-
Checks for discrepancies in the CRUSH map.
-
ceph_checklogs to /var/log/messages viarsyslog. -
If the leader MON is not available,
ceph_checkwill try to contact it three times each with an interval of 5, 10, and 15 seconds. If not able to contact within the said time period, it'll bail out.
ceph_check needs a few features of the subprocess module shipped in Python v3. But since ceph_check also targets OS versions running Python v2, we will need to use the module subprocess32 which contains the much needed features backported to v2.
Refer https://github.com/google/python-subprocess32
- You'll need to install
gccandpython-devel, before installingsubprocess32.
# yum install gcc python-devel -y
subprocess32can be installed usingpip
# sudo pip install subprocess32
ceph_check logs to rsyslog as of now.
It may move to the logger ceph uses in a later stage, or may use it's own log file as it initially did.
rsyslog dump logs which span multiple lines, as a single line. Even though ceph_check logs exceptions to /var/log/messages, it won't be formatted as python tracebacks would be.
For example, a ZeroDivisionError (or any other tracebacks) would look as:
Aug 21 19:00:30 rhel7 ceph_check: INFO: ####################
Aug 21 19:00:30 rhel7 ceph_check: INFO: Starting ceph_check
Aug 21 19:00:30 rhel7 ceph_check: INFO: Calling check_ansible()
Aug 21 19:00:30 rhel7 ceph_check: INFO: Trying to load the ansible module
Aug 21 19:00:30 rhel7 ceph_check: INFO: `ansible` module loaded, package installed.
Aug 21 19:00:30 rhel7 ceph_check: INFO: Calling check_keyring()
Aug 21 19:00:30 rhel7 ceph_check: INFO: Reading '/etc/ceph/ceph.conf'
Aug 21 19:00:30 rhel7 ceph_check: INFO: <--BUG--><--Cut here-->
Aug 21 19:00:30 rhel7 ceph_check: ERROR: integer division or modulo by zero#012Traceback (most recent call last):#012 File "ceph_check.py", line 266, in <module>#012 checker.cc_condition()#012 File "ceph_check.py", line 72, in cc_condition#012 self.check_keyring()#012 File "ceph_check.py", line 92, in check_keyring#012 1 / 0#012ZeroDivisionError: integer division or modulo by zero
This is due to rsyslog's behaviour of escaping newlines, tabs etc.. while logging them.
To fix this, add the following to /etc/rsyslog.conf, and restart rsyslog.
$EscapeControlCharactersOnReceive off
Logging should be as expected after this.
Traceback (most recent call last):
File "ceph_check.py", line 266, in <module>
checker.cc_condition()
File "ceph_check.py", line 72, in cc_condition
self.check_keyring()
File "ceph_check.py", line 92, in check_keyring
1 / 0
ZeroDivisionError: integer division or modulo by zero