[CUSTOMER] Use ntpdate in genesis kernel instead of starting the ntp daemon to reduce discovery time #2307
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempt to resolve #2327
This pull request fixes a customer reported issue for CORAL where certain nodes take 15+ minutes to do hardware discovery.
The following log message points us to the 15 minute time difference inside the Genesis Kernel.
At initial investigation, the
ntpq -c rv offsetseems to be the most likely place that causes a long loop. However, we were unable to easily re-create it. After further investigation, it turns out that most of the time, we startntpdand immediately call thentpqcommand that returns an offset of 0. My guess is the ntp daemon is not quite ready yet and so we skip over the while loop.If we inject a
sleep 15after starting thentpdthen the offset becomes very large, and ntp does not sync again for 15 minutes, thus, causing the 15 minute delay that we have seen.Since genesis is not intended to be a long running, I don't think it's necessary to start ntpd, but rather just use ntpdate to force the sync of the clock, if the server is configured on the xCAT MN
Testing results:
force change the date
Sync in genesis:
Discovery Testing output
After re-running a small provisioning test case, with ntp running on the MN node, we see the following messages: