-
Notifications
You must be signed in to change notification settings - Fork 671
Closed
kubernetes/kubernetes
#71522Description
How to reproduce:
- Create GKE cluster with version v1.10.9-gke.5 (The NPD shipped with this GKE version is using NPD v0.5.0, with log counter support. See Update NPD config for GCI kubernetes#65342).
- Log onto a node and verify that NPD is working fine.
- Compile NPD at HEAD and get tarball.
- Copy newly built NPD tarball to the node.
- Untar NPD.
- Run the following commands:
sudo systemctl stop node-problem-detector.service
sudo cp -f bin/node-problem-detector /home/kubernetes/bin/node-problem-detector
sudo systemctl start node-problem-detector.service
- Verify that NPD is not working. Saw logs as follows:
$ sudo journalctl -u node-problem-detector.service
...
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: Started Kubernetes node problem detector.
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.918435 14438 log_monitor.go:64] Finish parsing log monitor config file: {WatcherConfig:{Plugin:kmsg PluginConfig:map[] LogPath:/dev/kmsg Lookback:5m} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Stat
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919043 14438 log_watchers.go:40] Use log watcher of plugin "kmsg"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919373 14438 log_monitor.go:64] Finish parsing log monitor config file: {WatcherConfig:{Plugin:journald PluginConfig:map[source:docker] LogPath:/var/log/journal Lookback:5m} BufferSize:10 Source:docker-monitor DefaultConditions:[]
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919558 14438 log_watchers.go:40] Use log watcher of plugin "journald"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919812 14438 custom_plugin_monitor.go:67] Finish parsing custom plugin monitor config file: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc00030b4c0 TimeoutString:0xc00030b4d0 InvokeInterval:5m0s Timeout:1m0s MaxOutput
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: panic: invalid configuration: no server found for cluster "local"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: goroutine 1 [running]:
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: k8s.io/node-problem-detector/pkg/problemclient.NewClientOrDie(0xc0002f0f80, 0xc0002e69c0, 0xc0000405a0)
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: /usr/local/google/home/zhenw/go/src/k8s.io/node-problem-detector/pkg/problemclient/problem_client.go:69 +0x3f7
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: main.main()
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: /usr/local/google/home/zhenw/go/src/k8s.io/node-problem-detector/cmd/node_problem_detector.go:90 +0x302
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Unit entered failed state.
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Failed with result 'exit-code'.
I manually tried commits in the history and found that the culprit PR is #187.
/cc @Random-Liu
/cc @andyxning
/cc @jiayingz
Metadata
Metadata
Assignees
Labels
No labels