Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NPD HEAD not working on GKE #227

@wangzhen127

Description

@wangzhen127

How to reproduce:

  1. Create GKE cluster with version v1.10.9-gke.5 (The NPD shipped with this GKE version is using NPD v0.5.0, with log counter support. See Update NPD config for GCI kubernetes#65342).
  2. Log onto a node and verify that NPD is working fine.
  3. Compile NPD at HEAD and get tarball.
  4. Copy newly built NPD tarball to the node.
  5. Untar NPD.
  6. Run the following commands:
sudo systemctl stop node-problem-detector.service
sudo cp -f bin/node-problem-detector /home/kubernetes/bin/node-problem-detector
sudo systemctl start node-problem-detector.service
  1. Verify that NPD is not working. Saw logs as follows:
$ sudo journalctl -u node-problem-detector.service
...
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: Started Kubernetes node problem detector.
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.918435   14438 log_monitor.go:64] Finish parsing log monitor config file: {WatcherConfig:{Plugin:kmsg PluginConfig:map[] LogPath:/dev/kmsg Lookback:5m} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Stat
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919043   14438 log_watchers.go:40] Use log watcher of plugin "kmsg"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919373   14438 log_monitor.go:64] Finish parsing log monitor config file: {WatcherConfig:{Plugin:journald PluginConfig:map[source:docker] LogPath:/var/log/journal Lookback:5m} BufferSize:10 Source:docker-monitor DefaultConditions:[]
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919558   14438 log_watchers.go:40] Use log watcher of plugin "journald"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: I1126 20:02:44.919812   14438 custom_plugin_monitor.go:67] Finish parsing custom plugin monitor config file: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc00030b4c0 TimeoutString:0xc00030b4d0 InvokeInterval:5m0s Timeout:1m0s MaxOutput
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: panic: invalid configuration: no server found for cluster "local"
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: goroutine 1 [running]:
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: k8s.io/node-problem-detector/pkg/problemclient.NewClientOrDie(0xc0002f0f80, 0xc0002e69c0, 0xc0000405a0)
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]:         /usr/local/google/home/zhenw/go/src/k8s.io/node-problem-detector/pkg/problemclient/problem_client.go:69 +0x3f7
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]: main.main()
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 node-problem-detector[14438]:         /usr/local/google/home/zhenw/go/src/k8s.io/node-problem-detector/cmd/node_problem_detector.go:90 +0x302
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Unit entered failed state.
Nov 26 20:02:44 gke-npd-default-pool-17da7bcb-kb52 systemd[1]: node-problem-detector.service: Failed with result 'exit-code'.

I manually tried commits in the history and found that the culprit PR is #187.

/cc @Random-Liu
/cc @andyxning
/cc @jiayingz

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions