Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Master cannot see minion after 2018.3.0 upgrade #47905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue May 31, 2018 · 14 comments
Closed

Master cannot see minion after 2018.3.0 upgrade #47905

ghost opened this issue May 31, 2018 · 14 comments
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P4 Priority 4 severity-high 2nd top severity, seen by most users, causes major problems
Milestone

Comments

@ghost
Copy link

ghost commented May 31, 2018

Description of Issue/Question

On MacOS the latest minion 2018.3.0 is not recognised by the master.

We are were using salt with 2017.7.2 on both master and minions on both MacOS and Windows, then upgraded the master to 2018.3.0. All the currently set-up machines are still working fine. We upgraded the Windows machines to 2018.3.0, all good. Mac machines are still on 2017.7.2.
Now we are setting up new MacOS machines and found that using the latest minion 2018.3.0 doesn't connect to the master anymore, but previous version 2017.7.2 does work.

Setup

MacOS El capitan

Steps to Reproduce Issue

The way the machine is setup is simply:

  • install the minion with the package
  • run `salt-config -i "machine-name" -m "salt.company.domain.net"

We have the auto-accept option on master so after that, the machine is normally added automatically to the key list and works.

Now with the latest version, we install the new package (from a clean machine), run the same salt-config, but the machine is not added to the master key list nor visible.
If running a salt-call test.ping from the machine, then the machine is accepted on the master and can successfully apply a state. However, calling any command from the master, even a salt 'machine' test.ping, will fail.

The master can successfully ping the machine with a normal ping ip.address

It looks like the minion can access the master fine and do transaction (the master sending information too), but the master cannot see the minion by itself.

Versions Report

salt 2018.3.0 (Oxygen)
root@saltstack:/srv/salt# salt --versions-report
Salt Version:
Salt: 2018.3.0

Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.4.2
docker-py: Not Installed
gitdb: 0.6.4
gitpython: 1.0.1
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 1.0.3
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.12 (default, Dec 4 2017, 14:50:18)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.2.0
RAET: Not Installed
smmap: 0.9.0
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: Ubuntu 16.04 xenial
locale: UTF-8
machine: x86_64
release: 4.4.0-127-generic
system: Linux
version: Ubuntu 16.04 xenial

@Ch3LL
Copy link
Contributor

Ch3LL commented May 31, 2018

I have not been able to replicate this in our package testing. When you say its failing what is occurring? Are you seeing an error?

Also do you see anything in the debug log on the master o minion side when this occurs?

When the master sends the command do you see network traffic occurring on the minion? possibly using tcpdump?

Can you paste salt-call --versions-report from you mac minion

@Ch3LL Ch3LL added the info-needed waiting for more info label May 31, 2018
@Ch3LL Ch3LL added this to the Blocked milestone May 31, 2018
@ghost
Copy link
Author

ghost commented May 31, 2018

Sorry when I say it fails it returns

Minion did not return. [Not connected]

@ghost
Copy link
Author

ghost commented May 31, 2018

version reports is

Salt Version:
Salt: 2018.3.0

Dependency Versions:
cffi: 1.11.2
cherrypy: 13.0.0
dateutil: 2.6.1
docker-py: Not Installed
gitdb: 2.0.3
gitpython: 2.1.7
ioflo: Not Installed
Jinja2: 2.10
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 1.0.7
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: Not Installed
pycparser: 2.18
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.14 (default, Mar 28 2018, 14:48:50)
python-gnupg: 0.4.1
PyYAML: 3.12
PyZMQ: 17.0.0
RAET: Not Installed
smmap: 2.0.3
timelib: 0.2.4
Tornado: 4.5.2
ZMQ: 4.1.6

System Versions:
dist:
locale: US-ASCII
machine: x86_64
release: 15.0.0
system: Darwin
version: 10.11 x86_64

@ghost
Copy link
Author

ghost commented May 31, 2018

The ping with debug output from the master outputs

root@saltstack:/srv/salt# salt -l debug '*03' test.ping
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltstack
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltstack
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'saltstack_master', u'tcp://127.0.0.1:4506', u'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/ad/b3ef615bfa50e7950ad7dad33f68f9e90a2beb769ba8fd956ef6f71ee910a7/.minions.p
[DEBUG   ] get_iter_returns for jid 20180601092155897488 sent to set(['automac03']) will timeout at 09:22:00.907969
[DEBUG   ] Checking whether jid 20180601092155897488 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'saltstack_master', u'tcp://127.0.0.1:4506', u'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Passing on saltutil error. Key 'u'retcode' missing from client return. This may be an error in the client.
[DEBUG   ] return event: {'automac03': {u'failed': True}}
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] LazyLoaded no_return.output

I am surprise to see 127.0.0.1 as address, but that's the same output with a machine which works.

@ghost
Copy link
Author

ghost commented May 31, 2018

The machine also receives TCP packets from the master, every second or so

11:04:12.370595 IP 10.200.28.226.57515 > salt.domain.company.net.4506: Flags [.], ack 1, win 4117, options [nop,nop,TS val 535908139 ecr 1352131], length 0
11:04:12.370720 IP 10.200.28.226.57515 > salt.domain.company.net.4506: Flags [F.], seq 1, ack 1, win 4117, options [nop,nop,TS val 535908139 ecr 1352131], length 0
11:04:12.371870 ARP, Announcement 10.200.28.79 (Broadcast), length 46
11:04:12.372082 ARP, Announcement 10.200.28.79 (Broadcast), length 46
11:04:12.372364 ARP, Announcement 10.200.28.79 (Broadcast), length 46
11:04:12.375362 IP salt.domain.company.net.4506 > 10.200.28.226.57515: Flags [P.], seq 1:11, ack 1, win 227, options [nop,nop,TS val 1352132 ecr 535908139], length 10
11:04:12.375363 IP salt.domain.company.net.4506 > 10.200.28.226.57515: Flags [F.], seq 11, ack 2, win 227, options [nop,nop,TS val 1352133 ecr 535908139], length 0
11:04:12.375388 IP 10.200.28.226.57515 > salt.domain.company.net.4506: Flags [R], seq 938258983, win 0, length 0
11:04:12.375415 IP 10.200.28.226.57515 > salt.domain.company.net.4506: Flags [R], seq 938258984, win 0, length 0
11:04:12.418184 IP 10.200.28.226.57516 > salt.domain.company.net.4506: Flags [S], seq 1625006516, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 535908185 ecr 0,sackOK,eol], length 0
11:04:12.421322 IP salt.domain.company.net.4506 > 10.200.28.226.57516: Flags [S.], seq 4064199933, ack 1625006517, win 28960, options [mss 1460,sackOK,TS val 1352144 ecr 535908185,nop,wscale 7], l
ength 0
11:04:12.421354 IP 10.200.28.226.57516 > salt.domain.company.net.4506: Flags [.], ack 1, win 4117, options [nop,nop,TS val 535908188 ecr 1352144], length 0
11:04:12.421477 IP 10.200.28.226.57516 > salt.domain.company.net.4506: Flags [F.], seq 1, ack 1, win 4117, options [nop,nop,TS val 535908188 ecr 1352144], length 0
11:04:12.424139 IP salt.domain.company.net.4506 > 10.200.28.226.57516: Flags [P.], seq 1:11, ack 1, win 227, options [nop,nop,TS val 1352145 ecr 535908188], length 10
11:04:12.424141 IP salt.domain.company.net.4506 > 10.200.28.226.57516: Flags [F.], seq 11, ack 2, win 227, options [nop,nop,TS val 1352145 ecr 535908188], length 0
11:04:12.424170 IP 10.200.28.226.57516 > salt.domain.company.net.4506: Flags [R], seq 1625006517, win 0, length 0
11:04:12.424211 IP 10.200.28.226.57516 > salt.domain.company.net.4506: Flags [R], seq 1625006518, win 0, length 0

@ghost
Copy link
Author

ghost commented May 31, 2018

And that's running a ping FROM the machine, which succeeds and adds the machine to the master key list, although doesn't fix the problem from the master, i.e. still cannot ping the minion

automac-28-226:~ builder$ sudo salt-call -l debug test.ping
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/master_id.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/master_id.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/minion_id.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/minion_id.conf
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/master_id.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/master_id.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/minion_id.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/minion_id.conf
[DEBUG   ] Please install 'virt-what' to improve results of the 'virtual' grain.
[DEBUG   ] Connecting to master. Attempt 1 of 1
[DEBUG   ] Master URI: tcp://10.200.31.145:4506
[DEBUG   ] Popen(['git', 'version'], cwd=/Users/builder, universal_newlines=False, shell=None)
[DEBUG   ] Popen(['git', 'version'], cwd=/Users/builder, universal_newlines=False, shell=None)
[DEBUG   ] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506')
[INFO    ] Generating keys: /etc/salt/pki/minion
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] salt.crypt._get_key_with_evict: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (9081)
[DEBUG   ] Setting zmq_reconnect_ivl to '9081ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506', 'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://10.200.31.145:4506
[DEBUG   ] Trying to connect to: tcp://10.200.31.145:4506
[DEBUG   ] salt.crypt.get_rsa_pub_key: Loading public key
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] salt.crypt.get_rsa_pub_key: Loading public key
[DEBUG   ] Connecting the Minion to the Master publish port, using the URI: tcp://10.200.31.145:4505
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Determining pillar cache
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506', u'aes')
[DEBUG   ] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://10.200.31.145:4506
[DEBUG   ] Trying to connect to: tcp://10.200.31.145:4506
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] LazyLoaded test.ping
[DEBUG   ] test.ping received for minion 'automac-28-226'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506', u'aes')
[DEBUG   ] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'automac-28-226', u'tcp://10.200.31.145:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://10.200.31.145:4506
[DEBUG   ] Trying to connect to: tcp://10.200.31.145:4506
[DEBUG   ] LazyLoaded nested.output

@ghost
Copy link
Author

ghost commented Jun 1, 2018

Ok I finally got down to the bottom of the problem after outputting the daemon errors to a file, so the problem is that on those clean machines there is no Git installed.
So it fails to launch the daemon with

[CRITICAL] Unexpected error while connecting to salt.domain.company.net
Traceback (most recent call last):
  File "/opt/salt/lib/python2.7/site-packages/salt/minion.py", line 991, in _connect_minion
    yield minion.connect_master(failed=failed)
  File "/opt/salt/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/salt/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/opt/salt/lib/python2.7/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/salt/lib/python2.7/site-packages/salt/minion.py", line 1181, in connect_master
    master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
  File "/opt/salt/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/opt/salt/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/opt/salt/lib/python2.7/site-packages/tornado/gen.py", line 307, in wrapper
    yielded = next(result)
  File "/opt/salt/lib/python2.7/site-packages/salt/minion.py", line 691, in eval_master
    pub_channel = salt.transport.client.AsyncPubChannel.factory(self.opts, **factory_kwargs)
  File "/opt/salt/lib/python2.7/site-packages/salt/transport/client.py", line 161, in factory
    import salt.transport.zeromq
  File "/opt/salt/lib/python2.7/site-packages/salt/transport/zeromq.py", line 30, in <module>
    import salt.transport.mixins.auth
  File "/opt/salt/lib/python2.7/site-packages/salt/transport/mixins/auth.py", line 16, in <module>
    import salt.master
  File "/opt/salt/lib/python2.7/site-packages/salt/master.py", line 44, in <module>
    import salt.key
  File "/opt/salt/lib/python2.7/site-packages/salt/key.py", line 21, in <module>
    import salt.daemons.masterapi
  File "/opt/salt/lib/python2.7/site-packages/salt/daemons/masterapi.py", line 36, in <module>
    import salt.utils.gitfs
  File "/opt/salt/lib/python2.7/site-packages/salt/utils/gitfs.py", line 90, in <module>
    import git
  File "/opt/salt/lib/python2.7/site-packages/git/__init__.py", line 82, in <module>
    refresh()
  File "/opt/salt/lib/python2.7/site-packages/git/__init__.py", line 73, in refresh
    if not Git.refresh(path=path):
  File "/opt/salt/lib/python2.7/site-packages/git/cmd.py", line 230, in refresh
    cls().version()
  File "/opt/salt/lib/python2.7/site-packages/git/cmd.py", line 551, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/opt/salt/lib/python2.7/site-packages/git/cmd.py", line 1010, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/opt/salt/lib/python2.7/site-packages/git/cmd.py", line 821, in execute
    raise GitCommandError(command, status, stderr_value, stdout_value)
GitCommandError: Cmd('git') failed due to: exit code(1)
  cmdline: git version
  stderr: 'xcode-select: error: no developer tools were found at '/Applications/Xcode.app', and no install could be requested (perhaps no UI is present), please install manually from 'developer
.apple.com'.'

So it's not related to the version although the previous version doesn't seem to have this problem.
Now the question is why does salt need Git? Or does it actually need it? It looks like it's just checking the Git version.
That would be good to remove this dependency if possible.

@ghost
Copy link
Author

ghost commented Jun 1, 2018

Ok well looking at the code it looks like it just tries to import git and capture an exception but not this one, so it should just be a matter of capturing the GitCommandError too.
https://github.com/saltstack/salt/blob/develop/salt/utils/gitfs.py#L93

@terminalmage
Copy link
Contributor

terminalmage commented Jun 1, 2018

@maxime-viargues-serato Yeah, the problem with that is that we can't capture that error directly because it's part of an exception class in GitPython itself. @Ch3LL is going to change the exception catching to a general except Exception to get around this.

This is actually being caused by an upstream bug in GitPython. I reported and fixed this last year in gitpython-developers/GitPython#658, but this fix only worked when the git executable could not be found. The reason for the error you are seeing is because MacOS/Xcode/whatever appears to have added a git executable that's not actually git, but instead displays the "no developer tools were found" error. So, GitPython gets past the point where it imports all of its internal components, but fails when it initializes itself (during which it tries to run git version, which doesn't work of course).

I've reported this upstream and opened gitpython-developers/GitPython#763 to fix it. In the meantime, like I said, @Ch3LL is going to work around this by changing the exception catching.

Thanks for reporting!

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems P4 Priority 4 team-core and removed info-needed waiting for more info labels Jun 1, 2018
@Ch3LL Ch3LL modified the milestones: Blocked, Approved Jun 1, 2018
@prog-dale
Copy link

I am seeing the same thing with windows minions that I just upgraded to 2018.3

@terminalmage terminalmage added the fixed-pls-verify fix is linked, bug author to confirm fix label Jun 3, 2018
@ghost
Copy link
Author

ghost commented Jun 5, 2018

Thanks @terminalmage I'll wait for the fix and use the 2017 version in the meantime.

@rlwchang
Copy link

rlwchang commented Oct 1, 2018

@prog-dale I just had a similar issue. Have you tried restarting the salt-minion. For me, it worked after doing that.

Digging through the event logs, I got this:

Service salt-minion received SHUTDOWN control, which will be handled.

for some reason, and after that, the minion has been technically on, able to run salt-call, but it would no longer receive commands from the master. Any ideas why it might have received that command?

@terminalmage
Copy link
Contributor

2018.3.3 contains @Ch3LL's workaround for this upstream bug, and GitPython 2.1.11 contains my fix for the upstream bug. Given both of these facts, I am going to close this.

@ducnt102
Copy link

ducnt102 commented Dec 7, 2018

Hi,
I just had a similar issue.
i think beacause master.pub belong diffirent with salt

ll /etc/salt/pki/minion/minion_master.pub
-rw-r--r-- 1 root root 450 Aug 18 2017 /etc/salt/pki/minion/minion_master.pub

salt-minion --version

salt-minion 2018.3.3 (Oxygen)

so you need to remove salt-minion, delete master.pub and reinstall.
minion
#/etc/init.d/salt-minion stop
apt-get remove salt*
rm -f /etc/salt/
apt-get install salt-minion
/etc/init.d/salt-minion start
from master
#salt-key -d
accept again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix P4 Priority 4 severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

5 participants