-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
Description
salt-call state.apply gives:
salt.exceptions.SaltClientError: Nonce verification error
[ERROR ] An un-handled exception was caught by Salt's global exception handler:
SaltClientError: Nonce verification error
Traceback (most recent call last):
File "/usr/bin/salt-call", line 11, in <module>
sys.exit(salt_call())
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 444, in salt_call
client.run()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/call.py", line 50, in run
caller.run()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/caller.py", line 95, in run
ret = self.call()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/caller.py", line 202, in call
ret["return"] = self.minion.executors[fname](
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1232, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1247, in _run_as
return _func_or_method(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/executors/direct_call.py", line 10, in execute
return func(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1232, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1247, in _run_as
return _func_or_method(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 834, in apply_
return highstate(**kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 1183, in highstate
ret = st_.call_highstate(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4756, in call_highstate
high, errors = self.render_highstate(matches)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4616, in render_highstate
state, errors = self.render_state(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4402, in render_state
nstate, err = self.render_state(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4402, in render_state
nstate, err = self.render_state(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4250, in render_state
state_data = self.client.get_state(sls, saltenv)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 400, in get_state
dest = self.cache_file(path, saltenv, cachedir=cachedir)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 186, in cache_file
return self.get_url(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 517, in get_url
result = self.get_file(url, dest, makedirs, saltenv, cachedir=cachedir)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1166, in get_file
hash_server, stat_server = self.hash_and_stat_file(path, saltenv)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1403, in hash_and_stat_file
hash_result = self.hash_file(path, saltenv)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1396, in hash_file
return self.__hash_and_stat_file(path, saltenv)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1388, in __hash_and_stat_file
return self.channel.send(load)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 125, in wrap
raise exc_info[1].with_traceback(exc_info[2])
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 131, in _target
result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
return future_cell[0].result()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 295, in send
ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
value = future.result()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
yielded = self.gen.throw(*exc_info)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 252, in _crypted_transfer
ret = yield _do_transfer()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
value = future.result()
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1070, in run
yielded = self.gen.send(value)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 242, in _do_transfer
data = self.auth.crypticle.loads(data, raw, nonce=nonce)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/crypt.py", line 1533, in loads
raise SaltClientError("Nonce verification error")
salt.exceptions.SaltClientError: Nonce verification error
This happens occasionally on lower latency minions and almost always on high latency minions geographically separated from the syndic.
We have over 700 states applied across over 30 formulas and over 400 minions per syndic, so there's a lot of connections but the syndics' load is consistently low. The issue happens more frequently on minions with higher latency/jitter. If I add -l trace to see what state it was working on before hitting the exception, it'll be a different state each time, so while the number of states being applied might be affecting it, I think network latency and jitter is the bigger factor. (Or, maybe the combination of that + lots of connections?)
I edited crypt.Crypticle.loads() to collect more information, so at the top of that function, it looks like this:
if nonce:
ret_nonce = data[:32].decode()
data = data[32:]
payload = salt.payload.loads(data, raw=raw)
print(f"match={ret_nonce == nonce} {nonce} {ret_nonce} {payload=}")
if ret_nonce != nonce:
pass
#raise SaltClientError("Nonce verification error")
All payloads decode properly, so it isn't being mangled in transit, but these two lines hint at what's probably happening:
match=False 54afef3ef7a34d9995f75bee09824a12 0a599f19ee714a6a9adcd1a3a71159ea payload=''
match=False 9a92817586714c8a891ef566cf8110c4 54afef3ef7a34d9995f75bee09824a12 payload=''
Note the nonce of one request matches the ret_nonce of a later request. Something in ext.tornado might be mixing up requests when received out of order maybe?
Setup
Onedir setup pinned to the minor version 3006.2 + these modules pip installed:
arrow==1.2.3
PyMySQL==1.0.*
pyroute2==0.7.*
netifaces==0.10.*
netaddr==0.8.*
dnspython==2.0.*
pynetbox==6.5.*
layout: 1 master of masters -> 2 syndics -> minions which only report to only one (their geographically closest) syndic
Masters/Syndics are on CentOS 7. Minions a mixed bag of CentOS7 / CloudLinux 7 / AlmaLinux8 / Ubuntu20 / Ubuntu22.
Masters/Syndics are on OpenVZ. Minions are a mixture VM types and physical machines - but the issue happens on physical minions, so I don't think that's related.
We use gitfs for pillar and states, and TCP for transport.
Steps to Reproduce the behavior
If I sign into a minion that doesn't have the issue, I can cause it to happen by simulating some network delay:
tc qdisc add dev eth0 root netem delay 100ms 20ms distribution normal
salt-call state.testExpected behavior
Highstate applied without crashing with the nonce exception.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Versions on minions/master/syndics are identical. This is collected from an affected minion:
Salt Version:
Salt: 3006.2
Python Version:
Python: 3.10.12 (main, Aug 3 2023, 21:47:10) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: 1.3.0
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: 1.7.0
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.3
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: cloudlinux 7.9 Boris Yegorov
locale: utf-8
machine: x86_64
release: 3.10.0-1160.el7.x86_64
system: Linux
version: CloudLinux 7.9 Boris Yegorov