Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BUG] Nonce verification error with TCP transport on slower network connections #65114

@terryd-imh

Description

@terryd-imh

Description

salt-call state.apply gives:

salt.exceptions.SaltClientError: Nonce verification error
[ERROR   ] An un-handled exception was caught by Salt's global exception handler:
SaltClientError: Nonce verification error
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    sys.exit(salt_call())
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 444, in salt_call
    client.run()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/call.py", line 50, in run
    caller.run()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/caller.py", line 95, in run
    ret = self.call()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/caller.py", line 202, in call
    ret["return"] = self.minion.executors[fname](
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 149, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1232, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1247, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/executors/direct_call.py", line 10, in execute
    return func(*args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 149, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1232, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1247, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 834, in apply_
    return highstate(**kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 1183, in highstate
    ret = st_.call_highstate(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4756, in call_highstate
    high, errors = self.render_highstate(matches)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4616, in render_highstate
    state, errors = self.render_state(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4402, in render_state
    nstate, err = self.render_state(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4402, in render_state
    nstate, err = self.render_state(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 4250, in render_state
    state_data = self.client.get_state(sls, saltenv)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 400, in get_state
    dest = self.cache_file(path, saltenv, cachedir=cachedir)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 186, in cache_file
    return self.get_url(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 517, in get_url
    result = self.get_file(url, dest, makedirs, saltenv, cachedir=cachedir)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1166, in get_file
    hash_server, stat_server = self.hash_and_stat_file(path, saltenv)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1403, in hash_and_stat_file
    hash_result = self.hash_file(path, saltenv)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1396, in hash_file
    return self.__hash_and_stat_file(path, saltenv)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/fileclient.py", line 1388, in __hash_and_stat_file
    return self.channel.send(load)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 125, in wrap
    raise exc_info[1].with_traceback(exc_info[2])
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 131, in _target
    result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
    return future_cell[0].result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 295, in send
    ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 252, in _crypted_transfer
    ret = yield _do_transfer()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 242, in _do_transfer
    data = self.auth.crypticle.loads(data, raw, nonce=nonce)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/crypt.py", line 1533, in loads
    raise SaltClientError("Nonce verification error")
salt.exceptions.SaltClientError: Nonce verification error

This happens occasionally on lower latency minions and almost always on high latency minions geographically separated from the syndic.

We have over 700 states applied across over 30 formulas and over 400 minions per syndic, so there's a lot of connections but the syndics' load is consistently low. The issue happens more frequently on minions with higher latency/jitter. If I add -l trace to see what state it was working on before hitting the exception, it'll be a different state each time, so while the number of states being applied might be affecting it, I think network latency and jitter is the bigger factor. (Or, maybe the combination of that + lots of connections?)

I edited crypt.Crypticle.loads() to collect more information, so at the top of that function, it looks like this:

        if nonce:
            ret_nonce = data[:32].decode()
            data = data[32:]
            payload = salt.payload.loads(data, raw=raw)
            print(f"match={ret_nonce == nonce} {nonce} {ret_nonce} {payload=}")
            if ret_nonce != nonce:
                pass
                #raise SaltClientError("Nonce verification error")

All payloads decode properly, so it isn't being mangled in transit, but these two lines hint at what's probably happening:

match=False 54afef3ef7a34d9995f75bee09824a12 0a599f19ee714a6a9adcd1a3a71159ea payload=''
match=False 9a92817586714c8a891ef566cf8110c4 54afef3ef7a34d9995f75bee09824a12 payload=''

Note the nonce of one request matches the ret_nonce of a later request. Something in ext.tornado might be mixing up requests when received out of order maybe?

Setup

Onedir setup pinned to the minor version 3006.2 + these modules pip installed:

arrow==1.2.3
PyMySQL==1.0.*
pyroute2==0.7.*
netifaces==0.10.*
netaddr==0.8.*
dnspython==2.0.*
pynetbox==6.5.*

layout: 1 master of masters -> 2 syndics -> minions which only report to only one (their geographically closest) syndic

Masters/Syndics are on CentOS 7. Minions a mixed bag of CentOS7 / CloudLinux 7 / AlmaLinux8 / Ubuntu20 / Ubuntu22.

Masters/Syndics are on OpenVZ. Minions are a mixture VM types and physical machines - but the issue happens on physical minions, so I don't think that's related.

We use gitfs for pillar and states, and TCP for transport.

Steps to Reproduce the behavior

If I sign into a minion that doesn't have the issue, I can cause it to happen by simulating some network delay:

tc qdisc add dev eth0 root netem delay 100ms 20ms distribution normal
salt-call state.test

Expected behavior

Highstate applied without crashing with the nonce exception.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

Versions on minions/master/syndics are identical. This is collected from an affected minion:

Salt Version:
          Salt: 3006.2

Python Version:
        Python: 3.10.12 (main, Aug  3 2023, 21:47:10) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: 1.3.0
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: 1.7.0
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.3
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: cloudlinux 7.9 Boris Yegorov
        locale: utf-8
       machine: x86_64
       release: 3.10.0-1160.el7.x86_64
        system: Linux
       version: CloudLinux 7.9 Boris Yegorov

Metadata

Metadata

Assignees

Labels

TransportVMwarebugbroken, incorrect, or confusing behavior

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions