Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LostRemote exceptions while using gevent.zeorpc #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grzn opened this issue Jun 4, 2013 · 15 comments
Closed

LostRemote exceptions while using gevent.zeorpc #62

grzn opened this issue Jun 4, 2013 · 15 comments

Comments

@grzn
Copy link
Contributor

grzn commented Jun 4, 2013

We've been experiencing issues with zerorpc and gevent.subprocess.

Consider the following test:

from unittest import TestCase
from time import time
from gevent import spawn, sleep, event
from gevent.subprocess import Popen
from zerorpc import Server, Client


class Service(object):
    def get_nothing(self):
        pass


class BackgroundJobTestCase(TestCase):
    def setUp(self):
        self.stop_event = event.Event()
        self.background_job = spawn(self._start_job)

    def _start_job(self):
        while not self.stop_event.is_set():
            Popen("sleep 1", shell=True).wait()  # this does not work
            # sleep(1)  # this works

    def tearDown(self):
        self.stop_event.set()
        self.background_job = self.background_job.join()

    def test_rpc_with_background_job_for_longer_periods(self, duration_in_seconds=600):
        server = Server(Service())
        server.bind("tcp://0.0.0.0:7001")
        server._acceptor_task = spawn(server._acceptor)  # do not worry about teardown - the test fails in the middle
        client = Client()
        client.connect("tcp://0.0.0.0:7001")
        start_time = time()
        while abs(time() - start_time) <= duration_in_seconds:
            client.get_nothing()

Running this on CentOS 6.4 fails with:

bin/nosetests tests/long_tests/zerorpc_and_subprocess.py                   13-06-04 19:50
Traceback (most recent call last):
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 328, in run
    result = self._run(*self.args, **self.kwargs)
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/events.py", line 63, in _sender
    self._socket.send(parts[-1])
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/gevent_zmq.py", line 104, in send
    self._on_state_changed()
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/gevent_zmq.py", line 72, in _on_state_changed
    events = self.getsockopt(_zmq.EVENTS)
  File "socket.pyx", line 388, in zmq.core.socket.Socket.get (zmq/core/socket.c:3713)
  File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5859)
ZMQError: Interrupted system call
<Greenlet at 0x284c5f0: <bound method Sender._sender of <zerorpc.events.Sender object at 0x2864a50>>> failed with ZMQError

E
======================================================================
ERROR: test_rpc_with_background_job_for_longer_periods (zerorpc_and_subprocess.BackgroundJobTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/mainline/tests/long_tests/zerorpc_and_subprocess.py", line 37, in test_rpc_with_background_job_for_longer_periods
    client.get_nothing()
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/core.py", line 256, in <lambda>
    return lambda *args, **kargs: self(method, *args, **kargs)
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/core.py", line 241, in __call__
    return self._process_response(request_event, bufchan, timeout)
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/core.py", line 213, in _process_response
    reply_event = bufchan.recv(timeout)
  File "/root/mainline/eggs/zerorpc-0.4.1.2-py2.7.egg/zerorpc/channel.py", line 262, in recv
    event = self._input_queue.get(timeout=timeout)
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/queue.py", line 200, in get
    result = waiter.get()
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/hub.py", line 568, in get
    return self.hub.switch()
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
LostRemote: Lost remote after 10s heartbeat

----------------------------------------------------------------------
Ran 1 test in 102.469s

FAILED (errors=1)

with gevent-1.0rc2 and zerorpc-0.4.1

@bombela
Copy link
Member

bombela commented Jun 10, 2013

Hello,

Instead of:
server._acceptor_task = spawn(server._acceptor)

Try:
spawn(server.run)

best,
fx

@bombela
Copy link
Member

bombela commented Jun 10, 2013

Ok, so if I run your test long enough, I get the interrupted syscall exception. Either on the sender or the receiver part of the events object. I am pushing a fix soon.

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

Running with 0.4.2, although it passes, it prints alot to stdout:

Traceback (most recent call last):
  File "/root/mainline/eggs/zerorpc-0.4.2-py2.7.egg/zerorpc/gevent_zmq.py", line 73, in _on_state_changed
    events = self.getsockopt(_zmq.EVENTS)
  File "socket.pyx", line 388, in zmq.core.socket.Socket.get (zmq/core/socket.c:3713)
  File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5859)
ZMQError: Interrupted system call
<io at 0x3627e60 fd=13 events=READ active callback=<bound method Socket._on_state_changed of <zerorpc.gevent_zmq.Socket object at 0x3621e20>> args=()> failed with ZMQError

/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (RECV) /!\

and the prints to stderr continue until the test finishes running.
should I report this to pyzmq?

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

Still fails:

/!\ gevent_zeromq BUG /!\ catching up after missing event (SEND) /!\
/!\ gevent_zeromq BUG /!\ catching up after missing event (SEND) /!\
zerorpc.ChannelMultiplexer, unable to route event: _zpc_hb {'response_to': '8b3e0ea0-b0bf-4e25-9563-8cfdb0445c35', 'zmqid': ['\x00?\x96\xa3\xce'], 'message_id': '8b3e0ea2-b0bf-4e25-9563-8cfdb0445c35', 'v': 3} [...]
zerorpc.ChannelMultiplexer, unable to route event: _zpc_hb {'response_to': '8b3e0ea0-b0bf-4e25-9563-8cfdb0445c35', 'zmqid': ['\x00?\x96\xa3\xce'], 'message_id': '8b3e0ea3-b0bf-4e25-9563-8cfdb0445c35', 'v': 3} [...]
E
======================================================================
ERROR: test_rpc_with_background_job_for_longer_periods (zerorpc_and_subprocess.BackgroundJobTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/mainline/tests/long_tests/zerorpc_and_subprocess.py", line 35, in test_rpc_with_background_job_for_longer_periods
    client.get_nothing()
  File "/root/mainline/eggs/zerorpc-0.4.2-py2.7.egg/zerorpc/core.py", line 256, in <lambda>
    return lambda *args, **kargs: self(method, *args, **kargs)
  File "/root/mainline/eggs/zerorpc-0.4.2-py2.7.egg/zerorpc/core.py", line 241, in __call__
    return self._process_response(request_event, bufchan, timeout)
  File "/root/mainline/eggs/zerorpc-0.4.2-py2.7.egg/zerorpc/core.py", line 213, in _process_response
    reply_event = bufchan.recv(timeout)
  File "/root/mainline/eggs/zerorpc-0.4.2-py2.7.egg/zerorpc/channel.py", line 262, in recv
    event = self._input_queue.get(timeout=timeout)
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/queue.py", line 200, in get
    result = waiter.get()
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/hub.py", line 568, in get
    return self.hub.switch()
  File "/root/mainline/eggs/gevent-1.0rc2-py2.7-linux-x86_64.egg/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
LostRemote: Lost remote after 10s heartbeat

----------------------------------------------------------------------
Ran 1 test in 141.623s

FAILED (errors=1)

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

there's no re-open button, should I create a new issue?

@bombela bombela reopened this Jun 10, 2013
@bombela
Copy link
Member

bombela commented Jun 10, 2013

I cannot reproduce your problem with:
gevent==1.0dev
pyzmq==13.1.0
zerorpc=0.4.2

It spans a new sleep 1 every second, and adding print statements shows that the zerorpc service is working fine.

I get this message: /!\ gevent_zeromq BUG /!\ catching up after missing event (SEND) /!
One every few minutes or so. Which means zerorpc used a workaround to continue handling some missed messages. But besides that, its working perfectly well for me.

@bombela
Copy link
Member

bombela commented Jun 10, 2013

ah after 15min I got one: Traceback (most recent call last):
File "/home/bombela/zerorpc/zerorpc-python/zerorpc/gevent_zmq.py", line 73, in _on_state_changed
events = self.getsockopt(_zmq.EVENTS)
File "socket.pyx", line 388, in zmq.core.socket.Socket.get (zmq/core/socket.c:3700)
File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5838)
ZMQError: Interrupted system call
<io at 0x1f24668 fd=17 events=READ active callback=<bound method Socket._on_state_changed of <zerorpc.gevent_zmq.Socket object at 0x1f1b1f0>> args=()> failed with ZMQError

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

I'm using:
zerorpc-0.4.2
gevent-1.0rc2
pyzmq-13.1.0

on which OS are you running on?

bombela added a commit that referenced this issue Jun 10, 2013
for some reason, zmq thorw EINTR on getsockopt calls now...
ref #62
@bombela
Copy link
Member

bombela commented Jun 10, 2013

linux. to be accurate I use gevent: git://github.com/surfly/gevent.git@454a77ca561868854760b2d9cbfa3bf3bbd2e062#egg=gevent-dev

I just pushed a change on master, try with that... seriously, zmq is now throwing EINTR at every possible call...

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

that's gevent-1.0rc2
i'll try the thing you pushed now

On Monday, June 10, 2013 at 9:47 AM, François-Xavier Bourlet wrote:

linux. to be accurate I use gevent: git://github.com/surfly/gevent.git@ (http://github.com/surfly/gevent.git@)454a77c (454a77ca561868854760b2d9cbfa3bf3bbd2e062)#egg=gevent-dev
I just pushed a change on master, try with that... seriously, zmq is now throwing EINTR at every possible call...


Reply to this email directly or view it on GitHub (#62 (comment)).

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

now, which your latest change, the test finally passes

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

i'll have it running a few more times to be sure.
would you release 0.4.3 it passes?

@bombela
Copy link
Member

bombela commented Jun 10, 2013

as soon as you confirm that it works for you

@grzn
Copy link
Contributor Author

grzn commented Jun 10, 2013

seems to be working, passed 3/3.
please bump and release
thanks!

@grzn grzn closed this as completed Jun 10, 2013
@bombela
Copy link
Member

bombela commented Jun 11, 2013

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants