Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jmfee-usgs
Copy link
Contributor

@jmfee-usgs jmfee-usgs commented Nov 29, 2016

What does this PR do?

Check the return value from socket.recv before buffering the (empty) response and looping forever.

Why was it initiated? Any relevant Issues?

I couldn't find any existing issues for this specific problem.

On a heavily loaded Scientific Linux 7 (RHEL flavor) system, we run many concurrent processes using the earthworm client to access data. Intermittently (<1%, but every few hours) the process would spike CPU and leak memory until the kernel killed the process because the system ran out of memory. Logs indicated this was while fetching data using the earthworm obspy client. Using a copy of the earthworm client with these modifications has thus far eliminated this leak.

From https://docs.python.org/2/howto/sockets.html#using-a-socket

When a recv returns 0 bytes, it means the other side has closed (or is in the process of closing) the connection. You will not receive any more data on this connection. Ever.

PR Checklist

  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
    This is not reproducible under normal conditions.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .

@QuLogic
Copy link
Member

QuLogic commented Nov 29, 2016

We need to decide what to do when the socket is closed but < the expected number of bytes is received.

@megies
Copy link
Member

megies commented Nov 30, 2016

I don't have much input for this, just wanted to note that there's two more places in our code base with a socket.recv:

  • clients/neic/client.py line 204
  • clients/seedlink/client/seedlinkconnection.py line 1202

Also, this might even be considered a bug fix? (Which would qualify it for maintenance_1.0.x and thus 1.0.3?)

+TESTS:clients.earthworm,clients.seedlink (to make Travis run those network modules' tests as well)

@jmfee-usgs
Copy link
Contributor Author

@megies: I checked those clients (neic, seedlinkconnection) but it appeared they are using nonblocking sockets, which should generate an exception instead of fail silently.

When I put in the pr, I checked but didn't find the maintenance branch. Let me know if you need me to update the pr.

@megies
Copy link
Member

megies commented Nov 30, 2016

When I put in the pr, I checked but didn't find the maintenance branch. Let me know if you need me to update the pr.

OK. Also, there should be a tick box for you on this page to the right "Allow edits from maintainers. ". If you tick it we should be able to push to your branch if necessary.

@megies: I checked those clients (neic, seedlinkconnection) but it appeared they are using nonblocking sockets, which should generate an exception instead of fail silently.

Alright. Like I said, I'm no expert on sockets so I'll leave this to @QuLogic and @krischer and whoever feels qualified to review this.

@krischer
Copy link
Member

krischer commented Dec 1, 2016

Looks good to me and I don't think the change can have any negative side effects! Thanks a lot!

I guess the memory leak happened because it kept on looping and adding empty things to the chunks list?

Can you please make your changes to the maintenance_1.0.x branch and force push? Let us know if you need help with this.

We need to decide what to do when the socket is closed but < the expected number of bytes is received.

It will probably just raise some kind of parsing error later down the line which is fine IMHO.

@megies megies added this to the 1.0.3 milestone Dec 1, 2016
@jmfee-usgs jmfee-usgs force-pushed the socket-recv-handling branch from 7a33c1e to 701bdb6 Compare December 1, 2016 16:12
@jmfee-usgs jmfee-usgs changed the base branch from master to maintenance_1.0.x December 1, 2016 16:12
@jmfee-usgs
Copy link
Contributor Author

Rebasing ended up being a bad idea... Changes updated to start from maintenance_1.0.x.

@krischer
Copy link
Member

krischer commented Dec 1, 2016

Yea rebasing does not work here. You could have cherry picked the commits on top of a branch off of maintenance. But whatever you did - looks good now :-)

@krischer
Copy link
Member

krischer commented Dec 1, 2016

Can be merged depending on what CI says.

@krischer
Copy link
Member

krischer commented Dec 1, 2016

@megies: Are the seedlink and earthworm tests supposed to be executed everywhere? I don't see them anywhere.

@megies
Copy link
Member

megies commented Dec 2, 2016

@krischer, no, just Travis picks that up currently (see #1408 (comment)). My Windows-fu wasn't up to adding it to Appveyor too, and the docker scripts are pretty messy already, so I didn't want to add more mess on top.

Now, don't ask me why Travis didn't run this PR.. (maybe it got confused by the rebase)

@krischer
Copy link
Member

krischer commented Dec 2, 2016

I see. I'm merging in any case - I'm fairly sure this works based on the changes + the socket module documentation. Additionally I tested it locally for Python 2 and 3.

@jmfee-usgs: Thanks for this!

@krischer krischer merged commit 6b2f238 into obspy:maintenance_1.0.x Dec 2, 2016
@megies
Copy link
Member

megies commented Dec 2, 2016

I ran this PR through our docker images, looking good. 👍 http://tests.obspy.org/?git=701bdb68fd&node=docker-

Thanks @jmfee-usgs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants