Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Use static channels for node syncing. #1371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from May 22, 2014
Merged

Use static channels for node syncing. #1371

merged 2 commits into from May 22, 2014

Conversation

2fours
Copy link

@2fours 2fours commented Dec 13, 2013

Every time a connection is made, socket.io subscribes to 5 additional channels in redis to keep sync in state across multiple nodes (which are unsubscribed on disconnect). This causes problems with high numbers of concurrent connections, eg. 10000 concurrent = 50000 redis subscriptions, which causes redis to use orders of magnitude more CPU.

This patch uses 5 static channels to convey the sync information instead of subscribing and unsubscribing with every connection. The pub/sub count in redis now stays at 13 whether 10 users are connected, 10,000, or 100,000.

@2fours
Copy link
Author

2fours commented Dec 13, 2013

This might help with issues #1133 #1303 #1269 #1064 #862

@cendrizzi
Copy link

Hi 2fours. Can this be considered a viable replacement to the current redid store? Have you deployed this on a large scale app or done any scale testing against it? Just curious because your solution looks promising and just wanted to get a little more information about it.

@2fours
Copy link
Author

2fours commented Dec 19, 2013

I load tested it to 2000 concurrent connections and didn't see issues. I've also been testing it in production for a week and haven't noticed any issues but I don't get that much load (< 100 concurrent) in production yet. I think it has promise but I would definitely do some testing of your own.

@cendrizzi
Copy link

Sounds very promising. I'll test it on my end as well. Thanks for your work on this.

@2fours
Copy link
Author

2fours commented Dec 19, 2013

Prego! I would be interested in hearing the outcome of your tests.

@machard
Copy link

machard commented Dec 19, 2013

could it solve #1097 ?

@toblerpwn
Copy link

We are also looking to test this fork (and/or an earlier fork you merged, #1260 ) in production soon (after the holidays).

We have ~2,000 concurrent sockets for ~12 hours each day in prod (500-1,000 concurrent in off-hours), with each client emitting ~1 message to the server per 2-3 seconds (so pretty busy per socket - we do something like 15gb of data transmission in/out from/to our server per day in JUST socket messages/packets).

We have tested ~300 concurrent users in our load testing environment without issue (for technical reason on our side, we can't currently test many more than that).

I'll let you know how it goes!

(Also looking forward to socket.io 1.0.0, which seems like it's getting closer! It looks like 1.0.0 gets rid of RedisStore entirely and implements its own thing? YAY!)

@toblerpwn
Copy link

Update/tl;dr: this fork works for us in prod.

After running test with other, older/more established forks, the RedisStore leak situation improved DRASTICALLY (see my comments in #1260 for more), but some leaks remained.

So, we tried this fork - first in staging, then in prod. The short version is that we have been live in prod for about 24 hours with a peak of ~3,500 concurrent socket connections (sustained for several hours) with NO LEAKS AT ALL.

For a more detailed explanation, I am attaching two images from our experiments in our staging environment.

First, simply, a display of ~36 hours of this fork running. You'll notice that CPU usage is totally flat (other than at the times of our own nightly scripts). From start to finish - no change.

surespot-channelfix-1

Second, for those curious about the results of other branches/forks/pull requests, here is the same test run in on 3 different forks of Socket.io 0.9.x on the same AWS instance. (Note that this is a staging environment, but it essentially matches what we've seen in prod.)

surespot-channelfix-2

Results:

LearnBoost/socket.io#0.9 - we all know RedisStore leaks like crazy here. It seems to be related to connects/disconnects. In our load testing environment, we can see the leak maxes out our server in ~2 hours. We used this period of time as a benchmark by which to compare various fixes.

ifsnow/socket.io#improve_redis_store - good results, but not perfect; there is still a leak. In our load testing environment, we see that our CPUs get pegged (and actually crash in this case) after ~20 hours. So, roughly 10x better at handling connects/disconnects vs. the primary socket.io branch.

surespot/socket.io#channelfix - works perfectly with NO LEAKS the first ~24 hours in prod, and for an entire ~36 hours of constant load in our load testing environment (this would be ~500,000 unique of sockets disconnecting and reconnecting in a constant stream). We feel pretty damn good about this pull request, but will report back if any issues arise.

THANK YOU SURESPOT AUTHORS. :)

@2fours
Copy link
Author

2fours commented Jan 2, 2014

Thanks for reporting back on this, great news! You are welcome! Things are looking pretty stable over here too.

@filipedeschamps
Copy link

Guys, how are you?

I'm new to Github and I'm wondering how are you testing this. Do you pull
the master branch from Github and apply this patch? Or you have to make a
fork or something?

Best regards.

On Thu, Jan 2, 2014 at 12:11 PM, 2fours [email protected] wrote:

Thanks for reporting back on this, great news! You are welcome! Things are
looking pretty stable over here too.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1371#issuecomment-31454427
.

@cendrizzi
Copy link

Toblerpwn. Excellent, detailed feedback. Thanks for the update!

filipedeschamps, It only effects three files. I personally just went to the files changed tab and downloaded and replaced those files.

@toblerpwn
Copy link

@filipedeschamps - Assuming you're using NPM, here is how I manage it:

In your package.json, remove your existing socket.io line (probably something like "socket.io": "~0.9") and replace it with something like this:

"socket.io": "git+ssh://[email protected]:surespot/socket.io.git#channelfix"

..you may also use other git+... protocols (e.g. git+https, etc), which all vary slightly in syntax and purpose. More usage details are here:

https://npmjs.org/doc/json.html#Git-URLs-as-Dependencies

Git urls can be of the form:

git://github.com/user/project.git#commit-ish
git+ssh://user@hostname:project.git#commit-ish
git+ssh://user@hostname/project.git#commit-ish
git+http://user@hostname/project/blah.git#commit-ish
git+https://user@hostname/project/blah.git#commit-ish

The commit-ish can be any tag, sha, or branch which can be supplied
as an argument to git checkout. The default is master.

Two notes worth calling out:

  • This should be self-evident, but note that by adopting this fork you are effectively OFF the main repo for the socket.io - i.e. your npm update commands will only look at surespot's repo, not the main socket.io repo.
  • And related to that, by pointing at surespot's repo directly you are running the risk that they will make breaking changes, etc, and (temporarily) screw your environment.

In other words, be VERY CAREFUL about running npm update when you're pointing to a non-core branch, and seek to re-join the core socket.io repo/branch as soon as possible. (In fact, in most cases, I would not even recommend using this approach, but socket.io is on a weird haiatus at the moment, which is the whole reason we're here and unfortunately a cause for crazy solutions.)

An alternative and perhaps safer solution would be to reproduce this code (as @cendrizzi suggested) in your own fork. I personally chose to point to surespot's branch because I am watching this issue like a hawk every day and living dangerously. 😄

@filipedeschamps
Copy link

Guys, that's awesome!

I run a brazilian stock market website that is heavily built on top of
socket.io http://www.insidernews.com.br/ it's basic a realtime chat/news
with coin exchange and XP points.

I'm not a skilled programer, but socket.io make things so easy that it
enables people like me to make impossible things (impossible in my
experience range :)

I can't help with socket.io coding currently, but I will try to help
testing new patches at least. I will publish this patch in production to
see the results.

Best regards,

On Fri, Jan 3, 2014 at 6:04 AM, Sean [email protected] wrote:

@filipedeschamps https://github.com/filipedeschamps - Assuming you're
using NPM, here is how I manage it:

In your package.json, remove your existing socket.io line (probably
something like "socket.io": "~0.9") and replace it with something like
this:

"socket.io": "git+ssh://[email protected]:surespot/socket.io.git#channelfix"

..you may also use other git+... protocols (e.g. git+https, etc), which
all vary slightly in syntax and purpose. More usage details are here:

https://npmjs.org/doc/json.html#Git-URLs-as-Dependencies

Git urls can be of the form:

git://github.com/user/project.git#commit-ish
git+ssh://user@hostname:project.git#commit-ish
git+ssh://user@hostname/project.git#commit-ish
git+http://user@hostname/project/blah.git#commit-ish
git+https://user@hostname/project/blah.git#commit-ish

The commit-ish can be any tag, sha, or branch which can be supplied
as an argument to git checkout. The default is master.

Two notes worth calling out:

This should be self-evident, but note that by adopting this fork you
are effectively OFF the main repo for the socket.io - i.e. your npm
update commands will only look at surespot's repo, not the main
socket.io repo.
-

And related to that, by pointing at surespot's repo directly you are
running the risk that they will make breaking changes, etc, and
(temporarily) screw your environment.

In other words, be VERY CAREFUL about running npm update when you're
pointing to a non-core branch, and seek to re-join the core socket.iorepo/branch as soon as possible. (In fact, in most cases, I would not even
recommend using this approach, but socket.io is on a weird haiatus at the
moment, which is the whole reason we're here and unfortunately a cause for
crazy solutions.)

An alternative and perhaps safer solution would be to reproduce this code
(as @cendrizzi https://github.com/cendrizzi suggested) in your own
fork. I personally chose to point to surespot's branch because I am
watching this issue like a hawk every day and living dangerously. [image:
😄]


Reply to this email directly or view it on GitHubhttps://github.com//pull/1371#issuecomment-31509802
.

@freeman983
Copy link

@toblerpwn @2fours

We have two machines in the online operation,There are all about 4000 connections.
I use surespot/socket.io#channelfix run 96 hours
Its works is much better than others,But not perfect .

I have the memory for single worker process :

24hours: 176mb
36hours: 265mb
48hours: 368mb
72hours: 432mb
96hours: 509mb

I noticed io.sockets.clients().length still growth

io.sockets.clients():7983
io.sockets.clients():8042
io.sockets.clients():7947

the link to download heapdump file
http://pan.baidu.com/share/link?shareid=2804261227&uk=2113963308#dir/path=%2Fnodeheap

sockets in SocketNamespace 81134
handshaken in Manager 4110
roomClients in Manager 4108

@cthomaschase
Copy link

@2fours and @toblerpwn thanks for your incredible work and sleuthing here! I was about to scrap our hand-rolled implementation as our servers were getting crushed...without even the load you have!

screen-shot-2014-01-26-at-7 12 33-pm

@2fours
Copy link
Author

2fours commented Jan 28, 2014

@cthomaschase excellent news, thanks! @freeman983 I tested your memory leak fix, it doesn't change anything in our case but we are only using websocket transport so maybe there are issues with the other transports.

@cendrizzi
Copy link

Hi.

Thought I would follow up. I've been using this for some time and the results look very encouraging. This is certainly an important fix. At this point I think we really should get this pulled into the 0.9 series. I know there is work now being done in earnest on 1.0 but the reality is for me I'm stuck on 0.9 for the time being as I have some c++ code hitting and using the 0.9 protocol. So anything to give 0.9 legs is important for me.

Thanks again 2fours for your work.

rauchg added a commit that referenced this pull request May 22, 2014
Use static channels for node syncing.
@rauchg rauchg merged commit fea676b into socketio:0.9 May 22, 2014
@rauchg
Copy link
Contributor

rauchg commented May 22, 2014

0.9.17 is out. This is a great performance improvement, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants