Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fixing a performance issue in SSE client #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 30, 2018
Merged

Fixing a performance issue in SSE client #221

merged 8 commits into from
Nov 30, 2018

Conversation

hiranya911
Copy link
Contributor

@hiranya911 hiranya911 commented Nov 14, 2018

Improves the streaming RTDB API performance by checking for end of field more efficiently. Specifically, instead of matching a regex on each iteration, we keep track of the last 4 characters seen, and perform a simple string comparison.

Here are some preliminary test results to get an idea of the improvement.

Node Size Time to process (Before) Time to process (After)
100K 40.83s 1.73s
1M - 11.36s

Fixes #198

@daniel-ziegler
Copy link

Doesn't work for me at all on Mac OS X -- I simply don't receive any listen events. This definitely needs to be tested with a real HTTP stream.

Also note that while this should hopefully improve the constant factor, it's still O(N^2).

@hiranya911
Copy link
Contributor Author

There seems to be a requests bug here, which I've reported at psf/requests#4876

The proposed solution essentially reduces N by reading data in chunks. With the fix I've tested payloads (in the unit tests) as large as 10MB, which can be processed under a second.

@hiranya911
Copy link
Contributor Author

hiranya911 commented Nov 17, 2018

Attached is another solution I've been tinkering with. It doesn't change how we read content from requests -- just changes how we handle the incoming data:

  • Using a list to buffer incoming data as opposed to string concatenation which is inefficient at large sizes.
  • Getting rid of the regex check on each iteration. The alternative check I've implemented assumes \n\n as the only acceptable terminator, which seems to be correct for Firebase.

With this I was able to read a 1MB node from a production database in under 10 seconds. Feedback welcome.

sse.txt

@daniel-ziegler
Copy link

I like that solution better since it is indeed linear time. Of course for maximum performance we'd need to figure out how to read in bigger chunks and not process each character individually in Python, but I'm happy with the improvement.

@hiranya911
Copy link
Contributor Author

@daniel-ziegler I've cleaned up the 2nd solution and committed it. Can you give it a try? My performance numbers are added to the PR description.

@daniel-ziegler
Copy link

Seems to work; obviously I haven't tested it very thoroughly.

@hiranya911 hiranya911 assigned hiranya911 and unassigned bklimt Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants