Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Do not block writing to wake socket #1767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

dpkp
Copy link
Owner

@dpkp dpkp commented Mar 29, 2019

Fixes #1760 . We do not need to block on writes to the wakeup socket pair. This should avoid the root cause of #1760, which is that the socket buffer gets full and blocks the thread that is responsible for reading / draining it.


This change is Reviewable

@braedon
Copy link
Contributor

braedon commented Mar 29, 2019

I've run a number of medium-term (~15min) tests, and haven't had any KafkaTimeoutErrors, which is promising.

However, I am only getting ~25% of the performance compared to 1.4.5 (with #1765). (That's an extra reduction on top of the performance drop I see going from 1.4.4 -> 1.4.5)

There seem to be frequent pauses, which are suspiciously always (that I've seen) just under 30s. For example:

[2019-03-29 07:49:23,589] root.INFO MainThread Starting waiting for produce futures
[2019-03-29 07:49:52,613] root.INFO MainThread Finished waiting for produce futures  <-- ~30s wait
[2019-03-29 07:49:52,613] root.INFO MainThread Confirmed production of 1 messages
[2019-03-29 07:49:52,616] root.INFO MainThread Confirmed production of 60 messages
[2019-03-29 07:49:52,616] root.INFO MainThread Starting offset commit
[2019-03-29 07:49:52,619] root.INFO MainThread Finished offset commit
[2019-03-29 07:49:52,697] root.INFO MainThread Confirmed production of 565 messages
...
[2019-03-29 07:49:57,618] root.INFO MainThread Confirmed production of 689 messages
[2019-03-29 07:49:57,618] root.INFO MainThread Starting offset commit
[2019-03-29 07:49:57,639] root.INFO MainThread Finished offset commit
[2019-03-29 07:49:58,143] root.INFO MainThread Confirmed production of 800 messages
...
[2019-03-29 07:50:01,084] root.INFO MainThread Confirmed production of 276 messages
[2019-03-29 07:50:01,298] root.INFO MainThread Starting waiting for produce futures
[2019-03-29 07:50:30,262] kafka.conn.INFO kafka-python-producer-1-network-thread <BrokerConnection node_id=3 host=kafka-3:9092 <connecting> [IPv4 ('0.0.0.3', 9092)]>: connecting to kafka-3:9092 [('0.0.0.3', 9092) IPv4]
[2019-03-29 07:50:30,263] kafka.conn.INFO kafka-python-producer-1-network-thread <BrokerConnection node_id=3 host=kafka-3:9092 <connecting> [IPv4 ('0.0.0.3', 9092)]>: Connection complete.
[2019-03-29 07:50:30,282] root.INFO MainThread Finished waiting for produce futures  <-- ~30s wait
[2019-03-29 07:50:30,282] root.INFO MainThread Confirmed production of 1 messages
[2019-03-29 07:50:30,284] root.INFO MainThread Confirmed production of 118 messages
[2019-03-29 07:50:30,285] root.INFO MainThread Starting offset commit
[2019-03-29 07:50:30,289] root.INFO MainThread Finished offset commit
[2019-03-29 07:50:30,396] root.INFO MainThread Confirmed production of 882 messages
...
[2019-03-29 07:50:35,287] root.INFO MainThread Confirmed production of 434 messages
[2019-03-29 07:50:35,288] root.INFO MainThread Starting offset commit
[2019-03-29 07:50:35,297] root.INFO MainThread Finished offset commit
[2019-03-29 07:50:35,970] root.INFO MainThread Confirmed production of 839 messages
...
[2019-03-29 07:50:39,707] root.INFO MainThread Confirmed production of 465 messages
[2019-03-29 07:50:39,908] root.INFO MainThread Starting waiting for produce futures
[2019-03-29 07:50:40,307] root.INFO MainThread Finished waiting for produce futures
[2019-03-29 07:50:40,307] root.INFO MainThread Confirmed production of 1 messages
[2019-03-29 07:50:40,309] root.INFO MainThread Confirmed production of 85 messages
[2019-03-29 07:50:40,309] root.INFO MainThread Starting offset commit
[2019-03-29 07:50:40,313] root.INFO MainThread Finished offset commit
[2019-03-29 07:50:40,403] root.INFO MainThread Starting waiting for produce futures
[2019-03-29 07:50:40,506] root.INFO MainThread Finished waiting for produce futures
[2019-03-29 07:50:40,506] root.INFO MainThread Confirmed production of 85 messages
...
[2019-03-29 07:50:44,705] root.INFO MainThread Confirmed production of 182 messages
[2019-03-29 07:50:44,790] root.INFO MainThread Starting waiting for produce futures
[2019-03-29 07:51:13,912] kafka.conn.INFO kafka-python-producer-1-network-thread <BrokerConnection node_id=0 host=kafka-0:9092 <connecting> [IPv4 ('0.0.0.0', 9092)]>: connecting to kafka-0:9092 [('0.0.0.0', 9092) IPv4]
[2019-03-29 07:51:13,913] kafka.conn.INFO kafka-python-producer-1-network-thread <BrokerConnection node_id=0 host=kafka-0:9092 <connecting> [IPv4 ('0.0.0.0', 9092)]>: Connection complete.
[2019-03-29 07:51:13,927] root.INFO MainThread Finished waiting for produce futures  <-- ~30s wait
[2019-03-29 07:51:13,927] root.INFO MainThread Confirmed production of 1 messages
[2019-03-29 07:51:13,930] root.INFO MainThread Confirmed production of 105 messages
[2019-03-29 07:51:13,930] root.INFO MainThread Starting offset commit
[2019-03-29 07:51:13,936] root.INFO MainThread Finished offset commit
[2019-03-29 07:51:14,059] root.INFO MainThread Confirmed production of 895 messages
[2019-03-29 07:51:14,615] root.INFO MainThread Starting waiting for produce futures
^C[2019-03-29 07:51:44,091] root.INFO MainThread Confirmed production of 1000 messages  <-- ~30s wait (where I killed the process)
[2019-03-29 07:51:44,092] root.INFO MainThread Starting offset commit
[2019-03-29 07:51:44,095] root.INFO MainThread Finished offset commit
[2019-03-29 07:51:44,095] kafka.producer.kafka.INFO MainThread Closing the Kafka producer with 9223372036.0 secs timeout.  <-- another ~30s wait after this
[2019-03-29 07:52:14,091] kafka.conn.INFO kafka-python-producer-1-network-thread <BrokerConnection node_id=5 host=kafka-5:9092 <connected> [IPv4 ('0.0.0.4', 9092)]>: Closing connection. 
...
[2019-03-29 07:52:14,092] kafka.coordinator.INFO MainThread Stopping heartbeat thread
[2019-03-29 07:52:14,092] kafka.coordinator.INFO MainThread Leaving consumer group (test-temp-0).
[2019-03-29 07:52:14,094] kafka.conn.INFO MainThread <BrokerConnection node_id=0 host=kafka-0:9092 <connected> [IPv4 ('0.0.0.0', 9092)]>: Closing connection. 
...
[2019-03-29 07:52:14,096] kafka.producer.kafka.INFO MainThread Kafka producer closed

@braedon
Copy link
Contributor

braedon commented Mar 29, 2019

Done some more tests to try and isolate the slowdown.

As noted in the original ticket discussion, I'm using this (very WIP, hacky) project to do the testing. Specifically, the pipe command, which pipes messages from one topic to another. It has some funky logic around determining when to wait for produce futures to complete, and when to commit offsets, but the core is pretty straightforward - read a message from the consumer iterator, send it with the producer.

On a particular cluster/topic I'm using for testing, this version pipes messages a rate of ~500/s, down from ~2k/s on 1.4.5, and ~5k/s on 1.4.4. CPU usage spikes to 100%, but drops to essentially 0% for long periods. Presumably this is related to the pauses observed above. Note that this happens even with the produce buffer is large enough to avoid ever explicitly waiting for a produce future.

Using the consume command to consume to a file (the consumer has the same settings as in pipe), the consumption rate is ~2-2.5k/s, and 100% CPU is used all the time. No pauses are seen.

Using the produce command to produce from a file to the topic (the produce has the same settings as in pipe) the production rate is > 3k/s. Pauses are seen (despite the higher rate).

@dpkp
Copy link
Owner Author

dpkp commented Mar 29, 2019

Ok -- backpressure clearly needed in your setup.

@dpkp dpkp closed this Mar 29, 2019
@jeffwidman
Copy link
Contributor

jeffwidman commented Apr 1, 2019

backpressure clearly needed in your setup.

Can you clarify what you mean here?

Perhaps I'm just a bit dense tonight, but I read through the linked PR's and not understanding whether you mean @braedon needs to add backpressure within his test harness or that backpressure needs to be added to kafka-python somehow...

@dpkp
Copy link
Owner Author

dpkp commented Apr 1, 2019 via email

@dpkp dpkp deleted the nonblocking_wake_sockets branch April 3, 2019 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants