Thanks to visit codestin.com
Credit goes to github.com

Skip to content

KafkaConsumer.subscribe(pattern='x') sometimes picks up topic but not partitions #1237

Closed
@jeffwidman

Description

@jeffwidman

There appears to be a race condition bug of some kind in KafkaConsumer.subscribe(pattern='some pattern').

Normally the call works fine, the consumers picks up matching topics, assigns partitions to group members, etc.

However, once in a blue moon I've observed that the consumer finds the matching topic, but never successfully assigns the topic partitions to the group members. Once it's in this state, it will call poll() for hours without returning messages because the consumer thinks it has no assigned partitions, and because the consumer's subscription already contains the topic, there's never a change that triggers a rebalance.

I'm embarrassed to say that I've spent 40+ hours over the past two weeks trying to figure this out as we hit it in production, but all I've managed to do is isolate is a semi-consistently reproducible example. Unfortunately that requires running a service that has a KafkaConsumer instance and has a bunch of associated docker containers, so I can't make this setup public. The wrapper service does use gevent which I'm not very familiar with, but I disabled all the service's other greenlets so I don't think that should affect this at all.

Every time I try to isolate it down to a simple kafka-python script, I cannot reproduce it. But after spending hours stepping through the code, I'm reasonably certain it's a race condition in kafka-python and not the wrapper service.

Here's what I know:

  1. The issue doesn't show up the first time I run the service. If I kill the service (without calling KafkaConsumer.close()) and then restart it before the group coordinator evicts the consumer from the group, then I trigger the issue. If I then kill it, wait until I know the group coordinator has evicted all consumers, and then re-run it, it will work fine. Unfortunately, I have no idea if this behavior is related to the root cause, or just a trigger that makes the docker kafka container busy enough that it slows down its response times.

  2. In the failure case, calling KafkaConsumer.subscription() returns the expected topic name, but calling KafkaConsumer.assignment() returns an empty set.

  3. In the failure case, I can see that the cluster metadata object has both the topic and the list of partitions, so the cluster metadata is getting correctly updated, it's just not making it into the group assignments.

  4. SubscriptionState.change_subscription() has a check that short circuits the group rebalance if the previous/current topic subscriptions are equal. If I comment out this return in that short circuit check, the group rebalances properly and the problem disappears.

  5. Tracing the TCP calls in Wireshark, I see the following:

    Success case:

    1. Metadata v1 Request
    2. Metadata v2 Response
    3. GroupCoordinator v0 Request
    4. GroupCoordinator v0 Response
    5. JoinGroup v0 Request - protocol member metadata is all 0's
    6. JoinGroup v0 Response - protocol member metadata is all 0's
    7. SyncGroup v0 Request - member assignment is all 0's
    8. SyncGroup v0 Response - member assignment is all 0's
      (note this is a second generation of the group)
    9. JoinGroup v0 Request - protocol member metadata has data
    10. JoinGroup v0 Response - protocol member metadata has data
    11. SyncGroup v0 Request - member assignment has data
    12. SyncGroup v0 Response - member assignment has data
    13. From here on it's the expected behavior of polling the assigned parttions with the occasion Metadata Request/Response when the metadata refresh timeout kicks in

    Failure case:
    1. Metadata v1 Request
    2. Metadata v2 Response
    3. GroupCoordinator v0 Request
    4. GroupCoordinator v0 Response
    5. JoinGroup v0 Request - protocol member metadata is all 0's
    6. JoinGroup v0 Response - protocol member metadata is all 0's
    7. SyncGroup v0 Request - member assignment is all 0's
    (Here is the problem, we never trigger a second JoinGroup v0 Request that contains the partition data)
    8. From here on there are no requests other than the Metadata Request/Response when the metadata refresh timeout kicks in

Setup:

  • Single Kafka broker, version 0.10.2.1, running on docker.
  • Single instance of the consumer, so it always elects itself as the leader and consumes all partitions for the topic.
  • To keep things simple, my topic has only one partition. However, this race condition might be partition agnostic, meaning that a consumer might be working perfectly and then we expand the number of partitions and it might not pick up that the partitions changed.

After spending a lot of time poking through this code, I understand why the consumer is stuck once this happens, but I don't understand how it gets into this state in the first place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions