Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[πŸ› BUG]: TLS Timeout configuration results in I/O timeouts when connecting to KafkaΒ #2168

@adamsnoah98

Description

@adamsnoah98

No duplicates πŸ₯².

  • I have searched for a similar issue in our bug tracker and didn't find any solutions.

What happened?

Roadrunner can fail to start when utilizing the Kafka driver with a configured TLS timeout due to an i/o timeout connecting to the kafka cluster.

The problem arises when using the kafka driver in a jobs pipeline with a TLS timeout duration specified. The docs specify and the server accepts a duration such as "10s"; however, kafkajobs/config.go treats tls.timeout as a seconds-int, rather than duration.

		if c.TLS.Timeout != 0 {
			netDialer.Timeout = c.TLS.Timeout * time.Second
		}

This can result in an int overflow, yielding a negative timeout config, cause instant io timeouts when connecting. Removing the timeout from the spec was sufficient in my case, as it only specified the default value of "10s". Patching this allows the Kafka driver to spin up for the jobs plugin even with a timeout specified.

Version (rr --version)

rr --version

rr version 2024.3.5 (build time: 2025-02-27T17:24:29+0000, go1.24.0), OS: linux, arch: amd64

How to reproduce the issue?

version: '3'

server:
  relay: pipes
  command: 'php <redacted>'
  env:
    APP_ENV: local
    APP_BASE_PATH: <redacted>
    LARAVEL_OCTANE: '1'

logs:
  mode: development
  level: debug
  encoding: console
  channels:
    http:
      mode: development
      level: debug
      encoding: console
      output: ["stdout"]
      err_output: ["stderr"]
    server:
      mode: development
      level: debug
      encoding: console
      output: ["stdout"]
      err_output: ["stdout"]
    rpc:
      mode: development
      level: debug
      encoding: console
      output: ["stderr"]
      err_output: ["stdout"]

http:
  address: 0.0.0.0:8080
  access_logs: true
  max_request_size: 64
  # Middlewares for the http plugin, order is important. Allowed values is: "headers", "gzip", "static", "sendfile",  [SINCE 2.6] -> "new_relic", [SINCE 2.6] -> "http_metrics", [SINCE 2.7] -> "cache"
  middleware: [ "static", "gzip" ]
  static:
    dir: "<redacted>"
  pool:
    debug: false
    num_workers: 5
    max_jobs: 50
    max_queue_size: 100
    allocate_timeout: 2s
    reset_timeout: 60s
    stream_timeout: 60s
    destroy_timeout: 30s

    dynamic_allocator:
      max_workers: 50
      spawn_rate: 10
      idle_timeout: 60s
    supervisor:
      watch_tick: 1s
      ttl: 0s
      idle_ttl: 1h
      max_worker_memory: 128
      exec_ttl: 60s

endure:
  # How long to wait for stopping.
  grace_period: 30s
  # Logging level. Possible values: "debug", "info", "warn", "error", "panic", "fatal".
  log_level: error

rpc:
  listen: tcp://127.0.0.1:6001

kafka:
  brokers: [ "<redacted>:9092" ]
  ping:
    timeout: "10s"
  tls:
    timeout: "10s" # Buggy field
    root_ca: "/etc/ssl/certs/ca-certificates.crt"
    client_auth_type: require_any_client_cert
  sasl: 
    # <redacted>

jobs:
  num_pollers: 1
  pipeline_size: 100000
  pool:
    num_workers: 6
    max_jobs: 0
    allocate_timeout: 2s
    destroy_timeout: 30s
  pipelines:
    kafka-pipeline:
      driver: kafka
      config:
        producer_options:
          required_acks: AllISRAck
          request_timeout: 5s
          delivery_timeout: 100s
        group_options:
          group_id: some-group-id

Serve roadrunner as usual, in this case it was hosting an octane project and connect to a TLS-capable kafka cluster:

rr -e -c=/etc/rr/.rr.yml serve

Relevant log output

2025-04-28T23:29:28+0000 DEBUG   rpc             plugin was started      {"address": "tcp://127.0.0.1:6001", "list of the plugins with RPC methods:": ["status", "resetter", "app", "jobs", "informer", "lock"]}
2025-04-28T23:29:28+0000 DEBUG   jobs            initializing driver     {"pipeline": "kafka-pipeline", "driver": "kafka"}
2025-04-28T23:29:28+0000 DEBUG   kafka.kgo       opening connection to broker    {"kgo_driver": "addr", "kgo_driver": "<redacted>:9092", "kgo_driver": "broker", "kgo_driver": "seed_0"}
2025-04-28T23:29:28+0000 WARN    kafka.kgo       unable to open connection to broker     {"kgo_driver": "addr", "kgo_driver": "<redacted>:9092", "kgo_driver": "broker", "kgo_driver": "seed_0", "kgo_driver": "err", "kgo_driver": "dial tcp: lookup <redacted>: i/o timeout"}
2025-04-28T23:29:28+0000 DEBUG   kafka.kgo       opening connection to broker    {"kgo_driver": "addr", "kgo_driver": "<redacted>:9092", "kgo_driver": "broker", "kgo_driver": "seed_0"}
2025-04-28T23:29:28+0000 WARN    kafka.kgo       unable to open connection to broker     {"kgo_driver": "addr", "kgo_driver": "<redacted>:9092", "kgo_driver": "broker", "kgo_driver": "seed_0", "kgo_driver": "err", "kgo_driver": "dial tcp: lookup <redacted>: i/o timeout"}
2025-04-28T23:29:28+0000 ERROR   jobs            failed to initialize driver     {"pipeline": "kafka-pipeline", "driver": "kafka", "error": "kafka_ping: unable to dial: dial tcp: lookup <redacted>: i/o timeout"}
handle_serve_command: Function call error:
serve error from the plugin *jobs.Plugin stopping execution, error: jobs_plugin_serve: kafka_ping: unable to dial: dial tcp: lookup <redacted>: i/o timeout
exited with code 1

Metadata

Metadata

Assignees

Labels

bugBug: bug, exception

Type

Projects

Status

βœ… Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions