Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(outputs.opensearch): Implement startup-error-behavior options#18784

Open
Obeyed wants to merge 10 commits into
influxdata:masterfrom
Obeyed:dont-fail-startup-when-outputs-opensearch-not-reachable
Open

feat(outputs.opensearch): Implement startup-error-behavior options#18784
Obeyed wants to merge 10 commits into
influxdata:masterfrom
Obeyed:dont-fail-startup-when-outputs-opensearch-not-reachable

Conversation

@Obeyed
Copy link
Copy Markdown

@Obeyed Obeyed commented Apr 23, 2026

Summary

Telegraf will fail on start if there's no network path to the output opensearch service. In scenarios where the local device occasionally has no internet, this shouldn't hard fail telegraf's start sequence. Telegraf can start collecting and buffering metrics to send to opensearch when possible.

The current implementation works fine if the connection to opensearch is available on boot, then the buffering of metrics works as expected

Checklist

Related issues

resolves #18783

@telegraf-tiger
Copy link
Copy Markdown
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@Obeyed Obeyed changed the title Log instead of hard failure when ping fails Log instead of hard failure when ping to opensearch fails on start Apr 23, 2026
@Obeyed
Copy link
Copy Markdown
Author

Obeyed commented Apr 23, 2026

!signed-cla

@Obeyed Obeyed force-pushed the dont-fail-startup-when-outputs-opensearch-not-reachable branch from 85c1716 to e97c635 Compare April 23, 2026 09:15
When telegraf starts locally the remote opensearch service may not be
reachable. In some cases were the network is occasionally not available
it shouldn't prevent telegraf from starting. Instead allow telegraf to
start collecting and potentially buffering metrics to send when
possible.
@Obeyed Obeyed force-pushed the dont-fail-startup-when-outputs-opensearch-not-reachable branch from e97c635 to 95462bd Compare April 23, 2026 09:24
@Obeyed Obeyed changed the title Log instead of hard failure when ping to opensearch fails on start fix: log instead of fail to boot when ping to opensearch fails Apr 23, 2026
@telegraf-tiger telegraf-tiger Bot added the fix pr to fix corresponding bug label Apr 23, 2026
Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @Obeyed!

Instead of unconditionally ignore the error I suggest implementing Telegrafs startup-error-behavior spec for the plugin. I.e. you need to return a StartupError with the Retry flag set. This allows the user to specify what should happen if the connection cannot be established.

@srebhan srebhan self-assigned this Apr 27, 2026
@Obeyed
Copy link
Copy Markdown
Author

Obeyed commented Apr 28, 2026

Thanks, @srebhan. Appreciate the pointer! Let me know if my latest approach is as expected.

@Obeyed Obeyed requested a review from srebhan April 28, 2026 11:39
@srebhan srebhan changed the title fix: log instead of fail to boot when ping to opensearch fails feat(outputs.opensearch): Implement startup-error-behavior options Apr 28, 2026
@telegraf-tiger telegraf-tiger Bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels Apr 28, 2026
Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @Obeyed! Just a few minor comments.

Comment thread plugins/outputs/opensearch/opensearch.go Outdated
Comment thread plugins/outputs/opensearch/opensearch_test.go Outdated
Comment thread plugins/outputs/opensearch/opensearch_test.go Outdated
@srebhan srebhan added area/elasticsearch and removed fix pr to fix corresponding bug labels Apr 28, 2026
@telegraf-tiger
Copy link
Copy Markdown
Contributor

@Obeyed Obeyed requested a review from srebhan April 29, 2026 06:47
Copy link
Copy Markdown
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Obeyed!

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Apr 29, 2026
@srebhan srebhan assigned skartikey and unassigned srebhan Apr 29, 2026
Copy link
Copy Markdown
Contributor

@skartikey skartikey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Obeyed Thanks! Some comments for you to take a look!

Comment thread plugins/outputs/opensearch/opensearch_test.go Outdated
Comment thread plugins/outputs/opensearch/opensearch_test.go
Comment thread plugins/outputs/opensearch/opensearch.go Outdated
Comment thread plugins/outputs/opensearch/opensearch.go Outdated
Comment thread plugins/outputs/opensearch/opensearch_test.go Outdated
Comment thread plugins/outputs/opensearch/opensearch_test.go Outdated
Obeyed added 2 commits May 2, 2026 20:39
"unnecessaryDefer: defer model.Close() is placed just before return"
@Obeyed Obeyed requested a review from skartikey May 2, 2026 19:29
@Obeyed
Copy link
Copy Markdown
Author

Obeyed commented May 2, 2026

@skartikey, thanks for the guidance! I need some help understanding the failing tests. Do you have any pointers on how to resolve the following?

The test-go-[xx] output is cut off, not sure if it's being killed by circleci for taking too long?

Too long with no output (exceeded 10m0s): context deadline exceeded

On the test-integration output, it seems the nats plugin's tests failed for some reason. The test-integration tests passed at commit 11a4262 but failed on the next d9f9d2a (which didn't make any relevant changes).

FAIL github.com/influxdata/telegraf/plugins/outputs/nats 5.723s
[..]
ok github.com/influxdata/telegraf/plugins/outputs/opensearch 368.849s

o.Log.Errorf("error creating OpenSearch client: %v", err)
}

_, err = o.osClient.Ping()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping() is called without the context, which is what's hanging CI.

ctx, cancel := context.WithTimeout(context.Background(), time.Duration(o.Timeout))
defer cancel()

...

_, err = o.osClient.Ping()

The ctx with the 5s timeout is never passed to Ping(), so the opensearch-go client performs the ping without any deadline.

The new TestConnectionIssueAtStartup uses an unstarted httptest.Server whose listener accepts TCP connections but never reads them, so the round trip blocks forever.

Reproduced locally:

panic: test timed out after 30s
opensearchapi.PingRequest.Do(...) at opensearch.go:138

That's the same root cause as the four failing test-go-* CI jobs (Too long with no output (exceeded 10m0s)).

Suggested fix:

_, err = o.osClient.Ping(o.osClient.Ping.WithContext(ctx))

Ping.WithContext is provided by opensearchapi/api.ping.go.

This also makes the retry semantics meaningful at runtime. Without a deadline, each retry attempt can hang indefinitely against a half-open peer, which defeats the purpose of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/elasticsearch feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to connect to [outputs.opensearch], retrying in 15s, error was "unable to ping OpenSearch server"

3 participants