Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Open
21 changes: 19 additions & 2 deletions plugins/outputs/opensearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,23 @@ plugin ordering. See [CONFIGURATION.md][CONFIGURATION.md] for more details.

[CONFIGURATION.md]: ../../../docs/CONFIGURATION.md#plugins

## Startup error behavior options <!-- @/docs/includes/startup_error_behavior.md -->

In addition to the plugin-specific and global configuration settings the plugin
supports options for specifying the behavior when experiencing startup errors
using the `startup_error_behavior` setting. Available values are:

- `error`: Telegraf with stop and exit in case of startup errors. This is the
default behavior.
- `ignore`: Telegraf will ignore startup errors for this plugin and disables it
but continues processing for all other plugins.
- `retry`: Telegraf will try to startup the plugin in every gather or write
cycle in case of startup errors. The plugin is disabled until
the startup succeeds.
- `probe`: Telegraf will probe the plugin's function (if possible) and disables
the plugin in case probing fails. If the plugin does not support
probing, Telegraf will behave as if `ignore` was set instead.

## Configuration

```toml @sample.conf
Expand Down Expand Up @@ -129,9 +146,9 @@ plugin ordering. See [CONFIGURATION.md][CONFIGURATION.md] for more details.

### Required parameters

* `urls`: A list containing the full HTTP URL of one or more nodes from your
- `urls`: A list containing the full HTTP URL of one or more nodes from your
OpenSearch instance.
* `index_name`: The target index for metrics. You can use the date format
- `index_name`: The target index for metrics. You can use the date format

For example: "telegraf-{{.Time.Format \"2006-01-02\"}}" would set it to
"telegraf-2023-07-27". You can also specify metric name (`{{ .Name }}`), tag
Expand Down
6 changes: 5 additions & 1 deletion plugins/outputs/opensearch/opensearch.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (

"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/config"
"github.com/influxdata/telegraf/internal"
"github.com/influxdata/telegraf/internal/choice"
"github.com/influxdata/telegraf/plugins/common/tls"
"github.com/influxdata/telegraf/plugins/outputs"
Expand Down Expand Up @@ -143,7 +144,10 @@ func (o *Opensearch) Connect() error {

_, err = o.osClient.Ping()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping() is called without the context, which is what's hanging CI.

ctx, cancel := context.WithTimeout(context.Background(), time.Duration(o.Timeout))
defer cancel()

...

_, err = o.osClient.Ping()

The ctx with the 5s timeout is never passed to Ping(), so the opensearch-go client performs the ping without any deadline.

The new TestConnectionIssueAtStartup uses an unstarted httptest.Server whose listener accepts TCP connections but never reads them, so the round trip blocks forever.

Reproduced locally:

panic: test timed out after 30s
opensearchapi.PingRequest.Do(...) at opensearch.go:138

That's the same root cause as the four failing test-go-* CI jobs (Too long with no output (exceeded 10m0s)).

Suggested fix:

_, err = o.osClient.Ping(o.osClient.Ping.WithContext(ctx))

Ping.WithContext is provided by opensearchapi/api.ping.go.

This also makes the retry semantics meaningful at runtime. Without a deadline, each retry attempt can hang indefinitely against a half-open peer, which defeats the purpose of this PR.

if err != nil {
return fmt.Errorf("unable to ping OpenSearch server: %w", err)
return &internal.StartupError{
Err: fmt.Errorf("unable to ping server: %w", err),
Comment thread
Obeyed marked this conversation as resolved.
Outdated
Retry: true,
}
}

return nil
Expand Down
56 changes: 56 additions & 0 deletions plugins/outputs/opensearch/opensearch_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ import (
"github.com/testcontainers/testcontainers-go/wait"

"github.com/influxdata/telegraf/config"
"github.com/influxdata/telegraf/internal"
"github.com/influxdata/telegraf/models"
"github.com/influxdata/telegraf/testutil"
)

Expand Down Expand Up @@ -308,6 +310,60 @@ func TestDisconnectedServerOnConnect(t *testing.T) {
require.Error(t, e.Connect())
}

func TestConnectionIssueAtStartup(t *testing.T) {
// Test case for https://github.com/influxdata/telegraf/issues/18783
Comment thread
Obeyed marked this conversation as resolved.
Outdated
if testing.Short() {
t.Skip("Skipping integration test in short mode")
}
Comment thread
Obeyed marked this conversation as resolved.
Outdated

ts := httptest.NewServer(http.HandlerFunc(func(_ http.ResponseWriter, _ *http.Request) {}))

urls := []string{"http://" + ts.Listener.Addr().String()}

plugin := &Opensearch{
URLs: urls,
IndexName: `{{.Tag "tag1"}}-{{.Time.Format "2006-01-02"}}`,
Timeout: config.Duration(time.Second * 5),
AuthBearerToken: config.NewSecret([]byte("0123456789abcdef")),
Log: testutil.Logger{},
}
var err error
plugin.indexTmpl, err = template.New("index").Parse(plugin.IndexName)
require.NoError(t, err)

// Close the server before we try to connect
ts.Close()

// Create a model to be able to use the startup retry strategy
model, err := models.NewRunningOutput(
plugin,
&models.OutputConfig{
Name: "opensearch",
StartupErrorBehavior: "retry",
},
1000, 1000,
)
require.NoError(t, err)
require.NoError(t, model.Init())

// The connect call should not fail even though the server is closed due to the "retry" strategy
require.NoError(t, model.Connect())

// Writing metrics in this state should fail since server is closed
metrics := testutil.MockMetrics()
for _, m := range metrics {
model.AddMetric(m)
}
require.ErrorIs(t, model.WriteBatch(), internal.ErrNotConnected)

// Start the server and check that writes succeed
ts.Start()
Comment thread
Obeyed marked this conversation as resolved.
require.NoError(t, model.WriteBatch())

ts.Close()
model.Close()
Comment thread
Obeyed marked this conversation as resolved.
Outdated
}

func TestDisconnectedServerOnWrite(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
Expand Down
Loading