Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit #4564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

DylanRussell
Copy link
Contributor

@DylanRussell DylanRussell commented Apr 30, 2025

Description

Make timeout encompass retries and backoffs, rather than being applied per HTTP request or gRPC RPC.

Added a +/- 20% jitter to each backoff (both gRPC/HTTP).

Cleanup up the exporter code some. I got rid of a pointless 32 second sleep we would do after our last retry attempt before failing.

fixes: #3309, #4043, #2663

#4183 -- similar to this PR and what's discussed in #4043, but I implemented it in as minimal a way as I could..

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • [ x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Lots of unit tests.

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • [x ] No.

Checklist:

  • [ x] Followed the style guidelines of this project
  • [ x] Changelogs have been updated
  • [x ] Unit tests have been added
  • [x ] Documentation has been updated

Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a basic example of sending span/metric to a non existent collector through grpc:

Before (timeout not respected)
$ OTEL_EXPORTER_OTLP_TIMEOUT=5 uv run repro.py
2025-05-09 01:19:57 INFO [test] Hello world
2025-05-09 01:19:57 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 1s.
2025-05-09 01:19:58 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 2s.
2025-05-09 01:20:00 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 4s.

^[[B2025-05-09 01:20:02 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 1s.
2025-05-09 01:20:03 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 2s.
2025-05-09 01:20:04 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 8s.
2025-05-09 01:20:05 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 4s.
2025-05-09 01:20:09 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting traces to localhost:4317, retrying in 8s.
2025-05-09 01:20:12 WARNING [opentelemetry.exporter.otlp.proto.grpc.exporter] Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 16s.
Now (timeout is respected)
$ OTEL_EXPORTER_OTLP_TIMEOUT=5 uv run repro.py
2025-05-09 01:22:43 INFO [test] Hello world
2025-05-09 01:22:48 ERROR [opentelemetry.exporter.otlp.proto.grpc.exporter] Failed to export metrics to localhost:4317, error code: StatusCode.DEADLINE_EXCEEDED
2025-05-09 01:22:53 ERROR [opentelemetry.exporter.otlp.proto.grpc.exporter] Failed to export traces to localhost:4317, error code: StatusCode.DEADLINE_EXCEEDED

When exporting metrics/traces in the same program I noticed that:
Is this expected?

$ OTEL_EXPORTER_OTLP_TRACES_TIMEOUT=5 uv run repro.py
2025-05-09 15:37:54 INFO [test] Hello world
2025-05-09 15:38:04 ERROR [opentelemetry.exporter.otlp.proto.grpc.exporter] Failed to export traces to localhost:4317, error code: StatusCode.DEADLINE_EXCEEDED
2025-05-09 15:38:04 ERROR [opentelemetry.exporter.otlp.proto.grpc.exporter] Failed to export metrics to localhost:4317, error code: StatusCode.DEADLINE_EXCEEDED

Copy link
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aabmass
Copy link
Member

aabmass commented May 14, 2025

I updated the LogExporter and SpanExporter interfaces to include timeout_millis (MetricExporter already has it), this isn't a breaking change because @abstractmethod just checks that classes implementin have a method with a particular name, it doesn't check method parameters at all.. It does cause a pylint error, but I think that's a good thing.

I didn't see this in the code, if you updated can you just update the description as well?

@DylanRussell
Copy link
Contributor Author

Updated the description

@DylanRussell DylanRussell changed the title Add a timeout param to all OTLP grpc / http export calls -- fixed merge conflicts Switch gRPC exporters to use official gRPC retry config. Make timeout encompass retries/backoffs May 22, 2025
@DylanRussell
Copy link
Contributor Author

DylanRussell commented May 29, 2025

Nevermind, not going to use gRPC retry config for now..

@DylanRussell DylanRussell changed the title Switch gRPC exporters to use official gRPC retry config. Make timeout encompass retries/backoffs Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit Jun 5, 2025
@DylanRussell
Copy link
Contributor Author

@emdneto any additional comments on this since I made changes to remove the retry config ? Otherwise I think this can be merged

…try/exporter/otlp/proto/grpc/exporter.py

Co-authored-by: Emídio Neto <[email protected]>
Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DylanRussell running griffe I got:

griffe check opentelemetry-exporter-otlp-proto-http -a main -s exporter
exporter/opentelemetry-exporter-otlp-proto-http/tests/test_proto_span_exporter.py:0: TestOTLPSpanExporter.test_exponential_backoff: Public object was removed
exporter/opentelemetry-exporter-otlp-proto-http/tests/test_proto_log_exporter.py:0: TestOTLPHTTPLogExporter.test_exponential_backoff: Public object was removed
exporter/opentelemetry-exporter-otlp-proto-http/tests/metrics/test_otlp_metrics_exporter.py:0: TestOTLPMetricExporter.test_exponential_backoff: Public object was removed
exporter/opentelemetry-exporter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py:205: OTLPMetricExporter.export(timeout_millis): Parameter default was changed: 10000 -> None

@aabmass
Copy link
Member

aabmass commented Jun 10, 2025

Good to go @emdneto ?

@@ -16,9 +20,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- typecheck: add sdk/resources and drop mypy
([#4578](https://github.com/open-telemetry/opentelemetry-python/pull/4578))
- Refactor `BatchLogRecordProcessor` to simplify code and make the control flow more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this?

@emdneto emdneto self-requested a review June 10, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exporters shutdown takes longer then a minute when failing to send metrics/traces
3 participants