Segmentation fault with auto instrumentation in asgi applications #2030

potatochip · 2021-08-10T19:57:52Z

Describe your environment I am experiencing a segmentation fault when using auto instrumentation with any asgi application. I have tested with both starlette and fastapi.

Steps to reproduce
Can be created with the following repo. Steps in README.md. https://github.com/potatochip/opentelemetry-segfault-example

What is the expected behavior?
No segfault

What is the actual behavior?
Segfault

Additional context
Load testing sometimes needs to happen a couple times before a segfault occurs, but usually happens on the first try.

owais · 2021-08-10T20:52:47Z

Thanks for the report and reproducible example. Could you please also attach a traceback of the failure?

potatochip · 2021-08-10T21:01:51Z

The container exits with 139. These are the logs

[2021-08-10 20:59:12 +0000] [8] [INFO] Starting gunicorn 20.1.0
[2021-08-10 20:59:12 +0000] [8] [INFO] Listening at: http://0.0.0.0:8000 (8)
[2021-08-10 20:59:12 +0000] [8] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-08-10 20:59:12 +0000] [14] [INFO] Booting worker with pid: 14
Overriding of current TracerProvider is not allowed
[2021-08-10 20:59:12 +0000] [14] [INFO] Started server process [14]
[2021-08-10 20:59:12 +0000] [14] [INFO] Waiting for application startup.
[2021-08-10 20:59:12 +0000] [14] [INFO] Application startup complete.
Queue is full, likely spans will be dropped.
Queue is full, likely spans will be dropped.
scripts/auto: line 2:     8 Segmentation fault      opentelemetry-instrument gunicorn -c gunicorn_conf.py auto_main:app

potatochip · 2021-08-10T21:03:59Z

Immediate difference that I see is that the message "Queue is full, likely spans will be dropped." occurs twice with automatic instrumentation, but only once with manual instrumentation.

owais · 2021-08-10T22:10:24Z

Does this only when auto instrumenting and not with manual setup? From the logs, it does look like that the tracing pipeline is set up twice. Not sure if that is the issue though. Can you please try setting OTEL_BSP_MAX_QUEUE_SIZE to a very large value (it's 2048 by default) and see if that makes the segfault go away or take a longer time to happen? Alternatively, you could set it to a very low number such as 10 and see if it happens almost immediately. It could help us narrow down the issue a bit.

potatochip · 2021-08-11T16:44:44Z

This only happens with auto instrumentation, not manual. I set OTEL_BSP_MAX_QUEUE_SIZE=999999 and segfault still occurs. No change in the amount of time it takes. The only difference is that the queue full message does not appear before the segfault occurs.

owais · 2021-08-11T18:08:53Z

Thanks. That means the queue is not the issue. My initial naive guess is that somehow we are instrumenting things multiple times and that is causing weird memory access issues. Ill try to reproduce with you example and debug later this week.

srikanthccv · 2021-08-12T01:27:16Z

I tried to run this on my machine. I could recreate the issue. Here is the stack trace with more info

srikanthc@FL-LPT-388s-MacBook-Pro opentelemetry-segfault-example % docker run -it -p 8000:8000 --entrypoint scripts/auto otseg
[2021-08-12 01:21:14 +0000] [7] [INFO] Starting gunicorn 20.1.0
[2021-08-12 01:21:14 +0000] [7] [INFO] Listening at: http://0.0.0.0:8000 (7)
[2021-08-12 01:21:14 +0000] [7] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-08-12 01:21:14 +0000] [13] [INFO] Booting worker with pid: 13
WARNING:opentelemetry.trace:Overriding of current TracerProvider is not allowed
[2021-08-12 01:21:14 +0000] [13] [INFO] Started server process [13]
[2021-08-12 01:21:14 +0000] [13] [INFO] Waiting for application startup.
[2021-08-12 01:21:14 +0000] [13] [INFO] Application startup complete.
Fatal Python error: Segmentation fault

Thread 0x00007fc5877fe700 (most recent call first):
  File "/usr/local/lib/python3.8/threading.py", line 306 in wait
  File "/usr/local/lib/python3.8/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 230 in worker
  File "/usr/local/lib/python3.8/threading.py", line 870 in run
  File "/usr/local/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/local/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007fc58f627740 (most recent call first):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 357 in sleep
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 209 in run
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run
  File "/usr/local/bin/gunicorn", line 8 in <module>
scripts/auto: line 2:     7 Segmentation fault      opentelemetry-instrument gunicorn -c gunicorn_conf.py auto_main:app

owais · 2021-08-12T11:04:53Z

opentelemetry-instrument gunicorn -c gunicorn_conf.py auto_main:app

gunicorn_config.py already sets up the tracing pipeline and I see opentelemtry-distro is installed as well meaning the instrument command would definitely setup two pipeline. We should update docs to not recommend using the instrument command with gunicorn but instead enable the instrumentations in gunicorn_config.py. I think that should solve this specific issue.

That said, we still want to support setting up multiple pipelines when a user really wants it so we should figure out if setting up multiple pipelines is actually what causes the issue and fix it.

srikanthccv · 2021-09-03T13:00:08Z

Interestingly I am unable to reproduce this issue if I don't use docker.

rhymes · 2022-12-14T17:49:11Z

I'm experiencing a similar error in production:

Dec 14 05:57:28 PM  Fatal Python error: Segmentation fault
Dec 14 05:57:28 PM
Dec 14 05:57:28 PM  Thread 0x00007f1e0332f700 (most recent call first):
Dec 14 05:57:28 PM    File "/opt/render/project/python/Python-3.9.15/lib/python3.9/threading.py", line 316 in wait
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 254 in worker
Dec 14 05:57:28 PM    File "/opt/render/project/python/Python-3.9.15/lib/python3.9/threading.py", line 917 in run
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/sentry_sdk/integrations/threading.py", line 67 in run
Dec 14 05:57:28 PM    File "/opt/render/project/python/Python-3.9.15/lib/python3.9/threading.py", line 980 in _bootstrap_inner
Dec 14 05:57:28 PM    File "/opt/render/project/python/Python-3.9.15/lib/python3.9/threading.py", line 937 in _bootstrap
Dec 14 05:57:28 PM
Dec 14 05:57:28 PM  Thread 0x00007f1e0e3e6740 (most recent call first):
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/gunicorn/arbiter.py", line 357 in sleep
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/gunicorn/arbiter.py", line 209 in run
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/gunicorn/app/base.py", line 72 in run
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/gunicorn/app/base.py", line 231 in run
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 67 in run
Dec 14 05:57:28 PM    File "/opt/render/project/src/.venv/bin/gunicorn", line 8 in <module>

I have seen this quite often during the day, though at irregular intervals. We do have a quite complex instrumentation setup that's entirely manual.

Versions:

Python 3.9.15
gunicorn 20.1.0 with default settings (so 1 worker process and 1 thread)

And these are the OpenTelemetry packages we use directly:

opentelemetry-api                        1.7.1
opentelemetry-exporter-otlp-proto-grpc   1.7.1
opentelemetry-instrumentation-botocore   0.26b1
opentelemetry-instrumentation-celery     0.26b1
opentelemetry-instrumentation-flask      0.26b1
opentelemetry-instrumentation-redis      0.26b1
opentelemetry-instrumentation-requests   0.26b1
opentelemetry-instrumentation-sqlalchemy 0.26b1
opentelemetry-sdk                        1.7.1

I checked the changelog for the newer versions of the library but I haven't seen anything related.

Any ideas?

srikanthccv · 2022-12-18T06:31:57Z

There wasn't an exact fix, but there were many related fixes that happened. Can you check if this still happens on 1.15.0 and 0.36b0?

rhymes · 2022-12-19T15:37:31Z

@srikanthccv thanks for the tip! I will attempt to upgrade and report back :-) It'll take a bit as it's not a smooth upgrade

rhymes · 2022-12-21T14:39:18Z

Hi @srikanthccv,

unfortunately it didn't work. It might be due to OpenTelemetry's interaction with Sentry SDK within the Flask app, I've also tried to invert the order of the initialization of the two but nothing gained. I'll try using a CPU and memory profiler to see if I can get to the bottom of this.

Thanks in the meantime!

dimaqq · 2025-05-30T02:55:12Z

I think I have managed to hit same (or very similar)

Thread 0x0000704b0f63f6c0 (most recent call first):
  File "/home/dima/.local/share/uv/python/cpython-3.9.22-linux-x86_64-gnu/lib/python3.9/threading.py", line 316 in wait
  File "/code/operator/.tox/unit/lib/python3.9/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 258 in worker
  File "/home/dima/.local/share/uv/python/cpython-3.9.22-linux-x86_64-gnu/lib/python3.9/threading.py", line 917 in run

Full trace:
https://gist.github.com/dimaqq/c0ab0cae051a2177c28ec4a560dff183

My case is similar to OP, because the unit tests are distributed by pytest-xdist, meaning that worker processes are spawned, similar to gunicorn.

dimaqq · 2025-05-30T06:22:55Z

Given that the issue occurred in older pythons that don't accept bug fixes any more, and could not be reproduced in newer pythons, I recommend that this issue be closed.

potatochip added the bug Something isn't working label Aug 10, 2021

owais added the triaged label Aug 10, 2021

srikanthccv mentioned this issue Nov 2, 2021

Make batch processor fork aware and reinit when needed #2242

Merged

This was referenced May 30, 2025

Segmentation fault while running unit tests (Python 3.8, 3.9; pytest-xdist; opentelemetry-sdk) canonical/operator#1771

Open

How to debug worker 'gwN' crashed while running ... pytest-dev/pytest-xdist#1209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault with auto instrumentation in asgi applications #2030

Segmentation fault with auto instrumentation in asgi applications #2030

potatochip commented Aug 10, 2021

owais commented Aug 10, 2021

Uh oh!

potatochip commented Aug 10, 2021

Uh oh!

potatochip commented Aug 10, 2021

Uh oh!

owais commented Aug 10, 2021

Uh oh!

potatochip commented Aug 11, 2021 •

edited

Loading

Uh oh!

owais commented Aug 11, 2021

Uh oh!

srikanthccv commented Aug 12, 2021

Uh oh!

owais commented Aug 12, 2021

Uh oh!

srikanthccv commented Sep 3, 2021

Uh oh!

rhymes commented Dec 14, 2022 •

edited

Loading

Uh oh!

srikanthccv commented Dec 18, 2022

Uh oh!

rhymes commented Dec 19, 2022

Uh oh!

rhymes commented Dec 21, 2022

Uh oh!

dimaqq commented May 30, 2025

Uh oh!

dimaqq commented May 30, 2025

Uh oh!

Segmentation fault with auto instrumentation in asgi applications #2030

Segmentation fault with auto instrumentation in asgi applications #2030

Comments

potatochip commented Aug 10, 2021

owais commented Aug 10, 2021

Uh oh!

potatochip commented Aug 10, 2021

Uh oh!

potatochip commented Aug 10, 2021

Uh oh!

owais commented Aug 10, 2021

Uh oh!

potatochip commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

owais commented Aug 11, 2021

Uh oh!

srikanthccv commented Aug 12, 2021

Uh oh!

owais commented Aug 12, 2021

Uh oh!

srikanthccv commented Sep 3, 2021

Uh oh!

rhymes commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srikanthccv commented Dec 18, 2022

Uh oh!

rhymes commented Dec 19, 2022

Uh oh!

rhymes commented Dec 21, 2022

Uh oh!

dimaqq commented May 30, 2025

Uh oh!

dimaqq commented May 30, 2025

Uh oh!

potatochip commented Aug 11, 2021 •

edited

Loading

rhymes commented Dec 14, 2022 •

edited

Loading