start_http_server: start HTTPServer in main thread before handing off to daemon thread #102

rud · 2016-09-27T10:41:23Z

This means if you call start_http_server with an already used port/addr combination, you can rescue the OSError, and try for a different port. With this, it becomes possible to probe a range of ports to find one that is available.

Idea for this comes from https://github.com/korfuri/django-prometheus/blob/2b6eac500cc9bea402a45f04ca7b63189889785a/django_prometheus/exports.py#L77-L88 - which makes it easy to let each uwsgi worker listen on a port by automatically trying a whole range of ports and picking one that works.

… to daemon thread This means if you call start_http_server with an already used port/addr combination, you can rescue the OSError, and try for a different port. With this, it becomes possible to probe a range of ports to find one that is available. Idea for this comes from https://github.com/korfuri/django-prometheus/blob/2b6eac500cc9bea402a45f04ca7b63189889785a/django_prometheus/exports.py#L77-L88 - which makes it easy to let each uwsgi worker listen on a port by automatically trying a whole range of ports and picking one that works.

rud · 2016-09-27T10:45:17Z

A thing worth discussing: it would be entirely possible to rescue the OSError and just return a boolean to indicate whether starting the listener succeeded. Given that the previous behaviour was to silently not serve metrics for the current process (instead of halting with an exception like this introduces), a smaller change would be to add a rescue here - and existing setups with duplicate metrics ports would still fail silently.

Thoughts?

brian-brazil · 2016-09-27T10:47:41Z

There's also been requests to be able to stop the httpserver, so it'd be best to consider that with this change.

rud · 2016-09-27T12:16:01Z

Hi @brian-brazil,

As I see it, one option would be to return the handle of the daemon thread from the start_http_server method, or save it off somewhere for future reference. Would that be a handy enough API?

Alternatively, the return value could be the thread-handle on successful service start, None on startup failure. What do you think?

rud · 2016-09-27T12:27:08Z

I see, you probably refer to #76 - but as I read that issue, the user ended up finding a different way to start/stop their listener?

rud · 2016-09-27T13:35:18Z

@brian-brazil fwiw, I've created code in my project to just reuse the prometheus_client.MetricsHandler to setup my own HTTP listener just like I want it (with automatic probing of a number of ports until a free one is found).

Thank you for making the prometheus_client.MetricsHandler reusable externally, I can only imagine how many variations people need for running their services.

Feel free to close this pull-request if the design is not something you can use going forward.

brian-brazil · 2016-09-27T17:37:39Z

The PR as-is is good idea, the question is more around rehandling exceptions. The Pythonic way is to just let them be thrown.

(with automatic probing of a number of ports until a free one is found).

That smells a bit, your config management should be telling you what port number to use.

Thank you for making the prometheus_client.MetricsHandler reusable externally, I can only imagine how many variations people need for running their services.

That's the idea.

rud · 2016-09-27T21:35:40Z

I concur with the port probing being potentially problematic, but given that a uwsgi master process spawns a number of child processes at various times, it seems to make sense they each pick a port in a known range where they make metrics available. I'm using the Telegraf daemon to connect to each of these ports and collect the available metrics. There is a simplicity to this I like - and since to the best of my knowledge each uwsgi worker-process should not care too much about its individual identity/number in the flock, it also makes sense that it cannot have a distinct static prometheus listener port.

rud · 2016-09-27T21:36:45Z

So, any specific changes you'd like to see to this code at this time?

brian-brazil · 2016-09-27T21:57:25Z

Sounds like you should be reading #66, that's not a safe way to do multi-process.

rud · 2016-09-28T07:46:44Z

Thank you for your suggestion.

In #66 I see a way of having workers report their metrics up the process tree to the master process, but it is currently a work in progress. I do not see any discussion of safety properties that preclude individual workers from exposing their own metrics directly, but it may very well be too implicit for me to see it. Is it related to the point in the startup process where workers are spawned, that some internal structure may or may not be correctly setup yet?

If this is veering off-topic I do apologise.

brian-brazil · 2016-09-28T07:55:33Z

The issue is that there might be state in a dead worker that you want to preserve past its demise, and that correctly collapsing per-process counters requires data from dead workers too.

rud · 2016-09-28T08:07:57Z

Agreed, that would indeed be loss of recent state in this design. And since the uwsgi master process will much too happily kill -9 workers that are past their prime - dataloss will be guaranteed in that case if the collector did not sweep in soon enough.

To me that comes to the trade off of sending metrics somewhere external immediately while handling an incoming request (to guarantee survivability of metrics, but adding latency), or to buffer up measurements within each worker-process for a configurable short time, and gather the values in bulk every X seconds. All in-memory queueing means the potential for dataloss, but at much greater throughput. Adding additional communication complexity to the master/worker hierarchy as #66 does seems like it might be a good tradeoff, but things do become more difficult to reason about.

I'll close this merge-request now, as I think I have a grasp of the tradeoffs I'm making, and I think you are right the solution provided here is not generally applicable.

rud closed this Sep 28, 2016

rud deleted the feature/start-listener-in-main-thread-to-allow-retry branch January 14, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

start_http_server: start HTTPServer in main thread before handing off to daemon thread #102

start_http_server: start HTTPServer in main thread before handing off to daemon thread #102

Uh oh!

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016 •

edited

Loading

Uh oh!

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 28, 2016

Uh oh!

brian-brazil commented Sep 28, 2016

Uh oh!

rud commented Sep 28, 2016

Uh oh!

Uh oh!

start_http_server: start HTTPServer in main thread before handing off to daemon thread #102

start_http_server: start HTTPServer in main thread before handing off to daemon thread #102

Uh oh!

Conversation

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

rud commented Sep 27, 2016

Uh oh!

brian-brazil commented Sep 27, 2016

Uh oh!

rud commented Sep 28, 2016

Uh oh!

brian-brazil commented Sep 28, 2016

Uh oh!

rud commented Sep 28, 2016

Uh oh!

Uh oh!

rud commented Sep 27, 2016 •

edited

Loading