File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -326,6 +326,21 @@ def get_prediction(image_bytes):
326326# Next steps
327327# --------------
328328#
329+ # Before putting the server into production, we need to solve two issues:
330+ #
331+ # - One request is served at a time, it is much slower compared to a local batch prediction
332+ # - It will cause CUDA out-of-memory error on GPU when there are large concurrent requests
333+ #
334+ # We can cache user requests in batches and schedule the prediction process.
335+ # Follow `service-streamer tutorial <https://github.com/ShannonAI/service-streamer/wiki/Vision-Recognition-Service-with-Flask-and-service-streamer>`_
336+ # you will solve these issues with a few lines of code.
337+ #
338+ # .. Note ::
339+ # `service-streamer` <https://github.com/ShannonAI/service-streamer>`_ is a middleware for web service
340+ # of machine learning applications. Queued requests from users are sampled into mini-batches. Service-streamer
341+ # can significantly enhance the overall performance of the web server by improving GPU utilization.
342+ #
343+ #
329344# The server we wrote is quite trivial and and may not do everything
330345# you need for your production application. So, here are some things you
331346# can do to make it better:
You can’t perform that action at this time.
0 commit comments