Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[server] Update server routes to be compliant with MLServer#1237

Merged
dsikka merged 16 commits into
mainfrom
match_mlserver
Oct 11, 2023
Merged

[server] Update server routes to be compliant with MLServer#1237
dsikka merged 16 commits into
mainfrom
match_mlserver

Conversation

@dsikka

@dsikka dsikka commented Sep 11, 2023

Copy link
Copy Markdown
Contributor

Summary:

Testing

  • Updated all the server tests
  • Tested locally using custom routes as well as routes built for the user given the model name

Sample Config:

num_cores: 2
num_workers: 2
endpoints:
  - task: question_answering
    model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
    route: some_route/pruned

This will now produce the following endpoints:
Screenshot 2023-10-10 at 11 59 39 AM

@dsikka dsikka marked this pull request as ready for review September 11, 2023 21:10
Comment thread src/deepsparse/server/server.py Outdated
Comment thread src/deepsparse/server/server.py Outdated

@bfineran bfineran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - as discussed offline, agree with both @dsikka and @Satrat it's time we move route creation to a separate file / fleshed out system

Comment thread src/deepsparse/server/server.py
dbogunowicz
dbogunowicz previously approved these changes Sep 13, 2023

@dbogunowicz dbogunowicz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, clean

bfineran
bfineran previously approved these changes Oct 6, 2023

@bfineran bfineran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - as mentioned before we need to sync w/ QA before landing

@mgoin mgoin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would cause breaking changes to any application built using the old endpoint structure. I think if you could make a README or document showing how to transition current examples of usage, that would be helpful for other teams and users to have a summary of the changes.
For instance, how should we update this Digital Ocean getting started guide? https://marketplace.digitalocean.com/apps/deepsparse-inference-runtime

* refactor server for different integrations; additional functionality for chat completion streaming and non streaming

* further refactor server

* add support such that openai can host multiple models

* update all tests

* fix output for n > 1

* add inline comment explaining ProxyPipeline

* [server] Update OpenAI Model Support (#1300)

* update server

* allow users to send requests with new models

* use v1; move around baseroutes

* add openai path

* PR comments

* clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs
Comment thread src/deepsparse/server/cli.py Outdated
Comment thread src/deepsparse/server/openai_server.py
Comment thread src/deepsparse/server/output.py
@dsikka dsikka requested review from Satrat, bfineran and mgoin October 10, 2023 15:57
bfineran
bfineran previously approved these changes Oct 10, 2023
Satrat
Satrat previously approved these changes Oct 10, 2023
@dsikka dsikka dismissed stale reviews from Satrat and bfineran via c550799 October 10, 2023 19:11
@dsikka dsikka requested a review from bfineran October 10, 2023 19:36
@dsikka dsikka merged commit 639e9e4 into main Oct 11, 2023
@dsikka dsikka deleted the match_mlserver branch October 11, 2023 12:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants