apps

Apps examples with GenerationExecutor / LLM API

Python chat

chat.py provides a small examples to play around with your model. Before running, install additional requirements with pip install -r ./requirements.txt. Then you can run it with

python3 ./chat.py --model <model_dir> --tokenizer <tokenizer_path> --tp_size <tp_size>

Please run python3 ./chat.py --help for more information on the arguments.

Note that, the model_dir could accept the following formats:

A path to a built TRT-LLM engine
A path to a local HuggingFace model
The name of a HuggingFace model such as "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

FastAPI server

NOTE: This FastAPI-based server is only an example for demonstrating the usage of TensorRT LLM LLM API. It is not intended for production use. For production, use the trtllm-serve command. The server exposes OpenAI compatible API endpoints.

Install the additional requirements

pip install -r ./requirements.txt

Start the server

Start the server with:

python3 ./fastapi_server.py <model_dir>&

Note that, the model_dir could accept same formats as in the chat example. If you are using an engine build with trtllm-build, remember to pass the tokenizer path:

python3 ./fastapi_server.py <model_dir> --tokenizer <tokenizer_dir>&

To get more information on all the arguments, please run python3 ./fastapi_server.py --help.

Send requests

You can pass request arguments like "max_tokens", "top_p", "top_k" in your JSON dict:

curl http://localhost:8000/generate -d '{"prompt": "In this example,", "max_tokens": 8}'

You can also use the streaming interface with:

curl http://localhost:8000/generate -d '{"prompt": "In this example,", "max_tokens": 8, "streaming": true}'

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
chat.py		chat.py
fastapi_server.py		fastapi_server.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Apps examples with GenerationExecutor / LLM API

Python chat

FastAPI server

Install the additional requirements

Start the server

Send requests

FilesExpand file tree

apps

Directory actions

More options

Directory actions

More options

Latest commit

History

apps

Folders and files

parent directory

README.md

Apps examples with GenerationExecutor / LLM API

Python chat

FastAPI server

Install the additional requirements

Start the server

Send requests