CUDA & CPU versions of llama-cpp-python with server configuration and multiple model support.
188
This is a build of Llama-cpp-pytyhon. The tag indicates the build and the supported processor. For example:
llama-cpp-python:v0.2.77-cuda is llama-cpp-python version 0.2.77 built with CUDA support (~5GB image)llama-cpp-python:v0.2.77-cpu is llama-cpp-python version 0.2.77 built with CPU only support (~1.8GB)The CUDA version can run on CPUs. However, the container is larger, So select the CPU version if you don't have an Nvidia Graphics Card / GPU. The default port is 11434. You can change this by adding -e PORT=8000 or adding the desired port to the server.config file (second example)
Running the CUDA image:
docker run -it -d -p 11434:11434 --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model name> -v <local direcctory on host>:/var/model <image name>
To provide a server config file use the CONFIG_FILE environment variable.
Server Config
Server Configuration using a server.config file. The file name is arbitrary. Put this file in the folder that you mount to the container. (i.e. ).
For more information on parameters for configuring the server see:
server.config and show how to configure/load multiple models.Content type
Image
Digest
sha256:07623ccbf…
Size
1.6 GB
Last updated
over 1 year ago
Requires Docker Desktop 4.37.1 or later.