jflachman/llama-cpp-python

By jflachman

•Updated over 1 year ago

CUDA & CPU versions of llama-cpp-python with server configuration and multiple model support.

Image

Machine learning & AI

188

Overview Tags

jflachman/llama-cpp-python repository overview

This is a build of Llama-cpp-pytyhon. The tag indicates the build and the supported processor. For example:

llama-cpp-python:v0.2.77-cuda is llama-cpp-python version 0.2.77 built with CUDA support (~5GB image)
llama-cpp-python:v0.2.77-cpu is llama-cpp-python version 0.2.77 built with CPU only support (~1.8GB)

The CUDA version can run on CPUs. However, the container is larger, So select the CPU version if you don't have an Nvidia Graphics Card / GPU. The default port is 11434. You can change this by adding -e PORT=8000 or adding the desired port to the server.config file (second example)

Running the CUDA image:

docker run -it -d -p 11434:11434 --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/<model name> -v <local direcctory on host>:/var/model <image name>

To provide a server config file use the CONFIG_FILE environment variable.

Server Config

Server Configuration using a server.config file. The file name is arbitrary. Put this file in the folder that you mount to the container. (i.e. ).

For more information on parameters for configuring the server see:

Server Options Reference⁠
Configuration and Multi-Model Support⁠ - has a sample server.config and show how to configure/load multiple models.
https://llama-cpp-python.readthedocs.io/en/latest/server/⁠

Tag summary

Recent tags

Content type

Image

Digest

sha256:07623ccbf…

Size

1.6 GB

Last updated

over 1 year ago

Run in Docker Desktop

Requires Docker Desktop 4.37.1 or later.