Web API and websocket for Large Language Models in C++
- clone the repo and cd into it:
git clone https://github.com/monatis/llm-api.git && cd llm-api- Install
asiofor the web API.
apt install libasio-devNote: You can also run scripts/install-dev.sh to install asio
(and websocat additionally, in order to test websocket on the terminal).
- Build with cmake and make:
mkdir build && cd build
cmake -DLLM_NATIVE=ON ..
make -j4Find the executible in ./bin/llm-api.
- Download gpt4all-j model if you haven't already:
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O ./bin/ggml-gpt4all-j.bin- Run the executible:
./bin/llm-apiNote: You can pass the model path with -m argument if it's located elsewhere. See below for more options.
./bin/llm-api -h
usage: ./bin/llm-api [options]
options:
-h, --help show this help message and exit
-v, --verbose log generation in stdout (default: disabled)
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
--port PORT port to listen on (default: 8080)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 200)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.9)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: ggml-gpt4all-j.bin)
- Improve multi-user experience
- Integrate StableLM model.
- Add embedding endpoint.
- Provide a chain mechanism.
- Integrate a chat UI.
- Add Docker support.
- Extend readme and docs