Cortex.cpp

Documentation - API Reference - Changelog - Bug reports - Discord

⚠️ Cortex.cpp is currently in Development. This documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

About

Cortex.cpp is a Local AI engine that is used to run and customize LLMs. Cortex can be deployed as a standalone server, or integrated into apps like Jan.ai.

Cortex.cpp is a multi-engine that uses llama.cpp as the default engine but also supports the following:

Installation

To install Cortex.cpp, download the installer for your operating system from the following options:

Version Type	Windows	MacOS		Linux
Stable (Recommended)	cortexcpp.exe	Intel	M1/M2/M3/M4	cortexcpp.deb	cortexcpp.AppImage
Beta Build	cortexcpp.exe	Intel	M1/M2/M3/M4	cortexcpp.deb	cortexcpp.AppImage
Nightly Build	cortexcpp.exe	Intel	M1/M2/M3/M4	cortexcpp.deb	cortexcpp.AppImage

Note: You can also build Cortex.cpp from source by following the steps here.

Libraries

Quickstart

To run and chat with a model in Cortex.cpp:

# Start the Cortex.cpp server
cortex

# Start a model
cortex run <model_id>:[engine_name]

Built-in Model Library

Cortex.cpp supports a list of models available on Cortex Hub.

Here are example of models that you can use based on each supported engine:

`llama.cpp`

Model ID	Variant (Branch)	Model size	CLI command
codestral	22b-gguf	22B	`cortex run codestral:22b-gguf`
command-r	35b-gguf	35B	`cortex run command-r:35b-gguf`
gemma	7b-gguf	7B	`cortex run gemma:7b-gguf`
llama3	gguf	8B	`cortex run llama3:gguf`
llama3.1	gguf	8B	`cortex run llama3.1:gguf`
mistral	7b-gguf	7B	`cortex run mistral:7b-gguf`
mixtral	7x8b-gguf	46.7B	`cortex run mixtral:7x8b-gguf`
openhermes-2.5	7b-gguf	7B	`cortex run openhermes-2.5:7b-gguf`
phi3	medium-gguf	14B - 4k ctx len	`cortex run phi3:medium-gguf`
phi3	mini-gguf	3.82B - 4k ctx len	`cortex run phi3:mini-gguf`
qwen2	7b-gguf	7B	`cortex run qwen2:7b-gguf`
tinyllama	1b-gguf	1.1B	`cortex run tinyllama:1b-gguf`

`ONNX`

Model ID	Variant (Branch)	Model size	CLI command
gemma	7b-onnx	7B	`cortex run gemma:7b-onnx`
llama3	onnx	8B	`cortex run llama3:onnx`
mistral	7b-onnx	7B	`cortex run mistral:7b-onnx`
openhermes-2.5	7b-onnx	7B	`cortex run openhermes-2.5:7b-onnx`
phi3	mini-onnx	3.82B - 4k ctx len	`cortex run phi3:mini-onnx`
phi3	medium-onnx	14B - 4k ctx len	`cortex run phi3:medium-onnx`

`TensorRT-LLM`

Model ID	Variant (Branch)	Model size	CLI command
llama3	8b-tensorrt-llm-windows-ampere	8B	`cortex run llama3:8b-tensorrt-llm-windows-ampere`
llama3	8b-tensorrt-llm-linux-ampere	8B	`cortex run llama3:8b-tensorrt-llm-linux-ampere`
llama3	8b-tensorrt-llm-linux-ada	8B	`cortex run llama3:8b-tensorrt-llm-linux-ada`
llama3	8b-tensorrt-llm-windows-ada	8B	`cortex run llama3:8b-tensorrt-llm-windows-ada`
mistral	7b-tensorrt-llm-linux-ampere	7B	`cortex run mistral:7b-tensorrt-llm-linux-ampere`
mistral	7b-tensorrt-llm-windows-ampere	7B	`cortex run mistral:7b-tensorrt-llm-windows-ampere`
mistral	7b-tensorrt-llm-linux-ada	7B	`cortex run mistral:7b-tensorrt-llm-linux-ada`
mistral	7b-tensorrt-llm-windows-ada	7B	`cortex run mistral:7b-tensorrt-llm-windows-ada`
openhermes-2.5	7b-tensorrt-llm-windows-ampere	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`
openhermes-2.5	7b-tensorrt-llm-windows-ada	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`
openhermes-2.5	7b-tensorrt-llm-linux-ada	7B	`cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

Cortex.cpp CLI Commands

Command Description	Command Example
Start Cortex.cpp Server	`cortex`
Chat with a Model	`cortex chat [options] [model_id] [message]`
Embeddings	`cortex embeddings [options] [model_id] [message]`
Pull a Model	`cortex pull <model_id>`
Download and Start a Model	`cortex run [options] [model_id]:[engine]`
Get Model Details	`cortex models get <model_id>`
List Models	`cortex models list [options]`
Delete a Model	`cortex models delete <model_id>`
Start a Model	`cortex models start [model_id]`
Stop a Model	`cortex models stop <model_id>`
Update a Model	`cortex models update [options] <model_id>`
Get Engine Details	`cortex engines get <engine_name>`
Install an Engine	`cortex engines install <engine_name> [options]`
List Engines	`cortex engines list [options]`
Uninnstall an Engine	`cortex engines uninstall <engine_name> [options]`
Show Model Information	`cortex ps`
Update Cortex.cpp	`cortex update [options]`

Note For a more detailed CLI Reference documentation, please see here.

REST API

Cortex.cpp has a REST API that runs at localhost:3928.

Pull a Model

curl --request POST \
  --url http://localhost:3928/v1/models/{model_id}/pull

Start a Model

curl --request POST \
  --url http://localhost:3928/v1/models/{model_id}/start \
  --header 'Content-Type: application/json' \
  --data '{
  "prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
  "stop": [],
  "ngl": 4096,
  "ctx_len": 4096,
  "cpu_threads": 10,
  "n_batch": 2048,
  "caching_enabled": true,
  "grp_attn_n": 1,
  "grp_attn_w": 512,
  "mlock": false,
  "flash_attn": true,
  "cache_type": "f16",
  "use_mmap": true,
  "engine": "llamacpp"
}'

Chat with a Model

curl http://localhost:3928/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "",
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    },
  ],
  "model": "mistral",
  "stream": true,
  "max_tokens": 1,
  "stop": [
      null
  ],
  "frequency_penalty": 1,
  "presence_penalty": 1,
  "temperature": 1,
  "top_p": 1
}'

Stop a Model

curl --request POST \
  --url http://localhost:3928/v1/models/mistral/stop

Note Check our API documentation for a full list of available endpoints.

Build from Source

Windows

Clone the Cortex.cpp repository here.
Navigate to the engine > vcpkg folder.
Configure the vpkg:

cd vcpkg
./bootstrap-vcpkg.bat
vcpkg install

Build the Cortex.cpp inside the build folder:

mkdir build
cd build
cmake .. -DBUILD_SHARED_LIBS=OFF -DCMAKE_TOOLCHAIN_FILE=path_to_vcpkg_folder/vcpkg/scripts/buildsystems/vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static

Use Visual Studio with the C++ development kit to build the project using the files generated in the build folder.
Verify that Cortex.cpp is installed correctly by getting help information.

# Get the help information
cortex -h

MacOS

Clone the Cortex.cpp repository here.
Navigate to the engine > vcpkg folder.
Configure the vpkg:

cd vcpkg
./bootstrap-vcpkg.sh
vcpkg install

Build the Cortex.cpp inside the build folder:

mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=path_to_vcpkg_folder/vcpkg/scripts/buildsystems/vcpkg.cmake
make -j4

Use Visual Studio with the C++ development kit to build the project using the files generated in the build folder.
Verify that Cortex.cpp is installed correctly by getting help information.

# Get the help information
cortex -h

Linux

Clone the Cortex.cpp repository here.
Navigate to the engine > vcpkg folder.
Configure the vpkg:

cd vcpkg
./bootstrap-vcpkg.sh
vcpkg install

Build the Cortex.cpp inside the build folder:

mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=path_to_vcpkg_folder/vcpkg/scripts/buildsystems/vcpkg.cmake
make -j4

Use Visual Studio with the C++ development kit to build the project using the files generated in the build folder.
Verify that Cortex.cpp is installed correctly by getting help information.

# Get the help information
cortex -h

Contact Support

For support, please file a GitHub ticket.
For questions, join our Discord here.
For long-form inquiries, please email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1,392 Commits
.github		.github
assets		assets
coverage		coverage
docker		docker
engine		engine
platform		platform
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cortex.cpp

About

Installation

Libraries

Quickstart

Built-in Model Library

`llama.cpp`

`ONNX`

`TensorRT-LLM`

Cortex.cpp CLI Commands

REST API

Pull a Model

Start a Model

Chat with a Model

Stop a Model

Build from Source

Windows

MacOS

Linux

Contact Support

About

Uh oh!

Releases

Packages

Languages

License

5ky9uy/cortex.cpp

Folders and files

Latest commit

History

Repository files navigation

Cortex.cpp

About

Installation

Libraries

Quickstart

Built-in Model Library

llama.cpp

ONNX

TensorRT-LLM

Cortex.cpp CLI Commands

REST API

Pull a Model

Start a Model

Chat with a Model

Stop a Model

Build from Source

Windows

MacOS

Linux

Contact Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`llama.cpp`

`ONNX`

`TensorRT-LLM`

Packages