Knot is my client-server TUI for running LLMs locally.
- Convos automatically saved to SQLite DB.
- Install models to run locally.
- Markdown rendering, tables, syntax highlighting, etc.
- Load .md files into chat context.
- Generate and download summaries of a given conversation.
See the TODO section at the bottom of the README for known errors and future improvments.
Knot consists of two component that run simultaneously, server.py, the inference server which uses llama-cpp-python, and knot.py, the TUI client that renders the streaming response.
- Engine: Llama-cpp-python (python bindings for llama.cpp),
- UI: Rich and Prompt Toolkit,
- SQLite DB
To get started follow the below.
Note: Currently optimized for Apple Silicon (M1/M2/M3) with Metal GPU acceleration.
Clone the repository and create your virtual environment:
# Create the project folder
mkdir knot
cd knot
# Create your venv
python3 -m venv knot
source knot/bin/activateNote: If you don't have it already, this will create a folder titled convo in your project root as well as a SQLite DB in the folder for your conversation history.
Compile llama-cpp-python with Metal support to use Mac GPU:
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-pythonNote: If you are on Linux/Windows, you should be able to just omit the CMAKE_ARGS part.
UI and utility libs need installation still:
pip install rich prompt_toolkit huggingface_hubWIP still so lots for me to fix but can be played around with now.
Note: By default you will have Phi 3 mini ready to be downloaded upon running for the first time. ~sub-2.5GB in size, but you can enter details to load a different model.
Run the server in the background and the client in the foreground:
# In one terminal tab
python3 server.py
# In your second terminal tab
python3 knot.pyNote: It's cumbersome to cd into your directory, activate your venv and run two python files whenever you want to use knot. I've created a custom command by adding an alias to my shell config file so that I can run knot from anywhere in my terminal to automatically activate the venv and launch the application. Example:
#!/bin/bash
cd /{YOUR_FILE_PATH}/knot || exit
source {YOUR_VENV_NAME}/bin/activate || exit
cleanup() {
kill $SERVER_PID 2>/dev/null
}
trap cleanup EXIT INT TERM
python3 server.py > server.log 2>&1 &
SERVER_PID=$!
sleep 1
python3 knot.pyMake this executable and alias it to knot in your shell config.
Type normally to chat or start a line with : to enter a command. Quick overview:
| Command | Action |
|---|---|
:new |
Start a new conversation and clear the current context |
:history |
List past conversations |
:open <id> |
Open a conversation by its partial ID |
:delete <id> |
Delete a conversation permenantly |
:load <file> |
Load a text/md file as context |
:summary |
Save a summary of this chat to Downloads |
:search <h/d/w> <term> |
Search conversation history (h), device (d), or web URLs (w) |
:ask <question> |
Web RAG Search |
:job <cmd> |
Assign tasks to models (list, set summary, set title, set ask) |
:model <cmd> |
Manage active / downloaded models (add, select, list) |
:quit |
Exit Knot |
:cot <on/off> |
Toggle display of reasoning/thoughts |
:help |
View possible commands |
To set a model's job using the :job command, use :job set <task> <model_ID>. Currently, the two tasks available for designating models to are summary (ie. the :summary command) and title (ie. generating a title for the conversation). For example:
job set title 1ensures all titles are generated using the model with the ID of1.job set summary 2ensures all conversations are summarized using then model with the ID of2.
Note: I would currently reccomend using a non-CoT model for these jobs (see known errors).
:summarycommand sometimes doesn't work well for GPT OSS converations due to CoT.- Height gets fixed/standard terminal scrolling gets locked on some long answers. Think this is a limitation of Rich, need to look into it.
- Add ability to "branch" a new conversation from any previous message.
- Need to explore most expedient way to display maths/proofs, etc.
- Explore possibility of web search and/or search over local documents.
- Set path for accessing models, DB, summary export, etc. in app.
