Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gregorycotton/knot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knot banner image

🪢 Knot

Knot is my client-server TUI for running LLMs locally.

  • Convos automatically saved to SQLite DB.
  • Install models to run locally.
  • Markdown rendering, tables, syntax highlighting, etc.
  • Load .md files into chat context.
  • Generate and download summaries of a given conversation.

See the TODO section at the bottom of the README for known errors and future improvments.


Details

Knot consists of two component that run simultaneously, server.py, the inference server which uses llama-cpp-python, and knot.py, the TUI client that renders the streaming response.



Installation

To get started follow the below.

Note: Currently optimized for Apple Silicon (M1/M2/M3) with Metal GPU acceleration.


1. Clone & prepare

Clone the repository and create your virtual environment:

# Create the project folder
mkdir knot
cd knot

# Create your venv
python3 -m venv knot
source knot/bin/activate

Note: If you don't have it already, this will create a folder titled convo in your project root as well as a SQLite DB in the folder for your conversation history.


2. Install engine with Metal support

Compile llama-cpp-python with Metal support to use Mac GPU:

CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python

Note: If you are on Linux/Windows, you should be able to just omit the CMAKE_ARGS part.


3. Install dependancies

UI and utility libs need installation still:

pip install rich prompt_toolkit huggingface_hub

4. Use the thing

WIP still so lots for me to fix but can be played around with now.

Note: By default you will have Phi 3 mini ready to be downloaded upon running for the first time. ~sub-2.5GB in size, but you can enter details to load a different model.

Run the server in the background and the client in the foreground:

# In one terminal tab
python3 server.py

# In your second terminal tab
python3 knot.py

Note: It's cumbersome to cd into your directory, activate your venv and run two python files whenever you want to use knot. I've created a custom command by adding an alias to my shell config file so that I can run knot from anywhere in my terminal to automatically activate the venv and launch the application. Example:

#!/bin/bash

cd /{YOUR_FILE_PATH}/knot || exit

source {YOUR_VENV_NAME}/bin/activate || exit

cleanup() {
    kill $SERVER_PID 2>/dev/null
}

trap cleanup EXIT INT TERM

python3 server.py > server.log 2>&1 &

SERVER_PID=$!

sleep 1

python3 knot.py

Make this executable and alias it to knot in your shell config.


Comand reference

Type normally to chat or start a line with : to enter a command. Quick overview:

Command Action
:new Start a new conversation and clear the current context
:history List past conversations
:open <id> Open a conversation by its partial ID
:delete <id> Delete a conversation permenantly
:load <file> Load a text/md file as context
:summary Save a summary of this chat to Downloads
:search <h/d/w> <term> Search conversation history (h), device (d), or web URLs (w)
:ask <question> Web RAG Search
:job <cmd> Assign tasks to models (list, set summary, set title, set ask)
:model <cmd> Manage active / downloaded models (add, select, list)
:quit Exit Knot
:cot <on/off> Toggle display of reasoning/thoughts
:help View possible commands

To set a model's job using the :job command, use :job set <task> <model_ID>. Currently, the two tasks available for designating models to are summary (ie. the :summary command) and title (ie. generating a title for the conversation). For example:

  • job set title 1 ensures all titles are generated using the model with the ID of 1.
  • job set summary 2 ensures all conversations are summarized using then model with the ID of 2.

Note: I would currently reccomend using a non-CoT model for these jobs (see known errors).


TODO:

Known errors

  • :summary command sometimes doesn't work well for GPT OSS converations due to CoT.
  • Height gets fixed/standard terminal scrolling gets locked on some long answers. Think this is a limitation of Rich, need to look into it.

Future improvements

  • Add ability to "branch" a new conversation from any previous message.
  • Need to explore most expedient way to display maths/proofs, etc.
  • Explore possibility of web search and/or search over local documents.
  • Set path for accessing models, DB, summary export, etc. in app.

About

LLM gateway for local models in the terminal.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages