New feature

Added Qwen 3 Support

What's Changed

Fixed server concurrency crash
Rework font system
Added emoji
Font auto DPI scaling
Font scaling with scroll
Text selection
Multi-model UX fix
Switching to DirectX10

Full Changelog: v0.1.8...v0.1.9

Better memory management in model manager
multi-model deployment
smart resource approximator for model memory allocation and kv cache allocation
added completions endpoint on server
fixed url opening through markdown
added status bar footer
added system prompt editor
fix UI layouting
added auto-scroll
added feature to add your own model easily

What's Changed

Fixed installer to clean install the application, avoiding weird bugs
Fixed application and server crashed on large prompt
Added control of maximum number of tokens to be processed on each iteration frame
Fixed chat name can't contain some symbols
Allowing to rename duplicate chat name
Fixed pasted long text on system prompt crashed the application
Acrylic background
Refactor AI model config
Added section of downloaded model
Sort model list based on alphabet
Added search in model manager model
Gemma 3 support!

Full Changelog: v0.1.6...v0.1.7

Introducing Kolosal AI Server, easily managed server within kolosal AI's application.
Added phi 4 and phi 4 mini models.
Added continous batching mechanism for decoding
Added kv cache management mechanism for batch decoding
Added model loading settings within server tabs
Added tab management system
Added automatic title generation for each chat history

What's Changed

Context shifting with StreamingLLM (https://arxiv.org/abs/2309.17453) to unlimited generation
Llimit max context to 4096 to make it more memory efficient and faster
Added stop generation
Added regenerate button
Redesign progress bar
Model loading handled asyncrhonously
Added unload model button
Huge refactor
Fix code block rendering glitches
Setting max new tokens to 0 will result in unlimited generation with context shifting
Fix application crash when delete a chat

Full Changelog: v0.1.4.1...v0.1.5

added deepseek r1 support
added markdown rendering
added tps stat
added cancel download button
added delete model button
fixed model duplication issue
fix engine memory leak
added thinking UI
add automation to detect number of thread to use
fix last selected model issue
add fallback on loading model failed

New feature

Added persistence KV Cache method: Persistence KV Cache allow Kolosal to have model kv cache state saved for each chat history, making processing previous chat to be instant.

Bug fixing

Fix model parameter is not correctly passed to the model
Fix deleting a chat, crashed the application
Fix switching model resulting the model to generate in different chat
Fix does not detect AMD GPU
Fix EOS not detected on a finetuned model with chatml format
Fix force Close in chat feature
Fix performance issue on GPU

New model

qwen 2.5 code 0.5b - 14b
qwen 2.5 14b

What's Changed

Fix GPU support, now can detect automatically nvidia/amd gpu in your device and select it
Added clear chat and delete chat buttons
Fix application shortcut (removed fn + left arrow key short cut to open kolosal)
Added Qwen2.5 models 0.5 - 7B

Full Changelog: v0.1.0...v0.1.2

What's Changed

Added Windows Installer
Added Sahabat AI Llama 3 8B
Added Sahabat AI Gemma 2 9B
Added Gemma 2 2B
Added Gemma 2 9B
Added Llama 3.1 8B
Added 8bit quantization support
Update quantization selection UI to be radio button

Full Changelog: v0.1...v0.1.1

Uh oh!

Releases: KolosalAI/Kolosal

v0.1.9.1

Uh oh!

v0.1.9

What's Changed

Uh oh!

v0.1.8

Uh oh!

v0.1.7

What's Changed

Uh oh!

v0.1.6

Uh oh!

v0.1.5

What's Changed

Uh oh!

v0.1.4

Uh oh!

v0.1.3

New feature

Bug fixing

New model

Uh oh!

v0.1.2

What's Changed

Uh oh!

v0.1.1

What's Changed

Uh oh!