Releases: KolosalAI/Kolosal
Releases · KolosalAI/Kolosal
v0.1.9.1
v0.1.9
What's Changed
- Fixed server concurrency crash
- Rework font system
- Added emoji
- Font auto DPI scaling
- Font scaling with scroll
- Text selection
- Multi-model UX fix
- Switching to DirectX10
Full Changelog: v0.1.8...v0.1.9
v0.1.8
- Better memory management in model manager
- multi-model deployment
- smart resource approximator for model memory allocation and kv cache allocation
- added completions endpoint on server
- fixed url opening through markdown
- added status bar footer
- added system prompt editor
- fix UI layouting
- added auto-scroll
- added feature to add your own model easily
v0.1.7
What's Changed
- Fixed installer to clean install the application, avoiding weird bugs
- Fixed application and server crashed on large prompt
- Added control of maximum number of tokens to be processed on each iteration frame
- Fixed chat name can't contain some symbols
- Allowing to rename duplicate chat name
- Fixed pasted long text on system prompt crashed the application
- Acrylic background
- Refactor AI model config
- Added section of downloaded model
- Sort model list based on alphabet
- Added search in model manager model
- Gemma 3 support!
Full Changelog: v0.1.6...v0.1.7
v0.1.6
- Introducing Kolosal AI Server, easily managed server within kolosal AI's application.
- Added phi 4 and phi 4 mini models.
- Added continous batching mechanism for decoding
- Added kv cache management mechanism for batch decoding
- Added model loading settings within server tabs
- Added tab management system
- Added automatic title generation for each chat history
v0.1.5
What's Changed
- Context shifting with StreamingLLM (https://arxiv.org/abs/2309.17453) to unlimited generation
- Llimit max context to 4096 to make it more memory efficient and faster
- Added stop generation
- Added regenerate button
- Redesign progress bar
- Model loading handled asyncrhonously
- Added unload model button
- Huge refactor
- Fix code block rendering glitches
- Setting max new tokens to 0 will result in unlimited generation with context shifting
- Fix application crash when delete a chat
Full Changelog: v0.1.4.1...v0.1.5
v0.1.4
- added deepseek r1 support
- added markdown rendering
- added tps stat
- added cancel download button
- added delete model button
- fixed model duplication issue
- fix engine memory leak
- added thinking UI
- add automation to detect number of thread to use
- fix last selected model issue
- add fallback on loading model failed
v0.1.3
New feature
- Added persistence KV Cache method: Persistence KV Cache allow Kolosal to have model kv cache state saved for each chat history, making processing previous chat to be instant.
Bug fixing
- Fix model parameter is not correctly passed to the model
- Fix deleting a chat, crashed the application
- Fix switching model resulting the model to generate in different chat
- Fix does not detect AMD GPU
- Fix EOS not detected on a finetuned model with chatml format
- Fix force Close in chat feature
- Fix performance issue on GPU
New model
- qwen 2.5 code 0.5b - 14b
- qwen 2.5 14b
v0.1.2
What's Changed
- Fix GPU support, now can detect automatically nvidia/amd gpu in your device and select it
- Added clear chat and delete chat buttons
- Fix application shortcut (removed fn + left arrow key short cut to open kolosal)
- Added Qwen2.5 models 0.5 - 7B
Full Changelog: v0.1.0...v0.1.2
v0.1.1
What's Changed
- Added Windows Installer
- Added Sahabat AI Llama 3 8B
- Added Sahabat AI Gemma 2 9B
- Added Gemma 2 2B
- Added Gemma 2 9B
- Added Llama 3.1 8B
- Added 8bit quantization support
- Update quantization selection UI to be radio button
Full Changelog: v0.1...v0.1.1