Releases: thushan/olla
olla-v0.0.20
This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.
What's Changed
- feat: Backend llamacpp by @thushan in #73
- feat: anthropic / message logger (development only) by @thushan in #77
- feat: Anthropic Message format Support by @thushan in #76
- Bump github.com/pterm/pterm from 0.12.81 to 0.12.82 by @dependabot[bot] in #75
- Bump golang.org/x/time from 0.13.0 to 0.14.0 by @dependabot[bot] in #72
- prepare: v0.0.20 by @thushan in #78
Full Changelog: v0.0.19...v0.0.20
olla-v0.0.19
This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.
Encourage all to upgrade to this release.
What's Changed
- feat: backend/sglang by @thushan in #69
- feat: backend/lemonade by @thushan in #70
- fixes: October 2025 performance improvements by @thushan in #71
Full Changelog: v0.0.18...v0.0.19
Changelog
- 554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
- 4d3e12d adds parser
- dcf3c52 adds the parser and converter
- 267dcd2 atomic catalog store
- 716e57f avoid alloc on response times
- 203ce4a cleanup
- 9cb11c9 constants for linting, will add more later
- c7a7fc9 doc refresh
- 7aeb09f documentation
- 6748a50 documentation updates
- c688fce factory too
- 6ab4a15 fixed warnings and missed sglang reference
- 16fa9d5 handler bits
- ccc8f58 hotpath: reduce allocations
- e2be222 initial SGLang work
- 4d3d3e4 initial configuration based on what's available
- 12a7d14 initial lemonade bits
- 1d65097 note about format
- 1b9ffd6 openai
- c091490 perf: avoid resolvereference call if endpoint URL has no path
- 985d8eb perf: avoid GC pressure and preallocate
- fbaece8 perf: reduce string allocations
- dcb9050 race fix: method instead of module level
- e012a30 reduce hashing and allocations
- 21de3da refactor and slightly different way to infer capabilities
- 3b19336 refactor to use benchmark
- 77a4b8c refeactor test
- 53e83a6 rune fix
- 3fcf132 slightly more complex fix to improve allocations in unified memory registry
- 319f442 update docs and make supported backends a table.
- 35b6cab update readme
- 837dc42 use map rather than MapOf (deprecated)
- f9e8a69 wire up handler too and initial profile
olla-v0.0.18
This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.
What's Changed
- chore: Consolidate Converters by @thushan in #58
- September 2025 updates by @thushan in #68
- Bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #60
- refactor: Proxy Configurations by @thushan in #59
- Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #63
- Bump actions/setup-go from 5 to 6 by @dependabot[bot] in #62
- Bump actions/configure-pages from 4 to 5 by @dependabot[bot] in #55
- Bump actions/checkout from 4 to 5 by @dependabot[bot] in #54
Changelog
- b5be024 Bump actions/checkout from 4 to 5
- 8141cb2 Bump actions/configure-pages from 4 to 5
- 720cc7c Bump actions/setup-go from 5 to 6
- 29afbd9 Bump actions/setup-python from 5 to 6
- 11c06d3 Bump actions/upload-pages-artifact from 3 to 4
- 10efb5a September 2025 updates
- 9c03ec9 cache time
- c4bbb98 fix remaining convertors
- dbf6dee initial consolidation of Proxy Configuration
- 0d27e9b introduce a base converter for conversion to avoid duplication
- d7bec85 update the olla service config and fallback too
- 6951956 update workflows.
- a83c209 use the specific settings and fallback if unavailable
Full Changelog: v0.0.17...v0.0.18
olla-v0.0.17
This release brings support for litellm and also the ability to filter (generically) within the config - with include/exclude globs. For now, this allows you to exclude profiles you don't want loaded and exclude models from an endpoint.
Learn more about filters.
What's Changed
- docs: comparisons by @thushan in #53
- feat: backend/litellm by @thushan in #56
- feat: filtering adapter by @thushan in #57
Full Changelog: v0.0.16...v0.0.17
Changelog
- 7b7c96e Comparison docs for Olla from becky & wilson
- b9b7a5d doc updates
- ee3b07f fix default ports for vllm and lmstudio
- 33c2a04 fix links
- 140833f implements checks for filter breakages
- e18f829 initial bits of a global filter config
- 978eb03 initial litellm profile
- edb7a66 model and profile filterinf, tests and refactor glob to be reusable a bit more
- bdb66d7 readme refresher
- 846f37f update docs
- be278a1 update docs
olla-v0.0.16
This release has two big features.
Improved Recovery & Transparent Healing
Health is monitored during every request and if a request routes to a just failed endpoint (before the healthcheck has run) it will transparently move to another healthy endpoint that contains the model. Transparent to the caller, but logs in the CLI and headers are available to know what happened.
We think this makes olla pretty awesome.
Intercepts Stats from Endpoints
We also capture the last packet from the stream/payload and pull out metrics for common things from endpoints (TPS etc) and track other metrics that are (currently) shown in the logs.
Later these will be used for a new robusta balancer.
What's Changed
Full Changelog: v0.0.15...v0.0.16
Changelog
- 2b9860c Add documentation for provider metrics feature
- db2bab4 Adds VHS tapes.
- 76bfdd3 Constant'ine.
- 909dfb9 Fix compilation errors and shadow variable warning after merge
- 428d968 alloc changes
- 95d345d avoid blocking healthchecks during recovery, tricky.
- 3ea69ae cleanup factory and add test basic coverage
- c5d2922 cleanup & refactor
- a414331 cleanup constant use and new retry constnats
- 29a03c8 coderabbit feedback, routing strategy fixes
- e49731f coderabbit feedback about routing
- 6f17a2c configuration updates
- c1bec59 constants and reetry logic
- 2beb7f8 doc refresh
- df99db7 doc updates
- 6b11bbc doc updates for trailer
- 14a8653 doc: max-retries still in config overview
- 3e6e272 docs updated
- 7e122f4 documentation for fallback types and routing to fallback_behaviour by default
- e22cee0 documentation updates
- 988b753 doh we miss target url
- ab05356 fix gitignore not to ignore olla but rather olla in the root
- 68234fb fix n+1 logic issue
- 9e40a30 impelemts routing similar to scout
- 14963c3 initial request metrics
- 4308adb lab test fixes for profiles
- 982cdc6 lab: float issues
- e3558a6 make the jsonpath a bit more robust from lab tests
- 696ceb8 new routing strategy for registry
- 832ced6 profiling revealed some performance issues with custom written parser, adopting gjson and expr
- 0728c23 rabbit feedback around discovery issues, but refactored at the same time
- ec0a15b reduce allocations and cleanup constants
- 55ccb0f refactor a bit and move to core/metrics
- 9cb6e06 remove from intro
- 3f0e306 remove integration test no longer used post impl.
- 417a76d removed debug in hotpath and try to compile expressions at compiletime
- 2dde757 reorg docs and add more detail
- 369958c retry logic for post endpoint health changes
- feb17cb separate contexts to avoid failure issues
- 99acb83 test updates
- 9a8b53a test scripts
- 7b6ed53 tweaks
- 09cac43 update docs after changes in profiles
olla-v0.0.15
tldr;
This release adds proper cross platform docker images thanks to @ghostdevv and adds support for vLLM natively, proxy profiles so you can target streaming or buffering proxies, it finally also adds documentation (via mkdocs) and a few fixes and improvements
What's Changed
- feat: proxy profiles by @thushan in #42
- chore: constants by @thushan in #43
- feat: backend/vllm by @thushan in #44
- feat: add arm64 docker builds and better cross platform support by @ghostdevv in #46
- feat: docs by @thushan in #48
- feat: security & log consolidation by @thushan in #49
New Contributors
- @ghostdevv made their first contribution in #46
Full Changelog: v0.0.14...v0.0.15
Changelog
- 39d77cf Adds Proxy Profile to configuration
- d2be61a Consolidate logging for proxies
- 9b9fbd1 Revert "rabbit feedback of adding goos for context"
- 23f2634 Update readme.md
- c95b460 add more comprehensive tests and constants for content types
- 1af819c add proxy profile as an env var
- ca3faa5 adds detecting stream type for 'auto' and profiles properly
- 9c822a3 adds global constants properly for content / request bits.
- aa36c90 adds global constants properly for content / request bits.
- 7f66f19 adds goos/version to status handlers
- 2ec111e avoid build validation for docs
- af9941a avoid multiple instances of vllm responses
- 39ad794 avoid non go files
- 75c3282 change scripts and other files for standard behaviour
- c7eb705 claude update
- ff4f405 coderabbit feedback about having a bin, can't bin that feedback can we?
- 23ed305 detect streaming mode from scout
- 86768c3 feat: add arm64 docker builds
- 8f7acd4 findVLLMNativeName naively checks slashes, better to remove that
- 9bdbe14 fix TUI issues for long version numbers
- b7a5bb4 fix: normalize line endings for docs and workflow files
- 05b2fe0 fleshing out things
- e15ef9c forgot we can test arm64 with qemu
- 0e7d631 initial mkdocs-material integration
- 4f4fc05 initial streaming vs buffered tests
- 2711f82 initial test case infra
- cac160a initial updates
- e4ee050 initial vllm implementation
- 40cf1bf just show basic version info
- 1b81eb8 line ending normalisation
- cb61406 lint & allow local dev profiles to use anyhost to avoid breakage
- e533da2 missed type of Trailer header!
- 0a0805f msising configuration
- 2e42b9b mssing config
- 08ceebe rabbit feedback of adding goos for context
- 1670ec2 randomise port for test runs
- 661adba readme update
- c025112 readme update
- 2d8cb97 readme update
- f23e247 remove URLs from being visible in endpoints
- c27e364 renaming buffered to standard
- 3ad6b49 renormalise
- 1087472 revert the fmt issue fix
- 21530ad run test results in test/results
- d11fdd7 show proxy setup in the status
- a988c45 slightly better way to handle status
- e94aefe update CI with builds across all platforms
- 2ae68ee update converters to use constants
- 684185a update default configuration
- bd579d7 update docs for OLLA_PROXY_PROFILE
- 2ac4e87 update readme
- 9fe1904 update readme for native vllm
- 8716a45 update readme for profile
- 6daef17 update remaining constants
- 05870bd update remaining constants
- 8393afd update test scripts for vllm support (mirrors existing).
- 1c261e9 update vllm profile
- 9c27c77 vLLM integration test that uses OLLA_TEST_SERVER_VLLM var for test
olla-v0.0.14
tldr;
This release addresses an issue with streaming tokens from LLM backends (it was buffering instead of streaming) and adds proper OpenWebUI support to Olla
What's Changed
Changelog
- 07f1eb1 Update readme.md
- d7a050b add cleanup time and show the uptime after shutdown
- 69122be bugfix: accidentally left :ro for logs folder in example
- 29493fb comment some more in the default config
- 632128a implements clean output logs for triage without ANSI.
- de31335 line endings?
- d086582 test scripts and infra for streaming repro
- 996ddbc tweak: make the PRIORITY -> PRI just to save some horizontal space
- d379795 we need to flush the reponsewriter properly in order to push tokens
Full Changelog: v0.0.13...v0.0.14
olla-v0.0.13
Documentation updates and includes examples for using OpenWebUI with Olla.
olla-v0.0.12
This is our first official release that's released publicly.
Amongst the highlights:
- API redesign and consistency across providers
- Model unification and model registry
- Profile based model management
- Proxy configurations and code consolidated
We've beefed up the performance across the board, Olla now can sip < 40Mb process size on Linux serving hundreds of endpoints.
What's Changed
- feat: olla profile by @thushan in #32
- feat: API Redesign by @thushan in #33
- feat: proxy consolidation by @thushan in #34
- Bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #30
- tweaks & bugfixes by @thushan in #35
- feat: doc updates and refresher by @thushan in #36
Full Changelog: v0.0.11...v0.0.12
Changelog
- cb81a10 Bump golang.org/x/sync from 0.15.0 to 0.16.0
- 3bc40e0 Filters out any unhealthy endpoints when querying models to avoid callers from using it - but we hold onto it in case it comes back online
- a86a1ac Fix path issues
- c71c72d OLLA-85: [Unification] Models with different digests fail to unify correctly.
- 033bbd1 Refactored tests with Claude whos split the tests into two buckets, short and stress.
- 41a6e47 Revert "attempt container consolidation"
- f4342dc Revert to the old way of testing concurrency
- 2f7dc75 Validated in testing, we can use our Util.GenerateRequestId now.
- 7eec5e2 add InferenceProfile interface extending PlatformProfile
- 0b9f9fd add response time headers
- c4bf056 add tests for response headers.
- 9133d74 added reservoir sampling for percentiles (thanks to Browny's original implementation)
- 2ce45e4 adds back sherpa compatibility layer
- 58eaffa adds body inspector and extraction of models
- c196332 adds proper headers to requests.
- 0e14221 attempt container consolidation
- 490123e avoid starting if the port is in use
- e5b2425 basic docs
- 50c9983 beef up installer script
- 390c117 clean up + coderabbit review comments
- 00b5952 cleanup configuration issues which is back to being basic
- 1f43dab cleanup configuration, avoid maintaining docker.yaml manually.
- 0103cf8 coderabbit comment feedback.
- 0ab4adb concise claude.md
- 00ccf34 consolidate constants
- 367a9ea context aware logging
- ebc6737 copy and use the new docker file config.
- ed93e9a deprecate old proxy_olla & proxy_sherpa
- 2e75d42 doc updates
- bae025b docs and scripts updates.
- 57b6fc1 documentation update & conslidation for users, inspired by scout & sherpa docs.
- e6930ab eventbus leak fix
- 5802037 extend profiles with model capabilities and resource requirements
- 5815f53 feedback about docs from GregW, adds models.md.
- 5d79680 first attempt at API redesign
- 471fae0 fix configuration for olla in path matching with prefix
- 6ddd573 fix event bus channel closer for GC to handle
- 4a73b21 fix invalid lmstudio models being returned (empty sometimes, othertimes invalid)
- c0ff64a forgot updated readme
- e8bae0e friendlier error messages for discovery too.
- e84df17 hmm fix the race on slower systems
- 8e7c498 ignore the docker container files and update script for other variables
- bddd3e2 implement GetModelsByCapability
- ddb0446 initial documentation
- 0b19fb9 line endings
- 7cf4fcd move model size / quants to configuration files and update tests
- 567f5c0 pass in config file in the CLI args
- e5ea155 refactor and use constants
- 6059bb8 refactor handler to be a bit easier to manage moving forward
- cbf0373 show which config has been loaded
- c2191eb simplification part 2
- 87cc3ee simplify the regstration of routes, part one of two.
- 542df4d single configuration plane for both proxies.
- b7fcfe6 testing across all endpoints
- 724bed1 trying an alternative method of reducing tracking profile compatibility
- a61fcae update ci pipes
- c4f2719 update docs
- 7f12858 update tests
olla-v0.0.11
Changelog
- 15411f1 First pass at implementing model normalisation
- 257be87 Update internal/adapter/unifier/default_unifier.go
- 41328ea Update internal/core/domain/unified_model.go
- e60507d add model builder to make it easier
- 595b6ad adds missing default config
- 8e86fcf dedupe and adjust rules for model sequence
- a7db9f5 extra rules and musing about with current state of models, not very extensible rightn ow.
- f1f1558 fix complexity lint
- 9161bcc fix race
- 0cc1064 fix up tests
- 6048002 forgot updates to unifed converter & model
- 78032bb go back to simplified unification across application types.
- aed687a ignore config-dev
- d73264b lifecycle management
- 2c105aa lint and align
- d29fc8d more lint fixes, more ambitious ones
- 856c4f4 move model rules to models.yml
- afebded remove unused param