Conversation
Contributor
9 tasks
zhuangqh
pushed a commit
to kaito-project/kaito
that referenced
this pull request
Jun 16, 2026
…2s_v3 SKU in E2E (#2104) ## What type of PR is this? /kind bug ## What this PR does / why we need it ### 1. Replace `Standard_NC12s_v3` with `Standard_NV72ads_A10_v5` in E2E tests `Standard_NC12s_v3` (V100) is not available in `swedencentral` (and many other regions), causing E2E webhook tests to fail with `VMSizeNotSupported`. `Standard_NV72ads_A10_v5` (2×A10 = 48Gi GPU memory) is used because: - It matches the "2gpu" semantics of the webhook tests - It fits phi-4 on a single node (no multi-node Transformers issue) - It is widely available in the regions where E2E runs ### 2. Fix vLLM inference server crash caused by `prometheus-fastapi-instrumentator` incompatibility **Root cause of E2E `validateInferenceResource` timeouts:** FastAPI 0.137.0 ([fastapi/fastapi#15745](fastapi/fastapi#15745)) refactored `include_router()` to wrap routers in `_IncludedRouter` objects that lack a `.path` attribute. This breaks `prometheus-fastapi-instrumentator` middleware ([trallnag/prometheus-fastapi-instrumentator#370](trallnag/prometheus-fastapi-instrumentator#370)), causing **every HTTP request** to the vLLM inference server to return 500: ``` AttributeError: '_IncludedRouter' object has no attribute 'path' ``` vLLM starts and loads models successfully, but all health checks and inference requests crash in the Prometheus middleware before reaching the actual handler. This makes the workspace permanently "not ready" regardless of timeout duration. **Fix:** Pin `fastapi[standard] <0.137.0` in requirements.txt until the upstream instrumentator is fixed. ### 3. Revert timeout bump (30m → 20m) The previous commit bumped `validateInferenceResource` timeout from 20m to 30m, but that was never the real issue — the middleware crash means no amount of waiting helps. Reverted to original 20m. ## Changes - `test/e2e/webhook_test.go`: replace instanceType in webhook validation tests - `presets/workspace/dependencies/requirements.txt`: pin `fastapi<0.137.0` - `test/e2e/preset_test.go`: revert validateInferenceResource timeout to 20m
6 tasks
georgi-smasint
pushed a commit
to supermassive-intelligence/scalarlm
that referenced
this pull request
Jun 17, 2026
FastAPI 0.137.0 (fastapi/fastapi#15745) keeps _IncludedRouter wrappers in app.routes; prometheus-fastapi-instrumentator reads route.path unconditionally -> AttributeError -> HTTP 500 on every vLLM endpoint. cray's health check asserts vLLM /health == 200, so the stack reports vllm down and the finetune sweep fails with RESTART_FAILED. fastapi-utils pulls FastAPI transitively, so the ceiling lives in requirements-vllm.txt. See vllm-project/vllm#45597, trallnag/prometheus-fastapi-instrumentator#370. Co-Authored-By: Claude Opus 4.8 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
♻️ Refactor internals to preserve
APIRouterandAPIRouteinstancesSupersedes #4794
Unblocks ✨ SO MANY THINGS ✨
Before this,
router.include_router(other_router)would take each path operation fromother_routerand "clone" it, or recreate it from scratch.This would mean that in the end there was only one top level router, part of the app.
The way it is structured here is that there are a few additional classes to handle intermediate metadata for router and route inclusion. That way the information of "router X includes Y and Y includes Z" is stored somewhere, without affecting (recreating / clonning) the final route.
Non Objective
Dependencies for 404
Originally in the other PR I intended to support dependencies that would be executed even for 404, but that would conflict with the fact that a router could not find a match, but the next router did find a match. Executing dependencies in the router that did not find a match would not make sense, they could consume the request, body, etc.
This original idea was discarded.
Breaking Change
Now
router.routesis no longer a plain list ofAPIRouteobjects, it can contain these intermediate objects that can contain additional routers, forming a tree.Any logic that depended on iterating on the
router.routesdirectly would be affected, that logic cannot expect to be able to extract data from a plain list of routes, as it's no longer a plain list but a tree.Additionally, any logic that iterated on
router.routesto modify them would now also see these new objects, and would not see all the routes in the app.router.routesshould be considered an internal implementation detail, only passed around to the FastAPI functions that need it.Features
subrouterinmainroutercan be done before adding routes (path operations) tosubrouter, because now the the entire object is stored instead of copying the routes.Alpha Features
This is not documented yet, so it's not officially supported yet and could change in the future.
But, as
APIRouteandAPIRouterinstances are now preserved, they could be customized.APIRouterhas two new methods,.matches()and.handle(), counterpart to the existing ones inAPIRoute. With this a router could customize how it matches and handles requests. For example, it could match only requests that include some specific header, for example for handling versions in headers.Still, for now, consider this very experimental and potentially changing and breaking in the future.
Future Features Enabled
APIRoutesubclasses (as desccribed above)APIRoutersubclasses (as described above)Discussion:
Description
AI Disclaimer
Codex with GPT 5.5, through a lot of planning and research iterations, for weeks, then the same for the implementation, with too many iterations to count to clean up the implementation, what's supported and not, docs, types, etc.
All code and tests manually reviewed by hand.
AI transcript
Checklist