Modelcache Improvements #4887

VedantMahabaleshwarkar · 2025-12-10T00:38:32Z

What this PR does / why we need it : `Local Model Cache Enhancements`

Change 1: Optional Storage Spec for LocalModelCache

Proposal

The goal of this change is to simplify the process of caching models that require authentication (e.g., private HuggingFace models or S3 buckets) by adding an optional storage specification directly to the LocalModelCache CRD.

Previously, to cache a model with credentials, users were required to:

Create a ClusterStorageContainer CRD with workloadType: localModelDownloadJob.
Manually embed credentials (e.g., environment variables) into this container specification.
Manage secrets in a dedicated namespace (kserve-localmodel-jobs).

Pros of the new approach:

Eliminates the need for ClusterStorageContainer: Users no longer need to define a separate cluster-level resource just to handle download credentials.
Consistency: The configuration now mirrors the InferenceService storage configuration, providing a unified experience for defining storage access.
Simplified Operations: Credentials can be managed via ServiceAccountName or direct Storage parameters within the cache definition itself, reducing operational overhead.

Explanation of Changes

Updated LocalModelCache CRD: Added serviceAccountName and storage fields to the LocalModelCacheSpec.
- serviceAccountName: Allows referencing a ServiceAccount (and its attached secrets) for credential lookup.
- storage: Allows specifying a storage key and parameters (e.g., endpoint, region) to override default configurations.
Dynamic Download Job: The model download job now dynamically configures its container based on the storageInitializer configuration found in the inferenceservice-config ConfigMap. It injects the necessary environment variables and volume mounts derived from the specified credentials.
Backward Compatibility: Existing ClusterStorageContainer configurations continue to function and take precedence if they exist.

Change 2: LocalModelNamespaceCache CRD

Proposal

The LocalModelCache resource is Cluster-scoped, making cached models available globally across the cluster. To support multi-tenancy and better isolation, we propose the LocalModelNamespaceCache CRD.

Key Benefits:

Isolation: Allows defining model caches that are restricted to a specific namespace.
Security: Ensures that sensitive models cached in a namespace are only usable by InferenceServices within that same namespace.
Multi-tenancy: Different teams (namespaces) can manage their own cache policies and models without affecting the cluster-wide state.

Explanation of Changes

New CRD LocalModelNamespaceCache: Introduced a new namespace-scoped Custom Resource Definition.
- It shares the same specification structure as LocalModelCache (SourceModelUri, ModelSize, NodeGroups, ServiceAccountName, Storage).
Controller Implementation: Added a new reconciler to handle the lifecycle of LocalModelNamespaceCache resources. The controller manages the underlying LocalModelNode resources to ensure models are downloaded to the specified node groups.
InferenceService Integration: Updated the InferenceService logic to support resolving models from LocalModelNamespaceCache. When an InferenceService requests a model, the system checks for a matching cache in the local namespace.
API Types: Added corresponding Go types (LocalModelNamespaceCache, LocalModelNamespaceCacheList) and generated client code.

Workflow Differences: Cluster vs. Namespace Cache

The workflow for LocalModelNamespaceCache differs from LocalModelCache in several key ways to support isolation:

Job Execution Context:
- LocalModelCache: Download jobs run in a centralized namespace (default kserve-localmodel-jobs).
- LocalModelNamespaceCache: Download jobs run in the same namespace where the CR is defined. This ensures that the job uses the namespace's quota and service accounts.
Resolution Scope:
- LocalModelCache: Available to any InferenceService in the cluster.
- LocalModelNamespaceCache: Only available to InferenceServices within the same namespace. The webhook logic prioritizes checking the local namespace cache before falling back to the cluster cache.
Credential Access:
- LocalModelCache: Credentials (Secrets/ServiceAccounts) must exist in the centralized job namespace.
- LocalModelNamespaceCache: Credentials must exist in the user's namespace, allowing teams to manage their own access tokens securely.

LocalModelNode CR and Status Updates

The LocalModelNode CR, which tracks the models present on a specific node, has been updated to handle both cluster-scoped and namespace-scoped models.

Spec Updates:
- The LocalModelInfo struct in LocalModelNodeSpec now includes a Namespace field.
- If Namespace is empty, it represents a cluster-scoped LocalModelCache.
- If Namespace is set, it represents a LocalModelNamespaceCache.
- Credential fields (ServiceAccountName, Storage) are also passed down to the LocalModelNode to ensure the node agent can launch the download job with correct permissions.
Status Key Changes:
- To prevent name collisions between models with the same name in different namespaces, the status map keys have been updated:
  - Cluster-scoped models: Key remains modelName.
  - Namespace-scoped models: Key is now namespace/modelName.
- This allows the node agent to track the status of my-model (cluster) and my-ns/my-model (namespace) independently.
Storage Deduplication & Status:
- The node agent uses a hash of the SourceModelUri to determine the physical storage folder on the node.
- Deduplication: If multiple caches (e.g., in different namespaces) point to the exact same SourceModelUri, they will share the same physical folder on the disk.
- Smart Status Updates: The agent tracks which storage keys have been processed. If a download for a specific URI is already in progress or completed (initiated by another CR), the agent reuses that status for the current model. This prevents redundant download jobs while correctly updating the status of all referencing LocalModelNode entries.

Type of changes

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Test B
Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Release note:

Add namespace scoped LocalModelNamespaceCache CRD for caching models in a namespaced scope

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

terrytangyuan · 2025-12-10T18:07:25Z

I wonder if we should use these names instead:

ModelCache (namespaced)
ClusterModelCache (cluster scoped)

Similar to what we are doing in Argo Workfows, e.g. WorkflowTemplate vs ClusterWorkflowTemplate

Although the renaming might be a breaking change.

If not, LocalModelNamespacedCache might be better than LocalModelNamespaceCache.

terrytangyuan · 2025-12-10T18:08:30Z

cc @yuzisun @greenmoon55 @johnugeorge @cjohannsen-cloudera Could you take a look at this proposal?

yuzisun · 2025-12-12T05:11:06Z

@VedantMahabaleshwarkar If the intent is to cache the model on each node in the cluster, how do you control the access to the model cache to only allow a specific namespace.

VedantMahabaleshwarkar · 2025-12-12T16:25:29Z

@yuzisun

If the intent is to cache the model on each node in the cluster, how do you control the access to the model cache to only allow a specific namespace.

The existing LocalModelCache logic checks for a StorageUri match with all existing LocalModelCache CRs (which are clusterwide), and adds a ModelCache label (internal.serving.kserve.io/localmodel: my-model) if a match is found, which triggers the logic to inject the cache PV/PVC to the deployment.

For the LocalModelNamespaceCache, the StorageUri matching logic is modified as follows :
The ModelCache labels are added only if a matching StorageURI is found in either

A LocalModelNamespaceCache CR in the same namespace as the ISVC (First precedence, check for namespaced cache)
- This adds an additional internal.serving.kserve.io/localmodel-namespace: <namespace> label which is checked by the LocalModelNamespaceCache Reconciler while injecting the PV and PVC.
A LocalModelCache CR (second priority, check for global cache if no namespaced cache is found)

spolti · 2025-12-12T19:57:47Z

docs/samples/localmodelcache/README.md

+    storageClassName: local-storage
+  persistentVolumeSpec:
+    accessModes:
+      - ReadWriteOnce


Shouldn't it be RWX?

spolti · 2025-12-12T20:08:11Z

Wouldn't it be better move the discussion to a GH issue and attach the architecture proposal?

greenmoon55 · 2026-01-02T20:41:28Z

Looks great! We thought about something similar. One follow up could be validating disk quota to make sure disk space is not used up by some namespaces.

VedantMahabaleshwarkar added 2 commits December 9, 2025 19:01

Allow storage initializer config from ModelCache CRD

d2dbbe9

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

add namespace scoped model cache

87d6744

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

spolti reviewed Dec 12, 2025

View reviewed changes

VedantMahabaleshwarkar marked this pull request as ready for review January 19, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modelcache Improvements #4887

Modelcache Improvements #4887

Uh oh!

VedantMahabaleshwarkar commented Dec 10, 2025

Uh oh!

terrytangyuan commented Dec 10, 2025 •

edited

Loading

Uh oh!

terrytangyuan commented Dec 10, 2025

Uh oh!

yuzisun commented Dec 12, 2025

Uh oh!

VedantMahabaleshwarkar commented Dec 12, 2025 •

edited

Loading

Uh oh!

spolti Dec 12, 2025

Uh oh!

spolti commented Dec 12, 2025

Uh oh!

greenmoon55 commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Modelcache Improvements #4887

Are you sure you want to change the base?

Modelcache Improvements #4887

Uh oh!

Conversation

VedantMahabaleshwarkar commented Dec 10, 2025

What this PR does / why we need it : Local Model Cache Enhancements

Change 1: Optional Storage Spec for LocalModelCache

Proposal

Explanation of Changes

Change 2: LocalModelNamespaceCache CRD

Proposal

Explanation of Changes

Workflow Differences: Cluster vs. Namespace Cache

LocalModelNode CR and Status Updates

Uh oh!

terrytangyuan commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terrytangyuan commented Dec 10, 2025

Uh oh!

yuzisun commented Dec 12, 2025

Uh oh!

VedantMahabaleshwarkar commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spolti Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

spolti commented Dec 12, 2025

Uh oh!

greenmoon55 commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

What this PR does / why we need it : `Local Model Cache Enhancements`

terrytangyuan commented Dec 10, 2025 •

edited

Loading

VedantMahabaleshwarkar commented Dec 12, 2025 •

edited

Loading