-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Modelcache Improvements #4887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Modelcache Improvements #4887
Conversation
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
|
I wonder if we should use these names instead:
Similar to what we are doing in Argo Workfows, e.g. WorkflowTemplate vs ClusterWorkflowTemplate Although the renaming might be a breaking change. If not, LocalModelNamespacedCache might be better than LocalModelNamespaceCache. |
|
cc @yuzisun @greenmoon55 @johnugeorge @cjohannsen-cloudera Could you take a look at this proposal? |
|
@VedantMahabaleshwarkar If the intent is to cache the model on each node in the cluster, how do you control the access to the model cache to only allow a specific namespace. |
The existing LocalModelCache logic checks for a StorageUri match with all existing LocalModelCache CRs (which are clusterwide), and adds a ModelCache label ( For the LocalModelNamespaceCache, the StorageUri matching logic is modified as follows :
|
| storageClassName: local-storage | ||
| persistentVolumeSpec: | ||
| accessModes: | ||
| - ReadWriteOnce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be RWX?
|
Wouldn't it be better move the discussion to a GH issue and attach the architecture proposal? |
|
Looks great! We thought about something similar. One follow up could be validating disk quota to make sure disk space is not used up by some namespaces. |
What this PR does / why we need it :
Local Model Cache EnhancementsChange 1: Optional Storage Spec for LocalModelCache
Proposal
The goal of this change is to simplify the process of caching models that require authentication (e.g., private HuggingFace models or S3 buckets) by adding an optional storage specification directly to the
LocalModelCacheCRD.Previously, to cache a model with credentials, users were required to:
ClusterStorageContainerCRD withworkloadType: localModelDownloadJob.kserve-localmodel-jobs).Pros of the new approach:
ClusterStorageContainer: Users no longer need to define a separate cluster-level resource just to handle download credentials.InferenceServicestorage configuration, providing a unified experience for defining storage access.ServiceAccountNameor directStorageparameters within the cache definition itself, reducing operational overhead.Explanation of Changes
LocalModelCacheCRD: AddedserviceAccountNameandstoragefields to theLocalModelCacheSpec.serviceAccountName: Allows referencing a ServiceAccount (and its attached secrets) for credential lookup.storage: Allows specifying a storagekeyandparameters(e.g., endpoint, region) to override default configurations.storageInitializerconfiguration found in theinferenceservice-configConfigMap. It injects the necessary environment variables and volume mounts derived from the specified credentials.ClusterStorageContainerconfigurations continue to function and take precedence if they exist.Change 2: LocalModelNamespaceCache CRD
Proposal
The
LocalModelCacheresource is Cluster-scoped, making cached models available globally across the cluster. To support multi-tenancy and better isolation, we propose theLocalModelNamespaceCacheCRD.Key Benefits:
Explanation of Changes
LocalModelNamespaceCache: Introduced a new namespace-scoped Custom Resource Definition.LocalModelCache(SourceModelUri,ModelSize,NodeGroups,ServiceAccountName,Storage).LocalModelNamespaceCacheresources. The controller manages the underlyingLocalModelNoderesources to ensure models are downloaded to the specified node groups.LocalModelNamespaceCache. When an InferenceService requests a model, the system checks for a matching cache in the local namespace.LocalModelNamespaceCache,LocalModelNamespaceCacheList) and generated client code.Workflow Differences: Cluster vs. Namespace Cache
The workflow for
LocalModelNamespaceCachediffers fromLocalModelCachein several key ways to support isolation:Job Execution Context:
kserve-localmodel-jobs).Resolution Scope:
Credential Access:
LocalModelNode CR and Status Updates
The
LocalModelNodeCR, which tracks the models present on a specific node, has been updated to handle both cluster-scoped and namespace-scoped models.Spec Updates:
LocalModelInfostruct inLocalModelNodeSpecnow includes aNamespacefield.Namespaceis empty, it represents a cluster-scopedLocalModelCache.Namespaceis set, it represents aLocalModelNamespaceCache.ServiceAccountName,Storage) are also passed down to theLocalModelNodeto ensure the node agent can launch the download job with correct permissions.Status Key Changes:
modelName.namespace/modelName.my-model(cluster) andmy-ns/my-model(namespace) independently.Storage Deduplication & Status:
SourceModelUrito determine the physical storage folder on the node.SourceModelUri, they will share the same physical folder on the disk.LocalModelNodeentries.Type of changes
Feature/Issue validation/testing:
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Test B
Logs
Special notes for your reviewer:
Checklist:
Release note:
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.