Thanks to visit codestin.com
Credit goes to github.com

Skip to content

alvarolop/rhoai-gitops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Red Hat OpenShift AI

Red Hat OpenShift AI (RHOAI) builds on the capabilities of Red Hat OpenShift to provide a single, consistent, enterprise-ready hybrid AI and MLOps platform. It provides tools across the full lifecycle of AI/ML experiments and models including training, serving, monitoring, and managing AI/ML models and AI-enabled applications. This is my personal repository to test and play with some of its most important features.

1. Red Hat Training

RHOAI is a product under continuous improvement, so this repo will be outdated at some point in time. I recommend you to refer to the Official documentation to check the latest features or you can try the official trainings.

Red Hat OpenShift AI (RHOAI) is a platform for data scientists, AI practitioners, developers, machine learning engineers, and operations teams to prototype, build, deploy, and monitor AI models. This is a wide variety of audience that needs different kinds of training. For that reason, there are several courses that will help you to understand RHOAI from all angles:

  • AI262 - Introduction to Red Hat OpenShift AI: About configuring Data Science Projects and Jupyter Notebooks.

  • AI263 - Red Hat OpenShift AI Administration: About installing RHOAI, configuring users and permissions and creating Custom Notebook Images.

  • AI264 - Creating Machine Learning Models with Red Hat OpenShift AI: About training models and enhancing the model training.

  • AI265 - Deploying Machine Learning Models with Red Hat OpenShift AI: About serving models on RHOAI.

  • AI266 - Automating AI/ML Workflows with Red Hat OpenShift AI: About creating Data Science Pipelines, and Elyra and Kubeflow Pipelines.

  • AI267 - Developing and Deploying AI/ML Applications on Red Hat OpenShift AI: All the previous courses altogether.

2. RHOAI Architecture

The following diagram depicts the general architecture of a RHOAI deployment, including the most important components:

RHOAI Architecture
Figure 1. RHOAI Architecture
  • aipipelines: This enables you to build portable machine learning workflows. It is based on Kubeflow Pipelines and you don’t need to install OCP Pipelines operator. (Formerly known as datasciencepipelines)

  • dashboard: Provides the RHOAI dashboard.

  • feastoperator: Feast is an open-source feature store for machine learning. It helps manage, store, and serve features for training and inference, enabling feature reuse across ML projects. NOTE: Removed by default, must be explicitly enabled.

  • kserve: RHOAI uses Kserve to serve large language models that can scale based on demand.

  • kueue: Kueue is a Kubernetes-native job queueing system that provides quota management, resource sharing, and prioritization for batch workloads. Supports configuration of default cluster and local queue names. NOTE: It is installed separately as an OpenShift component, that’s why it should be set to Unmanaged.

  • llamastackoperator: Llama Stack is a set of APIs and tools for building AI applications with Meta’s Llama models. This operator helps deploy and manage Llama-based AI applications on OpenShift. NOTE: Removed by default, must be explicitly enabled.

  • modelregistry: Model Registry provides a central repository for data scientists to store, version, and manage machine learning models. It enables model governance, lineage tracking, and sharing across teams. Requires specifying a registriesNamespace for model registry deployments.

  • ray: Component to run the data science code in a distributed manner.

  • trainingoperator: Training Operator provides Kubernetes-native support for distributed training of machine learning models using popular frameworks like PyTorch, TensorFlow, and MPI.

  • trustyai: TrustyAI provides AI explainability and fairness monitoring capabilities. It helps data scientists and ML engineers understand model behavior, detect bias, and ensure AI transparency and accountability.

  • workbenches: Workbenches are containerized and isolated working environments for data scientists to examine data and work with data models. Data scientists can create workbenches from an existing notebook container image to access its resources and properties. Workbenches are associated to container storage to prevent data loss when the workbench container is restarted or deleted.

2.1. Deprecated Components

The following components have been deprecated and are no longer available in the DataScienceCluster configuration:

  • codeflare (Deprecated): Codeflare was an IBM software stack for developing and scaling machine-learning and Python workloads. Its functionality has been integrated into other components like Ray and the Training Operator.

  • datasciencepipelines (Renamed): This component has been renamed to aipipelines. Update your configurations accordingly.

  • modelmeshserving (Deprecated): ModelMesh Serving was used for serving small and medium size models. Its functionality has been consolidated into kserve, which now handles model serving for all model sizes.

3. Installation

Installing RHOAI is not as simple as installing and configuring other operators on OpenShift. This product provides integration with hardware like NVIDIA and Intel GPUs, automation of ML workflows and AI training, and deployment of LLMs. For that reason, I’ve created an auto-install.sh script that will do everything for you:

  1. If the installation is IPI AWS, it will create MachineSets for nodes with NVIDIA GPUs (Currently, g5.4xlarge).

  2. Install all the operators that RHOAI depends on:

    • Red Hat OpenShift Leader Worker Set Operator to manage leader worker sets. This is a requirement for llm-d feature.

    • Red Hat Build of Kueue to manage distributed workloads. This is a requirement for GPUaaS feature.

    • Node Feature Discovery and Nvidia GPU Operator to discover and configure nodes with GPU.

    • Authorino, to enable token authorization for models deployed with RHOAI.

    • Note: Single-Model Serving (Model Mesh Serving) is not supported anymore in RHOAI 3.0, so Service Mesh 2.x and Serverless operators are not installed as prerequisites.

  3. Optionally install and configure OpenShift Data Foundation (ODF) in Multicloud Object Gateway (MCG) mode. This is a lightweight alternative that allows us to use the AWS S3 object storage the same way that we will then use Object storage on Baremetal using ODF.

  4. Installs the actual RHOAI operator and configures the installation with some defaults, enabling NVIDIA acceleration and llm-d feature.

  5. Deploys a new Data Science Project called RHOAI Playground enabling pipelines and deploying a basic Notebook for testing.

3.1. Installation on non-4.20 OCP

Some of the components deployed in this repo are bound to an specific version of OpenShift. If you want to deploy RHOAI on an older version (For example 4.19), you have to make the following modifications:

  • Change the image for the Node Feature Discovery container to the one for 4.19:

    • In ./rhoai-dependencies/operator-nfd/nodefeaturediscovery-nfd-instance.yaml, the .spec.operand.image field should have value registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.19.

  • Change the channel of ODF:

    • In ./ocp-odf/odf-operator/sub-odf-operator.yaml, the value of .spec.channel field should be stable-4.19.

3.2. Let’s install!!

đź’ˇ

đź’ˇ Tip đź’ˇ The script contains many tasks divided in clear blocks with comments. Use the Environment Variables or add comments to disable those that you are not interested in.

In order to automate it all, it relays on OpenShift GitOps (ArgoCD), so you will to have it installed before executing the following script. Check out my automated installation on alvarolop/ocp-gitops-playground GitHub repository.

Now, log in to the cluster and just execute the script:

./auto-install.sh

4. Things you should know!

4.1. NVIDIA GPU nodes

Most of the activities related to RHOAI will require GPU Acceleration. For that purpose, we add NVIDIA GPU nodes during the installation process. In this chapter, I collect some information that might be useful for you.

In this automation, we are currently using the AWS g5.2xlarge instance, that according to the documentation:

Amazon EC2 G5 instances are designed to accelerate graphics-intensive applications and machine learning inference. They can also be used to train simple to moderately complex machine learning models.

How to know that a node has NVIDIA GPUs using NodeFeatureDiscovery?

The output of the following command will only be visible when you have applied the ArgoCD Application and the Node Feature Discovery operator has scanned the OpenShift nodes:

oc describe node | egrep 'Roles|pci'
Roles:              control-plane,master
Roles:              worker
                    feature.node.kubernetes.io/pci-1d0f.present=true
Roles:              gpu-worker,worker
                    feature.node.kubernetes.io/pci-10de.present=true
                    feature.node.kubernetes.io/pci-1d0f.present=true
Roles:              control-plane,master
Roles:              control-plane,master

pci-10de is the PCI vendor ID that is assigned to NVIDIA.

The NVIDIA GPU Operator automates the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.

After configuring the Node Feature Discovery Operator and the NVidia GPU Operator using GitOps, you need to confirm that the Nvidia operator is correctly retrieving the GPU information. You can use the following command to confirm that OpenShift is correctly configured:

oc exec -it -n nvidia-gpu-operator $(oc get pod -o wide -l openshift.driver-toolkit=true -o jsonpath="{.items[0].metadata.name}" -n nvidia-gpu-operator) -- nvidia-smi

The output should look like this:

Sat Oct 26 08:47:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
|  0%   25C    P8             22W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

If, for some race condition, RHOAI is not detecting that GPU worker, you might need to force it to recalculate. You can do so easily with the following command:

oc delete cm migration-gpu-status -n redhat-ods-applications; sleep 3; oc delete pods -l app=rhods-dashboard -n redhat-ods-applications

Wait for a few seconds until the dashboard pods start again and you will see in the RHOAI web console that now the NVidia GPU Accelerator Profile is listed.

4.2. NVIDIA GPU Partitioning

âť—

If you want to achieve this properly, please, don’t miss reading this repo.

Partitioning allows for flexibility in resource management, enabling multiple applications to share a single GPU or dividing a large GPU into smaller, dedicated units for different tasks. For the sake of simplicity and maximization of the reduced resources, I have enabled time-slicing configuration. You can check the configuration in rhoai-dependencies/operator-nvidia-gpu.

How to check that the configuration is applied?

oc get node --selector=nvidia.com/gpu.product="NVIDIA-A10G-SHARED" -o json  | jq '.items[0].metadata.labels' | grep nvidia

Also, you can check these two blog entries with an analysis from the RH Performance team about this topic:

4.3. Data Connection Pipelines S3 Bucket Secret

The DataSciencePipelineApplication requires an S3-compatible storage solution to store artifacts that are generated in the pipeline. You can use any S3-compatible storage solution for data science pipelines, including AWS S3, OpenShift Data Foundation, or MinIO. The automation is currently using ODF with Nooba to interact with the AWS S3 interface, so you won’t need to do anything. Nevertheless, if you decide to disable ODF, you will need to create buckets on AWS S3 manually and for that you will need the following process:

  1. Define the configuration variables for AWS is a file dubbed aws-env-vars. You can use the same structure as in aws-env-vars.example

  2. Execute the following command to interact with the AWS API:

    ./prerequisites/s3-bucket/create-aws-s3-bucket.sh
  3. Or execute the following command if you interact with MinIO:

    ./prerequisites/s3-bucket/create-minio-s3-bucket.sh

4.4. Managing distributed workloads

You can use the distributed workloads feature to queue, scale, and manage the resources required to run data science workloads across multiple nodes in an OpenShift cluster simultaneously. These three components need to be enabled on the RHOAI installation configuration:

  • CodeFlare: Secures deployed Ray clusters and grants access to their URLs.

  • KubeRay: Manages remote Ray clusters on OpenShift for running distributed compute workloads.

  • Kueue: Manages quotas and how distributed workloads consume them, and manages the queueing of distributed workloads with respect to quotas.

If you want to try this feature, I recommend you to follow the RH documentation, which points to the following Guided Demos.

After everything is configured, you can use the Model Tunning example from the Helm chart to see some stats:

helm template ./rhoai-environment-chart \
    -s templates/modelTunning/cm-training-config.yaml \
    -s templates/modelTunning/cm-twitter-complaints.yaml \
    -s templates/modelTunning/pvc-trained-model.yaml \
    -s templates/modelTunning/pytorchjob-demo.yaml \
    --set modelTunning.enabled=true | oc apply -f -

You can also see some stats from the RHOAI dasboard:

Distributed Workload - Metrics
Distributed Workload - Status

4.5. Model Registry

OpenShift AI now includes the possibility to deploy a model registry to store community and customized AI models. This model registry uses a mysql database as backend to store metadata and artifacts from your applications. Once deployed, your training pipelines can add an extra step putting model metadata to the registry.

Using RHOAI Model Registry you have a centralized source of models as well as a simple way to deploy prepared models:

Model Registry - Dashboard

Here you can find examples of REST requests to query model metadata:

MODEL_REGISTRY_NAME=default
MODEL_REGISTRY_HOST=$(oc get route default-https -n rhoai-model-registries -o go-template='https://{{.spec.host}}')
TOKEN=$(oc whoami -t)

# List models
curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_models?pageSize=100&orderBy=ID&sortOrder=DESC" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

# List all model versions
MODEL_NAME="test"
MODEL_ID="4"

curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_model?name=${MODEL_NAME}&externalId=${MODEL_ID}" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha/registered_models/${MODEL_ID}/versions?name=${MODEL_NAME}&pageSize=100&orderBy=ID&sortOrder=DESC" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

If you want to try this feature, I recommend you to follow the RH documentation:

5. Monitoring, Safety and Evaluation

  • Monitoring: To ensure the transparency, fairness, and reliability of your data science models in OpenShift AI for Bias and Data Drift. Configure and set up TrustyAI for your project, and then perform the following checks:

    • Bias: Check for unfair patterns or biases in data and model predictions to ensure your model’s decisions are unbiased.

    • Data drift: Detect changes in input data distributions over time by comparing the latest real-world data to the original training data. Comparing the data identifies shifts or deviations that could impact model performance, ensuring that the model remains accurate and reliable.

  • Safety: To ensure that your machine-learning models are transparent, fair, and reliable. The TrustyAI Guardrails Orchestrator service is a tool to invoke detections on text generation inputs and outputs, as well as standalone detections.

  • Evaluation: To ensure your OpenShift AI models for accuracy, relevance, and consistency. Evaluate your AI systems to generate an analysis of your model’s ability by using the following TrustyAI tools:

    • LM-Eval: Use TrustyAI to monitor your LLM against a range of different evaluation tasks and to ensure the accuracy and quality of its output. Features such as summarization, language toxicity, and question-answering accuracy are assessed to inform and improve your model parameters.

    • RAGAS: Use Retrieval-Augmented Generation Assessment (RAGAS) with TrustyAI to measure and improve the quality of your RAG systems in OpenShift AI. RAGAS provides objective metrics that assess retrieval quality, answer relevance, and factual consistency.

    • Llama Stack: Use Llama Stack components and providers with TrustyAI to evaluate and work with LLMs.

6. Deploying an Inference Server

As the Model Registry is still Tech Preview, we still keep documentation about how to sync manually models using an OCP Job and then serve it with OpenShift AI. You can use the following Application that points to a Helm Chart that automates it:

mistral-7b
oc apply -f application-serve-mistral-7b.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n mistral-7b
granite-1b-a400m
oc apply -f application-serve-granite-1b-a400m.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n granite-1b-a400m
nomic-embed-text-v1
oc apply -f application-serve-nomic-embed-text-v1.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n nomic-embed-text-v1
Testing LLM certificates
# Retrieve certificates
openssl s_client -showcerts -connect mistral-7b.mistral-7b.svc.cluster.local:443 </dev/null

# Check models endpoint
curl --cacert /etc/pki/ca-trust/source/anchors/service-ca.crt https://mistral-7b.mistral-7b.svc.cluster.local:443/v1/models

# Check Completion (It might be /v1/chat/completions)
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'

# Embeddings
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'
Embeddings
curl -s -X 'POST' \
  "https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/v1/embeddings" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nomic-embed-text-v1",
  "input": ["En un lugar de la Mancha..."]
}'

# API Endpoints:
# * Ollama => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/api/embed
# * OpenAI => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/embeddings

7. Extra Components

7.1. OpenShift Lightspeed

Red Hat OpenShift Lightspeed is a generative AI-powered virtual assistant for OpenShift Container Platform. Lightspeed functionality uses a natural-language interface in the OpenShift web console.

oc apply -f application-ocp-lightspeed.yaml

or you can deploy it manually with the following command:

oc apply -k components/ocp-lightspeed

7.2. LLS Playground

The Llama Stack Playground is an application that allows you to test the Llama Stack LLM and Agent capabilities.

cat application-lls-playground.yaml | \
  CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
  LLS_ENDPOINT="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
  envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/lls-playground \
  --set global.clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
  --set llamaStack.endpoint="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
  | oc apply -f -

7.3. MinIO

This demo is fully oriented to use the default and production ready capabilities provided by OpenShift. However, if your current deployment already uses minio and you cannot change it, you can optionally deploy a MinIO application in a side namespace using the following ArgoCD application. This application is included in the auto-install.sh automation:

cat application-minio.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    MINIO_NAMESPACE="minio" MINIO_SERVICE_NAME="minio" \
    MINIO_ADMIN_USERNAME="minio" MINIO_ADMIN_PASSWORD="minio123" \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/minio \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    --set namespace="minio" --set service.name="minio" \
    --set adminUser.username="minio" --set adminUser.password="minio123" | oc apply -f -

User and password is minio / minio123.

7.4. Open WebUI

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

cat application-open-webui.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    LLM_INFERENCE_SERVICE_URL="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/open-webui --namespace="open-webui" \
    --set llmInferenceService.url="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    --set rag.enabled="true" | oc apply -f -

7.5. Milvus

Milvus is Vector database built for scalable similarity search. It is "Open-source, highly scalable, and blazing fast". Milvus offers robust data modeling capabilities, enabling you to organize your unstructured or multi-modal data into structured collections.

Attu is an efficient open-source management tool for Milvus. It features an intuitive graphical user interface (GUI), allowing you to easily interact with your databases.

cat application-milvus.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/milvus --namespace="milvus" \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') | oc apply -f -

The password for the Attu GUI is root / Milvus.

8. Extra documentation

About

This repository is my playground to deploy, configure, and use Red Hat OpenShift AI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •