NetBackup104 DeployGuide Kubernetes Clusters
NetBackup104 DeployGuide Kubernetes Clusters
Release 10.4
NetBackup™ Deployment Guide for Kubernetes
Clusters
Legal Notice
Copyright © 2024 Veritas Technologies LLC. All rights reserved.
Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies
LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their
respective owners.
This product may contain third-party software for which Veritas is required to provide attribution
to the third party (“Third-party Programs”). Some of the Third-party Programs are available
under open source or free software licenses. The License Agreement accompanying the
Software does not alter any rights or obligations you may have under those open source or
free software licenses. Refer to the Third-party Legal Notices document accompanying this
Veritas product or available at:
https://www.veritas.com/about/legal/license-agreements
The product described in this document is distributed under licenses restricting its use, copying,
distribution, and decompilation/reverse engineering. No part of this document may be
reproduced in any form by any means without prior written authorization of Veritas Technologies
LLC and its licensors, if any.
The Licensed Software and Documentation are deemed to be commercial computer software
as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19
"Commercial Computer Software - Restricted Rights" and DFARS 227.7202, et seq.
"Commercial Computer Software and Commercial Computer Software Documentation," as
applicable, and any successor regulations, whether delivered by Veritas as on premises or
hosted services. Any use, modification, reproduction release, performance, display or disclosure
of the Licensed Software and Documentation by the U.S. Government shall be solely in
accordance with the terms of this Agreement.
Veritas Technologies LLC
2625 Augustine Drive.
Santa Clara, CA 95054
http://www.veritas.com
Technical Support
Technical Support maintains support centers globally. Technical Support’s primary
role is to respond to specific queries about product features and functionality. The
Technical Support group also creates content for our online Knowledge Base. The
Technical Support group works collaboratively with the other functional areas within
the company to answer your questions in a timely fashion.
Our support offerings include the following:
■ A range of support options that give you the flexibility to select the right amount
of service for any size organization
■ Telephone and/or Web-based support that provides rapid response and
up-to-the-minute information
■ Upgrade assurance that delivers software upgrades
■ Global support purchased on a regional business hours or 24 hours a day, 7
days a week basis
■ Premium service offerings that include Account Management Services
For information about our support offerings, you can visit our website at the following
URL:
www.veritas.com/support
All support services will be delivered in accordance with your support agreement
and the then-current enterprise technical support policy.
Customer service
Customer service information is available at the following URL:
www.veritas.com/support
Customer Service is available to assist with non-technical questions, such as the
following types of issues:
■ Questions regarding product licensing or serialization
■ Product registration updates, such as address or name changes
■ General product information (features, language availability, local dealers)
■ Latest information about product updates and upgrades
■ Information about upgrade assurance and support contracts
■ Advice about technical support options
■ Nontechnical presales questions
■ Issues that are related to CD-ROMs, DVDs, or manuals
Support agreement resources
If you want to contact us regarding an existing support agreement, please contact
the support agreement administration team for your region as follows:
Japan [email protected]
Contents
■ Required terminology
The guide describes a very comprehensive method to deploy, configure, and remove
the Cloud Scale components using the environment operators.
Supported platforms
■ For AWS cloud: Amazon Elastic Kubernetes Service
■ For Azure cloud: Azure Kubernetes Service
The number of pods that get created depends on the number of MSDP Scaleout
engines in a cluster. These pods are controlled by the MSDP operator.
■ 1 or 2 MSDP Scaleout engines: 1 pod
■ 3 or 4 MSDP Scaleout engines: 3 pods
■ 5 or more MSDP Scaleout engines: 5 pods
Required terminology
The table describes the important terms for NetBackup deployment on Kubernetes
cluster. For more information visit the link to Kubernetes documentation.
Term Description
Pod A Pod is a group of one or more containers, with shared storage and
network resources, and a specification for how to run the containers.
For more information on Pods, see Kubernetes Documentation.
Job Kubernetes jobs ensure that one or more pods execute their
commands and exit successfully. For more information on Jobs, see
Kubernetes Documentation.
Term Description
Persistent Volume A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
storage classes. For more information on Persistent Volumes, see
Kubernetes Documentation.
Custom Resource A Custom Resource (CR) is an extension of the Kubernetes API that
is not necessarily available in a default Kubernetes installation. For
more information on Custom Resources, see Kubernetes
Documentation.
Custom Resource The CustomResourceDefinition (CRD) API resource lets you define
Definition custom resources. For more information on
CustomResourceDefinitions, see Kubernetes Documentation.
ServiceAccount A service account provides an identity for processes that run in a Pod.
For more information on configuring the service accounts for Pods,
see Kubernetes Documentation.
■ Appropriate roles and Kubernetes cluster specific permissions are set to the
cluster at the time of cluster creation.
■ After successful deployment of the primary and media servers, the operator
creates a custom Kubernetes role with name ResourceName-admin whereas
Resource Name is given in primary server or media server CR specification.
The following permissions are provided in the respective namespaces:
Introduction 19
User roles and permissions
This role can be assigned to the NetBackup Administrator to view the pods that
were created, and to execute into them. For more information on the access
control, see Kubernetes Access Control Documentation.
Note: One role would be created, only if primary and media servers are in same
namespace with the same resource name prefix.
■ (AKS-specific only) Your AKS cluster must have the RBAC enabled. To view
the permissions set for the AKS cluster, use one of the following methods and
verify if enbleRBAC is set to true:
■ Run the following command:
az resource show -g <resource group name> -n <cluster name>
--resource-type
Microsoft.ContainerService/ManagedClusters --query
properties.enableRBAC
Table 1-2
Resource Name API Group Allowed Operations
PersistentVolume ■ Delete
■ Get
■ List
■ Patch
■ Update
■ Watch
■ Chapter 2. Prerequisites
■ Chapter 4. Configurations
AKS-specific requirements
Use the following checklist to prepare the AKS for installation.
■ Your Azure Kubernetes cluster must be created with appropriate network and
configuration settings.
For a complete list of supported Kubernetes cluster version, see the NetBackup
Compatibility List for all Versions.
■ While creating the cluster, assign appropriate roles and permissions.
Refer to the 'Concepts - Access and identity in Azure Kubernetes Services
(AKS)' section in Microsoft Azure Documentation.
■ Use an existing Azure container registry or create a new one. Your Kubernetes
cluster must be able to access this registry to pull the images from the container
Prerequisites 25
Preparing the environment for NetBackup installation on Kubernetes cluster
registry. For more information on the Azure container registry, see 'Azure
Container Registry documentation' section in Microsoft Azure Documentation.
■ Deploying the Primary and Media server installation on the same node pool
(node) is possible. For optimal performance, it is recommended to create
separate node pools. Select the Scale method as Autoscale. The autoscaling
feature allows your node pool to scale dynamically by provisioning and
de-provisioning the nodes as required automatically.
■ A dedicated node pool for Primary server must be created in Azure Kubernetes
cluster.
The following table lists the node configuration for the primary and media servers.
vCPU 16
RAM 64 GiB
Number of disks/node 1
Medium (8 nodes) 8 TB
■ Another dedicated node pool must be created for Snapshot Manager (if it has
to be deployed) with auto scaling enabled.
Following is the minimum configuration required for Snapshot Manager data
plane node pool:
RAM 8 GB
Following are the different scenario's on how the NetBackup Snapshot Manager
calculates the number of job which can run at a given point in time, based on
the above mentioned formula:
Prerequisites 26
Preparing the environment for NetBackup installation on Kubernetes cluster
RAM 8 GB
RAM 16 GB
■ All the nodes in the node pool must be running the Linux operating system.
Linux based operating system is only supported with default settings.
■ Taints and tolerations allows you to mark (taint) a node so that no pods can
schedule onto it unless a pod explicitly tolerates the taint. Marking nodes instead
of pods (as in node affinity/anti-affinity) is particularly useful for situations where
most pods in the cluster must avoid scheduling onto the node.
Taints are set on the node pool while creating the node pool in the cluster.
Tolerations are set on the pods.
■ If you want to use static private IPs and fully qualified domain names for the
load balancer service, private IP addresses and FQDNs must be created in AKS
before deployment.
Prerequisites 27
Preparing the environment for NetBackup installation on Kubernetes cluster
■ If you want to bind the load balancer service IPs to a specific subnet, the subnet
must be created in AKS and its name must be updated in the annotations key
in the networkLoadBalancer section of the custom resource (CR).
For more information on the network configuration for a load balancer service,
refer to the How-to-Guide section of the Microsoft Azure Documentation.
For more information on managing the load balancer service, See “About the
Load Balancer service” on page 170.
■ Create a storage class with Azure file storage type with file.csi.azure.com
and allows volume expansion. It must be in LRS category with Premium SSD.
It is recommended that the storage class has , Retain reclaim. Such storage
class can be used for primary server as it supports Azure premium files
storage only for catalog volume.
For more information on Azure premium files, see 'Azure Files CSI driver' section
of Microsoft Azure Documentation.
For example,
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: {{ custome-storage-class-name }}
provisioner: file.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
storageaccounttype: Premium_LRS
protocol: nfs
EKS-specific requirements
1 Create a Kubernetes cluster with the following guidelines:
■ Use Kubernetes version 1.27 onwards.
■ AWS default CNI is used during cluster creation.
Prerequisites 28
Preparing the environment for NetBackup installation on Kubernetes cluster
■ Create a nodegroup with only one availability zone and instance type should
be of at least m5.4xlarge configuration and select the size of attached EBS
volume for each node more than 100 GB.
The nodepool uses AWS manual or autoscaling group feature which allows
your nodepool to scale by provisioning and de-provisioning the nodes as
required automatically.
Note: All the nodes in node group must be running on the Linux operating
system.
2 Use an existing AWS Elastic Container Registry or create a new one and
ensure that the EKS has full access to pull images from the elastic container
registry.
3 It is recommended to create separate node pool for Media server installation
with autoscaler add-on installed in the cluster. The autoscaling feature allows
your node pool to scale dynamically by provisioning and de-provisioning the
nodes as required automatically.
Prerequisites 29
Preparing the environment for NetBackup installation on Kubernetes cluster
4 A dedicated node pool for Primary server must be created in Amazon Elastic
Kubernetes Services cluster.
The following table lists the node configuration for the primary and media
servers.
vCPU 16
RAM 64 GiB
Number of disks/node 1
Medium (8 nodes) 8 TB
5 Another dedicated node pool must be created for Snapshot Manager (if it has
to be deployed) with auto scaling enabled.
Following is the minimum configuration required for Snapshot Manager data
plane node pool:
RAM 8 GB
Following are the different scenario's on how the NetBackup Snapshot Manager
calculates the number of job which can run at a given point in time, based on
the above mentioned formula:
■ For DBPaaS Workload
Note: The following configuration is advised as the CPU credit limit was
reached in the T-series workload.
RAM 32 GB
RAM 8 GB
Prerequisites 31
Preparing the environment for NetBackup installation on Kubernetes cluster
RAM 16 GB
6 Taints and tolerations allows you to mark (taint) a node so that no pods can
schedule onto it unless a pod explicitly tolerates the taint. Marking nodes instead
of pods (as in node affinity/anti-affinity) is particularly useful for situations where
most pods in the cluster must avoid scheduling onto the node.
Taints are set on the node group while creating the node group in the cluster.
Tolerations are set on the pods.
7 Deploy aws load balancer controller add-on in the cluster.
For more information on installing the add-on, see 'Installing the AWS Load
Balancer Controller add-on' section of the Amazon EKS User Guide.
8 Install cert-manager and trust-manager as follows:
■ Install cert-manager by using the following command:
$ kubectl apply -f
https://github.com/cert-manager/cert-manager/releases/download/v1.13.3.0/cert-manager.yaml
For more information, see Documentation for cert-manager installation.
■ Install trust-manager by using the following command:
helm repo add jetstack https://charts.jetstack.io
--force-update
$ kubectl create namespace trust-manager
helm upgrade -i -n trust-manager trust-manager
jetstack/trust-manager --set app.trust.namespace=netbackup
--version v0.7.0 --wait
9 The FQDN that will be provided in primary server CR and media server CR
specifications in networkLoadBalancer section must be DNS resolvable to the
provided IP address.
Prerequisites 33
Preparing the environment for NetBackup installation on Kubernetes cluster
10 Amazon Elastic File System (Amazon EFS) for shared persistence storage.
To create EFS for primary server, see 'Create your Amazon EFS file system'
section of the Amazon EKS User Guide.
EFS configuration can be as follow and user can update Throughput mode as
required:
Performance mode: General Purpose
Throughput mode: Bursting (256 MiB/s)
Availability zone: Regional
Note: To install the add-on in the cluster, ensure that you install the Amazon
EFS CSI driver. For more information on installing the Amazon EFS CSI driver,
see 'Amazon EFS CSI driver' section of the Amazon EKS User Guide.
11 If NetBackup client is outside VPC or if you want to access the WEB UI from
outside VPC then NetBackup client CIDR must be added with all NetBackup
ports in security group inbound rule of cluster. See “About the Load Balancer
service” on page 170. for more information on NetBackup ports.
■ To obtain the cluster security group, run the following command:
aws eks describe-cluster --name <my-cluster> --query
cluster.resourcesVpcConfig.clusterSecurityGroupId
■ The following link helps to add inbound rule to the security group:
'Add rules to a security group' section of the Amazon EKS User Guide.
Prerequisites 34
Preparing the environment for NetBackup installation on Kubernetes cluster
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: ebs-csi-storage-class
parameters:
fsType: ext4
type: gp2
provisioner: kubernetes.io/ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Note: Ensure that you install the Amazon EBS CSI driver to install the add-on
in the cluster. For more information on installing the Amazon EBS CSI driver,
see 'Managing the Amazon EBS CSI driver as an Amazon EKS add-on' and
'Amazon EBS CSI driver' sections of the Amazon EKS User Guide.
13 The EFS based PV must be specified for Primary server catalog volume with
ReclaimPolicy=Retain.
Host-specific requirements
Use the following checklist to address the prerequisites on the system that you want
to use as a NetBackup host that connects to the AKS/EKS cluster.
AKS-specific
■ Linux operating system: For a complete list of compatible Linux operating
systems, refer to the Software Compatibility List (SCL) at:
NetBackup Compatibility List for all Versions
■ Install Docker on the host to install NetBackup container images through tar,
and start the container service.
Install Docker Engine
■ Prepare the host to manage the AKS cluster.
■ Install Azure CLI.
Prerequisites 35
Preparing the environment for NetBackup installation on Kubernetes cluster
For more information, see 'Install the Azure CLI on Linux' section of the
Microsoft Azure Documentation.
■ Install Kubernetes CLI
For more information, see 'Install and Set Up kubectl on Linux' section of
the Kubernetes Documentation.
■ Log in to the Azure environment to access the Kubernetes cluster by running
this command on Azure CLI:
az login –identity
az account set --subscription <subscriptionID>
az aks get-credentials --resource-group
<resource_group_name> --name <cluster_name>
az resource list -n $cluster_name --query
[*].identity.principalId --out tsv
az role assignment create --assignee <identity.principalId>
--role 'Contributor' --scope
/subscriptions/$subscription_id/resourceGroups/NBUX-QA-BiDi-
RG/providers/Microsoft.Network/virtualNetworks/NBUX-QA-BiDiNet01/subnets/$subnet
az login --scope https://graph.microsoft.com//.default
EKS-specific
■ Install AWS CLI.
For more information on installing the AWS CLI, see Install or update the latest
version of the AWS CLI' section of the AWS Command Line Interface User
Guide.
■ Install Kubectl CLI.
For more information on installing the Kubectl CLI, see 'Installing kubectl' section
of the Amazon EKS User Guide.
■ Configure docker to enable the push of the container images to the container
registry.
■ Create the OIDC provider for the AWS EKS cluster.
For more information on creating the OIDC provider, see 'Create an IAM OIDC
provider for your cluster' section of the Amazon EKS User Guide.
■ Create an IAM service account for the AWS EKS cluster.
For more information on creating an IAM service account, see 'Configuring a
Kubernetes service account to assume an IAM role' section of the Amazon EKS
User Guide.
Prerequisites 36
Prerequisites for MSDP Scaleout and Snapshot Manager (AKS/EKS)
■ If an IAM role needs an access to the EKS cluster, run the following command
from the system that already has access to the EKS cluster:
kubectl edit -n kube-system configmap/aws-auth
For more information on creating an IAM role, see Enabling IAM user and role
access to your cluster.
■ Login to the AWS environment to access the Kubernetes cluster by running the
following command on AWS CLI:
aws eks --region <region_name> update-kubeconfig --name
<cluster_name>
■ Free space of approximately 13GB on the location where you copy and extract
the product installation TAR package file. If using docker locally, there should
be approximately 8GB available on the /var/lib/docker location so that the
images can be loaded to the docker cache, before being pushed to the container
registry.
■ AWS EFS-CSI driver should be installed for static PV/PVC creation of primary
catalog volume.
■ At least one storage class is backed with Azure disk CSI storage driver
"disk.csi.azure.com", and allows volume expansion. It must be in LRS
category with Premium SSD. For example, the built-in storage class
"managed-csi-premium". It is recommended that the storage class has
"Retain" reclaim.
The Docker storage size must be more than 6 GB. The version of kubectl
must be v1.19.x or later. The version of Azure CLI must meet the AKS cluster
requirements.
■ If AKS is a private cluster, see Create a private Azure Kubernetes Service
cluster.
If the internal IPs are used, reserve the internal IPs (avoid the IPs that are
reserved by other systems) for Snapshot Manager and add forward and reverse
DNS records for all of them in your DNS configuration.
The AWS static public IPs can be used but is not recommended.
HOST_HAS_NAT_ENDPOINTS = YES
net.ipv4.tcp_keepalive_time=120
net.core.somaxconn = 1024
Tune the max open files to 1048576 if you run concurrent jobs.
Config-Checker utility
How does the Config-Checker utility work
The Config-Checker utility performs checks on the deployment environment to verify
that the environment meets the requirements, before starting the primary server
and media server deployments.
How does the Config-Checker works:
Prerequisites 41
Prerequistes for Kubernetes cluster configuration
■ RetainReclaimPolicy check:
This check verifies that the storage classes used for PVC creation in the CR
have reclaim policy as Retain. The check fails if any of the storage classes do
not have the Retain reclaim policy.
For more information, see the 'Persistent Volumes Reclaiming' section of the
Kubernetes Documentation.
■ MinimumVolumeSize check:
This check verifies that the PVC storage capacity meets the minimum required
volume size for each volume in the CR. The check fails if any of the volume
capacity sizes does not meet the requirements.
Following are the minimum volume size requirements:
■ Primary server:
■ Data volume size: 30Gi
■ Catalog volume size: 100Gi
■ Log volume size: 30Gi
■ Media server:
■ Data volume size: 50Gi
■ Log volume size: 30Gi
■ Provisioner check:
EKS-specific only
■ Primary server: This will verify that the storage type provided is Amazon
Elastic Block Store (Amazon EBS) for data and log volume. If any other
driver type is used, the Config-Checker fails.
■ Media server: This will verify that the storage type provided is Amazon Elastic
Block Store (Amazon EBS) for data and log volume. Config-Checker fails if
this requirement is not met for media server.
AKS-specific only
■ This check verifies that the provisioner type used in defining the storage
class is Azure disk, for the volumes in Media servers. If not the
Config-Checker will fail. This check verifies that the provisioner type used in
defining the storage class is not Azure files for the volumes in Media servers.
That is data and log volumes in case of Media server.
cluster. If this check fails, user must deploy the AWS Load Balancer Controller
add-on
■ Cluster Autoscaler
This autoscaler is required for autoscaling in the cluster. If autoscaler is not
configured, then Config-Checker displays a warning message and continues
with the deployment of NetBackup servers.
(EKS-specific only) This check verifies if the AWS Autoscaler add-on is installed
in the cluster. For more information, refer to 'Autoscaling' section of the Amazon
EKS User Guide.
■ Volume expansion check:
This check verifies the storage class name given for Primary server data and
log volume and for Media server data and log volumes has
AllowVolumeExpansion = true. If Config-Checker fails with this check then it
gives a warning message and continues with deployment of NetBackup media
servers.
■ Following are the Config-Checker modes that can be specified in the Primary
and Media CR:
■ Default: This mode executes the Config-Checker. If the execution is
successful, the Primary and Media CRs deployment is started.
■ Status of the Config-Checker can be retrieved from the primary server and media
server CRs by using the kubectl describe <PrimaryServer/MediaServer>
<CR name> -n <namespace> command.
For example, kubectl describe primaryservers environment-sample -n
test
■ Apply the CR again. Add the required data which was deleted earlier at
correct location, save it and apply the yaml using kubectl apply -f
<environment.yaml> command.
Note: Migration will take longer time based on catalog data size.
■ Execution summary of the Data migration can be retrieved from the migration
pod logs using the following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace>
This summary can also be retrieved from the operator pod logs using the
following command:
kubectl logs <netbackup-operator-pod-name> -n
<netbackup-environment-namespace>
■ Status of the data migration can be retrieved from the primary server CR by
using the following command:
kubectl describe <PrimaryServer> <CR name> -n
<netbackup-environment-namespace>
Following are the data migration statuses:
■ Success: Indicates all necessary conditions for the migration of the Primary
server are passed.
■ Failed: Indicates some or all necessary conditions for the migration the
Primary server are failed.
■ Running: Indicates migration is in running state for the Primary server.
■ If the Data migration execution status is failed, you can check the migration job
logs using the following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace>
Review the error codes and error messages pertaining to the failure and update
the primary server CR with the correct configuration details to resolve the errors.
For more information about the error codes, refer to NetBackup™ Status Codes
Reference Guide.
■ If any of the input value is not in the required form, then webhooks displays an
error and prevents the creation of an environment.
■ For primary server deployment, following webhook validations have been
implemented:
■ Validate RetainReclaimPolicy: This check verifies that the storage classes
used for PVC creation in the CR have reclaim policy as Retain. The check
fails if any of the webhook do not have the Retain reclaim policy.
■ Validate MinimumVolumeSize: This check verifies that the PVC storage
capacity meets the minimum required volume size for each volume in the
CR. The check fails if any of the volume capacity sizes does not meet the
following requirements for Primary server.
■ Catalog volume size: 100Gi
■ Log volume size: 30Gi
■ Data volume size: 30Gi
■ Validate CSI driver: This will verify that the PV created is provisioned using
the efs.csi.aws.com driver, that is, AWS Elastic file system (EFS) for
volumes catalog. If any other driver type is used, the webhook fails.
■ Validate AWS Elastic file system (EFS) controller add-on: Verifies if the AWS
Elastic file system (EFS) controller add-on is installed on the cluster. This
AWS Elastic file system (EFS) controller is required to use EFS as persistence
storage for pods which will be running on cluster. Webhooks will check the
EFS controller add-on is installed and it is running properly. If no, then
validation error is displayed.
■ AWS Load Balancer Controller add-on check: Verifies if the AWS load
balancer controller add-on is installed on the cluster. This load balancer
controller is required to use load balancer in the cluster. Webhooks will check
the load balancer controller add-on is installed and it is running properly. If
no, then a validation error is displayed.
■ Webhook validates each check in sequence. Even if one of the validation fails
then a validation error is displayed and the execution is stopped.
■ The error must be fixed and the environment.yaml file must be applied so that
the next validation check is performed.
■ The environment is created only after webhook validations are passed.
Note: The use of private cluster ensures that the network traffic between your
API server and node pools remain on the private network only.
■ For AWS:
Node size in AWS must be selected depending on ENIC available with the node
type. For more information on changing the value of max pods per node in AWS,
refer to AWS Documentation.
Note: If the max pods per node are not sufficient, then max jobs per node can
be reduced as mentioned in the 'max_jobs tunable' content in the following
section.
Pool settings
■ NetBackup pool: Used for deployment of NetBackup primary services along
with Snapshot Manager control plane services.
Minimum CPU requirement and Node size RAM: 4 CPU and 16 GB RAM
■ cpdata pool: Used for deployment of Snapshot Manager data plane (dynamically
created) services.
<= 2 TB 8 2
Note: ** If customer has distinct sizes of hosts to be protected then one should
consider the higher sized VM's as an average size of the VM.
■ Media pool: CPU requirement and Node size RAM: 4 CPU and 16 GB RAM
■ MSDP pool: CPU requirement and Node size RAM: 4 CPU and 16 GB RAM
Prerequisites 49
Prerequisites for Cloud Scale configuration
=====
~$ k describe cm flexsnap-conf
Name: flexsnap-conf
Namespace: nbux-002522
Labels: <none>
Annotations: <none>
Data
====
flexsnap.conf:
----
[agent] id = agent.8308b7c831af4b0388fdd7f1d91541e0
[capability_limit]
max_jobs=16
=======
■ Tuning account rate limit: For BFS performance improvement, the API limits
per AWS account can be updated as per the following formulae:
For example,
■ The default theoretical speed for the account is 43 TB/day (1000 request
per sec x 86400 sec in a day x 512 KB block size).
■ For PP schedule frequency of 1 per day and each VM around 1 TB size.
■ Theoretical maximum for number of full/day if the backup window is the
full day, then 43 VM/day can be backed up.
Prerequisites 50
Prerequisites for Cloud Scale configuration
For AKS
1. Permissions and role assignment: Before plugin configuration, the
Kubernetes cluster requires permissions to be assigned to the System Managed
Identity as follows:
■ Obtain the name of the infrastructure resource group for the Kubernetes
cluster.
■ Enable the System Managed Identity on the identified nodepool (nbupool).
■ Assign the role having the Snapshot Manager permission.
■ Add a taint with the same key and value which is used for label in above
step with effect as NoSchedule.
For example, key = nbpool, value = nbnodes, effect = NoSchedule
■ Access to a container registry that the Kubernetes cluster can access, like an
Amazon Elastic Kubernetes Service Container Registry.
■ AWS network load balancer controller add-on must be installed for using network
load balancer capabilities.
■ AWS EFS-CSI driver must be installed for statically provisioning the PV or PVC
in EFS for primary server.
For more information on installing the load balancer add-on controller and EFS-CSI
driver, See “About the Load Balancer service” on page 170.
Chapter 3
Recommendations and
Limitations
This chapter includes the following topics:
■ Deploy primary server custom resource and media server custom resource in
same namespace.
Recommendations and Limitations 55
Recommendations of NetBackup deployment on Kubernetes cluster
■ Ensure that you follow the symbolic link and edit the actual persisted version of
the file, if you want to edit a file having a symbolic link in the primary server or
media server.
■ Specify different block storage based volume to obtain good performance when
the nbdeployutil utility does not perform well on the following respective storage
types based volumes:
(AKS-specific): Azure premium files
(EKS-specific): Amazon elastic files
■ Duplication job configuration recommendation:
While configuring destination storage unit, manually select media servers that
are always up, running and would never scale in (by the media server autoscaler).
Number of media servers that are always up and running would be same as
that of the value mentioned in minimumReplicas field in CR.
When upgrading from older version of NetBackup 10.3, post upgrade ensure
that you manually select media servers mentioned in minimumReplicas field
in CR. If the value of minimumReplicas is not specified, the value will be set
to the default value of 1.
■ Adjust the value of minimumReplicas field based on the backup environment
and requirements.
■ (AKS-specific)
■ Use Azure Premium storage for data volume in media server CR.
■ Use Azure Standard storage for log volume in media server CR.
■ For primary server catalog volume, use Azure premium files as storage
type and for media server volumes, use managed-disk as storage type.
■ In case of upgrade and during migration, do not delete the Azure premium
files/Azure disk volume linked to the old PV which is used in primary
server CR deployment until the migration is completed successfully. Else
this leads to data loss.
■ Do not skip the Config-Checker utility execution during NetBackup upgrade
or data migration.
(EKS-specific)
■ Use AWS Premium storage for data volume in media server CR.
■ Use AWS Standard storage for log volume in media server CR.
■ For primary server volume (catalog), use Amazon EFS as storage type. For
media server, primary server volumes, log and data volumes use Amazon
EBS as storage type.
Recommendations and Limitations 56
Limitations of NetBackup deployment on Kubernetes cluster
■ In case of upgrade and during migration, do not delete the Amazon elastic
files linked to the old PV which is used in primary server CR deployment
until the migration is completed successfully. Else this leads to data loss.
EKS-specific
Recommendations and Limitations 57
Limitations in MSDP Scaleout
■ (Applicable only for media servers) A storage class that has the storage type
as EFS is not supported. When the Config-Checker runs the validation for
checking the storage type, the Config-Checker job fails if it detects the storage
type as EFS. But if the Config-Checker is skipped then this validation is not run,
and there can be issues in the deployment. There is no workaround available
for this limitation. You must clean up the PVCs and CRs and reapply the CRs.
■ Initial configurations
■ Configuring NetBackup
Item Description
OCI images in the These docker image files that are loaded and then copied to
/images directory the container registry to run in Kubernetes. They include
NetBackup and MSDP Scaleout application images and the
operator images.
Item Description
Sample product (.yaml) files You can use these as templates to define your NetBackup
at /samples directory environment.
MSDP kubectl plug-in at Used to deploy MSDP Scaleout separately without NetBackup
/bin/kubectl-msdp operator.
Note: Used for troubleshooting issues only.
Configurations 60
Initial configurations
Initial configurations
Creating Secrets
Perform the following steps to create Secrets
1 Create a Kubernetes namespace where your new NetBackup environment will
run. Run the command:
kubectl create namespace nb-example
Where, nb-example is the name of the namespace. The Primary, Media, and
MSDP Scaleout application namespace must be different from the one used
by the operators. It is recommended to use two namespaces. One for the
operators, and a second one for the applications.
2 Create a secret to hold the primary server credentials. Those credentials are
configured in the NetBackup primary server, and other resources in the
NetBackup environment use them to communicate with and configure the
primary server. The secret must include fields for `username` and `password`.
If you are creating the secret by YAML, the type should be opaque or basic-auth.
For example:
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials
namespace: nb-example
type: kubernetes.io/basic-auth
stringData:
username: nbuser
password: p@ssw0rd
3 Create a KMS DB secret to hold Host Master Key ID (`HMKID`), Host Master
Key passphrase (`HMKpassphrase`), Key Protection Key ID (`KPKID`), and
Key Protection Key passphrase (`KPKpassphrase`) for NetBackup Key
Management Service. If creating the secret by YAML, the type should be
_opaque_. For example:
apiVersion: v1
kind: Secret
metadata:
name: example-key-secret
namespace: nb-example
type: Opaque
stringData:
HMKID: HMKID
HMKpassphrase: HMKpassphrase
KPKID: KPKID
KPKpassphrase: KPKpassphrase
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic example-key-secret --namespace
nb-namespace --from-literal=HMKID="HMKID"
--from-literal=HMKpassphrase="HMKpassphrase"
--from-literal=KPKID="KPKID"
--from-literal=KPKpassphrase="KPKpassphrase"
4 Create a secret to hold the MSDP Scaleout credentials for the storage server.
The secret must include fields for `username` and `password` and must be
located in the same namespace as the Environment resource. If creating the
secret by YAML, the type should be _opaque_ or _basic-auth_. For example:
apiVersion: v1
kind: Secret
metadata:
name: msdp-secret1
namespace: nb-example
type: kubernetes.io/basic-auth
stringData:
username: nbuser
password: p@ssw0rd
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic msdp-secret1 --namespace
nb-example --from-literal=username='nbuser'
--from-literal=password='p@ssw0rd'
Note: You can use the same secret for the primary server credentials (from
step 2) and the MSDP Scaleout credentials, so the following step is optional.
However, to use the primary server secret in an MSDP Scaleout, you must set
the credential.autoDelete property to false. The sample file includes an
example of setting the property. The default value is true, in which case the
secret may be deleted before all parts of the environment have finished using
it.
Configurations 63
Initial configurations
5 (Optional) Create a secret to hold the KMS key details. Specify KMS Key only
if the KMS Key Group does not already exist and you need to create.
Note: When reusing storage from previous deployment, the KMS Key Group
and KMS Key may already exist. In this case, provide KMS Key Group only.
If creating the secret by YAML, the type should be _opaque_. For example:
apiVersion: v1
kind: Secret
metadata:
name: example-key-secret
namespace: nb-example
type: Opaque
stringData:
username: nbuser
passphrase: 'test passphrase'
You can also create a secret using kubectl from the command line:
$ kubectl create secret generic example-key-secret --namespace
nb-example --from-literal=username="nbuser"
--from-literal=passphrase="test passphrase"
You may need this key for future data recovery. After you have successfully
deployed and saved the key details. It is recommended that you delete this
secret and the corresponding key info secret.
6 Create a secret to hold the MSDP S3 root credentials if you need MSDP S3
service. The secret must include accessKey and secretKey, and must be
located in the same namespace as the Environment resource.
■ accessKey must match the regex pattern ^[\w]+$ and has the length in
the range [16, 128].
■ secretKey must match the regex pattern ^[\w+\/]+$ and has the length
in the range [32, 128].
It is recommended that you generate random S3 root credentials. Run the
following command:
$ kubectl msdp generate-s3-secret --namespace nb-example
--s3secret s3-secret1
Configurations 64
Configuring the environment.yaml file
Save the generated S3 root credentials at a secure place for later use.
7 Create the Snapshot Manager server secret using kubectl from the command
line:
kubectl create secret generic cp-creds --namespace netbackup
--from-literal=username="admin"
--from-literal=password="CloudPoint@123"
Parameter Description
namespace: example-ns Specify the namespace where all the NetBackup resources are managed. If not
specified here, then it will be the current namespace when you run the command
kubectl apply -f on this file.
Configurations 65
Configuring the environment.yaml file
Parameter Description
(AKS-specific) containerRegistry: Specify a container registry that the cluster has access. NetBackup images are
example.azurecr.io pushed to this registry.
(EKS-specific) containerRegistry:
example.dkr.ecr.us-east-2
.amazonaws.com/exampleReg
tag: 10.4 This tag is used for all images in the environment. Specifying a `tag` value on a
sub-resource affects the images for that sub-resource only. For example, if you
apply an EEB that affects only primary servers, you might set the `primary.tag`
to the custom tag of that EEB. The primary server runs with that image, but the
media servers and MSDP scaleouts continue to run images tagged `10.4`. Beware
that the values that look like numbers are treated as numbers in YAML even
though this field needs to be a string; quote this to avoid misinterpretation.
paused: false Specify whether the NetBackup operator attempts to reconcile the differences
between this YAML specification and the current Kubernetes cluster state. Only
set it to true during maintenance.
configCheckMode: default This controls whether certain configuration restrictions are checked or enforced
during setup. Other allowed values are skip and dryrun.
corePattern: Specify the path to use for storing core files in case of a crash.
/corefiles/core.%e.%p.%t
(AKS-specific) Specify the annotations to be added for the network load balancer
loadBalancerAnnotations: service.
beta.kubernetes.io/ azure-load-
balancer- internal-subnet:
example-subnet
(EKS-specific)
loadBalancerAnnotations:
service.beta.kubernetes.io/aws-load-balancer-subnets:
example-subnet1 name
emailServerConfigmapName Name of the configmap that contains required details to configure the email server
in NetBackup.
dbSecretName Specify the name of the secret required for deployment of PostgreSQL as a
container. This secret is created as part of the Helm installation of PostgreSQL
container.
Configurations 66
Configuring the environment.yaml file
Parameter Description
dbSecretProviderClass (Optional) Specify the name of the SecretProvider class required for DBaaS
deployment of PostgreSQL.
The following section describes Snapshot Manager related parameters. You may
also deploy without any Snapshot Manager. In that case, remove the cpServer
section entirely from the configuration file.
Parameter Description
containerRegistry (Optional) Specify a container registry that the cluster has access.
Snapshot Manager images are pushed to this registry which
overrides the one defined in Common environment parameters
table above.
Parameter Description
log.storageClassName Storage class for log volume. It must be EFS based storage class.
proxySettings: Address to be used as the proxy for all HTTP connections. For
vx_http_proxy: example, "http://proxy.example.com:8080/"
proxySettings: Address to be used as the proxy for all HTTPS connections. For
vx_https_proxy: example, "http://proxy.example.com:8080/"
proxySettings: Address that are allowed to bypass the proxy server. You can
vx_no_proxy: specify host name, IP addresses and domain names in this
parameter. For example,
"localhost,mycompany.com,169.254.169.254"
The following configurations apply to the primary server. The values specified in
the following table can override the values specified in the table above.
Configurations 68
Configuring the environment.yaml file
Paragraph Description
tag: 10.4-special To use a different image tag specifically for the primary
server, uncomment this value and provide the desired tag.
This overrides the tag specified in the common section.
nodeSelector: Specify a key and value that identifies nodes where the
primary server pod runs.
labelKey: kubernetes.io/os
Note: This labelKey and labelValue must be the same
labelValue: linux
label key:value pair used during cloud node creation which
would be used as a toleration for primary server.
Paragraph Description
credSecretName: primary-credential-secret This determines the credentials for the primary server.
Media servers use these credentials to register themselves
with the primary server.
kmsDBSecret: kms-secret Secret name which contains the Host Master Key ID
(HMKID), Host Master Key passphrase (HMKpassphrase),
Key Protection Key ID (KPKID) and Key Protection Key
passphrase (KPKpassphrase) for NetBackup Key
Management Service. The secret should be 'Opaque', and
can be created either using a YAML or the following
example command: kubectl create secret
generic kms-secret --namespace nb-namespace
--from-literal=HMKID="HMK@ID"
--from-literal=HMKpassphrase="HMK@passphrase"
--from-literal=KPKID="KPK@ID"
--from-literal=KPKpassphrase="KPK@passphrase"
(AKS specific) autoVolumeExpansion Enables automatic monitoring of the catalog volume when
set to true. For more information, see Reducing catalog
storage management.
capacity: 30Gi
Paragraph Description
The following section describes the media server configurations. If you do not have
a media server either remove this section from the configuration file entirely, or
define it as an empty list.
Parameters Description
tag: 10.4-special To use a different image tag specifically for the media
servers, uncomment this value and provide the desired
tag. This overrides the tag specified above in the common
table.
Configurations 71
Configuring the environment.yaml file
Parameters Description
capacity: 50Gi The minimum data size for a media server is 50 Gi.
(AKS-specific) storageClassName:
managed-premium-nbux
(AKS-specific) storageClassName:
managed-premium-nbux
ipAddr: 4.3.2.2
fqdn: media1-1.example.com
ipAddr: 4.3.2.3
fqdn: media1-2.example.com
on installing the EBS CSI driver, see Amazon EBS CSI driver. Example, for gp3
storage class:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
The following section describes MSDP-related parameters. You may also deploy
without any MSDP scaleouts. In that case, remove the msdpScaleouts section
entirely from the configuration file.
Parameter Description
tag: '20.4' This tag overrides the one defined in the table 1-3. It is
necessary because the MSDP Scaleout images are
shipped with tags different from the NetBackup primary
and media images.
serviceIPFQDNs: These are the IP addresses and host names of the MSDP
Scaleout servers. The number of the entries should match
ipAddr: 1.2.3.4
the number of the replicas specified above.
fqdn: dedupe1-1.example.com
Configurations 73
Configuring the environment.yaml file
Parameter Description
kms: Specifies the initial key group and key secret to be used
for KMS encryption. When reusing storage from a previous
keyGroup: example-key-group
deployment, the key group and key secret may already
exist. In this case, provide the keyGroup only.
keySecret: Specify keySecret only if the key group does not already
exist and needs to be created. The secret type should be
example-key-secret
Opaque, and you can create the secret either using a
YAML or the following command:
(AKS-specific) loadBalancerAnnotations: For MSDP scaleouts, the default value for the following
service.beta.kubernetes .io/azure-load- balancer-internal: annotation is `false`, which may cause the MSDP Scaleout
true services in this Environment to be accessible publicly:
credential: This defines the credentials for the MSDP Scaleout server.
It refers to a secret in the same namespace as this
secretName: msdp-secret1
environment resource. Secret can be either of type
'Basic-auth' or 'Opaque'. You can create secrets using a
YAML or by using the following command:kubectl
create secret generic <msdp-secret1>
--namespace <nb-namespace>
--from-literal=username=<"devuser">
--from-literal=password=<"Y@123abCdEf">
autoDelete: false Optional parameter. Default value is true. When set to true,
the MSDP Scaleout operator deletes the MSDP secret
after using it. In such case, the MSDP and primary secrets
must be distinct. To use the same secret for both MSDP
scaleouts and the primary server, set autoDelete to false.
Configurations 74
Configuring the environment.yaml file
Parameter Description
dataVolumes: This specifies the data storage for this MSDP Scaleout
resource. You may increase the size of a volume or add
capacity: 5Gi
more volumes to the end of the list, but do not remove or
(AKS-specific) storageClassName: standard re-order volumes. Maximum 16 volumes are allowed.
Appending new data volumes or expanding existing ones
(EKS-specific) storageClassName: gp2
will cause short downtime of the Engines. Recommended
volume size is 5Gi-32Ti.
nodeSelector: Specify a key and value that identifies nodes where MSDP
Scaleout pods will run.
labelKey: kubernetes.io/os
labelValue: linux
secretName: s3-secret1 Defines the MSDP S3 root credentials for the MSDP
Scaleout server. It refers to a secret in the same
namespace as this environment resource. If the parameter
is not specified, MSDP S3 feature is unavailable.
Parameter Description
ipAddr: 1.2.3.8 The IP address and host name of the S3 load balancer
service. If the parameter is not specified, MSDP S3 feature
fqdn: dedupe1-s3.example.com
is unavailable.
Parameter Description
name Specifies the prefix name for the primary, media, and MSDP Scaleout server resources.
(AKS-specific) The values against ipAddr, fqdn and loadBalancerAnnotations against following
fields should not be changed post initial deployment. This is applicable for primary,
ipAddr, fqdn and
media, and MSDP Scaleout servers. For example:
loadBalancerAnnotations
- The loadBalancerAnnotations for loadBalancerAnnotations:
service.beta.kubernetes.io/azure-load-balancer
-internal-subnet: example-subnet
service.beta.kubernetes.io/azure-load-balancer -internal: "true"
The IP and FQDNs values defined for Primary, Media and
MSDPScaleout ipList:
- ipAddr: 4.3.2.1 fqdn: primary.example.com
ipList:
- ipAddr: 4.3.2.2 fqdn: media1-1.example.com
- ipAddr: 4.3.2.3 fqdn: media1-2.example.com
serviceIPFQDNs:
- ipAddr: 1.2.3.4 fqdn: dedupe1-1.example.com
- ipAddr: 1.2.3.5 fqdn: dedupe1-2.example.com
- ipAddr: 1.2.3.6 fqdn: dedupe1-3.example.com
- ipAddr: 1.2.3.7 fqdn: dedupe1-4.example.com
Configurations 76
Configuring the environment.yaml file
Parameter Description
(EKS-specific) The values against ipAddr, fqdn and loadBalancerAnnotations against following
fields should not be changed post initial deployment. This is applicable for primary,
ipAddr, fqdn and
media, and MSDP Scaleout servers. For example:
loadBalancerAnnotations
- The loadBalancerAnnotations for loadBalancerAnnotations:
service.beta.kubernetes.io/aws-load-balancer -internal-subnet:
example-subnet service.beta.kubernetes.io/aws-load-balancer
-internal: "true"
- The IP and FQDNs values defined for Primary, Media and
MSDPScaleout ipList:
- ipAddr: 4.3.2.1 fqdn: primary.example.com
ipList:
- ipAddr: 4.3.2.2 fqdn: media1-1.example.com
- ipAddr: 4.3.2.3 fqdn: media1-2.example.com
serviceIPFQDNs:
- ipAddr: 1.2.3.4 fqdn: dedupe1-1.example.com
- ipAddr: 1.2.3.5 fqdn: dedupe1-2.example.com
- ipAddr: 1.2.3.6 fqdn: dedupe1-3.example.com
- ipAddr: 1.2.3.7 fqdn: dedupe1-4.example.com
parameters Description
parameters Description
ipList:
ipAddr: 4.3.2.2
fqdn: media1-1.example.com
ipAddr: 4.3.2.3
cpServer: media1-2.example.com
Run the command docker image ls command to confirm that the NetBackup
images are loaded properly to the docker cache.
<version>: Represents the NetBackup product version.
3 Run the following commands to re-tag the images to associate them with your
container registry, keep the image name and version same as original:
(AKS-specific): $ REGISTRY=<example.azurecr.io> (Replace with your
own container registry name)
(EKS-specific): $ REGISTRY=<<AccountID>.dkr.ecr.<region>.amazonaws.com
$ docker tag netbackup/main:<version>
${REGISTRY}/netbackup/main:<version>
If the repository is not created, then create the repository using the following
command:
aws ecr create-repository --repository-name <image-name> --region
<region-name>
5 Run the following commands to push the images to the container registry:
$ docker push ${REGISTRY}/netbackup/main:<version>
$ docker tag
veritas/flexsnap-datamover:${SNAPSHOT_MANAGER_VERSION}
${REGISTRY}/veritas/flexsnap-datamover:${SNAPSHOT_MANAGER_VERSION}
$ docker tag
veritas/flexsnap-postgresql:${SNAPSHOT_MANAGER_VERSION}
${REGISTRY}/veritas/flexsnap-postgresql:${SNAPSHOT_MANAGER_VERSION}
Note: Ensure that you use the same tag as that of Snapshot Manager image
version. Custom tag cannot be used.
Configurations 81
Loading docker images
$ docker push
${REGISTRY}/veritas/flexsnap-fluentd:${SNAPSHOT_MANAGER_VERSION}
$ docker push
${REGISTRY}/veritas/flexsnap-datamover:${SNAPSHOT_MANAGER_VERSION}
$ docker push
${REGISTRY}/veritas/flexsnap-nginx:${SNAPSHOT_MANAGER_VERSION}
$ docker push
${REGISTRY}/veritas/flexsnap-postgresql:${SNAPSHOT_MANAGER_VERSION}
$ docker push
${REGISTRY}/veritas/flexsnap-core:${SNAPSHOT_MANAGER_VERSION}
$ docker push
${REGISTRY}/veritas/flexsnap-deploy:${SNAPSHOT_MANAGER_VERSION}
Note: This section is applicable only when MSDP Scaleout is deployed separately
without the environment operator or Helm charts.
3 Copy MSDP kubectl plugin to a directory from where you access AKS or EKS
host. This directory can be configured in the PATH environment variable so
that kubectl can load kubectl-msdp as a plugin automatically.
For example,
cp ./VRTSpddek-*/bin/kubectl-msdp /usr/local/bin/
4 Push the docker images to the ACR. Keep the image name and version same
as original.
3 Copy MSDP kubectl plugin to a directory from where you access AKS or EKS
host. This directory can be configured in the PATH environment variable so
that kubectl can load kubectl-msdp as a plugin automatically.
For example,
cp ./VRTSpddek-*/bin/kubectl-msdp /usr/local/bin/
--password-stdin \
<aws_account_id>.dkr.ecr.<region>.amazonaws.com
■ Create a repository.
See AWS documentation Creating a private repository
■ Push the docker images to ECR. Keep the image name and version same
as original.
Note: From NetBackup version 10.3, cloudscale release data collector on primary
server pod is supported.
itanalyticsportal.<yourdomain>
itanalyticsagent.<yourdomain>
itanalyticsportal.<yourdomain>
itanalyticsagent.<yourdomain>
aptareportal.<yourdomain>
aptareagent.<yourdomain>
cd "/mnt/nbdata/"
mkdir analyticscollector
COLLECTOR_KEY_PATH=/mnt/nbdata/analyticscollector/<your-collector-name>.key
HTTP_PROXY_CONF=N
HTTP_PROXY_ADDRESS=
HTTP_PROXY_PORT=
HTTPS_PROXY_ADDRESS=
HTTPS_PROXY_PORT=
PROXY_USERNAME=
PROXY_PASSWORD=
PROXY_EXCLUDE=
■ Run ./dc_installer.sh -c
/usr/openv/analyticscollector/installer/responsefile.sample
command to connect data collector with IT Analytics portal
10 Check the data collector services status by running the following command
and ensure that the following data collector services are up and running:
/usr/openv/analyticscollector/mbs/bin/aptare_agent status
For more information about IT Analytics data collector policy, see NetBackup IT
Analytics User Guide.
3 Create and copy NetBackup API key from NetBackup web UI.
Configuring the primary server with NetBackup IT Analytics tools is supported only
once from primary server CR.
For more information about IT Analytics data collector policy, see Add a Veritas
NetBackup Data Collector policy and for more information about adding NetBackup
Primary Servers within the Data Collector policy, see Add/Edit NetBackup Master
Servers within the Data Collector policy.
Configurations 87
Configuring NetBackup
3 Restart the sshd service using the systemctl restart sshd command.
Configuring NetBackup
Primary and media server CR
This section provides details of the primary and media server CR's.
For more information on managing the load balancer service, See “About the Load
Balancer service” on page 170.
Note: After deployment, you cannot change the Name in primary server and
media server CR.
■ Before the CRs can be deployed, the utility called Config-Checker is executed
that performs checks on the environment to ensure that it meets the basic
deployment requirements. The config-check is done according to the
configCheckMode and paused values provided in the custom resource YAML.
See the section called “How does the Config-Checker utility work” on page 40.
Configurations 88
Configuring NetBackup
■ You can deploy the primary server and media server CRs in same namespace.
■ (AKS-specific) Use the storage class that has the storage type as Azure premium
files for the catalog volumes in the primary server CR, and the storage type
as Azure disk for the data and log volumes in the media server CR and primary
server CR.
(EKS-specific) Use the storage class that has the storage type as Amazon
elastic files for the catalog volume in the primary server CR. For data and
log volumes in the media server use the storage type as EBS.
■ During fresh installation of the NetBackup servers, the value for keep logs up
to under log retention configuration is set based on the log storage capacity
provided in the primary server CR inputs. You may change this value if required.
To update logs retention configuration, refer the steps mentioned in NetBackup™
Logging Reference Guide.
■ The NetBackup deployment sets the value as per the formula.
Size of logs PVC/PV * 0.8 = Keep logs up value By default, the default value
is set to 24GB.
For example: If the user configures the storage size in the CR as 40GB
(instead of the default 30GB) then the default value for that option become
32GB automatically based on the formula.
Note: This value will get automatically updated to the value of bp.conf file
on volume expansion.
■ Deployment details of primary server and media server can be observed from
the operator pod logs using the following command:
kubectl logs <operator-pod-name> -c netbackup-operator -n
<operator-namespace>
Fields Description
ActiveReplicas Indicates the actual number of replicas that must be running to complete
the ongoing operations on the media servers. Default value is 0. It will
be 0 if media server autoscaler is disabled.
Note: If autoscaler is enabled (after that autscaler is tuned off) the
value would be set to the value of minimumReplicas. It will be
minimumReplicas even if media server autoscaler is disabled.
NextIterationTime Indicates the next iteration time of the media server autoscaler that is,
the media server autoscaler will run after NextIterationTime only. Default
value is empty.
NextCertificateRenewal Next time to scale up all registered media servers for certificate renewal.
Time
Configuration parameters
■ ConfigMap
Configurations 91
Configuring NetBackup
Parameters Description
Parameters Description
Parameters Description
Note: Media server autoscaler scales out single pod at a time but can scale in multiple
pods at a time.
Note: If the scale-in does not happen due to background processes running on
the media server, a notification would be sent on NetBackup Web UI after regular
time interval as configured in the autoscaler ConfigMap. For more details, see
the following section:
See “Troubleshooting AKS and EKS issues” on page 258.
The time taken for media server scale depends on the value of
scaling-interval-in-seconds configuration parameter. During this interval, the
jobs would be served by existing media server replicas based on NetBackup
throttling parameters. For example, Maximum concurrent jobs in storage unit,
Number of jobs per client, and so on.
Cluster's native autoscaler takes some time as per scale-down-unneeded-time
attribute, which decides on the time a node should be unneeded before it is eligible
to be scaled down. By default this is 10 minutes. To change this parameter, edit
the cluster-autoscaler’s current deployment settings using the following commands
and then edit the existing value:
■ AKS: az aks update -g $RESOURCE_GROUP_NAME -n
■ EKS: kubectl -n kube-system edit deployment cluster-autoscaler
Note the following:
■ For scaled in media servers, certain resources and configurations are retained
to avoid reconfiguration during subsequent scale out.
■ Kubernetes services, persistent volume claims and persistent volumes are
not deleted for scaled in media servers.
■ Host entries for scaled in media servers are not removed from NetBackup
primary server. Hence scaled in media server entries will be displayed on
Web UI / API.
■ For scaled down media servers, the deleted media servers are also displayed
on Web UI/API during the credential validation for database servers.
Configurations 93
Configuring NetBackup
■ The configMap data must have entries in a key: value form to configure the
mail server, as shown below for smtp field:
emailServerConfigmap
apiVersion: v1
kind: ConfigMap
metadata:
name: configmail
namespace: <netbackup-namespace>
data:
smtp: "xyz.abc.domain.com"
smtp-use-starttls: ""
■ If there is a specific parameter that needs to be set only (not value), a key can
only be specified with the smtp-use-starttls field.
Perform the following to modify the mail server settings:
■ Exec into the primary container using the following command:
kubectl exec -it -n <namespace> <primaryServer-pod-name> -- bash
mail.rc
# mail server configuration
set mail
set mailserver=xyz.abc.domain.com:25
set smtp-use-starttls
Disk-based storage
Azure-disk based storage
■ Zone-redundant storage (ZRS) synchronously replicates an Azure managed
disk across three Azure availability zones in the regions selected. This can be
selected by setting skuname: Premium_ZRS in the yaml file for creating the
storage class.
■ ZRS disks are currently available in: Southeast Asia, Australia East, Brazil South,
North Europe, West Europe, France Central, Japan East, Korea Central, Qatar
Central, UK South, East US, East US 2, South Central US and West US 2.
■ The following yaml file can be used:
aks-disk
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: <storage class name>
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: Immediate
parameters:
skuname: Premium_ZRS
File-based storage
Azure file-based storage
■ Zone-redundant storage (ZRS) replicates the storage account synchronously
across three Azure availability zones in the primary region. This can be selected
Configuration of key parameters in Cloud Scale deployments 100
Enabling client-side deduplication capabilities
by setting skuname: Premium_ZRS in the yaml file for creating the storage
class.
■ ZRS for premium file shares is available in: Southeast Asia, Australia East,
Brazil South, North Europe, West Europe, France Central, Japan East, Korea
Central, Qatar Central, UK South, East US, East US 2, South Central US and
West US 2.
■ The following yaml file can be used:
aks-file
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: <name of storage class>
provisioner: file.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: True
volumeBindingMode: Immediate
parameters:
skuName: Premium_ZRS
protocol: nfs
server permitting the same. This is the default value set for Cloud Scale
deployment.
■ 2 - Client Direct will always be used as the data transfer method to transfer the
data to the specified host.
To change the value of OST_CLIENT_DIRECT entry through the bpclient CLI, refer
to the NetBackup Commands Reference Guide.
To enable client-side deduplication during restores, add
OST_CLIENT_DIRECT_RESTORE [=0/1/2] entry to the bp.conf file of the media
server(s). A new entry OLD_VNETD_CALLBACK=YES must be added to the
bp.conf file of all the clients that would use the client-side dedupliction feature
during restores, barring which restore jobs would fail. For more information, see
NetBackup Administrator's Guide, Volume I.
The restore mode (for enabling/disabling Client Direct) can also be changed using
the bpclient CLI as listed under client_direct_restore of the NetBackup Commands
Reference Guide.
If Client Direct is enabled/disabled using the bp.conf file and through the bpclient
CLI, then the configuration in bpclient would take precedence over the bp.conf file
changes.
Section 2
Deployment
2 Use the following command to edit the chart values to fit your requirement:
vi operators-values.yaml
Deploying operators 105
Deploying the operators
Or
If using the OCI registry, use the following command:
helm upgrade --install operators
oci://abcd.veritas.com:5000/helm-charts/operators --version 10.4
-f operators-values.yaml --create-namespace --namespace
netbackup-operator-system
global:
# Toggle for platform-specific features & settings
# Microsoft AKS: "aks"
# Amazon EKS: "eks"
Deploying operators 106
Deploying the operators
platform: "eks"
# This specifies a container registry that the cluster has
access to.
# NetBackup images should be pushed to this registry prior to
applying this
# Environment resource.
# Example Azure Container Registry name:
# example.azurecr.io
# Example AWS Elastic Container Registry name:
# 123456789012.dkr.ecr.us-east-1.amazonaws.com
containerRegistry:
"364956537575.dkr.ecr.us-east-1.amazonaws.com"
operatorNamespace: "netbackup-operator-system"
storage:
eks:
fileSystemId: fs-0f3cc640eeec507d0
msdp-operator:
image:
name: msdp-operator
# Provide tag value in quotes eg: '17.0'
tag: "20.4"
pullPolicy: Always
namespace:
labels:
control-plane: controller-manager
# This determines the path used for storing core files in the
case of a crash.
corePattern: "/core/core.%e.%p.%t"
nodeSelector: {}
logging:
# Enable verbose logging
debug: false
# Maximum age (in days) to retain log files, 1 <= N <= 365
age: 28
# Maximum number of log files to retain, 1 <= N =< 20
num: 20
nb-operator:
image:
name: "netbackup/operator"
tag: "10.4"
pullPolicy: Always
tag: "20.4"
flexsnap-operator:
image:
tag: "10.4.0.0.1016"
namespace:
labels:
nb-control-plane: nb-controller-manager
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
#loglevel:
# "-1" - Debug (not recommended for production)
# "0" - Info
# "1" - Warn
# "2" - Error
loglevel:
value: "0"
flexsnap-operator:
replicas: 1
namespace:
labels: {}
image:
name: "veritas/flexsnap-deploy"
tag: "10.4.0.1004"
pullPolicy: Always
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
Chapter 7
Deploying Postgres
This chapter includes the following topics:
■ Deploying Postgres
Deploying Postgres
NetBackup version 10.4 provides support for deploying the Postgres database
using Helm charts.
Note: (Optional) If you want Postgres pod to not be scheduled on any other nodes
other than Primary nodepool, add Kubernetes taints to the Media, MSDP and
FlexSnap/NetBackup Snapshot Manager nodepool.
Or
If using the OCI registry, use the following command:
helm upgrade --install postgresql
oci://abcd.veritas.com:5000/helm-charts/netbackup-postgresql
--version <version> -f postgres-values.yaml -n netbackup
postgresql:
replicas: 1
# The values in the image (name, tag) are placeholders. These
will be set
# when the deploy_nb_cloudscale.sh runs.
image:
name: "netbackup/postgresql"
tag: "14.11.1.0"
Deploying Postgres 112
Enable request logging, update configuration, and copying files from/to PostgreSQL pod
pullPolicy: Always
service:
serviceName: nb-postgresql
volume:
volumeClaimName: nb-psql-pvc
volumeDefaultMode: 0640
pvcStorage: 5Gi
# configMapName: nbpsqlconf
storageClassName: nb-disk-premium
mountPathData: /netbackup/postgresqldb
secretMountPath: /netbackup/postgresql/keys/server
# mountConf: /netbackup
timezone: null
securityContext:
runAsUser: 0
createCerts: true
# pgbouncerIniPath: /netbackup/pgbouncer.ini
serverSecretName: postgresql-server-crt
clientSecretName: postgresql-client-crt
dbSecretName: dbsecret
dbPort: 13785
pgbouncerPort: 13787
dbAdminName: postgres
initialDbAdminPassword: postgres
dataDir: /netbackup/postgresqldb
# postgresqlConfFilePath: /netbackup/postgresql.conf
# pgHbaConfFilePath: /netbackup/pg_hba.conf
defaultPostgresqlHostName: nb-postgresql
Method 1
■ Run the following command to exec into the Postgres pod:
kubectl exec -it <postgres-pod-name> -n netbackup –- bash
kubectl cp
netbackup/nb-postgresql-0:/netbackup/postgresqldb/postgresql.conf
psqlconf/postgresql.conf
kubectl cp
netbackup/nb-postgresql-0:/netbackup/postgresqldb/pg_hba.conf
psqlconf/pg_hba.conf
kubectl cp
netbackup/nb-postgresql-0:/home/nbsvcusr/postgresqldb/pgbouncer.ini
psqlconf/pgbouncer.ini
■ Use the following command to save the PostgreSQL chart values to a file:
helm show values postgresql-<version>.tgz >
postgres-values.yaml
postgresql:
replicas: 1
# The values in the image (name, tag) are placeholders. These
will be set
Deploying Postgres 115
Enable request logging, update configuration, and copying files from/to PostgreSQL pod
image:
name: "netbackup/postgresql"
tag: "10.4"
pullPolicy: Always
service:
serviceName: nb-postgresql
volume:
volumeClaimName: nb-psql-pvc
volumeDefaultMode: 0640
pvcStorage: 5Gi
configMapName: nbpsqlconf
storageClassName: nb-disk-premium
mountPathData: /netbackup/postgresqldb
secretMountPath: /netbackup/postgresql/keys/server
mountConf: /netbackup
timezone: null
securityContext:
runAsUser: 0
createCerts: true
pgbouncerIniPath: /netbackup/pgbouncer.ini
serverSecretName: postgresql-server-crt
clientSecretName: postgresql-client-crt
dbSecretName: dbsecret
dbPort: 13785
pgbouncerPort: 13787
dbAdminName: postgres
initialDbAdminPassword: postgres
dataDir: /netbackup/postgresqldb
postgresqlConfFilePath: /netbackup/postgresql.conf
pgHbaConfFilePath: /netbackup/pg_hba.conf
defaultPostgresqlHostName: nb-postgresql
log_min_duration_statement = 0
Note: The value 0 means the setting is the threshold in milliseconds after which
the statement is to be logged.
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
log_error_verbosity = default
2. Perform one of the following methods to copy files out of PostgreSQL container:
■ Method 1:
Run the following commands:
kubectl exec nb-postgresql-0 -n netbackup -- cat
/netbackup/postgresqldb/postgresql.conf > /tmp/postgresql.conf
kubectl exec nb-postgresql-0 -n netbackup -- cat
/netbackup/postgresqldb/log/postgresql-Tue.log >
/tmp/postgresql-Tue.log
kubectl cp
netbackup/nb-postgresql-0:/netbackup/postgresqldb/postgresql.conf
/tmp/postgresql.conf
Chapter 8
Deploying Cloud Scale
This chapter includes the following topics:
4 Deploy the operators. For more information on deploying the operators, refer
to the following section:
See “Deploying the operators” on page 103.
5 Deploy the PostgreSQL database. For more information on deploying the
PostgreSQL database, refer to the following section:
See “Deploying Postgres” on page 109.
6 Perform the following steps to deploy the environment.yaml file:
■ Use the following command to save the environment chart values to a file:
helm show values environment-10.4.tgz > environment-values.yaml
For example,
■ For EKS
kubectl msdp init -i <ecr-url>/msdp-operator:<version> -s
<storage-class-name> [-l agentpool=<nodegroup-name>]
Option Description
Option Description
■ AKS: agentpool=<nodepool-name>
■ EKS: agentpool=<nodegroup-name>
Range: 1-365
Default value: 28
Range: 1-20
Default value: 20
In the STATUS column, if the readiness state for the controller, MDS and
engine pods are all Running, it means that the configuration has completed
successfully.
In the READY column for engines, 2/2 or 3/3 indicates that the engine
configuration has completed successfully.
9 If you specified spec.autoRegisterOST.enabled: true in the CR, when the
MSDP engines are configured, the MSDP operator automatically registers the
storage server, a default disk pool, and a default storage unit in the NetBackup
primary server.
A field ostAutoRegisterStatus in the Status section indicates the registration
status. If ostAutoRegisterStatus.registered is True, it means that the
registration has completed successfully.
You can run the following command to check the status:
kubectl get msdpscaleouts.msdp.veritas.com -n <sample-namespace>
You can find the storage server, the default disk pool, and storage unit on the
Web UI of the NetBackup primary server.
If the command output is true, S3 service is configured and ready for use.
Otherwise, wait for the flag to be true. The flag changes to true automatically
after all MSDP Scaleout resources are ready.
2 Use the following URL to access S3 service in MSDP Scaleout:
https://<MSDP-S3-FQDN>
Limitations:
■ S3 service in MSDP Scaleout only supports NBCA certificates.
You can use the CA certificate in NetBackup primary server to bypass SSL
warnings when accessing S3 service. The CA certificate file path is
/usr/openv/var/webtruststore/cacert.pem.
Deploying MSDP Scaleout 125
MSDP Scaleout configuration
For example, when using AWS CLI you can use -ca-bundle parameter to
specify CA certificate file path to bypass SSL warnings.
■ The region name of MSDP S3 service is the LSU name that is used to store S3
data. Set the default region name to PureDiskVolume to use the MSDP local
LSU to store the S3 data.
■ Recommend 500~800 concurrent requests based on your Kubernetes node's
performance.
Ensure that you save the S3 credential at a secure place after it is generated
for later use.
If MSDP kubectl plug-in is not installed, copy MSDP kubectl plug-in from the
operator TAR folder to a directory from where you access the cluster host. This
directory can be configured in the PATH environment variable so that kubectl
can load MSDP kubectl as a plug-in automatically.
For example,
$ cp ./VRTSk8s-netbackup-<version>-0065/bin/kubectl-msdp
/usr/local/bin/
■ If the MSDP Scaleout is deployed with environment YAML, run the following
command to update the spec.msdpScaleouts[<index>].s3Credential
and spec.msdpScaleouts[<index>].s3Ip fields in the existing CR
resources:
$ kubectl edit environments.netbackup.veritas.com
<environmentCR_name> -n <sample-namespace>
Content format:
msdpScaleouts:
- credential:
autoDelete: true
secretName: msdp-creds
skipPrecheck: false
s3Credential:
secretName: <s3secretName>
s3Ip:
ipAddr: <s3IpAddress>
fqdn: <s3Fqdn>
■ If the MSDP Scaleout is deployed with MSDP Scaleout YAML, run the
following command to update the spec.s3Credential and
spec.s3ServiceIPFQDN fields in the existing CR resources:
$ kubectl edit msdpscaleouts.msdp.veritas.com.msdp.veritas.com
<MSDP Scaleout CR name> -n <sample-namespace>
Content format:
spec:
credential:
autoDelete: true
secretName: msdp-creds
skipPrecheck: false
s3Credential:
secretName: <s3secretName>
s3ServiceIPFQDN:
ipAddr: <s3IpAddress>
fqdn: <s3Fqdn>
Deploying MSDP Scaleout 127
Deploying MSDP Scaleout
Content format:
spec:
nbca:
s3TokenSecret: <S3-token-secret-name>
If the command output is true, S3 service is configured and ready for use.
Step 1 Install the docker images and See “Installing the docker images and
binaries. binaries for MSDP Scaleout”
on page 81.
Execute the following command to verify if the Cloud Scale deployment is successful:
kubectl get
primaryserver,mediaserver,msdpscaleout,cpserver,environment -n
<netbackup-namespace>
The output should display the name and status of all the CRs. If the value of
STATUS field for each CR is displayed as Success, then it indicates that the
deployment is successful.
The output message for a successful Cloud Scale deployment is as follows:
# kubectl get
primaryserver,mediaserver,msdpscaleout,cpserver,environment -n
<netbackup-namespace>
For further confirmation, verify if the Web UI of the Primary Server is accessible
through https://<Primary Server's FQDN>/webui/login/.
Section 3
Monitoring and
Management
■ Telemetry reporting
Table 11-1
Action Description Probe name Primary Media server
server (seconds)
(seconds)
Heath probes are run using the nb-health command. If you want to manually run
the nb-health command, the following options are available:
■ Disable
This option disables the health check that will mark pod as not ready (0/1).
■ Enable
This option enables the already disabled health check in the pod. This marks
the pod in ready state(1/1) again if all the NetBackup health checks are passed.
■ Deactivate
This option deactivates the health probe functionality in pod. Pod remains in
ready state(1/1). This will avoid pod restarts due to health probes like liveness,
readiness probe failure. This is the temporary step and not recommended to
use in usual case.
■ Activate
Monitoring NetBackup 134
Telemetry reporting
This option activates the health probe functionality that has been deactivated
earlier using the deactivate option.
You can manually disable or enable the probes if required. For example, if for any
reason you need to exec into the pod and restart the NetBackup services, the health
probes should be disabled before restarting the services, and then they should be
enabled again after successfully restarting the NetBackup services. If you do not
disable the health probes during this process, the pod may restart due to the failed
health probes.
You can check pod events in case of probe failures to get more details using
the kubectl describe <primary/media-pod-name> -n <namesapce>
command.
Telemetry reporting
Telemetry reporting entries for the NetBackup deployment on AKS/EKS are indicated
with the AKS/EKS based deployments text.
■ By default, the telemetry data is saved at the /var/veritas/nbtelemetry/
location. The default location will not persisted during the pod restarts.
■ If you want to save telemetry data to persisted location, then execute the kubectl
exec -it -n <namespace> <primary/media_server_pod_name> - /bin/bash
command in the pod using the and execute telemetry command using
/usr/openv/netbackup/bin/nbtelemetry with --dataset-path=DESIRED_PATH
option.
Monitoring NetBackup 135
About NetBackup operator logs
■ Exec into the primary server pod using the following command:
kubectl exec -it -n <namespace> <primary/media_server_pod_name>
-- /bin/bash
■ NetBackup operator provides different log levels that can be changed before
deployment of NetBackup operator.
The following log levels are provided:
■ -1 - Debug
■ 0 - Info
■ 1 - Warn
■ 2 - Error
By default, the log level is 0.
It is recommended to use 0, 1, or 2 log level depending on your requirement.
To change the log level modify the operators-values.yaml file and upgrade
the operators using the following command:
helm upgrade --install operators operators-10.4.tgz -f
operators-values.yaml -n netbackup-operator-system
■ Config-Checker jobs that run before deployment of primary server and media
server creates the pod. The logs for config checker executions can be checked
using the kubectl logs <configchecker-pod-name> -n
<netbackup-operator-namespace> command.
■ Installation logs of NetBackup primary server and media server can be retrieved
using any of the following methods:
Monitoring NetBackup 136
Monitoring Primary/Media server CRs
■ Execute the following command in the primary server/media server pod and
check the /mnt/nblogs/setup-server.log file:
kubectl exec -it <PrimaryServer/MediaServer-Pod-Name> -n
<PrimaryServer/MediaServer-namespace> -- bash
■ (AKS-specific) Data migration jobs create the pods that run before deployment
of primary server. The logs for data migration execution can be checked using
the following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace>
■ Execute the following respective commands to check the event logs that shows
deployment logs for PrimaryServer and MediaServer:
■ For primary server: kubectl describe PrimaryServer <PrimaryServer
name> -n <PrimaryServer-namespace>
Following table describes the primary server CR and media server CR status fields:
Table 11-2
Section Field / Value Description
Primary Server Host Name Name of the primary server that should
Details be used to access the web UI.
Note: (EKS-specific) Amazon EFS is an elastic file system, it does not enforce any
file system capacity limits. The actual storage capacity value in persistent volumes
and persistent volume claims is not used when creating the file system. However,
because storage capacity is a required field in Kubernetes, you must specify a valid
value. This value does not limit the size of your Amazon EFS file system.
1 Edit the environment custom resource using the kubectl edit Environment
<environmentCR_name> -n <namespace> command.
2 To pause the reconciler of the particular custom resource, change the paused:
false value to paused: true in the primaryServer or mediaServer section and
save the changes. In case of multiple media server objects change Paused
value to true for respective media server object only.
3 Edit StatefulSet of primary server or particular media server object using
thekubectl edit <statfulset name> -n <namespace> command, change
replica count to 0 and wait for all pods to terminate for the particular CR object.
4 Update all the persistent volume claim which expects capacity resize with the
kubectl edit pvc <pvcName> -n <namespace> command. In case of
particular media server object, resize respective PVC with expected storage
capacity for all its replicas.
Monitoring NetBackup 140
Allocating static PV for Primary and Media pods
5 Update the respective custom resource section using the kubectl edit
Environment <environmentCR_name> -n <namespace> command with updated
storage capacity for respective volume and change paused: false. Save updated
custom resource.
To update the storage details for respective volume, add storage section with
specific volume and its capacity in respective primaryServer or mediaServer
section in environment CR.
Earlier terminated pod and StatefulSet must get recreated and running
successfully. Pod should get linked to respective persistent volume claim and
data must have been persisted.
6 Run the kubectl get pvc -n <namespace> command and check for capacity
column in result to check the persistent volume claim storage capacity is
expanded.
7 (Optional) Update the log retention configuration for NetBackup depending on
the updated storage capacity.
For more information, refer to the NetBackup™ Administrator's Guide,
Volume I
During volume expansion, if media servers added in primary server are more than
the value of replicas (that is, user has reduced the value for replicas field) then the
volumes of provided replicas value would be expanded. If the value for replicas
is increased again irrespective of volume expansion and if media server must be
scaled up, then all the media server pods would be terminated and would be
re-deployed with all PVCs expanded. This might fail the backup jobs as media
servers may be terminated.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: managed-premium-retain
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
storageaccounttype: Premium_LRS
kind: Managed
Example If user wants to deploy a media For this scenario, you must create total 8
1 server with replica count 3. disks, 8 PV and 8 PVCs.
■ For catalog:
catalog-testprimary-primary-0
■ For data: data-testprimary-primary-0
■ For logs: logs-testprimary-primary-0
Media server
■ For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-2
■ For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
Monitoring NetBackup 144
Allocating static PV for Primary and Media pods
Example If user wants to deploy a media For this scenario, you must create 12
2 server with replica count 5 disks, 12 PV and 12 PVCs
For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-2
■ data-testmedia-media-3
■ data-testmedia-media-4
For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
■ logs-testmedia-media-3
■ logs-testmedia-media-4
3 Create required number of Azure disks and save the ID of newly created disk.
For more information, see Azure Disk - Static
Monitoring NetBackup 145
Allocating static PV for Primary and Media pods
4 Create PVs for each disk and link the PVCs to respective PVs.
To create the PVs, specify the created storage class and diskURI (ID of the
disk received in step 3) in the yaml file. The PV must be created using the
claimRef field and provide PVC name for its corresponding namespace.
For example, if you are creating PV for catalog volume, storage required is
128GB, diskName is primary_catalog_pv and namespace is test. PVC named
catalog-testprimary-primary-0 is linked to this PV when PVC is created in
the namespace test.
apiVersion: v1
kind: PersistentVolume
metadata:
name: catalog
spec:
capacity:
storage: 128Gi
accessModes:
- ReadWriteOnce
azureDisk:
kind: Managed
diskName: primary_catalog_pv
diskURI:
/subscriptions/3247febe-4e28-467d-a65c-10ca69bcd74b/
resourcegroups/MC_NBU-k8s-network_xxxxxx_eastus/providers/Microsoft.Compute/disks/deepak_s_pv
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: catalog-testprimary-primary-0
namespace: test
Monitoring NetBackup 146
Allocating static PV for Primary and Media pods
5 Create PVC with correct PVC name (step 2), storage class and storage.
For example,
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: catalog-testprimary-primary-0
namespace: test
spec:
storageClassName: "managed-premium-retain"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 128Gi
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp2-reclaim
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
fsType: ext4
type: gp2
Example If user wants to deploy a media For this scenario, you must create total 8
1 server with replica count 3. disks, 8 PV and 8 PVCs.
Name of the Media PVC assuming 6 disks, 6 PV and 6 PVCs for media
resourceNamePrefix_of_media is server.
testmedia.
Following will be the names for:
Primary server
■ For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-2
■ For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
Example If user wants to deploy a media For this scenario, you must create 12
2 server with replica count 5 disks, 12 PV and 12 PVCs
For data:
■ data-testmedia-media-0
■ data-testmedia-media-1
■ data-testmedia-media-2
■ data-testmedia-media-3
■ data-testmedia-media-4
For log:
■ logs-testmedia-media-0
■ logs-testmedia-media-1
■ logs-testmedia-media-2
■ logs-testmedia-media-3
■ logs-testmedia-media-4
Monitoring NetBackup 149
Allocating static PV for Primary and Media pods
3 Create the required number of AWS EBS volumes and save the VolumeId of
newly created volumes.
For more information on creating EBS volumes, see EBS volumes.
(For Primary Server volumes) Create the required number of EFS. User can
use single EFS to mount catalog of primary. For example, VolumeHandle in
PersistentVolume spec will be as follows:
<file_system_id>:/catalog
apiVersion: v1
kind: PersistentVolume
metadata:
name: catalog
spec:
accessModes:
- ReadWriteMany
awsElasticBlockStore:
fsType: xfs
volumeID: aws://us-east-2c/vol-xxxxxxxxxxxxxxxxx
capacity:
storage: 128Gi
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2-retain
volumeMode: Filesystem
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: catalog-testprimary-primary-0
namespace: test
Monitoring NetBackup 150
Allocating static PV for Primary and Media pods
5 Create PVC with correct PVC name (step 2), storage class and storage.
For example,
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: catalog-testprimary-primary-0
namespace: test
spec:
storageClassName: gp2-retain
accessModes:
- ReadWriteMany
resources:
requests:
storage: 128Gi
■ Overview
■ Configuration parameters
Overview
The status of Snapshot Manager deployment can be verified by using the following
command:
kubectl describe cpserver -n $ENVIRONMENT_NAMESPACE
Status Description
You can find the Snapshot Manager log files under /cloudpoint/logs/ folder.
Configuration parameters
■ Any configuration related parameter that must be added in
/cloudpoint/flexsnap.conf file can be added in flexsnap-conf configmap
by editing it as follows:
kubectl edit configmap flexsnap-conf -n $ENVIRONMENT_NAMESPACE
For example, for changing the log level from info to debug, add the following:
[logging]
level = debug
AKS:
{
"controllers": [
{
"apiVersions": [
"1.0"
],
"name": "msdp-aks-demo-uss-controller",
"nodeName": "aks-nodepool1-25250377-vmss000002",
"productVersion": "15.1-0159",
"pvc": [
{
"pvcName": "msdp-aks-demo-uss-controller-log",
"stats": {
"availableBytes": "10125.98Mi",
"capacityBytes": "10230.00Mi",
"percentageUsed": "1.02%",
"usedBytes": "104.02Mi"
}
}
],
"ready": "True"
}
],
"engines": [
{
Monitoring MSDP Scaleout 155
About MSDP Scaleout status and events
"ip": "x.x.x.x",
"name": "msdppods1.westus2.cloudapp.azure.com",
"nodeName": "aks-nodepool1-25250377-vmss000003",
"pvc": [
{
"pvcName": "msdppods1.westus2.cloudapp.azure.com-catalog",
"stats": {
"availableBytes": "20293.80Mi",
"capacityBytes": "20470.00Mi",
"percentageUsed": "0.86%",
"usedBytes": "176.20Mi"
}
},
{
"pvcName": "msdppods1.westus2.cloudapp.azure.com-data-0",
"stats": {
"availableBytes": "30457.65Mi",
"capacityBytes": "30705.00Mi",
"percentageUsed": "0.81%",
"usedBytes": "247.35Mi"
}
}
],
"ready": "True"
},
......
EKS:
"capacityBytes": "9951.27Mi",
"percentageUsed": "0.58%",
"usedBytes": "57.27Mi"
}
}
],
"ready": "True"
}
],
"engines": [
{
"ip": "x.x.x.x",
"name": "ip-x-x-x-x.ec2.internal",
"nodeName": "ip-x-x-x-x.ec2.internal",
"pvc": [
{
"pvcName": "ip-x-x-x-x.ec2.internal-catalog",
"stats": {
"availableBytes": "604539.68Mi",
"capacityBytes": "604629.16Mi",
"percentageUsed": "0.01%",
"usedBytes": "73.48Mi"
}
},
{
"pvcName": "ip-x-x-x-x.ec2.internal-data-0",
"stats": {
"availableBytes": "4160957.62Mi",
"capacityBytes": "4161107.91Mi",
"percentageUsed": "0.00%",
"usedBytes": "134.29Mi"
}
}
],
"ready": "True"
},
name: prometheus-cwagentconfig
namespace: amazon-cloudwatch
---
# create configmap for prometheus scrape config
apiVersion: v1
data:
# prometheus config
prometheus.yaml: |
global:
scrape_interval: 1m
scrape_timeout: 10s
scrape_configs:
- job_name: 'msdpoperator-metrics'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount
/token
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io
_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_
prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- source_labels: [__meta_kubernetes_namespace]
action: replace
Monitoring MSDP Scaleout 160
Monitoring with Amazon CloudWatch
target_label: NameSpace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: PodName
kind: ConfigMap
metadata:
name: prometheus-config
namespace: amazon-cloudwatch
Table 13-1 lists the Prometheus metrics that MSDP Scaleout supports.
4 Apply the YAML file.
Kubectl apply -f Prometheus-eks.yaml
If multiple MSDP scaleout clusters are deployed in the same EKS cluster, use
the filter to search the results. For example, search the MSDP engines with
the free space size lower than 1GB in the namespace sample-cr-namespace.
Log query:
prometheus-data-collection-settings: |-
[prometheus_data_collection_settings.cluster]
interval = "1m"
fieldpass = ["msdpoperator_reconcile_total",
"msdpoperator_reconcile_failed",
"msdpoperator_operator_run",
"msdpoperator_diskFreeLess5GBEngines_total",
"msdpoperator_diskFreeMiBytesInEngine",
"msdpoperator_diskFreeLess10GBClusters_total",
"msdpoperator_totalDiskFreePercentInCluster",
"msdpoperator_diskFreePercentInEngine",
"msdpoperator_pvcFreePercentInCluster",
"msdpoperator_unhealthyEngines_total",
"msdpoperator_createdPods_total"]
monitor_kubernetes_pods = true
monitor_kubernetes_pods_namespaces =
["msdp-operator-system"]
Table 13-2 lists the Prometheus metrics that MSDP Scaleout supports.
Monitoring MSDP Scaleout 163
Monitoring with Azure Container insights
The configuration change takes a few minutes and all omsagent pods in the
cluster restart.
The default namespace of prometheus metrics is prometheus.
5 Add alert rules for the integrated metrics.
Add related log query, add new alert rule for the selected query, and alert
group/action for it.
For example,
If the free space size of the MSDP Scaleout engines is lower than 1 GB in past
5 minutes, alert the users.
Log query:
InsightsMetrics
If multiple MSDP Scaleouts are deployed in the same AKS cluster, use the
filter to search the results. For example, search the MSDP engines with the
free space size lower than 1GB in the namespace sample-cr-namespace
Log query:
InsightsMetrics
| where Name == "msdpoperator_diskFreeMiBytesInEngine"
| where Namespace == "prometheus"
| where TimeGenerated > ago(5m)
| where Val <= 1000000
| where Val > 0
| extend Tags = parse_json(Tags)
| where Tags.msdpscalout_ns == "sample-cr-namespace"
■ Run the following command to find the Kubernetes cluster level resources that
belong to the CR:
kubectl api-resources --verbs=list --namespaced=false -o name |
xargs -n 1 -i bash -c 'kubectl get --show-kind --show-labels
--ignore-not-found {} |grep [msdp-operator|<cr-name>]'
Chapter 14
Managing NetBackup
This chapter includes the following topics:
After adding the VxUpdate package to nbrepo, this package is persisted even
after pod restarts.
Following tables describe the specs that can be edited for each CR.
Spec Description
(AKS-specific) capacity Catalog, log and data volume storage capacity can be
updated.
Spec Description
Spec Description
(AKS-specific) capacity Catalog, log and data volume storage capacity can be
updated.
If you edit any other fields, the deployment can go into an inconsistent state.
Additional steps
■ Delete the Load Balancer service created for the media server by running the
following commands:
$ kubectl get service --namespace <namespce_name>
$ kubectl delete service <service-name> --namespace <namespce_name>
■ Identify and delete any outstanding persistent volume claims for the media server
by running the following commands:
$ kubectl get pvc --namespace <namespce_name>
$ kubectl delete pvc <pvc-name>
■ Locate and delete any persistent volumes created for the media server by running
the following commands:
$ kubectl get pv
$ kubectl delete pv <pv-name> --grace-period=0 --force
Managing NetBackup 169
Migrating the cloud node for primary or media servers
2 Change the node selector labelKey and lableValue to new values for
primary/media server.
3 Save the environment CR.
This will change the statefulset for respective NetBackup server replica to 0 for
respective server. This will terminate the pods. After successful migration, statefulset
replicas will be set to original value.
Chapter 15
Managing the Load
Balancer service
This chapter includes the following topics:
Static IP addresses and FQDN if used must be created before being used.
Refer the below section:
■ Pre-allocation of static IP address and FQDN from resource group
In this case, it is required to provide the network resource group in
annotations. This resource group is the resource group of load balancer
public IPs that are in the same resource group as the cluster infrastructure
(node resource group). This static FQDN and IP address must be valid
in case of pod failure or upgrades scenarios as well.
In case user wants to use public load balancer, add type: Public in
networkLoadBalancer section in primary and media server section in
environment CR.
■ Example: In primary CR,
networkLoadBalancer:
type: Public
annotations:
- service.beta.kubernetes.io/azure-load-balancer-
resource-group:<name of network resource-group>
ipList:
- fqdn: primary.eastus.cloudapp.azure.com
ipAddr: 40.123.45.123
networkLoadBalancer:
annotations:
- service.beta.kubernetes.io/azure-load-balancer-
resource-group: ""<name of network resource-group>""
ipList:
- fqdn: media-1.eastus.cloudapp.azure.com
ipAddr: 40.123.45.123
- fqdn: media-2.eastus.cloudapp.azure.com
ipAddr: 40.123.45.124
■ (EKS-specific)
■ NetBackup supports the network load balancer with AWS Load Balancer
scheme as internet-facing.
■ FQDN must be created before being used. Refer below sections for different
allowed annotations to be used in CR spec.
■ User must add the following annotations:
Managing the Load Balancer service 172
About the Load Balancer service
service.beta.kubernetes.io/aws-load-balancer-subnets: <subnet1
name>
In addition to the above annotations, if required user can add more
annotations supported by AWS. For more information, see AWS Load
Balancer Controller Annotations.
For example:
CR spec in primary server,
networkLoadBalancer:
type: Private
annotations:
service.beta.kubernetes.io/aws-load-balancer-subnets: <subnet1
name>
ipList:
"10.244.33.27: abc.vxindia.veritas.com"
networkLoadBalancer:
type: Private
annotations:
service.beta.kubernetes.io/aws-load-balancer-subnets:
<subnet1 name>
ipList:
"10.244.33.28: pqr.vxindia.veritas.com"
"10.244.33.29: xyz.vxindia.veritas.com"
Note: The subnet provided here should be same as the one given in node
pool used for primary server and media server.
If NetBackup client is outside VPC or to access Web UI from outside VPC, then
client CIDR must be added with all NetBackup ports in security group rule of cluster.
Run the following command, to obtain the cluster security group:
aws eks describe-cluster --name <my-cluster> --query
cluster.resourcesVpcConfig.clusterSecurityGroupId
For more information on cluster security group, see Amazon EKS security group
requirements and considerations.
Managing the Load Balancer service 173
About the Load Balancer service
Add inbound rule to security group. For more information, see Add rules to a security
group.
■ Media server:
■ 1556
Used as bidirectional port. Primary server to/from media servers and primary
server to/from client require this TCP port for communication.
■ 13724
Used as a fall-back option when a legacy service cannot be reached through
PBX and when using the Resilient Network feature.
Note: Add the NFS rule that allows traffic on port 2049 directly to the cluster
security group. The security group attached to EFS must also allow traffic
from port 2049.
Note: Be cautious while performing this step, this may lead to data loss.
■ Before using the DNS and its respective IP address in CR yaml, you can verify
the IP address and its DNS resolution using nslookup.
■ In case of media server scaleout, ensure that the number of IP addresses
mentioned in IPList in networkLoadBalancer section matches the replica count.
■ If nslookup is done for loadbalancer IP inside the container, it returns the DNS
in the form of <svc name>.<namespace_name>.svc.cluster.local. This is
Kubernetes behavior. Outside the pod, the loadbalancer service IP address is
resolved to the configured DNS. The nbbptestconnection command inside
the pods can provide a mismatch in DNS names, which can be ignored.
For example:
■ For primary server load balancer service:
■ Service name starts with Name of primary server like <Name>-primary.
Edit the service with the kubectl edit service <Name>-primary -n
<namespace> command.
Managing the Load Balancer service 176
Opening the ports from the Load Balancer service
Note: The load balancer service with name Name used in primary sever and
media server specification must be unique.
3 Add entry for new port in ports array in specification field of the service. For
example, if user want to add 111 port, then add the following entry in ports
array in specification field.
name: custom-111
port: 111
protocol: TCP
targetPort: 111
Azure-specific
1 Launch an Azure CLI pod into the AKS cluster using the following command:
$ kubectl run az-cli --image=mcr.microsoft.com/azure-cli:2.53.0
--command sleep infinity
4 (Optional) Create a key vault policy to allow the current user to retrieve the
database credential.
Obtain the name of your resource group, key vault and ID of the current user
by using the following respective commands:
■ Resource group name:
$ RESOURCE_GROUP=<resource_group_name>
8 (Optional) Verify the current password encryption method by using the following
command:
az postgres flexible-server execute -p "$OLD_DBADMINPASSWORD" -u
$DBADMINUSER -n $DBSERVER -d postgres -q "SELECT * from
azure_roles_authtype();" -o table
Or
If you are only trying to re-encrypt the current password without changing it,
use the following command:
az postgres flexible-server execute -p $OLD_DBADMINPASSWORD -u
$DBADMINUSER -n $DBSERVER -d postgres -q "ALTER USER\"nbdbadmin\"
WITH PASSWORD '$OLD_DBADMINPASSWORD';"
Note: You can reset the flexible server password by using the following
command. This command does not require az extension and potentially could
be run outside of the az-cli container.
az postgres flexible-server update -g $RESOURCE_GROUP -n $DBSERVER
--admin-password <password>
Managing PostrgreSQL DBaaS 181
Changing database server password in DBaaS
10 Use the following command to verify if the password is using the correct
encryption method (SCRAM-SHA-256):
az postgres flexible-server execute -p "$OLD_DBADMINPASSWORD" -u
$DBADMINUSER -n $DBSERVER -d postgres -q "SELECT * from
azure_roles_authtype();" -o table
12 (Optional) Delete the key vault access policy created in step 4 above:
$ az keyvault delete-policy -n $KEYVAULT --upn $USER_ID
AWS-specific
1 Use lambda function to change the password.
LAMBDA_ARN is the ARN of the password changing lambda function. This
can be obtained from the lambda function page on AWS console.
NEW_PASSWORD is the new password to be used.
$ aws lambda invoke --function-name $LAMBDA_ARN \
--cli-binary-format raw-in-base64-out --payload
'{"password":"$NEW_PASSWORD"}' \ response_file
Containerized PostgreSQL
1 Exec into primary pod and change database password using the following
command:
$ kubectl exec -it <primary-pod-name> -n netbackup -- bash
# exit
TLS_FILE_NAME='/tmp/tls.crt'
rm -f ${TLS_FILE_NAME}
DB_CERT_URL="https://cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem"
■ EKS-specific:
TLS_FILE_NAME='/tmp/tls.crt'
PROXY_FILE_NAME='/tmp/proxy.pem'
rm -f ${TLS_FILE_NAME} ${PROXY_FILE_NAME}
DB_CERT_URL="https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem"
DB_PROXY_CERT_URL="https://www.amazontrust.com/repository/AmazonRootCA1.pem"
■ Backing up a catalog
■ Restoring a catalog
Backing up a catalog
You can backup a catalog by using one of the following methods:
■ Automatically
■ Manually
3 Once catalog policy is created, configure Recovery Vault storage in the catalog
backup policy. For more information, see NetBackup Deduplication Guide.
4 In the automatically configured catalog backup policy, the DR package path is
set to mnt/nbdb/usr/openv/drpackage_<storage server name>. If required,
this can be changed by editing the policy from the Web UI.
5 If the email field is included in the DR Secret, then on running a catalog backup
job, the created DRPackages would be sent through email. this is applicable
only when the e-mail server is configured. See “Configuring email server”
on page 97.
6 Exec into the primary server pod using the following command:
kuebctl exec -it -n <namespace> <primaryserver pod name> -- bash
Restoring a catalog
You can restore a catalog. This section describes the procedures for restoring a
catalog when catalog backup is taken on external media server or on MSDP-X and
the,
■ Primary server corrupted
■ MSDP-X corrupted
■ MSDP-X and Primary server corrupted
Performing catalog backup and recovery 189
Restoring a catalog
■ Delete the PV linked to primary server PVC using the kubectl delete pv
<pv-name> command.
6 (EKS-specific) Navigate to mounted EFS directory and delete the content from
primary_catalog folder by running the rm -rf /efs/* command.
7 Change CR spec paused: true to paused: false in primary server section in
and reapply yaml with the kubectl apply -f environment.yaml -n
<namespace> command.
8 Once the primary pod is in ready state, execute the following command in the
primary server pod:
kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash
Performing catalog backup and recovery 190
Restoring a catalog
■ Change ownership of the DRPackages folder to service user using the chown
nbsvcusr:nbsvcusr /mnt/nblogs/DRPackages command.
■ From Web UI, allow reissue of token from primary server for MSDP only
as follows:
Navigate to Security > Host Mappings for the MSDP storage server and
select Allow Auto reissue Certificate.
■ Run the primary server reconciler as follows:
■ Edit the environment (using kubectl edit environment -n
<namespace> command) and change primary spec's for paused field
to true and save it.
■ To enable the reconciler to run, the environment must be edited again
and the primary's paused field must be set to false.
The SHA fingerprint is updated in the primary CR's status.
■ Edit the environment using kubectl edit environment -n <namespace>
command and change paused field to false for MSDP.
■ Verify if MSDP installation is successful and default MSDP storage server,
STU and disk pool is created with old names. This takes some time. Hence,
wait before the STU and disk pool display on the Web UI before proceeding
to the next step.
■ Perform from step 2 in the following section:
See “Scenario 2: MSDP Scaleout and its data is lost and the NetBackup
primary server was destroyed and is re-installed” on page 204.
■ Edit environment CR and change paused: false for media server.
■ Perform full catalog recovery using one of the following options:
Trigger a catalog recovery from the Web UI.
Or
Exec into primary pod and run bprecover -wizard command.
■ Once recovery is completed, restart the NetBackup services:
Stop NetBackup services using the
/usr/openv/netbackup/bin/bp.kill_all command.
Start NetBackup services using the
/usr/openv/netbackup/bin/bp.start_all command.
■ Verify that the Primary, Media, MSDP and Snapshot Manager server are
up and running.
MSDP-X corrupted
1 Note the storage server, cloud LSU and cloud bucket name.
2 Edit the environment and remove MSDP server.
3 From NetBackup Web UI allow reissue of token for MSDP server.
4 Deploy MSDP server with same fields using the following command:
kubectl apply -f environment.yaml
3 Delete the corrupted MSDP and Primary server by running the following
command:
kubectl delete -f environment.yaml -n <namespace>
■ Delete primary and MSDP server PVC (catalog, log and data) using the
kubectl delete pvc <pvc-name> -n <namespace> command.
■ Delete the PV linked to primary server PVC using the kubectl delete pv
<pv-name> command.
5 (EKS-specific) Navigate to mounted EFS directory and delete the content from
primary_catalog folder by running the rm -rf /efs/* command.
6 Modify the environment.yaml file with the paused: true field in the MSDP and
Media sections.
Change CR spec from paused: false to paused: true in MSDP Scaleout and
media servers. Save it.
Note: Ensure that only primary server is deployed. Now apply the modified
environment.yaml file.
Save the environment.yaml file. Apply the environment.yaml file using the
following command:
kubectl apply -f environment.yaml -n <namespace>
15 Once media server pods are ready, perform full catalog recovery using one of
the following options:
Trigger a catalog recovery from the Web UI.
Or
Exec into primary pod and run bprecover -wizard command.
Performing catalog backup and recovery 196
Restoring a catalog
The MSDP Scaleout services are not interrupted when MSDP engines are added.
Note: Due to some Kubernetes restrictions, MSDP operator restarts the engine
pods for attaching the existing and new volumes, which can cause the short
downtime of the services.
Managing MSDP Scaleout 199
Expanding existing data or catalog volumes
To expand the data or catalog volumes using the kubectl command directly
◆ Run the following command to increase the requested storage size in the
spec.dataVolumes field or in the spec.catalogVolume field..
kubectl -n <sample-namespace> edit msdpscaleout <your-cr-name>
[-o json | yaml]
Sometimes Azure disk or Amazon EBS CSI driver may not respond the volume
expansion request promptly. In this case, the operator retries the request by adding
1 byte to the requested volume size to trigger the volume expansion again. If it is
successful, the actual volume capacity could be slightly larger than the requested
size.
Due to the limitation of Azure disk or Amazon CSI storage driver, the engine pods
need to be restarted for resizing the existing volumes. This can cause the short
downtime of the services.
MSDP Scaleout does not support the following:
■ Cannot shrink the volume size.
■ Cannot change the existing data volumes other than for storage expansion.
■ Cannot expand the log volume size. You can do it manually. See “Manual storage
expansion” on page 199.
■ Cannot expand the data volume size for MDS pods. You can do it manually.
See “Manual storage expansion” on page 199.
Note: If you add new MSDP Engines later, the new Engines will respect the CR
specification only. Your manual changes would not be respected by the new Engines.
■ After scaling up, the memory and CPU of the existing node pool may not meet
the performance requirements anymore. In this case, you can add more memory
and CPU by upgrading to the higher instance type to improve the existing node
pool performance or create another node pool with higher instance type and
update the node-selector for the CR accordingly. If you create another node
pool, the new node-selector does not take effect until you manually delete the
pods and deployments from the old node pool, or delete the old node pool
directly to have the pods re-scheduled to the new node pool.
■ Ensure that each AKS or EKS node supports mounting the number of data
volumes plus 5 of the data disks.
For example, if you have 16 data volumes for each engine, then each your AKS
or EKS node should support mounting at least 21 data disks. The additional 5
data disks are for the potential MDS pod, Controller pod or MSDP operator pod
to run on the same node with MSDP engine.
Credentials, bucket name, and sub bucket name must be the same as the
recovered Cloud LSU configuration in the previous MSDP Scaleout deployment.
Configuration file template:
If the LSU cloud alias does not exist, you can use the following command to
add it.
/usr/openv/netbackup/bin/admincmd/csconfig cldinstance -as -in
<instance-name> -sts <storage-server-name> -lsu_name <lsu-name>
Note: For Veritas Alta Recovery Vault Azure storage, the cmsCredName is a
credential name and cmsCredName can be any string. Add recovery vault
credential in the CMS using the NetBackup web UI and provide the credential
name for cmsCredName. For more information, see About Veritas Alta Recovery
Vault Azure topic in NetBackup Deduplication Guide.
3 On the first MSDP Engine of MSDP Scaleout, run the following command for
each cloud LSU:
sudo -E -u msdpsvc /usr/openv/pdde/pdcr/bin/cacontrol --catalog
clouddr <LSUNAME>
Managing MSDP Scaleout 203
MSDP Cloud backup and disaster recovery
Option 2: Stop MSDP services in each MSDP engine pod. MSDP service starts
automatically.
kubectl exec <sample-engine-pod> -n <sample-cr-namespace> -c
uss-engine -- /usr/openv/pdde/pdconfigure/pdde stop
Note: After this step, the MSDP storge server status may appear as down on
the NetBackup primary server. The status changes to up automatically after
the MSDP services are restarted in a few minutes.
If the status does not change, run the following command on the primary server
to update MSDP storage server status manually:
/usr/openv/volmgr/bin/tpconfig -update -storage_server
<storage-server-name> -stype PureDisk -sts_user_id
<storage-server-user-name> -password <storage-server-password>
Scenario 2: MSDP Scaleout and its data is lost and the NetBackup primary
server was destroyed and is re-installed
1 Redeploy MSDP Scaleout on a cluster by using the same CR parameters and
new NetBackup token.
2 When MSDP Scaleout is up and running, reuse the cloud LSU on NetBackup
primary server.
/usr/openv/netbackup/bin/admincmd/nbdevconfig -setconfig
-storage_server <STORAGESERVERNAME> -stype PureDisk -configlist
<configuration file>
Credentials, bucket name, and sub bucket name must be the same as the
recovered Cloud LSU configuration in previous MSDP Scaleout deployment.
Configuration file template:
If KMS is enabled, setup KMS server and import the KMS keys.
If the LSU cloud alias does not exist, you can use the following command to
add it.
/usr/openv/netbackup/bin/admincmd/csconfig cldinstance -as -in
<instance-name> -sts <storage-server-name> -lsu_name <lsu-name>
Note: For Veritas Alta Recovery Vault Azure storage, the cmsCredName is a
credential name and cmsCredName can be any string. Add recovery vault
credential in the CMS using the NetBackup web UI and provide the credential
name for cmsCredName. For more information, see About Veritas Alta Recovery
Vault Azure topic in NetBackup Deduplication Guide.
3 On the first MSDP Engine of MSDP Scaleout, run the following command for
each cloud LSU:
sudo -E -u msdpsvc /usr/openv/pdde/pdcr/bin/cacontrol --catalog
clouddr <LSUNAME>
Managing MSDP Scaleout 205
MSDP Cloud backup and disaster recovery
Note: After this step, the MSDP storge server status may appear as down on
the NetBackup primary server. The status changes to up automatically after
the MSDP services are restarted in a few minutes.
If the status does not change, run the following command on the primary server
to update MSDP storage server status manually:
/usr/openv/volmgr/bin/tpconfig -update -storage_server
<storage-server-name> -stype PureDisk -sts_user_id
<storage-server-user-name> -password <storage-server-password>
The command displays the IAM configurations in the cloud LSU and current
IAM configurations.
The following warning appears:
WARNING: This operation overwrites current IAM configurations
with the IAM configurations in cloud LSU.
To overwrite the current IAM configurations, type the following and press Enter.
overwrite-with-<cloud_LSU_name>
5 Get the token from the target domain NetBackup web UI.
Navigate to Security > Tokens. Enter the token name and other required
details. Click Create.
For more information, see the NetBackup Web UI Administrator’s Guide.
6 Add replication targets for the disk pool in the replication source domain.
Open Storage > Disk storage. Then click the Storage servers tab.
On the Disk pools tab, click on the disk pool link.
Click Add to add the replication target.
7 In the Add replication targets page:
■ Select the replication target primary server.
■ Provide the target domain token.
■ Select the target volume.
Managing MSDP Scaleout 208
About MSDP Scaleout logging and troubleshooting
Option Description
Option Description
Available options:
Run MSDP commands with non-root user msdpsvc after logging in to an engine
pod.
For example, sudo -E -u msdpsvc <command>
The MSDP Scaleout services in an engine pods are running with non-root user
msdpsvc. If you run the MSDP Scaleout services or commands with the root user,
MSDP Scaleout may stop working due to file permissions issues.
--set global.containerRegistry="$REGISTRY" \
--set global.storage.eks.fileSystemId=${EFS_ID} \
--set msdp-operator.image.name="$MSDP_OPERATOR_IMAGE_NAME"
\
--set msdp-operator.image.tag="$MSDP_OPERATOR_IMAGE_TAG" \
--set msdp-operator.storageClass.name=nb-disk-standardssd
\
--set msdp-operator.storageClass.size=5Gi \
--set msdp-operator.logging.debug=false \
--set msdp-operator.logging.age=28 \
--set msdp-operator.logging.num=20 \
--set
msdp-operator.nodeSelector."${MSDP_NODE_SELECTOR_KEY//./\\.}"="${MSDP_NODE_SELECTOR_VALUE}"
\
--set nb-operator.image.name="$OPERATOR_IMAGE_NAME" \
--set nb-operator.image.tag="$OPERATOR_IMAGE_TAG" \
--set nb-operator.loglevel.value="0" \
--set
nb-operator.nodeSelector.node_selector_key="$MEDIA_NODE_SELECTOR_KEY"
\
--set
nb-operator.nodeSelector.node_selector_value="$MEDIA_NODE_SELECTOR_VALUE"
\
--set
flexsnap-operator.image.name="$FLEXSNAP_OPERATOR_IMAGE_NAME" \
--set
flexsnap-operator.image.tag="$FLEXSNAP_OPERATOR_IMAGE_TAG" \
--set
flexsnap-operator.nodeSelector.node_selector_key="$MEDIA_NODE_SELECTOR_KEY"
\
--set
flexsnap-operator.nodeSelector.node_selector_value="$MEDIA_NODE_SELECTOR_VALUE"
MSDP Scaleout Maintenance 215
Migrating the MSDP Scaleout to another node pool
3 If the reclaim policy of the storage class is Retain, run the following command
to restart the existing MSDP Scaleout. MSDP Scaleout starts with the existing
data/metadata.
kubectl apply -f <your-cr-yaml>
Note: All affected pods or other Kubernetes workload objects must be restarted
for the change to take effect.
4 After the CR YAML file update, existing pods are terminated and restarted one
at a time, and the pods are re-scheduled for the new node pool automatically.
Note: Controller pods are temporarily unavailable when the MDS pod restarts.
Do not delete pods manually.
MSDP Scaleout Maintenance 216
Migrating the MSDP Scaleout to another node pool
5 Re run the following command to update the MSDP Scaleout operator with
new node pool:
# helm upgrade --install operators
6 If node selector does not match any existing nodes at the time of change, you
see the message on the console.
If auto scaling for node is enabled, it may resolve automatically as the new
nodes are made available to the cluster. If invalid node selector is provided,
pods may go in the pending state after the update. In that case, run the
command above again.
Do not delete the pods manually.
Chapter 20
PostgreSQL DBaaS
Maintenance
This chapter includes the following topics:
4 On the Create an alert rule page, under the Condition tab select Add
condition.
5 Select a metric from the list of signals to be alerted on from the Select a signal
page.
6 Configure the alert logic including the Condition (for example, Greater than),
Threshold (for example, 85 percent), Time Aggregation, Period of time the
metric rule must be satisfied before the alert triggers (for example, over the
last 30 minutes), and Frequency.
7 Select Next: Actions >.
8 Under the Actions section, select Create action group to create a new group
to receive notifications on the alert.
9 Fill in the Basics form with a name, display name, subscription, and resource
group. Select Next: Notifications >.
10 Configure an Email/SMS message/Push/Voice action type by providing the
details for all the required fields and then click OK.
11 (Optional) Select Next: Actions > to add actions based on the alerts.
12 Select Review+Create to review the information and create.
13 Provide the alert details. and select Next/Review+Create.
If the alert triggers, an email would be sent to the provided email id in the notification
section.
For more information, refer to the PostgreSQL section of the Azure documentation.
Note: It is a best practice to create a second, critical alarm for a lower threshold.
For example, set your first alarm for 25 GB, and the second critical alarm to 10 GB.
For more information on creating alarms for other critical metrics, refer to the Amazon
RDS User Guide.
5 Navigate to Amazon RDS console > Event subscriptions > Create event
subscription.
Enter the Name, select the ARN for the SNS topic, and select Instances as
the Source type.
Select specific instances and select your instance.
6 Navigate to Select specific event categories > select Maintenance > Create.
For more information, refer to the RDS maintenance section of the Amazon
Knowledge Center.
Chapter 21
Patching mechanism for
Primary and Media servers
This chapter includes the following topics:
■ Overview
■ Patching of containers
Overview
NetBackup version 10.4 provides support for patching the following containers:
■ Primary (main)
■ Media
■ Poddependency-init
Patching introduces the ability for customers to patch images in a Kubernetes native
way by specifying the image tags for respective containers using the
serviceImageTag field of the environment.
Note: Ensure that the required patch images are pushed in the registry.
Patching of containers
This section describes the procedure for patching of the following containers (listed
with examples):
■ Primary (main) containers
For example, netbackup/main:10.4-patch
Patching mechanism for Primary and Media servers 222
Patching of containers
■ Media containers
For example, netbackup/media:10.4-patch
■ Poddependency-init containers
For example, netbackup/operator:10.4-patch
To patch the main, media and poddependency-init container
1 Use the following command to obtain the environment name:
$ kubectl get environments -n <namespace>
Or
If serviceImageTag is not present, then run the following command by changing
the value field. For example, change 10.4-patch to the required image tag:
■ For primary container:
$ kubectl patch environment <env-name> -n <namespace>
--type=json --patch '[{"op": "replace", "path":
"/spec/primary/serviceImageTag", "value": {}},{"op": "replace",
"path": "/spec/primary/serviceImageTag/primary.main", "value":
"10.4-patch"}]'
Note: During the upgrade process, ensure that the cluster nodes are not scaled
down to 0 or restarted.
To upgrade operators
1 Run the following script when upgrading from an earlier release of Cloud Scale
that used a single helm chart or the kustomize deployment method:
scripts/prep_operators_for_upgrade.sh
■ Use the following command to save the operators chart values to a file:
# helm show values operators-<version>.tgz >
operators-values.yaml
■ Use the following command to edit the chart values to match your
deployment scenario:
# vi operators-values.yaml
global:
# Toggle for platform-specific features & settings
# Microsoft AKS: "aks"
# Amazon EKS: "eks"
platform: "eks"
# This specifies a container registry that the cluster has
access to.
# NetBackup images should be pushed to this registry prior to
applying this
# Environment resource.
# Example Azure Container Registry name:
# example.azurecr.io
# Example AWS Elastic Container Registry name:
# 123456789012.dkr.ecr.us-east-1.amazonaws.com
containerRegistry:
"364956537575.dkr.ecr.us-east-1.amazonaws.com/ECR Name"
operatorNamespace: "netbackup-operator-system"
Upgrading 226
Upgrading Cloud Scale deployment for Postgres using Helm charts
storage:
eks:
fileSystemId: fs-0f3cc640eeec507d0
msdp-operator:
image:
name: msdp-operator
# Provide tag value in quotes eg: '17.0'
tag: "20.4"
pullPolicy: Always
namespace:
labels:
control-plane: controller-manager
# This determines the path used for storing core files in the
case of a crash.
corePattern: "/core/core.%e.%p.%t"
logging:
# Enable verbose logging
debug: false
# Maximum age (in days) to retain log files, 1 <= N <= 365
age: 28
# Maximum number of log files to retain, 1 <= N =< 20
num: 20
nb-operator:
image:
name: "netbackup/operator"
tag: "10.4"
pullPolicy: Always
flexsnap-operator:
image:
tag: "10.4.0.0.1016"
namespace:
labels:
nb-control-plane: nb-controller-manager
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
Upgrading 228
Upgrading Cloud Scale deployment for Postgres using Helm charts
#loglevel:
# "-1" - Debug (not recommended for production)
# "0" - Info
# "1" - Warn
# "2" - Error
loglevel:
value: "0"
flexsnap-operator:
replicas: 1
namespace:
labels: {}
image:
name: "veritas/flexsnap-deploy"
tag: "10.4.0.1004"
pullPolicy: Always
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
■ Use the following command to save the PostgreSQL chart values to a file:
# helm show values postgresql-<version>.tgz >
postgres-values.yaml
postgresql:
replicas: 1
# The values in the image (name, tag) are placeholders. These
will be set
# when the deploy_nb_cloudscale.sh runs.
image:
name: "netbackup/postgresql"
tag: "14.11.1.0"
pullPolicy: Always
service:
serviceName: nb-postgresql
volume:
volumeClaimName: nb-psql-pvc
volumeDefaultMode: 0640
pvcStorage: 5Gi
# configMapName: nbpsqlconf
storageClassName: nb-disk-premium
Upgrading 230
Upgrading Cloud Scale deployment for Postgres using Helm charts
mountPathData: /netbackup/postgresqldb
secretMountPath: /netbackup/postgresql/keys/server
# mountConf: /netbackup
timezone: null
securityContext:
runAsUser: 0
createCerts: true
# pgbouncerIniPath: /netbackup/pgbouncer.ini
serverSecretName: postgresql-server-crt
clientSecretName: postgresql-client-crt
dbSecretName: dbsecret
dbPort: 13785
pgbouncerPort: 13787
dbAdminName: postgres
initialDbAdminPassword: postgres
dataDir: /netbackup/postgresqldb
# postgresqlConfFilePath: /netbackup/postgresql.conf
# pgHbaConfFilePath: /netbackup/pg_hba.conf
defaultPostgresqlHostName: nb-postgresql
To save $$ you can set storageClassName to nb-disk-standardssd
for non-production environments.
Note: For Postgres pod not to be scheduled on any other nodes other than
primary nodepool, then add Kubernetes taints to the Media, MSDP and
flexsnap/Snapshot Manager nodepool.
If primary node pool has taints applied, then you manually add tolerations to
the PostgreSQL StatefulSet as follows:
■ To verify that node pools use taints, run the following command:
kubectl get nodes
-o=custom-columns=NodeName:.metadata.name,TaintKey:
.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:
.spec.taints[*].effect
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
meta.helm.sh/release-name: postgresql
meta.helm.sh/release-namespace: netbackup
creationTimestamp: "2024-03-25T15:11:59Z"
generation: 1
labels:
app: nb-postgresql
app.kubernetes.io/managed-by: Helm
name: nb-postgresql
...
spec:
template:
spec:
containers:
...
nodeSelector:
nbupool: agentool
tolerations:
- effect: NoSchedule
key: nbupool
operator: Equal
value: agentpool
■ For AWS:
Upgrading 232
Upgrading Cloud Scale deployment for Postgres using Helm charts
TLS_FILE_NAME='/tmp/tls.crt'
PROXY_FILE_NAME='/tmp/proxy.pem'
rm -f ${TLS_FILE_NAME} ${PROXY_FILE_NAME}
DB_CERT_URL="https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem"
DB_PROXY_CERT_URL="https://www.amazontrust.com/repository/AmazonRootCA1.pem"
■ For Azure:
TLS_FILE_NAME='/tmp/tls.crt'
rm -f ${TLS_FILE_NAME}
DB_CERT_URL="https://cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem"
After installing db-cert bundle, ensure that you have db-cert configMap present
in netbackup namespace with size 1 as follows:
Note: If the configMap is showing the size as 0, then delete it and ensure that
the trust-manager recreates it before proceeding further.
■ Navigate to the directory containing the patch file and upgrade the Cloud
Scale deployment as follows:
$ cd scripts/
$ kubectl patch environment <env-name> --type json -n
$ENVIRONMENT_NAMESPACE --patch-file cloudscale_patch.json
Upgrading 234
Upgrading Cloud Scale deployment for Postgres using Helm charts
[
{
"op": "replace",
"path": "/spec/dbSecretName",
"value": "dbsecret"
},
{
"op" : "replace" ,
"path" : "/spec/tag" ,
"value" : "10.4"
},
{
"op" : "replace" ,
"path" : "/spec/msdpScaleouts/0/tag" ,
"value" : "20.4"
},
{
"op" : "replace" ,
"path" : "/spec/cpServer/0/tag" ,
"value" : "10.4.0.1074"
}
]
[
{
"op": "replace",
"path": "/spec/dbSecretName",
"value": "dbsecret"
},
Upgrading 235
Upgrading Cloud Scale deployment for Postgres using Helm charts
{
"op" : "replace" ,
"path" : "/spec/primary/tag" ,
"value" : "10.4"
},
{
"op" : "replace" ,
"path" : "/spec/mediaServers/0/tag" ,
"value" : "10.4"
},
{
"op" : "replace" ,
"path" : "/spec/msdpScaleouts/0/tag" ,
"value" : "20.4"
},
{
"op" : "replace" ,
"path" : "/spec/cpServer/0/tag" ,
"value" : "10.4.0.1074"
}
]
■ For DBAAS_cloudscale_patch.json:
Note: This patch file is to be used only during DBaaS to DBaaS migration.
[
{
"op" : "replace" ,
"path" : "/spec/dbSecretProviderClass" ,
"value" : "dbsecret-spc"
},
{
"op" : "replace" ,
"path" : "/spec/tag" ,
"value" : "10.4"
},
{
"op" : "replace" ,
"path" : "/spec/msdpScaleouts/0/tag" ,
"value" : "20.4"
},
Upgrading 236
Upgrading NetBackup individual components
{
"op" : "replace" ,
"path" : "/spec/cpServer/0/tag" ,
"value" : "10.4.0.1101"
}
]
8 Wait until Environment CR displays the status as ready. During this time pods
are expected to restart and any new services to start.
Run the following command to check the environment CR status:
- kubectl get environment -n "namespace"
For example,
Note: Ensure that you go through this section carefully before starting with the
upgrade procedure.
5 Preserve the environment CR object using the following command and operator
directory that is used to deploy the NetBackup operator:
kubectl -n <namespace> get environment.netbackup.veritas.com
<environment name> -o yaml > environment.yaml
■ Use the following command to edit the chart values to match your
deployment scenario:
# vi operators-values.yaml
global:
# Toggle for platform-specific features & settings
# Microsoft AKS: "aks"
# Amazon EKS: "eks"
platform: "eks"
# This specifies a container registry that the cluster has
access to.
# NetBackup images should be pushed to this registry prior to
applying this
# Environment resource.
# Example Azure Container Registry name:
# example.azurecr.io
# Example AWS Elastic Container Registry name:
# 123456789012.dkr.ecr.us-east-1.amazonaws.com
containerRegistry:
"364956537575.dkr.ecr.us-east-1.amazonaws.com"
operatorNamespace: "netbackup-operator-system"
storage:
eks:
fileSystemId: fs-0f3cc640eeec507d0
msdp-operator:
image:
name: msdp-operator
# Provide tag value in quotes eg: '17.0'
tag: "20.4"
pullPolicy: Always
namespace:
Upgrading 240
Upgrading NetBackup individual components
labels:
control-plane: controller-manager
# This determines the path used for storing core files in the
case of a crash.
corePattern: "/core/core.%e.%p.%t"
logging:
# Enable verbose logging
Upgrading 241
Upgrading NetBackup individual components
debug: false
# Maximum age (in days) to retain log files, 1 <= N <= 365
age: 28
# Maximum number of log files to retain, 1 <= N =< 20
num: 20
nb-operator:
image:
name: "netbackup/operator"
tag: "10.4"
pullPolicy: Always
flexsnap-operator:
image:
tag: "10.4.0.0.1016"
namespace:
labels:
nb-control-plane: nb-controller-manager
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
#loglevel:
# "-1" - Debug (not recommended for production)
# "0" - Info
# "1" - Warn
# "2" - Error
loglevel:
value: "0"
flexsnap-operator:
replicas: 1
Upgrading 242
Upgrading NetBackup individual components
namespace:
labels: {}
image:
name: "veritas/flexsnap-deploy"
tag: "10.4.0.1004"
pullPolicy: Always
nodeSelector:
node_selector_key: agentpool
node_selector_value: agentpool
Update the storageClassName for catalog volume for primary server in Storage
subsection of primary section in environment.yaml file.
(AKS-specific) Update the details of data volume for primary server in
environment.yaml file. Storage class must be of Azure managed disk storage type.
Note: Upgrade the PrimaryServer first and then change the tag for MediaServer
by re-editing the environment to upgrade the MediaServer also. If this sequence is
not followed then deployment may go into inconsistent state
Perform the following if upgrade fails in between for primary server or media
server
1 Check the installation logs using the following command:
kubectl logs <PrimaryServer-pod-name/MediaServer-pod-name> -n
<PrimaryServer/MediaServer-CR-namespace>
2 If required, check the NetBackup logs by performing exec into the pod using
the following command:
kubectl exec -it -n <PrimaryServer/MediaServer-CR-namespace>
<PrimaryServer/MediaServer-pod-name> -- bash
3 Fix the issue and restart the pod by deleting the respective pod with the
following command:
kubectl delete < PrimaryServer/MediaServer-pod-name > -n
<PrimaryServer/MediaServer-CR-namespace>
Upgrading 244
Upgrading NetBackup individual components
4 New pod would be created and upgrade process will be restarted for the
respective NetBackup server.
5 Data migration jobs create the pods that run before deployment of primary
server. Data migration pod exist after migration for one hour only if data
migration job failed. The logs for data migration execution can be checked
using the following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace>
User can copy the logs to retain them even after job pod deletion using the
following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace> > jobpod.log
Note: Downgrade of NetBackup servers is not supported. If this is done, there are
chances of inconsistent state of NetBackup deployment.
2 Change the value of primary and server statefulset to 0 using the following
command:
kubectl scale statefulsets <stateful-set-name> --replicas=0 -n
<namespace>
3 Change the value of media server statefulset to 0 using the following command:
kubectl scale statefulsets <stateful-set-name> --replicas=0 -n
<namespace>
5 Restart the AKS cluster or set the scale node to 0 and rescale it.
6 Upgrade to the latest versions of NetBackup and MSDP operator.
7 Un pause the primary and media server spec by changing the value as paused:
falsein primaryServer and mediaServer sections in environment CR using
the following command:
kubectl edit environments -n <namespace>
8 Change the imageTag to the latest imageTag for primary and media servers
together and save it.
EKS-specific
Ensure that all the steps mentioned for data migration in the following section are
performed before upgrading to the latest NetBackup or installing the latest :
See “Preparing the environment for NetBackup installation on Kubernetes cluster”
on page 24.
■ User must have deployed NetBackup on AWS with EBS as its storage class.
While upgrading to latest NetBackup, the existing catalog data of primary server
will be migrated (copied) from EBS to Amazon elastic files.
■ Fresh NetBackup deployment: If user is deploying NetBackup for the first time,
then Amazon elastic files will be used for primary server's catalog volume
for any backup and restore operations.
Upgrading 246
Upgrading NetBackup individual components
2 Upgrade the MSDP with new build and image tag. Apply the following command
to MSDP:
./kubectl-msdp init --image <Image name:Tag> --storageclass
<Storage Class Name> --namespace <Namespace>
3 Edit the sample/environment.yaml file from new build and perform the
following changes:
■ Add the tag: <new_tag_of_upgrade_image> tag separately under primary
sections.
■ Provide the EFS ID for storageClassName of catalog volume in primary
section.
■ Use the following command to retrieve the previously used EFS ID from
PV and PVC:
kubectl get pvc -n <namespace>
From the output, copy the name of catalog PVC which is of the following
format:
catalog-<resource name prefix>-primary-0
■ Edit the environment.yaml file and update the image tag for Media
Server in mediaServer section.
■ Apply environment.yaml file using the following command and ensure
that the Media Server is deployed successfully:
kubectl apply -f environment.yaml
Note: The rollback procedure in this section can be performed only after assuming
that the customer has taken catalog backup before performing the upgrade.
All the options except -i option must be same as earlier when the operator
was deployed initially.
To upgrade MSDP Scaleout CR resources
1 If you use the environment operator for the MSDP Scaleout deployment, run
the following command to change the spec.msdpScaleouts[<index>].tag
field in the existing CR resources:
kubectl edit environment <environmentCR_name> -n <cr-namespace>
2 If MSDP S3 service is enabled and you want to upgrade from NetBackup 10.2,
you must provide the pre-allocated MSDP S3 FQDN and IP address in the
spec.msdpScaleouts[<index>].s3Ip field.
See “Enabling MSDP S3 service after MSDP Scaleout is deployed” on page 125.
3 If you use the MSDP operator for the MSDP Scaleout deployment, run the
following command to change the spec.version in the existing CR resources.
kubectl edit msdpscaleout <cr-name> -n <cr-namespace>
4 If MSDP S3 service is enabled and you want to upgrade from NetBackup 10.2,
you must provide the pre-allocated MSDP S3 FQDN and IP address in the
spec.s3ServiceIPFQDN field.
Wait for a few minutes. MSDP operator upgrades all the pods and other MSDP
Scaleout resources automatically.
5 Upgrade process restarts the pods. The NetBackup jobs are interrupted during
the process.
Upgrading 249
Upgrading NetBackup individual components
4 To upgrade the operator, apply the new image changes using the following
command:
kubectl apply -k <operator folder name>
After applying the changes, new Snapshot manager operator pod will start in
operator namespace and run successfully.
NB_VERSION=10.4.0
OPERATOR_NAMESPACE="netbackup-operator-system"
ENVIRONMENT_NAMESPACE="ns-155"
NB_DIR=/home/azureuser/VRTSk8s-netbackup-${NB_VERSION}/
Post-migration tasks
After migration, if the name is changed to Snapshot Manager, then perform the
following steps for Linux and Windows on-host agent renews and then perform the
plugin level discovery:
For Linux:
■ Edit the /etc/flexsnap.conf file for migrated Snapshot Manager.
For example,
[agent]
id = agent.c2ec74c967e043aaae5818e50a939556
Upgrading 251
Upgrading NetBackup individual components
■ Perform the Linux on-host agent renew using the following command:
/opt/VRTScloudpoint/bin/flexsnap-agent--renew--token <auth_token>
For Windows:
■ Edit the \etc\flexsnap.conf file for migrated Snapshot Manager.
For example,
[global]
target = nbuxqa-alphaqa-10-250-172-172.vxindia.veritas.com
hostid = azure-vm-427a67a0-6f91-4a35-abb0-635e099fe9ad
[agent]
id = agent.3e2de0bf17d54ed0b54d4b33530594d8
■ Perform the Windows on-host agent renew using the following command:
"c:\ProgramFiles\Veritas\CloudPoint\flexsnap-agent.exe"--renew--token
<auth_token>
Chapter 23
Uninstalling
This chapter includes the following topics:
Note: Replace the environment custom resource names as per your configuration
in the steps below.
Uninstalling 253
Uninstalling NetBackup environment and the operators
2 Wait for all the pods, services and resources to be terminated. To confirm, run
$ kubectl get --namespace <namespce_name>
all,environments,primaryservers,mediaservers,msdpscaleouts,cpservers
You should get a message that no resources were found in the nb-example
namespace.
3 To identify and delete any outstanding persistent volume claims, run the
following:
$ kubectl get pvc --namespace <namespace_name>
To delete all PVCs under the same namespace, run the following command:
kubectl delete pvc -n <namespace> --all
4 To locate and delete any persistent volumes created by the deployment, run:
$ kubectl get pv
Note: Certain storage drivers may cause physical volumes to get stuck in the
terminating state. To resolve this issue, remove the finalizer, using the
command: $ kubectl patch pv <pv-name> -p
'{"metadata":{"finalizers":null}}
Note: (EKS-specific) Navigate to mounted EFS directory and delete the content
from primary_catalog folder by running the rm -rf /efs/ command.
For more information on uninstalling the Snapshot Manager, refer to the following
section:
See “Uninstalling Snapshot Manager from Kubernetes cluster” on page 254.
2. Following commands can be used to remove and disable the Snapshot Manager
from NetBackup:
kubectl apply -f environment.yaml -n $ENVIRONMENT_NAMESPACE sleep
10s
Uninstalling 255
Uninstalling MSDP Scalout from Kubernetes cluster
When an MSDP Scaleout CR is deleted, the critical MSDP data and metadata
is not deleted. You must delete it manually. If you delete the CR without cleaning
up the data and metadata, you can re-apply the same CR YAML file to restart
MSDP Scaleout again by reusing the existing data.
2 If your storage class is with the Retain policy, you must write down the PVs
that are associated with the CR PVCs for deletion in the Kubernetes cluster
level.
kubectl get
pod,svc,deploy,rs,ds,pvc,secrets,certificates,issuers,cm,sa,role,rolebinding
-n <sample-namespace> -o wide
4 If your storage class is with the Retain policy, you must delete the Azure disks
using Azure portal or delete the EBS volumes using Amazon console. You can
also use the Azure or AWS CLI.
AKS: az disk delete -g $RESOURCE_GROUP --name $AZURE_DISK --yes
EKS: aws ec2 delete-volume --volume-id <value>
See “Deploying MSDP Scaleout” on page 127.
See “Reinstalling MSDP Scaleout operator” on page 212.
■ -k: Delete all resources of MSDP Scaleout operator except the namespace.
3 If your storage class is with the Retain policy, you must delete the Azure disks
using Azure portal or delete the EBS volumes using Amazon console. You can
also use the Azure or AWS CLI.
AKS: az disk delete -g $RESOURCE_GROUP --name $AZURE_DISK --yes
EKS: aws ec2 delete-volume --volume-id <value>
See “Deploying MSDP Scaleout” on page 127.
See “Reinstalling MSDP Scaleout operator” on page 212.
Chapter 24
Troubleshooting
This chapter includes the following topics:
NAME READY
STATUS RESTARTS AGE
pod/flexsnap-operator-7d45568767-n9g27 1/1
Running 0 18h
pod/msdp-operator-controller-manager-0 2/2
Running 0 43m
pod/msdp-operator-controller-manager-1 2/2
Running 0 44m
Troubleshooting 259
Troubleshooting AKS and EKS issues
pod/netbackup-operator-controller-manager-6cbf85694f-p97sw 2/2
Running 0 42m
NAME TYPE
CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/msdp-operator-controller-manager-metrics-service
ClusterIP 10.96.144.99 <none> 8443/TCP 3h6m
service/msdp-operator-webhook-service
ClusterIP 10.96.74.75 <none> 443/TCP 3h6m
service/netbackup-operator-controller-manager-metrics-service
ClusterIP 10.96.104.94 <none> 8443/TCP 93m
service/netbackup-operator-webhook-service ClusterIP
10.96.210.26 <none> 443/TCP 93m
NAME
READY UP-TO-DATE AVAILABLE AGE
deployment.apps/msdp-operator-controller-manager
1/1 1 1 3h6m
deployment.apps/netbackup-operator-controller-manager-operator-controller-manager
1/1 1 1 93m
NAME
DESIRED CURRENT READY AGE
replicaset.apps/msdp-operator-controller-manager-65d8fd7c4d
1 1 1 3h6m
replicaset.apps/netbackup-operator-controller-manager-55d6bf59c8
1 1 1 93m
Verify that both pods display Running in the Status column and both deployments
display 2/2 in the Ready column.
pod/dedupe1-uss-controller
-79d554f8cc-598pr 1/1 Running 0 68m
pod/dedupe1-uss-mds-1 1/1 Running 0 75m
pod/dedupe1-uss-mds-2 1/1 Running 0 74m
pod/dedupe1-uss-mds-3 1/1 Running 0 71m
pod/media1-media-0 1/1 Running 0 53m
pod/environment-sample
-primary-0 1/1 Running 0 86m
pod/x10-240-0-12.veritas
.internal 1/1 Running 0 68m
pod/x10-240-0-13.veritas
.internal 2/2 Running 0 64m
pod/x10-240-0-14.veritas
.internal 2/2 Running 0 61m
pod/x10-240-0-15.veritas
.internal 2/2 Running 0 59m
1556:30248/TCP 54m
service/
environment-
sample-primary LoadBalancer 10.4.xxx.xxx 13781:30246/TCP,
13782:30498/TCP,
1556:31872/TCP,
443:30049/TCP,
8443:32032/TCP,
22:31511/TCP 87m
service/
x10-240-0-12
-veritas-internal LoadBalancer 10.4.xxx.xxx 10082:31199/TCP 68m
service/
Troubleshooting 261
Troubleshooting AKS and EKS issues
x10-240-0-13
-veritas-internal LoadBalancer 10.4.xxx.xxx 10082:32439/TCP, 68m
service/
x10-240-0-14 10102:30284/TCP
-veritas-internal LoadBalancer 10.4.xxx.xxx 10082:31810/TCP, 68m
service/
x10-240-0-15 10102:31755/TCP
-veritas-internal LoadBalancer 10.4.xxx.xxx 10082:31664/TCP, 68m
10102:31811/TCP
Once in the primary server shell prompt, to see the list of logs, run:
ls /usr/openv/logs/
To resolve this issue, update the sysctl.conf values for NetBackup servers
deployed on the Kubernetes cluster.
NetBackup image sets following values in sysctl.conf during Kubernetes
deployment:
■ net.ipv4.tcp_keepalive_time = 180
■ net.ipv4.tcp_keepalive_intvl = 10
■ net.ipv4.tcp_keepalive_probes = 20
■ net.ipv4.ip_local_port_range = 14000 65535
These settings are persisted at the location /mnt/nbdata/etc/sysctl.conf.
Modify the values in /mnt/nbdata/etc/sysctl.conf and restart the pod. The new
values are reflected after the pod restart.
If external media servers are used, perform the steps in the following order:
1. Add the following in /usr/openv/netbackup/bp.conf:
HOST_HAS_NAT_ENDPOINTS = YES
2. Add the following sysctl configuration values in etc/sysctl.conf on external
media servers to avoid any socket connection issues:
■ net.ipv4.tcp_keepalive_time = 180
■ net.ipv4.tcp_keepalive_intvl = 10
■ net.ipv4.tcp_keepalive_probes = 20
■ net.ipv4.ip_local_port_range = 14000 65535
■ net.core.somaxconn = 4096
3 Depending on the output of the command and the reason for the issue, perform
the required steps and update the environment CR to resolve the issue.
Resolving the issue where the NetBackup server pod is not scheduled
for long time
The NetBackup server (primary server and media server) pods are stuck in Pending
state. The issue can be because of one of the following reasons:
■ Insufficient resource allocation.
■ Persistent volume claims are not bound to persistent volume.
If nodes are not available, pod remains in pending state with event logs indicating
nodes are scaling up, if auto scaling is configured in cluster.
To resolve the issue where the NetBackup server pod is not scheduled for
long time
1 Check the pod event details for more information about the error using kubectl
describe <PrimaryServer/MediaServer_Pod_Name> -n <namespace>
command.
2 Depending on the output of the command and the reason for the issue, perform
the required steps and update the environment CR to resolve the issue.
Troubleshooting 265
Troubleshooting AKS and EKS issues
Error: ERROR Storage class with the <storageClassName> name does not exist.
After fixing this error, primary server or media server CR does not require any
changes. In this case, NetBackup operator reconciler loop is invoked after every
10 hours. If you want to reflect the changes and invoke the NetBackup operator
reconciler loop immediately, pause the reconciler of the custom resource by changing
the paused: false value to paused: true in the primaryServer or mediaServer section
by using the following command:
kubectl edit Environment <environment-CR-name> -n <namespace>
Again change the value to paused: false (un pause) in the primaryServer or
mediaServer section by using the following command:
kubectl edit Environment <environment-CR-name> -n <namespace>
User can copy the logs to retain them even after job pod deletion using the
following command:
kubectl logs <migration-pod-name> -n
<netbackup-environment-namespace> > jobpod.log
2 Check pod events for obtaining more details for probe failure using the following
command:
kubectl describe pod/<podname> -n <namespace>
Kubernetes will automatically try to resolve the issue by restarting the pod after
liveness probe times out.
3 Depending on the error in the pod logs, perform the required steps or contact
technical support.
NetBackup media server and NetBackup primary server were in running state.
Media server persistent volume claim or media server pod is deleted. In this case,
reinstallation of respective media server can cause the issue.
Troubleshooting 268
Troubleshooting AKS and EKS issues
7 Delete data and logs PVC for respective media server only using the kubectl
delete pvc <pvc-name> -n <namespace> command.
8 Un pause the media server reconciler by changing the value as paused: false
in mediaServer section in environment CR using the following command:
kubectl edit environment <environment-name> -n <namespace>
To resolve this issue, execute the following command in the primary server pod:
kubectl exec -it -n <namespace> <primary-server-pod-name> -- /bin/bash
Refer the NetBackup Security and Encryption Guide for configure KMS manually:
For other troubleshooting issue related to KMS, refer the NetBackup Troubleshooting
Guide.
pod/netbackup-operator
-controller-manager-
5df6f58b9b-6ftt9 1/2 ImagePullBackOff 0 13s
4 Run the kubectl get PV command and verify bound state of PVs is Available.
Troubleshooting 272
Troubleshooting AKS and EKS issues
5 For the PV to be claimed by specific PVC, add the claimref spec field with
PVC name and namespace using the kubectl patch pv <pv-name> -p
'{"spec":{"claimRef": {"apiVersion": "v1", "kind":
"PersistentVolumeClaim", "name": "<Name of claim i.e. PVC name>",
"namespace": "<namespace of pvc>"}}}' command.
For example,
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": {"apiVersion":
"v1", "kind": "PersistentVolumeClaim", "name":
"data-testmedia-media-0", "namespace": "test"}}}'
While adding claimRef add correct PVC names and namespace to respective
PV. Mapping should be as it was before deletion of the namespace or deletion
of PVC.
6 Deploy environment CR that deploys the primary server and media server CR
internally.
If the output shows STATUS as Failed as in the example above, check the primary
pod log for errors with the command:
$ kubectl logs pod/environment-sample-primary-0 -n <namespace>
pod/netbackup-
operator-controller-
manager-6c9dc8d87f
-pq8mr 0/2 Pending 0 15s
1 Run:
$ docker load -i images/pdk8soptr-20.4.tar.gz
Sample output:
Sample output:
"sha256:353d2bd50105cbc3c61540e10cf32a152432d5173bb6318b8e"
2 Run:
$ docker image ls | grep msdp-operator
Sample output:
(AKS-specific):
(EKS-specific):
20.4: digest:
sha256:d294f260813599562eb5ace9e0acd91d61b7dbc53c3 size:
2622
Troubleshooting 276
Troubleshooting AKS and EKS issues
Sample output:
(AKS-specific):
[
"testregistry.azurecr.io/msdp-operator@sha256:
d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
(EKS-specific):
[
"testregistry.<account
id>.dkr.ecr.<region>.amazonaws.com/<registry>:<tag>.io/
msdp-operator@sha256: d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
Troubleshooting 277
Troubleshooting AKS and EKS issues
Sample output:
(AKS-specific):
[
"msdp-operator",
]
(EKS-specific):
"repositories": [
{
"repositoryArn": "arn:aws:ecr:us-east-2:046777922665:
repository/veritas/main_test1",
"registryId": "046777922665",
"repositoryName": "veritas/main_test1",
"repositoryUri": "046777922665.dkr.ecr.us-east-2.
amazonaws.com/veritas/main_test1",
"createdAt": "2022-04-13T07:27:52+00:00",
"imageTagMutability": "MUTABLE",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
}
}
]
Troubleshooting 278
Troubleshooting AKS and EKS issues
Sample output:
(AKS-specific):
{
"changeableAttributes": {
"deleteEnabled": true,
"listEnabled": true,
"readEnabled": true,
"writeEnabled": true
},
"createdTime": "2022-02-01T13:43:26.6809388Z",
"digest": "sha256:d294f260813599562eb5ace9e0acd91d61b7dbc53c3",
"lastUpdateTime": "2022-02-01T13:43:26.6809388Z",
"name": "20.4",
"signed": false
}
(EKS-specific):
"imageDetails": [
{
"registryId": "046777922665",
"repositoryName": "veritas/main_test1",
"imageDigest":
"sha256:d0095074286a50c6bca3daeddbaf264cf4006a92fa3a074daa4739cc995b36f8",
"imageTags": [
"latestTest5"
],
"imageSizeInBytes": 38995046,
"imagePushedAt": "2022-04-13T15:56:07+00:00",
"imageManifestMediaType": "application/vnd.docker.
distribution.manifest.v2+json",
"artifactMediaType": "application/vnd.docker.container.image.v1+json"
}
]
Troubleshooting 279
Troubleshooting AKS and EKS issues
The third copy is located on a Kubernetes node running the container after it is
pulled from the registry. To check this copy, perform the following:
1 Run;
$ kubectl get nodes -o wide
(AKS-specific):
(EKS-specific):
3 You can interact with the node session from the privileged container:
chroot /host
Sample output:
(AKS-specific):
(EKS-specific):
Sample output
"sha256:353d2bd50105cbc3c61540e10cf32a152432d5173bb6318b8e"
null
Sample output
(AKS-specific):
[
"testregistry.azurecr.io/msdp-operator@sha256:
d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
null
(EKS-specific):
[
"<account
id>.dkr.ecr.<region>.amazonaws.com/msdp-operator@sha256:
d294f260813599562eb5ace9e0acd91d61b7dbc53c3"
]
null
How to make sure that you are running the correct image
Use the steps given above to identify image ID and Digest and compare with values
obtained from the registry and the Kubernetes node running the container.
Sample output:
Alternatively, if the nbbuilder script is not available, you can view the installed
EEBs by executing the following command:
$ docker run --rm <image_name>:<image_tag> cat
/usr/openv/pack/pack.summary
Sample output:
EEB_NetBackup_10.1Beta6_PET3980928_SET3992004_EEB1
EEB_NetBackup_10.3Beta6_PET3980928_SET3992021_EEB1
EEB_NetBackup_10.3Beta6_PET3980928_SET3992022_EEB1
EEB_NetBackup_10.3Beta6_PET3980928_SET3992023_EEB1
EEB_NetBackup_10.3Beta6_PET3992020_SET3992019_EEB2
EEB_NetBackup_10.3Beta6_PET3980928_SET3992009_EEB2
EEB_NetBackup_10.3Beta6_PET3980928_SET3992016_EEB1
EEB_NetBackup_10.3Beta6_PET3980928_SET3992017_EEB1
Note: The pack directory may be located in different locations in the uss-*
containers. For example: /uss-controller/pack , /uss-mds/pack,
/uss-proxy/pack.
ERROR controller-runtime.manager.controller.environment
Error defining desired resource {"reconciler group": "netbackup.veritas.com",
"reconciler kind": "Environment", "name": "test-delete", "namespace":
"netbackup-environment",
"Type": "MSDPScaleout", "Resource": "dedupe1", "error": "Unable to get primary host
UUID:
Get \"https://nbux-10-244-33-24.vxindia.veritas.com:1556/netbackup/config/hosts\":
x509: certificate signed by unknown authority (possibly because of \"crypto/rsa:
verification error\" while trying to verify candidate authority certificate \"nbatd\")"}
Troubleshooting 283
Troubleshooting AKS and EKS issues
To resolve this issue, restart the NetBackup operator by deleting the NetBackup
operator pod using the following command:
kubectl delete <Netbackup-operator-pod-name> -n <namespace>
If the Primary Server pod gets restarted then the user must perform the same
above steps to increase the values of total_time and sleep_duration, as these
values will not get persisted after pod restart.
Troubleshooting 284
Troubleshooting AKS and EKS issues
[NBDEPLOYUTIL_INCREMENTAL]
PARENTDIR=/mnt/nbdb/<FOLDER_NAME>
Primary and media servers are referred with multiple IP's inside the pod (pod
IP/LoadBalancer IP). With reverse name lookup of IP enabled, NetBackup treats
the local connection as remote insecure connection.
To resolve the audit events issue, disable the reverse name lookup of primary and
media Load Balancer IP.
2 If it is allocated to same node then create new node with same node selector
given in CR for primary server.
3 Delete the Primary pod which is in pending state.
The newly created Primary pod must not be in pending state now.
nbdevconfig -setconfig
For example,
/usr/openv/netbackup/bin/admincmd/nbdevconfig -getconfig -stype
PureDisk -storage_server [storage server] >
/tmp/tmp_pd_config_file
/usr/openv/netbackup/bin/admincmd/nbdevconfig -setconfig
-storage_server [storage server] -stype PureDisk -configlist
/tmp/tmp_pd_config_file
/usr/openv/netbackup/bin/nbwmc start
1. Obtain the pending pod's toleration and affinity status using the following
command:
kubectl get pods <pod name>
If all the above fields are correct and matching and still the control pool pod is in
pending state, then the issue may be due to all the nodes in nodepool running at
maximum capacity and cannot accommodate new pods. In such case the noodpool
must be scaled properly.
1. Obtain the pending pod's toleration and affinity status using the following
command:
kubectl get pods <pod name>
If all the above fields are correct and matching and still the control pool pod is in
pending state, then the issue may be due to all the nodes in nodepool running at
maximum capacity and cannot accommodate new pods. In such case the noodpool
must be scaled properly.
Troubleshooting 288
Troubleshooting AKS and EKS issues
If all the above fields are correct and matching and still the control pool pod is in
pending state, then the issue may be due to all the nodes in nodepool running at
maximum capacity and cannot accommodate new pods. In such case the noodpool
must be scaled properly.
■ The flexsnap operator is running and is already processing the event (Update,
Upgrade, Create, Delete).
■ To check logs of running operator, use the following command:
kubectl logs -f $(kubectl get pods -n $OPERATOR_NAMESPACE |
grep flexsnap-operator | awk '{printf $1" " }')
■ If you still want to go ahead with new action, you can stop the processing of
the current event so that the new events are processed. To do so delete the
flexsnap operator pod using the following command:
kubectl delete pod $(kubectl get pods -n $OPERATOR_NAMESPACE |
grep flexsnap-operator | awk '{printf $1" " }')
This will re-create the flexsnap-operator pod which will be ready to serve
new events.
Note: The newly created pod might have missed the event which was
performed before re-creation of pod. In this case you may have to reapply
environment.yaml.
2023-03-01T08:14:56.470Z INFO
controller-runtime.manager.controller.mediaserver Running
jobs 0: on Media Server nbux-10-244-33-77.vxindia.veritas.com.
{"reconciler group": "netbackup.veritas.com",
"reconciler kind": "MediaServer", "name": "media1", "namespace":
"netbackup-environment", "Media Server":
"nbux-10-244-33-77.vxindia.veritas.com"}
2023-03-01T08:14:56.646Z INFO
Troubleshooting 290
Troubleshooting AKS and EKS issues
controller-runtime.manager.controller.mediaserver bpps
processes running status. false: on Media Server
nbux-10-244-33-77.vxindia.veritas.com. {"reconciler group":
"netbackup.veritas.com", "reconciler kind": "MediaServer",
"name": "media1", "namespace": "netbackup-environment", "Media
Server": "nbux-10-244-33-77.vxindia.veritas.com"}
Perform the following to know which bpps processes are running and are
not allowing to scale-in the media server pod:
■ Login to NetBackup Web UI portal.
■ Check the notifications tab for any notifications of Media server elasticity
event category. The notification has the list of additional process running
on specific media server. User must wait until the process listed in the
additional process running exits.
Alternatively, user can also see the list of processes in the NetBackup
operator logs as follows:
2023-07-11T13:33:44.142Z INFO
controller-runtime.manager.controller.mediaserver
Following processes are still running : bpbkar test1, bpbkar
test2 {"reconciler group": "netbackup.veritas.com",
"reconciler kind": "MediaServer", "name": "test-media-server",
"namespace": "netbackup-environment"}
■ Identify and delete any outstanding persistent volume claims for the media
server by running the following commands:
$ kubectl get pvc --namespace <namespce_name>
$ kubectl delete pvc <pvc-name>
■ Locate and delete any persistent volumes created for the media server by
running the following commands:
$ kubectl get pv
$ kubectl delete pv <pv-name> --grace-period=0 --force
Workaround:
Manually register Snapshot Manager with NetBackup by performing the following
steps:
■ Navigate to NetBackup UI > Workload > Cloud > Snapshot Manager and
click on Add.
■ Enter the values for FQDN of Snapshot Manager and the port (Default: 443).
■ Click Save.
Troubleshooting 293
Troubleshooting AKS and EKS issues
Note: Even after Snapshot Manager is registered with NetBackup manually the
status of cpServer CRD would be displayed as failed. This status does not affect
the working of Snapshot Manager.
flexsnap-rabbitmq
flexsnap-postgres
2. Execute the following commands to ensure that the correct certificate is referred:
■ Get the ca.crt of postgresql-server-crt:
kubectl -n <namespace> get secret postgresql-server-crt -o
"jsonpath={.data['ca\.crt']}" | base64 -d > server_ca.crt
Troubleshooting 295
Troubleshooting AKS and EKS issues
Note: If reconciler is called while migration PVC exists the invocation will be
failed, customers must wait for the completion of a migration job if an existing
migration job is running and they can also monitor the migration job pods to
check if there are any issues with the migration job. In order to resolve any
problems encountered during existing migration job pod they may choose to
delete the migration job pod manually. If the migration job pod does not exist,
then customer may delete the migration PVC.
To resolve this issue, delete the corrupted database and correct symlink as follows:
1. Exec into primary pod by running the following command:
kubectl exec -it <primary_pod_name> -n <namespace> – bash
# /opt/veritas/vxapp-manage/nb-health disable
# bp.kill_all
# mv -f /mnt/nbdata/usr/openv/netbackup/db/rb.db /mnt/nbdb/usr/openv/netbackup/db/rb.db
# ln -sf /mnt/nbdb/usr/openv/netbackup/db/rb.db /mnt/nbdata/usr/openv/netbackup/db/rb.db
# chown -h nbsvcusr:nbsvcusr /mnt/nbdata/usr/openv/netbackup/db/rb.db
# bp.start_all
# /opt/veritas/vxapp-manage/nb-health enable
Troubleshooting 298
Troubleshooting EKS-specific issues
"Get \"https://abc.xyz.com:*/netbackup/security/cacert\":
■ From the output, copy the name of catalog PVC which is of the following
format:
catalog-<resource name prefix>-primary-0
Note down the value of VolumeHandle field from the output. This is the
EFS ID used previously.
2 Depending on the following appropriate scenario, fix the error from the output
under the Event section:
■ If the event log has an error related to incorrect EFS ID or incorrect format,
then update the environment.yaml file with the correct EFS ID and perform
the below steps.
Or
■ If the event log has an error other than the error related to incorrect EFS
ID, then analyze and fix the error and perform the below steps.
3 After fixing the error, clean the environment using the following command:
kubectl delete -k operator/
4 Delete PV and PVC created for primary server only by using the following
command:
kubectl delete envrionment <envrionmentCR-name> -n <namespace>
Describe the PVC for primary server which has the following format and obtain
the corresponding PV name:
Delete PVC and PV names using the following commands: For PVC: kubectl
delete pvc <pvc name> -n <namespace> For PV: kubectl delete pv <pv name>
Troubleshooting 301
Troubleshooting EKS-specific issues
5 Deploy NetBackup operator again and then apply the environment.yaml file.
This issue can be resolved by creating PV and apply environment.yaml file again.
Appendix A
CR template
This appendix includes the following topics:
■ Secret
■ MSDP Scaleout CR
Secret
The Secret is the Kubernetes security component that stores the MSDP credentials
that are required by the CR YAML.
stringData:
# Please follow MSDP guide for the credential characters and length.
# https://www.veritas.com/content/support/en_US/article.100048511
# The pattern is "^[\\w!$+\\-,.:;=?@[\\]`{}\\|~]{1,62}$"
username: xxxx
password: xxxxxx
MSDP Scaleout CR
■ The CR name must be fewer than 40 characters.
■ The MSDP credentials stored in the Secret must match MSDP credential rules.
See Deduplication Engine credentials for NetBackup
■ MSDP CR cannot be deployed in the namespace of MSDP operator. It must be
in a separate namespace.
■ You cannot reorder the IP/FQDN list. You can update the list by appending the
information.
■ You cannot change the storage class name.
The storage class must be backed with:
■ AKS: Azure disk CSI storage driver "disk.csi.azure.com"
■ EKS: Amazon EBS CSI driver "ebs.csi.aws.com"
■ You cannot change the data volume list other than for storage expansion. It is
append-only and storage expansion only. Up to 16 data volumes are supported.
■ Like the data volumes, the catalog volume can be changed for storage expansion
only.
■ You cannot change or expand the size of the log volume by changing the MSDP
CR.
■ You cannot enable NBCA after the configuration.
■ Once KMS and the OST registration parameters set, you cannot change them.
■ You cannot change the core pattern.
fqdn: "sample-fqdn1"
- ipAddr: "sample-ip2"
fqdn: "sample-fqdn2"
- ipAddr: "sample-ip3"
fqdn: "sample-fqdn3"
- ipAddr: "sample-ip4"
fqdn: "sample-fqdn4"
#
# # s3ServiceIPFQDN is the IP and FQDN pair to expose the S3 service from the MSDP
instance.
# # The IP and FQDN in one pair should match each other correctly.
# # It must be pre-allocated.
# # It is not allowed to be changed after deployment.
# s3ServiceIPFQDN:
# # The pattern is IPv4 or IPv6 format
# ipAddr: "sample-s3-ip"
# # The pattern is FQDN format.
# fqdn: "sample-s3-fqdn"
#
# Optional annotations to be added in the LoadBalancer services for the
Engine IPs.
# In case we run the Engines on private IPs, we need to add some
customized annotations to the LoadBalancer services.
# See https://docs.microsoft.com/en-us/azure/aks/internal-lb
# It's optional. It's not needed in most cases if we're
with public IPs.
# loadBalancerAnnotations:
# service.beta.kubernetes.io/azure-load-balancer-internal: "true"
#
# SecretName is the name of the secret which stores the MSDP credential.
# AutoDelete, when true, will automatically delete the secret specified
by SecretName after the
# initial configuration. If unspecified, AutoDelete defaults to true.
# When true, SkipPrecheck will skip webhook validation of the MSDP
credential. It is only used in data re-use
# scenario (delete CR and re-apply with pre-existing data) as the
secret will not take effect in this scenario. It
# can't be used in other scenarios. If unspecified, SkipPrecheck
defaults to false.
credential:
# The secret should be pre-created in the same namespace which has
the MSDP credential stored.
# The secret should have a "username" and a "password" key-pairs
CR template 306
MSDP Scaleout CR
# s3Credential:
# secretName: s3-secret
# # Optional
# # Default is true
# autoDelete: true
CR template 307
MSDP Scaleout CR
# # Optional
# # Default is false.
# skipPrecheck: false
# Paused is used for maintenance only. In most cases you don't need
to specify it.
# When it's specified, MSDP operator stops reconciling the corresponding
MSDP-X (aka the CR).
# Optional.
# Default is false
# paused: false
#
# The storage classes for logVolume, catalogVolume and dataVolumes should
be:
# - Backed with Azure disk CSI driver "disk.csi.azure.com" with the
managed disks, and allow volume
# expansion.
# - The Azure in-tree storage driver "kubernetes.io/azure-disk" is not
supported. You need to explicitly
# enable the Azure disk CSI driver when configuring your AKS cluster,
or use k8s version v1.21.x which
# has the Azure disk CSI driver built-in.
# - In LRS category.
# - At least Standard SSD for dev/test, and Premium SSD or Ultra Disk
for production.
# - The same storage class can be used for all the volumes.
# -
#
# LogVolume is the volume specification which is used to provision a
volume of an MDS or Controller
# Pod to store the log files and core dump files.
# It's not allowed to be changed.
# In most cases, 5-10 GiB capacity should be big enough for one MDS or
Controller Pod to use.
logVolume:
storageClassName: sample-azure-disk-sc1
resources:
CR template 308
MSDP Scaleout CR
requests:
storage: 5Gi
#
# CatalogVolume is the volume specification which is used to provision a
volume of an MDS or Engine
# Pod to store the catalog and metadata. It's not allowed to be changed
unless for capacity expansion.
# Expanding the existing catalog volumes expects short downtime of the
Engines.
# Please note the MDS Pods don't respect the storage request in
CatalogVolume, instead they provision the
# volumes with the minimal capacity request of 500MiB.
catalogVolume:
storageClassName: sample-azure-disk-sc2
resources:
requests:
storage: 600Gi
#
# DataVolumes is a list of volume specifications which are used to
provision the volumes of
# an Engine Pod to store the MSDP data.
# The items are not allowed to be changed or re-ordered unless for
capacity expansion.
# New items can be appended for adding more data volumes to each
Engine Pod.
# Appending new data volumes or expanding the existing data volumes
expects short downtime of the Engines.
# The allowed item number is in range 1-16. To allow the other MSDP-X
Pods (e.g. Controller, MDS) running
# on the same node, the item number should be no more than "<the maximum
allowed volumes on the node> - 5".
# The additional 5 data disks are for the potential one MDS Pod, one
Controller Pod or one MSDP operator Pod
# to run on the same node with one MSDP Engine.
dataVolumes:
- storageClassName: sample-azure-disk-sc3
resources:
requests:
storage: 8Ti
- storageClassName: sample-azure-disk-sc3
resources:
requests:
storage: 8Ti
CR template 309
MSDP Scaleout CR
#
# NodeSelector is used to schedule the MSDPScaleout Pods on the specified
nodes.
# Optional.
# Default is empty (aka all available nodes)
nodeSelector:
# e.g.
# agentpool: nodepool2
sample-node-label1: sampel-label-value1
sample-node-label2: sampel-label-value2
#
# NBCA is the specification for MSDP-X to enable NBCA SecComm
for the Engines.
# Optional.
nbca:
# The master server name
# The allowed length is in range 1-255
masterServer: sample-master-server-name
# The CA SHA256 fingerprint
# The allowed length is 95
cafp: sample-ca-fp
# The NBCA authentication/reissue token
# The allowed length is 16
# For security consideration, a token with maximum 1 user allowed and
valid for 1 day should be sufficient.
token: sample-auth-token
# tcpKeepAliveTime: 120
#
# TCPIdleTimeout is used to change the default value for Azure Load
Balancer rules and Inbound NAT rules.
# It's in minutes.
# The minimal allowed value is 4 and the maximum allowed value is 30.
# A default value 30 minutes is used if not specified. Set it to 0 to
disable the option.
# It's not allowed to change unless in maintenance mode (Paused=true),
and the change will not apply
# until the Engine Pods and the LoadBalancer services get recreated.
# For AKS deployment in P release, please leave it unspecified or specify
it with a value larger than 4.
# tcpIdleTimeout: 30
version: "sample-version-string"
#
# Size defines the number of Engine instances in the MSDP-X cluster.
# The allowed size is between 1-16
size: 4
#
# The IP and FQDN pairs are used by the Engine Pods to expose the
MSDP services.
# The IP and FQDN in one pair should match each other correctly.
# They must be pre-allocated.
# The item number should match the number of Engine instances.
# They are not allowed to be changed or re-ordered. New items can be
appended for scaling out.
# The first FQDN is used to configure the storage server in NetBackup,
automatically if autoRegisterOST is enabled,
# or manually by the user if not.
serviceIPFQDNs:
# The pattern is IPv4 or IPv6 format
- ipAddr: "sample-ip1"
# The pattern is FQDN format.
fqdn: "sample-fqdn1"
- ipAddr: "sample-ip2"
fqdn: "sample-fqdn2"
- ipAddr: "sample-ip3"
fqdn: "sample-fqdn3"
- ipAddr: "sample-ip4"
fqdn: "sample-fqdn4"
#
# # s3ServiceIPFQDN is the IP and FQDN pair to expose the S3 service from the MSDP instance.
# # The IP and FQDN in one pair should match each other correctly.
# # It must be pre-allocated.
# # It is not allowed to be changed after deployment.
# s3ServiceIPFQDN:
# # The pattern is IPv4 or IPv6 format
# ipAddr: "sample-s3-ip"
# # The pattern is FQDN format.
# fqdn: "sample-s3-fqdn"
# Optional annotations to be added in the LoadBalancer services for the
Engine IPs.
# In case we run the Engines on private IPs, we need to add some
customized annotations to the LoadBalancer services.
# loadBalancerAnnotations:
# # If it's an EKS environment, specify the following annotation
CR template 313
MSDP Scaleout CR
# Default is false.
# Should be specified only in data re-use scenario (aka delete and
re-apply CR with pre-existing data)
skipPrecheck: false
#
# s3Credential:
# # The secret should be pre-created in the same namespace that the MSDP cluster is deployed
# # The secret should have an "accessKey" and a "secretKey" key-pairs with the corresponding
# secretName: s3-secret
# # Optional
# # Default is true
# autoDelete: true
# # Optional
# # Default is false.
# # Should be specified only in data re-use scenario (aka delete and re-apply CR with pre-ex
# skipPrecheck: false
# Paused is used for maintenance only. In most cases you do not need
to specify it.
#
# When it is specified, MSDP operator stops reconciling the corresponding
MSDP-X cluster (aka the CR).
# Optional.
# Default is false
# paused: false
#
# The storage classes for logVolume, catalogVolume and dataVolumes
should be:
CR template 315
MSDP Scaleout CR
# # S3TokenSecret is the secret name that holds NBCA authentication/reissue token for MSDP S
# # It is used to request NBCA certificate for S3 service.
# # It must be set if MSDP S3 service is enabled.
# # The allowed length is in range 1-255
# # For security consideration, a token with maximum 1 user allowed and valid for 1 day shou
# s3TokenSecret: sample-auth-token-secret-for-s3
#
# KMS includes the parameters to enable KMS for the Engines.
# We support to enable KMS in init or post configuration.
# We do not support to change the parameters once they have been set.
# Optional.
kms:
# As either the NetBackup KMS or external KMS (EKMS) is configured
or registered on NetBackup master server, then used by
# MSDP by calling the NetBackup API, kmsServer is the NetBackup master
server name.
kmsServer: sample-master-server-name
keyGroup: sample-key-group-name
#
# autoRegisterOST includes the parameter to enable or disable the
automatic registration of
# the storage server, the default disk pool and storage unit when
MSDP-X configuration finishes.
# We do not support to change autoRegisterOST.
autoRegisterOST:
# If it is true, and NBCA is enabled, the operator would register
the storage server,
# disk pool and storage unit on the NetBackup primary server, when
the MSDP CR is deployed.
# The first Engine FQDN is the storage server name.
# The default disk pool is in format "default_dp_<firstEngineFQDN>".
# The default storage unit is in format "default_stu_<firstEngineFQDN>".
# The default maximum number of concurrent jobs for the STU is 240.
# In the CR status, field "ostAutoRegisterStatus.registered" with
value True, False or Unknown indicates the registration state.
# It is false by default.
enabled: true
CR template 318
MSDP Scaleout CR
#
# CorePattern is the core pattern of the nodes where the MSDPScaleout
Pods are running.
# It is path-based. A default core path "/core/core.%e.%p.%t" will be
used if not specified.
# In most cases, you do not need to specify it.
# It is not allowed to be changed.
# Optional.
# corePattern: /sample/core/pattern/path
#
# tcpKeepAliveTime sets the namespaced sysctl parameter net.ipv4.tcp_
keepalive_time in Engine Pods.
# It is in seconds.
# The minimal allowed value is 60 and the maximum allowed value is 1800.
# A default value 120 is used if not specified. Set it to 0 to disable
the option.
# It is not allowed to change unless in maintenance mode (paused=true),
and the change will not apply until the Engine Pods get restarted.
# For EKS deployment in 10.1 release, please leave it unspecified or
specify it with a value smaller than 240.
# tcpKeepAliveTime: 120
#
# TCPIdleTimeout is used to change the default value for AWS Load
Balancer rules and Inbound NAT rules.
# It is in minutes.
# The minimal allowed value is 4 and the maximum allowed value is 30.
# A default value 30 minutes is used if not specified. Set it to 0 to
disable the option.
# It is not allowed to change unless in maintenance mode (paused=true),
and the change will not apply until the Engine Pods and the LoadBalancer
services get recreated.
# For EKS deployment in 10.1 release, please leave it unspecified or
specify it with a value larger than 4.
# tcpIdleTimeout: 30