In this mode, deployment's scaling is divided into stages depending on number of replicas. A user can provide an array of intervals x1 to x2 and corresponding vpaWeights between and including 0 and 1. x1 and x2 are the number of replicas where that stage begins and ends respectively. In a particular stage, HVPA controller will consider VPA's recommendations according to the weights provided, and scale the deployment accordingly. The weights given to the HPA's recommendation will be 1 - vpaWeight.
According to VPA, new requests will be currentRequest + (targetRequest - currentRequest) * vpaWeight
According to HPA, new number of replicas will be:
- ceil of
currentReplicas + (desiredReplicas - currentReplicas) * (1 - vpaWeight)if desired replicas is more than current replicas - floor of
currentReplicas + (desiredReplicas - currentReplicas) * (1 - vpaWeight)if desired replicas is less than current replicas
vpaWeight = 1 will result in pure vertical scaling
vpaWeight = 0 will result in pure horizontal scaling
Resource request vs number of replicas curve could typically look like:
Here First stage Vpa weight = 0, second stage Vpa weight is a fraction, third stage Vpa weight = 0
resource
request
^
|
| ____________
| /|
| /
| / |
| /
| ___________/ |
|
|-------------|----|-----------|----->
min x1 x2 max #Replicas
Example of vpaWeight vs number of replicas.
Here first stage Vpa weight = 0, second stage Vpa weight = 0.4, third stage Vpa weight = 0
vpaWeight
^
|
0.4| |-------|
| | |
0| ________| |____________
|------------------------------------->
x1 x2 #Replicas
- Works even if HPA and VPA act on different metrices
- Gives better control to user on scaling of apps
- Need to define behavior when:
- current number of replicas is
x2, the deployment has not yet scaled up tomaxAllowed, and user has chosen to scale only horizontally afterx2. In this case, the deployment cannot scale up vertically anymore if the load increases. - current number of replicas is
x1, the deployment has not yet scaled down tominAllowed, and user has chosen to scale only horizontally for lower values thanx1. In this case, the deployment cannot scale down vertically anymore if the load decreases.
- current number of replicas is
- Will need both VPA and HPA to be deployed in a recommendation-only mode.
x1andx2will be optional, If either or both of them are not provided, vertical scaling will be done untilminAllowedand/ormaxAllowedcorrespondingly.- Once replica count reaches
maxReplicas, and VPA still recommends higher resource requirements, then vertical scaling will be done unconditionally
Here is the spec for HVPA controller
Example 1:
apiVersion: autoscaling.k8s.io/v1alpha1
kind: Hvpa
metadata:
name: hvpa-sample
spec:
scaleUpStabilizationWindow: "2m"
scaleDownStabilizationWindow : "3m"
minCpuChange:
value: "200m"
percentage: 70
minMemChange:
value: "200M"
percentage: 80
hpaTemplate:
maxReplicas: 10
minReplicas: 1
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
vpaTemplate:
resourcePolicy:
containerPolicies:
- MinAllowed:
memory: 400Mi
containerName: resource-consumer
maxAllowed:
memory: 3000Mi
weightBasedScalingIntervals:
- vpaWeight: 0
startReplicaCount: 1
lastReplicaCount: 2
- vpaWeight: 0.6
startReplicaCount: 3
lastReplicaCount: 7
- vpaWeight: 0
startReplicaCount: 7
lastReplicaCount: 10
targetRef:
apiVersion: apps/v1
kind: Deployment
name: resource-consumerresource
request
^
|
| _________
| /|
| /
| / |
| /
| ___________/ |
|
|-------------|----|--------|----->
1 3 7 10 #Replicas
Example 2:
apiVersion: autoscaling.k8s.io/v1alpha1
kind: Hvpa
metadata:
name: hvpa-sample
spec:
weightBasedScalingIntervals:
- vpaWeight: 1
startReplicaCount: 1
lastReplicaCount: 1
- vpaWeight: 0.6
startReplicaCount: 2
lastReplicaCount: 4
- vpaWeight: 0
startReplicaCount: 5
lastReplicaCount: 10
hpaTemplate:
.
.
.resource
request
^
| ______
| /
| /
| /
| /
| |
| |
| |
|
|--|----|-----|------->
1 4 10 #Replicas
Example 3: After maxReplicas, even if weight is not mentioned, there is only vertical scaling because HPA is limited by maxReplicas
apiVersion: autoscaling.k8s.io/v1alpha1
kind: Hvpa
metadata:
name: hvpa-sample
spec:
weightBasedScalingIntervals:
- vpaWeight: 0
startReplicaCount: 1
lastReplicaCount: 3
- vpaWeight: 0.6
startReplicaCount: 4
lastReplicaCount: 10
hpaTemplate:
minReplicas: 1
maxReplicas: 10
.
.
.resource
request
^
|
| |
| |
| .
| /
| /
| /
| /
| ____/
|
|--|---|----|------>
1 3 10 #Replicas