Tutorials
Tutorials
Basics
• Kubernetes Basics is an in-depth interactive tutorial that helps you understand the
Kubernetes system and try out some basic Kubernetes features.
• Introduction to Kubernetes (edX)
• Hello Minikube
Configuration
• Example: Configuring a Java Microservice
• Configuring Redis Using a ConfigMap
Stateless Applications
• Exposing an External IP Address to Access an Application in a Cluster
• Example: Deploying PHP Guestbook application with Redis
Stateful Applications
• StatefulSet Basics
• Example: WordPress and MySQL with Persistent Volumes
• Example: Deploying Cassandra with Stateful Sets
• Running ZooKeeper, A CP Distributed System
Services
• Connecting Applications with Services
• Using Source IP
Security
• Apply Pod Security Standards at Cluster level
• Apply Pod Security Standards at Namespace level
• AppArmor
• Seccomp
What's next
If you would like to write a tutorial, see Content Page Types for information about the tutorial
page type.
Hello Minikube
This tutorial shows you how to run a sample app on Kubernetes using minikube. The tutorial
provides a container image that uses NGINX to echo back all the requests.
Objectives
• Deploy a sample application to minikube.
• Run the app.
• View application logs.
Note:
Only execute the instructions in Step 1, Installation. The rest is covered on this page.
You also need to install kubectl. See Install tools for installation instructions.
• Launch a browser
• URL copy and paste
Now, switch back to the terminal where you ran minikube start.
Note:
The dashboard command enables the dashboard add-on and opens the proxy in the default web
browser. You can create Kubernetes resources on the dashboard such as Deployment and
Service.
To find out how to avoid directly invoking the browser from the terminal and get a URL for the
web dashboard, see the "URL copy and paste" tab.
By default, the dashboard is only accessible from within the internal Kubernetes virtual
network. The dashboard command creates a temporary proxy to make the dashboard accessible
from outside the Kubernetes virtual network.
To stop the proxy, run Ctrl+C to exit the process. After the command exits, the dashboard
remains running in the Kubernetes cluster. You can run the dashboard command again to create
another proxy to access the dashboard.
If you don't want minikube to open a web browser for you, run the dashboard subcommand
with the --url flag. minikube outputs a URL that you can open in the browser you prefer.
Now, you can use this URL and switch back to the terminal where you ran minikube start.
Create a Deployment
A Kubernetes Pod is a group of one or more Containers, tied together for the purposes of
administration and networking. The Pod in this tutorial has only one Container. A Kubernetes
Deployment checks on the health of your Pod and restarts the Pod's Container if it terminates.
Deployments are the recommended way to manage the creation and scaling of Pods.
1. Use the kubectl create command to create a Deployment that manages a Pod. The Pod
runs a Container based on the provided Docker image.
(It may take some time for the pod to become available. If you see "0/1", try again in a few
seconds.)
Note:
For more information about kubectl commands, see the kubectl overview.
Create a Service
By default, the Pod is only accessible by its internal IP address within the Kubernetes cluster. To
make the hello-node Container accessible from outside the Kubernetes virtual network, you
have to expose the Pod as a Kubernetes Service.
1. Expose the Pod to the public internet using the kubectl expose command:
The --type=LoadBalancer flag indicates that you want to expose your Service outside of
the cluster.
The application code inside the test image only listens on TCP port 8080. If you used
kubectl expose to expose a different port, clients could not connect to that other port.
This opens up a browser window that serves your app and shows the app's response.
Enable addons
The minikube tool includes a set of built-in addons that can be enabled, disabled and opened in
the local Kubernetes environment.
addon-manager: enabled
dashboard: enabled
default-storageclass: enabled
efk: disabled
freshpod: disabled
gvisor: disabled
helm-tiller: disabled
ingress: disabled
ingress-dns: disabled
logviewer: disabled
metrics-server: disabled
nvidia-driver-installer: disabled
nvidia-gpu-device-plugin: disabled
registry: disabled
registry-creds: disabled
storage-provisioner: enabled
storage-provisioner-gluster: disabled
3. View the Pod and Service you created by installing that addon:
5. Disable metrics-server:
Clean up
Now you can clean up the resources you created in your cluster:
minikube stop
# Optional
minikube delete
If you want to use minikube again to learn more about Kubernetes, you don't need to delete it.
Conclusion
This page covered the basic aspects to get a minikube cluster up and running. You are now
ready to deploy applications.
What's next
• Tutorial to deploy your first app on Kubernetes with kubectl.
• Learn more about Deployment objects.
• Learn more about Deploying applications.
• Learn more about Service objects.
Kubernetes Basics
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration
system. Each module contains some background information on major Kubernetes features and
concepts, and a tutorial for you to follow along.
2. Deploy an app
Create a Cluster
Learn about Kubernetes cluster and create a simple cluster using Minikube.
Learn what a Kubernetes cluster is. Learn what Minikube is. Start a Kubernetes cluster.
Objectives
Kubernetes Clusters
Summary:
• Kubernetes cluster
• Minikube
Cluster Diagram
The Control Plane is responsible for managing the cluster. The Control Plane coordinates
all activities in your cluster, such as scheduling applications, maintaining applications' desired
state, scaling applications, and rolling out new updates.
Control Planes manage the cluster and the nodes that are used to host the running applications.
When you deploy applications on Kubernetes, you tell the control plane to start the application
containers. The control plane schedules the containers to run on the cluster's nodes. Node-
level components, such as the kubelet, communicate with the control plane using the
Kubernetes API, which the control plane exposes. End users can also use the Kubernetes API
directly to interact with the cluster.
A Kubernetes cluster can be deployed on either physical or virtual machines. To get started
with Kubernetes development, you can use Minikube. Minikube is a lightweight Kubernetes
implementation that creates a VM on your local machine and deploys a simple cluster
containing only one node. Minikube is available for Linux, macOS, and Windows systems. The
Minikube CLI provides basic bootstrapping operations for working with your cluster, including
start, stop, status, and delete.
Now that you know more about what Kubernetes is, visit Hello Minikube to try this out on
your computer.
Deploy an App
Learn about application Deployments. Deploy your first app on Kubernetes with kubectl.
Using kubectl to Create a Deployment
Learn about application Deployments. Deploy your first app on Kubernetes with kubectl.
html
Objectives
Kubernetes Deployments
Note:
This tutorial uses a container that requires the AMD64 architecture. If you are using minikube
on a computer with a different CPU architecture, you could try using minikube with a driver
that can emulate AMD64. For example, the Docker Desktop driver can do this.
Once you have a running Kubernetes cluster, you can deploy your containerized applications
on top of it. To do so, you create a Kubernetes Deployment. The Deployment instructs
Kubernetes how to create and update instances of your application. Once you've created a
Deployment, the Kubernetes control plane schedules the application instances included in that
Deployment to run on individual Nodes in the cluster.
Once the application instances are created, a Kubernetes Deployment controller continuously
monitors those instances. If the Node hosting an instance goes down or is deleted, the
Deployment controller replaces the instance with an instance on another Node in the cluster.
This provides a self-healing mechanism to address machine failure or maintenance.
In a pre-orchestration world, installation scripts would often be used to start applications, but
they did not allow recovery from machine failure. By both creating your application instances
and keeping them running across Nodes, Kubernetes Deployments provide a fundamentally
different approach to application management.
Summary:
• Deployments
• Kubectl
You can create and manage a Deployment by using the Kubernetes command line interface,
kubectl. Kubectl uses the Kubernetes API to interact with the cluster. In this module, you'll
learn the most common kubectl commands needed to create Deployments that run your
applications on a Kubernetes cluster.
When you create a Deployment, you'll need to specify the container image for your application
and the number of replicas that you want to run. You can change that information later by
updating your Deployment; Modules 5 and 6 of the bootcamp discuss how you can scale and
update your Deployments.
Applications need to be packaged into one of the supported container formats in order to be
deployed on Kubernetes
For your first Deployment, you'll use a hello-node application packaged in a Docker container
that uses NGINX to echo back all the requests. (If you didn't already try creating a hello-node
application and deploying it using a container, you can do that first by following the
instructions from the Hello Minikube tutorial).
You will need to have installed kubectl as well. If you need to install it, visit install tools.
Now that you know what Deployments are, let's deploy our first app!
kubectl basics
This performs the specified action (like create, describe or delete) on the specified resource (like
node or deployment). You can use --help after the subcommand to get additional info about
possible parameters (for example: kubectl get nodes --help).
Check that kubectl is configured to talk to your cluster, by running the kubectl version
command.
Check that kubectl is installed and you can see both the client and the server versions.
To view the nodes in the cluster, run the kubectl get nodes command.
You see the available nodes. Later, Kubernetes will choose where to deploy our application
based on Node available resources.
Deploy an app
Let’s deploy our first app on Kubernetes with the kubectl create deployment command. We
need to provide the deployment name and app image location (include the full repository url
for images hosted outside Docker Hub).
• searched for a suitable node where an instance of the application could be run (we have
only 1 available node)
• scheduled the application to run on that Node
• configured the cluster to reschedule the instance on a new Node when needed
We see that there is 1 deployment running a single instance of your app. The instance is
running inside a container on your node.
Pods that are running inside Kubernetes are running on a private, isolated network. By default
they are visible from other pods and services within the same Kubernetes cluster, but not
outside that network. When we use kubectl, we're interacting through an API endpoint to
communicate with our application.
We will cover other options on how to expose your application outside the Kubernetes cluster
later, in Module 4. Also as a basic tutorial, we're not explaining what Pods are in any detail here,
it will be covered in later topics.
The kubectl proxy command can create a proxy that will forward communications into the
cluster-wide, private network. The proxy can be terminated by pressing control-C and won't
show any output while it's running.
kubectl proxy
We now have a connection between our host (the terminal) and the Kubernetes cluster. The
proxy enables direct access to the API from these terminals.
You can see all those APIs hosted through the proxy endpoint. For example, we can query the
version directly through the API using the curl command:
curl http://localhost:8001/version
Note: If port 8001 is not accessible, ensure that the kubectl proxy that you started above is
running in the second terminal.
The API server will automatically create an endpoint for each pod, based on the pod name, that
is also accessible through the proxy.
First we need to get the Pod name, and we'll store it in the environment variable POD_NAME:
You can access the Pod through the proxied API, by running:
curl http://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME:8080/proxy/
In order for the new Deployment to be accessible without using the proxy, a Service is required
which will be explained in Module 4.
Learn how to troubleshoot Kubernetes applications using kubectl get, kubectl describe, kubectl
logs and kubectl exec.
Objectives
Kubernetes Pods
When you created a Deployment in Module 2, Kubernetes created a Pod to host your
application instance. A Pod is a Kubernetes abstraction that represents a group of one or more
application containers (such as Docker), and some shared resources for those containers. Those
resources include:
A Pod models an application-specific "logical host" and can contain different application
containers which are relatively tightly coupled. For example, a Pod might include both the
container with your Node.js app as well as a different container that feeds the data to be
published by the Node.js webserver. The containers in a Pod share an IP Address and port
space, are always co-located and co-scheduled, and run in a shared context on the same Node.
Pods are the atomic unit on the Kubernetes platform. When we create a Deployment on
Kubernetes, that Deployment creates Pods with containers inside them (as opposed to creating
containers directly). Each Pod is tied to the Node where it is scheduled, and remains there until
termination (according to restart policy) or deletion. In case of a Node failure, identical Pods are
scheduled on other available Nodes in the cluster.
Summary:
• Pods
• Nodes
• Kubectl main commands
A Pod is a group of one or more application containers (such as Docker) and includes shared
storage (volumes), IP address and information about how to run them.
Pods overview
Nodes
A Pod always runs on a Node. A Node is a worker machine in Kubernetes and may be either a
virtual or a physical machine, depending on the cluster. Each Node is managed by the control
plane. A Node can have multiple pods, and the Kubernetes control plane automatically handles
scheduling the pods across the Nodes in the cluster. The control plane's automatic scheduling
takes into account the available resources on each Node.
• Kubelet, a process responsible for communication between the Kubernetes control plane
and the Node; it manages the Pods and the containers running on a machine.
• A container runtime (like Docker) responsible for pulling the container image from a
registry, unpacking the container, and running the application.
Containers should only be scheduled together in a single Pod if they are tightly coupled and need to
share resources such as disk.
Node overview
You can use these commands to see when applications were deployed, what their current
statuses are, where they are running and what their configurations are.
Now that we know more about our cluster components and the command line, let's explore our
application.
Let's verify that the application we deployed in the previous scenario is running. We'll use the
kubectl get command and look for existing Pods:
If no pods are running, please wait a couple of seconds and list the Pods again. You can
continue once you see one Pod running.
Next, to view what containers are inside that Pod and what images are used to build those
containers we run the kubectl describe pods command:
We see here details about the Pod’s container: IP address, the ports used and a list of events
related to the lifecycle of the Pod.
The output of the describe subcommand is extensive and covers some concepts that we didn’t
explain yet, but don’t worry, they will become familiar by the end of this bootcamp.
Note: the describe subcommand can be used to get detailed information about most of the
Kubernetes primitives, including Nodes, Pods, and Deployments. The describe output is designed to
be human readable, not to be scripted against.
Recall that Pods are running in an isolated, private network - so we need to proxy access to
them so we can debug and interact with them. To do this, we'll use the kubectl proxy command
to run a proxy in a second terminal. Open a new terminal window, and in that new terminal,
run:
kubectl proxy
Now again, we'll get the Pod name and query that pod directly through the proxy. To get the
Pod name and store it in the POD_NAME environment variable:
curl http://localhost:8001/api/v1/namespaces/default/pods/$POD_NAME:8080/proxy/
Anything that the application would normally send to standard output becomes logs for the
container within the Pod. We can retrieve these logs using the kubectl logs command:
Note: We don't need to specify the container name, because we only have one container inside the
pod.
We can execute commands directly on the container once the Pod is up and running. For this,
we use the exec subcommand and use the name of the Pod as a parameter. Let’s list the
environment variables:
Again, it's worth mentioning that the name of the container itself can be omitted since we only
have a single container in the Pod.
We have now an open console on the container where we run our NodeJS application. The
source code of the app is in the server.js file:
cat server.js
curl http://localhost:8080
Note: here we used localhost because we executed the command inside the NodeJS Pod. If you
cannot connect to localhost:8080, check to make sure you have run the kubectl exec command and
are launching the command from within the Pod
Learn about a Service in Kubernetes. Understand how labels and selectors relate to a Service.
Expose an application outside a Kubernetes cluster.
Objectives
Kubernetes Pods are mortal. Pods have a lifecycle. When a worker node dies, the Pods running
on the Node are also lost. A ReplicaSet might then dynamically drive the cluster back to the
desired state via the creation of new Pods to keep your application running. As another
example, consider an image-processing backend with 3 replicas. Those replicas are
exchangeable; the front-end system should not care about backend replicas or even if a Pod is
lost and recreated. That said, each Pod in a Kubernetes cluster has a unique IP address, even
Pods on the same Node, so there needs to be a way of automatically reconciling changes among
Pods so that your applications continue to function.
A Service in Kubernetes is an abstraction which defines a logical set of Pods and a policy by
which to access them. Services enable a loose coupling between dependent Pods. A Service is
defined using YAML or JSON, like all Kubernetes object manifests. The set of Pods targeted by a
Service is usually determined by a label selector (see below for why you might want a Service
without including a selector in the spec).
Although each Pod has a unique IP address, those IPs are not exposed outside the cluster
without a Service. Services allow your applications to receive traffic. Services can be exposed in
different ways by specifying a type in the spec of the Service:
• ClusterIP (default) - Exposes the Service on an internal IP in the cluster. This type makes
the Service only reachable from within the cluster.
• NodePort - Exposes the Service on the same port of each selected Node in the cluster
using NAT. Makes a Service accessible from outside the cluster using
<NodeIP>:<NodePort>. Superset of ClusterIP.
• LoadBalancer - Creates an external load balancer in the current cloud (if supported) and
assigns a fixed, external IP to the Service. Superset of NodePort.
• ExternalName - Maps the Service to the contents of the externalName field (e.g.
foo.bar.example.com), by returning a CNAME record with its value. No proxying of any
kind is set up. This type requires v1.7 or higher of kube-dns, or CoreDNS version 0.0.8 or
higher.
More information about the different types of Services can be found in the Using Source IP
tutorial. Also see Connecting Applications with Services.
Additionally, note that there are some use cases with Services that involve not defining a
selector in the spec. A Service created without selector will also not create the corresponding
Endpoints object. This allows users to manually map a Service to specific endpoints. Another
possibility why there may be no selector is you are strictly using type: ExternalName.
Summary
A Kubernetes Service is an abstraction layer which defines a logical set of Pods and enables
external traffic exposure, load balancing and service discovery for those Pods.
A Service routes traffic across a set of Pods. Services are the abstraction that allows pods to die
and replicate in Kubernetes without impacting your application. Discovery and routing among
dependent Pods (such as the frontend and backend components in an application) are handled
by Kubernetes Services.
Services match a set of Pods using labels and selectors, a grouping primitive that allows logical
operation on objects in Kubernetes. Labels are key/value pairs attached to objects and can be
used in any number of ways:
Labels can be attached to objects at creation time or later on. They can be modified at any time.
Let's expose our application now using a Service and apply some labels.
Let’s verify that our application is running. We’ll use the kubectl get command and look for
existing Pods:
If no Pods are running then it means the objects from the previous tutorials were cleaned up. In
this case, go back and recreate the deployment from the Using kubectl to create a Deployment
tutorial. Please wait a couple of seconds and list the Pods again. You can continue once you see
the one Pod running.
Next, let’s list the current Services from our cluster:
We have a Service called kubernetes that is created by default when minikube starts the cluster.
To create a new service and expose it to external traffic we'll use the expose command with
NodePort as parameter.
We have now a running Service called kubernetes-bootcamp. Here we see that the Service
received a unique cluster-IP, an internal port and an external-IP (the IP of the Node).
To find out what port was opened externally (for the type: NodePort Service) we’ll run the
describe service subcommand:
Create an environment variable called NODE_PORT that has the value of the Node port
assigned:
Now we can test that the app is exposed outside of the cluster using curl, the IP address of the
Node and the externally exposed port:
Note:
If you're running minikube with Docker Desktop as the container driver, a minikube tunnel is
needed. This is because containers inside Docker Desktop are isolated from your host computer.
http://127.0.0.1:51082
! Because you are using a Docker driver on darwin, the terminal needs to be open to
run it.
The Deployment created automatically a label for our Pod. With the describe deployment
subcommand you can see the name (the key) of that label:
Let’s use this label to query our list of Pods. We’ll use the kubectl get pods command with -l as
a parameter, followed by the label values:
Get the name of the Pod and store it in the POD_NAME environment variable:
To apply a new label we use the label subcommand followed by the object type, object name
and the new label:
This will apply a new label to our Pod (we pinned the application version to the Pod), and we
can check it with the describe pod command:
We see here that the label is attached now to our Pod. And we can query now the list of pods
using the new label:
To delete Services you can use the delete service subcommand. Labels can be used also here:
This confirms that our Service was removed. To confirm that route is not exposed anymore you
can curl the previously exposed IP and port:
We see here that the application is up. This is because the Deployment is managing the
application. To shut down the application, you would need to delete the Deployment as well.
Objectives
Scaling an application
Previously we created a Deployment, and then exposed it publicly via a Service. The
Deployment created only one Pod for running our application. When traffic increases, we will
need to scale the application to keep up with user demand.
If you haven't worked through the earlier sections, start from Using minikube to create a
cluster.
Summary:
• Scaling a Deployment
You can create from the start a Deployment with multiple instances using the --replicas parameter
for the kubectl create deployment command
Note:
If you are trying this after the previous section , then you may have deleted the service you
created, or have created a Service of type: NodePort. In this section, it is assumed that a service
with type: LoadBalancer is created for the kubernetes-bootcamp Deployment.
If you have not deleted the Service created in the previous section, first delete that Service and
then run the following command to create a new Service with its type set to LoadBalancer:
Scaling overview
1.
2.
Previous Next
Scaling out a Deployment will ensure new Pods are created and scheduled to Nodes with
available resources. Scaling will increase the number of Pods to the new desired state.
Kubernetes also supports autoscaling of Pods, but it is outside of the scope of this tutorial.
Scaling to zero is also possible, and it will terminate all Pods of the specified Deployment.
Running multiple instances of an application will require a way to distribute the traffic to all of
them. Services have an integrated load-balancer that will distribute network traffic to all Pods
of an exposed Deployment. Services will monitor continuously the running Pods using
endpoints, to ensure the traffic is sent only to available Pods.
Once you have multiple instances of an application running, you would be able to do Rolling
updates without downtime. We'll cover that in the next section of the tutorial. Now, let's go to
the terminal and scale our application.
Scaling a Deployment
We should have 1 Pod. If not, run the command again. This shows:
• DESIRED displays the desired number of replicas of the application, which you define
when you create the Deployment. This is the desired state.
• CURRENT displays how many replicas are currently running.
Next, let’s scale the Deployment to 4 replicas. We’ll use the kubectl scale command, followed by
the Deployment type, name and desired number of instances:
The change was applied, and we have 4 instances of the application available. Next, let’s check
if the number of Pods changed:
There are 4 Pods now, with different IP addresses. The change was registered in the Deployment
events log. To check that, use the describe subcommand:
You can also view in the output of this command that there are 4 replicas now.
Load Balancing
Let's check that the Service is load-balancing the traffic. To find out the exposed IP and Port we
can use the describe service as we learned in the previous part of the tutorial:
Create an environment variable called NODE_PORT that has a value as the Node port:
echo NODE_PORT=$NODE_PORT
Next, we’ll do a curl to the exposed IP address and port. Execute the command multiple times:
We hit a different Pod with every request. This demonstrates that the load-balancing is
working.
Note:
If you're running minikube with Docker Desktop as the container driver, a minikube tunnel is
needed. This is because containers inside Docker Desktop are isolated from your host computer.
http://127.0.0.1:51082
! Because you are using a Docker driver on darwin, the terminal needs to be open to
run it.
Scale Down
To scale down the Deployment to 2 replicas, run again the scale subcommand:
List the Deployments to check if the change was applied with the get deployments
subcommand:
The number of replicas decreased to 2. List the number of Pods, with get pods:
Updating an application
Users expect applications to be available all the time, and developers are expected to deploy
new versions of them several times a day. In Kubernetes this is done with rolling updates. A
rolling update allows a Deployment update to take place with zero downtime. It does this by
incrementally replacing the current Pods with new ones. The new Pods are scheduled on Nodes
with available resources, and Kubernetes waits for those new Pods to start before removing the
old Pods.
In the previous module we scaled our application to run multiple instances. This is a
requirement for performing updates without affecting application availability. By default, the
maximum number of Pods that can be unavailable during the update and the maximum number
of new Pods that can be created, is one. Both options can be configured to either numbers or
percentages (of Pods). In Kubernetes, updates are versioned and any Deployment update can be
reverted to a previous (stable) version.
Summary:
• Updating an app
Rolling updates allow Deployments' update to take place with zero downtime by incrementally
updating Pods instances with new ones.
Previous Next
Similar to application Scaling, if a Deployment is exposed publicly, the Service will load-balance
the traffic only to available Pods during the update. An available Pod is an instance that is
available to the users of the application.
• Promote an application from one environment to another (via container image updates)
• Rollback to previous versions
• Continuous Integration and Continuous Delivery of applications with zero downtime
If a Deployment is exposed publicly, the Service will load-balance the traffic only to available Pods
during the update.
In the following interactive tutorial, we'll update our application to a new version, and also
perform a rollback.
To list your Deployments, run the get deployments subcommand: kubectl get deployments
To view the current image version of the app, run the describe pods subcommand and look for
the Image field:
To update the image of the application to version 2, use the set image subcommand, followed
by the deployment name and the new image version:
The command notified the Deployment to use a different image for your app and initiated a
rolling update. Check the status of the new Pods, and view the old one terminating with the
get pods subcommand:
Verify an update
First, check that the service is running, as you might have deleted it in previous tutorial step,
run describe services/kubernetes-bootcamp. If it's missing, you can create it again with:
Create an environment variable called NODE_PORT that has the value of the Node port
assigned:
Every time you run the curl command, you will hit a different Pod. Notice that all Pods are now
running the latest version (v2).
You can also confirm the update by running the rollout status subcommand:
kubectl rollout status deployments/kubernetes-bootcamp
To view the current image version of the app, run the describe pods subcommand:
In the Image field of the output, verify that you are running the latest image version (v2).
Let’s perform another update, and try to deploy an image tagged with v10:
Notice that the output doesn't list the desired number of available Pods. Run the get pods
subcommand to list all Pods:
To get more insight into the problem, run the describe pods subcommand:
In the Events section of the output for the affected Pods, notice that the v10 image version did
not exist in the repository.
To roll back the deployment to your last working version, use the rollout undo subcommand:
The rollout undo command reverts the deployment to the previous known state (v2 of the
image). Updates are versioned and you can revert to any previously known state of a
Deployment.
Four Pods are running. To check the image deployed on these Pods, use the describe pods
subcommand:
The Deployment is once again using a stable version of the app (v2). The rollback was
successful.
Configuration
There are several ways to set environment variables for a Docker container in Kubernetes,
including: Dockerfile, kubernetes.yml, Kubernetes ConfigMaps, and Kubernetes Secrets. In the
tutorial, you will learn how to use the latter two for setting your environment variables whose
values will be injected into your microservices. One of the benefits for using ConfigMaps and
Secrets is that they can be re-used across multiple containers, including being assigned to
different environment variables for the different containers.
ConfigMaps are API Objects that store non-confidential key-value pairs. In the Interactive
Tutorial you will learn how to use a ConfigMap to store the application's name. For more
information regarding ConfigMaps, you can find the documentation here.
Although Secrets are also used to store key-value pairs, they differ from ConfigMaps in that
they're intended for confidential/sensitive information and are stored using Base64 encoding.
This makes secrets the appropriate choice for storing such things as credentials, keys, and
tokens, the former of which you'll do in the Interactive Tutorial. For more information on
Secrets, you can find the documentation here.
Externalizing Config from Code
Many open source frameworks and runtimes implement and support MicroProfile Config.
Throughout the interactive tutorial, you'll be using Open Liberty, a flexible open-source Java
runtime for building and running cloud-native apps and microservices. However, any
MicroProfile compatible runtime could be used instead.
Objectives
• Create a Kubernetes ConfigMap and Secret
• Inject microservice configuration using MicroProfile Config
• Killercoda
• Play with Kubernetes
You need to have the curl command-line tool for making HTTP requests from the terminal or
command prompt. If you do not have curl available, you can install it. Check the documentation
for your local operating system.
Objectives
• Update configuration via a ConfigMap mounted as a Volume
• Update environment variables of a Pod via a ConfigMap
• Update configuration via a ConfigMap in a multi-container Pod
• Update configuration via a ConfigMap in a Pod possessing a Sidecar Container
Below is an example of a Deployment manifest with the ConfigMap sport mounted as a volume
into the Pod's only container.
deployments/deployment-with-configmap-as-volume.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-volume
labels:
app.kubernetes.io/name: configmap-volume
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: configmap-volume
template:
metadata:
labels:
app.kubernetes.io/name: configmap-volume
spec:
containers:
- name: alpine
image: alpine:3
command:
- /bin/sh
- -c
- while true; do echo "$(date) My preferred sport is $(cat /etc/config/sport)";
sleep 10; done;
ports:
- containerPort: 80
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: sport
Check the pods for this Deployment to ensure they are ready (matching by selector):
On each node where one of these Pods is running, the kubelet fetches the data for that
ConfigMap and translates it to files in a local volume. The kubelet then mounts that volume into
the container, as specified in the Pod template. The code running in that container loads the
information from the file and uses it to print a report to stdout. You can check this report by
viewing the logs for one of the Pods in that Deployment:
# Pick one Pod that belongs to the Deployment, and view its logs
kubectl logs deployments/configmap-volume
In the editor that appears, change the value of key sport from football to cricket. Save your
changes. The kubectl tool updates the ConfigMap accordingly (if you see an error, try again).
Here's an example of how that manifest could look after you edit it:
apiVersion: v1
data:
sport: cricket
kind: ConfigMap
# You can leave the existing metadata as they are.
# The values you'll see won't exactly match these.
metadata:
creationTimestamp: "2024-01-04T14:05:06Z"
name: sport
namespace: default
resourceVersion: "1743935"
uid: 024ee001-fe72-487e-872e-34d6464a8a23
configmap/sport edited
Tail (follow the latest entries in) the logs of one of the pods that belongs to this Deployment:
After few seconds, you should see the log output change as follows:
When you have a ConfigMap that is mapped into a running Pod using either a configMap
volume or a projected volume, and you update that ConfigMap, the running Pod sees the
update almost immediately.
However, your application only sees the change if it is written to either poll for changes, or
watch for file updates.
An application that loads its configuration once at startup will not notice a change.
Note:
The total delay from the moment when the ConfigMap is updated to the moment when new
keys are projected to the Pod can be as long as kubelet sync period.
Also check Mounted ConfigMaps are updated automatically.
Below is an example of a Deployment manifest with an environment variable configured via the
ConfigMap fruits.
deployments/deployment-with-configmap-as-envvar.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-env-var
labels:
app.kubernetes.io/name: configmap-env-var
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: configmap-env-var
template:
metadata:
labels:
app.kubernetes.io/name: configmap-env-var
spec:
containers:
- name: alpine
image: alpine:3
env:
- name: FRUITS
valueFrom:
configMapKeyRef:
key: fruits
name: fruits
command:
- /bin/sh
- -c
- while true; do echo "$(date) The basket is full of $FRUITS";
sleep 10; done;
ports:
- containerPort: 80
Check the pods for this Deployment to ensure they are ready (matching by selector):
The key-value pair in the ConfigMap is configured as an environment variable in the container
of the Pod. Check this by viewing the logs of one Pod that belongs to the Deployment.
In the editor that appears, change the value of key fruits from apples to mangoes. Save your
changes. The kubectl tool updates the ConfigMap accordingly (if you see an error, try again).
Here's an example of how that manifest could look after you edit it:
apiVersion: v1
data:
fruits: mangoes
kind: ConfigMap
# You can leave the existing metadata as they are.
# The values you'll see won't exactly match these.
metadata:
creationTimestamp: "2024-01-04T16:04:19Z"
name: fruits
namespace: default
resourceVersion: "1749472"
configmap/fruits edited
Tail the logs of the Deployment and observe the output for few seconds:
Notice that the output remains unchanged, even though you edited the ConfigMap:
Note:
Although the value of the key inside the ConfigMap has changed, the environment variable in
the Pod still shows the earlier value. This is because environment variables for a process
running inside a Pod are not updated when the source data changes; if you wanted to force an
update, you would need to have Kubernetes replace your existing Pods. The new Pods would
then run with the updated information.
You can trigger that replacement. Perform a rollout for the Deployment, using kubectl rollout:
The rollout causes Kubernetes to make a new ReplicaSet for the Deployment; that means the
existing Pods eventually terminate, and new ones are created. After few seconds, you should
see an output similar to:
Note:
Please wait for the older Pods to fully terminate before proceeding with the next steps.
# Pick one Pod that belongs to the Deployment, and view its logs
kubectl logs deployment/configmap-env-var
This demonstrates the scenario of updating environment variables in a Pod that are derived
from a ConfigMap. Changes to the ConfigMap values are applied to the Pod during the
subsequent rollout. If Pods get created for another reason, such as scaling up the Deployment,
then the new Pods also use the latest configuration values; if you don't trigger a rollout, then
you might find that your app is running with a mix of old and new environment variable
values.
Below is an example manifest for a Deployment that manages a set of Pods, each with two
containers. The two containers share an emptyDir volume that they use to communicate. The
first container runs a web server (nginx). The mount path for the shared volume in the web
server container is /usr/share/nginx/html. The second helper container is based on alpine, and
for this container the emptyDir volume is mounted at /pod-data. The helper container writes a
file in HTML that has its content based on a ConfigMap. The web server container serves the
HTML via HTTP.
deployments/deployment-with-configmap-two-containers.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-two-containers
labels:
app.kubernetes.io/name: configmap-two-containers
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: configmap-two-containers
template:
metadata:
labels:
app.kubernetes.io/name: configmap-two-containers
spec:
volumes:
- name: shared-data
emptyDir: {}
- name: config-volume
configMap:
name: color
containers:
- name: nginx
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: alpine
image: alpine:3
volumeMounts:
- name: shared-data
mountPath: /pod-data
- name: config-volume
mountPath: /etc/config
command:
- /bin/sh
- -c
- while true; do echo "$(date) My preferred color is $(cat /etc/config/color)" > /pod-data/
index.html;
sleep 10; done;
Expose the Deployment (the kubectl tool creates a Service for you):
curl http://localhost:8080
In the editor that appears, change the value of key color from red to blue. Save your changes.
The kubectl tool updates the ConfigMap accordingly (if you see an error, try again).
Here's an example of how that manifest could look after you edit it:
apiVersion: v1
data:
color: blue
kind: ConfigMap
# You can leave the existing metadata as they are.
# The values you'll see won't exactly match these.
metadata:
creationTimestamp: "2024-01-05T08:12:05Z"
name: color
namespace: configmap
resourceVersion: "1801272"
uid: 80d33e4a-cbb4-4bc9-ba8c-544c68e425d6
If you are continuing from the previous scenario, you can reuse the ConfigMap named color for
this scenario.
If you are executing this scenario independently, use the kubectl create configmap command to
create a ConfigMap from literal values:
Below is an example manifest for a Deployment that manages a set of Pods, each with a main
container and a sidecar container. The two containers share an emptyDir volume that they use
to communicate. The main container runs a web server (NGINX). The mount path for the
shared volume in the web server container is /usr/share/nginx/html. The second container is a
Sidecar Container based on Alpine Linux which acts as a helper container. For this container
the emptyDir volume is mounted at /pod-data. The Sidecar Container writes a file in HTML
that has its content based on a ConfigMap. The web server container serves the HTML via
HTTP.
deployments/deployment-with-configmap-and-sidecar-container.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: configmap-sidecar-container
labels:
app.kubernetes.io/name: configmap-sidecar-container
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: configmap-sidecar-container
template:
metadata:
labels:
app.kubernetes.io/name: configmap-sidecar-container
spec:
volumes:
- name: shared-data
emptyDir: {}
- name: config-volume
configMap:
name: color
containers:
- name: nginx
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
initContainers:
- name: alpine
image: alpine:3
restartPolicy: Always
volumeMounts:
- name: shared-data
mountPath: /pod-data
- name: config-volume
mountPath: /etc/config
command:
- /bin/sh
- -c
- while true; do echo "$(date) My preferred color is $(cat /etc/config/color)" > /pod-data/
index.html;
sleep 10; done;
Check the pods for this Deployment to ensure they are ready (matching by selector):
Expose the Deployment (the kubectl tool creates a Service for you):
curl http://localhost:8081
In the editor that appears, change the value of key color from blue to green. Save your changes.
The kubectl tool updates the ConfigMap accordingly (if you see an error, try again).
Here's an example of how that manifest could look after you edit it:
apiVersion: v1
data:
color: green
kind: ConfigMap
# You can leave the existing metadata as they are.
# The values you'll see won't exactly match these.
metadata:
creationTimestamp: "2024-02-17T12:20:30Z"
name: color
namespace: default
resourceVersion: "1054"
uid: e40bb34c-58df-4280-8bea-6ed16edccfaa
Immutable ConfigMaps are especially used for configuration that is constant and is not
expected to change over time. Marking a ConfigMap as immutable allows a performance
improvement where the kubelet does not watch for changes.
• change the name of the ConfigMap, and switch to running Pods that reference the new
name
• replace all the nodes in your cluster that have previously run a Pod that used the old
value
• restart the kubelet on any node where the kubelet previously loaded the old ConfigMap
configmap/immutable-configmap.yaml
apiVersion: v1
data:
company_name: "ACME, Inc." # existing fictional company name
kind: ConfigMap
immutable: true
metadata:
name: company-name-20150801
deployments/deployment-with-immutable-configmap-as-volume.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: immutable-configmap-volume
labels:
app.kubernetes.io/name: immutable-configmap-volume
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/name: immutable-configmap-volume
template:
metadata:
labels:
app.kubernetes.io/name: immutable-configmap-volume
spec:
containers:
- name: alpine
image: alpine:3
command:
- /bin/sh
- -c
- while true; do echo "$(date) The name of the company is $(cat /etc/config/
company_name)";
sleep 10; done;
ports:
- containerPort: 80
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: company-name-20150801
Check the pods for this Deployment to ensure they are ready (matching by selector):
The Pod's container refers to the data defined in the ConfigMap and uses it to print a report to
stdout. You can check this report by viewing the logs for one of the Pods in that Deployment:
# Pick one Pod that belongs to the Deployment, and view its logs
kubectl logs deployments/immutable-configmap-volume
Note:
Once a ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate
the contents of the data or the binaryData field.
In order to modify the behavior of the Pods that use this configuration, you will create a new
immutable ConfigMap and edit the Deployment to define a slightly different pod template,
referencing the new ConfigMap.
configmap/new-immutable-configmap.yaml
apiVersion: v1
data:
company_name: "Fiktivesunternehmen GmbH" # new fictional company name
kind: ConfigMap
immutable: true
metadata:
name: company-name-20240312
configmap/company-name-20240312 created
You should see an output displaying both the old and new ConfigMaps:
In the editor that appears, update the existing volume definition to use the new ConfigMap.
volumes:
- configMap:
defaultMode: 420
name: company-name-20240312 # Update this field
name: config-volume
deployment.apps/immutable-configmap-volume edited
This will trigger a rollout. Wait for all the previous Pods to terminate and the new Pods to be in
a ready state.
# Pick one Pod that belongs to the Deployment, and view its logs
kubectl logs deployment/immutable-configmap-volume
Once all the deployments have migrated to use the new immutable ConfigMap, it is advised to
delete the old one.
Summary
Changes to a ConfigMap mounted as a Volume on a Pod are available seamlessly after the
subsequent kubelet sync.
Changes to a ConfigMap that configures environment variables for a Pod are available after the
subsequent rollout for the Pod.
Once a ConfigMap is marked as immutable, it is not possible to revert this change (you cannot
make an immutable ConfigMap mutable), and you also cannot make any change to the contents
of the data or the binaryData field. You can delete and recreate the ConfigMap, or you can make
a new different ConfigMap. When you delete a ConfigMap, running containers and their Pods
maintain a mount point to any volume that referenced that existing ConfigMap.
Cleaning up
Terminate the kubectl port-forward commands in case they are running.
kubectl delete configmap company-name-20150801 # In case it was not handled during the task
execution
Objectives
• Create a ConfigMap with Redis configuration values
• Create a Redis Pod that mounts and uses the created ConfigMap
• Verify that the configuration was correctly applied.
• Killercoda
• Play with Kubernetes
• The example shown on this page works with kubectl 1.14 and above.
• Understand Configure a Pod to Use a ConfigMap.
Examine the contents of the Redis pod manifest and note the following:
This has the net effect of exposing the data in data.redis-config from the example-redis-config
ConfigMap above as /redis-master/redis.conf inside the Pod.
pods/config/redis-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: redis
spec:
containers:
- name: redis
image: redis:5.0.4
command:
- redis-server
- "/redis-master/redis.conf"
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "0.1"
volumeMounts:
- mountPath: /redis-master-data
name: data
- mountPath: /redis-master
name: config
volumes:
- name: data
emptyDir: {}
- name: config
configMap:
name: example-redis-config
items:
- key: redis-config
path: redis.conf
Examine the created objects:
Name: example-redis-config
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
redis-config:
Use kubectl exec to enter the pod and run the redis-cli tool to check the current configuration:
Check maxmemory:
1) "maxmemory"
2) "0"
1) "maxmemory-policy"
2) "noeviction"
pods/config/example-redis-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: example-redis-config
data:
redis-config: |
maxmemory 2mb
maxmemory-policy allkeys-lru
Name: example-redis-config
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
redis-config:
----
maxmemory 2mb
maxmemory-policy allkeys-lru
Check the Redis Pod again using redis-cli via kubectl exec to see if the configuration was
applied:
Check maxmemory:
1) "maxmemory"
2) "0"
Returns:
1) "maxmemory-policy"
2) "noeviction"
The configuration values have not changed because the Pod needs to be restarted to grab
updated values from associated ConfigMaps. Let's delete and recreate the Pod:
kubectl delete pod redis
kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/
examples/pods/config/redis-pod.yaml
Check maxmemory:
1) "maxmemory"
2) "2097152"
1) "maxmemory-policy"
2) "allkeys-lru"
What's next
• Learn more about ConfigMaps.
• Follow an example of Updating configuration via a ConfigMap.
Security
Security is an important concern for most organizations and people who run Kubernetes
clusters. You can find a basic security checklist elsewhere in the Kubernetes documentation.
To learn how to deploy and manage security aspects of Kubernetes, you can follow the tutorials
in this section.
Pod Security is an admission controller that carries out checks against the Kubernetes Pod
Security Standards when new pods are created. It is a feature GA'ed in v1.25. This tutorial
shows you how to enforce the baseline Pod Security Standard at the cluster level which applies
a standard configuration to all namespaces in a cluster.
To apply Pod Security Standards to specific namespaces, refer to Apply Pod Security Standards
at the namespace level.
If you are running a version of Kubernetes other than v1.30, check the documentation for that
version.
• kind
• kubectl
This tutorial demonstrates what you can configure for a Kubernetes cluster that you fully
control. If you are learning how to configure Pod Security Admission for a managed cluster
where you are not able to configure the control plane, read Apply Pod Security Standards at the
namespace level.
To gather information that helps you to choose the Pod Security Standards that are most
appropriate for your configuration, do the following:
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
kubectl get ns
4. Use --dry-run=server to understand what happens when different Pod Security Standards
are applied:
1. Privileged
namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
namespace/kube-system labeled
namespace/local-path-storage labeled
2. Baseline
namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
Warning: existing pods in namespace "kube-system" violate the new PodSecurity
enforce level "baseline:latest"
Warning: etcd-psa-wo-cluster-pss-control-plane (and 3 other pods): host
namespaces, hostPath volumes
Warning: kindnet-vzj42: non-default capabilities, host namespaces, hostPath
volumes
Warning: kube-proxy-m6hwf: host namespaces, hostPath volumes, privileged
namespace/kube-system labeled
namespace/local-path-storage labeled
3. Restricted
namespace/default labeled
namespace/kube-node-lease labeled
namespace/kube-public labeled
Warning: existing pods in namespace "kube-system" violate the new PodSecurity
enforce level "restricted:latest"
Warning: coredns-7bb9c7b568-hsptc (and 1 other pod): unrestricted capabilities,
runAsNonRoot != true, seccompProfile
Warning: etcd-psa-wo-cluster-pss-control-plane (and 3 other pods): host
namespaces, hostPath volumes, allowPrivilegeEscalation != false, unrestricted
capabilities, restricted volume types, runAsNonRoot != true
Warning: kindnet-vzj42: non-default capabilities, host namespaces, hostPath
volumes, allowPrivilegeEscalation != false, unrestricted capabilities, restricted
volume types, runAsNonRoot != true, seccompProfile
Warning: kube-proxy-m6hwf: host namespaces, hostPath volumes, privileged,
allowPrivilegeEscalation != false, unrestricted capabilities, restricted volume types,
runAsNonRoot != true, seccompProfile
namespace/kube-system labeled
Warning: existing pods in namespace "local-path-storage" violate the new
PodSecurity enforce level "restricted:latest"
Warning: local-path-provisioner-d6d9f7ffc-lw9lh: allowPrivilegeEscalation != false,
unrestricted capabilities, runAsNonRoot != true, seccompProfile
namespace/local-path-storage labeled
From the previous output, you'll notice that applying the privileged Pod Security Standard
shows no warnings for any namespaces. However, baseline and restricted standards both have
warnings, specifically in the kube-system namespace.
Set modes, versions and standards
In this section, you apply the following Pod Security Standards to the latest version:
The baseline Pod Security Standard provides a convenient middle ground that allows keeping
the exemption list short and prevents known privilege escalations.
Additionally, to prevent pods from failing in kube-system, you'll exempt the namespace from
having Pod Security Standards applied.
When you implement Pod Security Admission in your own environment, consider the
following:
1. Based on the risk posture applied to a cluster, a stricter Pod Security Standard like
restricted might be a better choice.
3. Create a configuration file that can be consumed by the Pod Security Admission
Controller to implement these Pod Security Standards:
mkdir -p /tmp/pss
cat <<EOF > /tmp/pss/cluster-level-pss.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1
kind: PodSecurityConfiguration
defaults:
enforce: "baseline"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces: [kube-system]
EOF
Note:
Note:
If you use Docker Desktop with kind on macOS, you can add /tmp as a Shared Directory
under the menu item Preferences > Resources > File Sharing.
5. Create a cluster that uses Pod Security Admission to apply these Pod Security Standards:
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
security/example-baseline-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
Clean up
Now delete the clusters which you created above by running the following command:
Pod Security Admission is an admission controller that applies Pod Security Standards when
pods are created. It is a feature GA'ed in v1.25. In this tutorial, you will enforce the baseline Pod
Security Standard, one namespace at a time.
You can also apply Pod Security Standards to multiple namespaces at once at the cluster level.
For instructions, refer to Apply Pod Security Standards at the cluster level.
• kind
• kubectl
Create cluster
1. Create a kind cluster as follows:
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Create a namespace
Create a new namespace called example:
namespace/example created
2. You can configure multiple pod security standard checks on any namespace, using labels.
The following command will enforce the baseline Pod Security Standard, but warn and
audit for restricted Pod Security Standards as per the latest version (default value)
The Pod does start OK; the output includes a warning. For example:
pod/nginx created
The Pod Security Standards enforcement and warning settings were applied only to the
example namespace. You could create the same Pod in the default namespace with no warnings.
Clean up
Now delete the cluster which you created above by running the following command:
What's next
• Run a shell script to perform all the preceding steps all at once.
This page shows you how to load AppArmor profiles on your nodes and enforce those profiles
in Pods. To learn more about how Kubernetes can confine Pods using AppArmor, see Linux
kernel security constraints for Pods and containers.
Objectives
• See an example of how to load a profile on a Node
• Learn how to enforce the profile on a Pod
• Learn how to check that the profile is loaded
• See what happens when a profile is violated
• See what happens when a profile cannot be loaded
1. AppArmor kernel module is enabled -- For the Linux kernel to enforce an AppArmor
profile, the AppArmor kernel module must be installed and enabled. Several distributions
enable the module by default, such as Ubuntu and SUSE, and many others provide
optional support. To check whether the module is enabled, check the /sys/module/
apparmor/parameters/enabled file:
cat /sys/module/apparmor/parameters/enabled
Y
The kubelet verifies that AppArmor is enabled on the host before admitting a pod with
AppArmor explicitly configured.
apparmor-test-deny-write (enforce)
apparmor-test-audit-write (enforce)
docker-default (enforce)
k8s-nginx (enforce)
For more details on loading profiles on nodes, see Setting up nodes with profiles.
Securing a Pod
Note:
Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the
documentation version selector to view the documentation with this deprecated API.
AppArmor profiles can be specified at the pod level or container level. The container AppArmor
profile takes precedence over the pod profile.
securityContext:
appArmorProfile:
type: <profile_type>
See the API Reference for the full details on the AppArmor profile API.
To verify that the profile was applied, you can check that the container's root process is
running with the correct profile by examining its proc attr:
cri-containerd.apparmor.d (enforce)
Example
This example assumes you have already set up a cluster with AppArmor support.
First, load the profile you want to use onto your Nodes. This profile blocks all file write
operations:
#include <tunables/global>
file,
# This example assumes that node names match host names, and are reachable via SSH.
NODES=($(kubectl get nodes -o name))
file,
Next, run a simple "Hello AppArmor" Pod with the deny-write profile:
pods/security/hello-apparmor.yaml
apiVersion: v1
kind: Pod
metadata:
name: hello-apparmor
spec:
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: k8s-apparmor-example-deny-write
containers:
- name: hello
image: busybox:1.28
command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]
You can verify that the container is actually running with that profile by checking /proc/1/attr/
current:
k8s-apparmor-example-deny-write (enforce)
Finally, you can see what happens if you violate the profile by writing to a file:
To wrap up, see what happens if you try to specify a profile that hasn't been loaded:
pod/hello-apparmor-2 created
Although the Pod was created successfully, further examination will show that it is stuck in
pending:
Name: hello-apparmor-2
Namespace: default
Node: gke-test-default-pool-239f5d02-x1kf/10.128.0.27
Start Time: Tue, 30 Aug 2016 17:58:56 -0700
Labels: <none>
Annotations: container.apparmor.security.beta.kubernetes.io/hello=localhost/k8s-apparmor-
example-allow-write
Status: Pending
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10s default-scheduler Successfully assigned default/hello-
apparmor to gke-test-default-pool-239f5d02-x1kf
Normal Pulled 8s kubelet Successfully pulled image "busybox:1.28" in
370.157088ms (370.172701ms including waiting)
Normal Pulling 7s (x2 over 9s) kubelet Pulling image "busybox:1.28"
Warning Failed 7s (x2 over 8s) kubelet Error: failed to get container spec opts: failed
to generate apparmor spec opts: apparmor profile not found k8s-apparmor-example-allow-write
Normal Pulled 7s kubelet Successfully pulled image "busybox:1.28" in
90.980331ms (91.005869ms including waiting)
An Event provides the error message with the reason, the specific wording is runtime-
dependent:
Warning Failed 7s (x2 over 8s) kubelet Error: failed to get container spec opts: failed
to generate apparmor spec opts: apparmor profile not found
Administration
Setting up Nodes with profiles
Kubernetes 1.30 does not provide any built-in mechanisms for loading AppArmor profiles onto
Nodes. Profiles can be loaded through custom infrastructure or tools like the Kubernetes
Security Profiles Operator.
The scheduler is not aware of which profiles are loaded onto which Node, so the full set of
profiles must be loaded onto every Node. An alternative approach is to add a Node label for
each profile (or class of profiles) on the Node, and use a node selector to ensure the Pod is run
on a Node with the required profile.
Authoring Profiles
Getting AppArmor profiles specified correctly can be a tricky business. Fortunately there are
some tools to help with that:
To debug problems with AppArmor, you can check the system logs to see what, specifically,
was denied. AppArmor logs verbose messages to dmesg, and errors can usually be found in the
system logs or through journalctl. More information is provided in AppArmor failures.
Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the
documentation version selector to view the documentation with this deprecated API.
type (required) - indicates which kind of AppArmor profile will be applied. Valid options are:
Localhost
a profile pre-loaded on the node (specified by localhostProfile).
RuntimeDefault
the container runtime's default profile.
Unconfined
no AppArmor enforcement.
localhostProfile - The name of a profile loaded on the node that should be used. The profile must
be preconfigured on the node to work. This option must be provided if and only if the type is
Localhost.
What's next
Additional resources:
Seccomp stands for secure computing mode and has been a feature of the Linux kernel since
version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able
to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp
profiles loaded onto a node to your Pods and containers.
Identifying the privileges required for your workloads can be difficult. In this tutorial, you will
go through how to load seccomp profiles into a local Kubernetes cluster, how to apply them to a
Pod, and how you can begin to craft profiles that give only the necessary privileges to your
container processes.
Objectives
• Learn how to load seccomp profiles on a node
• Learn how to apply a seccomp profile to a container
• Observe auditing of syscalls made by a container process
• Observe behavior when a missing profile is specified
• Observe a violation of a seccomp profile
• Learn how to create fine-grained seccomp profiles
• Learn how to apply a container runtime default seccomp profile
The commands used in the tutorial assume that you are using Docker as your container
runtime. (The cluster that kind creates may use a different container runtime internally). You
could also use Podman but in that case, you would have to follow specific instructions in order
to complete the tasks successfully.
This tutorial shows some examples that are still beta (since v1.25) and others that use only
generally available seccomp functionality. You should make sure that your cluster is configured
correctly for the version you are using.
The tutorial also uses the curl tool for downloading examples to your computer. You can adapt
the steps to use a different tool if you prefer.
Note:
It is not possible to apply a seccomp profile to a container running with privileged: true set in
the container's securityContext. Privileged containers always run as Unconfined.
• audit.json
• violation.json
• fine-grained.json
pods/security/seccomp/profiles/audit.json
{
"defaultAction": "SCMP_ACT_LOG"
}
pods/security/seccomp/profiles/violation.json
{
"defaultAction": "SCMP_ACT_ERRNO"
}
pods/security/seccomp/profiles/fine-grained.json
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": [
"accept4",
"epoll_wait",
"pselect6",
"futex",
"madvise",
"epoll_ctl",
"getsockname",
"setsockopt",
"vfork",
"mmap",
"read",
"write",
"close",
"arch_prctl",
"sched_getaffinity",
"munmap",
"brk",
"rt_sigaction",
"rt_sigprocmask",
"sigaltstack",
"gettid",
"clone",
"bind",
"socket",
"openat",
"readlinkat",
"exit_group",
"epoll_create1",
"listen",
"rt_sigreturn",
"sched_yield",
"clock_gettime",
"connect",
"dup2",
"epoll_pwait",
"execve",
"exit",
"fcntl",
"getpid",
"getuid",
"ioctl",
"mprotect",
"nanosleep",
"open",
"poll",
"recvfrom",
"sendto",
"set_tid_address",
"setitimer",
"writev",
"fstatfs",
"getdents64",
"pipe2",
"getrlimit"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
You should see three profiles listed at the end of the final step:
pods/security/seccomp/kind.yaml
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
extraMounts:
- hostPath: "./profiles"
containerPath: "/var/lib/kubelet/seccomp/profiles"
Download that example kind configuration, and save it to a file named kind.yaml:
curl -L -O https://k8s.io/examples/pods/security/seccomp/kind.yaml
You can set a specific Kubernetes version by setting the node's container image. See Nodes
within the kind documentation about configuration for more details on this. This tutorial
assumes you are using Kubernetes v1.30.
As a beta feature, you can configure Kubernetes to use the profile that the container runtime
prefers by default, rather than falling back to Unconfined. If you want to try that, see enable the
use of RuntimeDefault as the default seccomp profile for all workloads before you continue.
Once you have a kind configuration in place, create the kind cluster with that configuration:
After the new Kubernetes cluster is ready, identify the Docker container running as the single
node cluster:
docker ps
You should see output indicating that a container is running with name kind-control-plane. The
output is similar to:
If observing the filesystem of that container, you should see that the profiles/ directory has been
successfully loaded into the default seccomp path of the kubelet. Use docker exec to run a
command in the Pod:
You have verified that these seccomp profiles are available to the kubelet running within kind.
Note:
If you have the seccompDefault configuration enabled, then Pods use the RuntimeDefault
seccomp profile whenever no other seccomp profile is specified. Otherwise, the default is
Unconfined.
Here's a manifest for a Pod that requests the RuntimeDefault seccomp profile for all its
containers:
pods/security/seccomp/ga/default-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: default-pod
labels:
app: default-pod
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: test-container
image: hashicorp/http-echo:1.0
args:
- "-text=just made some more syscalls!"
securityContext:
allowPrivilegeEscalation: false
pods/security/seccomp/ga/audit-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: audit-pod
labels:
app: audit-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
containers:
- name: test-container
image: hashicorp/http-echo:1.0
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false
Note:
Older versions of Kubernetes allowed you to configure seccomp behavior using annotations.
Kubernetes 1.30 only supports using fields within .spec.securityContext to configure seccomp,
and this tutorial explains that approach.
This profile does not restrict any syscalls, so the Pod should start successfully.
In order to be able to interact with this endpoint exposed by this container, create a NodePort
Service that allows access to the endpoint from inside the kind control plane container.
Check what port the Service has been assigned on the node.
Now you can use curl to access that endpoint from inside the kind control plane container, at
the port exposed by this Service. Use docker exec to run the curl command within the container
belonging to that control plane container:
# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw
from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373
You can see that the process is running, but what syscalls did it actually make? Because this
Pod is running in a local cluster, you should be able to see those in /var/log/syslog on your local
system. Open up a new terminal window and tail the output for calls from http-echo:
You should already see some logs of syscalls made by http-echo, and if you run curl again inside
the control plane container you will see more output written to the log.
For example:
You can begin to understand the syscalls required by the http-echo process by looking at the
syscall= entry on each line. While these are unlikely to encompass all syscalls it uses, it can
serve as a basis for a seccomp profile for this container.
Delete the Service and the Pod before moving to the next section:
pods/security/seccomp/ga/violation-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: violation-pod
labels:
app: violation-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/violation.json
containers:
- name: test-container
image: hashicorp/http-echo:1.0
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false
Attempt to create the Pod in the cluster:
The Pod creates, but there is an issue. If you check the status of the Pod, you should see that it
failed to start.
As seen in the previous example, the http-echo process requires quite a few syscalls. Here
seccomp has been instructed to error on any syscall by setting "defaultAction":
"SCMP_ACT_ERRNO". This is extremely secure, but removes the ability to do anything
meaningful. What you really want is to give workloads only the privileges they need.
pods/security/seccomp/ga/fine-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: fine-pod
labels:
app: fine-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/fine-grained.json
containers:
- name: test-container
image: hashicorp/http-echo:1.0
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false
Create the Pod in your cluster:
Open up a new terminal window and use tail to monitor for log entries that mention calls from
http-echo:
Check what port the Service has been assigned on the node:
Use curl to access that endpoint from inside the kind control plane container:
# Change 6a96207fed4b to the control plane container ID and 32373 to the port number you saw
from "docker ps"
docker exec -it 6a96207fed4b curl localhost:32373
You should see no output in the syslog. This is because the profile allowed all necessary syscalls
and specified that an error should occur if one outside of the list is invoked. This is an ideal
situation from a security perspective, but required some effort in analyzing the program. It
would be nice if there was a simple way to get closer to this security without requiring as much
effort.
Delete the Service and the Pod before moving to the next section:
If enabled, the kubelet will use the RuntimeDefault seccomp profile by default, which is defined
by the container runtime, instead of using the Unconfined (seccomp disabled) mode. The default
profiles aim to provide a strong set of security defaults while preserving the functionality of the
workload. It is possible that the default profiles differ between container runtimes and their
release versions, for example when comparing those from CRI-O and containerd.
Note:
Enabling the feature will neither change the Kubernetes securityContext.seccompProfile API
field nor add the deprecated annotations of the workload. This provides users the possibility to
rollback anytime without actually changing the workload configuration. Tools like crictl inspect
can be used to verify which seccomp profile is being used by a container.
Some workloads may require a lower amount of syscall restrictions than others. This means
that they can fail during runtime even with the RuntimeDefault profile. To mitigate such a
failure, you can:
If you were introducing this feature into production-like cluster, the Kubernetes project
recommends that you enable this feature gate on a subset of your nodes and then test workload
execution before rolling the change out cluster-wide.
You can find more detailed information about a possible upgrade and downgrade strategy in the
related Kubernetes Enhancement Proposal (KEP): Enable seccomp by default.
Kubernetes 1.30 lets you configure the seccomp profile that applies when the spec for a Pod
doesn't define a specific seccomp profile. However, you still need to enable this defaulting for
each node where you would like to use it.
If you are running a Kubernetes 1.30 cluster and want to enable the feature, either run the
kubelet with the --seccomp-default command line flag, or enable it through the kubelet
configuration file. To enable the feature gate in kind, ensure that kind provides the minimum
required Kubernetes version and enables the SeccompDefault feature in the kind configuration:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/
node:v1.28.0@sha256:9f3ff58f19dcf1a0611d11e8ac989fdb30a28f40f236f59f0bea31fb956ccf5c
kubeadmConfigPatches:
-|
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
- role: worker
image: kindest/
node:v1.28.0@sha256:9f3ff58f19dcf1a0611d11e8ac989fdb30a28f40f236f59f0bea31fb956ccf5c
kubeadmConfigPatches:
-|
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
seccomp-default: "true"
Should now have the default seccomp profile attached. This can be verified by using docker
exec to run crictl inspect for the container on the kind worker:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": ["..."]
}
]
}
What's next
You can learn more about Linux seccomp:
• A seccomp Overview
• Seccomp Security Profiles for Docker
Stateless Applications
Objectives
• Run five instances of a Hello World application.
• Create a Service object that exposes an external IP address.
• Use the Service object to access the running application.
service/load-balancer-example.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: load-balancer-example
name: hello-world
spec:
replicas: 5
selector:
matchLabels:
app.kubernetes.io/name: load-balancer-example
template:
metadata:
labels:
app.kubernetes.io/name: load-balancer-example
spec:
containers:
- image: gcr.io/google-samples/node-hello:1.0
name: hello-world
ports:
- containerPort: 8080
Note:
Note:
If the external IP address is shown as <pending>, wait for a minute and enter the same
command again.
Name: my-service
Namespace: default
Labels: app.kubernetes.io/name=load-balancer-example
Annotations: <none>
Selector: app.kubernetes.io/name=load-balancer-example
Type: LoadBalancer
IP: 10.3.245.137
LoadBalancer Ingress: 104.198.205.71
Port: <unset> 8080/TCP
NodePort: <unset> 32377/TCP
Endpoints: 10.0.0.6:8080,10.0.1.6:8080,10.0.1.7:8080 + 2 more...
Session Affinity: None
Events: <none>
Make a note of the external IP address (LoadBalancer Ingress) exposed by your service. In
this example, the external IP address is 104.198.205.71. Also note the value of Port and
NodePort. In this example, the Port is 8080 and the NodePort is 32377.
7. In the preceding output, you can see that the service has several endpoints:
10.0.0.6:8080,10.0.1.6:8080,10.0.1.7:8080 + 2 more. These are internal addresses of the pods
that are running the Hello World application. To verify these are pod addresses, enter this
command:
8. Use the external IP address (LoadBalancer Ingress) to access the Hello World application:
curl http://<external-ip>:<port>
Hello Kubernetes!
Cleaning up
To delete the Service, enter this command:
To delete the Deployment, the ReplicaSet, and the Pods that are running the Hello World
application, enter this command:
What's next
Learn more about connecting applications with services.
• Killercoda
• Play with Kubernetes
Your Kubernetes server must be at or later than version v1.14. To check the version, enter
kubectl version.
The manifest file, included below, specifies a Deployment controller that runs a single replica
Redis Pod.
application/guestbook/redis-leader-deployment.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-leader
labels:
app: redis
role: leader
tier: backend
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
role: leader
tier: backend
spec:
containers:
- name: leader
image: "docker.io/redis:6.0.5"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
1. Launch a terminal window in the directory you downloaded the manifest files.
3. Query the list of Pods to verify that the Redis Pod is running:
4. Run the following command to view the logs from the Redis leader Pod:
The guestbook application needs to communicate to the Redis to write its data. You need to
apply a Service to proxy the traffic to the Redis Pod. A Service defines a policy to access the
Pods.
application/guestbook/redis-leader-service.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: v1
kind: Service
metadata:
name: redis-leader
labels:
app: redis
role: leader
tier: backend
spec:
ports:
- port: 6379
targetPort: 6379
selector:
app: redis
role: leader
tier: backend
2. Query the list of Services to verify that the Redis Service is running:
Note:
This manifest file creates a Service named redis-leader with a set of labels that match the labels
previously defined, so the Service routes network traffic to the Redis Pod.
Although the Redis leader is a single Pod, you can make it highly available and meet traffic
demands by adding a few Redis followers, or replicas.
application/guestbook/redis-follower-deployment.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-follower
labels:
app: redis
role: follower
tier: backend
spec:
replicas: 2
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
role: follower
tier: backend
spec:
containers:
- name: follower
image: us-docker.pkg.dev/google-samples/containers/gke/gb-redis-follower:v2
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
2. Verify that the two Redis follower replicas are running by querying the list of Pods:
The guestbook application needs to communicate with the Redis followers to read data. To
make the Redis followers discoverable, you must set up another Service.
application/guestbook/redis-follower-service.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: v1
kind: Service
metadata:
name: redis-follower
labels:
app: redis
role: follower
tier: backend
spec:
ports:
# the port that this service should serve on
- port: 6379
selector:
app: redis
role: follower
tier: backend
Note:
This manifest file creates a Service named redis-follower with a set of labels that match the
labels previously defined, so the Service routes network traffic to the Redis Pod.
The guestbook app uses a PHP frontend. It is configured to communicate with either the Redis
follower or leader Services, depending on whether the request is a read or a write. The frontend
exposes a JSON interface, and serves a jQuery-Ajax-based UX.
application/guestbook/frontend-deployment.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: guestbook
tier: frontend
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5
env:
- name: GET_HOSTS_FROM
value: "dns"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 80
2. Query the list of Pods to verify that the three frontend replicas are running:
The Redis Services you applied is only accessible within the Kubernetes cluster because the
default type for a Service is ClusterIP. ClusterIP provides a single IP address for the set of Pods
the Service is pointing to. This IP address is accessible only within the cluster.
If you want guests to be able to access your guestbook, you must configure the frontend Service
to be externally visible, so a client can request the Service from outside the Kubernetes cluster.
However a Kubernetes user can use kubectl port-forward to access the service even though it
uses a ClusterIP.
Note:
Some cloud providers, like Google Compute Engine or Google Kubernetes Engine, support
external load balancers. If your cloud provider supports load balancers and you want to use it,
uncomment type: LoadBalancer.
application/guestbook/frontend-service.yaml
# SOURCE: https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook
apiVersion: v1
kind: Service
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
# if your cluster supports it, uncomment the following to automatically create
# an external load-balanced IP for the frontend service.
# type: LoadBalancer
#type: LoadBalancer
ports:
# the port that this service should serve on
- port: 80
selector:
app: guestbook
tier: frontend
2. Query the list of Services to verify that the frontend Service is running:
1. Run the following command to forward port 8080 on your local machine to port 80 on the
service.
If you deployed the frontend-service.yaml manifest with type: LoadBalancer you need to find
the IP address to view your Guestbook.
1. Run the following command to get the IP address for the frontend Service.
2. Copy the external IP address, and load the page in your browser to view your guestbook.
Note:
Try adding some guestbook entries by typing in a message, and clicking Submit. The message
you typed appears in the frontend. This message indicates that data is successfully added to
Redis through the Services you created earlier.
2. Query the list of Pods to verify the number of frontend Pods running:
3. Run the following command to scale down the number of frontend Pods:
4. Query the list of Pods to verify the number of frontend Pods running:
1. Run the following commands to delete all Pods, Deployments, and Services.
What's next
• Complete the Kubernetes Basics Interactive Tutorials
• Use Kubernetes to create a blog using Persistent Volumes for MySQL and Wordpress
• Read more about connecting applications with services
• Read more about using labels effectively
Stateful Applications
StatefulSet Basics
StatefulSet Basics
This tutorial provides an introduction to managing applications with StatefulSets. It
demonstrates how to create, delete, scale, and update the Pods of StatefulSets.
Before you begin
Before you begin this tutorial, you should familiarize yourself with the following Kubernetes
concepts:
• Pods
• Cluster DNS
• Headless Services
• PersistentVolumes
• PersistentVolume Provisioning
• The kubectl command line tool
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured
to communicate with your cluster. It is recommended to run this tutorial on a cluster with at
least two nodes that are not acting as control plane hosts. If you do not already have a cluster,
you can create one by using minikube or you can use one of these Kubernetes playgrounds:
• Killercoda
• Play with Kubernetes
You should configure kubectl to use a context that uses the default namespace. If you are using
an existing cluster, make sure that it's OK to use that cluster's default namespace to practice.
Ideally, practice in a cluster that doesn't run any real workloads.
Note:
Objectives
StatefulSets are intended to be used with stateful applications and distributed systems.
However, the administration of stateful applications and distributed systems on Kubernetes is a
broad, complex topic. In order to demonstrate the basic features of a StatefulSet, and not to
conflate the former topic with the latter, you will deploy a simple web application using a
StatefulSet.
application/web/web.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
You will need to use at least two terminal windows. In the first terminal, use kubectl get to
watch the creation of the StatefulSet's Pods.
In the second terminal, use kubectl apply to create the headless Service and StatefulSet:
service/nginx created
statefulset.apps/web created
The command above creates two Pods, each running an NGINX webserver. Get the nginx
Service...
...then get the web StatefulSet, to verify that both were created successfully:
For a StatefulSet with n replicas, when Pods are being deployed, they are created sequentially,
ordered from {0..n-1}. Examine the output of the kubectl get command in the first terminal.
Eventually, the output will look like the example below.
Note:
To configure the integer ordinal assigned to each Pod in a StatefulSet, see Start ordinal.
Pods in a StatefulSet
Pods in a StatefulSet have a unique ordinal index and a stable network identity.
As mentioned in the StatefulSets concept, the Pods in a StatefulSet have a sticky, unique
identity. This identity is based on a unique ordinal index that is assigned to each Pod by the
StatefulSet controller.
The Pods' names take the form <statefulset name>-<ordinal index>. Since the web StatefulSet
has two replicas, it creates two Pods, web-0 and web-1.
Each Pod has a stable hostname based on its ordinal index. Use kubectl exec to execute the
hostname command in each Pod:
web-0
web-1
Use kubectl run to execute a container that provides the nslookup command from the dnsutils
package. Using nslookup on the Pods' hostnames, you can examine their in-cluster DNS
addresses:
Name: web-0.nginx
Address 1: 10.244.1.6
nslookup web-1.nginx
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-1.nginx
Address 1: 10.244.2.6
The CNAME of the headless service points to SRV records (one for each Pod that is Running
and Ready). The SRV records point to A record entries that contain the Pods' IP addresses.
In a second terminal, use kubectl delete to delete all the Pods in the StatefulSet:
Wait for the StatefulSet to restart them, and for both Pods to transition to Running and Ready:
Use kubectl exec and kubectl run to view the Pods' hostnames and in-cluster DNS entries. First,
view the Pods' hostnames:
web-0
web-1
then, run:
kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 10.244.1.7
nslookup web-1.nginx
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-1.nginx
Address 1: 10.244.2.8
The Pods' ordinals, hostnames, SRV records, and A record names have not changed, but the IP
addresses associated with the Pods may have changed. In the cluster used for this tutorial, they
have. This is why it is important not to configure other applications to connect to Pods in a
StatefulSet by the IP address of a particular Pod (it is OK to connect to Pods by resolving their
hostname).
If you need to find and connect to the active members of a StatefulSet, you should query the
CNAME of the headless Service (nginx.default.svc.cluster.local). The SRV records associated
with the CNAME will contain only the Pods in the StatefulSet that are Running and Ready.
If your application already implements connection logic that tests for liveness and readiness,
you can use the SRV records of the Pods ( web-0.nginx.default.svc.cluster.local,
web-1.nginx.default.svc.cluster.local), as they are stable, and your application will be able to
discover the Pods' addresses when they transition to Running and Ready.
If your application wants to find any healthy Pod in a StatefulSet, and therefore does not need
to track each specific Pod, you could also connect to the IP address of a type: ClusterIP Service,
backed by the Pods in that StatefulSet. You can use the same Service that tracks the StatefulSet
(specified in the serviceName of the StatefulSet) or a separate Service that selects the right set
of Pods.
The StatefulSet controller created two PersistentVolumeClaims that are bound to two
PersistentVolumes.
Write the Pods' hostnames to their index.html files and verify that the NGINX webservers serve
the hostnames:
web-0
web-1
Note:
If you instead see 403 Forbidden responses for the above curl command, you will need to fix
the permissions of the directory mounted by the volumeMounts (due to a bug when using
hostPath volumes), by running:
# End this watch when you've reached the end of the section.
# At the start of "Scaling a StatefulSet" you'll start a new watch.
kubectl get pod --watch -l app=nginx
Examine the output of the kubectl get command in the first terminal, and wait for all of the
Pods to transition to Running and Ready.
# This should already be running
kubectl get pod --watch -l app=nginx
web-0
web-1
Even though web-0 and web-1 were rescheduled, they continue to serve their hostnames
because the PersistentVolumes associated with their PersistentVolumeClaims are remounted to
their volumeMounts. No matter what node web-0and web-1 are scheduled on, their
PersistentVolumes will be mounted to the appropriate mount points.
Scaling a StatefulSet
Scaling a StatefulSet refers to increasing or decreasing the number of replicas (horizontal
scaling). This is accomplished by updating the replicas field. You can use either kubectl scale or
kubectl patch to scale a StatefulSet.
Scaling up
Scaling up means adding more replicas. Provided that your app is able to distribute work across
the StatefulSet, the new larger set of Pods can perform more of that work.
# If you already have a watch running, you can continue using that.
# Otherwise, start one.
# End this watch when there are 5 healthy Pods for the StatefulSet
kubectl get pods --watch -l app=nginx
In another terminal window, use kubectl scale to scale the number of replicas to 5:
statefulset.apps/web scaled
Examine the output of the kubectl get command in the first terminal, and wait for the three
additional Pods to transition to Running and Ready.
The StatefulSet controller scaled the number of replicas. As with StatefulSet creation, the
StatefulSet controller created each Pod sequentially with respect to its ordinal index, and it
waited for each Pod's predecessor to be Running and Ready before launching the subsequent
Pod.
Scaling down
Scaling down means reducing the number of replicas. For example, you might do this because
the level of traffic to a service has decreased, and at the current scale there are idle resources.
# End this watch when there are only 3 Pods for the StatefulSet
kubectl get pod --watch -l app=nginx
In another terminal, use kubectl patch to scale the StatefulSet back down to three replicas:
statefulset.apps/web patched
The control plane deleted one Pod at a time, in reverse order with respect to its ordinal index,
and it waited for each Pod to be completely shut down before deleting the next one.
There are still five PersistentVolumeClaims and five PersistentVolumes. When exploring a Pod's
stable storage, you saw that the PersistentVolumes mounted to the Pods of a StatefulSet are not
deleted when the StatefulSet's Pods are deleted. This is still true when Pod deletion is caused by
scaling the StatefulSet down.
Updating StatefulSets
The StatefulSet controller supports automated updates. The strategy used is determined by the
spec.updateStrategy field of the StatefulSet API object. This feature can be used to upgrade the
container images, resource requests and/or limits, labels, and annotations of the Pods in a
StatefulSet.
There are two valid update strategies, RollingUpdate (the default) and OnDelete.
RollingUpdate
The RollingUpdate update strategy will update all Pods in a StatefulSet, in reverse ordinal order,
while respecting the StatefulSet guarantees.
You can split updates to a StatefulSet that uses the RollingUpdate strategy into partitions, by
specifying .spec.updateStrategy.rollingUpdate.partition. You'll practice that later in this tutorial.
In one terminal window, patch the web StatefulSet to change the container image again:
statefulset.apps/web patched
The Pods in the StatefulSet are updated in reverse ordinal order. The StatefulSet controller
terminates each Pod, and waits for it to transition to Running and Ready prior to updating the
next Pod. Note that, even though the StatefulSet controller will not proceed to update the next
Pod until its ordinal successor is Running and Ready, it will restore any Pod that fails during the
update to that Pod's existing version.
Pods that have already received the update will be restored to the updated version, and Pods
that have not yet received the update will be restored to the previous version. In this way, the
controller attempts to continue to keep the application healthy and the update consistent in the
presence of intermittent failures.
registry.k8s.io/nginx-slim:0.8
registry.k8s.io/nginx-slim:0.8
registry.k8s.io/nginx-slim:0.8
All the Pods in the StatefulSet are now running the previous container image.
Note:
You can also use kubectl rollout status sts/<name> to view the status of a rolling update to a
StatefulSet
Staging an update
You can split updates to a StatefulSet that uses the RollingUpdate strategy into partitions, by
specifying .spec.updateStrategy.rollingUpdate.partition.
For more context, you can read Partitioned rolling updates in the StatefulSet concept page.
First, patch the web StatefulSet to add a partition to the updateStrategy field:
statefulset.apps/web patched
Patch the StatefulSet again to change the container image that this StatefulSet uses:
statefulset.apps/web patched
registry.k8s.io/nginx-slim:0.8
Notice that, even though the update strategy is RollingUpdate the StatefulSet restored the Pod
with the original container image. This is because the ordinal of the Pod is less than the
partition specified by the updateStrategy.
You can roll out a canary (to test the modified template) by decrementing the partition you
specified above.
# The value of "partition" should match the highest existing ordinal for
# the StatefulSet
kubectl patch statefulset web -p '{"spec":{"updateStrategy":
{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
statefulset.apps/web patched
The control plane triggers replacement for web-2 (implemented by a graceful delete followed
by creating a new Pod once the deletion is complete). Wait for the new web-2 Pod to be
Running and Ready.
registry.k8s.io/nginx-slim:0.7
When you changed the partition, the StatefulSet controller automatically updated the web-2
Pod because the Pod's ordinal was greater than or equal to the partition.
Delete the web-1 Pod:
registry.k8s.io/nginx-slim:0.8
web-1 was restored to its original configuration because the Pod's ordinal was less than the
partition. When a partition is specified, all Pods with an ordinal that is greater than or equal to
the partition will be updated when the StatefulSet's .spec.template is updated. If a Pod that has
an ordinal less than the partition is deleted or otherwise terminated, it will be restored to its
original configuration.
You can perform a phased roll out (e.g. a linear, geometric, or exponential roll out) using a
partitioned rolling update in a similar manner to how you rolled out a canary. To perform a
phased roll out, set the partition to the ordinal at which you want the controller to pause the
update.
statefulset.apps/web patched
Wait for all of the Pods in the StatefulSet to become Running and Ready.
# This should already be running
kubectl get pod -l app=nginx --watch
Get the container image details for the Pods in the StatefulSet:
registry.k8s.io/nginx-slim:0.7
registry.k8s.io/nginx-slim:0.7
registry.k8s.io/nginx-slim:0.7
By moving the partition to 0, you allowed the StatefulSet to continue the update process.
OnDelete
statefulset.apps/web patched
When you select this update strategy, the StatefulSet controller does not automatically update
Pods when a modification is made to the StatefulSet's .spec.template field. You need to manage
the rollout yourself - either manually, or using separate automation.
Deleting StatefulSets
StatefulSet supports both non-cascading and cascading deletion. In a non-cascading delete, the
StatefulSet's Pods are not deleted when the StatefulSet is deleted. In a cascading delete, both
the StatefulSet and its Pods are deleted.
Read Use Cascading Deletion in a Cluster to learn about cascading deletion generally.
Non-cascading delete
# End this watch when there are no Pods for the StatefulSet
kubectl get pods --watch -l app=nginx
Use kubectl delete to delete the StatefulSet. Make sure to supply the --cascade=orphan
parameter to the command. This parameter tells Kubernetes to only delete the StatefulSet, and
to not delete any of its Pods.
Even though web has been deleted, all of the Pods are still Running and Ready. Delete web-0:
As the web StatefulSet has been deleted, web-0 has not been relaunched.
# Leave this watch running until the next time you start a watch
kubectl get pods --watch -l app=nginx
In a second terminal, recreate the StatefulSet. Note that, unless you deleted the nginx Service
(which you should not have), you will see an error indicating that the Service already exists.
statefulset.apps/web created
service/nginx unchanged
Ignore the error. It only indicates that an attempt was made to create the nginx headless Service
even though that Service already exists.
Examine the output of the kubectl get command running in the first terminal.
When the web StatefulSet was recreated, it first relaunched web-0. Since web-1 was already
Running and Ready, when web-0 transitioned to Running and Ready, it adopted this Pod. Since
you recreated the StatefulSet with replicas equal to 2, once web-0 had been recreated, and once
web-1 had been determined to already be Running and Ready, web-2 was terminated.
Now take another look at the contents of the index.html file served by the Pods' webservers:
web-0
web-1
Even though you deleted both the StatefulSet and the web-0 Pod, it still serves the hostname
originally entered into its index.html file. This is because the StatefulSet never deletes the
PersistentVolumes associated with a Pod. When you recreated the StatefulSet and it relaunched
web-0, its original PersistentVolume was remounted.
Cascading delete
In another terminal, delete the StatefulSet again. This time, omit the --cascade=orphan
parameter.
Examine the output of the kubectl get command running in the first terminal, and wait for all of
the Pods to transition to Terminating.
# This should already be running
kubectl get pods --watch -l app=nginx
As you saw in the Scaling Down section, the Pods are terminated one at a time, with respect to
the reverse order of their ordinal indices. Before terminating a Pod, the StatefulSet controller
waits for the Pod's successor to be completely terminated.
Note:
Although a cascading delete removes a StatefulSet together with its Pods, the cascade does not
delete the headless Service associated with the StatefulSet. You must delete the nginx Service
manually.
service/nginx created
statefulset.apps/web created
When all of the StatefulSet's Pods transition to Running and Ready, retrieve the contents of
their index.html files:
web-0
web-1
Even though you completely deleted the StatefulSet, and all of its Pods, the Pods are recreated
with their PersistentVolumes mounted, and web-0 and web-1 continue to serve their
hostnames.
You can specify a Pod management policy to avoid this strict ordering; either OrderedReady
(the default), or Parallel.
OrderedReady pod management is the default for StatefulSets. It tells the StatefulSet controller
to respect the ordering guarantees demonstrated above.
Use this when your application requires or expects that changes, such as rolling out a new
version of your application, happen in the strict order of the ordinal (pod number) that the
StatefulSet provides. In other words, if you have Pods app-0, app-1 and app-2, Kubernetes will
update app-0 first and check it. Once the checks are good, Kubernetes updates app-1 and finally
app-2.
If you added two more Pods, Kubernetes would set up app-3 and wait for that to become
healthy before deploying app-4.
Because this is the default setting, you've already practised using it.
The alternative, Parallel pod management, tells the StatefulSet controller to launch or terminate
all Pods in parallel, and not to wait for Pods to become Running and Ready or completely
terminated prior to launching or terminating another Pod.
The Parallel pod management option only affects the behavior for scaling operations. Updates
are not affected; Kubernetes still rolls out changes in order. For this tutorial, the application is
very simple: a webserver that tells you its hostname (because this is a StatefulSet, the hostname
for each Pod is different and predictable).
application/web/web-parallel.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
podManagementPolicy: "Parallel"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
This manifest is identical to the one you downloaded above except that the
.spec.podManagementPolicy of the web StatefulSet is set to Parallel.
service/nginx updated
statefulset.apps/web updated
Keep the terminal open where you're running the watch. In another terminal window, scale the
StatefulSet:
statefulset.apps/web scaled
Examine the output of the terminal where the kubectl get command is running. It may look
something like
The StatefulSet launched three new Pods, and it did not wait for the first to become Running
and Ready prior to launching the second and third Pods.
This approach is useful if your workload has a stateful element, or needs Pods to be able to
identify each other with predictable naming, and especially if you sometimes need to provide a
lot more capacity quickly. If this simple web service for the tutorial suddenly got an extra
1,000,000 requests per minute then you would want to run some more Pods - but you also
would not want to wait for each new Pod to launch. Starting the extra Pods in parallel cuts the
time between requesting the extra capacity and having it available for use.
Cleaning up
You should have two terminals open, ready for you to run kubectl commands as part of
cleanup.
You can watch kubectl get to see those Pods being deleted.
During deletion, a StatefulSet removes all Pods concurrently; it does not wait for a Pod's ordinal
successor to terminate prior to deleting that Pod.
Close the terminal where the kubectl get command is running and delete the nginx Service:
Delete the persistent storage media for the PersistentVolumes used in this tutorial.
kubectl get pv
Note:
You also need to delete the persistent storage media for the PersistentVolumes used in this
tutorial. Follow the necessary steps, based on your environment, storage configuration, and
provisioning method, to ensure that all storage is reclaimed.
A PersistentVolume (PV) is a piece of storage in the cluster that has been manually provisioned
by an administrator, or dynamically provisioned by Kubernetes using a StorageClass. A
PersistentVolumeClaim (PVC) is a request for storage by a user that can be fulfilled by a PV.
PersistentVolumes and PersistentVolumeClaims are independent from Pod lifecycles and
preserve data through restarting, rescheduling, and even deleting Pods.
Warning:
This deployment is not suitable for production use cases, as it uses single instance WordPress
and MySQL Pods. Consider using WordPress Helm Chart to deploy WordPress in production.
Note:
The files provided in this tutorial are using GA Deployment APIs and are specific to kubernetes
version 1.9 and later. If you wish to use this tutorial with an earlier version of Kubernetes,
please update the API version appropriately, or reference earlier versions of this tutorial.
Objectives
• Create PersistentVolumeClaims and PersistentVolumes
• Create a kustomization.yaml with
◦ a Secret generator
◦ MySQL resource configs
◦ WordPress resource configs
• Apply the kustomization directory by kubectl apply -k ./
• Clean up
• Killercoda
• Play with Kubernetes
The example shown on this page works with kubectl 1.27 and above.
1. mysql-deployment.yaml
2. wordpress-deployment.yaml
Many cluster environments have a default StorageClass installed. When a StorageClass is not
specified in the PersistentVolumeClaim, the cluster's default StorageClass is used instead.
Warning:
In local clusters, the default StorageClass uses the hostPath provisioner. hostPath volumes are
only suitable for development and testing. With hostPath volumes, your data lives in /tmp on
the node the Pod is scheduled onto and does not move between nodes. If a Pod dies and gets
scheduled to another node in the cluster, or the node is rebooted, the data is lost.
Note:
If you are bringing up a cluster that needs to use the hostPath provisioner, the --enable-
hostpath-provisioner flag must be set in the controller-manager component.
Note:
If you have a Kubernetes cluster running on Google Kubernetes Engine, please follow this
guide.
Create a kustomization.yaml
Add a Secret generator
A Secret is an object that stores a piece of sensitive data like a password or key. Since 1.14,
kubectl supports the management of Kubernetes objects using a kustomization file. You can
create a Secret by generators in kustomization.yaml.
Add a Secret generator in kustomization.yaml from the following command. You will need to
replace YOUR_PASSWORD with the password you want to use.
application/wordpress/mysql-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress-mysql
labels:
app: wordpress
spec:
ports:
- port: 3306
selector:
app: wordpress
tier: mysql
clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
labels:
app: wordpress
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress-mysql
labels:
app: wordpress
spec:
selector:
matchLabels:
app: wordpress
tier: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: wordpress
tier: mysql
spec:
containers:
- image: mysql:8.0
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
- name: MYSQL_DATABASE
value: wordpress
- name: MYSQL_USER
value: wordpress
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
application/wordpress/wordpress-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress
labels:
app: wordpress
spec:
ports:
- port: 80
selector:
app: wordpress
tier: frontend
type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wp-pv-claim
labels:
app: wordpress
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
selector:
matchLabels:
app: wordpress
tier: frontend
strategy:
type: Recreate
template:
metadata:
labels:
app: wordpress
tier: frontend
spec:
containers:
- image: wordpress:6.2.1-apache
name: wordpress
env:
- name: WORDPRESS_DB_HOST
value: wordpress-mysql
- name: WORDPRESS_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
- name: WORDPRESS_DB_USER
value: wordpress
ports:
- containerPort: 80
name: wordpress
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /var/www/html
volumes:
- name: wordpress-persistent-storage
persistentVolumeClaim:
claimName: wp-pv-claim
kubectl apply -k ./
Note:
It can take up to a few minutes for the PVs to be provisioned and bound.
Note:
Note:
Minikube can only expose Services through NodePort. The EXTERNAL-IP is always
pending.
5. Run the following command to get the IP Address for the WordPress Service:
http://1.2.3.4:32406
Copy the IP address, and load the page in your browser to view your site.
6.
You should see the WordPress set up page similar to the following screenshot.
Warning:
Do not leave your WordPress installation on this page. If another user finds it, they can
set up a website on your instance and use it to serve malicious content.
Either install WordPress by creating a username and password or delete your instance.
Cleaning up
1. Run the following command to delete your Secret, Deployments, Services and
PersistentVolumeClaims:
kubectl delete -k ./
What's next
• Learn more about Introspection and Debugging
• Learn more about Jobs
• Learn more about Port Forwarding
• Learn how to Get a Shell to a Container
StatefulSets make it easier to deploy stateful applications into your Kubernetes cluster. For more
information on the features used in this tutorial, see StatefulSet.
Note:
Cassandra and Kubernetes both use the term node to mean a member of a cluster. In this
tutorial, the Pods that belong to the StatefulSet are Cassandra nodes and are members of the
Cassandra cluster (called a ring). When those Pods run in your Kubernetes cluster, the
Kubernetes control plane schedules those Pods onto Kubernetes Nodes.
When a Cassandra node starts, it uses a seed list to bootstrap discovery of other nodes in the
ring. This tutorial deploys a custom Cassandra seed provider that lets the database discover new
Cassandra Pods as they appear inside your Kubernetes cluster.
Objectives
• Create and validate a Cassandra headless Service.
• Use a StatefulSet to create a Cassandra ring.
• Validate the StatefulSet.
• Modify the StatefulSet.
• Delete the StatefulSet and its Pods.
• Killercoda
• Play with Kubernetes
To complete this tutorial, you should already have a basic familiarity with Pods, Services, and
StatefulSets.
Caution:
Minikube defaults to 2048MB of memory and 2 CPU. Running Minikube with the default
resource configuration results in insufficient resource errors during this tutorial. To avoid these
errors, start Minikube with the following settings:
The following Service is used for DNS lookups between Cassandra Pods and clients within your
cluster:
application/cassandra/cassandra-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
selector:
app: cassandra
Create a Service to track all Cassandra StatefulSet members from the cassandra-service.yaml
file:
The response is
If you don't see a Service named cassandra, that means creation failed. Read Debug Services for
help troubleshooting common issues.
Note:
This example uses the default provisioner for Minikube. Please update the following StatefulSet
for the cloud you are working with.
application/cassandra/cassandra-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v13
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra-data
mountPath: /cassandra_data
# These are converted to volume claims by the controller
# and mounted at the paths mentioned above.
# do not use these in production until ssd GCEPersistentDisk or other ssd pd
volumeClaimTemplates:
- metadata:
name: cassandra-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast
resources:
requests:
storage: 1Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: k8s.io/minikube-hostpath
parameters:
type: pd-ssd
3. Run the Cassandra nodetool inside the first Pod, to display the status of the ring.
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.5 83.57 KiB 32 74.0% e2dd09e6-d9d3-477e-96c5-45094c08db0f
Rack1-K8Demo
UN 172.17.0.4 101.04 KiB 32 58.8% f89d6835-3a42-4419-92b3-0e62cae1479c
Rack1-K8Demo
UN 172.17.0.6 84.74 KiB 32 67.1% a6a1e8c2-3dc5-4417-b1a0-26507af2aaad
Rack1-K8Demo
This command opens an editor in your terminal. The line you need to change is the
replicas field. The following sample is an excerpt of the StatefulSet file:
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: 2016-08-13T18:40:58Z
generation: 1
labels:
app: cassandra
name: cassandra
namespace: default
resourceVersion: "323"
uid: 7a219483-6185-11e6-a910-42010a8a0fc0
spec:
replicas: 3
Cleaning up
Deleting or scaling a StatefulSet down does not delete the volumes associated with the
StatefulSet. This setting is for your safety because your data is more valuable than automatically
purging all related StatefulSet resources.
Warning:
Depending on the storage class and reclaim policy, deleting the PersistentVolumeClaims may
cause the associated volumes to also be deleted. Never assume you'll be able to access data if its
volume claims are deleted.
1. Run the following commands (chained together into a single command) to delete
everything in the Cassandra StatefulSet:
2. Run the following command to delete the Service you set up for Cassandra:
This image includes a standard Cassandra installation from the Apache Debian repo. By using
environment variables you can change values that are inserted into cassandra.yaml.
What's next
• Learn how to Scale a StatefulSet.
• Learn more about the KubernetesSeedProvider
• See more custom Seed Provider Configurations
• Pods
• Cluster DNS
• Headless Services
• PersistentVolumes
• PersistentVolume Provisioning
• StatefulSets
• PodDisruptionBudgets
• PodAntiAffinity
• kubectl CLI
You must have a cluster with at least four nodes, and each node requires at least 2 CPUs and 4
GiB of memory. In this tutorial you will cordon and drain the cluster's nodes. This means that
the cluster will terminate and evict all Pods on its nodes, and the nodes will
temporarily become unschedulable. You should use a dedicated cluster for this tutorial, or
you should ensure that the disruption you cause will not interfere with other tenants.
This tutorial assumes that you have configured your cluster to dynamically provision
PersistentVolumes. If your cluster is not configured to do so, you will have to manually
provision three 20 GiB volumes before starting this tutorial.
Objectives
After this tutorial, you will know the following.
ZooKeeper
The ensemble uses the Zab protocol to elect a leader, and the ensemble cannot write data until
that election is complete. Once complete, the ensemble uses Zab to ensure that it replicates all
writes to a quorum before it acknowledges and makes them visible to clients. Without respect
to weighted quorums, a quorum is a majority component of the ensemble containing the
current leader. For instance, if the ensemble has three servers, a component that contains the
leader and one other server constitutes a quorum. If the ensemble can not achieve a quorum,
the ensemble cannot write data.
ZooKeeper servers keep their entire state machine in memory, and write every mutation to a
durable WAL (Write Ahead Log) on storage media. When a server crashes, it can recover its
previous state by replaying the WAL. To prevent the WAL from growing without bound,
ZooKeeper servers will periodically snapshot them in memory state to storage media. These
snapshots can be loaded directly into memory, and all WAL entries that preceded the snapshot
may be discarded.
application/zookeeper/zookeeper.yaml
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "registry.k8s.io/kubernetes-zookeeper:1.0-3.4.10"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Open a terminal, and use the kubectl apply command to create the manifest.
This creates the zk-hs Headless Service, the zk-cs Service, the zk-pdb PodDisruptionBudget, and
the zk StatefulSet.
service/zk-hs created
service/zk-cs created
poddisruptionbudget.policy/zk-pdb created
statefulset.apps/zk created
Use kubectl get to watch the StatefulSet controller create the StatefulSet's Pods.
Once the zk-2 Pod is Running and Ready, use CTRL-C to terminate kubectl.
The StatefulSet controller creates three Pods, and each Pod has a container with a ZooKeeper
server.
Because there is no terminating algorithm for electing a leader in an anonymous network, Zab
requires explicit membership configuration to perform leader election. Each server in the
ensemble needs to have a unique identifier, all servers need to know the global set of identifiers,
and each identifier needs to be associated with a network address.
Use kubectl exec to get the hostnames of the Pods in the zk StatefulSet.
The StatefulSet controller provides each Pod with a unique hostname based on its ordinal index.
The hostnames take the form of <statefulset name>-<ordinal index>. Because the replicas field
of the zk StatefulSet is set to 3, the Set's controller creates three Pods with their hostnames set
to zk-0, zk-1, and zk-2.
zk-0
zk-1
zk-2
The servers in a ZooKeeper ensemble use natural numbers as unique identifiers, and store each
server's identifier in a file called myid in the server's data directory.
To examine the contents of the myid file for each server use the following command.
Because the identifiers are natural numbers and the ordinal indices are non-negative integers,
you can generate an identifier by adding 1 to the ordinal.
myid zk-0
1
myid zk-1
2
myid zk-2
3
To get the Fully Qualified Domain Name (FQDN) of each Pod in the zk StatefulSet use the
following command.
The zk-hs Service creates a domain for all of the Pods, zk-hs.default.svc.cluster.local.
zk-0.zk-hs.default.svc.cluster.local
zk-1.zk-hs.default.svc.cluster.local
zk-2.zk-hs.default.svc.cluster.local
The A records in Kubernetes DNS resolve the FQDNs to the Pods' IP addresses. If Kubernetes
reschedules the Pods, it will update the A records with the Pods' new IP addresses, but the A
records names will not change.
ZooKeeper stores its application configuration in a file named zoo.cfg. Use kubectl exec to view
the contents of the zoo.cfg file in the zk-0 Pod.
In the server.1, server.2, and server.3 properties at the bottom of the file, the 1, 2, and 3
correspond to the identifiers in the ZooKeeper servers' myid files. They are set to the FQDNs for
the Pods in the zk StatefulSet.
clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log
tickTime=2000
initLimit=10
syncLimit=2000
maxClientCnxns=60
minSessionTimeout= 4000
maxSessionTimeout= 40000
autopurge.snapRetainCount=3
autopurge.purgeInterval=0
server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
server.3=zk-2.zk-hs.default.svc.cluster.local:2888:3888
Achieving consensus
Consensus protocols require that the identifiers of each participant be unique. No two
participants in the Zab protocol should claim the same unique identifier. This is necessary to
allow the processes in the system to agree on which processes have committed which data. If
two Pods are launched with the same ordinal, two ZooKeeper servers would both identify
themselves as the same server.
The A records for each Pod are entered when the Pod becomes Ready. Therefore, the FQDNs of
the ZooKeeper servers will resolve to a single endpoint, and that endpoint will be the unique
ZooKeeper server claiming the identity configured in its myid file.
zk-0.zk-hs.default.svc.cluster.local
zk-1.zk-hs.default.svc.cluster.local
zk-2.zk-hs.default.svc.cluster.local
This ensures that the servers properties in the ZooKeepers' zoo.cfg files represents a correctly
configured ensemble.
server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
server.3=zk-2.zk-hs.default.svc.cluster.local:2888:3888
When the servers use the Zab protocol to attempt to commit a value, they will either achieve
consensus and commit the value (if leader election has succeeded and at least two of the Pods
are Running and Ready), or they will fail to do so (if either of the conditions are not met). No
state will arise where one server acknowledges a write on behalf of another.
The most basic sanity test is to write data to one ZooKeeper server and to read the data from
another.
The command below executes the zkCli.sh script to write world to the path /hello on the zk-0
Pod in the ensemble.
WATCHER::
To get the data from the zk-1 Pod use the following command.
The data that you created on zk-0 is available on all the servers in the ensemble.
WATCHER::
As mentioned in the ZooKeeper Basics section, ZooKeeper commits all entries to a durable
WAL, and periodically writes snapshots in memory state, to storage media. Using WALs to
provide durability is a common technique for applications that use consensus protocols to
achieve a replicated state machine.
Use the kubectl delete command to delete the zk StatefulSet.
This creates the zk StatefulSet object, but the other API objects in the manifest are not modified
because they already exist.
Once the zk-2 Pod is Running and Ready, use CTRL-C to terminate kubectl.
Even though you terminated and recreated all of the Pods in the zk StatefulSet, the ensemble
still serves the original value.
WATCHER::
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 20Gi
The StatefulSet controller generates a PersistentVolumeClaim for each Pod in the StatefulSet.
When the StatefulSet recreated its Pods, it remounts the Pods' PersistentVolumes.
When a Pod in the zk StatefulSet is (re)scheduled, it will always have the same
PersistentVolume mounted to the ZooKeeper server's data directory. Even when the Pods are
rescheduled, all the writes made to the ZooKeeper servers' WALs, and all their snapshots,
remain durable.
...
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
...
The command used to start the ZooKeeper servers passed the configuration as command line
parameter. You can also use environment variables to pass configuration to the ensemble.
Configuring logging
One of the files generated by the zkGenConfig.sh script controls ZooKeeper's logging.
ZooKeeper uses Log4j, and, by default, it uses a time and size based rolling file appender for its
logging configuration.
Use the command below to get the logging configuration from one of Pods in the zk StatefulSet.
The logging configuration below will cause the ZooKeeper process to write all of its logs to the
standard output file stream.
zookeeper.root.logger=CONSOLE
zookeeper.console.threshold=INFO
log4j.rootLogger=${zookeeper.root.logger}
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:
%C{1}@%L] - %m%n
This is the simplest possible way to safely log inside the container. Because the applications
write logs to standard out, Kubernetes will handle log rotation for you. Kubernetes also
implements a sane retention policy that ensures application logs written to standard out and
standard error do not exhaust local storage media.
Use kubectl logs to retrieve the last 20 log lines from one of the Pods.
You can view application logs written to standard out or standard error using kubectl logs and
from the Kubernetes Dashboard.
Kubernetes integrates with many logging solutions. You can choose a logging solution that best
fits your cluster and applications. For cluster-level logging and aggregation, consider deploying
a sidecar container to rotate and ship your logs.
The best practices to allow an application to run as a privileged user inside of a container are a
matter of debate. If your organization requires that applications run as a non-privileged user
you can use a SecurityContext to control the user that the entry point runs as.
securityContext:
runAsUser: 1000
fsGroup: 1000
In the Pods' containers, UID 1000 corresponds to the zookeeper user and GID 1000 corresponds
to the zookeeper group.
As the runAsUser field of the securityContext object is set to 1000, instead of running as root,
the ZooKeeper process runs as the zookeeper user.
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S zookeep+ 1 0 0 80 0 - 1127 - 20:46 ? 00:00:00 sh -c zkGenConfig.sh &&
zkServer.sh start-foreground
0 S zookeep+ 27 1 0 80 0 - 1155556 - 20:46 ? 00:00:19 /usr/lib/jvm/java-8-openjdk-
amd64/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -
Dzookeeper.root.logger=INFO,CONSOLE -cp /usr/bin/../build/classes:/usr/bin/../build/lib/*.jar:/
usr/bin/../share/zookeeper/zookeeper-3.4.9.jar:/usr/bin/../share/zookeeper/slf4j-
log4j12-1.6.1.jar:/usr/bin/../share/zookeeper/slf4j-api-1.6.1.jar:/usr/bin/../share/zookeeper/
netty-3.10.5.Final.jar:/usr/bin/../share/zookeeper/log4j-1.2.16.jar:/usr/bin/../share/zookeeper/
jline-0.9.94.jar:/usr/bin/../src/java/lib/*.jar:/usr/bin/../etc/zookeeper: -Xmx2G -Xms2G -
Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false
org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/bin/../etc/zookeeper/zoo.cfg
By default, when the Pod's PersistentVolumes is mounted to the ZooKeeper server's data
directory, it is only accessible by the root user. This configuration prevents the ZooKeeper
process from writing to its WAL and storing its snapshots.
Use the command below to get the file permissions of the ZooKeeper data directory on the zk-0
Pod.
Because the fsGroup field of the securityContext object is set to 1000, the ownership of the
Pods' PersistentVolumes is set to the zookeeper group, and the ZooKeeper process is able to
read and write its data.
You can use kubectl patch to update the number of cpus allocated to the servers.
statefulset.apps/zk patched
This terminates the Pods, one at a time, in reverse ordinal order, and recreates them with the
new configuration. This ensures that quorum is maintained during a rolling update.
Use the kubectl rollout history command to view a history or previous configurations.
statefulsets "zk"
REVISION
1
2
Use the kubectl rollout undo command to roll back the modification.
Restart Policies control how Kubernetes handles process failures for the entry point of the
container in a Pod. For Pods in a StatefulSet, the only appropriate RestartPolicy is Always, and
this is the default value. For stateful applications you should never override the default policy.
Use the following command to examine the process tree for the ZooKeeper server running in
the zk-0 Pod.
The command used as the container's entry point has PID 1, and the ZooKeeper process, a child
of the entry point, has PID 27.
In another terminal watch the Pods in the zk StatefulSet with the following command.
The termination of the ZooKeeper process caused its parent process to terminate. Because the
RestartPolicy of the container is Always, it restarted the parent process.
If your application uses a script (such as zkServer.sh) to launch the process that implements the
application's business logic, the script must terminate with the child process. This ensures that
Kubernetes will restart the application's container when the process implementing the
application's business logic fails.
Configuring your application to restart failed processes is not enough to keep a distributed
system healthy. There are scenarios where a system's processes can be both alive and
unresponsive, or otherwise unhealthy. You should use liveness probes to notify Kubernetes that
your application's processes are unhealthy and it should restart them.
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 15
timeoutSeconds: 5
The probe calls a bash script that uses the ZooKeeper ruok four letter word to test the server's
health.
In one terminal window, use the following command to watch the Pods in the zk StatefulSet.
When the liveness probe for the ZooKeeper process fails, Kubernetes will automatically restart
the process for you, ensuring that unhealthy processes in the ensemble are restarted.
Readiness is not the same as liveness. If a process is alive, it is scheduled and healthy. If a
process is ready, it is able to process input. Liveness is a necessary, but not sufficient, condition
for readiness. There are cases, particularly during initialization and termination, when a process
can be alive but not ready.
If you specify a readiness probe, Kubernetes will ensure that your application's processes will
not receive network traffic until their readiness checks pass.
For a ZooKeeper server, liveness implies readiness. Therefore, the readiness probe from the
zookeeper.yaml manifest is identical to the liveness probe.
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 15
timeoutSeconds: 5
Even though the liveness and readiness probes are identical, it is important to specify both. This
ensures that only healthy servers in the ZooKeeper ensemble receive network traffic.
You should always provision additional capacity to allow the processes of critical systems to be
rescheduled in the event of node failures. If you do so, then the outage will only last until the
Kubernetes scheduler reschedules one of the ZooKeeper servers. However, if you want your
service to tolerate node failures with no downtime, you should set podAntiAffinity.
Use the command below to get the nodes for Pods in the zk StatefulSet.
for i in 0 1 2; do kubectl get pod zk-$i --template {{.spec.nodeName}}; echo ""; done
kubernetes-node-cxpk
kubernetes-node-a5aq
kubernetes-node-2g2d
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
Surviving maintenance
In this section you will cordon and drain nodes. If you are using this tutorial on a shared
cluster, be sure that this will not adversely affect other tenants.
The previous section showed you how to spread your Pods across nodes to survive unplanned
node failures, but you also need to plan for temporary node failures that occur due to planned
maintenance.
This tutorial assumes a cluster with at least four nodes. If the cluster has more than four, use
kubectl cordon to cordon all but four nodes. Constraining to four nodes will ensure Kubernetes
encounters affinity and PodDisruptionBudget constraints when scheduling zookeeper Pods in
the following maintenance simulation.
The max-unavailable field indicates to Kubernetes that at most one Pod from zk StatefulSet can
be unavailable at any time.
In one terminal, use this command to watch the Pods in the zk StatefulSet.
In another terminal, use this command to get the nodes that the Pods are currently scheduled
on.
for i in 0 1 2; do kubectl get pod zk-$i --template {{.spec.nodeName}}; echo ""; done
kubernetes-node-pb41
kubernetes-node-ixsl
kubernetes-node-i4c4
Use kubectl drain to cordon and drain the node on which the zk-0 Pod is scheduled.
kubectl drain $(kubectl get pod zk-0 --template {{.spec.nodeName}}) --ignore-daemonsets --force
--delete-emptydir-data
As there are four nodes in your cluster, kubectl drain, succeeds and the zk-0 is rescheduled to
another node.
Keep watching the StatefulSet's Pods in the first terminal and drain the node on which zk-1 is
scheduled.
kubectl drain $(kubectl get pod zk-1 --template {{.spec.nodeName}}) --ignore-daemonsets --force
--delete-emptydir-data
"kubernetes-node-ixsl" cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, or
DaemonSet: fluentd-cloud-logging-kubernetes-node-ixsl, kube-proxy-kubernetes-node-ixsl;
Ignoring DaemonSet-managed pods: node-problem-detector-v0.1-voc74
pod "zk-1" deleted
node "kubernetes-node-ixsl" drained
The zk-1 Pod cannot be scheduled because the zk StatefulSet contains a PodAntiAffinity rule
preventing co-location of the Pods, and as only two nodes are schedulable, the Pod will remain
in a Pending state.
Continue to watch the Pods of the StatefulSet, and drain the node on which zk-2 is scheduled.
kubectl drain $(kubectl get pod zk-2 --template {{.spec.nodeName}}) --ignore-daemonsets --force
--delete-emptydir-data
You cannot drain the third node because evicting zk-2 would violate zk-budget. However, the
node will remain cordoned.
Use zkCli.sh to retrieve the value you entered during the sanity test from zk-0.
zk-1 is rescheduled on this node. Wait until zk-1 is Running and Ready.
kubectl drain $(kubectl get pod zk-2 --template {{.spec.nodeName}}) --ignore-daemonsets --force
--delete-emptydir-data
You can use kubectl drain in conjunction with PodDisruptionBudgets to ensure that your
services remain available during maintenance. If drain is used to cordon nodes and evict pods
prior to taking the node offline for maintenance, services that express a disruption budget will
have that budget respected. You should always allocate additional capacity for critical services
so that their Pods can be immediately rescheduled.
Cleaning up
• Use kubectl uncordon to uncordon all the nodes in your cluster.
• You must delete the persistent storage media for the PersistentVolumes used in this
tutorial. Follow the necessary steps, based on your environment, storage configuration,
and provisioning method, to ensure that all storage is reclaimed.
Services
Using Source IP
Kubernetes assumes that pods can communicate with other pods, regardless of which host they
land on. Kubernetes gives every pod its own cluster-private IP address, so you do not need to
explicitly create links between pods or map container ports to host ports. This means that
containers within a Pod can all reach each other's ports on localhost, and all pods in a cluster
can see each other without NAT. The rest of this document elaborates on how you can run
reliable services on such a networking model.
This tutorial uses a simple nginx web server to demonstrate the concept.
service/networking/run-my-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
This makes it accessible from any node in your cluster. Check the nodes the Pod is running on:
You should be able to ssh into any node in your cluster and use a tool such as curl to make
queries against both IPs. Note that the containers are not using port 80 on the node, nor are
there any special NAT rules to route traffic to the pod. This means you can run multiple nginx
pods on the same node all using the same containerPort, and access them from any other pod
or node in your cluster using the assigned IP address for the pod. If you want to arrange for a
specific port on the host Node to be forwarded to backing Pods, you can - but the networking
model should mean that you do not need to do so.
You can read more about the Kubernetes Networking Model if you're curious.
Creating a Service
So we have pods running nginx in a flat, cluster wide, address space. In theory, you could talk
to these pods directly, but what happens when a node dies? The pods die with it, and the
ReplicaSet inside the Deployment will create new ones, with different IPs. This is the problem a
Service solves.
A Kubernetes Service is an abstraction which defines a logical set of Pods running somewhere
in your cluster, that all provide the same functionality. When created, each Service is assigned a
unique IP address (also called clusterIP). This address is tied to the lifespan of the Service, and
will not change while the Service is alive. Pods can be configured to talk to the Service, and
know that communication to the Service will be automatically load-balanced out to some pod
that is a member of the Service.
You can create a Service for your 2 nginx replicas with kubectl expose:
service/my-nginx exposed
service/networking/nginx-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: my-nginx
labels:
run: my-nginx
spec:
ports:
- port: 80
protocol: TCP
selector:
run: my-nginx
This specification will create a Service which targets TCP port 80 on any Pod with the run: my-
nginx label, and expose it on an abstracted Service port (targetPort: is the port the container
accepts traffic on, port: is the abstracted Service port, which can be any port other pods use to
access the Service). View Service API object to see the list of supported fields in service
definition. Check your Service:
As mentioned previously, a Service is backed by a group of Pods. These Pods are exposed
through EndpointSlices. The Service's selector will be evaluated continuously and the results
will be POSTed to an EndpointSlice that is connected to the Service using labels. When a Pod
dies, it is automatically removed from the EndpointSlices that contain it as an endpoint. New
Pods that match the Service's selector will automatically get added to an EndpointSlice for that
Service. Check the endpoints, and note that the IPs are the same as the Pods created in the first
step:
Name: my-nginx
Namespace: default
Labels: run=my-nginx
Annotations: <none>
Selector: run=my-nginx
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.0.162.149
IPs: 10.0.162.149
Port: <unset> 80/TCP
TargetPort: 80/TCP
Endpoints: 10.244.2.5:80,10.244.3.4:80
Session Affinity: None
Events: <none>
You should now be able to curl the nginx Service on <CLUSTER-IP>:<PORT> from any node in
your cluster. Note that the Service IP is completely virtual, it never hits the wire. If you're
curious about how this works you can read more about the service proxy.
Note:
If the service environment variables are not desired (because possible clashing with expected
program ones, too many variables to process, only using DNS, etc) you can disable this mode by
setting the enableServiceLinks flag to false on the pod spec.
Environment Variables
When a Pod runs on a Node, the kubelet adds a set of environment variables for each active
Service. This introduces an ordering problem. To see why, inspect the environment of your
running nginx Pods (your Pod name will be different):
KUBERNETES_SERVICE_HOST=10.0.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
Note there's no mention of your Service. This is because you created the replicas before the
Service. Another disadvantage of doing this is that the scheduler might put both Pods on the
same machine, which will take your entire Service down if it dies. We can do this the right way
by killing the 2 Pods and waiting for the Deployment to recreate them. This time the Service
exists before the replicas. This will give you scheduler-level Service spreading of your Pods
(provided all your nodes have equal capacity), as well as the right environment variables:
You may notice that the pods have different names, since they are killed and recreated.
KUBERNETES_SERVICE_PORT=443
MY_NGINX_SERVICE_HOST=10.0.162.149
KUBERNETES_SERVICE_HOST=10.0.0.1
MY_NGINX_SERVICE_PORT=80
KUBERNETES_SERVICE_PORT_HTTPS=443
DNS
Kubernetes offers a DNS cluster addon Service that automatically assigns dns names to other
Services. You can check if it's running on your cluster:
The rest of this section will assume you have a Service with a long lived IP (my-nginx), and a
DNS server that has assigned a name to that IP. Here we use the CoreDNS cluster addon
(application name kube-dns), so you can talk to the Service from any pod in your cluster using
standard methods (e.g. gethostbyname()). If CoreDNS isn't running, you can enable it referring
to the CoreDNS README or Installing CoreDNS. Let's run another curl application to test this:
Waiting for pod default/curl-131556218-9fnch to be running, status is Pending, pod ready: false
Hit enter for command prompt
Name: my-nginx
Address 1: 10.0.162.149
Securing the Service
Till now we have only accessed the nginx server from within the cluster. Before exposing the
Service to the internet, you want to make sure the communication channel is secure. For this,
you will need:
• Self signed certificates for https (unless you already have an identity certificate)
• An nginx server configured to use the certificates
• A secret that makes the certificates accessible to pods
You can acquire all these from the nginx https example. This requires having go and make tools
installed. If you don't want to install those, then follow the manual steps later. In short:
secret/nginxsecret created
You can find an example for default.conf in the Kubernetes examples project repo.
configmap/nginxconfigmap created
You can view the details of the nginxconfigmap ConfigMap using the following command:
Name: nginxconfigmap
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
default.conf:
----
server {
listen 80 default_server;
listen [::]:80 default_server ipv6only=on;
listen 443 ssl;
root /usr/share/nginx/html;
index index.html;
server_name localhost;
ssl_certificate /etc/nginx/ssl/tls.crt;
ssl_certificate_key /etc/nginx/ssl/tls.key;
location / {
try_files $uri $uri/ =404;
}
}
BinaryData
====
Events: <none>
Following are the manual steps to follow in case you run into problems running make (on
windows for example):
Use the output from the previous commands to create a yaml file as follows. The base64
encoded value should all be on a single line.
apiVersion: "v1"
kind: "Secret"
metadata:
name: "nginxsecret"
namespace: "default"
type: kubernetes.io/tls
data:
tls.crt: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIekNDQWdlZ0F3SUJBZ0lKQUp5
M3lQK0pzMlpJTUEwR0NTcUdTSWIzRFFFQkJRVUFNQ1l4RVRBUEJnTlYKQkFNVENHNW5hV
zU0YzNaak1SRXdEd1lEVlFRS0V3aHVaMmx1ZUhOMll6QWVGdzB4TnpFd01qWXdOekEzTVRK
YQpGdzB4T0RFd01qWXdOekEzTVRKYU1DWXhFVEFQQmdOVkJBTVRDRzVuYVc1NGMzWm
pNUkV3RHdZRFZRUUtFd2h1CloybHVlSE4yWXpDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFB
RGdnRVBBRENDQVFvQ2dnRUJBSjFxSU1SOVdWM0IKMlZIQlRMRmtobDRONXljMEJxYUhIQ
ktMSnJMcy8vdzZhU3hRS29GbHlJSU94NGUrMlN5ajBFcndCLzlYTnBwbQppeW1CL3JkRldkOXg
5UWhBQUxCZkVaTmNiV3NsTVFVcnhBZW50VWt1dk1vLzgvMHRpbGhjc3paenJEYVJ4NEo5C
i82UVRtVVI3a0ZTWUpOWTVQZkR3cGc3dlVvaDZmZ1Voam92VG42eHNVR0M2QURVODBp
NXFlZWhNeVI1N2lmU2YKNHZpaXdIY3hnL3lZR1JBRS9mRTRqakxCdmdONjc2SU90S01rZXV3
R0ljNDFhd05tNnNTSzRqYUNGeGpYSnZaZQp2by9kTlEybHhHWCtKT2l3SEhXbXNhdGp4WTR
aNVk3R1ZoK0QrWnYvcW1mMFgvbVY0Rmo1NzV3ajFMWVBocWtsCmdhSXZYRyt4U1FVQ0F3
RUFBYU5RTUU0d0hRWURWUjBPQkJZRUZPNG9OWkI3YXc1OUlsYkROMzhIYkduYnhFVjcKT
UI4R0ExVWRJd1FZTUJhQUZPNG9OWkI3YXc1OUlsYkROMzhIYkduYnhFVjdNQXdHQTFVZE
V3UUZNQU1CQWY4dwpEUVlKS29aSWh2Y05BUUVGQlFBRGdnRUJBRVhTMW9FU0lFaXdyM
DhWcVA0K2NwTHI3TW5FMTducDBvMm14alFvCjRGb0RvRjdRZnZqeE04Tzd2TjB0clcxb2pGS
W0vWDE4ZnZaL3k4ZzVaWG40Vm8zc3hKVmRBcStNZC9jTStzUGEKNmJjTkNUekZqeFpUV0U
rKzE5NS9zb2dmOUZ3VDVDK3U2Q3B5N0M3MTZvUXRUakViV05VdEt4cXI0Nk1OZWNCMAp
wRFhWZmdWQTRadkR4NFo3S2RiZDY5eXM3OVFHYmg5ZW1PZ05NZFlsSUswSGt0ejF5WU4v
bVpmK3FqTkJqbWZjCkNnMnlwbGQ0Wi8rUUNQZjl3SkoybFIrY2FnT0R4elBWcGxNSEcybzgvT
HFDdnh6elZPUDUxeXdLZEtxaUMwSVEKQ0I5T2wwWW5scE9UNEh1b2hSUzBPOStlMm9KdF
ZsNUIyczRpbDlhZ3RTVXFxUlU9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
tls.key: "LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXc
wQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQ2RhaURFZlZsZHdkbFIKd1V5eFpJWmVE
ZWNuTkFhbWh4d1NpeWF5N1AvOE9ta3NVQ3FCWmNpQ0RzZUh2dGtzbzlCSzhBZi9WemFh
Wm9zcApnZjYzUlZuZmNmVUlRQUN3WHhHVFhHMXJKVEVGSzhRSHA3VkpMcnpLUC9QO
UxZcFlYTE0yYzZ3MmtjZUNmZitrCkU1bEVlNUJVbUNUV09UM3c4S1lPNzFLSWVuNEZJWTZ
MMDUrc2JGQmd1Z0ExUE5JdWFubm9UTWtlZTRuMG4rTDQKb3NCM01ZUDhtQmtRQlAzeE9
JNHl3YjREZXUraURyU2pKSHJzQmlIT05Xc0RadXJFaXVJMmdoY1kxeWIyWHI2UAozVFVOcG
NSbC9pVG9zQngxcHJHclk4V09HZVdPeGxZZmcvbWIvNnBuOUYvNWxlQlkrZStjSTlTMkQ0YX
BKWUdpCkwxeHZzVWtGQWdNQkFBRUNnZ0VBZFhCK0xkbk8ySElOTGo5bWRsb25IUGlHW
WVzZ294RGQwci9hQ1Zkank4dlEKTjIwL3FQWkUxek1yall6Ry9kVGhTMmMwc0QxaTBXSjdw
R1lGb0xtdXlWTjltY0FXUTM5SjM0VHZaU2FFSWZWNgo5TE1jUHhNTmFsNjRLMFRVbUFQZy
tGam9QSFlhUUxLOERLOUtnNXNrSE5pOWNzMlY5ckd6VWlVZWtBL0RBUlBTClI3L2ZjUFBac
DRuRWVBZmI3WTk1R1llb1p5V21SU3VKdlNyblBESGtUdW1vVlVWdkxMRHRzaG9reUxiTWV
tN3oKMmJzVmpwSW1GTHJqbGtmQXlpNHg0WjJrV3YyMFRrdWtsZU1jaVlMbjk4QWxiRi9DS
mRLM3QraTRoMTVlR2ZQegpoTnh3bk9QdlVTaDR2Q0o3c2Q5TmtEUGJvS2JneVVHOXBYamZ
hRGR2UVFLQmdRRFFLM01nUkhkQ1pKNVFqZWFKClFGdXF4cHdnNzhZTjQyL1NwenlUYmtG
cVFoQWtyczJxWGx1MDZBRzhrZzIzQkswaHkzaE9zSGgxcXRVK3NHZVAKOWRERHBsUWV0
ODZsY2FlR3hoc0V0L1R6cEdtNGFKSm5oNzVVaTVGZk9QTDhPTm1FZ3MxMVRhUldhNzZxelR
yMgphRlpjQ2pWV1g0YnRSTHVwSkgrMjZnY0FhUUtCZ1FEQmxVSUUzTnNVOFBBZEYvL25sQ
VB5VWs1T3lDdWc3dmVyClUycXlrdXFzYnBkSi9hODViT1JhM05IVmpVM25uRGpHVHBWaE9J
eXg5TEFrc2RwZEFjVmxvcG9HODhXYk9lMTAKMUdqbnkySmdDK3JVWUZiRGtpUGx1K09IYn
RnOXFYcGJMSHBzUVpsMGhucDBYSFNYVm9CMUliQndnMGEyOFVadApCbFBtWmc2d1BRS0
JnRHVIUVV2SDZHYTNDVUsxNFdmOFhIcFFnMU16M2VvWTBPQm5iSDRvZUZKZmcraEppS
XlnCm9RN3hqWldVR3BIc3AyblRtcHErQWlSNzdyRVhsdlhtOElVU2FsbkNiRGlKY01Pc29RdFBZ
NS9NczJMRm5LQTQKaENmL0pWb2FtZm1nZEN0ZGtFMXNINE9MR2lJVHdEbTRpb0dWZGIw
MllnbzFyb2htNUpLMUI3MkpBb0dBUW01UQpHNDhXOTVhL0w1eSt5dCsyZ3YvUHM2VnBvMj
ZlTzRNQ3lJazJVem9ZWE9IYnNkODJkaC8xT2sybGdHZlI2K3VuCnc1YytZUXRSTHlhQmd3MUt
pbGhFZDBKTWU3cGpUSVpnQWJ0LzVPbnlDak9OVXN2aDJjS2lrQ1Z2dTZsZlBjNkQKckliT2ZI
aHhxV0RZK2Q1TGN1YSt2NzJ0RkxhenJsSlBsRzlOZHhrQ2dZRUF5elIzT3UyMDNRVVV6bUlCR
kwzZAp4Wm5XZ0JLSEo3TnNxcGFWb2RjL0d5aGVycjFDZzE2MmJaSjJDV2RsZkI0VEdtUjZZdm
xTZEFOOFRwUWhFbUtKCnFBLzVzdHdxNWd0WGVLOVJmMWxXK29xNThRNTBxMmk1NVd
UTThoSDZhTjlaMTltZ0FGdE5VdGNqQUx2dFYxdEYKWSs4WFJkSHJaRnBIWll2NWkwVW1Vb
Gc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K"
Now modify your nginx replicas to start an https server using the certificate in the secret, and
the Service, to expose both ports (80 and 443):
service/networking/nginx-secure-app.yaml
apiVersion: v1
kind: Service
metadata:
name: my-nginx
labels:
run: my-nginx
spec:
type: NodePort
ports:
- port: 8080
targetPort: 80
protocol: TCP
name: http
- port: 443
protocol: TCP
name: https
selector:
run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 1
template:
metadata:
labels:
run: my-nginx
spec:
volumes:
- name: secret-volume
secret:
secretName: nginxsecret
- name: configmap-volume
configMap:
name: nginxconfigmap
containers:
- name: nginxhttps
image: bprashanth/nginxhttps:1.0
ports:
- containerPort: 443
- containerPort: 80
volumeMounts:
- mountPath: /etc/nginx/ssl
name: secret-volume
- mountPath: /etc/nginx/conf.d
name: configmap-volume
Noteworthy points about the nginx-secure-app manifest:
At this point you can reach the nginx server from any node.
Note how we supplied the -k parameter to curl in the last step, this is because we don't know
anything about the pods running nginx at certificate generation time, so we have to tell curl to
ignore the CName mismatch. By creating a Service we linked the CName used in the certificate
with the actual DNS name used by pods during Service lookup. Let's test this from a pod (the
same secret is being reused for simplicity, the pod only needs nginx.crt to access the Service):
service/networking/curlpod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: curl-deployment
spec:
selector:
matchLabels:
app: curlpod
replicas: 1
template:
metadata:
labels:
app: curlpod
spec:
volumes:
- name: secret-volume
secret:
secretName: nginxsecret
containers:
- name: curlpod
command:
- sh
- -c
- while true; do sleep 1; done
image: radial/busyboxplus:curl
volumeMounts:
- mountPath: /etc/nginx/ssl
name: secret-volume
$ curl https://<EXTERNAL-IP>:<NODE-PORT> -k
...
<h1>Welcome to nginx!</h1>
Let's now recreate the Service to use a cloud load balancer. Change the Type of my-nginx
Service from NodePort to LoadBalancer:
curl https://<EXTERNAL-IP> -k
...
<title>Welcome to nginx!</title>
The IP address in the EXTERNAL-IP column is the one that is available on the public internet.
The CLUSTER-IP is only available inside your cluster/private cloud network.
Note that on AWS, type LoadBalancer creates an ELB, which uses a (long) hostname, not an IP.
It's too long to fit in the standard kubectl get svc output, in fact, so you'll need to do kubectl
describe service my-nginx to see it. You'll see something like this:
What's next
• Learn more about Using a Service to Access an Application in a Cluster
• Learn more about Connecting a Front End to a Back End Using a Service
• Learn more about Creating an External Load Balancer
Using Source IP
Applications running in a Kubernetes cluster find and communicate with each other, and the
outside world, through the Service abstraction. This document explains what happens to the
source IP of packets sent to different types of Services, and how you can toggle this behavior
according to your needs.
NAT
Network address translation
Source NAT
Replacing the source IP on a packet; in this page, that usually means replacing with the IP
address of a node.
Destination NAT
Replacing the destination IP on a packet; in this page, that usually means replacing with
the IP address of a Pod
VIP
A virtual IP address, such as the one assigned to every Service in Kubernetes
kube-proxy
A network daemon that orchestrates Service VIP management on every node
Prerequisites
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured
to communicate with your cluster. It is recommended to run this tutorial on a cluster with at
least two nodes that are not acting as control plane hosts. If you do not already have a cluster,
you can create one by using minikube or you can use one of these Kubernetes playgrounds:
• Killercoda
• Play with Kubernetes
The examples use a small nginx webserver that echoes back the source IP of requests it receives
through an HTTP header. You can create it as follows:
Note:
deployment.apps/source-ip-app created
Objectives
• Expose a simple application through various types of Services
• Understand how each Service type handles source IP NAT
• Understand the tradeoffs involved in preserving source IP
Get the proxy mode on one of the nodes (kube-proxy listens on port 10249):
# Run this in a shell on the node you want to query.
curl http://localhost:10249/proxyMode
iptables
You can test source IP preservation by creating a Service over the source IP app:
service/clusterip exposed
Waiting for pod default/busybox to be running, status is Pending, pod ready: false
If you don't see a command prompt, try pressing enter.
# Replace "10.0.170.92" with the IPv4 address of the Service named "clusterip"
wget -qO - 10.0.170.92
CLIENT VALUES:
client_address=10.244.3.8
command=GET
...
The client_address is always the client pod's IP address, whether the client pod and server pod
are in the same node or in different nodes.
service/nodeport exposed
If you're running on a cloud provider, you may need to open up a firewall-rule for the
nodes:nodeport reported above. Now you can try reaching the Service from outside the cluster
through the node port allocated above.
client_address=10.180.1.1
client_address=10.240.0.5
client_address=10.240.0.3
Note that these are not the correct client IPs, they're cluster internal IPs. This is what happens:
Visually:
To avoid this, Kubernetes has a feature to preserve the client source IP. If you set
service.spec.externalTrafficPolicy to the value Local, kube-proxy only proxies proxy requests to
local endpoints, and does not forward traffic to other nodes. This approach preserves the
original source IP address. If there are no local endpoints, packets sent to the node are dropped,
so you can rely on the correct source-ip in any packet processing rules you might apply a
packet that make it through to the endpoint.
service/nodeport patched
client_address=198.51.100.79
Note that you only got one reply, with the right client IP, from the one node on which the
endpoint pod is running.
Visually:
You can test this by exposing the source-ip-app through a load balancer:
service/loadbalancer exposed
Print out the IP addresses of the Service:
curl 203.0.113.140
CLIENT VALUES:
client_address=10.240.0.5
...
Visually:
healthCheckNodePort: 32122
The service.spec.healthCheckNodePort field points to a port on every node serving the health
check at /healthz. You can test this:
A controller running on the control plane is responsible for allocating the cloud load balancer.
The same controller also allocates HTTP health checks pointing to this port/path on each node.
Wait about 10 seconds for the 2 nodes without endpoints to fail health checks, then use curl to
query the IPv4 address of the load balancer:
curl 203.0.113.140
CLIENT VALUES:
client_address=198.51.100.79
...
Cross-platform support
Only some cloud providers offer support for source IP preservation through Services with
Type=LoadBalancer. The cloud provider you're running on might fulfill the request for a
loadbalancer in a few different ways:
1. With a proxy that terminates the client connection and opens a new connection to your
nodes/endpoints. In such cases the source IP will always be that of the cloud LB, not that
of the client.
2. With a packet forwarder, such that requests from the client sent to the loadbalancer VIP
end up at the node with the source IP of the client, not an intermediate proxy.
Load balancers in the first category must use an agreed upon protocol between the loadbalancer
and backend to communicate the true client IP such as the HTTP Forwarded or X-
FORWARDED-FOR headers, or the proxy protocol. Load balancers in the second category can
leverage the feature described above by creating an HTTP health check pointing at the port
stored in the service.spec.healthCheckNodePort field on the Service.
Cleaning up
Delete the Services:
This tutorial explains the flow of Pod termination in connection with the corresponding
endpoint state and removal by using a simple nginx web server to demonstrate the concept.
Let's say you have a Deployment containing of a single nginx replica (just for demonstration
purposes) and a Service:
service/pod-with-graceful-termination.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
terminationGracePeriodSeconds: 120 # extra long grace period
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
lifecycle:
preStop:
exec:
# Real life termination may take any time up to terminationGracePeriodSeconds.
# In this example - just hang around for at least the duration of
terminationGracePeriodSeconds,
# at 120 seconds container will be forcibly terminated.
# Note, all this time nginx will keep processing requests.
command: [
"/bin/sh", "-c", "sleep 180"
]
service/explore-graceful-termination-nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
Now create the Deployment Pod and Service using the above files:
Once the Pod and Service are running, you can get the name of any associated EndpointSlices:
You can see its status, and validate that there is one endpoint registered:
{
"addressType": "IPv4",
"apiVersion": "discovery.k8s.io/v1",
"endpoints": [
{
"addresses": [
"10.12.1.201"
],
"conditions": {
"ready": true,
"serving": true,
"terminating": false
Now let's terminate the Pod and validate that the Pod is being terminated respecting the
graceful termination period configuration:
All pods:
While the new endpoint is being created for the new Pod, the old endpoint is still around in the
terminating state:
{
"addressType": "IPv4",
"apiVersion": "discovery.k8s.io/v1",
"endpoints": [
{
"addresses": [
"10.12.1.201"
],
"conditions": {
"ready": false,
"serving": true,
"terminating": true
},
"nodeName": "gke-main-default-pool-dca1511c-d17b",
"targetRef": {
"kind": "Pod",
"name": "nginx-deployment-7768647bf9-b4b9s",
"namespace": "default",
"uid": "66fa831c-7eb2-407f-bd2c-f96dfe841478"
},
"zone": "us-central1-c"
},
{
"addresses": [
"10.12.1.202"
],
"conditions": {
"ready": true,
"serving": true,
"terminating": false
},
"nodeName": "gke-main-default-pool-dca1511c-d17b",
"targetRef": {
"kind": "Pod",
"name": "nginx-deployment-7768647bf9-rkxlw",
"namespace": "default",
"uid": "722b1cbe-dcd7-4ed4-8928-4a4d0e2bbe35"
},
"zone": "us-central1-c"
This allows applications to communicate their state during termination and clients (such as load
balancers) to implement a connections draining functionality. These clients may detect
terminating endpoints and implement a special logic for them.
In Kubernetes, endpoints that are terminating always have their ready status set as as false. This
needs to happen for backward compatibility, so existing load balancers will not use it for
regular traffic. If traffic draining on terminating pod is needed, the actual readiness can be
checked as a condition serving.
What's next
• Learn how to Connect Applications with Services
• Learn more about Using a Service to Access an Application in a Cluster
• Learn more about Connecting a Front End to a Back End Using a Service
• Learn more about Creating an External Load Balancer