Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
59 views59 pages

VMMIG Module05 Optimize Phase

The Optimize phase focuses on updating migrated workloads to fully leverage cloud capabilities like managed instance groups, autoscaling, and cost optimization. Key activities include implementing image and configuration management strategies, enabling high availability and disaster recovery, consolidating networking and security, adopting managed services, and optimizing costs.

Uploaded by

NISHANT KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views59 pages

VMMIG Module05 Optimize Phase

The Optimize phase focuses on updating migrated workloads to fully leverage cloud capabilities like managed instance groups, autoscaling, and cost optimization. Key activities include implementing image and configuration management strategies, enabling high availability and disaster recovery, consolidating networking and security, adopting managed services, and optimizing costs.

Uploaded by

NISHANT KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

The Optimize Phase

Assess/Discover Plan/Foundation Migrate! Optimize your


your application Create a landing Pick a path, operations and
landscape zone to the cloud and get started save on costs

Cloud migration is the journey, the end-to-end lifecycle whereby


things move from other locations (on-prem, other clouds) and into
GCP is the destination where these things migrate to,
the GCP.
and which are often modernized/optimized in-cloud
afterwards.
Learn how to...
Leverage image and configuration management solutions

Enable autoscaling and rolling updates

Provide high-availability and disaster recovery solutions

Consolidate and simplify network and security settings

Select managed services to replace migrated workloads

Optimize costs

Migrate VMs directly into containers with Migrate for Anthos

1.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
The Optimize phase is where it gets cloudy

Having moved your workloads, you can now update to fully exploit the cloud.

Input Activites Output


● Migrated VMs ● Update and prioritize backlog of ● Updated workloads
● Business objectives optimizations
● Relationships with app ● Design and test strategy for specific
owners optimizations
● Implement optimizations
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Basic image management scheme

Base OS install / GCE Public image

Period 1 Hardened OS Image

Period 2 Platform image

Period 3 App image

https://cloud.google.com/solutions/image-management-best-practices

Start with a base OS installation, or if building images for GCP, start with a public boot
image

Periodically, take the base image and harden it by removing services, change
settings, installing security components, etc. Build the hardened image every 90 days,
or whatever frequency makes sense in the organization. This becomes that basis of
subsequent builds.

More frequently, build platform-specific images. One image for web servers, one for
application servers, one for databases, etc. Build this image maybe every 30 days.

As frequently as you build an app, create new VM images for the new versions of the
application. You might create new application images on a daily basis.
Goals for image management

Post-migration Consistent, image-based Managed Instance Groups,


Autoscaling, Rolling
Bespoke 1 Server 1 Updates
Compute Engine Compute Engine

Servers
Compute Engine
Bespoke 2 Server 2
Compute Engine Compute Engine

v1.0
Bespoke 3 Server 3
Boot image
Compute Engine Compute Engine

v1.1
Bespoke 4 Server 4 Boot image
Compute Engine Compute Engine

After migration, you have servers with independent configurations. They may, or may
not, be managed with a configuration management solution. However, each is
managed as a unique asset.

By updating the servers to all use a consistent base image, you ensure uniform
configuration across multiple instances. You also make it possible to combine like
servers into managed instance groups. This provides benefits such as:
- Health checks
- Ability to resize the cluster easily
- Autoscaling (for workloads that will scale horizontally)
- A cloud-native approach to VM updates - that is, use of immutable images.
This, combined with the rolling update feature of MIGs, makes rolling out new
versions easy.
Organizational maturity

Robust image factory Core image library What's an image?

?
Web image

DB image

Customers may or may not have well developed image creation/management


practices in place for managing creation of VM images for on-prem or AWS
deployments.

Robust systems will be run much like a standard DevOps pipeline. Commits to a code
base will trigger build jobs, which will create/test/deploy images. The image building
tool can leverage configuration management systems to automate the configuration of
the image.

Many customers will have some version of the second option, with a set of images
that may be built manually or with partial automation. They don't get built as often,
and certainly not daily.

Some customers will have hand-crafted servers, and have no existing process in
place for creating/baking images.
GCP Images

Public image On-prem image definition Migrated VM Disk

Baked image Baked image Baked image

https://cloud.google.com/solutions/image-management-best-practices

There are three main approaches to creating GCP boot images that can be used for
managed instance groups.

Best: Build an image from the ground up, starting with a public image. Develop a
clean CI/CD pipeline for generating these images, using tools like Packer and
Chef/Puppet/Ansible.

Good: Use existing image-generation pipelines and produce output images for GCP.
Tools like Packer and Vagrant that are being used to produce VMware images can
also output these images for use with GCP.

Not-so-good (some would say bad): Take the migrated VMs disk and create an image.
Then manually prune and tailor the image.
How much configuration is baked in?

Base image, Major


Everything in
configuration components
image
on boot in image

https://cloud.google.com/solutions/image-management-best-practices

There are many variables that go into deciding how much you bake into an image:

- How mature is the organization when it comes to building images frequently


and efficiently (more mature -> bake in more)
- How long does it take to install necessary components so your app is
functional (longer install times -> bake in more)
- To what extent do you want to move away from in-place upgrade to immutable
images and machine replacement (away from in-place -> bake in more)
Demo: Image factory with Packer and GCP

https://cloud.google.com/community/tutorials/create-cloud-build-image-factory-using-p
acker

Another example can be found here:


https://cloud.google.com/solutions/automated-build-images-with-jenkins-kubernetes

Packer can't SSH in successfully on instances where OS-Login is enabled. Make sure
the metadata to enable this feature is not set on the project where you are demoing.
Optimizing for configuration management
Disable network, authorization, firewall Disable network, authorization, firewall
management in playbooks management in playbooks

On premises On premises
CM Server Routes, firewall rules,
bandwidth
Compute Engine

CM Server CM Server
Migrated VM Migrated VM
Compute Engine Compute Engine

VM VM

Migrated VM Migrated VM
Compute Engine Compute Engine
VM VM

Migrated VM Migrated VM
Compute Engine VM Compute Engine VM

https://cloud.google.com/solutions/configuration-management/

As noted in the module on the Plan phase, companies should really have
configuration management for their on-prem assets in place prior to migrating VMs
into the cloud.

When extending infrastructure into the cloud, one common approach is to place CM
servers in the cloud as well. You configure the on-prem servers to manage the
on-prem inventory, and the cloud servers to manage the cloud inventory. You then
have either separate playbooks for the different environments, or adaptable playbooks
that use environment-specific variables or context to perform slightly different
configuration depending on whether the VM is in the cloud or on-prem.

An alternative approach is to leave the CM infrastructure on-prem, and have


configuration management orchestration happen across the interconnect. This
obviously is affected by latency, available bandwidth, and network access.

For VMs migrated into GCP, you'll want to remove the normal CM commands that
configure network, firewall, and authorization settings as they will be managed
differently in GCP.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing for scaling and release management

● Managed Instance Groups offer…


○ Health checks
○ Autoscaling
○ Rolling updates and restarts
○ A/B testing, canary releases

● Stateless apps lend themselves to horizontal scaling


● Some stateful apps are not too difficult to refactor (move state off
server)
● Apps with licensing restrictions, MAC address hard coding, complex
state aren't good candidates for autoscaling

https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-manag
ed-instance-groups

GCE Managed Instance groups provide a mechanism for updating instances by


replacing running VMs with new VMs, using a new image. With this scheme,
significant changes (and perhaps all changes) are accomplished not by using
configuration management processes to update the software on the server. The
Rolling Update feature allows zero-downtime upgrades to instance groups, and can
support A/B testing, canary releases, and rollback.

In addition to ability to accommodate scaling out horizontally, you need to consider


scaling back, or deleting instances. Applications that have long-lived sessions, or
long-running processes, might not scale down as expected. There can be other
reasons that applications don't tolerate removal of instances well.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing for high availability

● Distribute workloads across zones


○ Regional MIGs Region
○ Load balancing
Zone A Zone B Zone C

● Potentially distribute across regions App


Compute Engine

● Use resilient data stores


○ GCS is inherently HA
○ GCS offers multi-regional buckets
○ Managed services are often HA Name MR Bucket
Load Balancing Cloud Storage

https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations#
high-availability
https://cloud.google.com/docs/geography-and-regions

Regional instances groups will distribute instances created from your template across
zones. If a zone goes down, the instances in other zones remain available. If a zone
goes down, the regional MIG will not automatically create additional instances in the
remaining zones (unless autoscaling is enabled). An alternative is to use multiple
zonal managed instance groups.

Google typically recommends single-region deployments as being sufficient for


achieving high availability. Multi-region deployments do increase availability, but can
significantly increase costs due to network fees, and introduce other challenges
based on your application design. More often, deploying across regions is motivated
by a desire to place the app near consumers to reduce latency and improve
performance. It is also a strategy for disaster recovery situations.

Google's database managed services all offer high-availability options.

Not mentioned on slide, but also important, is ensuring you have a high-availability
interconnect between GCP and your on-premises networks. This should have been
handled during the Plan phase.
Optimizing for disaster recovery

● Your strategy depends on your


○ Recovery time objective
○ Recovery point objective

● The lower the tolerance for loss, the higher the cost and complexity
● Options include…
○ Cold: rebuild app in another region
○ Warm: unused app in another region
○ Hot: app runs across regions

https://cloud.google.com/solutions/dr-scenarios-planning-guide
DR: Cold pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB App DB
Cloud SQL Cloud SQL
Deployment
Manager

MR Bucket
Cloud Storage

https://cloud.google.com/solutions/dr-scenarios-planning-guide

The original environment is deployed using Infrastructure as Code (IaC). The app is
implemented using a managed instance group and instance templates. The database
is backed up periodically to a multiregional bucket.

If a regional fails, the application can be redeployed fairly quickly into a new region,
the database can be restored from the latest backup, and the load balancer can be
reconfigured with a new backend service.

RTO is bounded typically by the time required to restore the DB. RPO is bounded by
how frequently you perform database backups.
DR: Warm pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB Replica
Cloud SQL Cloud SQL

https://cloud.google.com/solutions/dr-scenarios-planning-guide

App deployments are made into multiple regions, but failover regions have smaller
application MIGs which don't serve traffic. A DB replica is created in the failover
region, and this receives replication traffic from the DB master, keeping it nearly
entirely up-to-date.

In the case of failure, update the load balancer to include the region 2 instance group
as a backend, increase the size of the instance group, and point the app to the replica
(this could be done via dns changes, or by placing a load balancer in front of the DB
and changing the load balancing configuration).

This design reduces the RTO and RPO significantly. However, it does introduce
cross-regional replication costs.
DR: Hot pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB
Cloud Spanner

https://cloud.google.com/solutions/dr-scenarios-planning-guide

App deployment occurs in multiple regions. The load balancer does geo-aware
routing of requests to the nearest region. The backing database service, Spanner,
handles replication across regions.

If a region goes down, the application continue to operate without interruption.


Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing routes and firewall rules

● After many sprints, there is often


an untidy collection of routes and
firewall rules
○ Best practice is to consolidate
and simplify
○ Use Security Command Center to
discover routes and rules
○ Consider Forseti for firewall rule
scanning

● Google recommends service


account-based firewall rules

https://www.youtube.com/watch?v=1ibeCQjjpBw&autoplay=1
https://forsetisecurity.org/about/
https://cloud.google.com/vpc/docs/firewalls#service-accounts-vs-tags
https://cloud.google.com/blog/products/gcp/simplify-cloud-vpc-firewall-management-w
ith-service-accounts
Optimizing load balancing

● GCP Load balancers offer App LB


high-performance and Cloud Load Balancing

high-availability
App
Compute Engine
● Proxy-based load balancers offer
cross-regional routing
On premises
● Hybrid load balancing can be DNS

achieved with round robin and/or


HTTPS
Load Balancer

weighted DNS App


DNS

https://cloud.google.com/load-balancing/
Optimizing security at the edge

● Google's proxy-based load balancers


protect against DDOS
○ SYN floods
○ IP fragment floods
○ Port exhaustion
○ Etc.

● Cloud Armor provides additional controls


to secure HTTP(S) load balancing
○ IP deny/allow list
○ Geo-based access control (alpha)
○ L3-L7 parameter-based rules (alpha)

https://cloud.google.com/files/GCPDDoSprotection-04122016.pdf
https://cloud.google.com/armor/
Optimizing secret management
Google APIs

● Cloud-native secrets management


solutions make managing secrets for Migrated VM
Service
account
GCP instances easier
● VM service accounts can be used by
apps when calling Google services
3rd party service
● Cloud KMS provides a means for
IAM-based encryption/decryption of
Cloud
secretes Migrated VM
KMS
● Cloud HSM does the same
Request to decrypt
GCS-stored secret GCS

https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-in
stances
https://cloud.google.com/kms/
https://cloud.google.com/hsm/
https://cloud.google.com/kms/docs/encrypt-decrypt

Apps running on instance can use the VM-assigned service account by using Cloud
Libraries; credentials are passed to the application via metadata.

Apps can leverage Cloud KMS to decrypt secrets that are either stored in app
configuration or in GCS. The application operates within the context of a service
account. That account has permissions to use a given key. That key is used by Cloud
KMS to encrypt/decrypt secrets. The diagram shows a secret being stored in GCS,
and the app asks KMS to read and decrypt the file.
Optimizing IAM configurations

● After many sprints, there is often an untidy collection of IAM role


assignments
● APIs make it possible to write tools that will extract role assignments
and definitions for analysis
○ Possible to place in BigQuery for resultant set of policy reporting

● New Policy Intelligence tools are in beta


○ Recommender uses machine learning to identify over-permissioning
○ Troubleshooter allows admins to visualize policies and why access
requests are denied
○ Validator allows teams to create permissions rules and monitoring
actual IAM configuration for violations

https://cloud.google.com/policy-intelligence/
Logging and monitoring for security

● Google services write logging and metrics data into Stackdriver


○ VMs with agents installed write guest OS and application data as well
○ This data can flow through to your logging/monitoring tools of choice

● Enable logging and create dashboards and alerts for new,


cloud-native signals
○ Changes to IAM role definitions
○ IAM role assignments
○ Firewall rule logging
○ VPC flow logs
○ Etc.

● If you are using non-Stackdriver logging/monitoring, you'll likely need


cloud-based aggregators

https://cloud.google.com/logging/docs/export/
https://cloud.google.com/solutions/exporting-stackdriver-logging-for-splunk
https://www.splunk.com/blog/2016/03/23/announcing-splunk-add-on-for-google-cloud-
platform-gcp-at-gcpnext16.html
https://resources.netskope.com/cloud-security-collateral-2/netskope-for-google-cloud-
platform
https://help.sumologic.com/03Send-Data/Sources/02Sources-for-Hosted-Collectors/G
oogle-Cloud-Platform-Source
https://cloud.google.com/logging/docs/audit/
https://cloud.google.com/vpc/docs/using-flow-logs
https://cloud.google.com/vpc/docs/firewall-rules-logging
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing with managed services

● One common way to optimize your applications is to replace


VM-based parts of the architecture with managed services
○ MySQL -> Cloud SQL, Spanner
○ HBase -> Bigtable
○ Kafka -> Cloud Pub/Sub
○ Hadoop/Spark -> Dataproc

● Leveraging managed services has multiple benefits


○ Much reduced administrative overhead
○ Potentially better availability and scalability
○ Increased functionality
Managed services behave differently

GCP Service VM-based solution Things to know...

CloudSQL MySQL Cost, not on VPC, see differences and issues docs

Pub/Sub Kafka Messaging only, no ordering, at least once delivery, different


latencies, general architecture, pay by volume

Dataproc Hadoop/Spark Cost for large persistent clusters, GCS performance


characteristics, workflows, configuration mechanisms

Memorystore Redis Failover period not configurable, no persistence, no support for


user modules

https://cloud.google.com/sql/docs/mysql/features#differences
https://cloud.google.com/sql/faq

Cloud SQL costs roughly 2x the cost of un-managed MySQL running on a VM. Cloud
SQL VMs are not on a VPC in the project; they are accessed via peering or public IP.

https://cloud.google.com/pubsub/architecture
https://cloud.google.com/pubsub/docs/faq
https://cloud.google.com/pubsub/docs/ordering
https://cloud.google.com/pubsub/pricing

https://cloud.google.com/dataproc/pricing
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
https://cloud.google.com/dataproc/docs/resources/faq

Dataproc's $0.01/vcpu/hr. charge adds up on very large clusters that are long lived.

In general, the online documentation does a good job of detailing key issues. Review
the concepts section, the known issues section, and the pricing. Also, Googling <gcp
product> vs. <other product) often yields good initial results.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Make sure you tailor instance sizes to real needs

● You chose VM sizes during initial planning and migration


● After the workloads have been in production for a while, you should
review and evaluate the real usage
○ Stackdriver is your friend

● Have at least one phase where you go through an additional


right-sizing pass
○ Note that resizing a VM does entail downtime
○ With Managed Instance Groups, you can do this with rolling updates

● GCP offers sizing recommendations


○ Based on Stackdriver metrics over 8 days
○ Stackdriver Monitoring agent improves recommendations

API access to recommendations is coming soon!

Sizing recommendations are currently not available for: VM instances created


using App Engine Flexible Environment, Cloud Dataflow, or Google Kubernetes
Engine; or VM instances with ephemeral disks, GPUs, or TPUs.

The sizing recommendation algorithm is suited to workloads that follow


weekly patterns, workloads that grow or shrink over weeks of time, workloads
that persistently underutilize their resources, or workloads that are persistently
throttled by insufficient resources. In such cases, 8 days of historical data is
enough to predict how a change in the size of the machine can improve
resource utilization.

The sizing recommendation algorithm is less suited to workloads that spike


less frequently (for example, monthly spikes) because 8 days of data is not
enough to capture or predict the processing fluctuations.

https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-i
nstances
There's more to TCO than VM costs

Persistent Disk

Network egress

Intra-VPC traffic
costs

Load balancer
costs

BigQuery and billing exports are your friends

https://cloud.google.com/billing/docs/how-to/export-data-bigquery

BigQuery is hugely useful for analyzing billing data. It can be used to find large,
and potentially unexpected, sources of cost - which can then be optimized.
Watch network costs

● Look for intra-VPC network costs


○ Consider moving vms into same zone or region

● Avoid VPC egress


○ Place VMs that exchange traffic on same VPC, or use peering

● Consider standard tier networking


○ Consider latency and reliability tradeoffs

Remember that traffic transferred within a VPC, but across zones or regions incurs
costs (in addition to the more obvious VPC egress).

https://cloud.google.com/vpc/docs/vpc-peering
https://cloud.google.com/vpc/docs/shared-vpc
https://cloud.google.com/network-tiers/
Working with budgets

● GCP has a mechanism for setting budgets


○ Can provide alerts when reaching % of budget

● GCP also offers programmatic budget notifications


○ Sends messages via Pub/Sub
○ Cloud Functions are an easy way to receive and act on messages
○ Theoretically, you could disable runaway services

● BigQuery and App Engine offer cost controls


○ Can stop use of product after $X spend

https://cloud.google.com/billing/docs/how-to/budgets
https://cloud.google.com/billing/docs/how-to/notify
https://cloud.google.com/bigquery/docs/custom-quotas
https://cloud.google.com/appengine/pricing#spending_limit
Lab 13
Defining an optimization strategy
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Moving VMs into containers

Why Kubernetes/GKE?

Secure kernel

Density

Resiliency

Modernization

Experience with desired end state

GKE is simply the best K8s experience

Google offers automatic updates, which keeps the kernel on the machines running
you apps secure.

You can run more apps on a given host for better resource utilization.

If a node goes down, workload is rescheduled to another node, and quickly

Istio, for example, allows you to make service discovery, traffic splitting, authorization,
circuit break patterns, and other features easy to implement without having to rewrite
apps.

Teams can get experience using GKE and K8s without having to totally re-engineer
their apps.
Migrate for Anthos

https://cloud.google.com/migrate/anthos/
Migrate for Anthos components

● Migrate for Compute Engine


○ Migrate for Anthos works on the Migrate for Compute Engine infrastructure
○ Migrate for Compute Engine should be deployed prior to Migrate for Anthos

● Google Kubernetes Engine cluster


○ VMs will be migrated into StatefulSets on the the target GKE cluster
○ The cluster must be deployed prior to installation Migrate for Anthos

● Migrate for Anthos application


○ Installed via Marketplace onto the GKE cluster

● Firewall rules
○ Two additional firewall rules are required to allow migrated workloads to
speak to the Migrate Manager and Cloud Extension
Creating the GKE Cluster

gcloud container --project project-name \


clusters create gke-cluster-name \
--zone us-central1-a \
--username "admin" \
--machine-type "machine-type" --image-type "UBUNTU" \
--num-nodes number-of-nodes \
--enable-stackdriver-kubernetes

https://cloud.google.com/migrate/anthos/docs/configuring-a-cluster
Deploying Migrate for Anthos

https://cloud.google.com/migrate/anthos/docs/creating-migrate-anthos-configuration
Migrate for Anthos architecture

GKE Cluster
controller
component

GCS (cache) VMDK

per-node
component Edge A Edge B Backend

storageclass

kubectl apply -f kubectl apply -f


pvc.yaml app.yaml

The installation of Anthos for Migrate automatically deploys a variety of resources to


your GKE cluster, including:

1. Container Storage Interface (CSI) driver


(https://kubernetes-csi.github.io/docs/)
a. Comprised of a controller component and a per-node component
(https://kubernetes-csi.github.io/docs/deploying.html)
b. Provides the "glue" to allow containers to talk to Migrate's streaming
storage
2. Storage class (https://kubernetes.io/docs/concepts/storage/storage-classes/)
a. Used to create PVCs pointing at streaming-backed storage

To migrate an app

1. Deploy the PVC. This is the storage resource that will be used by the pod. The
blocks it accesses are provided via the driver, which is talking to the Edge
node, which is talking to the backend. The streaming mechanism from the
backend to the Cloud Extension is what you've seen before. The new part is
the conduit from the edge node, through the CSI driver, to the PVC, to the
pod.
1. Deploy the app, with the pod definition including a volume that uses the PVC.

2. Deploy any service that sits in front of your app

You can migrate from AWS and Azure as well:


https://cloud.google.com/migrate/anthos/docs/migrate-aws-to-gke
PVC Configuration
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: [PVC_NAME]
annotations:
# Replace vm-id with a unique identifier. See the prerequisites
# earlier in this topic.
anthos-migrate.gcr.io/vm-id: [VM_ID]
anthos-migrate.gcr.io/vm-data-access-mode: "FullyCached"
anthos-migrate.gcr.io/run-mode: "TestClone"
spec:
accessModes:
- ReadWriteOnce
# Replace with your Storage Class name defined when adding Migrate for Anthos to your cluster
storageClassName: [STORAGE_CLASS_NAME]
resources:
requests:
storage: 1Gi

This is an example PVC configuration. The administrator will need to populate the
values in the square brackets.
- The PVC name can be anything
- The VM_ID is the VMware-specific ID for a given VM. For details on how to
get the id, see
https://cloud.google.com/migrate/anthos/docs/migrate-vmware-to-gke
- The storage class name is typically specified during Migrate for Anthos
installation

There are also some options that can be set differently than in the example.
- vm-data-access-mode can be streaming or fully cached
- run-mode can be normal or testclone
Application Configuration
kind: StatefulSet # source-pvc needs to match the name of the PVC
apiVersion: apps/v1 declared above.
metadata: anthos-migrate.gcr.io/source-pvc: [PVC_NAME]
name: [APPLICATION_NAME] spec:
namespace: default containers:
spec: - name: [APPLICATION_NAME]
serviceName: [SERVICE_NAME] # The image for the Migrate for Anthos system
replicas: 1 container.
selector: image: anthos-migrate.gcr.io/v2k-run:v1.0.1
matchLabels:
app: [APPLICATION_NAME]
template:
metadata:
labels:
app: [APPLICATION_NAME]
annotations:
anthos-migrate.gcr.io/action: run
anthos-migrate.gcr.io/source-type: streaming-disk

(continued)...

This is an example application configuration. The administrator will need to populate


the values in the square brackets.

The source-type annotation dictates whether the CSI driver and streaming is used, or
whether it uses an exported PVC (more on this to come).
How does it work?

Migrate for Anthos uses a wrapper image to


create containers from your VMs. This image:
● Replaces the VM's operating system kernel
with one supported by GKE.
● Configures the container to use GKE
services.
● Mounts a PV from the source VM using the
Migrate for Anthos streaming CSI driver.
● Runs the applications and services from the
VM's user-space

https://cloud.google.com/migrate/anthos/docs/architecture

● Configures the container's network interfaces, DNS, console output,


logging and health status to use GKE services.
● Examines the volumes attached to the VM, locates the root partition,
and parses fstab for the file-system layout. Migrate for Anthos then
mounts a PV from the source VM using the Migrate for Anthos
streaming CSI driver.
● Runs the applications and services from the VM's user-space (for
example, those launched by SysV-style or systemd init scripts) within
the container.
Export storage is like storage migration

● In order for your workload to run independently of the source VM disks,


you'll need to export its storage to a standalone Persistent Volume (PV).
● To export storage…
○ Estimate the disk size required by your workload
○ Create a configuration file with definitions for the following objects:
Storage Class, PVC, ConfigMap, and Job
○ Stop the production workload pod and delete the StatefulSet
○ Deploy the configuration file

● To restart the app…


○ Redefine the app configuration to use the new PVC and redeploy

https://cloud.google.com/migrate/anthos/docs/export-storage
Export configuration (part 1)
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: [STORAGE_CLASS_NAME]
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: none

-
Export configuration (part 2)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
# Replace this with the name of your application
name: [TARGET_PVC_NAME]
spec:
storageClassName: [STORAGE_CLASS_NAME]
accessModes:
- ReadWriteOnce
resources:
requests:
# Replace this with the quantity you'll need in the target volume, such as
# 20G. You can use the included script to make this calculation (see the
# section earlier in this topic).
storage: [TARGET_STORAGE]

-
Export configuration (part 3)
apiVersion: v1
kind: ConfigMap
metadata:
name: [CONFIGMAP_NAME]
data:
config: |-
appSpec:
dataFilter:
- "- *.swp"
- "- /etc/fstab"
- "- /boot/"
- "- /tmp/*"

-
Export configuration (part 4)
apiVersion: batch/v1
kind: Job
metadata:
name: [JOB_NAME]
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
anthos-migrate.gcr.io/action: export
anthos-migrate.gcr.io/source-type: streaming-disk
anthos-migrate.gcr.io/source-pvc: [SOURCE_PVC_NAME]
anthos-migrate.gcr.io/target-pvc: [TARGET_PVC_NAME]
anthos-migrate.gcr.io/config: [CONFIGMAP_NAME]
spec:
restartPolicy: OnFailure
containers:
- name: exporter-sample
image: anthos-migrate.gcr.io/v2k-export:v1.0.1

-
Modified app config using exported storage
annotations:
kind: StatefulSet anthos-migrate.gcr.io/action: run
apiVersion: apps/v1 anthos-migrate.gcr.io/source-type: exported
metadata: anthos-migrate.gcr.io/source-pvc: [TARGET_PVC_NAME]
name: [STATEFULSET_NAME] spec:
spec: containers:
serviceName: "[SERVICE_NAME]" - name: [STATEFULSET_NAME]
replicas: 1 image: anthos-migrate.gcr.io/v2k-run:v1.0.1
selector:
matchLabels:
app: [STATEFULSET_NAME]
template:
metadata:
labels:
app: [STATEFULSET_NAME]
But wait, there's more (coming soon)

Virtual Machine StatefulSet + vertical StatefulSet + scaling


scaling
Job(s)
Job(s)
Application A
Application(s) Application(s)
Application B
Container Image + ConfigMap

Services Services Persistent Volumes


Networking
Persistent Volumes
Logging ConfigMaps
Networking Container Registry
OS kernel +
drivers
Logging Logging
Virtual hardware
(Net, Disks, …)
OS kernel + drivers OS kernel + drivers
Lab 14
Migrating VMs to containers with Migrate for Anthos
< Action Safe

Title Safe >


How does it work?

● Starting with user-space part of VM (/sbin/init and up)


● Process tree is wrapped with nested namespaces

Virtual Machine
Pod sandbox

initContainer container
Services
sshd / apache / crond

Storage Services
init / systemd Aggregator sshd / apache / crond
+
OS kernel Image
adaptation init / systemd
BIOS

Hypervisor + net/block v2k wrappers + nested

Basic concepts
● User-space part of VM (services *) runs within container
● Networking (NIC and DNS) provided by GKE
● Storage streaming happens in the background, abstracted by k8s PV
init/systemd vs. typical app
● Assumes to run as PID 1
○ Also deals with process reaping (app containers typically do not do
that)
● Does not react the same on SIGTERM or SIGBREAK
○ sysv expects SIGPWR
○ systemd typically expects SIGRTMIN+3
○ SIGBREAK will cause a reboot (privileged)
○ SIGKILL is usually blocked
● Typically runs multiple sub-processes under different user contexts
● Does not produce the console output in the same way
○ Typically works with terminal devices
● Most importantly, does a lot of setup (devices, cgroups, mounts, networking,
…)
○ Disable some network confiugration
○ Fix signals
○ Tacke care of devices
○ Disable some services (iptables, firewall, etc.)

You might also like