Week 5 Preparing For PCA Module 4
Week 5 Preparing For PCA Module 4
Professional
Cloud Architect
Journey
Module 4: Analyzing and Optimizing Technical
and Business Processes
Week 5 agenda
Diagnostic Questions
Optimizing Cymbal Data services for exam guide Section
Direct’s technical and (Filestore, Firestore, 4: Analyzing and
business processes Memorystore, optimizing technical
and procedures Spanner, BigQuery, and business
Bigtable) processes
1 2 3 4 5 6
● Cymbal Direct’s management wants to make sure that they can easily scale to handle additional
demand when needed, so they can feel comfortable with expanding to more test markets.
● Streamline development for application modernization and new features/products.
● Ensure that developers spend as much time on core business functionality as possible, and not
have to worry about scalability wherever possible.
● Allow for partners to order directly via API
● Get a production version of the social media highlighting service up and running, and ensure no
inappropriate content
Technical Requirements
Vulnerability Found
Key Management
Service
“If you plan to evaluate the security of your Cloud Platform infrastructure
with penetration testing, you are not required to contact us.”
Filestore
Proprietary + Confidential
Filestore
Managed NFS, NOT a database
File sharing, Software Dev, HPC, Financial Modeling, SAP, GKE, and
Workloads
and Web Hosting Pharma, and Analytics ‘Lift & Shift Apps”
Max Performance
1.2GiB/s | 60k 26GiB/s | 920k 1.2GiB/s | 120k
(Throughput | IOPS)
Firestore is ideal for applications that rely on highly available structured data at scale.
● Product catalogs that provide real-time inventory and product details for a retailer.
● User profiles that deliver a customized experience based on the user’s past activities and preferences.
● Transactions based on ACID properties
● OLTP relational database with full SQL support. Consider: Cloud SQL
● Data isn’t highly structured or no need for ACID transactions. Consider: Cloud Bigtable
● Interactive querying in an online analytical processing (OLAP) system. Consider: BigQuery
● Unstructured data such as images or movies, Consider: Cloud Storage
Firestore: Datastore mode vs Firestore (native) mode
Both Native Mode (only) Datastore Mode (only)
Real-time Yes
updates
… vs Firebase
Exam Tip: Firestore is a NoSQL Database, but Firebase
is a development platform with a ton of additional
features that uses Firestore. Make sure to differentiate
between them!
Firebase
*** Platform, NOT a database ***
Firebase is Google’s complete app development platform
Release
Testing
Complete = it provides different products to: management
● Build apps
Backend Analytics
● Test apps compute Develop Run
● Implement authentication (Firebase
Crash
Authentication can be a part of PCA exam on reporting
Data +
very high-level!) Authentication
● Run apps Engage
Messaging
● Run analytics
Experimentation
● Personalize apps Personalization
● And more…
Exam Tip: Firestore is usually a part of Firebase-based app (for storing and syncing data)
Memorystore
Spanner
What workloads fit Cloud Spanner best?
01 02 03 04
Sharded RDBMS Scalable relational Manageability/HA Multi-region
data
Manually sharding is Highly automated. Write once and
difficult. People do it Scalable relational Online Schema automatically replicate
to achieve scale. database. Instead of changes and your data to multiple
Cloud Spanner gives moving to NoSQL, patching. No planned regions.
you relational data move from one downtime and comes
Most customers use
and scale. relational database with up to a 99.999%
regional instances, but
to a more scalable availability SLA.
multi-region is there if
relational database.
you need it.
When Cloud Spanner fits less well
TIP
It’s NOT a straightforward thing to migrate a different RDBMS to
Cloud Spanner. Be familiar with challenges on high level.
1 2 3 4
[email protected]
BigQuery - Data Transfer Service
Mostly useful for regular data transfers to BigQuery
Exam Tip: There is additional cost for streaming (both inserts and reads) in BigQuery.
BigQuery: Sharing Datasets with others
AllAuthenticatedUsers
● Dataset expiration
○ = “default table expiration time” for a dataset
● Table expiration
○ If Dataset expiration is set, each table inherits this setting by default
● Partition expiration:
○ The setting applies to all partitions in the table, but is calculated
independently for each partition based on the partition time.
○ At any point after a table is created, you can update the table's
partition expiration
BigQuery: Table Partitioning
c2 c3 eventDate
2018-01-05
You can partition BigQuery tables by:
c1 userId c3
2018-01-01
2018-01-02
2018-01-03
2018-01-04
2018-01-05
Storage pricing is the cost to store data that you load into BigQuery. You pay for active storage and
long-term storage.
● Active storage includes any table or table partition that has been modified in the last 90 days.
● Long-term storage includes any table or table partition that has not been modified for 90
consecutive days. The price of storage for that table automatically drops by approximately
50%. There is no difference in performance, durability, or availability between active and long-term
storage.
Bigtable
Bigtable is a common migration target for key-value,
wide-column and time-series databases
● Petabyte-scale
● scales seamlessly
10MB
transaction histories, stock prices,
and currency exchange rates.
● Internet of Things data, such as Exam Tip: types of apps where you’d consider using
usage reports from energy meters Bigtable: recommendation engines, personalizing user
and home appliances. experience, Internet of Things, real-time analytics, fraud
● Graph data, such as information detection, migrating from HBase or Cassandra, Fintech,
about how users are connected to gaming, high-throughput data streaming for creating /
one another. improving ML models.
Bigtable for analytics… ?
Bigtable vs BigQuery
Ad-hoc analysis
Optimized for Point read/write
and reporting
Cohort
Typical target User/entity level
/population level
Database
Cloud Bigtable
HBase
Scripting & Querying “After” Scripting & Querying
“Before” HIVE, Impala, Pig, Mahout
HIVE, Impala, Pig, Mahout
Compute
Dataproc
Database
Distributed Processing Distributed Processing
Spark, MapReduce Spark, MapReduce
SQL vs noSQL
SQL (aka ‘Relational’) NoSQL (aka ‘Non-relational’)
“traditional” table-based RDBMSes key-value, wide column, document
Strongly typed, fixed schemas Dynamic schemas
Almost all ACID-compliant Mostly BASE
Considerable percentage of logic can be done in Most of logic needs to be offloaded to application
database layer
Default choice for most monoliths Suitable for some microservices
performance capped at some point (vertical Processing nodes often separate from storage
scaling only, plus sharding, offloading read-only nodes (if network is fast enough)
etc)
In GCP: Cloud SQL, Cloud Spanner In GCP: Firestore, Bigtable
Outside of GCP: MySQL, Oracle, PostgreSQL, Outside of GCP: MongoDB, Redis, Cassandra,
Microsoft SQL Server. HBase, CouchDB
Proprietary + Confidential
OLTP vs OLAP
OLTransactionalP OLAnalyticalP
For processing data in transaction-oriented Multi-dimensional, analytical queries used in
apps BI, reporting, data mining etc
Large amounts of transactions Large volume of data
A mix of Inserts, Updates, Deletes on individual Loading data from source + selects. Optimized
records. for high throughput reads on large number of
records
Tables are normalized Tables are not normalized
ACID & (mostly) SQL SQL (sometimes NoSQL)
Cloud SQL, Cloud Spanner BigQuery
Exam Tip: Here you’ll find a GREAT Decision tree for database choices on AWS, Microsoft Azure,
Google Cloud Platform, and cloud-agnostic
Cloud Storage
Good for: Good for: Good for: Good for: Good for: Good for: Good for:
Web/mobile apps, Web RDBMS+scale, Hierarchical, Heavy read + Binary or object Enterprise data
gaming frameworks HA, HTAP mobile, web write, events data warehouse
Such as:
Such as: Such as: User metadata, Such as: Such as: Such as: Such as:
Game state, user CMS, Ad/Fin/MarTec User profiles, AdTech, Images, media Analytics,
sessions eCommerce h Game State financial, IoT serving, backups dashboards
TIP
Try to read from bottom up (what’s the most appropriate storage for analytics
workloads? What’s good for global, horizontally scalable RDBMS?
Comparing storage options: Technical details
Complex
Yes No No Yes Yes Yes
queries
~10 MB/cell
Determined 10,240 MiB/
Unit size 1 MB/entity ~100 5 TB/object 10 MB/row
by DB engine row
MB/row
GCP: storage service decision tree
GCP: storage service decision tree (version #2)
NO YES
Is your data
Start
structured?
Do you need
mobile SDKs?
Is your workload
analytics?
Do you need
Cloud Storage Cloud Storage
updates or low
for Firebase
Is your data latency?
relational?
Do you need
Cloud Bigtable BigQuery
horizontal
High throughput Data warehouse
scalability? Do you need Tabular data
mobile SDKs?
C. Configure the quotas for resources in the regions not being used
to 0.
C. Configure the quotas for resources in the regions not being used
to 0.
Optional materials 1
[ READING ]
● Make sure you know the differences between BigQuery and BigTable.
● Be aware how BigQuery table partitioning works.
[ VIDEOS ]
● Cloud Networking 104 (Load Balancers): Cloud OnAir: Networking 104 - Everything You Need to Know About Load
Balancers on GCP
● Querying external data with BigQuery
● BigQuery: What is BigQuery?
● [IMPORTANT TO KNOW] Sharing BigQuery data with others: Protect data with authorized views
● BigTable: What is Cloud Bigtable?
● Data Studio introduction: Data Studio in a minute
● BigTable: What can you do with Bigtable?
● Cloud Spanner [5 min]: What is Cloud Spanner | Cloud Spanner Explained | Cloud Native Relational Database
● Cloud Spanner [2x5min]: How to set up a Cloud Spanner instance & Cloud Spanner: Database deep dive
● Introduction to Firestore: Introduction to Firestore | NoSQL Document Database
● What is Dataprep? (do not confuse with Dataproc, Dataflow and other Data<service>) No code data wrangling with
Dataprep #GCPSketchnote
● Decision tree to migrate Apache Hadoop workloads to Dataproc: Decision tree to migrate Apache Hadoop
workloads to Dataproc #GCPSketchnote
Proprietary + Confidential
Optional materials 2
● Creating a large Dataproc Cluster with preemptible VMs: Creating a large Dataproc Cluster with preemptible VMs
● What is Cloud Build?: What is Cloud Build? #GCPSketchnote
● Three ways to improve CI/CD in your serverless app
● How to protect secrets with Secret Manager: Level Up - Secret Manager
● What is Cloud Armor?: What is Cloud Armor? #GCPSketchnote
[ PODCASTS ]
● BigQuery Admin Reference Guides
● Firebase (not to be mixed up with Firestore!)
● Cloud Functions
● Cloud BigTable
[ DEEP DIVES ]
● BigQuery and Cloud Spanner deep dive: Under the hood of Google Cloud data technologies: BigQuery and Cloud
Spanner
● (~20 mins) BigQuery lab that will familiarize you with basics and show interesting insights at the same time.
Diagnostic Questions
for Exam Guide Section 4: Analyzing
and optimizing technical and business
processes
PCA Exam Guide Section 4:
Analyzing and optimizing technical and business processes
Considerations include:
● Software development life cycle (SDLC)
● Continuous integration / continuous deployment
● Troubleshooting / root cause analysis best practices
● Testing and validation of software and infrastructure
● Service catalog and provisioning
● Business continuity and disaster recovery
4.1 Diagnostic Question 01 Discussion
You are asked to implement a lift and shift A. Commit the configuration file to your software repository.
operation for Cymbal Direct’s Social Media B. Run terraform plan to verify the contents of the Terraform
Highlighting service. You compose a configuration file.
Terraform configuration file to build all
the necessary Google Cloud resources. C. Run terraform apply to deploy the resources described in the
configuration file.
What is the next step in the Terraform D. Run terraform init to download the necessary provider modules.
What should you do?
workflow for this effort?
4.1 Diagnostic Question 01 Discussion
You are asked to implement a lift and shift A. Commit the configuration file to your software repository.
operation for Cymbal Direct’s Social Media B. Run terraform plan to verify the contents of the Terraform
Highlighting service. You compose a configuration file.
Terraform configuration file to build all
the necessary Google Cloud resources. C. Run terraform apply to deploy the resources described in the
configuration file.
What is the next step in the Terraform D. Run terraform init to download the necessary provider modules.
What should you do?
workflow for this effort?
4.1 Diagnostic Question 02 Discussion
You have implemented a manual A. Implement and reference a source repository in your Cloud Build
CI/CD process for the configuration file.
container services required for B. Implement a build trigger that applies your build configuration when a
the next implementation of the new software update is committed to Cloud Source Repositories.
Cymbal Direct’s Drone Delivery
project. You want to automate C. Specify the name of your Container Registry in your Cloud Build
configuration.
the process.
D. Configure and push a manifest file into an environment repository in
Cloud Source Repositories.
What should you do?
4.1 Diagnostic Question 02 Discussion
You have implemented a manual A. Implement and reference a source repository in your Cloud Build
CI/CD process for the configuration file.
container services required for B. Implement a build trigger that applies your build configuration when a
the next implementation of the new software update is committed to Cloud Source Repositories.
Cymbal Direct’s Drone Delivery
project. You want to automate C. Specify the name of your Container Registry in your Cloud Build
configuration.
the process.
D. Configure and push a manifest file into an environment repository in
Cloud Source Repositories.
What should you do?
4.1 Diagnostic Question 03 Discussion
You have an application A. Implement a scheduled snapshot on your Compute Engine instances.
implemented on Compute Engine. B. Implement a regional managed instance group.
You want to increase the
durability of your application. C. Monitor your application’s usage metrics and implement autoscaling.
D. Perform health checks on your Compute Engine instances.
You have an application A. Implement a scheduled snapshot on your Compute Engine instances.
implemented on Compute Engine. B. Implement a regional managed instance group.
You want to increase the
durability of your application. C. Monitor your application’s usage metrics and implement autoscaling.
D. Perform health checks on your Compute Engine instances.
Developers on your team A. Implement a Cloud Build configuration file with build steps.
frequently write new versions B. Implement a build trigger that references your repository and branch.
of the code for one of your
applications. You want to C. Set proper permissions for Cloud Build to access deployment resources.
automate the build process D. Upload application updates and Cloud Build configuration files to Cloud Source
when updates are pushed to Repositories.
Cloud Source Repositories.
Developers on your team A. Implement a Cloud Build configuration file with build steps.
frequently write new versions B. Implement a build trigger that references your repository and branch.
of the code for one of your
applications. You want to C. Set proper permissions for Cloud Build to access deployment resources.
automate the build process D. Upload application updates and Cloud Build configuration files to Cloud Source
when updates are pushed to Repositories.
Cloud Source Repositories.
Your development team used Cloud Source A. The runtime environment does not have permissions to the
Repositories, Cloud Build, and Artifact Artifact Registry in your current project.
Registry to successfully implement the build B. The runtime environment does not have permissions to Cloud
portion of an application's CI/CD process.. Source Repositories in your current project.
However, the deployment process is erroring
out. Initial troubleshooting shows that the C. The Artifact Registry might be in a different project.
runtime environment does not have D. You need to specify the Artifact Registry image by name.
access to the build images. You need to
advise the team on how to resolve the issue.
Your development team used Cloud Source A. The runtime environment does not have permissions to the
Repositories, Cloud Build, and Artifact Artifact Registry in your current project.
Registry to successfully implement the build B. The runtime environment does not have permissions to Cloud
portion of an application's CI/CD process.. Source Repositories in your current project.
However, the deployment process is erroring
out. Initial troubleshooting shows that the C. The Artifact Registry might be in a different project.
runtime environment does not have D. You need to specify the Artifact Registry image by name.
access to the build images. You need to
advise the team on how to resolve the issue.
You are implementing a disaster A. Hot with a low recovery time objective (RTO)
recovery plan for the cloud version of B. Warm with a high recovery time objective (RTO)
your drone solution. Sending videos to
the pilots is crucial from an C. Cold with a low recovery time objective (RTO)
operational perspective. D. Hot with a high recovery time objective (RTO)
You are implementing a disaster A. Hot with a low recovery time objective (RTO)
recovery plan for the cloud version of B. Warm with a high recovery time objective (RTO)
your drone solution. Sending videos to
the pilots is crucial from an C. Cold with a low recovery time objective (RTO)
operational perspective. D. Hot with a high recovery time objective (RTO)
The pilot subsystem in your Delivery by A. Configure proper startup scripts for your VMs.
Drone service is critical to your service. B. Deploy a load balancer to distribute traffic across multiple
You want to ensure that connections machines.
to the pilots can survive a VM
outage without affecting connectivity. C. Create persistent disk snapshots.
D. Implement a managed instance group and load balancer.
The pilot subsystem in your Delivery by A. Configure proper startup scripts for your VMs.
Drone service is critical to your service. B. Deploy a load balancer to distribute traffic across multiple
You want to ensure that connections machines.
to the pilots can survive a VM
outage without affecting connectivity. C. Create persistent disk snapshots.
D. Implement a managed instance group and load balancer.
Cymbal Direct wants to improve its A. You should implement canary testing.
drone pilot interface. You want to B. You should implement A/B testing.
collect feedback on proposed
changes from the community of pilots C. You should implement a blue/green deployment.
before rolling out updates D. You should implement an in-place release.
systemwide.
Cymbal Direct wants to improve its A. You should implement canary testing.
drone pilot interface. You want to B. You should implement A/B testing.
collect feedback on proposed
changes from the community of pilots C. You should implement a blue/green deployment.
before rolling out updates D. You should implement an in-place release.
systemwide.
Securing the software development lifecycle with Cloud Build and SLSA
CI/CD with Google Cloud
Site Reliability Engineering
DevOps tech: Continuous testing | Google Cloud
Application deployment and testing strategies | Cloud Architecture Center
Chapter 17 - Testing for Reliability
Service Catalog documentation | Google Cloud
What is Disaster Recovery? | Google Cloud
API design guide
4.2 Analyzing and defining business processes
Considerations include:
● Stakeholder management (e.g. influencing and facilitation)
● Change management
● Team assessment / skills readiness
● Decision-making processes
● Customer success management
● Cost optimization / resource optimization (capex / opex)
4.2 Analyzing and defining business processes
● Chaos engineering
● Penetration testing
4.3 Diagnostic Question 10 Discussion
You want to establish procedures A. Block access to storage assets in one of your zones.
for testing the resilience of the B. Inject a bad health check for one or more of your resources.
delivery-by-drone solution.
C. Load test your application to see how it responds.
D. Block access to all resources in a zone.
You want to establish procedures A. Block access to storage assets in one of your zones.
for testing the resilience of the B. Inject a bad health check for one or more of your resources.
delivery-by-drone solution.
C. Load test your application to see how it responds.
D. Block access to all resources in a zone.