0% found this document useful (0 votes)

46 views17 pages

4.4 - Managed Services

Uploaded by

sibev61723

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views17 pages

4.4 - Managed Services

Uploaded by

sibev61723

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Managed Services

In the last module, we discussed how to automate the creation of infrastructure. As an

alternative to infrastructure automation, you can eliminate the need to create
infrastructure by leveraging a managed service.

Managed services are partial or complete solutions offered as a service. They exist on
a continuum between platform as a service and software as a service, depending on
how much of the internal methods and controls are exposed. Using a managed
service allows you to outsource a lot of the administrative and maintenance overhead
to Google, if your application requirements fit within the service offering.
Agenda
BigQuery

Cloud Dataflow

Cloud Dataprep

Cloud Dataproc

Demo

In this module, we give you an overview of BigQuery, Cloud Dataflow, Cloud Dataprep
by Trifacta, and Cloud Dataproc. Now all of these services are for data analytics
purposes, and since that’s not the focus of this course series, there won’t be any labs
in this module. Instead, we’ll have a quick demo to illustrate how easy it is to use
managed services.

Let’s start by talking about BigQuery.

BigQuery is GCP’s serverless, highly scalable, and
cost-effective cloud data warehouse

● Fully managed

● Petabyte scale BigQuery

● SQL interface

● Very fast

● Free usage tier

BigQuery is GCP’s serverless, highly scalable, and cost-effective cloud data

warehouse.

It is a petabyte-scale data warehouse that allows for super-fast queries using the
processing power of Google's infrastructure. Because there is no infrastructure for
you to manage, you can focus on uncovering meaningful insights using familiar SQL
without the need for a database administrator.

BigQuery is used by all types of organizations, and there is a free usage tier to help
you get started. For more information, see the links sections of this video
[https://cloud.google.com/free/].
Query example

SELECT language, SUM(views) as views

FROM (
SELECT title, language, MAX(views) as views
FROM [bigquery-samples:wikipedia_benchmark.Wiki100B]
WHERE REGEXP_MATCH(title, "G.*o.*")
GROUP EACH BY title, language
)
GROUP EACH BY language
ORDER BY views desc

Query 100 billion rows in less than 1 minute!

You can access BigQuery by using the GCP Console, by using a command-line tool,
or by making calls to the BigQuery REST API using a variety of client libraries such as
Java, .NET, or Python. There are also several third-party tools that you can use to
interact with BigQuery, such as visualizing the data or loading the data.

Here is an example of a query on a table with over 100 billion rows. This query
processes over 4.1 TB but takes less than a minute to execute. The same query
would take hours, if not days, through a serial execution.
Agenda
BigQuery

Cloud Dataflow

Cloud Dataprep

Cloud Dataproc

Demo

Let’s learn a little bit about Dataflow.

Use Cloud Dataflow to execute a wide variety of
data processing patterns

● Serverless, fully managed data processing

Cloud Dataflow
● Batch and stream processing with autoscale

● Open source programming using

● Intelligently scale to millions of QPS

Dataflow is a managed service for executing a wide variety of data processing

patterns. It’s essentially a fully managed service for transforming and enriching data in
stream and batch modes with equal reliability and expressiveness. With Dataflow, a
lot of the complexity of infrastructure setup and maintenance is handled for you. It’s
built on Google Cloud infrastructure and autoscales to meet the demands of your data
pipelines, allowing it to intelligently scale to millions of queries per second.

Dataflow supports fast, simplified pipeline development via expressive SQL, Java,
and Python APIs in the Apache Beam SDK, which provides a rich set of windowing
and session analysis primitives as well as an ecosystem of source and sink
connectors.
Use Cloud Dataflow to execute a wide variety of
data processing patterns

● Serverless, fully managed data processing

Cloud Dataflow
● Batch and stream processing with autoscale

● Open source programming using

Stackdriver is now
● Intelligently scale to millions of QPS Google Cloud’s
operations suite

Dataflow is also tightly coupled with other Google Cloud services like Google Cloud’s
operations suite, so you can set up priority alerts and notifications to monitor your
pipeline and the quality of data coming in and out.
Data transformation with Cloud Dataflow

AI Platform

This diagram shows some example uses cases of Dataflow. As I just mentioned,
Dataflow processes stream and batch data. This data could come from other Google
Cloud services like Datastore or Pub/Sub, which is Google’s messaging and
publishing service. The data could also be ingested from third-party services like
Apache Avro and Apache Kafka.

After you transform the data with Dataflow, you can analyze it in BigQuery, AI
Platform, or even Cloud Bigtable. Using Data Studio, you can even build real-time
dashboards for IoT devices.
Agenda
BigQuery

Cloud Dataflow

Cloud Dataprep

Cloud Dataproc

Demo

Let’s learn a little bit about Cloud Dataprep.

Use Cloud Dataprep to visually explore, clean, and
prepare data for analysis and machine learning

● Serverless, works at any scale

Cloud Dataprep
● Suggests ideal data
transformation

● Focus on data analysis

● Integrated partner service

operated by Trifacta

Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and
preparing structured and unstructured data for analysis, reporting, and machine
learning.

Because Cloud Dataprep is serverless and works at any scale, there is no

infrastructure to deploy or manage. Your next ideal data transformation is suggested
and predicted with each UI input, so you don’t have to write code.

With automatic schema, datatype, possible joins, and anomaly detection, you can
skip time-consuming data profiling and focus on data analysis.

Cloud Dataprep is an integrated partner service operated by Trifacta and based on

their industry-leading data preparation solution, Trifacta Wrangler. Google works
closely with Trifacta to provide a seamless user experience that removes the need for
up-front software installation, separate licensing costs, or ongoing operational
overhead. Cloud Dataprep is fully managed and scales on demand to meet your
growing data preparation needs, so you can stay focused on analysis.
Cloud Dataprep architecture

AI
Platform

Here’s an example of a Cloud Dataprep architecture. As you can see, Cloud Dataprep
can be leveraged to prepare raw data from BigQuery, Cloud Storage, or a file upload
before ingesting it onto a transformation pipeline like Cloud Dataflow. The refined data
can then be exported to BigQuery or Cloud Storage for analysis and machine
learning.
Agenda
BigQuery

Cloud Dataflow

Cloud Dataprep

Cloud Dataproc

Demo

Let’s learn a little bit about Cloud Dataproc.

Cloud Dataproc is a service for running Apache
Spark and Apache Hadoop clusters

● Low cost (per-second, preemptible)

Cloud Dataproc
● Super fast to start, scale, and shut down

● Integrated with GCP

● Managed service

● Simple and familiar

Cloud Dataproc is a fast, easy-to-use, fully managed cloud service for running
Apache Spark and Apache Hadoop clusters in a simpler way. You only pay for the
resources you use with per-second billing. If you leverage preemptible instances in
your cluster, you can reduce your costs even further.

Without using Cloud Dataproc, it can take from five to 30 minutes to create Spark and
Hadoop clusters on-premises or through other Infrastructure-as-a-Service providers.
Cloud Dataproc clusters are quick to start, scale, and shut down, with each of these
operations taking 90 seconds or less, on average. This means you can spend less
time waiting for clusters and more hands-on time working with your data.

Cloud Dataproc has built-in integration with other GCP services, such as BigQuery,
Cloud Storage, Cloud Bigtable, Stackdriver Logging, and Stackdriver Monitoring. This
provides you with a complete data platform rather than just a Spark or Hadoop
cluster.

As a managed service, you can create clusters quickly, manage them easily, and save
money by turning clusters off when you don't need them. With less time and money
spent on administration, you can focus on your jobs and your data.

If you’re already using Spark, Hadoop, Pig, or Hive, you don’t even need to learn new
tools or APIs to use Cloud Dataproc. This makes it easy to move existing projects into
Cloud Dataproc without redevelopment
Cloud Dataflow vs. Cloud Dataproc
Dependencies on specific
tools/packages in the Apache
Hadoop/Spark ecosystem?

Yes No

Do you favor a hands-on/DevOps

approach to operations, or a
hands-off/serverless one?

DevOps Serverless

Now, Cloud Dataproc and Cloud Dataflow can both be used for data processing, and
there’s overlap in their batch and streaming capabilities. So, how do you decide which
product is a better fit for your environment?

Well, first, ask yourself whether you have dependencies on specific tools or packages
in the Apache Hadoop or Spark ecosystem. If that’s the case, you’ll obviously want to
use Cloud Dataproc.

If not, ask yourself whether you prefer a hands-on or DevOps approach to operations,
or a hands-off or serverless approach. If you opt for the DevOps approach, you want
to use Cloud Dataproc; otherwise, use Cloud Dataflow.
Demo
Cloud Dataproc

Philipp Maier

Let me show you how to create a Cloud Dataproc cluster, modify the number of
workers in the cluster, and submit a simple Apache Spark job.

[Demo]

That’s how easy it is to create a Cloud Dataproc cluster and submit a job to that
cluster.
Review
Managed Services

In this module, we provided you with an overview of managed services for data
processing in Google Cloud, namely BigQuery, Dataflow, Dataprep, and Dataproc.

Managed services allow you to outsource a lot of the administrative and maintenance
overhead to Google, so you can focus on your workloads, instead of the
infrastructure. Speaking of infrastructure, most of the services that we covered are
serverless. Now, this doesn’t mean that there aren’t any actual servers processing
your data. Serverless means that servers or Compute Engine instances are
obfuscated so that you don’t have to worry about the infrastructure.

Dataproc isn’t a serverless service, because you were able to view and manages the
underlying master and worker instances.
Review
Architecting with Google
Compute Engine

Thank you for taking the “Architecting with Google Compute Engine” course series!

I hope you have a better understanding of the comprehensive and flexible

infrastructure and platform services provided by GCP. I also hope that the demos and
labs made you feel more comfortable with using the different GCP services that we
covered.

Now it’s your turn. Go ahead and apply what you have learned by architecting your
own infrastructure in GCP.

See you next time!

File Module 5 - en - en
No ratings yet
File Module 5 - en - en
16 pages
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
No ratings yet
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
41 pages
11 Managed Services
No ratings yet
11 Managed Services
25 pages
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
No ratings yet
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
44 pages
? What Is Big Data
No ratings yet
? What Is Big Data
14 pages
GCP Fund Module 8 Big Data and Machine Learning in The Cloud Coursera
No ratings yet
GCP Fund Module 8 Big Data and Machine Learning in The Cloud Coursera
38 pages
Google Cloud Guide for Beginners
No ratings yet
Google Cloud Guide for Beginners
35 pages
GCP Data Engineer Curriculum
No ratings yet
GCP Data Engineer Curriculum
7 pages
BigQuery: AI-Ready Data Platform Overview
No ratings yet
BigQuery: AI-Ready Data Platform Overview
11 pages
Module 4
No ratings yet
Module 4
14 pages
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
No ratings yet
Bigquery: Introducing Powerful New Enterprise Data Warehousing Features
6 pages
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
No ratings yet
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
22 pages
Google Cloud Product Flashcards
No ratings yet
Google Cloud Product Flashcards
117 pages
Dataproc - Google Cloud
No ratings yet
Dataproc - Google Cloud
7 pages
Leveraging Data-Fiscloud Google
No ratings yet
Leveraging Data-Fiscloud Google
97 pages
DB For Data Engineering Solution Sheet
No ratings yet
DB For Data Engineering Solution Sheet
2 pages
155928-Turn Big Data
No ratings yet
155928-Turn Big Data
8 pages
Unit 3 (Ii) - CC
No ratings yet
Unit 3 (Ii) - CC
10 pages
Google Cloud Data Science Guide
No ratings yet
Google Cloud Data Science Guide
1 page
Finding Employee SSN in BigQuery Datasets - 05032025
No ratings yet
Finding Employee SSN in BigQuery Datasets - 05032025
2 pages
Cloud Elevate GCP CDL SME Connect2 GCP Data Services V1.1
No ratings yet
Cloud Elevate GCP CDL SME Connect2 GCP Data Services V1.1
31 pages
Seminar - Report Kiran
No ratings yet
Seminar - Report Kiran
14 pages
Script - Google Cloud Infrastructure
No ratings yet
Script - Google Cloud Infrastructure
6 pages
OD 02 PDE Designing Data Processing Systems
No ratings yet
OD 02 PDE Designing Data Processing Systems
67 pages
Google Cloud Data Engineering
No ratings yet
Google Cloud Data Engineering
129 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
Google Cloud Solutions & Labs Guide
No ratings yet
Google Cloud Solutions & Labs Guide
3 pages
Associate Cloud Engineer - Study Notes
No ratings yet
Associate Cloud Engineer - Study Notes
14 pages
HWX Big Data Cloud Ebook
No ratings yet
HWX Big Data Cloud Ebook
9 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Unit 5
No ratings yet
Unit 5
68 pages
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
No ratings yet
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
9 pages
Data Engineering Essentials
No ratings yet
Data Engineering Essentials
61 pages
GCP CDE Services
No ratings yet
GCP CDE Services
2 pages
Google Cloud Platform Services Overview
100% (1)
Google Cloud Platform Services Overview
33 pages
Big Data Course Guide - IIMCal
No ratings yet
Big Data Course Guide - IIMCal
131 pages
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
42 pages
Building Batch Data Pipelines On Google Cloud
No ratings yet
Building Batch Data Pipelines On Google Cloud
18 pages
Big Data As A Service On Google Cloud
No ratings yet
Big Data As A Service On Google Cloud
329 pages
BDA04 GoogleCloud
No ratings yet
BDA04 GoogleCloud
33 pages
GCP Group
No ratings yet
GCP Group
8 pages
Google Cloud Computing Foundations: Data, ML, and AI in Google Cloud
No ratings yet
Google Cloud Computing Foundations: Data, ML, and AI in Google Cloud
15 pages
02 - Big Data Tools Overview
No ratings yet
02 - Big Data Tools Overview
29 pages
Google ML CERN Public PDF
No ratings yet
Google ML CERN Public PDF
39 pages
Cloud Data Management Condensed 9 Pages
No ratings yet
Cloud Data Management Condensed 9 Pages
9 pages
Big Data HW
No ratings yet
Big Data HW
6 pages
Google Cloud Data Platform & Services: Gregor Hohpe
No ratings yet
Google Cloud Data Platform & Services: Gregor Hohpe
35 pages
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
No ratings yet
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
64 pages
Module - 1
No ratings yet
Module - 1
119 pages
Beginning PostgreSQL On The Cloud: Simplifying Database As A Service On Cloud Platforms Baji Shaik Updated 2025
No ratings yet
Beginning PostgreSQL On The Cloud: Simplifying Database As A Service On Cloud Platforms Baji Shaik Updated 2025
107 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
GCP Data Engineer Course Content
No ratings yet
GCP Data Engineer Course Content
7 pages
3.1 - Cloud IAM
No ratings yet
3.1 - Cloud IAM
63 pages
4.1 - Interconnecting Networks
No ratings yet
4.1 - Interconnecting Networks
31 pages
4.3 - Infrastructure Automation
No ratings yet
4.3 - Infrastructure Automation
19 pages
3.2 - Data Storage Services
No ratings yet
3.2 - Data Storage Services
98 pages
3.4 - Resource Monitoring
No ratings yet
3.4 - Resource Monitoring
35 pages
3.3 - Resource Management
No ratings yet
3.3 - Resource Management
23 pages
Kubernetes Workloads & kubectl Guide
No ratings yet
Kubernetes Workloads & kubectl Guide
101 pages
5.3 - Kubernetes Architecture - ILT v1.7
No ratings yet
5.3 - Kubernetes Architecture - ILT v1.7
92 pages
Module 1 Slides
No ratings yet
Module 1 Slides
100 pages
Cloud Load Balancing Essentials
No ratings yet
Cloud Load Balancing Essentials
49 pages
1 - People V Adriano, GR 205228
50% (2)
1 - People V Adriano, GR 205228
1 page
R3 - To Build A Fire
100% (1)
R3 - To Build A Fire
20 pages
Đề Khảo Sát Cuối Kỳ Ii
No ratings yet
Đề Khảo Sát Cuối Kỳ Ii
5 pages
PFC 4197
No ratings yet
PFC 4197
114 pages
Extra-Creamy Scrambled Eggs Recipe - NYT Cooking
No ratings yet
Extra-Creamy Scrambled Eggs Recipe - NYT Cooking
2 pages
Electric Fan
No ratings yet
Electric Fan
1 page
The Sigma Guidelines-Toolkit: Sigma Opportunity and Risk Guide
No ratings yet
The Sigma Guidelines-Toolkit: Sigma Opportunity and Risk Guide
21 pages
Operating System Concepts Test
No ratings yet
Operating System Concepts Test
11 pages
Manual Ventiladores Munters - mfs36-52
No ratings yet
Manual Ventiladores Munters - mfs36-52
39 pages
Evolution of Handwriting Systems
100% (2)
Evolution of Handwriting Systems
38 pages
Super Memory British English Student A2 B1
No ratings yet
Super Memory British English Student A2 B1
6 pages
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
No ratings yet
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
64 pages
The Famished Road
No ratings yet
The Famished Road
91 pages
PeriUrja Company Profile
No ratings yet
PeriUrja Company Profile
10 pages
Prac 7
No ratings yet
Prac 7
7 pages
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
No ratings yet
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
42 pages
Regular Letter 2024 Dulguime Jesus Carl 1
No ratings yet
Regular Letter 2024 Dulguime Jesus Carl 1
2 pages
Writing Section PYQs
No ratings yet
Writing Section PYQs
28 pages
MM-Last Day Assignment
No ratings yet
MM-Last Day Assignment
18 pages
CH 11
No ratings yet
CH 11
21 pages
Power Screw Lift Mechanism Design
No ratings yet
Power Screw Lift Mechanism Design
5 pages
Singer 457U15, U125, U135, U140 Operator's Guide
100% (1)
Singer 457U15, U125, U135, U140 Operator's Guide
8 pages
Marine Crane Failure Analysis
100% (1)
Marine Crane Failure Analysis
27 pages
Whirlpool Schema
No ratings yet
Whirlpool Schema
11 pages
Vernalisation in Details
No ratings yet
Vernalisation in Details
3 pages
111, NDT Brochure
No ratings yet
111, NDT Brochure
4 pages
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
No ratings yet
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
416 pages
Mini-Vert Brochure
No ratings yet
Mini-Vert Brochure
4 pages
Insurance Industry Career
No ratings yet
Insurance Industry Career
6 pages
Mtn66060008-Usermanual 2
No ratings yet
Mtn66060008-Usermanual 2
46 pages

4.4 - Managed Services

Uploaded by

4.4 - Managed Services

Uploaded by

Managed Services

In the last module, we discussed how to automate the creation of infrastructure. As an

Let’s start by talking about BigQuery.

● Petabyte scale BigQuery

● Free usage tier

BigQuery is GCP’s serverless, highly scalable, and cost-effective cloud data

SELECT language, SUM(views) as views

Query 100 billion rows in less than 1 minute!

Let’s learn a little bit about Dataflow.

● Serverless, fully managed data processing

● Open source programming using

● Intelligently scale to millions of QPS

Dataflow is a managed service for executing a wide variety of data processing

● Serverless, fully managed data processing

● Open source programming using

Let’s learn a little bit about Cloud Dataprep.

● Serverless, works at any scale

● Focus on data analysis

● Integrated partner service

Because Cloud Dataprep is serverless and works at any scale, there is no

Cloud Dataprep is an integrated partner service operated by Trifacta and based on

Let’s learn a little bit about Cloud Dataproc.

● Low cost (per-second, preemptible)

● Integrated with GCP

● Simple and familiar

Do you favor a hands-on/DevOps

I hope you have a better understanding of the comprehensive and flexible

See you next time!

You might also like