Google Data Analytics Coursera
(8 courses)
Data Analysis process:
1. Ask: business challenge, objective, or question
2. Prepare: data generation, collection, storage, and data management
3. Process: data cleaning and data integrity
4. Analyze: data exploration, visualization, and analysis
5. Share: communicating and interpreting results
6. Act: putting insights to work to solve the problem
COURSE 1: FOUNDATION: DATA, DATA EVERYWHERE
- What you will learn:
Real-life roles and responsibilities of a junior data analyst
How businesses transform data into actionable insights
Spreadsheet basics
Database and query basics
Data visualization basics
- Skill sets you will build:
Using data in everyday life
Thinking analytically
Applying tools from the data analytics toolkit
Showing trends and patterns with data visualizations
Ensuring your data analysis is fair
Module 1: Introducing Data Analytics and Analytical Thinking
Glossary Terms (terms & definitions for Course 1, module 1):
Analytical skills: Qualities and characteristics associated with using facts to solve
problems
Analytical thinking: The process of identifying and defining a problem, then solving it by
using data in an organized, step-by-step manner
Context: The condition in which something exists or happens
Data: A collection of facts
Data analysis: The collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analyst: Someone who collects, transforms, and organizes data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analytics: The science of data
Data design: How information is organized
Data-driven decision-making: Using facts to guide business strategy
Data ecosystem: The various elements that interact with one another in order to produce,
manage, store, organize, analyze, and share data
Data science: A field of study that uses raw data to create new ways of modeling and
understanding the unknown
Data strategy: The management of the people, processes, and tools used in data analysis
Data visualization: The graphical representation of data
Dataset: A collection of data that can be manipulated or analyzed as one unit
Gap analysis: A method for examining and evaluating the current state of a process in
order to identify opportunities for improvement in the future
Root cause: The reason why a problem occurs
Technical mindset: The ability to break things down into smaller steps or pieces and work
with them in an orderly and logical way
Visualization: (Refer to data visualization)
Module 2: The Wonderful World of Data
Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet
applications you will probably use a lot in your future role as a data analyst are Microsoft
Excel and Google Sheets.
Spreadsheets structure data in a meaningful way by letting you
Collect, store, organize, and sort information
Identify patterns and piece the data together in a way that works for each specific
data project
Create excellent data visualizations, like graphs and charts
Databases and query languages
A database is a collection of structured data stored in a computer system. Some popular
Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and
BigQuery.
Query languages
Allow analysts to isolate specific information from a database(s)
Make it easier for you to learn and understand the requests made to databases
Allow analysts to select, create, add, or download data from a database for analysis
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and
more. Two popular visualization tools are Tableau and Looker.
These tools
Turn complex numbers into a story that people can understand
Help stakeholders come up with conclusions that lead to informed decisions and
effective business strategies
Have multiple features
- Tableau's simple drag-and-drop feature lets users create interactive graphs in
dashboards and
worksheets
- Looker communicates directly with a database, allowing you to connect your data
right to the visual
tool you choose
Module 3: Set Up Your Data Analytics Toolbox
Example of SQL query (with multiple columns and multiple fields):
Resources to learn more:
- SQL Tutorial: https://www.w3schools.com/sql/default.asp
- SQL Cheat Sheet: https://www.sqltutorial.org/sql-cheat-sheet/
- Tableau Tutorial: https://public.tableau.com/app/learn/how-to-videos
- RStudio Learning: https://posit.co/
- RStudio Cheat Sheets: https://posit.co/resources/cheatsheets/
- RStudio: https://posit.cloud/learn/recipes
- Excel Video Training: https://support.microsoft.com/en-us/office/excel-video-training-
9bc05390-e94c-46af-a5b3-d7c22f6990bb
Module 4: Become a Fair and Impactful Data Professional
Decoding the job description
The data analyst role is one of many job titles that contain the word “analyst.”
To name a few others that sound similar but may not be the same role:
Business analyst—analyzes data to help businesses improve processes, products, or
services
Data analytics consultant—analyzes the systems and models for using data
Data engineer—prepares and integrates data from different sources for analytical use
Data scientist—uses expert skills in technology and social science to find trends
through data analysis
Data specialist—organizes or converts data for use in databases or software systems
Operations analyst—analyzes data to assess the performance of business operations
and workflows
Data analysts, data scientists, and data specialists sound very similar but focus on different
tasks. As you start to browse job listings online, you might notice that companies’ job
descriptions seem to combine these roles or look for candidates who may have overlapping
skills. The fact that companies often blur the lines between them means that you should
take special care when reading the job descriptions and the skills required.
The table below illustrates some of the overlap and distinctions between them:
Job specializations by industry
We learned that the data specialist role concentrates on in-depth knowledge of databases.
In similar fashion, other specialist roles for data analysts can focus on in-depth knowledge
of specific industries. For example, in a job as a business analyst you might wear some
different hats than in a more general position as a data analyst. As a business analyst, you
would likely collaborate with managers, share your data findings, and maybe explain how a
small change in the company’s project management system could save the company 3%
each quarter. Although you would still be working with data all the time, you would focus on
using the data to improve business operations, efficiencies, or the bottom line.
Other industry-specific specialist positions that you might come across in your data analyst
job search include:
Marketing analyst—analyzes market conditions to assess the potential sales of
products and services
HR/payroll analyst—analyzes payroll data for inefficiencies and errors
Financial analyst—analyzes financial status by collecting, monitoring, and reviewing
data
Risk analyst—analyzes financial documents, economic conditions, and client data to
help companies determine the level of risk involved in making a particular business
decision
Healthcare analyst—analyzes medical data to improve the business aspect of
hospitals and medical facilities
COURSE 2: ASK QUESTIONS TO MAKE DATA-DRIVEN DECISIONS
- What you will learn:
How data analysts solve problems with data
The use of analytics for making data-driven decisions
Spreadsheet formulas and functions
Dashboard basics, including an introduction to Tableau
Data reporting basics
- Skill sets you will build:
Asking SMART and effective questions
Structuring how you think
Summarizing data
Putting things into context
Managing team and stakeholder expectations
Problem-solving and conflict-resolution
Module 1: Ask Effective Questions
Glossary terms (terms and definitions for Course 2, Module 1)
Action-oriented question: A question whose answers lead to change
Cloud: A place to keep data online, rather than a computer hard drive
Data analysis process: The six phases of ask, prepare, process, analyze, share, and act
whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include plan,
capture, manage, analyze, archive, and destroy
Leading question: A question that steers people toward a certain response
Measurable question: A question whose answers can be quantified and assessed
Problem types: The various problems that data analysts encounter, including categorizing
things, discovering connections, finding patterns, identifying themes, making predictions,
and spotting something unusual
Relevant question: A question that has significance to the problem to be solved
SMART methodology: A tool for determining a question’s effectiveness based on whether
it is specific, measurable, action-oriented, relevant, and time-bound
Specific question: A question that is simple, significant, and focused on a single topic or a
few closely related ideas
Structured thinking: The process of recognizing the current problem or situation,
organizing available information, revealing gaps and opportunities, and identifying options
Time-bound question: A question that specifies a timeframe to be studied
Unfair question: A question that makes assumptions or is difficult to answer honestly
Module 2: Make Data-driven Decisions
Module 3: Spreadsheets Magic
Glossary terms (terms and definitions for Course 2, Module 3)
AVERAGE: A spreadsheet function that returns an average of the values from a selected
range
Borders: Lines that can be added around two or more cells on a spreadsheet
Cell reference: A cell or a range of cells in a worksheet typically used in formulas and
functions
COUNT: A spreadsheet function that counts the number of cells in a range that meet a
specific criteria
Equation: A calculation that involves addition, subtraction, multiplication, or division (also
called a math expression)
Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell that can be
dragged through neighboring cells in order to continue an instruction
Filtering: The process of showing only the data that meets a specified criteria while hiding
the rest
Header: The first row in a spreadsheet that labels the type of data in each column
Math expression: A calculation that involves addition, subtraction, multiplication, or
division (also called an equation)
Math function: A function that is used as part of a mathematical formula
MAX: A spreadsheet function that returns the largest numeric value from a range of cells
MIN: A spreadsheet function that returns the smallest numeric value from a range of cells
Open data: Data that is available to the public
Operator: A symbol that names the operation or calculation to be performed
Order of operations: Using parentheses to group together spreadsheet values in order to
clarify the order in which operations should be performed
Problem domain: The area of analysis that encompasses every activity affecting or
affected by a problem
Range: A collection of two or more cells in a spreadsheet
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and profit to
evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a
project
Sorting: The process of arranging data into a meaningful order to make it easier to
understand, analyze, and visualize
SUM: A spreadsheet function that adds the values of a selected range of cells
Module 4: Always Remember the Stakeholders
Course 3: Prepare Data for Exploration
- What you will learn:
How data is generated
Features of different data types, fields, and values
Database structures
The function of metadata in data analytics
Structured Query Language (SQL) functions
- Skill sets you will build:
Ensuring ethical data analysis practices
Addressing issues of bias and credibility
Accessing databases and importing data
Writing simple queries
Organizing and protecting data
Connecting with the data community (optional)
Module 1: Data Types and Structures
Module 2: Data Responsibility
Module 3: Database Essentials
Module 4: Organize and Protect Data
Module 5: Engage in Data Community
Course 4: Process Data from Dirty to Clean
- What you will learn:
Data integrity and the importance of clean data
The tools and processes used by data analysts to clean data
Data-cleaning verification and reports
Statistics, hypothesis testing, and margin of error
Resume building and interpretation of job postings (optional)
- Skill sets you will build:
Connecting business objectives to data analysis
Identifying clean and dirty data
Cleaning small datasets using spreadsheet tools
Cleaning large datasets by writing SQL queries
Documenting data-cleaning processes
Course 5: Analyze Data to Answer Questions
- What you will learn:
Steps data analysts take to organize data
How to combine data from multiple sources
Spreadsheet calculations and pivot tables
SQL calculations
Temporary tables
Data validation
- Skill sets you will build:
Sorting data in spreadsheets and by writing SQL queries
Filtering data in spreadsheets and by writing SQL queries
Converting data
Formatting data
Substantiating data analysis processes
Seeking feedback and support from others during data analysis
Objective
The objective of this query is to aggregate the data into a table containing each
warehouse's ID, state and alias, and number of orders; as well as the grand total of orders
for all warehouses combined; and finally a column that classifies each warehouse by the
percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%.
Note: This activity breaks out the steps into manageable chunks. The final query is only
intended to be run at the end. If you try to run the query before reaching the end of this
guide you will likely get an error.
Example: Combine and alias the tables
As a refresher, aliasing is when you temporarily name a table or column in your query to
make it easier to read and write. To alias the warehouse and orders tables and join the
tables, follow these steps. Remember, these statements require that you enter your unique
individual project name or else they won't run. Be sure to substitute your project name in
the code wherever you encounter your-project written. If you haven't explicitly assigned a
project name, BigQuery generates one for you automatically. It typically looks like two
words and a number, each separated by a hyphen, for example august-west-100777.
Begin with the FROM statement a few rows down. Later, you'll return to the top of the
query to fill it in.
1. In row 3, enter FROM your-project.warehouse_orders.warehouse AS
Warehouse
2. In row 4, enter LEFT JOIN your-project.warehouse_orders.orders AS Orders
3. In row 5, enter ON Orders.warehouse_id = Warehouse.warehouse_id
These statements will combine the two tables (warehouse and orders) using
warehouse_id as the common key (the column shared by both tables).
Example: Organize your new table
Use the GROUP BY clause in SQL to group rows that have the same values in specified
columns into aggregated data, such as sum, count, average, maximum, or minimum, based
on the values in another column. This operation is particularly useful in databases where
there is a need to analyze data based on certain criteria.
1. In row 6, enter GROUP BY
2. In row 7, enter Warehouse.warehouse_id,
3. In row 8, enter warehouse_name
Here, the combined table is grouped first by the warehouse ID and then by its name.
Example: Build subquery logic
Now that you have the FROM statement and JOIN, go back up to the first lines and define
the rows to select and operations to perform on them. From the objective, you know you
want to return five columns: each warehouse's ID (warehouse_id—column 1), state and
alias (this info will be combined into a single column: warehouse_name— column 2), and
number of orders (number_of_orders—column 3); as well as the grand total of orders for
all warehouses combined (total_orders—column 4); and finally a column that classifies
each warehouse by the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or
> 60% (fulfillment_summary—column 5).
Above everything you've written so far, write:
1. In row 1, enter SELECT
2. In row 2, enter Warehouse.warehouse_id, # (This is the first column.)
3. In row 3, enter CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias)
AS warehouse_name, # (This is the second column. Notice you're concatenating
two existing columns into a new one)
4. In row 4, enter COUNT(Orders.order_id) AS number_of_orders, # (This is the
third column.)
5. In row 5, enter (SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) AS total_orders, # (This is the
fourth column.)
To create the final column, you'll need to use a special keyword.
Example: Create categories using CASE
Use the CASE keyword in SQL to create categories or group data based on specific
conditions. This is valuable when dealing with numerical or textual data that needs to be
segmented into different groups or categories for analysis, reporting, or visualization
purposes.
For the final column, you'll use CASE to define which label to apply to each warehouse's
fulfillment percentage (the percentage of the grand total of orders that it fulfilled). There
will be three conditions, and thus three possible labels: "Fulfilled 0–20% of Orders",
"Fulfilled 21–60% of Orders", or "Fulfilled more than 60% of Orders".
1. In row 6, enter CASE
2. In row 7, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.20 # (This defines the first
possible condition.)
3. In row 8, enter THEN 'Fulfilled 0-20% of Orders' # (THEN defines the label to
apply when the first condition is true.)
4. In row 9, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) > 0.20 # (This is the first part of
the second condition.)
5. In row 10, enter AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.60 # (This is the second
part of the second condition.)
6. In row 11, enter THEN 'Fulfilled 21-60% of Orders' # (This defines the label to
apply when the second condition is true.)
7. In row 12, enter ELSE 'Fulfilled more than 60% of Orders' # (This defines the
label to apply when neither of the first two conditions is true.)
8. In row 13, enter END AS fulfillment_summary # (The END keyword terminates the
CASE declaration. Then the AS keyword indicates what the resulting column should
be named.)
Example: Filter using HAVING
Use the HAVING clause in SQL in combination with the GROUP BY clause to filter the
results of aggregate functions in a query. While the WHERE clause filters individual rows
before they are grouped, the HAVING clause filters groups of rows after they have been
grouped. To filter out the warehouses that are currently being built (and therefore have no
orders), enter the following lines below everything you've written so far:
1. In row 20, enter HAVING
2. In row 21, enter COUNT(Orders.order_id) > 0
Here is the final query:
SELECT
Warehouse.warehouse_id,
CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
COUNT(Orders.order_id) AS number_of_orders,
(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total
_orders,
CASE
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.20
THEN 'Fulfilled 0-20% of Orders'
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) > 0.20
AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.60
THEN 'Fulfilled 21-60% of Orders'
ELSE 'Fulfilled more than 60% of Orders'
END AS fulfillment_summary
FROM your-project.warehouse_orders.warehouse AS Warehouse
LEFT JOIN your-project.warehouse_orders.orders AS Orders
ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
Warehouse.warehouse_id,
warehouse_name
HAVING
COUNT(Orders.order_id) > 0
Course 6: Share Data Throughout the Art of Visualization
- What you will learn:
Design thinking
How data analysts use visualizations to communicate about data
The benefits of Tableau for presenting data analysis findings
Data-driven storytelling
Dashboards and dashboard filters
Strategies for creating an effective data presentation
- Skill sets you will build:
Creating visualizations and dashboards in Tableau
Addressing accessibility issues when communicating about data
Understanding the purpose of different business communication tools
Telling a data-driven story
Presenting to others about data
Answering questions about data
Course 7: Data Analysis with R Programming
- What you will learn:
Steps data analysts take to organize data
How to combine data from multiple sources
Spreadsheet calculations and pivot tables
SQL calculations
Temporary tables
Data validation
- Skill sets you will build:
Sorting data in spreadsheets and by writing SQL queries
Filtering data in spreadsheets and by writing SQL queries
Converting data
Formatting data
Substantiating data analysis processes
Seeking feedback and support from others during data analysis
Course 8: Data Analytics Capstone
- What you will learn:
How a data analytics portfolio distinguishes you from other candidates
Practical, real-world problem-solving
Strategies for extracting insights from data
Clear presentation of data findings
Motivation and ability to take initiative
- Skill sets you will build:
Building a portfolio
Increasing your employability
Showcasing your data analytics knowledge, skill, and technical expertise
Sharing your work during an interview
Communicating your unique value proposition to a potential employer