Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views21 pages

Analyze Data To Answer Questions Mod 3

google data

Uploaded by

studynandu17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views21 pages

Analyze Data To Answer Questions Mod 3

google data

Uploaded by

studynandu17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Analyze Data to Answer Questions >Module 3

VLOOKUP Core Concepts


Spreadsheet functions can be used to quickly find information and perform calculations
using specific values. VLOOKUP, or Vertical Lookup, is one such function that vertically
searches for a certain value in a column to return a corresponding piece of information. In
this reading, you’ll examine the intricacies of this extremely useful function so you
understand how it works when you use it to analyze data.

VLOOKUP functionality

VLOOKUP searches for a search term, called a search_key, in one column of a spreadsheet.
When the search_key is found, the function returns the data from another column of the row
from which it was located. VLOOKUP returns only the value that corresponds to the first item
it matches. So, if there are multiple matching values, the spreadsheet will return only data
about the first one.

VLOOKUP use cases

Here are two common reasons why you might use VLOOKUP:

 Populating data in a spreadsheet. Perhaps a store manager is tracking incoming


shipments before a busy holiday. They could use VLOOKUP to look up product ID codes
in a product spreadsheet and retrieve the corresponding product information from
another spreadsheet. This would help the manager know how many stock clerks they
need to schedule to work when the shipments arrive.

 Merging data from one spreadsheet with data in another. If a teacher keeps
one spreadsheet for student grades and another for attendance, they could use
VLOOKUP to combine the spreadsheets. That way, they could search for a particular
student in the attendance sheet, and VLOOKUP would pull the corresponding
attendance record into the grades spreadsheet.

VLOOKUP syntax

VLOOKUP is available in both Microsoft Excel and Google Sheets. Here, you’ll explore its
syntax in Google Sheets. Refer to the resources at the end of this reading for more
information about VLOOKUP in Microsoft Excel.

VLOOKUP’s syntax is:

The following sections explain each of the four parts of the syntax.

search_key

This is the value the function will search for. It can be a number, text string, or cell
reference.

range
This is the range of cells over which the function will search and return information. The first
column in the range is searched. When the search key is found, the index from that row is
returned.

For example, if you search for the search_key in column B and return the data from column
D, the range would need to include columns B through D, such as the range B2:D10. If you
specified a range of A2:D10, the function would search for the search term in column A.

The search_key must be to the left of the information you want the function to return. This
may require you to move columns around before you use VLOOKUP. For example, if you plan
to search for the search_key column D, but the information you want the function to return
is in column A, you must rearrange your columns before using VLOOKUP.

index

This is the position of the column that contains the data to be returned. The first column in
the range is column number 1, and each column is numbered sequentially to the right.

For example, if the range is B2:D10 and you want to return a value from column D, the index
number would be 3. If the index is not between 1 and the number of columns in range, the
error message #VALUE! will be returned.

is_sorted

This indicates whether to return an approximate or exact match. For example, if you’re
searching for Google, then google would not count as a match.

 To return an exact match, set is_sorted to FALSE. This is recommended.

 To return an approximate match, set is_sorted to TRUE. The nearest match (less than
or equal to the search_key) is returned. To use this option to obtain accurate results,
you must sort your data in ascending order. But, you could still find a value.

 If neither TRUE nor FALSE are selected, the function will default to TRUE.

The #N/A error

#N/A indicates that a matching value can't be returned because no matches were found.

Key takeaways

Use VLOOKUP to search for a value in a column and return a corresponding piece of
information. It’s a very useful tool for data professionals, as it enables them to combine data
from multiple sources and find information quickly. Keep in mind that the column that
matches the search_key in a VLOOKUP formula should be on the left side of the data. The
range must include both the column being searched and the column that contains the
information being returned. TRUE means an approximate match, and FALSE means an exact
match on the search_key.

VLOOKUP resources for Microsoft Excel

VLOOKUP may slightly differ in Microsoft Excel, but the overall concepts can still be generally
applied. Refer to the following resources if you’re working with Excel:

 How to use VLOOKUP in Excel

 : This tutorial includes a video to help you get a general understanding of how the
VLOOKUP function works in Excel, as well as practical examples to look through.
 VLOOKUP in Excel tutorial

 : Follow along in this video lesson and learn how to write a VLOOKUP formula in Excel and
master time-saving useful tips and tricks.

 23 things you should know about VLOOKUP in Excel

 : Explore this list of VLOOKUP facts, common challenges and their solutions.

 How to use Excel's VLOOKUP function

 : This article shares a specific example about applying VLOOKUP in your searches.

 VLOOKUP in Excel vs Google Sheets

 : This guide offers a VLOOKUP comparison of Excel and Google Sheets.

Secret identities: The importance of aliases


In this reading, you will learn about using aliasing to simplify your SQL queries. Aliases are
used in SQL queries to create temporary names for a column or table. Aliases make
referencing tables and columns in your SQL queries much simpler when you have table or
column names that are too long or complex to make use of in queries. Imagine a table name
like special_projects_customer_negotiation_mileages. That would be difficult to re-enter
every time you use that table. With an alias, you can create a meaningful nickname that you
can use for your analysis. In this case special_projects_customer_negotiation_mileages can
be aliased to simply mileage. Instead of having to write out the long table name, you can
use a meaningful nickname that you decide.

Basic syntax for aliasing

Aliasing is the process of using aliases. In SQL queries, aliases are implemented by making
use of the AS command. The basic syntax for the AS command can be seen in the following
query for aliasing a table:

Notice that AS is preceded by the table name and followed by the new nickname. It is a
similar approach to aliasing a column:

In both cases, you now have a new name that you can use to refer to the column or table
that was aliased.

Alternate syntax for aliases

If using AS results in an error when running a query because the SQL database you are
working with doesn't support it, you can leave it out. In the previous examples, the alternate
syntax for aliasing a table or column would be:

 FROM table_name alias_name

 SELECT column_name alias_name


The key takeaway is that queries can run with or without using AS for aliasing, but using AS
has the benefit of making queries more readable. It helps to make aliases stand out more
clearly.

Aliasing in action

Let’s check out an example of a SQL query that uses aliasing. Let’s say that you are working
with two tables: one of them has employee data and the other one has department data.
The FROM statement to alias those tables could be:

These aliases still let you know exactly what is in these tables, but now you don’t have to
manually input those long table names. Aliases can be really helpful for long, complicated
queries. It is easier to read and write your queries when you have aliases that tell you what
is included within your tables.

For more information

If you are interested in learning more about aliasing, here are some resources to help you
get started:

 SQL Aliases

 : This tutorial on aliasing is a really useful resource to have when you start practicing
writing queries and aliasing tables on your own. It also demonstrates how aliasing works
with real tables.

 SQL Alias

 : This detailed introduction to aliasing includes multiple examples. This is another great
resource to reference if you need more examples.

 Using Column Aliasing

 : This is a guide that focuses on column aliasing specifically. Generally, you will be
aliasing entire tables, but if you find yourself needing to alias just a column, this is a
great resource to have bookmarked.

Use JOINs effectively


In this reading, you will review how JOINs are used and will be introduced to some resources
that you can use to learn more about them. A JOIN combines tables by using a primary or
foreign key to align the information coming from both tables in the combination process.
JOINs use these keys to identify relationships and corresponding values across tables.

If you need a refresher on primary and foreign keys, refer to the glossary

for this course, or go back to Databases in data analytics

The general JOIN syntax


As you can see from the syntax, the JOIN statement is part of the FROM clause of the query.
JOIN in SQL indicates that you are going to combine data from two tables. ON in SQL
identifies how the tables are to be matched for the correct information to be combined from
both.

Type of JOINs

There are four general ways in which to conduct JOINs in SQL queries: INNER, LEFT, RIGHT,
and FULL OUTER.

The circles represent left and right tables, and where they are joined is highlighted in blue

Here is what these different JOIN queries do.

INNER JOIN
INNER is optional in this SQL query because it is the default as well as the most commonly
used JOIN operation. You may see this as JOIN only. INNER JOIN returns records if the data
lives in both tables. For example, if you use INNER JOIN for the customers and orders tables
and match the data using the customer_id key, you would combine the data for each
customer_id that exists in both tables. If a customer_id exists in the customers table but not
the orders table, data for that customer_id isn’t joined or returned by the query.

The results from the query might look like the following, where customer_name is from the
customers table and product_id and ship_date are from the orders table:

customer_name product_ ship_dat


id e

Martin's Ice Cream 043998 2021-02-


23

Beachside Treats 872012 2021-02-


25

Mona's Natural 724956 2021-02-


Flavors 28

... etc. ... etc. ... etc.

The data from both tables was joined together by matching the customer_id common to
both tables. Notice that customer_id doesn’t show up in the query results. It is simply used
to establish the relationship between the data in the two tables so the data can be joined
and returned.

LEFT JOIN

You may see this as LEFT OUTER JOIN, but most users prefer LEFT JOIN. Both are correct
syntax. LEFT JOIN returns all the records from the left table and only the matching records
from the right table. Use LEFT JOIN whenever you need the data from the entire first table
and values from the second table, if they exist. For example, in the query below, LEFT JOIN
will return customer_name with the corresponding sales_rep, if it is available. If there is a
customer who did not interact with a sales representative, that customer would still show up
in the query results but with a NULL value for sales_rep.
The results from the query might look like the following where customer_name is from the
customers table and sales_rep is from the sales table. Again, the data from both tables was
joined together by matching the customer_id common to both tables even though
customer_id wasn't returned in the query results.

customer_name sales_re
p

Martin's Ice Cream Luis


Reyes

Beachside Treats NULL

Mona's Natural Geri Hall


Flavors

...etc. ...etc.

RIGHT JOIN

You may see this as RIGHT OUTER JOIN or RIGHT JOIN. RIGHT JOIN returns all records from
the right table and the corresponding records from the left table. Practically speaking, RIGHT
JOIN is rarely used. Most people simply switch the tables and stick with LEFT JOIN. But using
the previous example for LEFT JOIN, the query using RIGHT JOIN would look like the
following:

The query results are the same as the previous LEFT JOIN example.

customer_name sales_re
p

Martin's Ice Cream Luis


Reyes
Beachside Treats NULL

Mona's Natural Geri Hall


Flavors

...etc. ...etc.

FULL OUTER JOIN

You may sometimes see this as FULL JOIN. FULL OUTER JOIN returns all records from the
specified tables. You can combine tables this way, but remember that this can potentially be
a large data pull as a result. FULL OUTER JOIN returns all records from both tables even if
data isn’t populated in one of the tables. For example, in the query below, you will get all
customers and their products’ shipping dates. Because you are using a FULL OUTER JOIN,
you may get customers returned without corresponding shipping dates or shipping dates
without corresponding customers. A NULL value is returned if corresponding data doesn’t
exist in either table.

The results from the query might look like the following.

customer_name ship_dat
e

Martin's Ice Cream 2021-02-


23

Beachside Treats 2021-02-


25

NULL 2021-02-
25

The Daily Scoop NULL

Mountain Ice NULL


Cream

Mona's Natural 2021-02-


Flavors 28

...etc. ...etc.

For more information


JOINs are going to be useful for working with relational databases and SQL—and you will
have plenty of opportunities to practice them on your own. Here are a few other resources
that can give you more information about JOINs and how to use them:

 SQL JOINs

 : This is a good basic explanation of JOINs with examples. If you need a quick reminder of
what the different JOINs do, this is a great resource to bookmark and come back to later.

 Database JOINs - Introduction to JOIN Types and Concepts

 : This is a really thorough introduction to JOINs. Not only does this article explain what
JOINs are and how to use them, but it also explains the various scenarios in more detail of
when and why you would use the different JOINs. This is a great resource if you are
interested in learning more about the logic behind JOINing.

 SQL JOIN Types Explained in Visuals

 : This resource has a visual representation of the different JOINs. This is a really useful
way to think about JOINs if you are a visual learner, and it can be a really useful way to
remember the different JOINs.

 SQL JOINs: Bringing Data Together One Join at a Time

 : Not only does this resource have a detailed explanation of JOINs with examples, but it
also provides example data that you can use to follow along with their step-by-step guide.
This is a useful way to practice JOINs with some real data.

 SQL JOIN:

 This is another resource that provides a clear explanation of JOINs and uses examples
to demonstrate how they work. The examples also combine JOINs with aliasing. This is
a great opportunity to see how JOINs can be combined with other SQL concepts that
you have been learning about in this course.

Step-by-Step: Queries within queries


This reading outlines the steps the instructor performs in the next video, Queries within
queries

. In the video, the instructor introduces subqueries, another type of SQL query, and
demonstrates how to use them to build more complex queries.

Keep this step-by-step guide open as you watch the video. It can serve as a helpful reference
tool if you need additional context or clarification while following the video steps. This is not
a graded activity, but you can complete these steps to practice the skills demonstrated in
the video.

What you will need

In order to follow along with the instructor, you will need to log in to your BigQuery account
and access the public database called new_york. To find this dataset, scroll through the
datasets in the bigquery-public-data project. From this database, you will use the tables
called citibike_stations and citibike_trips.

Important!
BigQuery has two different databases that contain very similar information: new_york is one
database and new_york_citibike is another. Both of these databases contain tables called
citibike_stations and citibike_trips. However, these tables are not exactly the same between
both databases. This step-by-step uses the new_york database. You will need to scroll to find
this dataset under the bigquery-public-data project in the Explorer pane; it does not come up
in a search.

Further, as with many of the public databases in BigQuery, these tables are regularly
updated, so if you find that your results do not exactly match the results in the video, this is
one probable explanation why.

Example 1: Use a subquery in a SELECT statement

In this query, you will compare the number of bikes available at a particular station to the
overall average number of bikes available at all stations.

Type or copy and paste the following query into a new BigQuery query window.

In this example, the subquery was used in a SELECT statement. The outer SELECT statement
(beginning on line 1) lists column names to be retrieved from the citibike_stations table. The
inner SELECT statement (beginning on line 4) is the subquery, which is used to make a new
column that is not already available in the table.

Notice that in the video the SELECT statement in lines 4–6 was written first. This is a
subquery to calculate the average of the num_bikes_available column and alias the results
as a new column in the results called avg_num_bikes_available. The subquery is enclosed in
parentheses, which mark it as a subquery.

Then, the whole subquery is incorporated into an outer query. The subquery is indented to
the same level as station_id and num_bikes_available, which are the other columns to be
returned in the query results.

So, the final query should return a table containing columns for station_id,
num_bikes_available, and avg_num_bikes_available. Here is an example output, but
remember, your results might differ due to table updates.

Example 2: Use a subquery in a FROM statement

In this query, you will use the citibike_trips table to calculate the total number of rides that
started at each station and return this as a column called
number_of_rides_starting_at_station along with the columns station_id and name from the
citibike_stations table.

Type or copy and paste the following query into a new BigQuery query window.

Note: The lines tagged with #** differ from the code in the accompanying video. This is due
to changes to the data tables that resulted in a mismatched data type (Int64 & STRING)
between the start_station_id column and the station_id column in the respective tables. To
make them the same datatype, the start_station_id column is converted to STRING using the
CAST keyword.
Here's what's happening in this example. Lines 1–5 are the outer query. They begin with a
SELECT statement followed by the names of the columns you want returned in the final
query results table: station_id, name, and number_of_rides_starting_at_station.

The problem is that the number_of_rides_starting_at_station column doesn't exist in either


table. It must be created. Adding further complication is the fact that the station_id and
name columns exist in the citibike_stations table, while the information needed to create the
number_of_rides_starting_at_station is in the citibike_trips table.

Lines 6–19 solve this problem. First, notice the subquery from lines 6–14. This subquery is
taking the citibike_trips table (line 11) and grouping it by start_station_id (converted to
STRING, lines 12–13).

From that grouped data, it's selecting (line 7) the start_station_id column (converted to
string and aliased as start_station_id_str, line 8) and the COUNT of all rows that begin with
each start_station_id. The count is aliased as a new column called number_of_rides (line 9).
The entire subquery is enclosed in parentheses (lines 6 and 14) and the resulting table is
aliased as station_num_trips (line 15).

station_num_trips is a helper table. By itself, it contains two columns: start_station_id and


number_of_rides. There is one ID for each unique station and the corresponding number of
rides from that station.

All the data in that subquery came from the citibike_trips table. You still need to connect it to
the citibike_stations table. Lines 16–19 make the connection. You INNER JOIN (line 16) the
station_num_trips helper table with the citibike_stations table (line 17) using the station_id
column in the citibike_stations table and the start_station_id_str column in the
station_num_trips helper table as common keys (lines 18–19).

This results in a big table that contains all the columns in the citibike_stations table as well
as the start_station_id and number_of_rides columns from the station_num_trips helper
table. However, you don't need all these columns. You only need three: station_id, name,
and number_of_rides_starting_at_station. These are the columns that are selected in lines 1–
4.

The final query results should contain these three columns, with rows in descending order by
number of rides. Here is an example output, but remember, your results might differ due to
table updates.
Example 3: Use a subquery in a WHERE statement

Finally, you will write a query that returns a table containing two columns: the station_id and
name (from the citibike_stations table) of only those stations that were used by people
classified as subscribers, which is information found in the citibike_trips table.

Type or copy and paste the following query into a new BigQuery query window.

Note: Line 10 (tagged with #**) differs from the code in the accompanying video. This is
due to changes to the data tables that resulted in a mismatched data type (Int64 & STRING)
between the start_station_id column and the station_id column in the respective tables. To
make them the same datatype, the start_station_id column is converted to STRING using the
CAST keyword.

To understand this query, break it into three sections.

Section 1:

The first section begins with lines 8–15. This is the subquery, which is indicated by the
parentheses in lines 8 and 15. This segment takes the citibike_trips table (lines 11–12) and
filters it using the WHERE clause (line 13) so it only contains rows where the usertype
column contains Subscriber as a value (line 14).

From this filtered table, you select the start_station_id, which is converted to string and
aliased as start_station_id_str (lines 9–10).
At this point, you have an intermediary table with just a single column—start_station_id_str
—containing the IDs of every row that had Subscriber in the usertype column of the original
table.

Section 2:

The second section of the query is in lines 4–7. This part uses the information in the
intermediary table from section 1 to filter the citibike_stations table. It begins with the full
citibike_stations table (line 5). Then, it filters this table using a WHERE clause (line 6) so it
only contains rows where the values in its station_id column also are found in the list of
start_station_id_strs that resulted from section 1.

At this point, you now have an intermediary table containing all the columns of the
citibike_stations table, but only the rows of stations that were the starting station of a
subscriber.

Section 3:

The last part is the simplest. You just need to select the relevant columns from the
intermediary table from section 2. This happens in lines 1–4, where you select the station_id
and name columns and add the FROM clause in line 5. Everything you're selecting from
follows, which was explained in sections 1 and 2.

The final query results should contain two columns: station_id and name. Here is an example
output, but remember, your specific results might differ due to table updates.

Step-by-Step: Use subqueries to aggregate data


This reading provides you with the steps the instructor performs in the following video, Use
subqueries to aggregate data

. The video teaches you how to aggregate or combine data using subqueries in SQL.

Keep this step-by-step guide open as you watch the video. It can serve as a helpful reference
tool if you need additional context or clarification while following the video steps. This is not
a graded activity, but you can complete these steps to practice the skills demonstrated in
the video.

What you’ll need


In order to follow along with the instructor, you will need the warehouse_orders dataset
uploaded into your project space. From that dataset, you'll need the warehouse and orders
tables. If you haven’t already uploaded this data, follow the instructions in the Upload the
warehouse dataset to BigQuery

reading.

Objective

The objective of this query is to aggregate the data into a table containing each warehouse's
ID, state and alias, and number of orders; as well as the grand total of orders for all
warehouses combined; and finally a column that classifies each warehouse by the
percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%.

Note: This activity breaks out the steps into manageable chunks. The final query is only
intended to be run at the end. If you try to run the query before reaching the end of this
guide you will likely get an error.

Example: Combine and alias the tables

As a refresher, aliasing is when you temporarily name a table or column in your query to
make it easier to read and write. To alias the warehouse and orders tables and join the
tables, follow these steps. Remember, these statements require that you enter your unique
individual project name or else they won't run. Be sure to substitute your project name in
the code wherever you encounter your-project written. If you haven't explicitly assigned a
project name, BigQuery generates one for you automatically. It typically looks like two words
and a number, each separated by a hyphen, for example august-west-100777.

Begin with the FROM statement a few rows down. Later, you'll return to the top of the query
to fill it in.

1. In row 3, enter FROM your-project.warehouse_orders.warehouse AS Warehouse

2. In row 4, enter LEFT JOIN your-project.warehouse_orders.orders AS Orders

3. In row 5, enter ON Orders.warehouse_id = Warehouse.warehouse_id

These statements will combine the two tables (warehouse and orders) using warehouse_id
as the common key (the column shared by both tables).

Example: Organize your new table

Use the GROUP BY clause in SQL to group rows that have the same values in specified
columns into aggregated data, such as sum, count, average, maximum, or minimum, based
on the values in another column. This operation is particularly useful in databases where
there is a need to analyze data based on certain criteria.

1. In row 6, enter GROUP BY

2. In row 7, enter Warehouse.warehouse_id,

3. In row 8, enter warehouse_name

Here, the combined table is grouped first by the warehouse ID and then by its name.

Example: Build subquery logic

Now that you have the FROM statement and JOIN, go back up to the first lines and define the
rows to select and operations to perform on them. From the objective, you know you want to
return five columns: each warehouse's ID (warehouse_id—column 1), state and alias (this
info will be combined into a single column: warehouse_name— column 2), and number of
orders (number_of_orders—column 3); as well as the grand total of orders for all warehouses
combined (total_orders—column 4); and finally a column that classifies each warehouse by
the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%
(fulfillment_summary—column 5).

Above everything you've written so far, write:

1. In row 1, enter SELECT

2. In row 2, enter Warehouse.warehouse_id, # (This is the first column.)

3. In row 3, enter CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS


warehouse_name, # (This is the second column. Notice you're concatenating two
existing columns into a new one)

4. In row 4, enter COUNT(Orders.order_id) AS number_of_orders, # (This is the third


column.)

5. In row 5, enter (SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS


Orders) AS total_orders, # (This is the fourth column.)

To create the final column, you'll need to use a special keyword.

Example: Create categories using CASE

Use the CASE keyword in SQL to create categories or group data based on specific
conditions. This is valuable when dealing with numerical or textual data that needs to be
segmented into different groups or categories for analysis, reporting, or visualization
purposes.

For the final column, you'll use CASE to define which label to apply to each warehouse's
fulfillment percentage (the percentage of the grand total of orders that it fulfilled). There will
be three conditions, and thus three possible labels: "Fulfilled 0–20% of Orders", "Fulfilled 21–
60% of Orders", or "Fulfilled more than 60% of Orders".

1. In row 6, enter CASE

2. In row 7, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) <= 0.20 # (This defines the first possible
condition.)

3. In row 8, enter THEN 'Fulfilled 0-20% of Orders' # (THEN defines the label to apply
when the first condition is true.)

4. In row 9, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) > 0.20 # (This is the first part of the
second condition.)

5. In row 10, enter AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) <= 0.60 # (This is the second part of the
second condition.)

6. In row 11, enter THEN 'Fulfilled 21-60% of Orders' # (This defines the label to apply
when the second condition is true.)
7. In row 12, enter ELSE 'Fulfilled more than 60% of Orders' # (This defines the label to
apply when neither of the first two conditions is true.)

8. In row 13, enter END AS fulfillment_summary # (The END keyword terminates the
CASE declaration. Then the AS keyword indicates what the resulting column should be
named.)

Example: Filter using HAVING

Use the HAVING clause in SQL in combination with the GROUP BY clause to filter the results
of aggregate functions in a query. While the WHERE clause filters individual rows before they
are grouped, the HAVING clause filters groups of rows after they have been grouped. To filter
out the warehouses that are currently being built (and therefore have no orders), enter the
following lines below everything you've written so far:

1. In row 20, enter HAVING

2. In row 21, enter COUNT(Orders.order_id) > 0

Here is the final query:


SELECT

Warehouse.warehouse_id,

CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,

COUNT(Orders.order_id) AS number_of_orders,

(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total


_orders,

CASE

WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) <= 0.20

THEN 'Fulfilled 0-20% of Orders'

WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) > 0.20

AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-


project.warehouse_orders.orders AS Orders) <= 0.60

THEN 'Fulfilled 21-60% of Orders'

ELSE 'Fulfilled more than 60% of Orders'

END AS fulfillment_summary

FROM your-project.warehouse_orders.warehouse AS Warehouse

LEFT JOIN your-project.warehouse_orders.orders AS Orders

ON Orders.warehouse_id = Warehouse.warehouse_id

GROUP BY

Warehouse.warehouse_id,

warehouse_name

HAVING

COUNT(Orders.order_id) > 0

Example: Run the new query

It’s important to run the new queries that you write to test that they are working correctly.

1. Select the Run button

2. Now, you can identify what percent of our company’s total orders are being fulfilled by
each warehouse

The query results should be:


SQL functions and subqueries: A functional friendship

As you’ve been learning, SQL functions are tools built into SQL to facilitate performing
calculations. For example, you could use the AVG() function to calculate the average salary
of employees in a table so management knows what to budget for next year. Another
example might be using the COUNT() function to count the number of orders in a table to
track daily order inventory.

A subquery, also called an inner or nested query, is a SQL query that is nested inside a
larger query. Going back to the previous example, you could add a subquery to your average
calculation to identify the names of employees who earn more or less than the average
salary to include that information in performance reviews. Subqueries allow more complex
questions to be answered in a single query, making data retrieval more efficient. In this
reading, you will learn about SQL functions and how they might be used with subqueries.

How do SQL functions function?

SQL functions help make data aggregation possible. As a refresher, data aggregation is the
process of gathering data from multiple sources in order to combine it into a single,
summarized collection. Take a moment to review some of these functions to better
understand how to run these queries:

 HAVING: The HAVING clause filters the results of a SQL query based on conditions
applied after the grouping. Check out W3School’s HAVING overview

 for a tutorial on this clause

 CASE: CASE provides conditional logic in SQL queries, similar to an 'if-else' structure in
programming languages. The W3School’s CASE overview

 explores the use of the CASE statement and how it works.

 IF: IF performs a simple conditional test and returns a value depending on the outcome.
Review W3School’s IF overview

 for a tutorial of the IF function and examples that you can practice with.

 COUNT: COUNT performs a simple conditional test and returns a value depending on the
outcome. Though it seems simple, the COUNT function is just as important as all the rest.
The W3School’s COUNT overview

 provides a tutorial and examples.

Subqueries
Subqueries can make projects easier and more efficient by allowing complex operations to
be performed in a single query, reducing the need for multiple trips to the database.
Subqueries also make your code more readable and maintainable. Take the employee salary
example mentioned before.:The original query was used to find the average employee
salary. By adding a subquery, you can learn this plus identify employees who earn more than
the average—all in a single query.

Usually, you will find subqueries nested in the SELECT, FROM, and/or WHERE clauses. There
is no general syntax for subqueries, but the syntax for a basic subquery follows a similar
pattern:

Basically, there’s another SELECT clause inside the first SELECT clause. The second SELECT
clause marks the start of the subquery in this statement. There are many different ways in
which you can use subqueries, but there are a few rules to follow:

 Subqueries must be enclosed within parentheses.

 A subquery can have one or more columns specified in the SELECT clause.

 Subqueries that return more than one row can only be used with multiple value
operators, such as the IN operator which allows you to specify multiple values in a
WHERE clause.

 A subquery can’t be nested in a SET command. The SET command is used with
UPDATE to specify which columns (and values) are to be updated in a table.

Additional resources

The following resources offer more guidance into subqueries and their usage:

 SQL subqueries

 : This detailed introduction includes the definition of a subquery, its purpose in SQL, when
and how to use it, and what the results will be.

 Writing subqueries in SQL

 : Explore the basics of subqueries in this interactive tutorial, including examples and
practice problems that you can work through.

As you continue to learn more about using SQL, functions, and subqueries, you will realize
how much time you can truly save when memorizing these tips and tricks.
Glossary terms from module 3
Terms and definitions for Course 5, Module 3

Absolute reference: A reference within a function that is locked so that rows and columns
won’t change if the function is copied

Aggregation: The process of collecting or gathering many separate pieces into a whole

Aliasing: Temporarily naming a table or column in a query to make it easier to read and
write

COUNT DISTINCT: A SQL function that only returns the distinct values in a specified range

Data aggregation: The process of gathering data from multiple sources and combining it
into a single, summarized collection

INNER JOIN : A SQL function that returns records with matching values in both tables

JOIN: A SQL function that is used to combine rows from two or more tables based on a
related column

LEFT JOIN: A SQL function that will return all the records from the left table and only the
matching records from the right table

LIMIT: A SQL clause that specifies the maximum number of records returned in a query

MATCH: A spreadsheet function used to locate the position of a specific lookup value

OUTER JOIN: A SQL function that combines RIGHT and LEFT JOIN to return all matching
records in both tables

RIGHT JOIN: A SQL function that will return all records from the right table and only the
matching records from the left.

Subquery: A SQL query that is nested inside a larger query

VALUE: A spreadsheet function that converts a text string that represents a number to a
numeric value

You might also like