Dsc354-Dwbi Lab Manual Sp24 v2.0
Dsc354-Dwbi Lab Manual Sp24 v2.0
CUI
1
List of Labs
Lab # Main Topic Page
#
Lab 01 Set up the Working Environment 4
Lab 02 Data Warehouse Schemas: Star, Snowflake, Fact Constellation 20
Lab 03 Creating views and indexes on Data warehouse 27
Lab 04 Conversion of Entity Relationship Diagram(ERD) to Dimensional Model 33
(DM): Working with Sample database Sakila
Lab 05 Demonstration of ETL Tool: SQL Server Integration services-SSIS 40
(Extraction and Loading)
Lab 06 Demonstration of ETL tool : SQL Server Integration services-SSIS 59
(Transformation)
Lab 07 Creating ROLAP in SQL Server Analysis Services (SSAS) 79
Lab 08 Get Start with Power BI Desktop 96
Lab 09 Mid Term Exam
Lab 10 Preparing Data in Power BI Desktop 127
Lab 11 Transformation using Power BI 142
Lab 12 Data Modeling in Power BI Desktop 169
Lab 13 Using DAX in Power BI Desktop 201
Lab 14 Designing a Report in Power BI Desktop 239
Lab 15 Creating a Power BI Dashboard and Data Analysis 285
Lab 16 Get Started with Tableau Desktop -part1 315
Lab 17 Working with Tableau -Part2 332
Final Term Exam
2
Lab 01
Set up the Working Environment
Objective:
The objective of this lab is to set the development environment for creating data warehouse.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Install SQL SERVER 2019-Enterprise Edition
• Install SQL SERVER Management Studio (SSMS)
• Install and integrate SQL SERVER Data Tool (SSDT)
Instructor Note:
As pre-lab activity “See SQL Server installation guide” by Microsoft Docs.
https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-
ver15
1) Useful information
Microsoft SQL Server
Microsoft SQL Server is a relational database management system (RDBMS) that supports a wide variety
of transaction processing, business intelligence and analytics applications in corporate IT environments.
Microsoft SQL Server is one of the three market-leading database technologies, along with Oracle Database
and IBM's DB2.
SQL Server services, tools and editions
Microsoft also bundles a variety of data management, business intelligence (BI) and analytics tools with
SQL Server. In addition to the R Services and now Machine Learning Services technology that first
appeared in SQL Server 2016, the data analysis offerings include SQL Server Analysis Services, an
analytical engine that processes data for use in BI and data visualization applications, and SQL Server
Reporting Services, which supports the creation and delivery of BI reports.
On the data management side, Microsoft SQL Server includes SQL Server Integration Services, SQL Server
Data Quality Services and SQL Server Master Data Services. Also bundled with the DBMS are two sets of
tools for DBAs and developers: SQL Server Data Tools, for use in developing databases, and SQL Server
Management Studio, for use in deploying, monitoring and managing databases.
3
Activity 2 1 hour (May vary due to Low CLO-5
system and internet speed)
Activity 3 30 minutes (May vary due to Low CLO-5
system and internet speed)
Activity 1:
This activity demonstrate the steps to be followed to install SQL server, SSMS, SDT on the system.
Solution:
https://www.microsoft.com/en-us/sql-server/sql-server-downloads
Once the download is complete, go to the destination folder (i.e. downloads folder on your
computer). The installation file will look something like this:
4
2. Choose the Media Location path. Note the minimum free space and download size and press
Install.
3. Once the SQL Server Installation Center launches choose Installation tab (second from the right).
4. In most cases you will want to run a New SQL Server New SQL Server stand-alone installation, but
other options are available, for example if you have a previous version of SQL Server installed, you
have an option to update.
5
5. On the Product Key page make sure that the selected Edition is “Developer” click Next.
6. On the License Terms page, check the box next to “I accept the license terms” and click Next.
7. Setup will check if needed install Setup Support Files. Click Next when complete.
Feature installation:
• For CS779 in addition to what is listed above please review these descriptions to see
which features you might be interested in for advanced topics for the term project.
• Instance root Directory and Shared Features Directory: Note the paths where SQL
server will install the components (default is Program Files folder within C
drive.)nstance Configuration
6
2. Generally, you can leave the Default Instance and the default Instance ID. The Named
instances would be used if you want to create multiple instances of SQL Server on the same
machine. Click Next when complete.
Server Configuration
7
Figure 1.6: Server Configuration
8
Figure 1.7: Server Configuration
Server Configuration:
• Authentication mode:
• Windows authentication: will only use your windows account privileges to connect to SQL
Server.
• Mixed mode: adds a local SQL system administrator (SA) account IMPORTANT: We
highly recommend using Mixed Mode so that there is an additional built in SA account with
a separate user name and password as well as your built-in windows account in case you
have issues logging in.
• IMPORTANT: Make sure to add users (such as your account) to SQL Server Administrators (click
on Add Current User) if it is not already there.
• These accounts will allow you to log into SQL Server.
• Note that the server itself does not need these accounts and runs as a service which you specified in
previous step.
9
Figure 1.8: Database Engine Configuration
10
Error Reporting, Installation Configuration Rules, & Ready to Install
11
conveniences and capabilities.
It is required that you install the Management Tools Complete for all courses.
Activity 2:
This activity demonstrate the steps to be followed to install SQL Server Management Studio (SSMS) on
the system.
Solution:
You will be brought to a web page to download the latest release of SQL Server Management Studio. Click
on the link to download the latest release and save the file to a location you can remember.
12
Figure 1.12: SSMS Installation step one
Click the “Install” button to begin. A progress screen will appear similar to the following.
Let it progress through until completion, then you will see a screen indicating successful setup, click close.
Congratulations! SSMS is now installed.
13
assignments.
IMPORTANT: If during setup you selected for SQL Server to start manually then you will need to start
SQL Server services. Click on Search at the bottom of the Winds screen and type in Services in the search
• Scroll down the list until you see the SQL Server services.
14
Figure 1.15: SSMS Installation step five
Notes:
• When you are no longer using SQL Server, you can shut the service down to save on system resources.
• You can also change the startup type to be automatic while the course is running to save you the step
of turning this on and off.
• You may want to put the services shortcut to your desktop for quick access
• You may want to put the SQL Server Management Studio shortcut to your desktop or pin it to the
Windows Task bar for quicker access.
15
Figure 1.18: Connecting SQL server
You have just connected to your database through SQL Server Management Studio!
Activity 3:
This activity demonstrate the steps to be followed to install SQL Server Data Tools (SSDT) on the system.
Solution:
16
Figure 1.19: SDT Installation step one
2. In the installer, select for the edition of Visual Studio that you want to add SSDT to, and then
choose Modify.
3. Select SQL Server Data Tools under Data storage and processing in the list of workloads.
17
3) Graded Lab Tasks
Note: The instructor can design graded lab activities according to the level of difficult and complexity of
the solved lab activities. The lab tasks assigned by the instructor should be evaluated in the same lab.
Lab Task 1
Students are required to install the required development environment before starting the lab activities.
18
Lab 02
Data Warehouse Schemas: Star and Snowflake schema
Objective:
The objective of this lab is to demonstrate various Data Warehouse Schemas including Star and Snowflake
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Install the sample data.
• Query and test sample data (AdventureWorks).
• Understand and design Modeling dimension tables
Instructor Note:
As pre-lab activity, read chapter1 from the text book “Data Mining and Data Warehousing:
Principles and Practical Techniques, Parteek Bhatia, Cambridge University Press, 2019”.
1) Useful Concepts
Adventure Works Cycles, the fictitious company on which the AdventureWorks sample database is
based, is a large multinational production company. The company produces bicycles made of metal
and composite materials. The products are exported to North America, Europe and Asia. The
company is headquartered in Birthall, Washington, has 290 employees, and has multiple regional
sales teams active around the world.
In 2000, Adventure Works Cycles purchased Importadores Neptuno, a small production plant in
Mexico. Importadores Neptuno produces a variety of key sub-components for Adventure Works
Cycles products. These sub-assemblies will be shipped to Bossel for final product assembly. In 2001,
19
Importadores Neptuno transformed into a manufacturer and seller focusing on touring mountain
bike products.
After achieving a successful financial year, Adventure Works Cycles hopes to expand its market
share by focusing on providing products to high-end customers, expanding its product sales channels
through external websites, and cutting its sales costs by reducing production costs.
Activity 1:
Finding and installing the sample data
Load up some sample data. The sample data used is the AdventureWorks Database, specifically, the Data
Warehousing version of the AdventureWorks Database.
Goto https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-
ver15&tabs=ssms
Select the appropriate version like in your case choose AdventureWorksDW (any version).
To restore your database in SQL Server Management Studio, follow these steps:
20
1. Download the appropriate .bak file from one of links( provided in the download backup files section)
shown in above picture.
2. Move the .bak file to your SQL Server backup location. This varies depending on your installation
location, instance name and version of SQL Server. For example, the default location for a default
instance of SQL Server 2019 is:
3. Open SQL Server Management Studio (SSMS) and connect to your SQL Server.
4. Right-click Databases in Object Explorer > Restore Database... to launch the Restore
Database wizard.
5. Select Device and then select the ellipses (...) to choose a device.
6. Select Add and then choose the .bak file you recently moved to the backup location. If you moved your
file to this location but you're not able to see it in the wizard, this typically indicates a permissions issue
- SQL Server or the user signed into SQL Server does not have permission to this file in this folder.
7. Select OK to confirm your database backup selection and close the Select backup devices window.
21
8. Check the Files tab to confirm the Restore as location and file names match your intended location
and file names in the Restore Database wizard.
9. Select OK to restore your database.
Activity 2:
Techniques for Modeling dimension tables
There's two primary techniques star and snowflake.
Star Schema
Look at the Adventureworks DW 2012 database, expand to database diagrams. Locate to finance, and in
here you will see in the middle of the screen, one fact table called FactFinance, and that is surrounded by
five-dimension tables.
This is a common design to have multiple dimensions reference in the same fact table, and in particular,
these dimension tables are not related to one another and not related to other dimension tables. They are
very simple. All of the information about a dimension is contained in one table. This is called the star
design.
22
Figure 2.3: Example of Star SchemaS
Snowflake Schema
The other diagram is Internet Sales. Here factInternetSales table is in the middle. Then off to the right, a
connection to other dimension tables, and those dimension tables have relationships to one another, and this
allows to break down a dimension into more detail, and give more options on filtering, sorting, and
searching.
However, this is a more complex design. It forces to write bigger queries, more robust queries, to get the
same data out of the database. It also can be a performance hit. This will create a lot more joins and joining
two large tables can be a very expensive process. Most of our dimension tables probably shouldn't be too
large, but you do get into some scenarios with some large dimension tables. If we look at the Reseller Sales
diagram, we also see some relationships again between the different dimension tables similar to what in the
Internet Sales.
23
Figure 2.4: Example of SnowFlake Schema
The way schema is laid out, it could be argued that this looks like the branches of a snowflake. The
dimension tables are branching off into various directions, but then those branches come back and connect
to one another, and that looks a little bit like a snowflake. Therefore, this is called the snowflake technique.
So, we have two primary techniques of structure in our dimension tables. The star technique, which is very
simple. Each dimension is stored in its entirety in one table, and then contrast that with the snowflake
technique.
In the snowflake technique, a dimension is split up amongst multiple tables, each one has some advantages
and disadvantages. The star is the simpler way to go, and therefore will typically give better performance,
and is easier to write queries against. The snowflake is a more complex design, will be more difficult to
write queries, will possibly give slower performance, but it could allow us to have more robust dimensions.
Realistically, most data warehouses have some of both.
It's very rare to see a data warehouse that's 100% star or 100% snowflake. Typically, some of your
dimensions can easily be captured in one table, and you can go ahead and use the star method there, and
other dimensions just logically require multiple tables and you can use the snowflake technique there. So,
you can mix these two techniques, and that is very common.
24
Lab Task 1
Students are required to explore the sample data (AdventureWorks Data Warehouse) which include tables,
schemas and data available in sample data warehouse.
Lab Task 2
Design following SQL Queries, Run them and show output.
a. For every customer with a 'Main Office' in Dallas show AddressLine1 of the 'Main Office' and
AddressLine1 of the 'Shipping' address - if there is no shipping address leave it blank. Use one row per
customer.
b. For each order show the SalesOrderID and SubTotal calculated three ways:
A) From the SalesOrderHeader
B) Sum of OrderQty*UnitPrice
C) Sum of OrderQty*ListPrice
c. Show the best selling item by value.
d. Show how many orders are in the following ranges (in $):
e. Identify the three most important cities. Show the break down of top level product category against city.
Note: students are required to submit the queries along with their results.
25
Lab 03
Creating views and indexes on Data warehouse
Objective:
The objective of this lab to create views for the improvement in the implementation of data warehouse.
This lab will help to create different views of data warehouse.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Create views
• Create indexes on views
Instructor Note:
As pre-lab activity, read chapter1 from the text book “Data Mining and Data Warehousing: Principles and
Practical Techniques, Parteek Bhatia, Cambridge University Press, 2019”.
1) Useful Concepts
Views:
A view is created by combining data from different tables. Hence, a view does not have data of itself.
On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision
making, performing calculations etc. The data stored by calculating it before hand using queries.
When a view is created, the data is not stored in the database. The data is created when a query is fired on
the view. Whereas, data of a materialized view is stored.
2) Solved Lab Activites (Allocated Time 1 Hr.)
Activity 1:
26
Figure 3.1: Data retrieval from geography dimension table
You may noticed the geography dimension, contains information about countries, cities, states. Scroll all
the way to the right, you will see it is linked by key to the sales_territory.
Right click on salesterritory table and look at the top rows and observe it contain the Northwest United
States, Northeast United States, Central United States. Different sales territories, but this doesn't tell us
what cities are in what territory.
In order to know what cities, or in what territory we need to look at both dimension sales territory, and the
dimension geography. We can create a view to make that easier by using following code:
This will create a view that references the most interesting fields for both tables, the city, state, country
from the geography table and also the salesterritory group and sales territory region from the territory table.
The connection between the two is very simple. It is the sales territory key field, which exists in both tables.
So run this it may says command completed successfully, so now check view under list of views, if not
appear refresh view. Right click newly created view Total_DimTerritoryAndGeography view and select
top rows
27
Figure 3.2: Result of query on view created
Now see city, state, territory group and territory region all on one line. This should make it easier and more
convenient for developers to look at both of these dimensions at the same time.
The other scenario where views can improve the implementation of our data warehouse comes with
aggregating data. So, information like sales, profit, revenue, expense. Usually we don't want to look at
those things line by line, we want to look at totals.
And maybe it's a total for a week, or a total for a month, or a total for a year but there's probably going to
be some sort of grouping. So, creating a view with a group by clause can help us get started in that
aggregation.
So, again, below is some code staged that's going to create a view.
Its going to sum four different fields. All of them are dealing with currency, discount amount, product cost,
total product cost, and sales amount and group by three different fields, the auto date, the customer key,
and the currency key. So, that will allow to run reports on a certain time frame, and or on a customer or
group of customers, and or in certain currencies.
28
Let's run and create this view by executing the code. Select top rows from it and now we can see the data
that was returned. This is now a convenience for developers, they no longer have to manually set up the
group by fields, they can just pull off of this view.
This implementation however would not provide a performance increase. If we want to provide a
performance increase, we'll need to do is attach an index to this view. The way the view is now with no
indexes, it does not create a new copy of the data. And every time we select from the view, it goes to the
tables, pulls the data, performs all the necessary math, and then displays the data to the user. If we had an
index to the view, that forces the machine to create a secondary copy of the data that's already aggregated.
So, now when someone wants to look at this view, it doesn't go back to the tables and do all the math again.
It will just look directly at the view with the pre-aggregated totals. So, that would be possibly a significant
performance increase on many of our queries but, do be aware it could be a performance decrease when we
have to load new data. Because every time we load new data into this data warehouse, we are going to have
to update the view. The machine will handle that automatically but it will increase the time it takes to load
data.
ALTER VIEW [dbo].[Total_FactInternetSales]
WITH SCHEMABINDING
AS
SELECT SUM(DiscountAmount) AS Total_DiscountAmount,
SUM(ProductStandardCost) AS Total_ProductStandardCost,
SUM(TotalProductCost) AS Total_TotalProductCost,
SUM(SalesAmount) AS Total_SalesAmount,
OrderDate,
CustomerKey,
CurrencyKey,
COUNT_BIG(*) as RecordCount
FROM dbo.FactInternetSales
GROUP BY OrderDate, CustomerKey, CurrencyKey
29
Two changes are added to view:
schemabinding which creates a relationship between view and the underlying tables preventing a change
from one, without changing the other.
COUNT_BIG count the number of records andthis is a requirement in order to add an index onto any view
that has a GROUP BY.
execute this and see command completed successfully. Now add an index to this view using following
query:
CREATE UNIQUE CLUSTERED INDEX [IX_Total_FactInternetSales]
ON [dbo].[Total_FactInternetSales] (OrderDate, CustomerKey, CurrencyKey)
Its create a unique clustered index because the first index on a view has to be a unique clustered index.
After that you can create non-clustered. Here it indexed on order date, customer key and currency key .
Execute that and now over in the views we can expand it, look at the indexes, and we do in fact see one
index coming up under view.
Index is added on view. In the background the machine had made a secondary copy of the data. So, we still
have the original data in the original tables. We also have a secondary copy of that data in the view that is
pre-aggregated. So, if we want to make a call to this view, the machine doesn't have to do all of the math
to do the group by, it just has to read the data that it has aggregated. That can be a significant performance
increase to us.
3) Graded Lab Tasks (Allocated Time 1 Hr.)
Note: The instructor can design graded lab activities according to the level of difficult and complexity of
the solved lab activities. The lab tasks assigned by the instructor should be evaluated in the same lab.
30
Lab Task 1
Create a view on sales amount greater than 25000 along with initial name hire date and title of employee
handling the sale, day of month using FactSalesQuota, DimEmployee, DimDate. Also create the index on
the view.
You need to submit the report with screen shorts of results and queries of views and indexes.
31
Lab 04
Conversion of Entity Relationship Diagram(ERD) to
Dimensional Model (DM): Working with Sample
database Sakila
Objective:
The objective of this lab to help students to learn about creating Dimesional Model (DM) from Entity
Relationship Diagram (ERD) or OLTP database (using sample database).
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Identify business processes from OLTP systems.
• Created DM from ERD.
Instructor Note:
As pre-lab activity, read chapter1 from the text book “The Data Warehouse Toolkit: The Definitive Guide
to Dimensional Modeling, Ralph Kimball & Margy Ross, Wiley, 2013”.
1) Useful Concepts
ERD
32
Figure 4.1 : ERD of DVD Rental database
• actor — contains actors data including first name and last name.
• film — contains films data such as title, release year, length, rating, etc.
• film_actor — contains the relationships between films and actors.
• category — contains film’s categories data.
• film_category — containing the relationships between films and categories.
• store — contains the store data including manager staff and address.
• inventory — stores inventory data.
• rental — stores rental data.
• payment — stores customer’s payments.
• staff — stores staff data.
• customer — stores customer’s data.
• address — stores address data for staff and customers
• city — stores the city names.
• country — stores the country names.
33
Sample Queries
1. Actor with most films (ignoring ties)
SELECT first_name, last_name, count(*) films
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
GROUP BY actor_id, first_name, last_name
ORDER BY films DESC
LIMIT 1;
Result:
first_name last_name films
--------------------------------
GINA DEGENERES 42
Result:
payment_date amount sum
-------------------------------------
2005-05-24 29.92 29.92
2005-05-25 573.63 603.55
2005-05-26 754.26 1357.81
2005-05-27 685.33 2043.14
2005-05-28 804.04 2847.18
2005-05-29 648.46 3495.64
2005-05-30 628.42 4124.06
2005-05-31 700.37 4824.43
2005-06-14 57.84 4882.27
2005-06-15 1376.52 6258.79
2005-06-16 1349.76 7608.55
2005-06-17 1332.75 8941.30
Analysis Queries:
1. What is the number of rentals per month for each store?
SELECT s.store_id,
EXTRACT(ISOYEAR FROM r.rental_date) AS rental_year,
EXTRACT(MONTH FROM r.rental_date) AS rental_month,
COUNT(r.rental_id) AS count_rentals
FROM rental r
34
JOIN staff
USING (staff_id)
JOIN store s
USING (store_id)
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
The plot here show a comparison between the 2 stores for each month. And as we can see there is no
significant defference between the 2 stores.
The first step in designing a data warehouse for the sakila DVD rental database is to determine what
questions the Business would like to answer. For example, the Business may wish to know answers to the
following questions.
35
• Which store has the most rentals?
• Which district has the most rentals?
• Which week of the month has the highest volume of rentals?
• Which week of the year has the highest volume of rentals?
• Is the rental business growing, month over month? and year over year?
• What time of the day is the most active for DVD returns?
• Do rentals decrease when the film duration exceeds a certain time?
• What staff member has the most rentals? the least rentals?
• What movie category type (e.g. genre) results in the most rentals? Note that this is not addressed in
the sample data warehouse schema that was designed for this project.
Once the Business needs are identified, then design of the data warehouse can proceed. Next step is to
define what will be the facts (i.e. measures) and what will be the dimensions (i.e. aspects of the business
process) to be tracked.
The facts for a business are typically sales, cost, inventory, and units sold. Dimensions will typically answer
the questions: what, where, who, and when. For a DVD rental business, the dimensions would typically be
product (i.e., the what), store (i.e., the where), customer (i.e., the who), sales person (again, the who), date
and time (i.e. the when).
For our data warehouse, the following is defined for the fact and dimensions:
The data warehouse schema that will be utilized to answer the Business questions is shown in the figure
below:
36
Figure 4.3 : Dimension Model (Star Schema) of DVD Rental data warehouse
Lab Task 1
You are required to design a data warehouse for any existing database. Consider the following sample
database provided by MYSQL. You can download this sample database (Table and data) from the link:
Download MySQL Sample Database .
37
• OrderDetails: stores sales order line items for each sales order.
• Payments: stores payments made by customers based on their accounts.
• Employees: stores all employee information as well as the organization structure such as who
reports to whom.
• Offices: stores sales office data.
Note: Try to take different sample database. If your sample database is same as others then your
business rules should vary with respect to other class fellows.
38
Lab 05
Demonstration of ETL Tool: SQL Server Integration
services-SSIS (Extraction and Loading)
Objective:
The objective of this lab to help students to work with SSIS for successful data transofrmtion and loading
into data warehouse from various type of data sources. This lab will help to load data from CSV file to
database/data warehouse table.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Install SSIS
• Load data from CSV file to database table.
Instructor Note:
As pre-lab activity, read chapter 3 from the text book “The Data Warehouse ETL Toolkit: Practical
Techniques for Extracting, Cleaning, Conforming, and Delivering Data, Ralph Kimball & Joe Caserta,
Wiley, 2004”.
1) Useful Concepts
What is SSIS:
SQL Server Integration Service (SSIS) is a component of the Microsoft SQL Server database software that
can be used to conduct a wide range of data integration tasks. SSIS is a fast & flexible data warehousing
tool used for data extraction, loading and transformation like cleaning, aggregating, merging data, etc.
It makes it easy to move data from one database to another database. SSIS can extract data from a wide
variety of sources like SQL Server databases, Excel files, Oracle and DB2 databases, etc.
SSIS also includes graphical tools & wizards for performing workflow functions like sending email
messages, FTP operations, data sources, and destinations.
• SSIS tool helps you to merge data from various data stores
• Automates Administrative Functions and Data Loading
• Populates Data Marts & Data Warehouses
• Helps you to clean and standardize data
• Building BI into a Data Transformation Process
• Automating Administrative Functions and Data Loading
• SIS contains a GUI that helps users to transform data easily rather than writing large programs
• It can load millions of rows from one data source to another in very few minutes
• Identifying, capturing, and processing data changes
39
• Coordinating data maintenance, processing, or analysis
• SSIS eliminates the need of hardcore programmers
• SSIS offers robust error and event handling
Activity 1:
When Visual Studio is opened, we click on "Continue without code" to add the necessary
extension:
40
Figure 5.1: SSIS Installation Step one
In the search bar of the opened window, we type "Integration Services" to easily locate the extension.
From the appeared list we choose "SQL Server Integration Services Projects" and press "Download":
41
Figure 5.3: SSIS Installation Step three
The installation of the extension begins. Now, we will follow some simple steps. In the next window we
click "OK":
42
After that, we click "Next" to continue:
If you receive the following message, you probably have SQL Server Management Studio opened:
43
Figure 5.8: SSIS Installation progress bar
Now, we are ready to create Integration Services projects. In Visual Studio, we choose "Create a new
project":
44
Figure 5.10: Integration service project creation step one
In the next window, we type "integration" to find "Integration Services Project" and click on it:
45
Figure 5.12: Integration service project creation ( adding project name)
Hence, it is ready! We opened the interface where we can design and develop SSIS 2019 packages:
Activity 2:
First, you need to prepare the environment by creating the SQL Server table and the CSV file.
46
Run the script below in SQL Server to create the SQL table either on a new database or an existing one.
For this example, I used my ‘TrainingDB’ database.
Now create a CSV file with the data below. Open notepad file add headings separated with comma (,) and
add each record as new line (use comma between each value in a row). Save the notepad file as CSV.
After launching Microsoft Visual Studio, navigate to File - New - Project, as shown below.
47
Figure 5.14: Create new project
Under the Business Intelligence group, select Integration Services and Integration Services Project. Enter
a name for project and a name for the solution, for example “Load CSV”. You can check the “Create a
directory for solution” box if you want to create a solution.
48
Figure 5.15: Select integration services project
Click OK
On the right side of the displayed screen, in the “Solution Explorer” window, change the name of the default
package to “Load CSV File into Table”
49
Figure 5.16: Rename SSIS Package
On the left side of the screen, in the SSIS Toolbar, drag the “Data Flow” to the “Control Flow” window
and rename the task to “Load CSV File”
Next, you need to setup the connection managers for both the CSV file and the SQL Server table, which
are also known as source and destination respectively. At the bottom of the screen, under Connection
Managers, do a right click and select “New Flat File Connection” and configure the Flat file connection
manager as shown below.
50
Figure 5.17: Flat file connection manager
Enter a suitable Connection manager name and specify the filepath for the Students.csv file. Click OK.
For the table’s connection manager, do a right click again in the Connection Managers window
and click on “New OLE DB Connection”. Click on New and specify the Server name and database
name that contains the StudentsDetail table.
51
Figure 5.18: Establish OLEDB connection
You can test the connection by clicking “Test Connection” then click OK and OK again. You should now
have the 2 Connection Managers at the bottom of the screen.
Drag the “Flat File Source” from the SSIS Toolbox into the “Data Flow” window and rename it as “CSV
File”.
52
Figure 5.19: Loading flat file
Double click on this source and select the “Student CSV File” connection manager. Click on Columns on
the left side of the screen to review the columns in the file. Click OK.
Then drag the “OLE DB Destination” from the SSIS Toolbox to the “Data Flow” window and rename it as
“SQL Table”. Drag the blue arrow from the source to the destination.
53
Figure 5.20: Connecting database table
Click on Mappings on the left side of the screen and ensure all fields are mapped correctly from source to
destination.
54
Figure 5.21: Mapping columns of source flat file to destination database table column
Click OK. Your screen should look like the image below.
55
Figure 5.22: Successful loading of data from flat file to database table
Run the package by clicking on Start. When the package finishes executing, you can check the table to view
the data from the CSV file.
56
You are required to design database Student Exam ( ERD is given belwo) and load data from
different CSV, HTML files into each table.
57
Lab 06
Demonstration of ETL tool : SQL Server Integration
services-SSIS (Transformation)
Objective:
The objective of this lab to help students to work with SSIS for successful data transofrmtion and loading
into data warehouse from various type of data sources. This lab will help to understand different type of
transformations which can be applied on data before loading into data warehouse.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Work with different data transmissions
• Create an SSIS package for data sampling in the SSIS package.
Instructor Note:
As pre-lab activity, read chapter 3 from the text book “The Data Warehouse ETL Toolkit: Practical
Techniques for Extracting, Cleaning, Conforming, and Delivering Data, Ralph Kimball & Joe Caserta,
Wiley, 2004”.
1) Useful Concepts
SSIS Transformation:
The SSIS transformations are the data flow components that are used to perform aggregations, sorting,
merging, modifying, joining, data cleansing, and distributing the data.
Apart from these, there is an important and powerful transformation in SSIS called Lookup transformation
to perform lookup operations. In this article, we will show you the list of available SSIS transformations
and explains their working functionality.
Business Intelligence Transformations in SSIS:
The following list of SSIS transformations will perform Business Intelligence operations such as Data
Mining, Correcting, and cleaning the data.
58
Row Transformation in SSIS:
The below list of SSIS transformations is useful to update the existing column values and to create new
columns.
59
Rowset Transformations:
The following transformations create new rowsets. The rowset can include aggregate and sorted values,
sample rowsets, or pivoted and unpivoted rowsets.
60
Split and Join Transformations:
The following transformations distribute rows to different outputs, create copies of the transformation
inputs, join multiple inputs into one output, and perform lookup operations.
Auditing Transformations:
Integration Services includes the following transformations to add audit information and count rows.
61
2) Solved Lab Activites
Let’s create an SSIS package for data sampling in the SSIS package.
SELECT [CustomerID]
,[PersonID]
,[StoreID]
,[TerritoryID]
,[AccountNumber]
,[rowguid]
,[ModifiedDate]
FROM [adventureworks2014].[Sales].[Customer]
In the SSIS Control flow window, add a data flow task and rename it to Data Sampling Transformation
in SSIS.
Right-click on this data flow task and Edit. It takes you to data flow page. In this page, you can see that we
are in this particular data flow task.
62
Figure 6.2: Data sampling transformation step two
Add an OLE DB source and rename the task as appropriate. This task should point to SQL instance and
Sales.Customers table in AdventureWorks2019 database.
Right-click on a blank area in the data flow task and click on Add Annotation. Annotation is similar to a
text box that does not execute, and we use to print messages that help to understand the SSIS package.
We have prepared the base of the SSIS package in this step. Let’s move forward with data sampling
transformations in SSIS.
We use Row sampling transformation to retrieve a specified random number of data rows from the source
data table. It gives random data every time we execute the SSIS package. You get two outputs from this
transformation.
Add a Row Sampling transformation from the SSIS toolbox and drag the blue arrow from source to
transformation, as shown below.
63
Figure 6.5: Row sampling transformation step one
Double click on Row Sampling, and it opens the row sampling transformation editor.
• The number of rows: We can specify the number of random rows we want from the
transformation. The default value is 10. Let’s modify it to 1000 rows
• Sample Output name: It shows the name of the output that we get from the transformation as
specified by the number of rows parameter. The default name is Sampling Selected Output. Let’s
modify it to Row Sampling Match output
• Unselected output name: It is the name of the output that contains data excluded from row
transformation. We get the total number of rows in the table – the number of rows specified in this
output. Let’s modify the name to Excluded data
64
Figure 6.7: Row sampling transformation editor (changing name of Unselected output)
Let’s skip the option ‘User the following random seed’ as of now. We will cover it in the latter part of the
lab.
65
Now, add two SSIS Multicast transformations and rename them as follows.
• Multicast – Matched
• Multicast – Unmatched
Join the output from Row Sampling to Multicast – Matched, and it opens the input-output selection window.
In the output column, select the output – Row Sampling Match output.
Similarly, take the second output from Row Sampling and join to Multicast – unmatched transformation. It
will automatically take another available output, as shown below.
We added SSIS Multicast operator here to display the data. If you want to insert data into SQL table,
configure OLE DB destination as well.
66
Figure 6.11: Splitting matched and unmatched data
Right-click on the arrow between Row Sampling and Multicast- Matched and enable data viewer.
67
Figure 6.12: Data viewer sambol representation on Row sampling match
Press F5 to execute the SSIS package. It opens the data viewer, and you can see the 1000 rows in the output.
Close this data viewer and package execution complete. In the output, we can see that
68
Figure 6.14: Viewing no. of match and unmatch rows
• Number of rows: 10
• Use single-column AccountNumber
First execution:
Figure 6.14: Raw sampling output data viewer after first execution
69
Second execution:
Figure 6.15: Raw sampling output data viewer after second execution
You can compare the output in both the executions. In both the executions, it gives random account numbers
and it different in both executions. It might also pick certain account number again in second execution in
random pick up.
Suppose we want to get similar records on each execution. It should give us the output as per specified
record count, but records should not be random.
In the configured SSIS package, open the properties again for row sampling transformation and set the
random seed value to 1. It is recommended only for testing purpose.
70
Figure 6.16: Setting new random seed value
First execution
Figure 6.17: Raw sampling output data viewer after first execution with new seed value
71
Second execution
Figure 6.18: Raw sampling output data viewer after second execution with new seed value
You get same data in both the executions. It picks the random data at once and does not change data in the
next execution.
Activity 2:
In the previous section, we discussed the Row Sampling Transformation in SSIS. Percentage sampling
configuration is similar to row sampling.
In row sampling, we specify the number of rows that we want in the output, such as 500 rows or 1000 rows.
I Percentage sampling, we specify the percentage of rows. For example, if total incoming rows are 1000
and we specify 10% percentage sample, we get approximately 100 rows in the matched output. Remaining
rows get to unmatched row output.
Similar to row sampling transformation, it picks random sampling data, and you might get a completely
different result set on each execution. We can specify random seed value to get similar data on each
execution.
72
• Drag a percentage sampling from the SSIS toolbox and join arrow between source data and
percentage sampling
• In percentage sampling, specify the percentage of rows, output column names
73
Figure 6.20: Setting up percentage sampling transformation
Execute the SSIS package. In the following screenshot, we can see that percentage sampling
transformation in SSIS does the following tasks
74
Let’s specify the random seed value 1 in the percentage sampling transformation and execute the
SSIS package.
Figure 6.22: Setting new random seed value for percentage sampling transformation
75
First execution
Figure 6.23: Percentage sampling output data viewer after first execution with new seed value
Second execution
Figure 6.24: Percentage sampling output data viewer after second execution with new seed value
76
Conclusion:
In this lab, we explored data sampling technique – Row Sampling transformation and Percentage Sampling
transformation in SSIS package. You can use these transformations to test package against a different set
of data and analyze results.
1. When loading data into SQL Server you have the option of using SQL Server Integration Services
to handle more complex loading and data transforms then just doing a straight load. One problem
that you may be faced with is that data is given to you in multiple files such as sales and sales
orders, but the loading process requires you to join these flat files during the load instead of doing
a preload and then later merging the data. Using merge join SSIS transformation, merge multiple
data sources and load the data into table. You can use employee table for data loading from any
two source files (cvs or html).
2. Using derived column transformation, add a new column to the table before loading data to table.
The new column will calculate the annual salary of each employee against the monthly salary (
consider the employee table for transformation).
77
Lab 07
Creating ROLAP Cube in SQL Server Analysis
Services (SSAS)
Objective:
The objective of this lab is to learn to design ROLAP Cube using SSAS (SQL Server Analysis Services).
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Create ROLAP cube in data warehouse.
Instructor Note:
As pre-lab activity, read chapter 14 from the text book “Data Mining and Data Warehousing: Principles
and Practical Techniques, Parteek Bhatia, Cambridge University Press, 2019”.
1) Useful Concepts
Introduction to OLAP Cube
• An OLAP cube is a technology that stores data in an optimized way to provide a quick response to
various types of complex queries by using dimensions and measures.
• Most cubes store pre-aggregates of the measures with its special storage structure to provide quick
response to queries.
• SSRS Reports and Excel Power Pivot is used as front end for Reporting and data analysis with SSAS
(SQL Server Analysis Services) OLAP Cube.
• SSAS (SQL Server Analysis Services) is Microsoft BI Tool for creating Online Analytical Processing
and data mining functionality.
What is SSAS?
SSAS stands for SQL Server Analytics Services and is simply a data analysis tool by Microsoft. Using
SSAS, you can analyze data coming from various sources and produce some summaries of useful
information.
Using SSAS you can create two types of models;
Tabular Model – this is just a kind of database. Or say a database that is a bit more advanced that normal
relational database. A tabular model are in-memory databases that uses table and other relational
components such as relationships, joins, rows and column etc
Multidimensional Model – this is a data model that support very large amount of data. Similar to but not
exactly big data. Multidimensional data model in addition to tabular data supports other things like
dimensions, measures perspectives, data sources and perspectives.
78
2) Solved Lab activities
Activity 1:
Installing SQL Server 2019 Analytics Services (SSAS)
Some Prerequisites
SQL Server installed as well as SQL Server Management Studio.
Setup SSAS
We would follow the steps below to setup SSAS
Step 1 – Run the SQL Server setup. You will come to the screen below:
Step 2 – Click on the first link ‘New SQL Server stand-alone installation or add features to an existing
installation’.
Step 3 – Follow the Wizard steps. And when it comes to the Feature Selection page, select Analysis
Services as shown below:
79
Figure 7.2: Choose Analysis services
Step 4 – When you get to the Analysis Services Configuration, choose Tabular Mode as shown below
You need to install SQL Server Data Tools (SSDT). According to Microsoft, this tool has been integrated
into Visual Studio. So following the steps below, you’ll be taken to where you’ll download and install
Visual Studio 2019.
Step 1 – Follow the same process but select SQL Server Data Tools. See figure below:
80
Figure 7.4: Installing SQL Server Data Tools
Step 2 – Click on Install SQL Server Data Tools. You’ll be taken to the download page for Visual Studio.
Install it (We already installed it in Lab 1).
Activity 2:
OLAP Cube Creation
Create a database called “Testing” and run the following query to create tables:
create table Dim_Customer(
customerid int primary key,
name varchar(50)
)
create table Dim_Product(
productid int primary key,
name varchar(50),
price int
)
create table Fact_Sales(
salesid int primary key,
customerid int foreign key references Dim_Customer(customerid),
productid int foreign key references Dim_Product(productid),
salesTotal int
)
81
Create new analysis service project in Microsoft Visual Studio –
82
83
84
Note: Use your windows username and password.
85
Create new cube by right click on Cubes in Solution Explorer-
86
87
88
89
90
Deploying:
Go to project properties and enter your server name from SSMS and database name as “Testing”.
91
Once deployment is successful right click the salescube and click on process to process the cube.
92
3) Graded Lab Tasks
Note: The instructor can design graded lab activities according to the level of difficult and complexity of
the solved lab activities. The lab tasks assigned by the instructor should be evaluated in the same lab.
93
Lab Task 1
Star schema for sales DWH is given below. You are required to create this Data warehouse. Queries for
table creation and data loading can be found on the link:
https://www.codeproject.com/Articles/652108/Create-First-Data-WareHouse
94
Lab 08
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Get familiar with PowerBI basic ribbon and operations.
• Load data fron CSV files
• Remove NULLs from Data
• Create basic visualization
Instructor Note:
As pre-lab activity, read chapter 14 from the textbook “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
95
1) Useful Concepts
Introducing Power BI
Power BI is a suite of business analytics tools which connects to different data sources to analyze data and share
insights throughout your organization.
Parts of Power BI
Power BI Desktop: It is a Windows desktop application (Report Authoring Tool) which Lets you build queries,
models and reports that visualize data.
Power BI Service: Power BI Service is cloud based Software as Service Application which allows us to create
dashboards, Setup schedule data refreshes, Share the reports securely in the organization.
Power BI Mobile: It is an application (App) on mobile devices which allows you to interact with the reports and
dashboard from Power BI Service.
It doesn’t always happen that way, and that’s okay, but we’ll use that flow to help you learn the various parts of
96
Power BI, and how they complement one another.
Power BI Desktop:
Power BI Desktop is report authoring tool that allows you to create reports, queries, Extract Transform and Load
the data from data sources and model the queries.
97
Power BI Desktop Interface: The Report has five main areas:
1. Ribbon: The Ribbon displays common tasks associated with reports and visualizations;
2. Pages: The Pages tab area along the bottom allows you to select or add a report page;
3. Visualizations: The Visualizations pane allows you to change visualizations, customize colors or axes,
apply filters, drag fields, and more;
4. Fields: The Fields pane, allows you to drag and drop query elements and filters onto the Report view, or
drag to the Filters area of the Visualizations pane;
5. Views Pane: There are three types of views in the views pane
▪ Reports View – allows you to create any number of report pages with visualizations.
▪ Data View – allows you to inspect, explore, and understand data in your Power BI Desktop model.
▪ Relationship or Model view – allows you to show all of the tables, columns, and relationships in your
model.
98
Activity 1:
Querying Data from CSV
Solution:
Query Editor
You can import and clean data from Oracle while working in Power BI.
Query Editor, allows you to connect to one or many data sources, shape and transform the data to meet your
business needs, then load the queries into the model into Power BI Desktop
This below step provides an overview of the work with data as well as connecting to data sources, shaping the
data in Query Editor
2. Click on the drop down of the Edit Queries on the bottom right corner, click on Edit Queries
99
Note: With no data connections, Query Editor appears as a blank pane, ready for data.
100
4. Navigate to the Strategic Plan and Dashboard Folder and Choose
PowerBITraining_StrategicPlanDashboard_Input_Template Excel File
101
6. Select Input sheet from the available list
2. Left Pane
4. Query Settings
102
The Query Ribbon
The Ribbon in Query Editor consists of four tabs
▪ Home
▪ Transform
▪ Add Column
▪ View
Home Tab: The Home tab contains the common query tasks, including the first step in any query, which is Get
Data.
Transform: The Transform tab provides access to common data transformation tasks, such as adding or
removing columns, changing data types, splitting columns, and other data-driven tasks.
Add Column: The Add Column tab provides additional tasks associated with adding a column, formatting
column data, and adding custom columns. The following image shows the Add Column tab.
View Tab: The View tab on the ribbon is used to toggle whether certain panes or windows are displayed. It’s also
used to display the Advanced Editor. The following image shows the View tab.
The left pane displays the number of active queries, as well as the name of the query. When you select a
query from the left pane, its data is displayed in the center pane, where you can shape and transform the
data to meet your needs.
103
The center (data) pane:
In the Center pane, or Data pane, data from the selected query is displayed. This is where much of the work of
the Query view is accomplished.
104
Activity 2:
Solution:
Removing the unwanted rows in the query.
8. Home Tab > Reduce Rows section > Remove Rows > Remove Blank Rows
Notice that null records are eliminated, and new steps is added for the transformation you applied to the query in
the query settings pane of the selected query.
105
Note: Each step, you do in the Query Editor is recorded in Applied Steps of Query Settings pane.
Note: After Close & Apply the query is added to the model for report development.
106
Activity 3:
Creating Simple Reports & Visualizations
Solution:
Creating your first visualization (Completion % of All Goals) Gauge Chart
1. Click on Visualizations Pane and Click on Gauge Chart
Note: Make sure the Visualization is selected before dropping the fields.
107
2. Expand Input Query, Drag Overall Completion% to the Value section of the Fields pane of the gauge Visual
With Report Themes you can apply design changes to your entire report, such as using corporate colors,
changing icon sets, or applying new default visual formatting. When you apply a Report Theme, all visuals in
your report use the colors and formatting from your selected theme.
108
3. From the Home Ribbon of the Report view, click on the drop down of the Switch Theme under Themes section
and select Import from the file. Drag Overall Completion% to the Value section of the Fields pane ofthe gauge
Visual
A window appears that lets you browse to the location of the JSON theme file
4. Navigate to the Strategic Plan and Dashboard folder o the Desktop and select Power BI Color Theme.Json file
109
5. Click on Open ( ) at the bottom of the screen
You will get a success message once the theme is imported successfully.
6. Select the Gauge Chart and Click on the Format of the Gauge Chart, Expand Data Colors properties,
click on the drop down of Fill property and select light blue color
After the changing the color the gauge chart looks like the one below.
110
7. Click on the drop down of Target property and select Black color.
111
Changing the Title of the Gauge Chart.
8. Expand the title property of the Gauge chart, Change the title text to “Completion% of All 4 Goals”.
We are done with our first visualization. We will create few more visualizations.
112
Exercise 9: Creating the Stacked Column Chart.
9. Click anywhere on the Canvas other than the visuals, select Stacked Column Chart and bring the visual nextto
the Donut Chart.
10. Expand Input, Drag Overall Completion% to the Value section, Goal Detail to the Legend, Goal to the
Axis of the Fields pane of the Stacked Column Visual.
113
Notice that the goals are not in the right order.
11. Click on the ellipses ( More Options) of the Stacked Column Visual, Select Sort Ascending,
Hover on Sort by and Select Goal Detail.
12. Click on the format icon ( ) for the visual, Expand Title and edit the title to “Goal Completion% by Goal”
115
15. Click anywhere on the Canvas other than the visuals, select Stacked Column Chart and bring the visual below
the Donut Chart.
16. Expand Input, Drag Overall Completion% to the Value section, Performance Measure/Milestone Detail to
the Axis, Champion to the tool tip of the Fields pane of the Stacked Column Visual.
Filters in Power BI
Filters allows the Power BI visual to narrow down or filter to the desired result. We are filtering the
visual to show just the data for Goal.
17. Expand the filters pane, Drag Goal to the “Add data fields here” section under Filters on this visual section
116
and select Goal 1
117
18. Click on the format icon ( ) for the Stacked Column Chart visual, expand Title and edit the title to Goal 1
Completion%
19. Turn on the Data Labels Property, Expand Y axis Property and in the End box Type 1
118
20. Click on the Stacked Column Chart visual and copy & paste it, Adjust the position on the Report page
21. Click on the format icon ( ) for the Stacked Column chart visual, expand Title and edit the title to Goal2
Completion %
119
22. Expand the filters pane, click on the drop down of the Goal Filter on Filters Pane and select Goal 2
Notice that the Stacked Column Chart visual is automatically changed to the reflect the data to the Goal 2.
23. Click on the format icon ( ) for the Stacked Column chart visual, expand Data colors property, Change
the color to reflect the color for Goal 2 on the Goal Completion % by Goal.
120
24. Click on the Stacked Column Chart visual and copy & paste it, Adjust the position on the Report page
25. Click on the format icon ( ) for the Stacked Column chart visual, expand Title and edit the title to Goal3
Completion %
121
26. Expand the filters pane, click on the drop down of the Goal Filter on Filters Pane and select Goal 3
27. Click on the format icon ( ) for the Stacked Column chart visual, expand Data colors property, Change
the color to reflect the color for Goal 2 on the Goal Completion % by Goal.
122
28. Click on the Stacked Column Chart visual and copy & paste it, Adjust the position on the Report page
29. Click on the format icon ( ) for the Stacked Column chart visual, expand Title and edit the title to Goal3
Completion %
123
30. Expand the filters pane, click on the drop down of the Goal Filter on Filters Pane and select Goal 4
31. From the Home Ribbon, click on the Text Box and type in “Strategic Plan Dashboard” and increase
the fontsize to 21.
124
5) Graded Lab Tasks
Note: The instructor can design graded lab activities according to the level of difficult and complexity
of the solved lab activities. The lab tasks assigned by the instructor should be evaluated in the same
lab.
Lab Task 1
Download and install Power BI Desktop. Explore the Power BI Desktop interface. Import a dataset (from
any online site e.g., Kaggles, data.World etc.) into Power BI Desktop. Create different visualizations using
the imported data.
125
Lab 10
Objective:
The objective of this lab is to get connecting to source data, previewing the data, and using data preview
techniques to understand the characteristics and quality of the source data.
Instructor Note:
As pre-lab activity, read chapter 14 from the text book “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
126
1) Useful Concepts
Activity 1:
Prepare Data
Solution:
In this exercise, you will create eight Power BI Desktop queries. Six queries will source data from SQL Server,
and two from CSV files.
127
5. Click Save.
Tip: You can also save the file by click the Save icon located at the top-right.
3. In the Options window, at the left, in the Current File group, select Data Load.
128
The Data Load settings for the current file allow setting options that determine default behaviors when
modeling.
4. In the Relationships group, uncheck the two options that are checked.
While these two options can be helpful when developing a data model, they have been disabled to support
the lab experience. When you create relationships in Lab 03A, you will learn why you are adding each one.
5. Click OK.
In this task, you will create queries based on SQL Server tables.
129
1. On the Home ribbon tab, from inside the Data group, click SQL Server.
2. In the SQL Server Database window, in the Server box, enter localhost.
In the labs, you will connect to the SQL Server database by using localhost. This isn’t a recommended
practice, however, when creating your own solutions. It’s because gateway data sources cannot resolve
localhost.
3. Click OK.
130
5. Click Connect.
131
The preview allows you to determine the columns and a sample of rows.
o DimEmployee o
DimEmployeeSalesTerritory o
DimProduct o DimReseller o
DimSalesTerritory o
FactResellerSales
11. To apply transformations to the data of the selected tables, click Transform Data.
You won’t be transforming the data in this lab. The objectives of this lab are to explore and profile the
data in the Power Query Editor window.
In this task, you will preview the data of the SQL Server queries. First, you will learn relevant information about
the data. You will also use column quality, column distribution, and column profile tools to understand the data,
and assess data quality.
1. In the Power Query Editor window, at the left, notice the Queries pane.
The Queries pane contains one query for each selected table.
132
The DimEmployee table stores one row for each employee. A subset of the rows represent the
salespeople, which will be relevant to the model you’ll develop.
3. At the bottom left, in the status bar, notice the table statistics—the table has 33 columns, and 296 rows.
5. Notice that the last five columns contain Table or Value links.
These five columns represent relationships to other tables in the database. They can be used to join
tables together.
6. To assess column quality, on the View ribbon tab, from inside the Data Preview group, check Column
Quality.
Column quality allows you to easily determine the percentage of valid, error, or empty values.
7. For the Position column (sixth last column), notice that 94% of rows are empty (null).
8. To assess column distribution, on the View ribbon tab, from inside the Data Preview group, check
Column Distribution.
133
9. Review the Position column again, and notice that there are four distinct values, and one unique value.
10. Review the column distribution for the EmployeeKey (first) column—there are 296 distinct values, and
296 unique values.
When the distinct and unique counts are the same, it means the column contains unique values. When
modeling, it’s important that some tables contain unique columns.
The DimEmployeeSalesTerritory table stores one row for each employee and the sales territory regions
they manage. The table supports relating many regions to a single employee. Some employees manage
one, two, or possibly more regions. When you model this data, you will need to define a many-to-many
relationship, which you will do in Lab 05A.
134
The DimProduct table contains one row per product sold by the company.
When you add transformations to this query in the next lab, you’ll use the
DimProductSubcategory column to join tables.
The DimReseller table contains one row per reseller. Resellers sell, distribute, or value add Adventure
Works’ products.
16. To view column values, on the View ribbon tab, from inside the Data Preview group, check Column
Profile.
135
17. Select the BusinessType column header.
18. Notice that a new pane opens beneath the data preview pane.
20. Notice the data quality issue: there are two labels for warehouse (Warehouse, and the misspelled Ware
House).
21. Hover the cursor over the Ware House bar, and notice that there are five rows with this value.
In the next lab, you will apply a transformation to relabel these five rows.
The DimSalesTerritory table contains one row per sales region, including Corporate HQ
(headquarters). Regions are assigned to a country, and countries are assigned to groups. In Lab 04A,
will create a hierarchy to support analysis at region, country, or group level.
136
The FactResellerSales table contains one row per sales order line—a sales order contains one or more
line items.
24. Review the column quality for the TotalProductCost column, and notice that 8% of the rows are empty.
Missing TotalProductCost column values is a data quality issue. To address the issue, in the next lab
you will apply transformations to fill in missing values by using the product standard cost, which is
stored in the DimProduct table.
137
2. In the Open window, navigate to the D:\DA100\Data folder, and select the ResellerSalesTargets.csv
file.
3. Click Open.
4. In the ResellerSalesTargets.csv window, notice the data preview.
5. Click OK.
138
The ResellerSalesTargets CSV file contains one row per salesperson, per year. Each row records 12
monthly sales targets (expressed in thousands). The business year for the Adventure Works company
commences on July 1.
When there isn’t a monthly sales target, a hyphen character is stored instead.
8. Review the icons in each column header, to the left of the column name.
The icons represent the column data type. 123 is whole number, and ABC is text.
In the next lab, you’ll apply many transformations to achieve a different shaped result consisting of only
three columns: Date, EmployeeKey, and TargetAmount.
139
The ColorFormats CSV file contains one row per product color. Each row records the HEX codes to
format background and font colors. In the next lab, you will integrate this data with the DimProduct
query data.
Finish up
1. On the View ribbon tab, from inside the Data Preview group, uncheck the three data preview options:
2. To save the Power BI Desktop file, on the File backstage view, select Save.
140
3. When prompted to apply the queries, click Apply Later.
Applying the queries will load their data to the data model. You’re not ready to do that, as there are
many transformations that must be applied first.
141
Lab 11
Objective:
The objective of this lab is to apply transformations to each of the queries created in the previous lab.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Apply various transformations
• Apply queries to load them to the data model
Instructor Note:
As pre-lab activity, read Chapter xx from the text book “”.
1) Useful Concepts
Activity 1:
Transformation on the queriers.
Solution:
Load Data
In this exercise, you will apply transformations to each of the queries created in the previous lab.
1. In the Power Query Editor window, in the Queries pane, select the DimEmployee query.
142
2. To rename the query, in the Query Settings pane (located at the right), in the Name box, replace the
text with Salesperson, and then press Enter.
The query name will determine the model table name. It’s recommended to define concise, yet friendly,
names.
3. In the Queries pane, verify that the query name has updated.
You will now filter the query rows to retrieve only employees who are salespeople.
4. To locate a specific column, on the Home ribbon tab, from inside the Manage Columns group, click
the Choose Columns down-arrow, and then select Go to Column.
143
Tip: This technique is useful when a query contains many columns. Usually, you can simply horizontally
scroll to locate the column.
5. In the Go to Column window, to order the list by column name, click the AZ sort button, and then
select Name.
7. To filter the query, in the SalesPersonFlag column header, click the down-arrow, and then uncheck
FALSE.
8. Click OK.
144
9. In the Query Settings pane, in the Applied Steps list, notice the addition of the Filtered Rows step.
Each transformation you create results in additional step logic. It’s possible to edit or delete steps. It’s
also possible to select a step to preview the query results at that stage of transformation.
10. To remove columns, on the Home ribbon tab, from inside the Manage Columns group, click the
Choose Columns icon.
11. In the Choose Columns window, to uncheck all columns, uncheck the (Select All Columns) item.
o EmployeeKey o
EmployeeNationalI
DAlternateKey o
FirstName o
LastName
o Title o
EmailAddress
145
13. Click OK.
14. In the Applied Steps list, notice the addition of another query step.
15. To create a single name column, first select the FirstName column header.
16. While pressing the Ctrl key, select the LastName column.
17. Right-click either of the select column headers, and then in the context menu, select Merge Columns.
146
Many common transformations can be applied by right-clicking the column header, and then choosing
them from the context menu. Note, however, that all transformations—and more—are available in the
ribbon.
18. In the Merge Columns window, in the Separator dropdown list, select Space.
19. In the New Column Name box, replace the text with Salesperson.
22. Replace the text with EmployeeID, and then press Enter.
When instructed to rename columns, it’s important that you rename them exactly as described.
23. Use the previous steps to rename the EmailAddress column to UPN.
24. At the bottom-left, in the status bar, verify that the query has five columns and 18 rows.
It’s important that you do not proceed if your query does not produce the correct result—it won’t be
possible to complete later labs. If it doesn’t, refer back to the steps in this task to fix any problems.
147
Activity 2:
Configure the SalespersonRegion query
Solution:
In this task, you will configure the SalespersonRegion query.
3. To remove the last two columns, first select the DimEmployee column header.
4. While pressing the Ctrl key, select the DimSalesTerritory column header.
5. Right-click either of the select column headers, and then in the context menu, select Remove Columns.
6. In the status bar, verify that the query has two columns and 39 rows.
148
Activity 3:
Configure the Product query
Solution:
In this task, you will configure the Product query.
When detailed instructions have already been provided in the labs, the lab steps will now provide more concise
instructions. If you need the detailed instructions, you can refer back to other tasks.
3. Locate the FinishedGoodsFlag column, and then filter the column to retrieve products that are finished
goods (i.e. TRUE).
o ProductKey o
EnglishProductName o
StandardCost o Color o
DimProductSubcategory
5. Notice that the DimProductSubcategory column represents a related table (it contains Value links).
6. In the DimProductSubcategory column header, at the right of the column name, click the expand
button.
149
By selecting these two columns, a transformation will be applied to join to the
DimProductSubcategory table, and then include these columns. The
DimProductCategory column is, in fact, another related table.
Query column names must always be unique. When checked, this checkbox would prefix each column
with the expanded column name (in this case DimProductSubcategory). Because it’s known that the
selected columns don’t collide with columns in the Product query, the option is deselected.
11. Notice that the transformation resulted in two columns, and that the DimProductCategory column has
been removed.
12. Expand the DimProductCategory, and then introduce only the EnglishProductCategoryName
column.
150
EnglishProductSubcategoryName to Subcategory o
EnglishProductCategoryName to Category
14. In the status bar, verify that the query has six columns and 397 rows.
Activity 4:
Configure the Reseller query
Solution:
In this task, you will configure the Reseller query.
o ResellerKey o
BusinessType o
RellerName o
DimGeography
4. Expand the DimGeography column, to include only the following three columns:
o City o StateProvinceName o
EnglishCountryRegionName
151
5. In the Business Type column header, click the down-arrow, and then review the items, and the incorrect
spelling of warehouse.
6. Right-click the Business Type column header, and then select Replace Values.
152
7. In the Replace Values window, configure the following values:
8. Click OK.
9. Rename the following four columns: o BusinessType to Business Type (include a space) o
ResellerName to Reseller o StateProvinceName to State-Province o EnglishCountryRegionName
to Country-Region
10. In the status bar, verify that the query has six columns and 701 rows.
153
Activity 5:
Configure the Region query
Solution:
In this task, you will configure the Region query.
154
4. Remove all columns, except the following:
o SalesTerritoryKey o
SalesTerritoryRegion o
SalesTerritoryCountry o
SalesTerritoryGroup
o SalesOrderNumber o
OrderDate o ProductKey o
ResellerKey o
EmployeeKey o
SalesTerritoryKey o
OrderQuantity o UnitPrice
155
o TotalProductCost o
SalesAmount o
DimProduct
11. Expand the DimProduct column, and then include the StandardCost column.
12. To create a custom column, on the Add Column ribbon tab, from inside the General group, click
Custom Column.
13. In the Custom Column window, in the New Column Name box, replace the text with Cost.
14. In the Custom Column Formula box, enter the following expression (after the equals symbol):
15. For your convenience, you can copy the expression from the D:\DA100\Lab03A\Assets\Snippets.txt
file. Power Query
This expression tests if the TotalProductCost value is missing. If it is, produces a value by multiplying
the OrderQuantity value by the StandardCost value; otherwise, it uses the existing TotalProductCost
value.
o TotalProductCost o
StandardCost
156
11. Rename the following three columns:
o OrderQuantity to Quantity o UnitPrice to Unit
Price (include a space) o SalesAmount to Sales
12. To modify the column data type, in the Quantity column header, at the left of the column name, click
the 1.2 icon, and then select Whole Number.
Configuring the correct data type is important. When the column contains numeric value, it’s also
important to choose the correct type if you expect to perform mathematic calculations.
13. Modify the following three column data types to Fixed Decimal Number.
o Unit Price o
Sales o Cost
157
The fixed decimal number data type stores values with full precision, and so requires more storage
space that decimal number. It’s important to use the fixed decimal number type for financial values, or
rates (like exchange rates).
14. In the status bar, verify that the query has 10 columns and 999+ rows.
A maximum of 1000 rows will be loaded as preview data for each query.
158
2. Rename the query to Targets.
3. To unpivot the 12 month columns (M01-M12), first multi-select the Year and EmployeeID column
headers.
4. Right-click either of the select column headers, and then in the context menu, select Unpivot Other
Columns.
5. Notice that the column names now appear in the Attribute column, and the values appear in the Value
column.
o Attribute to MonthNumber (no space between the two words—it will be removed later) o
Value to Target
You will now apply transformations to produce a date column. The date will be derived from the Year
and MonthNumber columns. You will create the column by using the Columns From Examples feature.
159
8. To prepare the MonthNumber column values, right-click the MonthNumber column header, and then
select Replace Values.
12. On the Add Column ribbon tab, from inside the General group, click The Column From Examples
icon.
160
13. Notice that the first row is for year 2017 and month number 7.
14. In the Column1 column, in the first grid cell, commence enter 7/1/2017, and then press Enter.
The virtual machine uses US regional settings, so this date is in fact July 1, 2017.
15. Notice that the grid cells update with predicted values.
The feature has accurately predicted that you are combining values from two columns.
16. Notice also the formula presented above the query grid.
17. To rename the new column, double-click the Merged column header.
161
20. Remove the following columns:
o Year o
MonthNumber
22. To multiply the Target values by 1000, select the Target column header, and then on the Transform
ribbon tab, from inside the Number Column group, click Standard, and then select Multiply.
25. In the status bar, verify that the query has three columns and 809 rows.
162
Configure the ColorFormats query
3. On the Home ribbon tab, from inside the Transform group, click Use First Row as Headers.
4. In the status bar, verify that the query has three columns and 10 rows.
Activity 6:
Update the Product query
Solution:
In this task, you will update the Product query by merging the ColorFormats query.
1. Select the Product query.
163
2. To merge the ColorFormats query, on the Home ribbon tab, from inside the Combine group, click
Merge Queries.
Merging queries allows integrating data, in this case from different data sources (SQL Server and a
CSV file).
3. In the Merge window, in the Product query grid, select the Color column header.
4. Beneath the Product query grid, in the dropdown list, select the ColorFormats query.
6. When the Privacy Levels window opens, for each of the two data sources, in the corresponding
dropdown list, select Organizational.
164
Privacy levels can be configured for data source to determine whether data can be shared between
sources. Setting each data source as Organizational allows them to share data, if necessary. Note that
Private data sources can never be shared with other data sources. It doesn’t mean that Private data
cannot be shared; it means that the Power Query engine cannot share data between the sources.
7. Click Save.
165
10. In the status bar, verify that the query now has eight columns and 397 rows.
In this task, you will update the ColorFormats to disable its load.
3. In the Query Properties window, uncheck the Enable Load To Report checkbox.
166
Disabling the load means it will not load as a table to the data model. This is done because the query
was merged with the Product query, which is enabled to lad to the data model.
4. Click OK.
Finish up
3. In the Fields pane (located at the right), notice the seven tables loaded to the data model.
167
4. Save the Power BI Desktop file.
168
Lab 12
Objective:
The objective of this lab is to develop the data model. It will involve creating relationships between tables, and
then configuring table and column properties to improve the friendliness and usability of the data model. You
will also create hierarchies and create quick measures.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Create model relationships
• Configure table and column properties
• Create hierarchies
• Create quick measures
Instructor Note:
As pre-lab activity, read Chapter xx from the textbook “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
169
1) Useful Concepts
Activity 1:
Create Model Relationships
Solution:
In this exercise, you will create model relationships.
2. If you do not see all seven tables, scroll horizontally to the right, and then drag and arrange the tables
more closely together so they can all be seen at the same time.
In Model view, it’s possible to view each table and relationships (connectors between tables). Presently,
there are no relationships because you disabled the data load relationship options.
170
3. To return to Report view, at the left, click the Report view icon.
4. To view all table fields, in the Fields pane, right-click an empty area, and then select Expand All.
5. To create a table visual, in the Fields pane, from inside the Product table, check the Category field.
From now on, the labs will use a shorthand notation to reference a field. It will look like this: Product
| Category.
6. To add a column to the table, in the Fields pane, check the Sales | Sales field.
7. Notice that the table visual lists four product categories, and that the sales value is the same for each,
and the same for the total.
171
The issue is that the table is based on fields from different tables. The expectation is that each product
category displays the sales for that category. However, because there isn’t a model relationship between
these tables, the Sales table is not filtered. You will now add a relationship to propagate filters between
the tables.
8. On the Modeling ribbon tab, from inside the Relationships group, click Manage Relationships.
9. In the Manage Relationships window, notice that no relationships are yet defined.
11. In the Create Relationship window, in the first dropdown list, select the Product table.
12. In the second dropdown list (beneath the Product table grid), select the Sales table.
172
13. Notice the ProductKey columns in each table have been selected.
The columns were automatically selected because they share the same name.
14. In the Cardinality dropdown list, notice that One To Many is selected.
The cardinality was automatically detected, because Power BI understands that the ProductKey column
from the Product table contains unique values. One-to-many relationships are the most common
cardinality, and all relationship you create in this lab will be this type. In the Cross Filter Direction
dropdown list, notice that Single is selected.
Single filter direction means that filters propagate from the “one side” to the “many side”. In this
case, it means filters applied to the Product table will propagate to the Sales table, but not in the
other direction. Notice that the Mark This Relationship Active is checked.
Active relationships will propagate filters. It’s possible to mark a relationship as inactive so filters don’t
propagate. Inactive relationships can exist when there are multiple relationship paths between tables.
In which case, model calculations can use special functions to activate them.
16. In the Manage Relationships window, notice that the new relationship is listed, and then click Close.
17. In the report, notice that the table visual has updated to display different values for each product
category.
173
18. Filters applied to the Product table now propagate to the Sales table.
19. Switch to Model view, and then notice there is now a connector between the two tables.
20. In the diagram, notice that you can interpret the cardinality which is represented by the 1 and *****
indicators.
Filter direction is represented by the arrow head. And, a solid line represents an active relationship; a
dashed line represents an inactive relationship.
21. Hover the cursor over the relationship to reveal the related columns.
There’s an easier way to create a relationship. In the model diagram, you can drag and drop columns
to create a new relationship.
22. To create a new relationship, from the Reseller table, drag the ResellerKey column on to the
ResellerKey column of the Sales table.
174
Tip: Sometime a column doesn’t want to be dragged. If this situation arises, select a different column,
and then select the column you intend to drag again, and try again.
In this lab, the SalespersonRegion and Targets tables will remain disconnected. There’s a many-to-
many relationship between salespeople and regions, you will work this advanced scenario in the next
lab.
24. In the diagram, placing the tables with the Sales table in the center, and arranging the related tables
about it.
175
Activity 2:
Configure Tables
Solution:
In this exercise, you will configure each table by creating hierarchies, and hiding, formatting, and categorizing
columns.
1. In Model view, in the Fields pane, if necessary, expand the Product table.
2. To create a hierarchy, in the Fields pane, right-click the Category column, and then select Create
Hierarchy.
3. In the Properties pane (to the left of the Fields pane), in the Name box, replace the text with Products.
4. To add the second level to the hierarchy, in the Hierarchy dropdown list, select Subcategory.
5. To add the third level to the hierarchy, in the Hierarchy dropdown list, select Product.
176
Tip: Don’t forget to click Apply Level Changes—it’s a common mistake to overlook this step.
177
9. To organize columns into a display folder, in the Fields pane, first select the Background Color
Format column.
10. While pressing the Ctrl key, select the Font Color Format.
11. In the Properties pane, in the Display Folder box, enter Formatting.
12. In the Fields pane, notice that the two columns are now inside a folder.
Display folders are a great way to declutter tables—especially those that contain lots of fields.
1. In the Region table, create a hierarchy named Regions, with the following three levels:
o Group o
Country o
Region
178
2. Select the Country column (not the Country level).
3. In the Properties pane, expand the Advanced section, and then in the Data Category dropdown list,
select Country/Region.
Data categorization can provide hints to the report designer. In this case, categorizing the column as
country or region, provides more accurate information when rendering a map visualization.
1. In the Reseller table, create a hierarchy named Resellers, with the following two levels:
o Business Type o
Reseller
179
2. Create a second hierarchy named Geography, with the following four levels:
o Country-Region o
State-Province o City o
Reseller
2. In the Properties pane, in the Description box, enter: Based on standard cost
180
Descriptions can be applied to table, columns, hierarchies, or measures. In the Fields pane,
description text is revealed in a tooltip when a report author hovers their cursor over the field.
6. In the Properties pane, from inside the Formatting section, slide the Decimal Places property to 2.
7. In the Advanced group (you may need to scroll down to locate it), in the Summarize By dropdown list,
select Average.
181
By default, numeric columns will summarize by summing values together. This default behavior is not
suitable for a column like Unit Price, which represents a rate. Setting the default summarization to
average will produce a useful and accurate result.
1. While pressing the Ctrl key, select the following 13 columns (spanning multiple tables):
The columns were hidden because they are either used by relationships or will be used in row-level
security configuration or calculation logic.
You will define row-level security in the next lab using the UPN column. You will use the
SalesOrderNumber in a calculation in Lab 06A.
182
o Product | Standard Cost o Sales
| Cost
o Sales | Sales
4. In the Properties pane, from inside the Formatting section, set the Decimal Places property to 0
(zero).
Activity 3:
Review the Model Interface
Solution:
In this exercise, you will switch to Report view, and review the model interface.
In this task, you will switch to Report view, and review the model interface.
o Columns, hierarchies and their levels are fields, which can be used to configure report visuals
o Only fields relevant to report authoring are visible
o The SalespersonRegion table is not visible—because all of its fields are hidden o Spatial fields
in the Region and Reseller table are adorned with a spatial icon o Fields adorned with the sigma
symbol (Ʃ) will summarize, by default o A tooltip appears when hovering the cursor over the Sales
| Cost field
3. Expand the Sales | OrderDate field, and then notice that it reveals a date hierarchy.
183
The Targets | TargetMonth presents the same hierarchy. These hierarchies were not created by you.
They are created automatically. There is a problem, however. The Adventure Works financial year
commences on July 1 of each year. But, the date hierarchy year commences on January 1 of each year.
You will now turn this automatic behavior off. In Lab 06A, you will use DAX to create a date table, and
configure it define the Adventure Works’ calendar.
4. To turn off auto/date time, click the File ribbon tab to open the backstage view.
5. At the left, select Options and Settings, and then select Options.
184
6. In the Options window, at the left, in the Current File group, select Data Load.
8. Click OK.
9. In the Fields pane, notice that the date hierarchies are no longer available.
Activity 4:
Create Quick Measures
Solution:
In this exercise, you will create two quick measures.
In this task, you will create two quick measures to calculate profit and profit margin.
185
1. In the Fields pane, right-click the Sales table, and then select New Quick Measure.
2. In the Quick Measures window, in the Calculation dropdown list, from inside the Mathematical
Operations group, select Subtraction.
186
3. In the Fields pane, expand the Sales table.
187
6. Click OK.
A quick measure creates the calculation for you. They’re easy and fast to create for simple and common
calculations. In the Fields pane, inside the Sales table, notice that new measure.
188
Tip: To rename a field, you can also double-click it, or select it and press F2.
9. In the Sales table, add a second quick measure, based on the following requirements:
189
10. Ensure the Profit Margin measure is selected, and then on the Measure Tools contextual ribbon, set
the format to Percentage, with two decimal places.
11. To test the two measures, first select the table visual on the report page.
190
13. Click and drag the right guide to widen the table visual.
14. Verify that the measures produce reasonable result that are correctly formatted.
191
Activity 5:
Advanced Data Modeling in Power BI Desktop
Solution:
In this activity, you will create a many-to-many relationship between the Salesperson table and the Sales table.
You will also enforce row-level security to ensure that a salesperson can only analyze sales data for their
assigned region(s). You will also learn how to:
In this task, you will create a many-to-many relationship between the Salesperson table and the Sales table.
1. In Power BI Desktop, in Report view, in the Fields pane, check the follow two fields to create a table
visual:
The table displays sales made by each salesperson. However, there is another relationship between
salespeople and sales. Some salespeople belong to one, two, or possibly more sales regions. In addition,
sales regions can have multiple salespeople assigned to them.
192
4. Use the drag-and-drop technique to create the following two model relationships: o Salesperson |
EmployeeKey to SalespersonRegion | EmployeeKey o Region | SalesTerritoryKey to
SalespersonRegion | SalesTerritoryKey
5. Switch to Report view, and then notice that the visual has not updated—the sales result for Michael
Blythe has not changed.
6. Switch back to Model view, and then follow the relationship filter directions (arrowhead) from the
Salesperson table.
Consider this: the Salesperson table filters the Sales table. It also filters the
SalespersonRegion table, but it does not continue to propagate to the Region table (the
arrowhead is pointing the wrong way).
7. To edit the relationship between the Region and SalespersonRegion tables, doubleclick the
relationship.
8. In the Edit Relationship window, in the Cross Filter Direction dropdown list, select Both.
This setting will ensure that bi-directional filtering is applied when row-level security is being enforced.
You will configure a security role in the next exercise.
193
10. Click OK.
12. Switch to Report view, and then notice that the sales values have not changed.
The issue now relates to the fact that there are two possible filter propagation paths between the
Salesperson and Sales tables. This ambiguity is internally resolved, based on a “least number of tables”
assessment. To be clear, you should not design models with this type of ambiguity—it will be addressed
in part in this lab, and by the next lab.
14. To force filter propagation via the bridging table, double-click the relationship between the Salesperson
and Sales tables.
15. In the Edit Relationship window, uncheck the Make This Relationship Active checkbox.
194
The filter propagation is now forced to take the only active path.
17. In the diagram, notice that the inactive relationship is represented by a dashed line.
18. Switch to Report view, and then notice that the sales for Michael Blythe is now nearly $22 million.
19. Notice also, that the sales for each salesperson—if added—would exceed the total.
While the many-to-many relationship is now working, it’s now not possible to analyze sales made by a
salesperson (the relationship is inactive). In the next lab, you’ll introduce a calculated table that will
represent salesperson for performance analysis (of their regions).
20. Switch to Modeling view, and then in the diagram, select the Salesperson table.
21. In the Properties pane, in the Name box, replace the text with Salesperson (Performance).
The renamed table now reflects its purpose: it is used to report and analyze the performance of
salespeople based on the sales of their assigned sales regions.
195
Task 2: Relate the Targets table
1. Create a relationship from the Salesperson (Performance) | EmployeeID column and the Targets |
EmployeeID column.
2. In Report view, add the Targets | Target field to the table visual.
3. It’s now possible to visualize sales and targets—but take care, for two reasons. First, there is no filter
on a time period, and so targets also including future target values. Second, targets are not additive, and
so the total should not be displayed. They can either disabled by using a visual formatting property or
removed by using calculation logic. You’ll write a target measure that will return BLANK when more
than one salesperson is filtered.
In this task, you will enforce row-level security to ensure a salesperson can only ever see sales made in their
assigned region(s).
196
2. In the Fields pane, select the Salesperson (Performance) table.
3. Review the data, noticing that Michael Blythe (EmployeeKey 281) has been assigned your Power BI
account (UPN column).
Recall that Michael Blythe is assigned to three sales regions: US Northeast, US Central, and US
Southeast.
5. On the Modeling ribbon tab, from inside the Security group, click Manage Roles.
7. In the box, replace the selected text with the name of the role: Salespeople, and then press Enter.
197
8. To assign a filter, for the Salesperson (Performance) table, click the ellipsis (…) character, and then
select Add Filter | [UPN].
9. In the Table Filter DAX Expression box, modify the expression by replacing “Value” with
USERNAME().
USERNAME() is a Data Analysis Expressions (DAX) function that retrieves the authenticated user. This
means that the Salesperson (Performance) table will filter by the User Principal Name (UPN) of the
user querying the model.
11. To test the security role, on the Modeling ribbon tab, from inside the Security group, click View As.
198
12. In the View as Roles window, check the Other User item, and then in the corresponding box, enter
your account name.
Tip: You can copy it from the MySettings.txt file.
This configuration results in using the Salespeople role and impersonating the user with your account
name.
15. Notice the yellow banner above the report page, describing the test security context.
16. In the table visual, notice that only the salesperson Michael Blythe is listed.
17. To stop testing, at the right of the yellow banner, click Stop Viewing.
199
7) Graded Lab Tasks
Note: The instructor can design graded lab activities according to the level of difficult and complexity
of the solved lab activities. The lab tasks assigned by the instructor should be evaluated in the same
lab.
Lab Task 1
Import a dataset from Excel into Power BI Desktop. Cleanse the data by removing duplicates and null values.
Create relationships between different tables in the dataset. Add calculated columns using Power BI's query
editor. Load the prepared data into Power BI Desktop's data model.
200
Lab 13
Using DAX in Power BI Desktop
Objective:
The objective of this lab is to learn how to create calculated tables, calculated columns, and simple measures
using Data Analysis Expressions (DAX).
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Create calculated tables
• Create calculated columns
• Create measures
•
Instructor Note:
As pre-lab activity, read Chapter xx from the textbook “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”..
201
1) Useful Concepts
What is DAX?
DAX or Data Analysis Expressions drive all the calculations you can perform in Power BI. DAX formulas are
versatile, dynamic, and very powerful – they allow you to create new fields and even new tables in your model.
While DAX is most commonly associated with Power BI, you can also find DAX formulas in Power Pivot in
Excel and SQL Server Analysis Services (SSAS).
DAX formulas are made up of 3 core components and this tutorial will cover each of these:
• Syntax – Proper DAX syntax is made up of a variety of elements, some of which are common to all
formulas.
• Functions – DAX functions are predefined formulas that take some parameters and perform a specific
calculation.
• Context – DAX uses context to determine which rows should be used to perform a calculation.
202
1 20 Low CLO-6
2 20 Low CLO-6
3 20 Medium CLO-6
4 20 Medium CLO-6
5 20 Medium
Activity 1:
Create Calculated Tables
Solution:
In this exercise, you will create two calculated tables. The first will be the Salesperson table, to allow a direct
relationship between it and the Sales table. The second will be the Date table.
In this task, you will create the Salesperson table (direct relationship to Sales).
1. In Power BI Desktop, in Report view, on the Modeling ribbon, from inside the Calculations
group, click New Table.
2. In the formula bar (which opens directly beneath the ribbon when creating or editing
calculations), type Salesperson =, press Shift+Enter, type 'Salesperson (Performance)', and
then press Enter.
For your convenience, all DAX definitions in this lab can be copied from the
D:\DA100\Lab06A\Assets\Snippets.txt file.
A calculated table is created by first entering the table name, followed by the equals symbol (=),
followed by a DAX formula that returns a table. The table name cannot already exist in the data model.
The formula bar supports entering a valid DAX formula. It includes features like autocomplete,
Intellisense and color-coding, enabling you to quickly and accurately enter the formula.
203
This table definition creates a copy of the Salesperson (Performance) table. It copies the data only,
however properties like visibility, formatting, etc. are not copied.
Tip: You are encouraged to enter “white space” (i.e. carriage returns and tabs) to layout formulas in
an intuitive and easy-to-read format—especially when formulas are long and complex. To enter a
carriage return, press Shift+Enter. “White space” is optional.
3. In the Fields pane, notice that the table icon is a shade of blue (denoting a calculated table).
Calculated tables are defined by using a DAX formula which returns a table. It is important to
understand that calculated tables increase the size of the data model because they materialize and store
values. They are recomputed whenever formula dependencies are refreshed, as will be the case in this
data model when new (future) date values are loaded into tables.
Unlike Power Query-sourced tables, calculated tables cannot be used to load data from external data
sources. They can only transform data based on what has already been loaded into the data model.
5. Notice that the Salesperson table is available (take care, it might be hidden from view—scroll
horizontally to locate it).
6. Create a relationship from the Salesperson | EmployeeKey column to the Sales | EmployeeKey
column.
7. Right-click the inactive relationship between the Salesperson (Performance) and Sales tables, and then
select Delete.
204
9. In the Salesperson table, multi-select the following columns, and then hide them:
o EmployeeID o
EmployeeKey o
UPN
11. In the Properties pane, in the Description box, enter: Salesperson related to s sale
Recall that description appear as tooltips in the Fields pane when the user hovers their cursor over a
table or field.
12. For the Salesperson (Performance) table, set the description to: Salesperson related to region(s)
The data model now provides to alternatives when analyzing salespeople. The Salesperson
table allows analyzing sales made by a salesperson, while the Salesperson (Performance)
table allows analyzing sales made in the sales region(s) assigned to the salesperson.
2. On the Home ribbon tab, from inside the Calculations group, click New Table.
DAX
Date =
205
CALENDARAUTO(6)
The CALENDARAUTO() function returns a single-column table consisting of date values. The “auto”
behavior scans all data model date columns to determine the earliest and latest date values stored in
the data model. It then creates one row for each date within this range, extending the range in either
direction to ensure full years of data is stored.
This function can take a single optional argument which is the last month number of a year. When
omitted, the value is 12, meaning that December is the last month of the year. In this case 6 is entered,
meaning that June is the last month of the year.
If the column does not appear, in the Fields pane, select a different table, and then select the Date table.
The dates shown are formatted using US regional settings (i.e. mm/dd/yyyy).
5. At the bottom-left corner, in the status bar, notice the table statistics, confirming that 1826 rows of data
have been generated, which represents five full years’ data.
In this task, you will add additional columns to enable filtering and grouping by different time periods. You will
also create a calculated column to control the sort order of other columns.
1. On the Table Tools contextual ribbon, from inside the Calculations group, click New Column.
206
2. In the formula bar, type the following, and then press Enter:
DAX
Year =
A calculated column is created by first entering the column name, followed by the equals symbol (=),
followed by a DAX formula that returns a single-value result. The column name cannot already exist in
the table.
The formula uses the date’s year value but adds one to the year value when the month is after June. This
is how fiscal years at Adventure Works are calculated.
4. Use the snippets file definitions to create the following two calculated columns for the Date table: o
Quarter o Month
207
5. To validate the calculations, switch to Report view.
6. To create a new report page, at the bottom-left, click the plus icon.
7. To add a matrix visual to the new report page, in the Visualizations pane, select the matrix visual type.
Tip: You can hover the cursor over each icon to reveal a tooltip describing the visual type.
8. In the Fields pane, from inside the Date table, drag the Year field into the Rows well.
208
9. Drag the Month field into the Rows well, directly beneath the Year field.
10. At the top-right of the matrix visual, click the forked-double arrow icon (which will expand all years
down one level).
11. Notice that the years expand to months, and that the months are sorted alphabetically rather than
chronologically.
209
By default, text values sort alphabetically, numbers sort from smallest to largest, and dates sort from
earliest to latest.
12. To customize the Month field sort order, switch to Data view.
DAX
MonthKey =
14. In Data view, verify that the new column contains numeric values (e.g. 201707 for July 2017, etc.).
210
15. Switch back to Report view.
16. In the Fields pane, ensure that the Month field is selected (when selected, it will have a dark gray
background).
17. On the Column Tools contextual ribbon, from inside the Sort group, click Sort by Column, and then
select MonthKey.
18. In the matrix visual, notice that the months are now chronologically sorted.
211
Task 4: Complete the Date table
In this task, you will complete the design of the Date table by hiding a column and creating a hierarchy. You
will then create relationships to the Sales and Targets tables.
3. In the Date table, create a hierarchy named Fiscal, with the following three levels:
o Year o
Quarter o
Month
212
o Sales | OrderDate o Targets |
TargetMonth
In this task, you will mark the Date table as a date table.
3. On the Table Tools contextual ribbon, from inside the Calendars group, click Mark as Date Table,
and then select Mark as Date Table.
4. In the Mark as Date Table window, in the Date Column dropdown list, select Date.
5. Click OK.
213
Power BI Desktop now understands that this table defines date (time). This is important when relying
on time intelligence calculations.
Note that this design approach for a date table is suitable when you don’t have a date table in your data
source. If you have access to a data warehouse, it would be appropriate to load date data from its date
dimension table rather than “redefining” date logic in your data model.
Activity 2:
Create Measures
Solution:
In this exercise, you will create and format several measures.
In this task, you will create simple measures. Simple measures aggregate a single column or table.
1. In Report view, on Page 2, in the Fields pane, drag the Sales | Unit Price field into the matrix visual.
Recall that in previous lab, you set the Unit Price column to summarize by Average. The result you see
in the matrix visual is the monthly average unit price.
2. In the visual fields pane (located beneath the Visualizations pane), in the Values well, notice that Unit
Price is listed.
214
3. Click the down-arrow for Unit Price, and then notice the available menu options.
Visible numeric columns allow report authors to decide at report design time how a column will
summarize (or not). This can result in inappropriate reporting. Some data modelers do not like leaving
things to chance, however, and choose to hide these columns and instead expose aggregation logic
defined by measures. This is the approach you will now take in this lab.
4. To create a measure, in the Fields pane, right-click the Sales table, and then select New Measure.
215
5. In the formula bar, add the following measure definition:
DAX
Avg Price = AVERAGE(Sales[Unit Price])
9. Use the snippets file definitions to create the following five measures for the Sales table: o Median
Price o Min Price o Max Price o Orders o Order Lines
The DISTINCTCOUNT() function used in the Orders measure will count orders only once (ignoring
duplicates). The COUNTROWS() function used in the Order
216
Lines measure operates over a table. In this case, the number of orders is calculated by counting the
distinct SalesOrderNumber column values, while the number of order lines is simply the number of
table rows (each row is a line of an order).
10. Switch to Model view, and then multi-select the four price measures: Avg Price, Max Price, Median
Price, and Min Price.
The Unit Price column is now not available to report authors. They must use the measure you’ve added
to the model. This design approach ensures that report authors won’t inappropriately aggregate prices,
for example, by summing them.
13. Multi-select the Orders and Order Lines measures, and configure the following requirements:
14. In Report view, in the Values well of the matrix visual, for the Unit Price field, click X to remove it.
217
15. Increase the size of the matrix visual to fill the page width and height.
16. Add the following five new measures to the matrix visual:
o Median Price o
Min Price o Max
Price o Orders o
Order Lines
17. Verify that the results looks sensible and are correctly formatted.
218
Task 2: Create additional measures
In this task, you will create additional measures that use more complex expressions.
2. Review the table visual, noticing the total for the Target column.
Summing the target values together doesn’t make sense because salespeople targets are set for
each salesperson based on their sales region assignment(s). A target value should only be shown
when a single salesperson is filtered. You will implement a measure now to do just that.
You’re about to create a measure named Target. It’s not possible to have a column and measure
in the same table, with the same name.
DAX
Target =
IF(
HASONEVALUE('Salesperson (Performance)'[Salesperson]),
SUM(Targets[TargetAmount])
The HASONEVALUE() function tests whether a single value in the Salesperson column is filtered.
When true, the expression returns the sum of target amounts (for just that salesperson). When false,
BLANK is returned.
220
9. Notice that the Target column total is now BLANK.
10. Use the snippets file definitions to create the following two measures for the Targets table: o
Variance o Variance Margin
12. Format the Variance Margin measure as percentage with two decimal places.
13. Add the Variance and Variance Margin measures to the table visual.
221
While it appears all salespeople are not meeting target, remember that the measures aren’t yet
filtered by a specific time period. You’ll produce sales performance reports that filter by a user-
selected time period in Lab 07A.
15. At the top-right corner of the Fields pane, collapse and then expand open the pane.
16. Notice that the Targets table now appears at the top of the list.
Tables that comprise only visible measures are automatically listed at the top of the list.
Activity 3:
create measures with DAX expressions involving filter context manipulation.
Solution:
• Use the CALCULATE() function to manipulate filter context
• Use Time Intelligence functions
222
Work with Filter Context
In this exercise, you will create measures with DAX expressions involving filter context manipulation.
In this task, you will create a matrix visual to support testing your new measures.
6. To expand the entire hierarchy, at the top-right of the matrix visual, click the forkeddouble arrow
icon twice.
223
Recall that the Regions hierarchy has the levels Group, Country, and Region.
7. To format the visual, beneath the Visualizations pane, select the Format pane.
10. Verify that the matrix visual has four column headers.
224
At Adventure Works, the sales regions are organized into groups, countries, and regions. All
countries—except the United States—have just one region, which is named after the country. As
the United States is such a large sales territory, it is divided into five regions.
You’ll create several measures in this exercise, and then test them by adding them to the matrix
visual.
In this task, you will create several measures with DAX expressions that use the CALCULATE() function
to manipulate filter context.
DAX
225
The CALCULATE() function is a powerful function used to manipulate the filter context. The first
argument takes an expression or a measure (a measure is just a named expression). Subsequent
arguments allow modifying the filter context.
The REMOVEFILTERS() function removes active filters. It can take either no arguments, or a
table, a column, or multiple columns as its argument.
In this formula, the measure evaluates the sum of the Sales column in a modified filter context,
which removes any filters applied to the Region table.
3. Notice that the Sales All Region measure computes the total of all region sales for each region,
country (subtotal) and group (subtotal).
This measure is yet to deliver a useful result. When the sales for a group, country, or region is
divided by this value it produces a useful ratio known as “percent of grand total”.
4. In the Fields pane, ensure that the Sales All Region measure is selected, and then in the formula
bar, replace the measure name and formula with the following formula:
226
Tip: To replace the existing formula, first copy the snippet. Then, click inside the formula bar and
press Ctrl+A to select all text. Then, press Ctrl+V to paste the snippet to overwrite the selected
text. Then press Enter.
DAX
)
)
The measure has been renamed to accurately reflect the updated formula. The DIVIDE()
function divides the Sales measure (not modified by filter context) by the Sales measure in a
modified context which removes any filters applied to the Region table.
5. In the matrix visual, notice that the measure has been renamed and that a different values now
appear for each group, country, and region.
6. Format the Sales % All Region measure as a percentage with two decimal places.
7. In the matrix visual, review the Sales % All Region measure values.
8. Add another measure to the Sales table, based on the following expression, and format as a
percentage:
DAX
227
Sales % Country =
DIVIDE(
SUM(Sales[Sales]),
CALCULATE(
SUM(Sales[Sales]),
REMOVEFILTERS(Region[Region])
)
)
9. Notice that the Sales % Country measure formula differs slightly from the Sales % All Region
measure formula.
The different is that the denominator modifies the filter context by removing filters on the Region
column of the Region table, not all columns of the Region table. It means that any filters applied
to the group or country columns are preserved. It will achieve a result which represents the sales
as a percentage of country.
11. Notice that only the United States’ regions produce a value which is not 100%.
Recall that only the United States has multiple regions. All other countries have a single region
which explains why they are all 100%.
12. To improve the readability of this measure in visual, overwrite the Sales % Country measure with
this improved formula.
DAX
Sales % Country =
IF(
ISINSCOPE(Region[Region]),
228
DIVIDE(
SUM(Sales[Sales]),
CALCULATE(
SUM(Sales[Sales]),
REMOVEFILTERS(Region[Region)
)
)
)
Embedded within the IF() function, the ISINSCOPE() function is used to test whether the region
column is the level in a hierarchy of levels. When true, the DIVIDE() function is evaluated. The
absence of a false part means that blank is returned when the region column is not in scope.
13. Notice that the Sales % Country measure now only returns a value when a region is in scope.
14. Add another measure to the Sales table, based on the following expression, and format as a
percentage:
DAX
Sales % Group =
DIVIDE(
SUM(Sales[Sales]),
CALCULATE(
SUM(Sales[Sales]),
REMOVEFILTERS(
Region[Region],
Region[Country]
)
)
)
229
To achieve sales as a percentage of group, two filters can be applied to effectively
remove the filters on two columns.
16. To improve the readability of this measure in visual, overwrite the Sales % Group
measure with this improved formula.
DAX
Sales % Group =
IF(
ISINSCOPE(Region[Region])
|| ISINSCOPE(Region[Country]),
DIVIDE(
SUM(Sales[Sales]),
CALCULATE(
SUM(Sales[Sales]),
REMOVEFILTERS(
Region[Region],
Region[Country]
)
)
)
)
17. Notice that the Sales % Group measure now only returns a value when a region or country is in
scope.
18. In Model view, place the three new measures into a display folder named Ratios.
The measures added to the Sales table have modified filter context to achieve hierarchical
navigation. Notice that the pattern to achieve the calculation of a subtotal requires removing some
columns from the filter context, and to arrive at a grand total, all columns must be removed.
230
Activity 4:
Work with Time Intelligence
Solution
In this exercise, you will create a sales year-to-date (YTD) measure and sales year-over-year (YoY) growth
measure.
1. In Report view, on Page 2, notice the matrix visual which displays various measures with years
and months grouped on the rows.
2. Add a measure to the Sales table, based on the following expression, and formatted to zero decimal
places:
DAX
Sales YTD =
TOTALYTD(SUM(Sales[Sales]), 'Date'[Date], "6-30")
The TOTALYTD() function evaluates an expression—in this case the sum of the Sales column—
over a given date column. The date column must belong to a date table marked as a date table, as
you did in Lab 06A. The function can also take a third optional argument representing the last date
of a year. The absence of this date means that December 31 is the last date of the year. For
Adventure Works, June in the last month of their year, and so “6-30” is used.
3. Add the Sales field and the Sales YTD measure to the matrix visual.
231
The TOTALYTD() function performs filter manipulation, specifically time filter manipulation. For
example, to compute YTD sales for September 2017 (the third month of the fiscal year), all filters
on the Date table are removed and replaced with a new filter of dates commencing at the beginning
of the year (July 1, 2017) and extending through to the last date of the in-context date period
(September 30, 2017).
Note that many Time Intelligence functions are available in DAX to support common time filter
manipulations.
1. Add an additional measure to the Sales table, based on the following expression:
DAX
232
MONTH
)
)
RETURN
SalesPriorYear
The Sales YoY Growth measure formula declares a variable. Variables can be useful for
simplifying the formula logic, and more efficient when an expression needs to be evaluated multiple
times within the formula (which will be the case for the YoY growth logic). Variables are declared
by a unique name, and the measure expression must then be output after the RETURN keyword.
The SalesPriorYear variable is assigned an expression which calculates the sum of the Sales
column in a modified context that uses the PARALLELPERIOD() function to shift 12 months back
from each date in filter context.
3. Notice that the new measure returns blank for the first 12 months (there were no sales recorded
before fiscal year 2017).
4. Notice that the Sales YoY Growth measure value for 2017 Jul is the Sales value for 2016 Jan.
Now that the “difficult part” of the formula has been tested, you can overwrite the measure with
the final formula which computes the growth result.
233
5. To complete the measure, overwrite the Sales YoY Growth measure with this formula, formatting
it as a percentage with two decimal places:
DAX
6. In the formula, in the RETURN clause, notice that the variable is referenced twice.
234
This means that July 2018 sales ($2,411,559) represents a nearly 400% (almost 4x) improvement
over the sales achieved for the prior year ($489,328).
8. In Model view, place the two new measures into a display folder named Time Intelligence.
DAX includes many Time Intelligence functions to make it easy to implement time filter
manipulations for common business scenarios.
This exercise completes the data model development. In the next exercise, you will publish the
Power BI Desktop file to your workspace, ready for creating a report in the next lab.
In this task, you will publish the Power BI Desktop file to Power BI.
235
1. Save the Power BI Desktop file.
If you’re not confident you completed this lab successfully, you should publish the Power BI
Desktop file found in the D:\DA100\Lab06B\Solution folder. In this case, close your current Power
BI Desktop file, and then open the solution file. First, perform a data refresh (using the Refresh
command on the ribbon), and then continue with the instructions in this task.
2. To publish the file, on the Home ribbon tab, from inside the Share group, click Publish.
It’s important that you publish it to the workspace you created in Lab 01A, and not “My
workspace”.
4. Click Select.
236
5. When the file has been successfully published, click Got It.
7. In the Edge, in the Power BI service, in the Navigation pane (located at the left), review the contents
of your Sales Analysis workspace.
The pubilication has added a report and a dataset. If you don’t see them, press F5 to reload the
browser, and then expand the workspace again.
The data model has been published to become a dataset. The report—used to test your model
calculations—has been added as a report. This report is not required, so you will now delete it.
237
8. Hover the cursor over the Sales Analysis report, click the vertical ellipsis (…), and then select
Remove.
In the next lab, you will create a report based on the published dataset.
238
Lab 14
Designing a Report in Power BI Desktop
Objective:
The objective of this lab is to create a three-page report named Sales Report. You will then publish it to
Power BI, whereupon you will open and interact with the report.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Use Power BI Desktop to create a live connection
• Design a report
• Configure visual fields and format properties
Instructor Note:
As pre-lab activity, read Chapter xx from the text book “”.
239
1) Useful Concepts
Power BI reports are comprehensive and detailed pages that provide in-depth analysis and insights. They
offer more advanced functionalities compared to dashboards.
Reports enable users to dive deep into data, perform ad-hoc analysis, and explore multiple dimensions.
They provide a comprehensive view of data, allowing users to answer complex business questions. This
is made possible by offering interactive features such as drill-through, filtering, and highlighting. Users can
explore data further by interacting with the visualizations, uncovering deeper insights.
Since so many visuals and other elements are incorporated into reports, it is best to split them
across multiple pages or tabs, each containing different visualizations, tables, and interactive elements.
Users can navigate between pages to explore specific aspects of the data, by using buttons and bookmarks.
Another technique to declutter your reports is by making use of progressive disclosure: showing the bear
necessities as a starting point, but letting users uncover more details by letting them click on buttons that
reveal more detailed visuals. This gives the user a more app-like experience.
Because of their complexities, reports support advanced data modeling techniques. Users can create custom
measures, calculated columns, and use DAX (Data Analysis Expressions) to perform complex calculations,
allowing for performant and efficient displaying of visualizations.
Activity 1:
Create a Report
Solution:
240
In this exercise, you will create a three-page report named Sales Report.
In this task, you will create a live connection to the Sales Analysis dataset.
1. To open the Power BI Desktop, on the taskbar, click the Microsoft Power BI Desktop shortcut.
3. Click the File ribbon tab to open the backstage view, and then select Save.
241
6. Click Save.
In this task, you will create a live connection to the Sales Analysis dataset.
1. To create a live connection, on the Home ribbon tab, from inside the Data group, click Get Data,
down-arrow, and then select Power BI Datasets.
2. In the Select a Dataset to Create a Report window, select the Sales Analysis dataset.
242
3. Click Create.
4. At the bottom-right corner, in the status bar, notice that the live connection has been established.
5. In the Fields pane, notice that the data model table are listed.
Power BI Desktop can no longer be used to develop the data model; in live connection mode, it’s
only a report authoring tool. It is possible, however, to create measures—but they are measures
that are only available within the report. You won’t add any reportscoped measures in this lab.
Recall that you added the Salespeople role to the model in Lab 05A. Because you’re the owner of
the Power BI dataset, the roles are not enforced. This explains why, in this lab, you can see all
data.
In this task, you will design the first report page. When you’ve completed the design, the page will look
like the following:
243
1. To rename the page, at the bottom-left, right-click Page 1, and then select Rename.
3. To add an image, on the Insert ribbon tab, from inside the Elements group, click Image.
244
5. Select the AdventureWorksLogo.jpg file, and then click Open.
6. Drag the image to reposition it at the top-left corner, and also drag the guide markers to resize it.
7. To add a slicer, first de-select the image by clicking an empty area of the report page.
8. In the Fields pane, select the Date | Year field (not the Year level of the hierarchy).
9. Notice that a table of year values has been added to the report page.
10. To convert the visual from a table to a slicer, in the Visualizations pane, select the Slicer.
11. To convert the slicer from a list to a dropdown, at the top-right of the slicer, click the down-arrow,
and then select Dropdown.
245
12. Resize and reposition the slicer so it sits beneath the image, and so it is the same width as the image.
13. In the Year slicer, select FY2020, and then collapse the dropdown list.
14. De-select the slicer by clicking an empty area of the report page.
15. Create a second slicer, based on the Region | Region field (not the Region level of the hierarch).
16. Leave the slicer as a list, and then resize and reposition the slicer beneath the Year slicer.
246
17. To format the slicer, beneath the Visualizations pane, open the Format pane.
247
20. In the Region slicer, notice that the first item is now Select All.
When selected, this item either selects all, or de-selects all items. It makes it easier for report users
to set the right filters.
21. De-select the slicer by clicking an empty area of the report page.
22. To add a chart to the page, in the Visualizations pane, click the Line and Stacked Column Chart
visual type.
23. Resize and reposition the visual so it sits to the right of the logo, and so it fills the width of the
report page.
248
24. Drag the following fields into the visual:
o Date | Month o
Sales | Sales
25. In the visual fields pane (not the Fields pane—the visual fields pane is located beneath the
Visualizations pane), notice that the fields are assigned to the Shared Axis and Column Values
wells.
By dragging visuals into a visual, they will be added to default wells. For precision, you can drag
fields directly into the wells, as you will do now.
26. From the Fields pane, drag the Sales | Profit Margin field into the Line Values well.
249
27. Notice that the visual has 11 months only.
The last month of the year, 2020 June, does not have any sales (yet). By default, the visual has
eliminated months with BLANK sales. You will now configure the visual to show all months.
28. In the visual fields pane, in the Shared Axis well, for the Month field, click the downarrow, and
then select Show Items With No Data.
30. De-select the chart by clicking an empty area of the report page.
31. To add a chart to the page, in the Visualizations pane, click the Map visual type.
250
32. Resize and reposition the visual so it sits beneath the column/line chart, and so it fills half the width
of the report page.
34. De-select the chart by clicking an empty area of the report page.
35. To add a chart to the page, in the Visualizations pane, click the Stacked Bar Chart visual type.
251
36. Resize and reposition the visual so it fills the remaining report page space.
39. Expand the Data Colors group, and then set the Default Color property to a suitable color (in
contrast to the column/line chart).
252
The design of the first page is now complete.
In this task, you will design the second report page. When you’ve completed the design, the page will look
like the following:
When detailed instructions have already been provided in the labs, the lab steps will now provide more
concise instructions. If you need the detailed instructions, you can refer back to other tasks.
253
3. Add a slicer based on the Region | Region field.
4. Use the Format pane to enable the “Select All” option (in the Selection Controls group).
5. Resize and reposition the slicer so it sits at the left side of the report page, and so it is about half the
page height.
6. Add a matrix visual, and resize and reposition it so it fills the remaining space of the report page
8. Add the following five Sales table fields to the Values well:
254
o Orders (from the Counts folder) o
Sales o Cost o Profit o Profit Margin
9. In the Filters pane (located at the left of the Visualizations pane), notice the Filter On This Page
well (you may need to scroll down).
10. From the Fields pane, drag the Product | Category field into the Filter On This Page well.
11. Inside the filter card, at the top-right, click the arrow to collapse the card.
255
Fields added to the Filters pane can achieve the same result as a slicer. One difference is they
don’t take up space on the report page. Another difference is that they can be configured for more
advanced filtering requirements.
12. Add each of the following Product table fields to the Filter On This Page well, collapsing each,
directly beneath the Category card:
o Subcategory
o Product o
Color
256
13. To collapse the Filters pane, at the top-right of the pane, click the arrow.
In this task, you will design the third and final report page. When you’ve completed the design, the page
will look like the following:
Recall that row-level security was configured to ensure users only ever see data for their sales
regions and targets. When this report is distributed to salespeople, they will only ever see their
sales performance results.
2. To simulate the row-level security filters during report design and testing, add the Salesperson
(Performance) | Salesperson field to the Filters pane, inside the Filters On This Page well.
3. In the filter card, scroll down the list of salespeople, and then check Michael Blythe.
257
You will be instructed to delete this filter before you distribute the report in an app in Lab 12A.
4. Add a dropdown slicer based on the Date | Year field, and then resize and reposition it so it sits at
the top-left corner of the page.
6. Add a Multi-row Card visual, and then resize and reposition it so it sits to the right of the slicer
and fills the remaining width of the page.
258
7. Add the following four fields to the visual:
o In the Data Labels group, increase the Text Size property to 28pt o In the
Background group, set the Color to a light gray color
9. Add a Clustered Bar Chart visual, and then resize and reposition it so it sits beneath the multi-
row card visual and fills the remaining height of the page, and half the width of the multi-row card
visual.
259
10. Add the following fields to the visual wells: o Axis: Date | Month o Value: Sales | Sales and
Targets | Target
260
11. To create a copy of the visual, press Ctrl+C, and then press Ctrl+V.
12. Position the copied visual to the right of the original visual.
261
13. To modify the visualization type, in the Visualizations pane, select Clustered Column Chart.
It’s now possible to see the same data expressed by two different visualization types. This isn’t a
good use of the page layout, but you will improve it in Lab 09A by superimposing the visuals. By
adding buttons to the page, you will allow the report user to determine which of the two visuals is
visible.
3. On the Home ribbon tab, from inside the Share group, click Publish.
262
4. Publish the report to your Sales Analysis workspace.
In the next exercise, you will explore the report in the Power BI service.
Activity 2:
Explore the Report
Solution:
In this exercise, you will explore the Sales Report in the Power BI service.
In this task, you will explore the Sales Report in the Power BI service.
1. In the Edge, in the Power BI service, in the Navigation pane, review the contents of your Sales
Analysis workspace, and then click the Sales Report report.
The report publication has added a report to your workspace. If you don’t see it, press F5 to
reload the browser, and then expand the workspace again.
263
2. In the Regions slicer, while pressing the Ctrl key, select multiple regions.
3. In the column/line chart, select any month column to cross filter the page.
5. Notice that the bar chart is filtered and highlighted, with the bold portion of the bars representing
the filtered months.
6. Hover the cursor over the visual, and then at the top-right, click the filter icon.
The filter icon allows you to understand all filters that are applied to the visual, including slicers
and cross filters from other visual.
7. Hover the cursor over a bar, and then notice the tooltip information.
8. To undo the cross filter, in the column/line chart, click an empty area of the visual.
9. Hover the cursor over the map visual, and then at the top-right, click the In Focus icon.
10. Hover the cursor over different segments of the pie charts to reveal tooltips.
11. To return to the report page, at the top-left, click Back to Report.
12. Hover the cursor over the map visual again, and then click the ellipsis (…), and notice the menu
options.
264
13. Try out each of the options.
14. At the left, in the Pages pane, select the Profit page.
15. Notice that the Region slicer has a different selection to the Region slicer on the Overview page.
The slicers are not synchronized. In the next lab, you will modify the report design to ensure they
sync between pages.
16. In the Filters pane (located at the right), expand a filter card, and apply some filters. The Filters
pane allows you to define more filters than could possibly fit on a page as slicers.
17. In the matrix visual, use the plus (+) button to expand into the Fiscal hierarchy.
18. Select the My Performance page.
265
19. At the top-right on the menu bar, click View, and then select Full Screen.
20. Interact with the page by modifying the slicer, and cross filtering the page.
21. At the bottom-left, notice the commands to change page, navigate backwards or forwards between
pages, or to exit full screen mode.
23. To return to the workspace, in the breadcrumb trail, click your workspace name.
Activity 3:
Configure Sync Slicers
Solution:
In this exercise, you will sync the report page slicers.
In this task, you will sync the Year and Region slicers.
You will continue the development of the report that you commenced designing in Lab 08A.
266
1. In Power BI Desktop, in the Sales Report file, on the Overview page, set the Year slicer to
FY2018.
2. Go to the My Performance page, and then notice that the Year slicer is a different value.
When slicers aren’t synced, it can contribute to misrepresentation of data and frustration for report
users. You’ll now sync the report slicers.
4. On the View ribbon tab, from inside the Show Panes group, click Sync Slicers.
5. In the Sync Slicers pane (at the left of the Visualizations pane), in the second column (which
represents syncing), check the checkboxes for the Overview and My Performance pages.
267
8. Test the sync slicers by selecting different filter options, and then verifying that the synced slicers
filter by the same options.
9. To close the Sync Slicer page, click the X located at the top-right of the pane.
268
Task 1: Create a drill through page
In this task, you will create a new page and configure it as a drill through page.
2. Right-click the Product Details page tab, and then select Hide Page.
Report users won’t be able to go to the drill through page directly. They’ll need to access it from
visuals on other pages. You’ll learn how to drill through to the page in the final exercise of this
lab.
3. Beneath the Visualizations pane, in the Drill Through section, add the Product | Category field
to the Add Drill-Through Fields Here box.
269
4. To test the drill through page, in the drill through filter card, select Bikes.
The button was added automatically. It allows report users to navigate back to the page from which
they drilled through.
6. Add a Card visual to the page, and then resize and reposition it so it sits to the right of the button
and fills the remaining width of the page.
270
7. Drag the Product | Category field into the card visual.
8. Configure the format options for the visual, and then turn the Category Label property to Off.
10. Add a Table visual to the page, and then resize and reposition it so it sits beneath the card visual
and fills the remaining space on the page.
271
11. Add the following fields to the visual:
o Product | Subcategory o
Product | Color o Sales |
Quantity o Sales | Sales o
Sales | Profit Margin
12. Configure the format options for the visual, and in the Grid section, set the Text Size property to
20pt.
The design of the drill through page is almost complete. In the next exercise, you’ll define
conditional formatting.
Activity 4:
Add Conditional Formatting
Solution:
In this exercise, you will enhance the drill through page with conditional formatting. When you’ve
completed the design, the page will look like the following:
272
Task 1: Add conditional formatting
In this task, you will enhance the drill through page with conditional formatting.
2. In the visual fields pane, for the Profit Margin field, click the down-arrow, and then select
Conditional Formatting | Icons.
273
3. In the Icons – Profit Margin window, in the Icon Layout dropdown list, select Right of Data.
4. To delete the middle rule, at the left of the yellow triangle, click X.
274
5. Configure the first rule (red diamond) as follows:
The rules are as follows: display a red diamond if the profit margin value is less than 0; otherwise
if the value is great or equal to zero, display the green circle.
7. Click OK.
8. In the table visual, verify that the that the correct icons are displayed.
275
9. Configure background color conditional formatting for the Color field.
10. In the Background Color – Color window, in the Format By dropdown list, select Field Value.
11. In the Based on Field dropdown list, select Product | Formatting | Background Color Format.
276
12. Click OK.
13. Repeat the previous steps to configure font color conditional formatting for the Color field, using
the Product | Formatting | Font Color Format field
277
Task 1: Add bookmarks
In this task, you will add two bookmarks, one to display each of the monthly sales/targets visuals.
2. On the View ribbon tab, from inside the Show Panes group, click Bookmarks.
3. On the View ribbon tab, from inside the Show Panes group, click Selection.
4. In the Selection pane, beside one of the Sales and Target by Month items, to hide the visual, click
the eye icon.
278
5. In the Bookmarks pane, click Add.
7. If the visible chart is the bar chart, rename the bookmark as Bar Chart ON, otherwise rename the
bookmark as Column Chart ON.
8. In the Selection pane, toggle the visibility of the two Sales and Target by Month items.
In other words, make the visible visual hidden, and make the hidden visual visible.
279
9. Create a second bookmark, and name it appropriately (either Column Chart ON or Bar Chart
ON).
10. In the Selection pane, to make both visuals visible, simple show the hidden visual.
11. Resize and reposition both visuals so they fill the page beneath the multi-card visual, and
completely overlap one another.
Tip: To select the visual that is covered up, select it in the Selection pane.
12. In the Bookmarks pane, select each of the bookmarks, and notice that only one of the visuals is
visible.
The next stage of design is to add two buttons to the page, which will allow the report user to select
the bookmarks.
In this task, you will add two buttons, and assign bookmark actions to each.
1. On the Insert ribbon, from inside the Elements group, click Button, and then select Blank.
280
2. Reposition the button directly beneath the Year slicer.
3. Select the button, and the in the Visualizations pane, turn the Button Text property to On.
4. Expand the Button Text section, and then in the Button Text box, enter Bar Chart.
6. Turn the Action property to On (located near the bottom of the list).
7. Expand the Action section, and then set the Type dropdown list to Bookmark.
281
8. In the Bookmark dropdown list, select Bar Chart ON.
9. Create a copy of the button by using copy and paste, and then configure the new button as follows:
o Set the Button Text property to Column Chart o In the Action section, set the Bookmark
dropdown list to Column Chart ON
In the next exercise, you will explore the report in the Power BI service.
282
Explore the Report
In this exercise, you will explore the Sales Report in the Power BI service.
In this task, you will explore the Sales Report in the Power BI service.
1. In the Edge, in the Power BI service, open the Sales Report report.
2. To test the drill through report, in the Quantity by Category visual, right-click the Clothing bar,
and then select Drill Through | Product Details.
4. To return to the source page, at the top-left corner, click the arrow button.
6. Click each of the buttons, and then notice that a different visual is displayed.
Finish up
1. To return to the workspace, in the breadcrumb trail, click your workspace name.
283
3. In Power BI Desktop, go to the My Performance page, and in the Fields pane, remove the
Salesperson filter card.
5. Save the Power BI Desktop file, and then republish to the Sales Analysis workspace.
284
Lab 15
Creating a Power BI Dashboard and Data Analysis
Objective:
The objective of this lab is create dashboards in PowerBI.
Activity Outcomes:
The activities provide hands - on practice with the following topics
Instructor Note:
As pre-lab activity, read Chapter xx from the textbook “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
285
1) Useful Concepts
Dashboards, in the context of Power BI, are visual displays that provide a consolidated view of data. They
allow users to monitor key metrics, track performance, and gain high-level insights at a glance.
In general, dashboards are designed to display data in real-time or near-real-time. They can connect to
various data sources, including databases, cloud services, and streaming data, providing up-to-date
information.
While different visualizations can be used in dashboards, they focus mainly on charts, graphs, gauges, and
cards. These visual elements represent key performance indicators (KPIs) and provide a quick overview
of business metrics. Therefore, dashboards are typically limited to a single page, allowing users to see
multiple visualizations at once. This simplicity helps users quickly grasp the overall performance of their
business.
Publishing a dashboard to Power BI Service even allows developers to fix certain filters or visual elements
for a set of end users.
Activity 1:
Create a Dashboard
Solution:
In this exercise, you will create the Sales Monitoring dashboard. The completed dashboard will look like
the following:
286
Task 1: Create a dashboard
2. To create a dashboard and pin the logo image, hover the cursor over the Adventure Works logo.
4. In the Pin to Dashboard window, in the Dashboard Name box, enter Sales Monitoring.
287
5. Click Pin.
When pinning visuals to a dashboard, they will use the current filter context. Once pinned, the filter
context cannot be changed. For time-based filters, it’s a better idea to use a relative date slicer (or,
Q&A using a relative time-based question).
8. Pin the Sales and Profit Margin by Month (column/line) visual to the Sales Monitoring
dashboard.
9. Open the Navigation pane, and then open the Sales Monitoring dashboard.
288
10. Notice that the dashboard has two tiles.
11. To resize the logo tile, drag the bottom-right corner, and resize the tile to become one unit wide,
and two units high.
Tile sizes are constrained into a rectangular shape. It’s only possible to resize into multiples of the
rectangular shape.
12. To add a tile based on a question, at the top-left of the dashboard, click Ask a Question About
Your Data.
13. You can use the Q&A feature to ask a question, and Power BI will respond will a visual.
289
14. Click any one of the suggested questions beneath the Q&A box, in gray boxes.
Recall you added the Sales YTD measure in Lab 06B. This measure is Time Intelligence expression
and it requires a filter on the Date table to produce a result.
290
21. To pin the response to the dashboard, at the top-right corner, click Pin Visual.
22. When prompted to pin the tile to the dashboard, click Pin.
There’s a possible bug that will only allow you to pin to a new dashboard. It’s because your Power
BI session has reverted to your “My Workspace”. If this happens, do not pin to a new dashboard.
Return to your Sales Analysis workspace, open the dashboard again, and recreate the Q&A
question.
23. To return to the dashboard, at the top-left corner, click Exit Q&A.
1. Hover the cursor over the Sales YTD tile, and then at the top-right of the tile, click the ellipsis, and
then select Edit Details.
291
2. In the Tile Details pane (located at the right), in the Subtitle box, enter FY2020.
292
4. Notice that the Sales YTD tile displays a subtitle.
5. Edit the tile details for the Sales, Profit Margin tile.
6. In the Tile Details pane, in the Functionality section, check Display Last Refresh Time.
7. Click Apply.
8. Notice that the tile describes the last refresh time (which you did when refreshing the data model
in Power BI Desktop).
Later in this lab, you’ll simulate a data refresh, and notice that the refresh time updates.
1. Hover the cursor over the Sales YTD tile, click the ellipsis, and then select Manage Alerts.
293
2. In the Manage Alerts pane (located at the right), click Add Alert Rule.
3. In the Threshold box, replace the value with 35000000 (35 million).
294
This configuration will ensure you’re notified whenever the tile updates to a value above 35 million.
In the next exercise, you’ll refresh the dataset. Typically, this should be done by using scheduled
refresh, and Power BI could use a gateway to connect to the SQL Server database. However, due
to constraints in the classroom setup, there is no gateway. So, you’ll opening Power BI Desktop,
perform a manual data refresh, and the upload the file.
In this task, you will run a PowerShell script to update data in the
AdventureWorksDW2020 database.
295
1. In File Explorer, inside the D:\DA100\Setup folder, right-click the UpdateDatabase2-
AddSales.ps1 file, and then select Run with PowerShell.
xxxvii. When prompted to press any key to continue, press Enter again.
3. When prompted, enter the account name of your classroom partner, and then press Enter.
You only need to enter their account name (all characters before the @ symbol). Choose somebody
sitting near you—you will work together in pairs to complete Lab 12A, which covers sharing Power
BI content.
Their account name is added so you can test the row-level security. You partner is now Pamela
Ansam-Wolfe, whose sales performance is measured by the sales of two sales territory regions: US
Northwest and US Southwest.
In this task, you will open the Sales Analysis Power BI Desktop file, perform a data refresh, and then upload
the file to your Sales Analysis workspace.
1. Open your Sales Analysis Power BI Desktop file, stored in the D:\DA100\MySolution folder.
296
When the file was published in Lab 06B, if you weren’t confident you completed the lab successfully
you were advised to upload the solution file instead. If you uploaded the solution file, be sure now
to open the solution file again now. It’s located in the D:\DA100\Lab06B\Solution folder.
2. On the Home ribbon, from inside the Queries group, click Refresh.
The dataset in the Power BI service now has June 2020 sales data.
7. In Edge, in the Power BI service, in your Sales Analysis workspace, notice that the Sales Analysis
report was also published.
This report was used to test the model a you developed it in Lab 05A and Lab 06A.
In this task, you will review the dashboard to notice updated sales, and that the alert was triggered.
297
2. In the Sales, Profit Margin tile, in the subtitle, notice that the data was refreshed NOW.
The alert on the Sales YTD tile should have triggered also. After a short while, the alert should
notify you that sales now exceeds the configured threshold value.
4. Notice that the Sales YTD tile has updated to $37M.
5. Verify that the Sales YTD tile displays an alert notification icon.
If you don’t see the notification, you might need to press F5 to reload the browser. If you still don’t
see the notification, wait some minutes longer.
298
Alert notifications appear on the dashboard tile, and can be delivered by email, and push
notifications to mobile apps including the Apple Watch8.
Activity 2:
Data Analysis in Power BI Desktop
Solution:
Tip: Use the Get Data command on the Home ribbon tab, and then select Power BI Datasets.
299
You now create four report pages, and on each page you’ll work with a different visual to analyze
and explore data.
In this task, you will create a scatter chart that can be animated.
300
2. Add a Scatter Chart visual to the report page, and then reposition and resize it so it fills the entire
page.
3. Add the following fields to the visual wells: o Legend: Reseller | Business Type o X Axis: Sales |
Sales o Y Axis: Sales | Profit Margin o Size: Sales | Quantity o Play Axis: Date | Quarter
301
The chart can be animated when a field is added to the Play Axis well.
4. In the Filters pane, add the Product | Category field to the Filters On This Page well.
302
6. To animate the chart, at the bottom left corner, click Play.
The scatter chart allows understanding the measure values simultaneously: in this case, order
quantity, sales revenue, and profit margin.
Each bubble represents a reseller business type. Changes in the bubble size reflect increased or
decreased order quantities. While horizontal movements represent increases/decreases in sales
revenue, and vertical movements represent increases/decreases in profitability.
8. When the animation stops, click one of the bubbles to reveal its tracking over time.
9. Hover the cursor over any bubble to reveal a tooltip describing the measure values for the reseller
type at that point in time.
10. In the Filters pane, filter by Clothing only, and notice that it produces a very different result.
Activity 3:
Create a Forecast
Solution:
In this exercise, you will create a forecast to determine possible future sales revenue.
In this task, you will create a forecast to determine possible future sales revenue.
303
2. Add a Line Chart visual to the report page, and then reposition and resize it so it fills the entire
page.
304
4. In the Filters pane, add the Date | Year field to the Filters On This Page well.
When forecasting over a time line, you will need at least two cycles (years) of data to produce an
accurate and stable forecast.
6. Add also the Product | Category field to the Filters On This Page well, and filter by Bikes.
305
7. To add a forecast, beneath the Visualizations pane, select the Analytics pane.
If the Forecast section is not available, it’s probably because the visual hasn’t been correctly
configured. Forecasting is only available when two conditions are met: the axis has a single field
of type date, and there’s only one value field.
9. Click Add.
306
10. Configure the following forecast properties:
12. In the line visual, notice that the forecast has extended one month beyond the history data.
The gray area represents the confidence. The wider the confidence, the less stable—and therefore
the less accurate—the forecast is likely to be.
When you know the length of the cycle, in this case annual, you should enter the seasonality points.
Sometimes it could be weekly (7), or monthly (30).
13. In the Filters pane, filter by Clothing only, and notice that it produces a different result.
307
Work with a Decomposition Tree
In this exercise, you will create a decomposition tree to explore the relationships between reseller geography
and profit margin.
In this task, you will create a decomposition tree to explore the relationships between reseller geography
and profit margin.
1. Add a new page, and then rename the page to Decomposition Tree.
2. On the Insert ribbon, from inside the AI Visuals group, click Decomposition Tree.
Tip: The AI visuals are also available in the Visualizations pane.
308
5. In the Filters pane, add the Date | Year field to the Filters On This Page well, and set the filter to
FY2020.
6. In the decomposition tree visual, notice the root of the tree: Profit Margin at -0.94%
7. Click the plus icon, and in the context menu, select High Value.
309
8. Notice that the decision tree presents resellers, ordered from highest profit margin.
9. To remove the level, at the top of visual, beside the Reseller label, click X.
10. Click the plus icon again, and then expand to the Country-Region level.
12. Use the down-arrow located at the bottom of the visual for State-Province, and then scroll to the
lower profitable states.
13. Notice that New York state has negative profitability.
310
United States is not producing profit in FY2020. New York is one state not achieving positive
profit, and it’s due to four resellers paying less than standard costs for their goods.
311
1. Add a new page, and then rename the page to Key Influencers.
2. On the Insert ribbon, from inside the AI Visuals group, click Key Influencers.
o Explain By: Reseller | Business Type and Reseller | Geography (the entire hierarchy)
o Expand By: Sales | Quantity
312
5. At the top-left of the visual, notice that Key Influencers is in focus, and the specific influence is
set to understand what includes profit margin to increase.
6. Review the result, which is that the city of Bothel is more likely to increase.
313
10. Notice that the target is now to determine segments when profit margin is likely to be high.
11. When the visual displays the segments (as circles), click one of them to reveal information about
it.
314
Lab 16
Get Started with Tableau Desktop -part1
Objective:
The objective of this lab is to get an introduction with the Tableau environment and to perform few task to get
insights from any dataset.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• how to connect with the data
• how to generate basic charts
• adding filters in the view
• Adding colours in view
Instructor Note:
As pre-lab activity, read Chapter xx from the textbook “Business Intelligence Guidebook: From Data
Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
1) Useful Concepts
Start Page
The start page in Tableau Desktop is a central location from which you can do the following:
• Connect to your data
• Open your most recently used workbooks, and
• Discover and explore content produced by the Tableau community.
The start page consists of three panes: Connect, Open, and Discover.
Connect
316
On the Connect pane, you can do the following:
• Connect to data: Under To a File, connect to data stored in Microsoft Excel files, text files,
Access files, Tableau extract files, and statistical files, such as SAS, SPSS, and R. Under To a
Server, connect to data stored in databases like Microsoft SQL Server or Oracle. The server
names listed in this section change based on which servers you connect to and how often.
• Open saved data sources: Quickly open data sources that you have previously saved to your
My Tableau Repository directory. Also, Tableau provides sample saved data sources that you
can use to explore Tableau Desktop functionality. To follow along with examples in the
Tableau Desktop documentation, you'll usually use the Sample – Superstore data source.
Open
Open recent workbooks, pin workbooks to the start page, and explore accelerator workbooks.
317
On the Open pane, you can do the following:
• Open recently opened workbooks: When you open Tableau Desktop for the first time, this pane is
empty. As you create and save new workbooks, the most recently opened workbooks appear here. Click
the workbook thumbnail to open a workbook, or if you don't see a workbook thumbnail, click the Open
a Workbook link to find other workbooks that are saved to your computer.
• Pin workbooks: You can pin workbooks to the start page by clicking the pin icon that appears in the
top-left corner of the workbook thumbnail. Pinned workbooks always appear on the start page, even if
they weren't opened recently. To remove a recently opened or pinned workbook, hover over the
workbook thumbnail, and then click the "x" that appears. The workbook thumbnail is removed
immediately but will show again with your most recently used workbooks the next time you open
Tableau Desktop.
• Explore accelerators: Open and explore accelerator workbooks to see what you can do with
Tableau. Prior to 2022.2, these were called sample workbooks.
318
Discover
See popular views in Tableau Public, read blog posts and news about Tableau, and find training videos and
tutorials to help you get started.
319
2) Solved Lab Activites
Sr.No Allocated Time Level of Complexity CLO Mapping
1 10 Low CLO-6
2 10 Low CLO-6
3 20 Medium CLO-6
4 10 Low CLO-6
Activity 1:
Data connection and generating basic graphs.
Solution:
The first thing you see after you open Tableau Desktop is the Start page. Here, you select the connector (how
you will connect to your data) that you want to use.
320
1. Tableau icon. Click in the upper left corner of any page to toggle between the start page and the
authoring workspace.
• Connect to data that is stored in a file, such as Microsoft Excel, PDF, Spatial files, and more.
• Connect to data that is stored on Tableau Server, Microsoft SQL Server, Google Analytics, or another
server.
• Connect to a data source that you’ve connected to before.
Tableau supports the ability to connect to a wide variety of data stored in a wide variety of places. The Connect
pane lists the most common places that you might want to connect to, or click the More links to see more options.
3. Under Accelerators, view accelerator workbooks that come with Tableau Desktop. Prior to 2022.2, these
were called sample workbooks.
4. Under Open, you can open workbooks that you've already created.
5. Under Discover, find additional resources like video tutorials, forums, or the “Viz of the week” to get ideas
about what you can build.
In the Connect pane, under Saved Data Sources, click Sample - Superstore to connect to the sample data set.
After you select Sample - Superstore, your screen will look something like this:
321
The Sample - Superstore data set comes with Tableau. It contains information about products, sales, profits, and
so on that you can use to identify key areas for improvement within this fictitious company.
Visualization in Tableau is possible through dragging and dropping Measures and Dimensions onto these
different Shelves.
Rows and Columns : Represent the x and y-axis of your graphs / charts. Filter: Filters help you view a strained
version of your data. For example, instead of seeing the combined Sales of all the Categories, you can look at a
specific one, such as just Furniture.
Pages: Pages work on the same principle as Filters, with the difference that you can actually see the
322
changes as you shift between the Paged values. Remember that Rosling chart? You can easily make one of your own
using Pages. Marks: The Marks property is used to control the mark types of your data. You may choose to represent your data
using different shapes, sizes or text.
When you drag and drop fields onto the visualization area, Tableau makes default graphs for you, as we shall see soon, but you can
change these by referring to the Show Me option.
Note: Not every graph can be made with any combination of Dimensions or Measures. Each graph has its own conditions for the
number and types of fields that can be used, which we shall discuss next.
Create a view
You set out to identify key areas for improvement, but where to start? With four years' worth of data, you decide
to drill into the overall sales data to see what you find. Start by creating a simple chart.
1. From the Data pane, drag Order Date to the Columns shelf.
Note: When you drag Order Date to the Columns shelf, Tableau creates a column for each year in
your data set. Under each column is an Abc indicator. This indicates that you can drag text or numerical
data here, like what you might see in an Excel spreadsheet. If you were to drag Sales to this area, Tableau
creates a crosstab (like a spreadsheet) and displays the sales totals for each year.
323
Tableau generates the following chart with sales rolled up as a sum (aggregated). You can see total
aggregated sales for each year by order date.
When you first create a view that includes time (in this case Order Date), Tableau automatically
generates a line chart.
This line chart shows that sales look pretty good and seem to be increasing over time. This is good information,
but it doesn't really tell you much about which products have the strongest sales and if there are some products
that might be performing better than others. Since you just got started, you decide to explore further and see
what else you can find out.
324
Refine your view
To gain more insight into which products drive overall sales, try adding more data. Start by adding the
product categories to look at sales totals in a different way.
1. From the Data pane, drag Category to the Columns shelf and place it to the right of
YEAR(Order Date).
Your view updates to a bar chart. By adding a second discrete dimension to the view you can
categorize your data into discrete chunks instead of looking at your data continuously over time.
This creates a bar chart and shows you overall sales for each product category by year.
325
Your view is doing a great job showing sales by category—furniture, office supplies, and
technology. An interesting insight is revealed!
From this view, you can see that sales for furniture is growing faster than sales for office
supplies, even though Office Supplies had a really good year in 2021. Perhaps you can
recommend that your company focus sales efforts on furniture instead of office supplies? Your
company sells a lot of different products in those categories, so you'll need more information
before you can make a recommendation.
To help answer that question, you decide to look at products by sub-category to see which items
are the big sellers. For example, for the Furniture category, you want to see details about
bookcases, chairs, furnishings, and tables. Looking at this data might help you gain insights
into sales and later on, overall profitability, so add sub-categories to your bar chart.
Note: You can drag and drop or double-click a field to add it to your view, but be careful.
Tableau makes assumptions about where to add that data, and it might not be placed where you
expect. You can always click Undo to remove the field, or drag it off the area where Tableau
placed it to start over.
Sub-Category is another discrete field. It creates another header at the bottom of the view, and
shows a bar for each sub-category (68 marks) broken down by category and year.
326
Now you are getting somewhere, but this is a lot of data to visually sort through. In the next section,
you will learn how you can add color, filters, and more to focus on specific results.
327
Step summary
This step was all about getting to know your data and starting to ask questions about your data to gain
insights. You learned how to:
Now you're ready to begin focusing on your results to identify more specific areas of concern. In the
next section, you will learn how to use filters and colors to help you explore your data visually.
You've created a view of product sales broken down by category and sub-category. You are starting to
get somewhere, but that is a lot of data to sort through. You need to easily find the interesting data
points and focus on specific results. Well, Tableau has some great options for that!
Filters and colors are ways you can add more focus to the details that interest you. After you add focus
to your data, you can begin to use other Tableau Desktop features to interact with that data.
Activity 2:
Add filters to your view
Solution:
You can use filters to include or exclude values in your view. In this example, you decide to add two
simple filters to your worksheet to make it easier to look at product sales by sub-category for a specific
year.
1. In the Data pane, right-click Order Date and select Show Filter.
The filters are added to the right side of your view in the order that you selected them. Filters
are card types and can be moved around on the canvas by clicking on the filter and dragging it
to another location in the view. As you drag the filter, a line appears that shows you where you
can drop the filter to move it.
Note: The Get Started tutorial uses the default position of the filter cards.
328
Check your work! Watch "Apply filters to your view" in action
329
Activity 3:
Add color to your view
Solution:
Adding filters helps you to sort through all of this data—but wow, that’s a lot of blue! It's time to do something
about that.
Currently, you are looking at sales totals for your various products. You can see that some products have
consistently low sales, and some products might be good candidates for reducing sales efforts for those product
lines. But what does overall profitability look like for your different products? Drag Profit to color to see what
happens.
From the Data pane, drag Profit to Color on the Marks card.
By dragging profit to color, you now see that you have negative profit in Tables, Bookcases, and even Machines.
Another insight is revealed!
330
Note: Tableau automatically added a color legend and assigned a diverging color palette because your data
includes both negative and positive values.
Step summary
In this step you used filter and color to make working with your data a bit easier. You also learned about a few
fun features that Tableau offers to help you answer key questions about your data. You learned how to:
• Apply filters and color to make it easier to focus on the areas of your data that interest you the most.
• Interact with your chart using the tools that Tableau provides.
• Duplicate worksheets and save your changes to continue exploring your data in different ways without
losing your work.
Download any dataset from Kaggle and generate basic graphs, add filters and colors to your graphs.
331
Lab 17
Objective:
The objective of this lab is to get an introduction with the Tableau environment and to perform few tasks to get
insights from any dataset.
Activity Outcomes:
The activities provide hands - on practice with the following topics
• Explore your data geographically
• Create a Top N Filter
• Building a dashboard
Instructor Note:
1. As pre-lab activity, read Chapter xx from the textbook “Business Intelligence Guidebook:
From Data Integration to Analytics, Rick Sherman, Morgan Kaufmann Press, 2014”.
”.
332
1) Useful Concepts
Activity 1:
Explore your data geographically
Solution:
You've built a great view that allows you to review sales and profits by product over several years. And after
looking at product sales and profitability in the South, you decide to look for trends or patterns in that region.
Because you're looking at geographic data (the Region field), you have the option to build a map view. Map
views are great for displaying and analyzing this kind of information. Plus, they're just cool!
For this example, Tableau has already assigned the proper geographic roles to the Country, State, City, and
Postal Code fields. That's because it recognized that each of those fields contained geographic data. You can get
to work creating your map view right away.
Tableau keeps your previous worksheet and creates a new one so that you can continue exploring your
data without losing your work.
333
2. In the Data pane, double-click State to add it to Detail on the Marks card.
Because Tableau already knows that state names are geographic data and because the State dimension
is assigned the State/Province geographic role, Tableau automatically creates a map view.
There is a mark for each of the 48 contiguous states in your data source. (Sadly, Alaska and Hawaii
aren't included in your data source, so they are not mapped.)
Notice that the Country field is also added to the view. This happens because the geographic fields in
Sample - Superstore are part of a hierarchy. Each level in the hierarchy is added as a level of detail.
Additionally, Latitude and Longitude fields are added to the Columns and Rows shelves. You can think
of these as X and Y fields. They're essential any time you want to create a map view, because each
location in your data is assigned a latitudinal and longitudinal value. Sometimes the Latitude and
Longitude fields are generated by Tableau. Other times, you might have to manually include them in
your data. You can find resources to learn more about this in the Learning Library.
Now, having a cool map focused on 48 states is one thing, but you wanted to see what was happening
in the South, remember?
334
3. Drag Region to the Filters shelf, and then filter down to the South only. The map view zooms in to the
South region, and there is a mark for each state (11 total).
Now you want to see more detailed data for this region, so you start to drag other fields to the Marks
card:
The view automatically updates to a filled map, and colors each state based on its total sales. Because
you're exploring product sales, you want your sales to appear in USD. Click the Sum(Sales) field on
the Columns shelf, and select Format. For Numbers, select Currency.
Any time you add a continuous measure that contains positive numbers (like Sales) to Color on the
Marks card, your filled map is colored blue. Negative values are assigned orange.
Sometimes you might not want your map to be blue. Maybe you prefer green, or your data isn’t
something that should be represented with the color blue, like wildfires or traffic jams. That would just
be confusing!
No need to worry, you can change the color palette just like you did before.
For this example, you want to see which states are doing well, and which states are doing poorly in
sales.
6. In the Palette drop-down list, select Red-Green Diverging and click OK. This allows you to see quickly
the low performers and the high performers.
335
Your view updates to look like this:
The data is accurate, and technically you can compare low performers with high performers, but is that
really the whole story?
Are sales in some of those states really that terrible, or are there just more people in Florida who want
to buy your products? Maybe you have smaller or fewer stores in the states that appear red. Or maybe
there’s a higher population density in the states that appear green, so there are just more people to buy
your stuff.
Either way, there’s no way you want to show this view to your boss because you aren't confident the
data is telling a useful story.
7. Click the Undo icon in the toolbar to return to that nice, blue view.
At first glance, it appears that Florida is performing the best. Hovering over its mark reveals a total of
89,474 USD in sales, as compared to South Carolina, for example, which has only 8,482 USD in sales.
However, have any of the states in the South been profitable?
8. Drag Profit to Color on the Marks card to see if you can answer this question.
336
Now that’s better! Because profit often consists of both positive and negative values, Tableau
automatically selects the Orange-Blue Diverging color palette to quickly show the states with negative
profit and the states with positive profit.
It’s now clear that Tennessee, North Carolina, and Florida have negative profit, even though it appeared they
were doing okay—even great—in Sales. But why? You'll answer that in the next step.
337
Step 5: Drill down into the details
In the last step you discovered that Tennessee, North Carolina, and Florida have negative profit. To
find out why, you decide to drill down even further and focus on what's happening in those three states
alone.
2. Right-click Profit Map at the bottom of the workspace and select Duplicate . Name the new
sheet Negative Profit Bar Chart.
3. In the Negative Profit Bar Chart worksheet, click Show Me, and then select horizontal bars.
Show Me highlights different chart types based on the data you've added to your view.
Note: At any time, you can click Show Me again to collapse it.
338
You now have a bar chart again—just like that.
339
4. To select multiple bars on the left, click and drag your cursor across the bars
between Tennessee, North Carolina, and Florida. On the tooltip that appears, select Keep
Only to focus on those three states.
Note: You can also right-click one of the highlighted bars, and select Keep Only.
Notice that an Inclusions field for State is added to the Filters shelf to indicate that certain
states are filtered from the view. The icon with two circles on the field indicates that this field
is a set. You can edit this field by right-clicking the field on the Filters shelf and selecting, Edit
Filter.
Now you want to look at the data for the cities in these states.
5. On the Rows shelf, click the plus icon on the State field to drill-down to the City level of
detail.
There’s almost too much information here, so you decide to filter the view down to the cities with the
most negative profit by using a Top N Filter.
340
Activity 2:
Create a Top N Filter
Solution:
You can use a Top N Filter in Tableau Desktop to limit the number of marks displayed in your view. In this
case, you want to use the Top N Filter to hone in on poor performers.
2. In the Filter dialog box, select the Top tab, and then do the following:
a. Click By field.
b. Click the Top drop-down and select Bottom to reveal the poorest performers.
341
c. Type 5 in the text box to show the bottom 5 performers in your data set.
Tableau Desktop has already selected a field (Profit) and aggregation (Sum) for the Top N Filter
based on the fields in your view. These settings ensure that your view will display only the five
poorest performing cities by sum of profit.
d. Click OK.
What happened to the bar chart, and why is it blank? That's a great question, and a great opportunity to
introduce the Tableau Order of Operations.
The Tableau Order of Operations, also known as the query pipeline, is the order that Tableau performs
various actions, such as the order in which it applies your filters to the view.
a. Extract Filters
c. Context Filters
d. Top N Filters
e. Dimension Filters
f. Measure Filters
The order that you create filters in, or arrange them on the Filters shelf, doesn't change the order in
which Tableau applies those filters to your view.
The good news is you can tell Tableau to change this order when you notice something strange
happening with the filters in your view. In this example, the Top N Filter is applied to the five poorest
performing cities by sum of profit for the whole map, but none of those cities are in the South, so the
chart is blank.
342
To fix the chart, add a filter to context. This tells Tableau to filter that field first, regardless of where it
falls on the order of operations.
But which field do you add to context? There are three fields on the Filters shelf: Region (a dimension
filter), City (a top N filter), and Inclusions (Country, State) (Country, State) (a set).
If you look at the order of operations again, you know that the set and the top N filter are being applied
before the dimension filter. But do you know if the top N filter or the set filter is being applied first?
Let's find out.
3. On the Filters shelf, right-click the City field and select Add to Context.
The City field turns gray and moves to the top of the Filters shelf, but nothing changes in the view. So
even though you're forcing Tableau to filter City first, the issue isn't resolved.
4. Click Undo.
5. On the Filters shelf, right-click the Inclusions (Country, State) (Country, State) set and select Add to
Context.
The Inclusions (Country, State) (Country, State) set turns gray and moves to the top of the Filters shelf.
And bars have returned to your view!
You're on to something! But there are six cities in the view, including Jacksonville, North Carolina,
which has a positive profit. Why would a city with a positive profit show up in the view when you
created a filter that was supposed to filter out profitable cities?
Jacksonville, North Carolina is included because City is the lowest level of detail shown in the view.
For Tableau Desktop to know the difference between Jacksonville, North Carolina, and Jacksonville,
Florida, you need to drill down to the next level of detail in the location hierarchy, which, in this case,
is Postal Code. After you add Postal Code, you can exclude Jacksonville in North Carolina without also
excluding Jacksonville in Florida.
6. On the Rows shelf, click the plus icon on City to drill down to the Postal Code level of detail.
7. Right-click the postal code for Jacksonville, North Carolina, 28540, and then select Exclude.
Postal Code is added to the Filters shelf to indicate that certain members in the Postal Code field have
been filtered from the view. Even when you remove the Postal Code field from the view, the filter
remains.
343
Check your work! Watch steps 1-8 in action
344
Now that you've focused your view to the least profitable cities, you can investigate further to
identify the products responsible.
1. Drag Sub-Category to the Rows shelf, and place it to the right of City.
2. Drag Profit to Color on the Marks card to make it easier to see which products have negative
profit.
3. In the Data pane, right-click Order Date and select Show Filter.
You can now explore negative profits for each year if you want, and quickly spot the products
that are losing money.
Machines, tables, and binders don’t seem to be doing well. So what if you stop selling those
items in Jacksonville, Concord, Burlington, Knoxville, and Memphis?
345
Verify your findings
Will eliminating binders, machines, and tables improve profits in Florida, North Carolina, and
Tennessee? To find out, you can filter out the problem products to see what happens.
1. Go back to your map view by clicking the Profit Map sheet tab.
A filter card for all of the products you offer appears next to the map view. You'll use this filter
later.
3. From the Data pane, drag Profit and Profit Ratio to Label on the Marks card. To format the
Profit Ratio as a percentage, right-click Profit Ratio, and select Format. Then, for Default
Numbers, choose Percentage and set the number of decimal places you want displayed on the
map. For this map, we'll choose zero decimal places.
Now you can see the exact profit of each state without having to hover your cursor over them.
4. In the Data pane, right-click Order Date and select Show Filter to provide some context for
the view.
346
A filter card for YEAR(Order Date) appears in the view. You can now view profit for all years
or for a combination of years. This might be useful for your presentation.
5. Clear Binders, Machines, and Tables from the list on the Sub-Category filter card in the view.
Recall that adding filters to your view lets you include and exclude values to highlight certain
parts of your data.
As you clear each member, the profit for Tennessee, North Carolina, and Florida improve, until
finally, each has a positive profit.
Binders, machines, and tables are definitely responsible for the losses in Tennessee, North
Carolina, and Florida, but not for the rest of the South. Do you notice how profit actually
decreases for some of the other states as you clear items from the filter card? For example, if
you toggle Binders on the Sub-Category filter card, profit drops by four percent in Arkansas.
You can deduce that Binders are actually profitable in Arkansas.
You want to share this discovery with the team by walking them through the same steps you
took.
347
6. Select (All) on the Sub-Category filter card to include all products again.
Now you know that machines, tables, and binders are problematic products for your company. In
focusing on the South, you see that these products have varying impacts on profit. This might be a
worthwhile conversation to have with your boss.
Next, you'll assemble the work you've done so far in a dashboard so that you can clearly present your
findings.
348
Activity 3:
Build a dashboard to show your insights:
Solution:
You’ve created four worksheets, and they're communicating important information that your boss needs to
know. Now you need a way to show the negative profits in Tennessee, North Carolina, and Florida and explain
some of the reasons why profits are low.
To do this, you can use dashboards to display multiple worksheets at once, and—if you want—make them
interact with one another.
You want to emphasize that certain items sold in certain places are doing poorly. Your bar graph view
of profit and your map view demonstrate this point nicely.
2. In the Dashboard pane on the left, you'll see the sheets that you created. Drag Sales in the
South to your empty dashboard.
3. Drag Profit Map to your dashboard, and drop it on top of the Sales in the South view.
349
Now you can see both views at once!
But sadly, the bar chart is a bit squished, which isn’t helping your boss understand your data.
Resolving these issues will give you more room to communicate the information you need.
1. On Sales in the South, right-click in the column area under the Region column header, and
clear Show header.
350
2. Repeat this process for the Category row header.
You've now hidden unnecessary columns and rows from your dashboard while preserving the
breakdown of your data. The extra space makes it easier to see data on your dashboard, but let's
freshen things up even more.
The title Profit Map is hidden from the dashboard and even more space is created.
4. Repeat this step for the Sales in the South view title.
351
5. Select the first Sub-Category filter card on the right side of your view, and at the top of the
card, click the Remove icon .
6. Repeat this step for the second Sub-Category filter card and one of the Year of Order
Date filter cards.
7. Click on the Profit color legend and drag it from the right to below Sales in the South.
8. Finally, select the remaining Year of Order Date filter, click its drop-down arrow, and then
select Floating. Move it to the white space in the map view. In this example, it is placed just
off the East Coast, in the Atlantic Ocean.
Try selecting different years on the Year of Order Date filter. Your data is quickly filtered to
show that state performance varies year by year. That's nice, but it could be made even easier
to compare.
9. Click the drop-down arrow at the top of the Year of Order Date filter, and select Single Value (Slider).
352
Now your dashboard is looking really good! Now, you can easily compare profit and sales by year.
But that’s not so different from a couple pictures in a presentation—and you're using Tableau! Let's
make your dashboard more engaging.
Add interactivity
Wouldn't it be great if you could view which sub-categories are profitable in specific states?
1. Select Profit Map in the dashboard, and click the Use as filter icon in the upper right
corner.
The Sales in the South bar chart automatically updates to show just the sub-category sales in
the selected state. You can quickly see which sub-categories are profitable.
353
3. Click an area of the map other than the colored Southern states to clear your selection.
You also want viewers to be able to see the change in profits based on the order date.
4. Select the Year of Order Date filter, click its drop-down arrow, and select Apply to
Worksheets > Selected Worksheets.
5. In the Apply Filter to Worksheets dialog box, select All in dashboard, and then click OK.
This option tells Tableau to apply the filter to all worksheets in the dashboard that use this
same data source.
Here, we filter Sales in the South to only items sold in North Carolina, and then explore year by year
profit.
354
Rename and go
You show your boss your dashboard, and she loves it. She's named it "Regional Sales and Profit," and
you do the same by double-clicking the Dashboard 1 tab and typing Regional Sales and Profit.
In her investigations, your boss also finds that the decision to introduce machines in the North Carolina
market in 2021 was a bad idea.
Your boss is glad she has this dashboard to explore, but she also wants you to present a clear action
plan to the larger team. She asks you to create a presentation with your findings.
Instead of having to guess which key insights your team is interested in and including them in a
presentation, you decide to create a story in Tableau. This way, you can walk viewers through your
data discovery process, and you have the option to interactively explore your data to answer any
questions that come up during your presentation.
You're presented with a blank workspace that reads, "Drag a sheet here." This is where you'll
create your first story point.
Blank stories look a lot like blank dashboards. And like a dashboard, you can drag worksheets
over to present them. You can also drag dashboards over to present them in your story.
2. From the Story pane on the left, drag the Sales in the South worksheet onto your view.
3. Add a caption—maybe "Sales and profit by year"—by editing the text in the gray box above
the worksheet.
355
This story point is a useful way to acquaint viewers with your data.
But you want to tell a story about selling machines in North Carolina, so let's focus on that data.
Continue working where you left off, but know that your first story point will be exactly as you
left it.
2. Since you know you’re telling a story about machines, on the Sub-Category filter, clear the
selection for (All), then select Machines.
356
Now your viewers can quickly identify the sales and profit of machines by year.
3. Add a caption to underscore what your viewers see, for example, "Machine sales and profit by
year."
You've successfully shifted the focus to machines, but you realize that something seems odd:
in this view, you can't single out which state is contributing to the loss.
You'll address this in your next story point by introducing your map.
Check your work! Watch "Create your first story point" and "Highlight machine sales" in action.
357
Make your point
The bottom line is that machines in North Carolina lose money for your company. You discovered that
in the dashboard you created. Looking at overall sales and profit by year doesn't demonstrate this point
alone, but regional profit can.
1. In the Story pane, select Blank. Then, drag your dashboard Regional Sales and Profit onto
the canvas.
This gives viewers a new perspective on your data: Negative profit catches the eye.
358
2. Add a caption like, "Underperforming items in the South."
To narrow your results to just North Carolina, start with a duplicate story point.
1. Select Duplicate to create another story point with your Regional Profit dashboard.
2. Select North Carolina on the map and notice that the bar chart automatically updates.
Now you can walk viewers through profit changes by year in North Carolina. To do this, you will
create four story points:
1. Select Duplicate to begin with your Regional Profit dashboard focused on North Carolina.
2. On the Year of Order Date filter, click the right arrow button so that 2018 appears.
359
3. Add a caption, for example, "Profit in North Carolina, 2018," and then click Duplicate.
Now viewers will have an idea of which products were introduced to the North Carolina market when,
and how poorly they performed.
Finishing touches
On this story point that focuses on data from 2021, you want to describe your findings. Let's add more
detail than just a caption.
1. In the left pane, select Drag to add text and drag it onto your view.
2. Enter a description for your dashboard that emphasizes the poor performance of machines in
North Carolina, for example, "Introducing machines to the North Carolina market in 2021
resulted in losing a significant amount of money."
360
For dramatic effect, you can hover over Machines on the Sales in the South bar chart while
presenting to show a useful tooltip: the loss of nearly $4,000.
And now, for the final slide, you drill down into the details.
4. From the Story pane, drag Negative Profit Bar Chart to the view.
5. In the Year of Order Date filter card, narrow the view down to 2021 only.
You can now easily see that the loss of machine profits was solely from Burlington, North
Carolina.
6. In the view, right-click the Burlington mark (the bar) and select Annotate > Mark.
7. In the Edit Annotation dialog box that appears, delete the filler text and type: "Machines in Burlington
lost nearly $4,000 in 2021."
8. Click OK.
9. In the view, click the annotation and drag it to adjust where it appears.
361
10. Give this story point the caption: "Where are we losing machine profits in North Carolina?"
11. Double-click the Story 1tab and rename your story to "Improve Profits in the South".
• If you or your company does not use Tableau Server, or if you want to learn about a free,
alternative sharing option, jump to Use Tableau Public.
362
• If you or your company uses Tableau Server, and you are familiar with what permissions are
assigned to you, jump to Use Tableau Server.
Note: When you publish to Tableau Public, as the name suggests, these views are publicly accessible.
This means that you share your views as well as your underlying data with anyone with access to the
internet. When sharing confidential information, consider Tableau Server(Link opens in a new
window) or Tableau Cloud(Link opens in a new window).
If you don't have a Tableau Public profile, click Create one now for free and follow the
prompts.
3. If you see this dialog box, open the Data Source page. Then in the top-right corner, change
the Connection type from Live to Extract.
363
4. For the second (and last) time, select Server > Tableau Public > Save to Tableau Public.
5. When your browser opens, review your embedded story. It will look like this:
6. Click Edit Details to update the title of your viz, add a description, and more.
7. Click Save.
364
8. To share with colleagues, click Share at the bottom of your viz.
a. Embed on your website: Copy the Embed Code and paste it in your web page HTML.
b. Send a link: Copy the Link and send the link to your colleagues.
c. Send an email using your default email client by clicking the email icon.
2. Enter the name of the server (or IP address) that you want to connect to in the dialog box and
click Connect.
365
4. If you want, enter a description for reference, for example "Take a look at the story I built in
Tableau Desktop!"
5. Under Sheets, click Edit, and then clear all sheets except Improve Profits in the South.
6. Click Publish.
Tableau Server opens in your internet browser. If prompted, enter your server credentials.
The Publishing Complete dialog box lets you know that your story is ready to view.
Great work! You've successfully published your story using Tableau Server.
366
Send a link to your work
Let's share your work with your teammates so that they can interact with your story online.
1. In Tableau Server, navigate to the Improve Profits in the South story that you published. You
will see a screen like this:
If you had published additional sheets from your workbook, they would be listed alongside
Improve Profits in the South.
367
Awesome! This is your interactive, embedded story.
a. Embed on your website by copying the Embed Code and pasting it in your web page
HTML.
b. Send a link by copying the Link and sending the link to your colleagues.
368
c. Send an email by using your default email client: Click the email icon
369