Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views93 pages

Model Data With PowerBI

Uploaded by

RamanKumarJha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views93 pages

Model Data With PowerBI

Uploaded by

RamanKumarJha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Data Modelling With DAX

Data Analysis Expressions (DAX) include a family of functions known as iterator


functions. Iterator functions enumerate all rows of a given table and evaluate a given
expression for each row. They provide you with flexibility and control over how your
model calculations summarize data.

By now, you're familiar with single-column summarization functions,


including SUM, COUNT, MIN, MAX, and others. Each of these functions has an equivalent
iterator function that's identified by the "X" suffix, such as SUMX, COUNTX, MINX, MAXX, and
others. Additionally, specialized iterator functions exist that perform filtering, ranking,
semi-additive calculations over time, and more.

Characteristic of all iterator functions, you must pass in a table and an expression.
The table can be a model table reference or an expression that returns a table object.
The expression must evaluate to a scalar value.

Single-column summarization functions, like SUM, are shorthand functions. Internally,


Microsoft Power BI converts the SUM function to SUMX. As a result, the following two
measure definitions will produce the same result with the same performance.

DAX
Revenue = SUM(Sales[Sales Amount])
DAX
Revenue =
SUMX(
Sales,
Sales[Sales Amount]
)

It's important to understand how context works with iterator functions. Because
iterator functions enumerate over table rows, the expression is evaluated for each
row in row context, similar to calculated column formulas. The table is evaluated in
filter context, so if you're using the previous Revenue measure definition example, if
a report visual was filtered by fiscal year FY2020, then the Sales table would contain
sales rows that were ordered in that year. Filter context is described in the filter
context module.

Important

When you're using iterator functions, make sure that you avoid using large tables (of
rows) with expressions that use expansive DAX functions. Some functions, like
the SEARCH DAX function, which scans a text value that looks for specific characters
or text, can result in slow performance. Also, the LOOKUPVALUE DAX function might
result in a slow, row-by-row retrieval of values. In this second case, use
the RELATED DAX function instead, whenever possible.
Data Modelling With DAX

Use aggregation iterator functions


Each single-column summarization function has its equivalent iterator function. The
following sections consider two aggregation scenarios when iterator functions are
useful: complex summarization and higher grain summarization.

Complex summarization
In this section, you will create your first measure that uses an iterator function. First,
download and open the Adventure Works DW 2020 M05.pbix file. Next, add the
following measure definition:

DAX
Revenue =
SUMX(
Sales,
Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount
Pct])
)

Format the Revenue measure as currency with two decimal places, and then add it
to the table visual that's found on Page 1 of the report.

By using an iterator function, the Revenue measure formula aggregates more than
the values of a single column. For each row, it uses the row context values of three
columns to produce the revenue amount.

Now, add another measure:


Data Modelling With DAX

DAX
Discount =
SUMX(
Sales,
Sales[Order Quantity]
* (
RELATED('Product'[List Price]) - Sales[Unit Price]
)
)

Format the Discount measure as currency with two decimal places, and then add it
to the table visual.

Notice that the formula uses the RELATED function. Remember, row context does not
extend beyond the table. If your formula needs to reference columns in other tables,
and model relationships exist between the tables, use the RELATED function for the
one-side relationship or the RELATEDTABLE function for the many-side relationship.

Higher grain summarization


The following example considers a requirement to report on average revenue. Add
the following measure:

DAX
Revenue Avg =
AVERAGEX(
Sales,
Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount
Pct])
)
Data Modelling With DAX

Format the Revenue Avg measure as currency with two decimal places, and then
add it to the table visual.

Consider that average means the sum of values divided by the count of values.
However, that theory raises a question: What does the count of values represent? In
this case, the count of values is the number of expressions that didn't evaluate to
BLANK. Also, because the iterator function enumerates the Sales table rows, average
would mean revenue per row. Taking this logic one step further, because each row in
the Sales table records a sales order line, it can be more precisely described
as revenue per order line.

Accordingly, you should rename the Revenue Avg measure


as Revenue Avg Order Line so that it's clear to report users about what's being used
as the average base.

The following example uses an iterator function to create a new measure that raises
the granularity to the sales order level (a sales order consists of one or more order
lines). Add the following measure:

DAX
Revenue Avg Order =
AVERAGEX(
VALUES('Sales Order'[Sales Order]),
[Revenue]
)

Format the Revenue Avg Order measure as currency with two decimal places, and
then add it to the table visual.
Data Modelling With DAX

As expected, the average revenue for an order is always higher than the average
revenue for a single order line.

Notice that the formula uses the VALUES DAX function. This function lets your
formulas determine what values are in filter context. In this case,
the AVERAGEX function iterates over each sales order in filter context. In other words, it
iterates over each sales order for the month. Filter context and the VALUES function
are introduced in the filter context module.

Calculate ranks

The RANKX DAX function is a special iterator function you can use to calculate ranks.
Its syntax is as follows:

DAX
RANKX(<table>, <expression>[, <value>[, <order>[, <ties>]]])

Similar to all iterator functions, you must pass in a table and an expression.
Optionally, you can pass in a rank value, set the order direction, or determine how to
handle ranks when values are tied.

Order direction

Order direction is either ascending or descending. When ranking something


favorable, like revenue values, you're likely to use descending order so that the
highest revenue is ranked first. When ranking something unfavorable, like customer
complaints, you might use ascending order so that the lowest number of complaints
Data Modelling With DAX

is ranked first. When you don't pass in an order argument, the function will use 0
(zero) (for descending order).

Handle ties

You can handle ties by skipping rank values or using dense ranking, which uses the
next rank value after a tie. When you don't pass in a ties argument, the function will
use Skip. You'll have an opportunity to work with an example of each tie argument
later in this unit.

Create ranking measures

Add the following measure to the Product table:

DAX
Product Quantity Rank =
RANKX(
ALL('Product'[Product]),
[Quantity]
)

Add the Product Quantity Rank measure to the table visual that is found
on Page 2 of the report. The table visual groups bike products and displays quantity,
which orders products by descending quantity.

The RANKX function iterates over a table that is returned by the ALL DAX function.
The ALL function is used to return all rows in a model table or values in one or more
columns, and it ignores all filters. Therefore, in this case, it returns a table that
consists of all Product column values in the Product table. The RANKX function must
use the ALL function because the table visual will group by product (which is a filter
on the Product table).

In the table visual, notice that two products tie for tenth place and that the next
product's rank is 12. This visual is an example of using the Skipped ties argument.
Data Modelling With DAX

Your next task is to enter the following logic to modify


the Product Quantity Rank measure definition to use dense ranking:

DAX
Product Quantity Rank =
RANKX(
ALL('Product'[Product]),
[Quantity],
,
,
DENSE
)

In the table visual, notice that a skipped ranking no longer exists. After the two
products that tie for tenth place, the next ranking is 11.
Data Modelling With DAX

Notice that the table visual total for the Product Quantity Rank is one (1). The
reason is because the total for all products is ranked.

It's not appropriate to rank total products, so you will now use the following logic to
modify the measure definition to return BLANK, unless a single product is filtered:

DAX
Product Quantity Rank =
IF(
HASONEVALUE('Product'[Product]),
RANKX(
ALL('Product'[Product]),
[Quantity],
,
,
DENSE
)
)
Data Modelling With DAX

Notice that the total Product Quantity Rank is now BLANK, which was achieved by
using the HASONEVALUE DAX function to test whether the Product column in
the Product table has a single value in filter context. It's the case for each product
group, but not for the total, which represents all products.

Filter context and the HASONEVALUE function will be introduced in the filter context
module.

1. An iterator function always includes at least two arguments. What are


they? - Table and expression
2. Which statement about DAX iterator functions is true? - Iterator functions
iterate over tables and evaluate an expression for each table row.
3. You're developing a Power BI Desktop model. You need to create a
measure formula by using the RANKX DAX function to rank students by
their test results. The lowest rank should be assigned to the highest test
result. Also, if multiple students achieve the same rank, the next student
rank should follow on from the tied rank number. Which order and ties
arguments should you pass to the RANKX function? - Order Descending |
Ties Dense

Add Measures to PowerBi Desktop Models


Measures in Microsoft Power BI models are either implicit or explicit. Implicit
measures are automatic behaviors that allow visuals to summarize model column
data. Explicit measures, also known simply as measures, are calculations that you can
add to your model. This module focuses on how you can use implicit measures.

In the Fields pane, a column that's shown with the sigma symbol (∑) indicates two
facts:

 It's a numeric column.


 It will summarize column values when it is used in a visual (when added to a
field well that supports summarization).

In the following image, notice that the Sales table includes only fields that can be
summarized, including the Profit Amount calculated column.
Data Modelling With DAX

As a data modeler, you can control if and how the column summarizes by setting
the Summarization property to Don't summarize or to a specific aggregation
function. When you set the Summarization property to Don't summarize, the
sigma symbol will no longer show next to the column in the Fields pane.

To observe how report authors can use implicit measures, you can first download
and open the Adventure Works DW 2020 M04.pbix file.

In the report, from the Sales table, add the Sales Amount field to the matrix visual
that groups fiscal year and month on its rows.

To determine how the column is summarized, in the visual fields pane, for
the Sales Amount field, select the arrow and then review the context menu options.
Data Modelling With DAX

Notice that the Sum aggregation function has a check mark next to it. This check
mark indicates that the column is summarized by summing column values together.
It's also possible to change the aggregation function by selecting any of the other
options like average, minimum, and so on.

Next, add the Unit Price field to the matrix visual.

The default summarization is now set to Average (the modeler knows that it's
inappropriate to sum unit price values together because they're rates, which are non-
additive).
Data Modelling With DAX

Implicit measures allow the report author to start with a default summarization
technique and lets them modify it to suit their visual requirements.

Numeric columns support the greatest range of aggregation functions:

 Sum
 Average
 Minimum
 Maximum
 Count (Distinct)
 Count
 Standard deviation
 Variance
 Median

Summarize non-numeric columns


Non-numeric columns can be summarized. However, the sigma symbol does not
show next to non-numeric columns in the Fields pane because they don't summarize
by default.
Data Modelling With DAX

Text columns allow the following aggregations:

 First (alphabetically)
 Last (alphabetically)
 Count (Distinct)
 Count

Date columns allow the following aggregations:

 Earliest
 Latest
 Count (Distinct)
 Count

Boolean columns allow the following aggregations:

 Count (Distinct)
 Count

Benefits of implicit measures


Several benefits are associated with implicit measures. Implicit measures are simple
concepts to learn and use, and they provide flexibility in the way that report authors
visualize model data. Additionally, they mean less work for you as a data modeler
because you don't have to create explicit calculations.

Limitations of implicit measures


Implicit measures do have limitations. Despite setting an appropriate summarization
method, report authors could choose to aggregate a column in unsuitable ways. For
example, in the matrix visual, you could modify the aggregate function
of Unit Price to Sum.
Data Modelling With DAX

The report visual obeys your setup, but it has now produced
a Sum of Unit Price column, which presents misleading data.

The most significant limitation of implicit measures is that they only work for simple
scenarios, meaning that they can only summarize column values that use a specific
aggregation function. Therefore, in situations when you need to calculate the ratio of
each month's sales amount over the yearly sales amount, you'll need to produce an
explicit measure by writing a Data Analysis Expressions (DAX) formula to achieve that
more sophisticated requirement.

Implicit measures don't work when the model is queried by using Multidimensional
Expressions (MDX). This language expects explicit measures and can't summarize
column data. It's used when a Power BI semantic model is queried by using Analyze
in Excel or when a Power BI paginated report uses a query that is generated by the
MDX graphical query designer.

Create simple measures


You can write a DAX formula to add a measure to any table in your model. A
measure formula must return a scalar or single value.

Note

In tabular modeling, no such concept as a calculated measure exists. The


word calculated is used to describe calculated tables and calculated columns. It
distinguishes them from tables and columns that originate from Power Query, which
doesn't have the concept of an explicit measure.

Measures don't store values in the model. Instead, they're used at query time to
return summarizations of model data. Additionally, measures can't reference a table
or column directly; they must pass the table or column into a function to produce a
summarization.

A simple measure is one that aggregates the values of a single column; it does what
implicit measures do automatically.

In the next example, you will add a measure to the Sales table. In the Fields pane,
select the Sales table. To create a measure, in the Table Tools contextual ribbon,
from inside the Calculations group, select New measure.
Data Modelling With DAX

In the formula bar, enter the following measure definition and then press Enter.

DAX
Revenue =
SUM(Sales[Sales Amount])

The measure definition adds the Revenue measure to the Sales table. It uses
the SUM DAX function to sum the values of the Sales Amount column.

On the Measure tools contextual ribbon, inside the Formatting group, set the
decimal places to 2.

Tip

Immediately after you create a measure, set the formatting options to ensure well-
presented and consistent values in all report visuals.

Now, add the Revenue measure to the matrix visual. Notice that it produces the
same result as the Sales Amount implicit measure.

In the matrix visual, remove Sales Amount and Sum of Unit Price.

Next, you will create more measures. Create the Cost measure by using the following
measure definition, and then set the format with two decimal places.

DAX
Cost =
SUM(Sales[Total Product Cost])

Create the Profit measure, and then set the format with two decimal places.

DAX
Profit =
Data Modelling With DAX

SUM(Sales[Profit Amount])

Notice that the Profit Amount column is a calculated column. This topic will be
discussed later in this module.

Next, create the Quantity measure and format it as a whole number with the
thousands separator.

DAX
Quantity =
SUM(Sales[Order Quantity])

Create three unit price measures and then set the format of each with two decimal
places. Notice the different DAX aggregation functions that are used: MIN, MAX,
and AVERAGE.

DAX
Minimum Price =
MIN(Sales[Unit Price])
DAX
Maximum Price =
MAX(Sales[Unit Price])
DAX
Average Price =
AVERAGE(Sales[Unit Price])

Now, hide the Unit Price column, which results in report authors losing their ability
to summarize the column except by using your measures.

Tip

Adding measures and hiding columns is how you, the data modeler, can limit
summarization options.

Next, create the following two measures, which count the number of orders and
order lines. Format both measures with zero decimal places.

DAX
Order Line Count =
COUNT (Sales [SalesOrderLineKey])
DAX
Order Count =
DISTINCTCOUNT ('Sales Order'[Sales Order])

The COUNT DAX function counts the number of non-BLANK values in a column, while
the DISTINCTCOUNT DAX function counts the number of distinct values in a column.
Because an order can have one or more order lines, the Sales Order column will
Data Modelling With DAX

have duplicate values. A distinct count of values in this column will correctly count
the number of orders.

Alternatively, you can choose the better way to write the Order Line Count measure.
Instead of counting values in a column, it's semantically clearer to use
the COUNTROWS DAX function. Unlike the previously introduced aggregation
functions, which aggregate column values, the COUNTROWS function counts the number
of rows for a table.

Modify the Order Line Count measure formula you created above to the following
parameters:

DAX
Order Line Count =
COUNTROWS(Sales)

Add each of the measures to the matrix visual.

All measures that you've created are considered simple measures because they
aggregate a single column or single table.

Create compound measures


When a measure references one or more measures, it's known as a compound
measure.

For this example, you will modify the Profit measure by using the following measure
definition. Format the measure with two decimal places.

DAX
Profit =
[Revenue] - [Cost]

Next, add the Profit measure to the matrix visual.

Now that your model provides a way to summarize profit, you can delete
the Profit Amount calculated column.

By removing this calculated column, you've optimized the semantic model.


Removing this column results in a decreased semantic model size and shorter data
refresh times. The Profit Amount calculated column wasn't required because
the Profit measure can directly produce the required result.
Data Modelling With DAX

Create quick measures


Microsoft Power BI Desktop includes a feature named Quick Measures. This feature
helps you to quickly perform common, powerful calculations by generating the DAX
expression for you.

Many categories of calculations and ways to modify each calculation are available to
fit your needs. Moreover, you are able to see the DAX that's generated by the quick
measure and use it to jumpstart or expand your DAX knowledge.

In this next example, you'll create another compound measure to calculate profit
margin. However, this time, you'll create it as a quick measure.

In the Fields pane, select the Sales table. On the Table tools contextual ribbon, from
inside the Calculations group, select Quick measure.

In the Quick measures window, in the Calculation drop-down list, locate


the Mathematical operations group (you might need to scroll down the list) and
then select Division.
Data Modelling With DAX

From the Fields list (in the Quick measures window), expand the Sales table and
then drag the Profit measure into the Numerator box. Then, drag
the Revenue measure into the Denominator box.
Data Modelling With DAX

Select Add. In the Fields pane, notice the addition of the new compound measure. In
the formula bar, review the measure definition.

DAX
Profit divided by Revenue =
DIVIDE([Profit], [Revenue])
Note

After the quick measure has been created, you must apply any changes in the
formula bar.

Rename the measure as Profit Margin, and then set the format to a percentage with
two decimal places.

Add the Profit Margin measure to the matrix visual.

Compare calculated columns with


measures

DAX beginners often experience a degree of confusion about calculated columns


and measures. The following section reviews the similarities and differences between
both.

Regarding similarities between calculated columns and measures, both are:

 Calculations that you can add to your semantic model.


 Defined by using a DAX formula.
 Referenced in DAX formulas by enclosing their names within square brackets.

The areas where calculated columns and measures differ include:

 Purpose - Calculated columns extend a table with a new column, while


measures define how to summarize model data.
 Evaluation - Calculated columns are evaluated by using row context at data
refresh time, while measures are evaluated by using filter context at query time.
Filter context is introduced in a later module; it's an important topic to
understand and master so that you can achieve complex summarizations.
 Storage - Calculated columns (in Import storage mode tables) store a value for
each row in the table, but a measure never stores values in the model.
Data Modelling With DAX

 Visual use - Calculated columns (like any column) can be used to filter, group,
or summarize (as an implicit measure), whereas measures are designed to
summarize.

1. Which statement about measures is correct? - Measures can reference


other measures directly. Measures can reference other measures. It's
known as a compound measure.
2. Which DAX function can summarize a table? - The COUNTROWS
function summarizes a table by returning the number of rows
3. Which of the following statements describing similarity of
measures and calculated columns in an Import model is true? - They
can achieve summarization of model data.

You can write a Data Analysis Expressions (DAX) formula to add a calculated table to
your model. The formula can duplicate or transform existing model data to produce
a new table.

Note

A calculated table can't connect to external data; you must use Power Query to
accomplish that task.

A calculated table formula must return a table object. The simplest formula can
duplicate an existing model table.

Calculated tables have a cost: They increase the model storage size and they can
prolong the data refresh time. The reason is because calculated tables recalculate
when they have formula dependencies to refreshed tables.

Duplicate a table
The following section describes a common design challenge that can be solved by
creating a calculated table. First, you should download and open
the Adventure Works DW 2020 M03.pbix file and then switch to the model
diagram.

In the model diagram, notice that the Sales table has three relationships to
the Date table.
Data Modelling With DAX

The model diagram shows three relationships because the Sales table stores sales
data by order date, ship date, and due date. If you examine
the OrderDateKey, ShipDateKey, and DueDateKey columns, notice that one
relationship is represented by a solid line, which is the active relationship. The other
relationships, which are represented by dashed lines, are inactive relationships.

Note

Only one active relationship can exist between any two model tables.

In the diagram, hover the cursor over the active relationship to highlight the related
columns, which is how you would interact with the model diagram to learn about
related columns. In this case, the active relationship filters the OrderDateKey column
in the Sales table. Thus, filters that are applied to the Date table will propagate to
the Sales table to filter by order date; they'll never filter by ship date or due date.

The next step is to delete the two inactive relationships between the Date table and
the Sales table. To delete a relationship, right-click it and then select Delete in the
context menu. Make sure that you delete both inactive relationships.

Next, add a new table to allow report users to filter sales by ship date. Switch to
Report view and then, on the Modeling ribbon tab, from inside
the Calculations group, select New table.
Data Modelling With DAX

In the formula bar (located beneath the ribbon), enter the following calculated table
definition and then press Enter.

DAX
Ship Date = 'Date'

The calculated table definition duplicates the Date table data to produce a new table
named Ship Date. The Ship Date table has exactly the same columns and rows as
the Date table. When the Date table data refreshes, the Ship Date table recalculates,
so they'll always be in sync.

Switch to the model diagram, and then notice the addition of the Ship Date table.

Next, create a relationship between the DateKey column in the Ship Date table and
the ShipDateKey column in the Sales table. You can create the relationship by
dragging the DateKey column in the Ship Date table onto the ShipDateKey column
in the Sales table.

A calculated table only duplicates data; it doesn't duplicate any model properties or
objects like column visibility or hierarchies. You'll need to set them up for the new
table, if required.

Tip
Data Modelling With DAX

It's possible to rename columns of a calculated table. In this example, it's a good idea
to rename columns so that they better describe their purpose. For example,
the Fiscal Year column in the Ship Date table can be renamed as Ship Fiscal Year.
Accordingly, when fields from the Ship Date table are used in visuals, their names
are automatically included in captions like the visual title or axis labels.

To complete the design of the Ship Date table, you can:

 Rename the following columns:


o Date as Ship Date
o Fiscal Year as Ship Fiscal Year
o Fiscal Quarter as Ship Fiscal Quarter
o Month as Ship Month
o Full Date as Ship Full Date
 Sort the Ship Full Date column by the Ship Date column.
 Sort the Ship Month column by the MonthKey column.
 Hide the MonthKey column.
 Create a hierarchy named Fiscal with the following levels:
o Ship Fiscal Year
o Ship Fiscal Quarter
o Ship Month
o Ship Full Date
 Mark the Ship Date table as a date table by using the Ship Date column.

Calculated tables are useful to work in scenarios when multiple relationships between
two tables exist, as previously described. They can also be used to add a date table to
your model. Date tables are required to apply special time filters known as time
intelligence.

Create a date table


In the next example, a second calculated table will be created, this time by using
the CALENDARAUTO DAX function.

Create the Due Date calculated table by using the following definition.

DAX
Due Date = CALENDARAUTO(6)

The CALENDARAUTO DAX function takes a single optional argument, which is the last
month number of the year, and returns a single-column table. If you don't pass in a
month number, it's assumed to be 12 (for December). For example, at Adventure
Data Modelling With DAX

Works, their financial year ends on June 30 of each year, so the value 6 (for June) is
passed in.

The function scans all date and date/time columns in your model to determine the
earliest and latest stored date values. It then produces a complete set of dates that
span all dates in your model, ensuring that full years of dates are loaded. For
example, if the earliest date that is stored in your model is October 15, 2021, then the
first date that is returned by the CALENDARAUTO function would be July 1, 2021. If the
latest date that is stored in the model is June 15, 2022, then the last date that is
returned by the CALENDARAUTO function would be June 30, 2022.

Effectively, the CALENDARAUTO function guarantees that the following requirements


to mark a date table are met:

 The table must include a column of data type Date.


 The column must contain complete years.
 The column must not have missing dates.
Tip

You can also create a date table by using the CALENDAR DAX function and passing in
two date values, which represent the date range. The function generates one row for
each date within the range. You can pass in static date values or pass in expressions
that retrieve the earliest/latest dates from specific columns in your model.

Next, switch to data view, and then in the Fields pane, select the Due Date table.
Now, review the column of dates. You might want to order them to see the earliest
date in the first row by selecting the arrow inside the Date column header and then
sorting in ascending order.

Note

Ordering or filtering columns doesn't change how the values are stored. These
functions help you explore and understand the data.
Data Modelling With DAX

Now that the Date column is selected, review the message in the status bar (located
in the lower-left corner). It describes how many rows that the table stores and how
many distinct values are found in the selected column.

When the table rows and distinct values are the same, it means that the column
contains unique values. That factor is important for two reasons: It satisfies the
requirements to mark a date table, and it allows this column to be used in a model
relationship as the one-side.

The Due Date calculated table will recalculate each time a table that contains a date
column refreshes. In other words, when a row is loaded into the Sales table with an
order date of July 1, 2022, the Due Date table will automatically extend to include
dates through to the end of the next year: June 30, 2023.

The Due Date table requires additional columns to support the known filtering and
grouping requirements, specifically by year, quarter, and month.

Create calculated columns

You can write a DAX formula to add a calculated column to any table in your model.
A calculated column formula must return a scalar or single value.

Calculated columns in import models have a cost: They increase the model storage
size and they can prolong the data refresh time. The reason is because calculated
columns recalculate when they have formula dependencies to refreshed tables.

In data view, in the Fields pane, ensure that the Due Date table is selected. Before
you create a calculated column, first rename the Date column to Due Date.
Data Modelling With DAX

Now, you can add a calculated column to the Due Date table. To create a calculated
column, in the Table tools contextual ribbon, from inside the Calculations group,
select New column.

In the formula bar, enter the following calculated column definition and then
press Enter.

DAX
Due Fiscal Year =
"FY"
& YEAR('Due Date'[Due Date])
+ IF(
MONTH('Due Date'[Due Date]) > 6,
1
)

The calculated column definition adds the Due Fiscal Year column to
the Due Date table. The following steps describe how Microsoft Power BI evaluates
the calculated column formula:

1. The addition operator (+) is evaluated before the text concatenation operator (&).
2. The YEAR DAX function returns the whole number value of the due date year.
3. The IF DAX function returns the value when the due date month number is 7-12 (July
to December); otherwise, it returns BLANK. (For example, because the Adventure
Works financial year is July-June, the last six months of the calendar year will use the
next calendar year as their financial year.)
4. The year value is added to the value that is returned by the IF function, which is the
value one or BLANK. If the value is BLANK, it's implicitly converted to zero (0) to allow
the addition to produce the fiscal year value.
5. The literal text value "FY" concatenated with the fiscal year value, which is implicitly
converted to text.

Add a second calculated column by using the following definition:

DAX
Due Fiscal Quarter =
'Due Date'[Due Fiscal Year] & " Q"
& IF(
MONTH('Due Date'[Due Date]) <= 3,
3,
Data Modelling With DAX

IF(
MONTH('Due Date'[Due Date]) <= 6,
4,
IF(
MONTH('Due Date'[Due Date]) <= 9,
1,
2
)
)
)

The calculated column definition adds the Due Fiscal Quarter column to
the Due Date table. The IF function returns the quarter number (Quarter 1 is July-
September), and the result is concatenated to the Due Fiscal Year column value and
the literal text Q.

Add a third calculated column by using the following definition:

DAX
Due Month =
FORMAT('Due Date'[Due Date], "yyyy mmm")

The calculated column definition adds the Due Month column to


the Due Date table. The FORMAT DAX function converts the Due Date column value
to text by using a format string. In this case, the format string produces a label that
describes the year and abbreviated month name.

Note

Many user-defined date/time formats exist. For more information, see Custom date
and time formats for the FORMAT function.

Add a fourth calculated column by using the following definition:

DAX
Due Full Date =
FORMAT('Due Date'[Due Date], "yyyy mmm, dd")

Add a fifth calculated column by using the following definition:

DAX
MonthKey =
(YEAR('Due Date'[Due Date]) * 100) + MONTH('Due Date'[Due Date])

The MonthKey calculated column multiplies the due date year by the value 100 and
then adds the month number of the due date. It produces a numeric value that can
be used to sort the Due Month text values in chronological order.
Data Modelling With DAX

Verify that the Due Date table has six columns. The first column was added when the
calculated table was created, and the other five columns were added as calculated
columns.

To complete the design of the Due Date table, you can:

 Sort the Due Full Date column by the Due Date column.
 Sort the Due Month column by the MonthKey column.
 Hide the MonthKey column.
 Create a hierarchy named Fiscal with the following levels:
o Due Fiscal Year
o Due Fiscal Quarter
o Due Month
o Due Full Date
 Mark the Due Date table as a date table by using the Due Date column.

Learn about row context


Now that you've created calculated columns, you can learn how their formulas are
evaluated.

The formula for a calculated column is evaluated for each table row. Furthermore, it's
evaluated within row context, which means the current row. Consider
the Due Fiscal Year calculated column definition:

DAX
Due Fiscal Year =
"FY"
& YEAR('Due Date'[Due Date])
+ IF(
MONTH('Due Date'[Due Date]) <= 6,
1
)
Data Modelling With DAX

When the formula is evaluated for each row, the 'Due Date'[Due Date] column
reference returns the column value for that row. You might be familiar with this
concept from working with formulas in Excel tables.

However, row context doesn't extend beyond the table. If your formula needs to
reference columns in other tables, you have two options:

 If the tables are related, directly or indirectly, you can use


the RELATED or RELATEDTABLE DAX function. The RELATED function retrieves the
value at the one-side of the relationship, while the RELATEDTABLE retrieves values
on the many-side. The RELATEDTABLE function returns a table object.
 When the tables aren't related, you can use the LOOKUPVALUE DAX function.

Generally, try to use the RELATED function whenever possible. It will usually perform
better than the LOOKUPVALUE function due to the ways that relationship and column
data is stored and indexed.

Now, add the following calculated column definition to the Sales table:

DAX
Discount Amount =
(
Sales[Order Quantity]
* RELATED('Product'[List Price])
) - Sales[Sales Amount]

The calculated column definition adds the Discount Amount column to


the Sales table. Power BI evaluates the calculated column formula for each row of
the Sales table. The values for the Order Quantity and Sales Amount columns are
retrieved within row context. However, because the List Price column belongs to
the Product table, the RELATED function is required to retrieve the list price value for
the sale product.

Row context is used when calculated column formulas are evaluated. It's also used
when a class of functions, known as iterator functions, are used. Iterator functions
provide you with flexibility to create sophisticated summarizations. Iterator functions
are described in a later module.

1. Which statement about calculated tables is true? - Calculated tables


increase the size of the semantic model.
2. Which statement about calculated columns is true? - Calculated
column formulas are evaluated by using row context.

You can use the DAX parent-child functions to naturalize the recursive (employee-
manager) relationship into columns.
Data Modelling With DAX

Filter context describes the filters that are applied during the evaluation of a measure
or measure expression. Filters can be applied directly to columns, like a filter on
the Fiscal Year column in the Date table for the value FY2020. Additionally, filters
can be applied indirectly, which happens when model relationships propagate filters
to other tables. For example, the Sales table receives a filter through its relationship
with the Date table, filtering the Sales table rows to those with
an OrderDateKey column value in FY2020.

Note

Calculated tables and calculated columns aren't evaluated within filter context.
Calculated columns are evaluated in row context, though the formula can transition
the row context to filter context, if it needs to summarize model data. Context
transition is described in Unit 5.

At report design time, filters are applied in the Filters pane or to report visuals. The
slicer visual is an example of a visual whose only purpose is to filter the report page
(and other pages when it's configured as a synced slicer). Report visuals, which
perform grouping, also apply filters. They're implied filters; the difference is that the
filter result is visible in the visual. For example, a stacked column chart visual can filter
by fiscal year FY2020, group by month, and summarize sales amount. The fiscal year
filter isn't visible in the visual result, yet the grouping, which results in a column for
each month, behaves as a filter.

Not all filters are applied at report design time. Filters can be added when a report
user interacts with the report. They can modify filter settings in the Filters pane, and
they can cross-filter or cross-highlight visuals by selecting visual elements like
Data Modelling With DAX

columns, bars, or pie chart segments. These interactions apply additional filters to
report page visuals (unless interactions have been disabled).

It's important to understand how filter context works. It guides you in defining the
correct formula for your calculations. As you write more complex formulas, you'll
identify times when you need to add, modify, or remove filters to achieve the desired
result.

Consider an example that requires your formula to modify the filter context. Your
objective is to produce a report visual that shows each sales region together with its
revenue and revenue as a percentage of total revenue.

The Revenue % Total Region result is achieved by defining a measure expression


that's the ratio of revenue divided by revenue for all regions. Therefore, for Australia,
the ratio is 10,655,335.96 dollars divided by 109,809,274.20 dollars, which is 9.7
percent.

The numerator expression doesn't need to modify filter context; it should use the
current filter context (a visual that groups by region applies a filter for that region).
The denominator expression, however, needs to remove any region filters to achieve
the result for all regions.

Tip

The key to writing complex measures is mastering these concepts:

 Understanding how filter context works.


 Understanding when and how to modify or remove filters to achieve a required
result.
Data Modelling With DAX

 Composing a formula to accurately and efficiently modify filter context.

Mastering these concepts takes practice and time. Rarely will students understand
the concepts from the beginning of training. Therefore, be patient and persevere
with the theory and activities. We recommend that you repeat this module at a later
time to help reinforce key lessons.

The next unit introduces the CALCULATE DAX function. It's one of the most powerful
DAX functions, allowing you to modify filter context when your formulas are
evaluated.

Modify filter context


You can use the CALCULATE DAX function to modify filter context in your formulas.
The syntax for the CALCULATE function is as follows:

DAX
CALCULATE(<expression>, [[<filter1>], <filter2>]…)

The function requires passing in an expression that returns a scalar value and as
many filters as you need. The expression can be a measure (which is a named
expression) or any expression that can be evaluated in filter context.

Filters can be Boolean expressions or table expressions. It's also possible to pass in
filter modification functions that provide additional control when you're modifying
filter context.

When you have multiple filters, they're evaluated by using the AND logical operator,
which means that all conditions must be TRUE at the same time.

Note

The CALCULATETABLE DAX function performs exactly the same functionality as


the CALCULATE function, except that it modifies the filter context that's applied to an
expression that returns a table object. In this module, the explanations and examples
use the CALCULATE function, but keep in mind that these scenarios could also apply to
the CALCULATETABLE function.

Apply Boolean expression filters


A Boolean expression filter is an expression that evaluates to TRUE or FALSE. Boolean
filters must abide by the following rules:
Data Modelling With DAX

 They can reference only a single column.


 They cannot reference measures.
 They cannot use functions that scan or return a table that includes aggregation
functions like SUM.

In this example, you will create a measure. First, download and open
the Adventure Works DW 2020 M06.pbix file. Then add the following measure to
the Sales table that filters the Revenue measure by using a Boolean expression filter
for red products.

DAX
Revenue Red = CALCULATE([Revenue], 'Product'[Color] = "Red")

Add the Revenue Red measure to the table visual that is found on Page 1 of the
report.

In this next example, the following measure filters the Revenue measure by multiple
colors. Notice the use of the IN operator followed by a list of color values.

DAX
Revenue Red or Blue = CALCULATE([Revenue], 'Product'[Color] IN {"Red", "Blue"})

The following measure filters the Revenue measure by expensive products.


Expensive products are those with a list price greater than USD 1000.

DAX
Revenue Expensive Products = CALCULATE([Revenue], 'Product'[List Price] > 1000)
Data Modelling With DAX

Apply table expression filters


A table expression filter applies a table object as a filter. It could be a reference to a
model table; however, it's likely a DAX function that returns a table object.

Commonly, you'll use the FILTER DAX function to apply complex filter conditions,
including those that can't be defined by a Boolean filter expression.
The FILTER function is classed as an iterator function, and so you would pass in a
table, or table expression, and an expression to evaluate for each row of that table.

The FILTER function returns a table object with exactly the same structure as one that
the table passed in. Its rows are a subset of those rows that were passed in, meaning
the rows where the expression evaluated as TRUE.

The following example shows a table filter expression that uses the FILTER function:

DAX
Revenue High Margin Products =
CALCULATE(
[Revenue],
FILTER(
'Product',
'Product'[List Price] > 'Product'[Standard Cost] * 2
)
)

In this example, the FILTER function filters all rows of the Product table that are in
filter context. Each row for a product where its list price exceeds double its standard
cost is displayed as a row of the filtered table. Therefore, the Revenue measure is
evaluated for all products that are returned by the FILTER function.

All filter expressions that are passed in to the CALCULATE function are table filter
expressions. A Boolean filter expression is a shorthand notation to improve the
writing and reading experience. Internally, Microsoft Power BI translates Boolean
filter expressions to table filter expressions, which is how it translates
your Revenue Red measure definition.

DAX
Revenue Red =
CALCULATE(
[Revenue],
FILTER(
'Product',
'Product'[Color] = "Red"
)
)
Data Modelling With DAX

Filter behavior
Two possible standard outcomes occur when you add filter expressions to
the CALCULATE function:

 If the columns (or tables) aren't in filter context, then new filters will be added
to the filter context to evaluate the CALCULATE expression.
 If the columns (or tables) are already in filter context, the existing filters will be
overwritten by the new filters to evaluate the CALCULATE expression.

The following examples show how adding filter expressions to the CALCULATE function
works.

Note

In each of the examples, no filters are applied to the table visual.

As in the previous activity, the Revenue Red measure was added to a table visual
that groups by region and displays revenue.

Because no filter is applied on the Color column in the Product table, the evaluation
of the measure adds a new filter to filter context. In the first row, the value of
$2,681,324.79 is for red products that were sold in the Australian region.

Switching the first column of the table visual from Region to Color will produce a
different result because the Color column in the Product table is now in filter
context.
Data Modelling With DAX

The Revenue Red measure formula evaluates the Revenue measure by adding a
filter on the Color column (to red) in the Product table. Consequently, in this visual
that groups by color, the measure formula overwrites the filter context with a new
filter.

This result might or might not be what you want. The next unit introduces
the KEEPFILTERS DAX function, which is a filter modification function that you can
use to preserve filters rather than overwrite them.

Use filter modifier functions

When using the CALCULATE function, you can pass in filter modification functions,
which allow you to accomplish more than adding filters alone.

Remove filters
Use the REMOVEFILTERS DAX function as a CALCULATE filter expression to remove filters
from filter context. It can remove filters from one or more columns or from all
columns of a single table.

Note

The REMOVEFILTERS function is relatively new. In previous versions of DAX, you


removed filters by using the ALL DAX function or variants including
the ALLEXCEPT and the ALLNOBLANKROW DAX functions. These functions behave as
Data Modelling With DAX

both filter modifiers and as functions that return table objects of distinct values.
These functions are mentioned now because you're likely to find documentation and
formula examples that remove filters by using them.

In the following example, you will add a new measure to the Sales table that
evaluates the Revenue measure but does so by removing filters from
the Sales Territory table. Format the measure as currency with two decimal places.

DAX
Revenue Total Region = CALCULATE([Revenue], REMOVEFILTERS('Sales Territory'))

Now, add the Revenue Total Region measure to the matrix visual that is found
on Page 2 of the report. The matrix visual will group by three columns from the
Sales Territory table on the rows: Group, Country, and Region.

Notice that each Revenue Total Region value is the same. It's the value of total
revenue.
Data Modelling With DAX

While this result on its own isn't useful, when it's used as a denominator in a ratio, it
calculates a percent of grand total. Therefore, you will now overwrite
the Revenue Total Region measure definition with the following definition. (This
new definition changes the measure name and declares two variables. Be sure to
format the measure as a percentage with two decimal places.)

DAX
Revenue % Total Region =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalRegionRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory')
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalRegionRevenue
)

Verify that the matrix visual now displays the Revenue % Total Region values.

You'll now create another measure, but this time, you will calculate the ratio of
revenue for a region divided by its country's or region's revenue.
Data Modelling With DAX

Before you complete this task, notice that the Revenue % Total Region value for the
Southwest region is 22.95 percent. Investigate the filter context for this cell. Switch to
data view and then, in the Fields pane, select the Sales Territory table.

Apply the following column filters:

 Group - North America


 Country - United States
 Region - Southwest

Notice that the filters reduce the table to only one row. Now, while thinking about
your new objective to create a ratio of the region revenue over its country's revenue,
clear the filter from the Region column.

Notice that five rows now exist, each row belonging to the country United States.
Accordingly, when you clear the Region column filters, while preserving filters on
Data Modelling With DAX

the Country and Group columns, you will have a new filter context that's for the
region's country.

In the following measure definition, notice how you can clear or remove a filter from
a column. In DAX logic, it's a small and subtle change that's made to
the Revenue % Total Region measure formula: The REMOVEFILTERS function now
removes filters from the Region column instead of all columns of
the Sales Territory table.

DAX
Revenue % Total Country =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalCountryRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory'[Region])
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalCountryRevenue
)

Add the Revenue % Total Country measure to the Sales table and then format it as
a percentage with two decimal places. Add the new measure to the matrix visual.

Notice that all values, except those values for United States regions, are 100 percent.
The reason is because, at the Adventure Works company, the United States has
regions, while all other countries/regions do not.
Data Modelling With DAX

Note

Tabular models don't support ragged hierarchies, which are hierarchies with variable
depths. Therefore, it's a common design approach to repeat parent (or other
ancestor) values at lower levels of the hierarchy. For example, Australia doesn't have
a region, so the country/region value is repeated as the region name. It's always
better to store a meaningful value instead of BLANK.

The next example is last measure that you will create. Add
the Revenue % Total Group measure, and then format it as a percentage with two
decimal places. Then, add the new measure to the matrix visual.

DAX
Revenue % Total Group =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalGroupRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS(
'Sales Territory'[Region],
'Sales Territory'[Country]
)
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalGroupRevenue
)

When you remove filters from the Region and Country columns in
the Sales Territory table, the measure will calculate the region revenue as a ratio of
its group's revenue.
Data Modelling With DAX

Preserve filters
You can use the KEEPFILTERS DAX function as a filter expression in
the CALCULATE function to preserve filters.

To observe how to accomplish this task, switch to Page 1 of the report. Then, modify
the Revenue Red measure definition to use the KEEPFILTERS function.

DAX
Revenue Red =
CALCULATE(
[Revenue],
KEEPFILTERS('Product'[Color] = "Red")
)

In the table visual, notice that only one Revenue Red value exists. The reason is
because the Boolean filter expression preserves existing filters on the Color column
in the Product table. The reason why colors other than red are BLANK is because the
filter contexts and the filter expressions are combined for these two filters. The color
black and color red are intersected, and because both can't be TRUE at the same time,
the expression is filtered by no product rows. It's only possible that both red filters
can be TRUE at the same time, which explains why the one Revenue Red value is
shown.

Use inactive relationships


An inactive model relationship can only propagate filters when
the USERELATIONSHIP DAX function is passed as a filter expression to
Data Modelling With DAX

the CALCULATE function. When you use this function to engage an inactive
relationship, the active relationship will automatically become inactive.

Review an example of a measure definition that uses an inactive relationship to


calculate the Revenue measure by shipped dates:

DAX
Revenue Shipped =
CALCULATE (
[Revenue],
USERELATIONSHIP('Date'[DateKey], Sales[ShipDateKey])
)

Modify relationship behavior


You can modify the model relationship behavior when an expression is evaluated by
passing the CROSSFILTER DAX function as a filter expression to the CALCULATE function.
It's an advanced capability.

The CROSSFILTER function can modify filter directions (from both to single or from
single to both) and even disable a relationship.

Examine filter context

The VALUES DAX function lets your formulas determine what values are in filter
context.

The VALUES function syntax is as follows:

DAX
VALUES(<TableNameOrColumnName>)

The function requires passing in a table reference or a column reference. When you
pass in a table reference, it returns a table object with the same columns that contain
rows for what's in filter context. When you pass in a column reference, it returns a
single-column table of unique values that are in filter context.

The function always returns a table object and it's possible for a table to contain
multiple rows. Therefore, to test whether a specific value is in filter context, your
formula must first test that the VALUES function returns a single row. Two functions
can help you accomplish this task: the HASONEVALUE and the SELECTEDVALUE DAX
functions.
Data Modelling With DAX

The HASONEVALUE function returns TRUE when a given column reference has been
filtered down to a single value.

The SELECTEDVALUE function simplifies the task of determining what a single value
could be. When the function is passed a column reference, it'll return a single value,
or when more than one value is in filter context, it'll return BLANK (or an alternate
value that you pass to the function).

In the following example, you will use the HASONEVALUE function. Add the following
measure, which calculates sales commission, to the Sales table. Note that, at
Adventure Works, the commission rate is 10 percent of revenue for all
countries/regions except the United States. In the United States, salespeople earn 15
percent commission. Format the measure as currency with two decimal places, and
then add it to the table that is found on Page 3 of the report.

DAX
Sales Commission =
[Revenue]
* IF(
HASONEVALUE('Sales Territory'[Country]),
IF(
VALUES('Sales Territory'[Country]) = "United States",
0.15,
0.1
)
)

Notice that the total Sales Commission result is BLANK. The reason is because
multiple values are in filter context for the Country column in
the Sales Territory table. In this case, the HASONEVALUE function returns FALSE, which
results in the Revenue measure being multiplied by BLANK (a value multiplied by
Data Modelling With DAX

BLANK is BLANK). To produce a total, you will need to use an iterator function, which
is explained later in this module.

Three other functions that you can use to test filter state are:

 ISFILTERED - Returns TRUE when a passed-in column reference is directly filtered.


 ISCROSSFILTERED - Returns TRUE when a passed-in column reference
is indirectly filtered. A column is cross-filtered when a filter that is applied to
another column in the same table, or in a related table, affects the reference
column by filtering it.
 ISINSCOPE - Returns TRUE when a passed-in column reference is the level in a
hierarchy of levels.

Return to Page 2 of the report, and then modify


the Revenue % Total Country measure definition to test that the Region column in
the Sales Territory table is in scope. If it's not in scope, the measure result should be
BLANK.

DAX
Revenue % Total Country =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalCountryRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory'[Region])
)
RETURN
IF(
ISINSCOPE('Sales Territory'[Region]),
DIVIDE(
CurrentRegionRevenue,
TotalCountryRevenue
)
)
Data Modelling With DAX

In the matrix visual, notice that Revenue % Total Country values are now only
displayed when a region is in scope.

Perform context transition


What happens when a measure or measure expression is evaluated within row
context? This scenario can happen in a calculated column formula or when an
expression in an iterator function is evaluated.

In the following example, you will add a calculated column to the Customer table to
classify customers into a loyalty class. The scenario is simple: When the revenue that
is produced by the customer is less than $2500, the customer is classified as Low;
otherwise they're classified as High.

DAX
Customer Segment =
VAR CustomerRevenue = SUM(Sales[Sales Amount])
RETURN
IF(CustomerRevenue < 2500, "Low", "High")

On Page 4 of the report, add the Customer Segment column as the legend of the
pie chart.
Data Modelling With DAX

Notice that only one Customer Segment value exists. The reason is because the
calculated column formula produces an incorrect result: Each customer is assigned
the value of High because the expression SUM(Sales[Sales Amount]) isn't evaluated in
a filter context. Consequently, each customer is assessed on the sum
of every Sales Amount column value in the Sales table.

To force the evaluation of the SUM(Sales[Sales Amount]) expression for each customer,
a context transition must take place that applies the row context column values to
filter context. You can accomplish this transition by using the CALCULATE function
without passing in filter expressions.

Modify the calculated column definition so that it produces the correct result.

DAX
Customer Segment =
VAR CustomerRevenue = CALCULATE(SUM(Sales[Sales Amount]))
RETURN
IF(CustomerRevenue < 2500, "Low", "High")

In the pie chart visual add the new calculated column to the Legend well, verify that
two pie segments now display.
Data Modelling With DAX

In this case, the CALCULATE function applies row context values as filters, known
as context transition. To be accurate, the process doesn't quite work that way when a
unique column is on the table. When a unique column is on the table, you only need
to apply a filter on that column to make the transition happen. In this case, Power BI
applies a filter on the CustomerKey column for the value in row context.

If you reference measures in an expression that's evaluated in row context, context


transition is automatic. Thus, you don't need to pass measure references to
the CALCULATE function.

Modify the calculated column definition, which references the Revenue measure,
and notice that it continues to produce the correct result.

DAX
Customer Segment =
VAR CustomerRevenue = [Revenue]
RETURN
IF(CustomerRevenue < 2500, "Low", "High")

Now, you can complete the Sales Commission measure formula. To produce a total,
you need to use an iterator function to iterate over all regions in filter context. The
iterator function expression must use the CALCULATE function to transition the row
context to the filter context. Notice that it no longer needs to test whether a
single Country column value in the Sales Territory table is in filter context because
it's known to be filtering by a single country/region (because it's iterating over the
regions in filter context and a region belongs to only one country/region).
Data Modelling With DAX

Switch to Page 3 of the report, and then modify the Sales Commission measure
definition to use the SUMX iterator function:

DAX
Sales Commission =
SUMX(
VALUES('Sales Territory'[Region]),
CALCULATE(
[Revenue]
* IF(
VALUES('Sales Territory'[Country]) = "United States",
0.15,
0.1
)
)
)

The table visual now displays a sales commission total for all regions.

1.Which type of model object is evaluated within a filter context? - Measures (or
measure expressions) are always evaluated in filter context.

2.Which one of the following DAX functions allows you to use an inactive
relationship when evaluating a measure expression? - The USERELATIONSHIP
function is a filter modifier function that can be passed in to the CALCULATE function.
Its purpose is to engage an inactive relationship.

3. Which one of the following statements about the CALCULATE function is


true? - The CALCULATE function modifies filter context by adding or removing filters
or by modifying standard filter behavior.
Data Modelling With DAX

Introduction
Time intelligence relates to calculations over time. Specifically, it relates to
calculations over dates, months, quarters, or years, and possibly time. Rarely would
you need to calculate over time in the sense of hours, minutes, or seconds.

In Data Analysis Expressions (DAX) calculations, time intelligence means modifying


the filter context for date filters.

For example, at the Adventure Works company, their financial year begins on July 1
and ends on June 30 of the following year. They produce a table visual that displays
monthly revenue and year-to-date (YTD) revenue.

The filter context for 2017 August contains each of the 31 dates of August, which
are stored in the Date table. However, the calculated year-to-date revenue for 2017
August applies a different filter context. It's the first date of the year through to the
last date in filter context. In this example, that's July 1, 2017 through to August 31,
2017.

Time intelligence calculations modify date filter contexts. They can help you answer
these time-related questions:

 What's the accumulation of revenue for the year, quarter, or month?


 What revenue was produced for the same period last year?
Data Modelling With DAX

 What growth in revenue has been achieved over the same period last year?
 How many new customers made their first order in each month?
 What's the inventory stock on-hand value for the company's products?

This module describes how to create time intelligence measures to answer these
questions.

Use DAX time intelligence functions

DAX includes several time intelligence functions to simplify the task of modifying
date filter context. You could write many of these intelligence formulas by using
a CALCULATE function that modifies date filters, but that would create more work.

Note

Many DAX time intelligence functions are concerned with standard date periods,
specifically years, quarters, and months. If you have irregular time periods (for
example, financial months that begin mid-way through the calendar month), or you
need to work with weeks or time periods (hours, minutes, and so on), the DAX time
intelligence functions won't be helpful. Instead, you'll need to use
the CALCULATE function and pass in hand-crafted date or time filters.

Date table requirement

To work with time intelligence DAX functions, you need to meet the prerequisite
model requirement of having at least one date table in your model. A date table is a
table that meets the following requirements:

 It must have a column of data type Date (or date/time), known as the date column.
 The date column must contain unique values.
 The date column must not contain BLANKs.
 The date column must not have any missing dates.
 The date column must span full years. A year isn't necessarily a calendar year (January-
December).
 The date table must be indicated as a date table.

For more information, see Create date tables in Power BI Desktop.

Summarizations over time

One group of DAX time intelligence functions is concerned with summarizations over
time:
Data Modelling With DAX

 DATESYTD - Returns a single-column table that contains dates for the year-to-date
(YTD) in the current filter context. This group also includes
the DATESMTD and DATESQTD DAX functions for month-to-date (MTD) and
quarter-to-date (QTD). You can pass these functions as filters into
the CALCULATE DAX function.
 TOTALYTD - Evaluates an expression for YTD in the current filter context. The
equivalent QTD and MTD DAX functions of TOTALQTD and TOTALMTD are also
included.
 DATESBETWEEN - Returns a table that contains a column of dates that begins with a
given start date and continues until a given end date.
 DATESINPERIOD - Returns a table that contains a column of dates that begins with a
given start date and continues for the specified number of intervals.
Note

While the TOTALYTD function is simple to use, you are limited to passing in one filter
expression. If you need to apply multiple filter expressions, use the CALCULATE function
and then pass the DATESYTD function in as one of the filter expressions.

In the following example, you will create your first time intelligence calculation that
will use the TOTALYTD function. The syntax is as follows:

DAX
TOTALYTD(<expression>, <dates>, [, <filter>][, <year_end_date>])

The function requires an expression and, as is common to all time intelligence


functions, a reference to the date column of a marked date table. Optionally, a single
filter expression or the year-end date can be passed in (required only when the year
doesn't finish on December 31).

Download and open the Adventure Works DW 2020 M07.pbix file. Then, add the
following measure definition to the Sales table that calculates YTD revenue. Format
the measure as currency with two decimal places.

DAX
Revenue YTD =
TOTALYTD([Revenue], 'Date'[Date], "6-30")

The year-end date value of "6-30" represents June 30.

On Page 1 of the report, add the Revenue YTD measure to the matrix visual. Notice
that it produces a summarization of the revenue amounts from the beginning of the
year through to the filtered month.
Data Modelling With DAX

Comparisons over time

Another group of DAX time intelligence functions is concerned with shifting time
periods:

 DATEADD - Returns a table that contains a column of dates, shifted either forward or
backward in time by the specified number of intervals from the dates in the current
filter context.
 PARALLELPERIOD - Returns a table that contains a column of dates that represents a
period that is parallel to the dates in the specified dates column, in the current filter
context, with the dates shifted a number of intervals either forward in time or back in
time.
 SAMEPERIODLASTYEAR - Returns a table that contains a column of dates that are
shifted one year back in time from the dates in the specified dates column, in the
current filter context.
 Many helper DAX functions for navigating backward or forward for specific time
periods, all of which returns a table of dates. These helper functions
include NEXTDAY, NEXTMONTH, NEXTQUARTER, NEXTYEAR,
and PREVIOUSDAY, PREVIOUSMONTH, PREVIOUSQUARTER,
and PREVIOUSYEAR.

Now, you will add a measure to the Sales table that calculates revenue for the prior
year by using the SAMEPERIODLASTYEAR function. Format the measure as currency with
two decimal places.

DAX
Revenue PY =
VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date]))
RETURN
Data Modelling With DAX

RevenuePriorYear

Add the Revenue PY measure to the matrix visual. Notice that it produces results
that are similar to the previous year's revenue amounts.

Next, you will modify the measure by renaming it to Revenue YoY % and then
updating the RETURN clause to calculate the change ratio. Be sure to change the
format to a percentage with two decimals places.

DAX
Revenue YoY % =
VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date]))
RETURN
DIVIDE(
[Revenue] - RevenuePriorYear,
RevenuePriorYear
)
Data Modelling With DAX

Notice that the Revenue YoY % measure produces a ratio of change factor over the
previous year's monthly revenue. For example, July 2018 represents a 106.53
percent increase over the previous year's monthly revenue, and November 2018
represents a 24.22 percent decrease over the previous year's monthly revenue.

Note

The Revenue YoY % measure demonstrates a good use of DAX variables. The
measure improves the readability of the formula and allows you to unit test part of
the measure logic (by returning the RevenuePriorYear variable value). Additionally,
the measure is an optimal formula because it doesn't need to retrieve the prior year's
revenue value twice. Having stored it once in a variable, the RETURN clause uses to the
variable value twice.
Data Modelling With DAX

Additional time intelligence


calculations

Other DAX time intelligence functions exist that are concerned with returning a
single date. You'll learn about these functions by applying them in two different
scenarios.

The FIRSTDATE and the LASTDATE DAX functions return the first and last date in the
current filter context for the specified column of dates.

Calculate new occurrences


Another use of time intelligence functions is to count new occurrences. The following
example shows how you can calculate the number of new customers for a time
period. A new customer is counted in the time period in which they made their first
purchase.

Your first task is to add the following measure to the Sales table that counts the
number of distinct customers life-to-date (LTD). Life-to-date means from the
beginning of time until the last date in filter context. Format the measure as a whole
number by using the thousands separator.

DAX
Customers LTD =
VAR CustomersLTD =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MAX('Date'[Date])
),
'Sales Order'[Channel] = "Internet"
)
RETURN
CustomersLTD

Add the Customers LTD measure to the matrix visual. Notice that it produces a
result of distinct customers LTD until the end of each month.
Data Modelling With DAX

The DATESBETWEEN function returns a table that contains a column of dates that begins
with a given start date and continues until a given end date. When the start date is
BLANK, it will use the first date in the date column. (Conversely, when the end date is
BLANK, it will use the last date in the date column.) In this case, the end date is
determined by the MAX function, which returns the last date in filter context.
Therefore, if the month of August 2017 is in filter context, then the MAX function will
return August 31, 2017 and the DATESBETWEEN function will return all dates through to
August 31, 2017.

Next, you will modify the measure by renaming it to New Customers and by adding
a second variable to store the count of distinct customers before the time period in
filter context. The RETURN clause now subtracts this value from LTD customers to
produce a result, which is the number of new customers in the time period.

DAX
New Customers =
VAR CustomersLTD =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MAX('Date'[Date])
),
'Sales Order'[Channel] = "Internet"
)
VAR CustomersPrior =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MIN('Date'[Date]) - 1
Data Modelling With DAX

),
'Sales Order'[Channel] = "Internet"
)
RETURN
CustomersLTD - CustomersPrior

For the CustomersPrior variable, notice that the DATESBETWEEN function includes dates
until the first date in filter context minus one. Because Microsoft Power BI internally
stores dates as numbers, you can add or subtract numbers to shift a date.

Snapshot calculations
Occasionally, fact data is stored as snapshots in time. Common examples include
inventory stock levels or account balances. A snapshot of values is loaded into the
table on a periodic basis.

When summarizing snapshot values (like inventory stock levels), you can summarize
values across any dimension except date. Adding stock level counts across product
categories produces a meaningful summary, but adding stock level counts across
dates does not. Adding yesterday's stock level to today's stock level isn't a useful
operation to perform (unless you want to average that result).

When you are summarizing snapshot tables, measure formulas can rely on DAX time
intelligence functions to enforce a single date filter.

In the following example, you will explore a scenario for the Adventure Works
company. Switch to model view and select the Inventory model diagram.
Data Modelling With DAX

Notice that the diagram shows three tables: Product, Date, and Inventory.
The Inventory table stores snapshots of unit balances for each date and product.
Importantly, the table contains no missing dates and no duplicate entries for any
product on the same date. Also, the last snapshot record is stored for the date of
June 15, 2020.

Now, switch to report view and select Page 2 of the report. Add
the UnitsBalance column of the Inventory table to the matrix visual. Its default
summarization is set to sum values.

This visual configuration is an example of how not to summarize a snapshot value.


Adding daily snapshot balances together doesn't produce a meaningful result.
Therefore, remove the UnitsBalance field from the matrix visual.
Data Modelling With DAX

Now, you'll add a measure to the Inventory table that sums


the UnitsBalance value for a single date. The date will be the last date of each time
period. It's achieved by using the LASTDATE function. Format the measure as a whole
number with the thousands separator.

DAX
Stock on Hand =
CALCULATE(
SUM(Inventory[UnitsBalance]),
LASTDATE('Date'[Date])
)
Note

Notice that the measure formula uses the SUM function. An aggregate function must
be used (measures don't allow direct references to columns), but given that only one
row exists for each product for each date, the SUM function will only operate over a
single row.

Add the Stock on Hand measure to the matrix visual. The value for each product is
now based on the last recorded units balance for each month.

The measure returns BLANKs for June 2020 because no record exists for the last date
in June. According to the data, it hasn't happened yet.

Filtering by the last date in filter context has inherent problems: A recorded date
might not exist because it hasn't yet happened, or perhaps because stock balances
aren't recorded on weekends.

Your next step is to adjust the measure formula to determine the last date that has a
non-BLANK result and then filter by that date. You can achieve this task by using
the LASTNONBLANK DAX function.

Use the following measure definition to modify the Stock on Hand measure.

DAX
Stock on Hand =
CALCULATE(
SUM(Inventory[UnitsBalance]),
LASTNONBLANK(
'Date'[Date],
Data Modelling With DAX

CALCULATE(SUM(Inventory[UnitsBalance]))
)
)

In the matrix visual, notice the values for June 2020 and the total (representing the
entire year).

The LASTNONBLANK function is an iterator function. It returns the last date that produces
a non-BLANK result. It achieves this result by iterating through all dates in filter
context in descending chronological order. (Conversely, the FIRSTNONBLANK iterates in
ascending chronological order.) For each date, it evaluates the passed in expression.
When it encounters a non-BLANK result, the function returns the date. That date is
then used to filter the CALCULATE function.

Note

The LASTNONBLANK function evaluates its expression in row context.


The CALCULATE function must be used to transition the row context to filter context to
correctly evaluate the expression.

You should now hide the Inventory table UnitsBalance column. It will prevent
report authors from inappropriately summarizing snapshot unit balances.

1.In the context of semantic model calculations, which statement best describes
time intelligence? - Time intelligence calculations modify date filter contexts.

2. You're developing a semantic model in Power BI Desktop. You've just added


a date table by using the CALENDARAUTO function. You've extended it with
calculated columns, and you've related it to other model tables. What else
should you do to ensure that DAX time intelligence calculations work correctly?
- You must mark the date table so that Power BI can correctly filter its dates

3.
Data Modelling With DAX

You have a table that stores account balance snapshots for each date, excluding
weekends. You need to ensure that your measure formula only filters by a
single date. Also, if no record is on the last date of a time period, it should use
the latest account balance. Which DAX time intelligence function should you
use? - The LASTNONBLANK function will return the last date in the filter context
where a snapshot record exists. This option will help you achieve the objective.

Introduction to performance
optimization
Performance optimization, also known as performance tuning, involves making
changes to the current state of the semantic model so that it runs more efficiently.
Essentially, when your semantic model is optimized, it performs better.

You might find that your report runs well in test and development environments, but
when deployed to production for broader consumption, performance issues arise.
From a report user's perspective, poor performance is characterized by report pages
that take longer to load and visuals taking more time to update. This poor
performance results in a negative user experience.

As a data analyst, you will spend approximately 90 percent of your time working with
your data, and nine times out of ten, poor performance is a direct result of a bad
semantic model, bad Data Analysis Expressions (DAX), or the mix of the two. The
process of designing a semantic model for performance can be tedious, and it is
often underestimated. However, if you address performance issues during
development, you will have a robust Power BI semantic model that will return better
reporting performance and a more positive user experience. Ultimately, you will also
be able to maintain optimized performance. As your organization grows, the size of
its data grows, and its semantic model becomes more complex. By optimizing your
semantic model early, you can mitigate the negative impact that this growth might
have on the performance of your semantic model.

A smaller sized semantic model uses less resources (memory) and achieves faster
data refresh, calculations, and rendering of visuals in reports. Therefore, the
performance optimization process involves minimizing the size of the semantic
model and making the most efficient use of the data in the model, which includes:

 Ensuring that the correct data types are used.


 Deleting unnecessary columns and rows.
 Avoiding repeated values.
 Replacing numeric columns with measures.
 Reducing cardinalities.
Data Modelling With DAX

 Analyzing model metadata.


 Summarizing data where possible.

In this module, you will be introduced to the steps, processes, and concepts that are
necessary to optimize a semantic model for enterprise-level performance. However,
keep in mind that, while the basic performance and best practices guidance in Power
BI will lead you a long way, to optimize a semantic model for query performance, you
will likely have to partner with a data engineer to drive semantic model optimizing in
the source data sources.

For example, assume that you work as a Microsoft Power BI developer for Tailwind
Traders. You have been given a task to review a semantic model that was built a few
years ago by another developer, a person who has since left the organization.

The semantic model produces a report that has received negative feedback from
users. The users are happy with the results that they see in the report, but they are
not satisfied with the report performance. Loading the pages in the report is taking
too long, and tables are not refreshing quickly enough when certain selections are
made. In addition to this feedback, the IT team has highlighted that the file size of
this particular semantic model is too large, and it is putting a strain on the
organization's resources.

You need to review the semantic model to identify the root cause of the performance
issues and make changes to optimize performance.

By the end of this module, you're able to:

 Review the performance of measures, relationships, and visuals.


 Use variables to improve performance and troubleshooting.
 Improve performance by reducing cardinality levels.
 Optimize DirectQuery models with table level storage.
 Create and manage aggregations.
Data Modelling With DAX

Review performance of measures,


relationships, and visuals
Completed100 XP

 19 minutes

If your semantic model has multiple tables, complex relationships, intricate


calculations, multiple visuals, or redundant data, a potential exists for poor report
performance. The poor performance of a report leads to a negative user experience.

To optimize performance, you must first identify where the problem is coming from;
in other words, find out which elements of your report and semantic model are
causing the performance issues. Afterward, you can take action to resolve those
issues and, therefore, improve performance.

Identify report performance bottlenecks


To achieve optimal performance in your reports, you need to create an efficient
semantic model that has fast running queries and measures. When you have a good
foundation, you can improve the model further by analyzing the query plans and
dependencies and then making changes to further optimize performance.

You should review the measures and queries in your semantic model to ensure that
you are using the most efficient way to get the results that you want. Your starting
point should be to identify bottlenecks that exist in the code. When you identify the
slowest query in the semantic model, you can focus on the biggest bottleneck first
and establish a priority list to work through the other issues.
Data Modelling With DAX

Analyze performance

You can use Performance analyzer in Power BI Desktop to help you find out how
each of your report elements is performing when users interact with them. For
example, you can determine how long it takes for a particular visual to refresh when
it is initiated by a user interaction. Performance analyzer will help you identify the
elements that are contributing to your performance issues, which can be useful
during troubleshooting.

Before you run Performance analyzer, to ensure you get the most accurate results
in your analysis (test), make sure that you start with a clear visual cache and a clear
data engine cache.

 Visual cache - When you load a visual, you can't clear this visual cache without
closing Power BI Desktop and opening it again. To avoid any caching in play,
you need to start your analysis with a clean visual cache.

To ensure that you have a clear visual cache, add a blank page to your Power BI
Desktop (.pbix) file and then, with that page selected, save and close the file.
Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open
on the blank page.

 Data engine cache - When a query is run, the results are cached, so the results
of your analysis will be misleading. You need to clear the data cache before
rerunning the visual.

To clear the data cache, you can either restart Power BI Desktop or connect
DAX Studio to the semantic model and then call Clear Cache.

When you have cleared the caches and opened the Power BI Desktop file on the
blank page, go to the View tab and select the Performance analyzer option.

To begin the analysis process, select Start recording, select the page of the report
that you want to analyze, and interact with the elements of the report that you want
to measure. You will see the results of your interactions display in the Performance
analyzer pane as you work. When you are finished, select the Stop button.
Data Modelling With DAX

For more detailed information, see Use Performance Analyzer to examine report
element performance.

Review results

You can review the results of your performance test in the Performance
analyzer pane. To review the tasks in order of duration, longest to shortest, right-
click the Sort icon next to the Duration (ms) column header, and then select Total
time in Descending order.

The log information for each visual shows how much time it took (duration) to
complete the following categories of tasks:

 DAX query - The time it took for the visual to send the query, along with the
time it took Analysis Services to return the results.
 Visual display - The time it took for the visual to render on the screen,
including the time required to retrieve web images or geocoding.
 Other - The time it took the visual to prepare queries, wait for other visuals to
complete, or perform other background processing tasks. If this category
displays a long duration, the only real way to reduce this duration is to optimize
DAX queries for other visuals, or reduce the number of visuals in the report.
Data Modelling With DAX

The results of the analysis test help you to understand the behavior of your semantic
model and identify the elements that you need to optimize. You can compare the
duration of each element in the report and identify the elements that have a long
duration. You should focus on those elements and investigate why it takes them so
long to load on the report page.

To analyze your queries in more detail, you can use DAX Studio, which is a free,
open-source tool that is provided by another service.

Resolve issues and optimize performance


The results of your analysis will identify areas for improvement and opportunities for
performance optimization. You might find that you need to carry out improvements
to the visuals, the DAX query, or other elements in your semantic model. The
following information provides guidance on what to look for and the changes that
you can make.

Visuals

If you identify visuals as the bottleneck leading to poor performance, you should find
a way to improve performance with minimal impact to user experience.
Data Modelling With DAX

Consider the number of visuals on the report page; fewer visuals means better
performance. Ask yourself if a visual is really necessary and if it adds value to the end
user. If the answer is no, you should remove that visual. Rather than using multiple
visuals on the page, consider other ways to provide additional details, such as drill-
through pages and report page tooltips.

Examine the number of fields in each visual. The more visuals you have on the report,
the higher chance for performance issues. In addition, the more visuals, the more the
report can appear crowded and lose clarity. The upper limit for visuals is 100 fields
(measures or columns), so a visual with more than 100 fields will be slow to load. Ask
yourself if you really need all of this data in a visual. You might find that you can
reduce the number of fields that you currently use.

DAX query

When you examine the results in the Performance analyzer pane, you can see how
long it took the Power BI Desktop engine to evaluate each query (in milliseconds). A
good starting point is any DAX query that is taking longer than 120 milliseconds. In
this example, you identify one particular query that has a large duration time.

Performance analyzer highlights potential issues but does not tell you what needs
to be done to improve them. You might want to conduct further investigation into
why this measure takes so long to process. You can use DAX Studio to investigate
your queries in more detail.

For example, select Copy Query to copy the calculation formula onto the clipboard,
then paste it into Dax Studio. You can then review the calculation step in more detail.
In this example, you are trying to count the total number of products with order
quantities greater than or equal to five.

Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
FILTER ( Order, Order[OrderQty] >= 5 )
)

After analyzing the query, you can use your own knowledge and experience to
identify where the performance issues are. You can also try using different DAX
Data Modelling With DAX

functions to see if they improve performance. In the following example, the FILTER
function was replaced with the KEEPFILTER function. When the test was run again
in Performance analyzer, the duration was shorter as a result of the KEEPFILTER
function.

Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
KEEPFILTERS (Order[OrderQty] >= 5 )
)

In this case, you can replace the FILTER function with the KEEPFILTER function to
significantly reduce the evaluation duration time for this query. When you make this
change, to check whether the duration time has improved or not, clear the data
cache and then rerun the Performance analyzer process.

Semantic model

If the duration of measures and visuals are displaying low values (in other words they
have a short duration time), they are not the reason for the performance issues.
Instead, if the DAX query is displaying a high duration value, it is likely that a
measure is written poorly or an issue has occurred with the semantic model. The
issue might be caused by the relationships, columns, or metadata in your model, or it
could be the status of the Auto date/time option, as explained in the following
section.

Relationships

You should review the relationships between your tables to ensure that you have
established the correct relationships. Check that relationship cardinality properties
are correctly configured. For example, a one-side column that contains unique values
might be incorrectly configured as a many-side column. You will learn more about
how cardinality affects performance later in this module.
Data Modelling With DAX

Columns

It is best practice to not import columns of data that you do not need. To avoid
deleting columns in Power Query Editor, you should try to deal with them at the
source when loading data into Power BI Desktop. However, if it is impossible to
remove redundant columns from the source query or the data has already been
imported in its raw state, you can always use Power Query Editor to examine each
column. Ask yourself if you really need each column and try to identify the benefit
that each one adds to your semantic model. If you find that a column adds no value,
you should remove it from your semantic model. For example, suppose that you
have an ID column with thousands of unique rows. You know that you won't use this
particular column in a relationship, so it will not be used in a report. Therefore, you
should consider this column as unnecessary and admit that it is wasting space in
your semantic model.

When you remove an unnecessary column, you will reduce the size of the semantic
model which, in turn, results in a smaller file size and faster refresh time. Also,
because the semantic model contains only relevant data, the overall report
performance will be improved.

For more information, see Data reduction techniques for Import modeling.

Metadata

Metadata is information about other data. Power BI metadata contains information


on your semantic model, such as the name, data type and format of each of the
columns, the schema of the database, the report design, when the file was last
modified, the data refresh rates, and much more.

When you load data into Power BI Desktop, it is good practice to analyze the
corresponding metadata so you can identify any inconsistences with your semantic
model and normalize the data before you start to build reports. Running analysis on
your metadata will improve semantic model performance because, while analyzing
your metadata, you will identify unnecessary columns, errors within your data,
incorrect data types, the volume of data being loaded (large semantic models,
including transactional or historic data, will take longer to load), and much more.

You can use Power Query Editor in Power BI Desktop to examine the columns, rows,
and values of the raw data. You can then use the available tools, such as those
highlighted in the following screenshot, to make the necessary changes.
Data Modelling With DAX

The Power Query options include:

 Unnecessary columns - Evaluates the need for each column. If one or more
columns will not be used in the report and are therefore unnecessary, you
should remove them by using the Remove Columns option on the Home tab.
 Unnecessary rows - Checks the first few rows in the semantic model to see if
they are empty or if they contain data that you do not need in your reports; if
so, it removes those rows by using the Remove Rows option on the Home tab.
 Data type - Evaluates the column data types to ensure that each one is correct.
If you identify a data type that is incorrect, change it by selecting the column,
selecting Data Type on the Transform tab, and then selecting the correct data
type from the list.
 Query names - Examines the query (table) names in the Queries pane. Just like
you did for column header names, you should change uncommon or unhelpful
query names to names that are more obvious or names that the user is more
familiar with. You can rename a query by right-clicking that query,
selecting Rename, editing the name as required, and then pressing Enter.
 Column details - Power Query Editor has the following three data preview
options that you can use to analyze the metadata that is associated with your
columns. You can find these options on the View tab, as illustrated in the
following screenshot.
o Column quality - Determines what percentage of items in the column are
valid, have errors, or are empty. If the Valid percentage is not 100, you
should investigate the reason, correct the errors, and populate empty values.
o Column distribution - Displays frequency and distribution of the values in
each of the columns. You will investigate this further later in this module.
o Column profile - Shows column statistics chart and a column distribution
chart.
Data Modelling With DAX

Note

If you are reviewing a large semantic model with more than 1,000 rows, and you
want to analyze that whole semantic model, you need to change the default option
at the bottom of the window. Select Column profiling based on top 1000
rows > Column profiling based on entire data set.

Other metadata that you should consider is the information about the semantic
model as a whole, such as the file size and data refresh rates. You can find this
metadata in the associated Power BI Desktop (.pbix) file. The data that you load into
Power BI Desktop is compressed and stored to the disk by the VertiPaq storage
engine. The size of your semantic model has a direct impact on its performance; a
smaller sized semantic model uses less resources (memory) and achieves faster data
refresh, calculations, and rendering of visuals in reports.
Data Modelling With DAX

Auto date/time feature

Another item to consider when optimizing performance is the Auto


date/time option in Power BI Desktop. By default, this feature is enabled globally,
which means that Power BI Desktop automatically creates a hidden calculated table
for each date column, provided that certain conditions are met. The new, hidden
tables are in addition to the tables that you already have in your semantic model.

The Auto date/time option allows you to work with time intelligence when filtering,
grouping, and drilling down through calendar time periods. We recommend that you
keep the Auto date/time option enabled only when you work with calendar time
periods and when you have simplistic model requirements in relation to time.

If your data source already defines a date dimension table, that table should be used
to consistently define time within your organization, and you should disable the
global Auto date/time option. Disabling this option can lower the size of your
semantic model and reduce the refresh time.

You can enable/disable this Auto date/time option globally so that it applies to all
of your Power BI Desktop files, or you can enable/disable the option for the current
file so that it applies to an individual file only.

To enable/disable this Auto date/time option, go to File > Options and


settings > Options, and then select either the Global or Current File page. On
either page, select Data Load and then, in the Time Intelligence section, select or
clear the check box as required.
Data Modelling With DAX

For an overview and general introduction to the Auto date/time feature, see Apply
auto date/time in Power BI Desktop.

Use variables to improve performance


and troubleshooting

You can use variables in your DAX formulas to help you write less complex and more
efficient calculations. Variables are underused by developers who are starting out in
Power BI Desktop, but they are effective and you should use them by default when
you are creating measures.
Data Modelling With DAX

Some expressions involve the use of many nested functions and the reuse of
expression logic. These expressions take a longer time to process and are difficult to
read and, therefore, troubleshoot. If you use variables, you can save query processing
time. This change is a step in the right direction toward optimizing the performance
of a semantic model.

The use of variables in your semantic model provides the following advantages:

 Improved performance - Variables can make measures more efficient because


they remove the need for Power BI to evaluate the same expression multiple
times. You can achieve the same results in a query in about half the original
processing time.
 Improved readability - Variables have short, self-describing names and are
used in place of an ambiguous, multi-worded expression. You might find it
easier to read and understand the formulas when variables are used.
 Simplified debugging - You can use variables to debug a formula and test
expressions, which can be helpful during troubleshooting.
 Reduced complexity - Variables do not require the use of EARLIER or EARLIEST
DAX functions, which are difficult to understand. These functions were required
before variables were introduced, and were written in complex expressions that
introduced new filter contexts. Now that you can use variables instead of those
functions, you can write fewer complex formulas.

Use variables to improve performance


To illustrate how you can use a variable to make a measure more efficient, the
following table displays a measure definition in two different ways. Notice that the
formula repeats the expression that calculates "same period last year" but in two
different ways: the first instance uses the normal DAX calculation method and the
second one uses variables in the calculation.

The second row of the table shows the improved measure definition. This definition
uses the VAR keyword to introduce a variable named SalesPriorYear, and it uses an
expression to assign the "same period last year" result to that new variable. It then
uses the variable twice in the DIVIDE expression.

Without variable

DAX
Sales YoY Growth =
DIVIDE (
( [Sales] - CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
),
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
)
Data Modelling With DAX

With variable

DAX
Sales YoY Growth =
VAR SalesPriorYear =
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
VAR SalesVariance =
DIVIDE ( ( [Sales] - SalesPriorYear ), SalesPriorYear )
RETURN
SalesVariance

In the first measure definition in the table, the formula is inefficient because it
requires Power BI to evaluate the same expression twice. The second definition is
more efficient because, due to the variable, Power BI only needs to evaluate the
PARALLELPERIOD expression once.

If your semantic model has multiple queries with multiple measures, the use of
variables could cut the overall query processing time in half and improve the overall
performance of the semantic model. Furthermore, this solution is a simple one;
imagine the savings as the formulas get more complicated, for instance, when you
are dealing with percentages and running totals.

Use variables to improve readability


In addition to improved performance, you might notice how the use of variables
makes the code simpler to read.

When using variables, it is best practice to use descriptive names for the variables. In
the previous example, the variable is called SalesPriorYear, which clearly states what
the variable is calculating. Consider the outcome of using a variable that was
called X, temp or variable1; the purpose of the variable would not be clear at all.

Using clear, concise, meaningful names will help make it easier for you to understand
what you are trying to calculate, and it will be much simpler for other developers to
maintain the report in the future.

Use variables to troubleshoot multiple steps


You can use variables to help you debug a formula and identify what the issue is.
Variables help simplify the task of troubleshooting your DAX calculation by
evaluating each variable separately and by recalling them after the RETURN
expression.
Data Modelling With DAX

In the following example, you test an expression that is assigned to a variable. In


order to debug you temporarily rewrite the RETURN expression to write to the
variable. The measure definition returns only the SalesPriorYear variable because
that is what comes after the RETURN expression.

DAX
Sales YoY Growth % =
VAR SalesPriorYear = CALCULATE([Sales], PARALLELPERIOD('Date'[Date], -12, MONTH))
VAR SalesPriorYear% = DIVIDE(([Sales] - SalesPriorYear), SalesPriorYear)
RETURN SalesPriorYear%

The RETURN expression will display the SalesPriorYear% value only. This technique
allows you to revert the expression when you have completed the debugging. It also
makes calculations simpler to understand due to reduced complexity of the DAX
code.

Reduce cardinality

Cardinality is a term that is used to describe the uniqueness of the values in a


column. Cardinality is also used in the context of the relationships between two
tables, where it describes the direction of the relationship.

Identify cardinality levels in columns


Previously, when you used Power Query Editor to analyze the metadata, the Column
distribution option on the View tab displayed statistics on how many distinct and
unique items were in each column in the data.

 Distinct values count - The total number of different values found in a given
column.
 Unique values count - The total number of values that only appear once in a
given column.
Data Modelling With DAX

A column that has a lot of repeated values in its range (unique count is low) will have
a low level of cardinality. Conversely, a column that has a lot of unique values in its
range (unique count is high) will have a high level of cardinality.

Lower cardinality leads to more optimized performance, so you might need to


reduce the number of high cardinally columns in your semantic model.

Reduce relationship cardinality


When you import multiple tables, it is possible that you'll do some analysis by using
data from all those tables. Relationships between those tables are necessary to
accurately calculate results and display the correct information in your reports. Power
BI Desktop helps make creating those relationships easier. In fact, in most cases, you
won't have to do anything, the autodetect feature does it for you. However, you
might occasionally have to create relationships or need to make changes to a
relationship. Regardless, it's important to understand relationships in Power BI
Desktop and how to create and edit them.

When you create or edit a relationship, you can configure additional options. By
default, Power BI Desktop automatically configures additional options based on its
best guess, which can be different for each relationship based on the data in the
columns.

The relationships can have different cardinality. Cardinality is the direction of the
relationship, and each model relationship must be defined with a cardinality type.
The cardinality options in Power BI are:

 Many-to-one (*:1) - This relationship is the most common, default type. It


means that the column in one table can have more than one instance of a
value, and the other related table, often known as the lookup table, has only
one instance of a value.
Data Modelling With DAX

 One-to-one (1:1) - In this relationship type, the column in one table has only
one instance of a particular value, and the other related table has only one
instance of a particular value.
 One-to-many (1:*) - In this relationship type, the column in one table has only
one instance of a particular value, and the other related table can have more
than one instance of a value.
 Many-to-many (:) - With composite models, you can establish a many-to-
many relationship between tables, which removes requirements for unique
values in tables. It also removes previous workarounds, such as introducing new
tables only to establish relationships.

During development, you are creating and editing relationships in your model, so
when you are building new relationships in your model, regardless of what
cardinality you have chosen, always ensure that both of the columns that you are
using to participate in a relationship are sharing the same data type. Your model will
never work if you try to build a relationship between two columns, where one
column has a text data type and another column has an integer data type.

In the following example, the ProductID field has the data type Whole number in
the Product and Sales tables. The columns with data type Integer perform better
than columns with data type Text.

Improve performance by reducing cardinality levels


Power BI Desktop offers different techniques that you can use to help reduce the
data that is loaded into semantic models, such as summarization. Reducing the data
that is loaded into your model will improve the relationship cardinality of the report.
For this reason, it is important that you strive to minimize the data that will be loaded
into your models. This case is especially true for large models, or models that you
anticipate will grow to become large over time.
Data Modelling With DAX

Perhaps the most effective technique to reduce a model size is to use a summary
table from the data source. Where a detail table might contain every transaction, a
summary table would contain one record per day, per week, or per month. It might
be an average of all of the transactions per day, for instance.

For example, a source sales fact table stores one row for each order line. Significant
data reduction could be achieved by summarizing all sales metrics if you group by
date, customer, and product, and individual transaction detail is not needed.

Consider then that an even more significant data reduction could be achieved by
grouping by date at month level. It could achieve a possible 99 percent reduction in
model size; but, reporting at day level or an individual order level is no longer
possible. Deciding to summarize fact-type data will always involve a tradeoff with the
detail of your data. A disadvantage is that you might lose the ability to drill into data
because the detail no longer exists. This tradeoff could be mitigated by using a
mixed model design.

In Power BI Desktop, a Mixed mode design produces a composite model. Essentially,


it allows you to determine a storage mode for each table. Therefore, each table can
have its Storage Mode property set as Import or DirectQuery.

An effective technique to reduce the model size is to set the Storage Mode property
for larger fact-type tables to DirectQuery. This design approach can work well in
conjunction with techniques that are used to summarize your data. For example, the
summarized sales data could be used to achieve high performance "summary"
reporting. A drill-through page could be created to display granular sales for specific
(and narrow) filter context, displaying all in-context sales orders. The drill-through
page would include visuals based on a DirectQuery table to retrieve the sales order
data (sales order details).

For more information, see Data reduction techniques for Import modeling.

Optimize DirectQuery models with


table level storage

DirectQuery is one way to get data into Power BI Desktop. The DirectQuery method
involves connecting directly to data in its source repository from within Power BI
Desktop. It is an alternative to importing data into Power BI Desktop.
Data Modelling With DAX

When you use the DirectQuery method, the overall user experience depends heavily
on the performance of the underlying data source. Slow query response times will
lead to a negative user experience and, in the worst-case scenarios, queries might
time out. Also, the number of users who are opening the reports at any one time will
impact the load that is placed on the data source. For example, if your report has 20
visuals in it and 10 people are using the report, 200 queries or more will exist on the
data source because each visual will issue one or more queries.

Unfortunately, the performance of your Power BI model will not only be impacted by
the performance of the underlying data source, but also by other uncontrollable
factors, such as:

 Network latency; faster networks return data quicker.


 The performance of the data source's server and how many other workloads are
on that server. For example, consider the implications of a server refresh taking
place while hundreds of people are using the same server for different reasons.

Therefore, using DirectQuery poses a risk to the quality of your model's performance.
To optimize performance in this situation, you need to have control over, or access
to, the source database.

For more detailed information, see DirectQuery model guidance in Power BI Desktop.

Implications of using DirectQuery


Data Modelling With DAX

It is best practice to import data into Power BI Desktop, but your organization might
need to use the DirectQuery data connectivity mode because of one of the following
reasons (benefits of DirectQuery):

 It is suitable in cases where data changes frequently and near real-time


reporting is required.
 It can handle large data without the need to pre-aggregate.
 It applies data sovereignty restrictions to comply with legal requirements.
 It can be used with a multidimensional data source that contains measures such
as SAP Business Warehouse (BW).

If your organization needs to use DirectQuery, you should clearly understand its
behavior within Power BI Desktop and be aware of its limitations. You will then be in
a good position to take action to optimize the DirectQuery model as much as
possible.

Behavior of DirectQuery connections

When you use DirectQuery to connect to data in Power BI Desktop, that connection
behaves in the following way:

 When you initially use the Get Data feature in Power BI Desktop, you will select
the source. If you connect to a relational source, you can select a set of tables
and each one will define a query that logically returns a set of data. If you select
a multidimensional source, such as SAP BW, you can only select the source.
 When you load the data, no data is imported into the Power BI Desktop, only
the schema is loaded. When you build a visual within Power BI Desktop, queries
are sent to the underlying source to retrieve the necessary data. The time it
takes to refresh the visual depends on the performance of the underlying data
source.
 If changes are made to the underlying data, they won't be immediately
reflected in the existing visuals in Power BI due to caching. You need to carry
out a refresh to see those changes. The necessary queries are present for each
visual, and the visuals are updated accordingly.
 When you publish the report to the Power BI service, it will result in a semantic
model in Power BI service, the same as for import. However, no data is included
with that semantic model.
 When you open an existing report in Power BI service, or build a new one, the
underlying source is again queried to retrieve the necessary data. Depending
on the location of the original source, you might have to configure an on-
premises data gateway.
 You can pin visuals, or entire report pages, as dashboard tiles. The tiles are
automatically refreshed on a schedule, for example, every hour. You can control
Data Modelling With DAX

the frequency of this refresh to meet your requirements. When you open a
dashboard, the tiles reflect the data at the time of the last refresh and might not
include the latest changes that are made to the underlying data source. You can
always refresh an open dashboard to ensure that it's up-to-date.

Limitations of DirectQuery connections

The use of DirectQuery can have negative implications. The limitations vary,
depending on the specific data source that is being used. You should take the
following points into consideration:

 Performance - As previously discussed, your overall user experience depends


heavily on the performance of the underlying data source.
 Security - If you use multiple data sources in a DirectQuery model, it is
important to understand how data moves between the underlying data sources
and the associated security implications. You should also identify if security
rules are applicable to the data in your underlying source because, in Power BI,
every user can see that data.
 Data transformation - Compared to imported data, data that is sourced from
DirectQuery has limitations when it comes to applying data transformation
techniques within Power Query Editor. For example, if you connect to an OLAP
source, such as SAP BW, you can't make any transformations at all; the entire
external model is taken from the data source. If you want to make any
transformations to the data, you will need to do this in the underlying data
source.
 Modeling - Some of the modeling capabilities that you have with imported
data aren't available, or are limited, when you use DirectQuery.
 Reporting -- Almost all the reporting capabilities that you have with imported
data are also supported for DirectQuery models, provided that the underlying
source offers a suitable level of performance. However, when the report is
published in Power BI service, the Quick Insights and Q&A features are not
supported. Also, the use of the Explore feature in Excel will likely result in
poorer performance.

For more detailed information on the limitations of using DirectQuery,


see Implications of using DirectQuery.

Now that you have a brief understanding of how DirectQuery works and the
limitations that it poses, you can take action to improve the performance.

Optimize performance
Data Modelling With DAX

Continuing with the Tailwind Traders scenario, during your review of the semantic
model, you discover that the query used DirectQuery to connect Power BI Desktop to
the source data. This use of DirectQuery is the reason why users are experiencing
poor report performance. It's taking too long to load the pages in the report, and
tables are not refreshing quickly enough when certain selections are made. You need
to take action to optimize the performance of the DirectQuery model.

You can examine the queries that are being sent to the underlying source and try to
identify the reason for the poor query performance. You can then make changes in
Power BI Desktop and the underlying data source to optimize overall performance.

Optimize data in Power BI Desktop

When you have optimized the data source as much as possible, you can take further
action within Power BI Desktop by using Performance analyzer, where you can
isolate queries to validate query plans.

You can analyze the duration of the queries that are being sent to the underlying
source to identify the queries that are taking a long time to load. In other words, you
can identify where the bottlenecks exist.

You don't need to use a special approach when optimizing a DirectQuery model; you
can apply the same optimization techniques that you used on the imported data to
tune the data from the DirectQuery source. For example, you can reduce the number
of visuals on the report page or reduce the number of fields that are used in a visual.
You can also remove unnecessary columns and rows.

For more detailed guidance on how to optimize a DirectQuery query,


see: DirectQuery model guidance in Power BI Desktop and Guidance for using
DirectQuery successfully.

Optimize the underlying data source (connected database)

Your first stop is the data source. You need to tune the source database as much as
possible because anything you do to improve the performance of that source
database will in turn improve Power BI DirectQuery. The actions that you take in the
database will do the most good.

Consider the use of the following standard database practices that apply to most
situations:

 Avoid the use of complex calculated columns because the calculation


expression will be embedded into the source queries. It is more efficient to
Data Modelling With DAX

push the expression back to the source because it avoids the push down. You
could also consider adding surrogate key columns to dimension-type tables.
 Review the indexes and verify that the current indexing is correct. If you need to
create new indexes, ensure that they are appropriate.

Refer to the guidance documents of your data source and implement their
performance recommendations.

Customize the Query reduction options

Power BI Desktop gives you the option to send fewer queries and to disable certain
interactions that will result in a poor experience if the resulting queries take a long
time to run. Applying these options prevents queries from continuously hitting the
data source, which should improve performance.

In this example, you edit the default settings to apply the available data reduction
options to your model. You access the settings by selecting File > Options and
settings > Options, scrolling down the page, and then selecting the Query
reduction option.

The following query reduction options are available:

 Reduce number of queries sent by - By default, every visual interacts with


every other visual. Selecting this check box disables that default interaction. You
can then optionally choose which visuals interact with each other by using
the Edit interactions feature.
 Slicers - By default, the Instantly apply slicer changes option is selected. To
force the report users to manually apply slicer changes, select the Add an
apply button to each slicer to apply changes when you're ready option.
 Filters - By default, the Instantly apply basic filter changes option is selected.
To force the report users to manually apply filter changes, select one of the
alternative options:
o Add an apply button to all basic filters to apply changes when you're
ready
o Add a single apply button to the filter pane to apply changes at once
(preview)
Data Modelling With DAX

Create and manage aggregations


Completed100 XP

 7 minutes

When aggregating data, you summarize that data and present it in at a higher grain
(level). For example, you can summarize all sales data and group it by date, customer,
product, and so on. The aggregation process reduces the table sizes in the semantic
model, allowing you to focus on important data and helping to improve the query
performance.
Data Modelling With DAX

Your organization might decide to use aggregations in their semantic models for the
following reasons:

 If you are dealing with a large amount of data (big data), aggregations will
provide better query performance and help you analyze and reveal the insights
of this large data. Aggregated data is cached and, therefore, uses a fraction of
the resources that are required for detailed data.
 If you are experiencing a slow refresh, aggregations will help you speed up the
refresh process. The smaller cache size reduces the refresh time, so data gets to
users faster. Instead of refreshing what could be millions of rows, you would
refresh a smaller amount of data instead.
 If you have a large semantic model, aggregations can help you reduce and
maintain the size of your model.
 If you anticipate your semantic model growing in size in the future, you can use
aggregations as a proactive step toward future proofing your semantic model
by lessening the potential for performance and refresh issues and overall query
problems.

Continuing with the Tailwind Traders scenario, you have taken several steps to
optimize the performance of the semantic model, but the IT team has informed you
that the file size is still too large. The file size is currently 1 gigabyte (GB), so you
need to reduce it to around 50 megabytes (MB). During your performance review,
you identified that the previous developer did not use aggregations in the semantic
model, so you now want to create some aggregations for the sales data to reduce
the file size and further optimize the performance.

Create aggregations
Before you start creating aggregations, you should decide on the grain (level) on
which you want to create them. In this example, you want to aggregate the sales data
at the day level.
Data Modelling With DAX

When you decide on the grain, the next step is to decide on how you want to create
the aggregations. You can create aggregations in different ways and each method
will yield the same results, for example:

 If you have access to the database, you could create a table with the
aggregation and then import that table into Power BI Desktop.
 If you have access to the database, you could create a view for the aggregation
and then import that view into Power BI Desktop.
 In Power BI Desktop, you can use Power Query Editor to create the
aggregations step-by-step.

In this example, you open a query in Power Query Editor and notice that the data has
not been aggregated; it has over 999 rows, as illustrated the following screenshot.

You want to aggregate the data by the OrderDate column and view
the OrderQuantity and SalesAmount columns. Start by selecting Choose
Columns on the Home tab. On the window that displays, select the columns that
you want in the aggregation and then select OK.
Data Modelling With DAX

When the selected columns display on the page, select the Group By option on
the Home tab. On the window that displays, select the column that you want to
group by (OrderDate) and enter a name for the new column (OnlineOrdersCount).

Select the Advanced option and then select the Add aggregation button to display
another column row. Enter a name for the aggregation column, select the operation
of the column, and then select the column to which you want to link the aggregation.
Repeat these steps until you have added all the aggregations and then select OK.
Data Modelling With DAX

It might take a few minutes for your aggregation to display, but when it does, you'll
see how the data has been transformed. The data will be aggregated into each date,
and you will be able to see the values for the orders count and the respective sum of
the sales amount and order quantity.

Select the Close and Apply button to close Power Query Editor and apply the
changes to your semantic model. Return to the Power BI Desktop page and then
select the Refresh button to see the results. Observe the screen because a brief
message will display the number of rows that your semantic model now has. This
number of rows should be significantly less than the number that you started with.
You can also see this number when you open Power Query Editor again, as illustrated
in the following screenshot. In this example, the number of rows was reduced to 30.

Remember, you started with over 999 rows. Using aggregation has significantly
reduced the number of rows in your semantic model, which means that Power BI has
less data to refresh and your model should perform better.

Manage aggregations
When you have created aggregations, you can manage those aggregations in Power
BI Desktop and make changes to their behavior, if required.

You can open the Manage Aggregations window from any view in Power BI
Desktop. In the Fields pane, right-click the table and then select Manage
aggregations.
Data Modelling With DAX

For each aggregation column, you can select an option from


the Summarization drop-down list and make changes to the selected detail table
and column. When you are finished managing the aggregations, select Apply All.

For more detailed information on how to create and manage aggregations, see Use
aggregations in Power BI Desktop.

1. What benefit do you get from analyzing the metadata? - The


benefit of analyzing the metadata is that you can clearly identify data
inconsistences with your semantic model.
2. What can be achieved by removing unnecessary rows and
columns? - Deleting unnecessary rows and columns will reduce a
semantic model size and it's good practice to load only necessary data
into your semantic model.
3. Which of the following statements about relationships in Power BI
Desktop is true? - Relationships can be created between tables that
contain different types of data.
Data Modelling With DAX

You might also like