Model Data With PowerBI
Model Data With PowerBI
Characteristic of all iterator functions, you must pass in a table and an expression.
The table can be a model table reference or an expression that returns a table object.
The expression must evaluate to a scalar value.
DAX
Revenue = SUM(Sales[Sales Amount])
DAX
Revenue =
SUMX(
Sales,
Sales[Sales Amount]
)
It's important to understand how context works with iterator functions. Because
iterator functions enumerate over table rows, the expression is evaluated for each
row in row context, similar to calculated column formulas. The table is evaluated in
filter context, so if you're using the previous Revenue measure definition example, if
a report visual was filtered by fiscal year FY2020, then the Sales table would contain
sales rows that were ordered in that year. Filter context is described in the filter
context module.
Important
When you're using iterator functions, make sure that you avoid using large tables (of
rows) with expressions that use expansive DAX functions. Some functions, like
the SEARCH DAX function, which scans a text value that looks for specific characters
or text, can result in slow performance. Also, the LOOKUPVALUE DAX function might
result in a slow, row-by-row retrieval of values. In this second case, use
the RELATED DAX function instead, whenever possible.
Data Modelling With DAX
Complex summarization
In this section, you will create your first measure that uses an iterator function. First,
download and open the Adventure Works DW 2020 M05.pbix file. Next, add the
following measure definition:
DAX
Revenue =
SUMX(
Sales,
Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount
Pct])
)
Format the Revenue measure as currency with two decimal places, and then add it
to the table visual that's found on Page 1 of the report.
By using an iterator function, the Revenue measure formula aggregates more than
the values of a single column. For each row, it uses the row context values of three
columns to produce the revenue amount.
DAX
Discount =
SUMX(
Sales,
Sales[Order Quantity]
* (
RELATED('Product'[List Price]) - Sales[Unit Price]
)
)
Format the Discount measure as currency with two decimal places, and then add it
to the table visual.
Notice that the formula uses the RELATED function. Remember, row context does not
extend beyond the table. If your formula needs to reference columns in other tables,
and model relationships exist between the tables, use the RELATED function for the
one-side relationship or the RELATEDTABLE function for the many-side relationship.
DAX
Revenue Avg =
AVERAGEX(
Sales,
Sales[Order Quantity] * Sales[Unit Price] * (1 - Sales[Unit Price Discount
Pct])
)
Data Modelling With DAX
Format the Revenue Avg measure as currency with two decimal places, and then
add it to the table visual.
Consider that average means the sum of values divided by the count of values.
However, that theory raises a question: What does the count of values represent? In
this case, the count of values is the number of expressions that didn't evaluate to
BLANK. Also, because the iterator function enumerates the Sales table rows, average
would mean revenue per row. Taking this logic one step further, because each row in
the Sales table records a sales order line, it can be more precisely described
as revenue per order line.
The following example uses an iterator function to create a new measure that raises
the granularity to the sales order level (a sales order consists of one or more order
lines). Add the following measure:
DAX
Revenue Avg Order =
AVERAGEX(
VALUES('Sales Order'[Sales Order]),
[Revenue]
)
Format the Revenue Avg Order measure as currency with two decimal places, and
then add it to the table visual.
Data Modelling With DAX
As expected, the average revenue for an order is always higher than the average
revenue for a single order line.
Notice that the formula uses the VALUES DAX function. This function lets your
formulas determine what values are in filter context. In this case,
the AVERAGEX function iterates over each sales order in filter context. In other words, it
iterates over each sales order for the month. Filter context and the VALUES function
are introduced in the filter context module.
Calculate ranks
The RANKX DAX function is a special iterator function you can use to calculate ranks.
Its syntax is as follows:
DAX
RANKX(<table>, <expression>[, <value>[, <order>[, <ties>]]])
Similar to all iterator functions, you must pass in a table and an expression.
Optionally, you can pass in a rank value, set the order direction, or determine how to
handle ranks when values are tied.
Order direction
is ranked first. When you don't pass in an order argument, the function will use 0
(zero) (for descending order).
Handle ties
You can handle ties by skipping rank values or using dense ranking, which uses the
next rank value after a tie. When you don't pass in a ties argument, the function will
use Skip. You'll have an opportunity to work with an example of each tie argument
later in this unit.
DAX
Product Quantity Rank =
RANKX(
ALL('Product'[Product]),
[Quantity]
)
Add the Product Quantity Rank measure to the table visual that is found
on Page 2 of the report. The table visual groups bike products and displays quantity,
which orders products by descending quantity.
The RANKX function iterates over a table that is returned by the ALL DAX function.
The ALL function is used to return all rows in a model table or values in one or more
columns, and it ignores all filters. Therefore, in this case, it returns a table that
consists of all Product column values in the Product table. The RANKX function must
use the ALL function because the table visual will group by product (which is a filter
on the Product table).
In the table visual, notice that two products tie for tenth place and that the next
product's rank is 12. This visual is an example of using the Skipped ties argument.
Data Modelling With DAX
DAX
Product Quantity Rank =
RANKX(
ALL('Product'[Product]),
[Quantity],
,
,
DENSE
)
In the table visual, notice that a skipped ranking no longer exists. After the two
products that tie for tenth place, the next ranking is 11.
Data Modelling With DAX
Notice that the table visual total for the Product Quantity Rank is one (1). The
reason is because the total for all products is ranked.
It's not appropriate to rank total products, so you will now use the following logic to
modify the measure definition to return BLANK, unless a single product is filtered:
DAX
Product Quantity Rank =
IF(
HASONEVALUE('Product'[Product]),
RANKX(
ALL('Product'[Product]),
[Quantity],
,
,
DENSE
)
)
Data Modelling With DAX
Notice that the total Product Quantity Rank is now BLANK, which was achieved by
using the HASONEVALUE DAX function to test whether the Product column in
the Product table has a single value in filter context. It's the case for each product
group, but not for the total, which represents all products.
Filter context and the HASONEVALUE function will be introduced in the filter context
module.
In the Fields pane, a column that's shown with the sigma symbol (∑) indicates two
facts:
In the following image, notice that the Sales table includes only fields that can be
summarized, including the Profit Amount calculated column.
Data Modelling With DAX
As a data modeler, you can control if and how the column summarizes by setting
the Summarization property to Don't summarize or to a specific aggregation
function. When you set the Summarization property to Don't summarize, the
sigma symbol will no longer show next to the column in the Fields pane.
To observe how report authors can use implicit measures, you can first download
and open the Adventure Works DW 2020 M04.pbix file.
In the report, from the Sales table, add the Sales Amount field to the matrix visual
that groups fiscal year and month on its rows.
To determine how the column is summarized, in the visual fields pane, for
the Sales Amount field, select the arrow and then review the context menu options.
Data Modelling With DAX
Notice that the Sum aggregation function has a check mark next to it. This check
mark indicates that the column is summarized by summing column values together.
It's also possible to change the aggregation function by selecting any of the other
options like average, minimum, and so on.
The default summarization is now set to Average (the modeler knows that it's
inappropriate to sum unit price values together because they're rates, which are non-
additive).
Data Modelling With DAX
Implicit measures allow the report author to start with a default summarization
technique and lets them modify it to suit their visual requirements.
Sum
Average
Minimum
Maximum
Count (Distinct)
Count
Standard deviation
Variance
Median
First (alphabetically)
Last (alphabetically)
Count (Distinct)
Count
Earliest
Latest
Count (Distinct)
Count
Count (Distinct)
Count
The report visual obeys your setup, but it has now produced
a Sum of Unit Price column, which presents misleading data.
The most significant limitation of implicit measures is that they only work for simple
scenarios, meaning that they can only summarize column values that use a specific
aggregation function. Therefore, in situations when you need to calculate the ratio of
each month's sales amount over the yearly sales amount, you'll need to produce an
explicit measure by writing a Data Analysis Expressions (DAX) formula to achieve that
more sophisticated requirement.
Implicit measures don't work when the model is queried by using Multidimensional
Expressions (MDX). This language expects explicit measures and can't summarize
column data. It's used when a Power BI semantic model is queried by using Analyze
in Excel or when a Power BI paginated report uses a query that is generated by the
MDX graphical query designer.
Note
Measures don't store values in the model. Instead, they're used at query time to
return summarizations of model data. Additionally, measures can't reference a table
or column directly; they must pass the table or column into a function to produce a
summarization.
A simple measure is one that aggregates the values of a single column; it does what
implicit measures do automatically.
In the next example, you will add a measure to the Sales table. In the Fields pane,
select the Sales table. To create a measure, in the Table Tools contextual ribbon,
from inside the Calculations group, select New measure.
Data Modelling With DAX
In the formula bar, enter the following measure definition and then press Enter.
DAX
Revenue =
SUM(Sales[Sales Amount])
The measure definition adds the Revenue measure to the Sales table. It uses
the SUM DAX function to sum the values of the Sales Amount column.
On the Measure tools contextual ribbon, inside the Formatting group, set the
decimal places to 2.
Tip
Immediately after you create a measure, set the formatting options to ensure well-
presented and consistent values in all report visuals.
Now, add the Revenue measure to the matrix visual. Notice that it produces the
same result as the Sales Amount implicit measure.
In the matrix visual, remove Sales Amount and Sum of Unit Price.
Next, you will create more measures. Create the Cost measure by using the following
measure definition, and then set the format with two decimal places.
DAX
Cost =
SUM(Sales[Total Product Cost])
Create the Profit measure, and then set the format with two decimal places.
DAX
Profit =
Data Modelling With DAX
SUM(Sales[Profit Amount])
Notice that the Profit Amount column is a calculated column. This topic will be
discussed later in this module.
Next, create the Quantity measure and format it as a whole number with the
thousands separator.
DAX
Quantity =
SUM(Sales[Order Quantity])
Create three unit price measures and then set the format of each with two decimal
places. Notice the different DAX aggregation functions that are used: MIN, MAX,
and AVERAGE.
DAX
Minimum Price =
MIN(Sales[Unit Price])
DAX
Maximum Price =
MAX(Sales[Unit Price])
DAX
Average Price =
AVERAGE(Sales[Unit Price])
Now, hide the Unit Price column, which results in report authors losing their ability
to summarize the column except by using your measures.
Tip
Adding measures and hiding columns is how you, the data modeler, can limit
summarization options.
Next, create the following two measures, which count the number of orders and
order lines. Format both measures with zero decimal places.
DAX
Order Line Count =
COUNT (Sales [SalesOrderLineKey])
DAX
Order Count =
DISTINCTCOUNT ('Sales Order'[Sales Order])
The COUNT DAX function counts the number of non-BLANK values in a column, while
the DISTINCTCOUNT DAX function counts the number of distinct values in a column.
Because an order can have one or more order lines, the Sales Order column will
Data Modelling With DAX
have duplicate values. A distinct count of values in this column will correctly count
the number of orders.
Alternatively, you can choose the better way to write the Order Line Count measure.
Instead of counting values in a column, it's semantically clearer to use
the COUNTROWS DAX function. Unlike the previously introduced aggregation
functions, which aggregate column values, the COUNTROWS function counts the number
of rows for a table.
Modify the Order Line Count measure formula you created above to the following
parameters:
DAX
Order Line Count =
COUNTROWS(Sales)
All measures that you've created are considered simple measures because they
aggregate a single column or single table.
For this example, you will modify the Profit measure by using the following measure
definition. Format the measure with two decimal places.
DAX
Profit =
[Revenue] - [Cost]
Now that your model provides a way to summarize profit, you can delete
the Profit Amount calculated column.
Many categories of calculations and ways to modify each calculation are available to
fit your needs. Moreover, you are able to see the DAX that's generated by the quick
measure and use it to jumpstart or expand your DAX knowledge.
In this next example, you'll create another compound measure to calculate profit
margin. However, this time, you'll create it as a quick measure.
In the Fields pane, select the Sales table. On the Table tools contextual ribbon, from
inside the Calculations group, select Quick measure.
From the Fields list (in the Quick measures window), expand the Sales table and
then drag the Profit measure into the Numerator box. Then, drag
the Revenue measure into the Denominator box.
Data Modelling With DAX
Select Add. In the Fields pane, notice the addition of the new compound measure. In
the formula bar, review the measure definition.
DAX
Profit divided by Revenue =
DIVIDE([Profit], [Revenue])
Note
After the quick measure has been created, you must apply any changes in the
formula bar.
Rename the measure as Profit Margin, and then set the format to a percentage with
two decimal places.
Visual use - Calculated columns (like any column) can be used to filter, group,
or summarize (as an implicit measure), whereas measures are designed to
summarize.
You can write a Data Analysis Expressions (DAX) formula to add a calculated table to
your model. The formula can duplicate or transform existing model data to produce
a new table.
Note
A calculated table can't connect to external data; you must use Power Query to
accomplish that task.
A calculated table formula must return a table object. The simplest formula can
duplicate an existing model table.
Calculated tables have a cost: They increase the model storage size and they can
prolong the data refresh time. The reason is because calculated tables recalculate
when they have formula dependencies to refreshed tables.
Duplicate a table
The following section describes a common design challenge that can be solved by
creating a calculated table. First, you should download and open
the Adventure Works DW 2020 M03.pbix file and then switch to the model
diagram.
In the model diagram, notice that the Sales table has three relationships to
the Date table.
Data Modelling With DAX
The model diagram shows three relationships because the Sales table stores sales
data by order date, ship date, and due date. If you examine
the OrderDateKey, ShipDateKey, and DueDateKey columns, notice that one
relationship is represented by a solid line, which is the active relationship. The other
relationships, which are represented by dashed lines, are inactive relationships.
Note
Only one active relationship can exist between any two model tables.
In the diagram, hover the cursor over the active relationship to highlight the related
columns, which is how you would interact with the model diagram to learn about
related columns. In this case, the active relationship filters the OrderDateKey column
in the Sales table. Thus, filters that are applied to the Date table will propagate to
the Sales table to filter by order date; they'll never filter by ship date or due date.
The next step is to delete the two inactive relationships between the Date table and
the Sales table. To delete a relationship, right-click it and then select Delete in the
context menu. Make sure that you delete both inactive relationships.
Next, add a new table to allow report users to filter sales by ship date. Switch to
Report view and then, on the Modeling ribbon tab, from inside
the Calculations group, select New table.
Data Modelling With DAX
In the formula bar (located beneath the ribbon), enter the following calculated table
definition and then press Enter.
DAX
Ship Date = 'Date'
The calculated table definition duplicates the Date table data to produce a new table
named Ship Date. The Ship Date table has exactly the same columns and rows as
the Date table. When the Date table data refreshes, the Ship Date table recalculates,
so they'll always be in sync.
Switch to the model diagram, and then notice the addition of the Ship Date table.
Next, create a relationship between the DateKey column in the Ship Date table and
the ShipDateKey column in the Sales table. You can create the relationship by
dragging the DateKey column in the Ship Date table onto the ShipDateKey column
in the Sales table.
A calculated table only duplicates data; it doesn't duplicate any model properties or
objects like column visibility or hierarchies. You'll need to set them up for the new
table, if required.
Tip
Data Modelling With DAX
It's possible to rename columns of a calculated table. In this example, it's a good idea
to rename columns so that they better describe their purpose. For example,
the Fiscal Year column in the Ship Date table can be renamed as Ship Fiscal Year.
Accordingly, when fields from the Ship Date table are used in visuals, their names
are automatically included in captions like the visual title or axis labels.
Calculated tables are useful to work in scenarios when multiple relationships between
two tables exist, as previously described. They can also be used to add a date table to
your model. Date tables are required to apply special time filters known as time
intelligence.
Create the Due Date calculated table by using the following definition.
DAX
Due Date = CALENDARAUTO(6)
The CALENDARAUTO DAX function takes a single optional argument, which is the last
month number of the year, and returns a single-column table. If you don't pass in a
month number, it's assumed to be 12 (for December). For example, at Adventure
Data Modelling With DAX
Works, their financial year ends on June 30 of each year, so the value 6 (for June) is
passed in.
The function scans all date and date/time columns in your model to determine the
earliest and latest stored date values. It then produces a complete set of dates that
span all dates in your model, ensuring that full years of dates are loaded. For
example, if the earliest date that is stored in your model is October 15, 2021, then the
first date that is returned by the CALENDARAUTO function would be July 1, 2021. If the
latest date that is stored in the model is June 15, 2022, then the last date that is
returned by the CALENDARAUTO function would be June 30, 2022.
You can also create a date table by using the CALENDAR DAX function and passing in
two date values, which represent the date range. The function generates one row for
each date within the range. You can pass in static date values or pass in expressions
that retrieve the earliest/latest dates from specific columns in your model.
Next, switch to data view, and then in the Fields pane, select the Due Date table.
Now, review the column of dates. You might want to order them to see the earliest
date in the first row by selecting the arrow inside the Date column header and then
sorting in ascending order.
Note
Ordering or filtering columns doesn't change how the values are stored. These
functions help you explore and understand the data.
Data Modelling With DAX
Now that the Date column is selected, review the message in the status bar (located
in the lower-left corner). It describes how many rows that the table stores and how
many distinct values are found in the selected column.
When the table rows and distinct values are the same, it means that the column
contains unique values. That factor is important for two reasons: It satisfies the
requirements to mark a date table, and it allows this column to be used in a model
relationship as the one-side.
The Due Date calculated table will recalculate each time a table that contains a date
column refreshes. In other words, when a row is loaded into the Sales table with an
order date of July 1, 2022, the Due Date table will automatically extend to include
dates through to the end of the next year: June 30, 2023.
The Due Date table requires additional columns to support the known filtering and
grouping requirements, specifically by year, quarter, and month.
You can write a DAX formula to add a calculated column to any table in your model.
A calculated column formula must return a scalar or single value.
Calculated columns in import models have a cost: They increase the model storage
size and they can prolong the data refresh time. The reason is because calculated
columns recalculate when they have formula dependencies to refreshed tables.
In data view, in the Fields pane, ensure that the Due Date table is selected. Before
you create a calculated column, first rename the Date column to Due Date.
Data Modelling With DAX
Now, you can add a calculated column to the Due Date table. To create a calculated
column, in the Table tools contextual ribbon, from inside the Calculations group,
select New column.
In the formula bar, enter the following calculated column definition and then
press Enter.
DAX
Due Fiscal Year =
"FY"
& YEAR('Due Date'[Due Date])
+ IF(
MONTH('Due Date'[Due Date]) > 6,
1
)
The calculated column definition adds the Due Fiscal Year column to
the Due Date table. The following steps describe how Microsoft Power BI evaluates
the calculated column formula:
1. The addition operator (+) is evaluated before the text concatenation operator (&).
2. The YEAR DAX function returns the whole number value of the due date year.
3. The IF DAX function returns the value when the due date month number is 7-12 (July
to December); otherwise, it returns BLANK. (For example, because the Adventure
Works financial year is July-June, the last six months of the calendar year will use the
next calendar year as their financial year.)
4. The year value is added to the value that is returned by the IF function, which is the
value one or BLANK. If the value is BLANK, it's implicitly converted to zero (0) to allow
the addition to produce the fiscal year value.
5. The literal text value "FY" concatenated with the fiscal year value, which is implicitly
converted to text.
DAX
Due Fiscal Quarter =
'Due Date'[Due Fiscal Year] & " Q"
& IF(
MONTH('Due Date'[Due Date]) <= 3,
3,
Data Modelling With DAX
IF(
MONTH('Due Date'[Due Date]) <= 6,
4,
IF(
MONTH('Due Date'[Due Date]) <= 9,
1,
2
)
)
)
The calculated column definition adds the Due Fiscal Quarter column to
the Due Date table. The IF function returns the quarter number (Quarter 1 is July-
September), and the result is concatenated to the Due Fiscal Year column value and
the literal text Q.
DAX
Due Month =
FORMAT('Due Date'[Due Date], "yyyy mmm")
Note
Many user-defined date/time formats exist. For more information, see Custom date
and time formats for the FORMAT function.
DAX
Due Full Date =
FORMAT('Due Date'[Due Date], "yyyy mmm, dd")
DAX
MonthKey =
(YEAR('Due Date'[Due Date]) * 100) + MONTH('Due Date'[Due Date])
The MonthKey calculated column multiplies the due date year by the value 100 and
then adds the month number of the due date. It produces a numeric value that can
be used to sort the Due Month text values in chronological order.
Data Modelling With DAX
Verify that the Due Date table has six columns. The first column was added when the
calculated table was created, and the other five columns were added as calculated
columns.
Sort the Due Full Date column by the Due Date column.
Sort the Due Month column by the MonthKey column.
Hide the MonthKey column.
Create a hierarchy named Fiscal with the following levels:
o Due Fiscal Year
o Due Fiscal Quarter
o Due Month
o Due Full Date
Mark the Due Date table as a date table by using the Due Date column.
The formula for a calculated column is evaluated for each table row. Furthermore, it's
evaluated within row context, which means the current row. Consider
the Due Fiscal Year calculated column definition:
DAX
Due Fiscal Year =
"FY"
& YEAR('Due Date'[Due Date])
+ IF(
MONTH('Due Date'[Due Date]) <= 6,
1
)
Data Modelling With DAX
When the formula is evaluated for each row, the 'Due Date'[Due Date] column
reference returns the column value for that row. You might be familiar with this
concept from working with formulas in Excel tables.
However, row context doesn't extend beyond the table. If your formula needs to
reference columns in other tables, you have two options:
Generally, try to use the RELATED function whenever possible. It will usually perform
better than the LOOKUPVALUE function due to the ways that relationship and column
data is stored and indexed.
Now, add the following calculated column definition to the Sales table:
DAX
Discount Amount =
(
Sales[Order Quantity]
* RELATED('Product'[List Price])
) - Sales[Sales Amount]
Row context is used when calculated column formulas are evaluated. It's also used
when a class of functions, known as iterator functions, are used. Iterator functions
provide you with flexibility to create sophisticated summarizations. Iterator functions
are described in a later module.
You can use the DAX parent-child functions to naturalize the recursive (employee-
manager) relationship into columns.
Data Modelling With DAX
Filter context describes the filters that are applied during the evaluation of a measure
or measure expression. Filters can be applied directly to columns, like a filter on
the Fiscal Year column in the Date table for the value FY2020. Additionally, filters
can be applied indirectly, which happens when model relationships propagate filters
to other tables. For example, the Sales table receives a filter through its relationship
with the Date table, filtering the Sales table rows to those with
an OrderDateKey column value in FY2020.
Note
Calculated tables and calculated columns aren't evaluated within filter context.
Calculated columns are evaluated in row context, though the formula can transition
the row context to filter context, if it needs to summarize model data. Context
transition is described in Unit 5.
At report design time, filters are applied in the Filters pane or to report visuals. The
slicer visual is an example of a visual whose only purpose is to filter the report page
(and other pages when it's configured as a synced slicer). Report visuals, which
perform grouping, also apply filters. They're implied filters; the difference is that the
filter result is visible in the visual. For example, a stacked column chart visual can filter
by fiscal year FY2020, group by month, and summarize sales amount. The fiscal year
filter isn't visible in the visual result, yet the grouping, which results in a column for
each month, behaves as a filter.
Not all filters are applied at report design time. Filters can be added when a report
user interacts with the report. They can modify filter settings in the Filters pane, and
they can cross-filter or cross-highlight visuals by selecting visual elements like
Data Modelling With DAX
columns, bars, or pie chart segments. These interactions apply additional filters to
report page visuals (unless interactions have been disabled).
It's important to understand how filter context works. It guides you in defining the
correct formula for your calculations. As you write more complex formulas, you'll
identify times when you need to add, modify, or remove filters to achieve the desired
result.
Consider an example that requires your formula to modify the filter context. Your
objective is to produce a report visual that shows each sales region together with its
revenue and revenue as a percentage of total revenue.
The numerator expression doesn't need to modify filter context; it should use the
current filter context (a visual that groups by region applies a filter for that region).
The denominator expression, however, needs to remove any region filters to achieve
the result for all regions.
Tip
Mastering these concepts takes practice and time. Rarely will students understand
the concepts from the beginning of training. Therefore, be patient and persevere
with the theory and activities. We recommend that you repeat this module at a later
time to help reinforce key lessons.
The next unit introduces the CALCULATE DAX function. It's one of the most powerful
DAX functions, allowing you to modify filter context when your formulas are
evaluated.
DAX
CALCULATE(<expression>, [[<filter1>], <filter2>]…)
The function requires passing in an expression that returns a scalar value and as
many filters as you need. The expression can be a measure (which is a named
expression) or any expression that can be evaluated in filter context.
Filters can be Boolean expressions or table expressions. It's also possible to pass in
filter modification functions that provide additional control when you're modifying
filter context.
When you have multiple filters, they're evaluated by using the AND logical operator,
which means that all conditions must be TRUE at the same time.
Note
In this example, you will create a measure. First, download and open
the Adventure Works DW 2020 M06.pbix file. Then add the following measure to
the Sales table that filters the Revenue measure by using a Boolean expression filter
for red products.
DAX
Revenue Red = CALCULATE([Revenue], 'Product'[Color] = "Red")
Add the Revenue Red measure to the table visual that is found on Page 1 of the
report.
In this next example, the following measure filters the Revenue measure by multiple
colors. Notice the use of the IN operator followed by a list of color values.
DAX
Revenue Red or Blue = CALCULATE([Revenue], 'Product'[Color] IN {"Red", "Blue"})
DAX
Revenue Expensive Products = CALCULATE([Revenue], 'Product'[List Price] > 1000)
Data Modelling With DAX
Commonly, you'll use the FILTER DAX function to apply complex filter conditions,
including those that can't be defined by a Boolean filter expression.
The FILTER function is classed as an iterator function, and so you would pass in a
table, or table expression, and an expression to evaluate for each row of that table.
The FILTER function returns a table object with exactly the same structure as one that
the table passed in. Its rows are a subset of those rows that were passed in, meaning
the rows where the expression evaluated as TRUE.
The following example shows a table filter expression that uses the FILTER function:
DAX
Revenue High Margin Products =
CALCULATE(
[Revenue],
FILTER(
'Product',
'Product'[List Price] > 'Product'[Standard Cost] * 2
)
)
In this example, the FILTER function filters all rows of the Product table that are in
filter context. Each row for a product where its list price exceeds double its standard
cost is displayed as a row of the filtered table. Therefore, the Revenue measure is
evaluated for all products that are returned by the FILTER function.
All filter expressions that are passed in to the CALCULATE function are table filter
expressions. A Boolean filter expression is a shorthand notation to improve the
writing and reading experience. Internally, Microsoft Power BI translates Boolean
filter expressions to table filter expressions, which is how it translates
your Revenue Red measure definition.
DAX
Revenue Red =
CALCULATE(
[Revenue],
FILTER(
'Product',
'Product'[Color] = "Red"
)
)
Data Modelling With DAX
Filter behavior
Two possible standard outcomes occur when you add filter expressions to
the CALCULATE function:
If the columns (or tables) aren't in filter context, then new filters will be added
to the filter context to evaluate the CALCULATE expression.
If the columns (or tables) are already in filter context, the existing filters will be
overwritten by the new filters to evaluate the CALCULATE expression.
The following examples show how adding filter expressions to the CALCULATE function
works.
Note
As in the previous activity, the Revenue Red measure was added to a table visual
that groups by region and displays revenue.
Because no filter is applied on the Color column in the Product table, the evaluation
of the measure adds a new filter to filter context. In the first row, the value of
$2,681,324.79 is for red products that were sold in the Australian region.
Switching the first column of the table visual from Region to Color will produce a
different result because the Color column in the Product table is now in filter
context.
Data Modelling With DAX
The Revenue Red measure formula evaluates the Revenue measure by adding a
filter on the Color column (to red) in the Product table. Consequently, in this visual
that groups by color, the measure formula overwrites the filter context with a new
filter.
This result might or might not be what you want. The next unit introduces
the KEEPFILTERS DAX function, which is a filter modification function that you can
use to preserve filters rather than overwrite them.
When using the CALCULATE function, you can pass in filter modification functions,
which allow you to accomplish more than adding filters alone.
Remove filters
Use the REMOVEFILTERS DAX function as a CALCULATE filter expression to remove filters
from filter context. It can remove filters from one or more columns or from all
columns of a single table.
Note
both filter modifiers and as functions that return table objects of distinct values.
These functions are mentioned now because you're likely to find documentation and
formula examples that remove filters by using them.
In the following example, you will add a new measure to the Sales table that
evaluates the Revenue measure but does so by removing filters from
the Sales Territory table. Format the measure as currency with two decimal places.
DAX
Revenue Total Region = CALCULATE([Revenue], REMOVEFILTERS('Sales Territory'))
Now, add the Revenue Total Region measure to the matrix visual that is found
on Page 2 of the report. The matrix visual will group by three columns from the
Sales Territory table on the rows: Group, Country, and Region.
Notice that each Revenue Total Region value is the same. It's the value of total
revenue.
Data Modelling With DAX
While this result on its own isn't useful, when it's used as a denominator in a ratio, it
calculates a percent of grand total. Therefore, you will now overwrite
the Revenue Total Region measure definition with the following definition. (This
new definition changes the measure name and declares two variables. Be sure to
format the measure as a percentage with two decimal places.)
DAX
Revenue % Total Region =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalRegionRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory')
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalRegionRevenue
)
Verify that the matrix visual now displays the Revenue % Total Region values.
You'll now create another measure, but this time, you will calculate the ratio of
revenue for a region divided by its country's or region's revenue.
Data Modelling With DAX
Before you complete this task, notice that the Revenue % Total Region value for the
Southwest region is 22.95 percent. Investigate the filter context for this cell. Switch to
data view and then, in the Fields pane, select the Sales Territory table.
Notice that the filters reduce the table to only one row. Now, while thinking about
your new objective to create a ratio of the region revenue over its country's revenue,
clear the filter from the Region column.
Notice that five rows now exist, each row belonging to the country United States.
Accordingly, when you clear the Region column filters, while preserving filters on
Data Modelling With DAX
the Country and Group columns, you will have a new filter context that's for the
region's country.
In the following measure definition, notice how you can clear or remove a filter from
a column. In DAX logic, it's a small and subtle change that's made to
the Revenue % Total Region measure formula: The REMOVEFILTERS function now
removes filters from the Region column instead of all columns of
the Sales Territory table.
DAX
Revenue % Total Country =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalCountryRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory'[Region])
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalCountryRevenue
)
Add the Revenue % Total Country measure to the Sales table and then format it as
a percentage with two decimal places. Add the new measure to the matrix visual.
Notice that all values, except those values for United States regions, are 100 percent.
The reason is because, at the Adventure Works company, the United States has
regions, while all other countries/regions do not.
Data Modelling With DAX
Note
Tabular models don't support ragged hierarchies, which are hierarchies with variable
depths. Therefore, it's a common design approach to repeat parent (or other
ancestor) values at lower levels of the hierarchy. For example, Australia doesn't have
a region, so the country/region value is repeated as the region name. It's always
better to store a meaningful value instead of BLANK.
The next example is last measure that you will create. Add
the Revenue % Total Group measure, and then format it as a percentage with two
decimal places. Then, add the new measure to the matrix visual.
DAX
Revenue % Total Group =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalGroupRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS(
'Sales Territory'[Region],
'Sales Territory'[Country]
)
)
RETURN
DIVIDE(
CurrentRegionRevenue,
TotalGroupRevenue
)
When you remove filters from the Region and Country columns in
the Sales Territory table, the measure will calculate the region revenue as a ratio of
its group's revenue.
Data Modelling With DAX
Preserve filters
You can use the KEEPFILTERS DAX function as a filter expression in
the CALCULATE function to preserve filters.
To observe how to accomplish this task, switch to Page 1 of the report. Then, modify
the Revenue Red measure definition to use the KEEPFILTERS function.
DAX
Revenue Red =
CALCULATE(
[Revenue],
KEEPFILTERS('Product'[Color] = "Red")
)
In the table visual, notice that only one Revenue Red value exists. The reason is
because the Boolean filter expression preserves existing filters on the Color column
in the Product table. The reason why colors other than red are BLANK is because the
filter contexts and the filter expressions are combined for these two filters. The color
black and color red are intersected, and because both can't be TRUE at the same time,
the expression is filtered by no product rows. It's only possible that both red filters
can be TRUE at the same time, which explains why the one Revenue Red value is
shown.
the CALCULATE function. When you use this function to engage an inactive
relationship, the active relationship will automatically become inactive.
DAX
Revenue Shipped =
CALCULATE (
[Revenue],
USERELATIONSHIP('Date'[DateKey], Sales[ShipDateKey])
)
The CROSSFILTER function can modify filter directions (from both to single or from
single to both) and even disable a relationship.
The VALUES DAX function lets your formulas determine what values are in filter
context.
DAX
VALUES(<TableNameOrColumnName>)
The function requires passing in a table reference or a column reference. When you
pass in a table reference, it returns a table object with the same columns that contain
rows for what's in filter context. When you pass in a column reference, it returns a
single-column table of unique values that are in filter context.
The function always returns a table object and it's possible for a table to contain
multiple rows. Therefore, to test whether a specific value is in filter context, your
formula must first test that the VALUES function returns a single row. Two functions
can help you accomplish this task: the HASONEVALUE and the SELECTEDVALUE DAX
functions.
Data Modelling With DAX
The HASONEVALUE function returns TRUE when a given column reference has been
filtered down to a single value.
The SELECTEDVALUE function simplifies the task of determining what a single value
could be. When the function is passed a column reference, it'll return a single value,
or when more than one value is in filter context, it'll return BLANK (or an alternate
value that you pass to the function).
In the following example, you will use the HASONEVALUE function. Add the following
measure, which calculates sales commission, to the Sales table. Note that, at
Adventure Works, the commission rate is 10 percent of revenue for all
countries/regions except the United States. In the United States, salespeople earn 15
percent commission. Format the measure as currency with two decimal places, and
then add it to the table that is found on Page 3 of the report.
DAX
Sales Commission =
[Revenue]
* IF(
HASONEVALUE('Sales Territory'[Country]),
IF(
VALUES('Sales Territory'[Country]) = "United States",
0.15,
0.1
)
)
Notice that the total Sales Commission result is BLANK. The reason is because
multiple values are in filter context for the Country column in
the Sales Territory table. In this case, the HASONEVALUE function returns FALSE, which
results in the Revenue measure being multiplied by BLANK (a value multiplied by
Data Modelling With DAX
BLANK is BLANK). To produce a total, you will need to use an iterator function, which
is explained later in this module.
Three other functions that you can use to test filter state are:
DAX
Revenue % Total Country =
VAR CurrentRegionRevenue = [Revenue]
VAR TotalCountryRevenue =
CALCULATE(
[Revenue],
REMOVEFILTERS('Sales Territory'[Region])
)
RETURN
IF(
ISINSCOPE('Sales Territory'[Region]),
DIVIDE(
CurrentRegionRevenue,
TotalCountryRevenue
)
)
Data Modelling With DAX
In the matrix visual, notice that Revenue % Total Country values are now only
displayed when a region is in scope.
In the following example, you will add a calculated column to the Customer table to
classify customers into a loyalty class. The scenario is simple: When the revenue that
is produced by the customer is less than $2500, the customer is classified as Low;
otherwise they're classified as High.
DAX
Customer Segment =
VAR CustomerRevenue = SUM(Sales[Sales Amount])
RETURN
IF(CustomerRevenue < 2500, "Low", "High")
On Page 4 of the report, add the Customer Segment column as the legend of the
pie chart.
Data Modelling With DAX
Notice that only one Customer Segment value exists. The reason is because the
calculated column formula produces an incorrect result: Each customer is assigned
the value of High because the expression SUM(Sales[Sales Amount]) isn't evaluated in
a filter context. Consequently, each customer is assessed on the sum
of every Sales Amount column value in the Sales table.
To force the evaluation of the SUM(Sales[Sales Amount]) expression for each customer,
a context transition must take place that applies the row context column values to
filter context. You can accomplish this transition by using the CALCULATE function
without passing in filter expressions.
Modify the calculated column definition so that it produces the correct result.
DAX
Customer Segment =
VAR CustomerRevenue = CALCULATE(SUM(Sales[Sales Amount]))
RETURN
IF(CustomerRevenue < 2500, "Low", "High")
In the pie chart visual add the new calculated column to the Legend well, verify that
two pie segments now display.
Data Modelling With DAX
In this case, the CALCULATE function applies row context values as filters, known
as context transition. To be accurate, the process doesn't quite work that way when a
unique column is on the table. When a unique column is on the table, you only need
to apply a filter on that column to make the transition happen. In this case, Power BI
applies a filter on the CustomerKey column for the value in row context.
Modify the calculated column definition, which references the Revenue measure,
and notice that it continues to produce the correct result.
DAX
Customer Segment =
VAR CustomerRevenue = [Revenue]
RETURN
IF(CustomerRevenue < 2500, "Low", "High")
Now, you can complete the Sales Commission measure formula. To produce a total,
you need to use an iterator function to iterate over all regions in filter context. The
iterator function expression must use the CALCULATE function to transition the row
context to the filter context. Notice that it no longer needs to test whether a
single Country column value in the Sales Territory table is in filter context because
it's known to be filtering by a single country/region (because it's iterating over the
regions in filter context and a region belongs to only one country/region).
Data Modelling With DAX
Switch to Page 3 of the report, and then modify the Sales Commission measure
definition to use the SUMX iterator function:
DAX
Sales Commission =
SUMX(
VALUES('Sales Territory'[Region]),
CALCULATE(
[Revenue]
* IF(
VALUES('Sales Territory'[Country]) = "United States",
0.15,
0.1
)
)
)
The table visual now displays a sales commission total for all regions.
1.Which type of model object is evaluated within a filter context? - Measures (or
measure expressions) are always evaluated in filter context.
2.Which one of the following DAX functions allows you to use an inactive
relationship when evaluating a measure expression? - The USERELATIONSHIP
function is a filter modifier function that can be passed in to the CALCULATE function.
Its purpose is to engage an inactive relationship.
Introduction
Time intelligence relates to calculations over time. Specifically, it relates to
calculations over dates, months, quarters, or years, and possibly time. Rarely would
you need to calculate over time in the sense of hours, minutes, or seconds.
For example, at the Adventure Works company, their financial year begins on July 1
and ends on June 30 of the following year. They produce a table visual that displays
monthly revenue and year-to-date (YTD) revenue.
The filter context for 2017 August contains each of the 31 dates of August, which
are stored in the Date table. However, the calculated year-to-date revenue for 2017
August applies a different filter context. It's the first date of the year through to the
last date in filter context. In this example, that's July 1, 2017 through to August 31,
2017.
Time intelligence calculations modify date filter contexts. They can help you answer
these time-related questions:
What growth in revenue has been achieved over the same period last year?
How many new customers made their first order in each month?
What's the inventory stock on-hand value for the company's products?
This module describes how to create time intelligence measures to answer these
questions.
DAX includes several time intelligence functions to simplify the task of modifying
date filter context. You could write many of these intelligence formulas by using
a CALCULATE function that modifies date filters, but that would create more work.
Note
Many DAX time intelligence functions are concerned with standard date periods,
specifically years, quarters, and months. If you have irregular time periods (for
example, financial months that begin mid-way through the calendar month), or you
need to work with weeks or time periods (hours, minutes, and so on), the DAX time
intelligence functions won't be helpful. Instead, you'll need to use
the CALCULATE function and pass in hand-crafted date or time filters.
To work with time intelligence DAX functions, you need to meet the prerequisite
model requirement of having at least one date table in your model. A date table is a
table that meets the following requirements:
It must have a column of data type Date (or date/time), known as the date column.
The date column must contain unique values.
The date column must not contain BLANKs.
The date column must not have any missing dates.
The date column must span full years. A year isn't necessarily a calendar year (January-
December).
The date table must be indicated as a date table.
One group of DAX time intelligence functions is concerned with summarizations over
time:
Data Modelling With DAX
DATESYTD - Returns a single-column table that contains dates for the year-to-date
(YTD) in the current filter context. This group also includes
the DATESMTD and DATESQTD DAX functions for month-to-date (MTD) and
quarter-to-date (QTD). You can pass these functions as filters into
the CALCULATE DAX function.
TOTALYTD - Evaluates an expression for YTD in the current filter context. The
equivalent QTD and MTD DAX functions of TOTALQTD and TOTALMTD are also
included.
DATESBETWEEN - Returns a table that contains a column of dates that begins with a
given start date and continues until a given end date.
DATESINPERIOD - Returns a table that contains a column of dates that begins with a
given start date and continues for the specified number of intervals.
Note
While the TOTALYTD function is simple to use, you are limited to passing in one filter
expression. If you need to apply multiple filter expressions, use the CALCULATE function
and then pass the DATESYTD function in as one of the filter expressions.
In the following example, you will create your first time intelligence calculation that
will use the TOTALYTD function. The syntax is as follows:
DAX
TOTALYTD(<expression>, <dates>, [, <filter>][, <year_end_date>])
Download and open the Adventure Works DW 2020 M07.pbix file. Then, add the
following measure definition to the Sales table that calculates YTD revenue. Format
the measure as currency with two decimal places.
DAX
Revenue YTD =
TOTALYTD([Revenue], 'Date'[Date], "6-30")
On Page 1 of the report, add the Revenue YTD measure to the matrix visual. Notice
that it produces a summarization of the revenue amounts from the beginning of the
year through to the filtered month.
Data Modelling With DAX
Another group of DAX time intelligence functions is concerned with shifting time
periods:
DATEADD - Returns a table that contains a column of dates, shifted either forward or
backward in time by the specified number of intervals from the dates in the current
filter context.
PARALLELPERIOD - Returns a table that contains a column of dates that represents a
period that is parallel to the dates in the specified dates column, in the current filter
context, with the dates shifted a number of intervals either forward in time or back in
time.
SAMEPERIODLASTYEAR - Returns a table that contains a column of dates that are
shifted one year back in time from the dates in the specified dates column, in the
current filter context.
Many helper DAX functions for navigating backward or forward for specific time
periods, all of which returns a table of dates. These helper functions
include NEXTDAY, NEXTMONTH, NEXTQUARTER, NEXTYEAR,
and PREVIOUSDAY, PREVIOUSMONTH, PREVIOUSQUARTER,
and PREVIOUSYEAR.
Now, you will add a measure to the Sales table that calculates revenue for the prior
year by using the SAMEPERIODLASTYEAR function. Format the measure as currency with
two decimal places.
DAX
Revenue PY =
VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date]))
RETURN
Data Modelling With DAX
RevenuePriorYear
Add the Revenue PY measure to the matrix visual. Notice that it produces results
that are similar to the previous year's revenue amounts.
Next, you will modify the measure by renaming it to Revenue YoY % and then
updating the RETURN clause to calculate the change ratio. Be sure to change the
format to a percentage with two decimals places.
DAX
Revenue YoY % =
VAR RevenuePriorYear = CALCULATE([Revenue], SAMEPERIODLASTYEAR('Date'[Date]))
RETURN
DIVIDE(
[Revenue] - RevenuePriorYear,
RevenuePriorYear
)
Data Modelling With DAX
Notice that the Revenue YoY % measure produces a ratio of change factor over the
previous year's monthly revenue. For example, July 2018 represents a 106.53
percent increase over the previous year's monthly revenue, and November 2018
represents a 24.22 percent decrease over the previous year's monthly revenue.
Note
The Revenue YoY % measure demonstrates a good use of DAX variables. The
measure improves the readability of the formula and allows you to unit test part of
the measure logic (by returning the RevenuePriorYear variable value). Additionally,
the measure is an optimal formula because it doesn't need to retrieve the prior year's
revenue value twice. Having stored it once in a variable, the RETURN clause uses to the
variable value twice.
Data Modelling With DAX
Other DAX time intelligence functions exist that are concerned with returning a
single date. You'll learn about these functions by applying them in two different
scenarios.
The FIRSTDATE and the LASTDATE DAX functions return the first and last date in the
current filter context for the specified column of dates.
Your first task is to add the following measure to the Sales table that counts the
number of distinct customers life-to-date (LTD). Life-to-date means from the
beginning of time until the last date in filter context. Format the measure as a whole
number by using the thousands separator.
DAX
Customers LTD =
VAR CustomersLTD =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MAX('Date'[Date])
),
'Sales Order'[Channel] = "Internet"
)
RETURN
CustomersLTD
Add the Customers LTD measure to the matrix visual. Notice that it produces a
result of distinct customers LTD until the end of each month.
Data Modelling With DAX
The DATESBETWEEN function returns a table that contains a column of dates that begins
with a given start date and continues until a given end date. When the start date is
BLANK, it will use the first date in the date column. (Conversely, when the end date is
BLANK, it will use the last date in the date column.) In this case, the end date is
determined by the MAX function, which returns the last date in filter context.
Therefore, if the month of August 2017 is in filter context, then the MAX function will
return August 31, 2017 and the DATESBETWEEN function will return all dates through to
August 31, 2017.
Next, you will modify the measure by renaming it to New Customers and by adding
a second variable to store the count of distinct customers before the time period in
filter context. The RETURN clause now subtracts this value from LTD customers to
produce a result, which is the number of new customers in the time period.
DAX
New Customers =
VAR CustomersLTD =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MAX('Date'[Date])
),
'Sales Order'[Channel] = "Internet"
)
VAR CustomersPrior =
CALCULATE(
DISTINCTCOUNT(Sales[CustomerKey]),
DATESBETWEEN(
'Date'[Date],
BLANK(),
MIN('Date'[Date]) - 1
Data Modelling With DAX
),
'Sales Order'[Channel] = "Internet"
)
RETURN
CustomersLTD - CustomersPrior
For the CustomersPrior variable, notice that the DATESBETWEEN function includes dates
until the first date in filter context minus one. Because Microsoft Power BI internally
stores dates as numbers, you can add or subtract numbers to shift a date.
Snapshot calculations
Occasionally, fact data is stored as snapshots in time. Common examples include
inventory stock levels or account balances. A snapshot of values is loaded into the
table on a periodic basis.
When summarizing snapshot values (like inventory stock levels), you can summarize
values across any dimension except date. Adding stock level counts across product
categories produces a meaningful summary, but adding stock level counts across
dates does not. Adding yesterday's stock level to today's stock level isn't a useful
operation to perform (unless you want to average that result).
When you are summarizing snapshot tables, measure formulas can rely on DAX time
intelligence functions to enforce a single date filter.
In the following example, you will explore a scenario for the Adventure Works
company. Switch to model view and select the Inventory model diagram.
Data Modelling With DAX
Notice that the diagram shows three tables: Product, Date, and Inventory.
The Inventory table stores snapshots of unit balances for each date and product.
Importantly, the table contains no missing dates and no duplicate entries for any
product on the same date. Also, the last snapshot record is stored for the date of
June 15, 2020.
Now, switch to report view and select Page 2 of the report. Add
the UnitsBalance column of the Inventory table to the matrix visual. Its default
summarization is set to sum values.
DAX
Stock on Hand =
CALCULATE(
SUM(Inventory[UnitsBalance]),
LASTDATE('Date'[Date])
)
Note
Notice that the measure formula uses the SUM function. An aggregate function must
be used (measures don't allow direct references to columns), but given that only one
row exists for each product for each date, the SUM function will only operate over a
single row.
Add the Stock on Hand measure to the matrix visual. The value for each product is
now based on the last recorded units balance for each month.
The measure returns BLANKs for June 2020 because no record exists for the last date
in June. According to the data, it hasn't happened yet.
Filtering by the last date in filter context has inherent problems: A recorded date
might not exist because it hasn't yet happened, or perhaps because stock balances
aren't recorded on weekends.
Your next step is to adjust the measure formula to determine the last date that has a
non-BLANK result and then filter by that date. You can achieve this task by using
the LASTNONBLANK DAX function.
Use the following measure definition to modify the Stock on Hand measure.
DAX
Stock on Hand =
CALCULATE(
SUM(Inventory[UnitsBalance]),
LASTNONBLANK(
'Date'[Date],
Data Modelling With DAX
CALCULATE(SUM(Inventory[UnitsBalance]))
)
)
In the matrix visual, notice the values for June 2020 and the total (representing the
entire year).
The LASTNONBLANK function is an iterator function. It returns the last date that produces
a non-BLANK result. It achieves this result by iterating through all dates in filter
context in descending chronological order. (Conversely, the FIRSTNONBLANK iterates in
ascending chronological order.) For each date, it evaluates the passed in expression.
When it encounters a non-BLANK result, the function returns the date. That date is
then used to filter the CALCULATE function.
Note
You should now hide the Inventory table UnitsBalance column. It will prevent
report authors from inappropriately summarizing snapshot unit balances.
1.In the context of semantic model calculations, which statement best describes
time intelligence? - Time intelligence calculations modify date filter contexts.
3.
Data Modelling With DAX
You have a table that stores account balance snapshots for each date, excluding
weekends. You need to ensure that your measure formula only filters by a
single date. Also, if no record is on the last date of a time period, it should use
the latest account balance. Which DAX time intelligence function should you
use? - The LASTNONBLANK function will return the last date in the filter context
where a snapshot record exists. This option will help you achieve the objective.
Introduction to performance
optimization
Performance optimization, also known as performance tuning, involves making
changes to the current state of the semantic model so that it runs more efficiently.
Essentially, when your semantic model is optimized, it performs better.
You might find that your report runs well in test and development environments, but
when deployed to production for broader consumption, performance issues arise.
From a report user's perspective, poor performance is characterized by report pages
that take longer to load and visuals taking more time to update. This poor
performance results in a negative user experience.
As a data analyst, you will spend approximately 90 percent of your time working with
your data, and nine times out of ten, poor performance is a direct result of a bad
semantic model, bad Data Analysis Expressions (DAX), or the mix of the two. The
process of designing a semantic model for performance can be tedious, and it is
often underestimated. However, if you address performance issues during
development, you will have a robust Power BI semantic model that will return better
reporting performance and a more positive user experience. Ultimately, you will also
be able to maintain optimized performance. As your organization grows, the size of
its data grows, and its semantic model becomes more complex. By optimizing your
semantic model early, you can mitigate the negative impact that this growth might
have on the performance of your semantic model.
A smaller sized semantic model uses less resources (memory) and achieves faster
data refresh, calculations, and rendering of visuals in reports. Therefore, the
performance optimization process involves minimizing the size of the semantic
model and making the most efficient use of the data in the model, which includes:
In this module, you will be introduced to the steps, processes, and concepts that are
necessary to optimize a semantic model for enterprise-level performance. However,
keep in mind that, while the basic performance and best practices guidance in Power
BI will lead you a long way, to optimize a semantic model for query performance, you
will likely have to partner with a data engineer to drive semantic model optimizing in
the source data sources.
For example, assume that you work as a Microsoft Power BI developer for Tailwind
Traders. You have been given a task to review a semantic model that was built a few
years ago by another developer, a person who has since left the organization.
The semantic model produces a report that has received negative feedback from
users. The users are happy with the results that they see in the report, but they are
not satisfied with the report performance. Loading the pages in the report is taking
too long, and tables are not refreshing quickly enough when certain selections are
made. In addition to this feedback, the IT team has highlighted that the file size of
this particular semantic model is too large, and it is putting a strain on the
organization's resources.
You need to review the semantic model to identify the root cause of the performance
issues and make changes to optimize performance.
19 minutes
To optimize performance, you must first identify where the problem is coming from;
in other words, find out which elements of your report and semantic model are
causing the performance issues. Afterward, you can take action to resolve those
issues and, therefore, improve performance.
You should review the measures and queries in your semantic model to ensure that
you are using the most efficient way to get the results that you want. Your starting
point should be to identify bottlenecks that exist in the code. When you identify the
slowest query in the semantic model, you can focus on the biggest bottleneck first
and establish a priority list to work through the other issues.
Data Modelling With DAX
Analyze performance
You can use Performance analyzer in Power BI Desktop to help you find out how
each of your report elements is performing when users interact with them. For
example, you can determine how long it takes for a particular visual to refresh when
it is initiated by a user interaction. Performance analyzer will help you identify the
elements that are contributing to your performance issues, which can be useful
during troubleshooting.
Before you run Performance analyzer, to ensure you get the most accurate results
in your analysis (test), make sure that you start with a clear visual cache and a clear
data engine cache.
Visual cache - When you load a visual, you can't clear this visual cache without
closing Power BI Desktop and opening it again. To avoid any caching in play,
you need to start your analysis with a clean visual cache.
To ensure that you have a clear visual cache, add a blank page to your Power BI
Desktop (.pbix) file and then, with that page selected, save and close the file.
Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open
on the blank page.
Data engine cache - When a query is run, the results are cached, so the results
of your analysis will be misleading. You need to clear the data cache before
rerunning the visual.
To clear the data cache, you can either restart Power BI Desktop or connect
DAX Studio to the semantic model and then call Clear Cache.
When you have cleared the caches and opened the Power BI Desktop file on the
blank page, go to the View tab and select the Performance analyzer option.
To begin the analysis process, select Start recording, select the page of the report
that you want to analyze, and interact with the elements of the report that you want
to measure. You will see the results of your interactions display in the Performance
analyzer pane as you work. When you are finished, select the Stop button.
Data Modelling With DAX
For more detailed information, see Use Performance Analyzer to examine report
element performance.
Review results
You can review the results of your performance test in the Performance
analyzer pane. To review the tasks in order of duration, longest to shortest, right-
click the Sort icon next to the Duration (ms) column header, and then select Total
time in Descending order.
The log information for each visual shows how much time it took (duration) to
complete the following categories of tasks:
DAX query - The time it took for the visual to send the query, along with the
time it took Analysis Services to return the results.
Visual display - The time it took for the visual to render on the screen,
including the time required to retrieve web images or geocoding.
Other - The time it took the visual to prepare queries, wait for other visuals to
complete, or perform other background processing tasks. If this category
displays a long duration, the only real way to reduce this duration is to optimize
DAX queries for other visuals, or reduce the number of visuals in the report.
Data Modelling With DAX
The results of the analysis test help you to understand the behavior of your semantic
model and identify the elements that you need to optimize. You can compare the
duration of each element in the report and identify the elements that have a long
duration. You should focus on those elements and investigate why it takes them so
long to load on the report page.
To analyze your queries in more detail, you can use DAX Studio, which is a free,
open-source tool that is provided by another service.
Visuals
If you identify visuals as the bottleneck leading to poor performance, you should find
a way to improve performance with minimal impact to user experience.
Data Modelling With DAX
Consider the number of visuals on the report page; fewer visuals means better
performance. Ask yourself if a visual is really necessary and if it adds value to the end
user. If the answer is no, you should remove that visual. Rather than using multiple
visuals on the page, consider other ways to provide additional details, such as drill-
through pages and report page tooltips.
Examine the number of fields in each visual. The more visuals you have on the report,
the higher chance for performance issues. In addition, the more visuals, the more the
report can appear crowded and lose clarity. The upper limit for visuals is 100 fields
(measures or columns), so a visual with more than 100 fields will be slow to load. Ask
yourself if you really need all of this data in a visual. You might find that you can
reduce the number of fields that you currently use.
DAX query
When you examine the results in the Performance analyzer pane, you can see how
long it took the Power BI Desktop engine to evaluate each query (in milliseconds). A
good starting point is any DAX query that is taking longer than 120 milliseconds. In
this example, you identify one particular query that has a large duration time.
Performance analyzer highlights potential issues but does not tell you what needs
to be done to improve them. You might want to conduct further investigation into
why this measure takes so long to process. You can use DAX Studio to investigate
your queries in more detail.
For example, select Copy Query to copy the calculation formula onto the clipboard,
then paste it into Dax Studio. You can then review the calculation step in more detail.
In this example, you are trying to count the total number of products with order
quantities greater than or equal to five.
Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
FILTER ( Order, Order[OrderQty] >= 5 )
)
After analyzing the query, you can use your own knowledge and experience to
identify where the performance issues are. You can also try using different DAX
Data Modelling With DAX
functions to see if they improve performance. In the following example, the FILTER
function was replaced with the KEEPFILTER function. When the test was run again
in Performance analyzer, the duration was shorter as a result of the KEEPFILTER
function.
Copy
Count Customers =
CALCULATE (
DISTINCTCOUNT ( Order[ProductID] ),
KEEPFILTERS (Order[OrderQty] >= 5 )
)
In this case, you can replace the FILTER function with the KEEPFILTER function to
significantly reduce the evaluation duration time for this query. When you make this
change, to check whether the duration time has improved or not, clear the data
cache and then rerun the Performance analyzer process.
Semantic model
If the duration of measures and visuals are displaying low values (in other words they
have a short duration time), they are not the reason for the performance issues.
Instead, if the DAX query is displaying a high duration value, it is likely that a
measure is written poorly or an issue has occurred with the semantic model. The
issue might be caused by the relationships, columns, or metadata in your model, or it
could be the status of the Auto date/time option, as explained in the following
section.
Relationships
You should review the relationships between your tables to ensure that you have
established the correct relationships. Check that relationship cardinality properties
are correctly configured. For example, a one-side column that contains unique values
might be incorrectly configured as a many-side column. You will learn more about
how cardinality affects performance later in this module.
Data Modelling With DAX
Columns
It is best practice to not import columns of data that you do not need. To avoid
deleting columns in Power Query Editor, you should try to deal with them at the
source when loading data into Power BI Desktop. However, if it is impossible to
remove redundant columns from the source query or the data has already been
imported in its raw state, you can always use Power Query Editor to examine each
column. Ask yourself if you really need each column and try to identify the benefit
that each one adds to your semantic model. If you find that a column adds no value,
you should remove it from your semantic model. For example, suppose that you
have an ID column with thousands of unique rows. You know that you won't use this
particular column in a relationship, so it will not be used in a report. Therefore, you
should consider this column as unnecessary and admit that it is wasting space in
your semantic model.
When you remove an unnecessary column, you will reduce the size of the semantic
model which, in turn, results in a smaller file size and faster refresh time. Also,
because the semantic model contains only relevant data, the overall report
performance will be improved.
For more information, see Data reduction techniques for Import modeling.
Metadata
When you load data into Power BI Desktop, it is good practice to analyze the
corresponding metadata so you can identify any inconsistences with your semantic
model and normalize the data before you start to build reports. Running analysis on
your metadata will improve semantic model performance because, while analyzing
your metadata, you will identify unnecessary columns, errors within your data,
incorrect data types, the volume of data being loaded (large semantic models,
including transactional or historic data, will take longer to load), and much more.
You can use Power Query Editor in Power BI Desktop to examine the columns, rows,
and values of the raw data. You can then use the available tools, such as those
highlighted in the following screenshot, to make the necessary changes.
Data Modelling With DAX
Unnecessary columns - Evaluates the need for each column. If one or more
columns will not be used in the report and are therefore unnecessary, you
should remove them by using the Remove Columns option on the Home tab.
Unnecessary rows - Checks the first few rows in the semantic model to see if
they are empty or if they contain data that you do not need in your reports; if
so, it removes those rows by using the Remove Rows option on the Home tab.
Data type - Evaluates the column data types to ensure that each one is correct.
If you identify a data type that is incorrect, change it by selecting the column,
selecting Data Type on the Transform tab, and then selecting the correct data
type from the list.
Query names - Examines the query (table) names in the Queries pane. Just like
you did for column header names, you should change uncommon or unhelpful
query names to names that are more obvious or names that the user is more
familiar with. You can rename a query by right-clicking that query,
selecting Rename, editing the name as required, and then pressing Enter.
Column details - Power Query Editor has the following three data preview
options that you can use to analyze the metadata that is associated with your
columns. You can find these options on the View tab, as illustrated in the
following screenshot.
o Column quality - Determines what percentage of items in the column are
valid, have errors, or are empty. If the Valid percentage is not 100, you
should investigate the reason, correct the errors, and populate empty values.
o Column distribution - Displays frequency and distribution of the values in
each of the columns. You will investigate this further later in this module.
o Column profile - Shows column statistics chart and a column distribution
chart.
Data Modelling With DAX
Note
If you are reviewing a large semantic model with more than 1,000 rows, and you
want to analyze that whole semantic model, you need to change the default option
at the bottom of the window. Select Column profiling based on top 1000
rows > Column profiling based on entire data set.
Other metadata that you should consider is the information about the semantic
model as a whole, such as the file size and data refresh rates. You can find this
metadata in the associated Power BI Desktop (.pbix) file. The data that you load into
Power BI Desktop is compressed and stored to the disk by the VertiPaq storage
engine. The size of your semantic model has a direct impact on its performance; a
smaller sized semantic model uses less resources (memory) and achieves faster data
refresh, calculations, and rendering of visuals in reports.
Data Modelling With DAX
The Auto date/time option allows you to work with time intelligence when filtering,
grouping, and drilling down through calendar time periods. We recommend that you
keep the Auto date/time option enabled only when you work with calendar time
periods and when you have simplistic model requirements in relation to time.
If your data source already defines a date dimension table, that table should be used
to consistently define time within your organization, and you should disable the
global Auto date/time option. Disabling this option can lower the size of your
semantic model and reduce the refresh time.
You can enable/disable this Auto date/time option globally so that it applies to all
of your Power BI Desktop files, or you can enable/disable the option for the current
file so that it applies to an individual file only.
For an overview and general introduction to the Auto date/time feature, see Apply
auto date/time in Power BI Desktop.
You can use variables in your DAX formulas to help you write less complex and more
efficient calculations. Variables are underused by developers who are starting out in
Power BI Desktop, but they are effective and you should use them by default when
you are creating measures.
Data Modelling With DAX
Some expressions involve the use of many nested functions and the reuse of
expression logic. These expressions take a longer time to process and are difficult to
read and, therefore, troubleshoot. If you use variables, you can save query processing
time. This change is a step in the right direction toward optimizing the performance
of a semantic model.
The use of variables in your semantic model provides the following advantages:
The second row of the table shows the improved measure definition. This definition
uses the VAR keyword to introduce a variable named SalesPriorYear, and it uses an
expression to assign the "same period last year" result to that new variable. It then
uses the variable twice in the DIVIDE expression.
Without variable
DAX
Sales YoY Growth =
DIVIDE (
( [Sales] - CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
),
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
)
Data Modelling With DAX
With variable
DAX
Sales YoY Growth =
VAR SalesPriorYear =
CALCULATE ( [Sales], PARALLELPERIOD ( 'Date'[Date], -12, MONTH ) )
VAR SalesVariance =
DIVIDE ( ( [Sales] - SalesPriorYear ), SalesPriorYear )
RETURN
SalesVariance
In the first measure definition in the table, the formula is inefficient because it
requires Power BI to evaluate the same expression twice. The second definition is
more efficient because, due to the variable, Power BI only needs to evaluate the
PARALLELPERIOD expression once.
If your semantic model has multiple queries with multiple measures, the use of
variables could cut the overall query processing time in half and improve the overall
performance of the semantic model. Furthermore, this solution is a simple one;
imagine the savings as the formulas get more complicated, for instance, when you
are dealing with percentages and running totals.
When using variables, it is best practice to use descriptive names for the variables. In
the previous example, the variable is called SalesPriorYear, which clearly states what
the variable is calculating. Consider the outcome of using a variable that was
called X, temp or variable1; the purpose of the variable would not be clear at all.
Using clear, concise, meaningful names will help make it easier for you to understand
what you are trying to calculate, and it will be much simpler for other developers to
maintain the report in the future.
DAX
Sales YoY Growth % =
VAR SalesPriorYear = CALCULATE([Sales], PARALLELPERIOD('Date'[Date], -12, MONTH))
VAR SalesPriorYear% = DIVIDE(([Sales] - SalesPriorYear), SalesPriorYear)
RETURN SalesPriorYear%
The RETURN expression will display the SalesPriorYear% value only. This technique
allows you to revert the expression when you have completed the debugging. It also
makes calculations simpler to understand due to reduced complexity of the DAX
code.
Reduce cardinality
Distinct values count - The total number of different values found in a given
column.
Unique values count - The total number of values that only appear once in a
given column.
Data Modelling With DAX
A column that has a lot of repeated values in its range (unique count is low) will have
a low level of cardinality. Conversely, a column that has a lot of unique values in its
range (unique count is high) will have a high level of cardinality.
When you create or edit a relationship, you can configure additional options. By
default, Power BI Desktop automatically configures additional options based on its
best guess, which can be different for each relationship based on the data in the
columns.
The relationships can have different cardinality. Cardinality is the direction of the
relationship, and each model relationship must be defined with a cardinality type.
The cardinality options in Power BI are:
One-to-one (1:1) - In this relationship type, the column in one table has only
one instance of a particular value, and the other related table has only one
instance of a particular value.
One-to-many (1:*) - In this relationship type, the column in one table has only
one instance of a particular value, and the other related table can have more
than one instance of a value.
Many-to-many (:) - With composite models, you can establish a many-to-
many relationship between tables, which removes requirements for unique
values in tables. It also removes previous workarounds, such as introducing new
tables only to establish relationships.
During development, you are creating and editing relationships in your model, so
when you are building new relationships in your model, regardless of what
cardinality you have chosen, always ensure that both of the columns that you are
using to participate in a relationship are sharing the same data type. Your model will
never work if you try to build a relationship between two columns, where one
column has a text data type and another column has an integer data type.
In the following example, the ProductID field has the data type Whole number in
the Product and Sales tables. The columns with data type Integer perform better
than columns with data type Text.
Perhaps the most effective technique to reduce a model size is to use a summary
table from the data source. Where a detail table might contain every transaction, a
summary table would contain one record per day, per week, or per month. It might
be an average of all of the transactions per day, for instance.
For example, a source sales fact table stores one row for each order line. Significant
data reduction could be achieved by summarizing all sales metrics if you group by
date, customer, and product, and individual transaction detail is not needed.
Consider then that an even more significant data reduction could be achieved by
grouping by date at month level. It could achieve a possible 99 percent reduction in
model size; but, reporting at day level or an individual order level is no longer
possible. Deciding to summarize fact-type data will always involve a tradeoff with the
detail of your data. A disadvantage is that you might lose the ability to drill into data
because the detail no longer exists. This tradeoff could be mitigated by using a
mixed model design.
An effective technique to reduce the model size is to set the Storage Mode property
for larger fact-type tables to DirectQuery. This design approach can work well in
conjunction with techniques that are used to summarize your data. For example, the
summarized sales data could be used to achieve high performance "summary"
reporting. A drill-through page could be created to display granular sales for specific
(and narrow) filter context, displaying all in-context sales orders. The drill-through
page would include visuals based on a DirectQuery table to retrieve the sales order
data (sales order details).
For more information, see Data reduction techniques for Import modeling.
DirectQuery is one way to get data into Power BI Desktop. The DirectQuery method
involves connecting directly to data in its source repository from within Power BI
Desktop. It is an alternative to importing data into Power BI Desktop.
Data Modelling With DAX
When you use the DirectQuery method, the overall user experience depends heavily
on the performance of the underlying data source. Slow query response times will
lead to a negative user experience and, in the worst-case scenarios, queries might
time out. Also, the number of users who are opening the reports at any one time will
impact the load that is placed on the data source. For example, if your report has 20
visuals in it and 10 people are using the report, 200 queries or more will exist on the
data source because each visual will issue one or more queries.
Unfortunately, the performance of your Power BI model will not only be impacted by
the performance of the underlying data source, but also by other uncontrollable
factors, such as:
Therefore, using DirectQuery poses a risk to the quality of your model's performance.
To optimize performance in this situation, you need to have control over, or access
to, the source database.
For more detailed information, see DirectQuery model guidance in Power BI Desktop.
It is best practice to import data into Power BI Desktop, but your organization might
need to use the DirectQuery data connectivity mode because of one of the following
reasons (benefits of DirectQuery):
If your organization needs to use DirectQuery, you should clearly understand its
behavior within Power BI Desktop and be aware of its limitations. You will then be in
a good position to take action to optimize the DirectQuery model as much as
possible.
When you use DirectQuery to connect to data in Power BI Desktop, that connection
behaves in the following way:
When you initially use the Get Data feature in Power BI Desktop, you will select
the source. If you connect to a relational source, you can select a set of tables
and each one will define a query that logically returns a set of data. If you select
a multidimensional source, such as SAP BW, you can only select the source.
When you load the data, no data is imported into the Power BI Desktop, only
the schema is loaded. When you build a visual within Power BI Desktop, queries
are sent to the underlying source to retrieve the necessary data. The time it
takes to refresh the visual depends on the performance of the underlying data
source.
If changes are made to the underlying data, they won't be immediately
reflected in the existing visuals in Power BI due to caching. You need to carry
out a refresh to see those changes. The necessary queries are present for each
visual, and the visuals are updated accordingly.
When you publish the report to the Power BI service, it will result in a semantic
model in Power BI service, the same as for import. However, no data is included
with that semantic model.
When you open an existing report in Power BI service, or build a new one, the
underlying source is again queried to retrieve the necessary data. Depending
on the location of the original source, you might have to configure an on-
premises data gateway.
You can pin visuals, or entire report pages, as dashboard tiles. The tiles are
automatically refreshed on a schedule, for example, every hour. You can control
Data Modelling With DAX
the frequency of this refresh to meet your requirements. When you open a
dashboard, the tiles reflect the data at the time of the last refresh and might not
include the latest changes that are made to the underlying data source. You can
always refresh an open dashboard to ensure that it's up-to-date.
The use of DirectQuery can have negative implications. The limitations vary,
depending on the specific data source that is being used. You should take the
following points into consideration:
Now that you have a brief understanding of how DirectQuery works and the
limitations that it poses, you can take action to improve the performance.
Optimize performance
Data Modelling With DAX
Continuing with the Tailwind Traders scenario, during your review of the semantic
model, you discover that the query used DirectQuery to connect Power BI Desktop to
the source data. This use of DirectQuery is the reason why users are experiencing
poor report performance. It's taking too long to load the pages in the report, and
tables are not refreshing quickly enough when certain selections are made. You need
to take action to optimize the performance of the DirectQuery model.
You can examine the queries that are being sent to the underlying source and try to
identify the reason for the poor query performance. You can then make changes in
Power BI Desktop and the underlying data source to optimize overall performance.
When you have optimized the data source as much as possible, you can take further
action within Power BI Desktop by using Performance analyzer, where you can
isolate queries to validate query plans.
You can analyze the duration of the queries that are being sent to the underlying
source to identify the queries that are taking a long time to load. In other words, you
can identify where the bottlenecks exist.
You don't need to use a special approach when optimizing a DirectQuery model; you
can apply the same optimization techniques that you used on the imported data to
tune the data from the DirectQuery source. For example, you can reduce the number
of visuals on the report page or reduce the number of fields that are used in a visual.
You can also remove unnecessary columns and rows.
Your first stop is the data source. You need to tune the source database as much as
possible because anything you do to improve the performance of that source
database will in turn improve Power BI DirectQuery. The actions that you take in the
database will do the most good.
Consider the use of the following standard database practices that apply to most
situations:
push the expression back to the source because it avoids the push down. You
could also consider adding surrogate key columns to dimension-type tables.
Review the indexes and verify that the current indexing is correct. If you need to
create new indexes, ensure that they are appropriate.
Refer to the guidance documents of your data source and implement their
performance recommendations.
Power BI Desktop gives you the option to send fewer queries and to disable certain
interactions that will result in a poor experience if the resulting queries take a long
time to run. Applying these options prevents queries from continuously hitting the
data source, which should improve performance.
In this example, you edit the default settings to apply the available data reduction
options to your model. You access the settings by selecting File > Options and
settings > Options, scrolling down the page, and then selecting the Query
reduction option.
7 minutes
When aggregating data, you summarize that data and present it in at a higher grain
(level). For example, you can summarize all sales data and group it by date, customer,
product, and so on. The aggregation process reduces the table sizes in the semantic
model, allowing you to focus on important data and helping to improve the query
performance.
Data Modelling With DAX
Your organization might decide to use aggregations in their semantic models for the
following reasons:
If you are dealing with a large amount of data (big data), aggregations will
provide better query performance and help you analyze and reveal the insights
of this large data. Aggregated data is cached and, therefore, uses a fraction of
the resources that are required for detailed data.
If you are experiencing a slow refresh, aggregations will help you speed up the
refresh process. The smaller cache size reduces the refresh time, so data gets to
users faster. Instead of refreshing what could be millions of rows, you would
refresh a smaller amount of data instead.
If you have a large semantic model, aggregations can help you reduce and
maintain the size of your model.
If you anticipate your semantic model growing in size in the future, you can use
aggregations as a proactive step toward future proofing your semantic model
by lessening the potential for performance and refresh issues and overall query
problems.
Continuing with the Tailwind Traders scenario, you have taken several steps to
optimize the performance of the semantic model, but the IT team has informed you
that the file size is still too large. The file size is currently 1 gigabyte (GB), so you
need to reduce it to around 50 megabytes (MB). During your performance review,
you identified that the previous developer did not use aggregations in the semantic
model, so you now want to create some aggregations for the sales data to reduce
the file size and further optimize the performance.
Create aggregations
Before you start creating aggregations, you should decide on the grain (level) on
which you want to create them. In this example, you want to aggregate the sales data
at the day level.
Data Modelling With DAX
When you decide on the grain, the next step is to decide on how you want to create
the aggregations. You can create aggregations in different ways and each method
will yield the same results, for example:
If you have access to the database, you could create a table with the
aggregation and then import that table into Power BI Desktop.
If you have access to the database, you could create a view for the aggregation
and then import that view into Power BI Desktop.
In Power BI Desktop, you can use Power Query Editor to create the
aggregations step-by-step.
In this example, you open a query in Power Query Editor and notice that the data has
not been aggregated; it has over 999 rows, as illustrated the following screenshot.
You want to aggregate the data by the OrderDate column and view
the OrderQuantity and SalesAmount columns. Start by selecting Choose
Columns on the Home tab. On the window that displays, select the columns that
you want in the aggregation and then select OK.
Data Modelling With DAX
When the selected columns display on the page, select the Group By option on
the Home tab. On the window that displays, select the column that you want to
group by (OrderDate) and enter a name for the new column (OnlineOrdersCount).
Select the Advanced option and then select the Add aggregation button to display
another column row. Enter a name for the aggregation column, select the operation
of the column, and then select the column to which you want to link the aggregation.
Repeat these steps until you have added all the aggregations and then select OK.
Data Modelling With DAX
It might take a few minutes for your aggregation to display, but when it does, you'll
see how the data has been transformed. The data will be aggregated into each date,
and you will be able to see the values for the orders count and the respective sum of
the sales amount and order quantity.
Select the Close and Apply button to close Power Query Editor and apply the
changes to your semantic model. Return to the Power BI Desktop page and then
select the Refresh button to see the results. Observe the screen because a brief
message will display the number of rows that your semantic model now has. This
number of rows should be significantly less than the number that you started with.
You can also see this number when you open Power Query Editor again, as illustrated
in the following screenshot. In this example, the number of rows was reduced to 30.
Remember, you started with over 999 rows. Using aggregation has significantly
reduced the number of rows in your semantic model, which means that Power BI has
less data to refresh and your model should perform better.
Manage aggregations
When you have created aggregations, you can manage those aggregations in Power
BI Desktop and make changes to their behavior, if required.
You can open the Manage Aggregations window from any view in Power BI
Desktop. In the Fields pane, right-click the table and then select Manage
aggregations.
Data Modelling With DAX
For more detailed information on how to create and manage aggregations, see Use
aggregations in Power BI Desktop.