Posted on 

 by 

 in ,

Notes: The Definitive Guide to DAX

This page originated as a place to share my chapter notes from The Definitive Guide to DAX by Marco Russo and Alberto Ferrari. It has evolved into a digital diary of topics as I learn. I’ll share notes and screencaps from public content from SQLBI and Jeffery Wang here, and over time share my own hands-on demos and document my learning experience here.

Please also check out all of the individual topic links provided on this page that provide original article or video references.

If this page is helpful for you, please use www.daxanatomy.com as a shortcut to return to this post.

Contact me at: mark@storybi.com with any questions or suggestions.

Chapter 1What is DAX?

Chapter 2 – Introducing DAX

Chapter 3 – Using Basic Table Functions

Chapter 4 – Understanding Evaluation Contexts

Chapter 5 – Understanding CALCULATE and CALCULATETABLE

Chapter 6 – Variables

Chapter 7 – Working with Iterators and with CALUCALUTE

Chapter 8 – Time Intelligence Calculations

Chapter 9 – Calculation Groups

Chapter 10 – Working with the Filter Context

Chapter 11 – Handling Hierarchies

Chapter 12 – Working with Tables

Chapter 13 – Authoring Queries

Chapter 14 – Advanced DAX Concepts

Chapter 15 – Advanced Relationships

Chapter 16 – Advanced Calculations in DAX

Chapter 17 – The DAX Engines

Chapter 18 – Optimizing VertiPaq

Chapter 19 – Analyzing DAX Query Plans

Chapter 20 – Optimizing DAX

Chapter 1What is DAX?

🤿 Beyond Clicks to Concepts: This guide isn’t about just the ‘how-tos’. It’s a deep dive into optimizing DAX and understanding the underlying concepts. Perfect for those interested in mastering DAX.

⚡️ Model & Relationships: The backbone of report performance is a well-structured model. Grasping the types and directions of relationships between tables is a crucial start to our journey.

🧙‍♂️ DAX vs. Excel: Although similar to Excel’s expression language at first, DAX demands a solid grasp of its theoretical aspects. Without this foundation, the more complex functions remain hidden in mystery.

📈 Beyond Calculated Columns: Coming from Excel, I’ve realized the power of iterators and optimization strategies in DAX. Iterations are a game-changer for model flexibility and report performance.

💪🏼 Patience, Persistence, and Understanding: DAX isn’t just about writing functions and getting results. It’s about building confidence through patience, persistence, and a deep understanding of evaluation context. Revisiting topics as we progress will reveal new insights especially in areas we may have initially glossed over.

📈 I’ll share more as I progress. Let’s discuss and grow together with DAX!

Chapter 2 – Introducing DAX

⚡️Understanding the Context:
Calculated columns are static and calculate in model tables row-by-row, ideal for pre-computing complex evaluations at refresh time and establishing segmented group labels in dimension tables (i.e., defining customer size, age group, etc.).

⚡️Measures on the other hand are dynamic, re-compute at query time based on report user interaction, and well suited when working with aggregated columns.

⚡️Performance Considerations:
Calculated columns must be considered carefully. On one hand they are an easy way to add new data to your model, especially when coming from Excel. However, calculated columns consume memory and will negatively impact performance over large models. Best practice is to avoid them in Power BI and push new columns as close to the original data source as possible.

⚡️Measures generally offer better performance in large datasets. Measures calculate results on the fly at query time, utilizing the in-memory engine of the model. There are many reasons to favor using measures over calculated columns.

⚡️Using Variables:
Variables in DAX offer enhanced readability and maintainability of code, along with performance improvements. By storing a value or expression result once and reusing it multiple times in your calculations, variables reduce the number of times DAX engine needs to compute the same expression, leading to faster and more efficient data models.

⚡️Iterators:
At this point the book focuses more on measures that simply aggregate a single column, however measures can also compute an expression prior to aggregating the final column. This is useful when needing a row-by-row evaluation (such as quantity x unit price) without having to create a new physical column in your table. This topic is a sneak peek to discovering more flexibility in DAX.

⚡️Optimizing Error Checking:
Using error handling functions can be a graceful way to intercept and communicate errors, however there can be significant performance cost, as the DAX engine can’t use optimization paths when errors exist.

Chapter 3 – Using Basic Table Functions

I’m discovering insights that are transforming the way I use basic table functions in Power BI. 🚀

🔍 Mastering FILTER:
Discovering that FILTER can act as both a table function and as an iterator. When you layer filters – the order of nesting matters and has an impact on performance!

🔗 Unlocking the Power of ALL:
The name says it all! ALL is the key to bypassing filter contexts and getting to the unfiltered truth of data. Whether it’s the entire table or distinct column values, ALL has got you covered.

ALLEXCEPT is your friend for adapting to future table column adds without missing this remove filter, however the EXCEPT in this function does not mean it explicitly restores a filter. In fact, ALL* functions only ignore or remove filters. The EXCEPT portion only ignores column filters on existing filters on the report visual. If the visual does not contain this column filter, ALLEXCEPT acts as an ALL or REMOVEFILTERS CALCULATE modifier.

One nuance of ALL is that it can be used as both a table argument or as a CALCULATE modifier. As a table argument (CALCULATETABLE(ALL(Products…), the ALL does not remove filter context, but ignores it.

If we explicitly add a CALCULATETABLE filter argument over Product to restore a filter, even though it’s a valid filter that would normally respect filter context, the ALL table function will ignore the filter.

However using ALL as a CALCULATE modifier, the behavior is to modify filter context and REMOVEFILTERS. This time we can add an explicit CALCULATE filter over Product (i.e., using VALUES) and restore the filter.

🛠️ Tackling Blank Rows:
Blank rows in your sales visual? No worries. While VALUES & DISTINCT both respect filter context, DISTINCT ignores missing dimension values that cause blank rows in your report. Use VALUES to reveal and rectify those pesky missing dimensions sooner rather than later.

It is also worth noting here that full table references (i.e., as used in FILTER as a filter in CALCULATE) also ignore the blanks, or potential of blank rows in a dimension table. This becomes important with running into circular dependency issues creating relationships between a calculated column depending on the source dimension table.


📊 Scalar Values Simplified:
When your data needs to be scalar, VALUES can still be an option as a scalar expression if used with care. Whether you’re counting rows, verifying single values, or creatively merging multiple values into one with CONCATENATEX, there are multiple ways to test for a scalar value ensuring that VALUES returns a single measure result.

✨ The magic of ALLSELECTED:
ALLSELECTED is nothing short of wizardry in function form, seemingly aware of both report visuals and slicer selections to maintain the integrity of your ratios. It’s the ace up your sleeve when report users want to see percentage breakdowns specific to their external report selections.

Chapter 4 – Understanding Evaluation Contexts

📊 This chapter prompted me to dust off this blog post I wrote to introduce students to this essential topic. 🚀 Whether you’re a beginner or years into your journey, grasping Evaluation Context is the key to confident data modeling.

⚡️Power BI works best when tables relate to each other using a star schema data model. A star schema in its basic form has one transaction table (fact table) surrounded by multiple dimension tables that allow us to isolate and describe different aspects of our report… the who (Customer), what (Product), when (Date characteristics), and where (Branch locations) related to sales. Dimensions give us the sales location, product vendor, month and year of sale etc., that define total sales for each and every data-point on our report.

⚡️DAX works with evaluation context over the model to calculate results on the fly for our report users. DAX is more than just a simple calculator, it also helps bring flexibility and features to our report users, helping them better engage and discover meaningful insight.

⚡️Evaluation Context involves two widely used environments within Power BI called Filter and Row Context. Both serve different purposes and are often combine in the same evaluation to produce accurate results.

In more advanced examples including an iterator function in our measure, context transition may replace the outer CALUCALTE filters with nested inner CALCULATE filters. Since nested inner CALCUALTE prevail over competing outer filters, we need to be aware of how this evaluation works in the presence of competing CALCULATE filters. See arbitrary shape set example: SUMX(Month… See Ch. 10 – Working With The Filter Context).

⚡️Filter Context ‘eliminates’ rows: When we select a single value on our report visual, this creates an automatic query that returns a table that meet the filter criteria… rows that don’t meet our filter criteria are eliminated from this queried table. The table result serves as a table filter over our dimension table (the one side of the relationship), followed by a corresponding filter over the fact table (the many sides of the relationship).

Those filters reduce the entire model to only keep table rows necessary to satisfy that part/point of the visual.

Finally, DAX evaluates the expression over surviving rows returning results back to our visual!

It is impressive to think that this exercise occurs on every click of a new filter selection (Slicer or visual value) on the reporting canvas.

Seeing is Believing!

Filter Context is an amazing feature for report driven query and results; it just happens automatically! However Filter Context also represents an opportunity for developer driven results that allow us to build in extreme flexibility and features into our report.

Before going step by step with CALCULATE, let’s walk through the the implicit side of CALCULATE and its direct relationship with creating filter context:

In order to take full advantage of the developer side of filter context, we need to understand how CALCULATE works to have confidence that our business requirements combine with our model to bring accurate results back to our report readers.

Let’s take a more detailed look at how filter context works by using CALCULATE:

⚡️Filter context is great for column filters and aggregations, but what if we need more than just a simple filter and vertical single column aggregation? What if we need to first multiply sales quantity times unit price horizontally within a table row? We have two options, add a physical calculated column, or use an iterator function within our measure. 

⚡️Row Contextevaluates’ rows: Anytime we add a column to a physical table, the default evaluation context is row context. We learned that filter context filters tables in our model. Row context splits apart a table creating individual isolated header-row combinations for each row in our table. This allows DAX to evaluate each single row horizontally between column references.

I think of calculated columns as the ‘Excel solution’… It’s easy, familiar, and we can see it working with our own eyes 👀. Excel knows by default to iterate our math function one row at a time… this is row context. In Power BI, Just adding a new column using DAX seems like an easy solution, and it is, however this approach is not best practice and comes with a cost to model size and performance🛑.

⚡️The preferred solution to our row-by-row mystery is to use an iterator function like SUMX. SUMX combines two-steps into one measure.

1. First we evaluate columns horizontally row-by-row… think quantity times price just like in Excel, except the new column is virtual in memory.

2. Next, SUMX takes that new virtual column and aggregates a vertical total.

SUMX is the best of both worlds, it evaluates our records row-by-row and aggregates the result. The best part of SUMX is that we avoid the need to add unnecessary columns to our model!

A good learning example is using SUMX with a filtered table followed by a simple SUM aggregation over a column. You may be surprised that the result sums the entire table column (not just the visible rows in the filtered table). This is because SUM does not respect row context, it respects filter context.

What about our filtered table? It does iterate over just the filtered rows, however the SUM aggregation does not respect row context, nor does it create a filter context whatsoever; thus it aggregates (for each row) the entire ‘unfiltered’ column.

How do we fix this? Remove SUM and just use a table column reference that does respect row context, or create a filter context by wrapping a CALCULATE around the SUM expression to filter over each iterated row triggering a context transition.

When there are multiple row contexts involved, the outer row context is hidden from the inner. This relates to the use of variables and the EARLIER function that was used prior to variables. I view the inner row context as independent from the outer row context. Variables and EARLIER, allow you to bring in the external row context, unaffected by the current inner row context.

⚡️Need to filter a table evaluating two columns? Say filtering female customers living in Oregon? The FILTER function is also a row-by-row iterator that results in a filtered table rather than a total aggregation.

This is a basic introduction; the book goes far beyond this to help us really understand how the model interacts with our measures to produce visuals.


Chapter 5 – Understanding CALCULATE and CALCULATETABLE

🚀 This chapter is the heart of DAX.

⚡️I view CALCULATE as a container that holds a collection of filters both original and new filters, that filter over our model. Understanding how CACLULATE operates over a multi-step sequence including both seen and unseen filter elements is the key to progressing in DAX.

Filter Context Terms:

1. Original (Initial) Filter Context

  • Definition: A set of implicit filters applied automatically through labels visible on the report visual, as well as user selections (slicers, cross-filtering from other visuals, etc.).
  • Key Point: You don’t typically write DAX code to create the original filter context—it is established automatically for you based on the report design and user interactions.
  • Effect: DAX queries use these filters behind the scenes (unseen) to return relevant data for the current visual on the report canvas.

How does CALCULATE create implicit unseen filter context?

  1. Values shown on our report canvas require two DAX attributes, an aggregation (Scalar value) and filter context, there are no exceptions.
  2. If we drag a column reference onto our report canvas, an unseen measure aggregation is assumed by DAX behind the scenes i.e. the column: ‘SalesTable[Sales]’ brought onto the canvas becomes: SUM(SalesTable[Sales]). This is an assumed implicit measure.
  3. All measures, explicit, implicit, or referenced, include an automatic CALCULATE wrapped around the outside. Our implicit measure from above, SUM(SalesTable[Sales]) now looks like this: CALCUALTE(SUM(SalesTable[Sales])).
  4. CALCULATE automatically pulls in all currently visible model values as seen in the current context. If on the reporting canvas (Filter Context environment) all report canvas rows, column, and selected outside slicer/filter values turn into table filters combined to filter those same columns in our model tables.
  5. If in a model table (Row Context) CALCULATE similarly pulls in all values visible from each isolated row, including row values from expanded tables. The end result is a reduced filtered model used to aggregate a result for each specific visual cell or table row. I think of these filters as sub-queries filtered over the model.
  6. Measures filter and aggregate over our report canvas (or over each row in a table). We know that the unseen and automatic CALCULATE is directly responsible for generating filter context for each value on our visual. CALCULATE is filter context!

2. Current Filter Context

  • Definition: The “current” filter context starts with the original filter context, then we add developer explicit CALCULATE modifiers, explicit added filters (Level 4 filters), and a filter modifier such as KEEPFILTERS all through the use of the CALCULATE function.
  • CALCULATE:
    • Often called the developers “Swiss Army Knife” of DAX.
    • Let’s you alter, add, and protect the original filter context in a logical sequence. The final result is the new filter context.
    • Example: CALCULATE(SUM(Sales[Amount]), FILTER(...)) would add or alter the original filters during the evaluation of the measure.
  • Effect: Any calculations or aggregations inside CALCULATE are performed under this current filter context.

3. New Filter Context

Key Point: This is the net result of all filter layers from above. Including 1. Report filters, 2. Table row filters, 3. Developer altered filters, 4. Developer added filters, and finally 5. Protected filter modifiers.

Definition: The final set of filters after the original context plus all developer explicit filters introduced within CALCULATE (or other DAX expressions that modify filters).

Returning to the Visual: Once the measure filters the model, the aggregation (Sum, Min, Max,…) result is handed back to the report canvas. However, the result is returned to the original filter context structure, all original labels still remain even though the resulting values may be different.

CALCULATE and Filter Context:

What makes CACLUATE tricky is the need to be aware of its presence and behavior even when it’s not visible in our DAX or reporting canvas. As we explore CALCULATE we need to remember that each type of explicit filter level we add will interact with our original unseen filters in a logical order. To fully understand and progress in DAX, we need to be familiar with this logic.

The key to unlocking the full power of the language is all in mastering evaluation contexts.” – Excerpt From Definitive Guide to DAX

Here are the five levels of CALCULATE and some notes I use to troubleshoot and correct unexpected results in measures:

Five Levels of CALCULATE

Pre-Evaluation:

  • Before CALCULATE starts evaluating, a snapshot of the original filter context is taken from the report canvas.
  • All explicitly written CALCULATE filters are evaluated against the original filter context. These explicit filters will be the final CALCULATE step to remove and replace / modify original filters unless KEEPFILTERS is used.
  • This pre-evaluation ensures efficient and accurate evaluation that each filter level prevails appropriately over the prior levels in creating the new filter context.

Level 1: Attracts Original Report Filters

  • The original filter context is made up of row and column label filters as seen on the report itself, including user slicer selections, and other report level filters.

Level 2: Attracts Original Table Row > Filters

  • Next, CALCULATE performs context transition in the presence of row context (i.e., SUMX). Context transition converts the current table row (including expanded table rows) into isolated individual row(s) to iterated over. This happens automatically and is called equivalent filter context.
  • In the case of the table used in row context (i.e., SUMX(Sales,…), this table is filtered by the original filter context, however explicit CALCULATE filters do not filter this SUMX table directly, but instead will choose rows to iterate over based on defined filters in CALCULATE.
  • CALCULATE then grabs all visible values (including expanded columns from related tables) and turns them into implicit (unseen) filters, which filter over the table rows themselves. This transformation is essential for row context to be converted into something CALCULATE can use to filter the table in the model.
  • Since context transition happens after the evaluation of the original filter context, the filters created in this step will remove and replace original column filters. This can lead to issues when dealing with an arbitrary shaped set (I.e., Month Name and Year fields) in grand totals when a single iterated month name is filtered with multiple years.

Whenever CALCULATE or a measure reference is present, I envision CALCULATE attracting or drawing in every visible Level 1-2 filtered value in its current evaluation.

From the report environment, CALCULATE draws in every column, row, and slicer label selection in order to combine values as Level 1 filters over the model. This behavior is automatic and can be altered or added to later in our steps, by the developer using CALCULATE in our explicitly written measure!

This also means that row context draws in visible isolated row values visible for each current row as Level 2 filters.

Level 3: Alter Original Filters (CALCULATE Modifiers)

  • CALCULATE modifiers such as REMOVEFILTERS, ALL, USERELATIONSHIP, and CROSSFILTER are now applied. These alter the existing filter context that includes both the original context from Level 1 and any new context generated later generated by context transition in Level 2.
  • Important to note that relationships and rows depend on tables themselves… think of the FROM clause in SQL where relationships connect tables. USERELATIONSHIP needs a table reference to work, not on a relational function like RELATED.
  • RELATED is a scalar expression, however it depends on table rows with a regular relationship. If used as a scalar expression in CALCULATE you will also need a model table reference for it to work.
  • Modifiers can remove or adjust filters, overriding earlier filters if necessary. For example, REMOVEFILTERS(Product) would remove any existing filters on the Product table, including those that originated in the original filter context, or new filters generated by context transition.

Level 4: Add New Explicit Filters

  • Explicit filters are now applied from pre-evaluation to prevail or override original filters, context transition and any CALCULATE modifiers that may have altered the original filter context. It is good to note that multiple simple filters will combine to form a crossjoin of all possible combinations. This is not so when dealing with ‘auto-exist’ or valid arbitrary shaped sets.
  • For example, if REMOVEFILTERS in Level 3 cleared filters on all columns of the Product table, and an explicit filter like VALUES(Product[Color]) was also written in Level 4 in CALCULATE, this explicit filter will be applied after the evaluation of modifiers, and prevail over the columns preciously modified by REMOVEFILTERS.

Level 5: Protect Filters (KEEPFILTERS)

KEEPFILTERS is a filter modifier that applies at the very end to ensure that any of the later filter arguments (Level 2-4) have no impact on original filters.

Under KEEPFILTERS The default behavior of simple column filters that normally removes and replaces original filters by using ALL over a column filter, will simply not happen.

Think KEEP(these original)FILTERS; the original and new explicit filter are not added together in an AND condition.

KEEPFILTERS is a barrier that prevents the explicit CALCULATE ALL filter (or SUMX table values) from escaping and removing and replacing original existing filters in the model.

CALCULATETABLE()

CALCULATETABLE is similar to CALCLUATE in that it allows you to add/alter filter context prior to evaluating an expression. The difference as the name implies is that the result is a table rather than a scalar expression.
A helpful pattern combines CALCULATE, ADDCOLUMNS, and SUMMARIZE over the Time Table to create a table to iterate over and define the MAXX sales month.

ALLSELECTED is used to open our filter context to only the full range of months defined in the original filter context.

The result is defining the max sale month. By changing MAXX to MINXX we get the low sales month for our visual.

High Sales = 

var __table = CALCULATETABLE(
                   ADDCOLUMNS(
                          SUMMARIZE(TIMETABLE,
                                    TIMETABLE[Mo Year],
                                    TIMETABLE[MoSeqNum]
                          ),
                                    
                    "@Sales", [Total Sales]
                   ),
              ALLSELECTED()
)

var __Sales = [Total Sales]
var __high = MAXX(__table,[@Sales])
var __result = IF(__Sales=__high,__Sales,BLANK())

return

__result

The Many Nuances of DAX

Learning the core fundamentals of DAX requires learning from specific examples that contain complexities. As I watch SQLBI videos, I start zeroing in on common recurring fundamentals that help me better understand what it means to think like DAX.

Common topics include:

  1. Circular Dependency – There is a lot packed in this video, circular dependency, expanded tables, and blank rows. At the core is learning how table functions, filters, context transition, and expanded tables all play a role it the potential of a circular dependency error in DAX (See bulleted topic below including SQLBI video below)
  2. KEEPFILTERS & Arbitrary shaped sets – In this video we appreciate the side effects of mixing an implicit unseen arbitrary shaped filter set (from a slicer), that combines with an explicit table iterator using VALUES(). We quickly discover the competing and replacing behavior of the VALUES() table breaks the original implicit filter set, leading to unexpected results!
  3. Auto Exist – This video deals with an optimization feature called Auto-Exist that leads to an unexpected result when combined with REMOVEFILTERS. It helps us better understand how filters from multiple columns behave when the exist in the same table vs. coming from multiple tables.
  4. Using TREATAS as an alternative to USERELATIONSHIP. We need REMOVEFILTERS over original filter context.
  5. Context Transition – This video also uses the arbitrary shaped filter set example, and is packed with important DAX concepts that help solidify how DAX really works seen or unseen.

The common themes from above have led me to ask three questions when writing DAX:

  1. Evaluation Context: Do I have or need filters (CALCULATE/Measure reference), an iterator, or both? Original filter context comes from report values; Equivalent filter context comes from values in each isolated row. Both automatically create unseen implicit filters.
  2. Explicit Filters: Have we added explicit tables to filter (VALUES/ALL), or set column filters (ALL) building on unseen implicit Original or Equivalent filter context? Do they combine, compete (replace), or return unexpected results?
  3. Expanded Tables: Do whole table filters or context transition call on unnecessary unseen filters that slow performance, return unexpected results, or an error (I.e., a circular blank row dependency)?

CALCULATE and Row Context:

Since CALCULATE automatically pulls in user selections and report labels in creating the original filter context, we can use this logic to explain a similar behavior in calculated columns created in model tables.

Regardless, of being in a row context environment, CALCULATE kicks in automatically and implicitly to look for visible table values (including expanded columns) and converts them into table filters. On the reporting canvas we called this Original Filter Context. In the row context environment we call this Context Transition, which creates an Equivalent Filter Context.

In this video Marco discusses complications from context transition over rows that includes expanded table columns. Marco also provided in the Q&A a nice way to envision how equivalent filter context might look spelled out in a DAX filter:

The only difference is that on the report canvas, CALCUATE uses column values visible over all model tables in creating filters and propagating these filters over the entire model.

In a table row context CALCULATE limits its filters to column values visible from current table rows, including unseen expanded table columns. The result is context transition that creates an ‘equivalent filter context’ over the current table.

A good basic example to solidify our understanding of context transition is to use CALCULATE with COUNTROWS in a calculated column in our table. What happens if you have duplicate rows? You should notice the row count is doubled up!

This is by design, as CALCULATE has transitioned a unique set of row values into a filter set applied over the table. If two rows meet that full filter set (all column values match), then we get multiple rows to count. If there are two identical rows, then each row ends up filtering itself and its corresponding match.

New Filter Context:

  • Finally, after all five prior steps are complete, the new filter context is finalized. This new context includes:
    • The original filters that survived,
    • Any changes made through context transition, CALCULATE modifiers, added filters, and filter modifiers.
    • The re-applied explicit filters from CALCULATE.
  • This final filter context is used to evaluate the measure or expression inside CALCULATE (e.g., SUMX(Sales, Sales[Qty] * Sales[Price])), and the result is returned to the visual under the original frozen filter context, ensuring consistency.

Although our new filter context may evaluate total sales over ALL customer names, that result will be returned to our original filter context containing a row for each individual customer name.

For example, if our visual includes the original filters of Bob, Sara, Jane, and John… regardless of the name displayed, our new filter context will prevail and return total customer sales for each name listed on the report visual. In this case, a table visual.

Unseen Filters and Outer CALCULATE: Behind the scenes, DAX operates as if there is always an implicit CALCUALTE wrapped around every measure. This implicit CALCULATE dynamically applies filters from both the original and current filter contexts to evaluate the final result of the measure. Filters create a TREATAS table constructor that uses column references/values as a {virtual table} to apply filters across all relevant table columns in the model.

In addition to forming a CALCULATE with unseen filters, we also have the option of writing our own second visible ‘explicit’ CALCULATE. This second CALCULATE is nested as an inner function and allows us to further modify (ALL*, CROSSFILTER, USERELATIONSHIP) or add additional filters as table expressions.

When we see CALCULATE or a [Measure Reference], we should know that visible values are being collected as model filters, along with filters from context transition (equivalent filter context), and also not forget that implicit values related through expanded tables (when referencing whole tables) are all combined to form a complete set of model filters.

Avoid filtering whole tables, especially your (Sales) fact table! Developers must be aware of the potential of unexpected results and slower performance over large models that may include more columns than necessary.

⚡️ Circular Dependencies occur when DAX does not have a clear path on the order of evaluation. For example, if you used a measure to evaluate Margin Percentage (Margin dollars/Sales), then referenced that same measure to evaluate margin dollars (Margin Percentage x Sales), both measures would be linked by evaluating Sales and return a circular error as both depend on Sales.

Another less obvious trap of circular reference is adding two calculated columns that both use CALCULATE. Within row context, CALCULATE pulls in the seen row values in the row as well as all unseen row values through the expanded table. The circular reference occurs when both columns using CALCULATE pull in the other as a ‘visible row value’ in building the filter context. Because both columns rely on each other, neither column can be evaluated first, creating a circular dependency that the data model engine cannot resolve.

There are a few ways to avoid circular errors:

1. Identify a column as a unique key (1 to many relationship) will signal to DAX that full context transition is not necessary to establish unique rows. The unique key column is sufficient to establish the filter in the table.

2. Use REMOVEFILTERS over the first CALCULATE to modify the filter over just that first CALCULATE.

3. If you know you have a unique column, use ALLEXCEPT to remove all filters over all columns except the filter over the unique column(s) in the table.

Check out this video by Marco Russo from SQLBI on complexities and solutions to circular dependencies. The issue is caused by the relationship between a fact table calculated column and the related dim table.

We run into circular errors when the fact side (dim) filters search for the correct dim value, but the filter by definition also triggers the dim side to search the fact table for the existence of a blank dim value. Correcting this involves eliminating any filter, seen or unseen, that by definition search and produce any blank dim value.

The issue stems from context transition including expanded table columns, and the circular reference in validating the potential of also including a blank row in the set of expanded filters.

Functions that include blanks cause a circular reference… VALUES, ALL, and simple column filters (predicate filters = unseen ALL).

However DISTINCT, ALLNOBLANKROW, and the use of ‘full table references’ by definition ignore blank rows.

Note that even though the below solution contains unseen ALL in the predicate filters, the use of the whole table reference (level 3 filter element) supersedes or overwrites the ALL remove filter behavior!

We could also just address the full context transition that include dim columns that may also include a blank row. REMOVEFILTERS can be used to ignore these expanded columns.

Both approaches eliminate the need for the dim side to check the fact side for the blank row.

⚡️I’d recommend The Definitive Guide to DAX to everyone looking to fully understand and leverage the power of DAX!

Chapter 6 – Variables

⚡️Enhanced Readability: Variables break down complex expressions into understandable parts, making our formulas easier to read and understand.

⚡️Stepping Stones: I love the fact that I can start with basic elements or aggregations and build up to a more complex result using variables. Checking results along the way and debugging unexpected results is a game changer for me!

⚡️Performance: Variables calculate expressions once, no matter how many times they’re used, leading to potentially better performance. But also remember that variables evaluate where they are defined, not where they are used (⚠️ Beware of bringing outside variables into CALCULATE!)

⚡️Scope: it was interesting for me to learn more about the scope or boundaries of using variables in a single measure. A (VAR/RETURN) block surrounded by a closed parenthesis close’s off access to the earlier VAR. However without the closed parenthesis, the earlier VAR is accessible after the RETURN as it’s considered a continuation of a single scalar expression.

Other permissible ways to access a prior VAR after the RETURN include

1. Using open parentheses to delimit two overlapping RETURN expressions:

First VAR

RETURN(

First VAR – New VAR

RETURN(

New VAR — First VAR is still accessible here.

2. VAR blocks nested through open paren:

Measure = SUMX( table1,

Outer VAR

RETURN

SUMX( table2,

Inner VAR

Combo VAR = Inner VAR * Outer VAR

RETURN Combo VAR

) <— Inner/Combo closed off.

) <— Outer VAR closed off


⚡️Improved Efficiency: Streamlining complex calculations into variables saves time in both writing and processing expressions, making our analytics faster and more efficient.

Chapter 7 – Working with Iterators and with CALUCALUTE

Let’s peel back a few DAX layers to gain new perspective and insights🔎

⚡️We go beyond ‘syntax sugar’✨ and learn that when we use the SUM function in our DAX measure, that DAX is really implementing a SUMX function behind the scenes.

⚡️SUMX is a two-part iterator function that first creates a new row context over our table argument, followed by an evaluation of our expression row-by-row, resulting in a virtual column. Next, the SUM part of the expression kicks in to aggregate the result over our newly iterated column.

⚡️Iterators come with a cardinality which corresponds to the number of rows being iterated over a table. We increase iterator cardinality by nesting multiple SUMX tables. For each row of the outer SUMX table, we iterate each related row contained in the table of the inner SUMX. If the two tables are not related, then the overall cardinality can be much higher up to the total Cartesian product including all possible row combinations from each table.

From a performance standpoint, only the inner iterator is optimized (created by the storage engine) with a more efficient query plan. The presence of the outer iterator creates temporary tables in memory. This results in higher memory cost and much slower performance when the outer iterator is large.

⚡️We can reduce the iterator cardinality by using a virtual DAX table tto group totals by fewer rows, resulting in a table with a higher granularity. This often brings better overall performance to the measure.

⚡️Whenever using iterators, we must be aware of hidden complexities of using high cardinality outer iterators such as the entire sales fact table, as well as hidden iterators (measure references) and the cost of calling a measure in the expression and the performance cost that results due to context transition from the existence of a hidden CALCULATE.

⚡️CONCATENATEX gives us a convenient way of displaying values included in the filter context on the face of a matrix or table visual.

⚡️Iterators that return tables merge a table with an expression that operates using row context. This chapter highlights the use of both ADDCOLUMNS and SELECTCOLUMS. FILTER is also a common table iterator frequently used to generate an intermediate table result.

⚡️As a note, other iterator table functions include SUMMARIZE, GENERATE, GENERATEALL, UNION, INTERSECT, EXCEPT, CROSSJOIN, NATURALINTERJOIN, NATURALLEFTOUTERJOIN, ADDMISSINGITEMS.


📕 The Definitive Guide to DAX is a wonderful resource. It’s not the quick how-to guide by any means but learning the nuances of DAX together has been great for my progress 🎉 I encourage you to read along with us and comment on what you’re learning. Next week Ch. 8 – Time Intelligence!

Chapter 8 – Time Intelligence Calculations


⚡️Auto Calendar? Many options exist to create a custom and efficient date dimension table. Using the hidden auto calendar generated by Power BI is *bad practice*. Disabling this option reduces file size and points us towards modeling best practices.

⚡️Standard time functions include current period and prior period functions. Standard time functions represent a convenient and compact way to alter filtered time periods.

⏰ Here are some commonly used time intelligence functions:

DATESYTD = CALCULATE ([Measure], DATESYTD (Date [Date])) – Returns a list of dates from the start of the year to the last date in current filter context. Also use DATESMTD and DATESQTD.

TOTALYTD = TOTALYTD ([Measure], Date [Date]) Depends on an unseen CALCULATE. It is best practice to wrap CALCULATE around this measure to be aware of its presence and existence of context transition.

SAMEPERIODLASTYEAR = CALCULATE ( [ Measure], SAMEPERIODLASTYEAR(Date[Date)) This function is a version of DATEADD (see below) that allows us to look back to the same dates in the current filter context, just shifted back one year. Unlike other time intelligence functions, this one does not depend on just a start and end date range, but rather just those existing dates in the current filter context shifted back one year.

DATEADD = CALCULATE ( [ Measure], DATEADD(Date[Date] , -1 , YEAR)) Similar to SAMEPERIODLASTYEAR but more flexible as it can look back to the previous Quarter, Month, or Day! This function returns the full shifted period, meaning that if your sales in the currently filtered year only go through June, the result of DATEADD will add all twelve prior months into the current year in order to return the full prior period. The function also filters lower granularity time periods (i.e. Month).

PARALLELPERIOD = CALCULATE ( [ Measure], PARALLELPERIOD( Date[Date] , -1 , YEAR)) This function is similar to DATEADD however this function removes filters over lower granularity time periods (i.e. Month) and will show the full shifted results on each row of the visual.

PREVIOUSMONTH = CALCULATE ( [ Measure], PREVIOUSMONTH( Date[Date] )) This function is similar to parallel period; however, it only returns the previous month based on the first month filtered in the current selection. If we have Qtr. 1 of 2024 selected, the first month in the filter is January 2024, and the previous month is December 2023. With the full quarter selected, we will only get result for the single prior month.

⏰ Custom Time Intelligence

If you need more flexibility in your time intelligence calculations, you are able to write your own by combining a well defined time table along with filters that modify the current filters to establish a date range that meets our end goal.

It is common to use CALCULATE([Sales], Prior MonSales = FILTER(ALL(TimeTable), TimeTable[Month Sequence] = This Month – 1)).

The important highlight here is that custom time intelligence functions depend on first clearing filters placed on the time table. We need to establish new filters over prior or future time frames.

We clear filters over the whole (ALL) table is to ensure that any and all time related columns used in the report visual or slicer are also ignored in evaluating our measure, not just the [Month Sequence] number we are defining in our measure example above.

It is also important to note that time intelligence functions that establish filters based on the date column, where the date column is also the one sided key to our fact table, will automatically include an unseen ALL over our Time Table. However, if our measure is based on clearing another date column, we are left adding this behavior ourselves.

Also, if we happen to have a relationship based on another column that is not date (I.e. a date key as an integer), we could also define the entire table as a date table to trigger this same automatic ALL table behavior.

A tip for solving for the first and last dates of a given report month shifted back one year is to solve for the current last day shifted back one year as: EOMONTH( MAX(Date[Date]) , -12), and first day as: EOMONTH( MAX(Date[Date]) , -13)+1; as there is no SOMONTH function. The start and end dates can be used with DATESBETWEEN 📅

⚡️Comparing amounts by order date vs. ship date? USERELATIONSHIP is a powerful CALCULATE modifier that helps us effectively switch relationships, avoid duplicating our time dimension, and leverage one date field to return different results.

⚡️ Automatic ALL! When two tables are related on date, an automatic ALL kicks in ✨ and helps us shift time to return results based on the desired time frame. If your time dimension isn’t related on date, there are options to help maintain this automatic behavior.

⚡️DATEADD is a powerful and flexible time function that allows us to shift time over different timeframes using sophisticated logic.

⚡️Semi-additive measures provide a single ending balance at the customer level, while also providing a sum of all ending balances at the grand total level. LASTDATE and LASTNONBLANK are two options when creating semi-additive measures. LASTDATE returns values on the last date of the filtered period (including no values), where LASTNONBLANK iterates dates in search of the last date with a value. I had to read this a few times to fully appreciate the differences and how to return logical totals.

⚡️Last but not least, if you choose to learn how to write full versions of time intelligence (CALCULATE, FILTER, and ALL) you will not only enjoy better control over the full time dimension table, you will also be able to write functions that work at the week level.

DAX definitely requires peeling back some layers to fully understand key concepts. Being able to learn with our SML group is a great way to work through the DAX bible 📚 I encourage you to read with us and share your thoughts in the comments.

Chapter 9 – Calculation Groups

🚀Calculation Groups to the rescue!! 🛟 Here are some notes I took along the way.

🛑Eliminate Measure Madness:
Calculation Groups allow us to define different CALCULATE patterns as Calculation Items in our model. Need to calculate prior year Sales, Cost, and Margin? 🧑🏻‍💻 Calculation groups define this ‘prior year’ pattern once for all measures rather than requiring us to write this multiple times over separate measures. When applying a common pattern to multiple base measures, Calculation Groups are here to help!

⚡️Power to report users!
The chapter also shows us how to add a second Calculation Group listing base measure references; allowing report users to toggle between single measures in the visual. Want to select and display multiple base measures at once? Field parameters (with measures) now allow report users to select different combinations of measures to be displayed on the same visual!

⚡️Enhanced Performance:
By centralizing common calculations, Calculation Groups can significantly improve our model’s performance. Instead of duplicating patterns across multiple measures, we can leverage it once and apply it everywhere, reducing complexity.

⚡️Streamlined Development Process:
For developers, Calculation Groups mean less time spent on repetitive tasks and more time focusing on delivering value. Using Calculation Groups ensures that our data model remains efficient and scalable. Adding Calculation Groups to our skillset can dramatically elevate our modeling and reporting capabilities.

⚡️Sticking with best practice:
A healthy appreciation for what lies just beneath the surface of DAX is critical to success. Respecting best practices can help us avoid wrong results. 1. Stick with single measure references in calculation items, and 2. Invest in learning about recursion and the power vs. the risk of introducing this in your report.

Calculation Groups will enable me to build more dynamic, efficient, and sustainable models. I’m thankful for the opportunity to learn and share together. If you’re working through this book and leveling up in DAX, I’d love to hear your thoughts in the comments! Next week chapter 10!

Chapter 10 – Working with the Filter Context

🚀 This is a great level-up chapter 📈. It fills in gaps for me with practical ways to test, modify, and appreciate filter context.

⚡️ IF/HASONEVALUE / SELECTEDVALUE:
If you need anything to evaluate differently based on the specific matrix granularity, regardless of if it’s a change in aggregation, or just leaving a row level blank, IF/HASONE.. and the newer SELECTEDVALUE both do the trick 🪄

⚡️ISFILTERED / ISCROSSFILTERED:
When one column is filtered, all neighboring columns are cross filtered. Be careful to not fully depend on the TRUE/FALSE result… If there is a filter condition applied to the column it may not result in actual table rows being filtered! 🤷🏼‍♂️a better option is suggested involving row counts.

⚡️ VALUES vs. FILTERS:
VALUES produces unique values visible in the current filter context; FILTERS on the other hand returns values being filtered. Sounds similar… may produce different results.

⚡️ALLEXCEPT vs. ALL/VALUES:
Both techniques remove filters and keep specific filters intact, however there is an interesting nuance where ALLEXCEPT depends on the EXCEPT being included in the original filter context, filters from the visual.

From a model perspective, ALL* only works to REMOVE or keep just select (EXCEPT) direct column FILTERS within the original filter context. Although adjacent table columns might remain in the original filter context, and ‘cross-filter’ the column used in EXCEPT, it will be ignored, as the EXCEPT filter is not directly filtered in the original filter context. This leaves just an ALL modifier removing all table columns.

Think of ALLEXCEPT as REMOVE-ORIGINAL-FILTER + Just KEEP-ORIGINAL-FILTER. Like a KEEPFILTERS used to protect the remove filter table from modifying the chosen column filters.

The EXCEPT part will overlook the removal but will not create or restore a filter that does not already exist in the original filter context.

To restore column filters not in original filter context, use ALL + VALUES as a more durable option. Unlike EXCEPT, VALUES filters table column and row values from the original filter context, leaving surviving row values, including adjacent cross-filtered values.

ALL/VALUES gives us the best of both worlds. Because VALUES creates or restores filters, this method works with or without the filter existing on the visual. ALL/VALUES is favored and considered a more durable ‘filter restorer’ vs. ALLEXCEPT. If multiple columns are needed for VALUES, use SUMMARIZE(Table, Columns…).

⚡️ALL avoids context transition?
Ch.5 taught us that Context Transition evaluates before the application of Calculate modifiers (I.e., ALL*). However, in this chapter we learn that using ALL in the presence of an iteration can preempt the need for evaluating context transition. A good example is shared using a calculated column in a product dimension.

⚡️ ISEMPTY:
Helpful in identifying dimensions (customers or products) that don’t have related fact sales. This is also efficient and preferable to scanning tables and counting rows that evaluate 0.

⚡️ Lineage:
I relate lineage to a home 🏠address for each model column. The values populated in the column also belong to this address. It’s the filtered column values, tied to the column, both tied to lineage, that allow filters to flow through our model.

⚡️TREATAS seems like magic: 🪄✨
TREATAS updates lineage of the first table expression with another column (or columns) in our model. If a physical relationship isn’t possible, TREATAS is a nice option. I like to say that we TREAT the table expression AS the filter over our table column(s).

⚡️Column filters are a list of values for one column only. Color IN {“Red,” “Blue”, “Green”} is an example of a column list of filtered colors.

A Simple (predicate) filter is a set of column filters Color IN {“Red,” “Blue”, “Green”}, Year IN {2022, 2023, 2024} is a set of column filters. Simple filters can also be combined in TREATAS as a set of tuples: TREATAS ( { (“Red”, 2022), (“Red”, 2023),… to satisfy all possible filter combinations.

To keep this simple, it is preferred to provide just two column filters rather than all combinations in a table constructor.

When using filters as table filters in CALCULATE, there are two types of filters,

Simple (predicate) filter: for column = value or column IN {values} conditions.

Table filter: Any condition that relies on a direct table expression i.e. VALUES(ColName) without specifying a predicate (=,>,<,IN…). See below when we discuss ‘Virtual Relationships’ using TREATAS, INTERSECT, and CONTAINS.

In general, if there are two competing simple column filters, one from the report, and another we explicitly add in CALCULATE, we know the explicit filter will overwrite the report filter by calling the FILTER(ALL…) table functions to remove and replace existing filters.

We’ve also learned that everything on the report canvas has CALCULATE wrapped around it. This also includes measures where we already write our own CALCULATE in the measure.

Two CALCULATES? Why would we need another outer CALCULATE? In my mind two CALCULATES provide separation between the original ‘outer’ and the ‘inner’ filters in my measure.

The outer filter is expressed in the form as a TREATAS table (Fig 10-23) and is made up from all the current filters coming from the report visual and slicer selections. The inner CALCULATE combines with the outer CALCULATE to form a new filter context.

For example, an ‘inner’ CALCULATE filter that specifies Color = “Red”, competing with an ‘outer’ filter Color = “Blue” will result in a new filter context containing Color = “Red”. Even in the presence of other colors on the visual… the report label may say Blue, or Green, but will always show sales of Red product.

Our inner Color = “Red” filter is modifying filters by first removing the column filter over product color, then next filtering the open color column back to Red. This happens for each cell or point in our report visual.

⚡️KEEPFILTERS as a filter modifier has a recent best practice of when to favor using KEEPFILTERS around column filters that would normally modify and remove outer filters and overwrite the filter with an alternate result.

With our example from above, we learned that Color = “Red” as a simple column filter in an inner CALCULATE will overwrite any like-filter coming from the outer (report) filter context. In the case of filtering over one simple column filter, we want to keep the inner overwrite behavior in our report as we want to compare red sales to all other colors regardless.

However, what if we are filtering sales, by brand, to identify only trendy color sales? If trendy sales included red, blue, and green; and we give our report user a slicer where they only choose the color green, we want our trendy sales to represent only green in trendy sales and no more.

In this case our inner filter is trendy red, blue, and green. Our user chooses the outer filter of green only. We don’t want the inner filter to overwrite, we want the logical ‘and’ of red, blue, and green to add to the selection of green. The only surviving color in this case is green. In this case the best practice would be to use KEEPFILTERS when filtering multiple values from a simple column filter.

A similar example can be found in filtering a range of sales price. Even though this is a range, it still should be considered a filter of multiple column values. Similarly, if we create a sales measure that returns all sales over $500 in price, then filter by brand, then further give our users a slicer to narrow price to a specific range (Say between $300 and $700), we’d expect that sales compared to ‘Sales over $500’ would be relevant and comparative to the current filtered sales transactions. This means, yet again, we should use KEEPFILTERS to retain both filters in our report.

The best practice is this: If you are using CALUCLATE to filter one single value, such as value of all red product, you would likely want to allow the modified result to flow through into your report. However, if you are filtering over multiple values in a single column, or filtering over multiple columns, you will likely benefit from using KEEPFILTERS to retain the outer filters.

⚡️Arbitrary Shaped Filters:
It’s worth learning about simple (1+ column filter sets) and how they can represent an arbitrary shaped filter. An arbitrary filter is a valid type of filter that depends on a hierarchy of two columns (often structured as a drill down in a slicer), the hierarchy ensure each sub selection only filters the rows limited to the range of the top-level filter. In DAX, this ensure that not all combinations of month/year are filtered. For example, if you filter December 2024 and January 2025, you want a filter over just two months, this works in a hierarchy slicer, however, when filtered within a DAX function, you will lose the shape of the hierarchy and end up filtering all combinations (in a crossjoin) of month and year.

First complication comes from a matrix of year and month amounts. If we use CALCULATE to add an explicit (inner) filter of Year = 2023, then every implicit (outer) report filtered month will be replaced and overwritten with Year = 2023! Our outer implicit report filter on month no longer is tied to its respective year; year is overwritten, and the hierarchy filter set is broken.

In this case we should use a slicer that supports a hierarchy or establish a single column filter that includes both month and year.

Taken a step further, we could use an iterator measure over the month column, then select Oct-Dec 2023 and Jan-Mar 2024 in our slicer. Again, the inner filter wins as the month in the table iterator overwrites the outer report level month filter, again detaching and breaking the arbitrary filter set.

In this case we should use KEEPFILTERS wrapped around the VALUES(Month… table expression in the iterator to protect the outer filter (slicer selection) from being overwritten by the inner CALCULATE/Iterator and the resulting context transition. Note that the table argument of the iterator measure is what becomes a CALCULATE filter by the measure during context transition. This is why we place KEEPFILTERS around the table expression / iterator.

In the case of an arbitrary shaped set, without KEEPFILTERS, context transition behavior is to remove the filter set over month, and apply a crossjoin between the overwritten month and all years selected (and unaltered) by the iterating VALUES(Month) table expression. This month captured in context transition, combined with all filtered years (crossjoined) breaks the original filter set, and may provide unexpected results at the total line.

The fix is to combine the filter set into a single month-year column field or add KEEPFILTERS to ensure we add the iterated month to the existing filter set, and don’t overwrite one of the columns that depends on the arbitrary shaped set.

⚡️Virtual Relationships:

Let’s review the two main types of filters in DAX:

Simple (predicate) filter: for column = value or column IN {values} conditions.

Table filter: Any condition that relies on a direct table expression i.e. VALUES(ColName) without specifying a predicate (=,>,<,IN…).

When using direct table filter it is important to know that a table filter from a dim table will not filter a matching or related column in another table, unless the lineage is transferred using TREATAS (Table), INTERSECT (Table), and CONTAINS (Scalar).

The best performance over the model comes from regular many to one relationship. However, there are times that we need measure results that depend on new relationships that would introduce ambiguity in the model. In this case we can avoid adding physical and instead add virtual relationships.

I think that virtual relationships are very instructive from a model perspective and also helps me better understand how Power BI queries the model.

TREATAS:

TREATAS is widely used under the hood in the DAX query providing filter context over the model. TREATAS can also be used in CALCULATE to transfer lineage between two separate table columns that wouldn’t normally share the same lineage.

I use TREATAS to compute ratios over only months where both the numerator (Purchase) and denominator (Total Production) exist. This ensures that the current ratio average on the Total line is accurate including only months with both entries present.

In the example below, I’m show in ‘Customer Wrong Total‘ how incomplete months gross up and mis-state the ratio on the total line. This error is due to missing production in the third month. However, when we use TREATAS in ‘Customer Correct Total‘, web limit our total average to only the months with both Sales and Production entered.

In the next example we use both INTERSECT and CONTAINS to filter disconnected dim table that would otherwise cause ambiguity in the data model if connected with a physical relationship.

Both examples are options in creating a virtual relationship, but are not as performant as TREATAS over large data sets.

INTERSECT:

Intersect returns a Table. It reminds me of how expanded table concept is an illustration of the intersect between a dim(s) and fact tables. Click here to learn more about INTERSECT on DAX.GUIDE

CONTAINS:

CONTAINS returns a scalar value that returns TRUE when matches are found. Click here to learn more about CONTAINS on DAX.GUIDE

Result of both INTERSECT and CONTAINS:


If you’re also reading The Definitive Guide to DAX, let me know in the comments how this book is helping you level up. 📚 Next week Ch11!

Chapter 11 – Handling Hierarchies
 
We learn right away that DAX does not support true parent/child hierarchies. This means we must perform steps that help flatten our P/C relationships using series of columns, measures, and new (to me) functions. In the process we build a structure that allows for proper aggregation over each level of our hierarchy in a matrix visual.
 
Let’s journey down this path together! (Punn intended 🙂)
 
⚡️Creating the PATH:
Our first step is to calculate a new column that identifies a combination of all relevant parent/child keys related to each person in our table. The book illustrates a simple yet powerful PATH function that helps us establish a base ‘Full Path’ column. By doing so, we are creating the foundation for a regular hierarchy that works in Power BI.
 
⚡️Defining Levels:
Next task is to LOOKUP each name as found in our earlier Full Path column. This establishes a different column for each level in our hierarchy. This layered approach gives us a first look at aggregated values in our matrix.
 
⚡️Filling in blanks:
Next, we modify our series of level columns to replace any blank levels with valid names. This involves finding the depth of our Full Path using PATHLENGTH and returning appropriate names by level. If we reach the end of our levels early, we simply fallback to the person occupying the last valid column level. This ensures we don’t show blank levels in our matrix visual.
 
⚡️Creating relevant aggregations:
At this point, our matrix may show repeating sales amounts based on repeated names as defined in our level columns from above. The fix involves using PATHLENGHT over our earlier Full Path column to establish a base ‘Node Depth’. Next, we create a measure using ISINSCOPE to help us determine our browse depth. Both are used in a condition that defines at what point we stop aggregating unnecessary levels.
 
⚡️Are we there yet??
Not quite 😂 We now discover that although the aggregation works at the parent level, we may want to include itemized amounts that also include the parents own contribution to sales. The parents own sales combined with sales from children help properly itemize the expected grand total under each parent. To accomplish this, we need a final column that evaluates if we are at the leaf node (a person with no children) or not.
 
⚡️Final act:
Our grand finale is to create a single measure that steps through a series of IF-then logic in determining exactly when to include and omit aggregations in our hierarchy.
 
I’d highly recommend this chapter as a step-by-step guide. I appreciate the books very small sample dataset and clear instructions that helped me mirror each step and result. Let me know in the comments how hierarchies help you create new insights. Also, I’m very thankful for our SML community group allowing us to learn together. Next week Ch 12, Working with Tables!

Chapter 12 – Working with Tables

⚡️ CALCULATE vs. FILTER:
CALCULATE applies filter context through table filters before evaluating the expression, whereas FILTER iterates over the table argument before the expression. They function differently; however, both commonly combine both row and filter to achieve the desired outcome.

Although we can add a measure or measure reference to any table function to combine context transition to produce an aggregation, there specific table functions and patters that have been recommended over others. The below explanation of each table functions points out specific 💪strengths and ⚠️weaknesses.

⚡️ ADDCOLUMNS: A convenient iterator that enhances an existing table. It’s often used in conjunction with CALCULATE or a measure reference to facilitate context transition during evaluation. ⚠️Note that this table function produces a complete list of column values… If adding an aggregation, the column values with no aggregated results will not filter out but still show in the resulting table.

⚡️ SUMMARIZE: Functions similarly to a pivot table in DAX, preferred for just grouping values by unique combinations of column values in a table expression. SUMMARIZE uses a base table, often Sales fact, using row context over the expanded fact table allowing us the option to create filter context over multiple columns and model tables. While this ability is there and can produce a measure result, it is not recommended.

🛠️The work-around (pre SUMMARIZECOLUMNS with measures) was to use SUMMARIZE with ADDCOLUMNS, as adding a DAX expression directly within SUMMARIZE is strongly discouraged.

More Limitations of SUMMARIZE:

⚠️SUMMARIZE Was Not Originally Intended for Measures

SUMMARIZE was designed primarily to group columns or tables, not to compute measures. While the syntax does allow you to add expressions, doing so can cause ambiguous or unexpected contexts when measures reference other table columns, during context transition, that are not explicitly in the group-by list.

⚠️ Unpredictable Filter Context for Measures

One of the biggest pitfalls, highlighted repeatedly in SQLBI materials, is that SUMMARIZE does not clearly define how the filter context is applied for the measure. Jeffrey Wang, has noted that SUMMARIZE can create unexpected (row-to-filter) context transitions, especially when the measure depends on columns or relationships not part of the group-by set.

⚠️ Row Context vs. Filter Context Pitfalls

Because SUMMARIZE was introduced before the DAX language matured to handle more sophisticated patterns, it does not provide a straightforward method to control how row context transitions into filter context. Marco Russo and Alberto Ferrari frequently point out that functions like ADDCOLUMNS and SUMMARIZECOLUMNS give more explicit control over these contexts, reducing ambiguity.

⚠️ Only one root table

Because SUMMARIZE requires a root table, it works well with all columns from dimensions that expand into one fact table, however if wanting to group by columns from two fact tables, you will run into limitations.

Performance Considerations

🚀From a performance standpoint, SUMMARIZE can cause DAX to evaluate measure expressions multiple times 🚩under multiple contexts, especially if the grouping is done in a way that the DAX engine does not optimize well. SQLBI often show that SUMMARIZECOLUMNS is more optimized for measure aggregation scenarios, and ADDCOLUMNS is more optimized for row-by-row computations, leading to better performance when used correctly.

⚡️SUMMARIZECOLUMNS: I view this as similar to SELECT in SQL. It is widely used to query a table result including expressions you may want to include. February 2023 allowed us to use this function in measures as it now creates context transition creating filter context. In June 2024 this function supports transition as well as external filters present in the measure.

🏆SUMMARIZECOLUMNS as the Preferred Choice

SQLBI emphasizes that SUMMARIZECOLUMNS was introduced as a table function (not for measures) to address many of the shortcomings of SUMMARIZE. Table query SUMMARIZECOLUMNS natively relies only on filter context, avoiding row context (a limitation for use in measures until 2024), expecting you to specify each grouping or filter column explicitly. This design makes measure evaluation more predictable.

This function has evolved into use with measures, it applies non-empty semantics that is an advantage over ADDCOLUMNS, filtering out any group-by column that does not have a result in the added expression. Previously, if you wanted to replicate this elegant behavior with a measure, you would have to combine ADDCOLUMNS, FILTER, unique VALUES, where NOT ISBLANK[Measure]. Then after that table has been defined, add the [Measure] reference again. ⚠️Although this works, it is less than ideal as this requires calling two [Measure] redundant evaluations.

Screen Images from Jeffery Wang – Watch the full video here.

Advantage: In the two examples above, Test Measure 1 from above is a traditional table filter that requires two evaluations of [Total Sales] as a model measure, where Test Measure 2 potentially has a performance advantage calling one reference of the model measure [Total Sales], via a local measure extension in “@Total Sales”. This advantage comes with the ability of SUMMARIZECOLUMNS to now use row context with context transition, something not possible before, as SUMMARIZECOLUMNS was limited to only filter context.

Caution: Self-Value Semantics: Limiting the filters over the group-by column to only those from the first table. The group-by behavior does not extend out to columns in other related tables. Here, instead of “non-empty semantics”, we instead see a cross-join type behavior.

In the example below, Measure Test1 includes a measure expression that naturally includes non-empty semantics, and indirectly filters the Product table, even though our group-by column is coming from another Category dim table. Since Test 2 does not have a measure reference, we can see the results for Test2 is not filtered through the group-by values coming from the Category dim table. This limitation of group-by filters to just its own table is Self-Value Semantics.

Suggestion: Only use SUMMARIZECOLUMNS with good measures. If you want to just summarize columns (w/o a measure expression) use the SUMMARIZE table function.

Caution – Coalesce and Auto-Exist: The group-by column and filter 1 are merged together (potentially creating an unnecessary performance expense) and coalesced (joined non-blank values) with filter 2 from the same Product table. This coalesce eliminates null combinations with auto-exist when coming from the same table.

Filter 3 is an independent filter from the Territory table.

Fixing Auto-Exist: In this example we turn on the new Independent Value Filter Behavior turning each filter (solid arrows) into its own independent filter that collectively (including the group-by table column ‘Product’) creating filter context over the Product table (dotted arrows). This new behavior is more intuitive and in line with user expected filter behavior of the SUMMARIZECOLUMNS table function.

The result of this measure under the old coalesce behavior returns only White, as the Mountain Bike Socks product only contain a white product option.

For more information on recent changes to filter behavior surrounding Auto-exist, see chapter 13 below.

⚡️ CROSSJOIN: Creates a Cartesian product, representing all possible combinations of column values. It’s particularly useful for generating a table of value combinations across different tables in our model. Whenever you need to filter two columns coming from different tables, you can use the table function CROSSJOIN to join all possible combination. Used as a CALCULATE table filter, we can iterate and FILTER our desired result.

CALCULATE([Expression],

FILTER(

CROSSJOIN( ALL(Customer[HasKids]), ALL(Product[Type])) ,

Customer[HasKids]=“Yes” && Product[Type]=“Toys”)))

⚡️ROW is a table function designed to return just a single row. It requires pairs of column “names” and expressions to generate column names and measure values.

⚡️ However, the table constructor syntax is now used more often than ROW. Curly braces define a table object. Parentheses encapsulate multiple columns in a single row. Commas between the parentheses sets separate different rows.

I particularly enjoyed the example in the book where two variations of an OR condition are created using SUMMARIZE and CROSSJOIN. The choice between the two depends on the table size and effort to scan. With a large fact table, SUMMARIZE might be slower than CROSSJOIN, which accesses dimension columns directly.

These insights are just a few highlights from this chapter. Other table functions like UNION, INTERSECT, EXCEPT, and SELECTEDCOLUMNS, with practical examples, show how versatile and efficient using tables can be in our model.

If you’re also learning from The Definitive Guide to DAX, share your thoughts in the comments! Next week we tackle Chapter 13 on Authoring Queries!

Chapter 13 – Authoring Queries

💡Most of the query functions in this chapter can be leveraged in measures.

⚡️ SUMMARIZE with ADDCOLUMNS and SUMMARIZECOLUMNS:
Two functions that are invaluable for grouping and aggregating data, allowing for dynamic filtered measures. See the Feb 2023 update on SUMMARIZECOLUMNS available on DAX guide. Also beware of adding columns in SUMMARIZE due to complexity related to mixing row and filter context by adding expressions within this table function.

Per DAX.GUIDE: Until February 2023, SUMMARIZECOLUMNS did not support evaluation within a context transition at all. In products released before that month, this limitation made SUMMARIZECOLUMNS not useful in most of the measures – it was not possible to call a measure SUMMARIZECOLUMNS in any case of context transition, including other SUMMARIZECOLUMNS statements.

🥇 TOPN:
Rather than returning all table columns with TOPN, I found it very practical to combine SUMMARIZE with TOPN, reducing query (measure filter) to just those columns that are relevant to our filter result.

⚡️ GENERATE:
Helpful when needing to iterate over a row context while also respecting a higher-level grouping such as calendar year. The chapter walks through a nice example of solving for the top three selling products for each calendar year filtered.

⚡️ Auto-Exist:
Filtering two columns from the same table result in both columns combining (binding) into in a single column filter. This is called Auto Exist(s) and it is by design is an optimization feature of SUMMARIZECOLUMNS as used to pre query the original filter prior to any downstream modification of filters. For example, it doesn’t make sense to filter combinations of Country = Europe, and State = Washington from the same table as they exist in the same table. This is why Auto-exist, exists, to eliminate unnecessary combinations that don’t exist in the same table.

🚩The conflict comes when this (Coalesce) behavior ends up limiting the original SUMMARIZECOLUMNS query right out of the gate, then asking the query to modify (ignore) one filter but not the other. With auto exist, the original query, based on the model table, is limited to only the original combinations that exist with the single filter chosen.

Because of auto exist, if we remove filters on one half of the bound values, we may inadvertently wipe out both values from the original filter SUMMARIZECOLUMS table result (see example report below).

However, with the (Independant) cross join query behavior, all possible combinations are included in the original SUMMARIZECOLUMNS query, therefore both filters regardless of being respected or ignored, exist in the original independent cross join query, and can effectively return expected results.

If two columns come from two different tables, all is good, and SUMMARIZECOLUMS produces a cross join naturally, including all possible combinations (i.e. Europe and Washington). The cross-join makes all possible combinations available to filter, including expected results when CALCUALTE modifiers apply after the fact to ignore one un-bound column filter.

From a performance standpoint, we said earlier that the auto-exist (Coalesce) behavior was a performance optimizing technique. However, Jeffery Wang noted in the below video that it depends on your model. He shares that a query can be optimized under the new Independant filter value behavior as there is simplicity in just providing each column as independent filters, rather than going through the join exercise of the auto-exist that joins existing combinations and filters them down to only existing combinations in the model.

Here is the SQLBI article

Here is the video with Jeffery Wang

With auto exist, the OR behavior we are expecting, becomes an AND similar to CALCULATE table filters, providing unexpected results.

Click on this sample Auto-Exist report as an example:

Both examples below share the same model, please clear slicers before trying the other side of the report.

Another way to see the unexpected results is to write out a CALCULATETABLE function adding the two filters from the same table as two filter conditions. Each filter will evaluate however and the condition in the same table will not produce all desired table rows as one filter may limit surviving column values.

Understanding its behavior helps prevent potential pitfalls in certain filtering conditions.

A new preview feature release in October 2024 allows us to change the properties of the semantic model to protect it from the unintended consequences of auto-exist. It is called Value Filter Behavior. It forces the behavior away from a coalesced filter (combined, bound), to treating each column as independent filters. Turn this on from the model view of your semantic model:

More notes on SUMMARIZECOLUMNS

SUMMARIZECOLUMNS is a table function that has seen much improvement in functionality over recent years. Historically this table function used only filter context. Lacking row context and context transition limited this function from being used within measures as most measures involve row context and context transition in CALCULATE.

SUMMARIZECOLUMNS without a measure or aggregation reference produces a simple cross join of filter values. With a measure aggregation, SUMMARIZECOLUMNS looks to the expression first to filter out all filter combinations that return no aggregated results (filters out 0).

The filters in this function are limited by the auto-exist filter behavior when column filters come from a single table. Recent changes to this behavior have been introduced in preview to switch from the original coalesce treatment of these filters from the same table and have introduced a new value filter behavior “Independent“, which does not pre query common table columns into only existing combinations but instead keeps each value filter independent as its own value filter.

As far as performance comparisons between the coalesce vs. independent value filters, of course it always depends. Jeffery Wang shared on a video that the new independent value filter had the performance advantage of not having to join or merge existing combinations of two columns from the same table, as seen in producing the existing / default behavior of auto-exist.

Chapter 14 – Advanced DAX Concepts

⚡️Expanded tables:
Filter context works by propagating table filters from the one side of our relationship to the many side through key fields. This relationship is efficient at transferring data lineage from the primary key to the foreign key to ensure an apples-to-apples comparisons are made. However we can also think of relationships in terms of expanded tables.

Expanded tables conceptually expand columns on the many side to also include all related columns from the one side of the relationship.

With expanded tables there is no need to consider filters placed on the one side… If all values expand into the many side, then we can ignore the one side, and know that filters being placed on one side dim table are also mirrored over the expanded columns on the many side.

Note that expanded table is a concept that illustrates that filters between both tables are forming an intersect through a regular many-to-one relationship.

For example, when we write a CALCULATE function to count Customer [Number] rows with sales orders in the Sale table (See _Wrong variable below), we could try to use the values listed in the Sales table column ‘ Cust#’.

This seems logical, as the list of customer numbers should filter the values in the related column in the Customer table. This fails; although the column values are the same, the lineage (that makes the column and values unique) is different.

However, if we use the entire Sales table as a filter (See above __right), we do get the expected result!

Since all of the columns from the Customer table (one side) expand into the Sales table (many side) through a regular relationship, the expanded Sales table acts as a filter (with the same lineage) back over the Customer table! Expanded table is a concept that illustrates how applicable rows on the many side relate back to the rows on the one side of the relationship.

This lineage-based filter propagation is what enables DAX to filter and aggregate data across model tables. Simply put, DAX is a language of tables filtering tables. Once you grasp this concept, you should experience a confidence boost in your knowledge of DAX!

⚡️Table vs. Column Filters:
Full table filter references include columns from the entire expanded table! If we use FILTER (ALL(Sales Table)…, we are asking the engine to include all native table columns as well as expanded columns from other related tables in our model. Expensive 💵 in terms performance and risky🫣 in terms of report accuracy. A strong rule of thumb is to keep filter arguments limited to just those columns necessary to achieve our results.

⚡️ALLSELECTED and Shadow Filter Context:

When used as a CALCULATE modifier, I think of ALLSELECTED as an ALL followed by VALUES restoring only the filters “selected” outside of the report visual.

This is a simplistic view. The reality of ALLSELECTED is very complex! ⚠️ please read the book or this SQLBI article for a complete definition. Here is my very high-level takeaway:

ALLSELECTED is a complex DAX function that can return a table of values or adjust filters based on the last “shadow filter context.” Shadow filters are hidden filter contexts that iterators (functions like SUMX) create. These filters don’t affect the result unless explicitly triggered by functions like ALLSELECTED.

  1. Shadow Filters: When an iterator runs (like SUMX), it generates a temporary filter for the rows it processes. This is the “shadow filter.” ALLSELECTED can “wake up” this shadow filter and use it in calculations, replacing the current filters.
  2. Different Uses:
    • As a CALCULATE modifier, ALLSELECTED can restore the filter context from a previous query, showing only the values relevant to that context.
    • As a table function, it returns all values that were visible under the last shadow filter context.
  3. Why it’s Complex:
    • ALLSELECTED behaves differently depending on whether it’s used with a table, a column, or no parameters.
    • Iterators create shadow filters, but ALLSELECTED can change the result by activating those shadow filters, causing it to show values that would otherwise be filtered out.
  4. When to Use:
    • Use ALLSELECTED to get the filter context applied to visuals (like tables and pivot tables) but be cautious when using it inside functions that iterate over rows. Iterations combined with ALLSELECTED can lead to unexpected and confusing results.
  5. Golden Rule: Avoid using ALLSELECTED in measures with multiple iterators. Instead, use variables to simplify and control calculations.

In short, ALLSELECTED can be tricky but powerful if you understand how it interacts with the filter contexts and iterations in DAX.

⚡️Shadow Filter Query Steps:

  1. CALCULATETABLE may create an outer filter context querying outer filter slicer values however these filters do not play a direct role or ensure that filters are applied through ALLSELECTED as a table function.
  2. VALUES is a table function that respects filter context and will filter a shadow filter context based on outer CALCULATETABLE filters,
  3. ALL is also a table filter, however does not respect filter context and will ignore those same filters. In terms of shadow filter context, VALUES and ALL will evaluate different results. When using ALLSELECTED including an ALL table, the shadow filter will include all values regardless of the filter context.
  4. ADDCOLUMNS and other such iterator functions are needed to create the shadow filter. Shadow filters representing all visible values under the filter context prior to starting the iteration.
  5. Iterators: Normally the iterator function would take the table argument and create a new iterator representing isolated rows, ensuring each row is evaluated one at a time.
  6. Restoring shadow filter context: However, when ALLSELECTED is called, it restores the last shadow filter context produced by the iterator function in the table argument (VALUES, ALL …). The shadow filter context reveals all the rows visible under the original filter context.
  7. Replacing iterated rows: All shadow values become visible as explicit filters and replace the current iterated row.

When thinking about ALLSELECTED, think ALL-ITERATED-SHADOW (AIS). An iterator and a shadow filter table are required; ALLSELECTED will not use the filter context of the model directly.

I also see ALLSELECTED as using filters somewhere between ALL (over ALL the model) and the current (VALUES) filter context in the visual itself… it’s the Shadow Filter Context, and represents the filters necessary to evaluate the grand total in the filtered visual!

ALLSELECTED can be viewed similar to the trigger of FC with CALCULATE, however ALLSELECTED catches the shadow FC.

A few rules to ensure ALLSELECTED works:

  1. Shadow filters need an iterator. Without an iterator, ALLSELECTED does not work.
  2. Beware of multiple iterators prior to executing ALLSELECTED, as you may encounter unexpected results.
  3. If the columns passed to ALLSELECTED are not filtered by the shadow FC, then nothing happens.

🏆The golden rule is to only use ALLSELECTED in a measure used directly on the visual. Do NOT reference a measure using ALLSELECTED within an iteration.

When report users make slicer selections, those selections become a table/list of outer filters wrapped around our measure. Using ALL() as a CALCULATE modifier would normally ignore these outer filter selections in our result. However, ALLSELECTED, through an iterator, and a table function that respects filter context, indirectly respects the outer ‘selected’ filters producing the last shadow filter in our measure. If our measure already iterates over ALL table values, then that full list of values becomes our last shadow filter. There is much more to appreciate with this function, including a best practice to limit our use of ALLSELECTED to measures directly applied to a visual.

⚡️ALLEXCEPT:

We also see a different use of ALLEXCEPT that includes two table arguments. With expanded tables it is permissible to use the base table name, then another table name, which expands into the base table. The result is that all of the columns in the expanded table reference is ignored in the ALL removal of other base and expanded column filters.

⚡️Data Lineage:
I think of data lineage as a unique home address 🏠 given to all columns and values in our model. Filter context works by filtering columns values that share the same lineage. There are certain rules and considerations that determine if lineage is kept or broken, and the book does a great job outlining those areas.

⚡️Window Functions:

I’m parking this here as I’m sure the third edition will include these functions. Window functions were released in December 2022 and are the first functions that allow us to sort a table and navigate the sort to return results. In order to accomplish this, there is a complex concept called Applied Semantics.

I view applied semantics somewhat like expanding a fact table from related filtered dim values. In the case of applied semantics though, the source table looks for matching column values in the current evaluation context (row or filter) to match against rows in the source table.

The goal is to filter the source table and identify the current row(s). Finally, the Window function iterates over the current source table row(s) to return a result.

Window functions operate over relative and absolute row positions rather than sequential values like traditional time intelligence functions. For example, prior year in a traditional DAX function would depend on the current year -1, however you evaluate the prior row in a window function, regardless of number sequence.

The job of a Window function is to identify the current row(s) in the current row or filter context. Evaluating using both row or filter context is a unique feature of applied semantics.

In terms of isolating a partition in the source table, the visual table isolates a single row in the source table allowing for the isolation of the full partition based on the current row.

Window functions are relatively new to DAX and were established to facilitate what was later released as visual calculation.

There are three functions that share similar syntax, but allow for different table expression results:

1. INDEX provides for an absolute single row result (1 first, 2 second, 0 current). However, PARTITIONBY provides higher grouping levels, then returns the single row for each group through applied semantics.

Index Meas = INDEX( 1,  -- the first row. -1 Gives you the last row.
                    TableName, 
                    ORDERBY( TableName[Product Name], ASC), -- Or an explicit DESC 
                    PARTITIONBY(TableName[Vendor Name]),
                    KEEP -- Keeps blank rows, optional.
                  )
-- This function returns the first row of the sorted table, in ASC order by default.
-- Requires each row to be different. If not window functions will add columns to sort.
-- However TOPN will return multiple rows.
-- Sorts tables on the fly. Not sure if added to OFFSET and WINDOW...
Calculate Table Index = 
       CALCULATE([TotalSales],
             INDEX( 1,
             ALLSELECTED(Date[Year]) --Default :/ respects outer visual filters on vendor.
                                     -- Don't skip the table, explicitly define it here.
             ORDERBY( Date[Year] )
       )
Measure to repeat first month sales each year = 
 
var __table = 
INDEX ( 
 1, 
 ORDERBY ( TimeTable[Mo Seq Num], ASC ,TimeTable[Mo Year] ), 
 PARTITIONBY ( TimeTable[Cal Year] )
 )
RETURN
CALCULATE([Sales (SumX)],
                         REMOVEFILTERS(TimeTable),
                         __table) 
-- REMOVEFILTERS clears original implicit filters from visual.

2. OFFSET provides for a relative row result (-1 last, 0 current, 1 next). Offset allows us to return a relative row in a sorted table. A faster alternative to custom time intelligence.

Prior Year Sales Meas = 
            CALCULATE([Total Sales Meas], 
               OFFSET( -1,   -- Evaluates current filter year - 1
               ALL(TimeTable[Cal Year]),  -- Default ALLSELECTED will limit visibility 
               ORDERBY(TimeTable[Cal Year], ASC) -- Order: 2019, 2020... 
                     )
            )
/* Remember that window functions when table argument is ALLSELECTED by default, will only see table values as filtered by the visual. ALL on the other hand will access other values in the model table, even though they are not visible in the report visual */
                              
                              

3. WINDOW provides for a range of rows returned including a relative range for a moving average (REL) function, or a mix of absolute and relative row references used for a running total (ABS, REL).

6 Mo Average Meas = 
          AVERAGEX(
             WINDOW( -5, REL,
                      0, REL,
               ALLSELECTED(TimeTable[MoYearNum],TimeTable[MoYr Label]), -- Default table
               ORDERBY(TimeTable[MoYearNum], ASC, TimeTable[MoYr Label], ASC)
                    ),
                         [TotalSales] -- Expression argument
                )
-- Window functions are faster bec they use a sorted lookup table. 
-- This derives -5, 0 position based on the visual due to the use of ALLSELECTED().
-- WINDOW filters the ALLSELECTED (Shawdow Filter Context) 
-- ignoring the inner 'current' filter of [MoYr Label] applied in the visual.
-- Running YTD Total table query:
EVALUATE
SUMMARIZECOLUMNS (
    TimeTable[Mo Seq Num],
    TimeTable[Mo Year],
    "@Sales", [Total Sales Meas],
    "@SalesRT",
        CALCULATE (
            [Total Sales Meas],
            WINDOW (
                1,ABS,
                0,REL,
                ORDERBY ( TimeTable[Mo Seq Num], ASC, TimeTable[Mo Year], ASC ),
                PARTITIONBY( TimeTable[Cal Year])
            )
        )
)
ORDER BY TimeTable[Mo Seq Num]
-- Need explicit ASC/DESC when multiple ORDERBY arguments are used!
-- The default table argument uses ALLSELECTED with ORDERBY and PARTITIONBY columns.
Running YTD Total Meas =
VAR __sourcetable =
    WINDOW (
        1,ABS,
        0,REL,
        ORDERBY ( TimeTable[Mo Seq Num], ASC, TimeTable[Mo Year], ASC ),
        PARTITIONBY(TimeTable[Cal Year])
    )
VAR __math =
    CALCULATE ( [Total Sales Meas], __sourcetable )
RETURN
    __math
-- Don't forget w/ multiple ORDERBY you need explicit DESC/ASC!

The general syntax is to use the function name followed by the Position(s) numbers including a relative or absolute reference if necessary. The next argument is the Table reference.

This argument is optional, if not defined it will internally call ALLSELECTED (ALL queried / respect outside filters) and include the columns defined in ORDER BY and PARTITIONBY (38:00) values in search of the current row(s).

Note that it is best practice to define the table argument explicitly. ALLSELECTED will define and access the shadow filter context, removing the current inner filtered visual value and replacing it with the full outer original filter context. This means that an OFFSET looking back one year, will return a blank for the first year in the table expression, as the prior year is not visible from the original outer filter selection that defines the shadow filter context. Watch the video here.

Notes on Apply Semantics

Apply semantics dictate how DAX processes table data within Window functions. It matches rows across contexts both row and filter context, and computes results in complex analytical scenarios.

Key Terms:

  • Original Query: The starting point referenced in the DAX expression, reflecting row and/or filter contexts present in the original query. Serves as the input for creating the matching table that filters the source table to identify the current row(s).
  • Source Table: A table explicitly defined or derived by functions like WINDOW and OFFSET. Represents the dataset over which Windows calculations iterate over. Often includes additional data through ADDCOLUMNS or tables extended using ALLSELECTED. ALLSELECTED is used to define an initial source table over values in ORDER BY and PARTITIONBY (38:00) if not explicitly defined.
  • Columns:

Bound Model Columns: Source model columns with a lineage back to the data model, essential for maintaining relationships. Columns are bound when they both exist in the original row or filter context and also visible in the source table.

Unbound Model Columns: Source model columns that are not present in the original row or filter context query.

Local Columns: Created within the DAX query (e.g., computed columns via ADDCOLUMNS and used for extended calculations.

Process of Apply Semantics:

** Step 1 Matching Process: Each model column in the ‘source’ table is evaluated for its visible values

Row Context Match: If the column exists in a row context, it uses that value.

Filter Context Match: If the column is not in a row context, its values are derived using VALUES, which retrieves a table of unique visible values in the current filter context. This process produces an initial matching table containing all possible combinations of bound column values derived through a CROSSJOIN (when dealing with multiple VALUES columns) of all VALUES.

Columns defined during the matching process include:

  • Bound Model Columns: Model columns present in both the source and original query, aligned via data lineage.
  • Unbound Model Columns: Model columns in the source table that do not match any columns in the original query.
  • Local Columns: Computed columns added during the query.

** Step 2 Matching Table: The result of the matching process; containing one row or combinations of bound and unbound columns under the current filter context.

Represents all potential matches between the source table and the original query.

At this point the Matching Table may include rows that do not exist in the actual source table.

**Step 3 Apply Process: The matching table is used to filter the source table, retaining only relevant rows existing in the source table.

1. DAX filters the source table by matching it against the matching table’s combinations.

2. Only rows that exist in the source table are retained, resulting in the apply table.

**Step 4 Apply Table: The apply process results in the current row(s) relevant to the Window evaluation given its current row or filter context.

The Window function evaluates by iterating over this table. If there are more than one valid row result, then each row evaluated by the Window function are unioned together in the result.

Context in DAX Calculations

Row Context Defines the current row in an iteration. Created by functions like SUMX, FILTER, or ADDCOLUMNS. Critical for iterating over the apply table during the calculation.

Filter Context Determines the set of visible rows based on filters or slicers. Modified dynamically during the matching process to include bound column values from the matching table.

Example:

Compute a running total of sales over year 2023.

1. Original Table: Filtered table containing only dates from 2023.

2. Source Table: Includes all transactions across all dates and sales amounts.

Columns include:

  • Model Columns: Date (to maintain alignment).
  • Local Columns: SalesAmount (added via ADDCOLUMNS).
  • Unbound Model Columns: Potentially include fields like CustomerID if needed for the calculation.

3. Matching Process: The Date column is aligned between the source and original tables.

A matching table is created, containing combinations of visible dates and their associated sales.

4. Apply Process: The matching table filters the source table, producing the apply table.

Rows include bound columns (e.g., Date) and unbound columns (e.g., SalesAmount).

5. Window Function: Iterates over the apply table using WINDOW to sum sales amounts for dates within the specified range (e.g., all dates ≤ the current date).

6. Result: A running total for each date in 2023.

Key Take-Aways

DAX apply semantics aligns data: Matching and binding columns ensure accurate relationships between tables.

Context Differentiation: Effectively managing row and filter contexts for precise calculations.

Efficient Processing: Iterating over apply tables optimally using window functions.

Flexibility: Supporting unbound columns (both model and local) for enriched analytical insights.

🙏🏼Thank you for reading. I appreciate the opportunity to learn and share together with our SML group. Please leave a comment on what insights you gained from this informative chapter. Next week, Chapter 15 – Advanced Relationships!

Chapter 15 – Advanced Relationships

🚀 Working through the The Definitive Guide to DAX, this chapter helps me appreciate relationships and how they work to improve performance and flexibility in my model.

⚡️Static Segmentation assigns grouping labels to slice by in my report. One option creates a calculated column using a nested IF of all segment values found in my configuration table. A more dynamic solution filters the current row value (i.e., net price) against filtered rows in our configuration table. If creating a physical relationship, knowing how VALUES and DISTINCT differ will help avoid a circular reference error caused by a blank row dependency.

⚡️Dynamic segmentation using virtual relationships can lead to additive and non-additive results in our report matrix. The book does a great job of walking through examples highlighting the flexibility and trade-offs that come with trying to force additive results in our report.

⚡️Model relationships are great at moving filters from one table to the other. However, when this is not available, a virtual relationship can save the day! TREATAS is a performant option with the ability to assume the lineage of model columns we wish to filter. Combine with KEEPFILTERS to protect our result from the inherent overwrite behavior of CALCULATE filters.

⚡️Many to Many (M2M) relationships are when two tables cannot be related in a one-to-many relationship. A budget example represents a M2M ‘cardinality’ challenge, it includes a DAX option and a model solution either through a snowflake dim or direct M2M cardinality relationship with small dims.

The video below relates to the M2M illustrations below. It is 1:20 minutes long and is included in the SQLBI+ subscription.

⚡️M2M Bank Accounts: The bank account example is a source table M2M used as a bridge table. A DAX option, or bi-dir filters ideally with CROSSFILTER to isolate a measure result.

DAX Solutions:

Relationship Solution:

⚡️M2M Budget: Another M2M example deals specifically with relationship cardinality set at a lower granularity than the primary key. The example in the chapter looks at relating Product and Customer dimensions to a Budget fact, and different model options analyzing both Sales and Budget together.

DAX Solutions:

Relationship Solution:

⚡️Relationships at a lower granularity than the table key call for special attention. We should consider hiding values (returning blanks) or maybe a logical allocation over levels that go below our supported level.

⚡️Ambiguity occurs when there are multiple relationship paths linking two tables. Ambiguity can also occur by referencing a valid measure from inside another measure that also uses CALCULATE modifiers such as USERELATIONSHIPS and CROSSFILTER.

🎉 As I work through the advanced chapters I realize there is so much more to DAX than just function definitions. I have a greater appreciation for the choices I make and how that impacts performance and results. If you are also reading the Definitive Guide to DAX, please comment with your thoughts and insights you’ve gained. Thanks for reading. Next week Ch.16 Advanced Calculations in DAX!

Chapter 16 – Advanced Calculations in DAX

⚡️ The chapter shares many great Advanced DAX examples. However, its purpose is less about DAX patterns and more about learning the art of “thinking in DAX” 🤔. At this point, DAX is more than basic measures; it’s about how different variations of measures bring different levels of flexibility and performance.

⚡️ Wondering if adding measure complexity is worth it? We discover that it “depends” 💵. The chapter encourages creating test queries that compare data size upfront as an indicator of potential performance gain. Ultimately, testing your specific scenario is necessary to point you in the right direction.

⚡️ Nothing improves performance like pre-computed values. One example involves pushing reference values back into a physical table. While this table may add a little size to the model, the benefit of a LOOKUP table could dramatically improve performance by shifting the burden of a dynamic query back to a static table.

⚡️ In a later example, we learn why creating a dynamic table reference is “one of the most powerful and useful techniques in DAX.”

⚡️ Another pre-computed technique involves pushing complex time intelligence logic back to a calculated column in the date table. Rather than asking our measure to dynamically recompute comparable prior year dates to filter, the book offers a useful way to shift the burden back to a column in the model.

⚡️ It’s tempting to write a measure that iterates an IF condition over each date row in our table argument. While this may initially seem like a direct and logical approach, we should be aware of the added expense of context transition and consider if row-by-row evaluation over each date is necessary. The example shares how an improved version avoids a complex iteration and uses CALCULATE to apply filter context over the model to produce the correct amounts for dates filtered in the visual.

🎉 I encourage anyone interested in leveling up in DAX to work through the Definitive Guide to DAX. This chapter leads well into Chapter 17, The DAX Engines. Both chapters are excellent introductions to learning more about DAX optimization. If you are also reading this book, I’d love to hear what has helped you think in DAX. Thanks for reading!


Chapter 17 – The DAX Engines

⚡️Tabular is a balance of two engines, the Formula Engine (FE), and the Storage Engine (SE). The formula engine handles the full complexity of DAX and manages query plans and requests made to the storage engine.

⚡️The FE is powerful yet limited by having to work with uncompressed data through a single thread.

⚡️The SE processes query requests. Its strength comes from being multi-threaded working quickly over compressed data serving back to the FE table results known as datacaches.

⚡️I think of the FE as an event planner managing an event; and the SE as a delivery team that works for the planner. The planner communicates what is needed to a delivery team, who is skilled at packing and transporting supplies (datacaches). When the event planner receives the supplies, they are un-loaded from the truck (un-compressed) but may (or may not) require further preparation by the planner for use in the event. The planner manages the details, but is just one person, where the delivery team moves volume efficiently.

⚡️When data remains in its original data source (Often SQL), this is genetically referred to as DirectQuery. The formula engine can write more complex SQL statements based on SQLs ability to receive and process more complex queries.

💻Vertipaq is an in-memory columnar database. Methods of compression include:
⚡️Value Encoding is a math technique that reduces the memory cost of a numeric column. Usually applied to primary key values and values within a defined range.
⚡️Hash Encoding: Builds a dictionary of unique column (non-integer) values and assigns an index of values. This also reduces the memory cost of column of strings. The primary factor to measure column size is cardinality of the column, not data type.
⚡️Run Length Encoding: Helps reduce the size of the dataset by avoiding duplicate or repeating values.

⚡️It is important to note that DAX calculated columns are not part of the initial Vertipaq sort that determine best overall compression. Columns pushed back to the source generally lead to better compression.

⚡️Materialization is the uncompressed datacache sent by the SE to the FE. The cardinality of the final FE query defines the most optimal materialization desired from the SE.

⚡️Late materialization occurs when the final query result matches the size of the datacache received (Optimized). However, when the size of the datacache received is larger than the final query result, this is referred to as early materialization.

🎉These are just some of the topics covered in the DAX bible. 🙌 Also note that there is a second Optimizing DAX to further your knowledge from our Italian friends.

I’m thankful to share here and with SML. If this book is helping you please leave a comment below. Thanks for reading!

Chapter 18 – Optimizing VertiPaq

⚡ Reducing Cardinality:
One of the main goals is to reduce cardinality, or reduce the number of unique values in a column. By doing so, we improve compression and increase Vertipaq performance. This chapter offers several common scenarios to consider.

⚡ Date and Time Granularity:
A significant opportunity to improve cardinality involves lowering the granularity of time related data. There are several tips in the chapter for storing date-time effectively.

⚡ Cost of Relationships:
Seemingly simple filters over low cardinality columns (such as customer gender) may result in a costly relationship. When we have an extremely large dimension tables (over 1M) and filter a large query result (I.e., half the table records) including all unique keys, we should consider options to denormalize this column to improve performance.

⚡ Star Schema Efficiency:
Although relationships between tables come at a cost, a star schema model is still the best option for overall modeling and performance. However, when dimensions exceed 100K and approach 1M unique records, there are denormalization techniques suggested in the chapter that help reduce cardinality when needed.

⚡ Calculated Columns:
Calculated Columns are appropriate in very limited circumstances. The chapter does a great job highlighting the costs and benefits of adding calculated columns as an optimization or reporting technique.

⚡ Simplifying Measures:
One valid option for calculated columns is to help simplify a complex measure filter into a simple boolean calculated column. By storing and evaluating a low cardinality column at refresh time, we could avoid the cost to dynamically evaluate a complex measure filter at query time.

⚡ Excluding High Cardinality Columns:
The chapter lists various column types and encourages excluding high cardinality columns that don’t add analytical value to our report.

⚡️Fusion:

DAX Fusion is an optimization technique used in the DAX engine to optimize a query to just one filtered result instead of having to go back to materialize a second filtered table. As we know, there are two engines, the Storage Engine, which has similar abilities to SQL queries, and is good at quickly providing a query result, and the Formula Engine, which has more advanced abilities in terms of making more complex evaluations to fulfill the query plan.

The storage engine query (SE) queries tables given the filter context requested by the formula engine (FE). If possible, the FE can request a single scan of the SE to query a single filter context table to satisfy either multiple expressions (SELECT) or multiple filters (WHERE). Multiple expressions need to share the same filter context query, and multiple filters need to be common to the expression(s) for fusion to work. The moment we introduce a new filter context for one expression but not another, this is when fusion breaks and is required to break off the query to satisfy the two different filter contexts in the query. Read about fusion below, and a creative method shared by Phil Seamark in this video, to query a single larger table result in order to satisfy two different filter (WHERE) requirements. The trick is to start with a larger query request, then use the FE to behave like filter context to filter out results that return a 0 result.

Vertical Fusion: Multiple SELECT expressions, Single WHERE filter. I.e., SELECT SUM(Gross Sales) and SUM(Net Sales), same FROM and WHERE conditions. This is a query optimization technique that leverages a single data cache over a single SE scan.

Horizontal fusion: Single SELECT expression, Multiple WHERE filters. I.e. Vendor IN {“KARPARTS”, “TOPCLEAN”}. This is a query optimization technique that leverages a single data cache over a single SE scan.

Horizontal fusion is the newer of the two types of fusion. I view horizontal fusion as creating a common row set, but separating different filter results into individual columns, i.e., KARPARTS Total column separate from a TOPCLEAN Totals column.

We can break horizontal fusion by adding a different column (WHERE) filter type. Say that we add a filter on Product Class. By doing this, we create an additional scan of the storage engine.

However, there is a way to modify our DAX measure using the table iterator function SUMX.

By using SUMX with ALL variable column values, we can cross join all potential filter combinations and push all possible filter combinations up into one common table (SE scan) expression to iterate over.

In the expression argument of SUMX we multiply the aggregation by the occurrences of column variables IN the current filter context.

Finally, we create filter-like context by replacing what used to be separate sets of WHERE (FE) filters with a single filter on only non 0 results.

With this ALL-IN pattern, we 1. Expand the table (SE) to include all filter combinations. then we 2. Transfer the filter-like behavior down into the FE expression. Then finally 3. We ignore (filter out) 0 results without using an ELSE branch. By doing so we create a cartesian set that makes fusion from the SE possible once again.

⭐️Understanding and applying these concepts can significantly improve the performance of your data models. I encourage anyone wanting to improve their DAX knowledge to read through The Definitive Guide to DAX. If you are already reading, I’d love to hear your thoughts in the comments 📝 Thanks for reading! 🚀

Chapter 19 – Analyzing DAX Query Plans

⚡️ Behind the scenes with DAX Studio:
The DAX we write in Desktop translates into a DAX expressions explaining ‘what’ data is required and how it should be aggregated. The DAX engine produces query plans that outline the ‘steps’ to fulfill. DAX Studio gives us visibility over the process and performance. This chapter does a great job pointing us to common optimizing issues as well as solutions.

⚡️Query Plans:
The high-level logical query plan lists a tree of logical operators and resembles our original DAX request. We further translate this into a different physical query plan defining physical operations.

⚡️DAX Query Processing Layers:
The Formula Engine (FE) orchestrates the physical query plan and steps in when processing complex operations and calculations. The FE communicates with the SE through xmSQL, however is limited to processing query results uncompressed over just a single thread.

The SE handles querying data, generating data caches, and performing bulk scans and aggregations. The SE processes queries over multiple threads to efficiently serve requests back to the FE.

⚡️ Optimizing:
Analyzing SE queries serve as a logical first step in optimization. The SE shows the level of materialization of data caches. We also benefit from knowing what the SE left out for the FE to complete. The SE is a great starting point in finding a good balance between SE and FE processing.

⚡️Compression:
The SE works over compressed columns, and memory footprint matters for efficient scanning. Memory footprint depends on the number of unique values, their distribution across rows, sort order, and the number of rows in a table.

⚡️Row-Level Security:
The FE is responsible for sending role specific (RLS) queries to the SE. The Vertipaq cache is a global resource serving multiple roles through different FE requests.

⚡️CallbackDataID Implications:
The presence of CallbackDataID indicates row iterations where calculations are too complex for the VertiPaq SE (I.e., IF conditions). The chapter walks through a great division equation exploring a measure that avoids callbacks. Another interesting note is that SE does not save data caches when an xmSQL query involves a CallbackDataID.

Thank you for joining me on this DAX journey through the DAX bible 🙏🏼 . Next week we ⭐️finish the book with Optimizing DAX!! 🎉 Please leave a comment below. Happy optimizing!

Chapter 20 – Optimizing DAX

🚀 Everything comes together in this final chapter 2️⃣0️⃣ where we Optimize DAX in “The Definitive Guide to DAX”!!

Optimizing DAX is not just one best practice, but a process of iterating through different improvement strategies. This chapter points us to core fundamentals and sound approaches to improve performance.

⚡️Analyzing:
We start by analyzing server timings, as it displays information about the entire query. We are concerned with larger materialization, longer scan times, and the need to refer back to the formula engine through CallbackDataIDs (Callbacks).

⚡️ Less is more!
A key fundamental is to write measures that filter columns rather than entire tables. Not only do we reduce the need to materialize and scan unnecessary columns, but less columns often reduce rows generating a smaller table of unique value combinations. This can be a huge gain when iterations and context transitions are involved.

⚡️Avoiding full table example:
When calling on non-additive expressions (i.e., DISTINCTCOUNT) the SE query might materialize a query for each row in the table I.e., every sales transaction. The improvement again is to reduce the table filter down to column filters (i.e., just Qty and Price) to avoid the excessive size of the datacache processing each sales row.

⚡️CallbackDataID:
CallbackDataIDs occur when using IF (in an iterator), DIVIDE, scalar functions used in an iteration that don’t aggregate I.e. DATE, VALUE, rounding etc. There is a processing cost and a missed opportunity to save the datacaches.

⚡️Avoiding Callbacks with CALCULATE!
⭐️IF conditions involving iterators require the FE through callbacks. A CALCULATE filter is generally preferred over a row-by-row IF conditional test.
⭐️DIVIDE and using IF to test for non-zero denominator values also use callbacks. However, an alternate approach uses CALCULATE with a filter to filter out non 0 denominator values to avoid the callback!
⭐️Optimizing IF-then Iterator expressions by adding two CALCULATEs together, each with an opposing filter argument simulates a dual result IF condition. A brilliant way to avoid costly callbacks.

🛟 Once again it’s CALCULATE to the rescue avoiding callbacks and keeping filters in the SE as a WHERE filter.

⚡️Nested Iterators can’t be merged into a single storage engine query. Options shared include:
⭐️Lower the granularity of the outer iterator table using VALUES over a column.
⭐️Eliminate the nested iterator by leveraging SUMMARIZE to combine two table iterators into one.
⭐️Iterating the expanded Sales table and using RELATED in our expression avoids the expense of a measure reference that includes a hidden nested iterator and context transition!

I’m thankful for those that have read my posts this year, especially those in SML that have given me time for DAX demos. I am no expert, but I do feel more confident in DAX. Huge thanks to Marco Russo and Alberto Ferrari! The Definitive Guide to DAX is indeed the DAX bible! The “brick” (Optimizing DAX 2nd Edition) is next!

One response to “Notes: The Definitive Guide to DAX”

  1. Belinda Allen Avatar
    Belinda Allen

    Just brilliant my friend!!!

Leave a Reply

Discover more from Story BI

Subscribe now to keep reading and get access to the full archive.

Continue reading