A relationship is a join between two tables. When you define a table relationship with one- to-many cardinality, you’re telling Power BI that there’s a logical one-to-many
relationship between a row in the lookup (dimension) table and the corresponding rows in the fact table. For example, the relationship between the Reseller and ResellerSales tables in Figure 7.11 means that each reseller in the Reseller table can have many corresponding rows in the ResellerSales table. Indeed, Progressive Sports (ResellerKey=1) recorded a sale on August 1st, 2006 for $100 and another sale on July 4th 2007 for $120. In this case, the ResellerKey column in the Reseller table is the primary key in the lookup (dimension) table. The ResellerKey column in the ResellerSales table fulfills the role of a foreign key in the fact table.
Figure 7.11 There’s a logical one-to-many relationship between the Reseller table and the ResellerSales table because each reseller can have multiple sales recorded.
Understanding relationship rules
A relationship can be created under the following circumstances:
The two tables have matching columns, such as a ResellerKey column in the Reseller lookup table and a ResellerKey column in the ResellerSales table. The column names don’t have to be the same but the columns must have matching values. For example, you can’t relate the two tables if the ResellerKey column in the ResellerSales table has
reseller codes, such as PRO for Progressive Sports.
The key column in the lookup (dimension) table must have unique values, similar to a primary key in a relational database. In the case of the Reseller table, the ResellerKey column fulfills this requirement because its values are unique across all the rows in the table. However, this doesn’t mean that all fact tables must join the lookup table on the same primary key. As long as the column is unique, it can serve as a primary key. And some fact tables can use one column while others can use another column. If you attempt to establish a join to a column that doesn’t contain unique values in a lookup table, you’ll get the following error:
The relationship cannot be created because each column contains duplicate values. Select at least one column that contains only unique values.
Interestingly, Power BI doesn’t require the two columns to have matching data types. For example, the ResellerKey column in the Reseller table can be of a Text data type while its counterpart in the fact table could be defined as the Whole Number data type. Behind the scenes, Power BI resolves the join by converting the values in the latter column to the Text data type. However, to improve performance and to reduce storage space, use numeric data types whenever possible.
Understanding relationship limitations
Relationships have several limitations. To start, only one column can be used on each side of the relationship. If you need a combination of two or more columns (so the key column can have unique values), you can add a custom column in the query or a calculated
column that uses a DAX expression to concatenate the values, such as =[ResellerKey] &
“|” & [SourceID]. I use the pipe delimiter here to avoid combinations that might result in the same concatenated values. For example, combinations of ResellerKey of 1 with SourceID of 10 and ResellerKey of 11 and SourceID of 0 result in “110”. To make the combinations unique, you can use a delimiter, such as the pipe character. Once you construct a primary key column, you can use this column for the relationship.
Moving down the list, you can’t create relationships forming a closed loop (also called a diamond shape). For example, given the relationships Table1 ð Table2 and Table2 ð Table3, you can’t set an active relationship Table1 ð Table3. Such a relationship probably isn’t needed anyway, because you’ll be able to analyze the data in Table3 by Table1 with only the first two relationships in place. Power BI will actually let you create the Table1 ð Table3 relationship, but it will mark is as inactive. This brings to the subject of role- playing relationships and inactive relationships.
As it stands, Power BI doesn’t support role-playing relationships. A role-playing
lookup table is a table that joins the same fact table multiple times, and thus plays multiple roles. For example, the InternetSales table has the OrderDateKey, ShipDateKey, and
DueDateKey columns because a sales order has an order date, ship date, and due date.
Suppose you want to analyze sales by these three dates. One approach is to import the Date table three times with different names and to create relationships to each date table.
This approach gives you more control because you now have three separate Date tables and their data doesn’t have to match. For example, you might want the ShipDate table to
include different columns than the OrderDate table. On the downside, you increase your maintenance effort because you know have to maintain three tables.
Figure 7.12 Power BI supports only one active relationship between two tables and marks the other relationships as inactive.
REAL WORLD On the subject of date tables, AdventureWorksDW uses a “smart” integer primary key for the Date table in the format YYYYMMDD. This is a common practice for data warehousing, but I personally prefer to use a date field (Date data type). Not only is it more compact (3 bytes vs. 4 bytes for Integer) but it’s also easier to work with. For example, if a business user imports ResellerSales, he can filter easier on a Date data type, such as to import data for the current year, than to parse integer fields. And DAX time calculations work if a date column is used as a primary key.
That’s why in the practice exercises that follows, you’ll recreate the relationships to the date table.
Understanding active and inactive relationships
Another approach is to join the three date columns in InternetSales to the Date table. This approach allows you to reuse the same date table three times. However, Power BI supports only one active role-playing relationship. An active relationship is a relationship that
Power BI follows to automatically aggregate the data between two tables. A solid line in the Relationships View indicates an active relationship while a dotted line is for inactive relationships (see Figure 7.12). You can also open the Manage Relationships window (click the Manage Relationships button in ribbon’s Home or Modeling tabs) and inspect the Active flag.
When Power BI Desktop imports the relationships from the database, it defaults the first one to active and marks the rest as inactive. In our case, the
InternetSales[DueDateKey] ð DimDate[DateKey] relationship is active because this happens to be the first of the three relationships between the DimDate and
FactInternetSales tables that you imported. Consequently, when you create a report that slices Internet dates by Date, Power BI automatically aggregates the sales by the due date.
NOTE I’ll use the TableName[ColumnName] notation as a shortcut when I refer to a table column. For example, InternetSales[DueDateKey] means the DueDateKey column in the InternetSales table. This notation will help you later on with DAX formulas because DAX follows the same syntax. When referencing relationships, I’ll use a right arrow (ð) to denote a relationship from a fact table to a lookup table. For example, InternetSales[OrderDateKey] ð
DimDate[DateKey] means a relationship between the OrderDateKey column in the InternetSales table to the DateKey column in the DimDate table.
If you want the default aggregation to happen by the order date, you must set InternetSales [OrderDateKey] ð DimDate[DateKey] as an active relationship. To do so, first select the InternetSales[ShipDateKey] ð DimDate[DateKey] relationship, and then click Edit. In the Edit Relationship dialog box, uncheck the Active checkbox, and then click OK. Finally, edit the InternetSales[OrderDateKey] ð DimDate[DateKey] relationship, and then check
the Active checkbox.
What if you want to be able to aggregate data by other dates without importing the Date table multiple times? You can create DAX calculated measures, such as
ShippedSalesAmount and DueSalesAmount, that force Power BI to use a given inactive relationship by using the DAX USERELATIONSHIP function. For example, the
following formula calculates ShippedSalesAmount using the ResellerSales[ShipDateKey]
ð DimDate[DateKey] relationship:
ShippedSalesAmount=CALCULATE(SUM(InternetSales[SalesAmount]), USERELATIONSHIP(InternetSales[ShipDateKey],‘Date’[DateKey]) Cross filtering limitations
In Chapter 5, I mentioned that a relationship can be set to cross-filter in both directions.
This is a great out-of-box feature that allows you to address more advanced scenarios that previously required custom calculations with Power Pivot, such as many-to-many
relationships. However, bi-directional filtering doesn’t make sense and should be avoided in the following cases:
When you have two fact tables sharing some common dimension tables – In fact, to avoid ambiguous join paths, Power BI Desktop won’t let you turn on bi-directional filtering from multiple fact tables to the same lookup table. Therefore, if you start from a single fact table but anticipate additional fact tables down the road, you may also
consider a uni-directional model (Cross Filtering set to Single) to keep a consistent experience to users, and then turn on bi-directional filtering only if you need it.
NOTE To understand this limitation better, let’s say you have a Product lookup table that has bi-directional relations to ResellerSales and InternetSales tables. If you define a DAX measure on the Product table, such as Count of Products, but have a filter on a Date table, Power BI won’t know how to resolve the join: count of products through ResellerSales on that date, or count of products through InternetSales on that date.
Relationships toward the date table – Relationships to date tables should be one- directional so that DAX time calculations continue to work.
Closed-loop relationships – As I just mentioned, Power BI Desktop will automatically inactivate one of the relationships when it detects a closed loop, although you can still use DAX calculations to navigate inactive relationships. In this case, bi-directional relationships would produce meaningless results.