Shaping and Cleansing Data

Một phần của tài liệu Applied microsoft power bi Bring your data to life (Trang 203 - 208)

Suppose that the Adventure Works Finance department gives Martin periodically (let’s say every month or year) an Excel file that details accessories and parts that Adventure Works purchases from its vendors. Martin needs to analyze spending by vendor. The problem is that the source data is formatted in Excel tables that the Finance department prepared for its own analysis. This makes it very difficult for Martin to load the data and relate it to other tables that he might have in the model. Fortunately, the Query Editor component of Power BI Desktop allows Martin to transform and extract the data he needs. For the purposes of this exercise, you’ll use the Vendor Parts.xlsx file that’s located in the

\Source\ch06 folder.

6.2.1 Applying Basic Transformations

Figure 6.8 shows the first two report sections in the Vendor Parts file. This format is not suitable for analysis and requires some preprocessing before data can be analyzed.

Specifically, the data is divided in sections and each section is designed as an Excel table.

However, you need just the data as a single Excel table, similar to the Resellers Excel file that you imported in the previous chapter. Another issue is that the data is presented as crosstab reports, making it impossible to join the vendor data to a Date table in the data model.

Figure 6.8 The Vendor Parts Excel file includes crosstab reports, which present a challenge for relating this data to other tables in the model.

Exploring source data

Let’s follow familiar steps to connect to the Excel file. However, this time you’ll launch the Query Editor before you import the data.

1.If the Vendor Parts file is open in Excel, close it so that Excel doesn’t lock the file and prevent importing.

2.Open a new instance of Power BI Desktop. Save the file as Query Examples.

3.Expand the Get Data menu, and click Excel because you’ll import an Excel file.

4.Navigate to the \Source\Ch06 folder and select the Vendor Parts.xlsx file. Then click Open.

5.In the Navigator window, check Sheet1 to select it and preview its data. The preview shows how the data would be imported if you don’t apply any transformations. As you see, there are many issues with the data, including mixed column content, pivoted data by month, and null values.

6.Click the Edit button to open the Query Editor.

Removing rows

First, let’s remove the unnecessary rows:

1. Right-click the first cell in the Column1 column, and then click Text Filters ð “Does Not Equal” to exclude the first row.

2. Locate the null value in the first cell of Column3, and apply the same filter (Text Filters ð “Does Not Equal”) to exclude all the rows that have NULL in Column3.

3. Promote the first row as headers so that each column has a descriptive column name. To do so, in the ribbon’s Transform tab, click the “Use First Row as Headers” button.

Alternatively, you can expand the table icon (in the top-left corner of the preview

window), and then click “User First Row as Headers”. Compare your results with Figure 6.9.

Figure 6.9 The source data after filtering unwanted rows.

4. Note that the first column (Category) has many null values. These empty values will present an issue when relating the table to other tables, such as Product. Click the first cell in the Category column (that says Wheels), and then click the ribbon’s Transform tab.

Click the Fill ðDown button. This fills the null values with the actual categories.

5. Let’s remove rows that represent report subtotals. To do so, right-click a cell in the Category column that contains the word Category. Click Text Filters ð “Does Not Equal”

to remove all the rows that have “Category” in the first column.

6. Next, you will need to filter all the rows that contain the word “Total”. Expand the column dropdown in the column header of the Category column. Click Text Filters ð

“Does Not Contain”. In the Filter Rows dialog box, type “Total”, and then click OK.

7. Hold the Ctrl key and select the last two columns, Column15 and 2014 Total. Right- click the selection, and then click Remove Columns.

Un-pivoting columns

Now that you’ve cleansed most of the data, there’s one remaining task. Note how the months appear on columns. This makes it impossible to join the table to a Date table because you can’t join on multiple columns. To make things worse, as new periods are added, the number of columns might increase. To solve this problem, you need to un-pivot the months from columns to rows. Fortunately, this is very easy to do with the Query Editor!

1.Hold the Shift key and select all the month columns, from Jan to Dec.

2.Right-click the selection, and then click Unpivot Columns. The Query Editor un-pivots the data by creating new columns called Attribute and Value, as shown in Figure 6.10.

8. Double-click the column header of the Attribute column, and rename it to Month.

Rename the Value column to Units.

Figure 6.10 The un-pivoted dataset includes Attribute and Value columns.

Adding custom columns

If you have a Date table in the model, you might want to join this data to it. As I mentioned, a Data table is at a day granularity. So that you can join to it, you need to convert the Month column to a date. Click the ribbon’s Add Column tab, and then click

“Add Custom Column”.

1.In the “Add Custom Column” dialog box, enter FirstDayOfMonth as the column name.

Then enter the following formula (see Figure 6.11):

=Date.FromText([Month] & ” 1,” & “2008”)

Figure 6.11 Create a custom column that converts a month to a date.

This formula converts the month value to the first day of the month in year 2008. For example, if the Month value is Jan, the resulting value will be 1/1/2008. The formula hardcodes the year to 2008 because the source data doesn’t have the actual year. If you need to default to the current year, you can use Date.Year(DateTime.LocalNow()) formula.

NOTE Unfortunately, as it stands the “Add Custom Column” window doesn’t have IntelliSense or show you the formula syntax, making it difficult to work with formulas and forcing a “trial and error” approach. If you’ve made a mistake, the custom column will display “Error” in every row. You can click the Error link to get more information about the error. Then in the Applied Steps pane, click the Settings next to the step to get back to the formula, and try again.

2.Assuming you need the month end date instead of the first date of the month, select the FirstDayOfMonth column. In the ribbon’s Transform tab, expand the Date dropdown button, and then select Month ð “End of Month”.

3.Rename the new column to Date.

6.2.2 Loading Transformed Data

Now that you’ve cleaned and transformed the raw data, you can load the query results to the model. As I explained, you have three options to do so that are available by expanding the Close & Apply button. Close & Apply closes the Query Editor and applies the changes to the model. You can close the Query Editor without applying the changes, but the model and queries aren’t synchronized. Finally, you can choose to apply the changes without closing the Query Editor so that you can continue working with it.

Renaming steps and queries

Before loading the data, consider renaming the query to apply the same name to the new table. You can also rename transformation steps to make them more descriptive. Let’s rename the query and a step:

1.In the Query Settings pane, rename the query to VendorParts. This will become the name of the table in the model.

2.In the Applied Steps pane, right-click the last step and click Rename. Change the step name to “Renamed Column to Date”, and click Enter.

Loading transformed data

Let’s load the transformed data into a new table:

1.Click the Close & Apply button in the ribbon. Power BI Desktop imports the data, closes the Query Editor, and adds the VendorParts table to the Fields pane.

2.In the ribbon’s Home tab, click the Edit Queries button. This brings you to the Query Editor in case you want to apply additional transformation steps.

3.(Optional) You can disable loading query results. This could be useful if another query uses the results from the VendorParts query and it doesn’t make sense to create

unnecessary tables. To demonstrate this, right-click the VendorParts query in the Queries pane and then uncheck “Enable Load”. Accept the warning that follow that disabling the query load will delete the table from the model and break existing reports. Click Close &

Apply and notice that the VendorParts table is removed from the Fields list.

Một phần của tài liệu Applied microsoft power bi Bring your data to life (Trang 203 - 208)

Tải bản đầy đủ (PDF)

(447 trang)