7.1 Understanding Tables and Columns
7.1.3 Understanding the Column Data Types
A table column has a data type associated with it. In Chapter 5, I mentioned that a query
column already has a data type. When a query connects to the data source, it attempts to infer the column data type from the data provider and then maps it to one of the data types it supports. Although it seems redundant to have data types in two places (query and
storage), it gives you more flexibility. For example, you can keep the inferred data type in the query but change it in the table.
Currently, there isn’t a one-to-one mapping between query and storage data types.
Instead, Power BI Desktop maps the query column types to the ones that the xVelocity storage engine supports. Table 7.1 shows these mappings. Queries support a couple of more data types (Date/Time/Timezone and Duration) than table columns.
Table 7.1 This table shows how query data types map to column data types.
Query Data Type Storage Data Type Description
Text String A Unicode character string with a max length of 268,435,456
characters
Decimal Number Decimal Number A 64 bit (eight-bytes) real number with decimal places
Fixed Decimal Number Fixed Decimal Number A decimal number with four decimal places of fixed precision useful for storing currencies.
Whole Number Whole number A 64 bit (eight-bytes) integer with no decimal places Date/Time Date/Time Dates and times after March 1st, 1900
Date Date Just the date portion of a date
Time Time Just the time portion of a date
Date/Time/Timezone Date Universal date and time
Duration Text Time duration, such as 5:30 for five minutes and 30 seconds
TRUE/FALSE Boolean True or False value
Binary Binary data type An image or blob
How data types get assigned
The storage data type has preference over the query. For example, the query might declare a column date type as Decimal Number, and this type might get carried over to storage.
However, you can overwrite the column data type in the Data View to Whole Number.
Unless you change the data type in the query and apply the changes, the column data type remains Whole Number.
The storage engine tries to use the most compact data type, depending on the column values. For example, the query might have assigned a Fixed Decimal Number data type to a column that has only whole numbers. Don’t be surprised if the Data View shows the column data type as Whole Number after you import the data. Power BI might also perform a widening data conversion on import if it doesn’t support certain numeric data types. For example, if the underlying SQL Server data type is tinyint (one byte), Power BI will map it to Whole Number because that’s the only data type that it supports for whole numbers.
Power BI won’t import data types it doesn’t recognize and won’t import the
corresponding columns. For example, Power BI won’t import a SQL Server column of a geography data type that stores geodetic spatial data. The Binary data type can be used to import images. Importing images is useful for creating banded reports, such as to allow you to click a product image and filter data for that product.
If the data source doesn’t provide schema information, Power BI imports data as text and uses the Text data type for all the columns. In such cases, you should overwrite the data types after import when it makes sense.
Changing the column data type
As I mentioned, the Formatting group in ribbon’s Modeling tab and the Transform group in the Query Editor indicate the data type of the selected column. You should review and change the column type when needed, for the following reasons:
Data aggregation – You can sum or average only numeric columns.
Data validation – Suppose you’re given a text file with a SalesAmount column that’s supposed to store decimal data. What happens if an ‘NA’ value sneaks into one or more cells? The query will detect it and might change the column type to Text. You can
examine the data type after import and detect such issues. As I mentioned in the previous chapter, I recommend you react to such issues in the Query Editor because it has the capabilities to remove errors or replace values. Or course, it’s best to fix such issues at the data source, but probably you won’t have write access to the source data.
NOTE What happens if all is well with the initial import, but a data type violation occurs the next month when you are given a new extract? What really happens in the case of a data type mismatch depends on the underlying data provider.
The text data provider (Microsoft ACE OLE DB provider) replaces the mismatched data values with blank values, and the blank values will be imported in the model. On the query side of things, if data mismatch occurs, you’ll see “Error”
in the corresponding cell to notify you about dirty data, but no error will be triggered on refresh.
Better performance – Smaller data types have more efficient storage and query performance. For example, a whole number is more efficient than text because it occupies only eight bytes irrespective of the number of digits.
Sometimes, you might want to overwrite the column data type in the Data View. You can do so by expanding the Data Type drop-down list in the Formatting ribbon group and then select another type. Power BI Desktop only shows the list of the data types that are
applicable for conversion. For example, if the original data type is Currency, you can convert the data type to Text, Decimal Number, and Whole Number. If the column is of a Text data type, the Data Type drop-down list would show all the data types. However, you’ll get a type mismatch error if the conversion fails, such as when trying to convert a non-numeric text value to a number.
Understanding column formatting
Each column in the Data View has a default format based on its data type and Windows regional settings. For example, my default format for Date columns is MM/dd/yyyy hh:mm:ss tt because my computer is configured for English US regional settings (such as 12/24/2011 13:55:20 PM). This might present an issue for international users. However, they can overwrite the language from the Power BI Desktop File ð Options menu and
change the formatting. Changing column formatting has no effect on how data is stored because the column format is for visualization purposes only. Use the Formatting group in the ribbon’s Modeling tab to change the column format settings, as shown in Figure 7.4.
You can use the format buttons in the Formatting ribbon group to apply changes interactively, such as to add a thousand separator or to increase the number of decimal places. Formatting changes apply automatically to reports the next time you switch to the Reports View. If the column width is too narrow to show the formatted values in Data View, you can increase the column width by dragging the right column border. Changing the column width in Data View has no effect on reports.
Figure 7.4 Use the Formatting ribbon group to change the column format.