Inferred Member Updates Output c Double-click the OLE DB Command transformation attached to this output and go to the Component Properties tab.. Historical Attribute Inserts Outputc Doub
Trang 1Dimension Columns Change Type
Country Historical attribute
Home Phone Changing attribute Work Phone Changing attribute Mobile Phone Changing attribute
When you’ve configured all the columns, the wizard screen should look as shown
in Figure 10-20 Click Next to move on
16. In the Fixed and Changing Attribute Options screen, uncheck the “Fail the transformation if changes are detected in a fixed attribute” option, because
we will make changes to the Email field to test how it goes In a production environment, you need to consider both of the options in this page in light of your requirements Click Next to open the Historical Attribute Options screen
Figure 10-20 Setting change types on the dimension columns
Trang 217. You have two options to record a Historical attribute If you select a single column
option, you can then select a column and the value pair to indicate the Current/
Expired or True/False values Alternatively, you can select start and end dates to
identify current and expired records Select the radio button next to the “Use start
and end dates to identify current and expired records” option Select StartDate in the
Start Date column and EndDate in the End Date column fields Then select System:: StartTime in the “Variable to set date values” field, as shown in Figure 10-21
This will update the values of StartDate and EndDate fields using the start time
of the package provided by the specified system variable Click Next
18. By default, the inferred member support is enabled in the SCD transformation,
and you can select from either of the two options to identify the inferred member
record In the first option, you can specify null as the column values for all columns with an inferred member change type, and in the second option, you specify an
indicator column Select the radio button for “Use a Boolean column to indicate
whether the current record is an inferred member.” Select IsInferred Column from
the drop-down list in the Inferred Member Indicator field as shown in Figure 10-22 Click Next to review the outputs summary of the SCD Wizard
Figure 10-21 Setting Historical attribute options in the SCD Wizard
Trang 319. Click Finish after reviewing the summary information The SCD Wizard will take a while to build the data flow for you, but when it’s done, you will have a lot
to review After moving some of the components around, your data flow should look like the one shown in Figure 10-23
The SCD Wizard has added a data flow to the four outputs out of six outputs available Let’s do a quick review
Inferred Member Updates Output
c Double-click the OLE DB Command transformation attached to this output and go to the Component Properties tab Check the SQL Statement in the SqlCommand field to find that this statement will update the DimCustomer for the Inferred member record (IsInferred = 1) and also reset its status while updating (IsInferred = 0) Click Cancel
Changing Attribute Updates Output
c Double-click the OLE DB Command attached to this output and go to the Component Properties tab Check the SQL Statement in the SqlCommand field to find that this statement will update the Changing attributes—i.e., the phone numbers for the member records Click Cancel to exit
Figure 10-22 Inferred member support in the SCD Wizard
Trang 4Historical Attribute Inserts Output
c Double-click the Derived Column
component attached to this output and note that this transformation creates a
new column EndDate and populates the package start time in this new column
Click Cancel Double-click the OLE DB Command attached to the output of the
Derived Column transformation and go to the Component Properties tab Check
the SQL Statement in the SqlCommand field to find that this statement will
update the member record of the DimCustomer table with the EndDate derived
in the previous component hence expiring the active record Click Cancel After
the OLE DB Command, the output is then combined with the data flow of New
Output using the Union All component Double-click the Union All component
to open the editor and note that the EndDate field created earlier has not been
passed on to the New Output data flow This will actually leave the EndDate column
value null, indicating that the added record is the active member So, in the Historical
Figure 10-23 Data flow created by the SCD Wizard
Trang 5Attribute Inserts Output data flow, the current member is first expired and then
a new row is inserted for the same member with the new attributes as an active member
New Output
c This output contains new records that are combined with the “to be inserted” records from Historical Attribute Inserts output as stated in the preceding item After that, the output is connected to Derived Column component Double-click the Derived Column component and note that it adds a new StartDate column and populates it with a package start time using a system variable Finally, it connects
to an OLE DB Destination to insert these new records into the DimCustomer table Two other outputs, Fixed Attribute Output and Unchanged Output, are available
at the Loading DimCustomer SCD transformation for which SCD Wizard has not created any data flow However, if you want to capture the rows from those outputs, you can create a downstream data flow for them
Exercise (Execute Loading Slowly Changing Dimension Package)
In this final part, you will execute the package to see how SCD deals with different types of updates
20. Double-click the New Output Data Flow Path to open the Data Flow Path Editor
Go to Data Viewers page and then click Add Click OK to add a grid-type data viewer in the Configure Data Viewer dialog box Click OK again to close the editor Similarly add data viewers to the other three outputs
21. You are now ready to run this package First, though, let’s see the data that is going to be played and replayed to see the responses of SCD transformation to different changes Open SQL Server Management Studio and connect to the database engine Open a new query pane and run the following query to see records in the Customer table:
Select * from [Campaign].[dbo].[Customer]
Note that the Customer table contains 14 records and the CustomerID of the last four records is NULL As per our package design, the Conditional Split transformation should filter out these records Switch to BIDS, right-click the Loading Slowly Changing Dimension.dtsx package in the Solution Explorer, and choose Execute Package from the context menu
As the package executes, the data viewer that gets data first will pop up, which is the one on New Output in our case If you move around on the screen, as shown
Trang 6in Figure 10-24, you will Note that the Customers Filter component diverts
four rows to the Excel destination and sends the other ten rows to Loading
DimCustomer SCD transformation Further, the SCD transformation identifies
that all these records are in fact new records and diverts them to New Output
No record flows through any other output Click the Detach button on the New
Output Data Viewer to let the data flow to the destination and be inserted into
DimCustomer table Click the Detach button on other data viewers and close
them Press shift-f5 to return to design mode
Figure 10-24 Executing Loading Slowly Changing Dimension Package
Trang 722. Let’s make changes to Customer’s data and see how this transformation works with them Run the CustomerChanges.sql from the C:\SSIS\RawFiles folder in the query pane of SQL Server Management Studio This script makes changes to the first three records in Customer table as follows:
103 [Address Line 1]
[Address Line 2] 15 Abercrombie Avenue Wooburn Green
Then it adds an Inferred member record first in DimCustomer table with minimal information as shown here:
[StartDate] 1 getdate()
Then inserts additional information for inferred member in the Customer table,
as shown next, to simulate a real-life scenario:
[FirstName]
[LastName]
[Address Line 1]
[Address Line 2]
[Address Line 3]
[City]
[State]
[Postcode]
[Country]
[Email]
[Home Phone]
[Work Phone]
[Mobile Phone]
111 Mark Morris Flat 22, Crescent Flats The Ridgeway Sketty
Chertsey West Midlands PE7 3RQ United Kingdom mmark@AffordingIT.co.uk
079576516756
23. After running the CustomerChanges.sql script successfully, switch to BIDS and execute the package again As the package executes, this time three data viewers will pop up First, you will see a record in the data viewer attached to Changing Attribute Updates output Note that this is the record for which you changed the
Trang 8home phone number, which is allowed, and no history is to be kept Click Detach
to let it process Then look at the data viewer attached to the Historical Attribute
Inserts Output will display a record with CustomerID = 103 Note that this is the
record that changed the address for which a history is to be kept This record will
first update the existing record with EndDate and then will be combined with the
New Output using Union All transformations so as to be inserted as a new record
Click Detach to let it process Next look at the data viewer attached to Inferred
Member Updates Output that shows a record with CustomerID = 111 This
record will be updated in the DimCustomer table Click Detach to let it process
The New Output path will also be processed and the package will complete
24. Press shift-f5 to switch back to design mode Press ctrl-shift-s to save all the
files and close the project
25. Switch to SQL Server Management Studio and check the DimCustomer table to
see that it has been loaded with the expected values
Review
This exercise has explained loading a dimension, which is a daily chore of a DBA’s life
Loading a data warehouse dimension has been made easy with the Slowly Changing
Dimension transformation, and configuring this transformation has been made easier
still by the SCD Wizard The SCD Wizard does so much work for you that you will
probably use it every time you need to configure a SCD transformation With the
ease of use and coverage for most of the loading scenarios for a dimension, you may
think perhaps SCD has nailed the problem; however, SCD has a shortcoming: its
performance when it comes to loading into a large dimension The use of OLE DB
components to update each member in a dimension slows down the SCD to almost
unacceptable levels of performance You will need to do some work here to improve
performance One solution could be to create indexes on input and destination
columns, while the other one could be to replace OLE DB Command with a staging
OLE DB destination and an Execute SQL task to upload a data set virtually converting
a row-based operation into a set-based operation I’ve seen other customized solutions
where developers have derived different flags, first using a script task and then loading
the data set straightaway, converting the row-based decision-making logic at run time
to a set-based operation and gaining performance However, whichever method you
use, testing for performance is the last, must-do step
Data Mining Query Transformation
You can use a Data Mining Query transformation whenever you want to perform
prediction queries against data mining models This transformation has one input and
one output and no error output As you can envision, to execute your Data Mining
Trang 9Extensions (DMX) query will require you to create a connection to the data mining model, so the user interface of this transformation provides two tabs—Mining Model and Query—for you to configure In the Mining Model tab, you specify the Analysis Services Connection Manager to connect to an Analysis Services project or server You select the mining structure from the drop-down list in the Mining Structure field Once you’ve selected a mining structure, the Mining Models field lists the mining models associated with that mining structure
In the Query tab, you can either type in your query or use the graphical query designer by clicking Build New Query The graphical query designer helps you build DMX queries If you type directly in the query mode and later on switch to design mode, you will get a prompt alerting you that your changes may get lost The designer
in the Query Builder allows you to select mining models and data flow input columns from two list boxes You can drag and drop fields from these boxes onto the designer cells and build custom DMX queries for evaluating data flow input data against an existing mining model You can create more than one prediction query using multiple mining models However, the mining models you select must belong to the same mining structure
Term Lookup Transformation
Using the Term Lookup transformation, you can count the number of times a text term occurs in the input data row and create custom word lists and word frequency statistics This transformation reads the terms from a lookup table to look for matches in an input column and then, by default, adds two columns named Term and Frequency
to the output containing the term and the count for the term The transformation supports one input and one output The reference data can be in SQL Server 2000 and later or in an Access database, and this transformation can do a lookup only on
a column that is either the DT_WSTR or the DT_NTEXT data type
When you open the editor, you will see the following three tabs:
Reference Table
C Specify an OLE DB Connection Manager and a reference table name from which the transformation can read lookup terms
Term Lookup
C Map an Input column to a Reference column to indicate that the lookup terms in the Reference column are to be counted in the Input column You also select the columns that you want to pass through the data flow Based on your selections, an advanced property—the InputColumnType of the Input column—
is set that is available in Advanced Editor Selecting a column only for pass-through sets the InputColumnType property to 0, mapping a column for lookup only sets the InputColumnType property to 1, and selecting a column for both pass-through and lookup operations sets the InputColumnType property to 2
Trang 10c Select to use a case-sensitive term lookup in which uppercase words
are treated separate from lowercase words However, if a word is a first word in
a sentence and its first letter is capitalized, this can still match with a lowercase
equivalent—so, for example, the word travel will match with the word Travel in
the sentence “Travel in style and comfort.”
It is relatively straightforward to configure this transformation The mechanics are
also simple This transformation loads the terms it is to look up from the reference
table in its private memory It works in a fully precached mode only, so it loads all the
values before processing lookups against the input column However, to get accurate
results you need to understand how this transformation behaves for different types of
matches The Term Lookup transformation extracts the required term from the input
column by breaking the text into sentences, breaking sentences into words, and then
normalizing the words To extract the matching term, this transformation observes the
following rules:
If you specify the singular form of the word or phrase in the reference table, this
c
transformation will match both singular and plural forms of the word or phrase
If you use a plural form of the noun or noun phrase in the reference table, the
c
transformation matches it only with a plural form of the noun or noun phrase in
the input data
If you want to do a match for nouns and noun phrases that contain special
c
characters such as %, @, &, $, #, *, :, ;, , , , !, ?, <, >, +, =, ^, ~, |, \, /, (, ), [, ], {, }, “,
and ‘, you can do so by including these special characters in the nouns and noun
phrases in the reference table
The Term Lookup transformation returns only one result for any lookup input
c
column in which multiple overlapping terms are involved
While normalizing the input column words, the Term Lookup transformation
c
will affect the last word in the lemmatized noun phrase for normalization
Term Extraction Transformation
Using a Term Extraction transformation, you can extract terms from the text of an
input column and can thus build a list of terms used repeatedly in the input column
for text mining and data analysis This transformation, however, is limited in that it
can extract only nouns or noun phrases or a combination of both, in English text only
It is aware of linguistic information about English and comes with its own English
dictionary At run time, the Term Extraction transformation reads the specified input