89Chapter 6 Turning Data into Information After completing this chapter, you will be able to: ■ ■ Return a value that aggregates data from a table column ■ ■ Add a column that aggregates
Trang 189
Chapter 6
Turning Data into Information
After completing this chapter, you will be able to:
■
■ Return a value that aggregates data from a table column
■
■ Add a column that aggregates data from a table, or from its parent or child table
■
■ Build an index-based view of a table
■
■ Generate a new table based on a projected view of the original table
After you have joined DataTable instances together in a DataSet, ADO.NET enables a few
more features that let you use those table relationships to analyze and select data These features build upon some of the single-table functions covered in earlier chapters
This chapter introduces the data-aggregation features included in the ADO.NET Framework, expressions that summarize data across multiple table rows Although not as powerful as
the aggregation features found in relational database systems, the DataTable variations still
provide quick access to multirow data summaries The chapter ends with an introduction to
the DataView class, which lets you establish row selection, filtering, and sorting standards for
a DataTable.
Note The exercises in this chapter all use the same sample project, a tool that demonstrates aggregate and data view features Although you will be able to run the application after each exercise, the expected results for the full application might not appear until you complete all exercises in the chapter.
Aggregating Data
An aggregation function returns a single calculated value from a set of related values
Averages are one type of data aggregation; they calculate a single averaged value from an input of multiple source values ADO.NET includes seven aggregation functions for use in
expression columns and other DataTable features.
■
■ Sum Calculates the total of a set of column values The column being summed must
be numeric, either integral or decimal
■
■ Avg Returns the average for a set of numbers in a column This function also requires
a numeric column
Trang 2strings, dates, and other types of data that can be placed in order are all valid for the target column
■
■ Max Like Min, but returns the largest value from the available column values As with the Min function, most column types will work.
■
■ Count Simply counts the number of rows included in the aggregation You can pass any type of column to this function As long as a row includes a non-NULL value in that column, it will be counted as 1
■
■ StDev Determines the statistical standard deviation for a set of values, a common measure of variability within such a set The indicated column must be numeric
■
■ Var Calculates the statistical variance for a set of numbers, another measurement re-lated to the standard deviation Only numeric columns are supported
These seven data aggregation features appear as functions within ADO.NET expressions Expressions were introduced in the “Using Expression Columns” section of Chapter 4,
“Accessing the Right Data Values.” String expressions form the basis of custom expression
col-umns and are also used in selecting subsets of DataTable rows To aggregate data, use one of
the following function formats as the expression string:
■
■ Sum(column-name)
■
■ Avg(column-name)
■
■ Min(column-name)
■
■ Max(column-name)
■
■ Count(column-name)
■
■ StDev(column-name)
■
■ Var(column-name)
In ADO.NET, aggregates always summarize a single DataTable column Each aggregate
func-tion considers only non-NULL column values Rows that contain NULL values in the specified column are excluded from the aggregation For example, if you take the average of a table column with 10 rows, but 3 of those rows contain NULL values in the column being averaged,
the function will average only the 7 non-NULL values This is especially useful with the Count
function; it counts only the number of rows that have a non-NULL value for the passed column name If all the column values are NULL, or if there are no rows to apply to the aggregation
function, the result is NULL (System.DBNull).
Trang 3Generating a Single Aggregate
To calculate the aggregate of a single table column, use the DataTable object’s Compute
method Pass it an expression string that contains an aggregate function with a column-name argument
C#
DataTable employees = new DataTable("Employee");
employees.Columns.Add("ID", typeof(int));
employees.Columns.Add("Gender", typeof(string));
employees.Columns.Add("FullName", typeof(string));
employees.Columns.Add("Salary", typeof(decimal));
// - Add employee data to table, then
decimal averageSalary = (decimal)employees.Compute("Avg(Salary)", "");
Visual Basic
Dim employees As New DataTable("Employee")
employees.Columns.Add("ID", GetType(Integer))
employees.Columns.Add("Gender", GetType(string))
employees.Columns.Add("FullName", GetType(String))
employees.Columns.Add("Salary", GetType(Decimal))
' - Add employee data to table, then
Dim averageSalary As Decimal = CDec(employees.Compute("Avg(Salary)", ""))
In the preceding code, the Compute method calculates the average of the values in the Salary column The second argument to Compute is a filter that limits the rows included in the calculation It accepts a Boolean criteria expression similar to those used in the DataTable Select method call.
C#
int femalesInCompany = (int)employees.Compute("Count(ID)",
"Gender = 'F'");
Visual Basic
Dim femalesInCompany As Integer = CInt(employees.Compute("Count(ID)",
"Gender = 'F'"))
Trang 4Computing an Aggregate Value: C#
1 Open the “Chapter 6 CSharp” project from the installed samples folder The project
in-cludes three Windows.Forms classes: Switchboard, Aggregates, and DataViews.
2 Open the source code view for the Aggregates form Locate the ActCompute_Click
func-tion This routine computes an aggregate value for a single table column
3 Just after the “Build the expression” comment, add the following statement:
expression = ComputeFunction.SelectedItem.ToString() + "(" +
columnName + ")";
This code builds an expression string that combines one of the seven aggregate func-tions and a column name from the sample table
4 Just after the “Process the expression” comment, add the following code:
try
{
result = whichTable.Compute(expression, "");
}
catch (Exception ex)
{
MessageBox.Show("Could not compute the column: " + ex.Message);
return;
}
The code performs the calculation in a try block because the code that built the
ex-pression didn’t bother to verify things such as allowing only numeric columns to be
used with the Sum aggregate function The catch block will capture such problems at
runtime
5 Just after the “Display the results” comment, add the following statements:
if (DBNull.Value.Equals(result))
MessageBox.Show("NULL");
else
MessageBox.Show(result.ToString());
Some aggregates may return a NULL result depending on the contents of the column This code makes that distinction
6 Run the program When the Switchboard form appears, click Aggregate Functions
When the Aggregates form appears, use the fields to the right of the Compute label to
generate the aggregate For example, select Sum from the Aggregate Function field (the one just to the right of the Compute label), and choose Child.Population2009
from the Column Name field (the one in parentheses) Then click Compute The response of “307006550” comes from adding up all values in the child table’s
Population2009 column.
Trang 5Note The “Child.” prefix shown in the Column Name field is stripped out before the column
name is inserted into the expression The Compute method does not support the Parent and
Child prefixes before column names.
Computing an Aggregate Value: Visual Basic
1 Open the “Chapter 6 VB” project from the installed samples folder The project includes
three Windows.Forms classes: Switchboard, Aggregates, and DataViews.
2 Open the source code view for the Aggregates form Locate the ActCompute_Click
func-tion This routine computes an aggregate value for a single table column
3 Just after the “Build the expression” comment, add the following statement:
expression = ComputeFunction.SelectedItem.ToString() & "(" &
columnName & ")"
This code builds an expression string that combines one of the seven aggregate func-tions and a column name from the sample table
4 Just after the “Process the expression” comment, add the following code:
Try
result = whichTable.Compute(expression, "")
Catch ex As Exception
MessageBox.Show("Could not compute the column: " & ex.Message)
Return
End Try
Trang 6The code performs the calculation in a Try block because the code that built the
ex-pression didn’t bother to verify things such as allowing only numeric columns to be
used with the Sum aggregate function The Catch block will capture such problems at
runtime
5 Just after the “Display the results” comment, add the following statements:
If (IsDBNull(result) = True) Then
MessageBox.Show("NULL")
Else
MessageBox.Show(result.ToString())
End If
Some aggregates may return a NULL result depending on the contents of the column This code makes that distinction
6 Run the program When the Switchboard form appears, click Aggregate Functions
When the Aggregates form appears, use the fields to the right of the Compute label to
generate the aggregate For example, select Sum from the Aggregate Function field (the one just to the right of the Compute label), and choose Child.Population2009
from the Column Name field (the one in parentheses) Then click Compute The response of “307006550” comes from adding up all values in the child table’s
Population2009 column.
Note In the example, the “Child.” prefix shown in the Column Name field is stripped out before
the column name is inserted into the expression The Compute method does not support the
Parent and Child prefixes before column names.
Adding an Aggregate Column
Expression columns typically compute a value based on other columns in the same row You can also add an expression column to a table that generates an aggregate value In the absence of a filtering expression, aggregates always compute their totals using all rows in a table This is also true of aggregate expression columns When you add such a column to a table, that column will contain the same value in every row, and that value will reflect the ag-gregation of all rows in the table
Trang 7DataTable sports = new DataTable("Sports");
sports.Columns.Add("SportName", typeof(string));
sports.Columns.Add("TeamPlayers", typeof(decimal));
sports.Columns.Add("AveragePlayers", typeof(decimal),
"Avg(TeamPlayers)");
sports.Rows.Add(new Object[] {"Baseball", 9});
sports.Rows.Add(new Object[] {"Basketball", 5});
sports.Rows.Add(new Object[] {"Cricket", 11});
MessageBox.Show((string)sports.Rows[0]["AveragePlayers"]); // Displays 8.3
MessageBox.Show((string)sports.Rows[1]["AveragePlayers"]); // Also 8.3
Visual Basic
Dim sports As New DataTable("Sports")
sports.Columns.Add("SportName", GetType(String))
sports.Columns.Add("TeamPlayers", GetType(Decimal))
sports.Columns.Add("AveragePlayers", GetType(Decimal),
"Avg(TeamPlayers)")
sports.Rows.Add({"Baseball", 9})
sports.Rows.Add({"Basketball", 5})
sports.Rows.Add({"Cricket", 11})
MessageBox.Show(CStr(sports.Rows(0)!AveragePlayers)) ' Displays 8.3
MessageBox.Show(CStr(sports.Rows(1)!AveragePlayers)) ' Also 8.3
Aggregating Data Across Related Tables
Adding aggregate functions to an expression column certainly gives you more data options,
but as a calculation method it doesn’t provide any benefit beyond the DataTable.Compute
method The real power of aggregate expression columns appears when working with
relat-ed tables By adding an aggregate function to a parent table that references the child table, you can generate summaries that are grouped by each parent row This functionality is
simi-lar in purpose to the GROUP BY clause found in the SQL language.
To apply an aggregate to a table relationship, you first add both tables to a DataSet and then add the relevant DataRelation between the linked fields After the tables are linked, you in-clude the Child keyword with the aggregate function’s column name reference.
function-name(Child.column-name)
Trang 8As with single-table aggregation, the expression can reference any valid column in the child table, including other expression columns Consider the following code, which calculates each customer’s total orders and stores the result in an expression column in the customer (parent) table:
C#
// - Build the parent table and add some data.
DataTable customers = new DataTable("Customer");
customers.Columns.Add("ID", typeof(int));
customers.Columns.Add("Name", typeof(string));
customers.Rows.Add(new Object[] {1, "Coho Winery"});
customers.Rows.Add(new Object[] {2, "Fourth Coffee"});
// - Build the child table and add some data The "Total"
// expression column adds sales tax to the subtotal.
DataTable orders = new DataTable("Order");
orders.Columns.Add("ID", typeof(int));
orders.Columns.Add("Customer", typeof(int));
orders.Columns.Add("Subtotal", typeof(decimal));
orders.Columns.Add("TaxRate", typeof(decimal));
orders.Columns.Add("Total", typeof(decimal), "Subtotal * (1 + TaxRate)");
// - Two sample orders for customer 1, 1 for customer 2.
orders.Rows.Add(new Object[] {1, 1, 35.24, 0.0875}); // Total = $38.32
orders.Rows.Add(new Object[] {2, 1, 56.21, 0.0875}); // Total = $61.13
orders.Rows.Add(new Object[] {3, 2, 14.94, 0.0925}); // Total = $16.32
// - Link the tables within a DataSet.
DataSet business = new DataSet();
business.Tables.Add(customers);
business.Tables.Add(orders);
business.Relations.Add(customers.Columns["ID"], orders.Columns["Customer"]);
// - Here is the aggregate expression column.
customers.Columns.Add("OrderTotals", typeof(decimal), "Sum(Child.Total)");
// - Display each customer's order total
foreach (DataRow scanCustomer in customers.Rows)
{
Console.WriteLine((string)scanCustomer["Name"] + ": " +
string.Format("{0:c}", (decimal)scanCustomer["OrderTotals"]));
}
Trang 9Visual Basic
' - Build the parent table and add some data.
Dim customers As New DataTable("Customer")
customers.Columns.Add("ID", GetType(Integer))
customers.Columns.Add("Name", GetType(String))
customers.Rows.Add({1, "Coho Winery"})
customers.Rows.Add({2, "Fourth Coffee"})
' - Build the child table and add some data The "Total"
' expression column adds sales tax to the subtotal.
Dim orders As New DataTable("Order")
orders.Columns.Add("ID", GetType(Integer))
orders.Columns.Add("Customer", GetType(Integer))
orders.Columns.Add("Subtotal", GetType(Decimal))
orders.Columns.Add("TaxRate", GetType(Decimal))
orders.Columns.Add("Total", GetType(Decimal), "Subtotal * (1 + TaxRate)")
' - Two sample orders for customer 1, 1 for customer 2.
orders.Rows.Add({1, 1, 35.24, 0.0875}) ' Total = $38.32
orders.Rows.Add({2, 1, 56.21, 0.0875}) ' Total = $61.13
orders.Rows.Add({3, 2, 14.94, 0.0925}) ' Total = $16.32
' - Link the tables within a DataSet.
Dim business As New DataSet
business.Tables.Add(customers)
business.Tables.Add(orders)
business.Relations.Add(customers.Columns!ID, orders.Columns!Customer)
' - Here is the aggregate expression column.
customers.Columns.Add("OrderTotals", GetType(Decimal), "Sum(Child.Total)")
' - Display each customer's order total.
For Each scanCustomer As DataRow In customers.Rows
Console.WriteLine(CStr(scanCustomer!Name) & ": " &
Format(scanCustomer!OrderTotals, "Currency"))
Next scanCustomer
This code generates the following output, correctly calculating the per-customer total of all child-record orders:
Coho Winery: $99.45
Fourth Coffee: $16.32
Trang 10The code calculated these totals by adding up the Child.Total column values for only those child rows that were associated to the parent row through the defined DataRelation Because
the aggregate functions work only with a single named column, a more complex request
such as Sum(Child.SubTotal * (1 + Child.TaxRate)) would fail The only way to generate totals
from multiple child columns (or even multiple columns within the same table) is to first add
an expression column to the child table and then apply the aggregate function to that new column
Referencing Parent Fields in Expressions
Although ADO.NET query expressions support a “Parent” keyword, it can’t be used with the aggregation functions Instead you use it to add an expression column to a child table that references column data from the parent table For instance, if you had Customer (parent) and Order (child) tables linked by a customer ID, and the parent table included the address for the customer, you could include the city name in the child table using an expression column
C#
orders.Columns.Add("CustomerCity", typeof(string), "Parent.City");
Visual Basic
orders.Columns.Add("CustomerCity", GetType(String), "Parent.City")
All standard expression operators that work with the local table’s column data will also work with parent columns
Setting Up Indexed Views
The DataTable.Select method lets you apply a selection query to a table, returning a subset of the available rows in the DataTable It’s convenient, but if you will run the same query against
the table repeatedly, it’s not the most efficient use of computing resources Also, because it
returns an array of DataRow instances instead of a new DataTable, some tools that expect a
full table construct won’t work with the returned results