Let’s proceed with an example that introduces theGROUP BYkeyword: SELECT GradeType AS 'Grade Type', AVG Grade AS 'Average Grade' FROM Grades GROUP BY GradeType ORDER BY GradeType... The
Trang 1The third format of the COUNT function allows you to use the DISTINCT
keyword in addition to a column name Here’s an example:
SELECT
COUNT (DISTINCT FeeType) AS 'Number of Fee Types'
FROM Fees
This statement is counting the number of distinct values for the FeeType
col-umn The result is:
Number of Fee Types
3
This means that there are three different values found in the FeeType column
Grouping Data
The previous examples of aggregation functions are interesting, but of somewhat
limited value The real power of the aggregation functions will become evident
after we introduce the concept of grouping data
The GROUP BY keyword is used to separate data returned from a SELECT
statement into any number of groups For example, when looking at the previous
Grades table, you may be interested in analyzing test scores based on the grade
type In other words, you want to separate the data into two separate groups,
quizzes and homework The value of the GradeType column can be used to
determine which group each row belongs to
Once data has been separated into groups, then aggregation functions can be
utilized so that summary statistics for each of the groups can be calculated and
compared
Let’s proceed with an example that introduces theGROUP BYkeyword:
SELECT
GradeType AS 'Grade Type',
AVG (Grade) AS 'Average Grade'
FROM Grades
GROUP BY GradeType
ORDER BY GradeType
Trang 2The result is:
Grade Type Average Grade
In this example, theGROUP BYkeyword specifies that groups are to be created based on the value of the GradeType column The two columns in theSELECT
columnlist are GradeType and a calculated field that uses theAVGfunction The
GradeType column was included in the columnlist because when creating a
group, it’s usually a good idea to include the column on which the groups are based The ‘‘Average Grade’’ calculated field aggregates values based on all rows
in each group
Notice that the average homework grade has been computed as 86 Even though there is one row with a NULL value for the Homework type, SQL is smart enough to ignore rows with NULL values when computing an average If you want the NULL value to be counted as a 0, then theISNULLfunction can used to convert the NULL to a 0, as follows:
AVG (ISNULL (Grade, 0)) AS 'Average Grade'
It’s important to note that when using aGROUP BYkeyword, all columns in the
columnlist must either be listed as columns in the GROUP BY clause or else be used in an aggregation function Nothing else would make any sense For example, the followingSELECTwould error:
SELECT
GradeType AS 'Grade Type',
AVG (Grade) AS 'Average Grade',
Student AS 'Student'
FROM Grades
GROUP BY GradeType
ORDER BY GradeType
The problem with this statement is that the Student column is not in theGROUP
BYclause, nor is it aggregated in any way Since everything is being presented in groups, SQL doesn’t know what to do with the Student column
Chapter 10 ■ Summarizing Data
102
Trang 3D A T A B A S E D I F F E R E N C E S : M y S Q L
Unlike Microsoft SQL Server and Oracle, the previous statement will not error in MySQL, but will
produce incorrect results.
Multiple Columns and Sorting
The concept of groups can be extended so the groups are based on more than one
column Let’s go back to the last SELECT and add the Student column to the
GROUP BYclause and also to the columnlist It now looks like:
SELECT
GradeType AS 'Grade Type',
Student AS 'Student',
AVG (Grade) AS 'Average Grade'
FROM Grades
GROUP BY GradeType, Student
ORDER BY GradeType, Student
The resulting data is:
Grade Type Student Average Grade
You now see a breakdown not only of grade types, but also of students The
average grades are computed on each group Note that the Homework row for
Kathy shows a NULL value, since she only has one homework row, and that row
has a value of NULL for the grade
The order in which columns are listed in the GROUP BY clause has no
sig-nificance The results would be the same if the clause were:
GROUP BY Student, GradeType
Trang 4However, as always, the order that columns are listed in theORDER BYclause is meaningful If you switch theORDER BYclause to:
ORDER BY Student, GradeType
then the results are:
Grade Type Student Average
This still looks a bit strange, since it’s difficult to tell at a glance that the data is really sorted by Student and then by Grade Type As a general rule of thumb, it often helps if columns are listed in the same order in which columns are sorted
A more understandableSELECTstatement would be:
SELECT
Student AS 'Student',
GradeType AS 'Grade Type',
AVG (Grade) AS 'Average Grade'
FROM Grades
GROUP BY GradeType, Student
ORDER BY Student, GradeType
The data now looks like:
Student Grade Type Average Grade
Chapter 10 ■ Summarizing Data
104
Trang 5This is more comprehensible, since the column order corresponds to the sort
order
There’s sometimes a certain confusion as to the difference between theGROUP
BYandORDER BYclauses Just remember that theGROUP BYmerely creates the
groups You still need to use theORDER BY to present your data in the correct
sequence
Selection Criteria on Aggregates
One more topic needs to be added to our discussion of summarizing data Once
groups are created, selection criteria becomes a bit more complex When
apply-ing any kind of selection criteria to aSELECT with aGROUP BY, one has to ask
whether the selection criteria applies to the individual rows or to the entire
group
In essence, theWHERE clause handles selection criteria for individual rows SQL
provides a keyword named HAVING, which allows for selection criteria at the
group level
Returning to the Grades table, let’s say you want to only look at grades on quizzes
that are 70 or higher The grades you’d like to look at are individual grades, so
you can use theWHEREclause, as normal Such aSELECTmight look like:
SELECT
Student AS 'Student',
GradeType AS 'Grade Type',
Grade AS 'Grade'
FROM Grades
WHERE GradeType ¼ 'Quiz'
AND Grade >= 70
ORDER BY Student, Grade
The resulting data is:
Student GradeType Grade