The result is:FirstName LastName Matching by Sound Let’s turn from matching letters and characters to matching sounds.. Let’s first look at an example that utilizes theSOUNDEX function:
Trang 1The result is:
FirstName LastName
Matching by Sound
Let’s turn from matching letters and characters to matching sounds SQL
pro-vides two functions that give you some interesting ways to compare the sounds
of words or phrases The two functions areSOUNDEXandDIFFERENCE
Let’s first look at an example that utilizes theSOUNDEX function:
SELECT
SOUNDEX ('Smith') AS 'Sound of Smith',
SOUNDEX ('Smythe') AS 'Sound of Smythe'
The result is:
Sound of Smith Sound of Smythe
TheSOUNDEXfunction always returns a four-character response, which is a sort
of code for the sound of the phrase The first character is always the first letter of
the phrase In this case, the first character is S because both Smith and Smythe
begin with an S
The remaining three characters are calculated from an analysis of the sound of
the rest of the phrase Internally, the function first removes all vowels and the
letter Y So, the function takes the MITH from SMITH and converts it to MTH
Likewise, it takes the MYTHE from SMYTHE and converts it to MTH It then
assigns a number to represent the sound of the phrase In this example, that
number turns out to be 530
Since SOUNDEX returns a value of S530 for both Smith and Smythe, you can
conclude that they probably have very similar sounds
Microsoft SQL Server provides one additional function, called DIFFERENCE,
which works in conjunction with theSOUNDEXfunction
Trang 2D A T A B A S E D I F F E R E N C E S : M y S Q L a n d O r a c l e
The DIFFERENCE function isn’t available in MySQL or Oracle.
Here’s an example, using the same words:
SELECT
DIFFERENCE ('Smith', 'Smythe') AS 'The Difference'
The result is:
The Difference
4
The DIFFERENCE function always requires two arguments Internally, the function first retrieves theSOUNDEX values for each of the arguments and then compares those values If it returns a value of 4, as in the previous example, that means that all four characters in theSOUNDEX value are identical A value of 0 means that none of the characters is identical Therefore, aDIFFERENCEvalue
of 4 indicates the highest possible match, and a value of 0 is the lowest possible match
With this in mind, here’s an example of how theDIFFERENCEfunction can be used to retrieve values that are very similar in sound to a specific phrase Work-ing from the Actors table, you’re goWork-ing to attempt to find rows with a first name that sounds like John TheSELECTstatement is:
SELECT
FirstName,
LastName
FROM Actors
WHERE DIFFERENCE (FirstName, 'John') ¼ 4
The results are:
FirstName LastName
Chapter 9 ■ Inexact Matches
92
Trang 3TheDIFFERENCE function concluded that both John and Jon had a difference
value of 4 between the name and the specified value of John
If you want to analyze exactly why these two rows were selected, you can alter
yourSELECTto show both theSOUNDEXandDIFFERENCEvalues for all rows
in the table:
SELECT
FirstName,
LastName,
DIFFERENCE (FirstName, 'John') AS 'Difference Value',
SOUNDEX (FirstName) AS 'Soundex Value'
FROM Actors
This returns:
FirstName LastName Difference Value Soundex Value
Notice that both Jon Voight and John Wayne have aSOUNDEXvalue of J500 and
a DIFFERENCE value of 4 for their first names This explains why they were
initially selected Also notice that Julie Andrews has aDIFFERENCEvalue of 3 If
you had specified aWHEREclause where theDIFFERENCEvalue equaled 3 or 4,
that actor would have been selected as well
Looking Ahead
This concludes our study of matching phrases by pattern or sound Matching by
patterns is an important and widely used function of SQL Any time you enter a
word in a search box and attempt to retrieve all entities containing that word,
you are utilizing pattern matching Efforts to match by sound are much less
common The technology exists, but there is an inherent difficulty in translating
words to sounds The English language, or any language for that matter, contains
too many quirks and exceptions for such a match to be reliable
Trang 4In our next chapter, ‘‘Summarizing Data,’’ we’re going to turn our attention to ways to separate data into groups and summarize the values in those groups with various statistics Back in Chapter 4, we talked about scalar functions The next
chapter will introduce another type of function, called aggregate functions These
aggregate functions will allow you to summarize your data in many useful ways For example, you’ll be able to look at any group of orders and determine the number of orders, the total dollar amount of the orders, and the average order size With these techniques, you’ll be able to move beyond the presentation of detailed data and begin to truly add value for your users as you deliver sum-marized information
Chapter 9 ■ Inexact Matches
94
Trang 5Summarizing Data
Up until now, we’ve been presenting data basically as it exists in a database Sure, we’ve used some functions to move things around and have created some addi-tional calculations, but the rows we’ve retrieved have corresponded to rows in the underlying database We now want to turn to various methods to summarize our data
The computer term usually associated with this type of endeavor is aggregation,
which means ‘‘to combine into groups.’’ The ability to aggregate and summarize your data is key to being able to move beyond a mere display of data to some-thing approaching real information There’s a bit of magic involved when users view summarized data in a report They understand and appreciate that you’ve been able to extract some real meaning from the mass of data in a database, in order to present a clearer picture of what it all means
Eliminating Duplicates
Although it doesn’t provide a true aggregation, the most elementary way
to summarize data is to eliminate duplicates SQL has a keyword named
DISTINCT, which provides an easy way to remove duplicate rows from your output
95