Indexes search from the beginning of strings, as shown here: SELECT Title FROM Fable WHERE CONTAINS *,‘ "Hunt*" ’; Result: Title ---The Hunter and the Woodman The Ass in the Lion’s Skin
Trang 1The other solution to the multiple-column search problem consists of adding an additional
col-umn to hold all the text to be searched and duplicating the data from the original colcol-umns to a
FullTextSearchcolumn within an after trigger or using a persisted computed column This solution
is not smooth either It duplicates data and costs performance time during inserts and updates The crux
of the decision regarding how to solve the multiple-column search is the conflict between fast reads and
fast writes — OLAP versus OLTP
Searches with wildcards
Because the full-text search engine has its roots in Windows Index and was not a SQL Server–developed
component, its wildcards use the standard DOS conventions (asterisk for a multi-character wildcard, and
double quotes) instead of SQL-style wildcards and SQL single quotes
The other thing to keep in mind about full-text wildcards is that they work only at the end of a word,
not at the beginning Indexes search from the beginning of strings, as shown here:
SELECT Title FROM Fable
WHERE CONTAINS (*,‘ "Hunt*" ’);
Result:
Title -The Hunter and the Woodman
The Ass in the Lion’s Skin The Bald Knight
Phrase searches
Full-text search can attempt to locate full phrases if those phrases are surrounded by double quotes
For example, to search for the fable about the boy who cried wolf, searching for ‘‘Wolf! Wolf!’’ does
the trick:
SELECT Title FROM Fable WHERE CONTAINS (*,‘ "Wolf! Wolf!" ’);
Result:
Title -The Shepherd’s Boy and the Wolf
Word-proximity searches
When searching large documents, it’s nice to be able to specify the proximity of the search words
Full-text search implements a proximity switch by means of theNEARoption The relative distance between
Trang 2the words is calculated, and, if the words are close enough (within about 30 words, depending on the
size of the text), then full-text search returns atruefor the row
The story of Androcles, the slave who pulls the thorn from the lion’s paw, is one of the longer fables
in the sample database, so it’s a good test sample
The following query attempts to locate the fable ‘‘Androcles’’ based on the proximity of the words
‘‘pardoned’’ and ‘‘forest’’ in the fable’s text:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘pardoned NEAR forest’);
Result:
Title
-Androcles
The proximity switch can handle multiple words The following query tests the proximity of the words
‘‘lion,’’ ‘‘paw,’’ and ‘‘bleeding’’:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘lion NEAR paw NEAR bleeding’);
Result:
Title
-Androcles
The proximity feature can be used withCONTAINSTABLE; theRANKindicates relative proximity
The following query ranks the fables that mention the word ‘‘life’’ near the word ‘‘death’’ in order of
proximity:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE (Fable, *,‘life NEAR death’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY FTS.Rank DESC;
Result:
-The Serpent and the Eagle 7
The Eagle and the Arrow 1
The Woodman and the Serpent 1
Trang 3Word-inflection searches
The full-text search engine can actually perform linguistic analysis and base a search for different words
on a common root word This enables you to search for words without worrying about number or
tense For example, the inflection feature makes possible a search for the word ‘‘flying’’ that finds a
row containing the word ‘‘flew.’’ The language you specify for the table is critical in a case like this
Something else to keep in mind is that the word base will not cross parts of speech, meaning that
a search for a noun won’t locate a verb form of the same root The following query demonstrates
inflection by locating the fable with the word ‘‘flew’’ in ‘‘The Crow and the Pitcher’’:
SELECT Title FROM Fable
WHERE CONTAINS (*,‘FORMSOF(INFLECTIONAL,fly)’);
Result:
Title -The Crow and the Pitcher
The Bald Knight
Thesaurus searches
The full-text search engine has the capability to perform thesaurus lookups for word replacements as
well as synonyms To configure your own thesaurus options, edit the thesaurus file The location of the
thesaurus file is dependent on your language, and server
The thesaurus file for your language will follow the naming convention TSXXX.xml, where XXX is your
language code (e.g., ENU for U.S English, ENG for U.K English, and so on) You need to remove the
comment lines from your thesaurus file If you edit this file in a text editor, then there are two sections
or nodes to the thesaurus file: an expansion node and a replacement node The expansion node is used to
expand your search argument from one term to another argument For example, in the thesaurus file,
you will find the following expansion:
<expansion>
<sub>Internet Explorer</sub>
<sub>IE</sub>
<sub>IE5</sub>
</expansion>
This will convert any searches on ‘‘IE’’ to search on ‘‘IE’’ or ‘‘IE5’’ or ‘‘Internet Explorer.’’
The replacement node is used to replace a search argument with another argument For example, if you
want the search argument sex interpreted as gender, you could use the replacement node to do that:
<replacement>
<pat>sex</pat>
<sub>gender</sub>
</replacement>
Trang 4Thepatelement (sex) indicates the pattern you want substituted by thesubelement (gender).
AFREETEXTquery will automatically use the thesaurus file for the language type Here is an example
of a generational query using theThesaurusoption:
SELECT * FROM TableName WHERE CONTAINS(*,‘FORMSOF(Thesaurus,"IE")’);
This returns matches to rows containing IE, IE5, and Internet Explorer
Variable-word-weight searches
In a search for multiple words, relative weight may be assigned, making one word critical to the search
and another word much less important The weights are set on a scale of 0.0 to 1.0
TheISABOUToption enables weighting, and any hit on the given word allows the rows to be returned,
so it functions as an implied BooleanORoperator
The following two queries use theweightoption withCONTAINSTABLEto highlight the differences
among the words ‘‘lion,’’ ‘‘brave,’’ and ‘‘eagle’’ as the weighting changes The query will examine only the
FableTextcolumn to prevent the results from being skewed by the shorter lengths of the text found
on the title and moral columns The first query weights the three words evenly:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, FableText,
‘ISABOUT (Lion weight (.5),
Brave weight (.5), Eagle weight (.5))’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
-
The Eagle and the Fox 85
The Hunter and the Woodman 50
The Serpent and the Eagle 50
The Eagle and the Arrow 21
The Ass in the Lion’s Skin 16
When the relative importance of the word ‘‘eagle’’ is elevated, it’s a different story:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, FableText,
Trang 5‘ISABOUT (Lion weight (.2), Brave weight (.2),
Eagle weight (.8))’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
-The Eagle and the Fox 102
The Serpent and the Eagle 59 The Eagle and the Arrow 25
The Hunter and the Woodman 14
The Ass in the Lion’s Skin 4
When all the columns participate in the full-text search, the small size of the moral and the title make
the target words seem relatively more important within the text The next query uses the same weighting
as the previous query but includes all columns (*):
SELECT Fable.Title, FTS.Rank FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, *,
‘ISABOUT (Lion weight (.2), Brave weight (.2),
Eagle weight (.8))’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
The Hunter and the Woodman 408 The Eagle and the Fox 102 The Eagle and the Arrow 80 The Serpent and the Eagle 80
The Ass in the Lion’s Skin 23
The ranking is relative, and is based on word frequency, word proximity, and the relative importance of
a given word within the text ‘‘The Wolf and the Kid’’ does not contain an eagle or a lion, but two
fac-tors favor bravado First, ‘‘brave’’ is a rarer word than ‘‘lion’’ or ‘‘eagle’’ in both the column and the table
Second, the word ‘‘brave’’ appears in the moral as one of only 10 words So even though ‘‘brave’’ was
weighted less, it rises to the top of the list It’s all based on word frequencies and statistics (and
some-times, I think, the phase of the moon!)
Trang 6Fuzzy Searches
While theCONTAINSpredicate andCONTAINSTABLE-derived table perform exact word searches, the
FREETEXTpredicate expands on theCONTAINSfunctionality to include fuzzy, or approximate, full-text
searches from free-form text
Instead of searching for two or three words and adding the options for inflection and weighting, the
fuzzy search handles the complexity of building searches that make use of all the full-text search engine
options, and tries to solve the problem for you Internally, the free-form text is broken down into
multiple words and phrases, and the full-text search with inflections and weighting is then performed on
the result
Freetext
FREETEXTworks within aWHEREclause just likeCONTAINS, but without all the options The
follow-ing query uses a fuzzy search to find the fable about the big race:
SELECT Title
FROM Fable
WHERE FREETEXT
(*,‘The tortoise beat the hare in the big race’);
Result:
Title
-The Hare and the Tortoise
FreetextTable
Fuzzy searches benefit from theFREETEXT-derived table that returns the ranking in the same way that
CONTAINSTABLEdoes The two queries shown in this section demonstrate a fuzzy full-text search using
theFREETEXT-derived table Here is the first query:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN FREETEXTTABLE
(Fable, *, ‘The brave hunter kills the lion’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
-The Hunter and the Woodman 257
The Ass in the Lion’s Skin 202
Trang 7The Dogs and the Fox 100 The Goose With the Golden Eggs 72 The Shepherd’s Boy and the Wolf 72
Here is the second query:
SELECT Fable.Title, FTS.Rank FROM Fable
INNER JOIN FREETEXTTABLE
(Fable, *, ‘The eagle was shot by an arrow’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
-The Eagle and the Arrow 288
The Eagle and the Fox 135 The Serpent and the Eagle 112 The Hunter and the Woodman 102 The Father and His Two Daughters 72
Performance
SQL Server 2008’s full-text search engine performance is several orders of magnitude faster than
previous versions of SQL Server However, you still might want to tune your system for optimal
performance
■ iFTS benefits from a very fast subsystem Place your catalog on its own controller, preferably its own RAID 10 array A sweet spot exists for SQL iFTS on eight-way servers After a full or incremental population, force a master merge, which will consolidate all the shadow indexes into a single master index, by issuing the following command:
ALTER FULLTEXT CATALOG catalog_name REORGANIZE;
■ You can also increase the maximum number of ranges that the gathering process can use To
do so, issue the following command:
EXEC sp_configure ‘max full-text crawl range’, 32;
Summary
SQL Server indexes are not designed for searching for words in the middle of a column If the database
project requires flexible word searches, then Integrated Full-Test Search (iFTS) is the perfect solution,
even though it requires additional development and administrative work
Trang 8■ iFTS requires configuring a catalog for each table to be searched.
■ iFTS catalogs are not populated synchronously within the SQL Server transaction They are
populated asynchronously following the transaction The recommended method is using
Change Tracking, which can automatically push changes as they occur
■ CONTAINSis used within theWHEREclause and performs simple word searches, but it can
also perform inflectional, proximity, and thesaurus searches
■ CONTAINSTABLEfunctions likeCONTAINSbut it returns a data set that can be referenced in
aFROMclause
■ FREETEXTandFREETEXTTABLEessentially turn on every advanced feature of iFTS and
perform a fuzzy word search
As you read through this ‘‘Beyond Relational’’ part of the book, I hope you’re getting a sense of the
breadth of data SQL Server can manage The next chapter concludes this part with Filestream, a new
way to store large BLOBs with SQL Server
Trang 10Developing with
SQL Server
IN THIS PART
Chapter 20
Creating the Physical Database Schema
Chapter 21
Programming with T-SQL
Chapter 22
Kill the Cursor!
Chapter 23
T-SQL Error Handling
Chapter 24
Developing Stored Procedures
Chapter 25
Building User-Defined Functions
Chapter 26
Creating DML Triggers
Chapter 27
Creating DDL Triggers
Chapter 28
Building the Data Abstraction Layer
Chapter 29
Part II of this book was all about writing set-based queries Part III
extended theselectcommand to data types beyond relational
This part continues to expand onselectto provide programmable
flow of control to develop server-side solutions; and SQL Server has a large
variety of technologies to choose from to develop server-side code — from
the mature T-SQL language to NET assemblies hosted within SQL Server
This part opens with DDL commands (create,alter, anddrop), and
progresses through 10 chapters of Transact-SQL that build on one another
into a crescendo with the data abstraction layer and dynamic SQL The final
chapter fits CLR programming into the picture
So, unleash the programmer within and have fun There’s a whole world of
developer possibilities with SQL Server 2005
If SQL Server is the box, then Part IV is all about thinking inside the box,
and moving the processing as close to the data as possible