Basically, Integrated Full-Text Search iFTS extends SQL Server beyond the traditional relational datasearches by building an index of every significant word and phrase.. In addition, the
Trang 1so searching forLIKE ‘word%’is fast, butLIKE ‘%word%’is terribly slow Searching for strings
within a string can’t use the b-tree structure of an index to perform a fast index seek so it must perform
a table scan instead, as demonstrated in Figure 19-1 It’s like looking for all the instances of ‘‘Paul’’ in
the telephone book The phone book isn’t indexed by first name, so each page must be scanned
FIGURE 19-1
Filtering by a where clause value that begins with a wildcard is not ‘‘sargable’’ — that is, not a
searchable argument — so it forces the Query Optimizer to use a full scan
Trang 2Basically, Integrated Full-Text Search (iFTS) extends SQL Server beyond the traditional relational data
searches by building an index of every significant word and phrase In addition, the full-text search
engine adds advanced features such as the following:
■ Searching for one word near another word
■ Searching with wildcards
■ Searching for inflectional variations of a word (such as run, ran, running)
■ Weighting one word or phrase as more important to the search than another word or phrase
■ Performing fuzzy word/phrase searches
■ Searching character data with embedded binary objects stored with SQL Server
■ Using Full-Text Search in theWHEREclause or as a data source like a subquery
Full-Text Search must be installed with the instance of SQL Server If it’s not installed on your
instance, it may be added later using the SQL Server Installation Center (see Chapter 4, ‘‘Installing SQL
Server 2008’’)
What’s New with Full-Text Search?
The history of Full-Text Search began in late 1998 when Microsoft reengineered one of its search engines
(Site Server Search — designed for websites) to provide search services for SQL Server 7 The engine was
called MSSearch, and it also provided search services to Exchange Content Indexing and SharePoint Portal
Server 2001 I liked Full-Text Search when it was first introduced back in SQL Server 7, and I’m glad that it’s
still here and Microsoft is continuing to invest in it
Microsoft continued to improve iFTS’s performance and scalability with SQL Server 2000 and SQL Server
2005 Also, in case you didn’t follow the evolution of Full-Text Search back in SQL Server 2005, Microsoft
worked on bringing Full-Text Search closer to industry standards:
■ The list ofnoise words was renamed to the industry standard term of stoplist
■ The many set-up stored procedures were simplified into normal DDL CREATE, ALTER,
and DROP commands
With SQL Server 2008, the old stored procedure methods of setting up Full-Text Search are deprecated,
meaning they will be removed in a future version
SQL 2008 Integrated Full-Text Search (iFTS) is the fourth-generation search component for SQL Server,
and this new version is by far the most scalable and feature-rich SQL 2008 iFTS ships in the Workgroup,
Standard, and Enterprise versions of SQL Server
With SQL Server 2008, SQL Server is no longer dependent on the indexing service of Windows Instead, it
is now fully integrated within SQL Server, which means that the SQL Server development team can advance
Full-Text Search features without depending on a release cycle
continued
Trang 3The integration of FTS in the SQL engine should also result in better performance because the Query
Optimizer can make an informed decision whether to invoke the full-text engine before or after applying
non-FTS filters
Minor enhancements include the following:
■ A number of new DMVs expose the workings of iFTS
■ Forty new languages
■ Noise words management with T-SQL using create fulltext stoplist
■ Thesaurus stored in system table and instance-scoped
All the code samples in this chapter use the Aesop’s Fables sample database The Aesop_Create.sql script will create the database and populate it with 25 of Aesop’s fables The database create script as well as the chapter code can be downloaded from
www.sqlserverbible.com
Integrated Full-Text Search is not installed by default by SQL Server 2008 Setup The option is under the Database Engine Services node To add Integrated Full-Text Search
to an existing instance, use the Programs and Features application in the Control Panel to launch SQL
Server Setup in a mode that allows changing the SQL Server components.
Microsoft is less than consistent with the naming of Integrated Full-Text Search Manage-ment Studio and Books Online sometimes call it only Full-Text Search or Full-Text Indexing.
In this chapter I use Integrated Full-Text Search or iFTS If I use a different term it’s only because the
sentence is referring to a specific UI command in Management Studio.
Configuring Full-Text Search Catalogs
A full-text search catalog is a collection of full-text indexes for a single SQL Server database Each
log may store multiple full-text indexes for multiple tables, but each table is limited to only one
cata-log Typically, a single catalog will handle all the full-text searches for a database, although dedicating a
single catalog to a very large table (one with over one million rows) will improve performance
Catalogs may index only user tables (not views, temporary tables, table variables, or system tables)
Creating a catalog with the wizard
Although creating and configuring a full-text search catalog with code is easy, the task is usually done
once and then forgotten Unless the repeatability of a script is important for redeploying the project, the
Full-Text Indexing Wizard is sufficient for configuring full-text search
The wizard may be launched from within Management Studio’s Object Explorer With a table selected,
use the context menu and select Full-Text Index➪ Define Full-Text Index
Trang 4The Full-Text Indexing Wizard starts from a selected database and table and works through multiple
steps to configure the full-text catalog, as follows:
1 Select a unique index that full-text can use to identify the rows indexed with full-text The
primary key is typically the best choice for this index; however, any non-nullable, unique,
single-column index is sufficient If the table uses composite primary keys, another unique
index must be created to use full-text search
2 Choose the columns to be full-text indexed, as shown in Figure 19-2 Valid column data types
are character data types (char,nchar,varchar,nvarchar,text,ntext, andxml) and
binary data types (binary,varbinary,varbinary(max), and the deprecatedimage)
(Indexing binary images is an advanced topic covered later in this chapter.) You may need
to specify the language used for parsing the words, although the computer default will likely
handle this automatically
Full-text search can also read documents stored inbinary,varbinary,varbinary(max),
andimagecolumns Using full-text search with embedded BLOBs (binary large objects) is
covered later in this chapter
FIGURE 19-2
Any valid columns are listed by the Full-Text Indexing Wizard and may be selected for indexing
3 Enable change tracking if desired This will automatically update the catalog when the data
changes The Automatic option means that Change Tracking is enabled and automatically
updated The Manual option means that updates are manual but change tracking is still
enabled Change Tracking can also be completely disabled
4 Select a catalog or opt to create a new catalog The stoplist may also be selected.
Trang 55 Skip creating a population schedule; there’s a better way to keep the catalog up-to-date (The
strategies for maintaining a full-text index are discussed later in the chapter.)
6 Click Finish.
When the wizard is finished, if Change Tracking was selected in step 3, then the Start full population
check box was also automatically selected, so Full-Text Search should begin a population immediately
and iFTS will be set to go as soon as all the data is indexed Depending on the amount of data in the
indexed columns, the population may take a few seconds, a few minutes, or a few hours to complete
If Change Tracking was disabled, then the iFTS indexes are empty and need to be populated To
ini-tially populate the catalog, right-click on the table and select Full-Text Index Table➪ Enable Full-Text
Index, and then Full-Text Index Table➪ Start Full Population from the context menu This directs SQL
Server to begin passing data to Full-Text Search for indexing
Creating a catalog with T-SQL code
To implement full-text search using a method that can be easily replicated on other servers, your best
option is to create a SQL script Creating a catalog with code means following the same steps as the
Full-Text Indexing Wizard Creating full-text catalogs and indexes uses normal DDLCREATEstatements
The following code configures a full-text search catalog for the Aesop’s Fables sample database:
USE AESOP;
CREATE FULLTEXT CATALOG AesopFT;
CREATE FULLTEXT INDEX ON dbo.Fable(Title, Moral, Fabletext)
KEY INDEX FablePK ON AesopFT WITH CHANGE_TRACKING AUTO;
Use thealter fulltext indexcommand to change the full-text catalog to manually populate it
Pushing data to the full-text index
Full-text indexes are different from data engine clustered and non-clustered indexes that are updated
as part of the ACID transaction (see Chapter 66, ‘‘Managing Transactions, Locking, and Blocking’’ for
details on ACID transactions) Full-text indexes are updated only when the Database Engine passes
new data to the full-text engine That’s both a benefit and a drawback On the one hand, it means that
updating the full-text index doesn’t slow down large-text updates On the other hand, the full-text index
is not real-time in the way SQL Server data is If a user enters a r´esum´e and then searches for it using
full-text search before the full-text index has been updated, then the r´esum´e won’t be found
Every full-text index begins empty, and if data already exists in the SQL Server tables, then it must be
pushed to the full-text index by means of a full population A full population re-initializes the index and
passes data for all rows to the full-text index A full population may be performed with Management
Studio or T-SQL code Because the data push is driven by SQL Server, data is sent from one table at
a time regardless of how many tables might be full-text indexed in a catalog If the full-text index is
created for an empty SQL Server table, then a full population is not required
Trang 6Two primary methods of pushing ongoing changes to a full-text index exist:
■ Incremental populations: An incremental population uses a timestamp to pass any rows
that have changed since the last population This method can be performed manually from
Management Studio or by means of T-SQL code or scheduled as a SQL Server Agent job
(typi-cally, each evening) Incremental population requires arowversion(timestamp) column in
the table
Incremental populations present two problems First, a built-in delay occurs between the time
the data is entered and the time the user can find the data using full-text search Second,
incremental populations consolidate all the changes into a single process that consumes a
significant amount of CPU time during the incremental change In a heavily used database,
the choice is between performing incremental populations each evening and forcing a one-day
delay each time or performing incremental populations at scheduled times throughout the
day and suffering performance hits at those times
■ Change tracking and background population (default): SQL Server can watch for data
changes in columns that are full-text indexed and then send what is effectively a single-row
incremental population every time a row changes While this method seems costly in terms
of performance, in practice, the effect is not noticeable The full-text update isn’t fired by a
trigger, so the update transaction doesn’t need to wait for the data to be pushed to the full-text
index Instead, the full-text update occurs in the background slightly behind the SQL DML
transaction The effect is a balanced CPU load and a full-text index that appears to be near
real-time
Change tracking can also be configured to require manual pushes of only the changed data
Best Practice
If the database project incorporates searching for words within columns, use full-text search with change
tracking and background population It’s the best overall way to balance search performance with update
performance
Maintaining a catalog with Management Studio
Within Management Studio, the iFTS catalogs are maintained with the right-click menu for each table
The menu offers the following maintenance options under Full-Text Index Table:
■ Define Full-Text Indexing on Table: Launches the Full-Text Indexing Wizard to create a
new catalog as described earlier in the chapter
■ Enable/Disable Full-Text Index: Turns iFTS on or off for the catalog
■ Delete Full-Text Index: Drops the selected table from its catalog
■ Start Full Population: Initiates a data push of all rows from the selected SQL Server table to
its full-text index catalog
Trang 7■ Start Incremental Population: Initiates a data push of rows that have changed since the last population in the selected table from SQL Server to the full-text index
■ Stop Population: Halts any currently running full-text population push
■ Track Changes Manually: Enables Change Tracking but does not push any data to the index
■ Track Changes Automatically: Performs a full or incremental population and then turns on change tracking so that SQL Server can update the index
■ Disable Change Tracking: Temporarily turns off change tracking
■ Apply Tacked Changes: Pushes updates of rows that have been flagged by change tracking to the full-text index as the changes occur
■ Update Index: Pushes an update of all rows that change tracking has flagged to the full-text index
■ Properties: Launches the Full-Text Search Property page, which can be used to modify the catalog for the selected table
Maintaining a catalog in T-SQL code
Each of the previous Management Studio iFTS maintenance commands can be executed from T-SQL
code The following examples demonstrate iFTS catalog-maintenance commands applied to the Aesop’s
Fables sample database:
■ Full population:
ALTER FULLTEXT INDEX ON Fable START FULL POPULATION;
■ Incremental population:
ALTER FULLTEXT INDEX ON Fable START Incremental POPULATION
■ Remove a full-text catalog:
DROP FULLTEXT INDEX ON dbo.Fable DROP FULLTEXT CATALOG AesopFT
Word Searches
Once the catalog is created, iFTS is ready for word and phrase queries Word searches are performed
with theCONTAINSkeyword The effect ofCONTAINSis to pass the word search to the iFTS
compo-nent with SQL Server and await the reply Word searches can be used within a query in one of two
ways,CONTAINSorCONTAINSTABLE
The Contains function
CONTAINSoperates within theWHEREclause, much like aWHERE IN(subquery) The parameters
within the parentheses are passed to the iFTS engine, which returns an ‘‘include’’ or ‘‘omit’’ status for
each row
Trang 8The first parameter passed to the iFTS engine is the column name to be searched, or an asterisk for a
search of all columns from one table If theFROMclause includes multiple tables, then the table must
be specified in theCONTAINSparameter The following basic iFTS searches all indexed columns for the
word ‘‘Lion’’:
USE Aesop;
SELECT Title
FROM Fable
WHERE CONTAINS (Fable.*,‘Lion’);
The following fables contain the word ‘‘Lion’’ in either the fable title, moral, or text:
Title
-The Dogs and the Fox
The Hunter and the Woodman
The Ass in the Lion’s Skin
Androcles
Integrated Full-Text Search is not case sensitive Even if the server is configured for a
case-sensitive collation, iFTS will accept column names regardless of the case.
The ContainsTable function
Not only will iFTS work within theWHEREclause, but theCONTAINSTABLEfunction operates as a
table or subquery and returns the result set from the full-text search engine This SQL Server feature
opens up the possibility of powerful searches
CONTAINSTABLEreturns a result set with two columns The first column,Key, identifies the row using
the unique index that was defined when the catalog was configured
The second column,Rank, reports the ranking of the rows using values from1(low) to1000(high)
There is no high/median/low range or fixed range to the rank value; the rank compares the row with
other rows only with regard to the following factors:
■ The frequency/uniqueness of the word in the table
■ The frequency/uniqueness of the word in the column
Therefore, a rare word will be ranked as statistically more important than a common word
Becauserankis only a relative ranking, it’s useful for sorting the results, but never assume a certain
rankvalue indicates significance and thus filter by a rank value in theWHEREclause
The same parameters that define the iFTS forCONTAINSalso define the search forCONTAINSTABLE
The following query returns the raw data from the iFTS engine:
SELECT *
FROM CONTAINSTABLE (Fable, *, ‘Lion’);
Trang 9
The key by itself is useless to a human, but joining theCONTAINSTABLEresults with theFabletable,
as ifCONTAINSTABLEwere a derived table, enables the query to return theRankand the fable’s
Title, as follows:
SELECT Fable.Title, FTS.Rank FROM Fable
INNER JOIN CONTAINSTABLE (Fable, *, ‘Lion’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY FTS.Rank DESC;
Result:
The Ass in the Lion’s Skin 80 The Hunter and the Woodman 48 The Dogs and the Fox 32
A fourthCONTAINSTABLEparameter, top n limit, reduces the result set from the full-text search engine
much as the SQLSELECT TOPpredicate does The limit is applied assuming that the result set is sorted
descending by rank so that only the highest ranked results are returned The following query
demon-strates the top n limit throttle:
SELECT Fable.Title, FTS.Rank FROM Fable
INNER JOIN CONTAINSTABLE (Fable, *, ‘Lion’, 2) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY FTS.Rank DESC;
Result:
The Ass in the Lion’s Skin 80
The advantage of using the top n limit option is that the full-text search engine can pass less data back
to the query It’s more efficient than returning the full result set and then performing a SQLTOPin the
SELECTstatement It illustrates the principle of performing the data work at the server instead of the
client In this case, the full-text search engine is the server process and SQL Server is the client process
Trang 10Advanced Search Options
Full-text search is powerful, and you can add plenty of options to the search string The options
described in this section work withCONTAINSandCONTAINSTABLE
Multiple-word searches
Multiple words may be included in the search by means of theORandANDconjunctions The following
query finds any fables containing both the word ‘‘Tortoise’’ and the word ‘‘Hare’’ in the text of the fable:
SELECT Title
FROM Fable
WHERE CONTAINS (FableText,‘Tortoise AND Hare’);
Result:
Title
-The Hare and the Tortoise
One significant issue pertaining to the search for multiple words is that while full-text search can easily
search across multiple columns for a single word, it searches for multiple words only if those words are
in the same column For example, the fable ‘‘The Ants and the Grasshopper’’ includes the word ‘‘thrifty’’
in the moral and the word ‘‘supperless’’ in the text of the fable itself But searching for ‘‘thrifty and
sup-perless’’ across all columns yields no results, as shown here:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘ "Thrifty AND supperless" ’);
Result:
(0 row(s) affected)
Two solutions exist, and neither one is pretty The query can be reconfigured so that theAND
conjunc-tion is at theWHERE-clause level, rather than within theCONTAINSparameter The problem with this
solution is performance The following query requires two remote scans to the full-text search engine:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘Thrifty’)
AND CONTAINS(*,‘supperless’)
Result:
Title
-The Ants and the Grasshopper