Microsoft SQL Server 2008 R2 Unleashed- P204 doc

CHAPTER 50 SQL Server Full-Text Search Here are some examples using FREETEXTandFREETEXTTABLE: Use AdventureWorks;SELECT * from Person.Contact where Freetext*,’Barack Obama’ Corrected!. S

Trang 1

CHAPTER 50 SQL Server Full-Text Search

Here are some examples using FREETEXTandFREETEXTTABLE:

Use AdventureWorks;SELECT * from Person.Contact where Freetext(*,’Barack Obama’)

Corrected! HPC

Use AdventureWorks;

SELECT * FROM Sales.Individual as s

JOIN (SELECT [key], rank FROM FREETEXTTABLE(Person.Contact, *, ‘jon’,100)) AS k

ON k.[key]=s.Contactid order by rank desc

Notice that the FREETEXTTABLEexample does the functional equivalent of a CONTAINSTABLE

query because the search is wrapped in double quotation marks

Stop Lists

Stop lists are used when you want to hide words in searches or to prevent from being

indexed those words that would otherwise bloat your full-text index and might cause

perfor-mance problems Stop lists (also known as noise word lists or stop word lists) are a legacy

component from decades ago when disk prices were very expensive Back then, using stop

lists could save considerable disk space However, with disk prices being relatively cheap, the

use of stop lists is no longer as critical as it once was You can create your own stop word list

by expanding your database in SSMS and then right-clicking on the Full-Text Stoplists

node and selecting New Full-Text Stoplist You have an option of creating your own stop list,

basing it on a system stop list, creating an empty one, or creating one based on another stop

list in a different database Each catalog can have its own stop list, which is a frequently

demanded feature because some search consumers want to be able to prevent some words

from being indexed in one table but want those words indexed in a different table After you

create a stop word list, you can maintain it by right-clicking on it in the Full-Text

Stoplistsnode and selecting Properties Figure 50.5 illustrates this option

The options are to add a stop word, delete a stop word, delete all stop words, and clear the

stop list After selecting the option you want, you can enter a stop word and the language

in which you want that stop word to be applied

Keep in mind that the stop lists are applied at query time (while searching) and index

time (while indexing) Changes made to a stop list are reflected real-time in searches but

applied only to newly indexed words The stop words remain in the catalog until you

rebuild the catalog It is a best practice to rebuild your catalog as soon as you have made

changes to your stop word list To rebuild your full-text catalog, right-click on the catalog

in SSMS and select Rebuild

Full-Text Search Maintenance

After you create full-text catalogs and indexes that you can query, you have to maintain

them The catalogs and indexes maintain themselves, but you need to focus on backing

up and restoring them as well as tuning your search solution for optimal performance In

SQL Server 2008, the full-text catalogs get fragmented from time to time, especially if you

are using the Automatic (Track Changes Automatically) setting You can check the level of

fragmentation by using the following command:

SELECT table_id, status FROM sys.fulltext_index_fragments WHERE status=4 OR

status=6;

Trang 2

FIGURE 50.5 Maintaining a full-text stop list

If you notice that your tables are highly fragemented you will optimize your full-text

indexes Here is the command you would use to do this:

ALTER FULLTEXT CATALOG AdventureWorks2008 REORGANIZE;

Full-Text Search Performance

SQL Server FTS performance is most sensitive to the number of rows in the result set and

number of search terms in the query You should limit your result set to a practical

number; most searchers are conditioned to look only at the first page of results for what

they are looking for, and if they don’t see what they need there, they refine the search

and search again A good practical limit for the number of rows to return is 200 You

should try, if at all possible, to use simple queries because they perform better than more

complex ones As a rule, you should useCONTAINSrather thanFREETEXTbecause it offers

better performance, and you should useCONTAINSTABLErather thanFREETEXTTABLEfor the

same reason

Several factors are involved in delivering an optimal Full-Text Search solution Consider

the following:

Avoid indexing binary content Convert it to text, if possible Most IFilters do not

perform as well as the text IFilter

Use integer columns on the base table that comprises your unique index

Trang 3

Partition large tables into smaller tables There seems to be a sweet spot around 50

million rows, but your results may vary Ensure that for large tables, each table has

its own catalog Place this catalog on a RAID 10 array, preferably on its own

controller

SQL Full-Text Search benefits from multiple processors, preferably four or more A

sweet spot exists on eight-way machines or better You will find 64-bit hardware also

offers substantial performance benefits over 32-bit

Dedicate at least 512MB to 1GB of RAM to MSFTESQLby setting the maximum server

memory to 1GB less than the installed memory Set resource usage to run at 5to

give a performance boost to the indexing process (that is, sp_fulltext_service

‘resource_usage’,5), set ft crawl bandwidth (max)andft notify bandwidth

(max)to0, and set max full-text crawl rangeto the number of CPUs on your

sys-tem Use sp_configureto make these changes

Full-Text Search Troubleshooting

The first question you should ask yourself when you have a problem with SQL Full-Text

Search is this: “Is the problem with searching or with indexing?” To help you make this

determination, Microsoft has included three DMVs in SQL Server 2008:

sys.dm_fts_index_keywords

sys.dm_fts_index_keywords_by_document

sys.dm_fts_parser

The first two DMVs displays the contents of your full-text index The first DMV returns

the following columns:

Keyword—Each keyword in varbinary form

Display_term—The keyword as indexed; all the accents are removed from the word

Column_ID—The column ID where the word exists

Document_Count—The number of times the word exists in that column

The second DMV breaks down the keywords by document Like the first DMV, it contains

theKeyword,Display_term, and Column_IDcolumns, but in addition it contains the

following two columns:

Document_ID—The row in which the keyword occurs

Occurrence_count—The number of times the word occurs in the cell (a cell is also

known as a tuple; it is a row-column combination—for example, the contents of the

third column in the fifth row)

The first DMV, sys.dm_fts_index_keywords, is used primarily to determine candidate

noise wordsit can be used to diagnose indexing problems The second DMV,

sys.dm_fts_index_keywords_by_document, is used to determine what is stored in your

index for a particular cell

Trang 4

Here are some examples of their usage:

select * From sys.dm_fts_index_keywords(DB_ID(),Object_iD(‘MyTable’))

select * From sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD(‘MyTable’))

These two DMVs are used to determine what occurs at index time The third DMV,

sys.dm_fts_parser, is used primarily to determine what happens at search time—in other

words, how SQL Server Full-Text Search interprets your search phrase Here is an example

of its usage

select * from sys.dm_fts_parser(@queryString, @LCID, @StopListID, @AccentSenstive)

@QueryString is your search word or phrase, @LCID is the LoCale ID for your language

(determinable by querying sys.fulltext_languages), @StopListID is your stoplist

file (determinable by querying sys.fulltext_stoplists), @AccentSensitive allows you

to set accent sensitivity (0 not sensitive, 1 sensitive to accents) Here is an

example of how this works:

select * from sys.dm_fts_parser(‘café’, 1033, 0, 1)

select * from sys.dm_fts_parser(‘café’, 1033, 0, 0)

In the second example, you will notice that the Display_termis cafe and not café These

queries return the following columns:

Keyword—This is a varbinary representation of your keyword

Group_id—The query parser builds a parse tree of the search phrase If you have any

Boolean searches, it assigns different group IDs to each part of the search term For

example in the search phrase’”Hillary Clinton” OR “Barack Obama”’, Hillary and

Clinton belong to Group ID1and Barack and Obama belong to Group ID2

Phrase_id—Some words are indexed in multiple forms; for example, data-base is

indexed as data, base, and database In this case, data and base have the same phrase

ID, and database has another phrase ID

Occurence_count—This is how frequently the word apprears in the search string

Special_term—This column refers to any delimiters that the parser finds in the

search phrase Possible values are Exact Match,End of Sentence,End of

Paragraph, and End of Chapter

Display_term—This is how the term would be stored in the index

Expansion_type—This is the type of expansion, whether it is a thesaurus expansion

(4), an inflectional expansion (2), or not expanded (0) For example, the following

query shows the stemmed variants of the word run.

select * from sys.dm_fts_parser(‘FORMSOF( INFLECTIONAL, run)’, 1033, 0, 0)

Source_Term—This is the source term as it appears in your query

When troubleshooting indexing problems, you should consult the full-text error log,

which can be found in C:\Program Files\Microsoft SQL

Trang 5

Server\MSSQL10.MSSQLSERVER\MSSQL\LOGand starts with the prefix SQLFTfollowed by the

database ID (padded with leading zeros), the catalog ID (query sys.fulltext_catalogsfor

this value), and then the extension .log You may find many versions of the log each

with a numerical extension, such as SQLFT0001800005.LOG.4; this is the fourth version of

this log These full-text indexing logs can be read by any text editor

You might find entries in this log that indicate documents were retried or documents

failed indexing in addition to error messages returned from the iFilters

Summary

SQL Server 2008 Full-Text Search offers extremely fast and powerful querying of textual

content stored in tables In SQL Server 2008, the full-text index creation statements are

highly symmetrical, with the table index creation statements making SQL Server FTS

much more intuitive to use than previous versions of SQL Server FTS Also new is the

tremendous increase in indexing and querying speeds These features make SQL Server

Full-Text Search a very attractive component of SQL Server 2008

Trang 6

SQL Server 2008 Analysis Services

What’s New in SSAS

Understanding SSAS and OLAP

Understanding the SSAS Environment Wizards

An Analytics Design Methodology

An OLAP Requirements Example: CompSales International

SQL Server 2008 Analysis Services (SSAS) continues to

expand with numerous data warehousing, data mining, and

online analytical processing (OLAP)–rich tools and

tech-nologies Microsoft continues to attack the data

warehous-ing/business intelligence (BI) market by pouring millions

and millions of dollars into this area Microsoft knows that

the world is hungry for analytics and is betting the farm on

it As a part of its internal project named “Madison,”

Microsoft has been acquiring other complementary BI

tech-nologies to accelerate its plans (such as acquiring the MPP

data warehousing appliance company DATAllegro and

rolling it under its BI offering) Other more traditional (and

much more expensive) OLAP and BI platforms such as

Cognos, Hyperion, Business Objects, and Micro Strategies

are being challenged, if not completely replaced, by this

new version of SSAS

A chief data architect from a prominent Silicon Valley

company said recently, “I can now build [using SSAS]

sound, extremely usable, highly scalable, OLAP cubes

myself, faster and smarter than the entire data warehouse

team could do only a few years ago.” This is what Microsoft

has been trying to bring to the forefront for years—“BI for

the masses.”

What’s New in SSAS

SQL Server 2005 was the big jump into completely

rede-ploying Analysis Services—from the architecture, to the

development environment, to the multidimensional

languages supported, and even to the wizard-driven

deploy-ments SQL Server 2008 R2 raises this core work up a few

Trang 7

CHAPTER 51 SQL Server 2008 Analysis Services

more notches with enhancements at almost every part of SSAS and with the addition of

major scaleout capabilities Following are some of the top new features and enhancements:

Microsoft has improved and streamlined the Cube Designer

Several subtle enhancements have been made around the Dimension and

Aggregation Designers

You can now create attribute relationships with the new Attribute Relationship

Designer

You can use subspace computations to optimize performance for your

Multidimensional Expressions (MDX) queries

Multidimensional OLAP (MOLAP) enables write-back capabilities that support

high-performance “what if” scenarios

A shared read-only Analysis Services database between several Analysis Services

servers enables you to “scale out” easily and efficiently

You are able to use localized analytical data in native languages, including

transla-tion capabilities and automatic currency conversions

A highly compressed and optimized data cache is maintained automatically

Backup performance is optimized

SQL Server PowerPivot for Excel is a new feature

The master data hub in SQL Server 2008 R2 helps manage your master data services

more efficiently

And, last, but not least,

SQL Server 2008 R2 Parallel Data Warehouse is a highly scalable data warehouse

appliance-based massively parallel processing (MPP) solution that knows no bounds

Understanding SSAS and OLAP

Because OLAP is at the heart of SSAS, you need to understand what it is and how it solves

the requirements of decision makers in a business As you might already know, data

ware-housing requirements typically include all the capability needed to report on a business’s

transactional history, such as sales history This transactional history is often organized

into subject areas and tiers of aggregated information that can support some online

query-ing and usually much more batch reportquery-ing Data warehouses and data marts typically

extract data from online transaction processing (OLTP) systems and serve data up to these

business users and reporting systems In general, these are all called decision support

systems (DSS), or BI systems, and the latency of this data is determined by the business

requirements it must support Typically, this latency is daily or weekly, depending on the

business needs, but more and more, we are seeing more real-time (or near-real-time)

reporting requirements

Trang 8

All Product

Product Type

All Geo

Country

All Time

Month

Sales Units 450 333 1203

Years

Product

Region

Customer

TIME

OLAP Cube

PRODUCT PRODUCT

Jan01 Feb01 Mar01 Apr01

996

(France)

(2010)

(Feb 01)

(IBM Laptop

Model 451D)

FIGURE 51.1 Multidimensional representation of business facts

OLAP falls squarely into the realm of BI The purpose of OLAP is to provide for a mostly

online reporting environment that can support various end user reporting requirements

Typically, OLAP representations are of OLAP cubes A cube is a multidimensional

represen-tation of basic business facts that can be accessed easily and quickly to provide you with

the specific information you need to make a critical decision It is useful to note that a

cube can be composed of from 1 to N dimensions However, remember that the business

facts represented in a cube must exist for all the dimensions being defined for the fact In

other words, all dimensional values (that is, intersections) have to be present for a fact

value to be stored in the cube

Figure 51.1 illustrates the Sales_Unitshistorical business fact, which is the intersection of

time, product, and geography dimensional data For a particular point in time (February

2010), for a particular product (IBM laptop model 451D), and in a particular country

(France), the sales units were 996 units With an OLAP cube, you can easily see how many

of these laptop computers were sold in France in February 2010

Basically, cubes enable you to look at business facts via well-defined and organized

dimen-sions (time, product, and geography dimendimen-sions, in this example) Note that each of these

dimensions is further organized into hierarchical representations that correspond to the

way data is looked at from the business point of view This provides for the capability to

drill down into the next level from a higher, broader level (like drilling down into a

specific country’s data within a geographic region, such as France’s data within the

European geographic region)

Trang 9

CHAPTER 51 SQL Server 2008 Analysis Services

SSAS directly supports this and other data warehousing capabilities In addition, SSAS

allows a designer to implement OLAP cubes using a variety of physical storage techniques

that are directly tied to data aggregation requirements and other performance

considera-tions You can easily access any OLAP cube built with SSAS via the Pivot Table Service, you

can write custom client applications by using MDX with OLE DB for OLAP or ActiveX

Data Objects Multidimensional (ADO MD), and you can use a number of third-party “OLE

DB for OLAP” compliant tools

Microsoft utilizes something called the Unified Dimensional Model (UDM) to

conceptual-ize all multidimensional representations in SSAS It is also worth noting that many of the

leading OLAP and statistical analysis software vendors have joined the Microsoft Data

Warehousing Alliance and are building front-end analysis and presentation tools for SSAS

The data mining capabilities that are part of SSAS provide a new avenue for organized data

discovery This includes using SQL Server DMX

This chapter takes you through the major components of SSAS, discusses a

mini-method-ology for OLAP cube design, and leads you through creating and managing robust OLAP

cube that can easily be used to meet a company’s BI needs

Understanding the SSAS Environment Wizards

Welcome to the “land of wizards.” This implementation of SSAS, as with older versions of

SSAS, is heavily wizard oriented SSAS has a Cube Wizard, a Dimension Wizard, a Partition

Wizard, a Storage Design Wizard, a Usage Analysis Wizard, a Usage-Based Optimization

Wizard, an Aggregation Wizard, a Calculated Cells Wizard, a Mining Model Wizard, and a

few other wizards All of them are useful, and many of their capabilities are also available

through editors and designers Using a wizard is helpful for those who need to have a

little structure in the definition process and who want to rely on defaults for much of

what they need The wizards are also plug-and-play oriented and have been made

avail-able in all SQL Server and NET development environments In other words, you can

access these wizards from wherever you need to, when you need to All the wizard-based

capabilities can also be coded in MDX, DMX, and ASSL

Figure 51.2 shows how SSAS fits into the overall scheme of the SQL Server 2008

environ-ment SSAS has become completely integrated into the SQL Server platform Utilizing many

different mechanisms, such as SSIS and direct data source access capabilities, a vast amount

of data can be funneled into the SSAS environment Most of the cubes you build will likely

be read-only because they will be for BI However, a write-enabled capability (WriteBack) is

available in SSAS for situations that meet certain data updatability requirements

As you can also see in Figure 51.2, the basic components in SSAS are all focused on building

and managing data cubes SSAS consists of the analysis server, processing services,

integra-tion services, and a number of data providers SSAS has both server-based and

client-/local-based SSAS capabilities This essentially provides a complete platform for OLAP

You create cubes by preprocessing aggregations (that is, precalculated summary data) that

reflect the desired levels within dimensions and support the type of querying that will be

done These aggregations provide the mechanism for rapid and uniform response times to

Trang 10

Packages

SSIS

OLAP

Cube

OLAP

Models Mining

Models

Local Cube Engine msmdlocal.exe

IIS

COM Data Pump

XMLA (SOAP over TCP/IP)

XMLA (SOAP over HTTP)

XMLA (SOAP over TCP/IP)

OLE DB for OLAP ADO MD

Win32/64 Applications

COM-Based Applications

.NET Applications

Any Application for OLAP or DM

OLTP Databases

Multi-Dimensional

Data Warehouse

OLTP Databases

Measures

Dimensions

Hierarchies

Partitions

Perspectives

Unified

Dimensional

Model

(UDM)

Proactive Cache

(MOLAP cache)

SSAS

Processing

Engine

FIGURE 51.2 SSAS as part of the overall SQL Server 2008 environment

queries You create them before the user uses the cube All queries utilize either these

aggre-gations, the cube’s source data, a copy of this data in a client cube, data in cache, or a

combination of these sources A single Analysis Server can manage many cubes You can

have multiple SSAS instances on a single machine

By orienting around UDM, SSAS allows for the definition of a cube that contains data

measures and dimensions Each cube dimension can contain a hierarchy of levels to

specify the natural categorical breakdown that users need to drill down into for more

details Look back at Figure 51.1, and you can see a product hierarchy, time hierarchy, and

geography hierarchy representation

The data values within a cube are represented by measures (the facts) Each measure of

data might utilize different aggregation options, depending on the type of data Unit data

might require the SUM(summarization) function, Date of Receipt data might require the

MAXfunction, and so on Members of a dimension are the actual level values, such as the

particular product number, the particular month, and the particular country Microsoft

has solved most of the limitations within SSAS SSAS addresses up to 2,147,483,647 of

most anything within its environment (for example, dimensions in a database, attributes

in a dimension, databases in an instance, levels in a hierarchy, cubes in a database,

measures in a cube) In reality, you will probably not have more than a handful of

dimen-sions Remember that dimensions are the paths to the interesting facts Dimension

members should be textual and are used as criteria for queries and as row and column

headers in query results

Định dạng
Số trang	10
Dung lượng	268,09 KB