By default, if you have more than one crawl database associated with a service application, the load is spread between the databases by host name.. Using host distribution rules, it’s po
Trang 1That’s it; you are done with your tour of Foundation site search administration Clearly, there are a lot of positives here; but keep reading The next section covers SharePoint Server Search and Search Server As you drool over those features, don’t forget that the Express version of Search Server is free, and you can bolt it right on top of Foundation with ease Wow — a free solution and a more awesome Search.
ShAREPOINt SERvER ANd SEARch SERvER
This section covers the following products:
SharePoint Server 2010 Standard
Search Server versus SharePoint Server
A very common question that first pops up in this conversation is “If I have SharePoint Server what do I get by adding Search Server?” The answer is simple: nothing at all Search Server is only
a subset of the functionality available in SharePoint Server and cannot be installed on an existing SharePoint Server installation
An example of a key difference is that SharePoint Server can index Active Directory information about your users after you configure and do a profile import, which is covered in Chapter 17 While Search Server can index SharePoint sites, it does not have a mechanism for doing the profile import from Active Directory, so it is unable to index user information We will note similar limitations
on Search Server throughout the chapter; otherwise, assume Search Server can perform the covered feature
The follow-up question is “What is the difference between Search Server and Search Server Express (SSX)?” Again the answer is simple: scale SSX can only be deployed on one server in the farm You cannot add more servers to make Search high availability Search Server can be scaled in the same fashion as SharePoint Server, providing high availability for search and the capability to scale to some-where in the ballpark of 100 million items Yikes! Of course, that power comes at a price Express is free, whereas regular Search Server is not
Trang 2configuration and Scale
In Chapter 3 you took a good look at farm topologies and scale points Noticeably absent from that chapter was a detailed discussion of Search That wasn’t author laziness; the Search team at Microsoft chose to build their own tools for configuration of their service application To access this tool, go into Central Administration ➪ Manage service applications and click on your Search service application At the bottom of the administration window you will see the screen shown in Figure 14-6
admin
In the Admin section of this screen you will find the Administration component This is the boss of Search It tells all of the other components and servers what to do by managing the topology This component cannot be made redundant but that is okay; if this server is offline, then the rest of the servers will continue serving their role No changes to the Search topology can be made while this server is offline This server is responsible for such items as starting crawls, reassigning crawl tasks
if it finds a crawler unavailable, and similar tasks
To store all of this information, this component uses the administration database This database has all of the search configuration information, so when you learn how to create a new crawl rule, this
is where you will find it
Trang 3A fi nal note about the Admin component: It cannot be readily moved to a different server, so it will live forever on whatever server you fi rst provision it on This might affect your planning if you are very particular about what is hosted on which server.
The Crawl component keeps track of what it needs to crawl and what has been crawled in the crawl database, along with the crawl schedule and other details necessary for crawl operations And the exciting part: You can have multiple crawlers assigned to the same crawl database For you MOSS
2007 fans, this means no more relying on only one index server to build your index; now the sky is the limit regarding how much hardware you can throw at creating the index Another benefi t of the crawler having a dedicated database is it does not add load to the property database while crawling
By default, if you have more than one crawl database associated with a service application, the load
is spread between the databases by host name Using host distribution rules, it’s possible to specify that a certain host (think content source like http://portal or \\server\share) is specifi cally tied
to a crawl database And because you assign Crawl components to specifi c crawl databases, you can now ensure that you have your most powerful crawlers working on that database You may even choose to have that crawl database on a dedicated SQL Server
If you have multiple databases and you want to fi nd out what hosts are in what
database, you can do that in the crawl log Details about this cool capability
follow later in the chapter.
index Partition
You just learned about crawlers, and how they create an index but don’t store the actual index The storage is actually done by the Query component The Query component is responsible for respond-ing to search queries When a user on a SharePoint site types “Cow” in the search box and hits Search, the web server hands that off to the Query Component server, more often than not just called the
query server The query server then digs through the index and property database to come up with
a list of items for the search Security trimming then takes place, and fi nally the web server renders those results back to the user
Trang 4If you want to add scale, you can actually divide the index into multiple partitions, or pieces (as described later in this chapter) That way, you can assign each partition to a query server For example,
if you have one million items in your index prior to partitioning, it might take one second to find your search results If you divide that into two partitions and put each partition on its own query server, your index still has one million items in it but each query server has only 500,000 items in its partition to look through Now your query results can be aggregated and returned to your browser
in 5 seconds That is how you scale the query servers for faster results
An important threshold for an index partition is 10 million items, the maximum number supported
in a partition Also, remember that each time you want to introduce a new partition you need to introduce a new query server Very little is gained, and more than likely you actually will decrease performance, if you have only one query server and you try to break your index up into two partitions with both living on the same query server Unlike the crawl databases that are divided up by hosts, the index partitions try to maintain a very close balance So each item is sent to an index partition based
on a hash of its document id This method provides better scale with query partitions
Now you have two query servers but each one has half the index (its own partition) Next you need
to configure redundancy Partitions can also have mirrors The mirror partition can be configured to respond to queries only if the primary partition is unavailable, or it can be a fully functional mirror that responds to queries The balancing of query traffic is handled by the Search Admin component and is automatic Typically, your index partition will be served by only one Query component, and configured with a failover mirror
The final piece here is the property database This database stores all of the metadata associated with the index partition(s) to which it is connected An index partition is associated with only one partition database, but a partition database can be connected to multiple index partitions This SQL Server data-base can become a bottleneck over time as it grows If that is the case, you can either move the database
to a bigger, badder SQL Server or reduce the number of partitions associated with it
Adding a Server to the Search topology
Consider a scenario in which the server farm is fully configured with everything, including SQL Server, running on one machine Another server, ServerRC, has been purchased, has the same ver-sion of SharePoint Server 2010 Enterprise installed, and is added to the farm The initial configura-tion wizard has been run on the new server This started the appropriate services on this server To add the second server to your Search topology, follow these steps:
1 Open Central Administration ➪ Application Management ➪ Manage service applications
2 Find your search service application and open the Manage interface Remember that Search topology is defined per Search service application if for some reason you have more than one
3 Scroll down the page and click Modify (refer to Figure 14-3)
4 Click New, and from the drop-down select Crawl Component
5 For Server, select your new server’s name For this example, it is ServerRC
6 For Associated Crawl Database, select the Crawl Database from which you want this crawler
to work
Trang 57 If necessary, change the Temporary Location of Index This location will only be used for creating the index updates before pushing them out, and it should remain relatively small It will not increase in size as your index grows Check out Figure 14-7 for an example and then click OK.
FIguRE 14-7
8 You are returned to the Manage Search Topology screen, where you will see Pending creation next to your new component Click the Apply Topology Changes button at the bottom of the screen, unless you plan to also add the Query component in the next set of steps If so, skip this step A processing screen will appear and process for a few minutes Once it is complete, you are all set
You now have configured the two servers to share the load of the one crawl database The next cal step is to configure your new server to also be a query server With the second Query component, you will get a second index partition, so you will want to define a mirror for each of your two partitions:
logi-1 Return to the Search administration screen and click the Modify Search Application
Topology button
2 Click New From the drop-down, select Index Partition and Query Component
3 For Server, select your new server
4 For Associated Property Database, choose the database you want this query component to use You haven’t created any additional ones, so there should only be one item in the list
5 Location of Index is an important consideration This is where the physical index files will be stored on the server Ensure that you have enough storage capacity in your chosen location
If at all possible, this should be on its own dedicated drive
Trang 66 Leave the Set this query component as failover-only at its default setting of unchecked as illustrated in Figure 14-8.
FIguRE 14-8
7 After you confirm your settings, click OK This will automatically create Query component 2
8 Now you have the two partitions you need to set up the mirrors Hover over Query nent 1, click the drop-down, and select Add Mirror
compo-9 For Server, choose the server that is currently not hosting this partition
10 Confirm that your Index location is correct (Remember that the C: drive is a bad place.)
11 Check the box for Set the query component as failover-only
12 Click OK
13 Repeat steps 8–12 for Query component 2
14 You are returned to the Manage Search Topology screen You will see Pending creation next to your new component Click the Apply Topology Changes button at the bottom of the screen A processing screen will appear and process for a few minutes Once it is complete you are all set.Now both servers are participating in serving Search queries and helping to crawl all of the content You also have solid redundancy In most environments the preceding actions will be sufficient You have the capacity to crawl a lot of content in a reasonable amount of time and your Search compo-nents are high availability Note that this does not include SQL Server It is up to you to implement a high-availability solution for the databases, whether that is SQL Server clustering, taking advantage
of the database mirroring support, or some third-party solution
Trang 7Scaling up with crawl databases
Fast forward a little bit and your SharePoint deployment demands have increased again You now want to add the crawling of your very large file server Because of the size and nature of the data, you expect the crawling burden to be very high, so you choose to add another crawl database running on
a dedicated SQL Server You will also make this a dedicated database
1 Return to the Search administration screen and click the Modify Search Application
Topology button
2 Click New and select Crawl Database
3 For Database Server, enter the SQL Server you want to host this database It can be the same SQL Server the rest of your farm uses, or if you’re trying to add scale because of performance constraints on your current SQL Server, it may be a dedicated SQL Server
3 Set Database Name to anything you would like
4 Enable the checkbox for Dedicate this crawl store to hosts as specified in Host Distribution Rules, as shown in Figure 14-9
5 Leave the other fields as is and click OK
FIguRE 14-9
Trang 8At the bottom of page you selected the option to Dedicate this crawl store to hosts as specified in Host Distribution Rules This rule tells the database to not store anything that is not specifically added by a host distribution rule, which you will create in the next section If you do not make this crawl database
a dedicated database, then Search will automatically balance the load in this database with the other crawl database Don’t forget to click Apply Topology Changes once you are done making updates to your topology
If you were to now go straight into adding a host distribution rule, you would not see your new crawl database listed That’s because you have not associated your new crawl database with a crawl component, making it useless To fix this, you need to follow the previous steps for creating a new crawl component, but this time select the new crawl database you created Do this on Server1 and ServerRC
adding a Content source and Host Distribution rule
In these steps you will add a file share content source and then add it to the crawl database you specified earlier:
1 Go to the Search Administration page
2 On the left side of the page, click Content Sources
3 Click New Content Source
4 Specify a Name
5 For Content Source Type, choose File Shares
6 For Start Addresses, enter the UNC path to the share(s) you want to crawl — for example,
\\FileServer\Share Note that the search crawl account needs to have “read access” to the share(s) being crawled
7 For Crawl Settings, the default is normally correct Crawl the whole share, not just the root folder
8 For now, leave the crawl schedule set to None (Crawl schedules are covered later in the chapter.)
9 Content Source Priority gives you the opportunity to mark a content source as high ity This way, if overlapping content source crawls are taking place, you can specify which should have priority
prior-10 Skip over Start Full Crawl You will do that the old-fashioned way in a moment
11 Click OK Figure 14-10 shows a sample configuration
Trang 9FIguRE 14-10
Creating a Host Distribution rule
Now your file share content source is created Before you start that full crawl, you need to set up your host distribution rule:
1 On the left side of the screen, click Host Distribution Rules
2 Click the button for Add Distribution Rule
Trang 103 For Hostname, enter FileServer (Do not use slashes, just the actual host name For example,
if you had a content source of http://portal.contoso.com, your hostname would be portal.contoso.com FileServer is used as the hostname here to keep up with the previous file share configured for \\FileServer\Share.)
4 From the Distribution Configuration, select the crawl database that you created in the earlier section
5 Click OK
6 Click Apply Changes This will check to determine whether any content must be moved from one crawl database to another to comply with your new rule If so, you are warned that this takes time and that any active/pending crawls will be paused for the duration of the move Click the Redistribute Now button when you are ready to commit to the changes
starting a Crawl
With all of that done you are now ready to do a crawl of your content sources and watch them split
up across the databases:
1 Click Content Sources on the left side of the screen
2 Hover over File Share (your content source), click the drop-down, and select Start Full
Crawl
3 Click Search Administration on the top left
4 Now you can get a nice can of Mountain Dew, and sit back and watch the crawler go
Perfect! Now you have your entire file share in one dedicated crawl database with two dedicated crawlers Keep in mind that your dedicated crawlers are still on the same crawl server as the other crawlers If you needed more scale, you could introduce more servers into the farm, create new crawl components on those servers, and then assign those crawlers to this crawl database and remove the current two Scaling up is as flexible as Silly Putty
Matching Crawl Databases to Hosts
For the final trick when it comes to playing with crawl databases, you need to look at the crawl logs:
1 On the left side of the Search Administration page, click Crawl Log
2 From the top menu bar, click Host Name
Behold! All of your crawl databases are listed, and each one shows what hosts are included
in the database
Take a gander at Figure 14-11 It doesn’t reflect the preceding steps, but rather includes some esting things to test your knowledge
Trang 11inter-FIguRE 14-11
There are three crawl databases Search_Service_Application_CrawlStoreDB_
e2375287809744a28811d81f75273870 is the original crawl database that was created using the Initial Farm Configuration Wizard The “Initial” in its name is a good reminder of its limitations SearchCrawlDB1 and SearchCrawlDB2 were manually created using the Modify Topology button SearchCrawlDB2 was configured to Dedicate this crawl store to hosts as specified in Host Distribution Rules
Looking at the hosts, you can see content distribution at work There are six content sources Server3 has a host distribution rule to force it into SearchCrawlDB2 The remaining five were spread across the remaining two databases Three of the content sources begin with sp911rc, but because they are separate sources, based on the port, they are divided accordingly
At the top of the page there is also a link that says “If you would like the system to analyze your current distribution and make recommendations for redistribution, click here.” Clicking that button
on this server produces the report shown in Figure 14-12
That’s rather impressive Search looked at how your hosts were currently distributed versus the amount
of content in each and suggested changes to better balance the databases Keeping perfect balance
is very difficult, as each host has to reside in only one crawl database; but in an environment with many hosts, this can go a long way At the bottom is a Redistribute Now button if you want to have the changes implemented for you If you click this button, SharePoint will automatically configure new Host Distribution rules for you and update the crawl databases as necessary Don’t forget that all crawls are paused while this process runs
Trang 12FIguRE 14-12
Once the rules are created, you will be brought back to the Host Distribution Rules page Here you will see a Redistribution status across the top of the page, with a percentage complete The page will automatically refresh every 10 seconds while the distribution runs
After everything is done you can return to the Auto Host Distribution page and let it check again You will see something similar to Figure 14-13
FIguRE 14-13
Trang 13adding a Property Database
Now imagine that after looking at your query performance you find that your property database has become the bottleneck Your overabundance of metadata and SQL disk I/O have combined to slow things down Time to add a new database:
1 Open Search Administration
2 Scroll down the page and click the Modify button under Search Application Topology
3 From the toolbar, click New and select Property Database
4 The defaults here are typically good, but if you want to give the database a new name or have it hosted on a different SQL server, make those changes now Once you are done click OK
Now the database is created, but it is still not in use You have to first associate it with a Query component:
1 Click Query Component1, and from the drop-down select Edit Properties
2 For Associated Property Database, click the drop-down and select the new database you created
3 Click OK
Now you are still in an awkward position When you change a Query component to be associated with a new property database, a new index partition is created as a by-product That’s because the index partition is associated with a specific property database and cannot be changed This means that you now need to reevaluate your index partitions For example, the partition you just created doesn’t have a mirror You need to add a mirror to it And the old partition is gone but the mirror
of that partition is still floating out there associated with the wrong property database Once you get everything straightened out, be sure to apply your changes
the Search uI
After you put so much work into configuring your topology and then working through the tion interfaces, it’s easy to assume you are done Don’t clock out quite yet While the UI is a wonderful thing that will “just work,” there is so much more you can get out of it with a little understanding and tweaking Even more exciting is the fact that you can delegate this work to a site collection administrator The following sections describe some of the ways you can tweak the UI
administra-The search Box
Everyone knows how to use the Search box: You enter your search query, hit Search, and then get the results Pretty straightforward — but as noted in the SharePoint Foundation section, you can do
a handful of cool things in this box:
Wildcard searches
➤
➤ — Wildcards enable you to broaden your search by using symbols to
rep-resent characters For example, you can simply type Sh* to search for all words that begin
with the letters Sh Note that the wildcard search works only for the end of the word You
Trang 14cannot search for *point only share* Also, keep in mind that while wildcard search can help
you find more good results, it is also going to return more bad results Relevancy is greatly reduced when search for wildcards
Boolean searches
➤
➤ — This searching method enables you to narrow or broaden your search using terms such as AND, OR, and NOT It is important that you capitalize the Boolean terms properly Also worth noting is the use of “ ” around phrases For example, you could
do a search such as (“Accounting Policy” or “Accounting Procedures”) AND Termination
This would return all search results that have either Accounting Policy and Termination or Accounting Procedures and Termination
Range refinements
➤
➤ — You can do range refinements using the =, >, <, <=, and >= operators The previous version of SharePoint accepted these operators to help you refine property restrictions; it just didn’t do it very well Who knew those could be used for something more than making emoticons?
Property searches
➤
➤ — For years we have had a property search capability but it was apparently
secret In the search box, you can type title:“Vacation policy” or author:Shane and do a
search on specific properties Any of the Managed Metadata properties can be used They are discussed later in the chapter
relevancy improvements
Every iteration of a good search engine improves the magic that drives search results, and SharePoint
is no exception Although most of the updates are closely guarded secrets, there are a couple that can be shared
Phrase matching support has been added For example, when you search for sales presentation,
results with sales and presentation together will be ranked higher than results with sales and tation in the document but not together
presen-Clickthroughs count A clickthrough is the way the search page captures your activity When you do
a search and get back results, Search continues to monitor your activity by noting which links you
click For example, if you search for policy, and after reviewing the list of files you click on the third document, SharePoint makes a note of that Over time, if people searching for policy continue to
click on the third document, SharePoint will adjust that document and return it higher in the results This is a pretty powerful feature, driving better search results as your users simply do their normal activities
In Chapters 16 and 17 you learned about different ways of adding metadata to documents One of the features was social tagging Whether it is on pages, documents, or entire sites, tags are a help to Search Search looks at these social tags and gives increased weight to tags, especially if the same content is tagged repeatedly with the same tag Once again, Search knows your users matter and it updates its indexes to reflect their activities
refiners
When you do a search, notice the list of properties on the left-hand side of the page, as shown in
Figure 14-14 These are called refiners For example, you can click on Word under Result Type and