Adding Content Sources Explain that SharePoint Portal Server provides access to content that is stored outside the workspace and that this content is referred to as a content source.. Th
Trang 1Contents
Overview 1
Components of a SharePoint Portal Server
Search 2
Lab A: Adding External Content to a
Trang 2Information in this document is subject to change without notice The names of companies, products, people, characters, and/or data mentioned herein are fictitious and are in no way intended
to represent any real individual, company, product, or event, unless otherwise noted Complying with all applicable copyright laws is the responsibility of the user No part of this document may
be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation If, however, your only means of access is electronic, permission to print one copy is hereby granted
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property
2001 Microsoft Corporation All rights reserved
Microsoft, Active Directory, Active X, FrontPage, JScript, MS-DOS, NetMeeting, Outlook, PowerPoint, SharePoint, Windows, Windows NT, Visio, Visual Basic, Visual SourceSafe, Visual Studio, and Win32 are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A and/or other countries
Other product and company names mentioned herein may be the trademarks of their respective owners
Trang 3Instructor Notes
This module provides students with the information necessary to add and manage a Microsoft® SharePoint™ Portal Server content source
After completing this module, students will be able to:
Describe the components that are used in the searching and indexing features of SharePoint Portal Server
Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source
Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources
Materials and Preparation
This section provides the materials and preparation tasks that you need to teach this module
Required Materials
To teach this module, you need the Microsoft PowerPoint® file 2095a_6.ppt
Preparation Tasks
To prepare for this module, you should:
Read all of the materials for this module
Complete the lab
Instructor Setup for a Lab
This section provides setup instructions that are required to prepare the instructor computer or classroom configuration for a lab
Lab A: Adding External Content to a Workspace
To prepare for the lab
• Classroom configured according to the setup guide for course 2059a
Presentation:
60 Minutes
Lab:
30 Minutes
Trang 4Module Strategy
Use the following strategy to present this module:
Components of a SharePoint Portal Server Search Describe the five components of a SharePoint Portal Server search, which includes the Gatherer, IFilters, word breakers and noise words, plug-ins, and indexing databases Describe the function of each of these components and then briefly explain how each component works
Adding Content Sources Explain that SharePoint Portal Server provides access to content that is stored outside the workspace and that this content is referred to as a content source Describe the basic features of content sources and then explain how
to add various content sources to a Content Sources folder
Managing Content Sources Explain that once a content source has been added, it must be managed to ensure that it used effectively during searches Discuss how to manage a content source by configuring crawl settings, search scopes, index updates, rules, gatherer log files and discussion settings as well as other management functions
Customization Information
This section identifies the lab setup requirements for a module and the configuration changes that occur on student computers during the labs This information is provided to assist you in replicating or customizing Training and Certification courseware
The lab in this module is also dependent on the classroom configuration that is specified in the Customization Information section in the
Classroom Setup Guide for Course 2095A, Implementing Microsoft ®
SharePoint ™ Portal Server 2001
Trang 5Overview
Components of a SharePoint Portal Server Search
Adding Content Sources
Managing Content Sources
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
Microsoft® SharePoint™ Portal Server 2001 stores content that is both internal
and external to the workspace A content source is used to specify a set of
content that is stored outside the workspace The Microsoft Search (MSSearch) service is a full-text indexing and search engine that is used to crawl, retrieve, create and update indexes for this content This module discusses this process and examines the use of content sources for accessing content that is external to the SharePoint Portal Server computer
After completing this module, you will be able to:
Describe the components that are used in the searching and indexing features of SharePoint Portal Server
Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source
Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources
In this module, you will learn
about adding and managing
content with SharePoint
Portal Server
Trang 6Components of a SharePoint Portal Server Search
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
This topic provides an overview of the technology that is used in the searching and indexing features of SharePoint Portal Server These components are used
to create and manage content sources
Trang 7The Gatherer
Accessing Filtering Indexing
Filter Daemon Process
Core Component of MSSearch
Manages How Content Is Accessed, Filtered, and Indexed
Includes Native and Registered Protocol Handlers
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
The Microsoft Gatherer performance object is the core component of
MSSearch As SharePoint Portal Server processes transactions on your system,
it generates performance data that Windows 2000 can track and log This data is
described as a performance object and is typically named for the component
generating the data The Gatherer manages the way that content is accessed, filtered, and indexed
How the Gatherer Works
The Gatherer runs inside MSSearch and interacts with a separate filter daemon process (mssdmn.exe) that performs data access and content filtering The following steps describe how the Gatherer works:
1 The filter daemon uses protocol handlers and IFilters to extract data These filters are data type–specific components that SharePoint Portal Server uses
to communicate with and filter the documents in the content source
2 The Gatherer runs the data through a series of plug-ins to process and filter the data Plug-ins are used to interpret the data and properties as it is pulled from the documents in a content source
3 The data passes through the plug-ins before the index is created and the document properties are saved to an index database (Microsoft Jet property store)
A Jet property store is separate from the Microsoft Web Storage System used by SharePoint Portal Server
Topic Objective
To explain the function of
the Gatherer
Lead-in
In this topic we will examine
the Gatherer, a core
component of SharePoint
Portal Server MSSearch
Note
Trang 8Using Protocol Handlers to Access Data Store Content
The Gatherer accesses documents in a data store by using the appropriate protocol by way of a protocol handler interface The protocol handler, which has no relation to network protocol, is an interface between the index and SharePoint Portal Server When the Gatherer processes a Uniform Resource Locater (URL) during indexing, the filter daemon determines which protocol handler to use based on the URL prefix, loads the associated dynamic link library (DLL), and passes the URL and security credentials to the protocol handler
Native Protocol Handlers
SharePoint Portal Server includes native protocol handlers, or handlers that
ship with the product, for Hypertext Transfer Protocol (HTTP), file, Microsoft Exchange 5.5, Microsoft Exchange 2000 Server, and Lotus Notes
Exchange 2000 and SharePoint Portal Server share the Web Storage System technology and the same protocol handler This protocol handler accesses a local Web Storage System by using Microsoft OLE DB Provider for Exchange 2000 Server (EXOLEDB) and uses Web Distributed Authoring and Versioning (WebDAV) to access the Web Storage System on a remote Exchange or SharePoint Portal Server computer
Registered Protocol Handlers
The following table lists the registered protocol handlers that are included with
SharePoint Portal Server
HTTP Mssph.dl MSSearch.HttpHandler.1
Gatherer Project
A search application can have one or more Microsoft Gatherer Projects
performance object Gatherer Projects are located inside a search application,
such as SharePoint Portal Server SharePoint Portal Server has one Gatherer Project for each internal or external workspace These workspaces have their own settings, such as indexing schedules The Search services uses Gatherer Projects to keep each workspace separate so it can have its own schedule
A SharePoint Portal Server workspace is a Gatherer Project with its own index
Each Gatherer Project contains its own set of build parameters, crawl
restrictions, and plug-ins Each Gatherer Project contains its own run-time
transaction log containing all URLs to be crawled and maintains its own statistics
Trang 9IFilters
Office (offfilt.dll)HTML (nlhtml.dll)Text (query.dll)
MIME (mimefilt.dll)TIFF (mspfilt.dll)Null Filter (tquery.dll)
Extract Content and Properties from Documents
Open Data Streams and Expose the Data as Indexable Chunks
SharePoint Portal Server Provides IFilters for:
*****************************I LLEGAL FOR N ON -T RAINER U SE ****************************
IFilters are the components of MSSearch that extract a document’s content and
its properties
How IFilters Work
During the filter daemon process, IFilters open data streams and expose the data
so that it can be indexed In particular, the Hypertext Markup Language (HTML) filter strips a document of all HTML tags and emits various HTML syntactic elements as properties, such as author or title, and also emits the body text Each file type, indicated by its file extension, has an IFilter associated with
it
SharePoint Portal Server provides IFilters for HTML, Microsoft Office, text, Multipurpose Internet Mail Extensions (MIME) and Tagged Image File Format (TIFF)
You should convert documents created using Office applications to Office 95 or later The office IFilter would not expose document properties of older Office documents
Topic Objective
To explain the function of
IFilters
Lead-in
In this topic we will examine
how filters extract content
and properties from
documents for indexing
Note
Trang 11Word Breakers and Noise Words
Loem Ipsum arnet
Word Breakers
Break words apart
Remove punctuation and symbols
Follow language-specific rules
Follow special case rules
Noise Words
Words that do not add value to a query (“and”, “the”)
MSSearch filters out noise words
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
Word breakers and noise words are used to facilitate indexing
Word Breakers
To correctly crawl a document to add it to an index, SharePoint Portal Server
must use word breakers A word breaker determines where the word boundaries
are in the stream of characters in the query or in a document being crawled The word breaker that is used during indexing is determined by the language that is identified and emitted by the IFilter
Function of Word Breakers
Common functions of word breakers include:
Breaking words apart at white spaces and at line and paragraph separators
Removing most punctuation and symbols
Following language-specific rules to handle such things as URLs, e-mail addresses, currency, hyphenation, and time/date For example, the e-mail address username@domain.com is broken at the @ and the period
Following special case rules For example, SharePoint Portal Server word breakers leave the string C++ intact, because if the ++ were deleted, the resulting “C” would be discarded as a noise word
Topic Objective
To explain the function of
word breakers and noise
words
Lead-in
In this topic, we will examine
how word breakers and
noise words are used to
facilitate indexing
Trang 12Using Word Breakers in Indexing
The content index uses the word breaker component in the following two situations:
When an index is created or updated The word breaker splits all text that is
referenced by the content index The index is updated continuously as documents are modified and closed
At query time A word breaker is used to break query strings into words and
phrases
For more information about word breaking at query time, see Module 7,
“Searching for Content,” in Course 2095A, Implementing Microsoft®
SharePoint™ Portal Server 2001
Using SharePoint Portal Server and Operating System Word Breakers
The word breakers included in SharePoint Portal Server override existing operating system word breakers SharePoint Portal Server calls the operating system word breaker if a special one for SharePoint Portal Server does not exist If Windows 2000 or SharePoint Portal Server does not have a special
language word breaker, the neutral word breaker is used The neutral word
breaker (query.dll) provided by the operating system breaks at white spaces and several other breaking characters
Noise Words
Both noise words and noise word lists are used by MSSearch
Using Noise Words
Noise words are words that do not add value to a query, such as “and”, “the”,
and single letters MSSearch filters out noise words to save index space and increase performance
Using Noise Word Lists
Noise word lists are customizable language-specific text files that are stored in the %systemroot%\program files\SharePoint Portal Server\data\ftdata\
SharePoint Portal Server\config folder There is one noise word list for each language that is supported For example, the noise word list for U.S English is noiseenu.txt Each file contains a list of words, with one word per line If you change the noise word list, you must perform a full update of the index to incorporate the changes
Note
Trang 13PQS plug-in
Indexing plug-in
Gatherer plug-in
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
A plug-in is a component that resides in the Gatherer data pipeline and
processes the data that is emitted by the content filters The Gatherer Project uses plug-ins to process the text and properties of collected content
Plug-in Categories
The Gatherer includes the following two categories of plug-ins:
Consumer plug-in This plug-in uses only the text and properties that are
emitted and does not affect the pipeline
Active plug-in This plug-in can affect the pipeline by adding, modifying, or
deleting properties
Default Plug-ins
The Gatherer contains four default plug-ins: the Auto-categorization Module plug-in, the Persistent Query Server (PQS) plug-in, the Indexing plug-in, and the Gatherer plug-in
Auto-Categorization Module Plug-In
The Auto-categorization (AutoCat) Module plug-in is a consumer plug-in that processes the data being streamed and uses statistical information to
automatically associate certain SharePoint Portal Server categories with documents
PQS Plug-In
The PQS plug-in is used for the SharePoint Portal Server Subscriptions feature The active PQS plug-in checks the data in the stream against subscription rules and notifies the subscription engine to generate notifications if needed
Topic Objective
To explain the function of
plug-ins
Lead-in
In this topic, we will describe
how the Gatherer uses
plug-ins
Trang 14Gatherer Plug-In
The Gatherer plug-in can be thought of as the crawl manager It receives the call to start a crawl, checks for crawl restrictions, and maintains the crawl queue and history It is present in every Gatherer project, regardless of the
configuration
Trang 15One or more master indexes
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
The indexing database is a collection of word lists, shadow indexes, and one or more master indexes Each data structure contains the same type of information and is optimized for a different stage in the life cycle of the index
Word List
Word lists can be quickly created since they are in memory This also means a document is accessed quickly The crawl is not held up for very long as the word list is being written and the crawl can move quickly from document to document
Shadow Index
Because word lists exist only in memory and take up too much space to be used for long-term storage, the MSSearch service automatically transfers data in
word lists to a shadow index A shadow index is a disk-based structure that is
created when a specified number of word lists exists Because data in a shadow index is compressed, access time is slower than for a word list Creating a shadow index is also much slower than creating a word list After a shadow index is created, it cannot be modified Further, if MSSearch determines that there are too many shadow indexes, they will merge to create new shadow indexes, building on existing shadow indexes and word lists
Because shadow indexes cannot be modified, the number of shadow indexes in the content index will grow over time as new word lists are converted to shadow indexes
Topic Objective
To explain the function of an
indexing database and its
collection of four indexes
Lead-in
In this topic, we will examine
how SharePoint Portal
Server provides a consistent
structure for the
components of the indexing
database
Trang 16Master Index
Because the access time for a shadow index is almost constant regardless of size, content index performance will decrease as more shadow indexes are created Therefore, it is advantageous to merge shadow indexes into a master
index In SharePoint Portal Server, this process is called a master merge and it
happens by default every night at midnight, after a specific number of documents have been indexed or if disk space gets too low You cannot manually initiate the creation of a master index The master index, which is the final repository for all indexing information, is by far the largest index The optimal content index is a master index, with no word lists or shadow indexes The content of the word lists and shadow indexes now exists only in the master index
Trang 17Adding Content Sources
Adding a Content Source
Adding a Web Content Source
Adding an Exchange 5.5 Content Source
Adding an Exchange 2000 Content Source
Adding a Lotus Notes Content Source
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
In addition to storing content in standard and enhanced folders in the workspace, SharePoint Portal Server provides access to content that is stored outside the workspace, by means of content sources SharePoint Portal Server provides read access to, and searching within, content sources, but content sources cannot be edited, checked in, or checked out This section describes some of the basic features of content sources and how to add them to your Content Sources folder
Topic Objective
To outline this topic
Lead-in
In this section, you will learn
about the basic procedure
for adding a content source
Trang 18Adding a Content Source
Content
Management
Content Sources
~~~ ~~~ ~~~
Users Index
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
A content source represents an external location, indicated by a URL, where the content is stored and accessed for indexing You create and store links to this content in the Content Sources folder that is located in the Management folder Content can be located on the same server, a server on your intranet, or a server
on the Internet
Defining a Content Source
A content source is defined by:
The type of data store that is accessed, such as a network file server, a Web server, an Exchange server, or a Lotus Notes database
The address, a URL containing the host name and a path, that is required to locate the content
Additional parameters that control how the index of the content is created
Topic Objective
To describe the function of a
content source as well as
how to add a content
source
Lead-in
In this topic, you will learn
how to prepare for adding a
content source
Trang 19Types of Content Sources
When you add a content source to the Content Sources folder, you must provide
an address or URL for that content The following table lists the types of information that you can add to the workspace as a content source
Lotus Notes database Before you can create this content source, the Lotus Notes client must be properly installed on the SharePoint Portal Server computer, and the computer must be properly configured with the NotesSetup utility
Provide the name of the database and the address of the database server, such as:
//noteserver
Other SharePoint Portal Server workspaces
http://server/workspace/folder/
Creating and Updating an Index of the Content
On a regular basis, SharePoint Portal Server creates and updates an index of the content that is made available through content sources After SharePoint Portal Server includes a content source in the workspace index, users with appropriate permissions can search for and view its content on the dashboard site However, users cannot check out and edit content sources or the documents that are accessed through the content sources
SharePoint Portal Server supports indexing of content that is stored on Web sites, network file shares, Lotus Notes version 4.6a / R5 databases,
Exchange 5.5 servers, Exchange 2000 servers, and other SharePoint Portal Server workspaces You can also write custom protocol handlers that gather content from additional stores
File Formats
SharePoint Portal Server supports only certain document file formats
File Formats Supported by SharePoint Portal Server
SharePoint Portal Server supports any of the following document file formats: Microsoft Office Suite, TIFF, MIME, HTML, and Lotus Notes Plug-ins are available from the vendors’ Web sites for Adobe PDF files and Corel WordPerfect files
File Formats Not Supported by SharePoint Portal Server
The current version of SharePoint Portal Server does not support some document file formats For example, Microsoft Visio® and Microsoft Project are not supported file types This information is important to remember when you crawl content or create an index
Trang 20Adding a Content Source to the Workspace
To add a content source, you use the Content Source Wizard in the Content Sources folder under the Management folder Before you can add a content source to your workspace, you must have read access to the source, know where the content source files are stored, and know how the files will be searched Before you can add a content source to the workspace, the workspace administrator must specify a default content access account
If the administrator has not configured a default account for SharePoint Portal Server to crawl, the wizard will prompt for one This account will be used to connect to the content source SharePoint Portal Server also will allow you to create indexes immediately, or you may choose to do so later
To add a content source to your SharePoint Portal Server workspace:
1 Specify the location of the external content that you want to add to the workspace
You can add any one of five types of content sources using the Content Source Wizard
You must choose content that is external to the current workspace
2 Open the Management folder, and then open the Content Sources folder
3 Double-click Add Content Source
4 The Add Content Source Wizard opens
a Define the content type by selecting the content source type that you want to incorporate into the index
b Provide a path that directs SharePoint Portal Server to the linked content
by providing an address or URL for Web content or by providing the database address and name for a Lotus Notes database
The new content source is placed in the Content Sources folder The information available from the source is included in the workspace index and is available for users to search for and view on the dashboard site
For information about content access accounts, see Module 9, “Managing
SharePoint Portal Server,” in Course 2095A, Implementing Microsoft ®
SharePoint ™ Portal Server 2001
Important
Note
Trang 21Adding a Web Content Source
To Add a Web Content Source:
Run the Add Content Source Wizard
Select Web Site, File Share, or SharePoint Portal Server as the content type
Enter a valid URL or UNC path to the content, and specify the desired crawl depth
Assign the content source a unique display name
On the Finish page, you can choose to start the full build immediately or you can initiate it later
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
Adding a Web content source for a Web server, network file share, and remote SharePoint Portal Server workspace requires a simple URL or Uniform Naming Convention (UNC) file path
To add a Web content source:
1 Run the Add Content Source Wizard
2 Select Web Site, File Share, or SharePoint Portal Server as the content type
3 Enter a valid URL or UNC path to the content, and specify the desired crawl depth
4 Assign a unique display name to the content source
5 On the Finish page, you can choose to start the full build immediately, or
you can initiate it later
For network file shares, you can specify any standard shared folder on a Windows file system MSSearch is also able to crawl mounted network file shares on other operating systems that support the server message block (SMB) protocol For example IBM OS/2, Novell Netware, and UNIX running an SMB service like Samba
In Microsoft Site Server 3.0, users can map custom properties stored
in HTML META tags to Office properties using the text files schema.txt and gathererprm.txt so that the metadata will be indexed SharePoint Portal Server version 1 does not support schema mapping using these files Custom properties
in META tags will not be included in the index if they match properties in the SharePoint Portal Server schema
Topic Objective
To describe how to add a
Web content source
Lead-in
In this topic, we will explore
how to add a Web content
source
Important
Trang 22Connecting to a Secure Site
When you are connecting to a secure site, you must specify an account that has the appropriate type of access and authentication credentials MSSearch runs as
a local system account and must impersonate an access account by using the credentials that you provide You must specify a default content access account
during Setup You can change the account at any time by using the Accounts tab on the Properties page of the server in SharePoint Portal Server
Administration A coordinator can also specify an account other than the default
by creating a site path rule for the URL or UNC path
Using HTTP Protocol and Authentication Methods
When the Gatherer connects to a SharePoint Portal Server or Web content source, it uses the HTTP protocol and HTTP authentication methods To validate the content access account, it can use the Basic, Anonymous, or Integrated Windows authentication method By default, content sources always use the Integrated Windows authentication method To configure the content source to use the Basic authentication method, you must create a site path rule Because the Basic authentication method sends credentials over the network unencrypted, an administrator must ensure this does not pose a security risk To secure portal connections, you can enable Secure Socket Layer (SSL) on the workspace virtual directory in Microsoft Internet Information Server (IIS) When the Gatherer connects to a file content source, it uses the SMB protocol and Integrated Windows authentication When accessing file systems other than Windows, such as UNIX or Netware, you must use the Basic authentication method
When crawling content in a non-trusted domain, you must use the Basic authentication method, which you can set by using a site path rule You also cannot set a default content access account that resides in a non-trusted domain
Be careful when you set the crawl settings If you configure a site to follow all links, make sure that you are aware of the depth and size of the site You might use excessive bandwidth and not have enough disk space to crawl large sites
Important
Warning
Trang 23Adding an Exchange 5.5 Content Source
Required The Outlook 2000 client must be installed The Exchange server name
The Outlook Web Access server name The Exchange site the server belongs to The Exchange organization the server belongs to
An access account
To Add Exchange 5.5 Content Source:
Provide the path to the public folders
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
Before you can add an Exchange 5.5 content source, you must enable this feature in SharePoint Portal Server Because MSSearch requires Messaging Application Programming Interface (MAPI) files and Windows 2000 does not contain these files, you must install Microsoft Outlook® 2000, including the Collaboration Data Objects (CDO) component on the SharePoint Portal Server computer before you can crawl Exchange 5.5 content You do not need to configure a MAPI profile If these conditions are not met, the content source will not be created The following information is also required:
The name of the Exchange server
The name of the Microsoft Outlook Web Access server If the name of the server is not specified, it is assumed that it is installed on the Exchange server that is being indexed You do not need to use Outlook Web Access, but if you do not, SharePoint Portal Server requires additional configuration
to crawl the public folders
The Exchange site that the server belongs to
The Exchange organization that the server belongs to
An access account with Administrator privilege on the Organization
Implementation,” in Course 2095A, Implementing Microsoft® SharePoint™
In this topic, we will explore
how to add an Exchange 5.5
content source
Important
Tip
Trang 24Using the Exchange Service Account
Although you can use the Exchange service account to crawl content, any account that has Administrator rights on the Organization container can be used It is not necessary to grant permissions on the Site, Site Configuration, or Server containers Exchange Administrator privileges are required because:
Exchange 5.5 does not use Windows access control lists (ACLs) to secure content, which requires MSSearch to communicate with the Exchange 5.5 directory (dir.edb) at query time to filter out any results for which the user does not have access
Crawling Exchange 5.5 uses MAPI calls that require Administrator privileges
Providing a Public Folder Path
When you add a content source, you are simply providing the path to the public folder The path format reflects the hierarchy of the public folders and starts with exch:// Each folder name is separated by a slash mark (/)
For example, to crawl a folder called Company News, use the start address
exch://ExchangeServer/Public Folders/All Public Folders/Company News, where ExchangeServer is the name of the Exchange server that is configured
for Search and the name of the public folder tree is All Public Folders To crawl all public folders, the path must end with All Public Folders/ (note trailing slash mark)
For Your Information
Site Server 3.0 Search
crawling Exchange 5.5
setup was very similar to
SharePoint Portal Server
crawling an Exchange 5.5
content source However,
Site Server required
MSSearch to run in the
context of the Exchange
Administrator account With
SharePoint Portal Server,
the service runs as the local
system account and
impersonates the Exchange
account only when crawling
and performing security
validations on search
results
Trang 25Adding an Exchange 2000 Content Source
Index
Exchange Public Folders
SharePoint Portal Server Indexes Any Items That Can be Read by the Access Account Provided in Exchange 2000
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
SharePoint Portal Server can crawl both the content of Exchange 5.5 (no service pack is required) and Exchange 2000 Server SharePoint Portal Server crawls only public folder items for Exchange 5.5 and any items that can be read by the access account provided in Exchange 2000 This means you can search for content in private mailboxes in Exchange 2000, such as a shared departmental mailbox
Accessing Exchange Content
To access Exchange content that is returned in search results on the dashboard site, click the Web link, which retrieves and displays the content by using Outlook Web Access
Indexing Office Attachments
On crawled messages, Exchange 2000 creates indexes of the following attachments:
Office attachments The metadata of an attachment is included in the index
Custom properties of Office attachments Unlike Site Server, the custom
properties of an Office attachment are included in the index if they match SharePoint Portal Server properties, just as with documents inside a SharePoint Portal Server Web folder
Attachments that the Gatherer usually filters For example, an htm file is
included in the index However, the search results for an attachment display the subject and author of the message
For more information about installing and accessing Outlook Web Access, see the Exchange Server documentation
Trang 26Crawling an Exchange 2000 Server
Crawling an Exchange 2000 server is essentially the same as crawling an external SharePoint Portal Server computer because the content for both servers
is stored in the Web Storage System The crawler uses HTTP and WebDAV to gather the content
Exchange 2000 Content Source Features
Unlike Exchange 5.5, Exchange 2000 content sources can be configured to have the ability to:
Specify any domain user account for the content access account Because Windows 2000 ACLs are used in Exchange 2000, administrative access to Exchange is not required However, the account must have permission to read the content that will be crawled
Crawl multiple Exchange 2000 servers
Crawl content outside public folders, such as user mailboxes in the private information store, if the content account has the correct permissions
Not specify an administrator account in Microsoft Management Console (MMC) or Outlook 2000 MSSearch does not need to impersonate an administrator account to verify permissions to view search results because Windows 2000 ACLs are used
Crawl documents according to their content class If the document’s content class matches a content class (document profile) in the SharePoint Portal Server schema, properties will be included in the index according to the SharePoint Portal Server schema definition, not the Exchange schema If the content classes do not match, MSSearch uses the Base Document profile and crawls accordingly Because MSSearch does not use the Exchange 2000 schema, custom Exchange properties in public folder items are not included
in the index
Determining a Public Folder Path
A typical path that you specify for Exchange 2000 public folder content in the Content Source Wizard uses the following URL format:
http://exchange_server_name/public/public_folder_tree_name/folder name
The default content access account is used to access Exchange 2000 servers If you want to specify different access accounts for separate Exchange 2000 content sources, use site path rules to configure them To access private information store content, such as a user’s mailbox, use the following URL format:
http:// exchange_server_name/exchange/mailbox_alias
If you do specify a separate account in a site path rule, you must create a separate rule for any redirected URLs for folders that are replicated to another server For example, if you are crawling http://serverA/public/folderA and it is redirected to http://serverB/public, you must create an additional site path rule for http://serverB
Trang 27Adding a Lotus Notes Content Source
Special Planning and Configuration Install an R5 or later Lotus Notes client Configure the Lotus Notes client with a Lotus Notes ID Configure Lotus Notes to Windows NT user mapping Run the Lotus Notes Indexing Setup Wizard
Create a document profile to map properties
Index Lotus Notes
*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************
Adding a Lotus Notes content source requires special planning and configuration prior to running the Add Content Source Wizard You must perform the following tasks before you add a Lotus Notes content source:
Install an R5 or later Lotus Notes client on the SharePoint Portal Server computer (You can, however, crawl either 4.6a or R5 servers.)
Configure the Lotus Notes client with a Lotus Notes ID that has reader access to the databases that you wish to crawl, and ensure that you can connect to the server from the SharePoint Portal Server computer that is functioning as a Lotus Notes client
Configure Lotus Notes to Microsoft Windows NT® user mapping along with a special Lotus Notes view if you want to provide secure access to Lotus Notes databases
Run the Lotus Notes Indexing Setup Wizard after it is installed on your SharePoint Portal Server computer
Create a document profile to map properties if needed
Displaying Property Types
The protocol handler provides Number, Date, and Text property types and resolves numeric and string types When the user creates a content source for Lotus Notes and maps SharePoint Portal Server properties to Lotus Notes properties, the property type for each Lotus Notes property is displayed, while the property type for SharePoint Portal Server properties is not displayed
If the user maps a number to a string or vice versa, the user does not receive any feedback from the user interface that an error has occurred
SharePoint Portal Server crawls the Lotus Notes database and creates an index
of the content according to the Lotus Notes property types Queries use the SharePoint Portal Server property types, and if the type has been mismatched,
no results are returned
Topic Objective
To describe how to add a
Lotus Notes content source
Lead-in
In this topic, we will explore
how to add a Lotus Notes
content source
Important