1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Module 6: Adding and Managing External Content doc

54 427 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Adding and Managing External Content
Trường học Microsoft Corporation
Chuyên ngành Information Technology
Thể loại giáo trình
Năm xuất bản 2001
Định dạng
Số trang 54
Dung lượng 1,34 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Adding Content Sources Explain that SharePoint Portal Server provides access to content that is stored outside the workspace and that this content is referred to as a content source.. Th

Trang 1

Contents

Overview 1

Components of a SharePoint Portal Server

Search 2

Lab A: Adding External Content to a

Trang 2

Information in this document is subject to change without notice The names of companies, products, people, characters, and/or data mentioned herein are fictitious and are in no way intended

to represent any real individual, company, product, or event, unless otherwise noted Complying with all applicable copyright laws is the responsibility of the user No part of this document may

be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation If, however, your only means of access is electronic, permission to print one copy is hereby granted

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property

 2001 Microsoft Corporation All rights reserved

Microsoft, Active Directory, Active X, FrontPage, JScript, MS-DOS, NetMeeting, Outlook, PowerPoint, SharePoint, Windows, Windows NT, Visio, Visual Basic, Visual SourceSafe, Visual Studio, and Win32 are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A and/or other countries

Other product and company names mentioned herein may be the trademarks of their respective owners

Trang 3

Instructor Notes

This module provides students with the information necessary to add and manage a Microsoft® SharePoint™ Portal Server content source

After completing this module, students will be able to:

 Describe the components that are used in the searching and indexing features of SharePoint Portal Server

 Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source

 Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources

Materials and Preparation

This section provides the materials and preparation tasks that you need to teach this module

Required Materials

To teach this module, you need the Microsoft PowerPoint® file 2095a_6.ppt

Preparation Tasks

To prepare for this module, you should:

 Read all of the materials for this module

 Complete the lab

Instructor Setup for a Lab

This section provides setup instructions that are required to prepare the instructor computer or classroom configuration for a lab

Lab A: Adding External Content to a Workspace

 To prepare for the lab

• Classroom configured according to the setup guide for course 2059a

Presentation:

60 Minutes

Lab:

30 Minutes

Trang 4

Module Strategy

Use the following strategy to present this module:

 Components of a SharePoint Portal Server Search Describe the five components of a SharePoint Portal Server search, which includes the Gatherer, IFilters, word breakers and noise words, plug-ins, and indexing databases Describe the function of each of these components and then briefly explain how each component works

 Adding Content Sources Explain that SharePoint Portal Server provides access to content that is stored outside the workspace and that this content is referred to as a content source Describe the basic features of content sources and then explain how

to add various content sources to a Content Sources folder

 Managing Content Sources Explain that once a content source has been added, it must be managed to ensure that it used effectively during searches Discuss how to manage a content source by configuring crawl settings, search scopes, index updates, rules, gatherer log files and discussion settings as well as other management functions

Customization Information

This section identifies the lab setup requirements for a module and the configuration changes that occur on student computers during the labs This information is provided to assist you in replicating or customizing Training and Certification courseware

The lab in this module is also dependent on the classroom configuration that is specified in the Customization Information section in the

Classroom Setup Guide for Course 2095A, Implementing Microsoft ®

SharePoint ™ Portal Server 2001

Trang 5

Overview

 Components of a SharePoint Portal Server Search

 Adding Content Sources

 Managing Content Sources

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

Microsoft® SharePoint™ Portal Server 2001 stores content that is both internal

and external to the workspace A content source is used to specify a set of

content that is stored outside the workspace The Microsoft Search (MSSearch) service is a full-text indexing and search engine that is used to crawl, retrieve, create and update indexes for this content This module discusses this process and examines the use of content sources for accessing content that is external to the SharePoint Portal Server computer

After completing this module, you will be able to:

 Describe the components that are used in the searching and indexing features of SharePoint Portal Server

 Define content source and describe the types of content that are supported, how a content source is used, and how to add a content source

 Manage a content source by setting schedules, scope, and rules, and describe additional functions that apply to content sources

In this module, you will learn

about adding and managing

content with SharePoint

Portal Server

Trang 6

 Components of a SharePoint Portal Server Search

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

This topic provides an overview of the technology that is used in the searching and indexing features of SharePoint Portal Server These components are used

to create and manage content sources

Trang 7

The Gatherer

Accessing Filtering Indexing

Filter Daemon Process

 Core Component of MSSearch

 Manages How Content Is Accessed, Filtered, and Indexed

 Includes Native and Registered Protocol Handlers

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

The Microsoft Gatherer performance object is the core component of

MSSearch As SharePoint Portal Server processes transactions on your system,

it generates performance data that Windows 2000 can track and log This data is

described as a performance object and is typically named for the component

generating the data The Gatherer manages the way that content is accessed, filtered, and indexed

How the Gatherer Works

The Gatherer runs inside MSSearch and interacts with a separate filter daemon process (mssdmn.exe) that performs data access and content filtering The following steps describe how the Gatherer works:

1 The filter daemon uses protocol handlers and IFilters to extract data These filters are data type–specific components that SharePoint Portal Server uses

to communicate with and filter the documents in the content source

2 The Gatherer runs the data through a series of plug-ins to process and filter the data Plug-ins are used to interpret the data and properties as it is pulled from the documents in a content source

3 The data passes through the plug-ins before the index is created and the document properties are saved to an index database (Microsoft Jet property store)

A Jet property store is separate from the Microsoft Web Storage System used by SharePoint Portal Server

Topic Objective

To explain the function of

the Gatherer

Lead-in

In this topic we will examine

the Gatherer, a core

component of SharePoint

Portal Server MSSearch

Note

Trang 8

Using Protocol Handlers to Access Data Store Content

The Gatherer accesses documents in a data store by using the appropriate protocol by way of a protocol handler interface The protocol handler, which has no relation to network protocol, is an interface between the index and SharePoint Portal Server When the Gatherer processes a Uniform Resource Locater (URL) during indexing, the filter daemon determines which protocol handler to use based on the URL prefix, loads the associated dynamic link library (DLL), and passes the URL and security credentials to the protocol handler

Native Protocol Handlers

SharePoint Portal Server includes native protocol handlers, or handlers that

ship with the product, for Hypertext Transfer Protocol (HTTP), file, Microsoft Exchange 5.5, Microsoft Exchange 2000 Server, and Lotus Notes

Exchange 2000 and SharePoint Portal Server share the Web Storage System technology and the same protocol handler This protocol handler accesses a local Web Storage System by using Microsoft OLE DB Provider for Exchange 2000 Server (EXOLEDB) and uses Web Distributed Authoring and Versioning (WebDAV) to access the Web Storage System on a remote Exchange or SharePoint Portal Server computer

Registered Protocol Handlers

The following table lists the registered protocol handlers that are included with

SharePoint Portal Server

HTTP Mssph.dl MSSearch.HttpHandler.1

Gatherer Project

A search application can have one or more Microsoft Gatherer Projects

performance object Gatherer Projects are located inside a search application,

such as SharePoint Portal Server SharePoint Portal Server has one Gatherer Project for each internal or external workspace These workspaces have their own settings, such as indexing schedules The Search services uses Gatherer Projects to keep each workspace separate so it can have its own schedule

A SharePoint Portal Server workspace is a Gatherer Project with its own index

Each Gatherer Project contains its own set of build parameters, crawl

restrictions, and plug-ins Each Gatherer Project contains its own run-time

transaction log containing all URLs to be crawled and maintains its own statistics

Trang 9

IFilters

Office (offfilt.dll)HTML (nlhtml.dll)Text (query.dll)

MIME (mimefilt.dll)TIFF (mspfilt.dll)Null Filter (tquery.dll)

 Extract Content and Properties from Documents

 Open Data Streams and Expose the Data as Indexable Chunks

 SharePoint Portal Server Provides IFilters for:

*****************************I LLEGAL FOR N ON -T RAINER U SE ****************************

IFilters are the components of MSSearch that extract a document’s content and

its properties

How IFilters Work

During the filter daemon process, IFilters open data streams and expose the data

so that it can be indexed In particular, the Hypertext Markup Language (HTML) filter strips a document of all HTML tags and emits various HTML syntactic elements as properties, such as author or title, and also emits the body text Each file type, indicated by its file extension, has an IFilter associated with

it

SharePoint Portal Server provides IFilters for HTML, Microsoft Office, text, Multipurpose Internet Mail Extensions (MIME) and Tagged Image File Format (TIFF)

You should convert documents created using Office applications to Office 95 or later The office IFilter would not expose document properties of older Office documents

Topic Objective

To explain the function of

IFilters

Lead-in

In this topic we will examine

how filters extract content

and properties from

documents for indexing

Note

Trang 11

Word Breakers and Noise Words

Loem Ipsum arnet

 Word Breakers

 Break words apart

 Remove punctuation and symbols

 Follow language-specific rules

 Follow special case rules

 Noise Words

 Words that do not add value to a query (“and”, “the”)

 MSSearch filters out noise words

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

Word breakers and noise words are used to facilitate indexing

Word Breakers

To correctly crawl a document to add it to an index, SharePoint Portal Server

must use word breakers A word breaker determines where the word boundaries

are in the stream of characters in the query or in a document being crawled The word breaker that is used during indexing is determined by the language that is identified and emitted by the IFilter

Function of Word Breakers

Common functions of word breakers include:

 Breaking words apart at white spaces and at line and paragraph separators

 Removing most punctuation and symbols

 Following language-specific rules to handle such things as URLs, e-mail addresses, currency, hyphenation, and time/date For example, the e-mail address username@domain.com is broken at the @ and the period

 Following special case rules For example, SharePoint Portal Server word breakers leave the string C++ intact, because if the ++ were deleted, the resulting “C” would be discarded as a noise word

Topic Objective

To explain the function of

word breakers and noise

words

Lead-in

In this topic, we will examine

how word breakers and

noise words are used to

facilitate indexing

Trang 12

Using Word Breakers in Indexing

The content index uses the word breaker component in the following two situations:

 When an index is created or updated The word breaker splits all text that is

referenced by the content index The index is updated continuously as documents are modified and closed

 At query time A word breaker is used to break query strings into words and

phrases

For more information about word breaking at query time, see Module 7,

“Searching for Content,” in Course 2095A, Implementing Microsoft®

SharePointPortal Server 2001

Using SharePoint Portal Server and Operating System Word Breakers

The word breakers included in SharePoint Portal Server override existing operating system word breakers SharePoint Portal Server calls the operating system word breaker if a special one for SharePoint Portal Server does not exist If Windows 2000 or SharePoint Portal Server does not have a special

language word breaker, the neutral word breaker is used The neutral word

breaker (query.dll) provided by the operating system breaks at white spaces and several other breaking characters

Noise Words

Both noise words and noise word lists are used by MSSearch

Using Noise Words

Noise words are words that do not add value to a query, such as “and”, “the”,

and single letters MSSearch filters out noise words to save index space and increase performance

Using Noise Word Lists

Noise word lists are customizable language-specific text files that are stored in the %systemroot%\program files\SharePoint Portal Server\data\ftdata\

SharePoint Portal Server\config folder There is one noise word list for each language that is supported For example, the noise word list for U.S English is noiseenu.txt Each file contains a list of words, with one word per line If you change the noise word list, you must perform a full update of the index to incorporate the changes

Note

Trang 13

 PQS plug-in

 Indexing plug-in

 Gatherer plug-in

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

A plug-in is a component that resides in the Gatherer data pipeline and

processes the data that is emitted by the content filters The Gatherer Project uses plug-ins to process the text and properties of collected content

Plug-in Categories

The Gatherer includes the following two categories of plug-ins:

 Consumer plug-in This plug-in uses only the text and properties that are

emitted and does not affect the pipeline

 Active plug-in This plug-in can affect the pipeline by adding, modifying, or

deleting properties

Default Plug-ins

The Gatherer contains four default plug-ins: the Auto-categorization Module plug-in, the Persistent Query Server (PQS) plug-in, the Indexing plug-in, and the Gatherer plug-in

Auto-Categorization Module Plug-In

The Auto-categorization (AutoCat) Module plug-in is a consumer plug-in that processes the data being streamed and uses statistical information to

automatically associate certain SharePoint Portal Server categories with documents

PQS Plug-In

The PQS plug-in is used for the SharePoint Portal Server Subscriptions feature The active PQS plug-in checks the data in the stream against subscription rules and notifies the subscription engine to generate notifications if needed

Topic Objective

To explain the function of

plug-ins

Lead-in

In this topic, we will describe

how the Gatherer uses

plug-ins

Trang 14

Gatherer Plug-In

The Gatherer plug-in can be thought of as the crawl manager It receives the call to start a crawl, checks for crawl restrictions, and maintains the crawl queue and history It is present in every Gatherer project, regardless of the

configuration

Trang 15

 One or more master indexes

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

The indexing database is a collection of word lists, shadow indexes, and one or more master indexes Each data structure contains the same type of information and is optimized for a different stage in the life cycle of the index

Word List

Word lists can be quickly created since they are in memory This also means a document is accessed quickly The crawl is not held up for very long as the word list is being written and the crawl can move quickly from document to document

Shadow Index

Because word lists exist only in memory and take up too much space to be used for long-term storage, the MSSearch service automatically transfers data in

word lists to a shadow index A shadow index is a disk-based structure that is

created when a specified number of word lists exists Because data in a shadow index is compressed, access time is slower than for a word list Creating a shadow index is also much slower than creating a word list After a shadow index is created, it cannot be modified Further, if MSSearch determines that there are too many shadow indexes, they will merge to create new shadow indexes, building on existing shadow indexes and word lists

Because shadow indexes cannot be modified, the number of shadow indexes in the content index will grow over time as new word lists are converted to shadow indexes

Topic Objective

To explain the function of an

indexing database and its

collection of four indexes

Lead-in

In this topic, we will examine

how SharePoint Portal

Server provides a consistent

structure for the

components of the indexing

database

Trang 16

Master Index

Because the access time for a shadow index is almost constant regardless of size, content index performance will decrease as more shadow indexes are created Therefore, it is advantageous to merge shadow indexes into a master

index In SharePoint Portal Server, this process is called a master merge and it

happens by default every night at midnight, after a specific number of documents have been indexed or if disk space gets too low You cannot manually initiate the creation of a master index The master index, which is the final repository for all indexing information, is by far the largest index The optimal content index is a master index, with no word lists or shadow indexes The content of the word lists and shadow indexes now exists only in the master index

Trang 17

 Adding Content Sources

 Adding a Content Source

 Adding a Web Content Source

 Adding an Exchange 5.5 Content Source

 Adding an Exchange 2000 Content Source

 Adding a Lotus Notes Content Source

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

In addition to storing content in standard and enhanced folders in the workspace, SharePoint Portal Server provides access to content that is stored outside the workspace, by means of content sources SharePoint Portal Server provides read access to, and searching within, content sources, but content sources cannot be edited, checked in, or checked out This section describes some of the basic features of content sources and how to add them to your Content Sources folder

Topic Objective

To outline this topic

Lead-in

In this section, you will learn

about the basic procedure

for adding a content source

Trang 18

Adding a Content Source

Content

Management

Content Sources

~~~ ~~~ ~~~

Users Index

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

A content source represents an external location, indicated by a URL, where the content is stored and accessed for indexing You create and store links to this content in the Content Sources folder that is located in the Management folder Content can be located on the same server, a server on your intranet, or a server

on the Internet

Defining a Content Source

A content source is defined by:

 The type of data store that is accessed, such as a network file server, a Web server, an Exchange server, or a Lotus Notes database

 The address, a URL containing the host name and a path, that is required to locate the content

 Additional parameters that control how the index of the content is created

Topic Objective

To describe the function of a

content source as well as

how to add a content

source

Lead-in

In this topic, you will learn

how to prepare for adding a

content source

Trang 19

Types of Content Sources

When you add a content source to the Content Sources folder, you must provide

an address or URL for that content The following table lists the types of information that you can add to the workspace as a content source

Lotus Notes database Before you can create this content source, the Lotus Notes client must be properly installed on the SharePoint Portal Server computer, and the computer must be properly configured with the NotesSetup utility

Provide the name of the database and the address of the database server, such as:

//noteserver

Other SharePoint Portal Server workspaces

http://server/workspace/folder/

Creating and Updating an Index of the Content

On a regular basis, SharePoint Portal Server creates and updates an index of the content that is made available through content sources After SharePoint Portal Server includes a content source in the workspace index, users with appropriate permissions can search for and view its content on the dashboard site However, users cannot check out and edit content sources or the documents that are accessed through the content sources

SharePoint Portal Server supports indexing of content that is stored on Web sites, network file shares, Lotus Notes version 4.6a / R5 databases,

Exchange 5.5 servers, Exchange 2000 servers, and other SharePoint Portal Server workspaces You can also write custom protocol handlers that gather content from additional stores

File Formats

SharePoint Portal Server supports only certain document file formats

File Formats Supported by SharePoint Portal Server

SharePoint Portal Server supports any of the following document file formats: Microsoft Office Suite, TIFF, MIME, HTML, and Lotus Notes Plug-ins are available from the vendors’ Web sites for Adobe PDF files and Corel WordPerfect files

File Formats Not Supported by SharePoint Portal Server

The current version of SharePoint Portal Server does not support some document file formats For example, Microsoft Visio® and Microsoft Project are not supported file types This information is important to remember when you crawl content or create an index

Trang 20

Adding a Content Source to the Workspace

To add a content source, you use the Content Source Wizard in the Content Sources folder under the Management folder Before you can add a content source to your workspace, you must have read access to the source, know where the content source files are stored, and know how the files will be searched Before you can add a content source to the workspace, the workspace administrator must specify a default content access account

If the administrator has not configured a default account for SharePoint Portal Server to crawl, the wizard will prompt for one This account will be used to connect to the content source SharePoint Portal Server also will allow you to create indexes immediately, or you may choose to do so later

To add a content source to your SharePoint Portal Server workspace:

1 Specify the location of the external content that you want to add to the workspace

You can add any one of five types of content sources using the Content Source Wizard

You must choose content that is external to the current workspace

2 Open the Management folder, and then open the Content Sources folder

3 Double-click Add Content Source

4 The Add Content Source Wizard opens

a Define the content type by selecting the content source type that you want to incorporate into the index

b Provide a path that directs SharePoint Portal Server to the linked content

by providing an address or URL for Web content or by providing the database address and name for a Lotus Notes database

The new content source is placed in the Content Sources folder The information available from the source is included in the workspace index and is available for users to search for and view on the dashboard site

For information about content access accounts, see Module 9, “Managing

SharePoint Portal Server,” in Course 2095A, Implementing Microsoft ®

SharePoint ™ Portal Server 2001

Important

Note

Trang 21

Adding a Web Content Source

To Add a Web Content Source:

Run the Add Content Source Wizard

Select Web Site, File Share, or SharePoint Portal Server as the content type

Enter a valid URL or UNC path to the content, and specify the desired crawl depth

Assign the content source a unique display name

On the Finish page, you can choose to start the full build immediately or you can initiate it later

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

Adding a Web content source for a Web server, network file share, and remote SharePoint Portal Server workspace requires a simple URL or Uniform Naming Convention (UNC) file path

To add a Web content source:

1 Run the Add Content Source Wizard

2 Select Web Site, File Share, or SharePoint Portal Server as the content type

3 Enter a valid URL or UNC path to the content, and specify the desired crawl depth

4 Assign a unique display name to the content source

5 On the Finish page, you can choose to start the full build immediately, or

you can initiate it later

For network file shares, you can specify any standard shared folder on a Windows file system MSSearch is also able to crawl mounted network file shares on other operating systems that support the server message block (SMB) protocol For example IBM OS/2, Novell Netware, and UNIX running an SMB service like Samba

In Microsoft Site Server 3.0, users can map custom properties stored

in HTML META tags to Office properties using the text files schema.txt and gathererprm.txt so that the metadata will be indexed SharePoint Portal Server version 1 does not support schema mapping using these files Custom properties

in META tags will not be included in the index if they match properties in the SharePoint Portal Server schema

Topic Objective

To describe how to add a

Web content source

Lead-in

In this topic, we will explore

how to add a Web content

source

Important

Trang 22

Connecting to a Secure Site

When you are connecting to a secure site, you must specify an account that has the appropriate type of access and authentication credentials MSSearch runs as

a local system account and must impersonate an access account by using the credentials that you provide You must specify a default content access account

during Setup You can change the account at any time by using the Accounts tab on the Properties page of the server in SharePoint Portal Server

Administration A coordinator can also specify an account other than the default

by creating a site path rule for the URL or UNC path

Using HTTP Protocol and Authentication Methods

When the Gatherer connects to a SharePoint Portal Server or Web content source, it uses the HTTP protocol and HTTP authentication methods To validate the content access account, it can use the Basic, Anonymous, or Integrated Windows authentication method By default, content sources always use the Integrated Windows authentication method To configure the content source to use the Basic authentication method, you must create a site path rule Because the Basic authentication method sends credentials over the network unencrypted, an administrator must ensure this does not pose a security risk To secure portal connections, you can enable Secure Socket Layer (SSL) on the workspace virtual directory in Microsoft Internet Information Server (IIS) When the Gatherer connects to a file content source, it uses the SMB protocol and Integrated Windows authentication When accessing file systems other than Windows, such as UNIX or Netware, you must use the Basic authentication method

When crawling content in a non-trusted domain, you must use the Basic authentication method, which you can set by using a site path rule You also cannot set a default content access account that resides in a non-trusted domain

Be careful when you set the crawl settings If you configure a site to follow all links, make sure that you are aware of the depth and size of the site You might use excessive bandwidth and not have enough disk space to crawl large sites

Important

Warning

Trang 23

Adding an Exchange 5.5 Content Source

Required The Outlook 2000 client must be installed The Exchange server name

The Outlook Web Access server name The Exchange site the server belongs to The Exchange organization the server belongs to

An access account

To Add Exchange 5.5 Content Source:

Provide the path to the public folders

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

Before you can add an Exchange 5.5 content source, you must enable this feature in SharePoint Portal Server Because MSSearch requires Messaging Application Programming Interface (MAPI) files and Windows 2000 does not contain these files, you must install Microsoft Outlook® 2000, including the Collaboration Data Objects (CDO) component on the SharePoint Portal Server computer before you can crawl Exchange 5.5 content You do not need to configure a MAPI profile If these conditions are not met, the content source will not be created The following information is also required:

 The name of the Exchange server

 The name of the Microsoft Outlook Web Access server If the name of the server is not specified, it is assumed that it is installed on the Exchange server that is being indexed You do not need to use Outlook Web Access, but if you do not, SharePoint Portal Server requires additional configuration

to crawl the public folders

 The Exchange site that the server belongs to

 The Exchange organization that the server belongs to

 An access account with Administrator privilege on the Organization

Implementation,” in Course 2095A, Implementing Microsoft® SharePoint

In this topic, we will explore

how to add an Exchange 5.5

content source

Important

Tip

Trang 24

Using the Exchange Service Account

Although you can use the Exchange service account to crawl content, any account that has Administrator rights on the Organization container can be used It is not necessary to grant permissions on the Site, Site Configuration, or Server containers Exchange Administrator privileges are required because:

 Exchange 5.5 does not use Windows access control lists (ACLs) to secure content, which requires MSSearch to communicate with the Exchange 5.5 directory (dir.edb) at query time to filter out any results for which the user does not have access

 Crawling Exchange 5.5 uses MAPI calls that require Administrator privileges

Providing a Public Folder Path

When you add a content source, you are simply providing the path to the public folder The path format reflects the hierarchy of the public folders and starts with exch:// Each folder name is separated by a slash mark (/)

For example, to crawl a folder called Company News, use the start address

exch://ExchangeServer/Public Folders/All Public Folders/Company News, where ExchangeServer is the name of the Exchange server that is configured

for Search and the name of the public folder tree is All Public Folders To crawl all public folders, the path must end with All Public Folders/ (note trailing slash mark)

For Your Information

Site Server 3.0 Search

crawling Exchange 5.5

setup was very similar to

SharePoint Portal Server

crawling an Exchange 5.5

content source However,

Site Server required

MSSearch to run in the

context of the Exchange

Administrator account With

SharePoint Portal Server,

the service runs as the local

system account and

impersonates the Exchange

account only when crawling

and performing security

validations on search

results

Trang 25

Adding an Exchange 2000 Content Source

Index

Exchange Public Folders

 SharePoint Portal Server Indexes Any Items That Can be Read by the Access Account Provided in Exchange 2000

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

SharePoint Portal Server can crawl both the content of Exchange 5.5 (no service pack is required) and Exchange 2000 Server SharePoint Portal Server crawls only public folder items for Exchange 5.5 and any items that can be read by the access account provided in Exchange 2000 This means you can search for content in private mailboxes in Exchange 2000, such as a shared departmental mailbox

Accessing Exchange Content

To access Exchange content that is returned in search results on the dashboard site, click the Web link, which retrieves and displays the content by using Outlook Web Access

Indexing Office Attachments

On crawled messages, Exchange 2000 creates indexes of the following attachments:

 Office attachments The metadata of an attachment is included in the index

 Custom properties of Office attachments Unlike Site Server, the custom

properties of an Office attachment are included in the index if they match SharePoint Portal Server properties, just as with documents inside a SharePoint Portal Server Web folder

 Attachments that the Gatherer usually filters For example, an htm file is

included in the index However, the search results for an attachment display the subject and author of the message

For more information about installing and accessing Outlook Web Access, see the Exchange Server documentation

Trang 26

Crawling an Exchange 2000 Server

Crawling an Exchange 2000 server is essentially the same as crawling an external SharePoint Portal Server computer because the content for both servers

is stored in the Web Storage System The crawler uses HTTP and WebDAV to gather the content

Exchange 2000 Content Source Features

Unlike Exchange 5.5, Exchange 2000 content sources can be configured to have the ability to:

 Specify any domain user account for the content access account Because Windows 2000 ACLs are used in Exchange 2000, administrative access to Exchange is not required However, the account must have permission to read the content that will be crawled

 Crawl multiple Exchange 2000 servers

 Crawl content outside public folders, such as user mailboxes in the private information store, if the content account has the correct permissions

 Not specify an administrator account in Microsoft Management Console (MMC) or Outlook 2000 MSSearch does not need to impersonate an administrator account to verify permissions to view search results because Windows 2000 ACLs are used

 Crawl documents according to their content class If the document’s content class matches a content class (document profile) in the SharePoint Portal Server schema, properties will be included in the index according to the SharePoint Portal Server schema definition, not the Exchange schema If the content classes do not match, MSSearch uses the Base Document profile and crawls accordingly Because MSSearch does not use the Exchange 2000 schema, custom Exchange properties in public folder items are not included

in the index

Determining a Public Folder Path

A typical path that you specify for Exchange 2000 public folder content in the Content Source Wizard uses the following URL format:

http://exchange_server_name/public/public_folder_tree_name/folder name

The default content access account is used to access Exchange 2000 servers If you want to specify different access accounts for separate Exchange 2000 content sources, use site path rules to configure them To access private information store content, such as a user’s mailbox, use the following URL format:

http:// exchange_server_name/exchange/mailbox_alias

If you do specify a separate account in a site path rule, you must create a separate rule for any redirected URLs for folders that are replicated to another server For example, if you are crawling http://serverA/public/folderA and it is redirected to http://serverB/public, you must create an additional site path rule for http://serverB

Trang 27

Adding a Lotus Notes Content Source

Special Planning and Configuration Install an R5 or later Lotus Notes client Configure the Lotus Notes client with a Lotus Notes ID Configure Lotus Notes to Windows NT user mapping Run the Lotus Notes Indexing Setup Wizard

Create a document profile to map properties

Index Lotus Notes

*****************************I LLEGAL FOR N ON -T RAINER U SE *****************************

Adding a Lotus Notes content source requires special planning and configuration prior to running the Add Content Source Wizard You must perform the following tasks before you add a Lotus Notes content source:

 Install an R5 or later Lotus Notes client on the SharePoint Portal Server computer (You can, however, crawl either 4.6a or R5 servers.)

 Configure the Lotus Notes client with a Lotus Notes ID that has reader access to the databases that you wish to crawl, and ensure that you can connect to the server from the SharePoint Portal Server computer that is functioning as a Lotus Notes client

 Configure Lotus Notes to Microsoft Windows NT® user mapping along with a special Lotus Notes view if you want to provide secure access to Lotus Notes databases

 Run the Lotus Notes Indexing Setup Wizard after it is installed on your SharePoint Portal Server computer

 Create a document profile to map properties if needed

Displaying Property Types

The protocol handler provides Number, Date, and Text property types and resolves numeric and string types When the user creates a content source for Lotus Notes and maps SharePoint Portal Server properties to Lotus Notes properties, the property type for each Lotus Notes property is displayed, while the property type for SharePoint Portal Server properties is not displayed

If the user maps a number to a string or vice versa, the user does not receive any feedback from the user interface that an error has occurred

SharePoint Portal Server crawls the Lotus Notes database and creates an index

of the content according to the Lotus Notes property types Queries use the SharePoint Portal Server property types, and if the type has been mismatched,

no results are returned

Topic Objective

To describe how to add a

Lotus Notes content source

Lead-in

In this topic, we will explore

how to add a Lotus Notes

content source

Important

Ngày đăng: 10/12/2013, 16:15

TỪ KHÓA LIÊN QUAN