1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Module 7: Server Cluster Maintenance and Troubleshooting ppt

36 372 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Server Cluster Maintenance and Troubleshooting
Tác giả April Andrien, Priscilla Johnston, Diana Jahrling, Jack Creasey, Jeff Johnson, James Cochran, Lorrin Smith-Bates, Andrea Heuston, Lynette Skinner, Elizabeth Reese, Bill Jones, Miracle Davis, Julie Challenger, Irene Barnett, Eric Wagoner, Eric R. Myers, Robertson Lee, David Mahlmann, Scott Serna, Rick Terek, John Williams, Laura King, Kathy Hershey, Bo Galford, Sid Benavente, Ken Rosen, David Bramble, Julie Truax, Dean Murray, Robert Stewart
Người hướng dẫn Don Thompson, Greg Bulette
Trường học Microsoft Corporation
Thể loại tài liệu
Năm xuất bản 2000
Thành phố Redmond
Định dạng
Số trang 36
Dung lượng 887,24 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Identify and troubleshoot common server cluster failures: network communications, small computer system interface SCSI configuration problems, group, resource, and quorum failures.. • Ba

Trang 1

Contents

Overview 1

Troubleshooting Cluster Service 11

Review 30

Module 7: Server Cluster Maintenance and Troubleshooting

Trang 2

to represent any real individual, company, product, or event, unless otherwise noted Complying with all applicable copyright laws is the responsibility of the user No part of this document may

be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation If, however, your only means of access is electronic, permission to print one copy is hereby granted

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property

 2000 Microsoft Corporation All rights reserved

Microsoft, Active Directory, BackOffice, Jscript, PowerPoint, Visual Basic, Visual Studio, Win32, Windows, Windows NT are either registered trademarks or trademarks of Microsoft Corporation

in the U.S.A and/or other countries

Other product and company names mentioned herein may be the trademarks of their respective owners

Program Manager: Don Thompson

Product Manager: Greg Bulette

Instructional Designers: April Andrien, Priscilla Johnston, Diana Jahrling

Subject Matter Experts: Jack Creasey, Jeff Johnson

Technical Contributor: James Cochran

Classroom Automation: Lorrin Smith-Bates

Graphic Designer: Andrea Heuston (Artitudes Layout & Design)

Editing Manager: Lynette Skinner

Editor: Elizabeth Reese

Copy Editor: Bill Jones (S&T Consulting)

Production Manager: Miracle Davis

Build Manager: Julie Challenger

Print Production: Irene Barnett (S&T Consulting)

CD Production: Eric Wagoner

Test Manager: Eric R Myers

Test Lead: Robertson Lee (Volt Technical)

Creative Director: David Mahlmann

Media Consultation: Scott Serna

Illustration: Andrea Heuston (Artitudes Layout & Design)

Localization Manager: Rick Terek

Operations Coordinator: John Williams

Manufacturing Support: Laura King; Kathy Hershey

Lead Product Manager, Release Management: Bo Galford

Lead Technology Manager: Sid Benavente

Lead Product Manager, Content Development: Ken Rosen

Group Manager, Courseware Infrastructure: David Bramble

Group Product Manager, Content Development: Julie Truax

Director, Training & Certification Courseware Development: Dean Murray

General Manager: Robert Stewart

Trang 3

Instructor Notes

This module is intended to prepare the students to successfully back up and restore a server cluster Students need to know how to use the troubleshooting tools available for troubleshooting server cluster problems The module covers common Cluster service problems and possible resolutions

After completing this module, you will be able to:

 Perform the steps to successfully back up a server cluster

 Perform the steps to successfully restore a server cluster

 Evict a node from a server cluster

 Identify the tools that are necessary to troubleshoot a cluster failure

 Interpret the entries on the cluster log

 Identify and troubleshoot common server cluster failures: network communications, small computer system interface (SCSI) configuration problems, group, resource, and quorum failures

Materials and Preparation

This section provides the materials and preparation tasks that you need to teach this module

Required Materials

To teach this module, you need the Microsoft® PowerPoint® file 2087A_02.ppt

Preparation Tasks

To prepare for this module, you should:

 Read the materials for this module and anticipate questions students may ask

 Read Q224075, Q257892, Q248998, Q172951, Q266274, Q234767, Q193890, Q245762 and “Interpreting MSCS Cluster Log, on the Student compact disk

 Be familiar with the Resource Kit Utilities

 Practice the labs

 Study the review questions and prepare alternative answers for discussion

Presentation:

45 Minutes

Lab:

15 Minutes

Trang 4

Module Strategy

Use the following strategy to present this module:

Because backing up the cluster is a key maintenance task, the first section begins with information on how to backup the cluster configuration files The following pages cover the complete procedure for restoring an entire cluster in case of catastrophic failure You can also use each of the topics as a separate procedure for performing a specific task

The troubleshooting section lists the tools that are available for troubleshooting Cluster service and gives common problems and suggested resolutions

 Cluster Maintenance Cluster service is self-tuning and requires no maintenance other than daily backups

• Backup: Backing up the system state backs up the cluster configuration files; however, you also need to back up each node’s data and operating system and the cluster disks

• Restoring the First Node: The overall procedure for restoring a cluster is outlined on this page The first step, restoring the operating system on the first node, is also covered The remaining steps are covered in detail

on the following pages

• Restoring Cluster Disks: Cluster service uses the disk signature file to identify the cluster disk To replace this disk, you must write the disk signature file of the old disk onto the new disk

• Restoring the Second Node: Restoring the remaining nodes of the cluster

is similar to restoring the first node, except that after it is restored, you need to test the failover capabilities of the cluster before putting the cluster back into the production environment

• Evicting a Node: Evicting a node is a manual process through Cluster Administrator As always, it is important to have a good backup of the server prior to the eviction process

Trang 5

 Troubleshooting Cluster Service The key point of this section is to give the students the tools and techniques that are useful in reducing the time it takes to find a root cause for common Cluster service problems

• Troubleshooting Tools: The tools that are used to help troubleshoot a problem with Cluster service are the same tools that are used to help troubleshoot a server running Microsoft Windows® 2000

• Examining the Cluster Log: Cluster service logs every change configuration and problem to the cluster log It is important for the students to become familiar with the syntax of the log

• Troubleshooting Network Communications: Students need to know that there are different troubleshooting paths to follow depending on whether the network problem is a node-to-node or a client-to-node problem

• SCSI Configuration Problems: SCSI is less reliable than Fibre There can be problems with the SCSI controller, SCSI termination, and SCSI cabling

• Group and Resource Failures: Remind students to keep dependency trees vertical so that if a resource fails, it is easier to find a root cause as to which resource is causing the failure of the group

• Quorum Log Corruption: If Cluster service cannot write information to the quorum log, it will not start You can attempt to reset the quorum log, or you can delete the quorum log and let Cluster service create a new log

Trang 6

Instructor Setup for a Lab

Lab Strategy

This lab is designed to prepare the students to use Backup and Clusrest.exe to perform the proper backup and restore procedures Students will uninstall Cluster service in preparation for the Network Load Balancing (NLB) portion

of the course NLB and Cluster service cannot run on the same computer

Lab A: Cluster Maintenance

To conduct this lab:

 Read though the lab carefully, paying close attention to the instructions and details

 Students will need the Clusrest utility from c:\moc\2087\labfiles\mscs

 Students work in teams of two, grouped together by their shared bus

 Help the students determine whether they are Node A or Node B In these exercises each node performs a specific task in the backup and restoration procedures Both nodes will uninstall Cluster service

Trang 7

Overview

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

Server cluster maintenance and troubleshooting are considered two separate disciplines Maintenance is continuous, whereas troubleshooting has a beginning when the problem is discovered, and an end when the problem is resolved The two disciplines are complimentary, however When every troubleshooting procedure that you follow fails, you will need to rebuild the cluster from a backup tape that was generated during a maintenance procedure After completing this module, you will be able to:

 Perform the steps to successfully back up a server cluster

 Perform the steps to successfully restore a server cluster

 Evict a node from a server cluster

 Identify the tools that are necessary to troubleshoot a cluster failure

 Interpret the entries on the cluster log

 Identify and troubleshoot common server cluster failures: network communications, small computer system interface (SCSI) configuration problems, group, resource, and quorum failures

In this module, we will cover

Cluster maintenance in the

form of backing up and

restoring a cluster, and

troubleshooting Cluster

service

Trang 8

 Cluster Maintenance

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

Cluster service uses the self-tuning features of Microsoft® Windows® 2000 and requires very little maintenance The only day-to-day maintenance operation that you need to perform is to back up the cluster

Under special circumstances, a node in the cluster may need to be replaced, for example, when your organization decides to perform a hardware upgrade In this situation, you need to evict a node from the cluster and add the upgraded node to the cluster

Topic Objective

To introduce the

fundamental tasks for

maintaining a server cluster

Trang 9

Backup

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

Backing up the cluster is no different from backing up Microsoft Windows 2000 Advanced Server It is recommended that you perform regular backups by using the Windows 2000 Backup program (NTBackup), or other compatible backup programs Additional backup agents are still necessary to back up applications running on the cluster, such as Microsoft SQL Server™

and Microsoft Exchange

A cluster-aware backup program will be able to perform the same backup operations as NTBackup, especially with regard to backing up the System State and the cluster configuration database

Backing Up the System State

The configuration information for the cluster is located on the registry on each node (HKEY_LOCAL_MACHINE\Cluster) The Backup tool that is included with Windows 2000 backs up the cluster database when you back up each node’s system state

NTBackup backs up the system state on each node The system state includes:

 The quorum log

 The local registry

 The Cluster registry hive

Topic Objective

To describe how to back up

the system state, node, and

cluster disks

Lead-in

A backup of the cluster

includes the system state,

the node, and the cluster

disk

Note

Trang 10

Backing Up the Local Disk

Follow standard computer backup procedures to back up the operating system and the data on the local drives You must also back up key cluster files on the local disks

 On each node, back up the cluster database files:

Backing Up the Cluster Disks

It is critical to back upcluster files on the quorum diskanddata on the cluster disks, because Cluster service will write information to files in the

\mscsdirectory on the quorum disk and cluster-aware applications will likely be placing data on the cluster disk Because either node of the cluster could own the cluster disk resource at any time, it is possible for each node to back up the data on the drive However, having each node back up data would require you

to install backup hardware and software on each cluster node, which is not the best solution

One possibility is to identify a nonclustered server running Windows 2000 Server and schedule it to back up data remotely through a network connection

to the Cluster disk’s administrative share or a hidden share that you create For example, you might create FBackup$, GBackup$, HBackup$, and WBackup$ file share resources on the virtual server for the root of drives F, G, H, and W

F, G, and H would be cluster disks with data, and W would be the drive letter for the quorum disk Hidden shares would not appear in a browse list and you could configure them to allow access only to members of the Backup Operators group

Note

Trang 11

Restoring the First Node

Steps For Restoring a Server Cluster:

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

The following sections describe the procedure for restoring a server cluster in the event that both nodes and the cluster disk fail It is possible that any one of the components in the cluster could fail independently In the case of a failed component, you follow the same procedure for restoring that specific component

Performing a complete restore of a server cluster is a straightforward process

1 Restore a node of the cluster

2 Restore the cluster disks of the restored first node

3 Restore the remaining node of the cluster

4 Perform node testing

Topic Objective

To list the steps for restoring

a server cluster and

describe how to restore the

first node

Lead-in

In the event of a complete

cluster failure, you first

restore a node

Delivery Tip

This page lists the four

steps that are involved in

restoring a complete cluster

and covers the first step,

Restoring a Node Details

about the other three steps

follow on the next pages

Trang 12

Restoring a Node of the Cluster

To restore a node in a server cluster, you follow the same procedure that you would use in restoring a Windows 2000 operating system

1 Install a fresh copy of Windows 2000 Advanced Server on the node to be restored

2 Log on as Administrator and restore the system and boot partition, system state, and associated volumes from the backup Make sure that you select the option to restore the system state to the original location in the backup program

3 Restart the node

4 Perform the steps for restoring the cluster disk These steps follow in the next section

The difference between the time of the backup and the time of the restoration to the new computer may affect the computer account on the domain controller You may have to join a workgroup and then rejoin the domain

Note

Trang 13

Restoring Cluster Disks

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

After you have restored a node in the cluster, you must restore the cluster disks Restoring the cluster disks involves restoring the disk signature file that the cluster uses to identify the disk You may also need to restore a cluster disk if you are running out of disk space or if there is impending disk failure of a disk

It can be costly to make mistakes while replacing a cluster disk; the consequence can be the irrecoverable loss of all of the data on that disk If the disk is the quorum disk, the server cluster's configuration data is at risk

Before restoring the cluster disks, stop Cluster service on all of the nodes of the cluster Stopping Cluster service will ensure that it will not attempt to start, which would place a lock on the disks

Restoring Disk Signature Files

Because Cluster service relies on disk signatures to identify and mount volumes, if a disk is replaced, or if the bus is re-enumerated, Cluster service will not find the disk signatures that it is expecting and will not function You can run Dumpcfg.exe to extract the disk signature from the registry and write it to the new disk Cluster service will recognize the new disk and successfully start the resource

The Dumpcfg.exe is a resource kit utility that restores an old disk signature file to a new disk

If the disk that you are replacing is the quorum disk, use Cluster Administrator

to move the quorum to a different disk, and proceed in the replacement of the disk After the disk is brought back online, you can move the quorum back to the new disk

Topic Objective

To describe how to restore

the cluster disk by restoring

signature files, data and

cluster configuration files

Lead-in

Restoring a cluster disk

involves restoring the disk

Cluster,” found on the

Student compact disk

Note

Trang 14

Restoring the Data on the Cluster Disk

Restoring the data on the cluster disk is the same as a restore of a local disk Before restoring the data, make sure that you have associated each cluster disk

to the same drive letter as before the disaster or failure When restoring, make sure that you restore the data to the original location and verify the integrity after you have completed the restore

Restoring the Cluster Configuration Files

The cluster configuration files include the cluster database and the quorum log The cluster database is the database or configuration data (cluster objects and their settings) that are pertinent to the cluster This database is the product of the cluster registry key checkpoint and the changes that are recorded in the quorum log All of the nodes of the cluster hive maintain a local copy of this database in the nodes local registry

After you have restored the disk signature file and data, you can start the server cluster If the cluster files were not restored, or were corrupted, the following procedure can restore the cluster database from the registry of the restored node Identify the node on which you will restore the database (in the case of a disaster restore, this will be the first node that you have restored) Restore the cluster database on the selected node by restoring the system state Restoring the system state creates a temporary folder under the %Systemroot%\Cluster folder called Cluster_backup

You use NTBackup to restore the cluster configuration files, which places them

on the node You then restore the cluster database to the node’s registry by using the Clusrest.exe tool Clusrest.exe restores both the quorum log (Quorum.log) file and the cluster database (Clusdb)

The Clusrest.exe tool is available in the Windows 2000 Resource Kit This tool is a free download from www.microsoft.com

Note

Trang 15

Restoring the Second Node

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

After you complete the process of restoring a node of a cluster, and Cluster service has started successfully on the newly restored node, you can start the restore process on the other node of the cluster

Restoring the Remaining Node(s) of the Cluster

The restoration of the second node of a cluster is the same procedure as restoring the first node of a cluster, except that you will not have to restore the cluster disks

Performing Node Testing

Testing the failover and failback policy is recommended before putting the cluster back into production

1 Verify that the disk and cluster resources are available on the correct node

2 Fail over each group and resource to verify that they can successfully start

on the other node of the cluster

3 Test the failback policy of each resource by allowing the resource to fail back to a preferred owner after the node has come back online

Topic Objective

To describe how to restore

the second or remaining

nodes of a cluster and test

the failover and failback

policies

Lead-in

The last step in restoring the

cluster is to restore the

second node and then test

the components of the

cluster

Trang 16

Evicting a Node

Steps for Evicting a Node

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

If you need to change a node of a cluster, for example, to add a more powerful server, you need to logically remove the node before physically removing the node from the cluster When you configure a new server with the shared bus, and the public and private networks, you can then run the Cluster Installation Wizard

To remove a node from a cluster, from Cluster Administrator, right-click on the

node to access the menu with the Stop Cluster option and Evict Node options

To evict a node:

1 Back up both nodes

2 Verify backup

3 Move all of the groups to the remaining node

4 Stop Cluster service on the node that is to be removed

5 Evict the node

6 Unplug the server from the shared bus (if the shared bus is a SCSI bus, be careful about termination)

If a new server is to join the cluster later, run the Cluster Installation

Wizard and select Join a Cluster

Topic Objective

To describe how to evict a

node from a cluster

Lead-in

You must first evict a node

from the cluster to add a

new node to the cluster

Note

Trang 17

 Troubleshooting Cluster Service

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

Troubleshooting a problem with Cluster service can be more complex than troubleshooting a single server because of the virtual servers and the need for intracluster communications Virtual servers change ownership from one node

to another, which may cause network connectivity problems Applications running on the cluster are difficult to troubleshoot, because they are running on

a virtual server instead of a physical server You could also have a node-to-node communication problem because servers usually work independently of each other and not together You might experience hardware problems with the shared bus and the cluster disk resources

The most common failures are due to improper configurations within groups and resources Cluster service will fail if the quorum log becomes corrupt It is important to know how to repair the quorum log to restart the cluster

You use the same tools to identify problems on the cluster as you would use to identify problems on a physical server The best resource for troubleshooting is the cluster log because Cluster service records the activity of each node in the cluster log This log can help you identify problems on the node or in the cluster

This section provides an

overview of the tools that

are available for

Cluster Service Startup

Issues” on the Student

compact disk

Trang 18

***************************** ILLEGAL FOR NON - TRAINER USE ******************************

When troubleshooting Cluster service, you can use the same tools and methodologies that you would when troubleshooting Windows 2000 Advanced Server

Cluster service writes logging information to the system log of every node in the cluster Cluster service also writes a more detailed log of cluster activity to the cluster log on each node Use these two sources to gather information when you begin troubleshooting a problem You will be able to determine whether the problem is related to the network, to services or applications, or to physical components in the cluster

Use Event Viewer to filter the system log on event source: ClusSvc You can view general events, such as if Microsoft Cluster service failed to join the cluster on this node and Microsoft Cluster service successfully created a cluster

on this node

After you have determined the type of problem, you can use the following tools

to search for the source of the problem You must check each node individually when using any of these tools

 Disk Manager You check disk manager to find out the health of the cluster

disk You can check whether the operating system recognizes the disks, and whether the cluster disks are basic versus dynamic You also need to verify that the drive letters of the cluster disks are the same on both nodes

 Task Manager You can verify that Cluster service is running in Microsoft Windows 2000 Task Manager You can also use Task Manager as a

performance monitor, but you do not obtain the level of detail as you would with a performance monitor In Task Manager, you will be able to verify the CPU utilization percentage and the memory resources on the node

Topic Objective

To describe the tools that

are used for troubleshooting

Cluster service problems

Lead-in

The tools that you use for

troubleshooting a cluster are

the same tools that you use

to troubleshoot a server

Note

Ngày đăng: 18/01/2014, 05:20

TỪ KHÓA LIÊN QUAN