Novell press novell cluster services for linux and netware may 2006 ISBN 0672328453

Master intalling and managing Novell Cluster Services with the tutorial not available from anyone else, Novell Cluster Services for Linux and NetWare... We cover Novell Cluster Services

Trang 1

By Rob Bastiaansen, Sander van Vugt

Publisher: Novell Press Pub Date: May 09, 2006 Print ISBN-10: 0-672-32845-3 Print ISBN-13: 978-0-672-32845-9 Pages: 312

Rob Bastiaansen and Sander van Vugt, two Novell Certified Instructors with day-to-day

experience consulting on the topics covered in this book Master intalling and managing

Novell Cluster Services with the tutorial not available from anyone else, Novell Cluster

Services for Linux and NetWare.

Trang 2

By Rob Bastiaansen, Sander van Vugt

Publisher: Novell Press Pub Date: May 09, 2006 Print ISBN-10: 0-672-32845-3 Print ISBN-13: 978-0-672-32845-9 Pages: 312

Trang 5

Novell Cluster Services™ for Linux® and NetWare®

electronic, mechanical, photocopying, recording, or otherwise,without written permission from the publisher No patent

liability is assumed with respect to the use of the informationcontained herein Although every precaution has been taken inthe preparation of this book, the publisher and author assume

no responsibility for errors or omissions Nor is any liabilityassumed for damages resulting from the use of the informationcontained herein

Trang 6

Novell is a registered trademark; Novell Press and the NovellPress logo are trademarks of Novell, Inc in the United Statesand other countries All brand names and product names used

in this book are trade names, service marks, trademarks, orregistered trademarks of their respective owners

Warning and Disclaimer

Every effort has been made to make this book as complete and

as accurate as possible, but no warranty or fitness is implied.The information provided is on an "as is" basis

Special and Bulk Sales

Pearson offers excellent discounts on this book when ordered inquantity for bulk purchases or special sales For more

Trang 7

international@pearsoned.com

Novell Press is the exclusive publisher of tradecomputer technology books that have been authorized by

Novell, Inc Novell Press books are written and reviewed by theworld's leading authorities on Novell and related technologies,and are edited, produced, and distributed by the Que/SamsPublishing group of Pearson Education, the worldwide leader inintegrated education and computer technology publishing Formore information on Novell Press and Novell Press books,

please go to www.novellpress.com

Associate Publisher Mark Taber

Program Manager, Novell, Inc Darrin Vandenbos

Marketing Manager Doug Ingersoll

Trang 8

wrote and published his first book, Rob's Guide to Using

VMware, which is in print in its second edition In 2005, Rob

published The NetWare Toolbox He is a Master Certified Novell

Instructor and holds all the major Novell certifications, includingCertified Linux Engineer and Certified Linux Professional

Sander van Vugt is an independent trainer and consultant,

with a strong focus on Linux in combination with any Novell

product available As a technical trainer, Sander does a lot ofwork for Novell in the EMEA region where he lives He

specializes in the workings of Linux file systems and file accessprotocols and clustering services Sander is also a technical

author; he has written 26 books in the Dutch language, and this

is his second book in English Sander is also a Master CertifiedNovell Instructor and holds all major Novell and Linux

certifications

Trang 9

This book could not have been written without the help of manypeople at Novell In particular we'd like to thank Robert Wipfelfor the time he spent with us thinking about this book Also wewould like to thank Bhogilal Hirani and Brad Rupp for their inputrelated to Chapter 12 Further, we would like to thank Kent

Boogert and Richard Jones for their support in writing this book

Trang 10

availability solutions We cover Novell Cluster Services (NCS) forOpen Enterprise Server for both NetWare and Linux We focus

In this book we provide you with an overview of Novell's high-on the latest release of this software, but most information fromthis book is applicable to older versions of NCS as well Our goalwith this book is to give you a full overview of the process ofdesigning and implementing a cluster We provide insight intoadditional technologies such as iSCSI, Linux Heartbeat

clustering, and Business Continuity Clustering We also describehow to install popular applications into your cluster

environment In this book we have tried to go beyond the

information provided in the online documentation We not onlydescribe how to configure NCS clusters, but also pay a gooddeal of attention to the "why" part of the configuration of a

cluster because we would like you to understand what you aredoing as well This book is written for two different audiences.Administrators new to administering NCS, or clustering at all,will find enough information to get themselves going with NovellCluster Services And administrators who do already have

experience configuring Novell Cluster Services will find that lots

of details are included, especially on how to configure NCS onLinux servers

In Chapter 1, "Introduction to Clustering and High Availability,"clustering is introduced This chapter is aimed in particular atnovice cluster administrators, although even experienced

administrators may find it helpful Read this chapter if you want

to know more about when you would use a cluster solution andhow such a solution should be set up Chapter 2, "ExaminingNovell Cluster Services Architecture," goes into the working ofNovell Cluster Services; in this chapter we describe how thedifferent components of an NCS network interact to make a

working cluster Also some clustering fundamentals such as

Trang 11

partitions are covered This chapter is aimed mainly at thosewho want to know what's happening behind the scenes If youare in a hurry and want to start as soon as possible, you cansafely skip these two chapters

Chapter 3, "Clustering Design," is an important one that

everyone should read; it is about the design of a cluster We'veput all our field experience and some of the most important

field experience of Novell technicians together in this chapter

We consider this one of the most important chapters in the

book, and we do recommend that everyone read it, particularlybecause a proper understanding of how to design a cluster canprevent you from making many errors later

Chapter 4, "Installation and Configuration," discusses

installation of NCS Read this chapter if you want to see moredetails on how to install NCS on NetWare, on Linux, or even in aVMware virtual environment From this chapter, you also willlearn how to add and remove nodes from the cluster Chapter 5,

"Creating Clustered Resources," gives an introduction to theconfiguration of clustered resources Like Chapter 4, this

chapter is particularly useful for people who have never workedwith NCS before Then in Chapter 6, "Cluster Management," youwill find an overview of all the ways to administer an NCS

affordable solution to do just that From the information in thischapter, you will learn how to configure NetWare as an iSCSItarget and connect the iSCSI initiator software on Linux to thatiSCSI target We have chosen not to include any other

information on shared storage solutions, the main reason being

Trang 12

"Cluster-Enabled Applications," can be considered the core ofthis book In this chapter, which is one of the longest in the

book, you can find information on how to configure our

selection of applications in an NCS cluster We discuss

configuration of some of our favorites, such as iFolder and

GroupWise, but some classics from the Linux environment arealso covered in this chapter Next, Chapter 9, "Advanced

Clustering Topics, Maintenance, and Troubleshooting," discussesadvanced technologies that can be applied to NCS

environments In this chapter, for example, you can read how

to re-create the SBD partition when it fails and how to make aproper backup of your cluster This chapter typically is meantfor an administrator who already knows about the nuts and

bolts of Novell Cluster Services and wants to upgrade his tuningand troubleshooting skills

The last three chapters are optional in many cases Chapter 10,

"Upgrading Clusters," discusses how to upgrade an existing

cluster to NCS Chapter 11, "Using SUSE Linux Enterprise

Server Clustering Options," tells you how to use the Heartbeatcluster solution in a SUSE Linux Enterprise Server environment

We can recommend this chapter because in the near future theNCS core will be upgraded to the open-source software from theHeartbeat project

Finally, in Chapter 12, "Introduction to Business Continuity

Clustering," you can read how to create a cluster of clustersusing Novell's Business Continuity Services

We put a lot of effort into writing this book and really hope youenjoy reading it! If you ever have any questions or remarks,you can reach us at the following email addresses:

Rob Bastiaansen: mail@robbastiaansen.nl

Sander van Vugt: mail@sandervanvugt.nl

Trang 13

Chapter 1 Introduction to Clustering and High Availability

In this first chapter we will introduce the general concept of

clustering The Novell Cluster Services (NCS) product will be putinto perspective with regard to other high-availability solutions.The terminology around high availability and Novell Cluster

Services will also be explained in detail We will provide

guidelines here about when to use cluster products and whennot to use them The hardware and software requirements forNovell Cluster Services will be covered at the end of this

chapter

Trang 14

This section introduces Novell's clustering software If you arenew to the product, you must certainly read on because if youmiss the basics, it will be very hard to understand the chapters

on design and installation that follow

With Novell Cluster Services, 2 to 32 servers can be combinedinto one system, where every server runs its own tasks and cantake over services from another server in case of a failover Themost simple clarification of this is an example of two serversproviding access to a data volume on an external storage

device, such as a storage area network (SAN) Server one isproviding access to the volume on a specific Internet Protocol(IP) address If that server fails, for example because of a

power failure, server two will detect that failure, mount the

volume, and pick up the IP address that was used to access theservice Because the addressing remains the same, client

computers can continue to work with their data It depends onthe end users' application whether it will have any problemswith this failover But in most cases everything just keeps

working

In this scenario all servers in the cluster are actively runningtheir own tasks; therefore, we call this type of cluster an Active-Active cluster, in comparison with an Active-Passive solution, inwhich one of the servers is not running any tasks but just waitsfor another server to fail An example of the latter solution wasNovell's Standby Server It is also possible to look at the Active-Active terminology from a service perspective

In the architecture and design chapters you will be able to readmore about how this actually works, but we would like to giveyou a brief overview of Cluster Services components and theterminology of Cluster Services

Trang 15

The most important component for your cluster is the disk thatall your servers will share That external medium contains thedata that your applications work with, and it contains a smallarea that the servers will use for housekeeping The shared diskcan be anything from a simple shared disk on a Small ComputerSystems Interface (SCSI) bus to a SAN or an iSCSI connection

to a server that acts as a disk target Each of these solutions isdescribed in more detail later We will first take a look at whatthe shared disk is used for Figure 1.1 shows a typical cluster ofOES (Open Enterprise Server) servers

Figure 1.1 A typical OES cluster setup.

[View full size image]

First of all, the disk will contain a small partition that we call theSplit Brain Detector (SBD) partition The SBD process is

explained later in this section In short, it is the referee that can

Trang 16

versions of the clustering software, the smallest entity to assign

to a service and to failover was a volume But starting with

NetWare 6, all Novell Storage Services (NSS) volumes becamelogical volumes that reside somewhere inside an NSS Pool

Therefore, you will always perform what we call a pool-basedfailover An NSS Pool can be active on only one server at a time.Accessing the data and especially writing to it from two servers

at the same time would corrupt the data in the pool

Secondary IP Addresses

Trang 17

component you should know about If your services were

running on the primary IP address of a server and the serverfailed, the clients would lose access to that IP address and

therefore also to the service The other server would, of course,have its own private primary IP address

To solve this issue, we use secondary IP addresses on the

cluster Every service that we want to fail over to another nodewill get its own IP address The address must be one that fitsinto the subnet of the primary address of the server And it

must also fit into the subnet of the destination server From thisbehavior follows the requirement that cluster nodes must

always be inside the same subnet

Using these additional IP addresses also means that you shouldshift your focus from working with servers to working with

services You no longer access data on the volume of a serverbut on the volume of a virtual cluster server, either via its

eDirectory object or via its IP address or domain name services(DNS) name

Clustering Terminology

Throughout the book we will use the terminology also used byNovell in its documentation to discuss the clustering

components All these terms are defined here The listing willalso give you an overview of the building blocks for a cluster

Node

This is what you normally would call a server in your

environment But as soon as it becomes part of a cluster, weuse this term to identify a server A cluster resource can be

active on one node and it will fail over to other nodes when the

Trang 18

Master and Slave Nodes

In the entire cluster there will always be one node that is themaster node The other nodes are called slaves You can

identify the master node in the management interface because

it has a yellow dot in its icon The only meaning of the masternode is that it is the one that holds the IP address that is

general for the cluster Besides, in a split brain scenario with aneven tie, which is described later in this chapter, the side wherethe master node resides wins From a technical perspective thisnode is the one that sends out the cluster heartbeat packets tothe other nodes as a broadcast, whereas the slave nodes

respond to that heartbeat with a unicast response The masternode is the first node to start the Cluster Services software in acluster If the server fails, another node will become the masterand will also keep the role of master node until it fails or leavesthe cluster

Heartbeat

All nodes in the cluster will communicate via the network tocheck whether all nodes are still alive They do this by means ofheartbeat packets When a node stops responding to the

heartbeat packets, there is a timeout to wait for the server torespond If it does not respond after that period, the server will

be sent a poison pill

Master Ip Address

As soon as the first node comes up in the cluster, it will becomethe master node, and it will bind this IP address to the network

Trang 19

Cluster Container

All objects that belong to the cluster will be placed into this

container Also, the object itself will contain general

configuration options for the cluster Every node in the clustermust have access to this container in a local replica, so it is

therefore common practice to split off this container as a

partition and store a replica on each node

Cluster Resource

This is an eDirectory object that defines an IP address, a loadscript, and an unload script for a service that runs in the cluster

It is possible that it also activates a cluster pool and its volumes

to be used by the application Because most applications

require access to data, this is the most commonly used

configuration When an application uses data from volumes onthe cluster-enabled pool, that application must also be loadedand unloaded from the scripts in this object and not from theirown resource object In Figure 1.2 you can see a general

overview of a typical cluster with two nodes and a shared diskpool

Figure 1.2 An overview of cluster objects for a

basic cluster.

[View full size image]

Trang 20

Every cluster resource and cluster volume contains a load scriptand an unload script In these scripts you can use commandsthat you could also use on the server's command line or in

NetWare Command File (NCF) files The scripts will contain

some default lines for the clustering software, but for the rest it

is up to you to define the commands needed to start and stopthe application The cluster load script syntax is covered in

depth in Chapter 5, "Creating Cluster Resources."

Cluster Virtual Server

For every pool that you enable in the cluster, you must specify aunique IP address that can be used to access the data but also

a virtual server name The CVSBIND ADD command in the load

Trang 21

unload script with the CVSBIND DEL command

Cluster Template

The resources described previously can be created via a

template These templates contain an example of how a clusterresource can be configured The cluster container holds severaldefault templates that can be used to create network servicessuch as a DHCP server or an iFolder server

Split Brain Detector

We have discussed before that a server works with a shareddisk where a small partition is created for the Split Brain

Detector process This process is needed to prevent a scenario

in which servers in a cluster can no longer see each other viathe network and therefore would decide that they each are theonly surviving node after a failure This scenario would result intwo clusters being alive with the same name and configurationtrying to access the same data If servers no longer see eachother via the network, they will use the shared disk to

Trang 22

Failover

This process forms the heart of the entire clustering solution If

a failure on a node is detected, it is this process that will runthe load script to activate the disk on another node and load theapplication The process will take into consideration what thepreferences are for the failover and will decide on what othernode the application will be run

Migration

The process of a failover can also be executed manually Thiscan be done, for example, if you want an application to run onanother server to better balance the load of your environment

Or, for example, if you want to move all applications to othernodes because you want to upgrade a particular node The

difference with the failover process is that the cluster softwarewill run the unload script and execute the load script on the newserver

Failback

After a failover, the resource can also be moved back to where

it came from In that case the unload script will execute and theload script will run on the destination node This process can beactivated automatically or manually For the manual mode thedifference with a manual migration is that using the failbackfeature will mark the resource as failed, and it will show thatadministrator intervention is needed to move the resource back

to its original node You might want to do that after businesshours, or at least after investigating why a failover occurred It

Trang 23

it could fail again, and thus it could start going back and forthbetween two nodes

Epoch

When you're looking at your Cluster Services management

software, this number shows you how many times the status ofthe cluster has changed When the cluster starts, the epoch iszero, and every time a node joins or leaves the cluster, the

epoch is increased by one

Lives

This number also shows up in your management application.Every cluster resource or cluster volume starts with one life,and every time the resource is brought online or migrated toanother node or does a failover to another node, the lives

counter is increased by one

Membership Quorum and Timeout

Starting the cluster software when the entire cluster has beendown will start with one node, followed by the other nodes Itwould not be a good idea if that first node would start all thecluster resources on its own; the server would probably becometoo busy That is the reason why there is a quorum defined inthe cluster container object When the number of nodes definedthere is online, the nodes will start loading cluster resources.There is also a timeout value that defines after what period oftime the server will start loading resources anyhow This settingcan be overruled on every single resource if you want that

Trang 25

In the preceding section we introduced you to Novell ClusterServices While talking about that product, we have not coveredthe entire high-availability portfolio of Novell We will cover theavailable solutions in this chapter and we will also give you asneak peek into the future of high-availability products

Before we go into the details of all these products, we wouldlike to explain the difference between the types of clusters thatare available This one term, "cluster," is used for different

types of clusters and they are not all the same The cluster that

availability cluster, or a failover cluster If one node fails,

we talk about in this book is what is generally called a high-another node will take over its job And in a healthy cluster

environment all nodes are running their own individual services.The other type of cluster comes in different names: high-

performance clusters, load-balancing clusters, computationalclusters, datacenter clusters, and so forth What is different forthese clusters compared to the ones we write about in this book

is that all servers in the cluster work together to deliver thesame service and with that share the load of running the

service The entire collection of machines works together to runthe application If a node fails, there is no such a thing as afailover The service remains running on the nodes that are stillalive, and there is absolutely no interruption for the application

An example of such a cluster is the PolyServe Matrix Server,which we will discuss later in this section

So if these clusters are so different and there is no failover orinterruption to the end user, why do we not run all our

applications on such a cluster? The answer comes in two parts.Part one is that the software we generally run on Open

Enterprise Server does not run on the operating systems thatare used to build these types of clusters They do not run

GroupWise Post Office NetWare Loadable Modules (NLMs), for

Trang 26

administrators

Having said all that, let's take a look at the clustering solutionsavailable today

Novell Cluster Services

This solution has already been discussed earlier in this chapter,but we would like to look at a few of the benefits of this solution

when NCS was first shipped in 1999 A SAN really was a

complex and expensive piece of equipment, and that has

changed throughout the years Nowadays, iSCSI or DistributedReplicated Block Devices can be used as cheap shared storagesolutions

Edirectory Enabled

Servers that run a clustering solution without this extra layer ofmanagement are hard to configure and to keep in sync with

Trang 27

Running a Cluster with Mixed Operating Systems

With Open Enterprise Server it is possible to run cluster nodes

in a mixed environment with both NetWare and Linux This

feature has not been added to run this as a permanent solution,but it will help you migrate from a running NetWare cluster to aLinux cluster You can add new Linux nodes and migrate your

Trang 28

We will discuss the architecture in this section to the level thatyou can understand what BCC is Chapter 12, "Introduction toBusiness Continuity Clustering," contains a full technical andfunctional description of what BCC is and how to set up a

solution for testing and development

Figure 1.3 With BCC a complete cluster can fail

over to another site.

Trang 29

clusters, with a maximum of four Each cluster runs its ownservices and provides active services to its own clients Of

course, you can also build the second or third cluster with onlyfailover in mind and not use it in any way That depends onyour design and your demand for high availability

With the two clusters set up, the next component that is part ofBCC is Novell Identity Manager With this DirXML-based productthe configuration of your cluster resources is continuously

synchronized between the two geographical sites When a

failover has to occur, the cluster at the remote site will have theentire configuration available to run the services from the

original cluster All the configuration information that has to bechanged will be modified during the synchronization cycle Anexample of this is the IP addresses for your services When you

do a failover to another site, the IP addresses will need to be inanother subnet, and Identity Manager driver takes care of that

The third component that you see in Figure 1.3 is the SAN Thedata you store in the cluster must also be available on the

remote site Therefore the data must be synchronized in someway There are three general ways of doing this:

SAN-based mirroring

Host operating systembased mirroring

Snapshot-based solutions

Your choice of a data transport solution is limited to factors likeavailable bandwidth and cost But there are also technical

factors to take into consideration With the mirroring solutionsthe loss of data is almost zero Only the data that is still in

Trang 30

The fourth and last component in Figure 1.3 is the clients Forthose there are two things you should look into The first one istechnical What will have to be done with the IP addressing, forexample? If your clients are still on the original subnet, how dothey access the services that have failed over to the remotesite? But the most important question is this: Where are yourclients? Are they still there? If you have to do a failover becauseyou had to evacuate the area, where do you continue your

Server that can run applications like Oracle databases or webservers The applications can run on several nodes in the clustersimultaneously Figure 1.4 shows a typical PolyServe clusterwith NFS exports With that you can balance the load of an

application, and when a node fails the application remains

available

With PolyServe Matrix Server the load of applications can bebalanced on servers With this solution consolidated servers canrun more applications than you can normally do on individualservers The overall performance of any number of servers willimprove significantly when they are combined into one cluster.Another benefit of this consolidation is that storage is also

consolidated The benefit of that is also increased storage

efficiency

The product can be bought from Novell Besides the basic

Trang 31

offering, there are also two solution packs that add functionalityfor specific deployments to your environment These are theDatabase Solution Pack and the File Serving Solution Pack.

Trang 32

Server by default It provides a high-availability solution forLinux applications that run on SUSE Linux The product

originates from the Linux High Availability project The softwareprovides very similar functionality as NCS It provides failoverclustering for applications This product is discussed in moredetail in Chapter 10, "Upgrading Clusters."

Trang 33

What is this? A section in a clustering book that tells you whennot to cluster applications? Do you need that? Yes, you do

Because after reading the previous sections about what

clustering is and what the benefits are, you might be tempted

to start clustering just for the fun of it

We are serious here We have seen it many times Starting withNetWare 6, a two-node cluster is included with the basic

license That means that it is relatively inexpensive to build ahigh-availability solution Several vendors have sold, and somestill do sell, servers with a shared SCSI solution for two servers.The cost is not much higher than for two individual servers

There is nothing wrong per se with using such a cluster, but youshould not do it to replace two servers in your environment In

Chapter 3, we will go into more detail about this topic, but forclusters up to six you should always calculate one extra servercompared to what you would use in real life

In a scenario in which you already have two servers that runseveral services and you build a two-node cluster that runs allthose services, what happens if one of the nodes fails? One ofthe two servers will then have to run the full load of all

applications That might be too much and it could cause yourapplications to fail, for example, because of a lack of memory

Planning for a clustering solution is really something you should

do with high availability in mind for the applications that areimportant for your users or for which you have agreed on highavailability in a Service Level Agreement If you do not focus onthis and just cluster-enable everything you have running today,you may find yourself with an overloaded server one day, losingvaluable uptime for your system

Trang 34

management and policy management For both there are noservices that run on the server All information comes from

eDirectory and the file system And it is absolutely possible topoint to a cluster virtual server to deploy software packages andpolicies The problem lies in the back-end services The question

is this: Do they need to run on a cluster? We cannot answerthat question for you You should think about the need to

cluster-enable them yourself How important is it that your

inventory database has a continuous availability? Do you want

to run the Automatic Workstation Import service with high

availability? If you have a separate server for these services,they might as well have good availability anyhow, and they arenot mission critical An already imported workstation works fine

if it cannot contact the import server at boot time The only

thing that would not happen is the update of workstation

information So you could miss the change in IP address in caseyou want to remote-control the machine

This was really just an example of how to look at your services:with a clear mind toward high availability It is okay with us ifyou cluster-enable all services in your environment up to theMONITOR.NLM as long as you make sure that you dimension yourcluster servers with all these services in mind

Trang 35

System administrators and end users do not look at high

availability from the same perspective Where the system

administrator sees a server that is available, it does not meanthat the end user has available service This is the most

common misconception when we talk to system administratorsand IT managers when they are looking into introducing high-availability solutions into their environment Installing a clusterdoes not provide instant high availability There is more to thedefinition that is important to look at

Let us first define availability If you look it up in the dictionary, you will find that available means present or ready for

immediate use And this is really the heart of everything we will

be talking about in this section: services being available We willuse the end user's perspective here because that is all that

matters The users and the services they need are the sole

reason the IT infrastructure exists There is no company or

organization that runs servers just for the fun of running

servers For the end user availability means this: I can use theservice that I am allowed to have access to This can be

something as simple as a word processing application or a

spreadsheet program Or it can be a more complex service such

as email and document management, or accounting software

So determining whether a service is available is really simplefrom the perspective of the end user: Can I access my email?Answering this question with yes means availability, answeringwith no means nonavailability

To the end users, it does not matter if a server was down, if anetwork switch was broken, or if the post office was closed

They cannot access their email and therefore there is no

availability

If you look at your IT infrastructure from this perspective, you

Trang 36

to look into than just clustering your servers Of course, that isthe topic of this book Later in this chapter we will discuss whatelse you need to look into to improve the level of availability ofyour IT infrastructure

Trang 37

Now that we have looked into what availability is, we can

explore the world of high availability The use of the word high

implies that we get more than normal availability, but clearlynot availability without any disruption Otherwise, we wouldhave used a term such as nonstop availability or continuousavailability No, instead we call it high availability How high

depends on the solution you implement

In general, vendors of high-availability solutions will talk aboutthe level they offer in terms of a percentage of downtime It isreally impossible to just look at the numbers and decide whichsolution is the best An example is a vendor that claims 100%availability The offering includes redundant hardware with up tothree main boards, multiple network interface cards, and so on.You might ask yourself what happens when I need to reboot theserver? Does that not lower the percentage of availability of thesystem? According to this vendor it does not because he meansthat your system will be available during the business hoursthat are available and you will have to keep a maintenance

window open for such operations

So if you are looking into the percentage of availability you

need, you will first have to do the math yourself and gather allthe details

Are there solutions available that will give me that 100%

availability 365 days per year, 24 hours per day? Yes, there are.But they are really a completely different solution than the

Novell clustering offering we will be looking into in this book.Examples of servers that have an extremely high level of

availability are the ones being used by financial institutions thatrun payment services for ATMs and payment consoles in shopsand restaurants A downtime of 30 minutes on a busy Saturdayafternoon will be guaranteed to make it into the evening news

Trang 38

mostly they are also clustered in a way that will never let theservices stop

What you could read in the preceding section is that there areseveral types of clusters The solutions described in the

preceding paragraph are not based on failover clustering but onclusters of servers that keep one or more tasks running, in

which no single machine has a critical role That type of

clustering allows for the highest availability possible Of course,the price tag for such a solution is also the highest possible Ifthe IT manager or the users ever complain about a

maintenance window from midnight to 6 a.m or about a

downtime during business hours of an hour per year, you can dothe math for them to determine what that last hour of

availability would cost compared to the overall cost when youchoose to have a little lower availability

Let us now do the math to see what level of availability we get

if we choose a solution with a specific number of nines behindthe decimal point In Table 1.1 we have calculated the

Trang 39

99.9999 31.5 seconds

The table tells us what availability a specific number of ninesgives us in one year But is this realistic? If we buy a solutionfor Novell Cluster Services that promises 99.999% uptime peryear, is that realistic? If we need to upgrade the Cluster

Services software, can we do that in 5.25 minutes?

If you need realistic numbers, you should not calculate the

availability based on one full year You should look at your

business hours Do your users need 24-hour access, 7 days aweek? Or do you have a maintenance window each night thatyou can use to update software or to test new configurations?

Table 1.2 gives a few examples of an uptime window of only 18hours per day This makes a great difference compared to a 24-hour window

Trang 40

amount of downtime you will get with a Novell cluster solution.The reason that these numbers are important is the ServiceLevel Agreement that you have with your customers (your users

in most cases) You need to be able to set up an agreement inwhich you define how much downtime is allowed for your

environment

The downtime you might get for a system is important, but ifyou, for example, set up an SLA with your customers with amaximum downtime based on 99.9%, based on an 18-hour

working day, including the weekends, you must be able to

calculate whether that is realistic This percentage will give you

a time slot to work with to repair the system of a maximum 6hours and 34 minutes Anyone who ever had to wait for a newhard disk to arrive knows that you cannot have this happen

more than once or twice per year Of course, we will look intomore high availability than just clusters and hard disks, but wewill do that later First we will look into how to calculate whatyou can expect in terms of downtime when a component fails

Định dạng
Số trang	534
Dung lượng	7,48 MB