Automating Linux and Unix System Administration Second Edition phần 8 pptx

bec+jnla*_bc to the cfengine master at LNK@+nalh+nkkp+qon+lgc+jnla_kjb+jnla*_bc฀ ฀then edited the jnla*_bc file to use the +qon+lgc+j]cekolhqcejo+he^ata_ directory for all the paths and

Trang 1

Figure 10-1 Nagios service detail screen for the system localhost

Trang 2

If you’re following along with the book in an environment of your own, you’ll notice

a problem—there isn’t a _da_g[dpplo฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀

This new _kii]j` object definition calls the _da_g[dppl plug-in with the appropriate

arguments to test an HTTPS-enabled web site Once this was copied to our Nagios server

the check cleared in Nagios

Nagios is now in a fully functional state in our environment, but we don’t find it very

useful to only monitor a single machine Next, we’ll take steps to monitor the rest of the

hosts at our site The first step will be to deploy a local monitoring agent called NRPE to

all our systems

NRPE

NRPE is the Nagios Remote Plug-in Executor It is used in place of agents and protocols

such as SNMP for remotely monitoring hosts It grants access to remote hosts to execute

plug-ins such as those in the Nagios plug-ins distribution NRPE has two components: a

daemon called jnla and a plug-in to the Nagios daemon called _da_g[jnla

Trang 3

The NRPE documentation points out that there are other ways to accomplish remote plug-in execution, such as the Nagios _da_g[^u[ood฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀host seems attractive for security reasons, it imposes more overhead on remote hosts than the NRPE program does In addition, a site’s security policy may expressly forbid

lightweight, flexible, and fast

Step 15: Building NRPE

The NRPE source distribution does not include an installation facility Once it is built, it

฀ ฀_da_g[jnla to the preexisting j]ceko)lhqcejo directory for the `a^e]j*e242

architecture and copied the jnla program itself into the single shared LNK@+nalh+nkkp+qon+lgc+jnla).*-.)^ej directory

except that we copied the plug-ins to the jnla)^ej+jnla)na`d]p*e242 directory and the

jnla binary to jnla).*-.)^ej+jnla)na`d]p*e242

Trang 4

bec+jnla*_bc) to the cfengine master at LNK@+nalh+nkkp+qon+lgc+jnla)_kjb+jnla*_bc฀ ฀

then edited the jnla*_bc file to use the +qon+lgc+j]ceko)lhqcejo+he^ata_ directory for all

the paths and allow access from our etchlamp system as shown:

At this point, we have the NRPE programs built and ready for distribution from the

cfengine master, along with a configuration file The last thing we need to prepare for

NRPE is a start-up script

Trang 5

Step 17: Creating an NRPE Start-up Script

฀created a simple init script for NRPE at LNK@+nalh+nkkp+ap_+ejep*`+jnla on the cfengine master with these contents:

This is a very simple init script, but it suffices because NRPE is a very simple daemon

฀ ฀ ฀lgehh command, because in writing this chapter, we found that ally the PID of the jnla process wasn’t properly stored in the jnla*le` file Occasionally, daemons have bugs such as this, so we simply work around it with some extra measures

occasion-to kill the daemon with the lgehh command

Step 18: Copying NRPE Using cfengine

฀now have everything we need to deploy NRPE at our site To distribute NRPE with cfengine, we created a task to distribute the configuration file, init script, and binaries in

a file named LNK@+ejlqpo+p]ogo+]ll+j]ceko+_b*jnla[ouj_ Here’s the file, which we will describe only briefly after showing the complete contents, because we’re not introducing any new cfengine functionality in this task:

Trang 7

jnlanaop]np+ap_+ejep*`+jnlaop]npejbkni9pnqaqi]og9,

Trang 8

฀ ฀ ฀ ฀+ap_+ejep*`+jnla start-up script into the runlevel-specific

directo-ries in the preceding hejgo section, we avoid creating a link in +ap_+n_/*` on Solaris hosts

execute twice No damage would result, but we don’t want to be sloppy ฀ ฀

directories n_0*`, n_1*`, and n_2*` don't exist on Solaris, so we won't attempt to create

sym-links in them

Note that we make it easy to move to a newer version of NRPE later on, using version

numbers and a symlink at +qon+lgc+jnla to point to the current version The use of a

vari-able means only the single entry in this task will need to change once a new NRPE version

is built and placed in the appropriate directories on the cfengine master

To activate this new task, we placed the following line in LNK@+ejlqpo+dkopcnkqlo+

_b*]ju:

p]ogo+]ll+j]ceko+_b*jnla[ouj_

Step 19: Configuring the Red Hat Local Firewall to Allow NRPE

The next-to-last step we had to take was to allow NRPE connections through the Red Hat

firewall To do so, we added rules directly to the +ap_+ouo_kjbec+elp]^hao file on the

sys-tem rhlamp and restarted elp]^hao with oanre_aelp]^haonaop]np Here are the complete

contents of the elp]^hao file, with the newly added line in bold:

Trang 9

always use the Red Hat command ouopai)_kjbec)oa_qnepuharah to make changes and then feed the resulting +ap_+ouo_kjbec+elp]^hao changes back into the copy that we distribute with cfengine This is just another example of how manual changes are often needed to determine how to automate something It’s always OK as long as we feed the resulting changes and steps back into cfengine for long-term enforcement.

฀ ฀ ฀elp]^hao file on our cfengine master at LNK@+nalh+nkkp+ap_+ouo_kjbec+elp]^hao and placed a task with these contents at the location LNK@+ejlqpo+p]ogo+ko+_b*elp]^hao[ouj_:

_kjpnkh6

]ju66

]``ejop]hh]^ha9$naop]npelp]^hao%

Trang 10

It might seem strange to use the ]ju class in the _b*na`d]p hostgroup file, but if you

think about it, the task doesn’t apply to all hosts on our network, only to the hosts that

import this dkopcnkql file That means that this ]ju66 class will actually apply to only Red

Hat systems

Now, sit back and let NRPE go out to your network If you encounter any issues while

building NRPE, refer to the JNLA*l`b file included in the `k_o directory of the NRPE source distribution

Trang 11

Monitoring Remote Systems

So far, we’re simply using the example configuration included with Nagios to monitor

only the system that is actually running Nagios To make Nagios generally useful, we need

to monitor remote systems

Step 20: Configuring Nagios to Monitor All Hosts at Our Example Site

Trang 12

Templates are used in Nagios to avoid repeating the same values for every service

and host object These objects have many required entries, but Nagios allows the use of

every required value in the objects that we define Template definitions are very similar

to the host or service definitions that they are meant for, but templates contain the line

naceopan, to keep Nagios from loading it as a real object Any or all values can be

Note Be aware that ao_]h]pekj settings override the _kjp]_p[cnkqlo setting in service definitions

We have no ao_]h]pekj settings and won’t configure it in this chapter, but keep them in mind for your own

configurations

Now that we have a template that suits our needs, we can inherit from it in our

ser-vice definitions and specify only important values or those that we wish to override from

the template’s values

In the directory LNK@+nalh+nkkp+qon+lgc+j]ceko)_kjb+k^fa_po+oanrano, we have four

files to define the objects to monitor on our network:

฀ ฀dkopo*_bc

฀ ฀dkopcnkqlo*_bc

฀ ฀ouopai[_da_go*_bc

฀ ฀sa^[_da_go*_bc

Trang 13

฀ ฀ ฀the hosts at our site in the file dkopo*_bc:

Trang 14

Now that we have host definitions for all the hosts that we want to monitor at our

site, we will set up groups in the file dkopcnkqlo*_bc:

an existing dkopcnkql and immediately have the proper checks performed against it

Next, we set up some system level monitoring using NRPE, configured in the file

Trang 15

This entry means that the _da_g[jnla command is passed the argument _da_g[hk]`

for the Hk]`_da_gkranJNLA฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀_da_g[jnla, you can now see that what is run on the monitoring host is:

Trang 16

฀ ฀ ฀_da_g[dpplo check earlier to test the web server on localhost, so here

we simply set it up for a remote host and it works properly

Each time we update the Nagios configuration files, cfengine gets the files to the

cor-฀ ฀ ฀ ฀ ฀ ฀ etchlamp) and restarts the Nagios daemon.

฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀etchlamp system fails due to hardware issues, we

will simply need to reimage the host, and without any manual intervention cfengine will

central monitoring host

At this point, we have the four components of Nagios deployed, as planned: Nagios

to run plug-ins that we define, either locally on systems via NRPE or across the network to test client/server applications

add checks and perhaps new plug-ins Our monitoring infrastructure choice really shines

in the easy addition of new plug-ins; it should be able to support us for quite a while

with-out any core modifications

Trang 17

What Nagios Alerts Really Mean

-ing system, what does it really mean?

฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀a monitoring program or

notifica-by all systems except the monitoring host.

Don’t jump to the conclusion that a notification means that a service or host has failed You need to understand exactly what each service definition is checking and vali-date that the service is really failing with some checks of your own before undertaking any remediation steps

Ganglia

Ganglia is a distributed monitoring system that uses graphs to display the data it collects Nagios will let us know if an application or host is failing a check, but Ganglia is there to

site-specific metrics into Ganglia, though we don’t demonstrate doing so in this book

If a host intermittently triggers a load alarm in Nagios, with no clear cause ately visible, looking at graphs of the system’s load over time can be useful in helping you see when the load increase began Armed with this information, we can check if the alarm correlates to a system change or application update Ganglia is extremely useful in such

Trang 18

฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀incredibly well, and adding new custom metrics to the Ganglia graphs is extremely easy.

The core functionality of Ganglia is provided by two main daemons, along with a web front end:

฀ ฀cikj`: This multithreaded daemon runs on each host you want to monitor cikj`

keeps track of state on the system, relays the state changes on to other systems via

TCP or multicast UDP, listens for and gathers the state of other cikj` daemons in

the local cluster, and answers request for all the collected information The cikj`

configuration will cause hosts to join a cluster group A site might contain many

different clusters, depending on how the administrator wants to group systems for display in the Ganglia web interface

฀ ฀ciap]`: This daemon is used to aggregate Ganglia data and can even be used to

aggregate information from multiple Ganglia clusters ciap]` polls one or many

sockets to clients

฀ ฀ Web interface ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ciap]` daemon to receive the

clusterwide, or for a single host over periods of time such as the last hour, day,

week, or month The web interface uses graphs generated by ciap]` to display

his-torical information

Ganglia’s cikj` daemon can communicate using TCP with explicit connections to

other hosts that aggregate a cluster’s state, or it can use multicast UDP to broadcast the

and then poll those hosts explicitly with ciap]` The cikj` configuration file still has UDP

port configuration settings, but they won’t be used at our example site

Building and Distributing the Ganglia Programs

of commands Note that a C++ compiler will need to be present on the system, as well

as development libraries for RRDtool฀ ฀ ฀ ฀he^ljc-.),฀ ฀ ฀ ฀

the RRDtool libraries the build will seem successful, but the ciap]` program will fail to

be built

Trang 19

฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀cikj`*_kjb), edit as appropriate for your site, and then place the cikj`*_kjb file on the cfengine master The beautiful thing about this option is that it even emits comments describing each configuration section! Ganglia was clearly written by system administrators

Trang 20

-฀ ฀ ฀ ฀ ฀goldmaster and etchlamp to be the cluster data aggregators via the

q`l[oaj`[_d]jjah฀ ฀ ฀ ฀ciap]` to poll the cluster state from these two hosts

The p_l[] alp[_d]jjah section allows our host running ciap]`฀ ฀

฀etch-lamp) to poll state over TCP from any host running cikj` The rest of the configuration

file is unchanged

Trang 21

฀ ฀ ฀ ฀ ฀ ฀ciap]`*_kjb file from the Ganglia source distribution

at the location ciap]`+ciap]`*_kjb฀ ฀ ฀ ฀ ฀ ฀ ฀cikj`*_kjb

and ciap]`*_kjb) into the directory LNK@+nalh+nkkp+qon+lgc+c]jche])_kjb on the cfengine

Wl]oos`xod]`ksxcnkqlY files with these entries:

Trang 23

Next, add this line to LNK@+ejlqpo+dkopcnkqlo+_b*]ju so that all of our hosts get the Ganglia programs copied over:

Configuring the Ganglia Web Interface

Our central Ganglia machine will run the web interface for displaying graphs, as well as the ciap]` program that collects the information from the cikj` daemons on our network.Ganglia’s web interface is written in PHP and distributed in the source package Copy the PHP files from the Ganglia source package’s web directory to this location on the cfengine master:

Trang 25

This task causes the ciap]` daemon to be started on the c]jche][sa^ host if it isn’t

฀ ฀ ฀c]jche][sa^ in the next section) Our configuration for the ciap]`

serve as sufficient documentation to get most users going with a working configuration

Trang 26

฀ ฀ ฀ ฀ ฀ ฀ ฀SA>฀ ฀ ฀ ฀ ฀ ฀ ฀]lp)cap on

etchlamp in this case, so that we didn’t have to reimage the host just to add two packages.

Next, we created a new dkopcnkql file for our new c]jche][sa^ role on the cfengine

master at the location LNK@+ejlqpo+dkopcnkqlo+_b*c]jche][sa^, with these contents:

Once cfengine on etchlamp copies the PHP content and Apache configuration files,

we can visit dpplo6++c]jche]*_]ilej*jap+ in our web browser and view graphs for all the

hosts at our site, individually or as a whole If you haven’t previously used a similar host

you refer to the graphs during troubleshooting or for capacity planning

Now You Can Rest Easy

At this ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀

and will grow and scale along with our new infrastructure

As your site requires more and more monitoring, you might benefit from the

a test instance of distributed Nagios in order to determine if the additional load sharing

and redundancy is a good fit for your site Many sites simply purchase more powerful

this may no longer be feasible

Ganglia will scale extremely well to large numbers of systems, and most of the

follow-on cfollow-onfiguratifollow-on will be around breaking up hosts into separate groups, and possibly

to aggregate the cluster’s state and simply configure ciap]` to poll the cluster state from

a list of several hosts running cikj` This allows one or more cikj` aggregators to fail and

to run with many more as the total number of systems at your site increases

Trang 27

Infrastructure Enhancement

At this point, we have a fully functional infrastructure We have automated all of the

changes to the hosts at our site from the point at which the initial imaging hosts and

cfengine server were set up

We’re running a rather large risk, however, because if we make errors in our cfengine

configuration files, we won’t have an easy way to revert the changes We run an even

greater risk if our cfengine server were to suffer hardware failure: we would have no way

of restoring the cfengine i]opanbehao tree The other hosts on our network will continue

running cfengine, and they will apply the last copied policies and configuration files, but

no updates will be possible until we restore our central host

Subversion can help us out with both issues Using version control, we can easily

track the changes to all the files hosted in our cfengine i]opanbehao tree, and by making

backups of the Subversion repository, we can restore our cfengine server in the event of

system failure or even total site failure

Cfengine Version Control with Subversion

With only a small network in place, we already have over 2,800 lines of configuration

code in over 55 files under the LNK@+ejlqpo directory We need to start tracking the

dif-ferent versions of those files as time goes on, as well as tracking any additional files that

are added The workplace of one of this book’s authors has over 30,000 lines of cfengine

configuration in 971 files Without version control, it is difficult to maintain any

sem-blance of control over your cfengine configuration files, as well as the files being copied

by cfengine

We covered basic Subversion usage in Chapter 8 and included instructions on how

to set up a Subversion server with an Apache front end We’ll utilize that infrastructure to

host version control for our cfengine master repository

Importing the masterfiles Directory Tree

In order to import our cfengine i]opanbehao directory into Subversion, we need to create

the repository on etchlamp, our Subversion host Conveniently, we already created the

Tiêu đề	Automating Linux and Unix System Administration Second Edition Part 8
Trường học	University of Information Technology and Communications
Chuyên ngành	System Administration, Linux and Unix
Thể loại	Lecture Notes
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	44
Dung lượng	222,33 KB