Natur e is an incredible creator of infrasonic signals that can emanate from sources such as volcano eruptions, earthquakes, severe weather, tsunamis, meteors bolides, gravity waves, mic
Trang 1A Universal Neural Network–Based Infrasound Event Classifier
Fredri c M Ham and Ranjan Acha ryya
CONTE NTS
3.1 Over view of Infrasound and Why Cl assify Infrasou nd Ev ents? 31
3.2 Neu ral Netw orks for Infrasound Class ification 32
3.3 Detail s of the Approa ch 33
3.3.1 Infraso und Data Collect ed for Trainin g and Testin g 34
3.3.2 Radial Ba sis Functio n Neu ral Ne tworks 34
3.4 Data Pr eprocessin g 38
3.4.1 Noi se Filterin g 38
3.4.2 Feat ure Extractio n Pro cess 38
3.4.3 Useful Definition s 42
3.4.4 Sel ection Proce ss for the Optima l Numbe r of Feature Vect or Compo nents 44
3.4.5 Optima l Outp ut Thres hold Value s and 3-D ROC Curves 44
3.5 Simul ation Results 47
3.6 Conc lusions 51
Ackno wledg ments 51
Refere nces 51
3.1 Overview of I nfrasound and Why C las sify Inf rasound Events?
Infrasound is a longitudinal pressure wave [1– 4] The characteristics of these waves are similar to audible acoustic waves but the frequency range is far below what the human ear can detect The typical frequenc y ran ge is from 0.01 to 10 Hz (Figure 3 1) Natur e is an incredible creator of infrasonic signals that can emanate from sources such as volcano eruptions, earthquakes, severe weather, tsunamis, meteors (bolides), gravity waves, microbaroms (infrasound radiated from ocean waves), surf, mountain ranges (mountain associated waves), avalanches, and auroral waves to name a few Infrasound can also result from man-made events such as mining blasts, the space shuttle, high-speed aircraft, artillery fire, rockets, vehicles, and nuclear events Because of relatively low atmospheric absorption at low frequencies, infrasound waves can travel long distances in the Earth’s atmosphere and can be detected with sensitive ground-based sensors
An integral part of the comprehensive nuclear test ban treaty (CTBT) international monitoring system (IMS) is an infrasound network system [3] The goal is to have 60
Trang 2infrasound arrays operational worldwide over the next several years The main objective
of the infrasound monitoring system is the detection and verification, localization, andclassification of nuclear explosions as well as other infrasonic signals-of-interest (SOI).Detection refers to the problem of detecting an SOI in the presence of all other unwantedsources and noises Localization deals with finding the origin of a source, and classifica-tion deals with the discrimination of different infrasound events of interest This chapterconcentrates on the classification part only
3.2 Neural Networks for Infrasound Classification
Humans excel at the task of classifying patterns We all perform this task on a daily basis
Do we wear the checkered or the striped shirt today? For example, we will probably selectfrom a group of checkered shirts versus a group of striped shirts The grouping process iscarried out (probably at a near subconscious level) by our ability to discriminate among allshirts in our closet and we group the striped ones in the striped class and the checkered ones
in the checkered class (that is, without physically moving them around in the closet, only inour minds) However, if the closet is dimly lit, this creates a potential problem anddiminishes our ability to make the right selection (that is, we are working in a ‘‘noisy’’environment) In the case of using an artificial neural network for classification of patterns(or various ‘‘events’’) the same problem exists with noise Noise is everywhere
In general, a common problem associated with event classification (or detection andlocalization for that matter) is environmental noise In the infrasound problem, manytimes the distance between the source and the sensors is relatively large (as opposed toregion infrasonic phenomena) Increases in the distance between sources and sensorsheighten the environmental dependence of the signals For example, the signal of aninfrasonic event that takes place near an ocean may have significantly different charac-teristics as compared to the same event that occurs in a desert A major contributor ofnoise for the signal near an ocean is microbaroms As mentioned above, microbaroms aregenerated in the air from large ocean waves One important characteristic of neuralnetworks is their noise rejection capability [5] This, and several other attributes, makesthem highly desirable to use as classifiers
FIGURE 3.1 Infrasound spectrum.
1 Megaton yield
Volcano events baromsMicro-
1 Kiloton yield
Impulsive events
Gravity waves Mountain associated waves
Bolide
Hz
Trang 33.3 Details of the Approach
Our approach of classifying infrasound events is based on a parallel bank neural networkstructure [6–10] The basic architecture is shown in Figure 3.2 There are several reasons forusing such an architecture; however, one very important advantage of dedicating onemodule to perform the classification of one event class is that the architecture is faulttolerant (i.e., if one module fails, the rest of the individual classifiers will continue tofunction) However, the overall performance of the classifier is enhanced when the parallelbank neural network classifier (PBNNC) architecture is used Individual banks (or mod-ules) within the classifier architecture are radial basis function neural networks (RBF NNs)[5] Also, each classifier has its own dedicated preprocessor Customized feature vectors arecomputed optimally for each classifier and are based on cepstral coefficients and a subset oftheir associated derivatives (differences) [11] This will be explained in detail later Thedifferent neural modules are trained to classify one and only one class; however, for therequisite module responsible for one of the classes, it is also trained not to recognize allother classes (negative reinforcement) During the training process, the output is set to a ‘‘1’’for a correct class and a ‘‘0’’ for all the other signals associated with all the other classes.When the training process is complete the final output thresholds will be set to an optimalvalue based on a three-dimensional receiver operating characteristic (3-D ROC) curve foreach one of the neural modules (see Figure 3.2)
Infrasound class 2 neural network
Infrasound class 3 neural network
Infrasound class 4 neural network
Infrasound class 5 neural network
Infrasound class 6 neural network
Pre-processor 5
Pre-processor 6
Optimum threshold set by ROC curve
Optimum threshold set by ROC curve
Optimum threshold set by ROC curve
Optimum threshold set by ROC curve
Optimum threshold set by ROC curve
Optimum threshold set by ROC curve
1 0
1 0
1 0
1 0
1 0
1 0
FIGURE 3.2
Basic parallel bank neural network classifier (PBNNC) architecture.
Trang 43.3 1 Infras ound Data Colle cted fo r Traini ng and Testing
The data used for train ing and testing the individual network s are obt ained from mult ipleinfr asound arr ays locate d in differen t geogr aphi cal regions with differen t geome tries Thesix infr asound classes used in this study are shown in Table 3.1, and the vari ous arr aygeomet ries are shown in Figure 3.3(a) through Figu re 3.3(e) [12,13 ] Table 3.2 sho ws thevario us classes, along with the arr ay numbe rs where the data were collected , and theass ociated sa mpling freque ncies
3.3 2 Radial Basis Fu nction Neur al Networ ks
As previousl y mentioned , eac h of the neural netw ork modu les in Figure 3.2 is an RBF NN
A brief overview of RBF NNs will be given here This is not meant to be an exhaustivediscourse on the subject, but only an introduction to the subject More details can be found
in Refs [5,14]
Earlier work on the RBF NN was carried out for handling multivariate interpolationproblems [15,16] However, more recently they have been used for probability densityestimation [17–19] and approximations of smooth multivariate functions [20] In prin-ciple, the RBF NN makes adjustments of its weights so that the error between the actualand the desired responses is minimized relative to an optimization criterion through adefined learning algorithm [5] Once trained, the network performs the interpolation inthe output vector space, thus the generalization property
Radial basis functions are one type of positive-definite kernels that are extensively usedfor multivariate interpolation and approximation Radial basis functions can be used forproblems of any dimension, and the smoothness of the interpolants can be achieved toany desirable extent Moreover, the structures of the interpolants are very simple How-ever, there are several challenges that go along with the aforementioned attributes of RBFNNs For example, many times an ill-conditioned linear system must be solved, and thecomplexity of both time and space increases with the number of interpolation points Butthese types of problems can be overcome
The interpolation problem may be formulated as follows Assume M distinct datapoints X ¼ {x1, , xM} Also assume the data set is bounded in a region V (for a specificclass) Each observed data point x 2 Ru (u corresponds to the dimension of the inputspace) may correspond to some function of x Mathematically, the interpolation problemmay be stated as follows Given a set of M points, i.e., {xi2 Ruji ¼ 1, 2, , M} and acorresponding set of M real numbers {di 2 Rji ¼ 1, 2, , M} (desired outputs or thetargets), find a function F:RM! R that satisfies the interpolation condition
F(xi) ¼ di, i ¼ 1, 2, , M (3:1)
TABLE 3.1
Infrasound Classes Used for Training and Testing
No SOI Used for Training (n ¼ 351)
No SOI Used for Testing (n ¼ 223)
Trang 5Sensor 5 (−19.8, 0.0)
Sensor 2 (−1.3, −19.9)
Sensor 4 (19.8, −0.1)
Array BP2
Sensor 3 (0.7, 20.1)
(−22.0, 10.0)
Sensor 4 (45.0, −8.0)
Sensor 2 (−20.1, 0.0)
Sensor 4 (20.3, 0.5)
Array K8202
Sensor 3 (0.0, 20.2)
(−12.3, −15.8)
Sensor 4 (14.1, −14.1)
Array K8203
Sensor 3 (1.1, 20.0)
Trang 6Thus , all the point s must pass thr ough the interpol ating sur face A radial basis func tionmay be a special inte rpolating func tion of the form
F( x) ¼XM i¼ 1
wi f i ( x xk ik2 ) (3:2)
wh ere f ( ) is kno wn as the radial basis functi on and k.k2 deno tes the Euc lidean norm Ingene ral, the data point s xi are the center s of the radial ba sis func tions and are frequ entlywritt en as ci
One of the problem s encounter ed when attempti ng to fit a func tion to da ta point s isover -fitting of the data, that is, the value of M is too large Howeve r, general ly speaking,this is less a problem the RBF NN that it is with , for example , a multi-l ayer per ceptrontrain ed by backpro pagation [5] The RBF NN is attemptin g to constru ct the hype rspace for
a particul ar pro blem wh en given a limited number of da ta point s
Let us take another point of view concer ning how an RBF NN per form s its constr uction
of a hype rsurface Regul arization theo ry [5,14] is applied to the constr uction of thehype rsurface A geomet rical explanati on follo ws
Consi der a set of input data obt ained from sever al events from a single cl ass The inp utdata may be from temp oral sign als or defined features obt ained from thes e sign als usin g
an appropriate transformation The input data would be transformed by a nonlinearfunction in the hidden layer of the RBF NN Each event would then correspond to apoint in the featu re spac e Figure 3.4 depicts a two- dimensi onal (2-D) feature set, that is,the dimension of the output of the hidden layer in the RBF NN is two In Figure 3.4, ‘‘(a)’’,
‘‘(b)’’, and ‘‘(c)’’ correspond to three separate events The purpose here is to construct asurface (shown by the dotted line in Figure 3.4) such that the dotted region encompassesevents of the same class If the RBF network is to classify four different classes, there must
be four different regions (four dotted contours), one for each class Ideally, each of theseregions should be separate with no overlap However, because there is always a limitedamount of observed data, perfect reconstruction of the hyperspace is not possible and it isinevitable that overlap will occur
To overcome this problem it is necessary to incorporate global information from V (i.e.,the class space) in approximating the unknown hyperspace One choice is to introduce asmoothness constraint on the targets Mathematical details will not be given here, but for
an in-depth development see Refs [5,14]
Let us now turn our attention to the actual RBF NN architecture and how the network
is trained In its basic form, the RBF NN has three layers: an input layer, one hidden
Trang 7layer, and one output layer Referring to Figure 3.5, the source nodes (or the inputcomponents) make up the input layer The hidden layer performs a nonlinear trans-formation (i.e., the radial basis functions residing in the hidden layer perform thistransformation) of the input to the network and is generally of a higher dimension thanthe input This nonlinear transformation of the input in the hidden layer may be viewed
as a basis for the construction of the input in the transformed space Thus, the term radialbasis function
In Figure 3.5, the output of the RBF NN (i.e., at the output layer) is calculatedaccording to
are the weights in the output layer, N is the number of neurons in the hidden layer, and ck
2 Ru1 are the RBF centers that are selected based on the input vector space TheEuclidean distance between the center of each neuron in the hidden layer and the input
to the network is computed The output of the neuron in a hidden layer is a nonlinear
(a)
(b) (c)
Feature 1
FIGURE 3.4 Example of a two-dimensional feature set.
Trang 8func tion of this dista nce, and the output of the network is compu ted as a wei ghted sum ofthe hidde n layer outpu ts.
The func tional form of the radial ba sis func tion, fk( ), can be any of the follow ing:
. Line ar func tion: f( x) ¼ x
. Cubi c appr oximatio n: f (x) ¼ x3
. Thin- plate-s pline function: f (x) ¼ x2 ln(x)
. Gaus sian function : f (x) ¼ exp( x2/s2)
. Mult i-quadrat ic functi on: f ( x) ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ck, of the Gau ssian func tions are points used to per form a sampl ing of the inp ut vectorspac e In general , the center s form a subset of the inp ut data
3 4 D ata P reproc essi ng
3.4 1 Noise Filtering
Microb aroms, a s pre viousl y def ined, are a persisten tly pres ent sour ce of noise thatresides in most collected infr asound sign als [21–23 ] Mi crobarom s are a clas s of infr asonicsign als charact eriz ed by narrow-b and, nearly sinu soidal wavef orms, with a periodbet ween 6 and 8 sec These signal s can be gene rated by marin e sto rms through a non-linear inte raction of surfac e waves [24] The freque ncy conten t of the microbar oms oftencoinci des with that of small-yi eld nuclea r ex plosions This co uld be bothersome in man yappli cations; howev er, simple band-pass filtering can allevi ate the pro blem in man ycase s The refore, a band-pass filter with a pass band betwee n 1 and 49 Hz (for signal ssamp led at 100 Hz) is used here to elimi nate the effects of the microb aroms Figure 3.6
shows how band-pass filteri ng can be used to elimina te the mi crobarom s problem
3.4 2 Featur e E xtractio n Process
Dep icted in eac h of the six graphs in Figure 3.7 is a collectio n of eight sign als fromeach cl ass, that is, yij (t ) for i ¼ 1, 2, , 6 (classe s) and j ¼ 1, 2, , 8 (num ber of sign als)(see Ta ble 3.1 for total number of signals in each class) A feature extra ction proces s isdesired that wil l captu re the salient features of the sign als in each class and at the sametime be invarian t relative to the arr ay geome try, the geog raphica l locat ion of the array , thesamp ling frequenc y, and the leng th of the time window The overall per formance ofthe cl assifier is co ntingen t on the data that is used to train the neural netw ork in each ofthe six modu les shown in Figure 3.2 Moreo ver, the neu ral netw ork’s ability to distinguishbetween the various events (presented the neural networks as feature vectors) is thedistinctiveness of the features between the classes However, within each class it isdesirable to have the feature vectors as similar to each other as possible
There are two major questions to be answered: (1) What will cause the signals in oneclass to have markedly different characteristics? (2) What can be done to minimize these
Trang 9differen ces and achieve uni formit y with in a class and distin ctive ly dif ferent featu re vectorcharact eristic s between class es?
The answer to the first questi on is quite simple—n oise This can be noise as sociatedwith the sensors, the data acquis ition equip ment, or other unwa nted sign als that are not
of interest The answ er to the sec ond question is also quite sim ple (once you know theansw er)—using a feature extra ction process ba sed on compu ted cepst ral co efficients and
a subset of thei r assoc iated der ivatives (di fferences ) [10,11 ,25 –28]
As me ntioned in Secti on 3.3, each cl assifier has its own dedicat ed pre process or (seeFigure 3.2) Customi zed feature vec tors are compute d optim ally for each clas sifier (orneural module) and are based on the aforem entio ned cepst ral coefficie nts and a subset oftheir associa ted deriva tives (or differen ces) The pr eprocessin g proce dure is as follow s.Each time-do main signal is first norm alized and then its mean value is co mpute d andremove d Next , the power spect ral dens ity (P SD) is calcu lated for each signal, whic h is amixtur e of the desire d comp onent and possi bly other unwa nted signal s and noise.Therefor e, when the PSDs are comp uted for a set of signal s in a defin ed class there will
be spect ral compo nents ass ociate d wi th noise and ot her unwa nted signal s that need to besuppres sed This can be systemati cally accomp lished by first compu ting the av erage PSD(i.e., PSDavg ) over the suite of PSDs for a particul ar class The spect ral co mponent s aredefine d as mi for i ¼ 1, 2, for PSD avg The max imum spect ral compo nent, mmax , ofPSDavg is then deter mined This is consi dered the dominan t spectral comp onent wi thin aparticul ar cl ass and its value is used to supp ress selected comp onents in the res ident PSDsfor any particu lar cl ass according to the follo wing:
if m i> «1 mmax(typ ically «1¼ 0: 001)then mi mi
Before filtering
1000 –0.5
1
1
0.0 –0.5
0.5
Before filtering
8 6 4 2 0
Frequency (Hz) Frequency (Hz)
Trang 10To some extent, this will minimize the effects of any unwanted components that mayreside in the signals and at the same time minimize the effects of noise However, anotherstep can be taken to further minimize the effects of any unwanted signals and noise thatmay reside in the data This is based on a minimum variance criterion applied to thespectral components of the PSDs in a particular class after the previously described step iscompleted The second step is carried out by taking the first 90% of the spectral compon-ents that are rank-ordered according to the smallest variance The rest of the components
Raw time-domain 8 signals for vehicle
Raw time-domain 8 signals for missile
2 1.5 1 0.5
–0.5 –1 –1.5 –2 –2.5 –3
0 1000 2000 3000 4000
Time (sec)5000 6000 7000 80000
Raw time-domain 8 signals for artillery
Raw time-domain 8 signals for jet
Raw time-domain 8 signals for shuttle
1 0.9 0.8 0.7
0.5 0.4 0.3 0.2 0.1
FIGURE 3.7 (See color insert following page 178.)
Infrasound signals for six classes.
Trang 11in the power spectral densities within a particular class are set to a small value, that is, «3
(typically 0.00001) Therefore, the number of spectral components greater than «3 willdictate the number of components in the cepstral domain (i.e., the number of cepstralcoefficients and associated differences) Depending on the class, the number of coeffi-cients and differences will vary For example, in the simulations that were run, the largestnumber of components was 2401 (artillery class) and the smallest number was 543(vehicle class) Next, the mel-frequency scaling step is carried out with defined valuesfor a and b [10], then the inverse discrete cosine transform is taken and the derivatives(differences) are computed
From this set of computed cepstral coefficients and differences, it is desired to selectthose components that will constitute a feature vector that is consistent within a particularclass That is, there is minimal variation among similar components across the suite offeature vectors So the approach taken here is to think in terms of minimum variance ofthese similar components within the feature set
Recall, the time-domain infrasound signals are assumed to be band-pass filtered toremove any effects of microbaroms as described previously For each discrete-timeinfrasound signal, y(k), where k is the discrete time index (an integer), the specificpreprocessing steps are (dropping the time dependence k):
(1) Normalize (i.e., divide each sample in the signal y(k) by the absolute value ofthe maximum amplitude, jymaxj, and also divide by the square root of thecomputed variance of the signal, sy, and then remove the mean:
where Ryy() is the autocorrelation of the infrasound signal y
(3) Find the average of the entire set of PSDs in the class, i.e., PSDavg
(4) Retain only those spectral components whose contributions will maximize theoverall performance of the global classifier:
if mi> «1mmax(typically «1¼ 0:001)then mi mi
else «2 mi (typically «2¼ 0:00001)(5) Compute variances of the components selected in Step (4) Then take the first90% of the spectral components that are rank-ordered according to the smallestvariance Set the remaining components to a small value, i.e., «3 (typically0.00001)
(6) Apply mel-frequency scaling to Syy(kv):
Smel(kv) ¼ a loge½bSyy(kv) (3:7)where a ¼ 11.25, b ¼ 0.03