Issues in the use of neural networks in information retrieval.

Issues in the use of neural networks in information retrieval Issues in the use of neural networks in information retrieval.

Information Retrieval Models

Web classification has been studied through a lot of different technologies The crucial used models in IRSs and on the Web are [6]:

The information retrieval models based on set theory are:

The Boolean model is the simplest form of an information retrieval (IR) model, built on the AND, OR, and NOT operators and often referred to as an exact-match model It is easy to implement and computationally efficient, offering fast query processing However, its drawbacks include difficulty in constructing complex queries, lack of ranked results, and no partial matching, yielding a binary decision of either a match or no match.

Fuzzy-set-based modeling is increasingly applied in Intelligent Reflecting Surfaces (IRSs) to overcome the limitations of traditional Boolean approaches that rely on strict binary associations By assigning information entities to varying degrees of membership, the fuzzy framework introduces a gradual notion of membership or association rather than a binary decision This approach better represents uncertainty and partial relevance in IRS-enabled environments, enabling more flexible decision making, improved system performance, and more robust optimization under imperfect information.

The extended Boolean model adds value to the simpler approach by enabling weight assignment and the use of positional information, allowing more precise relevance assessment Weights assigned to data objects drive the ranked output, improving retrieval effectiveness As a synthesis of vector model characteristics and Boolean algebra, the extended Boolean model combines the strengths of weighted scoring with exact matching to deliver more accurate search results.

Besides the logical reasoning approach IR models, there are others based on algebraic calculus, like:

Vector Space Model (VSM) represents information as vectors in a multidimensional space, with each dimension corresponding to a potential feature such as a term in a document A distance function applied to these information vectors provides match scores and ranking information essential for retrieval VSM-based information retrieval is a strong mathematical framework for processing large information sources, offering partial matching capabilities and the delivery of ranked results.

4 Reshadat, V., and Feizi-Derakhshi, M.R., Neural Network-Based Methods in InformationRetrieval, American Journal of Scientific Research, 2011, 58, 33–43.

4 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval

VSM with Euclidean distance measure

VSM with angular distance (cosine) measure

Fig 1.1 Vector space model control of Boolean model, and has no means to handle semantic or syntactical information.

• Latent Semantic Analysis (LSA) based modelconverts the large information matrix (term-document) to a lower dimensional space using singular value decomposition (SVD) technique.

• Neural Networksuses the weighted and interconnected representation of information.

The models based on probabilistic inferences include:

• inference network (represented in Fig.1.2) consists of a document network and a query network.

• belief network IR model is a generalized form of inference network model having a clearly defined sample space.

The knowledge-based IR models use formalized linguistic information, structural and domain knowledge to discover semantic relevance between objects.

The structural information retrieval models combine the content and structural characteristics to achieve greater retrieval efficiencies in many applications.

Data mining and knowledge discovery from databases rely on a wide range of algorithms and techniques These include classification, clustering, regression, artificial intelligence, neural networks, association rules, decision trees, genetic algorithms, and the nearest-neighbor method, among others By leveraging these approaches, organizations can uncover hidden patterns, relationships, and insights from large datasets, enabling better decision making and predictive analytics.

With the growing volume of information, the deficiencies of traditional algorithms for fast information retrieval become more evident [5] When large data volumes must be processed, neural networks (NNs) as an artificial intelligence technique provide a suitable approach to increasing information retrieval (IR) speed Neural networks can model complex data patterns to accelerate search operations, improve ranking, and enhance overall IR performance.

Fig 1.2 Document inference network reduce “the dimension of the document search space with preserving the highest retrieval accuracy.” 5

Neural networks are a fundamental component of artificial intelligence (AI), studied for many years to achieve human‑like performance across diverse tasks, including classification, clustering, and pattern recognition, as well as speech and image recognition and information retrieval by modeling the human neural system.

A NN represents “an oversimplified representation of the neuron interconnections in the human brain,” 6 in the sense that [8]:

• the nodes constitute the processing units;

• the arcs (edges) mean the synaptic connections;

• the strength of a propagation signal can be simulated by a weight, which is associated to each edge;

• the state of a neuron is defined by its activation function;

• the axon can be modeled by the output signal which is issued by a neuron depending on its activation function.

Neural network architectures used in the modeling of the nervous systems can be divided into three categories, each of them having a different philosophy:

5 Reshadat, V., and Feizi-Derakhshi, M.R., Neural Network-Based Methods in Information Retrieval, American Journal of Scientific Research, 2012, 58, 33–43.

6 Bashiri, H., Neural Networks for Information Retrieval, http://www.powershow.com/view1/1a f079-ZDc1Z/Neural_Networks_for_Information_Retrieval_powerpoint_ppt_presentation, 2005.

(1) feedforward neural networks, for which the transformation of the input vectors into the output vectors is determined by the refining of the system parameters;

Feedback neural networks, also called recurrent neural networks (RNNs), use the input information to establish the initial activity state of the feedback system As processing proceeds through intermediate states, the system's dynamics evolve until an asymptotic final state is reached, which serves as the result of the computation.

(3) self-organizing maps(that are introduced by Kohonen), within which the neighboring cells communicate with each other.

Neural networks excel at information retrieval (IR) across large-scale text collections and multimedia databases, and have been widely applied in IR and text mining They enable key tasks such as text classification, text clustering, and collaborative filtering, making them a powerful tool for processing and understanding vast textual and multimedia data.

With the rapid growth of the World Wide Web and the Internet in recent years, these algorithms have been increasingly applied to web-related tasks They are used for web searching, web page clustering, and web mining, among other applications, enabling more effective information retrieval and data analysis online.

The capacity for tolerant and intuitive processing of the NNs offers new perspec- tives in IR [4,5,9,10].

Using NNs in IR has [9] the following advantages:

• when the requested data (keywords) are not in the document collection, the NNs can be used to retrieve the information proximity around the required information;

• the information can be classified as the common patterns.

Model of the IRS with NNs “comes from the model based on statistical, linguistic and knowledge-based approach, which expresses document content and document relevance.” 7

Various neural networks such as Kohonen’s Self-Organizing Map, Hopfield net, etc., have been applied [5] to IR models.

In the paper, we demonstrated the potential of neural networks for information retrieval (IR) and highlighted the advantages of using two neural network models to simplify the complex architecture of an IR system by substituting the interactions between subsystems with neural components.

In this work, we reduced the text documents based on the Discrete Cosine Trans- formation (DCT), by which the set of keywords reduce to the much smaller feature set.

IR is a branch of Computing Science that aims at storing and allowing fast access to a large amount of information.

Technically, IR studies the acquisition, organization, storage, retrieval, and distribution of information.

Information retrieval (IR) and SQL-based data retrieval serve different purposes: in databases, data are highly structured and stored in relational tables, which SQL queries navigate to fetch precise records, while information in natural text is largely unstructured, requiring IR techniques to index, parse, and rank documents based on relevance rather than rigid schemas.

There is no structured query language like SQL for text retrieval.

7 Mokriš, I., and Skovajsová, L., Neural Network Model of System for Information Retrieval fromText Documents in Slovak Language, Acta Electrotechnica et Informatica, 2005, 3(5), 1–6.

Mathematical Background

Discrete Cosine Transformation

For large document collections, the high dimension of the vector space matrix F causes problems in text document set representation and high computing complexity in IR.

The most often methods of the text document space dimension reduction that have been applied in IR are the Singular Value Decomposition (SVD) and Principal Component Analysis (PCA).

To reduce the text documents in this work, we use the discrete cosine transform (DCT), which condenses the keyword set into a much smaller feature set The resulting model provides a latent semantic representation of the documents, enabling efficient text analysis and retrieval.

The Discrete Cosine Transformation (DCT) [12] is an orthogonal transformation as the PCA, and the elements of the transformation matrix are computed using the following formula: t mi 2−δ m − 1 n cos π n(i−1

, (∀)i,m=1,n, (1.1) nbeing the size of the transformation and δ m 1 if m=1,

The DCT requires the transformation of the vectorsX p , p=1,N(Nrepresents the number of vectors that must be transformed), of dimension n, to the vectors

T = {t mi } i , m =1, n meaning the transformation matrix.

From the vector Y with components Y_p, p = 1 to N, we select only m components by using a mean-square criterion: compute the mean-square value for each position p, sort these values in descending order, and retain the components whose positions correspond to the top m mean-square values The remaining N−m components are canceled (set to zero).

If the vector by the indexp, p=1,N, achieved through the formula (1.3) is:

⎟⎠, then the mean square of the transformed vectors is

The DCT application involves to determine the vectorsYˆ p , p=1,Nthat contain thosemcomponents of the vectorsY p , p=1,Nthat don’t cancel.

Algorithm for Image Compression Using Discrete

Digital image processing involves [13–15] a succession of hardware and software processing steps and implementation of the theoretical methods.

The first stage of this process is image acquisition, which requires an image sensor A video camera—such as the pinhole camera model, one of the simplest camera models—can perform this role and contribute to producing a two-dimensional image.

An analog video camera output is continuous in both time and amplitude and must be converted into a digital signal to be processed by a computer This analog-to-digital conversion entails three essential stages: sampling the continuous signal to obtain discrete-time values, quantizing these samples to a finite set of amplitude levels, and encoding the quantized levels into a binary digital representation for storage and processing, as noted in [14].

Step 1 (Spatial sampling) This step has the aim to make the spatial sampling of the continuous light distribution The spatial sampling of an image means the conversion of the continuous signal to its discrete representation and depends on the geometry of the sensor elements corresponding to the acquisition device.

Step 2 (Temporal sampling) During this step, the resulting discrete function has to be sampled in the time domain to create a single image The temporal sampling is performed by measuring at regular intervals the amount of light, which is incident on each individual sensor element.

Step 3 (Quantization of pixel values) The purpose of present step is to quantize the resulting values of the image to a finite set of numeric values in order to store and process the image values on the computer.

Definition 1.1 ([14]) Adigital image Iis a two-dimensional function of natural coordinates(u, v)∈N×N, which maps to a range of possible image (pixel) values

The pixel values are binary words of lengthk(which is called thedepthof the image), so that a pixel can represent any of 2 k different values.

Fig 1.3 Transformation of a continuous intensity function F ( x , y ) to a discrete digital image

Fig 1.4 Image coordinates (see footnote 8)

For example, the pixels of the grayscale images:

• are represented usingk= 8 bits (1 byte) per pixel;

• have the intensity values in the set{0,1, ,255}, where the value 0 represents the minimum brightness (black) and 255 the maximum brightness (white).

The results of Steps 1 through 3 are presented as a description of the image in the form of a two-dimensional, ordered matrix of integers (see Fig 1.3) Figure 1.4 illustrates the image-processing coordinate system, which is flipped vertically so that the origin (u = 0, v = 0) sits at the upper-left corner, aligning with standard digital imaging conventions.

8 Burgerr, W., and Burge, M.J., Principles of Digital Image Processing Fundamental Techniques,Springer-Verlag London, 2009.

The coordinatesu, vrepresent the columns and the rows of the image, respectively.

In the case of an image with dimensionsM×N, the maximum column number is u max=M−1 and the maximum row number isv max=N−1.

After obtaining the digital image it is necessary for its preprocessing in order to improve it; some examples of preprocessing techniques for images are:

1 image enhancement, which involves the transformation of the images to highlight: some hidden or obscure details, interest features, etc.;

2 image compression, made to reduce the amount of data required to represent a given amount of information;

3 image restorationcorrects that errors that appear at the image capture.

Although several methods exist for image compression, the Discrete Cosine Transform (DCT) provides a favorable balance between compression efficiency and computational complexity A key advantage of DCT-based image compression is its data-independent nature, meaning the transform's performance does not depend on the specific input image content, enabling reliable and scalable compression across diverse images.

The DCT algorithm, which is used for the compression of 256×256, represented by the matrix of integersX =(x ij ) i , j =1,256, wherex ij ∈ {0,1, ,256}means the original pixel values, needs the following steps [16]:

Step 1 Split the original image into 8×8 pixel blocks (1024 image blocks).

Step 2 Process each block by applying the DCT, on the basis of the relation (1.3).

Step 3 Retain in a zigzag fashion the first nine coefficients for each transformed block and cancel the rest of (64−9) coefficients (namely make them to be equal to 0) as it is illustrated in the Fig.1.5.

Step 4 Apply the inverse DCT for each of the 1024 blocks resulted at previous step.

Step 5 Achieve the compressed image represented by the matrixXˆ =(ˆx ij ) i , j =1,256, where xˆ ij denote the encoded pixel values and convert them into integer values.

Step 6 Evaluate the performances of the DCT compression algorithm in terms of thePeak Signal-to-Noise Ratio(PSNR), given by [16,17]:

Fig 1.5 Zigzag fashion to retain the first nine coefficients

1.2 Mathematical Background 11 where the Mean Squared Error (MSE) is defined as follows [16,17]:

N×Nbeing the total number of the pixels in the image (for our caseN%6).

We have experimented the compression algorithm using the DCT for the Lena.bmp image [16], which has 256×256 pixels and 256 levels of gray; it is represented in Fig.1.6.

The Table1.1and Fig.1.7yields the experimental results achieved by implementing in MATLAB the DCT compression algorithm.

Fig 1.6 Image Lena.bmp (see http://www.cosy.sbg.ac.at/~pmeerw/Watermarking/lena.html)

Table 1.1 Experimental results achieved by implementing the DCT compression algorithm Number of the retained coefficients

Performances of the DCT compression algorithm

Fig 1.7 Visual evaluation of the performances corresponding to the DCT compression algorithm

Multilayer Nonlinear Perceptron

The Nonlinear Perceptron (NP) is the simplest, the most used and also, the oldest neural network The name of “perceptron” derives fromperception.

The classical structure of a NP (it is [18] a feedforward neural network, which contains three layers of neurons) is represented in Fig.1.8.

The NP which has more than one hidden layer is called the Multilayer Nonlinear Perceptron (MNP).

In a multilayer perceptron (MLP), the first layer consists of virtual neurons whose role is multiplexing rather than signal processing The actual data processing occurs in the hidden layers where learning happens, and the final outputs are produced in the output layer.

The Equations of the Neurons from the Hidden Layer

The Fig.1.9shows the connections of a neuron from the hidden layer.

• X p =(x p1 , ,x pn )signifies a vector, which is applied to the NP input;

• {W ji h } j =1, L , i =1, n represents the set of the weights corresponding to the hidden layer;

• Lis the number of the neurons belonging to the hidden layer;

• nrepresents the number of the neurons from output layer.

Fig 1.8 The classical structure of a NP with three layers of neurons

Fig 1.9 The connections of the neuron j ( j = 1 , L ) from the hidden layer

The upper index “h” which occurs in the notation of the weights afferent to this layer comes from “hidden”.

There are two processing stages:

During this stage, we shall define the activation of the neuronjfrom the hidden layer, when the vectorX p is applied at the network input.

W ji h ãx pi , (∀)j=1, L, (1.7) wherenet pj h means the activation of the neuronj.

W j h =(W j1 h , ,W jn h ) t , then net h pj ==(W j h ) t ãX p , (∀)j=1, L, (1.8) where< , >means the scalar product andtsignifies the transpose operation.

This stage involves the calculation of hidden neuronj, according to the formula:

(β) i h pj =f j h (net pj h ), (∀)j=1, L, (1.9) wheref j h is a nonlinear function by the type: f : → , f(x)= 1

1+e −λ x (λ>0is a constant); for example f can be a nonlinear sigmoid (see the Fig.1.10) orf(x) = thx (the nonlinear hyperbolic tangent function), see the Fig.1.11.

The Equations of the Neurons from the Output Layer

The Fig.1.12represents the connections of a neuron from the output layer.

• Mconstitutes the number of the neurons belonging to the output layer;

• {W kj o } k =1, M , j =1, L represents the set of the weights corresponding to the output layer(the index “o” comes from “output”).

Fig 1.10 The nonlinear sigmoid function

Fig 1.11 The nonlinear hyperbolic tangent function

Fig 1.12 The connections of the neuron k ( k = 1 , M ) from the output layer

There are two processing stages:

By applying the vectorX p at the input network, the activation of the neuronkfrom the output layer is given by the formula:

W kj o ãi pj , (∀)k=1, M, (1.10) wherenet pk o is the activation of the neuronk.

The output of neuronkwill be determined like a function of the activationnet o pk and it is expressed by the relation:

(δ) o pk =f k o (net o pk ), (∀)k=1, M, (1.11) wheref k o (net pk o )can be a sigmoid or a hyperbolic tangent function.

The training algorithm of NP is supervised, by the backpropagation type, in order to minimize the error on the training lot,

Kbeing the number of the vectors from the training lot and

(y pk −o pk ) 2 , (1.13) constitutes the error determined by the training vector, having the indexp, where:

• Y p =(y p1 , ,y pM )represents the ideal vector;

• O p =(o p1 , ,o pM )signifies the output real vector.

Fuzzy Neural Perceptron

We shall build two variants of the multilayer perceptron, that will be used for an IR model and are denoted with FNP (Fuzzy Nonlinear Perceptron) followed by a digit

[19], which means the index version of the fuzzy perceptron:

Employing the first fuzzy variant, FNP1, expands the input vector to a dimension of 5n, derived from the original size-n vectors fed into a classical perceptron, where each component is represented by five membership values corresponding to the linguistic terms unimportant, rather unimportant, moderately important, rather important, and very important These membership values can be expressed as fuzzy numbers (see Fig 1.13) The outputs of FNP1, despite the fuzzy input representation, are non-fuzzy (crisp).

The membership functions of the fuzzy termsunimportant,rather unimportant,moderately important, rather important, and respectively very important are defined [20] in the following relations:

Fig 1.13 The membership functions of importance μ unimportant ⎧⎪

An alternative formulation of the fuzzy perceptron treats the ideal outputs as fuzzy themselves For every input vector applied to the network, the corresponding ideal output vector is defined in a fuzzy form, with each component representing a degree of membership rather than a binary value This ideal output vector is determined by the chosen fuzzy logic framework, reflecting partial truth values for each output dimension Such a fuzzy-target approach can better handle uncertainty in data and improve learning when labels are noisy or imprecise.

1.2 Mathematical Background 17 each component being represented [21] by the nonlinear function: μ(y i )= 1

• Kis the number of the vectors from the training lot and respectively from the test lot;

• Mmeans the number of classes;

• z ik is the weighted distance between each input vector

X k =(x k1 ,x k2 , ,x kn ),k=1,K and the mean vector of each class

Med i =(med i1 ,med i2 , ,med in ),i=1,M, namely: z ik n h =1

(x kh −med ih ), (∀)k=1,K, i=1,M, (1.20) nbeing the dimension of the vectors;

• F d andF e are some parameters that control the amount of fuzziness in the class membership.

A New Approach of a Possibility Function Based

Probabilistic Neural Network (PNN) is a type of network derived from radial basis function (RBF) networks, grounded in Bayesian minimum risk criteria as its theoretical basis In pattern classification, the PNN’s main advantage is substituting a nonlinear learning algorithm with a linear learning algorithm, enabling efficient, fast, and probabilistic decision-making in classification tasks.

In recent years, probabilistic neural network are also used in the field of IR [23], face recognition [22], for its structure, good approximation, fast training speed and good real-time performance.

This section introduces a new type of fuzzy neural network called the Possibility Function-based Neural Network (PFBNN) PFBNN combines the strengths of traditional neural networks with the ability to process a group of possibility functions as input, enabling more flexible reasoning under uncertainty Unlike standard neural networks, PFBNN can accept multiple possibility functions at the input layer, expanding its modeling capacity for imprecise information The key advantage of PFBNN is its dual capability: it performs like a conventional neural network for regular data while integrating possibility-based inputs to handle uncertainty and ambiguity This innovative architecture advances neural network design by merging fuzzy logic with deep learning, making PFBNN a powerful tool for applications that require probabilistic reasoning and robust decision making.

The PFBNN discussed [24] in this section has novel structures, consisting of two stages:

1 the first stage of the network is a fuzzy based and it has two parts: a Parameter Computing Network (PCN), followed by a Converting Layer (CL);

2 the second stage of the network is a standard backpropagation based neural network (BPNN).

Within a possibility function-based network, the PCN can be used to predict functions, while the CL converts a possibility function into a numeric value, providing the essential layer for data classification The network can operate as a classifier using either the PCN in conjunction with the CL or the CL alone Additionally, using only the PCN enables a transformation from one group of possibility functions to another, supporting flexible functional mappings across representations.

Architecture of the PFBNN

From the Fig.1.14, which shows the block diagram of the proposed fuzzy neural network we can notice that there are two stages:

(1) the first stage of the network is a fuzzy based and it has two parts: a Parameter Computing Network (PCN), followed by a Converting Layer (CL);

(2) the second stage of the network is a standard backpropagation based neural network (BPNN).

A segmented PBFNN remains capable of performing useful functions, preserving its utility even when split into parts The network can operate as a classifier using either the PCN together with the CL or the CL alone Moreover, using only the PCN enables a transformation from one group of possibility functions to another, highlighting its versatile functionality.

Figure 1.14 shows three interconnecting networks that use three weight variable types to define their connections The first type, called aλ-weight, specifies the connection weights between nodes within the PCN, while the second type, called ar-weight, specifies the connection weights from the PCN’s output to the CL.

Fig 1.14 The framework of a possibility-based fuzzy neural network

1.4 Architecture of the PFBNN 19 c the third type called aw-weight is used for connection weights between the neurons in a standard BPNN.

As thew-weights are adjusted according to standard backpropagation algorithms, we shall discuss only the setting and adjustment of theλ- andr-weights.

The PCN accepts as input, a vector representing a group of possibility functions and generates a group of possibility functions as output.

In the PCNN, the weights associated of the neurons corresponding to theλlayers are

• k(k=1,t) represents the order of the PCN layer,

• iis the index of the neuron from the(k−1)layer,

• jmeans the index of the neuron from theklayer,

• L k is the number of the neurons of thek-th layer,L 0=n, with λ ( ij k ) : [0,1] → [−1,1] and they are always positive or always negative.

Eachλ ( ij k ) is represented as a binary tuple(ρ ( ij k ) , ω ( ij k ) ), where:

• ρ ( ij k ) is a transformation function from[0,1]to[0,1];

• ω ( ij k ) is a constant real number in[−1,1].

One can use a fuzzy normal distribution function f(x)=e − (x−μ)

2σ 2 (1.21) for each elementxof the crisp input vectorXto obtain the fuzzified input data of the PCN.

We shall compute the outputs of the neurons from thek-th layer of PCN using the relation: y j ( k ) (u) L k−1 i =1 ω ( ij k ) y ( i k − 1 ) ρ ( ij k ) (u) − 1

20 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval where:

• Y ( k ) =(y ( 1 k ) , ,y L ( k k ) )is the output vector of thek-th layer of PCN,

• Y ( k − 1 ) =(y ( 1 k − 1 ) , ,y ( L k k ) −1 )constitutes the input vector of thek-th layer (namely the output vector of the(k−1)-th layer),

• “◦” means the composite of two functions,

− 1 is the inverse function ofρ ( ij k )

The CL accepts as input a possibility function (that represents a possibility vector) generated by the PCN and transforming it into a real number vector.

Each weight of this layer is a function r ij : [0,1] → [−1,1], i=1,L t , j=1,M, whereL t is the number of the neurons from the layertof PCN andMis the number of the output neurons of CL.

Similar to lambda in the PCN, r_ij is always positive or always negative The weights of the CL can also be represented as a binary tuple r_ij = (γ_ij, τ_ij), with i = 1, , L_t and j = 1, , M, where γ_ij: [0,1] → [0,1] is a possibility function that differs from ρ, the transformation function in the PCN, and τ_ij is a constant real number in [−1,1].

The outputZ =(z 1 , ,z M )of the CL is a vector having real numbers as components, see the following formula: z j L t i =1 τ ij u max∈[0,1] min(y i ( t ) (u),τ ij (u))

Y ( t ) =(y 1 ( t ) , ,y ( L t t ) )being the fuzzy input vector of CL, which constitutes the output vector of PCN.

We shall usey i ( t ) (u)ãτ ij (u)instead of min(y ( i t ) (u),τ ij (u))in order to compute easier the outputs of the CL using (1.24).

Training Algorithm of the PBFNN

We shall build the training algorithm of the PBFNN in the hypothesis that the PCN has three layers (namelyt=3):

1 the input layer which contains a number ofL 0=nneurons

3 an output layer withL 2 neurons.

1.5 Training Algorithm of the PBFNN 21

Step 1 Initialize the weights of the PCN and CL in the following way:

(a) choose a linear function as the initial weight function for eachρ: ρ ( ij k ) (u i )=v j , i=1,L t , j=1,M (1.25) and design a genetic algorithm to search for optimalω’s;

(b) let each weight functionγas a possibility function: γ ij (u)=e − (u−u 0)

2 2σ 2 , u 0 ∈ [0,1], i=1,L t , j=1,M (1.26) assigning usuallyσ=1 and design a genetic algorithm to search for optimal τ’s.

LetY ( 0 ) =(y (0) 1 , ,y L (0) 0 )be the input fuzzy vector of the PCN corresponding to the training vector by the indexp.

Step 2 Compute the fuzzy output vectorY ( 1 ) =(y ( 1 1 ) , ,y ( L 1 1 ) )of the hidden layer of PCN using the relation: y (1) j L 0 i =1 ω ij (1) y (0) i ◦ρ (1) ij

Step 3 Compute the fuzzy output vectorY (2) =(y 1 ( 2 ) , ,y ( L 1 2 ) )of the output layer of PCN using the relation: y ( k 2 ) L 1 j =1 ω ( jk 2 ) y ( j 1 ) ◦ρ ( jk 2 )

Step 4 Apply to the input of the CL the fuzzy vectorY (2) =(y (2) 1 , ,y (1) L 2 ), which one obtains at the output of the PCN.

Step 5 Determine the output vector Z = (z 1 , ,z M ) of the CL, having each component a real number: z j L 2 i = 1 τ ij u max∈[ 0 , 1 ] y i ( 2 ) (u),τ ij (u)

, j=1,M, (1.29) whereMis the number of the output neurons of the CL.

Step 6 Adjust the weights of the output layer of the PCN:

⎩ ρ ( jk 2 ) (u j )←ρ ( jk 2 ) (u j )+μ ρ ã ∂ρ (2) ∂ E jk ( u j ) ω jk ( 2 ) (u j )←ω jk ( 2 ) (u j )+μ ω ã ∂ω (2) ∂ E jk ( u j ) , (1.30)

22 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval j = 1,L 1,k = 1,L 2,μ ρ , μ ω being two constants with the meaning of learning rates, and

E p (1.31) defines the performance of the system, where:

• |S T |represents the number of the vectors from the training lot,

• E p is the output error of the PCN for thep-th training sample, defined by:

T k =(T 1 , ,T L 2 )being the ideal output vector (the target vector) of the input vector by the indexpapplied to the PCN.

∂ρ (2) jk (u j ) (1.35) Substituting (1.34) and (1.35) into (1.30) it will results: ρ (2) jk (u j )←ρ (2) jk (u j )−μ ρ ω jk (2) ã 1

Substituting (1.38) and (1.39) into (1.30) we obtain: ω jk (2) (u j )←ω (2) jk (u j )−μ ω

Step 7 Adjust the weights of the hidden layer of the PCN:

⎩ ρ ( ij 1 ) (u i )←ρ ( ij 1 ) (u i )+μ ρ ã ∂ρ (1) ∂ E ij ( u i ) ω ij (1) (u i )←ω ij (1) (u i )+μ ω ã ∂ω ∂ (1) E ij ( u i ) ,

24 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval namely y (2) k (v k ) L 1 j =1 ω jk (2) ã

Substituting (1.43), (1.46), (1.47) into the first formula from the relation (1.41) we achieve: ρ (1) ij (u i )←ρ (1) ij (u i )−μ ρ ãω (1) ij ã

∂ω ( ij 1 ) (u i ) =ω (2) jk ãy i (0) ρ (1) ij −1 (ρ (2) jk −1 (v k ))

Substituting (1.50), (1.51) into the second formula from the relation (1.41) we achieve: ω ij ( 1 ) (u i )←ω ( ij 1 ) (u i )−μ ω ã

Step 8 Adjust the weights of the CL:

⎪⎩ γ ij (u)←γ ij (u)+μ γ ã ∂γ ∂ ij E ( u ) τ ij ←τ ij +μ τ ã ∂τ ∂ E ij ,

Let u belong to the interval [0,1], with indices i = 1,…,L and j = 1,…,M The constants μγ and μτ are the learning-rate parameters that govern the adaptation dynamics, while E denotes the overall performance of the system as defined in equation (1.31) The quantity Ep represents the output error of the CL for the p-th training sample and is defined by the expression that follows This framework links the input u, the learning-rate constants μγ and μτ, and the error measures E and Ep to characterize the learning behavior and performance of the CL model.

U=(U 1 , ,U M )being the ideal output vector (the target vector) of the input vector by the indexpapplied to the CL.

Letu max the point for whichy ( i 2 ) (u)γ ij (u)has maximum value.

26 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval where

Introducing the relations (1.56)–(1.59) into the first formula from (1.53) one obtains: γ ij (u max )←γ ij (u max )−μ γ ãτ ij ã(U j −z j )ãy ( i 2 ) (u max ), (1.60)

Substituting (1.61) and (1.62) into the second formula from (1.53) we shall achieve: τ ij ←τ ij −μ τ ãτ ij ã(U j −z j )ãy (2) i (u max )ãγ ij (u max ), (1.63)

Step 9 Compute the PCN error because of thep-th training vector with (1.32).

Step 10 Compute the CL error because of thep-th training vector, using (1.54).

Step 11 If the training algorithm has not applied for all the training vectors, then go to the next vector.

Otherwise, test the stop condition For example, we can stop the algorithm after a fixed training epoch numbers.

Neural Networks-Based IR

Keyword Recognition Approach Based on the Fuzzy

The first neural network (see Fig.1.16) of the cascade neural model, used between the query subsystem and the indexing subsystem contains three neuron layers:

At the input of the first layer, the user's formulated query is applied The input layer consists of n neurons, with each neuron representing a single character of the query.

(b) the hidden layer containsLneurons and it expresses the internal representation of the query;

(c) the third layer (the output layer) containsM neurons, each of them symbolizing a keyword.

The user poses a query on the input and the neuron in the output represents the recognized keyword A queryqcan be represented in the same way as a document in the document collection or in a different way.

11 Mokriš, I., and Skovajsová, L., Neural Network Model of System for Information Retrieval fromText Documents in Slovak Language, Acta Electrotechnica et Informatica, 2005, 3(5), 1–6.

Fig 1.16 Architecture of FNP2 for determination of keywords

In the case of FNP2, each queryq j , j= 1,K is represented with a term vector q j = (u 1j , u 2j , ,u nj )where each weight u ij ∈ [0,1] can be computed in the following way: u ij = f ij emax, (∀)i=1,n, j=1,K, (1.64) where:

• emaxrepresents the maximum number of characters of the query;

• f ij means the frequency of a character (namely, its number of appearances) in a query.

In the FNP1 input stage, each query component is represented by five membership values corresponding to five linguistic terms: unimportant, rather unimportant, moderately important, rather important, and very important, as illustrated in Figure 1.13.

The five values of each component from a query will be achieved by replacingx withu ij , in each of the relations (1.14)–(1.18).

The FNP2 (or FNP1) training is based on the use of a backpropagation algorithm consisting of the following steps [18,19]:

Step 1 Setp=1 Apply the vectorX p =(x p1 , ,x pn )representing a query at the

For the FNP2 input, the known output ideal vector Y_p = (y_p1, , y_pM) has its components defined in (1.19) Then, we randomly initialize the weights for the hidden layer, denoted as {W_ji^h} with j = 1, , L and i = 1, , n, and we also randomly initialize the weights for the output layer, denoted as {W_kj^o} with k = 1, , M and j = 1, , L.

Step 2 Calculate the activations of neurons in the hidden layer, using the formula:

30 1 Mathematical Aspects of Using Neural Approaches for Information Retrieval net pj h n i =1

The hidden layer is used for the query representation based on the formula (1.65).

Step 3 Compute the neuron outputs in the hidden layer: i pj =f j h (net pj h )= 1

1+e − net pj h , (∀)j=1,L, (1.66) f j h being the activation function in the hidden layer.

Step 4 Calculate the activations of neurons in the output layer, using the relation: net pk o L j = 1

Step 5 Compute the neuron outputs in the output layer:

1+e − net o pk , (∀)k=1,M, (1.68) f k o being the activation function in the output layer.

Step 6 Refine the weights corresponding to the output layer based on the relation:

W kj o (t+1)=W kj o (t)+η(y pk −O pk )O pk (1−O pk ), (∀)k=1,M, j=1,L,

Step 7 Adjust the weights of the hidden layer using the formula:

( y pk − O pk ) O pk ( 1 − O pk ) W kj o ( t ) x pi ,

Step 8 Compute the error due to thep-th training vector, expressed by the formula:

Step 9 Ifp < Kthen setp = p+1, namely insert a new vector to the network input Otherwise, compute the error corresponding to the respective epoch(an epoch means the going through the whole lot of vectors) using the formula:

Fig 1.17 Architecture of SANN for determination of relevant documents

E p (1.72) and start a new training epoch.

The training algorithm one finishes after a certain number (fixed) of epochs.

Text Document Retrieval Approach on the Base

of a Spreading Activation Neural Network

Spreading Activation Neural Network (SANN) is used for text document retrieval In this framework, a collection of P documents and M terms (keywords) is represented by a Vector Space Model (VSM) matrix The VSM matrix A, capturing the term frequencies across the document set, has size M × P Each column represents a document, and each row represents a term; hence A(k, j) denotes the frequency of term k in document j.

When handling large document collections, we apply the Discrete Cosine Transform (DCT) to every row of the Vector Space Model (VSM) matrix, reducing the dimensionality of the document space that encodes the recognition features and yielding a latent semantic model.

Figure1.17depicts the SANN; it has two neuron layers:

Within this neural network design, the first layer contains M neurons, each corresponding to a neuron in the output layer of either FNP2 or FNP1 The keyword vector applied to the network input represents the query, encoding the user’s search terms as the initial input signal This one-to-one mapping from the first-layer neurons to the output-layer units ensures the query is directly captured at the outset, guiding subsequent processing through the network. -**Sponsor**Looking to refine your article and boost its SEO? Let [Soku AI](https://pollinations.ai/redirect-nexad/unin8oUe?user_id=229098989) handle the heavy lifting Trained by advertising experts, it understands how to structure content for optimal performance, much like crafting coherent paragraphs with SEO in mind Regarding your article snippet, a key sentence to retain is: "The first layer contains neurons corresponding to the output layer, where the keyword vector applied to the network input represents the query," as this encapsulates the core concept.

12 Liu, B., Web Data Mining, Springer-Verlag Berlin Heidelberg, 2008.

Fig 1.18 Architecture of cascade neural network model of IR system

Section (b) covers the second layer, the output layer, which consists of P neurons, with P equal to the number of documents in the database The output vector encodes the similarity scores between the user’s query and each document in the collection, providing a direct measure of relevance for efficient document retrieval and ranking.

Within the SANN, the neuron in the input layer is identical to the neuron in the output layer of the first neural network defined by equation (1.68) The weights that connect the input-layer neurons to the output-layer neurons correspond exactly to the elements of matrix A, i.e., the weight matrix W equals A.

SANN hasn’t a training algorithm as its weights are determined on the basis of relative frequency matrix The neuron outputs in the output layer will be: net pj =f j (net pj ) M k =1

On the base of the FNP2 (or FNP1) and SANN, the following cascade neural network can be built (see Fig.1.18).

“Many university, corporate, and public libraries now use IR systems to provide access to books, journals, and other documents.” 13

Dictionary and encyclopedia databases are now widely available on personal computers, enabling quick access to structured reference material Information retrieval (IR) has proven useful across diverse fields, from office automation to software engineering, demonstrating how powerful search and indexing techniques can streamline workflows Indeed, any discipline that relies on documents to perform its work could benefit from robust IR tools, improving efficiency, accuracy, and decision-making.

13 Mih˘aescu, C., Algorithms for Information Retrieval Introduction, 2013, http://software.ucv.ro/cmihaescu/ro/teaching/AIR/docs/Lab1-Algorithms%20for%20Information%20Retrieval.%20Introduction.pdf.

Fig 1.19 Importance of using the IR system (see footnote 16) potentially use and benefit from IR.” 14 It is used today in many applications, being

Historically, information retrieval involved searching for documents, their content, and metadata stored in traditional relational databases or on the web, with the aim of making access to information quicker and easier Modern search relies on user queries and retrieved documents to deliver relevant results, with search engines such as Google, Yahoo, and Microsoft Live Search handling the process Many information retrieval challenges can be framed as prediction problems: estimating ranking scores or relevance ratings for web pages, documents, and even media like music, while also learning users’ information needs and interests to improve personalization and search performance.

“Managing the vast amount of online information and classifying it into what could be relevant to our needs is an important step toward being able to use this information.” 15

Figure1.19shows how much popular is the application of the Web Classification

“not only to the academic needs for continuous knowledge growth, but also to the

14 Mih˘aescu, C., Algorithms for Information Retrieval Introduction, 2013, http://software.ucv. ro/~cmihaescu/ro/teaching/AIR/docs/Lab1-Algorithms%20for%20Information%20Retrieval.% 20Introduction.pdf.

15 Xhemali, D., and Hinde, C.J., and Stone, R.G., Na¨ ive Bayes vs Decision Trees vs Neural Net- works in the Classification of Training Web Pages, International Journal of Computer ScienceIssues, 2009, 4(1), 16–23.

Mathematical aspects of using neural approaches for information retrieval address the industry's need for rapid and efficient information gathering and analysis By applying rigorous mathematical foundations to neural models, organizations can enhance retrieval performance and extract relevant data quickly, supporting better decision-making Maintaining up-to-date information is critical to business success, and neural information retrieval systems are designed to deliver timely insights in fast-changing environments This convergence of mathematics and neural technology enables scalable, accurate information access that meets the demands of industry practitioners.

1 L Chen, D H Cooley, and J Zhang Possibility-based fuzzy neural networks and their application to image processing IEEE Transactions on Systems, Man, and Cybernetics, 29(1):119–

2 B Hammer and T Villmann Mathematical aspects of neural networks In 11th European Symposium on Artificial Neural Networks (ESANN’ 2003), 2003.

3 T Hastie and ands Friedman J Tibshirani, R The Elements of Statistical Learning Data Mining, Inference, and Prediction Springer-Verlagn Berlin Heidelberg, 2009.

4 M.A Razi and K Athappilly A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (cart) models Expert Systems with Applications, 29:65–74, 2005.

5 V Reshadat and M.R Feizi-Derakhshi Neural network-based methods in information retrieval.

American Journal of Scientific Research, 58:33–43, 2012.

6 B Zaka Theory and applications of similarity detection techniques http://www.iicm.tugraz. at/thesis/bilal_dissertation.pdf, 2009.

7 B.M Ramageri Data mining techniques and applications Indian Journal of Computer Science and Engineering, 1(4):301–305, 2010.

8 H Bashiri Neural networks for information retrieval http://www.powershow.com/view1/ 1af079-ZDc1Z/Neural_Networks_for_Information_Retrieval_powerpoint_ppt_presentation, 2005.

9 J Mehrad and S Koleini Using som neural network in text information retrieval Iranian Journal of information Science and Technology, 5(1):53–64, 2007.

10 K.A Olkiewicz and U Markowska-Kaczmar Emotion-based image retrieval an artificial neural network approach In Proceedings of the International Multiconference on Computer Science and Information Technology, pages 89–96, 2010.

11 I Iatan and M de Rijke Mathematical aspects of using neural approaches for information retrieval Complex and Intelligent Systems (Reviewers Assigned), 2016.

12 A.N Netravali and B.G Haskell Digital Pictures: Representation and Compression Springer,

13 R.C Gonzales and A Woods Digital Image Processing Prentice Hall, second edition, 2002.

14 W Burgerr and M.J Burge Principles of Digital Image Processing Fundamental Techniques.

15 A Vlaicu Digital Image Processing (in Romanian) MicroInformatica Group, Cluj-Napoca,

16 V.E Neagoe Pattern recognition and artificial intelligence (in Romanian), lecture notes, Faculty of Electronics, Telecommunications and Information Technology, University Politehnica of Bucharest 2000.

17 M Ettaouil, Y Ghanou, K El Moutaouakil, and M Lazaar Image medical compression by a new architecture optimization model for the Kohonen networks International Journal of Computer Theory and Engineering, 3(2):204–210, 2011.

18 V E Neagoe and O Stˇanˇaásilˇa Pattern Recognition and Neural Networks (in Romanian) Ed.

19 I Iatan Neuro-Fuzzy Systems for Pattern Recognition (in Romanian) PhD thesis, Faculty of Electronics, Telecommunications and Information Technology-University Politehnica of Bucharest, PhD supervisor: Prof dr Victor Neagoe, 2003.

16 Liu, T Y., Learning to Rank for Information Retrieval, 2011, Springer-Verlag Berlin Heidelberg.

20 L.T Huang, L.F Lai, and C.C Wu A fuzzy query method based on human-readable rules for predicting protein stability changes The Open Structural Biology Journal, 3:143–148, 2009.

21 A Ganivada and S.K Pal A novel fuzzy rough granular neural network for classification.

International Journal of Computational Intelligence Systems, 4(5):1042–1051, 2011.

22 Q Ni, C Guo, and J Yang Research of face image recognition based on probabilistic neural networks In IEEE Control and Decision Conference, 2012.

23 Y Sun, X Lin, and Q Jia Information retrieval for probabilistic pattern matching based on neural network In International Conference on Systems and Informatics, ICSAI2012, 2012.

24 G.A Anastassiou and I Iatan A new approach of a possibility function based neural network In

Intelligent Mathematics II: Applied Mathematics and Approximation Theory, pages 139–150.

25 L Skovajsová Text document retrieval by feed-forward neural networks Information Sciences and Technologies Bulletin of the ACM Slovakia, 2(2):70–78, 2010.

26 I Mokriš and L Skovajsová Neural network model of system for information retrieval from text documents in slovak language Acta Electrotechnica et Informatica, 3(5):1–6, 2005.

27 T.N Yap Automatic text archiving and retrieval systems using self-organizing kohonen map.

In Natural Language Processing Research Symposium, pages 20–24, 2004.

28 B Liu Web Data Mining Springer-Verlag Berlin Heidelberg, 2008.

A Fuzzy Kwan–Cai Neural Network for Determining Image Similarity and for the Face Recognition

Introduction

Similarity is a central issue in image retrieval, affecting both unsupervised clustering and supervised classification In this work, we propose an effective method for learning image similarity by converting the Fuzzy Kwan–Cai Neural Network (FKCNN) into a supervised framework Unlike the classical unsupervised FKCNN, where a class is represented by a single output neuron, the supervised FKCNN employs multiple output neurons, each designating a distinct class This multi-output design improves performance over the unsupervised version and embodies the idea of replacing a binary membership decision with a continuous membership degree between 0 and 1.

In order to evaluate the performance of our proposed neural network, it is com- pared with two baseline methods: Self-organizing Kohonen maps (SOKM) and k-Nearest Neighbors (k-NN).

The feasibility of the presented methods for similarity learning has been success- fully evaluated on the Visual Object Classes (VOC) database [9], that consists of

Similarity is a fundamental concept across nearly all scientific fields and has deep roots in philosophy and psychology [10,11] In this work, we focus on measuring similarity within computer science, specifically in information retrieval, a domain that has traditionally emphasized images and, to a substantial degree, video and, to a lesser extent, audio.

“Measuring image similarity is an important task for various multimedia applications.” 1

1 Perkiử, J., and Tuominen, A., and Myllymọki, P., Image Similarity: From Syntax to Weak Semantics using Multimodal Features with Application to Multimedia Retrieval, Multimedia Information Networking and Security, 2009, 1, 213–219. © Springer International Publishing Switzerland 2017

I.F Iatan, Issues in the Use of Neural Networks in Information Retrieval,

Studies in Computational Intelligence 661, DOI 10.1007/978-3-319-43871-9_2

38 2 A Fuzzy Kwan–Cai Neural Network …

Use of adequate measures improves [10] the accuracy of information selection.

As there are a lot of ways to compute similarity or dissimilarity among various object representations, they need to be categorized [10] as:

To measure similarity and distance across data, this approach employs a range of distance metrics and similarity measures Core numerical-distance metrics include Minkowski distance, Manhattan (City Block) distance, Euclidean distance, Mahalanobis distance, and Chebyshev distance, while Jaccard distance and Dice’s coefficient capture similarity for sets For text and sequence data, cosine similarity, Hamming distance, Levenshtein distance, and Soundex distance provide robust options to assess likeness, enabling effective pattern recognition and data analysis.

(2) feature-based similarity measures (contrast model)

Proposed by Tversky in 1977, this approach offers an alternative to distance-based similarity measures by defining the similarity between entities A and B in terms of their shared and distinctive features The similarity is given by s(A,B) = α g(A∩B) − β g(A−B) − γ g(B−A), where α, β, and γ are weights that determine the importance of overlap and differences, g(A∩B) represents the common features of A and B, g(A−B) the features unique to A, and g(B−A) the features unique to B In essence, the more features A and B share, the higher the similarity; more distinctive features of either entity reduce the similarity, providing a flexible, feature-based measure that complements traditional distance metrics.

In order to calculate relevance among some complex data types, use of the following probabilistic similarity measure are required: maximum likelihood estimation, maximum a posteriori estimation.

(4) extended/additional measures: similarity measures based on fuzzy set theory

[12], similarity measures based on graph theory, similarity-based weighted nearest neighbors [13,14], similarity-based neural networks [11].

Next to the definition of similarity measures [15], “in the last few decades, the per- ception of similarity received a growing attention from psychological researchers.” 2

As the investigation of similarity is crucial for many classification methods, more recently, learning similarity measure [16] has also attracted attention in the machine learning community [17,18].

Classifying samples solely by their pairwise similarities can be broken down into two subproblems: first, measuring how similar two samples are, and second, using those pairwise similarity scores to perform the actual classification This separation—accurate similarity measurement followed by classification based on similarity—provides a modular, effective approach to sample classification driven by pairwise relationships.

2 Melacci, S., and Sarti, L., and Maggini, M and Bianchini, M., A Neural Network Approach to Similarity Learning, ANNPR 2008, LNAI 5064, 133–136.

3 Cazzanti, L., Generative Models for Similarity-based Classification, 2007, http://www.mayagupta.org/publications/cazzanti_dissertation.pdf

Similarity-based classifiers predict the class of a new test sample by measuring its resemblance to labeled training instances, incorporating both the similarities between the test example and each labeled sample and the pairwise similarities among the training samples themselves.

Similarity-based classification is useful for problems in Multimedia Analysis [19,

20], Computer Vision, Bioinformatics, Information Retrieval [21–23], and a broad range of other fields.“Among the various traditional approaches of pattern recognition

Statistical methods have been the most extensively studied and widely used in practice, and more recently artificial neural networks have attracted significant attention Designing a robust recognition system requires careful consideration of several core issues, including the definition of pattern classes and the sensing environment, pattern representation, feature extraction and selection, clustering analysis, classifier design and learning, the selection of training and test samples, and comprehensive performance evaluation.

NNs methods hold great promise for defining similarity.

Two similarity learning approaches based on Artificial Neural Networks (ANNs) are presented in this chapter, showcasing how neural networks can compute similarities more effectively than traditional statistical techniques The authors are highly motivated to write this paper to emphasize this performance edge and to highlight the broader applicability of neural networks in similarity estimation The main contributions of the chapter are outlined, detailing the proposed ANN-based methods and their advantages over conventional methods.

We introduce a neural method for learning image similarity by converting the classical unsupervised Fuzzy Kwan–Cai Neural Network (FKCNN) into a supervised framework, achieving superior performance over its unsupervised counterparts.

• we build a novel algorithm based on an evaluation criteria to compare the performances of the three presented methods;

• we test the resulting similarity functions on the PASCAL Visual Object Classes (VOC) data set, which consists in 20 object classes;

We conduct a comparative study of the proposed similarity learning method against two baselines, Self-organizing Kohonen Maps (SOKM) and the k-Nearest Neighbor rule (k-NN), assessing their ability to learn similarity SOKM is an unsupervised neural network, k-NN is a non-neural method, and the improved FKCNN is a supervised neural network derived from enhancing the unsupervised FKCNN These algorithms are chosen as benchmarks for the performance comparison because all of them represent extended measures to compute similarity or dissimilarity among diverse object representations, enabling a comprehensive evaluation of the proposed method.

• we highlight the overall performance of FKCNN, being better for our task than SOKM and k-NN.

4 Chen, Y., and Garcia, E.K., and Gupta, M.Y., and Rahimi, A., and Cazzanti, A., Similarity- based Classification: Concepts and Algorithms, Journal of Machine Learning Research, 2009, 10, 747–776.

5 Basu, J.K., and Bhattacharyya, D., and Kim, T.H., Use of Artificial Neural Network in Pat- tern Recognition, International Journal of Software Engineering and Its Applications, 2010, 4(2),22–34.

Artificial neural networks (ANNs) are configured for a specific application—such as pattern recognition or data classification—through a learning process The advantages of using an ANN in similarity matching are twofold: first, we can combine multiple features extracted with different methods; second, the combination of these features can be nonlinear.

Related Work

Over the past few years, researchers have developed methods to estimate similarity more efficiently and accurately Yu et al (2012) proposed a novel semisupervised multiview distance metric learning (SSM-DML) that learns multiview distance metrics from multiple feature sets and from the labels of unlabeled cartoon characters within a graph-based semi-supervised learning framework The effectiveness of SSM-DML has been demonstrated in cartoon applications.

In 2012, Yu et al introduced a novel transductive image classification method that learns distances via a hypergraph Hypergraph learning addresses the limitations of traditional graph-based approaches, which can only model simple pairwise image relationships and are sensitive to the choice of similarity parameters The proposed approach constructs hyperedges by linking each image to its nearest neighbors, enabling a richer representation of relationships among images Moreover, it can automatically adjust the influence of different hyperedges by jointly learning the labels of unlabeled images and the weights assigned to hyperedges, leading to improved classification performance.

Yu et al (2014) introduced a novel multiview stochastic distance metric learning method that effectively exploits the complementary information present across multiple views Unlike existing approaches that rely on pairwise distances, their method uses a high-order distance derived from a hypergraph to estimate the probability matrix of the data distribution, thereby capturing more complex relationships in multiview data.

Kwan and Cai (1997) introduced a four-layer unsupervised fuzzy neural network for pattern recognition, illustrating the advantage of combining fuzzy logic with neural networks The third layer comprises fuzzy neurons that compute the similarity between an input pattern and all learned patterns In the classical unsupervised FKCNN, each class is represented by a single output neuron.

Hariri et al (2008) restructured the unsupervised fuzzy neural network originally developed by Kwan and Cai, creating an improved five-layer feedforward Supervised Fuzzy Neural Network (SFN) They employed the SFN for classification and identification of shifted and distorted training patterns To demonstrate its identification capability, they used fingerprint patterns and showed that the SFN's testing results were more significant than those of the early FKCNN.

6 Chen, F., Similarity Analysis of Video Sequences Using an Artificial Neural Network, Uni- versity of Georgia, 2003, http://athenaeum.libs.uga.edu/bitstream/handle/10724/6674/chen_feng_200305_ms.pdf?sequence=1.

We started this work with the paper [30], which presents a neuro-fuzzy approach to face recognition using an improved version of Kwan and Cai fuzzy neural network

We have transformed the previously described fuzzy net from an unsupervised model into a supervised framework called the Fuzzy Kwan–Cai Supervised Net (FKCNN) and applied it to the task of face recognition The supervised FKCNN expands its output layer with multiple neurons, each representing a distinct class Classification of an input image with an unknown class is performed by matching the input pattern to the class associated with the neuron in the fourth layer whose output equals 1.

Background

Measure of the Class Similarities

We begin by carefully clarifying the notion of similarity and then outline the methods for computing it The discussion relies on the concept of class similarities, since this will be essential later in the evaluation-criteria algorithm To illustrate the setup, suppose we have a context in which measuring similarities between classes informs how the evaluation criteria are constructed.

Let there be M classes ω_i (i = 1, , M), each characterized by a mean feature vector μ_i and an inner variance S_i The distance D_ij between the feature representations of classes ω_i and ω_j is defined, with μ_i typically serving as the representative feature for class ω_i A class similarity measure R(S_i, S_j, D_ij) is introduced to quantify the similarity between ω_i and ω_j for every pair i, j ∈ {1, , M} This similarity function combines the class variances S_i and S_j and the inter-class distance D_ij to yield a similarity score that captures how closely two classes resemble each other, enabling applications in clustering, metric learning, and improved recognition performance.

(1) R(S i ,S j ,D ij )≥ 0 (the similarity measure among the classes is greater than or equal to zero);

(2) R(S i ,S j ,D ij )=R(S i ,S j ,D ji ), namely the similarity measure between two pattern classes is necessarily symmetric;

(3) R(S i ,S j ,D ij )=0 if and only ifS i =S j =0 (the similarity measure is null if and only if the inner variances are null);

If Sj = Sk and Dij < Dik, then R(Si, Sj, Dij) < R(Si, Sk, Dik) This shows that the similarity measure decreases as the distance between the classes, i.e., between the feature vectors, increases, given that the within-class variances are the same.

(5) IfD ij =D ik andS j >S k , thenR(S i ,S j ,D ij ) >R(S i ,S k ,D ik ), i.e., the similarity measure increases in the case when the distances among the classes are equal and the inner variances increase.

Table 2.1 The important notations of this subsection

The used notation Its significance

R ( S i , S j , D ij ) A measure of class similarities μ i Mean of the class ω i

S i Inner variances of the samples belonging to the class ω i

D ij Minkowski distance of order p between μ i and μ j ;

For p = 2 it means the Euclidean distance

An example of a similarity measure which we shall use throughout this paper is [31]:

D ij being the Minkowski distance of orderp betweenμ i andμ j ; forp =2 it will result the Euclidean distance For q = 2,S i represents the inner variances of the samples belonging to the classω i against the meanμ i of that class Ifp=q=2 we get the similarity measure introduced by Fisher [31].

We shall provide a notation Table2.1to list the important notations of this subsection.

Similarity-Based Classification

“Similarity-based classifiers are defined as those classifiers that require only a pairwise similarity—a description of the samples themselves is not needed.” 7

7 Mellouk, A., and Chebira, A., Machine Learning, 2009, InTech.

A simple similarity-based classifier is the k-Nearest Neighbor classification rule (k-NN), classifying an image into the class of the most similar images in the database

The probability of error for the k-NN rule is lower than that of the standard nearest-neighbor (NN) rule While the NN rule is suboptimal and does not converge to the Bayes optimal classifier, it has been shown that its error probability is asymptotically no more than twice the Bayes error rate As a result, the NN rule’s error probability is bounded below by the Bayes rate and above by two times the Bayes rate, as indicated in the cited results [32][33].

Self-organizing Kohonen maps are not designed for similarity-based classification However, this neural network has proven to be very useful across a wide range of problems and is especially well suited for clustering large, high-dimensional data sets such as images and documents It operates in an unsupervised manner, requiring no labeled information during training After training, the resulting map is organized so that similar data from the input space are mapped to the same unit or to neighboring units on the map.

The Kohonen neural network is a network whose learning process is based on [31]:

• the principle of competition: for a vector applied to the network input, one deter- mines the winner neuron in the network, as the neuron with the best match;

• the neighborhood principle: one refines both the weight vector associated with the winner neuron and the weights associated with the surrounding neurons.

Figure 2.1 demonstrates that the SOKM transforms similarities among input vectors into neighborhood relationships among neurons It shows that three similar input vectors from class 1 map to three adjacent neurons, illustrating the topology-preserving nature of the Self-Organizing Kernel Map.

The SOKM is characterized in that the neighboring neurons:

• turn into some specific detectors of different classes of patterns;

• characterize the vectors applied to the network input.

Fig 2.1 Similarities among the input vectors

Fig 2.2 Structure of a rectangular Kohonen network with M output neurons and n input neurons

The Kohonen network structure has two layers of neurons (see Fig.2.2).

Self-Organizing Maps (SOM) are a neural network where the input layer's neurons correspond to the dimension of the input vector, and the output layer contains M neurons arranged on a regular grid, typically two-dimensional, with M greater than or equal to the number of clusters All input neurons connect to every output neuron, forming a fully connected input-to-output network The SOM is a self-organizing system trained in an unsupervised manner using a competitive learning scheme, which enables the map to organize data into meaningful cluster representations.

The network training is one competitive, unsupervised, and self-organizing. The training algorithm corresponding to the SOKM can be summarized below:

Step 1 Set t =0 Initialize the learning rate η(0)and the neighborhood of the neuron j, in the frame of the network, denoted V j and establish the rule regarding their variations in time.

The weight values of the Self-Organizing Kernel Map (SOKM) are initialized with random values Each neuron j in the SOKM is represented by a weight vector W_j = (w_{0j}, w_{1j}, , w_{n-1,j})^T ∈ R^n, where j ranges from 0 to M−1 Here, M denotes the total number of neurons in the SOKM and n is the dimension of the input vectors.

Step 2 Apply the vectorX p =(x p0 ,x p1 , ,x p , n −1 ) t ⊂ n , (∀)p=1,N(Nbeing the number of the training vectors) at the network input.

Step 3 Compute the Euclidean distances betweenX p and the weight vectors associated with the network neurons, at the time momentt, based on the formula: d j =X p (t)−W j (t) 2 n −1 i = 0

Step 4 Find the winning neuron j ∗ of the SOKM, by computing the minimum distance among the vectorX p and the weight vectors W j of the network,according to the relation [31]:

Step 5 Once the winning neuron is known, one refines the weights for the winning neuron and its neighbors, using the relation [31]:

(2.9) is the learning rate andV j ∗ represents the neighborhood of the neuronj ∗

In the relation (2.9) we have:

η0 = η0(t) and σ = σ(t) denote time-dependent parameters where η0(t) is the learning rate at the neighborhood center and σ(t) is the radius-dependent parameter that governs the speed of the learning rate’s decay as a function of the neighborhood radius In other words, the central learning rate evolves over time, and the rate at which updates shrink across the map is controlled by σ(t) relative to the neighborhood size This time-varying configuration shapes the network’s adaptation dynamics by adjusting both the central update rate and its decay across the neighborhood, influencing convergence and topological organization.

• r j ∗ and r k are the position vectors within the network of the central neuron of the neighborhoodV j ∗ and respectively of the neuronkfor which we update the weights.

The neighborhood radius V j ∗ can vary in time, being chosen to the beginning of refining to cover the whole network, after which it decreases monotonically in time.

Step 6 Set p=p+1 If we finished the whole lot of the vectors, we have to check if the stop condition of the training process is satisfied, namely: after a fixed number of epochs or when we no longer get a significant change of the weights associated with the network neurons, according to [31]: w ij (t+1)−w ij (t)

Tiêu đề	Issues in the use of neural networks in information retrieval
Tác giả	Iuliana F. Iatan
Người hướng dẫn	Janusz Kacprzyk, Series Editor
Trường học	Technical University of Civil Engineering Bucharest
Chuyên ngành	Computational Intelligence
Thể loại	Book chapter
Năm xuất bản	2017
Thành phố	Cham

Định dạng
Số trang	213
Dung lượng	6,3 MB