on the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts

The ﬁrst one is to obtain synaptic weights for the spiking network by training a formal network.. Keywords: supervised learning, spike-timing-dependent plasticity, artiﬁcial neural netwo

Trang 1

Procedia Computer Science 101 , 2016 , Pages 187 – 196 YSC 2016 5th International Young Scientist Conference on Computational Science

Peer-review under responsibility of organizing committee of the scientific committee of the

5th International Young Scientist Conference on Computational Science

On the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts

Alexander Sboev1 ,2,3,4,5, Tatiana Litvinova2, Danila Vlasov1 ,4, Alexey Serenko2,

and Ivan Moloshnikov2 ,4

1 MEPhI National Research Nuclear University, Moscow, Russia

2 National Research Center Kurchatov Institute, Moscow, Russia

3 Plekhanov Russian University of Economics, Moscow, Russia

4 JSC “Concern ‘Systemprom’ ”, Moscow, Russia

5 Moscow Technological University (MIREA), Moscow, Russia

Sboev AG@nrcki.ru

Abstract

Two approaches to utilize spiking neural networks, applicable for implementing in neuromorphic hardware with ultra-low power consumption, in the task of recognizing gender of a text author are analyzed The ﬁrst one is to obtain synaptic weights for the spiking network by training a formal network We show the results obtained with this approach The second one is a creation

of a supervised learning algorithm for spiking networks that would be based on biologically plausible plasticity rules We discuss possible ways to construct such algorithms

Keywords: supervised learning, spike-timing-dependent plasticity, artiﬁcial neural networks, spiking neural networks

Introduction

For a few last years the interest to spiking neural networks has been growing greatly as the result

of appearance of neuromorphic hardware capable of running such networks It, in turn, gives rise to necessity to develop approaches that can be implemented on such hardware for solving practical tasks Taking into account the fact that hardware with ultra-low power consumption gives a way to solve the mentioned tasks on autonomous devices, a problem of spiking neural network learning becomes particularly relevant The task of predicting gender of a text author

on base of linguistic parameters, that could be realised on these devices, is important, in particular, for security or conversational purposes

There are generally two approaches to using spiking networks in a classiﬁcation task Since learning algorithms for artiﬁcial networks are developed more than those for spiking nets, the direct approach is to convert a trained formal network into a spiking one In [1] each formal neuron is replaced with several spiking ones They, along with the encoding and decoding

187

Trang 2

Figure 1: Algorithm steps

machinery, reproduce its activation function Furthermore, one can simply transfer synaptic weights from a trained formal network to a spiking network of same topology [2] We show

in Section 1.2 that after such transfer the spiking network achieves higher accuracy than the formal one in the Fisher’s Iris classiﬁcation task, and in Section 1.3 apply this approach to the gender recognition task

Another approach is to implement learning in spiking neuron networks by biologically in-spired learning rules There has been published a number of synaptic plasticity models suitable for supervised learning [3, 4, 5], but still none has been based only on the current knowledge

of biological neural systems operating rules, namely, on the Hebb principle As the biologically plausible long-term plasticity model we consider spike-timing-dependent plasticity (STDP) [6]

It was in [7] shown to be suitable for unsupervised learning, but a supervised learning protocol based on it has not yet been developed In Section 2.2 the STDP parameters which allow

to receive several diﬀerent synaptic weight distributions are demonstrated In Section 2.3 we show that any desired weight values can be reached in case of given proper value of correlation between input and output spike sequences Based on this fact, in Section 2.4 we suggest a supervised learning algorithm suitable for classiﬁcation of rate-coded binary vectors

1 ANN to SNN mapping approach

We here used, following [2], the combined learning algorithm, involving artiﬁcial (ANN) and spiking neural networks (SNN) It consists of the following steps (ﬁg 1):

1 Training the artiﬁcal neural network using backpropagation The neurons’ activation func-tion was ReLU for hidden layers and Softmax for the output layer Neuron biases were set to zero Input data was normalized so that the L2 norm of each vector was 1

2 Transferring the synaptic weights to the spiking neural network Integrate-and-ﬁre neuron model was chosen, in which the membrane potential V obeys dVdt =

i

s∈S iwiδ(t − s), where Si is the sequence of spikes (spike train) on i-th input synapse, and wi is the synaptic weight Whenever the potential exceeds the threshold Θ, it is reset to zero and the neuron ﬁres a spike

3 Encoding input data to spike trains Input vector component x was encoded by a Poisson spike train with mean frequencyx · νmax

4 Optimizing the spiking network parameters Besides νmax and Θ, simulation time T and simulation step Δt were adjusted According to [2],

Trang 3

• the simulation time should be set long enough to eliminate probabilistic inﬂuences of spike trains;

• correct classification is impossible if it requires a neuron to fire several spikes in one simulation step So, total input a neuron receives during one simulation step must not exceed the threshold This condition is confidently fulfilled if

νmax· Δt ·

i

To fulﬁll (1) all spiking neural network weights are divided by the normalization factorM , same for all neurons in a layer but unique for each layer,

M = 1

Θmaxj

⎛

⎝

j

wij

⎞

where wij is i-th synapse weight of j-th neuron in current layer The conditions above are necessary but not suﬃcient, so achieving maximal classiﬁcation accuracy still requires adjusting

νmax and Θ

1.2 Fisher’s Iris classiﬁcation

To test the algorithm described above the popular toy task of Fisher’s iris classification was solved The network had 4 neurons in the input layer, 4 neurons in the single hidden layer, 3 neurons in the output layer Spiking network weights were normalized according to (2) Each input vector was presented during 10 s The classification result was determined according to the output neuron that fired the most spikes during the simulation

1.2.1 Results

The mean classiﬁcation error (the ratio of wrongly classiﬁed input samples to the total number

of samples) of ReLU network was 0.04 ± 0.01 on the training set and 0.06 ± 0.04 on the test set, averaged over 20 realizations of splitting to training and testing sets The spiking network can achieve higher accuracy than the ReLU one (Fig 2), with adjusted Θ andνmax reducing the error down to 0.04 ± 0.01 The higher Θ is, the higher the accuracy is, because the neuron has to integrate more input spikes before it ﬁres an output spike

RusPersonality [8] is the ﬁrst corpus of Russian-language texts labeled with data on their authors This free-to-use corpus contains over 1,850 documents, 230 words per document in average, from 1,145 respondents and is currently expanding A unique aspect of our corpus is the breadth of the metadata (gender, age, personality, neuropsychological testing data, education level, etc.) Another advantage is that, in contrast to the common approach of retrieving texts from social networks, all our samples were designed especially for this corpus Therefore they

do not contain any borrowings or citations All respondents were given a few themes to write about, same for male and female participants This, along with the large number of participants, allows to focus on the peculiarities caused by demographic characteristics of authors (gender in the case of the current paper) rather than by their individual styles

Trang 4

!

Figure 2: Fisher’s iris classiﬁcation error of spiking network on the test set, divided by the error of ReLU network, in dependence of maximum input frequency νmax for diﬀerent neuron thresholds Θ The error is averaged over 20 realizations of splitting to training and testing sets, and then over 5 independent realizations of input spike trains For distinctness, deviation bars are shown not for every point

As the input data for gender prediction, the following set of context-independent features was used to describe a text:

• Morphological features – the number of nouns, numerals, adjectives, prepositions, verbs, pronouns, interjections, articles, conjunctions, participles, inﬁnitives, and the number of ﬁnite verbs

• Syntactical parameters – syntactic relations of diﬀerent types

• Derivative coefficients which are different ratios of parts of speech (Trager index, dynamics coefficient, etc.)

• The number of exclamatory marks, question marks, dots, and of emoticons;

• The number of words pertaining to a particular “Emotion” group, e.g., “Anxiety”, “Dis-content”, the total of 37 categories

The highest gender classiﬁcacion accuracy obtained on our corpus is 0.86±0.05 [9], employing

a sophisticated combination of learning algorithms However, we are currently interested in the diﬀerence in accuracy between the spiking network and ReLU rather than in the absolute accuracy values

The training set contained 364 texts, the testing one 187 Network topology: 141 input neurons, 81 neurons in the ﬁrst hidden layer, 19 neurons in the second hidden layer and 2 neurons

in the output layer Weight mapping was performed both with and without normalization (2)

Trang 5

"!#$

%

&'$

Figure 3: The dependence of gender recognizing error on maximum input frequency νmax for diﬀerent neuron thresholds Θ, and also with weights normalization, in which case Θ was equal

to 1

1.3.1 Results

The classification error of ReLU neural network was 0.22 on the testing set Mean classification error on test set of spiking neural network with different Θ and νmax without normalization (2) and with normalization are shown in Fig 3 Again, as in the Iris classification task, the best accuracy was obtained at high input frequencies and thresholds The lowest error is 0.22, indicating that no losses took place during mapping

2 The principal possibility of applying Spike-Timing-Dependent Plasticity to the gender recognition task

In the Spike-Timing-Dependent Plasticity model, each synapse’ strength is described by a weight 0≤ w ≤ wmax, whose change depends on the exact moments tpreof presynaptic spikes andtpost of postsynaptic spikes:

Δw =

⎧

⎪

⎨

⎪

⎩

−W−·

w

wmax

μ −

· exp

−tpre− tpost

τ− , if tpre− tpost > 0;

W+·

1− w

wmax

μ +

· exp

−tpost− tpre

τ+ , if tpre− tpost < 0

(3)

Trang 6

Figure 4: The restricted symmetric spike pairing scheme Tics denote spikes, and a gray line mean taking that pair of spikes into account in the STDP weight change rule, potentiation in pre-before-post case and depression in post-before-pre case

where W+= 0.03, W = 1.035 · W+, τ+ = τ− = τcorr= 20 ms The rule with μ+= μ− = 0 is

called additive STDP, with μ+ = μ− = 1 – multiplicative, intermediate values 0≤ μ ≤ 1 are also possible

In case of additive STDP the additional constraint is needed to prevent the weight from falling below zero or exceeding the maximum value wmax= 1:

ifw + Δw > wmax, then Δw = wmax− w; if w + Δw < 0, then Δw = w

An important part of STDP rule is the scheme of pairing pre- and postsynaptic spikes when evaluating weight change according to the rule 3 Besides the all-to-all scheme, there exist several nearest-neighbour ones [6] We used the restricted symmetric scheme (Fig 4), in which

a presynaptic spike is paired with the last preceding postsynaptic, and vice versa, but a spike can participate neither in two depression pairs nor in two potentiation pairs

As the neuron model we used Leaky Integrate-and-Fire, in which the membrane potential dynamics is

dV

dt =

− (V (t) − Vresting)

τm +Isyn(t)

Cm +Iext

Cm; when V ≥ Vth=−54 mV, V → Vresting=−70 mV, and during the refractory period τref= 3 ms the neuron is insensitive to the synaptic input The membrane capacity Cm = 300 pF, the membrane leakage time constant τm= 10 ms The postsynaptic current is of exponential form:

a presynaptic spike from synapse i at time tsp adds

wi(tsp)qsyn

τsyne−t−tspτsyn Θ(t − tsp)

to Isyn, where qsyn= 0.75 nC, τsyn= 5 ms, wiis the synaptic weight and Θ(t) is the Heaviside step function

2.2 The possibility of reaching non-bimodal weight distributions by non-additive STDP

In case of additive STDP only 0 and wmax = 1 are the stable values of weight Using non-additive STDP allows to reach more wide range of weight distributions

To investigate the ability of weights to converge to the target, we used the protocol of [10]:

1 Preliminarily, the output train of the neuron with target weights and without STDP is recorded It will be then considered as the desired output

Trang 7

Figure 5: Target synaptic weights and weights reached after applying protocol described in Section 2.2 In the left plot target weights are all equal to 0.5, and in the right plot target weights are distributed uniformly between 0 and 1

2 Then the neuron, now with STDP turned on, receives the same input trains, and is forced

to ﬁre spikes in desired moments by stimulating it by current impulses

Fig 5 shows two examples of target weight distributions that can be reached during learning with the parameters that we found, μ+= 0.06 and μ−= 0.01

2.3.1 The correlation measure

The direction of average weight change is determined by the amount of correlation between input and output spike trains Deﬁning the normed cross-correlation function as

kSpre(k · tbin)·kSpost(k · tbin)

k

Spre(k · tbin)Spost(k · tbin+ Δt),

where Spre/post(t) indicates a pre/postsynaptic spike respectively at time t, and tbin is the simulation step,

I =

τcorr

Δt=0

Γ(Δt) can be used as a rough correlation indicator, where τcorr is the STDP time window constant

2.3.2 Results

Here we artificially generated input and output spike trains with different values of correlation When STDP is applied to these trains, the weights reach some equlibrium state (Fig 6A) The obtained weights, in their turn, reproduce output signal with the same level of correlation with input as the initial artificial signal, see Fig 6B STDP was non-additive, with the parameters

as in Section 2.2

So, any desired weight value can be reached by making the neuron generate output with the proper amount of correlation with the corresponding input Based on this fact, we suggest the following protocol of supervised learning

Trang 8

'"

Figure 6: Results of applying STDP to artiﬁcially generated input and output spike trains A: the dynamics of the Eucledian norm of the weight vector during the weights convergence B:

the dependence of correlation indicator I for artiﬁcially constructed input and output trains on weights obtained as the result of STDP learning on base of these trains (“artiﬁcial output”), and the correlation of the output that the neuron produces with the established weight (“neuron output”)

As the input data we used 10-dimensional binary vectors, having half components of 0 and the other half of 1 Each vector component of 1 was encoded by 10 synapses of the neuron receiving independent Poisson trains with mean frequency of 30 Hz, a component of 0 – by 10 independent 2-Hz trains Let each vector belong to one of two classes: C+, in response to which high output frequency is expected as proper classiﬁcation, andC−, vectors from which should

produce low mean output frequency

Our model consisted of a single neuron with 100 incoming synapses, all excitatory STDP was additive withW+= 0.01, W = 1.035 · W+ Initially all weights were set to 0.4

2.4.1 The learning protocol

Input vectors are presented to the neuron in an alternating manner: a vector from C+ during

5 s, then a vector from C− for 1.5 s During the presentation of a vector from C− the neuron

is stimulated with constant current, high enough to make the mean output rate close to the highest possible 1/τref

2.4.2 Results

While the neuron is receiving an input vector from C+class, a synapse receiving high-frequency input contributes more to the neuron’s output, therefore its weight is more rewarded by STDP Vector components of 1 increase in 66% cases, and weights of synapses receiving components of

0 decrease in 66% cases with the parameters we have chosen When a vector fromC− class is

presented, the neuron output is caused by the stimulating current and is poorly correlated with input So, all weights decrease (for them not to fall to zero the duration of a vector from C−

Trang 9

Figure 7: Deviation β between actual and

tar-get weights during learning

Figure 8: Mean ﬁring rate of the neuron in re-sponse to the input vectors after learning The ﬁrst three vectors belong to C+ class, and the second three to theC− class Firing rate was

averaged over 5 tries, each having independent 30-s input spike trains

is 1.5 s in contrast to 5 s of a vector from C+), but weights of high-frequency inputs decrease more due to higher number of post-before-pre events

We took six binary vectors:

S1= (1 1 1 1 0 0 0 0 1 0),

S2= (0 1 0 0 1 1 1 0 1 0),

S3= (1 0 0 1 1 1 1 0 0 1),

S4= (1 1 0 0 0 1 0 1 0 1),

S5= (0 1 0 1 1 0 0 1 0 1),

S6= (0 1 1 0 0 1 0 1 0 0);

three of which are linearly separable from the other three The target weights which separate them are known:

(1 0 1 0 1 0 1 0 1 0),

so we watched the deviation

β(t) =

100

i=1|wi(t) − wi

target|

100 i=1wi target

between actual and target weights during learning (Fig 7) After 6,045 s of learning (310 cycles

of presenting the whole set of vectors) the neuron clearly distinguishes the classes by its mean ﬁring rate, as shown in Fig 8

Trang 10

There is a straightforward way to obtain the spiking network to solve the task of recognizing gender hidden in texts by training a well-studied artiﬁcial network and then use the ready weights in the spiking network implemented on a hardware with low energy consumption Results of mapping ANN to SNN demonstrate the same classiﬁcation error of 0.22 of both ANN and SNN, indicating lossless mapping

It is also possible to implement supervised learning in a spiking network with spike-timing-dependent plasticity, based on controlling the correlation between input and output spike trains The proposed technique opens the way for using it in practical tasks, such as gender identifying

It is a question of further research

Acknowledgements

This work was supported by RSF, project 16-18-10050 “Identifying the Gender and Age of Online Chatters Using Formal Parameters of their Texts” Simulations were carried out using high-performance computing resources of federal center for collective usage at NRC “Kurchatov Institute”, http://computing.kiae.ru

References

[1] Chris Eliasmith How to build a brain: A neural architecture for biological cognition Oxford University Press, 2013

[2] Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiﬀer Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing In IEEE International Joint Conference on Neural Networks (IJCNN), 2015

[3] R G¨utig and H Sompolinsky The tempotron: a neuron that learns spike timing-based decisions Nat Neurosci., 9(3):420–428, 2006

[4] Joseph M Brader, Walter Senn, and Stefano Fusi Learning real-world stimuli in a neural network with spike-driven synaptic dynamics Neural computation, 19(11):2881–2912, 2007

[5] Jan-Moritz P Franosch, Sebastian Urban, and J Leo van Hemmen Supervised spike-timing-dependent plasticity: A spatiotemporal neuronal learning rule for function approximation and decisions Neural computation, 25(12):3113–3130, 2013

[6] A Morrison, M Diesmann, and W Gerstner Phenomenological models of synaptic plasticity based on spike timing Biol Cybern., 98:459–478, 2008

[7] Peter U Diehl and Matthew Cook Unsupervised learning of digit recognition using spike-timing-dependent plasticity Frontiers in Computational Neuroscience, 2015

[8] OV Zagorovskaya, TA Litvinova, and OA Litvinova Elektronnyy korpus studencheskikh esse na russkom yazyke i ego vozmozhnosti dlya sovremennykh gumanitarnykh issledovaniy [electronic corpus of student essays and its applications in modern humanity studies] Mir nauki, kultury i obrazovaniya [World of Science, Culture and Education], 3(34):387–9, 2012

[9] A Sboev, T Litvinova, D Gudovskikh, R Rybka, and I Moloshnikov Machine learning models

of text categorization by author gender using topic-independent features (in review)

[10] R Legenstein, C Naeger, and W Maass What can a neuron learn with spike-timing-dependent plasticity Neural Computation, 17:2337–2382, 2005

There is a straightforward way to obtain the spiking network to solve the task of recognizing gender hidden in texts by training a well-studied artificial network. .. firing rate of the neuron in re-sponse to the input vectors after learning The first three vectors belong to C+ class, and the second three to theC− class Firing... and then use the ready weights in the spiking network implemented on a hardware with low energy consumption Results of mapping ANN to SNN demonstrate the same classification error of 0.22 of both

Tiêu đề	On the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts
Tác giả	Alexander Sboev, Tatiana Litvinova, Danila Vlasov, Alexey Serenko, Ivan Moloshnikov
Trường học	National Research Nuclear University MEPhI
Chuyên ngành	Computational Science
Thể loại	Procedia Computer Science
Năm xuất bản	2016
Thành phố	Moscow

Định dạng
Số trang	10
Dung lượng	415,41 KB