This paper proposed a prediction method for the sensor node positioning based on a multilayer perception in the neural networks.. The experimental results compared with the other method
Trang 1on Multilayer Perceptron
Thi-Kien Dao
1
, Shi-Jie Jiang
1 , Truong-Giang Ngo
2 (B) , Thi-Thanh-Tan Nguyen
3 , and Trong-The Nguyen
1,4
1
Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of
Technology, Fuzhou 350014, China vnthe@hpu.edu.vn
2
Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam
giangnt@tlu.edu.vn
3
Information Technology Faculty, Electric Power University, Hanoi, Vietnam
4
Haiphong University of Manage and Technology, Haiphong 180000, Vietnam
Abstract Node positioning accuracy and the environmental impact on devices
in wireless sensor networks (WSN) have been paid attention much by scholars recently This paper proposed a prediction method for the sensor node positioning based on a multilayer perception in the neural networks The node locations based
on its signals’ strength characteristics are captured to be a dataset The features about the signal strength of the node considered to extract from a large number of signal strength samples included noise that is measured by the nearest neighbor estimation for inputs of the scheme system The experimental results compared with the other method in the literature shows that the proposed scheme provides higher positioning accuracy and the lower average error than the competitors
Keywords:Wireless sensor networks · Multilayer perceptron · Indoor
positioning node · Predictive positioning
Thanks to developing computer technology and smartphones, the sensors’ smart devices have become popularized in our daily life, e.g., in the fields of health care with positioning services, environment monitoring [1 3] The node devicelocation is tofind outits position by the estimation technique for the indoor environment [4,5] Several factors can influence the node localization accuracy, e.g., complex indoor radio transmission environment, indoor building layout, personnel mobility, and so on [6] The indoor signal fading model cannot be established accurately, so its progress lags far behind the outdoor positioning technology [7] Global positioning system (GPS) often widely deployed and applied in outdoor positioning technology and the cellular base station
Node indoor positioning solutions with low-cost and high-precision have been paid more attention from scholars The wireless communication technology, e.g., WiFi, Zig-bee, Cellular, Bluetooth, can effectively be used to solve the blocking problem with GPS
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2021
J.-S Pan et al (eds.), Advances in Intelligent Information Hiding and Multimedia
Signal Processing, Smart Innovation, Systems and Technologies 212,
Trang 2as outdoor [8] However, the localization accuracy is affected by obstacles, non-line-of-sight propagation, and noise due to the complex indoor environment [9] The traditional methods, such as continuous positioning, cannot achieve high accuracy because of the complex environments or the overfitting problem The primary practical significance is
to study indoor positioning algorithms for smart node devices
This paper considers the node device indoor positioning method based on the multi-layers perception learning with the hidden structural features of data extracted by direct learning The network generalization ability is used to avoid the overfitting problem A deep learning model of the input, hidden, and output layers with setting the input–output approximatelyequal, learning the parametersof network weights, and then build the encoding mode, is applied to identify continuous prediction positioning The location
of the nodes is estimated and measured through the captured signals strength with the nearest neighbor algorithm The signal strength data of a stacked coding scheme is used
to build a position database for fitting generalization
Amultilayer perceptron (MLP)is amultilayer neural network thatusually contains multiple hidden layers, which i mprove network expression ability for prediction It is similar tothethree-layer structureof the traditional neural networkthat includesan input layer, a hidden layer, and output layer The gradient of concentrated learning is transmitted effectively through layer-by-layer training methods
MLP is known as a concentrated pre-feedback network that is a typical deep learning model Its multiple layers of nodes, where each layer is fully connected to the next layer, and each node in the hidden layer is operated with a nonlinear activation function There are several activation functions, e.g., Sigmoid, Tanh, Relu functions
The Sigmoid function is expressed as follows The Sigmoid function is used neural networks as an S-type function that can compress the real number into the interval of [0, 1], which is under a durable explanatory power However, when the neuron approaches
0 or 1, saturation will occur, leading to gradient dispersion Therefore, the weight should
be initialized carefully
f (x) =
1
1 + ex
(1) The Tanh function: this function has good data control ability and maps real numbers
to the interval of [−1, 1], but there is still a saturation problem
f (x) = e x
− e
−x
ex+ e−x
(2) Relu function: is a linear correction unit, which is 0 when x < 0 and 1 when x > 0 Relu converges faster, but Relu is also more fragile Large gradient flow may lead to the permanent failure of neurons, which can be avoided by selecting an appropriate learning rate or inter-layer batch regularization The formula is as follows:
Trang 3The networkmodel is trained by using a back-propagation model The training sample set is x
(1)
, y (1) , , | x
(m) , x (m) , where m is the number of samples, and the sample set is utilized for training the neural network The loss function in the experiment
is expressed as follows
J (W , b; x, y) =
1 2
hW ,b(x) − y
2
(4) The critical step of the gradient descent method is equivalent to calculate the partial derivatives The iterative formula is given as follows
W (1)
ij = W
(1)
ij − α
∂
∂ W (1) ij
where W andb are the weight and the bias item in the network, respectively
b (1)
(1)
i − α
∂
∂ b (1) ij
where the a is t he learning rate
A deep learning regressionprediction model with an indoor location scheme can predict and estimate discrete points For a more accurate continuous prediction location,
a regression prediction model is used to build a dataset by using meaningful learning The linear regression model can be expressed as follows
f (x) = w
T
where x represents input, w represents the weight, and b represents deviation w and b are trained as minimized objective functions The model first processes input data and then performs pre-training When the output layer is achieved, the model will propagate back The algorithm stops when it converges
The location information of the sensor nodes inthe deployed network environment
is estimated by deep learning indoor location algorithm with the signal strength of various signal sources sensor devices The position point’s dataset is established based
on the principle of extracts feature or reduces noise It would be trained and tested by a multilayer perception It matches the signal intensity features in the position of the node dataset, and the nearest neighbor algorithm is used to estimate the location of the points that are measured and chooses for the best matching position
3.1 The Positioning Algorithm
The related features of the collected data can be extracted from high-dimensional col-lected data, and reduce the data dimension The input layer, the hidden layer, and the output layer are set as the calculation as follows
h =
1
(8)
Trang 4Let h be the hidden layer, and v be to calculate reconstructed output layer u’ The calculation method is as follows:
u =
1
1 + exp(−w h − b )
(9)
where w and w are, respectively, the connection weights between the input layer and the hidden layer and between the hidden layer and reconstructed output layer The weight mat rix w is limited to the transpose of the weight matrix w that is, w = w
T b and b is the bias units of the hidden layer and reconstructed output layer, respectively; h is the hidden layer unit data The training of the automatic coding machine is to minimize the reconstruction error between u and u obtained through the input layer v The smaller the error is, the closer the reconstructed output layer is the input layer The hidden layer can better expressthe information of theinputlayer toreachthe purpose of feature extraction
The K-dimensional vector v = {vi|i = 1, 2, , k },the inputlayer of the N the hidden layer The number of hiddenlayerneurons of experimentalencoding values
voffline = vji|j = 1, 2, , J ; i = 1, 2, , K is trained at the input of the stack-ingautomatic coding machine of structure J is the number of datastrips collected
in the offline phase, and each dimension of each piece of data corresponds to an RSS of fixed AP or iBeacon The training under newly collected data: DATAoffline = h
3
ji offline
|j = 1, 2, , J ; i = 1, 2, , n ={ dataji|j =1, 2, , J ; i =1, 2, , n},
n represents the dimension of data
3.2 Nearest Neighbor Technique
Phase data {voffline= 1, 2, , K } is put in system with the input layer, and the structure for a forward propagation, where the parameters w andb are the DATA trained in the offline phase, and DATA = h
3
i online
|i = 1, 2, , n = {DATAi|i = 1, 2, , n} as the input data of the classifier nearest neighbor method iBeacon corresponding to the RSS [10] of each dimension of the original fingerprint database and the Vonline phase DATA online are the same, and the information expressed by each dimension of the new fingerprint database and the online DATA is also corresponding In the original dataset
of offline and online DATA, the nearest neighbor method is used to calculate the online phase data and the Euclidean distance of the i data in the new dataset
dj = n
i =1 DATAi− dataij
2
(10 )
where datajirepresents the i dimension data of the j data in the new fingerprint database, datajirepresents the i dimension data in the online phase, and n represents the dimension
of the data processed by the automatic stack encoder Finally, depending on the order
ofEuclidean distancedj fromsmall to large (theshorter the distance,the higher the similarity of the two kinds of data), the coordinate of the sampling point with the smallest range is the positioning result
Trang 54 Experimental Results
A deployed network area is used for the nodes device localization to verify the effective-ness of the proposed scheme The setting environment of the deployed coverage network area includes corridors and offices equipped with desks, chairs, bookcases, and other office items The signal strength collected with the groups at a time interval of constant seconds for data in each location is to collect data
The simulation of the multilayered neural network is tested with a set of 100 samples Several hidden layers in the multilayered neural network classifier are set to L (L is set
to 3, 5, 10, 20, 50); the activation function used Relu adopted of the hidden layer that initializes the weight Regression fitting is carried out on the test set to predict the results
of coordinate points It can be observed in the table that the prediction effect of multiple hidden layers is obviously better than that of a single layer, but the positioning error is still substantial
Table1shows a comparison of the prediction effect of multiple hidden layers with
a single layer The results of coordinate node-points are estimated on the test set based
on a regression fitting It can be seen that the positioning error of the multilayer hidden
is better than a single layer
Table 1 Comparison of results of multi-hidden layers with the single hidden layer
Hidden layer setting A single hidden layer Multi-hidden layers Mean positioning error /m 0.336 0.268
The error is less than 0.25 m registration point/% 19.5 51.8
Table2shows the specific resultsof theselected activation functions,e.g., Relu, Sigmoid,and Tanh, for thehidden layers Itis clear to see thatadequate positioning accuracy through the complete action Relu function performs well in the classification task However, the Sigmoid and other functions are so practical in the case of uneven data distribution due to its weak ability to control data
Table 2 Comparison of positioning accuracy of selected activation functions for the hidden layers
The activation function Relu Sigmoid Tanh Mean positioning error /m 0.2716 0.2158 0.1796 The error is less than 0.25 m registration point /% 37 54 71
The experimental results of the proposed method are compared with the other tech-niques, e.g., grey wolf optimizer (GWO) [4], firefly algorithm (FA) [5], pigeon-inspired optimization (PIO) [9], Ion motion optimization (IMO) [11] for constructing the rela-tionship between positioning features and positioning coordinates under the same
Trang 6exper-method with several nodes positioning means by, e.g., GWO, FA, IMO, and PIO algo-rithms Subfigure (a) is the average positioning error comparison, and subfigure (b) is the average positioning error cumulative probability distribution Observed, the obtained resultsof the proposedmethodcan provide smaller errors inthe devicepositioning problems
Table3 depictsthe comparison of thetime consumption of the proposed method with the GWO, FA, IMO, and PIO approaches for node positioning problem (ms) with
a variety of nodes numbers of deployed networks It can be seen that most cases of the time running of the proposed method produce a shorter time than the competitors
Table 3 Comparison of the time consumption of the proposed method with the GWO, FA, IMO, andPIOapproaches fornodepositioningproblemwithdifferentnodesnumbers ofdeployed networks
Algorithms N = 20 N = 50 N = 80 N = 110 N = 130 N = 160 Proposed method 297.476 350.721 398.991 451.081 502.766 552.004 GWO 301.350 357.109 408.329 459.231 513.221 565.171
FA 302.701 353.281 405.421 458.341 508.217 557.124 IMO 6.213 102.334 166.210 217.662 275.329 327.371 PIO 60.001 101.219 159.296 241.002 266.737 319.534
Generally, the comparison results of location accuracy and calculation time show that the proposed algorithm can achieve better performance than the competitors
In this study, we proposed a prediction scheme for the sensor node positioning based
ona multilayer perceptionin the neural networks The nearest neighbor was used to estimation for inputs of the scheme system The signals’ strength of the node locations wasused to bethe parametersas inputs to theclassification system The featuresof the node signal strength were extracted from a large number of signal strength samples eventincluded noise Theexperimentalresults comparedwith theotherapproaches, e.g., GWO, FA, IMO, and PIO methods in the literature, show that the proposed scheme provides higher positioning accuracy and the lower average error than the competitors
Trang 7Fig 1 Comparison of the obtained values of the proposed method with several nodes positioning means by, e.g., GWO, FA, IMO, and PIO algorithms Subfigure a is the average positioning error comparison, and subfigure b is the average positioning error cumulative probability distribution
References
1 Clemensen, J.,Larsen,S.B.,Kyng,M.,Kirkevold,M.: Participatorydesigninhealth sci-ences: using cooperative experimental methods in developing health services and computer technology Qual Health Res 17, 122–130 (2007)
2 Dao, T., Nguyen, T., Pan, J., Qiao, Y., Lai, Q.: Identification failure data for cluster heads aggregation in WSN based on improving classification of SVM IEEE Access 8, 61070–61084 (2020).https://doi.org/10.1109/ACCESS.2020.2983219
3 Nguyen, T.-T., Qiao, Y., Pan, J.-S., Chu, S.-C., Chang, K.-C., Xue, X., Dao, T.-K.: A hybridized parallel bats algorithm for combinatorial problem of traveling salesman J Intell Fuzzy Syst Preprint, 1–10 (2020).https://doi.org/10.3233/JIFS-179668
4 Nguyen,T.-T.,Thom,H.T.H.,Dao,T.-K.:Estimationlocalizationinwirelesssensor net-work based on multi-objective grey wolf optimizer In: Akagi, M., Nguyen, T.-T., Vu, D.-T., Phung, T.-N., Huynh, V.-N (Eds.), Adv Inf Commun Technol Proc Int Conf ICTA 2016, Springer International Publishing, Cham, pp 228–237 (2017) https://doi.org/10.1007/978-3-319-49073-1_25
5 Nguyen, T.-T., Pan, J.-S., Chu, S.-C., Roddick, J.F., Dao, T.-K.: Optimization localization in wireless sensor network based on multi-objective firefly algorithm J Netw Intell 1, 130–138 (2016)
6 Nguyen, T.T., Pan, J.S., Dao, T.K.: An improved flower pollination algorithm for optimizing layouts of nodes in wireless sensor network IEEE Access 7, 75985–75998 (2019).https:// doi.org/10.1109/ACCESS.2019.2921721
7 Pei, L., Chen, R., Chen, Y., Leppäkoski, H., Perttula, A.: Indoor/outdoor seamless positioning technologies integrated on smart phone In: 2009 First Int Conf Adv Satell Sp Commun., IEEE, pp 141–145 (2009)
8 Yeh, S.-C., Hsu, W.-H., Su, M.-Y., Chen, C.-H., Liu, K.-H.: A study on outdoor positioning technology using GPS and WiFi networks In: 2009 Int Conf Networking, Sens Control, IEEE, pp 597–601 (2009)
9 Nguyen, T.-T., Pan, J.-S., Dao, T.-K., Sung, T.-W., Ngo, T.-G.: Pigeon-inspired optimization for node locationin wireless sensornetwork BT—advancesin engineering researchand application In: Sattler, K.-U., Nguyen, D.C., Vu, N.P., Tien Long, B., Puta, H (Eds.) Springer International Publishing, Cham, pp 589–598 (2020)
10 Vaghefi, R.M., Gholami, M.R., Ström, E.G.: RSS-based sensor localization with unknown transmit power In: 2011 IEEE Int Conf.Acoust Speech Signal Process., pp 2480–2483
Trang 811 Pan, J.-S., Nguyen, T.-T., Chu, S.-C., Dao, T.-K., Ngo, T.-G.: Network, diversity enhanced ion motion optimization for localization in wireless sensor J Inf Hiding Multimed Signal Process 10, 221–229 (2019)
Trang 9and Feature Scoring Method
Tserenpurev Chuluunsaikhan
1 , Kwan-Hee Yoo
1 , HyungChul Rah
2 , and Aziz Nasridinov
1 (B)
1
Department of Computer Science, Chungbuk National University, Cheongju, South Korea
{teo,khyoo,aziz}@chungbuk.ac.kr
2
Department of Management Information System, Chungbuk National University, Cheongju,
South Korea hrah@chungbuk.ac.kr
Abstract A large amount of text data may hide a numeric connection related to some other subject, for example, price In this paper, we aimed to predict pork prices based on topic modeling and word scoring method This study consists of foursteps, suchas feature extraction, word scoring, featureselection, and pre-diction.Anypredictionmodelhas input/featuresandoutput.Weextractedour features from online news data using the topic modeling technique (LDA) Also,
we selected the daily pork price as the output After that, we created a word scoring corpus using the result of LDA and price movements Because of our features and output are numeric values, we applied the Pearson’s correlation as feature selec-tion To check our word scoring method, we built a prediction model of pork price using LSTM We evaluated the model without feature selection and with feature selection We used RMSE, MAE, and MAPE to measure our model accuracy The results show that our model can be used in the price prediction of pork and other agricultural commoditi es
Keywords:Price prediction · Topic modeling · Word scoring · LSTM
Agriculture is an importantsector that hasalways been with human growth It is the process of producing food by the cultivation of grain and the raising of domesticated animals (livestock) All the humans are participants of this process For example, they are governors, farmers, and consumers The main thing that connects them is the price
of agricultural commodities.High prices dobenefitfarmers, butnot for consumers Consumers always want low prices In this situation, governors attempt to keep the price
at the proper level Predicting the price will help governors make decisions in the future
If farmers know the price in advance, they also can regulate the production of agricultural commodities It helps them to avoid the loss of the economy
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2021
J.-S Pan et al (eds.), Advances in Intelligent Information Hiding and Multimedia
Signal Processing, Smart Innovation, Systems and Technologies 212,
Trang 10Many research works attempt to predict agricultural commodity prices [1 3] Notably,online text data(online news,Twitter, and others)make a significant con-tribution to predicting agricultural commodities prices [2,3] Because online text data could include reasons (words) for price changes So, we need to extract good keywords
to get good results of agricultural price prediction
Pork is one of the agricultural commodities that people mostly use in Korea The market supply and demand determine the price of pork like any other product Some unexpected and unplanned actions can change supply and demand For example, African SwineFeveris ahottopicof the pork market inKorea.During this disease,people decrease their usage of pork Because of t hat, the pork price is also fallen When demand
is decreasingcontinuously, the governors begin to take action to support consump-tion Consumers usually knowthis kind of information from news andregulate their consumption That is why we believe that news can affect the price of pork
Pri ce prediction is a process of trying to calculate the future price using inputs/features that can affect the price The inputs/features can be different values based
on their sector and goals In this paper, we proposed a pork price prediction model using topic modeling and word scoring method First, we applied a topic modeling technique
to obtain the most relevant words from online news data Also, we compared the price with rising and falling prices for the previous day Finally, we scored each word of topic modeling using the price changes We evaluated our model using the LSTM algorithm
To increase the accuracy of the model, we also applied a feature selection method We measuredouraccuracy usingroot mean squarederror (RMSE), meanabsoluteerror (MAE), and mean absolute percentage error (MAPE)
We explain our proposed method in this section The section consists of feature extraction (2.1), feature scoring (2.2), feature selection (2.3), and price prediction (2.4) Figure1
shows an overview of our purposed method The proposed method mainly consists of feature extraction, feature scoring, and feature selection Additionally, we applied some other data preprocessing techniques to increase the accuracy of our model We explain
it in detail in the fol lowing subsections
2.1 Feature Extraction
Feature extraction is the initial step of the methodology We collected online news data from PigTimes [4], which is a portal web that publishes news about pig Since the portal publishes news related to pig, we can assure that our dataset just related to pig, pork, pork farm, and pork market Topic modeling is an important technique of natural language processing (NLP) It extracts relevant topics from a large amount of text data There are many approaches to obtaining topics from text data For example, LDA is one of the popular topic modeling techniques Weextracted the input features of our prediction model using the LDA technique
Our output feature is the daily price of pork We collected the retail price of pork from KAMIS [5], which provides various information related to the distribution of agricultural