https://drive.google.com/file/d/1jv2U3tmJq1vUEez6nt6Cq8DzJWEWZu6-/view [CT3] Vu, Trung Kien and Le, Hung Lan 2018, "Gaussian Mixture Modeling for Wi-Fi Fingerprinting based Indoor Positi
Trang 1MINISTRY OF EDUCATION AND TRAINING MINISTRY OF SCIENCE AND TECHNOLOGY
NATIONAL CENTER FOR TECHNOLOGICAL PROGRESS
VU TRUNG KIEN
RESEARCH AND DEVELOPMENT FOR WI-FI BASED
INDOOR POSITIONING TECHNIQUE
SUMMARY OF DOCTORAL THESISField of study: Electronics Engineering
Code: 9520203
HA NOI - 2019
Trang 2The thesis is completed at:
National Center for Technological Progress
Supervisor: Prof., Dr Le Hung Lan
Reviewer 1: Assoc Prof., Dr Thai Quang Vinh
Reviewer 2: Assoc Prof., Dr Ha Hai Nam
Reviewer 3: Assoc Prof., Dr Hoang Van Phuc
The thesis shall be defended in front of the Thesis Committee at
Academy Level at National Center for Technological Progress
At hour date month year 2019
The thesis can be found at: The Library of National Center for Technological Progress; The National Library
Trang 3LIST OF WORKS RELATED TO THE THESIS
HAS BEEN PUBLISHED [CT1] Hoang Manh Kha, Duong Thi Hang, Vu Trung Kien, Trinh Anh
Vu (2017), Enhancing WiFi based Indoor Positioning by Modeling measurement Data with GMM, IEEE International Conference on Advanced Technologies for Communications, IEEE, Quy Nhon, Vietnam, pp 325-328
[CT2] Vu, T.K., Hoang, M.K., and Le, H.L (2018), "WLAN
Fingerprinting based Indoor Positioning in the Precence of Dropped Mixture Data", Journal of Military Science and Technology 57A(3), pp 25-34
https://drive.google.com/file/d/1jv2U3tmJq1vUEez6nt6Cq8DzJWEWZu6-/view
[CT3] Vu, Trung Kien and Le, Hung Lan (2018), "Gaussian Mixture
Modeling for Wi-Fi Fingerprinting based Indoor Positioning in the Presence of Censored Data", Vietnam Journal of Science, Technology and Engineering 61(1), pp 3-8,
DOI: https://doi.org/10.31276/VJSTE.61(1).03-08
[CT4](ISI-Q2) Vu, Trung Kien, Hoang, Manh Kha, and Le, Hung Lan
(2019), "An EM algorithm for GMM parameter estimation in the presence of censored and dropped data with potential application for indoor positioning", ICT Express, 5(2), pp 120-123,
DOI: 10.1016/j.icte.2018.08.001
Accepted paper:
[CT5](ISI-Q3) Vu, Trung Kien, Hoang, Manh Kha, and Le, Hung Lan
(2019), “Performance Enhancement of Wi-Fi Fingerprinting based IPS by Accurate Parameter Estimation of Censored and Dropped Data”, Radioengineering, ISSN: 1805-9600 Submission: 06/04/2019, Reviews Opened: 27/05/2019, Accepted: 03/09/2019
Trang 4INTRODUCTION
1 The necessity of the thesis
Satellite based positioning systems such as the GPS (Global Positioning System) can accurately locate objects in outdoor environments However, in indoor environments, because satellite signals are not transmitted directly to the positioning device, the accuracy of these systems is greatly reduced On the other hand, there are more and more indoor navigation needs, such as positioning for smartphone users to move in terminals, airports, and commercial centers; locating for goods in stock; positioning for cars in the parking lots For these reasons, in recent years, the IPS (Indoor Positioning System) is interested in research and development
Among the current indoor positioning technologies, Wi-Fi based positioning technology in the WLAN (Wireless Local Area Network) is most commonly used due to some reasons such as: Wi-Fi is available at most areas, popular mobile devices such as phones and computers are equipped with Wi-Fi signal transceivers
According to the above reasons, the author has chosen the topic:
"Research and development for Wi-Fi based indoor positioning techniques", which delves into the research of RSSIF-IPT (Received Signal Strength Indication Fingerprinting based Indoor Positioning Technique)
2 Scope of the study
Researching techniques for positioning the static objects in dimensional space in indoor environments Positioning technique focused on research is RSSIF-IPT The studied issues include: Characteristics of Wi-Fi RSSI; modelling the distribution of Wi-Fi RSSI; algorithm to estimate parameters, optimize the parameters of the model used to model the distribution of Wi-Fi RSSI; online positioning algorithm
Trang 52-3 Research objectives of the topic
Researching and developing the Wi-Fi RSSI fingerprinting based indoor positioning technique in order to minimize positioning errors and optimize positioning time The detailed research objectives are as follow:
+ Developing algorithms to estimate the parameters and number of Gaussian components in GMM (Gaussian Mixture Model) in the presence of unobservable data;
+ Developing a positioning algorithm for minimizing positioning errors and optimizing positioning time;
4 Methods
Statistical method for conducting the characteristics of collected data (Wi-Fi RSSI); analytical method for developing parameter estimation algorithms and positioning algorithms; Monte Carlo method for evaluating proposed algorithms; empirical methods on both simulation data and real data to verify the effectiveness of the proposals applied to IPS
5 New findings of the doctoral dissertation
- The parameter estimation algorithm for GMM in the presence of censored and dropped mixture data [CT2-CT4];
- The model selection algorithm for GMM from incomplete data [CT5];
- The positioning procedure in the presence of unobservable data [CT5]
6 Organization of dissertation
The thesis will be divided into 4 chapters: Chapter 1: Overview of
Wi-Fi based IPS Chapter 2: GMM parameter estimation in the presence of censored and dropped data Chapter 3: GMM model selection in the presence of censored and dropped data Chapter 4: Positioning algorithm and experimental results
Trang 6CHAPTER 1 OVERVIEW OF WI-FI BASED IPS
1.1 Wi-Fi based indoor positioning techniques
Wi-Fi based indoor positioning techniques (IPT) can be divided into two main groups:
- Time and Space Attributes of Received Signal (TSARS) based IPTs TSARS can be the Time of Arrival (ToA); the Time Difference of Arrival (TDoA) or the Angle of Arrival (AoA)
- RSSI based IPTs This group includes the proximity positioning technique; Path Loss Model (PLM) based positioning technique and RSSIF-IPT
RSSIF-IPT consists of two phases: the offline training phase and the online positioning phase During the training phase, RSSIs were collected at the reference points (RP) to build the database At the online positioning stage, the RSSIs collected by the object (OB) are compared with the database, thereby estimating the position of the OB through the location of one or several RPs Among the positioning techniques, RSSIF-IPT has the most advantages
RSSIF-IPT can be utilized the deterministic method (D-RSSIF-IPT) or probability method (P-RSSIF-IPT) Compared with D-RSSIF-IPT, P-RSSIF-IPT has lower positioning error because the database of this method can cover the variation of RSSI P-RSSIF-IPT can use non-parametric model (e.g histogram) or parametric model (e.g Gaussian process, GMM) to model the distribution of Wi-Fi RSSIs P-RSSIF-IPT using a parametric model has lower positioning errors; the database has
to store fewer parameters than P-RSSIF-IPT using a non-parametric model
1.2 Theoretical studies about the available RSSIF-IPT
The distribution of Wi-Fi RSSIs can be fitted by the Gaussian process
or the GMM if data was collected under the changing conditions (e.g
Trang 7door opening or closing, the moving of commuters) Therefore, compared to Gaussian process, GMM can model Wi-Fi RSSI distribution more accurately
However, some data samples may not be observable due to either of the following reasons:
- Censoring, i.e., clipping This problem refers to the fact that sensors are unable to measure RSSI values below some threshold, such as −100 dBm
- Dropping It means that occasionally RSSI measurements of access points are not available, although their value is clearly above the censoring threshold
While censoring occurs due to the limited sensitivity of Wi-Fi sensors
on portable devices, dropping comes from the limitation of sensor drivers and the operation of WLAN system
According to our data investigation, the data set (Wi-Fi RSSIs) collected at an RP, from an AP has the characteristics corresponding to one of the following eight cases:
(1) The distribution of data can be drawn from one Gaussian component, data set are observable;
(2) The distribution of data can be drawn from one Gaussian component, a part of data set are unobservable due to censoring problem;
(3) The distribution of data can be drawn from one Gaussian component, a part of data set are unobservable due to dropping problem;
(4) The distribution of data can be drawn from one Gaussian component, a part of data set are unobservable due to censoring and dropping problems;
(5) The distribution of data can be drawn from more than one Gaussian component, data set are observable;
Trang 8(6) The distribution of data can be drawn from more than one Gaussian component, a part of data set are unobservable due to censoring problem (figure 1.10a);
(7) The distribution of data can be drawn from more than one Gaussian component, a part of data set are unobservable due to dropping problem (figure 1.10b);
(8) The distribution of data can be drawn from more than one Gaussian component, a part of data set is unobservable due to censoring and dropping problems (figure 1.10c)
Figure 1.10 Histogram of Wi-Fi RSSIs The authors in published articles solved the data set with characteristics such as (1) - (5) However, no studies have been able to solve the data set with the same characteristics as the cases (6) - (8) For this reason, the thesis focuses on researching and proposing solutions to develop RSSIF-IPT to simultaneously solve the problems of censoring, dropping and multi-component problems (cases (6) - (8))
1.3 Conclusion of chapter 1
In this chapter, the thesis presents available Wi-Fi based indoor positioning techniques Chapter 1 also summarizes and analyzes related works on RSSIF-IPT According to related works and the issues that have not been solved for RSSIF-IPT, the thesis proposes scientific research goals
Trang 9CHAPTER 2 GMM PARAMETER ESTIMATION IN THE PRESENCE OF CENSORED AND DROPPED DATA
2.1 Motivation
In indoor environment, data set (Wi-Fi RSSIs) collected at a RP from
an AP can be modeled by the GMM with J Gaussian components (J is a finite number) Let yn is RSSI value gathered at n time, (th yn ,
1
n N), N is the number of measurements yn are independent and identically distributed random variables In a GMM, the PDF (Probability Density Function) of an observation yn is:
Trang 10If data are unobservable owing to the censoring and dropping problems then:
2.2 Introduction to the EM algorithm
The EM (Expectation Maximization) algorithm is an iterative method for ML (Maximum Likelihood) estimation of parameters of statistical models in the presence of hidden variables This method can be used to estimate the parameters of a GMM, including two steps:
- E-step: Creates a function for the expectation of the likelihood evaluated using the current estimate for the parameters
- M-step: computes parameters maximizing the expected log-likelihood found on the E-step
log-2.3 GMM parameter estimation in the presence of censored data The EM algorithm for GMM parameters estimation in the presence of censored data (EM-C-GMM) [CT3] is developed as follows:
Let Δnj(n 1 N j, 1 J) be the latent variables, Δnj 1 if yn belongs
to jthGaussian component, Δnj 0 if yn does not belong to jthGaussian component The expectation of log-likelihood function (LLF) of y given
by observations ( ) x and old estimated parameters are calculated:
Trang 11Function Q ;Θ Θ( ) k was calculated for two case including xn yn and
( ) ( ) 1
( ( )
1 0) 0
Iβ
Trang 12j k
k j
(2.27)
2.4 GMM parameter estimation in the presence of dropped data The EM algorithm for GMM parameters estimation in the presence of dropped data (EM-D-GMM) [CT2] is developed as follows:
E-step:
Trang 13
( ) 1
) (
j
k j
(2.31)
2 ( ( )
( 1) 2
( )
) 1
n j k
E-step:
Trang 14( ) ( )
0 1
( ) ( 1)
j k
n n
xx
x
vv
ΘΘ
Θ
1 ( )
N n n
(2.54)
Trang 15
( ) ( ) ( ) ( ) ( 1)
N
vN
Θ
(2.56)
As can be seen in equations (2.53) - (2.56), collected data, including observable, censored and dropped samples are contributed to the estimate, simultaneously This means the proposed EM algorithm can deal with all the mentioned phenomena presented in collected data
2.6 Evaluation of the EM-CD-GMM
In this section, the proposed EM-CD-GMM was evaluated and compared to other EM algorithms by using Kullback Leibler Divergence (KLD) After 1000 experiments, the mean of KLD (KLD) is shown in table 2.1 and standard deviation of KLD (KLD) is shown in table 2.2 (when c= – 90dBm)
Table 2.1 KLD of the EM algorithms after 1000 experiments
Trang 16As can be seen in table 2.1 and table 2.2:
- When 0 and c 96, data are almost observable The EM-GMM and the EM-CD-GMM introduced the same results The EM-CD-G has
a larger error due to the fact that this algorithm assumed the distribution
of data by the Gaussian process
- For other cases, KLDand KLDof the EM-CD-GMM are always the smallest Hence, EM-CD-GMM is the most effective algorithm for GMM parameter estimation in the presence of censored and dropped data
Trang 17CHAPTER 3 GMM MODEL SELECTION IN THE PRESENCE
OF CENSORED AND DROPPED DATA 3.1 Motivation
In the complex indoor environments, the histogram of collected Wi-Fi RSSIs can be drawn from one or more than one Gaussian components If using GMM with J Gaussian components, the number of parameters of GMM will be NPs = 3J-1 This means that the number of parameters to store in the database and the computational cost of positioning algorithms are proportional to the number of Gaussian components used
to describe the distribution of Wi-Fi RSSIs Therefore, it is necessary to have a solution to estimate the number of Gaussian components in GMM to optimize the database and reduce the complexity of the calculations in the positioning algorithm of the IPS
3.2 Methods for GMM model selection
3.2.1 Penalty Function (PF) based methods
Let x be the mixture and observable data set; N is the number of samples in x; ˆΘJis the set of parameters of GMM with J Gaussian components; NPsis the number of parameters of GMM; (Θ xˆ J | ) is the likelihood function PF of Akaike Information Criterion (AIC), AIC3 and Bayesian Information Criterion (BIC) were defined as follows:
3.2.2 Characteristic Function (CF) based methods
The CF based method uses the convergence of the Sum of Weighted Real parts of all Log-Characteristic Functions (SWRLCF) to determine the number of Gaussian components, is as follows: