1. Trang chủ
  2. » Công Nghệ Thông Tin

Nghiên cứu và phát triển các kỹ thuật định vị và định danh kết hợp thông tin hình ảnh và WiFi

174 907 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 174
Dung lượng 15,91 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Modern technology is changing human life in many ways, notably in the way people interact with technological products. Human-computer interaction becomes more and more natural and convenient and this makes our life more enjoyable and comfortable. A new concept was formed for this revolutionary change that is Ambient Intelligent (AmI). Ambient Intelligence (AmI) has become an active area with many related projects, research programs and potential applications in di erent domains, such as home, oce, education, entertainment, health care, emergency services, logistic, transportation, security and surveillance [135], [116], [164], [67], [62], [43], [123]. A common viewpoint shared by many authors [3], [61], [14], [40] is that AmI refers to digital environments in which the environmental information (temperature, humidity..) and the human presence are perceived automatically by sensors and devices interconnected through networks. Three main features of Perceiving, Modeling and Acting are required for an AmI system [40]. Perceiving is also considered as the problem of context awareness, in which humanity with their attributes are the center of perception. Modeling relates to feature extraction and building discriminate descriptor for each object. Finally, Acting speci es the response of the environment to the people inhabiting in it by providing adaptive and user-transparent services. Although the vision of AmI was introduced more than ten years ago and its research has strengthened and expanded, the development and implementation of the real-life applications are still in infancy. There are many practical challenges that need to be addressed in each of the contributing technological areas or particular applications [2].

Trang 1

MINISTRY OF EDUCATION AND TRAININGHANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

THUY PHAM THI THANH

NGHIÊN CỨU VÀ PHÁT TRIỂN CÁC KỸ THUẬT ĐỊNH VỊ VÀ ĐỊNH DANH KẾT HỢP THÔNG TIN

HÌNH ẢNH VÀ WIFI

PERSON LOCALIZATION AND IDENTIFICATION

BY FUSION OF VISION AND WIFI

DOCTORAL THESIS OF COMPUTER SCIENCE

Hanoi − 2017

Trang 2

1.1 WiFi-based localization 14

1.2 Vision-based person localization 18

1.2.1 Human detection 19

1.2.1.1 Motion-based detection 19

1.2.1.2 Classifier-based detection 21

1.2.2 Human tracking 22

1.2.3 Human localization 23

1.3 Person localization based on fusion of WiFi and visual properties 24

1.4 Vision-based person re-identification 26

1.5 Conclusion 29

2 WIFI-BASED PERSON LOCALIZATION 30 2.1 Framework 30

2.2 Probabilistic propagation model 32

2.2.1 Parameter estimation 33

2.2.2 Reduction of Algorithm Complexity 34

2.3 Fingerprinting database and KNN matching 35

2.4 Experimental results 38

2.4.1 Testing environment and data collection 38

2.4.2 Experiments for propagation model 40

2.4.3 Localization experiments 43

2.4.3.1 Evaluation metrics 43

2.4.3.2 Experimental results 43

2.5 Conclusion 49

Trang 3

3 VISION-BASED PERSON LOCALIZATION 50

3.1 Introduction 50

3.2 Experimental datasets 53

3.3 Shadow Removal 56

3.3.1 Chromaticity-based feature extraction and shadow-matching score calculation 57

3.3.2 Shadow-matching score utilizing physical properties 60

3.3.3 Density-based score fusion scheme 62

3.3.4 Experimental Evaluation 65

3.4 Human detection 67

3.4.1 Fusion of background subtraction and HOG-SVM 67

3.4.2 Experimental evaluation 69

3.4.2.1 Dataset and evaluation metrics 69

3.4.2.2 Experimental results 70

3.5 Person tracking and localization 72

3.5.1 Kalman filter 72

3.5.2 Person tracking and data association 73

3.5.3 Person localization and linking trajectories in camera network 80 3.5.3.1 Person localization 80

3.5.3.2 Linking person’s trajectories in camera network 82

3.5.4 Experimental evaluation 84

3.5.4.1 Initial values 84

3.5.4.2 Evaluation metrics for person tracking in one camera FOV 85

3.5.4.3 Experimental results 87

3.6 Conclusion 90

4 PERSON IDENTIFICATION AND RE-IDENTIFICATION IN A CAMERA NETWORK 92 4.1 Face recognition system 93

4.1.1 Framework 93

4.1.2 Experimental evaluation 96

4.1.2.1 Testing scenarios 96

4.1.2.2 Measurements 96

4.1.2.3 Testing data and results 96

4.2 Appearance-based person re-identification 97

4.2.1 Framework 97

4.2.2 Improved kernel descriptor for human appearance 98

4.2.3 Experimental results 102

Trang 4

4.2.3.1 Testing datasets 102

4.2.3.2 Results and discussion 104

4.3 Conclusion 115

5 FUSION OF WIFI AND CAMERA FOR PERSON LOCALIZA-TION AND IDENTIFICALOCALIZA-TION 117 5.1 Fusion framework and algorithm 118

5.1.1 Framework 118

5.1.2 Fusion method 120

5.1.2.1 Kalman filter 121

5.1.2.2 Optimal Assignment 123

5.2 Dataset and Evaluation 124

5.2.1 Testing dataset 124

5.2.2 Experimental results 128

5.2.2.1 Experimental results on script 1 data 128

5.2.2.2 Experimental results on script 2 data 129

5.3 Conclusion 132

Trang 5

TT Abbreviation Meaning

1 AHPE Asymmetry based Histogram Plus Epitome

3 ANN Artificial Neural Network

16 JPDAF Joint Probability Data Association Filtering

18 GLOH Gradient Location and Orientation Histogram

19 GLONASS Global Navigation Satellite System

21 GMOTA Global Multiple Object Tracking Accuracy

22 GNSS Global Navigation Satellite Systems

23 GPS Global Positioning System

24 HOG Histogram of Oriented Gradient

Trang 6

31 LBP Local Binary Pattern

32 LBPH Local Binary Pattern Histogram

33 LDA Linear Discriminant Analysis

34 LMNR Large Margin Nearest Neighbor

39 MHT Multiple Hypothesis Tracking

40 MOTA Multiple Object Tracking Accuracy

42 MOTP Multiple Object Tracking Precision

43 MSCR Maximally Stable Colour Regions

44 NLoS None-Line-of-Sight

45 PCA Principal Component Analysis

46 PDF Probability Distribution Function

47 PLS Partial Least Squares

48 PNG Portable Network Graphics

49 PPM Probabilistic Propagation Model

50 RBF Radial Basis Function

51 RDC Relative Distance Comparison

52 Re-ID Re-Identification

53 RFID Radio Frequency Identification

56 RSS Received Signal Strength

57 RSSI Received Signal Strength Indication

59 SDALF Symmetry Driven Accumulation of Local Features

60 SIFT Scale Invariant Feature Transform

61 SKMGM Spatial Kinetic Mixture of Gaussian Model

62 SLAM Simultaneous Localization and Mapping

63 SMP Stable Marriage Problem

64 SPOT Structure Preserving Object Tracker

Trang 7

66 STGMM Spatio Temporal Gaussian Mixture Model

67 STL Standard Template Library

68 SURF Speeded Up Robust Features

70 SVR Support Vector Regression

71 TAPPMOG Time Adaptive Per Pixel Mixtures of Gaussians

72 TDoA Time Difference of Arrival

73 TLGMM Two-Layer Gaussian Mixture Model

79 WLAN Wireless Local Area Network

80 WPS WiFi Positioning System

Trang 8

LIST OF TABLES

Table 2.1 Genetic algorithm configuration 40Table 2.2 Optimized system parameters for the first and the second scenarios

of testing environments 41Table 2.3 Evaluations for the first scenario with distance and RSSI features 45Table 2.4 Localization results for the first scenario using different features

of distance and RSSI, without using coefficient λ 45Table 2.5 Evaluations for the second scenario with distance and RSSI features 47

Table 3.1 Performance of human detectors with HOG-SVM method, bination of HOG-SVM and Adaptive GMM with and without shadowremoval (SR) on MICA2 dataset 71Table 3.2 Reference points on image and floor plan coordinate systems 86Table 3.3 Evaluations for homography transformation 88Table 3.4 Testing results of person tracking and localization in Cam1’s FOV 88Table 3.5 Testing results of person tracking and localization in Cam2’s FOV 89Table 3.6 Testing results of person tracking and localization in Cam4’s FOV 90

com-Table 4.1 Comparative face recognition results and time consuming for galleryand probe sets 97Table 4.2 Datasets for person re-identification testing In the last column,the number of sign (√

) shows the ranking for intra-class variation of thedatasets 104Table 4.3 The comparative evaluations of Person Re-ID on HDA dataset 111Table 4.4 The testing results on iLIDS VID dataset for the proposed method,original KDES and the method in [157] 112

Trang 9

Table 4.5 The comparative evaluations on Rank 1 (%) for person Re-ID with different methods and datasets (The sign ”×” indicates no information available For iLIDS dataset, there are two data settings as described in

[19] and in [10]) 112

Table 5.1 The comparative results of the proposed fusion algorithm against the evaluations in chapter 4 with testing data of script 1 129

Table 5.2 The experimental results for person tracking by identification and person Re-ID with the second dataset 131

Table A.1 Technical information of WiFi-based localization system 154

Table A.2 Technical information of vision-based localization system 156

Table A.3 Technical information of fusion-based localization system 157

Trang 10

LIST OF FIGURES

Figure 1 Person surveillance context in indoor environment 4

Figure 2 Multimodal localization system fusing WiFi signals and images 4 Figure 3 Surveillance region with WiFi range covers disjoint camera FOVs 6 Figure 4 Framework for person localization and identification by fusion of WiFi and camera 11

Figure 1.1 Flowchart of WiFi-based person localization 14

Figure 1.2 The angle-based positioning technique using AoA 15

Figure 1.3 The position of a mobile client is determined by (a) the inter-section of three circle (circular trilateration) with the radius for each is the distance di and (b) the intersection of two hyperbolas (hyperbolic lateration) 16

Figure 1.4 Framework of person localization system using fixed cameras 19

Figure 1.5 Human detection in an image [149] 19

Figure 1.6 Human detection results with shadows (the images on the left column) and without shadows (the images on the right column) [134] 20

Figure 1.7 Human tracking and Re-ID results in two different cameras [22] 22 Figure 1.8 Camera-based localization system in [154]: (a) Original frame, (b) Foreground segmentation by MOG, (c) Extraction of foot region from (b), (d) Gaussian kernel of (c) is mapped on floor plan 24

Figure 2.1 Diagram of the proposed WiFi-based object localization system 31 Figure 2.2 An example of radio map with a set of pi RPs and the distance values di(L) from each RP to L APs 31

Figure 2.3 WiFi signal attenuation through walls/floors 33

Figure 2.4 Optimization of system parameters using GAs 34

Figure 2.5 Weights of different values of θ based on dissimilarity 37

Figure 2.6 Weights of different values of λ based on dissimilarity 38

Trang 11

Figure 2.7 Distribution of APs in the first scenario of testing environment 39Figure 2.8 Ground plan of the second floor in the second testing scenario 39Figure 2.9 Radio map (a) with (b) 2000 fingerprint locations collected onthe 8th floor in the first testing scenario 39Figure 2.10 Radio map (a) with (b) 1200 fingerprint locations collected onthe 2nd floor in the second testing scenario 40Figure 2.11 Deterministic propagation model compared to measurements inthe first scenario 41Figure 2.12 Deterministic propagation model compared to measurements inthe second scenario 42Figure 2.13 Probabilistic propagation model for the first scenario 42Figure 2.14 Probabilistic propagation model for the second scenario 43Figure 2.15 Localization results for the first scenario, with distance and RSSIfeatures 44Figure 2.16 Distribution of localization error for distance and RSSI features

in the first scenario 44Figure 2.17 Localization reliability for distance and RSSI features in the firstscenario 45Figure 2.18 Localization results for distance and RSSI features, without usingcoefficient λ 46Figure 2.19 Distribution of localization error for distance and RSSI features,without using coefficient λ 46Figure 2.20 Localization reliability for distance and RSSI features, withoutusing coefficient λ 47Figure 2.21 Localization results for the second scenario, with distance andRSSI features 47Figure 2.22 Distribution of localization error for distance and RSSI features

in the second scenario 48Figure 2.23 Localization reliability for distance and RSSI features in the sec-ond scenario 48

Trang 12

Figure 3.1 Framework of person localization in camera networks 50Figure 3.2 Examples of tracking lines which are formed by linking trajecto-ries of corresponding FootPoint positions 51Figure 3.3 Testing environment 53Figure 3.4 Examples in the MICA1 dataset The images on the top arecaptured from the camera at check-in region and used for training phase.The images at the bottom are the testing images which acquired from 4other cameras (Cam1, Cam2, Cam3, Cam4) in surveillance region 54Figure 3.5 Examples of manually-extracted human ROIs from Cam2 55Figure 3.6 Examples of manually-cropped human ROIs from Cam1 and Cam4 55Figure 3.7 Framework of the proposed shadow removal method 56Figure 3.8 Extracting shadow pixels: (a) original frame, (b) foregroundmask obtained with adaptive GMM, (c) frame superimposed on fore-ground mask, (d) and (e) are object and shadow pixels labeled manuallyfrom (c), respectively 57Figure 3.9 Example using chromatic-based features for shadow removal:background image at (a) H, (b) S and (c) V channels (d) ∆(H) and(e) ∆(S) of shadow pixels This example uses the same frame as in Fig.3.8 58Figure 3.10 The results of GMM fitting algorithm with K = 3 to distribution

of the chromaticity-based features (a) original distribution of featurevectors x; (b) isocurves of each mixture component k (c) density mix-ture components 60Figure 3.11 The results of shadow score calculated on an example image (a)original image; (b) background subtraction result; (c) shadow matchingscore s1 calculated using GMM fitting results in Fig 3.10 with chro-maticity features; (d) shadow matching score s2 calculated with physicalfeatures 60Figure 3.12 Physical shadow model [76], with shadow pixels fall into the grayarea The physics-based feature vector for a shadow pixel p contains twocomponents: α(p) (length of vector SD ) and θ(p) 61Figure 3.13 (a) Physical shadow model [76], and examples of (b) originalimage, (c) log-scale of θ(p) property, (d) log-scale of α(p) property 61

Trang 13

Figure 3.14 Illustration of log-likelihood calculated from learning (a) based score, (b) physics-based score, and (c) score fusion of shadow andnon-shadow pixels (d) Visualization of s = (s1, s2) for shadow pixels(in blue dots) and non-shadow pixels (in red dots) 63Figure 3.15 Illustration of shadow removal results using score fusion scheme.The first row contains original frames The second row shows the fore-ground masks with background subtraction (BGS) The foreground maskswith shadow removal are presented in the third row The fourth row in-dicates the shadow pixels detected by the proposed method, and thefinal row is ground truth data 64Figure 3.16 The evaluations for shadow removal from the proposed methodand other methods in [134] 65Figure 3.17 Examples of shadow removal results: The original frames are onthe first row; the second row: chromaticity-based method; the third row:physical method; the fourth row: geometry-based method; the fifth andthe sixth row: texture-based method; The last one is our method 66Figure 3.18 Examples of training data for HOG descriptor Positive andnegative training images captured by (a) Camera 1 (Cam1), (b) Camera

chromaticity-2 (Camchromaticity-2) and (c) Camera 3 (Cam3) 70Figure 3.19 Examples of (a) false positive (F P ) and (b) false negative (F N )

in HOG-SVM detector 71Figure 3.20 The Kalman recursive model 72Figure 3.21 Example of (a) grid map and (b) a threshold region bounded by

a contour line 74Figure 3.22 Examples of noises in detection: (a) equal number of real targetsand detections, but not all of the detections are true ones (b) thenumber of real targets is larger than the detections (c) the number ofreal targets is smaller than the detections 77Figure 3.23 Example of Hungarian algorithm 79Figure 3.24 Camera pinhole model 80Figure 3.25 Examples of original frames (on the top row) and frames withcorrected distortion (on the bottom row) 82Figure 3.26 The flowchart for wrapping camera FOVs 83

Trang 14

Figure 3.27 Four marked points and the detected points on the floor plan resulted from inversed transformation of matrix H are shown on the top The bottom images are bird-eye view projection of Cam1 and Cam3 84 Figure 3.28 The matching points on the floor plan between two images

cap-tured from Cam1 and Cam3 84

Figure 3.29 Flowchart of linking user trajectories 85

Figure 3.30 Image points marked on the frames captured from (a) camera 1 (Cam1), (b) camera 2 (Cam2) and (c) camera 4 (Cam4) These points are used for calculating the camera extrinsic parameters 86

Figure 3.31 Floor map with a 2D coordinate system 87

Figure 3.32 Examples of frame sequences in MICA2 dataset with (a) hallway scenario captured from Cam1, (b) lobby scenario captured from Cam2 and (c) showroom scenario captured from Cam4 They are used for evaluations of person tracking and localization 89

Figure 3.33 Examples of person tracking results in Cam1 FOV (Hallway scene) 89 Figure 3.34 Examples of person tracing results in Cam2 FOV (Lobby scene) 90 Figure 3.35 Examples of person tracking results in Cam4 FOV (Showroom scene) 90

Figure 4.1 Framework of human face recognition 93

Figure 4.2 Face detection result represented by a rectangle region 93

Figure 4.3 Example of LBP computation [75] 94

Figure 4.4 LBP images with different gray-scale transformations 95

Figure 4.5 Face description with LBPH 95

Figure 4.6 Examples in training database with (a) face images of 20 subjects, (b) images of one subject 97

Figure 4.7 Diagram of vision-based person Re-ID system 98

Figure 4.8 The basic idea of representation based on kernel methods 98

Figure 4.9 Illustration of size-adaptive patches (a, c) and size-fixed patches (a, b) which is mentioned in [25] 100

Trang 15

Figure 4.10 Image-level feature vector concatenated by feature vectors ofblocks in the pyramid layers 101Figure 4.11 Examples of testing images which are detected automatically inthe MICA2 dataset The first and the second rows contain the humanROIs with and without shadow removal, respectively 103Figure 4.12 Results of proposed method against AHPE [19] and KDES [25]

on (a) CAVIAR4REID dataset 105Figure 4.13 Results of proposed method against AHPE [19], SDALF [54] andKDES [25] on iLIDS dataset 105Figure 4.14 Comparative results with reported methods in [10] on iLIDs dataset.106Figure 4.15 Testing results on our MICA1 dataset 107Figure 4.16 The person Re-ID evaluation of proposed KDES descriptor againstoriginal KDES and PLS [138] on ETHZ dataset with (a) Sequence 1, (b)Sequence 2 and (c) Sequence 3 108Figure 4.17 The person Re-ID evaluations of proposed KDES descriptor com-pared with the original KDES method on WARD dataset 109Figure 4.18 The person Re-ID evaluations of proposed KDES descriptor com-pared with the original KDES method on RAiD dataset 110Figure 4.19 The recognition rates of the proposed KDES on HDA dataset 110Figure 4.20 Recognition rates of the proposed KDES on MICA2 dataset withmanually-cropped human ROIs, automatically-detected human ROIswith and without shadow removal 113Figure 4.21 Examples of person Re-ID results on MICA2 dataset The firstcolumn is the frames captured from Cam1 and Cam4 The second col-umn contains human ROIs extracted manually from these frames Thehuman ROIs detected automatically with and without shadow removalare shown in the third and the fourth columns, respectively The IDlabels are put on the top of these human ROIs, and at the bottom, thefilled circles and squares indicate the correct and incorrect results ofperson Re-ID, respectively 114

Figure 5.1 The different types of sensor combination for person localization.(a) late fusion, (b) early fusion, (c) trigger fusion 119

Trang 16

Figure 5.2 Framework for person localization and Re-ID using the combinedsystem of WiFi and camera 119Figure 5.3 Flowchart of fusion algorithm 120Figure 5.4 A 2D floor map of the testing environment in Figure 5.5, withthe routing path of moving people in testing scenarios 125Figure 5.5 Testing environment 126Figure 5.6 The visual examples in script 2 The first row contains framesfor the scenario of one moving person The scenarios for two, three andfive moving people are shown in the second, third and fourth rows 127Figure 5.7 Training examples of manually-extracted human ROIs from Cam

2 for person 1 (images on the left) and person 2 (images on the right) 129Figure 5.8 Testing examples of manually-extracted human ROIs from Cam

1 (images on the left column) and Cam 4 (images on the right column)for (a) person 1 and (b) person 2 130Figure 5.9 Person Re-ID evaluations on testing data of two moving people 131

Trang 17

Motivation

Modern technology is changing human life in many ways, notably in the way peopleinteract with technological products Human-computer interaction becomes more andmore natural and convenient and this makes our life more enjoyable and comfortable

A new concept was formed for this revolutionary change that is Ambient Intelligent(AmI)

Ambient Intelligence (AmI) has become an active area with many related projects,research programs and potential applications in different domains, such as home, office,education, entertainment, health care, emergency services, logistic, transportation, se-curity and surveillance [135], [116], [164], [67], [62], [43], [123] A common viewpointshared by many authors [3], [61], [14], [40] is that AmI refers to digital environments

in which the environmental information (temperature, humidity ) and the humanpresence are perceived automatically by sensors and devices interconnected throughnetworks Three main features of Perceiving, Modeling and Acting are required for anAmI system [40] Perceiving is also considered as the problem of context awareness, inwhich humanity with their attributes are the center of perception Modeling relates tofeature extraction and building discriminate descriptor for each object Finally, Actingspecifies the response of the environment to the people inhabiting in it by providingadaptive and user-transparent services

Although the vision of AmI was introduced more than ten years ago and its search has strengthened and expanded, the development and implementation of thereal-life applications are still in infancy There are many practical challenges that need

re-to be addressed in each of the contributing technological areas or particular applications[2]

In this research, the information of person position and identity are considered inindoor environments They are two of the most crucial attributes for ambient environ-ments In order to determine position (where a person is) and identity (who a personis) in indoor environments, two problems of person localization and identification need

to be solved A wide range of sensors can be used to handle these problems, such asUltra-Wideband (UWB), ultrasound, Radio-Frequency Identification (RFID), camera,WiFi, etc [101] UWB is especially useful for indoor environments where multipath issevere, but it is not widely used because of the requirement for dedicated transmit-ters and receiver infrastructure Ultrasound-based systems are able to locate objects

Trang 18

within a few centimeters, but remain prohibitively expensive and rely on large amount

of fixed infrastructure Such infrastructure is not only labor intensive to install, butalso expensive to maintain RFID allows to identify and wirelessly transmit the iden-tity of a person or an object via radio waves by a unique RFID tag The performance

of RFID-based localization outperforms other technologies but the deployment is pensive and the positioning range is limited Camera has been becoming a dominanttechnology for person localization and identification thank to the improvement andminiaturization of actuators (e.g., lasers) and particularly advancement in the technol-ogy of detectors (e.g., CCD sensors) However, deployment is limited because of theexorbitant cost of the solution (both in terms of licensing and processing requirements)and the effectiveness of the image processing algorithms themselves in solving real-world dynamic situations Wi-Fi positioning system (WPS) is a suitable alternationfor GPS and GLONASS in indoor environments as this technology is inadequate due

ex-to various causes, including multi-path and signal blockage indoors Moreover, WiFipositioning takes advantage of the rapid growth of wireless access points in buildingareas and wireless-enabled smart mobile devices However, the positioning accuracyfor WiFi signal based is lower than vision-based positioning systems Indoor envi-ronments are particularly challenging for WiFi-based positioning for several reasons:multi-path propagation, Non-Line-of-Sight (NLoS) conditions, high attenuation andsignal scattering due to high density of obstacles, etc

It has by now become apparent that no overall solution based on a single ogy is perfect for all applications Therefore, besides developing optimal algorithms foreach technology, fusion of their data is a new trend in solving the problem of personlocalization and identification in indoor environments [121], [12], [98], [104] The mainpropose of the fusion is to retain the benefits of each individual sensor technology,whilst at the same time mitigating their weaknesses Being motivated by this, our re-search focuses on person localization and identification by combining WiFi-based andvision-based technologies This combination offers the following benefits in comparisonwith each single method:

technol-ˆ A coarse-to-fine localization system can be set up The coarse level of positioning

is established for WiFi system, and based on this the fine positioning processesare done at the cameras which are in the range of WiFi-based localization Thecoarse-to-fine localization system allows to continuously localize people with asparse camera network, and offers lower cost for system deployment and compu-tation, with cameras are deployed in the regions which require high positioningaccuracy

ˆ Easy scalability of coverage area by simply deploying more APs (Access Point)

in the environment

Trang 19

ˆ Richer information for person identification and re-identification (Re-ID) Oneobject can be identified by both WiFi and camera systems.

Objective

The thesis focuses on researching and developing solutions for person localizationand identification which are considered under the context of automatic person surveil-lance in indoor environments by using WiFi and camera systems For this, the concreteobjectives are:

ˆ Constructing an improved method for WiFi-based localization The method lows to to use popular WiFi-enable devices, such as smart phones or tablets forlocalization These kinds of devices are not originally produced for localization,but RSSI values scanned by them from nearby APs are popularly used for lo-calization The proposed method can overcome some of unstable characteristics

al-of RSSI values which are used for localization in indoor environments It alsogrants coarse-level localization in the combined system of WiFi and camera Theperformance criterion set up in this thesis for WiFi-based localization system isunder 4 m of error at reliability of 90 %

ˆ Building efficient methods for vision-based person localization, including the lutions for human detection, tracking and linking person’s trajectories in cameranetworks The performance of human localization is tied with the positioning er-ror for all matched pairs of person and tracker hypothesis on all frames is under

so-50 cm

ˆ Constructing an efficient solution for person Re-ID in camera networks

ˆ Developing a method for person localization and Re-ID by combination of WiFiand camera systems The method can leverage the advantages of each single tech-nology, such as high localization accuracy of vision-based systems and low compu-tational cost and more reliable identity (ID) of target in WiFi-based localizationsystems In the combined localization system, the positioning performance based

on cameras are reserved, while the performance of person identification and Re-ID

in camera network is improved

ˆ Setting up a combined system of WiFi and camera under indoor scenarios of anautomatic person surveillance system The proposed methods for people local-ization and identification are evaluated in this system

ˆ Building datasets for experimental evaluation of the proposed solutions To thebest of our knowledge, public multi-modal datasets for evaluating combined lo-

Trang 20

calization and identification systems of WiFi and camera do not exist.

Context, constraints, and challenges

Context

The combined system of WiFi and camera for person localization and identification

is deployed in real world scenarios for an automated person surveillance system inbuilding environments In almost buildings, entrance and exit gates are set up in order

to control who comes in or comes out of a building This context is also considered inour system Figure 1 shows the context in which the testing environment is divided intotwo areas: Check-in/check-out and Surveillance The proposed system is implemented

in these areas with two main functions The first function is learning ID cues, which isexecuted for each individual in check-in/check-out area The second function is personlocalization and Re-ID, and it is processed in surveillance area (see Figure 2)

Check in/Check-out Area Surveillance Area Capture Face for Check in Capture Appearance Capture Face for Check out

Figure 1 Person surveillance context in indoor environment

Surveillance Area

Learning IDs Localization and

Re-Identification

Figure 2 Multimodal localization system fusing WiFi signals and images

Each person is required to hold a WiFi-integrated device, and one by one comes in

at the entrance of the check-in/check-out area At the entrance gate of the this region,the person’s ID will be learned individually by the images captured from cameras

Trang 21

and the MAC address of WiFi-enable equipment held by each person One camera,which is attached in front door of the check-in gate, will capture the human face forface recognition In this case, face recognition is processed in closed set manner, whichmeans the faces in the probe set are included in the gallery set Another camera acquireshuman images at different poses and learning phase of appearance-based ID is donefor each person In short, in the first region, we get three types of signatures for eachperson Ni: face-based identity IDF

i , WiFi-based identity IDW

i and appearance-basedidentity IDA

i Based on IDF

i , we know which corresponding IDW

i has already beeninside the surveillance region The corresponding IDA

i is also assigned for this person.Depending on different circumstances, these ID cues can be used for the purpose ofperson localization and Re-ID in the surveillance region

Each person will end up his/her route at the exit gate and he/she will be checkedout by another camera which captures human faces for face recognition The checked-out person will be removed from the processing system

In summary, with the above-mentioned scenarios, in check-in/check-out area, wecan:

ˆ Monitor the changes of each individual in appearance (changes in cloth) at eachtime he/she comes into the surveillance regions This makes appearance-basedperson descriptors more feasible for person Re-ID

ˆ Decrease the computing cost of the system and narrow the ID-matching space

by eliminating checked-out people from the processing system

ˆ Map between different ID cues for the same person

In the surveillance area, two problems of person localization and Re-ID will besimultaneously solved by combining visual and WiFi information The surveillanceregion is set so that the WiFi range, which is formed by deploying wireless AccessPoints (APs), will cover all visual ranges (the cameras’ FOVs) Figure 3 demonstratesthis setting, with two camera ranges are covered by WiFi range

Constraints

With the above-mentioned context, some constraints are taken into account forperson localization and identification system by fusion of WiFi and camera:

ˆ Environment:

– Indoor environment with space constraints:

* A single floor including the scenarios in hallway, lobby and in a room,

Trang 22

AP

AP

WiFi range Camera range

Figure 3 Surveillance region with WiFi range covers disjoint camera FOVs

with the area of hallway is 9×5.1 m, lobby is 7.5×1.8 m and showroom

is 5.4×5.1 m

* The space in which people are continuously located by WiFi range and

at least two non-overlapping cameras

– Furnitures and other objects are distributed statically in an office building

ˆ Illumination conditions: Both natural and artificial lighting sources are ered

consid-– Natural lighting source changing within a day (in the morning, noon, noon)

after-– Artificial lighting sources are stable in a room

ˆ Sensors:

– Vision:

* Stationary RGB cameras which capture frames at normal frame rate(from 15 to 30 fps) and the image resolution at 640×480 pixels

* Cameras are deployed in the environment with non-overlapping FOVs

* Cameras are time synchronized with Internet time

– WiFi:

* Wireless APs are deployed so that their wireless ranges can cover out the surveillance region

through-* WiFi enabled devices, such as smart phones or tablets have their own

ID (the MAC address of WiFi adapter) Each person holding this device

Trang 23

will be uniquely assigned an ID of the device.

* WiFi enabled devices are time synchronized with Internet time

ˆ Pedestrian:

– At the same time, there may be more than one pedestrian involved

– Each person is required to hold a WiFi-enable device and moves with normalspeed (1-1.3 m/s) in monitoring areas

Challenges

Person localization and identification in indoor environments by fusion of WiFiand camera systems are very challenging First, the challenges come from the vision-based system, including:

ˆ Illumination conditions: Light variations which suddenly occur can strongly affectthe performance of human detector For person Re-ID, this issue is critical,especially in case of non-overlapping cameras The same person observed by twodifferent cameras under distinctive illumination conditions may result to differentappearances This decreases the performance of person Re-ID, because most ofthe proposed methods for person Re-ID rely on the human appearance

ˆ Shadows and reflections: Depending on illumination conditions, lighting angle,floor/wall smoothness, shadows and reflections can appear variously,, and theyare troubles for human detection, tracking and localization Shadows and re-flections are difficult to handle Depending on the features (motion, shape orbackground) used for human detection or tracking, a shadow on the ground orreflected by walls or windows may behave and appear like the person that casts it.Localization errors may become larger if we have bad results of people detectionand tracking caused by shadow phenomenon

ˆ Occlusions: Occlusions appear when people move in proximity or hidden byobstacles in the environment This phenomenon can cause track loss, errors inposition-ID assignment For multi-target tracking, inter-person occlusion is still

a challenging problem

ˆ Person appearance variation: Appearance of one person can be highly influenced

by the clothing color he/she wears, the distinctive view angles of one camera, ordifferent cameras The variation in human appearance is challenging for humandetection, tracking and Re-ID

ˆ Crowded scene: The number of persons in the scene is a critical parameter For

Trang 24

human tracking, the high number of persons in the scene has two negative effects:first, the probability of occlusion increases with the increase of person number;second, the high number of persons increases the risk of ID permutations andtracking errors, due to the high probability of having close models of trackedpersons For person Re-ID, this issue is also important The high number ofpersons in the scene increases the number of matching candidates for each Re-IDquery, and thereby, increases the probability of Re-ID error The high number ofpeople increases the probability of having similar visual signatures too.

ˆ Multiple cameras: Person localization and identification in multi-camera ios are much more challenging than using only one camera In camera networks,the problem of person Re-ID and linking trajectories when people move from onecamera FOV to others is still an open issue

scenar-ˆ Computational cost and real time performance: Computational costs include thetime and memory used when building and testing a system Many image-basedlocalization applications require real time performance, so it is worth to studyhow these systems address the time performance issue

Second, in comparison with vision-based person localization systems, the ment of WiFi-based systems is easier and the wireless chips are much cheaper than cam-eras Their power and computing resource consumption is also significantly lower thanvision-based localization systems However, the WiFi-based localization techniquesare always associated with a set of challenges, mainly originated from the influence ofobstacles on the propagation of radio waves in indoor environments:

deploy-ˆ Unpredictability of WiFi signal propagation through indoor environments Thedata distribution may be varied because of changes in temperature and humidity,

as well as the position of moving obstacles, such as people walking throughoutthe building This uncertainty makes it difficult to generate accurate estimates

of signal strength measurements by which the positions can be calculated

ˆ None Line of Sight (NLOS) refers to the path of propagation of a radio frequency(RF) that is obscured (partially or completely) by obstacles, thus it makes difficultfor the radio signal to pass through

Additionally, quality of WiFi data for localization are highly depends on type, position,orientation, quantity and the distribution of wireless transceivers (APs, mobile phones,tablets, )

Third, some challenges arise from combination of WiFi and vision-based systemsfor person localization and identification in indoor environments:

ˆ Different nature of WiFi and visual signals: It is challenging for combining data

Trang 25

collected from the distinctive sensors like WiFi and visual sensors.

ˆ Signal synchronization between different sensors of WiFi and camera This is anecessary step before testing and evaluating any fusing solutions

Contributions

In order to achieve the above-mentioned objectives of the research, several butions are made in this thesis:

contri-ˆ Contribution 1: Proposing an improvement for WiFi-based person localization

In this proposal, an efficient pass-loss model is defined with the consideration ofobstacle constraints in indoor environments Based on this, we can effectivelymodel the relationship between RSSI and the distance from a mobile user andAPs A well-known fingerprinting method with a new radio map is defined tomake stable and reliable fingerprint data for localization In order to do matchingbetween a query sample and fingerprint data, KNN method is utilized with anadditional coefficient reflecting the chronological changes of fingerprinting data inthe environment The WiFi-based localization results allow to activate the vision-based localization processes at the cameras which are in the range of returnedpositioning result from WiFi system

ˆ Contribution 2: Improving vision-based person localization by proposing ficient shadow removal and human detection methods For shadow removal,

ef-a combinef-ation of chromef-aticity-bef-ased ef-and physicef-al feef-atures is proposed, ef-and ef-adensity-based score fusion scheme is built to integrate each shadow-matchingscore archived by each independent feature It is a preprocessing step for bet-ter human detection, which is based on the fusion of HOG-SVM and adaptiveGMM background subtraction techniques This fusion allows to take advantages

of high speed computation of adaptive GMM and accuracy of HOG-SVM forhuman detection Additionally, for HOG-SVM detector, we build HOG descrip-tors and train SVM on our database and standard INRIA dataset This helps

to improve the performance of human detection by HOG-SVM in the consideredenvironments

ˆ Contribution 3: An efficient appearance-based human descriptor is proposed forperson Re-ID in camera networks The descriptor is built on each detected humanROI from human detector Three different features of gradient, color and shapeare extracted from a human ROI at three levels of pixel, patch and whole humanROI, then three match kernel functions are built from these Fusion of thesematch kernel functions results to an invariant descriptor to scale and rotation of

Trang 26

human images captured from different camera view angles and distances This isspecially helpful for multi-camera surveillance scenarios which exist high intra-class variation.

ˆ Contribution 4: A new fusion method is proposed in a combined person ization and identification system of WiFi and camera By using state predictionand correction steps of Kalman filter, together with an optimal assignment, theproposed fusion method allows to maintain the high accuracy of vision-based per-son localization In addition, with this fusion, tracking by identification based on

local-ID cues from WiFi adapter also offers a better solution for person tracking andRe-ID in camera networks

Apart from the main contributions mentioned beforehand, in this thesis, a usefulmethod for linking person trajectories in camera networks is proposed Based on theobservation that cameras have views on a common floor plane where the people move,each pair of cameras forms a stereo vision on a single floor plane By using cameracalibration for this stereo vision, the person’s trajectories on the images captured bydifferent cameras can be transformed correspondingly to the real world locations on anunique floor map In addition, a fully-automated person surveillance system is proposed

in indoor environments The system reflects the real surveillance scenarios in mostbuildings Towards building a such surveillance system, some experiments are done

to show the performance of the other reported methods for human face recognition,person localization, identification and Re-ID in a camera network

General framework and thesis outline

A combined system of WiFi and camera for person localization and identification

in the above-mentioned context of indoor person surveillance system is presented inFigure 4 In each camera FOV, person localization is done by three phases of humandetection, tracking and localization to output person identity (IDC) and correspondingpositions (PC) Because WiFi range covers the camera FOVs, so in each camera FOV,vision-based positioning results will be combined with WiFi-based localization results

by a fusion algorithm in order to make effective decisions about position and identity ofperson in environments When people switch from one camera FOV to another, theywill be re-identified to update the ID for each individual trajectory The trajectoriesthrough the cameras will be also linked to show the entire route in the environment

In this thesis, the algorithms for person localization and identification will bedeveloped and evaluated in the combined system of WiFi and camera This thesis isdivided into five chapters, with the introduction at the beginning and the conclusion

Trang 27

(P C , ID C )

WiFi-based Localization

WiFi signals

with future research directions are shown at the end:

ˆ Introduction: The motivation and objective of the thesis; the considered context,constraints and challenges that arise when dealing with the problems in the thesis.Additionally, an overview framework, thesis outline and the contributions of thethesis are also presented in this chapter

ˆ Chapter 1: The related works on person localization based on WiFi system orcamera system and fusion of them are discussed In addition, the relative issues

of person Re-ID in camera network are also surveyed in this chapter

ˆ Chapter 2: The details of the proposed algorithm and experimental evaluationsfor WiFi-based localization are presented

ˆ Chapter 3: A vision-based person localization system is proposed with three mainphases of human detection, tracking and 3D localization The improvements aregiven in each phase in order to enhance the system performance

ˆ Chapter 4: In real-time scenarios of multi-camera surveillance system, the lems of person identification based on human face and appearance-based personRe-ID are proposed An efficient human descriptor based on human appearance

prob-is applied with evaluations on person Re-ID in a camera network

Trang 28

ˆ Chapter 5: A fusion of WiFi and visual signals for person localization, cation, and Re-ID is proposed in this chapter.

identifi-ˆ Conclusion and future works: Major findings of the thesis will be recapitulatedand future directions are proposed for further research and development

Trang 29

CHAPTER 1

LITERATURE REVIEW

Person localization refers to the process of determining the person’s positions inthe environment For multi-person localization, at each person’s position, the identity(ID) need to be shown correspondingly so that we can separate trajectory for eachindividual This means in multi-person localization, two problems of positioning andidentifying must be solved concurrently

Depending on the target environment, the positioning system can be categorized

as either indoor (inside the buildings), outdoor (outside the buildings) or mixed type.Global navigation satellite systems (GNSS), such as GPS are well-known outdoor local-ization system and have been widely used in real life Unfortunately, GPS is ineffective

in indoor environments because of strong degradation of satellite signals due to ent obstacles inside the buildings Meanwhile, the growing interest in location-basedapplications and services in indoor environments, such as building automation andcontrol systems, guidance, asset localization, key personnel tracking and LBS (locationbased services), has led to the increase in research for indoor human localization.Several technologies have been proposed for indoor person localization, such ascamera, infrared, sound, WLAN/WiFi, RFID, Ultra-wideband, inertial navigation, etc[101] A number of corresponding solutions for each type of technology have beenproposed In general, they can be categorized into three groups:

differ-ˆ Solutions which are based on geometry calculations or pattern matching, such asproximity, centroid, scene analysis, angulation, lateration, min-max, hyperbolic,etc [47]

ˆ Optimization solutions or error minimization algorithms, such as the method ofsteepest descent, Levenberg-Marquardt and Newton methods, etc [52], [95]

ˆ State estimation solutions, such as Kalman filter or particle filter [59], [102], [110]

In the first group, the geometry measurements are used to locate people The surement errors are not taken into account in these algorithms In contrast, the secondgroup can solve this problem by minimizing the overall error between the collecteddata and location estimate The final group utilizes a set of states (current states orboth of current and past states) They are are operated by a iterative combination ofthe previous state estimation with the observed measurements Basically, the currentstates are predicted by using both of a dynamic model of state processing and the

Trang 30

mea-previous states.

In the choice of the best technology to design an indoor person localization system,

a large number of performance-related parameters should be considered, such as curacy, cost for system deployment or processing, robustness, coverage, scalability, Infact, no single solution fulfills all of these parameters or works fine for any scenarios[101] This results to a new research trend of combining multiple sensors for personlocalization in indoor environments By this way, multi-modal positioning systems areproposed to best fit the applications and user requirements

ac-In this thesis, WiFi and vision-based systems are combined for indoor personlocalization and identification This means both WiFi signal properties and opticalfeatures are exploited to locate and identify person in indoor environment Therefore,

in the following sections, the literature review will focus each single system of WiFiand camera and the fusion of them for indoor person localization Additionally, theproblem of person identification and re-identification (Re-ID) in camera network is alsodiscussed

1.1 WiFi-based localization

WiFi-based localization is the process of determining the person’s physical nates using radio signal observations transmitted from Access Points (APs) to a mobiledevice held by the person An AP is also called as a beacon/base station An AP willperiodically transmit a set of beacons that contains the transmission information, such

coordi-as time stamp, path loss, or supported data rate A mobile client when moving inthe propagation range of an AP will receive the beacon signals transmitted from this

AP Owing to this characteristic, in WiFi-based localization, an AP is also called atransmitter and a mobile client is named as a receiver

Beacon signal features

AoA-ToA-TDoA-RSS

Distances/AnglesAPs-Mobile Client

Mobile Client Position

Figure 1.1 Flowchart of WiFi-based person localization

Figure 1.1 shows the flowchart of WiFi-based localization First, the beacon signalfeatures, such as AoA (Angle of arrival), ToA (Time of arrival), TDoA (Time differ-ence of arrival) and RSSI (Received signal strength index), are used to determine thedistances or angles from a mobile device to surrounding APs The angle calculation isdone with AoA, while ToA, TDoA and RSSI are used in the distance calculation The

Trang 31

distances or angles are then utilized to calculate the position of the mobile device bytwo popular techniques: (1) geometry with distance (Lateration/Trilateration/Multi-lateration) or angle (Angulation/Triangulation) and (2) fingerprinting.

BS 1

BS 2

Figure 1.2 The angle-based positioning technique using AoA

Figure 1.2 illustrates an angle-based techniques using AoA A mobile client islocated by calculating angles of incidence of the signals transmitted from neighboringAPs to a mobile client The intersection of two LoBs (Line of bearing) formed by aradial line to each AP is the estimated position of the mobile client At least two APsare required for location estimation, but three or more APs can give better positioningaccuracy A well-known implementation of AoA is the VOR aircraft navigation system[114]

The distance-based techniques is indicated in Figure 1.3 ToA-based localizationusing precise measurement of the arrival time of a signal transmitted from a mobileclient to APs Given signal velocity, the distances (di) between the mobile client andAPs can be calculated In TDoA method, the time differences of arrival of the signaltraveled from a mobile client to surrounding APs are used for distance calculations.PinPoint system [165] is an example for ToA-based localization It gives an averageaccuracy of four to six feet, in different environments, allowing PinPoint to supportaccurate rapidly deployable localization scenarios in both indoor and outdoor envi-ronments TDoA-based localization is reported with Cricket location-support system[125] for in-building, mobile, location-dependent applications The distances between amobile client and neighboring APs can be also determined by RSSI which measures the

Trang 32

signal strength scanned at the receiver Based on these distances, the mobile device’slocation will be calculated by circular or hyperbolic lateration/trilateration/multilat-eration (see example in Figures 1.3-a and 1.3-b, respectively) [78], [122].

In general, AoA-based localization requires specific antennas so the hardware costwill be extended ToA method needs an exact time synchronization between referencestations and mobile clients, while this is unnecessary for TDoA methods except forthe synchronization betwee reference stations However, the exact synchronization ishard to be done for WiFi signals in indoor environment RSSI is used for selectingthe most suitable AP for signal transmission With RSSI-based localization, no morehardwares are deployed and it is easy to executed by mobile devices Thus, RSSI isthe most popular selection for WiFi-based localization In the ideal environment, RSSIfrom a transmitter will decrease at inverse proportion to the distance square between atransmitter and a receiver This means RSSI value will reduce if the distance between

a transmitter and a receiver increase The dependence of RSSI on the distance can beused to localize efficiently the mobile device However, it is not easy to model exactlythe relationship between RSSI and the distance Two main approaches are proposedfor modeling RSSI-distance correlation:

• Path-loss/Radio propagation model: It is a function of propagation distance andother parameters, such as terrain profile, carrier frequency or antenna height, etc Thesignal attenuation between a transmit and a receive antenna is then estimated based onthese functions The positioning accuracy of this approach much depends on path-lossmodels so many efforts have been done for this The log-normal model reported in [65]

is the simplest path-loss model It describes the relation between RSSI measurement

Trang 33

zj and location x as follows:

zj = zjR− 10hjlog10k x − xj k

where zRj is RSSI level measured at APjfrom a reference point xj which has the distance

d0from APj; hj is path loss coefficient and vj is measurement noise relating to the APj.This model presents a simple mapping of RSSI to the distance, therefore it does notfully reflect the complex nature of the indoor environment Many improved modelsfrom this have been reported later In [129], an improved model called SGMF+BPWL(Single gradient multi-floor building partitioned) (1.2) is proposed with a dynamic wallbreakpoint and exterior wall penetration loss are added to the original model of SGMF

path-is adopted with additional assumptions: the measurement nopath-ise vi is Gaussian andpath loss model (1.1) yields that the RSSI measurement zj, conditioned on the pathloss coefficient hj and location x, is Gaussian distributed such that:

p(zj|x, hj) ∼ N zjR− 10hjlog10k x − xj k

d0 , σ

2 v

!

(1.3)

where p(zj|x, hj) is the probability density function of zj conditioned on path losscoefficient hj and location x, while N (µ, σ) denotes Gaussian distribution of mean µand variance σ In addition, zj is assumed to be independent conditioned on location

x The maximum likelihood estimate of the unknown location x is then obtained asfollows:

ˆ

x = argmax

x

l(z1, , zN|x, h1, , hN) (1.4)with

l(z1, , zN|x, h1, , hN) = 1

2σ2 v

In short, mobile user localization based on path-loss model is still an open problem

It is not easy to model the relationship between RSSI and the relative distance in indoor

Trang 34

environment There still exist many challenges that need to be considered to utilize thissignal feature The nature of media space, including the number of APs, the obstacles,motion and direction of mobile devices (target nodes) not only causes the multi-pathand non-line-of-sight (NLOS) signal propagation but also high signal attenuation andscattering Those factors change over time, hence both spatial and temporal challengesmust be considered for object indoor localization based on RSSI In addition, differenttypes of smart phones show different RSSI values at the same position of measurement,even they are not similar for one kind of device at different time of signal scanning [93].

• Fingerprinting method: including two phases of training or building fingerprintdatabase and testing The radio map R (1.6), (1.7) is a core component of fingerprintdatabase It is a set of reference points F (pi) with their known positions pi and RSSIlevels ri scanned from L nearby APs A testing sample will be mapped to the finger-printing database to return the corresponding location of the mobile device A number

of matching techniques have been developed, such as probabilistic methods [130], nearest neighbor (KNN) [16], artificial neural networks (ANN) [6][152], support vectorregression (SVR) [29], etc Among these, probabilistic methods are the most advanced

K-as the uncertainty of RSSI is taken into account [84][105][34]

R , {(pi, F (pi))|i = 1, , N } (1.6)with

F (pi) , [ri(1), , ri(n)] and ri(t) , [r1i(t), , rLi(t)] (1.7)Fingerprinting is usually time consuming and labor intensive because of collectingtraining data In addition, the nature of indoor environment with furniture distribu-tion, room occupancy and the AP locations can cause the variety of RSSI levels atthe same scanning point Adjusting fingerprinting data to adapt any changes in theenvironment is also costly Some works were reported to alleviate these issues, such

as building radio maps in a semi-fingerprinting fashion [173], using ray tracing instead

of fingerprinting [88] or applying machine learning algorithms [53] Generally, printing methods have higher positioning accuracy than geometry solutions [16][108]

finger-1.2 Vision-based person localization

Vision-based person localization systems are under active development becausethey do not require special markers to be embedded within the environment More-over, vision provides an immensely rich source of data from which estimating positions

is possible Vision-based person localization has proven to be highly successful andpopular in computer vision applications, especially for indoor surveillance systems In

Trang 35

these systems, stationary and monocular cameras are popularly used and person ization is done through three main phases of human detection, tracking and localization

local-in real world coordlocal-inate system (see Figure 1.4)

Human Detection

Human Tracking

Human Localization

in recent years, detecting partially occluded person or detecting in dynamic or ter background remains a weakness of current approaches [49] There are two main

clut-Figure 1.5 Human detection in an image [149]

approaches for human detection The first one detects moving objects and considersthem as people It is called as motion-based detection In the second one, people aredetected by applying human classification

1.2.1.1 Motion-based detection

In the scenarios of human detection by stationary cameras, background tion techniques are the most popular choices for motion-based human detection Theadvantage of this method is to rapidly detect people in the image [148], [150] and it

Trang 36

is suitable for real-time applications Some challenges remain for background tion Firstly, the challenge appears from how to build a robust background model frombackground changes caused by:

subtrac-ˆ Natural oscillations which appear shortly in the light intensity of image pixels

ˆ Illumination variations

ˆ Background which contains moving elements (swaying tree branches)

ˆ Changes in the locations of static objects in the scene

In order to tackle these problems, a number of solutions have been proposed and theycan be categorized into Basic Background Modeling [89], [169], Statistical BackgroundModeling [51], Fuzzy Background Modeling [50], [140] All of these approaches followthe steps of background modeling, background initialization, background maintenance,foreground detection, choice of the feature size (pixel, a block or a cluster), choice ofthe feature type (color, edge, stereo, motion or texture features) The most usedbackground model is pixel-wise MOG (Mixture of Gaussian) [143] and its improve-ments of GMM (Gaussian Mixture Model), TLGMM (Two-Layer Gaussian MixtureModel), STGMM (Spatio Temporal Gaussian Mixture Model), SKMGM (Spatial Ki-netic Mixture of Gaussian Model), TAPPMOG (Time-Adaptive, Per-Pixel Mixtures

Of Gaussians) [28] These models take into account the spatial or temporal constraintsfor modeling background and this is robust to dynamic background changes

Figure 1.6 Human detection results with shadows (the images on the left column) andwithout shadows (the images on the right column) [134]

Secondly, the challenge comes from shadow phenomenon in which foreground mented from background not only contains the object of interest but also its shadow(as demonstrated in Figure 1.6-a with human shadow is included in foreground and

Trang 37

seg-Figure 1.6-b without human shadow) In visual surveillance scenarios, shadow tion and removal in video stream captured from stationary cameras are very activeresearches in the field of computer vision, autonomous vehicles or visual surveillance.Shadow is formed when direct light from a light source is obstructed by opaque ob-jects Shadow could be regarded as noise for vision-based tasks, such as pedestriandetection, tracking, and localization Despite of the good performance achieved bysome techniques, shadow detection is still an open problem The variations of objectappearance, illumination condition, occlusion, or high computing time could all bringchallenges.

detec-A large number of algorithms for shadow detection have been proposed in recentyears, but they can be categorized into: feature-based and learning-based methods Forthe first one, a recent survey [134] evaluates performance of the techniques which utilizespectral features extracted from RGB image, such as intensity, chromaticity, physicalproperties; and spatial features consisting of geometry and textures In the survey[134], the authors proved that performance of traditional, simple, and fast shadow de-tection methods are relatively poor (e.g., below 60% for the geometry-based technique).They also suggest that extra features combined from the existing methods are helpfulfor shadow removal For the second one, statistical learning-based approaches havebeen developed to learn and remove shadow [96], [97], [92] The shadow models arelearned to adapt to environment changes, and such kind of these approaches improvedperformance compared to the feature-based techniques

ˆ Classification: Some popular classifiers for human detection including SVM, aBoost, MPLBoost, Linear SVM, RBF kernel SVM It is crucial for selectingimage features together with an effective classifier For example, Haar featurewith Adaboost or Haar-like and linear SVM can give better human detectionperformance In general, RBF kernel SVM used for HOG and Shape Contextgive the best performance for human detection [136]

Trang 38

Ad-In general, human detection by applying classifiers requires training models bywhich human classification can be done It is quite time consuming and the performance

of detectors much depend on the trained models Moreover, sliding window technique

is applied in most of the above methods to find the best matching This results to highprocessing cost and unsuitable for real-time applications

or jointly In the first scenario, with the help of human detection algorithm, possiblehuman regions in every frame are obtained, and human correspondence across frames isperformed by human trackers, such as KLT (Kanade - Lucas - Tomasi) feature tracker[15], SPOT (Structure Preserving Object Tracker) [166], Mean-shift Based Moving Ob-ject Tracker [38], etc In the latter scenario, information obtained from previous frameshelps to find the target region and the correct estimation of correspondence is donejointly by iterative updating of object region and its location Statistical methods likeKalman filter [66] or particle filter [137] are applied popularly for this scenario Dataassociation is specially necessary for multi-object tracking because it can help partly

to solve the circumstance in which objects are close to each other Some techniquesare reported for data association, such as Nearest neighbor, LAP (Linear Assignmentproblem) [31], SMP (Stable Marriage Problem) [68], or some statistical techniques of

Trang 39

JPDAF (Joint Probability Data Association Filtering) [91][146] or Multiple HypothesisTracking (MHT) [127][24].

Although many approaches have been proposed in the literature for human ing but it is still a challenging problem because of non-rigid structures of moving people,scene illumination, person-to-person or person-to-scene occlusions, real time processingrequirements, etc

track-1.2.3 Human localization

Person localization in real world coordinate system is the process of finding theperson’s positions in the real world environment A large number of surveys has beendone for vision-based person localization and different classification criteria are pre-sented In [27], vision-based localization techniques are categorized into Map-basedand Mapless solutions SLAM (Simultaneous Localization and Mapping) [32], [77] isdominant technique for Map-building positioning systems In this approach, a mapformed by a sequence of landmarks is used during localization The observed land-marks will be matched with the data in the map to find out the corresponding object’spositions In contrast, mapless localization is achieved without any prior description ofthe environment The localization process is performed based on elements observed inthe environment and does not require the creation of a map Optical Flow and FeatureTracking are popular techniques for this approach [115], [63]

Following the approach of [63], most of the visual localization systems are based

on a monocular camera [37], [77], [151] or binocular cameras or stereo cameras [77],[144] With a monocular camera, motion parameters are provided up to a scale factorwhich is resulted from camera 3D-2D transformation Indeed, a 2D point in the imageplane is the projection of an infinite number of 3D world points By using cameracalibration method [153] and hormography transform [5], the 3D world points can

be calculated from the corresponding 2D image points With binocular cameras, the3D coordinates of the world points will be achieved using the triangulation technique[70] In comparison with monocular camera, stereo cameras have disadvantages ofhigher cost mainly due to the additional software and hardware In addition, in large-scale environments, the targets may be far from the camera when being captured, soprocessing these images does not allow recovering the depth values unless the stereocamera baseline is of few meters [73]

In addition to the above classification criteria, vision-based localization systemscan be distinguished by using moving or stationary cameras The first case relates to thecamera mounted on a moving object and the positions of the object can be determinedfrom the images captured by this camera [81] The second case refers to positioning

Trang 40

moving objects by fixed cameras [139] Recently, with the wide usage of omnidirectionalcameras in many applications, the vision-based localization systems can be sorted bythe camera field of view (FOV), by which we have localization systems based on narrow-FOV or wide-FOV cameras A majority of the existing vision based localization systemsusing 2D images or 3D RGB-D data captured from regular cameras with narrow FOV.However, additional information captured by ominiderectional cameras can improvethe positioning results gained by narrow FOV cameras [55].

1.3 Person localization based on fusion of WiFi and visual

prop-erties

There have been several attempts to combine camera and WiFi systems for door person localization A multi-modal localization system is reported in [154] usingWiFi-based localization and tracking by stationary cameras The combined system

Figure 1.8 Camera-based localization system in [154]: (a) Original frame, (b) ground segmentation by MOG, (c) Extraction of foot region from (b), (d) Gaussiankernel of (c) is mapped on floor plan

Fore-focuses on improving the positioning accuracy and confidence at room level ing to the authors’ assessments, camera-based localization achieves higher positioningaccuracy than WiFi-based system However, blind points, occlusions and person iden-tification are much challenging for camera systems WiFi systems give clearer identityinformation because each mobile device has a unique MAC ID, but considered targetsare required to hold mobile devices during tracking In this work, RSSI property andfingerprinting method are used in WiFi system to locate mobile targets In camera-based system, foreground segmentation is done by MOG (Mixture of Gauss) method.The region which contains person feet is then extracted from foreground and projected

Accord-on the floor plan Gaussian kernels are used to model the foot regiAccord-on (see Figure1.8) Each single localization module is executed depending on the availability of eachsensor information When both of them appear, a combined Bayes model with the

Ngày đăng: 08/02/2017, 10:44

HÌNH ẢNH LIÊN QUAN

HÌNH ẢNH VÀ WIFI - Nghiên cứu và phát triển các kỹ thuật định vị và định danh kết hợp thông tin hình ảnh và WiFi
HÌNH ẢNH VÀ WIFI (Trang 1)

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w