Throughout the thesis, we first briefly introduce an overview of vision – based localization; We then present the proposed frame-work including steps: Background Subtraction for detectin
Trang 1MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
-
NGUYEN VAN GIAP
VISION – BASED LOCALIZATION
Trang 2CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập – Tự do – Hạnh phúc
BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Họ và tên tác giả luận văn : Nguyễn Văn Giáp
Đề tài luận văn: Định vị sử dụng thông tin hình ảnh
Chuyên ngành: Khoa học máy tính
Mã số SV: CB140975
Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày 21 tháng 10 năm 2016 với các nội dung sau:
Bổ sung trích dẫn cho các hình vẽ, bổ sung thứ nguyên
Sửa các lỗi soạn thảo trong toàn bộ luận văn
Bổ sung thêm danh mục các ký hiệu, ý nghĩa và thứ nguyên
Bổ sung các thông tin còn thiếu trong tài liệu tham khảo
Làm rõ việc sử dụng lại code, chỉ rõ phần nào, ở đâu
Hà Nội, ngày……tháng 11 năm 2016
Người hướng dẫn
TS Vũ Hải
Tác giả luận văn
Nguyễn Văn Giáp CHỦ TỊCH HỘI ĐỒNG
GS.TS Eric Castelli
Trang 3 Where I have consulted the published work of others, this is always clearly attributed
Where I have quoted from the work of others, the source is always given With the exception of such quotations, this thesis is entirely my own work
I have acknowledged all main sources of help
Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself
Signed:
Date:
Trang 4HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
International Research Institute MICA Computer Vision Department
Master of Science
Vision – Based Localization
by Nguyen Van Giap Abstract
Nowadays, the vision-based localization systems using vision sensors are widely used in public and crowded places Positioning information, which is extracted from the image data stream by surveillance cameras, could support monitoring services in different manners For instance, to detect people/subjects who are not allowed entrancing in a certain place; or to link human trajectories and then to identify their abnormal behaviors, so on These services always require positioning information of the interested subjects Whereas, the other localization techniques are still limited about distance, accuracy (e.g., WIFI, RFID), or setting the environment and usability (e.g., Bluetooth, GPS) The vision–based localization technique, particularly, in indoor-environment has many advantages such as: scalable, high accuracy; without requirements of the additional attached-equipment/devices to the subjects
Motivated by above advantages, this thesis aims to study, and propose a high accuracy vision-based localization system in indoor environments We also take into account detailed implementations and developments, as well as testing the proposed techniques It is noticed that the thesis focuses on moving human indoor environments that is monitored in a surveillance network camera To archive a high accuracy positioning system, the thesis deals with the critical issues of a vision-based localization We observe that there is no a perfect human detector and tracking Then, we use a regression-model to eliminate outlier detections The system therefore improves detection and tracking results
Throughout the thesis, we first briefly introduce an overview of vision – based localization; We then present the proposed frame-work including steps: Background Subtraction for detecting moving subject; shadows removal techniques for improving detection result, and linear regression method to eliminate the outliers; and finally the tracking object using a Kalman Filter The most important result of the thesis is demonstrations which show a high accuracy and real-time computation for human
Trang 5positioning in indoor environments These evaluations are implemented in several indoor environments
Trang 6ACKNOWLEDGEMENTS
I am so honor to be here the second time, in one of the finest university
in Vietnam to write those grateful words to people who have been supporting, guiding me from the very first moment when I was an undergraduate student until now, when I am writing my master thesis
I am grateful to my supervisor, Dr.Vu Hai, whose expertise, understanding, generous guidance and support made the research topics to be possible for me that were of great interest to me I am pleasure to work with him
I would like to special thanks to Dr Le Thi Lan, Dr Tran Thi Thanh Hai and all of the members in the Computer Vision Department, MICA Institute for their sharp comments, guidance for my works which helps me a lot in how to study and to do researching in right way and also the valuable advices and encouragements that they gave to me during my thesis Particularly, I would to express my appreciations to a Ph.D Student Pham Thi Thanh Thuy, who allow me to use a valuable database of human tracking in a surveillance camera network Without her permission, I could not make extensive evaluations for the proposed method
Finally, I would especially like to thank my family and friends for their continue love, support they have given me through my life, helps me pass through all the frustrating, struggling, confusing Thanks for everything that helped me get to this day
Hanoi, 10/2016 Nguyen Van Giap
Trang 7CONTENTS
ACKNOWLEDGEMENTS 5
LIST OF FIGURES 8
LIST OF TABLES 10
ABBREVIATIONS 11
Chapter 1: INTRODUCTION 12
1.1 Context and Motivation 12
1.2 Objectives 14
1.3 Vision-based localization and main contributions 15
1.4 Scope and limitations on the research 16
1.5 Thesis Outline 17
Chapter 2: RELATED WORK 18
2.1 A briefly survey on localization techniques 18
2.2 A brief servey on vision-based localization systems 19
Chapter 3: PROPOSED FRAMEWORK 25
3.1 Formulate the vision-based localization 25
3.2 Background subtraction 28
3.3 Post-processing procedure 30
3.4 Shadow removal techniques 32
3.4.1 Chromaticity-based feature extraction 32
3.4.2 Shadow-matching score utilizing physical properties 33
3.4.3 Existing issues after applying shadow removal 34
3.5 A localization estimation using a regression 35
3.5.1 Linear regression 35
3.5.2 Definition of Gaussian processing 35
3.5.3 Regression model 37
3.5.4 Estimating height with regression model 38
3.5.2 Outliers removal 41
3.6 Object tracking 42
Chapter 4: EXPERIMENTAL EVALUATIONS 44
4.1 Experimental setup 44
4.2 Evaluation results of the BGS and shadow removal 46
Trang 84.3 Evaluation results of the Gaussian processing regression 49
4.3.1 Evaluation results with GP 49
4.3.2 Evaluation about suitable method 51
4.3.3 Discuss about 52
4.4 The final evaluation results of the proposed system 55
Chapter 5: CONCLUSION AND FUTURE WORKS 57
5.1 Conclusion 57
5.2 Future works 57
PUBLICATION 58
BIBLIOGRAPHY 59
APPENDIX 61
Trang 9LIST OF FIGURES
Figure 1.1: Indoor localization techniques 13
Figure 1.2: Surveillance camera network 13
Figure 1.3: Positioning a human from video stream 14
Figure 1.4: Casting shadow problem a) Origin image; (b)-(c) Casting shadows; d) Mask of object; (e) Shadow pixels 14
Figure 1.5: Some examples of the experimental environments a) hallway; b) Lobby and c) in a room 16
Figure 2.1 The flow chart of a common vision-based localization technique 19
Figure 2.2 Kinds of object shadows ([9]) 22
Figure 2.3: An illustratrion of the human tracking results in [9] 23
Figure 2.4: Different tracking approaches 23
Figure 3.1: Foot-point definition 25
Figure 3.2: Transformation 2D point to 3-D point real world 25
Figure 3.3: Calibration procedure Top row: The original image; Bottom row: the corner points are detected for calculating the homographic matrix 26
Figure 3.4 A wrong tracked-point detection 26
Figure 3.5: The general flow chart of the proposed method 27
Figure 3.6: Result of BGS with adaptive Gaussian Mixture 29
Figure 3.7: Widespread object shadow: a) Origin situation; b) Mask situation 30
Figure 3.8: An illustration of the wrong object detection results due to shadow appearances 30
Figure 3.9: Noisy by illumination changing 31
Figure 3.10: Results of preprocessing 32
Figure 3.11: Example using chromatic-based features for shadow remova 33
Figure 3.12: (a) Physical shadow model and examples of (b) original image, (c) log-scale of _(p) property, (d) log-log-scale of _(p) property 34
Figure 3.13: Problems of shadow removal 34
Figure 3.14 : Graphical model of the Gaussian Regression 37
Figure 3.15: Position and height of object 38
Trang 10Figure 3.16: Position and height of object 39
Figure 3.17: Ground truth dataset to train GP model 39
Figure 3.18: Height detection object (H_det) 40
Figure 3.19: Estimated height of object (H_est) 40
Figure 3.20: Tracking results high object with GP model scanario 1 40
Figure 3.21: Height of object tracking results scenario 2 41
Figure 3.22: Outliers of tracking object and removal outliers result 41
Figure 3.23: Consensus between H_det and H_est 42
Figure 3.24: Result of Kalman filter using 43
Figure 3:25: Tracking results with and without processing 43
Figure 4.1: Testing environment 44
Figure 4.2: Processing dataset 45
Figure 4.3: Some frames of scenario 1 45
Figure 4.4: Some frames of scenario 2 45
Figure 4.5: Low quality tracking results 46
Figure 4.6: Mapping moving object result with BGS and shadow removal 47
Figure 4.7 Compare tracking BGS H_det with H_gt 48
Figure 4.9 Compare tracking BGS and shadow removal H_det with H_gt 49
Figure 4.10: Results with BGS-Shadow removal and GP 50
Figure 4.11: Results with application t 51
Figure 4.12: Calculation in scenario 2 53
Figure 4.13: Result with scenario 2 53
Figure 4.14: Result with scenario 1 54
Figure 4.15: Values of scenario 1 55
Trang 11LIST OF TABLES
Table 2.1: Some localization techniques (Adapted from [1]) 18
Table 4.2: Evaluation tracking result with BGS-Shadow removal 49
Table 4.3: Evaluation tracking result with BGS-Shadow-GP system 50
Table 4.4: Testing correlation Position and Height of moving object 51
Table 4.5: Relation of Position and Height of moving object 52
Table 4.6: ANOVA analysis of ground truth dataset 52
Table 4.7: Coefficients of independent and dependent valuation 52
Table 4.8: Compare t values and gain-lost valuation positioning tracking 53 Table 4.9: Evaluation scenario 1without removal ourliers 54
Table 4.10 Evaluation scenario 1with removal outliers 54
Table 4.11: Comparison of final results and evaluation 56
Trang 12RFID Radio Frequency Identification HSI Hue, saturation, and intensity
H_est Estimated height of object
H_det Detected height of object
H_gt Ground truth height of object
Trang 13Chapter 1: INTRODUCTION 1.1 Context and Motivation
Computer vision is the field of science and technology in which machines gain high-level understanding (or what are meanings of see) using vision/camera sensors
As a scientific discipline, computer vision is concerned with the theory for building artificial system obtaining information from a single or sequence images By analyzing such image data, a wide variety of computer vision applications (or machine visions) have been deployed, for instance, from navigations for mobile robot/autonomous vehicle, surveillance systems in both public and private environments; to diagnostics assistance using medical imaging To deploy a vision machine, many related fields such as artificial intelligence, computer graphic, optimization, modeling environments, so on, could be involved In term of an intelligent surveillance system, views from single/multiple surveillance cameras usually are underlying a series of vision-based and pattern recognition techniques such as: achieving human (detection issues), motion trajectories (tracking issues), and human identification (re-identification issues) As consequents, there are many relevant applications such as homeland security, crime prevention, traffic control, accident prediction and detection, and monitoring patients, elderly and children at home In HCI applications, ones also require position information to determine occupancy of an area to turn off lighting; localization of a moving object which triggers a camera to record subsequent events These applications always require analyzing extract positioning information from video streams collected by the surveillance cameras
Vision-based localization system (particularly, in indoor environments) is one
of important solution among localization technologies Although there are many techniques which could be deployed for indoor localization such as: Bluetooth; WIFI, LIDAR, GPS and vision/camera techniques, as shows Figure 1.1 Referring
to the vision-based localization system, it is more natural for human beings, and provides much extracted informative features at a given time Vision – based localization therefore shows significant advantages In addition, in one hand vision – based sensors are cheaper and cheaper, in the other hand security systems from the public areas are more and more important Those are opened more opportunities for developing intelligent vision-based localization technology
Trang 14Figure 1.1: Indoor localization techniques
This research theme has been widely and active research topic in the field of computer visions Many vision-based localization techniques have been studied extensively In intelligent surveillance systems, vision-based localization consists of many components, each of them critically impacts to whole system’s performances Among these conponents, human moving detection and tracking is the main target
of these systems In this thesis, we propose a vision-based indoor localization service for intelligent building to solve above requirements A vision-based
localization can be performed in two different ways: Fixed camera systems and Mobile camera systems While mobile camera system is often performed by mobile robot, (or navigation service supporting visually impaired-people), this thesis’s
theme focuses on developing localization services for fixed camera systems, as
shown in Figure 1.2
Figure 1.2: Surveillance camera network
The system extracts human from video stream, then, it transform 2-D position
to 3-D real world coordination, as shown in Figure 1.3 However, to extract correctly 2-D position from video stream, vision-based techniques suffer from many
Trang 15aspects such as lighting condition, shadow issues, object occlusion, complicated background, and so on To this end, the thesis deals with some critical issues such as: detection and tracking moving object in common surveillance camera with objectives described below
Figure 1.3: Positioning a human from video stream
1.2 Objectives
The thesis aims at researching and developing solutions for person localization in real-time person surveillance systems in indoor environments by camera Using vision-based techniques for detection and tracking human moving still exists some problems: complexes background, noises, occlusions; lighting conditions, casting shadows or quality of image/video Figure 1.4 shows some problems of them Most of them will make our detection and tracking results with having no perfect human detector Therefore, we pay much attention for improving quality person localization in indoor environment situation
Figure 1.4: Casting shadow problem a) Origin image; (b)-(c) Casting
shadows; d) Mask of object; (e) Shadow pixels
To this end, our objectives are detailed as below:
Trang 16 Researching and utilizing some techniques for human detection and tracking such as: Background subtraction, Shadow removal, Kalman filter, Tracking; Detection, Gaussian processing, and proposing frame-work suitably
Training and developing forecast function with Gaussian processing method This sub-work is to develop solutions for common human detector which always appears issues such as object’s shadow situation, or outliners
of the tracked/detected points
Converting 2-D image points to 3-D world points through a calibration procedure
The proposed techniques are developed by C++ and utilizied OpenCV 2.4.9 library
1.3 Vision-based localization and main contributions
As shown in Figure 1.3, given an image sequence or a video stream collected from a surveillance camera, we formulate a vision-based localization as follow: a foot-point extracted from human p(x,y) on 2-D images is transformed to 3-D floor P(x,y,z) in the environments In this application, we set z=0 that based on an assumption that human stand on the floor, and their activities are walking on a plane-floor While the transformation from 2-D point to 3-D is implemented by a calibration procedure, extracting 2-D human foot point is more complicated procedure We have faced some problems when we practice localization objects:
Shadow of objects
Noises by illumination changing
Obscuring by other objects
Noises of background by environment (light, branches shaking, ….)
Quality of image/video
In order to get the 2-D point from image sequences captured by a surveillance camera, firstly we perform the background subtraction to separate the foreground/background information The results of the foreground may consist of artifacts We prune the background subtraction results in order to precise 2-D position After applying the post-processing procedure to pruning results, we put 2-
D point to a tracking module using a Kalman filter However, to eliminate outliers,
we apply a learning procedure to estimate corresponding human height from the detected positions Based on constraint between two observations, we can eliminate outliers Therefore, main contribution of the thesis is proposed processing frame
Trang 17work to improve detection and tracking moving person in indoor environment The experimental results reported that the accuracy of the localization is increased to nearly 30% compared with results from common approaches
1.4 Scope and limitations on the research
We utilize a fixed camera networks: A stationary/fixed CCD camera which capture frames at normal frame rate (from 15 to 30 fps) and the image resolution at 640× 480 pixels Because the proposed technique can deploy similar with the entire camera in the network In almost figures and demonstration, we show only results from one fixed camera
The environments for implementing and evaluating the proposed techniques are limited as below:
It is an indoor environment with space constraints: A single floor with the scenarios in hallway, lobby and in a room The space in which a people is continuously located by a camera
Furniture and other objects are in an office building
Illumination conditions: Both natural and artificial lighting sources are considered
o Natural lighting source changing within a day (in the morning, noon, afternoon)
o Artificial lighting sources are stable in a room
We show some collected images at lobby, hallway and in room areas in Figure 1.5
Figure 1.5: Some examples of the experimental environments a) hallway; b)
Lobby and c) in a room
Trang 18Chapter 2: Related work
In this chapter, we present state-of-the-art researches for object localization which are based on computer vision techniques In addition, we report some relevant works that will be deployed and extended in our works
Chapter 3: Proposed framework
In this chapter, we propose a framework for localization The frame-work consists of relevant and improved techniques to make high accuracy of the localization of moving objects in image/video
Chapter 4: Experimental Evaluations
After proposing the frame-work, we develop and evaluate its performance The experimental results will be shown in the Chapter 4 Besides the experimental results are evaluated and compared with previous results
Chapter 5: Conclusion and future works
Following the results of the experimental results, we conclude this work and discuss some limitations We plan the next research and improve the quality of localization with computer vision technology
Trang 19Chapter 2: RELATED WORK
Localization systems could be based on non-visual or visual information, need
to obtain data from a navigated environment with sensors as input In this chapter,
we will first introduce some non-visual sensor based localization methods which might be used or compared with vision-based solutions We then focus on vision-based system using a surveillance camera Because of vision-based localization technique spreads over several topics of computer vision We divide them into smaller related research topics which are: background subtraction, shadow removal and object tracking
2.1 A briefly survey on localization techniques
To design an indoor positioning system, and there are several implementation methods The most common methods include vision, infrared, ultrasound, Wireless Local Area Network (WLAN), RFID, and Bluetooth Table 2.1 shows comparative performances of these techniques [1]:
Table 2.1: Some localization techniques (Adapted from [1])
Labelee wifi Bluetooth UWB Hybrid
S./I.GPS RFID/NFC Scanning/QR Outdoor
Scale according enclosure
Necessary
infransture
WLAN router
Specific Trans
Specific Trans
Specific Trans NFC Tag QR Tag
Energy
spending Medium Medium Medium High Low Low Specific
Cost High High High High Low Low
RFID is more mature technique to use in indoor positioning system However its anti-interference ability is poor This technique is more suitable for the
Trang 20positioning of goods The Bluetooth techniques for complex environments are less stable and easily disturbed by noise signal Ultrasonic technology use ultrasonic waves to measure distance between fixed-point station and the mobile target to localize These methods need to set up equitments in the monitoring environments such as relevant techniques proposed [2] and [3], which require multiple nodes to locate Comparing with these techniques, although computational costs of the vision-based methods are higher, the popularity of the monitoring system using a survillance camera opens new opportunities for localization services This could be considerd as the added value service in the survillance camera network Moreover, recently, computational issues are overcome thanks to high/power computing systems
2.2 A brief servey on vision-based localization systems
For a vision-based system, the related works share a common frame-work as shown shown in Figure 2.1 The frame-work consists of following main components such as: moving object detection, prunning results, object tracking, and projecting the object into real world coordination
Figure 2.1 The flow chart of a common vision-based localization technique
The different approaches will be different effective results Almost vision-based
localization systems deal with moving object localization in the first step This one
is the main target To do this, moving objects are extracted by utilizing the basic algrothims such as background subtraction[4], motion extraction based on
Trang 21difference frames; optical flow, so on However, it is noticed that because the inherent ambiguities of ego-motion (camera motion) is difficult to estimate from vision-data; and also the scene structure (e.g., depth variations) can be discarded, the quality of the taget detection is not guarantted
The second step in the common frame-work aims to prunning the detection
results Because the target detection is suffered from many artifacts, such as the gradual and sudden illumination changes (such as clouds), camera oscillations, high frequency background objects, such as tree branches or sea waves, biasing of moving objects and the noise coming from the camera (especially in thermal cameras) Therefore, the target results must be further processings Concerning these issues, above stated noise issues, are introduced for the detection problem
The third part of vision-based localization is tracking moving object Object
tracking has a wide variety of applications like smart video surveillance, traffic video monitoring, accident prediction and detection, motion-based recognition, human computer interaction, human behavior understanding, etc Object tracking in general, is a very challenging task because of the following circumstances; complex object shapes, cluttered background, loss of information caused by representing the real- world 3D into a 2D scene, noise in images induced by image capturing device, illumination variations, occlusions, shadows, etc Further, while tracking in cluttered environment the motion object blob may be occluded and hence the features cannot be extracted from the occluded motion object blob In such a case, prediction tools such as Kalman filter or particle filter are used to estimate the object location A lot of work have been carried out on object tracking and these approaches can be classified as region based, feature based, model based and hybrid [6]
In this thesis, we focus to study and improve vision – based localization with stationary camera situation The main targets of vision-based localization are accuracy detection object and tracking it/them every time Firstly, to extract the target object, we utilize a background subtraction; then to pruning the detection results, we utilize a shadow removal technique For tracking the human, we utilizie
a Kalman filter The related works of these techniques are presented in sub-sections below
Background Subtraction Techniques:
The background subtraction is a widely used approach for detecting moving objects
in videos from static cameras Many background subtraction techniques are proposed such as Running Gaussian average; Temporal median filter; Mixture of
Trang 22Gaussians; Eigen backgrounds The rationale in the approach is that of detecting the moving objects from the difference between the current frame and a reference frame, often called the “background image”, or “background model” As a basic, the background image must be a representation of the scene with no moving objects and must be kept regularly updated so as to adapt to the varying luminaries conditions and geometry settings More complex models have extended the concept
of “background subtraction” beyond its literal meaning Several methods for performing background subtraction have been proposed in the recent literature All
of these methods try to effectively estimate background model from the temporal sequence of the frames All approaches aim, however, at real-time performance, because the background subtraction often is the first step of a serie of the consequence techniques which will be deployed with different applications
Human detection technique:
Given the input frames, human detection is executed to determine the image regions which contains the targets Although the performance of human detectors has improved tremendously in recent years, detecting partially occluded person or detecting in dynamic or clutter background remains a weakness of current approaches There are two main approaches for human detection The first one detects moving objects and considers them as people It is called as motion-based detection In the second one, people are detected by applying a human classifier In case of detecting human from a fixed camera networks, background subtraction techniques are most popular choices for motion-based human detection For the second one, the extracted features are modeling through statistical learning-based approaches Some popular human modeling which are built based on image features: Haar Wavelets [8], Haar-Like[9], HOG, Shapelet, etc However, using a single feature for human detection is not as effective as fusing some of them For the classification: Some popular classi_ers for human detection including SVM, AdaBoost, MPLBoost, Linear SVM, RBF kernel SVM It is crucial for selecting image features together with an effective classifier In general, HOG feature with Adaboost or linear SVM can give better human detection performance
Shadow removal technique:
A shadow is created when direct light from any source of illumination is obstructed either partially or totally by an object If the light energy is fallen less, that area is represented as shadow region whereas if the light energy is emitted more, this area is represented as non shadow region [11] There are two types of shadow, self shadow and cast shadow, as shown in Figure 2.2 Self-shadow is
Trang 23objects itself and another is cast-shadow [9] Both cast and self shadow has different brightness value The brightness of all the shadows in an image depends on the reflectivity of the object upon which they are cast as well as the illumination from secondary light sources Self shadows usually have a higher brightness than cast shadows since they receive more secondary lighting from surrounding illuminated objects
Figure 2.2 Kinds of object shadows ([9])
As described in [11], shadow removal techniques can be categorized as following:
- Model Based Techniques: Model based techniques have limited applicability and are applied to specific problems (say aerial images) and simple objects only These are dependent on prior information about illumination conditions and scene geometry as well as the object which also turns out to be a major drawback
- Image based Techniques: In these techniques, certain image shadow properties such as color/intensity, shadow structure and boundaries etc are used Nevertheless, if any of that information is available, it can be used to improve the detection process performance
- Color/Spectrum based Shadow Detection: The color/spectrum model attempts to describe the color change of shaded pixel and find the color feature that
is illumination invariant The shadows are then discriminated from foreground objects by using empirical thresholds on HSV color space
- Texture based Shadow Detection: The principle behind the textural model is that the texture of foreground objects is different from that of the background, while the texture of shaded area remains the same as that of the background The several techniques have been developed to detect moving cast shadows in a normal indoor environment
Human tracking:
Tracking is an important task of computer vision with many applications in surveillance, scene monitoring, navigation, sport scene analysis, and video database
Trang 24management The objective of tracking the object is linked to the object to be tracked in consecutive frames, as shown in an example in Figure 2.3 Although studying for many years, the problem of "tracking object" remains an open research today The links can be very difficult when the fast-moving subjects and related to the speed of the video frame In addition, tracking objects can be complicated by some reasons: Interference in the image; the obscured in whole or part of the object; Changing lighting conditions; Complex shape of the object Another situation which increases the complexity of the problem is that people who want to track changes direction constantly For these cases the tracking systems often use a dynamic model which describes the object can move how to take the various movements of the object
Figure 2.3: An illustratrion of the human tracking results in [9]
In vision community, there are various tracking approaches, as shown in Fig 2.4
Figure 2.4: Different tracking approaches
- Point tracking (Fig.2.4a): The detected objects are represented by points, and the tracking of these points is based on the previous object states which can include object positions and motion
- Appearance tracking (Fig.2.4b): The object appearance can be for example a rectangular template or an elliptical shape with an associated RGB color histogram Objects are tracked by considering the coherence of their appearances in
Trang 25consecutive frames This motion is usually in the form of a parametric transformation such as a translation, a rotation or an affinity
- Silhouette tracking (Fig.2.4c-d): The tracking is performed by estimating the object region in each frame Silhouette tracking methods use the information encoded inside the object region This information can be in the form of appearance density and shape models which are usually in the form of edge maps Given the object models, silhouettes are tracked by either shape matching or contour evolution
Trang 26Chapter 3: PROPOSED FRAMEWORK 3.1 Formulate the vision-based localization
We assume that a moving subject is separated from image sequence/video stream in a surveillance camera Basing on the detection results, the Region-Of-Interest (RoI) and a related point (e.g FootPoints, human centers) are extracted in each frame, as shown in Fig 3.1 For example, a FootPoint is a 2-D point (x,y) in image coordinate, where human foot touches the floor-plane
Figure 3.1: Foot-point definition
Given a FootPoint P(x,y), the corresponding position on 3-D world coordination is calculated by a transformation T, as defined below:
Figure 3.2: Transformation 2D point to 3-D point real world
To identify the homographic matrix H, a calibration procedure is setup H matrix is given by:
3 3
'
x H
x Hx
Trang 27In this thesis, we do not describe in detail the calibration procedure This procedure is setup by collecting chessboard images, as shown in Fig 3.3 Because the fixed camera is utilized, parameters of the transformation can be used at different times
Figure 3.3: Calibration procedure Top row: The original image; Bottom row: the corner points are detected for calculating the homographic matrix
The most important point, which affects to accuracy of the localization service
is the detected point from the image sequence Therefore well extracted ROIs and points, the more accurate of tracking and localization phases are An example case, which illustrates affect of the tracked-point detections, is shown in Fig 3.4 below:
Figure 3.4 A wrong tracked-point detection
In this example, the detected Footpoint (based on red-rectangle) is far from the correct one (marked by yellow-box) As consequent, the corresponding 3-D position
in real-world coordinate therefore is wrong estimated We know that most of detector is not always perfect The main reasons are common techniques such as background subtraction, human detection, and tracking always suffered from artifacts from environments (e.g., shadow) or object occlusions, or lighting conditions Therefore, to archive high accuracy results of the localization service, a detector needs to handle following problems:
- Shadow of objects
Trang 28- Noises by illumination changing
- Obscuring by other objects
- Noises of background by environment (light, branches shaking,)
Therefore, in this work, we focus on increase quality of the foot-point detection through many pruning procedures The general flow char of the proposed system is shown in Figure 3.5 below:
Figure 3.5: The general flow chart of the proposed method
To solve this one, the proposed technique has a major difference from Fig 3.5
It is that input of the object tracking procedure is pruned through a regression procedure We eliminate outlier results (that is results of the detections) based on a correlation evaluation The evaluation infer how is different between the detection results and the estimation results Too low correlation means that the detection results are not confident Otherwise, high correlation is preserved because such position is consensuses due to both observations: one from detection results; one from estimation results The inliers are put into an object tracking module Because
Trang 29only inliers are utilized, the object tracking module could be a simple procedure
We show effectiveness of the proposed techniques in the experimental results
3.2 Background subtraction
In the project environment, camera network is usually organized permanently; Environmental monitoring is limited in the building lobby, hallway These are the fixed environments and minimizing the change of environment Some environmental situation could be changed like opening door; changing the position
of objects (pots, fire extinguishers ) Therefore, our approaches to detect moving object is based on the background subtraction The solution is modeling background, then we use this model with the current frame from which to draw the foreground motion Putting into each video frame is compared with a reference pattern or patterns background The pixels in the current frame have a significant bias against the background will be seen as moving objects There are several different algorithms about background subtraction which are studied below [10] To obtain a trade-off between computational time and the performance, we utilize the Adaptive Gaussian mixture Model algorithm for the background subtraction procedure This technique is proposed by Stauffer and Grimson in 1999 The key-idea is an observation that a single platform model cannot handle continuous frames
in a long time due to problem of the light changing, repeated actions, the clutter from the actual setting Using the method of Gaussian mixture dispersed to represent each pixel in a model According to that argument, implement and integrate this approach into the surveillance system Figure 3.6 illustrates some background subtraction results using Adaptive GMM
Trang 30(a) Result with Cam1
(b) Result with Cam2
Figure 3.6: Result of BGS with adaptive Gaussian Mixture
Background subtraction is basic and simple method to detect moving object Therefore, there are some critical issues with the BGS results: existing shadows object; having much small noises Figure 3.7 and Figure 3.8 shows some examples Specially, shadow of moving objects which appears spread over the surface that the light is obscured (corner, assigned the two surfaces) are often larger than focused object These problems directly effects to quality of the detection and tracking moving object So, we suggest solutions to solve these problems in following next steps: Removing noises and Removing shadows of the objects
Trang 31a) b)
Figure 3.7: Widespread object shadow: a) Origin situation; b) Mask situation
a) Appearance of a large shadow b) Wrong object detection result
Figure 3.8: An illustration of the wrong object detection results due to shadow
appearances
3.3 Post-processing procedure
Illumination change is the cause of much noise appearances We apply a series
of morphological operator and filtering techniques to remove these artifacts
The proposed procedures are explained below:
Trang 32a1) Original image (t) b1) Original image (t+i)
Figure 3.9: Noisy by illumination changing
Rescale and thresholding to remove the small noises/artifacts:
o Down scale and upscale;
o Thresholding suitable selection
Applying median filters:
o Applying Filters and rescale to eliminate noises on original mask image
Find out the largest blob
Results of pre-processing are shown in Fig 3.10
Trang 33Figure 3.10: Results of preprocessing
However, after applying the post-processing procedure, there are some existing problems, e.g., still existing shadows of object; Blobs object is not fit and low accuracy detection object Then in next step, we prune the detection result through applying a shadow removal technique
3.4 Shadow removal techniques
In this work we utilize a density-based score fusion scheme using a based approach is taken into account for removing shadow regions This technique
feature-is proposed in a related work [26] To make the thesfeature-is to be more consolidated, we explain the shadow removal technique as follows In [26], two different types of features in the examined shadow region are extracted They are chromaticity-based and physical features Two likelihoods or shadow-matching scores are calculated from corresponding features A likelihood ratio as shadow per non shadow score is calculated Probabilities of shadow and non shadow are estimated on the basis of approximating distributions of the shadow-matching scores using GMM
3.4.1 Chromaticity-based feature extraction
Chromaticity features have been chosen in various shadow detection techniques, such as in [20] Because of shadow regions tend to have lower light intensity than their nearby areas Therefore, we firstly convert an input RGB image into HSV color space HSV color space separates chromaticity and luminosity channels In HSV color space, shadow region on background does not change its hue (H) and shadow pixels often have lower in the saturation (S) Adapt to this observation, we then calculate the difference of Hue and Saturation between Foreground (F) and Background (B) at a shadow pixel p as below: