MINISTRY OF EDUCATION AND TRAININGHANOI UNIVERSITY OF SCIENCE AND TECNOLOGYTHI HUONG GIANG DOAN DYNAMIC HAND GESTURE RECOGNITION USING RGB-D IMAGES FOR HUMAN-MACHINE INTERACTION Specialt
Trang 1MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY
THI HUONG GIANG DOAN
DYNAMIC HAND GESTURE RECOGNITION USING RGB-DIMAGES FOR HUMAN-MACHINE INTERACTION
DOCTORAL THESIS OFCONTROL ENGINEERING AND AUTOMATION
Hanoi − 2017
Trang 2MINISTRY OF EDUCATION AND TRAININGHANOI UNIVERSITY OF SCIENCE AND TECNOLOGY
THI HUONG GIANG DOAN
DYNAMIC HAND GESTURE RECOGNITION USING
RGB-D IMAGES FOR HUMAN-MACHINE
INTERACTION
Specialty: Control Engineering and Automation
Specialty Code: 62520216
DOCTORAL THESIS OFCONTROL ENGINEERING AND AUTOMATION
SUPERVISORS:
1 Dr Hai Vu
2 Dr Thanh Hai Tran
Hanoi − 2017
Trang 3DECLARATION OF AUTHORSHIP
I, Thi Huong Giang Doan, declare that the thesis titled, “Dynamic Hand GestureRecognition Using RGB-D Images for Human-Machine Interaction”, and the workspresented in it are my own I confirm that:
This work was done wholly or mainly while in candidature for a Ph.D researchdegree at Hanoi University of Science and Technology
Where any part of this thesis has previously been submitted for a degree or anyother qualification at Hanoi University of Science and Technology or any otherinstitution, this has been clearly stated
Where I have consulted the published work of others, this is always clearly tributed
at- Where I have quoted from the work of others, the source is always given Withthe exception of such quotations, this thesis is entirely my own work
I have acknowledged all main sources of help
Where the thesis is based on work done by myself jointly with others, I havemade exactly what was done by others and what I have contributed myself
Hanoi, December 2017PhD STUDENT
Thi Huong Giang DOAN
SUPERVISORS
Trang 4This thesis was written during my doctoral study at International Research stitute Multimedia, Information, Communication and Applications (MICA), HanoiUniversity of Science and Technology (HUST) It is my great pleasure to thank all thepeople who supported me for completing this work
In-First, I would like to express my sincere gratitude to my advisors Dr Hai Vu and
Dr Thi Thanh Hai Tran for the continuous support of my Ph.D study and related search, for their patience, motivation, and immense knowledge Their guidance helped
re-me in all the tire-me of research and writing of this thesis I could not have imaginedhaving a better advisor and mentor for my Ph.D study
Besides my advisors, I would like to thank the scientists and the authors of thepublished works which are cited in this thesis, and I am provided with valuable infor-mation resources from their works for my thesis The attention at scientific conferenceshave always been a great experience for me to receive many the useful comments
In the process of implementation and completion of my research, I have receivedmany supports from the board of MICA directors My sincere thanks go to Prof YenNgoc Pham, Prof Eric Castelli and Dr Son Viet Nguyen, who provided me with anopportunity to join researching works in MICA institute, and who gave access to thelaboratory and research facilities Without their precious support would it have beenbeing impossible to conduct this research
As a Ph.D student of 911 programme, I would like to thanks 911 programme fortheir financial support during my Ph.D course I also gratefully acknowledge the finan-cial support for publishing papers and conference fees from research projects T2014-100,T2016-PC-189, and T2016-LN-27 I would like to thank my colleagues at ComputerVision Department and Multi-Lab of MICA institute over the years both at work andoutside of work
Special thanks to my family Words can not express how grateful I am to mymother and father for all of the sacrifices that they have made on my behalf I wouldalso like to thank my beloved husband Thank you for supporting me for everything
Hanoi, December 2017Ph.D StudentThi Huong Giang DOAN
Trang 51.1 Completed hand gesture recognition systems for controlling home
appli-ances 8
1.1.1 GUI device dependent systems 8
1.1.2 GUI device independent systems 14
1.2 Hand detection and segmentation 18
1.2.1 Color 19
1.2.2 Shape 20
1.2.3 Motion 21
1.2.4 Depth 21
1.2.5 Discussions 23
1.3 Hand gesture spotting system 24
1.3.1 Model-based approaches 25
1.3.2 Feature-based approaches 27
1.3.3 Discussions 29
1.4 Dynamic hand gesture recognition 29
1.4.1 HMM-based approach 30
1.4.2 DTW-based approach 31
1.4.3 SVM-based approach 33
1.4.4 Deep learning-based approach 34
1.4.5 Conclusion 35
1.5 Discussion and Conclusion 35
Trang 62.1 Defining dynamic hand gestures 37
2.2 The existing dynamic hand gesture datasets 38
2.2.1 The published dynamic hand gesture datasets 38
2.2.1.1 The RGB hand gesture datasets 38
2.2.1.2 The Depth hand gesture datasets 40
2.2.1.3 The RGB and Depth hand gesture datasets 41
2.2.2 The non-published hand gesture datasets 44
2.2.3 Conclusion 46
2.3 Definition of the closed-form pattern of gestures and phasing issues 47
2.3.1 A conducting commands of a dynamic hand gestures set 47
2.3.2 Definition of the closed-form pattern of gestures and phasing issues 48 2.3.3 Characteristics of dynamic hand gesture set 50
2.4 Data collection 51
2.4.1 MICA1 dataset 51
2.4.2 MICA2 dataset 52
2.4.3 MICA3 dataset 53
2.4.4 MICA4 dataset 54
2.5 Discussion and Conclusion 55
3 HAND DETECTION AND GESTURE SPOTTING WITH USER-GUIDE SCHEME 56 3.1 Introduction 56
3.2 Heuristic user-guide scheme 58
3.2.1 Assumptions 58
3.2.2 Proposed framework 58
3.2.3 Estimating heuristic parameters 60
3.2.3.1 Estimating parameters of background model for body detection 60
3.2.3.2 Estimating the distance from hand to the Kinect sensor for extracting hand candidates 62
3.2.3.3 Estimating skin color parameters for pruning hand regions 63 3.2.4 Hand detection phase using heuristic parameters 65
3.2.4.1 Hand detection 65
3.2.4.2 Hand posture recognition 66
3.3 Dynamic hand gesture spotting 66
3.3.1 Catching buffer 66
3.3.2 Spotting dynamic hand gesture 67
3.4 Experimental results 71
3.4.1 The required learning time for end-users 71
Trang 73.4.2 The computational time for hand segmentation and recognition 73
3.4.3 Performance of the hand region segmentations 75
3.4.3.1 Evaluate the hand segmentation 75
3.4.3.2 Compare the hand posture recognition results 75
3.4.4 Performance of the gesture spotting algorithm 76
3.5 Discussion and Conclusion 78
3.5.1 Discussions 78
3.5.2 Conclusions 78
4 DYNAMIC HAND GESTURE REPRESENTATION AND RECOG-NITION USING SPATIAL-TEMPORAL FEATURES 79 4.1 Introduction 79
4.2 Proposed framework 80
4.2.1 Hand representation from spatial and temporal features 81
4.2.1.1 Temporal features extraction 81
4.2.1.2 Spatial features extraction using linear reduction space 83 4.2.1.3 Spatial features extraction using non-linear reduction space 84
4.2.2 DTW-based phase synchronization and KNN-based classification 86 4.2.2.1 Dynamic Time Warping for phase synchronization 86
4.2.2.2 Dynamic hand gesture recognition using K-NN method 88 4.2.3 Interpolation-based synchronization and SVM Classification 89
4.2.3.1 Dynamic hand gesture representation 89
4.2.3.2 Quasi-periodic dynamic hand gesture pattern 91
4.2.3.3 Phase synchronization using hand posture interpolation 94 4.2.3.4 Dynamic hand gesture recognition using difference clas-sifications 96
4.3 Experimental results 97
4.3.1 Influence of temporal resolution on recognition accuracy 97
4.3.2 Tunning kernel scale parameters RBF-SVM classifier 98
4.3.3 Performance evaluation of the proposed method 99
4.3.4 Impacts of the phase normalization 100
4.3.5 Further evaluations on public datasets 101
4.4 Discussion and Conclusion 103
4.4.1 Discussion 103
4.4.2 Conclusion 103
5 CONTROLLING HOME APPLIANCES USING DYNAMIC HAND GESTURES 105 5.1 Introduction 105
Trang 85.2 Deployment of control systems using hand gestures 105
5.2.1 Assignment of hand gestures to commands 105
5.2.2 Different modes of operations carried out by hand gestures 107
5.2.2.1 Different states of lamp and their transitions 107
5.2.2.2 Different states of fan and their transition 108
5.2.3 Implementation of the control system 108
5.2.3.1 Main components of the control system using hand ges-tures 108
5.2.3.2 Integration of hand gesture recognition modules 109
5.3 Experiments of control systems using hand gestures 115
5.3.1 Environment and material setup 115
5.3.2 Pre-built script 116
5.3.3 Experimental results 117
5.3.3.1 Evaluation of hand gesture recognition 118
5.3.3.2 Evaluation of time costs 119
5.3.4 Evaluation of usability 120
5.4 Discussion and Conclusion 121
5.4.1 Discussions 121
5.4.2 Conclusion 122
Trang 9TT Abbreviation Meaning
1 ANN Artifical Neural Network
10 CNN Convolution Neural Network
11 CPU Central Processing Unit
12 CRFs Conditional Random Fields
13 CSI Channel State Information
15 DDNN Deep Dynamic Neural Networks
18 DTM Dense Trajectories Motion
29 GUI Graphic User Interface
Trang 1031 HCRFs Hidden Conditional Random Fields
32 HNN Hopfield Neural Network
34 HOG Histogram of Oriented Gradient
45 LLE Locally Linear Embedding
48 MFC Microsoft Founding Classes
49 MSC Mean Shift Clustering
53 PCA Principal Component Analysis
54 PDF Probability Distribution Function
55 PNG Portable Network Graphics
56 QCIF Quarter Common Intermediate Format
59 RBF Radial Basic Function
62 RGB-D Red Green Blue Depth
63 RMSE Root Mean Square Error
Trang 1166 SIFT Scale Ivariant Feature Transform
69 STF Spatial Temporal Feature
Trang 12LIST OF TABLES
Table 1.1 Soft remote control system and commands assignment 12
Table 1.2 Omron TV command assignment 15
Table 1.3 Hand gestures utilized for different devices using Wisee technique 16 Table 1.4 Hand gestures utilized for different devices using MR technique 17
Table 1.5 The existing in-air gesture-based systems 18
Table 1.6 The existing vision-based dynamic hand gesture methods 36
Table 2.1 The existing Hand gesture datasets 46
Table 2.2 The main commands of some smart home electrical appliances 48 Table 2.3 Notations used in this research 50
Table 2.4 Characteristic of the defined databases 55
Table 3.1 The required time to learning parameters of the background model 72 Table 3.2 The required time to learn parameters of the hand-skin color model 73
Table 3.3 The required time to learn the hand to Kinect distance 73
Table 3.4 The required time to hand segmentation 74
Table 3.5 The required time to hand posture recognition 74
Table 3.6 Results of the JI indexes without/with learning scheme 75
Table 4.1 Recall rate the proposed method (%) on myself datasets with the difference classifiers 100
Table 4.2 Performance of the proposed method on three different datasets 103 Table 5.1 Assignment of hand gestures to commands for controlling lamp and fan 107
Table 5.2 Confusion matrix of dynamic hand gesture recognition 118
Trang 13Table 5.3 Accuracy rate (%) of dynamic hand gesture commands 118Table 5.4 Assessment of end-users on the defined dataset 120
Trang 14LIST OF FIGURES
Figure 1 Home appliances in a smart homes 3
Figure 2 Controlling home appliances using dynamic hand gestures in smart house 3
Figure 3 The proposed frame-work of the dynamic hand gesture recogni-tion for controlling home appliances 6
Figure 1.1 Mitsubishi hand gesture-based TV [46] 9
Figure 1.2 Samsung-Smart-TV using hand gestures 10
Figure 1.3 Dynamic hand gestures used for Samsung-Smart-TV 10
Figure 1.4 Hand gesture commands in Soft Remote Control System [39] 11
Figure 1.5 General framework of the Soft remote control system [39] 11
Figure 1.6 Hand gesture-based home appliances system [143] 12
Figure 1.7 TV Controlling with GUI of Dynamic gesture recognition [151] 13 Figure 1.8 Commands of GUI of Dynamic gesture recognition [103] 13
Figure 1.9 Features of the Omron dataset [3] 14
Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119] 15 Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset) 16 Figure 1.12 Simulation of using MR to control some home appliances [62] 17
Figure 1.13 AirTouch-based control uses depth cue [33] 18
Figure 1.14 Depth threshold cues and face skin [97] 22
Figure 1.15 Depth threshold and skeleton [60] 23
Figure 1.16 The process of detecting hand region [69] 23
Figure 1.17 Spotting dynamic hand gestures system using HMM model [71] 25 Figure 1.18 Threshold using HMM model for different gestures [71] 26
Figure 1.19 CRFs-based spotting method use threshold [142] 26
Trang 15Figure 1.20 Designed gesture in proposed method [13] 28
Figure 1.21 Two gesture boundaries is spotted [65] 29
Figure 1.22 Gesture recognition using HMM [42] 31
Figure 1.23 Gesture features extraction [8] 33
Figure 2.1 Periodic image sequences appear in many common actions 38
Figure 2.2 Four hand gestures of [83] 39
Figure 2.3 Cambridge hand gesture dataset of [67] 39
Figure 2.4 Five hand gestures of [82] 40
Figure 2.5 Twelve dynamic hand gestures of the MSRGesture3D dataset [1] 41 Figure 2.6 Dynamic hand gestures of [88] 42
Figure 2.7 Gestures of NATOPS dataset [140] 43
Figure 2.8 Dynamic hand gestures of SKIG Dataset [76] 44
Figure 2.9 Gestures in Charlean dataset 44
Figure 2.10 Dynamic hand gestures of [93] 44
Figure 2.11 Dynamic hand gestures of the NVIDIA dataset [87] 45
Figure 2.12 Dynamic hand gestures of PowerGesture dataset [71] 45
Figure 2.13 Hand shape variations and hand trajectories (low panel) of the proposed gesture set (5 gestures) 48
Figure 2.14 In each row, changes of the hand shape during a gesture perform-ing From left-to-right, hand-shapes of the completed gesture chance in a cyclical pattern (closed-opened-closed) 49
Figure 2.15 Comparing the similarity between the closed-form gestures and a simple sinusoidal signal 51
Figure 2.16 Close cyclical hand gesture pattern and cycle signal 51
Figure 2.17 The environment setup the MICA1 dataset 52
Figure 2.18 The environment setup for the MICA2 dataset 52
Figure 2.19 The environment setup for the MICA3 dataset 53
Trang 16Figure 2.20 The environment setup for the MICA4 dataset 54
Figure 3.1 Diagram of the proposed hand gesture spotting system 57
Figure 3.2 Diagram of the proposed hand detection and segmentation system 59 Figure 3.3 The Venn diagram representing the relationship between the pixel sets I, D, Bd, Hd, S and H∗ 60
Figure 3.4 Results of hand region detection 61
Figure 3.5 Result of the learning distance parameter (a-c) Three consecu-tive frames; (d) Results of subtracting two first frames; (e) Results of the subtracting two next frames; (f) Binary thresholding operator; (g) A range of hand (left) and of body (right) on the depth histogram 63
Figure 3.6 The training skin color model 63
Figure 3.7 Result of the training skin color model 64
Figure 3.8 Results of the hand segmentation (a) a Candidate of hand; (b) Mahalanobis distance; (c) Refining the segmentation results using RGB features 66
Figure 3.9 Catching buffer to store continous hand frames 67
Figure 3.10 The area cues of the hand regions 68
Figure 3.11 The velocity cues of the hand regions 68
Figure 3.12 The comnination of area and velocity signal of the hands 69
Figure 3.13 The finding of local peak from the original area signal of the hands 70 Figure 3.14 Log activities of an evaluator who follows stages of the user-guide scheme and represents seven hand postures for preparing the posture dataset 72
Figure 3.15 Seven type of the postures recognized in the proposed system (a) The first row: original images with results of the hand detections (in red boxes) (b) The second row: zoom-in version of the hand regions without segmentation (c) The third row: the corresponding segmented hand 73
Figure 3.16 Results of the kernel-based descriptors for hand posture recogni-tion without/with segmentarecogni-tion 76
Trang 17Figure 3.17 Performances of the dynamic gesture spotting on two datasets
MICA1 and MICA2 77
Figure 3.18 An illustration of the gesture spotting errors 77
Figure 4.1 The comparison framework of hand gesture recognition 81
Figure 4.2 Optical flow and Trajectory of the go-right hand gesture 83
Figure 4.3 An illustration of the Go left hand gesture before and after pro-jecting in the constructed PCA space 84
Figure 4.4 3D manifold of hand postures belonging to five gesture classes 86
Figure 4.5 An illustration of the DTW results of two hand gestures (T,P) (a)-(b) Alignments between postures in T and P in the image space and the spatial-temporal space (c)-(d) The refined alignments after removing repetitive ones 87
Figure 4.6 Distribution of dynamic hand gestures in the low-dimension 89
Figure 4.7 Five dynamic hand gestures in the 3D dimension 90
Figure 4.8 Define quasi-periodic image sequence 91
Figure 4.9 Illustrations of the phase variations 92
Figure 4.10 Define quasi-periodic image sequence in phase domain 92
Figure 4.11 Manifold representation of the cyclical Next hand gesture 93
Figure 4.12 Phase synchronization 94
Figure 4.13 Whole length sequence is synchronized with the best difference phase 95
Figure 4.14 Whole length sequence is synchronized with the the best similar phase 95
Figure 4.15 a, c) Original hand gestures b,d) corresponding interpolated hand gestures 96
Figure 4.16 ROC curves of hand gesture recognition results with SVM classifier 97 Figure 4.17 The dynamic hand gesture recognition results with the difference kernel scale SVM 98
Trang 18Figure 4.18 The comparison combination characteristics (KLT and ISOMAP)
of dynamic hand gesture 99
Figure 4.19 Performance comparisons with different techniques 101
Figure 4.20 Comparison results between the proposed method vs others at thirteen positions 101
Figure 4.21 Dynamic hand gestures in the sub-NVIDIA dataset 102
Figure 4.22 Confusion Matrixs with MSRGesture3D and Sub-NVIDIA Datasets103 Figure 5.1 Illustration of light controlling using dynamic hand gestures with different levels of intensity of the lamp 106
Figure 5.2 Illustration of ten modes of fan controlled by dynamic hand ges-tures 106
Figure 5.3 The state diagram of the proposed lighting control system 107
Figure 5.4 The state diagram of the proposed fan control system 108
Figure 5.5 A schematic representation of basic components in hand gesture-based control system 109
Figure 5.6 Integration of hand gesture recognition modules 109
Figure 5.7 The proposed frame-work for training phase 110
Figure 5.8 The proposed flow chart for the online dynamic hand gesture recognition 111
Figure 5.9 The proposed flow chart for controlling lamp 113
Figure 5.10 The proposed flow chart for controlling fan 114
Figure 5.11 Setup for evaluating the control systems 115
Figure 5.12 Illustration of environment and material setup 117
Figure 5.13 The time-line of the proposed evaluation system 119
Figure 5.14 The time cost for the proposed dynamic hand gesture recognition system 120
Figure 5.15 Usability evaluation of the proposed system 120
Trang 19INTRODUCTION Motivation
Home-automation products have been widely used in smart homes (or smartspaces) thanks to recent advances in intelligent computing, smart devices, and newcommunication protocols In term of automating ability, most of advanced technolo-gies are focusing on either saving energy or facilitating the control via an user-interface(e.g., remote controllers [92], mobile phones [7], tablets [52], voice recognition [11])
To maximize user ability, a human-computer interaction method must allow end-userseasily using and naturally performing the conventional operations Motivated by suchadvantages, this thesis pursues an unified solution to deploy a complete hand gesture-based control system for home appliances A natural and friendly way will be deployed
in order to replace the conventional remote controller
A complete gesture-based controlling application requires both robustness as well
as low computational time However, these requirements face to many technical lenges such as a huge computational cost and complexity of hand movements Theprevious solution only focus on one of problems in this field To solve these issues, twotrends in the literature are investigated One common trend bases on aided-devicesand another focuses on improving the relevant algorithms/paradigms The first groupaddresses the critical issues by using supportive devices such as a data-glove [85, 75],hand markers [111], or contact sensors mounted on hand, or palm of end-users whenthey control home appliances Obviously, these solutions are expensive or inconve-nient for the end-users For the second one, hand gesture recognition has been widelyattempted by researchers in the communities of computer visions, robotics, and au-tomation control However, how to achieve the robustness and low computational timestill remaining an open question In this thesis, the main motivation pursues a set
chal-of “suggestive” hand gestures There is an argument that the characteristics chal-of handgestures are important cues in contexts of deploying a complete hand gesture-basedsystem
On the other hand, recent new and low-cost depth sensors have been widely applied
in the fields of robotic and automation control These devices open new opportunitiesfor addressing the critical issues of gesture recognition schemes This work attempts
to benefit from Kinect sensor [2] which provides both RGB and depth features lizing such valuable features offer an efficient and robust solution for addressing thechallenges
Trang 20The thesis aims to achieve a robust, real-time hand gesture recognition system As
a feasible solution, the proposed method should be natural and friendly for end-users
A real application is deployed for automatically controlling a fan and/or bulb/lampusing hand gestures They are the common electrical home appliances Without anylimitation, the proposed technique tends to extend a specific case to general homeautomation control systems To this end, the concrete objectives are:
- Defining an unique set of dynamic hand gestures This gesture set conveys mands that are available in common home electronic appliances such as televi-sion, fan, lamp, door, air-conditioner, and so on Moreover, the proposed gestureset is designed with unique characteristics These characteristics are importantcues and offer promising solutions to address the challenges of a dynamic handgestures recognition system
com A realcom time spotting dynamic hand gestures from input video stream The procom posed spotting gesture technique consists of relevant solutions of hand detectionand hand segmentation from consecutive RGB-D images In the view of a com-plete system, the spotting technique considers a preprocessing procedure
pro Performances of a dynamic hand gesture recognition method depends on gesture’srepresentation and matching phases This work aims to extract and representboth spatial and temporal features of the gestures Moreover, thesis intends tomatch phases of the gallery and probe sequences using a phase synchronizationscheme The proposed phase synchronization aims to solve variants of gesturespeeds, acquisition frame rates In the experiments, the proposed method withvarious positions, directions, and distances from the human to the Kinect sensorare evaluated
- A proposed framework to control home appliances (such as lamp/fan) is deployed
A full hand gesture-based system is built in an indoor scenario (a smart-room).The prototypes of the proposed system for controlling fans and lamps are shown
in Fig 5.1 and in Fig 5.2, respectively Evaluations of usability with theproposed datasets and experimental evaluations are reported Datasets are alsoshared to the community for further evaluations
Context, constraints, and challenges
Figure 2 shows the context when end-user controls home electronic appliances
in a living room environment Nowadays there are many methods to control home
Trang 21appliances (as illustrated in Fig 1 (a)) The main differences from existing onesare that the proposed hand gesture recognition system aims to convey naturally andconveniently the commands of home appliance equipment without any requirements of
a remote control, as illustrated in Fig 1 (b)
Figure 1 Home appliances in a smart homes
Figure 2 Controlling home appliances using dynamic hand gestures in smart house
The proposed system operates with a Kinect sensor This device is mounted atthe fixed position to obtain good system performance as well as to make end-users feelcomfortable To deploy a real application of home appliance controlling using dynamichand gestures, the thesis has some constraints for studying on dynamic hand gesturerecognition as the following:
Trang 22 The Kinect sensor:
– The Kinect sensor is immobile when end-users implement interactions.– The Kinect sensor captures RGB and Depth images at a normal frame rate(from 10 to 30 fps) with an image resolution of 640×480 pixels for both ofthose image types
– The visible area is an area in front of the Kinect sensor so that every objectcan be viewed by the Kinect sensor (not only limited by distance from theobjects to the camera (from 0.8m to 4m) but also coved by an angle of 300around the center axe of the Kinect sensor)
Furnitures and other objects are distributed uniformly in a square room
For an instance time, it is assumed that that only one end-user controls a homeappliance by using dynamic hand gestures of his/her right hand If there is morethan one subject in the room, the nearest person from the Kinect sensor will beconsidered
When an end-user wants to control an electronic appliance, he/she should stand infront of the Kinect sensor in the visible area of the Kinect sensor, raises one handforward to the Kinect sensor and implements gestures that have been designedpreviously
The above-mentioned scenarios and constraints are to cope with the followingissues:
Changing of illumination: While natural lighting source changes within a day(in the morning, noon, afternoon), artificial lighting sources in a smart-roomcondition could be affected by:
Complex background: In a practical environment, the background of the scene
is complex with many types of furniture The background could contain otherobjects whose color appearances are similar to human-skin color Moreover, someobjects, which are same distances to the Kinect sensor, may appear Therefore,
a task to clearly detect and segment hand from the background meets challenges
Computational time: Consists of costs for training the end-users and for cessing relevant procedures of a complete system The proposed gesture-basedapplication requires real-time performance It is worth to study and proposereasonable solutions to address the real-time performance issue
pro- Representing dynamic hand gestures: The gestures consist of non-rigid handshapes in a continuous image sequences Therefore, in order to obtain good per-
Trang 23formances of recognition, the gesture’s representation should adapt to variation
of hand-shape along temporal dimension
Variations of gestures: The end-users (subjects) implement dynamic hand tures with artifacts such as different speeds/velocities, captured frame rate changes,various length of hand’s trajectories Therefore, the proposed dynamic handgesture system must be designed to adapt to such variations Thesis mainlyaddresses such issues by a new phase synchronization technique
of specific characteristics that are useful and supportive for deploying a robusthand gesture recognition system A number of datasets are captured with a largenumber of end-users The datasets consist both RGB-Depth images and publishfor the research community about dynamic hand gestures In addition, thesedatasets are to evaluate the performances of proposed algorithms
Contribution 2: An efficient user-guide scheme is proposed to learn the tic parameters-based with a trade-off between a real-time system and user inde-pendent system This scheme helps to obtain both a real-time hand detectionand good performance of hand segmentation Then, an efficient gesture spot-ting method is proposed that utilizes the features extracted from continuoussegmented hand regions
heuris- Contribution 3: Proposing an efficient representation for dynamic hand tures which combines spatial-temporal features By using some most significantdimensions from the nonlinear reduced space (ISOMAP technique), the spatialfeatures are extracted for dynamic hand gesture representations The trajectories
ges-of hand movements are extracted using KLT technique This proposed tation is especially helpful for discriminating the different types of the gestures
represen-In addition, to resolve the gestures’ variation issues, a new phase synchronization
is proposed By using a proposed interpolation method in in the spatial-temporalspace, a new sequence with a pre-determined length is created
Contribution 4: A complete system is deployed to control light and fan in
Trang 24a smart-room environment The system utilizes the proposed algorithms andachieves both high accuracy and real-time performance In addition, it is sufferedevaluations from a large number of end-users in different contexts such as inTechmark Exhibitions (Sept., 2015, 2016), or in Technical demonstration sessions.(Celebration of HUST’s 60th Anniversary, Oct., 2016).
General framework and thesis outline
5 types of controls
Defining dataset
Setup Hand gestures DB
Hand detection &
segmentation
Spotting dynamic hand gesture
Real-time dynamic hand gesture spotting
Spotted dynamic hand gesture
Control home appliances (natural way & real environment)
Application system
Dynamic hand gesture
representation Phase synchronization
Robust dynamic hand gesture recognition
Hand gesture classifer
Figure 3 The proposed frame-work of the dynamic hand gesture recognition for trolling home appliances
con-This thesis proposes an unified solution of dynamic hand gesture recognition Theproposed framework consists of three main phases as illustrated in Fig 3 They are(1) hand detection and segmentation from a video stream; (2) spotting dynamic handgestures; and (3) the recognition schemes Utilizing this framework, a real application
is also deployed The application is evaluated in different contexts such as in lab-basedenvironments, demonstrations in the exhibitions and Tech-mart events Particularly,these research works in the thesis are divided into five chapters as follows:
Introduction: This chapter describes the main motivations and objectives of thestudy The thesis also presents the research’s context, constraints, and challenges.These factors could be raised when addressing the relevant problems in the thesis.Additionally, the general proposed framework, and the main contributions arealso presented in this Chapter
Trang 25 Chapter 1: This chapter mainly surveys existing complete hand gesture-basedcontrol systems Particularly, the related techniques of home appliance electronicequipment are discussed A series of the relative techniques consisting of handdetection, segmentation, and recognition techniques when systems are surveyed
in this Chapter
Chapter 2: In this Chapter, existing datasets of the dynamic hand gestures arefirstly described Then, the common commands of the home appliances are ex-amined Based on these studies, a new set of hand gestures is proposed Thisnew set consists the gestures with cyclical hand patterns A number of the pro-posed gesture sets are collected in different experiments such as in exhibitions,lab-based environments for further works
Chapter 3: This chapter proposes a learning scheme to learn a heuristic rameters for the hand detection and segmentation Utilizing the results of thelearning scheme, hand detection and segmentation results obtain not only a ro-bust and real-time but also good performance Given the segmented hand of thecontinuous sequence, a proposed method for spotting gestures is also presented
pa- Chapter 4: This chapter describes the proposed algorithms and experimentalevaluations for dynamic hand gesture recognition system An efficient represen-tation of the hand gestures based on spatial-temporal features are proposed Tosolve critical issues of gesture variations, a phase synchronization enhancing thesystem’s performance is presented The proposed algorithms are evaluated onseveral datasets (both collected and public datasets)
Chapter 5: By utilizing the proposed framework, a complete system is deployed tocontrol lamps/bulbs and fans in indoor environments A number of volunteers/end-user are invited to interact with the proposed system The computational costsand end-users’ feedbacks are reported The application shows the feasibility ofthe proposed method to deploy a real application
Conclusion and Future Works: Conclusions of the works and relevant discussions
on the limitations of the proposed method are given in this Chapter Furtherresearch directions are proposed for future works
Trang 26CHAPTER 1 LITERATURE REVIEW
This chapter presents surveys on the related works of hand gesture-based systemsand dynamic hand gesture recognition methods Firstly, the relevant hand gesture-based applications are presented in Sec 1.1 Then as vision based hand gesturerecognition techniques generally consist of three main steps: The hand detection andsegmentation, spotting dynamic hand gesture, and dynamic hand gesture recognitionmethods Therefore, state of the art works related to these problems will be presented
in Sec 1.2, Sec 1.3, and Sec 1.4 respectively
home appliances
Nowadays, dynamic hand gesture-based controlling system has been still an tive topic in fields of computer vision because of its wide range of practical applica-tions, more specifically, sign hand language, lie detection, game, e-learning, human-robot interaction, and so on From the viewpoint of the specific applications insmart homes which utilize computer vision based techniques, readers can refer surveys[46, 4, 3, 36, 106, 53, 109] Hand gesture-based home automation field was applied onmany types of equipment such as TV, air-conditioner, fan, light, door In this section,the completed systems to control home appliances through hand gesture commandsare briefly surveyed Since the devices were a recent representative multimedia devicesapplying the hand motion techniques Hand gesture-based controlling system is di-vided into two categories: GUI device dependent systems and GUI device independentsystems The first category is presented in Sec 1.1.1, and the latter one is presented
ac-in Sec 1.1.2
1.1.1 GUI device dependent systems
Derived from the actual requirements of the hand gesture-based controlling tems which hand recognition systems face to many challenges as high accuracy rates,and/or computation time Many real applications deployed an independence GUI sys-tem which end-user is easy to interact with the system In these systems, hands arefirstly detected with some constraints such as put hand in the defined region, then it istracked as mouse pointers on the screen [46, 39, 151, 103] Mitsubishi Electric TV: [46]
Trang 27sys-proposes a hand gesture-based system to control TV with four commands as illustrated
in Fig 1.1 A computer monitor is a middle equipment to show GUI interface withthe end-user An end-user has to use one open hand to trigger gesture mode, a handicon appears following the hand The end-user then moves the hand to adjust variousgraphical with the hand icon A controlling mode is ended by closing hand The nor-malized correlation is calculated for every offset positions of hand movements in theimage This is a simple algorithm which requires an angle of hand movement vectorsuch as trigger gesture is 250, and 150 during tracking of the hand These constraintsare quite disadvantaging for the end-user In addition, time cost for interaction up to ahalf second delay before recognition of the trigger gesture, and hand tracking at about
5 times a second, delay 2 seconds before resuming its search for the Turn On command
Graphics display
HP 735 Computer
monitor
Plex-cam video camera
Television
Electronically controlled remote control
Figure 1.1 Mitsubishi hand gesture-based TV [46]
Samsung smart TV: A Samsung smart TV could be controlled by moving one/twohands as illustrated in Fig 1.2 One can activate the system by waving his/her hand
in front of the TV during two seconds When the system is activated, an end-user canselect hand gesture-based controlling mode in order to replace a conventional method
In this case, a pointer will appear on the TV and a gesture control command bar appears
at the bottom of TV Therefore, end-users can recognize the available commands onthe device
Trang 28Figure 1.2 Samsung-Smart-TV using hand gestures.
A CCD/CMOS camera is placed above or below the TV to capture images Thecamera does not capture entire body of end-user It just tracks the location of end-user’s head and the hands.The system required a distance between an end-user anddevice that is at least 1.5 m The cursor in TV’screen is controlled by the location ofthe hand TV uses thirteen gestures to convey commands as illustrated in Fig 1.3.This system is good for the ideal living room lighting situation and its accuracy largeeffects of light condition as well as the complex background
Waving a hand Flip to right Flip to left Grab Long grab Thumb up CCWRotation
Move Grab&move Waving two hand Grab&Widening Grab&Narrowing Grab&Rotating
Figure 1.3 Dynamic hand gestures used for Samsung-Smart-TV
Even though TV gesture-based consists up to thirteen dynamic hand gesturesand convey many commands in TV These commands are quite natural and memorial.But this system requires a GUI-interface to interact between end-user and TV, thendetected hand seems as a mouse pointer of the computer This application is dependent
on the GUI interface, therefore, it is difficult to use for every home appliances such as
a lamp, fan and so on Moreover, this application only uses RGB images, so it is muchaffected by illumination In addition, end-users have to put the hand with distanceabout 1.5m that is is not good for human’s health
Soft Remote Control System: Authors in [39] propose a feasible gesture-basedsystem to control various home appliances such as TV, curtain, light naturally Handgesture commands are designed by a combination of hand postures and hand motions
as illustrated in Fig 1.4
Trang 29Hand postures Hand motions
Figure 1.4 Hand gesture commands in Soft Remote Control System [39]
In order to recognize the end-user’s hand pointing gestures in a large space, threeRGB cameras (320x240 pixels) are setup as illustrated in Fig 1.5 This system isquite sensitivity to illumination as well as requires a management screen for end-users tochoose the control modes as described in Table 1.1 End-user’s face firstly is detected toobtain skin color Then the hand regions are extracted Based on the segmented hands
on three RGB images, the 3-D position of the hand is located thanks to conventionalstereo-matching techniques To determine instance time to active the system, 3-Dhand’s position, speed of hand, and face direction are combined HMM model is used
to recognize the gestures All features are separated into two categories: static featureand dynamic feature Dynamic feature such as a sequence of directional codeword andpattern of cumulative angle have temporal variance at each sample point Static featuresuch as length and angle similarity between measured line vector and reference vectors
do not have this feature As reported, the system obtains quite low performances withthe accuracy rate of 80.7%
Figure 1.5 General framework of the Soft remote control system [39]
Improvements from [39], authors in [143] use two USB pan-tilt cameras to
Trang 30de-Table 1.1 Soft remote control system and commands assignment
Menu rotation (clockwise) Channel up Close Off
Menu rotation (counterclockwise) channel down
ploy the home appliances controlling system The system strongly depends on theassumption that a gesture is performed if the hand is moved in a high speed Featuresextracted from hand motion is used to spot gestures The difference from above sys-tem, the relationship between face and hand is calculated to determine the controlledappliance as illustrated in Fig 1.6 The nearest distance from the appliance to theend-user is found to define the controlled equipment Then, the direction and distance
of the pointed position to the nearest appliance are calculated This proposed method
is often wrong because using 2D images to estimated relation between a point in 3D isnot accurate In addition, this system provides a feedback information for the end-user
to tells the nearest positioned appliance from the pointed 3D position by the end-user
It also provides the information of direction and distance to select properly the nearestpositioned appliance If feedback is wrong, the end-user must change position anddirection of hand to adjust their requirements Accuracy rate obtains 86% only
VCR Candidate
Audio Candidate
Light Candidate
VCR Selected
Figure 1.6 Hand gesture-based home appliances system [143]
Zhengmao Zou et al [151, 103]: The GUI system is created to control TV asillustrated in Fig 1.7, and OpenCV for image processing tasks The moment invari-ants algorithm is proposed to recognize ten dynamic gestures A simple background
Trang 31is setup to segment hand region using skin color features Then, the moment ants algorithm to extract descriptive feature with the first four moments adequated
invari-to represent a gesture uniquely, and hence result in a simple feature vecinvari-tor Finally,SVM classifier is applied to recognize dynamic hand gesture In addition, a dynamichand gesture involves four static hand gestures: start, meaning command 1, meaningcommand 2, and stop static gesture as illustrated in Fig 1.8 Using four static handgesture is a tight constraint for the end-user as well as the system
Figure 1.7 TV Controlling with GUI of Dynamic gesture recognition [151]
Dynamic gesture recognition also minimizes the problems associated with handmovements in static hand gesture recognition systems The authors improved fromtheir first attempt using static hand gesture system The system is capable of rejectingunintentional gestures more easily than a static system because of the start and stoproutines associated with dynamic gestures as illustrated in Fig 1.8 Although handgesture dataset is diversity with twenty gestures Moreover, their proposed method isquite simple, the accuracy rate is obtained at 74.4% only
(a) Start-switch-down-stop
(b) Start-volumn-up-stop
Figure 1.8 Commands of GUI of Dynamic gesture recognition [103]
Ramamoorthy et al [106]: authors proposed the gesture-based digital TV usingultrasonic array sensors This proposed method uses a GUI interface, when the hand
Trang 32moves in front of TV to perform basic graphical controls, the hand is detected based
on pattern matching Then, hand motion is detected and track Six dynamic handgestures are recognized to convey commands, that consists: turn on, changing channels,increasing or decreasing volumes commands
Shiguo Lian et al [73]: authors propose a dynamic gesture-based TV controlwith automatic end-user’s state detection Three types of sensors are utilized with
an ultrasonic distance sensor, RGB, and depth camera while only six dynamic handgestures are conveyed commands to control TV: swift left, right, up, down, back, andforward Which is not effective to deploy a real application Moreover, the systemdetects end-user’s gaze and tracks his/her face based on viewing direction Hand’smotion is used as an important cue to recognize the gestures That is not robust aswell as their algorithm to detect and segment hand based on the face detection result
in RGB cues, or depth cues for hand detection are some limitations
1.1.2 GUI device independent systems
(a) A trigger motion (b) Shapes: V-sign / 1 finger (c) Motion: up,down,left, right
Figure 1.9 Features of the Omron dataset [3]
Omron hand gesture dataset: OMRON Corporation has proposed a new handgesture recognition technology to obtain the position, shape, and motion of hand orfinger [3] First, OMRON’s core facial image sensing technology is used Then, thegesture is recognized automatically based on the interrelation between the position
or direction of the face and hand shape or position The trigger motion to start therecognition process through an initial movement between the face and hand region asFig 1.9 (a) Simple instructions used in human interfaces as up, down, left, rightmotions (Fig 1.9 (b)), V-sign and/or 1-finger sign are defined (Fig 1.9 (c)) Thedistances range between the end-user and the TV is around 10 cm to several meters.These distance are quite near and not good for human’s health Moreover, after triggerhand gesture controlling mode by the calculating distance from face and hand, thedynamic hand gesture is recognized by only motion feature, this is a weak feature ofthe recognition system
For the end-users’ behavior, authors in [119] surveyed when the end-user interacts
Trang 33Table 1.2 Omron TV command assignment
Open hand shape Turn on/off Power Button
Letter motions Jump channel directly Channel Buttons
Right motion Turn channel Channel Up 7
Down motion Turn up volume Volume Up
Up motion Turn down volume Volume Down
Close hand shape Mute volume Mute Button
V-hand shape Start Gesture Start Gesture
with TV By selecting hand shapes and hand motions through interaction with a handgesture selection system This hand gesture dataset consists of eight gestures as illus-trated in Table 1.2 Using both static and dynamic hand gestures may be face to somechallenges when deployed in a real application such as time cost, a combination of thealgorithm if a number of gestures are not too large
Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119]
Wisee-based hand gestures: [9] uses wifi signals (Fig 1.10) When hand waves,the receiver measures the wifi signal changing and interprets commands A novelsegmentation method based on wavelet analysis and STE is proposed The novelalgorithm intercepts CSI segments and analyzes the variations in CSI caused by handmotions, revealing the unique patterns of gestures Then, DTW technique is applied
to classify gestures Seven hand gestures are designed to control TV, laptop and condition (Fig 1.11) that are mapped in Table 1.3 Despite of high accuracy (92.1%)but an emission source in this system is set up by two WB-2400D300 antennas to
Trang 34air-increase the power of wifi signal Therefore, using high power wifi signal may beharmful to human’s health, it is not feasibility to apply for the real home appliances.
Pointing
Scroll up Scroll down Swipe leftward Swipe rightward Flick
Grab
Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset)
Table 1.3 Hand gestures utilized for different devices using Wisee technique
Lei Jing et.al [62]: an MR is wearied into thumb to acquire hand gesture tories as illustrated in Fig 1.12 (b) The proposed system determines the startingand ending points of hand motion Then a hierarchical classifier based on HMM clas-sify dynamic hand gesture motions Corresponding to these trajectories of the handcommands to control different home appliances as lamp, CD radio, and television Sixdefined gesture trajectories are proposed as illustrated in Fig 1.12 (a) to convey com-mands as illustrated in detail at Table 1.4 However, wearing a ring to control homeappliances make end-users feel uncomfortable, unnatural, unfriendly while accuracyrate only obtains with 85%
trajec-Bretzner et al [23]: Hand postures are represented in terms of hierarchies of scale color image features at different scales, with qualitative inter-relations in terms
multi-of scale, position and orientation Detection multi-of multi-scale color features is performed.Hand states are then simultaneously detected and tracked using particle filtering, with
an extension of layered sampling referred to as hierarchical layered sampling These
Trang 35(a) The six gestures dataset (b) Control home appliances
Figure 1.12 Simulation of using MR to control some home appliances [62].Table 1.4 Hand gestures utilized for different devices using MR technique
Right Rotate Volume up Swipe files rightward Volume up
Max Up Appliance list(+) Appliance list(+) Appliance list(+)Max Down Appliance list(-) Appliance list(-) Appliance list(-)
components have been integrated into a real-time prototype system, applied to a testproblem of controlling consumer electronics using hand gestures
PointGrab company[33]: At the Consumer Electronics Show in 2014, PointGrabshowed technology for controlling devices and home appliances by pointing the finger.Depth cues are used to create a “transparent space” in front of the end-user, offeringtouch like operation albeit from a distance The end-users can control some homeappliances as TV, air-condition, lamp, door so on A chip is embedded to process agesture algorithms, calibrates the position of the end-user’s eyes with the position anddirection of the finger For example, a finger points forward to a lamp and turn thedimmer up or down The end-user can control up to 4.5m away and under all lightingconditions Moreover, this system only controls two commands with pointing the end-user’s finger to turn on and turn off a home appliance In addition, using depth cuesonly may face to some algorithm limitations while this company did not publish theirmethod
There are many systems use hand gestures to control home appliances as shown
in Table 1.5 which could be separated into the two following groups: dependence andindependence with GUI of the devices For the first category, it requires a GUI tointerface between an end-user and equipment Therefore, theses systems need a screen
Trang 36Figure 1.13 AirTouch-based control uses depth cue [33].
Table 1.5 The existing in-air gesture-based systems
ID Reference Equipments Vision-based Other in-air gesture
1 Mitsubishi, 1995 [46] TV x
2 Jun-hyeong et al., 2002 [39] TV, Lamp, Curtain x
3 Lars Bretzner et al., 2002 [23] TV x
4 Yang et al.,2006 [143] TV, Fan, Lamp, Music x
5 Asanterabi Malima et al 2006,[81] Robot x
6 Lee-Ferng et al., 2009 [72] Robot x
7 Xu Zhang et al., 2009[149] Game controller x
8 Sigalas et al., 2010 [122] Robot x
9 Solanki et al., 2011[126] TV, Stereo in car x
10 NATOPS, 2011 [140] Air-plane x
11 Omron, 2012 [3] TV x
12 Jing et.al, 2012 [62] Lamp, CD Radio, and TV Magic-ring
13 Samsung 2013[4] TV x
14 PointGrab company, 2013 [33] TV, Lamp, Air-Conditioner, Fan x
15 Microsoft Xbox-Kinect 2013[2] Game controller x
16 Lian [73] TV x
17 Ramamoorthy et al.,2015 [106] TV x
18 Prajapat et al., 2015[117] Robot x
19 NVIDIA, 2016 [87] Car x
20 Aqaness et al., 2016 [9] TV, Laptop/Smart-device, Air-conditioner Wisee
21 Grif et al., 2016[51] Laptop x
22 Biju et.al, 2016[18] TV, Stereo x
23 Azilen company 2017 [102] Lamp, Air-conditioner, Security, Washer, Wisee
Ovens or Fefrigerator, TV x
which is not suitable for almost non-screen home appliances such as fan, lamp, and so
on For the second category, some proposed methods use other technique as WiFi-based[9], or limit number of hand gestures such as turn on/off commands [39]
This step aims to determine the hand region, then cut as well as prune and keeponly hand region in images Because human’s hands that has a large number of DoFs,
a huge variety of 2D appearances will be depended on the camera viewpoints, ent scales, resolution, illumination or even different characters of sensor types (Depthand/or Color images) Therefore, accurate detection of hands in images or video re-
Trang 37differ-mains a challenging problem To deploy a real system, it requires not only accuracybut also low time cost for this step There has been uncountable proposed methods forhand detection and segmentation that used the main following features such as: Color[27, 115, 111, 132], shape [57, 110, 91], motion [10, 57], and depth [111, 101].
1.2.1 Color
Skin color-based segmentation has been applied by several approaches Some ofthem use the value of individual pixels that often detects hand pixels based on skincolor, other approaches utilize the relationship between pixels or regions [151, 103, 73]
In [23], authors propose an approach to detect feature based on color First, thesignificant blob and ridge feature is extracted from an image of a hand Then based
on color cues remaining part of the image will be extracted as hand region candidate.Yeong et al [39] detect a face in order to create skin model A 2D histogram fromthe pixels of the detected face area is computed on U and V channels of YUV colorspace This histogram is used to convert moving region pixels in the current inputimage to a corresponding probability of skin color images This work is not affected
by different lighting conditions
To overcome the limitation of RGB due to its sensitivity to light conditions, Yang
et al [143] convert RGB image to YCrCb and discard Y component which containsluminance information After splitting into Cr (red color information chrome) and Cb(blue color cues chrome) they detect red blob using thresholding technique Duringthe procedure, a closing operation is adopted to remove the fragments of red blobs.However, this method is not effective if the background color is the same with skincolor
In [145], the RGB color model is combined by both color and brightness cueswhich effects from background illumination and environment Therefore, authors try
to convert from RGB image to HSV color space [132] proposes a method to detectskin regions using an algorithm of color segmentation which is robust to the lightingcondition The proposed algorithm uses a neural network as a step of color normaliza-tion However, this step takes so much time because each pixel will be passed to theneural network Moreover, using color to detect hand requires some constraint suchas: (1) the background does not contain objects that are similar to skin color, (2) illu-mination is changing, (3) the background is simple For the complex background, theskin segmentation is combined with a sliding window technique to reduce the searchspace of a hand posture recognition This work requires complex algorithm and highcomputation time This work is not effective if the background is similar to skin color
Trang 38Hand segmentation based on skin color is still quite challenging in the real plications It requires some constraints to obtain high performance such as simplebackground, invariant illumination, difference color between hand and background Toimprove the accuracy of hand detection, other cues or method are combined with skincolor or combined multiple features such as topography, shape, edge, the motion of thehand and so on.
ap-1.2.2 Shape
The shape feature has been utilized to detect the hand in images The shapefeature is often obtained by extracting the contours and edges Normally, hand shape-based methods firstly extracts all contours in the image Then, basing on characteristic
of hand shape, hand regions are detected [10, 37, 23, 127]
To achieve a high performance, many methods always require sophisticated gorithms For instance, [99] constructs multiple layers of the texture, color, shape ofthe hand; or in order to extract precisely hand contours, shape of the hand to extractprecisely hand contours This method obtained high accuracy but requires up to 2.65s
al-to process one image of resolution 160x120
In [35], after detecting face region using a Viola-Jone detector, hand region issearched using skin detection and hand contour comparison algorithm based on handposture templates In [10], authors firstly use three complementary detectors to detecthand candidates Each candidate is then scored by the three detectors independently.Next, a classifier is applied to compute a final confidence score for the proposals usingthese features Although this proposed method obtains higher performance but itrequires up to 2 minutes for one image
In [16], a descriptor is computed to measure the similarity between shapes Theshape context descriptor is characterized a particular point location on the shape.This descriptor is the histogram of the relative polar coordinates of all other points.Detection is based on the assumption that corresponding points on two different shapeswill ideally have a similar shape context
In [32], the corona effect smoothing and border extraction algorithm are used tofind the contour of the hand Then, the FD is used to describe the hand shapes The
FD is defined as the power spectrum of discrete FD of the boundary points that arerepresented by complex numbers The Hausdorff distance measure is used to trackshape-variant hand motion
In [127], the blobs of hands are extracted by thresholding the luminance of thecaptured images Then, the chain code representations of blobs contours are used to
Trang 39compute the fingertip positions by the linear time algorithm Last, 3D positions ofthe fingertips were reconstructed by triangulation and subsequently smoothed usingKalman filters This system aims to reduce the processing complexity by limiting so-phisticated operations only to geometric primitives The time cost of this segmentationmethod spends more than 90% of the system.
[31] supposes that a hand-forearm region (including a hand and part of a forearm)has different brightness from other skin-colored regions Therefore, the hand-forearmregion is segmented by the brightness difference The hand region is extracted fromthe hand-forearm region by detecting a feature point of the wrist This method isdifficult to deploy a real application because it can not operate when it does not detecthand-forearm
The hand is a non-rigid object with variants of hand shapes, scales, and colors.White shape-based hand detection methods use contours of the hand Therefore, usingshapes of hand to detect are quite challenging
1.2.3 Motion
In [57], shape-variant hand motion is measured to track by the Hausdorff distance.Next, hand figures are characterized by the scale and rotation-invariant Fourier descrip-tor Then, a 3D modified HNN that uses a graph matching to recognize dynamic handgestures Fifteen hand gestures are recognized that achieves above 91% recognitionrate But, this method requires a high time cost (up to 10s), enlarges the library ofstored models, and depends on thresholding parameters
In [84], a supervisor selects and activates visual processes In a process, a fidence factor is provided to dynamically reconfigure the system that responses toevents in the scene Hand tracking is described using image differencing and normal-ized histogram matching The result of hand detection is used by a recursive estimator(Kalman filter) to provide an estimate of the position and size of the hand Moreover,time cost for this system is only 5fps
con-1.2.4 Depth
To speed up the detection as well as to avoid illumination and skin color changing,[111] used depth cues that are captured by a Kinect sensor with two main assumptions:the hand is the front most object and face to the sensor; the end-user must wear a blackbelt on the gesturing hand’s wrist The hands are detected by a threshold of the depthmap in order to determine hand regions Then, RANSAC is applied to locate theposition of a black belt to precisely localize and segment the hand region Following
Trang 40the same idea, [141] proposes to detect hand from the human skeleton They alsorefine the hand region by a threshold on the depth data and black wristband detection.Obviously, the current approaches favor the use of depth data and only use color ascomplementary.
[17] extracts skeleton of hands and applies HMM for classifying various handgestures [40] introduces a method to estimate 3D hand pose and hand configuration.Then, template matching technique was used for recognition procedures [42] proposes
a method which used RGB and Depth image for hand detection
[101] uses a threshold to cut hand region from depth images But depth ing actually does not perform a perfect segmentation [26] proposes a method relies ondepth information that obtained from a stereoscopic video system Head of the human
threshold-is firstly detected from stereoscopic, then depth data of the head threshold-is obtained Fromthe depth of head, a threshold is used to cut hand region This method is real-timesystem but the accuracy of hand region is not good Moreover, the accuracy of handdetection is affected by the result of head detection, and it is difficult to exactly define
a threshold
Figure 1.14 Depth threshold cues and face skin [97]
Park et al [97] propose a method to detect hand as illustrated in Fig 1.14 Adepth threshold is used to detect hand region Then, the face is detected to obtainskin model Finally, the hand is segmented based on the skin model [97] improves thequality of segmented hand region but hand detection results depends of results of facedetection Moreover, the skin color of the face is different from hand’s skin because itcontains some unwanted objects such as eye, eyebrow, mouth, and so on
In [148], hand detection based on skin color model and background subtractionfrom depth images A threshold of skin color model is selected and combined withbackground subtraction Stereo camera calibration and disparity mapping are used toextract hand from the depth information In [60], hand region is cropped on the basis