Nhận dạng cử chỉ động của bàn tay người sử dụng kết hợp thông tin hình ảnh và độ sâu ứng dụng trong tương tác người thiết bị

MINISTRY OF EDUCATION AND TRAININGHANOI UNIVERSITY OF SCIENCE AND TECNOLOGYTHI HUONG GIANG DOAN DYNAMIC HAND GESTURE RECOGNITION USING RGB-D IMAGES FOR HUMAN-MACHINE INTERACTION Specialt

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY

THI HUONG GIANG DOAN

DYNAMIC HAND GESTURE RECOGNITION USING RGB-DIMAGES FOR HUMAN-MACHINE INTERACTION

DOCTORAL THESIS OFCONTROL ENGINEERING AND AUTOMATION

Hanoi − 2017

Trang 2

MINISTRY OF EDUCATION AND TRAININGHANOI UNIVERSITY OF SCIENCE AND TECNOLOGY

THI HUONG GIANG DOAN

DYNAMIC HAND GESTURE RECOGNITION USING

RGB-D IMAGES FOR HUMAN-MACHINE

INTERACTION

Specialty: Control Engineering and Automation

Specialty Code: 62520216

DOCTORAL THESIS OFCONTROL ENGINEERING AND AUTOMATION

SUPERVISORS:

1 Dr Hai Vu

2 Dr Thanh Hai Tran

Hanoi − 2017

Trang 3

DECLARATION OF AUTHORSHIP

I, Thi Huong Giang Doan, declare that the thesis titled, “Dynamic Hand GestureRecognition Using RGB-D Images for Human-Machine Interaction”, and the workspresented in it are my own I confirm that:

This work was done wholly or mainly while in candidature for a Ph.D researchdegree at Hanoi University of Science and Technology

Where any part of this thesis has previously been submitted for a degree or anyother qualification at Hanoi University of Science and Technology or any otherinstitution, this has been clearly stated

Where I have consulted the published work of others, this is always clearly tributed

at- Where I have quoted from the work of others, the source is always given Withthe exception of such quotations, this thesis is entirely my own work

I have acknowledged all main sources of help

Where the thesis is based on work done by myself jointly with others, I havemade exactly what was done by others and what I have contributed myself

Hanoi, December 2017PhD STUDENT

Thi Huong Giang DOAN

SUPERVISORS

Trang 4

This thesis was written during my doctoral study at International Research stitute Multimedia, Information, Communication and Applications (MICA), HanoiUniversity of Science and Technology (HUST) It is my great pleasure to thank all thepeople who supported me for completing this work

In-First, I would like to express my sincere gratitude to my advisors Dr Hai Vu and

Dr Thi Thanh Hai Tran for the continuous support of my Ph.D study and related search, for their patience, motivation, and immense knowledge Their guidance helped

re-me in all the tire-me of research and writing of this thesis I could not have imaginedhaving a better advisor and mentor for my Ph.D study

Besides my advisors, I would like to thank the scientists and the authors of thepublished works which are cited in this thesis, and I am provided with valuable infor-mation resources from their works for my thesis The attention at scientific conferenceshave always been a great experience for me to receive many the useful comments

In the process of implementation and completion of my research, I have receivedmany supports from the board of MICA directors My sincere thanks go to Prof YenNgoc Pham, Prof Eric Castelli and Dr Son Viet Nguyen, who provided me with anopportunity to join researching works in MICA institute, and who gave access to thelaboratory and research facilities Without their precious support would it have beenbeing impossible to conduct this research

As a Ph.D student of 911 programme, I would like to thanks 911 programme fortheir financial support during my Ph.D course I also gratefully acknowledge the finan-cial support for publishing papers and conference fees from research projects T2014-100,T2016-PC-189, and T2016-LN-27 I would like to thank my colleagues at ComputerVision Department and Multi-Lab of MICA institute over the years both at work andoutside of work

Special thanks to my family Words can not express how grateful I am to mymother and father for all of the sacrifices that they have made on my behalf I wouldalso like to thank my beloved husband Thank you for supporting me for everything

Hanoi, December 2017Ph.D StudentThi Huong Giang DOAN

Trang 5

1.1 Completed hand gesture recognition systems for controlling home

appli-ances 8

1.1.1 GUI device dependent systems 8

1.1.2 GUI device independent systems 14

1.2 Hand detection and segmentation 18

1.2.1 Color 19

1.2.2 Shape 20

1.2.3 Motion 21

1.2.4 Depth 21

1.2.5 Discussions 23

1.3 Hand gesture spotting system 24

1.3.1 Model-based approaches 25

1.3.2 Feature-based approaches 27

1.4 Dynamic hand gesture recognition 29

1.4.1 HMM-based approach 30

1.4.2 DTW-based approach 31

1.4.3 SVM-based approach 33

1.4.4 Deep learning-based approach 34

1.4.5 Conclusion 35

1.5 Discussion and Conclusion 35

Trang 6

2.1 Defining dynamic hand gestures 37

2.2 The existing dynamic hand gesture datasets 38

2.2.1 The published dynamic hand gesture datasets 38

2.2.1.1 The RGB hand gesture datasets 38

2.2.1.2 The Depth hand gesture datasets 40

2.2.1.3 The RGB and Depth hand gesture datasets 41

2.2.2 The non-published hand gesture datasets 44

2.2.3 Conclusion 46

2.3 Definition of the closed-form pattern of gestures and phasing issues 47

2.3.1 A conducting commands of a dynamic hand gestures set 47

2.3.2 Definition of the closed-form pattern of gestures and phasing issues 48 2.3.3 Characteristics of dynamic hand gesture set 50

2.4 Data collection 51

2.4.1 MICA1 dataset 51

3 HAND DETECTION AND GESTURE SPOTTING WITH USER-GUIDE SCHEME 56 3.1 Introduction 56

3.2 Heuristic user-guide scheme 58

3.2.1 Assumptions 58

3.2.2 Proposed framework 58

3.2.3 Estimating heuristic parameters 60

3.2.3.1 Estimating parameters of background model for body detection 60

3.2.3.2 Estimating the distance from hand to the Kinect sensor for extracting hand candidates 62

3.2.3.3 Estimating skin color parameters for pruning hand regions 63 3.2.4 Hand detection phase using heuristic parameters 65

3.2.4.1 Hand detection 65

3.2.4.2 Hand posture recognition 66

3.3 Dynamic hand gesture spotting 66

3.3.1 Catching buffer 66

3.3.2 Spotting dynamic hand gesture 67

3.4 Experimental results 71

3.4.1 The required learning time for end-users 71

Trang 7

3.4.2 The computational time for hand segmentation and recognition 73

3.4.3 Performance of the hand region segmentations 75

3.4.3.1 Evaluate the hand segmentation 75

3.4.3.2 Compare the hand posture recognition results 75

3.4.4 Performance of the gesture spotting algorithm 76

3.5.2 Conclusions 78

4 DYNAMIC HAND GESTURE REPRESENTATION AND RECOG-NITION USING SPATIAL-TEMPORAL FEATURES 79 4.1 Introduction 79

4.2 Proposed framework 80

4.2.1 Hand representation from spatial and temporal features 81

4.2.1.1 Temporal features extraction 81

4.2.1.2 Spatial features extraction using linear reduction space 83 4.2.1.3 Spatial features extraction using non-linear reduction space 84

4.2.2 DTW-based phase synchronization and KNN-based classification 86 4.2.2.1 Dynamic Time Warping for phase synchronization 86

4.2.2.2 Dynamic hand gesture recognition using K-NN method 88 4.2.3 Interpolation-based synchronization and SVM Classification 89

4.2.3.1 Dynamic hand gesture representation 89

4.2.3.2 Quasi-periodic dynamic hand gesture pattern 91

4.2.3.3 Phase synchronization using hand posture interpolation 94 4.2.3.4 Dynamic hand gesture recognition using difference clas-sifications 96

4.3 Experimental results 97

4.3.1 Influence of temporal resolution on recognition accuracy 97

4.3.2 Tunning kernel scale parameters RBF-SVM classifier 98

4.3.3 Performance evaluation of the proposed method 99

4.3.4 Impacts of the phase normalization 100

4.3.5 Further evaluations on public datasets 101

4.4.1 Discussion 103

4.4.2 Conclusion 103

5 CONTROLLING HOME APPLIANCES USING DYNAMIC HAND GESTURES 105 5.1 Introduction 105

Trang 8

5.2 Deployment of control systems using hand gestures 105

5.2.1 Assignment of hand gestures to commands 105

5.2.2 Different modes of operations carried out by hand gestures 107

5.2.2.1 Different states of lamp and their transitions 107

5.2.2.2 Different states of fan and their transition 108

5.2.3 Implementation of the control system 108

5.2.3.1 Main components of the control system using hand ges-tures 108

5.2.3.2 Integration of hand gesture recognition modules 109

5.3 Experiments of control systems using hand gestures 115

5.3.1 Environment and material setup 115

5.3.2 Pre-built script 116

5.3.3 Experimental results 117

5.3.3.1 Evaluation of hand gesture recognition 118

5.3.3.2 Evaluation of time costs 119

5.3.4 Evaluation of usability 120

5.4.2 Conclusion 122

Trang 9

TT Abbreviation Meaning

1 ANN Artifical Neural Network

10 CNN Convolution Neural Network

11 CPU Central Processing Unit

12 CRFs Conditional Random Fields

13 CSI Channel State Information

15 DDNN Deep Dynamic Neural Networks

18 DTM Dense Trajectories Motion

29 GUI Graphic User Interface

Trang 10

31 HCRFs Hidden Conditional Random Fields

32 HNN Hopfield Neural Network

34 HOG Histogram of Oriented Gradient

45 LLE Locally Linear Embedding

48 MFC Microsoft Founding Classes

49 MSC Mean Shift Clustering

53 PCA Principal Component Analysis

54 PDF Probability Distribution Function

55 PNG Portable Network Graphics

56 QCIF Quarter Common Intermediate Format

59 RBF Radial Basic Function

62 RGB-D Red Green Blue Depth

63 RMSE Root Mean Square Error

Trang 11

66 SIFT Scale Ivariant Feature Transform

69 STF Spatial Temporal Feature

Trang 12

LIST OF TABLES

Table 1.1 Soft remote control system and commands assignment 12

Table 1.2 Omron TV command assignment 15

Table 1.3 Hand gestures utilized for different devices using Wisee technique 16 Table 1.4 Hand gestures utilized for different devices using MR technique 17

Table 1.5 The existing in-air gesture-based systems 18

Table 1.6 The existing vision-based dynamic hand gesture methods 36

Table 2.1 The existing Hand gesture datasets 46

Table 2.2 The main commands of some smart home electrical appliances 48 Table 2.3 Notations used in this research 50

Table 2.4 Characteristic of the defined databases 55

Table 3.1 The required time to learning parameters of the background model 72 Table 3.2 The required time to learn parameters of the hand-skin color model 73

Table 3.3 The required time to learn the hand to Kinect distance 73

Table 3.4 The required time to hand segmentation 74

Table 3.5 The required time to hand posture recognition 74

Table 3.6 Results of the JI indexes without/with learning scheme 75

Table 4.1 Recall rate the proposed method (%) on myself datasets with the difference classifiers 100

Table 4.2 Performance of the proposed method on three different datasets 103 Table 5.1 Assignment of hand gestures to commands for controlling lamp and fan 107

Table 5.2 Confusion matrix of dynamic hand gesture recognition 118

Trang 13

Table 5.3 Accuracy rate (%) of dynamic hand gesture commands 118Table 5.4 Assessment of end-users on the defined dataset 120

Trang 14

LIST OF FIGURES

Figure 1 Home appliances in a smart homes 3

Figure 2 Controlling home appliances using dynamic hand gestures in smart house 3

Figure 3 The proposed frame-work of the dynamic hand gesture recogni-tion for controlling home appliances 6

Figure 1.1 Mitsubishi hand gesture-based TV [46] 9

Figure 1.2 Samsung-Smart-TV using hand gestures 10

Figure 1.3 Dynamic hand gestures used for Samsung-Smart-TV 10

Figure 1.4 Hand gesture commands in Soft Remote Control System [39] 11

Figure 1.5 General framework of the Soft remote control system [39] 11

Figure 1.6 Hand gesture-based home appliances system [143] 12

Figure 1.7 TV Controlling with GUI of Dynamic gesture recognition [151] 13 Figure 1.8 Commands of GUI of Dynamic gesture recognition [103] 13

Figure 1.9 Features of the Omron dataset [3] 14

Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119] 15 Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset) 16 Figure 1.12 Simulation of using MR to control some home appliances [62] 17

Figure 1.13 AirTouch-based control uses depth cue [33] 18

Figure 1.14 Depth threshold cues and face skin [97] 22

Figure 1.15 Depth threshold and skeleton [60] 23

Figure 1.16 The process of detecting hand region [69] 23

Figure 1.17 Spotting dynamic hand gestures system using HMM model [71] 25 Figure 1.18 Threshold using HMM model for different gestures [71] 26

Figure 1.19 CRFs-based spotting method use threshold [142] 26

Trang 15

Figure 1.20 Designed gesture in proposed method [13] 28

Figure 1.21 Two gesture boundaries is spotted [65] 29

Figure 1.22 Gesture recognition using HMM [42] 31

Figure 1.23 Gesture features extraction [8] 33

Figure 2.1 Periodic image sequences appear in many common actions 38

Figure 2.2 Four hand gestures of [83] 39

Figure 2.3 Cambridge hand gesture dataset of [67] 39

Figure 2.4 Five hand gestures of [82] 40

Figure 2.5 Twelve dynamic hand gestures of the MSRGesture3D dataset [1] 41 Figure 2.6 Dynamic hand gestures of [88] 42

Figure 2.7 Gestures of NATOPS dataset [140] 43

Figure 2.8 Dynamic hand gestures of SKIG Dataset [76] 44

Figure 2.9 Gestures in Charlean dataset 44

Figure 2.10 Dynamic hand gestures of [93] 44

Figure 2.11 Dynamic hand gestures of the NVIDIA dataset [87] 45

Figure 2.12 Dynamic hand gestures of PowerGesture dataset [71] 45

Figure 2.13 Hand shape variations and hand trajectories (low panel) of the proposed gesture set (5 gestures) 48

Figure 2.14 In each row, changes of the hand shape during a gesture perform-ing From left-to-right, hand-shapes of the completed gesture chance in a cyclical pattern (closed-opened-closed) 49

Figure 2.15 Comparing the similarity between the closed-form gestures and a simple sinusoidal signal 51

Figure 2.16 Close cyclical hand gesture pattern and cycle signal 51

Figure 2.17 The environment setup the MICA1 dataset 52

Figure 2.18 The environment setup for the MICA2 dataset 52

Trang 16

Figure 3.1 Diagram of the proposed hand gesture spotting system 57

Figure 3.2 Diagram of the proposed hand detection and segmentation system 59 Figure 3.3 The Venn diagram representing the relationship between the pixel sets I, D, Bd, Hd, S and H∗ 60

Figure 3.4 Results of hand region detection 61

Figure 3.5 Result of the learning distance parameter (a-c) Three consecu-tive frames; (d) Results of subtracting two first frames; (e) Results of the subtracting two next frames; (f) Binary thresholding operator; (g) A range of hand (left) and of body (right) on the depth histogram 63

Figure 3.6 The training skin color model 63

Figure 3.7 Result of the training skin color model 64

Figure 3.8 Results of the hand segmentation (a) a Candidate of hand; (b) Mahalanobis distance; (c) Refining the segmentation results using RGB features 66

Figure 3.9 Catching buffer to store continous hand frames 67

Figure 3.10 The area cues of the hand regions 68

Figure 3.11 The velocity cues of the hand regions 68

Figure 3.12 The comnination of area and velocity signal of the hands 69

Figure 3.13 The finding of local peak from the original area signal of the hands 70 Figure 3.14 Log activities of an evaluator who follows stages of the user-guide scheme and represents seven hand postures for preparing the posture dataset 72

Figure 3.15 Seven type of the postures recognized in the proposed system (a) The first row: original images with results of the hand detections (in red boxes) (b) The second row: zoom-in version of the hand regions without segmentation (c) The third row: the corresponding segmented hand 73

Figure 3.16 Results of the kernel-based descriptors for hand posture recogni-tion without/with segmentarecogni-tion 76

Trang 17

Figure 3.17 Performances of the dynamic gesture spotting on two datasets

MICA1 and MICA2 77

Figure 3.18 An illustration of the gesture spotting errors 77

Figure 4.1 The comparison framework of hand gesture recognition 81

Figure 4.2 Optical flow and Trajectory of the go-right hand gesture 83

Figure 4.3 An illustration of the Go left hand gesture before and after pro-jecting in the constructed PCA space 84

Figure 4.4 3D manifold of hand postures belonging to five gesture classes 86

Figure 4.5 An illustration of the DTW results of two hand gestures (T,P) (a)-(b) Alignments between postures in T and P in the image space and the spatial-temporal space (c)-(d) The refined alignments after removing repetitive ones 87

Figure 4.6 Distribution of dynamic hand gestures in the low-dimension 89

Figure 4.7 Five dynamic hand gestures in the 3D dimension 90

Figure 4.8 Define quasi-periodic image sequence 91

Figure 4.9 Illustrations of the phase variations 92

Figure 4.10 Define quasi-periodic image sequence in phase domain 92

Figure 4.11 Manifold representation of the cyclical Next hand gesture 93

Figure 4.12 Phase synchronization 94

Figure 4.13 Whole length sequence is synchronized with the best difference phase 95

Figure 4.14 Whole length sequence is synchronized with the the best similar phase 95

Figure 4.15 a, c) Original hand gestures b,d) corresponding interpolated hand gestures 96

Figure 4.16 ROC curves of hand gesture recognition results with SVM classifier 97 Figure 4.17 The dynamic hand gesture recognition results with the difference kernel scale SVM 98

Trang 18

Figure 4.18 The comparison combination characteristics (KLT and ISOMAP)

of dynamic hand gesture 99

Figure 4.19 Performance comparisons with different techniques 101

Figure 4.20 Comparison results between the proposed method vs others at thirteen positions 101

Figure 4.21 Dynamic hand gestures in the sub-NVIDIA dataset 102

Figure 4.22 Confusion Matrixs with MSRGesture3D and Sub-NVIDIA Datasets103 Figure 5.1 Illustration of light controlling using dynamic hand gestures with different levels of intensity of the lamp 106

Figure 5.2 Illustration of ten modes of fan controlled by dynamic hand ges-tures 106

Figure 5.3 The state diagram of the proposed lighting control system 107

Figure 5.4 The state diagram of the proposed fan control system 108

Figure 5.5 A schematic representation of basic components in hand gesture-based control system 109

Figure 5.6 Integration of hand gesture recognition modules 109

Figure 5.7 The proposed frame-work for training phase 110

Figure 5.8 The proposed flow chart for the online dynamic hand gesture recognition 111

Figure 5.9 The proposed flow chart for controlling lamp 113

Figure 5.10 The proposed flow chart for controlling fan 114

Figure 5.11 Setup for evaluating the control systems 115

Figure 5.12 Illustration of environment and material setup 117

Figure 5.13 The time-line of the proposed evaluation system 119

Figure 5.14 The time cost for the proposed dynamic hand gesture recognition system 120

Figure 5.15 Usability evaluation of the proposed system 120

Trang 19

INTRODUCTION Motivation

Home-automation products have been widely used in smart homes (or smartspaces) thanks to recent advances in intelligent computing, smart devices, and newcommunication protocols In term of automating ability, most of advanced technolo-gies are focusing on either saving energy or facilitating the control via an user-interface(e.g., remote controllers [92], mobile phones [7], tablets [52], voice recognition [11])

To maximize user ability, a human-computer interaction method must allow end-userseasily using and naturally performing the conventional operations Motivated by suchadvantages, this thesis pursues an unified solution to deploy a complete hand gesture-based control system for home appliances A natural and friendly way will be deployed

in order to replace the conventional remote controller

A complete gesture-based controlling application requires both robustness as well

as low computational time However, these requirements face to many technical lenges such as a huge computational cost and complexity of hand movements Theprevious solution only focus on one of problems in this field To solve these issues, twotrends in the literature are investigated One common trend bases on aided-devicesand another focuses on improving the relevant algorithms/paradigms The first groupaddresses the critical issues by using supportive devices such as a data-glove [85, 75],hand markers [111], or contact sensors mounted on hand, or palm of end-users whenthey control home appliances Obviously, these solutions are expensive or inconve-nient for the end-users For the second one, hand gesture recognition has been widelyattempted by researchers in the communities of computer visions, robotics, and au-tomation control However, how to achieve the robustness and low computational timestill remaining an open question In this thesis, the main motivation pursues a set

chal-of “suggestive” hand gestures There is an argument that the characteristics chal-of handgestures are important cues in contexts of deploying a complete hand gesture-basedsystem

On the other hand, recent new and low-cost depth sensors have been widely applied

in the fields of robotic and automation control These devices open new opportunitiesfor addressing the critical issues of gesture recognition schemes This work attempts

to benefit from Kinect sensor [2] which provides both RGB and depth features lizing such valuable features offer an efficient and robust solution for addressing thechallenges

Trang 20

The thesis aims to achieve a robust, real-time hand gesture recognition system As

a feasible solution, the proposed method should be natural and friendly for end-users

A real application is deployed for automatically controlling a fan and/or bulb/lampusing hand gestures They are the common electrical home appliances Without anylimitation, the proposed technique tends to extend a specific case to general homeautomation control systems To this end, the concrete objectives are:

- Defining an unique set of dynamic hand gestures This gesture set conveys mands that are available in common home electronic appliances such as televi-sion, fan, lamp, door, air-conditioner, and so on Moreover, the proposed gestureset is designed with unique characteristics These characteristics are importantcues and offer promising solutions to address the challenges of a dynamic handgestures recognition system

com A realcom time spotting dynamic hand gestures from input video stream The procom posed spotting gesture technique consists of relevant solutions of hand detectionand hand segmentation from consecutive RGB-D images In the view of a com-plete system, the spotting technique considers a preprocessing procedure

pro Performances of a dynamic hand gesture recognition method depends on gesture’srepresentation and matching phases This work aims to extract and representboth spatial and temporal features of the gestures Moreover, thesis intends tomatch phases of the gallery and probe sequences using a phase synchronizationscheme The proposed phase synchronization aims to solve variants of gesturespeeds, acquisition frame rates In the experiments, the proposed method withvarious positions, directions, and distances from the human to the Kinect sensorare evaluated

- A proposed framework to control home appliances (such as lamp/fan) is deployed

A full hand gesture-based system is built in an indoor scenario (a smart-room).The prototypes of the proposed system for controlling fans and lamps are shown

in Fig 5.1 and in Fig 5.2, respectively Evaluations of usability with theproposed datasets and experimental evaluations are reported Datasets are alsoshared to the community for further evaluations

Context, constraints, and challenges

Figure 2 shows the context when end-user controls home electronic appliances

in a living room environment Nowadays there are many methods to control home

Trang 21

appliances (as illustrated in Fig 1 (a)) The main differences from existing onesare that the proposed hand gesture recognition system aims to convey naturally andconveniently the commands of home appliance equipment without any requirements of

a remote control, as illustrated in Fig 1 (b)

Figure 1 Home appliances in a smart homes

Figure 2 Controlling home appliances using dynamic hand gestures in smart house

The proposed system operates with a Kinect sensor This device is mounted atthe fixed position to obtain good system performance as well as to make end-users feelcomfortable To deploy a real application of home appliance controlling using dynamichand gestures, the thesis has some constraints for studying on dynamic hand gesturerecognition as the following:

Trang 22

The Kinect sensor:

– The Kinect sensor is immobile when end-users implement interactions.– The Kinect sensor captures RGB and Depth images at a normal frame rate(from 10 to 30 fps) with an image resolution of 640×480 pixels for both ofthose image types

– The visible area is an area in front of the Kinect sensor so that every objectcan be viewed by the Kinect sensor (not only limited by distance from theobjects to the camera (from 0.8m to 4m) but also coved by an angle of 300around the center axe of the Kinect sensor)

Furnitures and other objects are distributed uniformly in a square room

For an instance time, it is assumed that that only one end-user controls a homeappliance by using dynamic hand gestures of his/her right hand If there is morethan one subject in the room, the nearest person from the Kinect sensor will beconsidered

When an end-user wants to control an electronic appliance, he/she should stand infront of the Kinect sensor in the visible area of the Kinect sensor, raises one handforward to the Kinect sensor and implements gestures that have been designedpreviously

The above-mentioned scenarios and constraints are to cope with the followingissues:

Changing of illumination: While natural lighting source changes within a day(in the morning, noon, afternoon), artificial lighting sources in a smart-roomcondition could be affected by:

Complex background: In a practical environment, the background of the scene

is complex with many types of furniture The background could contain otherobjects whose color appearances are similar to human-skin color Moreover, someobjects, which are same distances to the Kinect sensor, may appear Therefore,

a task to clearly detect and segment hand from the background meets challenges

Computational time: Consists of costs for training the end-users and for cessing relevant procedures of a complete system The proposed gesture-basedapplication requires real-time performance It is worth to study and proposereasonable solutions to address the real-time performance issue

pro- Representing dynamic hand gestures: The gestures consist of non-rigid handshapes in a continuous image sequences Therefore, in order to obtain good per-

Trang 23

formances of recognition, the gesture’s representation should adapt to variation

of hand-shape along temporal dimension

Variations of gestures: The end-users (subjects) implement dynamic hand tures with artifacts such as different speeds/velocities, captured frame rate changes,various length of hand’s trajectories Therefore, the proposed dynamic handgesture system must be designed to adapt to such variations Thesis mainlyaddresses such issues by a new phase synchronization technique

of specific characteristics that are useful and supportive for deploying a robusthand gesture recognition system A number of datasets are captured with a largenumber of end-users The datasets consist both RGB-Depth images and publishfor the research community about dynamic hand gestures In addition, thesedatasets are to evaluate the performances of proposed algorithms

Contribution 2: An efficient user-guide scheme is proposed to learn the tic parameters-based with a trade-off between a real-time system and user inde-pendent system This scheme helps to obtain both a real-time hand detectionand good performance of hand segmentation Then, an efficient gesture spot-ting method is proposed that utilizes the features extracted from continuoussegmented hand regions

heuris- Contribution 3: Proposing an efficient representation for dynamic hand tures which combines spatial-temporal features By using some most significantdimensions from the nonlinear reduced space (ISOMAP technique), the spatialfeatures are extracted for dynamic hand gesture representations The trajectories

ges-of hand movements are extracted using KLT technique This proposed tation is especially helpful for discriminating the different types of the gestures

represen-In addition, to resolve the gestures’ variation issues, a new phase synchronization

is proposed By using a proposed interpolation method in in the spatial-temporalspace, a new sequence with a pre-determined length is created

Contribution 4: A complete system is deployed to control light and fan in

Trang 24

a smart-room environment The system utilizes the proposed algorithms andachieves both high accuracy and real-time performance In addition, it is sufferedevaluations from a large number of end-users in different contexts such as inTechmark Exhibitions (Sept., 2015, 2016), or in Technical demonstration sessions.(Celebration of HUST’s 60th Anniversary, Oct., 2016).

General framework and thesis outline

5 types of controls

Defining dataset

Setup Hand gestures DB

Hand detection &

segmentation

Spotting dynamic hand gesture

Real-time dynamic hand gesture spotting

Spotted dynamic hand gesture

Control home appliances (natural way & real environment)

Application system

Dynamic hand gesture

representation Phase synchronization

Robust dynamic hand gesture recognition

Hand gesture classifer

Figure 3 The proposed frame-work of the dynamic hand gesture recognition for trolling home appliances

con-This thesis proposes an unified solution of dynamic hand gesture recognition Theproposed framework consists of three main phases as illustrated in Fig 3 They are(1) hand detection and segmentation from a video stream; (2) spotting dynamic handgestures; and (3) the recognition schemes Utilizing this framework, a real application

is also deployed The application is evaluated in different contexts such as in lab-basedenvironments, demonstrations in the exhibitions and Tech-mart events Particularly,these research works in the thesis are divided into five chapters as follows:

Introduction: This chapter describes the main motivations and objectives of thestudy The thesis also presents the research’s context, constraints, and challenges.These factors could be raised when addressing the relevant problems in the thesis.Additionally, the general proposed framework, and the main contributions arealso presented in this Chapter

Trang 25

Chapter 1: This chapter mainly surveys existing complete hand gesture-basedcontrol systems Particularly, the related techniques of home appliance electronicequipment are discussed A series of the relative techniques consisting of handdetection, segmentation, and recognition techniques when systems are surveyed

in this Chapter

Chapter 2: In this Chapter, existing datasets of the dynamic hand gestures arefirstly described Then, the common commands of the home appliances are ex-amined Based on these studies, a new set of hand gestures is proposed Thisnew set consists the gestures with cyclical hand patterns A number of the pro-posed gesture sets are collected in different experiments such as in exhibitions,lab-based environments for further works

Chapter 3: This chapter proposes a learning scheme to learn a heuristic rameters for the hand detection and segmentation Utilizing the results of thelearning scheme, hand detection and segmentation results obtain not only a ro-bust and real-time but also good performance Given the segmented hand of thecontinuous sequence, a proposed method for spotting gestures is also presented

pa- Chapter 4: This chapter describes the proposed algorithms and experimentalevaluations for dynamic hand gesture recognition system An efficient represen-tation of the hand gestures based on spatial-temporal features are proposed Tosolve critical issues of gesture variations, a phase synchronization enhancing thesystem’s performance is presented The proposed algorithms are evaluated onseveral datasets (both collected and public datasets)

Chapter 5: By utilizing the proposed framework, a complete system is deployed tocontrol lamps/bulbs and fans in indoor environments A number of volunteers/end-user are invited to interact with the proposed system The computational costsand end-users’ feedbacks are reported The application shows the feasibility ofthe proposed method to deploy a real application

Conclusion and Future Works: Conclusions of the works and relevant discussions

on the limitations of the proposed method are given in this Chapter Furtherresearch directions are proposed for future works

Trang 26

CHAPTER 1 LITERATURE REVIEW

This chapter presents surveys on the related works of hand gesture-based systemsand dynamic hand gesture recognition methods Firstly, the relevant hand gesture-based applications are presented in Sec 1.1 Then as vision based hand gesturerecognition techniques generally consist of three main steps: The hand detection andsegmentation, spotting dynamic hand gesture, and dynamic hand gesture recognitionmethods Therefore, state of the art works related to these problems will be presented

in Sec 1.2, Sec 1.3, and Sec 1.4 respectively

home appliances

Nowadays, dynamic hand gesture-based controlling system has been still an tive topic in fields of computer vision because of its wide range of practical applica-tions, more specifically, sign hand language, lie detection, game, e-learning, human-robot interaction, and so on From the viewpoint of the specific applications insmart homes which utilize computer vision based techniques, readers can refer surveys[46, 4, 3, 36, 106, 53, 109] Hand gesture-based home automation field was applied onmany types of equipment such as TV, air-conditioner, fan, light, door In this section,the completed systems to control home appliances through hand gesture commandsare briefly surveyed Since the devices were a recent representative multimedia devicesapplying the hand motion techniques Hand gesture-based controlling system is di-vided into two categories: GUI device dependent systems and GUI device independentsystems The first category is presented in Sec 1.1.1, and the latter one is presented

ac-in Sec 1.1.2

1.1.1 GUI device dependent systems

Derived from the actual requirements of the hand gesture-based controlling tems which hand recognition systems face to many challenges as high accuracy rates,and/or computation time Many real applications deployed an independence GUI sys-tem which end-user is easy to interact with the system In these systems, hands arefirstly detected with some constraints such as put hand in the defined region, then it istracked as mouse pointers on the screen [46, 39, 151, 103] Mitsubishi Electric TV: [46]

Trang 27

sys-proposes a hand gesture-based system to control TV with four commands as illustrated

in Fig 1.1 A computer monitor is a middle equipment to show GUI interface withthe end-user An end-user has to use one open hand to trigger gesture mode, a handicon appears following the hand The end-user then moves the hand to adjust variousgraphical with the hand icon A controlling mode is ended by closing hand The nor-malized correlation is calculated for every offset positions of hand movements in theimage This is a simple algorithm which requires an angle of hand movement vectorsuch as trigger gesture is 250, and 150 during tracking of the hand These constraintsare quite disadvantaging for the end-user In addition, time cost for interaction up to ahalf second delay before recognition of the trigger gesture, and hand tracking at about

5 times a second, delay 2 seconds before resuming its search for the Turn On command

Graphics display

HP 735 Computer

monitor

Plex-cam video camera

Television

Electronically controlled remote control

Figure 1.1 Mitsubishi hand gesture-based TV [46]

Samsung smart TV: A Samsung smart TV could be controlled by moving one/twohands as illustrated in Fig 1.2 One can activate the system by waving his/her hand

in front of the TV during two seconds When the system is activated, an end-user canselect hand gesture-based controlling mode in order to replace a conventional method

In this case, a pointer will appear on the TV and a gesture control command bar appears

at the bottom of TV Therefore, end-users can recognize the available commands onthe device

Trang 28

Figure 1.2 Samsung-Smart-TV using hand gestures.

A CCD/CMOS camera is placed above or below the TV to capture images Thecamera does not capture entire body of end-user It just tracks the location of end-user’s head and the hands.The system required a distance between an end-user anddevice that is at least 1.5 m The cursor in TV’screen is controlled by the location ofthe hand TV uses thirteen gestures to convey commands as illustrated in Fig 1.3.This system is good for the ideal living room lighting situation and its accuracy largeeffects of light condition as well as the complex background

Waving a hand Flip to right Flip to left Grab Long grab Thumb up CCWRotation

Move Grab&move Waving two hand Grab&Widening Grab&Narrowing Grab&Rotating

Figure 1.3 Dynamic hand gestures used for Samsung-Smart-TV

Even though TV gesture-based consists up to thirteen dynamic hand gesturesand convey many commands in TV These commands are quite natural and memorial.But this system requires a GUI-interface to interact between end-user and TV, thendetected hand seems as a mouse pointer of the computer This application is dependent

on the GUI interface, therefore, it is difficult to use for every home appliances such as

a lamp, fan and so on Moreover, this application only uses RGB images, so it is muchaffected by illumination In addition, end-users have to put the hand with distanceabout 1.5m that is is not good for human’s health

Soft Remote Control System: Authors in [39] propose a feasible gesture-basedsystem to control various home appliances such as TV, curtain, light naturally Handgesture commands are designed by a combination of hand postures and hand motions

as illustrated in Fig 1.4

Trang 29

Hand postures Hand motions

Figure 1.4 Hand gesture commands in Soft Remote Control System [39]

In order to recognize the end-user’s hand pointing gestures in a large space, threeRGB cameras (320x240 pixels) are setup as illustrated in Fig 1.5 This system isquite sensitivity to illumination as well as requires a management screen for end-users tochoose the control modes as described in Table 1.1 End-user’s face firstly is detected toobtain skin color Then the hand regions are extracted Based on the segmented hands

on three RGB images, the 3-D position of the hand is located thanks to conventionalstereo-matching techniques To determine instance time to active the system, 3-Dhand’s position, speed of hand, and face direction are combined HMM model is used

to recognize the gestures All features are separated into two categories: static featureand dynamic feature Dynamic feature such as a sequence of directional codeword andpattern of cumulative angle have temporal variance at each sample point Static featuresuch as length and angle similarity between measured line vector and reference vectors

do not have this feature As reported, the system obtains quite low performances withthe accuracy rate of 80.7%

Figure 1.5 General framework of the Soft remote control system [39]

Improvements from [39], authors in [143] use two USB pan-tilt cameras to

Trang 30

de-Table 1.1 Soft remote control system and commands assignment

Menu rotation (clockwise) Channel up Close Off

Menu rotation (counterclockwise) channel down

ploy the home appliances controlling system The system strongly depends on theassumption that a gesture is performed if the hand is moved in a high speed Featuresextracted from hand motion is used to spot gestures The difference from above sys-tem, the relationship between face and hand is calculated to determine the controlledappliance as illustrated in Fig 1.6 The nearest distance from the appliance to theend-user is found to define the controlled equipment Then, the direction and distance

of the pointed position to the nearest appliance are calculated This proposed method

is often wrong because using 2D images to estimated relation between a point in 3D isnot accurate In addition, this system provides a feedback information for the end-user

to tells the nearest positioned appliance from the pointed 3D position by the end-user

It also provides the information of direction and distance to select properly the nearestpositioned appliance If feedback is wrong, the end-user must change position anddirection of hand to adjust their requirements Accuracy rate obtains 86% only

VCR Candidate

Audio Candidate

Light Candidate

VCR Selected

Figure 1.6 Hand gesture-based home appliances system [143]

Zhengmao Zou et al [151, 103]: The GUI system is created to control TV asillustrated in Fig 1.7, and OpenCV for image processing tasks The moment invari-ants algorithm is proposed to recognize ten dynamic gestures A simple background

Trang 31

is setup to segment hand region using skin color features Then, the moment ants algorithm to extract descriptive feature with the first four moments adequated

invari-to represent a gesture uniquely, and hence result in a simple feature vecinvari-tor Finally,SVM classifier is applied to recognize dynamic hand gesture In addition, a dynamichand gesture involves four static hand gestures: start, meaning command 1, meaningcommand 2, and stop static gesture as illustrated in Fig 1.8 Using four static handgesture is a tight constraint for the end-user as well as the system

Figure 1.7 TV Controlling with GUI of Dynamic gesture recognition [151]

Dynamic gesture recognition also minimizes the problems associated with handmovements in static hand gesture recognition systems The authors improved fromtheir first attempt using static hand gesture system The system is capable of rejectingunintentional gestures more easily than a static system because of the start and stoproutines associated with dynamic gestures as illustrated in Fig 1.8 Although handgesture dataset is diversity with twenty gestures Moreover, their proposed method isquite simple, the accuracy rate is obtained at 74.4% only

(a) Start-switch-down-stop

(b) Start-volumn-up-stop

Figure 1.8 Commands of GUI of Dynamic gesture recognition [103]

Ramamoorthy et al [106]: authors proposed the gesture-based digital TV usingultrasonic array sensors This proposed method uses a GUI interface, when the hand

Trang 32

moves in front of TV to perform basic graphical controls, the hand is detected based

on pattern matching Then, hand motion is detected and track Six dynamic handgestures are recognized to convey commands, that consists: turn on, changing channels,increasing or decreasing volumes commands

Shiguo Lian et al [73]: authors propose a dynamic gesture-based TV controlwith automatic end-user’s state detection Three types of sensors are utilized with

an ultrasonic distance sensor, RGB, and depth camera while only six dynamic handgestures are conveyed commands to control TV: swift left, right, up, down, back, andforward Which is not effective to deploy a real application Moreover, the systemdetects end-user’s gaze and tracks his/her face based on viewing direction Hand’smotion is used as an important cue to recognize the gestures That is not robust aswell as their algorithm to detect and segment hand based on the face detection result

in RGB cues, or depth cues for hand detection are some limitations

1.1.2 GUI device independent systems

(a) A trigger motion (b) Shapes: V-sign / 1 finger (c) Motion: up,down,left, right

Figure 1.9 Features of the Omron dataset [3]

Omron hand gesture dataset: OMRON Corporation has proposed a new handgesture recognition technology to obtain the position, shape, and motion of hand orfinger [3] First, OMRON’s core facial image sensing technology is used Then, thegesture is recognized automatically based on the interrelation between the position

or direction of the face and hand shape or position The trigger motion to start therecognition process through an initial movement between the face and hand region asFig 1.9 (a) Simple instructions used in human interfaces as up, down, left, rightmotions (Fig 1.9 (b)), V-sign and/or 1-finger sign are defined (Fig 1.9 (c)) Thedistances range between the end-user and the TV is around 10 cm to several meters.These distance are quite near and not good for human’s health Moreover, after triggerhand gesture controlling mode by the calculating distance from face and hand, thedynamic hand gesture is recognized by only motion feature, this is a weak feature ofthe recognition system

For the end-users’ behavior, authors in [119] surveyed when the end-user interacts

Trang 33

Table 1.2 Omron TV command assignment

Open hand shape Turn on/off Power Button

Letter motions Jump channel directly Channel Buttons

Right motion Turn channel Channel Up 7

Down motion Turn up volume Volume Up

Up motion Turn down volume Volume Down

Close hand shape Mute volume Mute Button

V-hand shape Start Gesture Start Gesture

with TV By selecting hand shapes and hand motions through interaction with a handgesture selection system This hand gesture dataset consists of eight gestures as illus-trated in Table 1.2 Using both static and dynamic hand gestures may be face to somechallenges when deployed in a real application such as time cost, a combination of thealgorithm if a number of gestures are not too large

Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119]

Wisee-based hand gestures: [9] uses wifi signals (Fig 1.10) When hand waves,the receiver measures the wifi signal changing and interprets commands A novelsegmentation method based on wavelet analysis and STE is proposed The novelalgorithm intercepts CSI segments and analyzes the variations in CSI caused by handmotions, revealing the unique patterns of gestures Then, DTW technique is applied

to classify gestures Seven hand gestures are designed to control TV, laptop and condition (Fig 1.11) that are mapped in Table 1.3 Despite of high accuracy (92.1%)but an emission source in this system is set up by two WB-2400D300 antennas to

Trang 34

air-increase the power of wifi signal Therefore, using high power wifi signal may beharmful to human’s health, it is not feasibility to apply for the real home appliances.

Pointing

Scroll up Scroll down Swipe leftward Swipe rightward Flick

Grab

Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset)

Table 1.3 Hand gestures utilized for different devices using Wisee technique

Lei Jing et.al [62]: an MR is wearied into thumb to acquire hand gesture tories as illustrated in Fig 1.12 (b) The proposed system determines the startingand ending points of hand motion Then a hierarchical classifier based on HMM clas-sify dynamic hand gesture motions Corresponding to these trajectories of the handcommands to control different home appliances as lamp, CD radio, and television Sixdefined gesture trajectories are proposed as illustrated in Fig 1.12 (a) to convey com-mands as illustrated in detail at Table 1.4 However, wearing a ring to control homeappliances make end-users feel uncomfortable, unnatural, unfriendly while accuracyrate only obtains with 85%

trajec-Bretzner et al [23]: Hand postures are represented in terms of hierarchies of scale color image features at different scales, with qualitative inter-relations in terms

multi-of scale, position and orientation Detection multi-of multi-scale color features is performed.Hand states are then simultaneously detected and tracked using particle filtering, with

an extension of layered sampling referred to as hierarchical layered sampling These

Trang 35

(a) The six gestures dataset (b) Control home appliances

Figure 1.12 Simulation of using MR to control some home appliances [62].Table 1.4 Hand gestures utilized for different devices using MR technique

Right Rotate Volume up Swipe files rightward Volume up

Max Up Appliance list(+) Appliance list(+) Appliance list(+)Max Down Appliance list(-) Appliance list(-) Appliance list(-)

components have been integrated into a real-time prototype system, applied to a testproblem of controlling consumer electronics using hand gestures

PointGrab company[33]: At the Consumer Electronics Show in 2014, PointGrabshowed technology for controlling devices and home appliances by pointing the finger.Depth cues are used to create a “transparent space” in front of the end-user, offeringtouch like operation albeit from a distance The end-users can control some homeappliances as TV, air-condition, lamp, door so on A chip is embedded to process agesture algorithms, calibrates the position of the end-user’s eyes with the position anddirection of the finger For example, a finger points forward to a lamp and turn thedimmer up or down The end-user can control up to 4.5m away and under all lightingconditions Moreover, this system only controls two commands with pointing the end-user’s finger to turn on and turn off a home appliance In addition, using depth cuesonly may face to some algorithm limitations while this company did not publish theirmethod

There are many systems use hand gestures to control home appliances as shown

in Table 1.5 which could be separated into the two following groups: dependence andindependence with GUI of the devices For the first category, it requires a GUI tointerface between an end-user and equipment Therefore, theses systems need a screen

Trang 36

Figure 1.13 AirTouch-based control uses depth cue [33].

Table 1.5 The existing in-air gesture-based systems

ID Reference Equipments Vision-based Other in-air gesture

1 Mitsubishi, 1995 [46] TV x

2 Jun-hyeong et al., 2002 [39] TV, Lamp, Curtain x

3 Lars Bretzner et al., 2002 [23] TV x

4 Yang et al.,2006 [143] TV, Fan, Lamp, Music x

5 Asanterabi Malima et al 2006,[81] Robot x

6 Lee-Ferng et al., 2009 [72] Robot x

7 Xu Zhang et al., 2009[149] Game controller x

8 Sigalas et al., 2010 [122] Robot x

9 Solanki et al., 2011[126] TV, Stereo in car x

10 NATOPS, 2011 [140] Air-plane x

11 Omron, 2012 [3] TV x

12 Jing et.al, 2012 [62] Lamp, CD Radio, and TV Magic-ring

13 Samsung 2013[4] TV x

14 PointGrab company, 2013 [33] TV, Lamp, Air-Conditioner, Fan x

15 Microsoft Xbox-Kinect 2013[2] Game controller x

16 Lian [73] TV x

17 Ramamoorthy et al.,2015 [106] TV x

18 Prajapat et al., 2015[117] Robot x

19 NVIDIA, 2016 [87] Car x

20 Aqaness et al., 2016 [9] TV, Laptop/Smart-device, Air-conditioner Wisee

21 Grif et al., 2016[51] Laptop x

22 Biju et.al, 2016[18] TV, Stereo x

23 Azilen company 2017 [102] Lamp, Air-conditioner, Security, Washer, Wisee

Ovens or Fefrigerator, TV x

which is not suitable for almost non-screen home appliances such as fan, lamp, and so

on For the second category, some proposed methods use other technique as WiFi-based[9], or limit number of hand gestures such as turn on/off commands [39]

This step aims to determine the hand region, then cut as well as prune and keeponly hand region in images Because human’s hands that has a large number of DoFs,

a huge variety of 2D appearances will be depended on the camera viewpoints, ent scales, resolution, illumination or even different characters of sensor types (Depthand/or Color images) Therefore, accurate detection of hands in images or video re-

Trang 37

differ-mains a challenging problem To deploy a real system, it requires not only accuracybut also low time cost for this step There has been uncountable proposed methods forhand detection and segmentation that used the main following features such as: Color[27, 115, 111, 132], shape [57, 110, 91], motion [10, 57], and depth [111, 101].

1.2.1 Color

Skin color-based segmentation has been applied by several approaches Some ofthem use the value of individual pixels that often detects hand pixels based on skincolor, other approaches utilize the relationship between pixels or regions [151, 103, 73]

In [23], authors propose an approach to detect feature based on color First, thesignificant blob and ridge feature is extracted from an image of a hand Then based

on color cues remaining part of the image will be extracted as hand region candidate.Yeong et al [39] detect a face in order to create skin model A 2D histogram fromthe pixels of the detected face area is computed on U and V channels of YUV colorspace This histogram is used to convert moving region pixels in the current inputimage to a corresponding probability of skin color images This work is not affected

by different lighting conditions

To overcome the limitation of RGB due to its sensitivity to light conditions, Yang

et al [143] convert RGB image to YCrCb and discard Y component which containsluminance information After splitting into Cr (red color information chrome) and Cb(blue color cues chrome) they detect red blob using thresholding technique Duringthe procedure, a closing operation is adopted to remove the fragments of red blobs.However, this method is not effective if the background color is the same with skincolor

In [145], the RGB color model is combined by both color and brightness cueswhich effects from background illumination and environment Therefore, authors try

to convert from RGB image to HSV color space [132] proposes a method to detectskin regions using an algorithm of color segmentation which is robust to the lightingcondition The proposed algorithm uses a neural network as a step of color normaliza-tion However, this step takes so much time because each pixel will be passed to theneural network Moreover, using color to detect hand requires some constraint suchas: (1) the background does not contain objects that are similar to skin color, (2) illu-mination is changing, (3) the background is simple For the complex background, theskin segmentation is combined with a sliding window technique to reduce the searchspace of a hand posture recognition This work requires complex algorithm and highcomputation time This work is not effective if the background is similar to skin color

Trang 38

Hand segmentation based on skin color is still quite challenging in the real plications It requires some constraints to obtain high performance such as simplebackground, invariant illumination, difference color between hand and background Toimprove the accuracy of hand detection, other cues or method are combined with skincolor or combined multiple features such as topography, shape, edge, the motion of thehand and so on.

ap-1.2.2 Shape

The shape feature has been utilized to detect the hand in images The shapefeature is often obtained by extracting the contours and edges Normally, hand shape-based methods firstly extracts all contours in the image Then, basing on characteristic

of hand shape, hand regions are detected [10, 37, 23, 127]

To achieve a high performance, many methods always require sophisticated gorithms For instance, [99] constructs multiple layers of the texture, color, shape ofthe hand; or in order to extract precisely hand contours, shape of the hand to extractprecisely hand contours This method obtained high accuracy but requires up to 2.65s

al-to process one image of resolution 160x120

In [35], after detecting face region using a Viola-Jone detector, hand region issearched using skin detection and hand contour comparison algorithm based on handposture templates In [10], authors firstly use three complementary detectors to detecthand candidates Each candidate is then scored by the three detectors independently.Next, a classifier is applied to compute a final confidence score for the proposals usingthese features Although this proposed method obtains higher performance but itrequires up to 2 minutes for one image

In [16], a descriptor is computed to measure the similarity between shapes Theshape context descriptor is characterized a particular point location on the shape.This descriptor is the histogram of the relative polar coordinates of all other points.Detection is based on the assumption that corresponding points on two different shapeswill ideally have a similar shape context

In [32], the corona effect smoothing and border extraction algorithm are used tofind the contour of the hand Then, the FD is used to describe the hand shapes The

FD is defined as the power spectrum of discrete FD of the boundary points that arerepresented by complex numbers The Hausdorff distance measure is used to trackshape-variant hand motion

In [127], the blobs of hands are extracted by thresholding the luminance of thecaptured images Then, the chain code representations of blobs contours are used to

Trang 39

compute the fingertip positions by the linear time algorithm Last, 3D positions ofthe fingertips were reconstructed by triangulation and subsequently smoothed usingKalman filters This system aims to reduce the processing complexity by limiting so-phisticated operations only to geometric primitives The time cost of this segmentationmethod spends more than 90% of the system.

[31] supposes that a hand-forearm region (including a hand and part of a forearm)has different brightness from other skin-colored regions Therefore, the hand-forearmregion is segmented by the brightness difference The hand region is extracted fromthe hand-forearm region by detecting a feature point of the wrist This method isdifficult to deploy a real application because it can not operate when it does not detecthand-forearm

The hand is a non-rigid object with variants of hand shapes, scales, and colors.White shape-based hand detection methods use contours of the hand Therefore, usingshapes of hand to detect are quite challenging

1.2.3 Motion

In [57], shape-variant hand motion is measured to track by the Hausdorff distance.Next, hand figures are characterized by the scale and rotation-invariant Fourier descrip-tor Then, a 3D modified HNN that uses a graph matching to recognize dynamic handgestures Fifteen hand gestures are recognized that achieves above 91% recognitionrate But, this method requires a high time cost (up to 10s), enlarges the library ofstored models, and depends on thresholding parameters

In [84], a supervisor selects and activates visual processes In a process, a fidence factor is provided to dynamically reconfigure the system that responses toevents in the scene Hand tracking is described using image differencing and normal-ized histogram matching The result of hand detection is used by a recursive estimator(Kalman filter) to provide an estimate of the position and size of the hand Moreover,time cost for this system is only 5fps

con-1.2.4 Depth

To speed up the detection as well as to avoid illumination and skin color changing,[111] used depth cues that are captured by a Kinect sensor with two main assumptions:the hand is the front most object and face to the sensor; the end-user must wear a blackbelt on the gesturing hand’s wrist The hands are detected by a threshold of the depthmap in order to determine hand regions Then, RANSAC is applied to locate theposition of a black belt to precisely localize and segment the hand region Following

Trang 40

the same idea, [141] proposes to detect hand from the human skeleton They alsorefine the hand region by a threshold on the depth data and black wristband detection.Obviously, the current approaches favor the use of depth data and only use color ascomplementary.

[17] extracts skeleton of hands and applies HMM for classifying various handgestures [40] introduces a method to estimate 3D hand pose and hand configuration.Then, template matching technique was used for recognition procedures [42] proposes

a method which used RGB and Depth image for hand detection

[101] uses a threshold to cut hand region from depth images But depth ing actually does not perform a perfect segmentation [26] proposes a method relies ondepth information that obtained from a stereoscopic video system Head of the human

threshold-is firstly detected from stereoscopic, then depth data of the head threshold-is obtained Fromthe depth of head, a threshold is used to cut hand region This method is real-timesystem but the accuracy of hand region is not good Moreover, the accuracy of handdetection is affected by the result of head detection, and it is difficult to exactly define

a threshold

Figure 1.14 Depth threshold cues and face skin [97]

Park et al [97] propose a method to detect hand as illustrated in Fig 1.14 Adepth threshold is used to detect hand region Then, the face is detected to obtainskin model Finally, the hand is segmented based on the skin model [97] improves thequality of segmented hand region but hand detection results depends of results of facedetection Moreover, the skin color of the face is different from hand’s skin because itcontains some unwanted objects such as eye, eyebrow, mouth, and so on

In [148], hand detection based on skin color model and background subtractionfrom depth images A threshold of skin color model is selected and combined withbackground subtraction Stereo camera calibration and disparity mapping are used toextract hand from the depth information In [60], hand region is cropped on the basis

Định dạng
Số trang	157
Dung lượng	10,17 MB