Nhận dạng các tình huống khó ứng dụng trong trợ giúp người khiếm thị sử dụng kinect di động

The 3D imaging techniques were used to detect the general obstacle including: plane segmentation, 3D point clustering and the mixed strategy between depth and color image is used to dete

Trang 1

DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR

VISUALLY-IMPAIRED AID USING A MOBILE KINECT

MASTER THESIS OF SCIENCE COMPUTER SCIENCE

Ha Noi – 2016 Nhi ■ u event thú v ■ , event ki ■ m ti ■ n thi ■ t th ■ c 123doc luôn luôn t ■ o c ■ i gia t ■ ng thu nh ■ p online cho t ■ t c ■ các thành viên c ■ a website.

123doc s ■ u m ■ t kho th ■ vi ■ n kh ■ ng l ■ i h ■ n 2.000.000 tài li ■ t c ■ nh v ■ c: tài chính tín d ■ ng, công ngh ■ thông tin, ngo ■ i ng ■ , Khách hàng có th ■ dàng tra c ■ u tài li ■ u m ■ t cách chính xác, nhanh chóng.

Mang l ■ i tr ■ nghi ■ m m ■ i m ■ cho ng ■■ i dùng, công ngh ■ hi ■ n th ■ hi ■ ■■ ■ n online không khác gì so v ■ i b ■ n g ■ c B ■ n có th ■ phóng to, thu nh ■ tùy ý.

Luôn h ■■ ng t ■ i là website d ■ ■■ u chia s ■ và mua bán tài li ■ u hàng ■■ u Vi ■ t Nam Tác phong chuyên nghi ■ p, hoàn h ■ o, cao tính trách nhi ■ m ■ ng ng ■■ i dùng M ■ c tiêu hàng ■■ ■ a 123doc.net tr ■ thành th ■ vi ■ n tài li ■ u online l ■ n nh ■ t Vi ■ t Nam, cung c ■ p nh ■ ng tài li ■■■ c không th ■ tìm th ■ y trên th ■ ■■ ng ngo ■ i tr ■ 123doc.net

123doc cam k ■ t s ■ mang l ■ i nh ■ ng quy ■ n l ■ t nh ■ t cho ng ■■ i dùng Khi khách hàng tr ■ thành thành viên c ■ a 123doc và n ■ p ti ■ n vào tài kho ■ n c ■ a 123doc, b ■ n s ■ ■■■ c h ng nh ■ ng quy ■ n l ■ i sau n ■ p ti ■ n trên website thay vì m ■ i m ■ t cá nhân kinh doanh t ■ th ■ c hi ■ n ngh ■ a v ■ a mình thì s ■ p t ■ i, ngh ■ a v ■ a c ■ a hàng tri ■ u nhà bán hàng l ■ i chuy ■ n giao sang ■■ ■ qu ■ n lýChào m ■ ng b ■■■ ■ i 123doc.

Trang 2

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

-

Hoang Van Nam

DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR

VISUALLY-IMPAIRED AID USING A MOBILE KINECT

Department : COMPUTER SCIENCE

Trang 3

Độc lập – Tự do – Hạnh phúc

BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ

Họ và tên tác giả luận văn : ……… ………

Đề tài luận văn: ……… ……… ….

Chuyên ngành:……… ……… …

Mã số SV:……… ……… …

Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày… ………… với các nội dung sau: ……… …………

……… ………

CHỦ TỊCH HỘI ĐỒNG

Trang 4

I, Hoang Van Nam, declare that this thesis titled, ’Difficult situations recognition for

visual-impaired aid using mobile Kinect’ and the work presented in it are my own I

confirm that:

This work was done wholly or mainly while in candidature for a research degree

at this University

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly

stated

Where I have consulted the published work of others, this is always clearly

at-tributed

Where I have quoted from the work of others, the source is always given With

the exception of such quotations, this thesis is entirely my own work

I have acknowledged all main sources of help

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself

Signed:

Date:

Trang 5

International Research Institute MICAComputer Vision Department

Master of ScienceDifficult situations recognition for visual-impaired aid using mobile Kinect

by Hoang Van Nam

By 2014, according to figures from some organization, here are more than one million

people in the Vietnam living with sight loss, about 1.3% of Vietnamese people Although

the big impact to the daily living, especially with the ability to move, read, communicate

with another, only a small percentage of blind or visually impaired people live with

assistive device or animal such as a dog guide Motivated by the significant changes in

technology have take place in the last decade, especially in the introduction of varies

types of sensors as well as the development in the field of computer vision, I present

in this thesis a difficult situations recognition system for visually impaired aid using a

mobile Kinect This system is based on data captured from Kinect and using computer

vision technique to detect obstacle At the current prototype, I only focused on detecting

obstacle in the indoor environment like public building and two types of obstacle will

be exploited: general obstacle in the moving way and staircases-which causes a big

dangerous to the visually impaired people The 3D imaging techniques were used to

detect the general obstacle including: plane segmentation, 3D point clustering and the

mixed strategy between depth and color image is used to detect the staircase based on

detecting the stair edges and its structure The system is very reliable with the detection

rate is about 82.9% and the time to process each frame is 493 ms

Trang 6

I am so honor to be here the second time, in one of the finest university in Vietnam to

write those grateful words to people who have been supporting, guiding me from the

very first moment when I was a university student until now, when I am writing my

master thesis

I am grateful to my supervisor, Dr Le Thi Lan, whose expertise, understanding,

gener-ous guidance and support made it possible for me to work on a topic that was of great

interest to me It was a pleasure to work with her

Special thanks to Dr Tran Thi Thanh Hai, Dr Vu Hai and Dr Nguyen Thi Thuy

(VNUA) and all of the members in the Computer Vision Department, MICA Institute

for their sharp comments, guidance for my works which helps me a lot in how to study

and to do research in right way and also the valuable advices and encouragements that

they gave to me during my thesis

I would like to express my gratitude to Prof Veelaert Peter, Dr Luong Quang Hiep

and Mr Michiel Vlaminck at Ghent University, Belgium for their supporting It’s been

a great honor to cooperate and work with them

Finally, I would especially like to thank my family and friends for their continues love,

support they have given me through my life, helps me pass through all the frustrating,

struggling, confusing Thanks for everything that helped me get to this day

Hanoi, 19/02/2016Hoang Van Nam

Trang 7

Declaration of Authorship i

1.1 Motivation 1

1.2 Definition 2

1.2.1 Assistive systems for visually impaired people 2

1.2.2 Difficult situations 3

1.2.3 Mobile Kinect 5

1.2.4 Environment Context 11

1.3 Difficult Situations Recognition System 12

1.4 Thesis Contributions 13

2 Related Works 14 2.1 Assistive systems for visually impaired people 14

2.2 RGB-D based assistive systems for visually impaired people 18

2.3 Stair Detection 19

3 Obstacle Detection 25 3.1 Overview 25

3.2 Data Acquisition 26

3.3 Point Cloud Registration 27

3.4 Plane Segmentation 30

3.5 Ground & Wall Plane Detection 32

Trang 8

3.6 Obstacle Detection 32

3.7 Stair Detection 34

3.7.1 Stair definition 34

3.7.2 Color-based stair detection 35

3.7.3 Depth-based stair detection 45

3.7.4 Result fusion 46

3.8 Obstacle information representation 48

4 Experiments 49 4.1 Dataset 49

4.2 Difficult situation recognition evaluation 51

4.2.1 Obstacle detection evaluation 51

4.2.2 Stair detection evaluation 53

5 Conclusions and Future Works 58 5.1 Conclusions 58

5.2 Future Works 59

Trang 9

1.1 A Comprehensive Assistive Technology (CAT) Model provided by [12] 3

1.2 A model for activities attribute and mobility provided by [12] 4

1.3 Distribution of frequencies of head-level accidents for blind people [18] 4

1.4 Distribution of frequencies of tripping resulting a fall [18] 5

1.5 A typical example of depth image (A) raw depth image, (B) depth image is visualized by jet color map and the colorbar shows the real distance with each color value, (C) Reconstructed 3D scene 6

1.6 A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map) 7 1.7 Some existed stereo camera From left to right: Kodak stereo camera, View-Master Personal stereo camera, ZED, Duo 3D Sensor 7

1.8 Time of flight systems from [3] 8

1.9 Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft Kinect v2 8

1.10 Structured light cameras From left to right: PrimeSense, Microsoft Kinect v1 8

1.11 Structured light systems from [3] 9

1.12 Figure from [16], (A) raw IR image with pattern, (B) depth image 9

1.13 Figure from [16] (A) Errors for structured light cameras, (B) Quantization errors in different distances of a door: 1m, 3m, 5m 10

1.14 Prototype of system using mobile Kinect, (A) Kinect with battery and belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body 11

1.15 Two different environments that I tested with (A) Our office build (B) Nguyen Dinh Chieu secondary school 12

1.16 Prototype of our obstacle detection and warning system 13

2.1 Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C) Nav-igation 15

2.2 NXT Robot System from [6] (A) The system’s Block Diagram, (B) NXT Robot 16

2.3 Mobile robot from [22] [21] 16

2.4 BrainPort vision substitution device [32] 18

2.5 Obstacle detection process from [30] 20

2.6 Stair detection from [26] (A) Input image (B)(C)Frequency as a output of Gabor filter (D)Stair detection result 21

Trang 10

2.7 A near-approach for stair detection in [13] (A) Input image with detected

stair region, (B) Texture energy, (C)Input image with detected lines are

stair candidates, (D)Optical flow maps in this image, there is a significant

changing in the line in the edge of stair 22

2.8 Example of segmentation and classification in [24] 23

2.9 Stair modeling(left) and features in each plane [24] 23

2.10 Stair detection algorithm proposed in [29] (A) Detected line in the edge image (using color infomation) (B) Depth profiles in each line (red line: pedestrian crosswalk, blue: down stair, green: upstair) 24

3.1 Obstacle Detection Flowchart 26

3.2 Kinect mounted on body 27

3.3 Coordinate Transformation Process 28

3.4 Kinect Coordinate 29

3.5 Point Cloud rotation using normal vector of ground plane (while arrow): left: before rotating, right: after rotating 30

3.6 Normal vector estimation algorithms [15] (a) Normal vector of the center point can be calculated by a cross product of two vectors of four neighbor points (red), (b) Normal vector estimation in a scene 31

3.7 Plane segmentation result using algorithm proposed in [15] Each plane is represented by a distinctive color 31

3.8 Detected Ground and Walls plane (ground: blue, wall: red) 33

3.9 Human Segmentation Data by Microsoft Kinect SDK (a) Color Image, (b) Human Mask 34

3.10 Detected Obstacles (a) Color Image, (b) Detected Obstacles 34

3.11 Model of stair 35

3.12 Coordinate transformation models from [7] 36

3.13 Projective chirping: a) A real world object that generate a projection with ”chirping” - ”periodicity-in-perspective” b) Center raster of image c) Best fit projective chirp 38

3.14 A pin-hole camera model with stair 38

3.15 A vertical Gabor filter kernel 39

3.16 Gabor filter applied on a color image (a) Original (b) Filtered Image 40

3.17 Thresholding the grayscale image (a) Original (b) Thresholded Image 40

3.18 Example of thinning image using morphological 41

3.19 Thresholding the grayscale image (a) Original (b) Thresholded Image 42

3.20 Six points vote for a line will make an intersection in Hough space, this intersection has higher intensity than neighbor pixels 42

3.21 Hough space (a) Line in the original space (b) Three curves vote for this line in Hough space 43

3.22 Hough space on stair image (a) Original image (b) Hough space 43

3.23 Chirp pattern detection (a) Hough space (b) Original image with detected chirp pattern 44

3.24 Point cloud of stair (a) Original color image (b)Point cloud data created from color and depth image 45

3.25 Detected steps 46

3.26 Detected planes 47

3.27 Detected stair on point cloud 47

Trang 11

3.28 Obstacle position quantization for sending warning message to visually

impaired people 48

4.1 Depth image encoding (A) Original, (B) Visualized Image (C) Encoded

Image 50

4.2 Detection time of each step in our proposed method 52

4.3 Example stair image to evaluation (A)Positive sample from MICA dataset

(B) Negative sample from MICA dataset (C) Positive sample from MONASH

dataset (D) Negative sample from MONASH dataset 54

4.4 Detected stair in Tian’s based method (A-F) and detected stair in my

proposal method (G-I) (A) Color image (B) Depth image (C) Edges (D)

Line segments (E) Detected concurrent lines (F) Depth values on detected

lines (G) Detected stair with blue lines are false stair edge and green

lines are stair edge (H) Edges Image, (I) Detected peaks in Hough map

corresponding to lines in Figure G 55

4.5 Miss detection in Tian’s based method because of missed depth on

stair(A-F) and detected stair in my proposed method (G-I) 56

4.6 Miss detection in Tian’s based method because of missed depth on

stair(A-F) and detected stair in my proposed method (G-I) 57

Trang 12

2.1 Comparison between assistive robot and wearable device 14

4.1 Database specifications 50

4.2 Pixel level evaluation result (TP,FP,FN: million pixels) 52

4.3 Object level evaluation result (TP,FP,FN: objects) 52

4.4 Stair dataset for evaluation 53

4.5 Stair detection result of the proposed method on different datasets 53

4.6 Comparison of the proposed method and the method of Tian et al [29] on MICA dataset 55

Trang 13

PCL Point Cloud LibraryCAT Comprehensive Assistive TechnologyTDU Tongue Display Unit

IR InfraredOpenCV Open Computer VisionRGB Red Green Blue

RGB-D Red Green Blue and DepthToF Time of Flight

Trang 14

1.1 Motivation

According to the official statistic of National Eye Hospital in 2002, Vietnam has about

900.000 blind people, including about 400.000 who are totally blind By 2014, according

to figures from some organizations, the number of blind people in Vietnam is about 1,2

to 1,4 million people, it’s still a large number in comparison with other countries In

the worldwide, the visually impaired population is estimated to number in excess of 285

million by the investigation of World Health Organization (August 2014)1 About 90% of

them live in developing countries with low-income settings Visually impaired has made

a big impact in their daily living Especially they can not read the document, the ability

to move and to communicate with other people is compromised because the information

is received primarily through vision All of the above things have led blindness become

the public heath problem in all over the world

Nowadays, with the significant developing in the technology, lots of assistive devices

has been released in order to help visually impaired people in daily life But although

many researchers and companies are concerned with making better and cheap device

to improve the comfort of visually impaired people, the research in this field still

re-mains many unsolved issues and in general, those devices still cannot replace traditional

methods such as the white cane or guided dog

Take the motivation on the significant changes in technology have take place in the last

decade, especially in the introduction of varies types of sensors as well as the development

in the field of computer vision, my thesis aims to build a prototype of system to help

visually impaired people avoid the obstacle in the environment using Kinect sensor

1 http://www.who.int/mediacentre/factsheets/fs282/en/

Trang 15

With the Kinect, the benefit is that we can make a reliable system by using depth and

color information to detect the obstacle with an affordable price In my thesis, due to

the lack of time, I only focus on indoor environment, more specifically in the public

building such as apartment or office in order to detect some general objects encountered

on the moving way and stair which may cause danger to the visually impaired people

My thesis is organized as follows:

First, I shall give some definitions in the context of my work and the contributions in

this thesis

In chapter 2, I shall review briefly some other works that related to my system such

as existing assistive devices, obstacle detection algorithms/systems, its advantages and

disadvantages

In chapter 3, a framework for obstacle detection will be developed and I shall present

the details of each module and also the entire system, analyzing and assessing them

In the next chapter, I shall give some experiments results of my system, including how

to prepare the dataset, how to make an evaluation and the final results

In the final chapter, I end this work by giving some conclusions and future works to

make the system more complete and effective

1.2 Definition

1.2.1 Assistive systems for visually impaired people

According to [12], assistive systems for visually impaired people can be understood as

an equipment, devices or systems which can be used to overcome the gap between a

disabled person wants to do and what the social allows them to do In short, such

kind of system must be able to help the visually impaired people to do the things that

normal people can do And this system can be model by the Comprehensive Assistive

Technology (CAT) Model as shown in Fig 1.1 The top level of this model consist of

four components that can be used to define all assistive technology systems:

• Context (in which the assistive technology will be used)

• Person (what kind of user can use this system)

• Activities (what activities that assistive system can help the visually impaired

people, can be seen more clearly in Fig1.2)

Trang 16

• Assistive Technology (technology will be used to make a system)

Most of the existing systems are aimed at solving one specific aspect of each branch in

the model: work on bounded defined context, with some certain types of users to help

them in specific activities in daily life In the framework of my master thesis, to simplify

the system, I just focused on some certain aspects of this model and I will explain in

detail in the next sections In short, I applied my system with the local settings of

context, in a small public building such as office, department and the users are the

visually impaired students at the Nguyen Dinh Chieu Secondary school to help them

avoid obstacles in a moving way

PersonContext

Culture &

Social contextNational contextLocal settings

Activity specificationDesign issues

End user issuesSystem technology issues

Characteristics

Social aspectsAttitudes

Figure 1.1: A Comprehensive Assistive Technology (CAT) Model provided by [12]

1.2.2 Difficult situations

Fig 1.2 shows detail information of activities branch in the model CAT (see Fig 1.1)

As shown in the figure, there are a lot of services that can be used in assistive systems

for visually impaired people such as mobility, daily living, cognitive activities, education

and employment, recreational activities, communication and access to information But

most of exist works focus on the mobility component in the activities model because of

its important role for visually impaired people daily life

Trang 17

Recreational activities Education and Employment Cognitive activities

Daily living Mobility

Communication & Access to information

Movement on ramps,

slopes, stairs & hills

Long & medium distance

locomotion

Sitting and standing

Reaching and lifting

Navigation and orientation

Sitting and standing

Obstacle avoidance

Short distance locomotion

inside & outside

Access to environment

Obstacle avoidance

Our Focus

Figure 1.2: A model for activities attribute and mobility provided by [12]

According to the survey of R.Manduchi [18] in 2011 with 300 respondents who are

legally blind or blind, there were half of the respondents said that they had an

head-level accident at least once in a week and about 30% respondents fell down at least once

a month (see Fig 1.3 and Fig 1.4) Therefore, helping visually impaired people in the

moving process is always an interested topic for researchers, social organizations and

companies In fact, many products have been released, and also have some particular

success like the system proposed in [11], [10], [1] and [4]

Figure 1.3: Distribution of frequencies of head-level accidents for blind people [18]

Trang 18

Figure 1.4: Distribution of frequencies of tripping resulting a fall [18]

In the context of my thesis, I aim to develop a system which can detect the obstacles

in visually impaired people’s moving way which are the main cause of the accidents

mentioned above The scenario in this project is that visually impaired people want to

move on the hallway inside a public building, so they need to avoid obstacles including

moving or static objects and to go up/down the stair Obstacle in my case can be defined

as objects laying on the ground or in front of the visually impaired people that he/she

can be harmed while moving if encountered these objects Although obstacle’s class is

very important with the visually impaired people to distinguish which is more dangerous

and which is not but in my work, I just try to detect obstacle in the scene without saying

its name (make a classification) And within the framework of this thesis, I also focus on

detection another special object that often appears in the building and is very dangerous

for the visually impaired people, that is the stair Moreover, the proposed system will

only give a warning to the blind people using Tongue Display Unit (TDU) which has

been already developed by Thanh-Huong Nguyen in 2013 [23] In brief, my proposed

system aims to solve two aspects of mobility component of the activities model (see

Fig 1.2): obstacle avoidance and movement on ramps, slopes, stair & hill and with

the second aspect, the current system just stop at the level of given warning distance of

stairs to the visually impaired people in order to assist them in going up/down stairs

1.2.3 Mobile Kinect

1 Introduction To assist visually impaired persons in those difficult situations, in

my thesis, I proposed using a Kinect sensor to capture the information of the

envi-ronment in order to detect obstacles if they appear There are a lot of advantages

when using Kinect in this system since it is a popular RGB-D camera with cheap

price But firstly, I will give some brief information about the depth camera where

Kinect is the typical example

Trang 19

Depth camera is actually a sensor which has the capacity to provide depth

in-formation (depth image or depth map) A depth map is an image that contains

information relating to the distance of the surface of scene objects from a

view-point, for example in the Fig 1.5 An intensity value of each pixel in a depth

map represents a distance from a point in the object to the camera Therefore,

3D information of the scene can be reconstructed by using depth image (as shown

in Fig 1.5-C) The benefit of the depth image that is not affected by lighting

conditions

Figure 1.5: A typical example of depth image (A) raw depth image, (B) depth image

is visualized by jet color map and the colorbar shows the real distance with each color

value, (C) Reconstructed 3D scene

In recent years, with the development of technology, especially in the field of

sensor fabrication industry, there are a lot of cameras have been placed on the

market which is capable of capturing the depth information Those devices can be

separated into several groups by used technology such as stereo camera: ZED, for

example, Time-of-Flight (ToF) like ZCam, the structured light camera like Kinect,

long range 3D camera Each device has it own advantages, disadvantages and only

suitable for a particular use case

2 Stereo Camera

Stereo camera is the kind of camera was used in the robotics since its early days

Take the ideas of human binocular vision, it contains two or more cameras with

precisely known relative offsets Depth information can be calculated by matching

similar point in the overlapped region between images Hence, 3D distance to

matching points can be determined using triangulation like illustrated in Fig1.6

However, the camera is used in this case is still the color camera As a result, it is

still affected by the changing of lighting conditions On the other hand, the depth

image is calculated by matching algorithms, so it works very poorly when the

scene is texture-less, for example image of wall, building, There are many stereo

cameras that are available on the market due to the ease of making such as Kodak

Trang 20

stereo camera, View-Master Personal stereo camera, ZED2, Duo 3D Sensor3, as

illustrated in Fig1.7

Figure 1.6: A stereo images that taken from OpenCV library and the calculated

depth image (A) left image, (B) right image, (C) depth image (disparity map)

Figure 1.7: Some existed stereo camera From left to right: Kodak stereo camera,

View-Master Personal stereo camera, ZED, Duo 3D Sensor

3 Time of Flight (ToF) camera

Time of Flight (ToF) cameras use the same principle as laser radar, which is

instead of transmitting a single beam, short pulses of Infrared(IR) light is sent

The camera will get the return time from pixels across its field of view And the

distance was measured by comparing the phase of the modulated return pulses

with those emitted by the laser (Fig 1.8) But ToF cameras is also suffered from

similar limitations as a ime of flight sensors, including ambiguity of measurements,

multiple reflection, sensitivity to material reflectance, background lighting and do

not operate well outdoors in strong sunlight Some of the popular ToF cameras

can be listed such as: DepthSense4, Fotonic5, Microsoft Kinect v2 (see Fig1.9)

4 Structured light camera

Structured light camera is another approach to measure depth information by

using “structured light”, which is a pattern of light such as array of lines The

scene will be viewed at an angle like illustrating in the Fig1.11 If the pattern is

projected onto a flat wall, the camera will see straight lines but if the scene is very

complex then it will see a more complex profile By analyzing this profiles across

Trang 21

Figure 1.8: Time of flight systems from [3]

Figure 1.9: Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft

Kinect v2

the field to map, depth information can be calculated With traditional method,

structured lights is grids or arrays of lines but it’s affected by a noise So that, in

some newer devices such as PrimeSense or Microsoft Kinect v1 (see Fig 1.10), a

codes will be added in to a light to make the camera is almost zero repetition across

the scene The Kinect v1 uses a randomly distributed speckled pattern and each

speckle looks differently flat at different distance, due to a special lens as can be

seen in the Fig1.12 But this kind of depth sensor also have some limitations such

as the errors grow with the square of the distance to objects, strong quantization

effects (see Fig) and some limitations that similar with ToF system like sensitivity

to material reflectance, do not operate well in strong sunlight

Figure 1.10: Structured light cameras From left to right: PrimeSense, Microsoft

Kinect v1

Trang 22

Figure 1.11: Structured light systems from [ 3 ]

Figure 1.12: Figure from [16], (A) raw IR image with pattern, (B) depth image

Trang 23

(b) Figure 1.13: Figure from [16] (A) Errors for structured light cameras, (B) Quantiza-

tion errors in different distances of a door: 1m, 3m, 5m

Trang 24

5 Mobile Kinect

In my thesis, I used Microsoft Kinect v1 as a capture device of the system due to

its usability and availability To make Kinect become more flexibility, I have added

some components and making it become “Mobile Kinect”, which is Kinect with a

external battery so it can be moved to anywhere without worrying about electrical

sources (holes, cables) and it is easy to replace this external battery To attach on

the human body, Kinect has been mounted into a single belt, so that the Kinect

can be fix mounted easily on the body An other important part of mobile Kinect

is a laptop which plays the role of main processor It contains a data acquisition

and obstacle detection modules The reason of choosing laptop because Kinect is

the commercial device and it have been developed for the video game purposes,

so Kinect cannot operate without a PC And because of the restrictions in Kinect

data cable length, a computer must be placed nearby the Kinect (the whole system

can be seen in Fig 1.14)

Figure 1.14: Prototype of system using mobile Kinect, (A) Kinect with battery and

belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body

Officially, Kinect runs with 12V sources provided by an adapter comes with it by

default In the experiment, it can operate when the voltage drops down to 8.5V

with the running current is about 0.3-0.4A So I designed battery with 8-packs

AAA batteries of which can provides 8x1.5A=12A And the time to drop down

from 12V-8.5V is about 1.5-2 hours in our experiment, which mean the mobile

Kinect can runs 1.5-2 hours with the battery

1.2.4 Environment Context

The environment for development and testing system is public building as mentioned

before More specifically, I just focus on a specific use case, that is walking along the

corridors of building There are two major type of corridor in our context, that is

two sides wall and half opening with one side is the glass windows or opened and the

other side is the solid wall But in our experiment, I aim to develop a system with the

Trang 25

half opening corridor because it’s very popular type in public building such as school,

office and some apartment In the context of my thesis, I tested the system with two

different environments One is our office building at B1-Hanoi University of Science and

Technology, the others is Nguyen Dinh Chieu secondary school for blind pupils (see Fig

1.15) Because depth data is strongly affected by sun light, so the requirement with

environment is that it must not be lighted too strong (in the shady day or in the two

sides wall building, where the sun light cannot reach to)

The use-case was used is that the user (visually impaired people) want to move to another

place in the building and to do that, he/she must go along the corridor, walk up/down

the stair then move to destination point Ignoring the problem with path finding, in

my work, I only aim to obstacle avoidance Obstacles in both cases are the objects that

blocked the moving way such as: distinguisher, trash bin, column, wall and human in

front of the user (can be seen in Fig 1.15)

Figure 1.15: Two different environments that I tested with (A) Our office build (B)

Nguyen Dinh Chieu secondary school

1.3 Difficult Situations Recognition System

In conclusion, my entire system to recognize the difficult situations can be demonstrated

as in Fig 1.16 In this case, the prototype system will be fix mounted on the visually

impaired people To interact with the user, a tactile visual substitution module from

[23] has been used to give the warning about obstacle in front of he/she The mobile

Kinect will be mounted on the human hip to capture depth and color information Those

information will be processed by a recognition module in the laptop behind user After

obstacle has been detected, laptop will send a correspondence command to tactile visual

substitution module in order to give warning message The message representation has

been integrated into this module and presented in [23] So my main work is how to send

a correct command to the feedback module

Trang 26

There are two main contributions in my thesis:

• The first is making the prototype of difficult situations recognition system that

can work in a flexible manner, stability in many different environments

• Second contributions is I proposed a method to detect the obstacle using both

color, depth information and 3D Point Cloud technique, especially on the problem

of early detection the up/down stair and handling with noise in depth data

Trang 27

Related Works

2.1 Assistive systems for visually impaired people

There are a lot of devices and systems have been developed to help visually impaired

people in the daily life This section presents some research based on vision technology

for the visually impaired people that related to my work Each technology/device aims

to cover one or more fields in the Mobility component as shown in Fig1.2in Chapter1

From the point of view of obstacle avoidance system technology, there are two different

technologies are being widely used: an assistive robot to help visually impaired people

with moving activities and wearable devices The advantages and disadvantages of each

technologies will be discussed in the following part

Table2.1shows the comparison between two technologies

Table 2.1: Comparison between assistive robot and wearable device

Assistive Robot Wearable DeviceTypical Example Guided dog robot[22][21][6][17]

Glasses[32][20], mobile phone,white cane,

sensory substitution[5][14]

Advantages

-Can integrate differenttechnologies & devices-Long operating time

-Cheaper than assistive device-Flexible

-Convenience while using

Disadvantages

-Expensive-Inconvenience,hard to get on well with society-Limited environment

(almost works with flat plane only)

-Limited operating time(battery problem)-Limited in technologies-May harm other human senses

Trang 28

With the assistive robot for visually impaired people, in [17] (2004), the authors

devel-oped a simple robot-assisted for indoor navigation as shown in Fig 2.1 In this work, a

lot of passive RFID tags were attached to the environment and the object to help the

navigation (or obstacle avoidance) task To interact with the visually impaired users,

the speech recognition with the single, simple word and wearable keyboards are used,

when the robot is passing an object, it can speak its name through speech synthesized

module But this system is still developing and was only tested in the laboratory

envi-ronment and since RFID tags play the most importance role in this system, it is hard

to apply it in the real, large environment like office and resident

Figure 2.1: Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C)

Nav-igation

In [6], Dalal et al proposed a mobile application along with a robot (named NXT Robot)

to help visually impaired people avoid an obstacle (Fig2.2) in an outdoor environment

This robot and mobile phone are connected to each other by Bluetooth and the

inter-action between human and robot are speech communication technique On the robot,

the authors attach the ultrasonic sensor in order to detect an obstacle and combine this

information with location sent from a mobile phone using GPS When user asks for

going to a destination, the mobile phone will find the route with embedded Google Map

and give voice instructions while user going to the destination The advantage of this

system is using robust navigation applications like Google Map and ultrasonic sensor

But the limitations is this system depends on GPS signal and internet connection so

that it cannot work off-line or in some environment with weak GPS signal With the

obstacle detection, the system only gives the obstacles in front of the robot and in the

complex environment, this system may give unreasonable instructions

Recently, Nguyen et al has introduced a mobile robot [22], [21] for indoor navigation

using computer vision techniques (see Fig2.3) This system consists a PCbot-914, which

is a PC attached with actuators and sensors like wheels, ultrasonic sensors, a normal

webcam and mobile phone The visually impaired user will take the mobile phone,

Trang 29

choose destination location through the touch screen and follow the robot to go to the

destination that he/she wants To communicate with visually impaired people, vibration

patterns on a mobile phone Firstly environment must be modeled in off-line phase to

build a map with image and static objects position at each location At the on-line

phase, the image captured from the camera will be used to match with the database in

the learned model to give a location of robot in building for navigation task, another

module in this system will detect an obstacle in the image in order to give warning to the

users However, this system only works in the limited environment and the off-line model

must be built before using, and visually impaired people must be trained carefully to

use this system (how to recognize a vibration pattern, destination position on the touch

screen) And due to the limitation of the monocular webcam, a lot of unexpected factors

can affect to the system results such as environment and lighting changing

Trang 30

With the wearable devices for helping visually impaired people in daily life, there are

some existing products based on different technologies as presented in the next part The

vOICe system [5] is the head mounted camera which is a form of sensory substitution

and has a functionality of receiving vision information and converting to the sound signal

that blind people can hear it through the stereo headphone The main idea is that the

scene or captured image can be represented as a sound signal pattern where the time is

corresponding to the position in horizontal axes, the pitch is corresponding to vertical

axes of the visual pattern and the loudness stands for brightness Consequently, each

image with the capture rate is about one image per second, blind people will hear a sound

from left to right that represent each position by a pitch and loudness level In order to

know which is the object/scene corresponding to the sound level, the blind people have

been trained before using the system But this device also has some limitations Firstly,

auditory signals are very important for blind people but if they use this device, theirs

hear sensing will be blocked and they can be distracted from the natural environment

In another way, because the image is represented by a pitch, loudness level, as a result,

the system will create a very noisy sound map when moving to the complex environment

and it’s complicated for the blind user to understand the scene

A product in development called BrainPort (from Wicab Inc.) and recently approved

for sale in Europe [32], uses a camera mounted on a pair of sun glasses as its input

device After image processing, images are displayed on the tongue using an electro

tactile display of 49 electrodes to provide directional cues to the blind users This

will be shown as a small electrical “image” on the tongue - which called “lollipop”-like

display as illustrated in Fig2.4 The limitations of this system are that it requires use

of the mouth, and it will reduces the abilities of blind people, especially in speaking and

eating - which are the very important activities and occur frequently Another problem

is the resolution of both electrotacticle and tongue sensitivity is still far from the visual

system So, representing the image directly in the human tongue is not an effective way

to represent a data In my work, I used similar system but the output on electrotactile

just a defined encoded signal - which is easier for the blind people to recognize the

instructions

Very recently in Vietnam, Dr.Nguyen Ba Hai has successfully developed a vision-based

device named “A haptic device for blind people” [20] This is a very low cost and

portable glasses which can help visually impaired people to detect an obstacle using a

single laser transmitter and receiver When the glass detects an obstacle, it’ll trigger

a small vibrator on the forehead so that visually impaired people can feel the obstacle

However, this device is very simple, it cannot detect the potential obstacles come from

two sides of visually impaired people as well as take into account object information

Trang 31

Figure 2.4: BrainPort vision substitution device [32]

2.2 RGB-D based assistive systems for visually impaired

people

Nowadays, with the development of depth sensor, there are some works dedicated to

RGB-D based assistive system for visually impaired people as follows:

NAVI [33] (Navigational Aid for Visually Impaired) is the system which is similar with

my proposal This system also used a Kinect with a battery which is fixed mounted

on helmet There are two main functions in this system called “Micro-Navigation” and

“Macro-Navigation”, Micro-Navigation means obstacle avoidance and Macro-Navigation

is stand for path finding For the purpose of giving information about obstacles to blind

people, the vibrotactile output is provided by a waist belt that contains three pairs

of Arduino LilyPad vibe boards With the obstacle detection, the system can detect

closest obstacles in the left, right and center region of the Kinect’s field of view using

depth histogram and triggers the vibe board in the corresponding direction About

“Macro-Navigation”, the authors used fixed markers to annotate the location and detect

it via Kinect’s RGB camera and using depth image to calculate the person’s distance

to the marker in order to give navigation instructions The output of this function is

synthesized voice as a feedback to the blind people about the instructions to move, for

example ”Open the door”

In [8], the authors present a low-cost system using Kinect sensor for obstacle avoidance

for visually impaired people The Kinect sensor is mounted on the body using a belt It

is connected to a battery pack and to a computing device (smart phone) which provides

audio feedback to the user The system follows 5 main tasks: i) read the data from

Kinect device and express it as 3D point cloud; ii) detect the floor and register the data

in the reference system centered at the user’s feet; iii) detect the occupancy of the volume

in front of the user, at this step, the space is subdivided into a certain numbers of values

each one corresponding to a vertical volume spanning a range of possible directions; iv)

Trang 32

analyze the output of the accelerometer to determine if the user is walking and how fast

it is; v) provide the feedback to the user

Tang et al also presented a RGB-D sensor based computer vision device to improve the

performance of visual prosthesis (retinal prosthesis or tongue stimulator) [27] Firstly,

the patch-based stereo vision will be applied to create a RGB-D data which is already

segmented based on color segmentation, feature point matching and plane fitting and

merging using RANSAC Then they do a technique called “smart sampling” to highlight

the important informations This step includes background removal, parallax

simula-tion, object highlighting and path directions At the final step, the information will

be represented using BrainPort device for line orientation and navigation using line

orientation

In addition, my thesis is also get motivation and inherited the work from Michiel el

al [30] In this work, the authors proposed a obstacle detection algorithm based on

point cloud to run with Kinect Obstacle is defined as obstacle on the floor plane,

door and stair However, with stair and door detection , the authors only tested it

independently along with obstacle detection Fig2.5shows the main process of obstacle

detection algorithms which partial similar with my proposal in 3.1 Firstly, The point

cloud will be down-sampled and filtered to reduce the processing time Then, using

RANSAC, ground plane and wall plane will be removed from point cloud and using

clustering techniques to detect a obstacle in a remaining point cloud data With the

stair detection, a similar process with ground plane detection will be used and using

some pre-defined parameters such as step height, step number, a stairs will be detected

from the point cloud With the door detection, take the observation that door is always

on the wall plane, the authors use segmentation on wall plane based on color to find a

door region The obtained results when applying this algorithms with author’s database

is very promising However, because this algorithm based almost on plane segmentation

and ground plane detection so when apply it to our dataset (MICA dataset, presented in

4.1), where point cloud data is not good on floor plane because of the moving, lighting

condition, the achieved results is still low And with the pre-defined parameters such

as step height, the stair detection algorithm is not robustness when apply it in our

environment where its specifications can be changed

2.3 Stair Detection

Stair detection on an image is a classical problem in computer vision since it is a familiar

object in daily life The most prominent characteristic of the stair is that it has the rigid

form with repetitive structure of stair plane By that way, a lot of lines and edges appear

Trang 33

Figure 2.5: Obstacle detection process from [ 30 ]

in the stair image Therefore, stair detection is the interesting topic for researchers with

traditional computer vision techniques like line detection, edge detection, frequency

analysis, image segmentation

In [26] (2000), before the Hough line detection algorithm has been developed and used

widely, the authors proposed a method to detect the stair in the gray-scale image using

some basic image processing operators and apply with constraints to find out the stair in

the image Stair image is the outdoor scene with good lighting condition and the edges

on stair plane is quite clearly The authors first use Canny detection to extract the edges

and Gabor filer on horizontal and vertical direction to focus on two main directions then

find concurrency lines (hence finding the vanishing point) as hypotheses for stair-cases

However, since this algorithm is based on simple operators and rules, it will very sensitive

with parameters (on Canny, Gabor filter, line’s length) and detection is not good when

there is an object has concurrency lines in the images Fig 2.6illustrated the result of

this system

Recently, with the developing of the sensor with the ability to capture depth information,

the stair detection can be done efficiently in 3D and it is still a good topic to research

about plane detection, plane segmentation in 3D With the stair detection system using

Trang 34

(a) (b)

Figure 2.6: Stair detection from [26] (A) Input image (B)(C)Frequency as a output

of Gabor filter (D)Stair detection result

RGB or RGB-D data, there are some works aim to develop a system which can early

detect the stair in order to give moving instructions to the blind people or control the

robot automatically and will be presented in the following part

In [13], the authors proposed an approach to detect descending stair to control the

autonomous tracked vehicle using the only monocular gray-scale camera Firstly, the

stair will be coarsely detected if it’s far from the robot This is called “Far-approach”

which uses optical flow and texture information to predict the stair By taking the

observation that if the stair is descending, so the region above it is normally the wall

with low texture region Another observation is that when moving, if there is a stair

appears, the optical flow in the is changed very fast from the ground region to the wall

region because there’s a significant changing in the depth of the image This step can

be illustrated in the Fig 2.7below

With the near-approach, detected lines will be applied some constraints to the line using

a median flow of the region above and below the line, the length of line to look for stair

edge

Trang 35

(a) (b)

Figure 2.7: A near-approach for stair detection in [13] (A) Input image with detected

stair region, (B) Texture energy, (C)Input image with detected lines are stair

candi-dates, (D)Optical flow maps in this image, there is a significant changing in the line in

the edge of stair

In [24], the authors proposed a method to detect and model a stair using a depth sensor

Depth image will be used to create point cloud data Firstly, the plane segmentation and

classification will be applied to find planes in a scene This step includes normal

estima-tion, region-growing, planar test, planes extension, cluster extraction and classification

(can be seen in Fig2.8)

After planes have been segmented, the stair will be modeled as a group of parallel planes

Each plane must be satisfied some conditions like size, height, orientation as shown in

Fig 2.9

The advantages of this method are the robustness of depth information and stair was

modeled very explicitly using a lot of conditions applied on stair plane But these

constraints also require the stair plane must be clearly visible in the images when in

the real environment, due to the occlusion, camera viewpoint, these conditions may not

satisfied

Trang 36

Figure 2.8: Example of segmentation and classification in [24]

Figure 2.9: Stair modeling(left) and features in each plane [24]

Yingli Tian [29] proposed a mixed approach between using RGB and Depth Image to

find a stair and crosswalk In the first step, parallel lines will be detected in the RGB

Image to find a group of concurrent lines using Hough transform and line fitting with

geometric constraints This step can be illustrated in Algorithm 1

Algorithm 1 Parallel line detection from [29]

1: Detect edge maps from the RGB image by edge detection

2: Compute the Hough transform of the RGB image to obtain the direction of the line

3: Calculate the peaks in the Hough transform matrix

4: Extract lines segment and its direction in the RGB image

5: Group the line fragments as the same line if the gap is less than a threshold

6: Detect a group of parallel lines based on constraints such as the length and a total

number of detected lines of stairs and pedestrian crosswalks

Then, they extract the depth information in each line then put it into a support vector

machine (SVM) classifier to detect the stair (both upstairs and downstairs) or pedestrian

crosswalks (as seen in Fig 2.10) Finally, they estimate the distance between camera

and stairs to the blind user to give a warning messages In this paper, the authors

Trang 37

presented a robust algorithm to detect a stair take into account the advantages of color

and depth information However, these both color and depth image must be clearly to

get a correct stair detection where color information gives a stair candidates and depth

information was used to confirm that it is the stair or not In my case, since the limited

in the measurable range of Kinect depth sensor, the depth image is not always available

on stair surface and the edge detection algorithm is still sensitive to parameters

Figure 2.10: Stair detection algorithm proposed in [ 29 ] (A) Detected line in the edge

image (using color infomation) (B) Depth profiles in each line (red line: pedestrian

crosswalk, blue: down stair, green: upstair)

Trang 38

Obstacle Detection

3.1 Overview

The entire system flowcharts can be seen in the Fig 3.1 There are 3 main parts in this

flowchart including Kinect side, laptop side and user interface (or feedback module) side

where the laptop side contains almost processing module

Firstly, the laptop will capture data from Kinect By default, Kinect provides many

types such as images, sound, skeleton, accelerometer data but in my work, only depth

image, color image, and accelerometer data will be used Then human will be detected in

the depth image Kinect SDK also provides a human index in each pixel encoded in the

depth image This information is calculated from depth image so it’s very robustness

After human has been detected, the system will store that information as a first detected

obstacles

The next module is obstacle detection in both color and depth image by using point

cloud technique All the captured information will be used to build a point cloud - a

collection of a 3D point in order to reconstruct the environment around the blind people

Then, from the point cloud, some techniques will be applied to find an obstacle like plane

segmentation, ground and wall plane detection and then obstacle detection The output

of this module is detected obstacle including normal obstacle and stairs

Another individual module is stair detection based on the color image This module will

detect stair directly in the color image by using some line extraction technique and the

geometry relationships between lines

In the obstacle fusion module, all the obstacle will be checked again for its availability in

the scene and return a most importance (or dangerous) obstacle to the blind people in

order to send the commands to user interface module through obstacle warning module

Tiêu đề	Nhận dạng các tình huống khó ứng dụng trong trợ giúp người khiếm thị sử dụng Kinect di động
Tác giả	Hoang Van Nam
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Khoa học Máy Tính
Thể loại	Luận văn Thạc sĩ
Năm xuất bản	2014
Thành phố	Hà Nội

Định dạng
Số trang	76
Dung lượng	18,7 MB