Difficult situatioins recognition system for visually impaired aid using a mobile kinect

The 3D imaging techniques were used todetect the general obstacle including: plane segmentation, 3D point clustering and themixed strategy between depth and color image is used to detect

Trang 2

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

-

Hoang Van Nam

DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR VISUALLY-IMPAIRED AID USING A MOBILE KINECT

Department : COMPUTER SCIENCE

Trang 3

Độc lập – Tự do – Hạnh phúc

BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ

Họ và tên tác giả luận văn : ……… ………

Đề tài luận văn: ……… ……… ….

Chuyên ngành:……… ……… …

Mã số SV:……… ……… …

Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày… ………… với các nội dung sau: ……… …………

……… ………

Ngày tháng năm

CHỦ TỊCH HỘI ĐỒNG

Trang 4

I, Hoang Van Nam, declare that this thesis titled, ’Difficult situations recognition forvisual-impaired aid using mobile Kinect’ and the work presented in it are my own Iconfirm that:

This work was done wholly or mainly while in candidature for a research degree

at this University

Where any part of this thesis has previously been submitted for a degree or anyother qualification at this University or any other institution, this has been clearlystated

Where I have consulted the published work of others, this is always clearly tributed

at- Where I have quoted from the work of others, the source is always given Withthe exception of such quotations, this thesis is entirely my own work

I have acknowledged all main sources of help

Where the thesis is based on work done by myself jointly with others, I have madeclear exactly what was done by others and what I have contributed myself

Signed:

Date:

i

Trang 5

International Research Institute MICAComputer Vision Department

Master of ScienceDifficult situations recognition for visual-impaired aid using mobile Kinect

by Hoang Van Nam

By 2014, according to figures from some organization, here are more than one millionpeople in the Vietnam living with sight loss, about 1.3% of Vietnamese people Althoughthe big impact to the daily living, especially with the ability to move, read, communicatewith another, only a small percentage of blind or visually impaired people live withassistive device or animal such as a dog guide Motivated by the significant changes intechnology have take place in the last decade, especially in the introduction of variestypes of sensors as well as the development in the field of computer vision, I present

in this thesis a difficult situations recognition system for visually impaired aid using amobile Kinect This system is based on data captured from Kinect and using computervision technique to detect obstacle At the current prototype, I only focused on detectingobstacle in the indoor environment like public building and two types of obstacle will

be exploited: general obstacle in the moving way and staircases-which causes a bigdangerous to the visually impaired people The 3D imaging techniques were used todetect the general obstacle including: plane segmentation, 3D point clustering and themixed strategy between depth and color image is used to detect the staircase based ondetecting the stair edges and its structure The system is very reliable with the detectionrate is about 82.9% and the time to process each frame is 493 ms

Trang 6

I am so honor to be here the second time, in one of the finest university in Vietnam towrite those grateful words to people who have been supporting, guiding me from thevery first moment when I was a university student until now, when I am writing mymaster thesis.

I am grateful to my supervisor, Dr Le Thi Lan, whose expertise, understanding, ous guidance and support made it possible for me to work on a topic that was of greatinterest to me It was a pleasure to work with her

gener-Special thanks to Dr Tran Thi Thanh Hai, Dr Vu Hai and Dr Nguyen Thi Thuy(VNUA) and all of the members in the Computer Vision Department, MICA Institutefor their sharp comments, guidance for my works which helps me a lot in how to studyand to do research in right way and also the valuable advices and encouragements thatthey gave to me during my thesis

I would like to express my gratitude to Prof Veelaert Peter, Dr Luong Quang Hiepand Mr Michiel Vlaminck at Ghent University, Belgium for their supporting It’s been

a great honor to cooperate and work with them

Finally, I would especially like to thank my family and friends for their continues love,support they have given me through my life, helps me pass through all the frustrating,struggling, confusing Thanks for everything that helped me get to this day

Hanoi, 19/02/2016Hoang Van Nam

iii

Trang 7

Declaration of Authorship i

1.1 Motivation 1

1.2 Definition 2

1.2.1 Assistive systems for visually impaired people 2

1.2.2 Difficult situations 3

1.2.3 Mobile Kinect 5

1.2.4 Environment Context 11

1.3 Difficult Situations Recognition System 12

1.4 Thesis Contributions 13

2 Related Works 14 2.1 Assistive systems for visually impaired people 14

2.2 RGB-D based assistive systems for visually impaired people 18

2.3 Stair Detection 19

3 Obstacle Detection 25 3.1 Overview 25

3.2 Data Acquisition 26

3.3 Point Cloud Registration 27

3.4 Plane Segmentation 30

3.5 Ground & Wall Plane Detection 32

iv

Trang 8

3.6 Obstacle Detection 32

3.7 Stair Detection 34

3.7.1 Stair definition 34

3.7.2 Color-based stair detection 35

3.7.3 Depth-based stair detection 45

3.7.4 Result fusion 46

3.8 Obstacle information representation 48

4 Experiments 49 4.1 Dataset 49

4.2 Difficult situation recognition evaluation 51

4.2.1 Obstacle detection evaluation 51

4.2.2 Stair detection evaluation 53

5 Conclusions and Future Works 58 5.1 Conclusions 58

5.2 Future Works 59

Trang 9

1.1 A Comprehensive Assistive Technology (CAT) Model provided by [12] 3

1.2 A model for activities attribute and mobility provided by [12] 4

1.3 Distribution of frequencies of head-level accidents for blind people [18] 4

1.4 Distribution of frequencies of tripping resulting a fall [18] 5

1.5 A typical example of depth image (A) raw depth image, (B) depth image is visualized by jet color map and the colorbar shows the real distance with each color value, (C) Reconstructed 3D scene 6

1.6 A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map) 7 1.7 Some existed stereo camera From left to right: Kodak stereo camera, View-Master Personal stereo camera, ZED, Duo 3D Sensor 7

1.8 Time of flight systems from [3] 8

1.9 Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft Kinect v2 8

1.10 Structured light cameras From left to right: PrimeSense, Microsoft Kinect v1 8

1.11 Structured light systems from [3] 9

1.12 Figure from [16], (A) raw IR image with pattern, (B) depth image 9

1.13 Figure from [16] (A) Errors for structured light cameras, (B) Quantization errors in different distances of a door: 1m, 3m, 5m 10

1.14 Prototype of system using mobile Kinect, (A) Kinect with battery and belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body 11

1.15 Two different environments that I tested with (A) Our office build (B) Nguyen Dinh Chieu secondary school 12

1.16 Prototype of our obstacle detection and warning system 13

2.1 Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C) Nav-igation 15

2.2 NXT Robot System from [6] (A) The system’s Block Diagram, (B) NXT Robot 16

2.3 Mobile robot from [22] [21] 16

2.4 BrainPort vision substitution device [32] 18

2.5 Obstacle detection process from [30] 20

2.6 Stair detection from [26] (A) Input image (B)(C)Frequency as a output of Gabor filter (D)Stair detection result 21

vi

Trang 10

2.7 A near-approach for stair detection in [13] (A) Input image with detected stair region, (B) Texture energy, (C)Input image with detected lines are stair candidates, (D)Optical flow maps in this image, there is a significant

changing in the line in the edge of stair 22

2.8 Example of segmentation and classification in [24] 23

2.9 Stair modeling(left) and features in each plane [24] 23

2.10 Stair detection algorithm proposed in [29] (A) Detected line in the edge image (using color infomation) (B) Depth profiles in each line (red line: pedestrian crosswalk, blue: down stair, green: upstair) 24

3.1 Obstacle Detection Flowchart 26

3.2 Kinect mounted on body 27

3.3 Coordinate Transformation Process 28

3.4 Kinect Coordinate 29

3.5 Point Cloud rotation using normal vector of ground plane (while arrow): left: before rotating, right: after rotating 30

3.6 Normal vector estimation algorithms [15] (a) Normal vector of the center point can be calculated by a cross product of two vectors of four neighbor points (red), (b) Normal vector estimation in a scene 31

3.7 Plane segmentation result using algorithm proposed in [15] Each plane is represented by a distinctive color 31

3.8 Detected Ground and Walls plane (ground: blue, wall: red) 33

3.9 Human Segmentation Data by Microsoft Kinect SDK (a) Color Image, (b) Human Mask 34

3.10 Detected Obstacles (a) Color Image, (b) Detected Obstacles 34

3.11 Model of stair 35

3.12 Coordinate transformation models from [7] 36

3.13 Projective chirping: a) A real world object that generate a projection with ”chirping” - ”periodicity-in-perspective” b) Center raster of image c) Best fit projective chirp 38

3.14 A pin-hole camera model with stair 38

3.15 A vertical Gabor filter kernel 39

3.16 Gabor filter applied on a color image (a) Original (b) Filtered Image 40

3.17 Thresholding the grayscale image (a) Original (b) Thresholded Image 40

3.18 Example of thinning image using morphological 41

3.19 Thresholding the grayscale image (a) Original (b) Thresholded Image 42

3.20 Six points vote for a line will make an intersection in Hough space, this intersection has higher intensity than neighbor pixels 42

3.21 Hough space (a) Line in the original space (b) Three curves vote for this line in Hough space 43

3.22 Hough space on stair image (a) Original image (b) Hough space 43

3.23 Chirp pattern detection (a) Hough space (b) Original image with detected chirp pattern 44

3.24 Point cloud of stair (a) Original color image (b)Point cloud data created from color and depth image 45

3.25 Detected steps 46

3.26 Detected planes 47

3.27 Detected stair on point cloud 47

Trang 11

3.28 Obstacle position quantization for sending warning message to visuallyimpaired people 48

4.1 Depth image encoding (A) Original, (B) Visualized Image (C) EncodedImage 50

4.2 Detection time of each step in our proposed method 52

4.3 Example stair image to evaluation (A)Positive sample from MICA dataset(B) Negative sample from MICA dataset (C) Positive sample from MONASHdataset (D) Negative sample from MONASH dataset 54

4.4 Detected stair in Tian’s based method (A-F) and detected stair in myproposal method (G-I) (A) Color image (B) Depth image (C) Edges (D)Line segments (E) Detected concurrent lines (F) Depth values on detectedlines (G) Detected stair with blue lines are false stair edge and greenlines are stair edge (H) Edges Image, (I) Detected peaks in Hough mapcorresponding to lines in Figure G 55

4.5 Miss detection in Tian’s based method because of missed depth on F) and detected stair in my proposed method (G-I) 56

stair(A-4.6 Miss detection in Tian’s based method because of missed depth on F) and detected stair in my proposed method (G-I) 57

Trang 12

stair(A-2.1 Comparison between assistive robot and wearable device 14

4.1 Database specifications 50

4.2 Pixel level evaluation result (TP,FP,FN: million pixels) 52

4.3 Object level evaluation result (TP,FP,FN: objects) 52

4.4 Stair dataset for evaluation 53

4.5 Stair detection result of the proposed method on different datasets 53

4.6 Comparison of the proposed method and the method of Tian et al [29] on MICA dataset 55

ix

Trang 13

PCL Point Cloud LibraryCAT Comprehensive Assistive TechnologyTDU Tongue Display Unit

IR InfraredOpenCV Open Computer VisionRGB Red Green Blue

RGB-D Red Green Blue and DepthToF Time of Flight

x

Trang 14

According to the official statistic of National Eye Hospital in 2002, Vietnam has about900.000 blind people, including about 400.000 who are totally blind By 2014, according

to figures from some organizations, the number of blind people in Vietnam is about 1,2

to 1,4 million people, it’s still a large number in comparison with other countries Inthe worldwide, the visually impaired population is estimated to number in excess of 285million by the investigation of World Health Organization (August 2014)1 About 90% ofthem live in developing countries with low-income settings Visually impaired has made

a big impact in their daily living Especially they can not read the document, the ability

to move and to communicate with other people is compromised because the information

is received primarily through vision All of the above things have led blindness becomethe public heath problem in all over the world

Nowadays, with the significant developing in the technology, lots of assistive deviceshas been released in order to help visually impaired people in daily life But althoughmany researchers and companies are concerned with making better and cheap device

to improve the comfort of visually impaired people, the research in this field still mains many unsolved issues and in general, those devices still cannot replace traditionalmethods such as the white cane or guided dog

re-Take the motivation on the significant changes in technology have take place in the lastdecade, especially in the introduction of varies types of sensors as well as the development

in the field of computer vision, my thesis aims to build a prototype of system to helpvisually impaired people avoid the obstacle in the environment using Kinect sensor

1 http://www.who.int/mediacentre/factsheets/fs282/en/

1

Trang 15

With the Kinect, the benefit is that we can make a reliable system by using depth andcolor information to detect the obstacle with an affordable price In my thesis, due tothe lack of time, I only focus on indoor environment, more specifically in the publicbuilding such as apartment or office in order to detect some general objects encountered

on the moving way and stair which may cause danger to the visually impaired people

My thesis is organized as follows:

First, I shall give some definitions in the context of my work and the contributions inthis thesis

In chapter 2, I shall review briefly some other works that related to my system such

as existing assistive devices, obstacle detection algorithms/systems, its advantages anddisadvantages

In chapter 3, a framework for obstacle detection will be developed and I shall presentthe details of each module and also the entire system, analyzing and assessing them

In the next chapter, I shall give some experiments results of my system, including how

to prepare the dataset, how to make an evaluation and the final results

In the final chapter, I end this work by giving some conclusions and future works tomake the system more complete and effective

1.2.1 Assistive systems for visually impaired people

According to [12], assistive systems for visually impaired people can be understood as

an equipment, devices or systems which can be used to overcome the gap between adisabled person wants to do and what the social allows them to do In short, suchkind of system must be able to help the visually impaired people to do the things thatnormal people can do And this system can be model by the Comprehensive AssistiveTechnology (CAT) Model as shown in Fig 1.1 The top level of this model consist offour components that can be used to define all assistive technology systems:

• Context (in which the assistive technology will be used)

• Person (what kind of user can use this system)

• Activities (what activities that assistive system can help the visually impairedpeople, can be seen more clearly in Fig1.2)

Trang 16

• Assistive Technology (technology will be used to make a system)

Most of the existing systems are aimed at solving one specific aspect of each branch inthe model: work on bounded defined context, with some certain types of users to helpthem in specific activities in daily life In the framework of my master thesis, to simplifythe system, I just focused on some certain aspects of this model and I will explain indetail in the next sections In short, I applied my system with the local settings ofcontext, in a small public building such as office, department and the users are thevisually impaired students at the Nguyen Dinh Chieu Secondary school to help themavoid obstacles in a moving way

PersonContext

Culture &

Social contextNational contextLocal settings

Activity specificationDesign issues

End user issuesSystem technology issues

Characteristics

Social aspectsAttitudes

Figure 1.1: A Comprehensive Assistive Technology (CAT) Model provided by [12]

1.2.2 Difficult situations

Fig 1.2 shows detail information of activities branch in the model CAT (see Fig 1.1)

As shown in the figure, there are a lot of services that can be used in assistive systemsfor visually impaired people such as mobility, daily living, cognitive activities, educationand employment, recreational activities, communication and access to information Butmost of exist works focus on the mobility component in the activities model because ofits important role for visually impaired people daily life

Trang 17

Recreational activities Education and Employment Cognitive activities

Daily living Mobility

Communication & Access to information

Movement on ramps,

slopes, stairs & hills

Long & medium distance

locomotion

Sitting and standing

Reaching and lifting

Navigation and orientation

Sitting and standing

Obstacle avoidance

Short distance locomotion

inside & outside

Access to environment

Obstacle avoidance

Our Focus

Figure 1.2: A model for activities attribute and mobility provided by [12]

According to the survey of R.Manduchi [18] in 2011 with 300 respondents who arelegally blind or blind, there were half of the respondents said that they had an head-level accident at least once in a week and about 30% respondents fell down at least once

a month (see Fig 1.3 and Fig 1.4) Therefore, helping visually impaired people in themoving process is always an interested topic for researchers, social organizations andcompanies In fact, many products have been released, and also have some particularsuccess like the system proposed in [11], [10], [1] and [4]

Figure 1.3: Distribution of frequencies of head-level accidents for blind people [18]

Trang 18

Figure 1.4: Distribution of frequencies of tripping resulting a fall [18]

In the context of my thesis, I aim to develop a system which can detect the obstacles

in visually impaired people’s moving way which are the main cause of the accidentsmentioned above The scenario in this project is that visually impaired people want tomove on the hallway inside a public building, so they need to avoid obstacles includingmoving or static objects and to go up/down the stair Obstacle in my case can be defined

as objects laying on the ground or in front of the visually impaired people that he/shecan be harmed while moving if encountered these objects Although obstacle’s class isvery important with the visually impaired people to distinguish which is more dangerousand which is not but in my work, I just try to detect obstacle in the scene without sayingits name (make a classification) And within the framework of this thesis, I also focus ondetection another special object that often appears in the building and is very dangerousfor the visually impaired people, that is the stair Moreover, the proposed system willonly give a warning to the blind people using Tongue Display Unit (TDU) which hasbeen already developed by Thanh-Huong Nguyen in 2013 [23] In brief, my proposedsystem aims to solve two aspects of mobility component of the activities model (seeFig 1.2): obstacle avoidance and movement on ramps, slopes, stair & hill and withthe second aspect, the current system just stop at the level of given warning distance ofstairs to the visually impaired people in order to assist them in going up/down stairs

1.2.3 Mobile Kinect

1 Introduction To assist visually impaired persons in those difficult situations, in

my thesis, I proposed using a Kinect sensor to capture the information of the ronment in order to detect obstacles if they appear There are a lot of advantageswhen using Kinect in this system since it is a popular RGB-D camera with cheapprice But firstly, I will give some brief information about the depth camera whereKinect is the typical example

Trang 19

envi-Depth camera is actually a sensor which has the capacity to provide depth formation (depth image or depth map) A depth map is an image that containsinformation relating to the distance of the surface of scene objects from a view-point, for example in the Fig 1.5 An intensity value of each pixel in a depthmap represents a distance from a point in the object to the camera Therefore,3D information of the scene can be reconstructed by using depth image (as shown

in-in Fig 1.5-C) The benefit of the depth image that is not affected by lightingconditions

Figure 1.5: A typical example of depth image (A) raw depth image, (B) depth image

is visualized by jet color map and the colorbar shows the real distance with each color

value, (C) Reconstructed 3D scene

In recent years, with the development of technology, especially in the field ofsensor fabrication industry, there are a lot of cameras have been placed on themarket which is capable of capturing the depth information Those devices can beseparated into several groups by used technology such as stereo camera: ZED, forexample, Time-of-Flight (ToF) like ZCam, the structured light camera like Kinect,long range 3D camera Each device has it own advantages, disadvantages and onlysuitable for a particular use case

2 Stereo Camera

Stereo camera is the kind of camera was used in the robotics since its early days.Take the ideas of human binocular vision, it contains two or more cameras withprecisely known relative offsets Depth information can be calculated by matchingsimilar point in the overlapped region between images Hence, 3D distance tomatching points can be determined using triangulation like illustrated in Fig1.6.However, the camera is used in this case is still the color camera As a result, it isstill affected by the changing of lighting conditions On the other hand, the depthimage is calculated by matching algorithms, so it works very poorly when thescene is texture-less, for example image of wall, building, There are many stereocameras that are available on the market due to the ease of making such as Kodak

Trang 20

stereo camera, View-Master Personal stereo camera, ZED2, Duo 3D Sensor3, asillustrated in Fig1.7.

Figure 1.6: A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map)

Figure 1.7: Some existed stereo camera From left to right: Kodak stereo camera,

View-Master Personal stereo camera, ZED, Duo 3D Sensor

3 Time of Flight (ToF) camera

Time of Flight (ToF) cameras use the same principle as laser radar, which isinstead of transmitting a single beam, short pulses of Infrared(IR) light is sent.The camera will get the return time from pixels across its field of view And thedistance was measured by comparing the phase of the modulated return pulseswith those emitted by the laser (Fig 1.8) But ToF cameras is also suffered fromsimilar limitations as a ime of flight sensors, including ambiguity of measurements,multiple reflection, sensitivity to material reflectance, background lighting and donot operate well outdoors in strong sunlight Some of the popular ToF camerascan be listed such as: DepthSense4, Fotonic5, Microsoft Kinect v2 (see Fig1.9)

4 Structured light camera

Structured light camera is another approach to measure depth information byusing “structured light”, which is a pattern of light such as array of lines Thescene will be viewed at an angle like illustrating in the Fig1.11 If the pattern isprojected onto a flat wall, the camera will see straight lines but if the scene is verycomplex then it will see a more complex profile By analyzing this profiles across

Trang 21

Figure 1.8: Time of flight systems from [3]

Figure 1.9: Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft

Kinect v2

the field to map, depth information can be calculated With traditional method,structured lights is grids or arrays of lines but it’s affected by a noise So that, insome newer devices such as PrimeSense or Microsoft Kinect v1 (see Fig 1.10), acodes will be added in to a light to make the camera is almost zero repetition acrossthe scene The Kinect v1 uses a randomly distributed speckled pattern and eachspeckle looks differently flat at different distance, due to a special lens as can beseen in the Fig1.12 But this kind of depth sensor also have some limitations such

as the errors grow with the square of the distance to objects, strong quantizationeffects (see Fig) and some limitations that similar with ToF system like sensitivity

to material reflectance, do not operate well in strong sunlight

Figure 1.10: Structured light cameras From left to right: PrimeSense, Microsoft

Kinect v1

Trang 22

Figure 1.11: Structured light systems from [ 3 ]

Figure 1.12: Figure from [16], (A) raw IR image with pattern, (B) depth image

Trang 23

(b) Figure 1.13: Figure from [16] (A) Errors for structured light cameras, (B) Quantiza-

tion errors in different distances of a door: 1m, 3m, 5m

Trang 24

5 Mobile Kinect

In my thesis, I used Microsoft Kinect v1 as a capture device of the system due toits usability and availability To make Kinect become more flexibility, I have addedsome components and making it become “Mobile Kinect”, which is Kinect with aexternal battery so it can be moved to anywhere without worrying about electricalsources (holes, cables) and it is easy to replace this external battery To attach onthe human body, Kinect has been mounted into a single belt, so that the Kinectcan be fix mounted easily on the body An other important part of mobile Kinect

is a laptop which plays the role of main processor It contains a data acquisitionand obstacle detection modules The reason of choosing laptop because Kinect isthe commercial device and it have been developed for the video game purposes,

so Kinect cannot operate without a PC And because of the restrictions in Kinectdata cable length, a computer must be placed nearby the Kinect (the whole systemcan be seen in Fig 1.14)

1.2.4 Environment Context

The environment for development and testing system is public building as mentionedbefore More specifically, I just focus on a specific use case, that is walking along thecorridors of building There are two major type of corridor in our context, that istwo sides wall and half opening with one side is the glass windows or opened and theother side is the solid wall But in our experiment, I aim to develop a system with the

Trang 25

half opening corridor because it’s very popular type in public building such as school,office and some apartment In the context of my thesis, I tested the system with twodifferent environments One is our office building at B1-Hanoi University of Science andTechnology, the others is Nguyen Dinh Chieu secondary school for blind pupils (see Fig

1.15) Because depth data is strongly affected by sun light, so the requirement withenvironment is that it must not be lighted too strong (in the shady day or in the twosides wall building, where the sun light cannot reach to)

The use-case was used is that the user (visually impaired people) want to move to anotherplace in the building and to do that, he/she must go along the corridor, walk up/downthe stair then move to destination point Ignoring the problem with path finding, in

my work, I only aim to obstacle avoidance Obstacles in both cases are the objects thatblocked the moving way such as: distinguisher, trash bin, column, wall and human infront of the user (can be seen in Fig 1.15)

Figure 1.15: Two different environments that I tested with (A) Our office build (B)

Nguyen Dinh Chieu secondary school

In conclusion, my entire system to recognize the difficult situations can be demonstrated

as in Fig 1.16 In this case, the prototype system will be fix mounted on the visuallyimpaired people To interact with the user, a tactile visual substitution module from[23] has been used to give the warning about obstacle in front of he/she The mobileKinect will be mounted on the human hip to capture depth and color information Thoseinformation will be processed by a recognition module in the laptop behind user Afterobstacle has been detected, laptop will send a correspondence command to tactile visualsubstitution module in order to give warning message The message representation hasbeen integrated into this module and presented in [23] So my main work is how to send

a correct command to the feedback module

Trang 26

There are two main contributions in my thesis:

• The first is making the prototype of difficult situations recognition system thatcan work in a flexible manner, stability in many different environments

• Second contributions is I proposed a method to detect the obstacle using bothcolor, depth information and 3D Point Cloud technique, especially on the problem

of early detection the up/down stair and handling with noise in depth data

Trang 27

Related Works

There are a lot of devices and systems have been developed to help visually impairedpeople in the daily life This section presents some research based on vision technologyfor the visually impaired people that related to my work Each technology/device aims

to cover one or more fields in the Mobility component as shown in Fig1.2in Chapter1

From the point of view of obstacle avoidance system technology, there are two differenttechnologies are being widely used: an assistive robot to help visually impaired peoplewith moving activities and wearable devices The advantages and disadvantages of eachtechnologies will be discussed in the following part

Table2.1shows the comparison between two technologies

Table 2.1: Comparison between assistive robot and wearable device

Assistive Robot Wearable DeviceTypical Example Guided dog robot[22][21][6][17]

Glasses[32][20], mobile phone,white cane,

sensory substitution[5][14]

Advantages

-Can integrate differenttechnologies & devices-Long operating time

-Cheaper than assistive device-Flexible

-Convenience while using

Disadvantages

-Expensive-Inconvenience,hard to get on well with society-Limited environment

(almost works with flat plane only)

-Limited operating time(battery problem)-Limited in technologies-May harm other human senses

14

Trang 28

With the assistive robot for visually impaired people, in [17] (2004), the authors oped a simple robot-assisted for indoor navigation as shown in Fig 2.1 In this work, alot of passive RFID tags were attached to the environment and the object to help thenavigation (or obstacle avoidance) task To interact with the visually impaired users,the speech recognition with the single, simple word and wearable keyboards are used,when the robot is passing an object, it can speak its name through speech synthesizedmodule But this system is still developing and was only tested in the laboratory envi-ronment and since RFID tags play the most importance role in this system, it is hard

devel-to apply it in the real, large environment like office and resident

Figure 2.1: Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C)

Nav-igation

In [6], Dalal et al proposed a mobile application along with a robot (named NXT Robot)

to help visually impaired people avoid an obstacle (Fig2.2) in an outdoor environment.This robot and mobile phone are connected to each other by Bluetooth and the inter-action between human and robot are speech communication technique On the robot,the authors attach the ultrasonic sensor in order to detect an obstacle and combine thisinformation with location sent from a mobile phone using GPS When user asks forgoing to a destination, the mobile phone will find the route with embedded Google Mapand give voice instructions while user going to the destination The advantage of thissystem is using robust navigation applications like Google Map and ultrasonic sensor.But the limitations is this system depends on GPS signal and internet connection sothat it cannot work off-line or in some environment with weak GPS signal With theobstacle detection, the system only gives the obstacles in front of the robot and in thecomplex environment, this system may give unreasonable instructions

Recently, Nguyen et al has introduced a mobile robot [22], [21] for indoor navigationusing computer vision techniques (see Fig2.3) This system consists a PCbot-914, which

is a PC attached with actuators and sensors like wheels, ultrasonic sensors, a normalwebcam and mobile phone The visually impaired user will take the mobile phone,

Trang 29

choose destination location through the touch screen and follow the robot to go to thedestination that he/she wants To communicate with visually impaired people, vibrationpatterns on a mobile phone Firstly environment must be modeled in off-line phase tobuild a map with image and static objects position at each location At the on-linephase, the image captured from the camera will be used to match with the database inthe learned model to give a location of robot in building for navigation task, anothermodule in this system will detect an obstacle in the image in order to give warning to theusers However, this system only works in the limited environment and the off-line modelmust be built before using, and visually impaired people must be trained carefully touse this system (how to recognize a vibration pattern, destination position on the touchscreen) And due to the limitation of the monocular webcam, a lot of unexpected factorscan affect to the system results such as environment and lighting changing.

Trang 30

With the wearable devices for helping visually impaired people in daily life, there aresome existing products based on different technologies as presented in the next part ThevOICe system [5] is the head mounted camera which is a form of sensory substitutionand has a functionality of receiving vision information and converting to the sound signalthat blind people can hear it through the stereo headphone The main idea is that thescene or captured image can be represented as a sound signal pattern where the time iscorresponding to the position in horizontal axes, the pitch is corresponding to verticalaxes of the visual pattern and the loudness stands for brightness Consequently, eachimage with the capture rate is about one image per second, blind people will hear a soundfrom left to right that represent each position by a pitch and loudness level In order toknow which is the object/scene corresponding to the sound level, the blind people havebeen trained before using the system But this device also has some limitations Firstly,auditory signals are very important for blind people but if they use this device, theirshear sensing will be blocked and they can be distracted from the natural environment.

In another way, because the image is represented by a pitch, loudness level, as a result,the system will create a very noisy sound map when moving to the complex environmentand it’s complicated for the blind user to understand the scene

A product in development called BrainPort (from Wicab Inc.) and recently approvedfor sale in Europe [32], uses a camera mounted on a pair of sun glasses as its inputdevice After image processing, images are displayed on the tongue using an electrotactile display of 49 electrodes to provide directional cues to the blind users Thiswill be shown as a small electrical “image” on the tongue - which called “lollipop”-likedisplay as illustrated in Fig2.4 The limitations of this system are that it requires use

of the mouth, and it will reduces the abilities of blind people, especially in speaking andeating - which are the very important activities and occur frequently Another problem

is the resolution of both electrotacticle and tongue sensitivity is still far from the visualsystem So, representing the image directly in the human tongue is not an effective way

to represent a data In my work, I used similar system but the output on electrotactilejust a defined encoded signal - which is easier for the blind people to recognize theinstructions

Very recently in Vietnam, Dr.Nguyen Ba Hai has successfully developed a vision-baseddevice named “A haptic device for blind people” [20] This is a very low cost andportable glasses which can help visually impaired people to detect an obstacle using asingle laser transmitter and receiver When the glass detects an obstacle, it’ll trigger

a small vibrator on the forehead so that visually impaired people can feel the obstacle.However, this device is very simple, it cannot detect the potential obstacles come fromtwo sides of visually impaired people as well as take into account object information

Trang 31

Figure 2.4: BrainPort vision substitution device [32]

people

Nowadays, with the development of depth sensor, there are some works dedicated toRGB-D based assistive system for visually impaired people as follows:

NAVI [33] (Navigational Aid for Visually Impaired) is the system which is similar with

my proposal This system also used a Kinect with a battery which is fixed mounted

on helmet There are two main functions in this system called “Micro-Navigation” and

“Macro-Navigation”, Micro-Navigation means obstacle avoidance and Macro-Navigation

is stand for path finding For the purpose of giving information about obstacles to blindpeople, the vibrotactile output is provided by a waist belt that contains three pairs

of Arduino LilyPad vibe boards With the obstacle detection, the system can detectclosest obstacles in the left, right and center region of the Kinect’s field of view usingdepth histogram and triggers the vibe board in the corresponding direction About

“Macro-Navigation”, the authors used fixed markers to annotate the location and detect

it via Kinect’s RGB camera and using depth image to calculate the person’s distance

to the marker in order to give navigation instructions The output of this function issynthesized voice as a feedback to the blind people about the instructions to move, forexample ”Open the door”

In [8], the authors present a low-cost system using Kinect sensor for obstacle avoidancefor visually impaired people The Kinect sensor is mounted on the body using a belt It

is connected to a battery pack and to a computing device (smart phone) which providesaudio feedback to the user The system follows 5 main tasks: i) read the data fromKinect device and express it as 3D point cloud; ii) detect the floor and register the data

in the reference system centered at the user’s feet; iii) detect the occupancy of the volume

in front of the user, at this step, the space is subdivided into a certain numbers of valueseach one corresponding to a vertical volume spanning a range of possible directions; iv)

Trang 32

analyze the output of the accelerometer to determine if the user is walking and how fast

it is; v) provide the feedback to the user

Tang et al also presented a RGB-D sensor based computer vision device to improve theperformance of visual prosthesis (retinal prosthesis or tongue stimulator) [27] Firstly,the patch-based stereo vision will be applied to create a RGB-D data which is alreadysegmented based on color segmentation, feature point matching and plane fitting andmerging using RANSAC Then they do a technique called “smart sampling” to highlightthe important informations This step includes background removal, parallax simula-tion, object highlighting and path directions At the final step, the information will

be represented using BrainPort device for line orientation and navigation using lineorientation

In addition, my thesis is also get motivation and inherited the work from Michiel el

al [30] In this work, the authors proposed a obstacle detection algorithm based onpoint cloud to run with Kinect Obstacle is defined as obstacle on the floor plane,door and stair However, with stair and door detection , the authors only tested itindependently along with obstacle detection Fig2.5shows the main process of obstacledetection algorithms which partial similar with my proposal in 3.1 Firstly, The pointcloud will be down-sampled and filtered to reduce the processing time Then, usingRANSAC, ground plane and wall plane will be removed from point cloud and usingclustering techniques to detect a obstacle in a remaining point cloud data With thestair detection, a similar process with ground plane detection will be used and usingsome pre-defined parameters such as step height, step number, a stairs will be detectedfrom the point cloud With the door detection, take the observation that door is always

on the wall plane, the authors use segmentation on wall plane based on color to find adoor region The obtained results when applying this algorithms with author’s database

is very promising However, because this algorithm based almost on plane segmentationand ground plane detection so when apply it to our dataset (MICA dataset, presented in

4.1), where point cloud data is not good on floor plane because of the moving, lightingcondition, the achieved results is still low And with the pre-defined parameters such

as step height, the stair detection algorithm is not robustness when apply it in ourenvironment where its specifications can be changed

Stair detection on an image is a classical problem in computer vision since it is a familiarobject in daily life The most prominent characteristic of the stair is that it has the rigidform with repetitive structure of stair plane By that way, a lot of lines and edges appear

Trang 33

Figure 2.5: Obstacle detection process from [ 30 ]

in the stair image Therefore, stair detection is the interesting topic for researchers withtraditional computer vision techniques like line detection, edge detection, frequencyanalysis, image segmentation

In [26] (2000), before the Hough line detection algorithm has been developed and usedwidely, the authors proposed a method to detect the stair in the gray-scale image usingsome basic image processing operators and apply with constraints to find out the stair inthe image Stair image is the outdoor scene with good lighting condition and the edges

on stair plane is quite clearly The authors first use Canny detection to extract the edgesand Gabor filer on horizontal and vertical direction to focus on two main directions thenfind concurrency lines (hence finding the vanishing point) as hypotheses for stair-cases.However, since this algorithm is based on simple operators and rules, it will very sensitivewith parameters (on Canny, Gabor filter, line’s length) and detection is not good whenthere is an object has concurrency lines in the images Fig2.6 illustrated the result ofthis system

Recently, with the developing of the sensor with the ability to capture depth information,the stair detection can be done efficiently in 3D and it is still a good topic to researchabout plane detection, plane segmentation in 3D With the stair detection system using

Trang 34

(a) (b)

Figure 2.6: Stair detection from [26] (A) Input image (B)(C)Frequency as a output

of Gabor filter (D)Stair detection result

RGB or RGB-D data, there are some works aim to develop a system which can earlydetect the stair in order to give moving instructions to the blind people or control therobot automatically and will be presented in the following part

In [13], the authors proposed an approach to detect descending stair to control theautonomous tracked vehicle using the only monocular gray-scale camera Firstly, thestair will be coarsely detected if it’s far from the robot This is called “Far-approach”which uses optical flow and texture information to predict the stair By taking theobservation that if the stair is descending, so the region above it is normally the wallwith low texture region Another observation is that when moving, if there is a stairappears, the optical flow in the is changed very fast from the ground region to the wallregion because there’s a significant changing in the depth of the image This step can

be illustrated in the Fig 2.7below

With the near-approach, detected lines will be applied some constraints to the line using

a median flow of the region above and below the line, the length of line to look for stairedge

Trang 35

(a) (b)

Figure 2.7: A near-approach for stair detection in [13] (A) Input image with detectedstair region, (B) Texture energy, (C)Input image with detected lines are stair candidates, (D)Optical flow maps in this image, there is a significant changing in the line in

the edge of stair

In [24], the authors proposed a method to detect and model a stair using a depth sensor.Depth image will be used to create point cloud data Firstly, the plane segmentation andclassification will be applied to find planes in a scene This step includes normal estima-tion, region-growing, planar test, planes extension, cluster extraction and classification(can be seen in Fig2.8)

After planes have been segmented, the stair will be modeled as a group of parallel planes.Each plane must be satisfied some conditions like size, height, orientation as shown inFig 2.9

The advantages of this method are the robustness of depth information and stair wasmodeled very explicitly using a lot of conditions applied on stair plane But theseconstraints also require the stair plane must be clearly visible in the images when inthe real environment, due to the occlusion, camera viewpoint, these conditions may notsatisfied

Trang 36

Figure 2.8: Example of segmentation and classification in [24]

Figure 2.9: Stair modeling(left) and features in each plane [24]

Yingli Tian [29] proposed a mixed approach between using RGB and Depth Image tofind a stair and crosswalk In the first step, parallel lines will be detected in the RGBImage to find a group of concurrent lines using Hough transform and line fitting withgeometric constraints This step can be illustrated in Algorithm 1

Algorithm 1 Parallel line detection from [29]

1: Detect edge maps from the RGB image by edge detection

2: Compute the Hough transform of the RGB image to obtain the direction of the line

3: Calculate the peaks in the Hough transform matrix

4: Extract lines segment and its direction in the RGB image

5: Group the line fragments as the same line if the gap is less than a threshold

6: Detect a group of parallel lines based on constraints such as the length and a totalnumber of detected lines of stairs and pedestrian crosswalks

Then, they extract the depth information in each line then put it into a support vectormachine (SVM) classifier to detect the stair (both upstairs and downstairs) or pedestriancrosswalks (as seen in Fig 2.10) Finally, they estimate the distance between cameraand stairs to the blind user to give a warning messages In this paper, the authors

Trang 37

presented a robust algorithm to detect a stair take into account the advantages of colorand depth information However, these both color and depth image must be clearly toget a correct stair detection where color information gives a stair candidates and depthinformation was used to confirm that it is the stair or not In my case, since the limited

in the measurable range of Kinect depth sensor, the depth image is not always available

on stair surface and the edge detection algorithm is still sensitive to parameters

Trang 38

Obstacle Detection

The entire system flowcharts can be seen in the Fig 3.1 There are 3 main parts in thisflowchart including Kinect side, laptop side and user interface (or feedback module) sidewhere the laptop side contains almost processing module

Firstly, the laptop will capture data from Kinect By default, Kinect provides manytypes such as images, sound, skeleton, accelerometer data but in my work, only depthimage, color image, and accelerometer data will be used Then human will be detected inthe depth image Kinect SDK also provides a human index in each pixel encoded in thedepth image This information is calculated from depth image so it’s very robustness.After human has been detected, the system will store that information as a first detectedobstacles

The next module is obstacle detection in both color and depth image by using pointcloud technique All the captured information will be used to build a point cloud - acollection of a 3D point in order to reconstruct the environment around the blind people.Then, from the point cloud, some techniques will be applied to find an obstacle like planesegmentation, ground and wall plane detection and then obstacle detection The output

of this module is detected obstacle including normal obstacle and stairs

Another individual module is stair detection based on the color image This module willdetect stair directly in the color image by using some line extraction technique and thegeometry relationships between lines

In the obstacle fusion module, all the obstacle will be checked again for its availability inthe scene and return a most importance (or dangerous) obstacle to the blind people inorder to send the commands to user interface module through obstacle warning module

25

Định dạng
Số trang	76
Dung lượng	17,91 MB