The 3D imaging techniques were used todetect the general obstacle including: plane segmentation, 3D point clustering and themixed strategy between depth and color image is used to detect
Trang 2MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
-
Hoang Van Nam
DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR VISUALLY-IMPAIRED AID USING A MOBILE KINECT
Department : COMPUTER SCIENCE
Trang 3Độc lập – Tự do – Hạnh phúc
BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Họ và tên tác giả luận văn : ……… ………
Đề tài luận văn: ……… ……… ….
Chuyên ngành:……… ……… …
Mã số SV:……… ……… …
Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày… ………… với các nội dung sau: ……… …………
……… ………
……… ………
……… ………
……… ………
……… ………
Ngày tháng năm
CHỦ TỊCH HỘI ĐỒNG
Trang 4I, Hoang Van Nam, declare that this thesis titled, ’Difficult situations recognition forvisual-impaired aid using mobile Kinect’ and the work presented in it are my own Iconfirm that:
This work was done wholly or mainly while in candidature for a research degree
at this University
Where any part of this thesis has previously been submitted for a degree or anyother qualification at this University or any other institution, this has been clearlystated
Where I have consulted the published work of others, this is always clearly tributed
at- Where I have quoted from the work of others, the source is always given Withthe exception of such quotations, this thesis is entirely my own work
I have acknowledged all main sources of help
Where the thesis is based on work done by myself jointly with others, I have madeclear exactly what was done by others and what I have contributed myself
Signed:
Date:
i
Trang 5International Research Institute MICAComputer Vision Department
Master of ScienceDifficult situations recognition for visual-impaired aid using mobile Kinect
by Hoang Van Nam
By 2014, according to figures from some organization, here are more than one millionpeople in the Vietnam living with sight loss, about 1.3% of Vietnamese people Althoughthe big impact to the daily living, especially with the ability to move, read, communicatewith another, only a small percentage of blind or visually impaired people live withassistive device or animal such as a dog guide Motivated by the significant changes intechnology have take place in the last decade, especially in the introduction of variestypes of sensors as well as the development in the field of computer vision, I present
in this thesis a difficult situations recognition system for visually impaired aid using amobile Kinect This system is based on data captured from Kinect and using computervision technique to detect obstacle At the current prototype, I only focused on detectingobstacle in the indoor environment like public building and two types of obstacle will
be exploited: general obstacle in the moving way and staircases-which causes a bigdangerous to the visually impaired people The 3D imaging techniques were used todetect the general obstacle including: plane segmentation, 3D point clustering and themixed strategy between depth and color image is used to detect the staircase based ondetecting the stair edges and its structure The system is very reliable with the detectionrate is about 82.9% and the time to process each frame is 493 ms
Trang 6I am so honor to be here the second time, in one of the finest university in Vietnam towrite those grateful words to people who have been supporting, guiding me from thevery first moment when I was a university student until now, when I am writing mymaster thesis.
I am grateful to my supervisor, Dr Le Thi Lan, whose expertise, understanding, ous guidance and support made it possible for me to work on a topic that was of greatinterest to me It was a pleasure to work with her
gener-Special thanks to Dr Tran Thi Thanh Hai, Dr Vu Hai and Dr Nguyen Thi Thuy(VNUA) and all of the members in the Computer Vision Department, MICA Institutefor their sharp comments, guidance for my works which helps me a lot in how to studyand to do research in right way and also the valuable advices and encouragements thatthey gave to me during my thesis
I would like to express my gratitude to Prof Veelaert Peter, Dr Luong Quang Hiepand Mr Michiel Vlaminck at Ghent University, Belgium for their supporting It’s been
a great honor to cooperate and work with them
Finally, I would especially like to thank my family and friends for their continues love,support they have given me through my life, helps me pass through all the frustrating,struggling, confusing Thanks for everything that helped me get to this day
Hanoi, 19/02/2016Hoang Van Nam
iii
Trang 7Declaration of Authorship i
1.1 Motivation 1
1.2 Definition 2
1.2.1 Assistive systems for visually impaired people 2
1.2.2 Difficult situations 3
1.2.3 Mobile Kinect 5
1.2.4 Environment Context 11
1.3 Difficult Situations Recognition System 12
1.4 Thesis Contributions 13
2 Related Works 14 2.1 Assistive systems for visually impaired people 14
2.2 RGB-D based assistive systems for visually impaired people 18
2.3 Stair Detection 19
3 Obstacle Detection 25 3.1 Overview 25
3.2 Data Acquisition 26
3.3 Point Cloud Registration 27
3.4 Plane Segmentation 30
3.5 Ground & Wall Plane Detection 32
iv
Trang 83.6 Obstacle Detection 32
3.7 Stair Detection 34
3.7.1 Stair definition 34
3.7.2 Color-based stair detection 35
3.7.3 Depth-based stair detection 45
3.7.4 Result fusion 46
3.8 Obstacle information representation 48
4 Experiments 49 4.1 Dataset 49
4.2 Difficult situation recognition evaluation 51
4.2.1 Obstacle detection evaluation 51
4.2.2 Stair detection evaluation 53
5 Conclusions and Future Works 58 5.1 Conclusions 58
5.2 Future Works 59
Trang 91.1 A Comprehensive Assistive Technology (CAT) Model provided by [12] 3
1.2 A model for activities attribute and mobility provided by [12] 4
1.3 Distribution of frequencies of head-level accidents for blind people [18] 4
1.4 Distribution of frequencies of tripping resulting a fall [18] 5
1.5 A typical example of depth image (A) raw depth image, (B) depth image is visualized by jet color map and the colorbar shows the real distance with each color value, (C) Reconstructed 3D scene 6
1.6 A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map) 7 1.7 Some existed stereo camera From left to right: Kodak stereo camera, View-Master Personal stereo camera, ZED, Duo 3D Sensor 7
1.8 Time of flight systems from [3] 8
1.9 Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft Kinect v2 8
1.10 Structured light cameras From left to right: PrimeSense, Microsoft Kinect v1 8
1.11 Structured light systems from [3] 9
1.12 Figure from [16], (A) raw IR image with pattern, (B) depth image 9
1.13 Figure from [16] (A) Errors for structured light cameras, (B) Quantization errors in different distances of a door: 1m, 3m, 5m 10
1.14 Prototype of system using mobile Kinect, (A) Kinect with battery and belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body 11
1.15 Two different environments that I tested with (A) Our office build (B) Nguyen Dinh Chieu secondary school 12
1.16 Prototype of our obstacle detection and warning system 13
2.1 Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C) Nav-igation 15
2.2 NXT Robot System from [6] (A) The system’s Block Diagram, (B) NXT Robot 16
2.3 Mobile robot from [22] [21] 16
2.4 BrainPort vision substitution device [32] 18
2.5 Obstacle detection process from [30] 20
2.6 Stair detection from [26] (A) Input image (B)(C)Frequency as a output of Gabor filter (D)Stair detection result 21
vi
Trang 102.7 A near-approach for stair detection in [13] (A) Input image with detected stair region, (B) Texture energy, (C)Input image with detected lines are stair candidates, (D)Optical flow maps in this image, there is a significant
changing in the line in the edge of stair 22
2.8 Example of segmentation and classification in [24] 23
2.9 Stair modeling(left) and features in each plane [24] 23
2.10 Stair detection algorithm proposed in [29] (A) Detected line in the edge image (using color infomation) (B) Depth profiles in each line (red line: pedestrian crosswalk, blue: down stair, green: upstair) 24
3.1 Obstacle Detection Flowchart 26
3.2 Kinect mounted on body 27
3.3 Coordinate Transformation Process 28
3.4 Kinect Coordinate 29
3.5 Point Cloud rotation using normal vector of ground plane (while arrow): left: before rotating, right: after rotating 30
3.6 Normal vector estimation algorithms [15] (a) Normal vector of the center point can be calculated by a cross product of two vectors of four neighbor points (red), (b) Normal vector estimation in a scene 31
3.7 Plane segmentation result using algorithm proposed in [15] Each plane is represented by a distinctive color 31
3.8 Detected Ground and Walls plane (ground: blue, wall: red) 33
3.9 Human Segmentation Data by Microsoft Kinect SDK (a) Color Image, (b) Human Mask 34
3.10 Detected Obstacles (a) Color Image, (b) Detected Obstacles 34
3.11 Model of stair 35
3.12 Coordinate transformation models from [7] 36
3.13 Projective chirping: a) A real world object that generate a projection with ”chirping” - ”periodicity-in-perspective” b) Center raster of image c) Best fit projective chirp 38
3.14 A pin-hole camera model with stair 38
3.15 A vertical Gabor filter kernel 39
3.16 Gabor filter applied on a color image (a) Original (b) Filtered Image 40
3.17 Thresholding the grayscale image (a) Original (b) Thresholded Image 40
3.18 Example of thinning image using morphological 41
3.19 Thresholding the grayscale image (a) Original (b) Thresholded Image 42
3.20 Six points vote for a line will make an intersection in Hough space, this intersection has higher intensity than neighbor pixels 42
3.21 Hough space (a) Line in the original space (b) Three curves vote for this line in Hough space 43
3.22 Hough space on stair image (a) Original image (b) Hough space 43
3.23 Chirp pattern detection (a) Hough space (b) Original image with detected chirp pattern 44
3.24 Point cloud of stair (a) Original color image (b)Point cloud data created from color and depth image 45
3.25 Detected steps 46
3.26 Detected planes 47
3.27 Detected stair on point cloud 47
Trang 113.28 Obstacle position quantization for sending warning message to visuallyimpaired people 48
4.1 Depth image encoding (A) Original, (B) Visualized Image (C) EncodedImage 50
4.2 Detection time of each step in our proposed method 52
4.3 Example stair image to evaluation (A)Positive sample from MICA dataset(B) Negative sample from MICA dataset (C) Positive sample from MONASHdataset (D) Negative sample from MONASH dataset 54
4.4 Detected stair in Tian’s based method (A-F) and detected stair in myproposal method (G-I) (A) Color image (B) Depth image (C) Edges (D)Line segments (E) Detected concurrent lines (F) Depth values on detectedlines (G) Detected stair with blue lines are false stair edge and greenlines are stair edge (H) Edges Image, (I) Detected peaks in Hough mapcorresponding to lines in Figure G 55
4.5 Miss detection in Tian’s based method because of missed depth on F) and detected stair in my proposed method (G-I) 56
stair(A-4.6 Miss detection in Tian’s based method because of missed depth on F) and detected stair in my proposed method (G-I) 57
Trang 12stair(A-2.1 Comparison between assistive robot and wearable device 14
4.1 Database specifications 50
4.2 Pixel level evaluation result (TP,FP,FN: million pixels) 52
4.3 Object level evaluation result (TP,FP,FN: objects) 52
4.4 Stair dataset for evaluation 53
4.5 Stair detection result of the proposed method on different datasets 53
4.6 Comparison of the proposed method and the method of Tian et al [29] on MICA dataset 55
ix
Trang 13PCL Point Cloud LibraryCAT Comprehensive Assistive TechnologyTDU Tongue Display Unit
IR InfraredOpenCV Open Computer VisionRGB Red Green Blue
RGB-D Red Green Blue and DepthToF Time of Flight
x
Trang 14According to the official statistic of National Eye Hospital in 2002, Vietnam has about900.000 blind people, including about 400.000 who are totally blind By 2014, according
to figures from some organizations, the number of blind people in Vietnam is about 1,2
to 1,4 million people, it’s still a large number in comparison with other countries Inthe worldwide, the visually impaired population is estimated to number in excess of 285million by the investigation of World Health Organization (August 2014)1 About 90% ofthem live in developing countries with low-income settings Visually impaired has made
a big impact in their daily living Especially they can not read the document, the ability
to move and to communicate with other people is compromised because the information
is received primarily through vision All of the above things have led blindness becomethe public heath problem in all over the world
Nowadays, with the significant developing in the technology, lots of assistive deviceshas been released in order to help visually impaired people in daily life But althoughmany researchers and companies are concerned with making better and cheap device
to improve the comfort of visually impaired people, the research in this field still mains many unsolved issues and in general, those devices still cannot replace traditionalmethods such as the white cane or guided dog
re-Take the motivation on the significant changes in technology have take place in the lastdecade, especially in the introduction of varies types of sensors as well as the development
in the field of computer vision, my thesis aims to build a prototype of system to helpvisually impaired people avoid the obstacle in the environment using Kinect sensor
1 http://www.who.int/mediacentre/factsheets/fs282/en/
1
Trang 15With the Kinect, the benefit is that we can make a reliable system by using depth andcolor information to detect the obstacle with an affordable price In my thesis, due tothe lack of time, I only focus on indoor environment, more specifically in the publicbuilding such as apartment or office in order to detect some general objects encountered
on the moving way and stair which may cause danger to the visually impaired people
My thesis is organized as follows:
First, I shall give some definitions in the context of my work and the contributions inthis thesis
In chapter 2, I shall review briefly some other works that related to my system such
as existing assistive devices, obstacle detection algorithms/systems, its advantages anddisadvantages
In chapter 3, a framework for obstacle detection will be developed and I shall presentthe details of each module and also the entire system, analyzing and assessing them
In the next chapter, I shall give some experiments results of my system, including how
to prepare the dataset, how to make an evaluation and the final results
In the final chapter, I end this work by giving some conclusions and future works tomake the system more complete and effective
1.2.1 Assistive systems for visually impaired people
According to [12], assistive systems for visually impaired people can be understood as
an equipment, devices or systems which can be used to overcome the gap between adisabled person wants to do and what the social allows them to do In short, suchkind of system must be able to help the visually impaired people to do the things thatnormal people can do And this system can be model by the Comprehensive AssistiveTechnology (CAT) Model as shown in Fig 1.1 The top level of this model consist offour components that can be used to define all assistive technology systems:
• Context (in which the assistive technology will be used)
• Person (what kind of user can use this system)
• Activities (what activities that assistive system can help the visually impairedpeople, can be seen more clearly in Fig1.2)
Trang 16• Assistive Technology (technology will be used to make a system)
Most of the existing systems are aimed at solving one specific aspect of each branch inthe model: work on bounded defined context, with some certain types of users to helpthem in specific activities in daily life In the framework of my master thesis, to simplifythe system, I just focused on some certain aspects of this model and I will explain indetail in the next sections In short, I applied my system with the local settings ofcontext, in a small public building such as office, department and the users are thevisually impaired students at the Nguyen Dinh Chieu Secondary school to help themavoid obstacles in a moving way
PersonContext
Culture &
Social contextNational contextLocal settings
Activity specificationDesign issues
End user issuesSystem technology issues
Characteristics
Social aspectsAttitudes
Figure 1.1: A Comprehensive Assistive Technology (CAT) Model provided by [12]
1.2.2 Difficult situations
Fig 1.2 shows detail information of activities branch in the model CAT (see Fig 1.1)
As shown in the figure, there are a lot of services that can be used in assistive systemsfor visually impaired people such as mobility, daily living, cognitive activities, educationand employment, recreational activities, communication and access to information Butmost of exist works focus on the mobility component in the activities model because ofits important role for visually impaired people daily life
Trang 17Recreational activities Education and Employment Cognitive activities
Daily living Mobility
Communication & Access to information
Movement on ramps,
slopes, stairs & hills
Long & medium distance
locomotion
Sitting and standing
Reaching and lifting
Navigation and orientation
Sitting and standing
Obstacle avoidance
Short distance locomotion
inside & outside
Access to environment
Obstacle avoidance
Our Focus
Figure 1.2: A model for activities attribute and mobility provided by [12]
According to the survey of R.Manduchi [18] in 2011 with 300 respondents who arelegally blind or blind, there were half of the respondents said that they had an head-level accident at least once in a week and about 30% respondents fell down at least once
a month (see Fig 1.3 and Fig 1.4) Therefore, helping visually impaired people in themoving process is always an interested topic for researchers, social organizations andcompanies In fact, many products have been released, and also have some particularsuccess like the system proposed in [11], [10], [1] and [4]
Figure 1.3: Distribution of frequencies of head-level accidents for blind people [18]
Trang 18Figure 1.4: Distribution of frequencies of tripping resulting a fall [18]
In the context of my thesis, I aim to develop a system which can detect the obstacles
in visually impaired people’s moving way which are the main cause of the accidentsmentioned above The scenario in this project is that visually impaired people want tomove on the hallway inside a public building, so they need to avoid obstacles includingmoving or static objects and to go up/down the stair Obstacle in my case can be defined
as objects laying on the ground or in front of the visually impaired people that he/shecan be harmed while moving if encountered these objects Although obstacle’s class isvery important with the visually impaired people to distinguish which is more dangerousand which is not but in my work, I just try to detect obstacle in the scene without sayingits name (make a classification) And within the framework of this thesis, I also focus ondetection another special object that often appears in the building and is very dangerousfor the visually impaired people, that is the stair Moreover, the proposed system willonly give a warning to the blind people using Tongue Display Unit (TDU) which hasbeen already developed by Thanh-Huong Nguyen in 2013 [23] In brief, my proposedsystem aims to solve two aspects of mobility component of the activities model (seeFig 1.2): obstacle avoidance and movement on ramps, slopes, stair & hill and withthe second aspect, the current system just stop at the level of given warning distance ofstairs to the visually impaired people in order to assist them in going up/down stairs
1.2.3 Mobile Kinect
1 Introduction To assist visually impaired persons in those difficult situations, in
my thesis, I proposed using a Kinect sensor to capture the information of the ronment in order to detect obstacles if they appear There are a lot of advantageswhen using Kinect in this system since it is a popular RGB-D camera with cheapprice But firstly, I will give some brief information about the depth camera whereKinect is the typical example
Trang 19envi-Depth camera is actually a sensor which has the capacity to provide depth formation (depth image or depth map) A depth map is an image that containsinformation relating to the distance of the surface of scene objects from a view-point, for example in the Fig 1.5 An intensity value of each pixel in a depthmap represents a distance from a point in the object to the camera Therefore,3D information of the scene can be reconstructed by using depth image (as shown
in-in Fig 1.5-C) The benefit of the depth image that is not affected by lightingconditions
Figure 1.5: A typical example of depth image (A) raw depth image, (B) depth image
is visualized by jet color map and the colorbar shows the real distance with each color
value, (C) Reconstructed 3D scene
In recent years, with the development of technology, especially in the field ofsensor fabrication industry, there are a lot of cameras have been placed on themarket which is capable of capturing the depth information Those devices can beseparated into several groups by used technology such as stereo camera: ZED, forexample, Time-of-Flight (ToF) like ZCam, the structured light camera like Kinect,long range 3D camera Each device has it own advantages, disadvantages and onlysuitable for a particular use case
2 Stereo Camera
Stereo camera is the kind of camera was used in the robotics since its early days.Take the ideas of human binocular vision, it contains two or more cameras withprecisely known relative offsets Depth information can be calculated by matchingsimilar point in the overlapped region between images Hence, 3D distance tomatching points can be determined using triangulation like illustrated in Fig1.6.However, the camera is used in this case is still the color camera As a result, it isstill affected by the changing of lighting conditions On the other hand, the depthimage is calculated by matching algorithms, so it works very poorly when thescene is texture-less, for example image of wall, building, There are many stereocameras that are available on the market due to the ease of making such as Kodak
Trang 20stereo camera, View-Master Personal stereo camera, ZED2, Duo 3D Sensor3, asillustrated in Fig1.7.
Figure 1.6: A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map)
Figure 1.7: Some existed stereo camera From left to right: Kodak stereo camera,
View-Master Personal stereo camera, ZED, Duo 3D Sensor
3 Time of Flight (ToF) camera
Time of Flight (ToF) cameras use the same principle as laser radar, which isinstead of transmitting a single beam, short pulses of Infrared(IR) light is sent.The camera will get the return time from pixels across its field of view And thedistance was measured by comparing the phase of the modulated return pulseswith those emitted by the laser (Fig 1.8) But ToF cameras is also suffered fromsimilar limitations as a ime of flight sensors, including ambiguity of measurements,multiple reflection, sensitivity to material reflectance, background lighting and donot operate well outdoors in strong sunlight Some of the popular ToF camerascan be listed such as: DepthSense4, Fotonic5, Microsoft Kinect v2 (see Fig1.9)
4 Structured light camera
Structured light camera is another approach to measure depth information byusing “structured light”, which is a pattern of light such as array of lines Thescene will be viewed at an angle like illustrating in the Fig1.11 If the pattern isprojected onto a flat wall, the camera will see straight lines but if the scene is verycomplex then it will see a more complex profile By analyzing this profiles across
Trang 21Figure 1.8: Time of flight systems from [3]
Figure 1.9: Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft
Kinect v2
the field to map, depth information can be calculated With traditional method,structured lights is grids or arrays of lines but it’s affected by a noise So that, insome newer devices such as PrimeSense or Microsoft Kinect v1 (see Fig 1.10), acodes will be added in to a light to make the camera is almost zero repetition acrossthe scene The Kinect v1 uses a randomly distributed speckled pattern and eachspeckle looks differently flat at different distance, due to a special lens as can beseen in the Fig1.12 But this kind of depth sensor also have some limitations such
as the errors grow with the square of the distance to objects, strong quantizationeffects (see Fig) and some limitations that similar with ToF system like sensitivity
to material reflectance, do not operate well in strong sunlight
Figure 1.10: Structured light cameras From left to right: PrimeSense, Microsoft
Kinect v1
Trang 22Figure 1.11: Structured light systems from [ 3 ]
Figure 1.12: Figure from [16], (A) raw IR image with pattern, (B) depth image
Trang 23(b) Figure 1.13: Figure from [16] (A) Errors for structured light cameras, (B) Quantiza-
tion errors in different distances of a door: 1m, 3m, 5m
Trang 245 Mobile Kinect
In my thesis, I used Microsoft Kinect v1 as a capture device of the system due toits usability and availability To make Kinect become more flexibility, I have addedsome components and making it become “Mobile Kinect”, which is Kinect with aexternal battery so it can be moved to anywhere without worrying about electricalsources (holes, cables) and it is easy to replace this external battery To attach onthe human body, Kinect has been mounted into a single belt, so that the Kinectcan be fix mounted easily on the body An other important part of mobile Kinect
is a laptop which plays the role of main processor It contains a data acquisitionand obstacle detection modules The reason of choosing laptop because Kinect isthe commercial device and it have been developed for the video game purposes,
so Kinect cannot operate without a PC And because of the restrictions in Kinectdata cable length, a computer must be placed nearby the Kinect (the whole systemcan be seen in Fig 1.14)
1.2.4 Environment Context
The environment for development and testing system is public building as mentionedbefore More specifically, I just focus on a specific use case, that is walking along thecorridors of building There are two major type of corridor in our context, that istwo sides wall and half opening with one side is the glass windows or opened and theother side is the solid wall But in our experiment, I aim to develop a system with the
Trang 25half opening corridor because it’s very popular type in public building such as school,office and some apartment In the context of my thesis, I tested the system with twodifferent environments One is our office building at B1-Hanoi University of Science andTechnology, the others is Nguyen Dinh Chieu secondary school for blind pupils (see Fig
1.15) Because depth data is strongly affected by sun light, so the requirement withenvironment is that it must not be lighted too strong (in the shady day or in the twosides wall building, where the sun light cannot reach to)
The use-case was used is that the user (visually impaired people) want to move to anotherplace in the building and to do that, he/she must go along the corridor, walk up/downthe stair then move to destination point Ignoring the problem with path finding, in
my work, I only aim to obstacle avoidance Obstacles in both cases are the objects thatblocked the moving way such as: distinguisher, trash bin, column, wall and human infront of the user (can be seen in Fig 1.15)
Figure 1.15: Two different environments that I tested with (A) Our office build (B)
Nguyen Dinh Chieu secondary school
In conclusion, my entire system to recognize the difficult situations can be demonstrated
as in Fig 1.16 In this case, the prototype system will be fix mounted on the visuallyimpaired people To interact with the user, a tactile visual substitution module from[23] has been used to give the warning about obstacle in front of he/she The mobileKinect will be mounted on the human hip to capture depth and color information Thoseinformation will be processed by a recognition module in the laptop behind user Afterobstacle has been detected, laptop will send a correspondence command to tactile visualsubstitution module in order to give warning message The message representation hasbeen integrated into this module and presented in [23] So my main work is how to send
a correct command to the feedback module
Trang 26There are two main contributions in my thesis:
• The first is making the prototype of difficult situations recognition system thatcan work in a flexible manner, stability in many different environments
• Second contributions is I proposed a method to detect the obstacle using bothcolor, depth information and 3D Point Cloud technique, especially on the problem
of early detection the up/down stair and handling with noise in depth data
Trang 27Related Works
There are a lot of devices and systems have been developed to help visually impairedpeople in the daily life This section presents some research based on vision technologyfor the visually impaired people that related to my work Each technology/device aims
to cover one or more fields in the Mobility component as shown in Fig1.2in Chapter1
From the point of view of obstacle avoidance system technology, there are two differenttechnologies are being widely used: an assistive robot to help visually impaired peoplewith moving activities and wearable devices The advantages and disadvantages of eachtechnologies will be discussed in the following part
Table2.1shows the comparison between two technologies
Table 2.1: Comparison between assistive robot and wearable device
Assistive Robot Wearable DeviceTypical Example Guided dog robot[22][21][6][17]
Glasses[32][20], mobile phone,white cane,
sensory substitution[5][14]
Advantages
-Can integrate differenttechnologies & devices-Long operating time
-Cheaper than assistive device-Flexible
-Convenience while using
Disadvantages
-Expensive-Inconvenience,hard to get on well with society-Limited environment
(almost works with flat plane only)
-Limited operating time(battery problem)-Limited in technologies-May harm other human senses
14
Trang 28With the assistive robot for visually impaired people, in [17] (2004), the authors oped a simple robot-assisted for indoor navigation as shown in Fig 2.1 In this work, alot of passive RFID tags were attached to the environment and the object to help thenavigation (or obstacle avoidance) task To interact with the visually impaired users,the speech recognition with the single, simple word and wearable keyboards are used,when the robot is passing an object, it can speak its name through speech synthesizedmodule But this system is still developing and was only tested in the laboratory envi-ronment and since RFID tags play the most importance role in this system, it is hard
devel-to apply it in the real, large environment like office and resident
Figure 2.1: Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C)
Nav-igation
In [6], Dalal et al proposed a mobile application along with a robot (named NXT Robot)
to help visually impaired people avoid an obstacle (Fig2.2) in an outdoor environment.This robot and mobile phone are connected to each other by Bluetooth and the inter-action between human and robot are speech communication technique On the robot,the authors attach the ultrasonic sensor in order to detect an obstacle and combine thisinformation with location sent from a mobile phone using GPS When user asks forgoing to a destination, the mobile phone will find the route with embedded Google Mapand give voice instructions while user going to the destination The advantage of thissystem is using robust navigation applications like Google Map and ultrasonic sensor.But the limitations is this system depends on GPS signal and internet connection sothat it cannot work off-line or in some environment with weak GPS signal With theobstacle detection, the system only gives the obstacles in front of the robot and in thecomplex environment, this system may give unreasonable instructions
Recently, Nguyen et al has introduced a mobile robot [22], [21] for indoor navigationusing computer vision techniques (see Fig2.3) This system consists a PCbot-914, which
is a PC attached with actuators and sensors like wheels, ultrasonic sensors, a normalwebcam and mobile phone The visually impaired user will take the mobile phone,
Trang 29choose destination location through the touch screen and follow the robot to go to thedestination that he/she wants To communicate with visually impaired people, vibrationpatterns on a mobile phone Firstly environment must be modeled in off-line phase tobuild a map with image and static objects position at each location At the on-linephase, the image captured from the camera will be used to match with the database inthe learned model to give a location of robot in building for navigation task, anothermodule in this system will detect an obstacle in the image in order to give warning to theusers However, this system only works in the limited environment and the off-line modelmust be built before using, and visually impaired people must be trained carefully touse this system (how to recognize a vibration pattern, destination position on the touchscreen) And due to the limitation of the monocular webcam, a lot of unexpected factorscan affect to the system results such as environment and lighting changing.
Trang 30With the wearable devices for helping visually impaired people in daily life, there aresome existing products based on different technologies as presented in the next part ThevOICe system [5] is the head mounted camera which is a form of sensory substitutionand has a functionality of receiving vision information and converting to the sound signalthat blind people can hear it through the stereo headphone The main idea is that thescene or captured image can be represented as a sound signal pattern where the time iscorresponding to the position in horizontal axes, the pitch is corresponding to verticalaxes of the visual pattern and the loudness stands for brightness Consequently, eachimage with the capture rate is about one image per second, blind people will hear a soundfrom left to right that represent each position by a pitch and loudness level In order toknow which is the object/scene corresponding to the sound level, the blind people havebeen trained before using the system But this device also has some limitations Firstly,auditory signals are very important for blind people but if they use this device, theirshear sensing will be blocked and they can be distracted from the natural environment.
In another way, because the image is represented by a pitch, loudness level, as a result,the system will create a very noisy sound map when moving to the complex environmentand it’s complicated for the blind user to understand the scene
A product in development called BrainPort (from Wicab Inc.) and recently approvedfor sale in Europe [32], uses a camera mounted on a pair of sun glasses as its inputdevice After image processing, images are displayed on the tongue using an electrotactile display of 49 electrodes to provide directional cues to the blind users Thiswill be shown as a small electrical “image” on the tongue - which called “lollipop”-likedisplay as illustrated in Fig2.4 The limitations of this system are that it requires use
of the mouth, and it will reduces the abilities of blind people, especially in speaking andeating - which are the very important activities and occur frequently Another problem
is the resolution of both electrotacticle and tongue sensitivity is still far from the visualsystem So, representing the image directly in the human tongue is not an effective way
to represent a data In my work, I used similar system but the output on electrotactilejust a defined encoded signal - which is easier for the blind people to recognize theinstructions
Very recently in Vietnam, Dr.Nguyen Ba Hai has successfully developed a vision-baseddevice named “A haptic device for blind people” [20] This is a very low cost andportable glasses which can help visually impaired people to detect an obstacle using asingle laser transmitter and receiver When the glass detects an obstacle, it’ll trigger
a small vibrator on the forehead so that visually impaired people can feel the obstacle.However, this device is very simple, it cannot detect the potential obstacles come fromtwo sides of visually impaired people as well as take into account object information
Trang 31Figure 2.4: BrainPort vision substitution device [32]
people
Nowadays, with the development of depth sensor, there are some works dedicated toRGB-D based assistive system for visually impaired people as follows:
NAVI [33] (Navigational Aid for Visually Impaired) is the system which is similar with
my proposal This system also used a Kinect with a battery which is fixed mounted
on helmet There are two main functions in this system called “Micro-Navigation” and
“Macro-Navigation”, Micro-Navigation means obstacle avoidance and Macro-Navigation
is stand for path finding For the purpose of giving information about obstacles to blindpeople, the vibrotactile output is provided by a waist belt that contains three pairs
of Arduino LilyPad vibe boards With the obstacle detection, the system can detectclosest obstacles in the left, right and center region of the Kinect’s field of view usingdepth histogram and triggers the vibe board in the corresponding direction About
“Macro-Navigation”, the authors used fixed markers to annotate the location and detect
it via Kinect’s RGB camera and using depth image to calculate the person’s distance
to the marker in order to give navigation instructions The output of this function issynthesized voice as a feedback to the blind people about the instructions to move, forexample ”Open the door”
In [8], the authors present a low-cost system using Kinect sensor for obstacle avoidancefor visually impaired people The Kinect sensor is mounted on the body using a belt It
is connected to a battery pack and to a computing device (smart phone) which providesaudio feedback to the user The system follows 5 main tasks: i) read the data fromKinect device and express it as 3D point cloud; ii) detect the floor and register the data
in the reference system centered at the user’s feet; iii) detect the occupancy of the volume
in front of the user, at this step, the space is subdivided into a certain numbers of valueseach one corresponding to a vertical volume spanning a range of possible directions; iv)
Trang 32analyze the output of the accelerometer to determine if the user is walking and how fast
it is; v) provide the feedback to the user
Tang et al also presented a RGB-D sensor based computer vision device to improve theperformance of visual prosthesis (retinal prosthesis or tongue stimulator) [27] Firstly,the patch-based stereo vision will be applied to create a RGB-D data which is alreadysegmented based on color segmentation, feature point matching and plane fitting andmerging using RANSAC Then they do a technique called “smart sampling” to highlightthe important informations This step includes background removal, parallax simula-tion, object highlighting and path directions At the final step, the information will
be represented using BrainPort device for line orientation and navigation using lineorientation
In addition, my thesis is also get motivation and inherited the work from Michiel el
al [30] In this work, the authors proposed a obstacle detection algorithm based onpoint cloud to run with Kinect Obstacle is defined as obstacle on the floor plane,door and stair However, with stair and door detection , the authors only tested itindependently along with obstacle detection Fig2.5shows the main process of obstacledetection algorithms which partial similar with my proposal in 3.1 Firstly, The pointcloud will be down-sampled and filtered to reduce the processing time Then, usingRANSAC, ground plane and wall plane will be removed from point cloud and usingclustering techniques to detect a obstacle in a remaining point cloud data With thestair detection, a similar process with ground plane detection will be used and usingsome pre-defined parameters such as step height, step number, a stairs will be detectedfrom the point cloud With the door detection, take the observation that door is always
on the wall plane, the authors use segmentation on wall plane based on color to find adoor region The obtained results when applying this algorithms with author’s database
is very promising However, because this algorithm based almost on plane segmentationand ground plane detection so when apply it to our dataset (MICA dataset, presented in
4.1), where point cloud data is not good on floor plane because of the moving, lightingcondition, the achieved results is still low And with the pre-defined parameters such
as step height, the stair detection algorithm is not robustness when apply it in ourenvironment where its specifications can be changed
Stair detection on an image is a classical problem in computer vision since it is a familiarobject in daily life The most prominent characteristic of the stair is that it has the rigidform with repetitive structure of stair plane By that way, a lot of lines and edges appear
Trang 33Figure 2.5: Obstacle detection process from [ 30 ]
in the stair image Therefore, stair detection is the interesting topic for researchers withtraditional computer vision techniques like line detection, edge detection, frequencyanalysis, image segmentation
In [26] (2000), before the Hough line detection algorithm has been developed and usedwidely, the authors proposed a method to detect the stair in the gray-scale image usingsome basic image processing operators and apply with constraints to find out the stair inthe image Stair image is the outdoor scene with good lighting condition and the edges
on stair plane is quite clearly The authors first use Canny detection to extract the edgesand Gabor filer on horizontal and vertical direction to focus on two main directions thenfind concurrency lines (hence finding the vanishing point) as hypotheses for stair-cases.However, since this algorithm is based on simple operators and rules, it will very sensitivewith parameters (on Canny, Gabor filter, line’s length) and detection is not good whenthere is an object has concurrency lines in the images Fig2.6 illustrated the result ofthis system
Recently, with the developing of the sensor with the ability to capture depth information,the stair detection can be done efficiently in 3D and it is still a good topic to researchabout plane detection, plane segmentation in 3D With the stair detection system using
Trang 34(a) (b)
Figure 2.6: Stair detection from [26] (A) Input image (B)(C)Frequency as a output
of Gabor filter (D)Stair detection result
RGB or RGB-D data, there are some works aim to develop a system which can earlydetect the stair in order to give moving instructions to the blind people or control therobot automatically and will be presented in the following part
In [13], the authors proposed an approach to detect descending stair to control theautonomous tracked vehicle using the only monocular gray-scale camera Firstly, thestair will be coarsely detected if it’s far from the robot This is called “Far-approach”which uses optical flow and texture information to predict the stair By taking theobservation that if the stair is descending, so the region above it is normally the wallwith low texture region Another observation is that when moving, if there is a stairappears, the optical flow in the is changed very fast from the ground region to the wallregion because there’s a significant changing in the depth of the image This step can
be illustrated in the Fig 2.7below
With the near-approach, detected lines will be applied some constraints to the line using
a median flow of the region above and below the line, the length of line to look for stairedge
Trang 35(a) (b)
Figure 2.7: A near-approach for stair detection in [13] (A) Input image with detectedstair region, (B) Texture energy, (C)Input image with detected lines are stair candi- dates, (D)Optical flow maps in this image, there is a significant changing in the line in
the edge of stair
In [24], the authors proposed a method to detect and model a stair using a depth sensor.Depth image will be used to create point cloud data Firstly, the plane segmentation andclassification will be applied to find planes in a scene This step includes normal estima-tion, region-growing, planar test, planes extension, cluster extraction and classification(can be seen in Fig2.8)
After planes have been segmented, the stair will be modeled as a group of parallel planes.Each plane must be satisfied some conditions like size, height, orientation as shown inFig 2.9
The advantages of this method are the robustness of depth information and stair wasmodeled very explicitly using a lot of conditions applied on stair plane But theseconstraints also require the stair plane must be clearly visible in the images when inthe real environment, due to the occlusion, camera viewpoint, these conditions may notsatisfied
Trang 36Figure 2.8: Example of segmentation and classification in [24]
Figure 2.9: Stair modeling(left) and features in each plane [24]
Yingli Tian [29] proposed a mixed approach between using RGB and Depth Image tofind a stair and crosswalk In the first step, parallel lines will be detected in the RGBImage to find a group of concurrent lines using Hough transform and line fitting withgeometric constraints This step can be illustrated in Algorithm 1
Algorithm 1 Parallel line detection from [29]
1: Detect edge maps from the RGB image by edge detection
2: Compute the Hough transform of the RGB image to obtain the direction of the line
3: Calculate the peaks in the Hough transform matrix
4: Extract lines segment and its direction in the RGB image
5: Group the line fragments as the same line if the gap is less than a threshold
6: Detect a group of parallel lines based on constraints such as the length and a totalnumber of detected lines of stairs and pedestrian crosswalks
Then, they extract the depth information in each line then put it into a support vectormachine (SVM) classifier to detect the stair (both upstairs and downstairs) or pedestriancrosswalks (as seen in Fig 2.10) Finally, they estimate the distance between cameraand stairs to the blind user to give a warning messages In this paper, the authors
Trang 37presented a robust algorithm to detect a stair take into account the advantages of colorand depth information However, these both color and depth image must be clearly toget a correct stair detection where color information gives a stair candidates and depthinformation was used to confirm that it is the stair or not In my case, since the limited
in the measurable range of Kinect depth sensor, the depth image is not always available
on stair surface and the edge detection algorithm is still sensitive to parameters
Trang 38Obstacle Detection
The entire system flowcharts can be seen in the Fig 3.1 There are 3 main parts in thisflowchart including Kinect side, laptop side and user interface (or feedback module) sidewhere the laptop side contains almost processing module
Firstly, the laptop will capture data from Kinect By default, Kinect provides manytypes such as images, sound, skeleton, accelerometer data but in my work, only depthimage, color image, and accelerometer data will be used Then human will be detected inthe depth image Kinect SDK also provides a human index in each pixel encoded in thedepth image This information is calculated from depth image so it’s very robustness.After human has been detected, the system will store that information as a first detectedobstacles
The next module is obstacle detection in both color and depth image by using pointcloud technique All the captured information will be used to build a point cloud - acollection of a 3D point in order to reconstruct the environment around the blind people.Then, from the point cloud, some techniques will be applied to find an obstacle like planesegmentation, ground and wall plane detection and then obstacle detection The output
of this module is detected obstacle including normal obstacle and stairs
Another individual module is stair detection based on the color image This module willdetect stair directly in the color image by using some line extraction technique and thegeometry relationships between lines
In the obstacle fusion module, all the obstacle will be checked again for its availability inthe scene and return a most importance (or dangerous) obstacle to the blind people inorder to send the commands to user interface module through obstacle warning module
25