The 3D imaging techniques were used to detect the general obstacle including: plane segmentation, 3D point clustering and the mixed strategy between depth and color image is used to dete
Trang 1DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR
VISUALLY-IMPAIRED AID USING A MOBILE KINECT
MASTER THESIS OF SCIENCE COMPUTER SCIENCE
Ha Noi – 2016 Nhi ■ u event thú v ■ , event ki ■ m ti ■ n thi ■ t th ■ c 123doc luôn luôn t ■ o c ■ i gia t ■ ng thu nh ■ p online cho t ■ t c ■ các thành viên c ■ a website.
123doc s ■ u m ■ t kho th ■ vi ■ n kh ■ ng l ■ i h ■ n 2.000.000 tài li ■ t c ■ nh v ■ c: tài chính tín d ■ ng, công ngh ■ thông tin, ngo ■ i ng ■ , Khách hàng có th ■ dàng tra c ■ u tài li ■ u m ■ t cách chính xác, nhanh chóng.
Mang l ■ i tr ■ nghi ■ m m ■ i m ■ cho ng ■■ i dùng, công ngh ■ hi ■ n th ■ hi ■ ■■ ■ n online không khác gì so v ■ i b ■ n g ■ c B ■ n có th ■ phóng to, thu nh ■ tùy ý.
Luôn h ■■ ng t ■ i là website d ■ ■■ u chia s ■ và mua bán tài li ■ u hàng ■■ u Vi ■ t Nam Tác phong chuyên nghi ■ p, hoàn h ■ o, cao tính trách nhi ■ m ■ ng ng ■■ i dùng M ■ c tiêu hàng ■■ ■ a 123doc.net tr ■ thành th ■ vi ■ n tài li ■ u online l ■ n nh ■ t Vi ■ t Nam, cung c ■ p nh ■ ng tài li ■■■ c không th ■ tìm th ■ y trên th ■ ■■ ng ngo ■ i tr ■ 123doc.net
123doc cam k ■ t s ■ mang l ■ i nh ■ ng quy ■ n l ■ t nh ■ t cho ng ■■ i dùng Khi khách hàng tr ■ thành thành viên c ■ a 123doc và n ■ p ti ■ n vào tài kho ■ n c ■ a 123doc, b ■ n s ■ ■■■ c h ng nh ■ ng quy ■ n l ■ i sau n ■ p ti ■ n trên website thay vì m ■ i m ■ t cá nhân kinh doanh t ■ th ■ c hi ■ n ngh ■ a v ■ a mình thì s ■ p t ■ i, ngh ■ a v ■ a c ■ a hàng tri ■ u nhà bán hàng l ■ i chuy ■ n giao sang ■■ ■ qu ■ n lýChào m ■ ng b ■■■ ■ i 123doc.
Trang 2MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
-
Hoang Van Nam
DIFFICULT SITUATIONS RECOGNITION SYSTEM FOR
VISUALLY-IMPAIRED AID USING A MOBILE KINECT
Department : COMPUTER SCIENCE
Trang 3Độc lập – Tự do – Hạnh phúc
BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Họ và tên tác giả luận văn : ……… ………
Đề tài luận văn: ……… ……… ….
Chuyên ngành:……… ……… …
Mã số SV:……… ……… …
Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày… ………… với các nội dung sau: ……… …………
……… ………
……… ………
……… ………
……… ………
……… ………
CHỦ TỊCH HỘI ĐỒNG
Trang 4I, Hoang Van Nam, declare that this thesis titled, ’Difficult situations recognition for
visual-impaired aid using mobile Kinect’ and the work presented in it are my own I
confirm that:
This work was done wholly or mainly while in candidature for a research degree
at this University
Where any part of this thesis has previously been submitted for a degree or any
other qualification at this University or any other institution, this has been clearly
stated
Where I have consulted the published work of others, this is always clearly
at-tributed
Where I have quoted from the work of others, the source is always given With
the exception of such quotations, this thesis is entirely my own work
I have acknowledged all main sources of help
Where the thesis is based on work done by myself jointly with others, I have made
clear exactly what was done by others and what I have contributed myself
Signed:
Date:
Trang 5International Research Institute MICAComputer Vision Department
Master of ScienceDifficult situations recognition for visual-impaired aid using mobile Kinect
by Hoang Van Nam
By 2014, according to figures from some organization, here are more than one million
people in the Vietnam living with sight loss, about 1.3% of Vietnamese people Although
the big impact to the daily living, especially with the ability to move, read, communicate
with another, only a small percentage of blind or visually impaired people live with
assistive device or animal such as a dog guide Motivated by the significant changes in
technology have take place in the last decade, especially in the introduction of varies
types of sensors as well as the development in the field of computer vision, I present
in this thesis a difficult situations recognition system for visually impaired aid using a
mobile Kinect This system is based on data captured from Kinect and using computer
vision technique to detect obstacle At the current prototype, I only focused on detecting
obstacle in the indoor environment like public building and two types of obstacle will
be exploited: general obstacle in the moving way and staircases-which causes a big
dangerous to the visually impaired people The 3D imaging techniques were used to
detect the general obstacle including: plane segmentation, 3D point clustering and the
mixed strategy between depth and color image is used to detect the staircase based on
detecting the stair edges and its structure The system is very reliable with the detection
rate is about 82.9% and the time to process each frame is 493 ms
Trang 6I am so honor to be here the second time, in one of the finest university in Vietnam to
write those grateful words to people who have been supporting, guiding me from the
very first moment when I was a university student until now, when I am writing my
master thesis
I am grateful to my supervisor, Dr Le Thi Lan, whose expertise, understanding,
gener-ous guidance and support made it possible for me to work on a topic that was of great
interest to me It was a pleasure to work with her
Special thanks to Dr Tran Thi Thanh Hai, Dr Vu Hai and Dr Nguyen Thi Thuy
(VNUA) and all of the members in the Computer Vision Department, MICA Institute
for their sharp comments, guidance for my works which helps me a lot in how to study
and to do research in right way and also the valuable advices and encouragements that
they gave to me during my thesis
I would like to express my gratitude to Prof Veelaert Peter, Dr Luong Quang Hiep
and Mr Michiel Vlaminck at Ghent University, Belgium for their supporting It’s been
a great honor to cooperate and work with them
Finally, I would especially like to thank my family and friends for their continues love,
support they have given me through my life, helps me pass through all the frustrating,
struggling, confusing Thanks for everything that helped me get to this day
Hanoi, 19/02/2016Hoang Van Nam
Trang 7Declaration of Authorship i
1.1 Motivation 1
1.2 Definition 2
1.2.1 Assistive systems for visually impaired people 2
1.2.2 Difficult situations 3
1.2.3 Mobile Kinect 5
1.2.4 Environment Context 11
1.3 Difficult Situations Recognition System 12
1.4 Thesis Contributions 13
2 Related Works 14 2.1 Assistive systems for visually impaired people 14
2.2 RGB-D based assistive systems for visually impaired people 18
2.3 Stair Detection 19
3 Obstacle Detection 25 3.1 Overview 25
3.2 Data Acquisition 26
3.3 Point Cloud Registration 27
3.4 Plane Segmentation 30
3.5 Ground & Wall Plane Detection 32
Trang 83.6 Obstacle Detection 32
3.7 Stair Detection 34
3.7.1 Stair definition 34
3.7.2 Color-based stair detection 35
3.7.3 Depth-based stair detection 45
3.7.4 Result fusion 46
3.8 Obstacle information representation 48
4 Experiments 49 4.1 Dataset 49
4.2 Difficult situation recognition evaluation 51
4.2.1 Obstacle detection evaluation 51
4.2.2 Stair detection evaluation 53
5 Conclusions and Future Works 58 5.1 Conclusions 58
5.2 Future Works 59
Trang 91.1 A Comprehensive Assistive Technology (CAT) Model provided by [12] 3
1.2 A model for activities attribute and mobility provided by [12] 4
1.3 Distribution of frequencies of head-level accidents for blind people [18] 4
1.4 Distribution of frequencies of tripping resulting a fall [18] 5
1.5 A typical example of depth image (A) raw depth image, (B) depth image is visualized by jet color map and the colorbar shows the real distance with each color value, (C) Reconstructed 3D scene 6
1.6 A stereo images that taken from OpenCV library and the calculated depth image (A) left image, (B) right image, (C) depth image (disparity map) 7 1.7 Some existed stereo camera From left to right: Kodak stereo camera, View-Master Personal stereo camera, ZED, Duo 3D Sensor 7
1.8 Time of flight systems from [3] 8
1.9 Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft Kinect v2 8
1.10 Structured light cameras From left to right: PrimeSense, Microsoft Kinect v1 8
1.11 Structured light systems from [3] 9
1.12 Figure from [16], (A) raw IR image with pattern, (B) depth image 9
1.13 Figure from [16] (A) Errors for structured light cameras, (B) Quantization errors in different distances of a door: 1m, 3m, 5m 10
1.14 Prototype of system using mobile Kinect, (A) Kinect with battery and belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body 11
1.15 Two different environments that I tested with (A) Our office build (B) Nguyen Dinh Chieu secondary school 12
1.16 Prototype of our obstacle detection and warning system 13
2.1 Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C) Nav-igation 15
2.2 NXT Robot System from [6] (A) The system’s Block Diagram, (B) NXT Robot 16
2.3 Mobile robot from [22] [21] 16
2.4 BrainPort vision substitution device [32] 18
2.5 Obstacle detection process from [30] 20
2.6 Stair detection from [26] (A) Input image (B)(C)Frequency as a output of Gabor filter (D)Stair detection result 21
Trang 102.7 A near-approach for stair detection in [13] (A) Input image with detected
stair region, (B) Texture energy, (C)Input image with detected lines are
stair candidates, (D)Optical flow maps in this image, there is a significant
changing in the line in the edge of stair 22
2.8 Example of segmentation and classification in [24] 23
2.9 Stair modeling(left) and features in each plane [24] 23
2.10 Stair detection algorithm proposed in [29] (A) Detected line in the edge image (using color infomation) (B) Depth profiles in each line (red line: pedestrian crosswalk, blue: down stair, green: upstair) 24
3.1 Obstacle Detection Flowchart 26
3.2 Kinect mounted on body 27
3.3 Coordinate Transformation Process 28
3.4 Kinect Coordinate 29
3.5 Point Cloud rotation using normal vector of ground plane (while arrow): left: before rotating, right: after rotating 30
3.6 Normal vector estimation algorithms [15] (a) Normal vector of the center point can be calculated by a cross product of two vectors of four neighbor points (red), (b) Normal vector estimation in a scene 31
3.7 Plane segmentation result using algorithm proposed in [15] Each plane is represented by a distinctive color 31
3.8 Detected Ground and Walls plane (ground: blue, wall: red) 33
3.9 Human Segmentation Data by Microsoft Kinect SDK (a) Color Image, (b) Human Mask 34
3.10 Detected Obstacles (a) Color Image, (b) Detected Obstacles 34
3.11 Model of stair 35
3.12 Coordinate transformation models from [7] 36
3.13 Projective chirping: a) A real world object that generate a projection with ”chirping” - ”periodicity-in-perspective” b) Center raster of image c) Best fit projective chirp 38
3.14 A pin-hole camera model with stair 38
3.15 A vertical Gabor filter kernel 39
3.16 Gabor filter applied on a color image (a) Original (b) Filtered Image 40
3.17 Thresholding the grayscale image (a) Original (b) Thresholded Image 40
3.18 Example of thinning image using morphological 41
3.19 Thresholding the grayscale image (a) Original (b) Thresholded Image 42
3.20 Six points vote for a line will make an intersection in Hough space, this intersection has higher intensity than neighbor pixels 42
3.21 Hough space (a) Line in the original space (b) Three curves vote for this line in Hough space 43
3.22 Hough space on stair image (a) Original image (b) Hough space 43
3.23 Chirp pattern detection (a) Hough space (b) Original image with detected chirp pattern 44
3.24 Point cloud of stair (a) Original color image (b)Point cloud data created from color and depth image 45
3.25 Detected steps 46
3.26 Detected planes 47
3.27 Detected stair on point cloud 47
Trang 113.28 Obstacle position quantization for sending warning message to visually
impaired people 48
4.1 Depth image encoding (A) Original, (B) Visualized Image (C) Encoded
Image 50
4.2 Detection time of each step in our proposed method 52
4.3 Example stair image to evaluation (A)Positive sample from MICA dataset
(B) Negative sample from MICA dataset (C) Positive sample from MONASH
dataset (D) Negative sample from MONASH dataset 54
4.4 Detected stair in Tian’s based method (A-F) and detected stair in my
proposal method (G-I) (A) Color image (B) Depth image (C) Edges (D)
Line segments (E) Detected concurrent lines (F) Depth values on detected
lines (G) Detected stair with blue lines are false stair edge and green
lines are stair edge (H) Edges Image, (I) Detected peaks in Hough map
corresponding to lines in Figure G 55
4.5 Miss detection in Tian’s based method because of missed depth on
stair(A-F) and detected stair in my proposed method (G-I) 56
4.6 Miss detection in Tian’s based method because of missed depth on
stair(A-F) and detected stair in my proposed method (G-I) 57
Trang 122.1 Comparison between assistive robot and wearable device 14
4.1 Database specifications 50
4.2 Pixel level evaluation result (TP,FP,FN: million pixels) 52
4.3 Object level evaluation result (TP,FP,FN: objects) 52
4.4 Stair dataset for evaluation 53
4.5 Stair detection result of the proposed method on different datasets 53
4.6 Comparison of the proposed method and the method of Tian et al [29] on MICA dataset 55
Trang 13PCL Point Cloud LibraryCAT Comprehensive Assistive TechnologyTDU Tongue Display Unit
IR InfraredOpenCV Open Computer VisionRGB Red Green Blue
RGB-D Red Green Blue and DepthToF Time of Flight
Trang 141.1 Motivation
According to the official statistic of National Eye Hospital in 2002, Vietnam has about
900.000 blind people, including about 400.000 who are totally blind By 2014, according
to figures from some organizations, the number of blind people in Vietnam is about 1,2
to 1,4 million people, it’s still a large number in comparison with other countries In
the worldwide, the visually impaired population is estimated to number in excess of 285
million by the investigation of World Health Organization (August 2014)1 About 90% of
them live in developing countries with low-income settings Visually impaired has made
a big impact in their daily living Especially they can not read the document, the ability
to move and to communicate with other people is compromised because the information
is received primarily through vision All of the above things have led blindness become
the public heath problem in all over the world
Nowadays, with the significant developing in the technology, lots of assistive devices
has been released in order to help visually impaired people in daily life But although
many researchers and companies are concerned with making better and cheap device
to improve the comfort of visually impaired people, the research in this field still
re-mains many unsolved issues and in general, those devices still cannot replace traditional
methods such as the white cane or guided dog
Take the motivation on the significant changes in technology have take place in the last
decade, especially in the introduction of varies types of sensors as well as the development
in the field of computer vision, my thesis aims to build a prototype of system to help
visually impaired people avoid the obstacle in the environment using Kinect sensor
1 http://www.who.int/mediacentre/factsheets/fs282/en/
Trang 15With the Kinect, the benefit is that we can make a reliable system by using depth and
color information to detect the obstacle with an affordable price In my thesis, due to
the lack of time, I only focus on indoor environment, more specifically in the public
building such as apartment or office in order to detect some general objects encountered
on the moving way and stair which may cause danger to the visually impaired people
My thesis is organized as follows:
First, I shall give some definitions in the context of my work and the contributions in
this thesis
In chapter 2, I shall review briefly some other works that related to my system such
as existing assistive devices, obstacle detection algorithms/systems, its advantages and
disadvantages
In chapter 3, a framework for obstacle detection will be developed and I shall present
the details of each module and also the entire system, analyzing and assessing them
In the next chapter, I shall give some experiments results of my system, including how
to prepare the dataset, how to make an evaluation and the final results
In the final chapter, I end this work by giving some conclusions and future works to
make the system more complete and effective
1.2 Definition
1.2.1 Assistive systems for visually impaired people
According to [12], assistive systems for visually impaired people can be understood as
an equipment, devices or systems which can be used to overcome the gap between a
disabled person wants to do and what the social allows them to do In short, such
kind of system must be able to help the visually impaired people to do the things that
normal people can do And this system can be model by the Comprehensive Assistive
Technology (CAT) Model as shown in Fig 1.1 The top level of this model consist of
four components that can be used to define all assistive technology systems:
• Context (in which the assistive technology will be used)
• Person (what kind of user can use this system)
• Activities (what activities that assistive system can help the visually impaired
people, can be seen more clearly in Fig1.2)
Trang 16• Assistive Technology (technology will be used to make a system)
Most of the existing systems are aimed at solving one specific aspect of each branch in
the model: work on bounded defined context, with some certain types of users to help
them in specific activities in daily life In the framework of my master thesis, to simplify
the system, I just focused on some certain aspects of this model and I will explain in
detail in the next sections In short, I applied my system with the local settings of
context, in a small public building such as office, department and the users are the
visually impaired students at the Nguyen Dinh Chieu Secondary school to help them
avoid obstacles in a moving way
PersonContext
Culture &
Social contextNational contextLocal settings
Activity specificationDesign issues
End user issuesSystem technology issues
Characteristics
Social aspectsAttitudes
Figure 1.1: A Comprehensive Assistive Technology (CAT) Model provided by [12]
1.2.2 Difficult situations
Fig 1.2 shows detail information of activities branch in the model CAT (see Fig 1.1)
As shown in the figure, there are a lot of services that can be used in assistive systems
for visually impaired people such as mobility, daily living, cognitive activities, education
and employment, recreational activities, communication and access to information But
most of exist works focus on the mobility component in the activities model because of
its important role for visually impaired people daily life
Trang 17Recreational activities Education and Employment Cognitive activities
Daily living Mobility
Communication & Access to information
Movement on ramps,
slopes, stairs & hills
Long & medium distance
locomotion
Sitting and standing
Reaching and lifting
Navigation and orientation
Sitting and standing
Obstacle avoidance
Short distance locomotion
inside & outside
Access to environment
Obstacle avoidance
Our Focus
Figure 1.2: A model for activities attribute and mobility provided by [12]
According to the survey of R.Manduchi [18] in 2011 with 300 respondents who are
legally blind or blind, there were half of the respondents said that they had an
head-level accident at least once in a week and about 30% respondents fell down at least once
a month (see Fig 1.3 and Fig 1.4) Therefore, helping visually impaired people in the
moving process is always an interested topic for researchers, social organizations and
companies In fact, many products have been released, and also have some particular
success like the system proposed in [11], [10], [1] and [4]
Figure 1.3: Distribution of frequencies of head-level accidents for blind people [18]
Trang 18Figure 1.4: Distribution of frequencies of tripping resulting a fall [18]
In the context of my thesis, I aim to develop a system which can detect the obstacles
in visually impaired people’s moving way which are the main cause of the accidents
mentioned above The scenario in this project is that visually impaired people want to
move on the hallway inside a public building, so they need to avoid obstacles including
moving or static objects and to go up/down the stair Obstacle in my case can be defined
as objects laying on the ground or in front of the visually impaired people that he/she
can be harmed while moving if encountered these objects Although obstacle’s class is
very important with the visually impaired people to distinguish which is more dangerous
and which is not but in my work, I just try to detect obstacle in the scene without saying
its name (make a classification) And within the framework of this thesis, I also focus on
detection another special object that often appears in the building and is very dangerous
for the visually impaired people, that is the stair Moreover, the proposed system will
only give a warning to the blind people using Tongue Display Unit (TDU) which has
been already developed by Thanh-Huong Nguyen in 2013 [23] In brief, my proposed
system aims to solve two aspects of mobility component of the activities model (see
Fig 1.2): obstacle avoidance and movement on ramps, slopes, stair & hill and with
the second aspect, the current system just stop at the level of given warning distance of
stairs to the visually impaired people in order to assist them in going up/down stairs
1.2.3 Mobile Kinect
1 Introduction To assist visually impaired persons in those difficult situations, in
my thesis, I proposed using a Kinect sensor to capture the information of the
envi-ronment in order to detect obstacles if they appear There are a lot of advantages
when using Kinect in this system since it is a popular RGB-D camera with cheap
price But firstly, I will give some brief information about the depth camera where
Kinect is the typical example
Trang 19Depth camera is actually a sensor which has the capacity to provide depth
in-formation (depth image or depth map) A depth map is an image that contains
information relating to the distance of the surface of scene objects from a
view-point, for example in the Fig 1.5 An intensity value of each pixel in a depth
map represents a distance from a point in the object to the camera Therefore,
3D information of the scene can be reconstructed by using depth image (as shown
in Fig 1.5-C) The benefit of the depth image that is not affected by lighting
conditions
Figure 1.5: A typical example of depth image (A) raw depth image, (B) depth image
is visualized by jet color map and the colorbar shows the real distance with each color
value, (C) Reconstructed 3D scene
In recent years, with the development of technology, especially in the field of
sensor fabrication industry, there are a lot of cameras have been placed on the
market which is capable of capturing the depth information Those devices can be
separated into several groups by used technology such as stereo camera: ZED, for
example, Time-of-Flight (ToF) like ZCam, the structured light camera like Kinect,
long range 3D camera Each device has it own advantages, disadvantages and only
suitable for a particular use case
2 Stereo Camera
Stereo camera is the kind of camera was used in the robotics since its early days
Take the ideas of human binocular vision, it contains two or more cameras with
precisely known relative offsets Depth information can be calculated by matching
similar point in the overlapped region between images Hence, 3D distance to
matching points can be determined using triangulation like illustrated in Fig1.6
However, the camera is used in this case is still the color camera As a result, it is
still affected by the changing of lighting conditions On the other hand, the depth
image is calculated by matching algorithms, so it works very poorly when the
scene is texture-less, for example image of wall, building, There are many stereo
cameras that are available on the market due to the ease of making such as Kodak
Trang 20stereo camera, View-Master Personal stereo camera, ZED2, Duo 3D Sensor3, as
illustrated in Fig1.7
Figure 1.6: A stereo images that taken from OpenCV library and the calculated
depth image (A) left image, (B) right image, (C) depth image (disparity map)
Figure 1.7: Some existed stereo camera From left to right: Kodak stereo camera,
View-Master Personal stereo camera, ZED, Duo 3D Sensor
3 Time of Flight (ToF) camera
Time of Flight (ToF) cameras use the same principle as laser radar, which is
instead of transmitting a single beam, short pulses of Infrared(IR) light is sent
The camera will get the return time from pixels across its field of view And the
distance was measured by comparing the phase of the modulated return pulses
with those emitted by the laser (Fig 1.8) But ToF cameras is also suffered from
similar limitations as a ime of flight sensors, including ambiguity of measurements,
multiple reflection, sensitivity to material reflectance, background lighting and do
not operate well outdoors in strong sunlight Some of the popular ToF cameras
can be listed such as: DepthSense4, Fotonic5, Microsoft Kinect v2 (see Fig1.9)
4 Structured light camera
Structured light camera is another approach to measure depth information by
using “structured light”, which is a pattern of light such as array of lines The
scene will be viewed at an angle like illustrating in the Fig1.11 If the pattern is
projected onto a flat wall, the camera will see straight lines but if the scene is very
complex then it will see a more complex profile By analyzing this profiles across
Trang 21Figure 1.8: Time of flight systems from [3]
Figure 1.9: Some ToF cameras From left to right: DepthSense, Fotonic, Microsoft
Kinect v2
the field to map, depth information can be calculated With traditional method,
structured lights is grids or arrays of lines but it’s affected by a noise So that, in
some newer devices such as PrimeSense or Microsoft Kinect v1 (see Fig 1.10), a
codes will be added in to a light to make the camera is almost zero repetition across
the scene The Kinect v1 uses a randomly distributed speckled pattern and each
speckle looks differently flat at different distance, due to a special lens as can be
seen in the Fig1.12 But this kind of depth sensor also have some limitations such
as the errors grow with the square of the distance to objects, strong quantization
effects (see Fig) and some limitations that similar with ToF system like sensitivity
to material reflectance, do not operate well in strong sunlight
Figure 1.10: Structured light cameras From left to right: PrimeSense, Microsoft
Kinect v1
Trang 22Figure 1.11: Structured light systems from [ 3 ]
Figure 1.12: Figure from [16], (A) raw IR image with pattern, (B) depth image
Trang 23(b) Figure 1.13: Figure from [16] (A) Errors for structured light cameras, (B) Quantiza-
tion errors in different distances of a door: 1m, 3m, 5m
Trang 245 Mobile Kinect
In my thesis, I used Microsoft Kinect v1 as a capture device of the system due to
its usability and availability To make Kinect become more flexibility, I have added
some components and making it become “Mobile Kinect”, which is Kinect with a
external battery so it can be moved to anywhere without worrying about electrical
sources (holes, cables) and it is easy to replace this external battery To attach on
the human body, Kinect has been mounted into a single belt, so that the Kinect
can be fix mounted easily on the body An other important part of mobile Kinect
is a laptop which plays the role of main processor It contains a data acquisition
and obstacle detection modules The reason of choosing laptop because Kinect is
the commercial device and it have been developed for the video game purposes,
so Kinect cannot operate without a PC And because of the restrictions in Kinect
data cable length, a computer must be placed nearby the Kinect (the whole system
can be seen in Fig 1.14)
Figure 1.14: Prototype of system using mobile Kinect, (A) Kinect with battery and
belt, (B) Backpack with laptop (C)Mobile Kinect is mounted on human body
Officially, Kinect runs with 12V sources provided by an adapter comes with it by
default In the experiment, it can operate when the voltage drops down to 8.5V
with the running current is about 0.3-0.4A So I designed battery with 8-packs
AAA batteries of which can provides 8x1.5A=12A And the time to drop down
from 12V-8.5V is about 1.5-2 hours in our experiment, which mean the mobile
Kinect can runs 1.5-2 hours with the battery
1.2.4 Environment Context
The environment for development and testing system is public building as mentioned
before More specifically, I just focus on a specific use case, that is walking along the
corridors of building There are two major type of corridor in our context, that is
two sides wall and half opening with one side is the glass windows or opened and the
other side is the solid wall But in our experiment, I aim to develop a system with the
Trang 25half opening corridor because it’s very popular type in public building such as school,
office and some apartment In the context of my thesis, I tested the system with two
different environments One is our office building at B1-Hanoi University of Science and
Technology, the others is Nguyen Dinh Chieu secondary school for blind pupils (see Fig
1.15) Because depth data is strongly affected by sun light, so the requirement with
environment is that it must not be lighted too strong (in the shady day or in the two
sides wall building, where the sun light cannot reach to)
The use-case was used is that the user (visually impaired people) want to move to another
place in the building and to do that, he/she must go along the corridor, walk up/down
the stair then move to destination point Ignoring the problem with path finding, in
my work, I only aim to obstacle avoidance Obstacles in both cases are the objects that
blocked the moving way such as: distinguisher, trash bin, column, wall and human in
front of the user (can be seen in Fig 1.15)
Figure 1.15: Two different environments that I tested with (A) Our office build (B)
Nguyen Dinh Chieu secondary school
1.3 Difficult Situations Recognition System
In conclusion, my entire system to recognize the difficult situations can be demonstrated
as in Fig 1.16 In this case, the prototype system will be fix mounted on the visually
impaired people To interact with the user, a tactile visual substitution module from
[23] has been used to give the warning about obstacle in front of he/she The mobile
Kinect will be mounted on the human hip to capture depth and color information Those
information will be processed by a recognition module in the laptop behind user After
obstacle has been detected, laptop will send a correspondence command to tactile visual
substitution module in order to give warning message The message representation has
been integrated into this module and presented in [23] So my main work is how to send
a correct command to the feedback module
Trang 26There are two main contributions in my thesis:
• The first is making the prototype of difficult situations recognition system that
can work in a flexible manner, stability in many different environments
• Second contributions is I proposed a method to detect the obstacle using both
color, depth information and 3D Point Cloud technique, especially on the problem
of early detection the up/down stair and handling with noise in depth data
Trang 27Related Works
2.1 Assistive systems for visually impaired people
There are a lot of devices and systems have been developed to help visually impaired
people in the daily life This section presents some research based on vision technology
for the visually impaired people that related to my work Each technology/device aims
to cover one or more fields in the Mobility component as shown in Fig1.2in Chapter1
From the point of view of obstacle avoidance system technology, there are two different
technologies are being widely used: an assistive robot to help visually impaired people
with moving activities and wearable devices The advantages and disadvantages of each
technologies will be discussed in the following part
Table2.1shows the comparison between two technologies
Table 2.1: Comparison between assistive robot and wearable device
Assistive Robot Wearable DeviceTypical Example Guided dog robot[22][21][6][17]
Glasses[32][20], mobile phone,white cane,
sensory substitution[5][14]
Advantages
-Can integrate differenttechnologies & devices-Long operating time
-Cheaper than assistive device-Flexible
-Convenience while using
Disadvantages
-Expensive-Inconvenience,hard to get on well with society-Limited environment
(almost works with flat plane only)
-Limited operating time(battery problem)-Limited in technologies-May harm other human senses
Trang 28With the assistive robot for visually impaired people, in [17] (2004), the authors
devel-oped a simple robot-assisted for indoor navigation as shown in Fig 2.1 In this work, a
lot of passive RFID tags were attached to the environment and the object to help the
navigation (or obstacle avoidance) task To interact with the visually impaired users,
the speech recognition with the single, simple word and wearable keyboards are used,
when the robot is passing an object, it can speak its name through speech synthesized
module But this system is still developing and was only tested in the laboratory
envi-ronment and since RFID tags play the most importance role in this system, it is hard
to apply it in the real, large environment like office and resident
Figure 2.1: Robot-Assisted Navigation from [17] (A) RFID tag, (B) Robot (C)
Nav-igation
In [6], Dalal et al proposed a mobile application along with a robot (named NXT Robot)
to help visually impaired people avoid an obstacle (Fig2.2) in an outdoor environment
This robot and mobile phone are connected to each other by Bluetooth and the
inter-action between human and robot are speech communication technique On the robot,
the authors attach the ultrasonic sensor in order to detect an obstacle and combine this
information with location sent from a mobile phone using GPS When user asks for
going to a destination, the mobile phone will find the route with embedded Google Map
and give voice instructions while user going to the destination The advantage of this
system is using robust navigation applications like Google Map and ultrasonic sensor
But the limitations is this system depends on GPS signal and internet connection so
that it cannot work off-line or in some environment with weak GPS signal With the
obstacle detection, the system only gives the obstacles in front of the robot and in the
complex environment, this system may give unreasonable instructions
Recently, Nguyen et al has introduced a mobile robot [22], [21] for indoor navigation
using computer vision techniques (see Fig2.3) This system consists a PCbot-914, which
is a PC attached with actuators and sensors like wheels, ultrasonic sensors, a normal
webcam and mobile phone The visually impaired user will take the mobile phone,
Trang 29choose destination location through the touch screen and follow the robot to go to the
destination that he/she wants To communicate with visually impaired people, vibration
patterns on a mobile phone Firstly environment must be modeled in off-line phase to
build a map with image and static objects position at each location At the on-line
phase, the image captured from the camera will be used to match with the database in
the learned model to give a location of robot in building for navigation task, another
module in this system will detect an obstacle in the image in order to give warning to the
users However, this system only works in the limited environment and the off-line model
must be built before using, and visually impaired people must be trained carefully to
use this system (how to recognize a vibration pattern, destination position on the touch
screen) And due to the limitation of the monocular webcam, a lot of unexpected factors
can affect to the system results such as environment and lighting changing
Trang 30With the wearable devices for helping visually impaired people in daily life, there are
some existing products based on different technologies as presented in the next part The
vOICe system [5] is the head mounted camera which is a form of sensory substitution
and has a functionality of receiving vision information and converting to the sound signal
that blind people can hear it through the stereo headphone The main idea is that the
scene or captured image can be represented as a sound signal pattern where the time is
corresponding to the position in horizontal axes, the pitch is corresponding to vertical
axes of the visual pattern and the loudness stands for brightness Consequently, each
image with the capture rate is about one image per second, blind people will hear a sound
from left to right that represent each position by a pitch and loudness level In order to
know which is the object/scene corresponding to the sound level, the blind people have
been trained before using the system But this device also has some limitations Firstly,
auditory signals are very important for blind people but if they use this device, theirs
hear sensing will be blocked and they can be distracted from the natural environment
In another way, because the image is represented by a pitch, loudness level, as a result,
the system will create a very noisy sound map when moving to the complex environment
and it’s complicated for the blind user to understand the scene
A product in development called BrainPort (from Wicab Inc.) and recently approved
for sale in Europe [32], uses a camera mounted on a pair of sun glasses as its input
device After image processing, images are displayed on the tongue using an electro
tactile display of 49 electrodes to provide directional cues to the blind users This
will be shown as a small electrical “image” on the tongue - which called “lollipop”-like
display as illustrated in Fig2.4 The limitations of this system are that it requires use
of the mouth, and it will reduces the abilities of blind people, especially in speaking and
eating - which are the very important activities and occur frequently Another problem
is the resolution of both electrotacticle and tongue sensitivity is still far from the visual
system So, representing the image directly in the human tongue is not an effective way
to represent a data In my work, I used similar system but the output on electrotactile
just a defined encoded signal - which is easier for the blind people to recognize the
instructions
Very recently in Vietnam, Dr.Nguyen Ba Hai has successfully developed a vision-based
device named “A haptic device for blind people” [20] This is a very low cost and
portable glasses which can help visually impaired people to detect an obstacle using a
single laser transmitter and receiver When the glass detects an obstacle, it’ll trigger
a small vibrator on the forehead so that visually impaired people can feel the obstacle
However, this device is very simple, it cannot detect the potential obstacles come from
two sides of visually impaired people as well as take into account object information
Trang 31Figure 2.4: BrainPort vision substitution device [32]
2.2 RGB-D based assistive systems for visually impaired
people
Nowadays, with the development of depth sensor, there are some works dedicated to
RGB-D based assistive system for visually impaired people as follows:
NAVI [33] (Navigational Aid for Visually Impaired) is the system which is similar with
my proposal This system also used a Kinect with a battery which is fixed mounted
on helmet There are two main functions in this system called “Micro-Navigation” and
“Macro-Navigation”, Micro-Navigation means obstacle avoidance and Macro-Navigation
is stand for path finding For the purpose of giving information about obstacles to blind
people, the vibrotactile output is provided by a waist belt that contains three pairs
of Arduino LilyPad vibe boards With the obstacle detection, the system can detect
closest obstacles in the left, right and center region of the Kinect’s field of view using
depth histogram and triggers the vibe board in the corresponding direction About
“Macro-Navigation”, the authors used fixed markers to annotate the location and detect
it via Kinect’s RGB camera and using depth image to calculate the person’s distance
to the marker in order to give navigation instructions The output of this function is
synthesized voice as a feedback to the blind people about the instructions to move, for
example ”Open the door”
In [8], the authors present a low-cost system using Kinect sensor for obstacle avoidance
for visually impaired people The Kinect sensor is mounted on the body using a belt It
is connected to a battery pack and to a computing device (smart phone) which provides
audio feedback to the user The system follows 5 main tasks: i) read the data from
Kinect device and express it as 3D point cloud; ii) detect the floor and register the data
in the reference system centered at the user’s feet; iii) detect the occupancy of the volume
in front of the user, at this step, the space is subdivided into a certain numbers of values
each one corresponding to a vertical volume spanning a range of possible directions; iv)
Trang 32analyze the output of the accelerometer to determine if the user is walking and how fast
it is; v) provide the feedback to the user
Tang et al also presented a RGB-D sensor based computer vision device to improve the
performance of visual prosthesis (retinal prosthesis or tongue stimulator) [27] Firstly,
the patch-based stereo vision will be applied to create a RGB-D data which is already
segmented based on color segmentation, feature point matching and plane fitting and
merging using RANSAC Then they do a technique called “smart sampling” to highlight
the important informations This step includes background removal, parallax
simula-tion, object highlighting and path directions At the final step, the information will
be represented using BrainPort device for line orientation and navigation using line
orientation
In addition, my thesis is also get motivation and inherited the work from Michiel el
al [30] In this work, the authors proposed a obstacle detection algorithm based on
point cloud to run with Kinect Obstacle is defined as obstacle on the floor plane,
door and stair However, with stair and door detection , the authors only tested it
independently along with obstacle detection Fig2.5shows the main process of obstacle
detection algorithms which partial similar with my proposal in 3.1 Firstly, The point
cloud will be down-sampled and filtered to reduce the processing time Then, using
RANSAC, ground plane and wall plane will be removed from point cloud and using
clustering techniques to detect a obstacle in a remaining point cloud data With the
stair detection, a similar process with ground plane detection will be used and using
some pre-defined parameters such as step height, step number, a stairs will be detected
from the point cloud With the door detection, take the observation that door is always
on the wall plane, the authors use segmentation on wall plane based on color to find a
door region The obtained results when applying this algorithms with author’s database
is very promising However, because this algorithm based almost on plane segmentation
and ground plane detection so when apply it to our dataset (MICA dataset, presented in
4.1), where point cloud data is not good on floor plane because of the moving, lighting
condition, the achieved results is still low And with the pre-defined parameters such
as step height, the stair detection algorithm is not robustness when apply it in our
environment where its specifications can be changed
2.3 Stair Detection
Stair detection on an image is a classical problem in computer vision since it is a familiar
object in daily life The most prominent characteristic of the stair is that it has the rigid
form with repetitive structure of stair plane By that way, a lot of lines and edges appear
Trang 33Figure 2.5: Obstacle detection process from [ 30 ]
in the stair image Therefore, stair detection is the interesting topic for researchers with
traditional computer vision techniques like line detection, edge detection, frequency
analysis, image segmentation
In [26] (2000), before the Hough line detection algorithm has been developed and used
widely, the authors proposed a method to detect the stair in the gray-scale image using
some basic image processing operators and apply with constraints to find out the stair in
the image Stair image is the outdoor scene with good lighting condition and the edges
on stair plane is quite clearly The authors first use Canny detection to extract the edges
and Gabor filer on horizontal and vertical direction to focus on two main directions then
find concurrency lines (hence finding the vanishing point) as hypotheses for stair-cases
However, since this algorithm is based on simple operators and rules, it will very sensitive
with parameters (on Canny, Gabor filter, line’s length) and detection is not good when
there is an object has concurrency lines in the images Fig 2.6illustrated the result of
this system
Recently, with the developing of the sensor with the ability to capture depth information,
the stair detection can be done efficiently in 3D and it is still a good topic to research
about plane detection, plane segmentation in 3D With the stair detection system using
Trang 34(a) (b)
Figure 2.6: Stair detection from [26] (A) Input image (B)(C)Frequency as a output
of Gabor filter (D)Stair detection result
RGB or RGB-D data, there are some works aim to develop a system which can early
detect the stair in order to give moving instructions to the blind people or control the
robot automatically and will be presented in the following part
In [13], the authors proposed an approach to detect descending stair to control the
autonomous tracked vehicle using the only monocular gray-scale camera Firstly, the
stair will be coarsely detected if it’s far from the robot This is called “Far-approach”
which uses optical flow and texture information to predict the stair By taking the
observation that if the stair is descending, so the region above it is normally the wall
with low texture region Another observation is that when moving, if there is a stair
appears, the optical flow in the is changed very fast from the ground region to the wall
region because there’s a significant changing in the depth of the image This step can
be illustrated in the Fig 2.7below
With the near-approach, detected lines will be applied some constraints to the line using
a median flow of the region above and below the line, the length of line to look for stair
edge
Trang 35(a) (b)
Figure 2.7: A near-approach for stair detection in [13] (A) Input image with detected
stair region, (B) Texture energy, (C)Input image with detected lines are stair
candi-dates, (D)Optical flow maps in this image, there is a significant changing in the line in
the edge of stair
In [24], the authors proposed a method to detect and model a stair using a depth sensor
Depth image will be used to create point cloud data Firstly, the plane segmentation and
classification will be applied to find planes in a scene This step includes normal
estima-tion, region-growing, planar test, planes extension, cluster extraction and classification
(can be seen in Fig2.8)
After planes have been segmented, the stair will be modeled as a group of parallel planes
Each plane must be satisfied some conditions like size, height, orientation as shown in
Fig 2.9
The advantages of this method are the robustness of depth information and stair was
modeled very explicitly using a lot of conditions applied on stair plane But these
constraints also require the stair plane must be clearly visible in the images when in
the real environment, due to the occlusion, camera viewpoint, these conditions may not
satisfied
Trang 36Figure 2.8: Example of segmentation and classification in [24]
Figure 2.9: Stair modeling(left) and features in each plane [24]
Yingli Tian [29] proposed a mixed approach between using RGB and Depth Image to
find a stair and crosswalk In the first step, parallel lines will be detected in the RGB
Image to find a group of concurrent lines using Hough transform and line fitting with
geometric constraints This step can be illustrated in Algorithm 1
Algorithm 1 Parallel line detection from [29]
1: Detect edge maps from the RGB image by edge detection
2: Compute the Hough transform of the RGB image to obtain the direction of the line
3: Calculate the peaks in the Hough transform matrix
4: Extract lines segment and its direction in the RGB image
5: Group the line fragments as the same line if the gap is less than a threshold
6: Detect a group of parallel lines based on constraints such as the length and a total
number of detected lines of stairs and pedestrian crosswalks
Then, they extract the depth information in each line then put it into a support vector
machine (SVM) classifier to detect the stair (both upstairs and downstairs) or pedestrian
crosswalks (as seen in Fig 2.10) Finally, they estimate the distance between camera
and stairs to the blind user to give a warning messages In this paper, the authors
Trang 37presented a robust algorithm to detect a stair take into account the advantages of color
and depth information However, these both color and depth image must be clearly to
get a correct stair detection where color information gives a stair candidates and depth
information was used to confirm that it is the stair or not In my case, since the limited
in the measurable range of Kinect depth sensor, the depth image is not always available
on stair surface and the edge detection algorithm is still sensitive to parameters
Figure 2.10: Stair detection algorithm proposed in [ 29 ] (A) Detected line in the edge
image (using color infomation) (B) Depth profiles in each line (red line: pedestrian
crosswalk, blue: down stair, green: upstair)
Trang 38Obstacle Detection
3.1 Overview
The entire system flowcharts can be seen in the Fig 3.1 There are 3 main parts in this
flowchart including Kinect side, laptop side and user interface (or feedback module) side
where the laptop side contains almost processing module
Firstly, the laptop will capture data from Kinect By default, Kinect provides many
types such as images, sound, skeleton, accelerometer data but in my work, only depth
image, color image, and accelerometer data will be used Then human will be detected in
the depth image Kinect SDK also provides a human index in each pixel encoded in the
depth image This information is calculated from depth image so it’s very robustness
After human has been detected, the system will store that information as a first detected
obstacles
The next module is obstacle detection in both color and depth image by using point
cloud technique All the captured information will be used to build a point cloud - a
collection of a 3D point in order to reconstruct the environment around the blind people
Then, from the point cloud, some techniques will be applied to find an obstacle like plane
segmentation, ground and wall plane detection and then obstacle detection The output
of this module is detected obstacle including normal obstacle and stairs
Another individual module is stair detection based on the color image This module will
detect stair directly in the color image by using some line extraction technique and the
geometry relationships between lines
In the obstacle fusion module, all the obstacle will be checked again for its availability in
the scene and return a most importance (or dangerous) obstacle to the blind people in
order to send the commands to user interface module through obstacle warning module