The objectives of the thesis Study specific approachs to improve belief propagation algorithms to speed-up of execution and reduce the amount of memory required when executing the dispa
Trang 1MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENSE
ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY
DOAN VAN TUAN
RESEARCH ON THE APPROACH TO IMPROVE SIGNAL PROCESSING SPEED IN THE STEREO VISION SYSTEM
Specialization: Electronic Engineering
Code: 9 52 02 03
SUMMARY OF PhD THESIS IN ELECTRONIC ENGINEERING
Ha noi, 2019
Trang 2The thesis has been completed at:
ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY
Scientific supervisors:
1 Dr Ha Huu Huy
2 Assoc Prof Dr Bui Trung Thanh
Reviewer 1: Assoc Prof Dr Hoang Manh Thang
Hanoi University of Science and Technology
Reviewer 2: Assoc Prof Dr Le Nhat Thang
Posts and Telecommunications Institute of Technology
Reviewer 3: Dr Vu Le Ha
Academy of Military Science and Technology
The thesis was defended at the Doctoral Evaluating Council at Academy level held at Academy of Military Science and Technology at
…… , date 2019
The thesis can be found at:
- The library of Academy of Military Science and Technology
- Vietnam National Library
Trang 3INTRODUCTION
1 The necessity of the thesis
Today, science and technology has developed strongly, especially since the industrial revolution 4.0 was initiated from Germany in 2013 One
of the factors that dominated the industrial revolution 4.0 is that robots will gradually replace labors working in factories Therefore, the robot must process information in a three-dimensional environment (3D) through the vision system to orient, locate, identify and accurately locate the surrounding objects called stereo vision or 3D robot vision In addition, stereo vision is also applied in identification, regeneration, positioning, surgery, self-propelled vehicles, mapping and in art
People want to create a robot vision system like human vision, the simplest vision system is to use stereo camera including two cameras combined with embedded processing system to replace two human eyes Stereo camera information is processed through algorithms based on processing devices such as CPU, DSP, GPU, FPGA and ASIC combined with implementation languages such as Matlab, OpenCV, CUDA Such a system is called a stereo vision system The major challenges for the well-known stereo visio system are the stereo camera as the data source from the stereo camera image is increasing, the execution speed requires real-time response, high reliability and finite memory capacity To solve this problem, one of the most effective solutions is to develop processing algorithms, while the processing platforms have not yet developed according to human needs
2 The objectives of the thesis
Study specific approachs to improve belief propagation algorithms to speed-up of execution and reduce the amount of memory required when executing the disparity map of dense two-frame high resolution stereo camera in the stereo vision system application for 3D robot vision
3 Research objectives and scope of the thesis
-The thesis focuses on studying the solution to reduce the energy
Trang 4function of the belief propagation algorithms implements the disparity map
of dense two-frame high resolution stereo camera image in the stereo vision system
- Stereo camera images are taken from the test data set
- Study and propose solutions to improve speed-up BP algorithm to advance the effectiveness of the disparity map implementation
4 Research methodology of the thesis
The thesis focuses on researching solutions to optimization energy function of the belief propagation algorithms implements the disparity map
of dense two-frame high resolution stereo camera Analyze improvement belief propation algorithms and propose solutions to reduce the the energy function of the belief propagation algorithm and select appropriate processing platform to achieve the purpose of the thesis From mathematical analysis, parameterization of the parameters, the thesis uses simulation tools, data from the test data set to prove the correctness of the research results
5 The scientific and practical contributions of the thesis
The disparity map of stereo camera has a very important role in 3D Robot vision From the disparity map, combined with triangulation, the depth map and the distance from the camera to the object can be estimated This technique is widely applied in industry, robotics, surgery, self-propelled vehicles, localization and mapping
The thesis has proposed two solutions to reduce the cost function for belief propagation algorithm The first solution is to reduce the number of nodes in the Markov random field model through loops using the CTF method to level 1 The second solution is to combine the local census transform algorithm and global belief propagation algorithm has improved the cost reduction of the initial start button when implementing the belief propagation algorithm's message update
6 Research content and structure of the thesis
The whole thesis consists of 137 pages presented in 3 chapters, 40 drawings, 29 tables and 14 charts
Trang 5Chapter 1:
OVERVIEW OF STEREO VISION AND SIGNAL PROCESSING IN
THE STEREO VISION SYSTEM 1.1 Overview of stereo vision
Stereo vision is a very important component in computer vision and has been researched and developed by many scientists in the last two decades [46] Stereo vision system is widely applied in many areas such as robots, self-propelled vehicles, medical, arts, entertainment and especially
in industrial networks 4.0 [59] People want to create a robot vision system that works in a 3D environment similar to human vision called stereo vision system like Figure 1.1, when robots and humans work together, interactive
Figure 1.1 Scheme block of stereo vision system
1.2 Model camera
1.3 Calibration methods
The camera calibration method will determine the speed and reliability
of the camera's internal and external parameters Currently there are a number of classic image calibration methods such as Hall [39], Salvi [37], Tsai [91] and Weng [76] based on the corresponding camera models Each model will have appropriate calibration methods and have different advantages and disadvantages
1.4 Rectification methods
Rectification methods optimize finding homologous points in thestereo camera images and improve image processing reliability The rectification method is divided into two types In the first form, the rectification methods after calibration [9], [105] The second form, the rectification methods performed without calibration [26]
Image information Image processing Application
Trang 61.5 Stereo matching algorithms
Over the past two decades, many matching algorithms have been proposed [46] The matching algorithm is classified according to the stereo camera image Matching algorithms for sparse two-frame high resolution stereo camera images such as SIFT [10], SURF [66] are often used for stereo vision systems that require high speed and memory capacity low requirements however do not require high reliability, often applied to navigation systems, mapping or SLAM [36] and self-propelled vehicles Matching algorithms for dense two-frame high resolution stereo camera images as [7], [44] which are often used for stereo vision systems that require high reliability, often applied to industrial product inspection systems, 3D visual system of robot vision and in surgery or object reproduction, however, large computational complexity and high memory requirements Matching algorithms for dense two-frame high resolution stereo camera images have three main types: local algorithms [15], [101], global algorithms [48], [78] and semi-global algorithm [24], [90]
1.6 Hardware processing method in stereo vision
- Method CPU
- Method DSP
- Method GPU
- Method FPGA/ASIC
1.7 Evaluation hardware processing method in stereo vision
From CPU → DSP → GPU → FPGA → ASIC, processing efficiency increases sequentially, while costs and power consumption decrease accordingly Stereo matching algorithms have more flexibility and short development cycles, while hardware performs a longer design cycle with less design flexibility because at the same time the algorithm must be considered optimally and collect hardware map From a practical point of view, the stereo vision processing hardware system needs to be more accessible to real-time stereo vision systems because it consumes low power and is cheaper
Trang 71.8 Research directions to improve the efficiency of the stereo vision system
- Image segmentation or hierarchy optimization
- Occlusion and consistency handling
- Matching cost & energy optimization improvement
- Cooperative optimization
- Efficient memory arrangement method
- Advanced VLSI design method
1.9 Conclusion of chapter 1
Chapter 1 presents an overview of the main components of the stereo vision system consisting of two main blocks of image information block and image processing block Each component has also been analyzed and given an assessment of its role in the system The image information block consists of two main components: stereo camera and camera calibration This block provides stereo camera parameters such as image size and depth disparity, internal parameters and parameters outside the stereo camera The parameters also affect the reliability of the system The image processing unit will determine the effectiveness of the system, including software and hardware Software is programming languages that perform processing algorithms including image rectification algorithms, matching algorithms In particular, the role of matching algorithms will primarily affect the efficiency of the system Hardware is the processing platform for implementing software solutions and it also plays a role in improving the efficiency of the stereo vision system In addition, the match choice between the processing platform and the matching algorithm also influences the performance of the stereo vision system
The selected hardware is the GPU processing platform of Nvidia GXT 750Ti with 2GB, 460 memory and 128 bit bandwidth using CUDA 7.5 software and QT creator compiler combined with Intel core i7 CPU, RAM
8 GB with Windows 8.1 operating system The GPU processing platform is selected because it supports parallel processing structures, has multiple processor cores, broadband and memory is increasingly being increased in accordance with the experimental program of the thesis
Trang 8Chapter 2:
RESEARCH BELIEF PROPAGATION ALGORITHMS AND BUILD THE METHODS TO IMPROVE SIGNAL PROCESSING
SPEED IN THE STEREO VISION SYSTEM
2.1 Markov random field
Markov random field is a branch of probability theory Markov random field is used as a tool to processing image data modeling, combined with winner-take-all algorithms In addition, the Markov random field is used as a means of generating inference results on images The inferences related to basic image and frame structure will solve problems such as image reconstruction, image segmentation, stereo vision and creating object labeling The markov random field model usually has two forms: grid-like structure and part-based structure
2.2 Belief propagatin
Belief propagation uses messages containing the disparity values of the corresponding points and moves between nodes according to iterative methods to perform inference on the graph model This method provides accurate inferences with part-like structure models and provides approximate reasoning for grid-like structure A belief propagation algorithm
is used to identify maximum a posteriori (MAP) in markov random field models for stereo vision problems
2.3 Census transform
Census transform is a non-parametric transformation algorithm, it does not depend on the light conditions of the image [86] The operating principle of census transform is to convert each pixel into a bit length bit string with local space architecture For each neighboring pixel except the
center point will transform respectively into a bit in the sequence of N bits
according to threshold if the value of intensity is close, the neighboring bit
is greater than the central bit strength value corresponding to a bit equal to
1, then the bit is 0
2.4 Approachs to improve processing speed-up of belief propagation algorithm
Trang 9- Parallel calculations
- Reduce computational complexity
- Reduce the amount of memory required when performing
- Minimum update messages
- Optimize the way to access memory
2.5 Propose solutions to reduce cost functions
2.5.1 Proposed algorithm 1
The model of proposed algorithm coarse to fine belief propagation (CFBP: proposed algorithm 1) is built based on the markov random field model in grid-like structure, node with 4 neighborhood as Figure 2.16
Consider G = (E, V) where G is a graphical model, E is a set of nodes, V is
a set of edges The node is the label that is assigned the value of the intencity disparity of the stereo corresponding point of the stereo camera, often called data value or data function The edge is the label assigned the disparity value of the two neighboring labels, often called cost smooth or smooth function
Figure 2.16 Scheme of proposed algorithm model 1
Trang 10From the model proposed algorithm 1 shows that the proposed algorithm used the coarse-to-fine (CTF) level 1 method as shown in Figure 2.17 to reduce the number of nodes after loops Method CTF is used to deduce the reduction of the number of nodes by levels After executing
CTF level l, the number of nodes on the current loop will decrease S = 2x2 l
times the number of nodes in the previous loop The cost value for doing node reasoning on a node is determined by the formula (2.36) The message
4-in the proposed algorithm pass4-ing 4-in a parallel scheme as shown 4-in Figure 2.18 The initial start node selected is the node labeled (0, 0) with the initial message values set to be m0' 0and m0,0' 0
The energy function at CTF level 1 is given by
2
k k
m
+ The energy function is given by
Figure 2.17 Structure scheme CTF
level 1
Figure 2.18 Scheme passing
message
Trang 11of dimension given by the number of possible labels, '
Trang 12The energy function is given by:
I xx y is the gray level of the right image at the coordinates
(xx y i, ) of stereo camera image
2.5.2 Proposed algorithm 2
The model of proposed algorithm coarse to fine change space belief propagation (CFCSBP: proposed algorithm 2) has the same structure as the proposed algorithm 1 model as Figure 2.20, however, there is a diffrence between these two models is that while the proposed algorithm 1 must perform the number of loops equal to the number of disparity of the image then the proposed algorithm 2 has a number of loops that vary according to the Z'' coefficient by formula (2.50) compared to the depth disparity of the image
Figure 2.20 Scheme of proposed algorithm 2 model
Considering the stereo camera image resolution is m, n and k, where m
is the number of pixels in the row, n is the number of pixels in the column and k is the number of depth disparity of the image The number of coarse
to fine CTF level 1 is determined by formula (2.49) for reasons such as selection k2'
Trang 13where Z'' is the coefficient of depth change
Calculating the cost value for passing the message of the proposed algorithm 2 is the same as that of the proposed algorithm 1 except that the proposed algorithm 1 must perform k' k1' k2' the loop while the proposed algorithm 2 performs the k'' k1'' k2'' the loop
2.6 Propose solutions to cooperative optimization
2.6.1 Proposed algorithm 3
The model of proposed algorithm census transform belief propagation (CTBP: proposed algorithm 3) is built based on the markov random field model in grid-like structure, node with 4 neighborhood as Figure 2.22
Consider G = (E, V) where G is a graphical model, E is a set of nodes, V is
a set of edges The node is the label that is assigned the value of the intencity disparity of the stereo corresponding point of the stereo camera, often called data value or data function The edge is the label assigned the disparity value of the two neighboring labels, often called cost smooth or
smooth function Let V 1 , V 2 , V 3 , V 4 and E 1 , E 2 , E 3 , E 4 respectively nodes and edges of part 1, part 2, part 3 and part 4 of the proposed algorithm 3 model
Figure 2.22 Scheme of proposed algorithm 3 model
From the proposed algorithm 3 model show that, the start nodes for passing message to be labeled (0, 0) on belief propagation algorithm model