virtual camera is constrained by two boundary lines see Figure 5.4: one boundary line is the optical axis of the virtual camera which can be determined by back-extending the refracted ra
Trang 1CHAPTER 5 SINGLE-LENS TRINOCULAR VISION
STEREO-In this chapter, we present a novel design for stereovision - a 3F filter prism) based single-lens trinocular stereovision system This system can be considered as an extension of the single-lens binocular stereovision presented in Chapter 4 An image captured by this system is divided into three sub-images, called stereo image triplet, and these three sub-images can be taken as the images captured
(tri-by three virtual cameras which are created (tri-by the 3F filter The stereo image triplet is captured simultaneously by this system, and hence a dynamic scene will be handled
by this system without any problem Video rate image capturing is not a problem for this system too
The basic ideas of the two approaches used to study the previous single-lens binocular system are also applied here to model and determine this single-lens trinocular system: one based on calibration technique and another one based on geometrical analysis of ray sketching The approach based on geometry analysis of ray sketching is still of the greater interest because of its significantly simpler implementation: it does not require the usual complicated calibration process but one simple field point test (see 4.1.3) to determine the whole system once the system is fixed and pin-hole camera model is used In addition, greater consideration is given
so that the mathematical analysis used in this approach can be generalized, with as minor modification as possible, to explain similar system which employ the prisms with similar pyramid-like structures but different number of faces (≥3), the so called single-lens multi-ocular stereovision systems which will be introduced in the next chapter An implicit mathematical solution is given Due to its complexity, this
Trang 2solution can only be obtained numerically by computer programming, and is not like the explicit mathematical expressions for the single-lens binocular stereovision obtained by the geometrical analysis based approach The mathematic method used
by this approach is made generic to facilitate any comprehensive system analysis and also to provide a flexible way of analyzing any refractive ray problem involving any planar glass surface in 3-dimensional space Experiments are conducted to test the feasibility of both approaches
Search of stereo correspondence is a difficult issue in stereovision Trinocular stereovision, which enables to cross check the hypothesized correspondence using additional epipolar constraints, contributes to the solution of this problem A short review on the epipolar geometry and its application to trinocular stereovision is given
in Appendix A Trinocular stereovision can also help to solve the problem of occlusion in stereovision and its redundant stereo information should lead to better accuracy in depth recovery In 1986, the idea of trinocular stereovision was presented by Yachida, et al [38] Extensive discussions were given by Ayache [39][41][44] A list of the pioneers of trinocular stereovision is given in [38]-[44] The discussed trinocular vision systems appeared in the literature review include the orthogonal configuration and the non-orthogonal configuration
Because of the potential advantages of trinocular stereovision, researches are still carried out and different applications have been developed in recent years A list
of more recent works on trinocular stereovision is [45]-[52] Chiou et al [45] discussed the optimal camera geometry of trinocular stereovision with regards to system performance; Agrawal and Davis [52] studied the problem of shortest paths and the ordering constraint in correspondence searching of trinocular; Pollard et al
Trang 3[51] presented their application of trinocular stereo system on view synthesis Discussions about trinocular stereovision can also be found in the book by Faugeras [5], Sonka et al [7] and a discussion on its geometrical properties can be found in the book by Hartley and Zisserman [6]
Nevertheless, the price to pay for trinocular stereovision is due to the third camera which often increases the complexity of system setup, calibration and also camera synchronization Developing a single-lens trinocular stereovision system may help to solve these problems, but very few works on single-lens trinocular stereovision systems that can perform simultaneous image capturing are reported Some relevant but different works are presented by Kurada [53] and Ramsgaard [54] Both systems need to employ mirrors in order to assist and can perform close range stereovision The design of Kurada [53] uses a four-mirror setup such that the three views of a scene can be imaged onto a camera image plane side by side via a tri-split lens head However its system configuration is relatively complex, and more importantly the three virtual camera optical axes are nearly co-planar, which results
in the difficulty in the application of epipolar constraints for correspondence searching The design of Ramsgaard [54] positions two rectangular mirrors to be perpendicular with each other and both are parallel with the real optical axis such that the camera can simultaneously detect one direct view of the object and also two reflected images of the object However this system needs to capture an image via two reflections and the information from this system does not seem to be utilized easily because its quality depends on a perfect alignment It also has the problem of inefficient CCD matrix usage
In our work, one alternative in building a single-lens trinocular stereovision system which can avoid the above problems is presented, and also detailed methods to model
Trang 4the system are provided, including a method providing fast and efficient implementation To our knowledge this design is novel Part of the work reported in this chapter has been published in [55]
5.1 Virtual Camera Generation
The key issue to model and determine our single-lens trinocular system is on the determination of virtual cameras If a 3F filter is vertically positioned in front of
a CCD camera as shown in Figure 5.1, in which the shape of a 3F filter is also illustrated, the image plane of this camera will capture three different views of the same scene behind the filter in one shot These three sub-images can be taken as the images captured by three virtual cameras which are generated by the 3F filter One sample image captured by this system is given in Figure 5.2, from which significant differences among the three sub-images caused by different view angles and view scopes of the virtual cameras can be observed It is assumed that each virtual camera consists of one unique optical center and one “planar” image plane The challenge is
to determine the properties of these virtual cameras, which mainly include their focal lengths, positions and orientations so that the disparity information on the sub-images can be exploited to perform depth recovery like a stereovision system Furthermore,
as these three views are captured simultaneously, this system theoretically possesses the merits of a typical trinocular stereovision system including its special properties
on epipolar constraints, which provides a significant advantage in correspondence searching
Like the virtual camera model used for single-lens binocular stereovision system in the previous chapter, it is assumed that the Field of View (FOV) of each
Trang 5virtual camera is constrained by two boundary lines (see Figure 5.4): one boundary line is the optical axis of the virtual camera which can be determined by back-extending the refracted ray that is aligned with the real camera optical axis; and another FOV boundary line of the virtual camera can be determined by back-extending the refracted ray that is aligned with the real camera FOV boundary line(s) The optical center of the virtual camera can be found at the intersection between these two FOV boundary lines Thus, the generation of the virtual camera(s) is done
by the proceeding method The properties of each virtual camera can be determined either by calibration or by geometrical analysis of ray sketching, which are presented
in next two sections
The basic requirements to build this system are:
1) the image plane of the CCD camera in use has consistent properties;
2) the 3F filter is exactly symmetrical with respect to all of its three apex edges and its center axis, which passes through the prism vertex and is normal with its back plane;
3) the back plane of the 3F filter is positioned in parallel with the real camera image plane, and;
4) the projection of the 3F filter vertex on the camera image plane is located at the camera principle point and the projection of one apex edge of the filter on the image plane bisects the camera image plane equally and vertically
With the above requirements satisfied, the camera optical axis will pass through the 3F filter vertex; the three virtual cameras will have identical properties and will be symmetrically located with respect to real camera optical axis Thus the analysis of any one virtual camera would be sufficient as the results can be transposed to the other two virtual cameras Now the three sub-regions of the image
Trang 6plane (and also the three corresponding virtual cameras) can be differentiated by
using label l, r and b which stand for left, right and bottom, as shown in Figure 5.1
Figure 5.1 Positioning a 3F filter in front of a CCD camera
Figure 5.2 One image captured by the single-lens trinocular system
r
l
b
Trang 75.1.1 Determining the Virtual Cameras by Calibration
The calibration technique introduced in Chapter 3 can also be used here to calibrate the virtual cameras, with slight modifications Various coordinate systems can be created on the virtual cameras analogously, including the distorted virtual
camera 2D image coordinate systems (X d,l , Y d,l ), (X d,r , Y d,r ) or (X d,b , Y d,b), undistorted
virtual camera 2D image coordinate systems (X u,l , Y u,l ), (X u,r , Y u,r ) or (X u,b , Y u,b) and the Left Virtual Camera Coordinate System (LCCS), the Right Virtual Camera Coordinate System (RCCS), and the Bottom Virtual Camera Coordinate System
(BCCS) (X d,l , Y d,l ), (X d,r , Y d,r ) and (X d,b , Y d,b) can be linked to the computer image
coordinates (X f , Y f) via:
,')(
,')(
;')(
,')(
;')(
,')(
, ,
, ,
, ,
dy Y C X
dx X
C Y
dy Y C Y
dx X
C X
dy Y C Y
dx X
C X
f y b f
x b
f Y r d f
x r
d
f Y l d f
x l
where dx′ and dy′ are the pixel size of the computer sampled images (images captured
by computer and displayed on computer screen) and can be obtained by actual CCD pixel size times its resolution and then divided by the computer sampled image
resolution in both x and y directions Hence the calibration of virtual cameras
becomes possible Each virtual camera can be calibrated one-by-one using the information provided by its correspondent sub-image, from which the whole system can be determined
This system is now ready to perform depth recovery like a typical trinocular stereovision system using triangulation knowledge
From the coordinate setup for calibration the following equations can be obtained:
Trang 8;
;
r w w
w r r
r
r
b w w
w b b
b
b
l w w
w l l
l
l
T z y
x R z
y
x
T z y
x R z
y
x
T z y
x R z
lx l l
l l
l l l
l l l l
T T
T T r
r r
r r r
r r r
9 8 7
6 5 4
3 2 1
,
=
≡
bz by
bx b b
b b
b b b
b b b b
T T
T T r
r r
r r r
r r r
9 8 7
6 5 4
3 2 1
rx r r
r r
r r r
r r r r
T T
T T r
r r
r r r
r r r
9 8 7
6 5 4
3 2 1
The precondition for the proceeding equations to hold is that the world coordinate systems used in calibrating all the left, bottom and right virtual cameras must be the same coordinate system (same origin and orientation)
From the calibration result, R l , T l , R b , T b , R r and T r can be obtained, and also f l,
f b and f r The details on the calibration procedure can be found in Tsai [37] and in Chapter 4 of this thesis
It is also known that:
Trang 9;,
;,
r r
ur r r r
ur r
b b
ub b b
b
ub b
l l
ul l l l
ul l
z f
Y y z
f
X
x
z f
Y y z f
X
x
z f
Y y z f
7
6 5
4
3 2
1
9 8
7
6 5
4
3 2
1
9 8
7
6 5
4
3 2
1
rz w r w r w r
r
ry w r w r w r r
r
ur
rx w r w r w r r
r
ur
bz w b w b w b
b
by w b w b w b b
b
ub
bx w b w b w b b
b
ub
lz w l w l w l
l
ly w l w l w l l
l
ul
lx w l w l w l l
l
ul
T z r y r x r
z
T z r y r x r z
f
Y
T z r y r x r z
f
X
T z r y r x r
z
T z r y r x r z
f
Y
T z r y r x r z
f
X
T z r y r x r
z
T z r y r x r z
f
Y
T z r y r x r z
f
X
++
+
=
++
+
=
++
+
=
++
+
=
++
+
=
++
+
=
++
+
=
++
+
=
++
Trang 1000
00
01
0
00
00
00
1
00
00
9 8 7
6 5 4
3 2 1
9 8 7
6 5 4
3 2 1
9 8 7
6 5 4
3 2 1
r r r
r
ur r
r r
r
ur r
r r
b b b
b
ub b
b b
b
ub b
b b
l l l
l
ul l
l l
l
ul l
l l
r r
Y r
r r
f
X r
r r
r r
Y r
r r
f
X r
r r
r r
Y r
r r
f
X r
r r
r b l w w
x
and B=[−T lx −T ly −T lz −T bx −T by −T bz −T rx −T ry −T rz] With the least square solution,
B A A A
c=( T )− 1 T
(5.6)
The redundant information obtained with three virtual cameras (as any two virtual cameras are enough for stereovision purpose) are handled by using the least square method, and the condition number appearing in calculating the matrix inverse
is not a problem as shown by our calculation in the experiment This is believed to
be due to the fact that all the three virtual cameras are naturally symmetrically located (or in another word, evenly scattered) about the optical axis of the real camera and this situation leads to the possible maximum linear independence amongst the coordinate systems on the three virtual cameras that can be achieved in such a system design (This explanation is equally valid in the calibration based approach and the single-lens multi-ocular stereovision system to be presented in the following sections and chapters) Now this system is ready for depth recovery
Trang 11There are other different methods to organize the triangulation information For example, one method is to find depth information using any two virtual cameras and take the average of the three results obtained from three combinations of the virtual cameras However, organizing all the triangulation information using one linear system (equation (5.5)), which is more systematic, is preferred here
It is well known that camera calibration is normally quite tedious to implement since the calibration software needs to be prepared and calibration patterns also need to be fabricated with good precision, and the operation of calibration itself
is also not straightforward In the next section, the approach of determining this system using geometrical analysis of ray sketching is presented, which can avoid these problems and hence results in a much easier system implementation process
5.1.2 Determining the Virtual Cameras by Geometrical Analysis of Ray
Sketching
In this section, the use of the geometrical knowledge to analyze the ray sketching that links the real camera and the 3F filter is described, from which the properties of the virtual cameras can be determined As explained in Chapter 3, pin-hole camera can be used to model the real camera and this model is also used to approximate the virtual cameras Hence camera lens distortions are ignored, which implies that distorted 2D image coordinates would be identical to undistorted 2D image coordinates on the camera image plane
Due to the complexity of the mathematics used by this approach, this section
is divided into two parts: it firstly gives a simple and concise description for the readers who want to get a quick understanding of the basic idea of this approach, and then it gives a thorough description for the readers who want to know the details
Trang 125.1.2.1 The basic idea
Assuming that the real camera used by the system is not calibrated, but the size and resolution of the camera CCD chip, the computer sampled image resolution, geometry of the 3F filter, and also its relative position with respect to the real camera
(Figure 5.3) are known A ray sketching is drawn in Figure 5.4 Let us find a point P
on the real camera image plane which defines one FOV boundary line of a virtual camera (its choice depends on how the effective range of the real camera image plane
is defined) such that the line jointing point P and the focal point F intersects with the line O″D (the line which bisects triangle O″AC) at point M, and this ray PM after two refractions on filter surfaces becomes ray NL (point N is on plane A′B′C′) and goes
into the view zone behind the filter If this ray NL defines the boundary of the
captured scene or the interested boundary within one sub-region on the real camera image plane, then it also defines the view boundary of the virtual camera that is correspondent to this sub-region
Next, we look at ray KO″, where point K is the camera image plane center and point O″ is the filter vertex, this ray becomes ray JS (point J is on plane A′B′C′) after
two refractions As this ray KO″defines the real camera optical axis, then ray JS
defines the virtual camera optical axis according to the description about virtual
camera model in section 5.1 By back-extending the ray NL and JS, their intersection can be found, which is the optical center F′ of the virtual camera This intersection
always exists as ray NL and JS are located in a same plane This basically described
how the virtual cameras are determined via geometrical analysis
This approach is simple to understand For example, to find ray MN, applying
the coordinate manipulation knowledge which is often used in the kinematics analysis
of robotics, firstly line PM needs to be determined, and then define an auxiliary
Trang 13coordinate T which has its origin located on point M and its z-axis along line PM, axis along line UV, which is an auxiliary line on plane O″AC and perpendicular to line PM, and y-axis of T can be determined using the right hand rule The refraction that occurs on the filter surface actually rotates this coordinate system T by an angle θ
x-with respect to line UV, where θ can be determined via refraction rule Suppose the
coordinate T becomes coordinate T′ after this rotation, and then the following equation can be obtained:
),,(
C T
(5.7)
where C is any reference coordinate system and
.100
0
0)cos(
)sin(
0
0)sin(
)cos(
0
000
1),
=
θθ
θθ
θ
x ROT
Similar camera and image coordinate systems can be built on the virtual cameras like what was done for the calibration based approach, except that the 2D
computer image coordinate systems are rotated with respect to their z-axes such that their x-axes bisect the correspondent sub-regions on real camera image plane of each virtual camera for easier analysis Hence (X d,l , Y d,l ), (X d,r , Y d,l ) and (X d,b , Y d,b) can be
linked to the computer image coordinates (X f , Y f) via:
Trang 14)(
,')(
;'30cos)(
'30sin)(
,'30sin)(
'30cos)(
;'30cos)(
'30sin)(
,'30sin)(
'30cos)(
dx X
C Y
dy Y
C dx C
X Y
dy C
Y dx C
X X
dy C
Y dx C
X Y
dy C
Y dx X
C X
f y b
d
f x b
f Y x
f r
d
y f x
f r
d
Y f x
f l
d
y f f
x l
Once the system can be described mathematically, we can study the effect on system performance when certain parameters are varied, and we can use this knowledge to enhance the design For example, a larger 3F filter size or a larger distance between 3F filter and real camera can give a larger baseline, which is the distance between the optical centers of any two virtual cameras Note that a larger baseline should give a better precision in stereovision The system view zone can also be inferred from the mathematical model
Trang 15Figure 5.3 Position relationship between real camera and 3F filter
The mathematics involved in this approach is not simpler than the calibration based approach, but it can be seen that by using this approach a complicated calibration procedure, which includes the camera calibration software and hardware preparation and calibration operation can be avoided Instead only an alignment between the 3F filter and real camera and also a procedure of field point testing are required Hence a much simpler system implementation process can be expected
3F filter apex Real image plane
Trang 16Figure 5.4 Symbolic Illustration of virtual camera modeling using geometrical analysis
5.1.2.2 Detailed description
This section describes the complete idea of modeling the virtual cameras using geometrical analysis method based on the introduction presented in the previous section with emphasis on two problems for the geometrical analysis based
Trang 17approach to determine single-lens trinocular stereovision: one is about how the virtual cameras are determined geometrically and the other one is about the depth recovery, which are not discussed in detail in the previous section They are now described separately
According to the definition of virtual camera made in the previous section, in
Figure 5.4, ray KF (i.e line KO″), O″J, PF (i.e line PM) and line MN after refraction become, O″J, JS, MN and NL respectively Line NL and line JS are actually the
boundaries of the virtual camera view scope and will help to determine the position
of the virtual camera The real camera can be modeled by line KF, PF, and point F Other known conditions include f, d, t, h (see Figure 5.3), n r refraction index), etc
The virtual camera model are described by line K′F′ (optical axis of virtual camera),
line P′F′ and point F′, which are to be determined As shown in Figure 5.4, line P′F′
and line K′F′are actually line NL and line JS Thus these procedures can be separated into two main paths as illustrated by Figure 5.5: to find line NL (Flow A in Figure 5.5, denoted by red lines in Figure 5.4) and, and to find line JS (Flow B in Figure 5.5,
denoted by blue lines in Figure 5.4) These two flows can be further separated into
more sub-steps as illustrated in Figure 5.5 Once line NL and line JS are found, point
F′ can be determined easily In the following analysis, the coordinates are all referred
to the 3D real camera coordinate system which is located at the real camera optical
center and denoted by C
Path A – Solve For Line NL
Let plane AO″C be represented by Ax + By + Cz = 1, where x, y and z are
coordinates of any point in this plane and they are all described with respect to the
real camera coordinate system which is located at real camera optical center, C
Trang 18To determine point A, which is on plane AO″C,
.,
0,
0,
6
3,
l
(5.12)
Hence the three proceeding equations can be used to solve for the coefficients
A, B and C of the image plane AO ″C, which are:
.1)(
26
3
,1)(
,1)(
3
3
=++++
−
=+
=+++
C h d f B
l A l
C d
f
h d f A
l
And after solving the proceeding equations, A, B and C are given by:
.)(
1,
)(
3,
)(
3
d f
C l
d f
h B
l d f
h A
+
=+
−
=+
Trang 19Figure 5.5 Workflow of determining the virtual camera via geometrical analysis
Known Conditions from real camera model:
line KF, PQ, PF, point F, etc
Step A2: Find Line PM
Step A3: Find Line MN
Step A1: Find Point M
Step A4: Find Point N
Step A5: Find Line NL
Step B2: Find Line KO″ Step B1: Find Point O″
Step B3: Find Line O″J
Step B4: Find Point J
Step B5: Find Line JS
line K′F′, P′K′, P′F′, point F′
Path
A
Path
B
Trang 20Step A1: Find Point M
As described in the previous section, point P is located on the real camera
image plane which defines one FOV boundary line of a virtual camera (its choice depends on how the effective range of the real camera image plane is defined, let say,
denoted by H) such that the line jointing point P and focal point F intersects with the line O″D (the line which bisects triangle O″AC) at point M This gives:
,30sin2
,30cos2
f z
h y
,0
,0
Point M is the intersection of line PF and plane AO″C, and hence point M is
on the following line:
)(
)(
)(
)(
F P F
P F
P
P P
P
z z C y y B x x A
z C y B x A P
F P
M
−+
−+
−
×+
×+
×
−+
(5.16)
Step A2: Find line PM
After obtaining point M, line PM can be determined easily as point P is
known
Step A3: Find line MN
Trang 21First, we need to find the angle formed by line PF and plane AO″C, which is
denoted by ρ The distance between points P and M is given by:
.)(
)(
)
m p m
p m
1
"
2 2
A
Cz By Ax C AO
++
−++
Since plane AO″C is known (see Equation (5.13)), its normal can be
determined easily, in vector form:
],,[
" A B C
and its norm is
2 2 2
" A B C
N AO C = + + ,
and its unit vector is given by:
C AO
C AO C
N n
"
"
" =
(5.20)
Trang 22Do note that plane AO″C has infinite number of the normals, but here the normal passing through point M is used The angle between this normal and line PM
will be calculated later
Now, we look at Figure 5.6 After refraction, ray PM changes direction to
MN, where point N is the intersection between line MN and plane A′B′C′ N AO”C represents a normal of plane AO″C which passes through point M Angle α is the
angle formed by line PM and the normal N AO”C, and angle β is the angle between line
MN and line N AO”C Let pm represent the unit vector of line PM, then
αcos