Digital elevation Model generation from Stereo Ima- 123docz.net

8.4 Methods, Processing, and Errors

8.4.4 Digital elevation Model generation from Stereo Images

When two images are acquired over the same site from two different viewpoints, it is possible to reconstruct the terrain relief and to generate digital terrain models (DTMs)�

Because the sensor did not image the bald Earth but the top of feature surfaces, DEMs are in fact DSMs, which include the height, or a part, of natural and human-made surfaces (trees, houses, fences, etc�; Figure 8�12)� Principally, two image-matching methods can be used to extract the elevation parallax for generating DSMs: (1) the computer-assisted (visual) method or (2) the automatic method� These two methods can of course be integrated to take into account the strength of each one�

Computer-assisted visual matching is an extension of the traditional photogrammetric method to extract elevation data (contour lines) on a stereoplotter� It then requires full stereoscopic capabilities to generate the online 3D reconstruction of the stereo model and to capture in real time the 3D planimetric and elevation features� For elevation, spot elevations, contour lines, or irregular grid, DEM can be generated� Stereoscopic viewing is realized on a computer screen using a system of optics� The stereo images are separated spatially, radiometrically, or temporally� Spatial separation is achieved by the use of two monitors or a split screen and an optical system using mirror or convex lenses or both�

Radiometric separation is achieved by anaglyphic or polarization techniques with colored or polarized lens, respectively� Temporal separation is achieved by an alternate display of the two images and using special synchronized lenses (Walker and Petrie 1996)�

Right image

DEM(pixel values representing the elevation) The x, y and z position Left image

Pixels

A A

B B

B Matching pixels

FIgure 8.12

Digital elevation models from stereoscopic images acquired from two different viewpoints� The heights of natural or human-made surfaces are included in the elevation�

To obtain true 3D performance in a stereo workstation, the images are resampled into an epipolar or a quasi-epipolar geometry, in which only the X-parallax related to the elevation is retained (de Masson d’Autumne 1979; Baker and Binford 1981)� Another solution to control the image positioning from the raw imagery is to automatically follow the dynamic change by cancelling the Y-parallax using the previously computed stereo model (Toutin et al� 1993; Toutin and Beaudoin 1995)� In the same way as with a conventional stereoplotter, the operator cancels the X-parallax by fusing the two floating marks (one per image) on the ground� The system then measures the bidimensional parallax between the images for each point, and computes the X, Y, and Z cartographic coordinates using 3D intersection�

The visual matching then combines in the brain a geometric aspect (fusing the floating marks together) and a radiometric aspect (fusing the floating marks on the corresponding image point)� Some automatic tasks (displacement of the image or cursor, prediction of the corresponding image point position) are added�

However, computer-assisted visual matching, principally used with paper-format images and analytical stereo workstations, is a long and expensive process to derive DEM� When using digital images, automated image matching can thus be used� Since image matching has been a lively research topic for the last 30 years, an enormous body of research work and literature exists on the image matching of different EO sensors�

Most of the research studies on satellite image matching are based on Marr’s research (1982) at the Massachusetts Institute of Technology (MIT), dealing with the modeling of human vision� If a computer program can be realized to see things as a human would, then the algorithm must have some basis in human visual processing� The stereo disparity is based on the following two “correct” assumptions about the real world (Marr and Poggio 1977): (1) a point of a surface has a unique position in space at any one time and (2) matter is cohesive� The first generation of image matching processes based on these assumptions is the gray-level image matching process� Gray-level matching between two images really implies that the radiometric intensity data from one image, representing a particular element of the real world, must be matched with the intensity data from the second image, representing the same real-world element�

Although satellite images of the real world represented by gray levels is not like a random-dot stereogram (which is easily matchable), gray-level matching has been widely studied and applied to remote sensing data� Most of the matching systems operate on reference and search windows� For each position in the search window, a match value is computed from the gray-level values in the reference window� The local maximum of all the match values computed in the search window is the good spatial position of the searched point� The match value can be computed with the normalized cross-correlation coefficient, sum of mean normalized absolute differences, stochastic sign change, or outer minimal number estimator methods� The first is considered to be the most accurate (Leberl et al�

1994) computation method and is largely used with remote sensing images� Leberl et al�

also noticed that matching errors were smaller with SPOT images and digitized aerial photographs than with SAR images� The last two match-value computation methods have rarely or never been used by the remote sensing community�

Another solution to the problem of matching, introduced by Fửrstner (1982), is the least- squares approach, minimizing the squares of the image gray-level differences in an itera- tive process� This method makes possible the use of well-known mathematical tools and the estimation of error� Rosenholm (1986) found that the more-complicated least-squares method applied on simulated SPOT images did not give any significant improvement when compared with the cross-correlation coefficient method� However, this least-squares method seems to be more accurate with real SPOT data (Day and Muller 1988)�

The notion of least-squares matching in the object domain (groundel) rather than in the image domain (pixel) was later introduced by Helava (1988)� Predicted image densities, corresponding to each groundel, are mathematically computed with known geometric and radiometric image parameters, and matched to the original ones� The uncertainty in the parameters of a particular groundel is resolved by the least-squares method� An advan- tage of this approach is that more than two images from the same or different sensors can be used to make the least-squares solution meaningful; a disadvantage is the inability to correctly model the groundel attributes for each image� Due to these reasons, this matching technique is mainly used with air photos, since more than two images overlap the same ground area and their geometry and radiometry are better controlled�

Since one of Marr’s assumptions was either missing or incorrectly implemented in gray- level matching (mainly with images of the real world), Marr developed a second generation of image matching: feature-based matching (Marr and Hildreth 1980)� The same element of the real world may look considerably different in remote sensing images acquired at different times and with different geometries between the sensor, illumination, and terrain�

Instead, the edges in the images reflect the true structures (Cooper, Friedman, and Wood 1987)� Although feature-based matching has not become very popular among the remote sensing community with satellite data, some applications have been realized with simulated SPOT and real Landsat-TM (Cooper, Friedman, and Wood 1987)� The DEM results were not as good as those obtained by Simard and Slaney (1986) with Landsat-TM stereo pair using gray-level matching� Họhn and Fửrstner (1988) also found that the least-squares matching method is more accurate than feature-based matching, which is converse to Marr’s theoretical prediction� Later, Schneider and Hahn (1995) tested the two methods to extract TPs on Modular Opto-electronic Multispectral Stereo Scanner (MOMS-2/D2) stereo images� Their results in planimetry and elevation were twice as accurate with intensity-based matching than with feature-based matching�

Hybrid approaches using multiprimitive multi-image matching can thus achieve better and faster results by combining gray-level matching and feature-based matching with a hierarchical multiscale algorithm, and also with computer-assisted visual matching�

An example is the algorithm and software, satellite image precision processing (SAT-PP), developed by the Institute of Geodesy and Photogrammetry of the Swiss Federal Institute of Technology, Zurich (ETH Zurich), Switzerland, for multisensor data (Eisenbeiss et al�

2005; Figure 8�13)� The feature-based approach may produce good results for identified features, but it produces no elevation at intermediate points� They can then be used as seed points for gray-level matching� Another hybrid approach is to generate gradient ampli- tude images in a first step with gray-level values derived from the original stereo images instead of gradient images with only binary edge values� In the second step, any gray-level matching technique can be used on these preprocessed images (Paillou and Gelautz 1999)�

The linear gradient operator can be designed to be optimal to remove noise (if any) and to enhance edges� No attempt has been made with VIR images�

Although computer-assisted visual matching is a long process, it has been proven to be very accurate with photos or different satellite VIR data (Leberl et al� 1994; Raggam et al�

1994; Dorrer et al� 1995; Toutin and Beaudoin 1995)� It can thus be used to eliminate blunders, to fill mismatched areas, or in areas where automated image matching gives errors larger than 1 pixel (about 10% for SPOT and 15% for digitized photographs; Leberl et al�

1994)� It can also be used to generate seed points for automated matching�

Other developments have been realized and tested principally for airborne or close- range stereo images, but rarely with satellite images, such as the global approach, scale space algorithms, relational matching, consideration of break lines, and multiple image

primitives� Some other research studies using the recognition of corresponding structures (Della Ventura et al� 1990) or of uniform regions (Petit-Frère 1992; Abbasi-Dezfouli and Freeman 1996), a moment-based approach with fine-invariant features (Flusser and Suk 1994), or a wavelet transform approach (Djamdji and Bijaoui 1995) were performed� They were only used to extract well-defined GCPs for image registration between different spaceborne VIR images�

More development must be done to integrate these solutions for generating seed points for gray-level matching� Some apparent contradictions should also be considered in future research studies, such as

The theoretical prediction of Marr (1982) that feature-based matching is better than

•

gray-level matching versus better experimental results with gray-level matching than with feature-based matching

Initial DSM generation (at highest level of pyramid) Multiple primitive multiimage matching

Geometrically constrained candidate search adaptive matching

parameter determination Probability relaxation based relational matching

Feature point extraction and matching

DSM represented by TIN (intermediate integration of feature point, grid points and edges)

Grid point generation

and matching Edge extraction and matching

Modified multiimage geometrically constrained matching (MPGC)

Final DSM

Image preprocessing and image pyramid generation Images and orientation data

FIgure 8.13

Overview of the multiprimitive, multiimage matching method employed in satellite image precision processing (SAT-PP) software package developed by the Institute of Geodesy and Photogrammetry of the Swiss Federal Institute of Technology Zurich (ETH Zurich), Switzerland� (From Baltsavias, E�, L� Zhang, and H� Eisenbeiss�

2005� DSM generation and interior orientation determination of ikonos images using a test field in Switzerland�

In International Society of Photogrammetry, Remote Sensing Workshop “High-Resolution Earth Imaging for Geospatial Information”, May 17–20� CD-ROM� With permission�)

The theoretical automated image matching error (much better than one pixel) ver-

•

sus the experimental results (one and more pixels, depending on the data)

The so-called superiority of computer matching over visual matching versus the

•

experimental results

Overall, studies conducted until now confirm our earlier statement in this section that image matching has been a lively research topic for the past 30 years, but only time will tell whether it will remain so for the next 30 years�

Whatever the matching method and the strategy adopted, there is always a need for postprocessing the extracted elevation data, for example, to remove blunders, fill the mismatched areas, correct for vegetation cover, and smooth the DEM� Different methods can be used depending on the capability of the (stereo) workstation: manual, automatic, or interactive� A blunder-removal function is needed to remove any artifacts or noise when an elevation value is drastically different from its neighbors� These functions generally use existing filters based on statistical computation (mean, standard deviation)� Some functions tend to remove small noisy areas, whereas inversely, some tend to increase failed areas on the rationale that the pixels surrounded by failed pixels tend to have a high probability of being noisy� These functions are well adapted to be performed automatically�

To fill the mismatched and the noisy areas once they are detected, interpolation functions are used to replace the mismatched values by interpolating from good elevation values of the edges of the failed areas� Standard interpolation functions (bilinear, distance- weighted), which can be performed automatically, are adequate for small areas (less than 200 pixels)� For larger areas, an operator should interactively stereo extract seed points to fill the mismatched areas of the raw DEM� Another solution is to first transform the DEM into a triangular irregular network (TIN) and then display it over the stereo pair in the stereo workstation� The operator can then edit the appropriate vertex of triangles to better fit the shape of the TIN with his or her 3D perception of the terrain relief� In addition, the operator can extract some specific geomorphologic features (mountain crests, thalwegs, lake shorelines), which can be integrated to generally reduce the largest errors at the lowest and highest elevations in the DEM� Using human 3D perception to edit DEM is thus advantageous since it produces a more coherent and consistent terrain relief reconstruction�

Forested areas also have to be edited for vegetation cover, depending on the relation between sensor resolution, the expected DEM accuracy, and canopy height� An automatic classification or an interactive stereo extraction or both can delimit the different forested areas and measure their canopy height� This information is then used to reduce the elevations at the ground level� Finally, an appropriate method of filtering must also be applied to smooth the “pit and hummock” pattern of the DEM, while preserving the sharp breaks in slopes� Filtering improves the relative DEM accuracy or the relationship between neigh- boring values, whereas the absolute DEM accuracy appears to be controlled by the generation method, system, and software (Giles and Franklin 1996)� Unfortunately, only a few research studies and scientific results have been devoted to and published on the postprocessing step� Most of the time, stereo workstation manufacturers develop their own methods and tools to achieve this last, but not least, step of DEM generation�

8.4.5 “Orthorectification”

The last step of the geometric processing is image rectification with DEM (Figure 8�14)� To orthorectify the original image into a map image, there are two processing operations:

1� A geometric operation to compute the cell coordinates in the original image for each map image cell (Figure 8�14a)

2� A radiometric operation to compute the intensity value or digital number (DN) of the map-image cell as a function of the intensity values of original image cells that surround the previously computed position of the map image cell (Figure 8�14b) 8.4.5.1 Geometric Operation

The geometric operation requires the two equations of the geometric model with the previously computed unknown parameters, and sometimes elevation information� Since the 2D models do not use elevation information, the accuracy of the resulting rectified image will depend on the image viewing/look angle and the terrain relief� On the other hand, 3D models take into account the elevation distortion and a DEM is thus needed to create accurate orthorectified images� This rectification should then be called an orthorectification� But if no DEM is available, different altitude levels can be input for different parts of the image (a kind of “rough” DEM) in order to minimize this elevation distortion� It is then important to have a quantitative evaluation of the DEM impact on the rectification/

orthorectification process, both in terms of elevation accuracy for the positioning accuracy and grid spacing for the level of details� This last aspect is more important with HR images because a poor grid spacing when compared with the image spacing could generate artifacts for linear features (wiggly roads or edges)�

(a) p

q Y

Rectified image

Observed landscape

p q

Unprocessed image

(b) Corrected

image

Original image

FIgure 8.14

Image rectification to project the original image to the ground reference system: the geometric (a) and radiometric (b) operations�

Figures 8�15 and 8�16 give the relationship between DEM accuracy (including interpolation in the grid), and the viewing and look angles with the resulting positioning error on VIR and SAR orthoimages, respectively� These curves were mathematically computed with the elevation distortion parameters of a 3D physical model (Toutin 1995b)� However, they could also be used as an approximation for other 3D physical and empirical models�

One of the advantages of these curves is that they can be used to find any third parameter when two others are known� It can be useful not only for the quantitative evaluation of

15 1015 2025 30 50

00 10

Viewing angle (°)

Planimetric error (m)

DEM accuracy (m)

20 30 40

FIgure 8.15

Relationship between the digital elevation model (DEM) accuracy (in meters), the viewing angle (in degrees) of the visible and infrared (VIR) image, and the resulting positioning error (in meters) generated on the orthoimage� (From Toutin, T�, EARSeL J Adv Remote Sens, 4, 2, 1995b�)

160 140 120 100 80 60

DEM accuracy (m) 40 Planimetric error (m)

20 010

1 1

1 2

2 3 3

3 4

4 5

5 6 7

Radarsat beam modes

20 Fine Standard Wide

25 30

Viewing angle (°)35 40 45 50 10 20 30 40 50 60 70 80 90 100 110 120 130

FIgure 8.16

Relationship between the digital elevation model (DEM) accuracy (in meters), the look angle (in degrees) of the synthetic aperture radar (SAR) image, and the resulting positioning error (in meters) generated on the SAR orthoimage� The different boxes at the bottom represent the range of look angles for each Radarsat beam mode�

(From Toutin, T�, Journal canadien de télédétection, 24, 1998�)

the orthorectification but also to forecast the appropriate input data, DEM, or the viewing/

look angles, depending on the objectives of the project�

For example (Figure 8�15), with a SPOT image acquired with a viewing angle of 10° and having a 45-m accurate DEM, the error generated on the orthoimage is 9 m� Inversely, if a 4-m final positioning accuracy for the orthoimage is required and there is a 10-m accurate DEM, the VIR image should be acquired with a viewing angle less than 20°� The same error evaluation can be applied to SAR data using the curves given in Figure 8�16�

As another example, if positioning errors of 60 and 20 m on standard-1 (S1) and fine-5 (F5) orthoimages, respectively, are required, a 20-m elevation error, which includes the DEM accuracy and the interpolation into the DEM, is thus sufficient� For HR images (spaceborne or airborne), the surface heights (buildings, forest, hedges) should be either included in the DTM to generate a DSM or taken into account in the overall elevation error� In addition, an inappropriate DEM in terms of grid spacing can generate artifacts with HR images acquired with large viewing angles, principally over high relief areas (Zhang, Tao, and Mercer 2001)�

Finally, for any map coordinates (X, Y), with the Z-elevation parameter extracted from a DEM when 3D models are used, the original image coordinates (column and line) are computed from the two resolved equations of the model� However, the computed image coordinates of the map image will not be directly overlaid on a pixel center of the original image; in other words, the column and line computed values will be rarely, if ever, integer values�

8.4.5.2 Radiometric Operation

Since the computed coordinate values in the original image are not integers, one must compute the DN to be assigned to the map image cell� In order to compute the DN to be assigned to the map image cell, the radiometric operation uses a resampling kernel applied to original image cells: either the DN of the closest cell (called “nearest neighbor resampling”), or a specific interpolation or deconvolution algorithm using the DNs of surrounding cells� In the first case, the radiometry of the original image and the image spec- tral signatures are not altered, but the visual quality of the image is degraded� In addition to radiometric degradation, a geometric error of up to half a pixel is introduced� This can cause a disjointed appearance in the map image� If these visual and geometric degrada- tions are acceptable to the end user, it can be an advantageous solution�

In the second case, different interpolation or deconvolution algorithms (bilinear interpolation or sinusoidal function) can be applied� The bilinear interpolation takes into account the four cells surrounding the cell� The final DN is then computed either from two succes- sive linear interpolations in line and column using the DNs of the two surrounding cells in each direction or in one linear interpolation using the DNs of the four surrounding cells�

The DNs are weighted as a function of the cell distance from the computed coordinate values� Due to the weighting function, this interpolation creates a smoothing in the final map image�

The theoretically ideal deconvolution function is the sin(x)/x function� As this sin(x)/x function has an infinite domain, it cannot be exactly computed� Instead, it can be represented by a piecewise cubic function, such as the well-known cubic convolution� The cubic convolution then computes third-order polynomial functions using a 4 × 4 cell window�

The DNs are first computed successively in the four-column and -line directions, and the final DN is the arithmetic mean of these DNs� This cubic convolution does not smooth, but enhances and generates some contrast in the map image (Kalman 1985)�

Digital elevation Model generation from Stereo Images

Monitoring Forest Successional Stages with landsat Imagery

Moderate - Resolution Imaging Spectroradiometer Land Products