THEMATIC ACCURACY ASSESSMENT OF REGIONAL SCALE LAND-COVER DATA 93within the extent of all of the PSUs only, of 100 sample sites per LC class.. Before the NAPP photo interpretation for th
Trang 1Thematic Accuracy Assessment
of Regional Scale Land-Cover Data Siamak Khorram, Joseph F Knight, and Halil I Cakir
CONTENTS
7.1 Introduction 91
7.2 Approach 92
7.2.1 Sampling Design 92
7.2.2 Training 93
7.2.3 Photographic Interpretation 93
7.2.3.1 Interpretation Protocol 93
7.2.3.2 Interpretation Procedures 94
7.2.3.3 Quality Assurance and Quality Control 94
7.3 Results 94
7.3.1 Accuracy Estimates 94
7.3.2 Issues and Problems 99
7.3.2.1 Heterogeneity 99
7.3.2.2 Acquisition Dates 99
7.3.2.3 Location Errors 99
7.4 Further Research 101
Acknowledgments 101
References 101
Appendix A: MRLC Classification Scheme and Class Definitions 102
7.1 INTRODUCTION
The Multi-Resolution Land Characteristics (MRLC) consortium, a cooperative effort of several U.S federal agencies, including the U.S Geological Survey (USGS) EROS Data Center (EDC) and the U.S Environmental Protection Agency (EPA), has conducted the National Land Cover Data (NLCD) program This program used Landsat Thematic Mapper (TM) 30-m resolution imagery as baseline data and successfully produced a consistent and conterminous land-cover (LC) map of the lower 48 states at approximately an Anderson Level II thematic detail The primary goal of the program was to provide a generalized and regionally consistent LC product for use in
a broad range of applications (Lunetta et al., 1998) Each of the 10 U.S federal geographic regions
L1443_C07.fm Page 91 Friday, June 25, 2004 10:14 AM
Trang 292 REMOTE SENSING AND GIS ACCURACY ASSESSMENT
was mapped independently EPA funded the Center for Earth Observation (CEO) at North Carolina State University (NCSU) to assess the accuracy of the NLCD for federal geographic Region IV
An accuracy assessment is an integral component of any remote sensing-based mapping project Thematic accuracy assessment consists of measuring the general and categorical qualities of the data (Khorram et al., 1999) An independent accuracy assessment was implemented for each federal geographic region after LC mapping was completed The objective for this study was specifically
to estimate the overall accuracy and category-specific accuracy of the LC mapping effort Federal geographic Region IV included the states of Kentucky, Tennessee, Mississippi, Alabama, Georgia, Florida, North Carolina, and South Carolina (Figure 7.1)
7.2 APPROACH
Quantitative accuracy assessment of regional scale LC maps, produced from remotely sensed data, involves comparing thematic maps with reference data (Congalton, 1991) Since there were
no suitable existing reference data that could be used for all federal regions, a practical and statistically sound sampling plan was designed by Zhu et al (2000) to characterize the accuracy
of common and rare classes for the map product using National Aerial Photography Program (NAPP) photographs as the reference data
The sampling design was developed based on the following criteria: (1) ensure the objectivity
of sample selection and validity of statistical inferences drawn from the sample data, (2) distribute sample sites spatially across the region to ensure adequate coverage of the entire region, (3) reduce the variance for estimated accuracy parameters, (4) provide a low-cost approach in terms of budget and time, and (5) be easy to implement and analyze (Zhu et al., 2000)
The sampling was a two-stage design The first stage, the primary sampling unit (PSU), was the size of a NAPP aerial photograph One PSU (photo) was randomly selected from a cluster of
selected PSU locations are shown in Figure 7.1 The second stage was a stratified random sample,
Figure 7.1 Randomly selected photograph center points.
Tennessee
Mississippi
Kentucky
North Carolina
South Carolina Georgia
Florida Alabama
0 70 140 210 280 Miles
N
L1443_C07.fm Page 92 Friday, June 25, 2004 10:14 AM
Trang 3THEMATIC ACCURACY ASSESSMENT OF REGIONAL SCALE LAND-COVER DATA 93
within the extent of all of the PSUs only, of 100 sample sites per LC class The selected sites were referred to as secondary sampling units (SSU) The number of sites per photograph ranged from
1 to approximately 70 (Figure 7.2) The total number of sample sites in the study was 1500 (100 per cover classes), although only 1473 sites were interpreted due to missing NAPP photos This sampling approach was chosen by the Eros Data Center (EDC) over a standard random sample to reduce the cost of purchasing the NAPP photography (Zhu et al., 2000)
Before the NAPP photo interpretation for the sample sites could begin, photo interpreters were trained to accomplish the goals of the study To provide consistency among the interpreters, a comprehensive training program was devised The program consisted of a full-day training session and subsequent on-the-job training Two experienced aerial photo interpretation and photogram-metry instructors led the formal classroom training sessions The training sessions included the following topics: (1) discussion of color theory and photo interpretation techniques, (2) understand-ing of the class definitions, (3) interpretation of over 100 sample sites of different classes durunderstand-ing the training sessions followed by interactive discussions about potential discrepancies, (4) creation
of sample sites for later reference, and (5) repetition of interpretation practice after the sessions The focus was on real-world situations that the interpreters would encounter during the project Each participant was presented with over 100 preselected sites and was asked to provide his or her interpretation of the land cover for these sites Their interpretations were analyzed and subsequently discussed to minimize any misconceptions During the on-the-job portion of the training, each interpreter was assigned approximately 500 sites to examine Their progress was monitored daily for accuracy and proper methodology The interpreters kept logs of their decisions and the sites for which they were uncertain about the LC classes On a weekly basis, their questions were addressed
by the project photo interpretation supervisor The problem sites (approximately 400) were discussed until all team members felt comfortable with the class definitions and their consistency in interpre-tation Agreement analysis between the three interpreters resulted in an average agreement of 84%
The standard protocol used by the photo interpreters was as follows:
Figure 7.2 Sample sites clustered around
the photograph center.
Photo Center
Sample Points
L1443_C07.fm Page 93 Friday, June 25, 2004 10:14 AM
Trang 494 REMOTE SENSING AND GIS ACCURACY ASSESSMENT
• Each interpreter was assigned 500 of the 1500 total sites.
• Interpretation was based on NAPP photographs
• The sample site locations on the NAPP photos were found by first plotting the sites on TM false-color composite images then finding the same area on the photo by context.
• During the interpretation process, cover type and other related information such as site homogeneity were recorded for later analysis.
• When there was some doubt as to the correct class or there was the possibility that two classes could
be considered correct, the interpreters selected an alternate class in addition to the primary class.
• The interpretations were based on the majority of a 3 ¥ 3 pixel window (Congalton and Green, 1999).
The Landsat TM images were displayed using ERDAS Imagine By plotting the site locations
on the Landsat TM false-color composite images, the interpreters precisely located each site Then, based on the context from the image, the interpreters located the site on the photographs as best they could Clearly, some error was inherent in this location process; however, this was the simplest
was intended to reduce the effect of location errors
The interpreters examined each site’s characteristics using the aerial photograph and TM image and determined the appropriate LC label for the site according to the classification scheme, then they entered the information into the project database The following data were entered into the database: site identification number (sample site), coordinates, photography acquisition date, pho-tograph identification code, imagery identification number, primary or dominant LC class, alternate
LC class (if any), general site description, unusual observations, general comments, and any temporal site changes between image and photo acquisition dates The interpreters did not have prior access to the MRLC classification values during the interpretation process
Individual interpreters analyzed 15% (n = 75) of each of the other interpreters’ sample sites to create an overlap database to evaluate the performance of the interpreters and the agreement among them Selection of these 75 sites was done through random sampling This scheme provided 225 sites that were interpreted by all three interpreters Agreement analysis using these overlap sites indicated an average agreement of 84% among the three interpreters (Table 7.1)
Quality assurance (QA) and quality control (QC) procedures were vigorously implemented in the study as designated in the interpretation organization chart (Table 7.2) Discussions among the interpreters and project supervisors during the interpretation process provided an opportunity to discuss the problems that occurred and to resolve problems on the spot
to determine how similarly the interpreters would call the same sites The initial results of the analysis revealed that some misunderstandings about class definitions had remained after the training process As a result, the interpreters were retrained as a group to “calibrate” themselves This helped
to ensure that calls were more consistent among interpreters Upon satisfactory completion of the retraining, the interpreters were assigned to complete interpretation of the 1500 sample sites
7.3 RESULTS
Table 7.3 presents the error matrix for MRLC Level II classes The numbers across the top and sides of the matrices represent the 15 MRLC classes (Appendix A) Table 7.4 presents the error
L1443_C07.fm Page 94 Friday, June 25, 2004 10:14 AM
Trang 5THEMATIC ACCUR
Table 7.1 Agreement Analysis Among PIs: Interpreter Call vs Overlap Consensus for the 225 Overlap Sites
Overlap Consensus
MRLC
© 2004 by Taylor & Francis Group, LLC
Trang 696 REMOTE SENSING AND GIS ACCURACY ASSESSMENT
matrix for MRLC Level I classes The Level II classes were grouped into the following Level I categories: (1) water, (2) urban or developed, (3) bare surface, (4) agriculture and other grasslands, (5) forest (upland), and (6) wetland (woody or nonwoody) The overall accuracies for the Level I and II classes were 66% and 44%, respectively
Table 7.3 illustrates the confusion among low-intensity residential, high-intensity residential, and commercial/transportation categories Many factors may have contributed to the confusion; however, we believe the complex classification scheme used was a dominant factor For example, the most ambiguous categories were the three urban classes, which were distinguished only by percentage of vegetation Technically, it was beyond the methods employed in this study to quantify subpixel vegetation content As a result, many high-intensity residential areas in the classified image were assigned to low-intensity residential and commercial/transportation classes This occurred because high-intensity residential classes, which had a median percentage of vegetation, were easily confused with lower-intensity and higher-intensity urban development
Also, many problems were encountered with the interpretation of cropland and pasture/hay since these classes had very similar spectral and spatial patterns that occurred within the same agricultural areas In addition, cropland was frequently converted to pasture/hay during the interval
of two acquisition dates, or vice versa Confusion also existed within classes of evergreen forest
Table 7.2 Interpretation Team Organization
Interpreter Organization Photo Interpreters PI #1 (500 pts + 75 pts
from PI #2 and 75 pts from PI #3
PI #2 (500 pts + 75 pts from PI #1 and 75 pts from PI #3
PI #3 (500 pts + 75 pts from
PI #1 and 75 pts from PI
#2
PI supervisor Random checking for consistency, checking 225 overlapped sites, sites with question
from three PIs
Project supervisor Checking sites with question from PI supervisor, random checking of overall sites,
overall QA/QC
Project director Procedure establishment, discussions on issues, random checking, overall QA/QC
Figure 7.3 Training, photo interpretation (PI), and quality assurance and quality control (QA/QC) procedures.
Classroom photo interpretation training
Independent and supervised photo interpretation for each interpreter
Interpretation of 225 overlap points
Overlap satisfactory?
Photo interpretation of the 1500 random sample points
Interpreters work through overlap points
as a group to resolve differences
Accuracy analysis
MRLC region
4 classified data Yes
No
L1443_C07.fm Page 96 Friday, June 25, 2004 10:14 AM
Trang 7THEMATIC ACCUR
Table 7.3 Error Matrix for the Level II MRLC Data (15 Classes)
Classified MRLC Data MRLC
© 2004 by Taylor & Francis Group, LLC
Trang 898 REMOTE SENSING AND GIS ACCURACY ASSESSMENT
and mixed forest, deciduous forest and mixed forest, barren ground and other grassland,
low-intensity residential and mixed forest, and transitional and all other classes
The difference between image classification and photo interpretation is that image classification
is mostly based on the spectral values of the pixels, whereas photo interpretation incorporates color
(tones), pattern recognition, and background context in combination These issues are inherent in
any accuracy assessment project using aerial photos as the reference data (Ramsey et al., 2001)
For this project, however, aerial photos were the only reasonable reference data source
The interpretation process is not the only component of the accuracy assessment process
(Congalton and Green, 1999) Additional factors that should be considered are positional and
correspondence error To account for these errors, the following additional criteria for correct
classification were considered in this project: (1) primary matches classified pixel, (2) primary or
alternate matches classified pixel, (3) primary is most common in classified 3 ¥ 3 pixel areas, (4)
primary matches any pixel in a classified 3 ¥ 3 pixel area, (5) primary is most common in classified
3 ¥ 3 pixel area, and (6) primary or alternate matches any pixel in a 3 ¥ 3 pixel area “Interpreted”
refers to the classes chosen during the aerial photo interpretation process, “primary” and “alternate”
are the most probable LC classes for a particular site, and “classified” refers to the MRLC
classification result for that site The analysis results for each cover class in six cases are presented
79.4% (n = 1473) for cases “a” and “f,” respectively
Table 7.4 Error Matrix for the Level I MRLC Data
MRLC data
1 87 3 10 0 2 6 108 0.81 87
2 0 188 9 4 38 4 243 0.77 188
3 1 12 134 21 8 9 185 0.72 134
4 1 46 45 227 30 39 388 0.59 227
8 1 43 78 21 207 12 362 0.57 207
9 8 6 24 18 4 127 187 0.68 127
Tot 98 298 300 291 289 197 1473
% 0.89 0.63 0.45 0.78 0.72 0.64 0.66 Corr 87 188 134 227 207 127 970
Table 7.5 Summary of Further Accuracy Analysis by Interpreted Cover Class: Number of Sites
Class No.
Primary PI Matches MRLC
Prim or Alt Matches MRLC
Primary PI
Is Mode of
3 ¥ 3
Primary PI Matches Any 3 ¥ 3
Prim or Alt
PI Is Mode
of 3 ¥ 3
Prim or Alt
PI Matches Any 3 ¥ 3
L1443_C07.fm Page 98 Friday, June 25, 2004 10:14 AM
Trang 9THEMATIC ACCURACY ASSESSMENT OF REGIONAL SCALE LAND-COVER DATA 99
The heterogeneity of many areas caused confusion in assigning an exact class label to the sites Since the spatial resolution of the Landsat TM data was 30 ¥ 30 m, pixel heterogeneity was a
trees, grassland, and several houses Thus, the reflectance of the pixel was actually a combination
of different reflectance classes within that pixel This factor contributed to confusion between evergreen forest and mixed forest, deciduous forest and mixed forest, low-intensity residential and other grassland, and transitional and several classes
Temporal discrepancies between photograph and image acquisition dates, if not reconciled, would negatively affect the classification accuracy (Plate 7.1b) For example, to interpret early forest growth areas, the interpreter had to decide whether the site was a transitional or a forested area If the photograph was acquired before the image (e.g., as much as 6 years earlier), it was clear that those early forest growth sites would show up as forest cover on the satellite image In this case, the interpreters decided the appropriate cover class based on satellite imagery
Locating the reference site on the photo was sometimes problematic This frequently occurred when: (1) the LC had changed between the image and photo acquisition dates, (2) there were few clearly identifiable features for positional reference, and (3) the reference site was on the border
of two or more classes (boundary pixel problem) When the LC had changed between acquisition dates, locating reference sites was difficult because the features surrounding the reference site were also changed Similarly, when a reference site fell in an area with few identifiable features for positional reference, the interpreter had to approximate the location of the reference site For
Table 7.6 Summary of Further Accuracy Analysis by Interpreted Cover Class: Percentage of Sites for
Each Class Class Percentage
Primary PI Matches MRLC
Prim or Alt
PI Matches MRLC
Primary PI
Is Mode of
3 ¥ 3
Primary PI Matches Any 3 ¥ 3
Prim or Alt
PI Is Mode
of 3 ¥ 3
Prim or Alt
PI Matches Any 3 ¥ 3
Total
Percentage
Trang 10100 REMOTE SENSING AND GIS ACCURACY ASSESSMENT
example, when the reference site was on the shadowy side of a mountain, it was impossible to see the reference features except the ridgeline of the mountain; thus, the interpreter was required to locate the reference site based on the approximate distance to and the direction of the ridgeline The third case was the most common source of confusion in the interpretation process Reference sites were frequently on the border of two or more classes In these situations, the interpreter
Plate 7.1 (S ee color insert follo wing page 114.) (a) Heterogeneity problem: reference site consists of
several classes (b) LC class changed between acquisition dates in reference site (c) Ambiguity
of class definitions; it is difficult to differentiate between high-density and commercial class according to definition.
LANDSAT TM Image CIR Aerial Photo
LANDSAT TM Image CIR Aerial Photo
(a)
(b)
(c)