NOTICE: The project that is the subject of this, report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the tional Acad
Trang 1Panel on Confidentiality Issues Arising from the Integration of
Remotely Sensed and Self-Identifying Data
Myron P Gutmann and Paul C Stern, editors
Committee on the Human Dimensions of Global Change Division of Behavioral and Social Sciences and Education
N A T I O N A L RESEARCH C O U N C I L
OF THE NATIONAL ACADEMIES
THE NATIONAL ACADEMIES PRESS
Washington, D.C
Trang 2NOTICE: The project that is the subject of this, report was approved by the Governing Board
of the National Research Council, whose members are drawn from the councils of the tional Academy of Sciences, the National Academy of Engineering, and the Institute of Medi-cine The members of the committee responsible For the report were chosen For their special competences and with regard For appropriate balance
Na-This study was supported by Contract/Grant Nos BCS-0431863, NNH04PR35P, and OD-4-2139, TO 131 between the National Academy of Sciences and the U.S National Science Foundation, the U.S National Aeronautics and Space Administration, and the U.S Depart ment of Health and Human Services, respectively Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author (s) and do not necessar-ily reflect the views of the organizations or agencies that provided support For the project
N'01-Library of Congress Cataloging-in-publication Data Pulling people on the map : protecting confidentiality with linked social-spatial data / Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, Committee on the Human Dimensions of Global Change, Division of Behavioral and Social Sciences and Education
p cm
"National Research Council."
Includes bibliographical references
ISBN 9780309104142 (pbk.) — ISBN 9780309668316 (pdf) 1 Social s c i e n c e s Research—Moral and ethical aspects 2 Confidential communications—Social surveys 3 Spatial analysis (Statistics) 4 Privacy, Right of—United States 5 Public records—Access control—United States I National Research Council (U.S.) Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self Identifying Data IT Title: Protect ing confidentiality with linked social-spatial data
-H62.P953 2007
174'.93—dc22
2006103005 Additional copies of this report are available from the National Academies Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet http://www.nap.edu
Printed in the United States of America
Cover image: Tallinn, the capital city and main seaport of Estonia, is located on Estonia's north coast to the Gulf of Finland Acquired on June 18, 2006, this scene covers an area of
35.6 x 37.5 km and is located at 59.5 degrees north latitude and 25 degrees cast longitude The red dots are arbitrarily selected and do not correspond to the locations of actual research participants
Cover credit: NASA/GSFC/METI/ERSDAO'JAROS and U.S./Japan ASTER Science Team Suggested citation: National Research Council (2007) Putting People on the Map: Protect- ing Confidentiality with Linked Social-Spatial Data Panel on Confidentiality Issues Arising
from the Integration of Remotely Sensed and Self-Identifying Data M.P Gutmann and P.C Stern, Eds Committee on the Human Dimensions of Global Change Division of Behavioral and Social Sciences and Education Washington, DC: The National Academies Press
Trang 3THE NATIONAL ACADEMIES
Advisers to the Nation on Science, Engineering, and Medicine
The National Academy of Sciences is a private, nonprofit, self-perpetuating society
of distinguished scholars engaged in scientific and engineering research, dedicated
to the furtherance of science and technology and to their use For the general welfare Upon the authority of the charter granted to it by the Congress in 1 863, the Acad-emy has a mandate that requires it to advise the federal government on scientific and technical mailers Dr Ralph J Cicerone is president of Che National Academy of Sciences
The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engi-neers It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility For advising the federal government The National Academy of Engineering also sponsors engineer-ing programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers Dr Win A Wulf is presi-dent of the National Academy of Engineering
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education Dr Harvey V Eineherg is president of the Institute of Medicine
The National Research Council was organized by the National Academy of Sciences
in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy
of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities The Coun-cil is administered jointly by both Academies and the Institute of Medicine Dr Ralph J Cicerone and Dr Wm, A Wulf are chair and vice chair, respectively, of the National Research Council
www.national-academies.org
Trang 7Preface
The m a i n themes of this report—protecting the confidentiality of
hu-m a n research subjects in social science research and sihu-multaneously ing t h a t research d a t a are used as widely a n d as frequently as possible— have been the subject of a n u m b e r of N a t i o n a l Research Council ( N R C )
ensur-publications over a considerable span of time Beginning with Sharing search Data (1985) a n d continuing with Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics (1993), Protect- ing Participants and tacilitating Behavioral and Social Science Research (2003), and, most recently, Expanding Access to Research Dalai Reconcil- ing Risks and Opportunities (2005), a series of r e p o r t s has emphasized the
Re-value of e x p a n d e d sharing and use of social science data while neously protecting the interests (and especially the confidentiality) of hu-
simulta-m a n research subjects T h i s r e p o r t d r a w s frosimulta-m those earlier evaluations and analyzes the role played by a type of d a t a infrequently discussed in those publications: data that explicitly identify a location associated with a research subject—home, w o r k , school, docFor's office, or s o m e w h e r e else The increased availability of spatial information, the increasing k n o w l -edge of h o w to perForm sophisticated scientific analyses using it, a n d the
g r o w t h of a body of science t h a t makes use of these data and analyses to study i m p o r t a n t social, economic, environmental, spatial, and public health problems has led to an increase in the collection a n d preservation of these data a n d in the linkage of spatial a n d nonspatial information a b o u t the same research subjects At the same time, questions h a v e been raised a b o u t the best ways to increase the use of such d a t a while preserving respondent
Trang 8confidentiality The latter is i m p o r t a n t because analyses t h a t m a k e the m o s t productive use of spatial information often require great accuracy and precision in t h a t information: For e x a m p l e , if you w a n t to k n o w the route
s o m e o n e takes from h o m e to the docFor's office, imprecision in o n e or the other degrades the analysis Yet precise information a b o u t spatial location
is almost perfectly identifying: if one k n o w s where someone lives, o n e is likely to k n o w the person's identity T h a t tension between the need For precision and the need to protect the confidentiality of research subjects is
w h a t motivates this study
In this report, the Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed a n d Self-Identifying Data r e c o m m e n d s ways
to find a successful balance between needs For precision and the protection
of confidentiality It considers both institutional and technical solutions and d r a w s conclusions a b o u t each In general, we find t h a t institutional solutions are the m o s t promising For the short term, t h o u g h they need further development, while technical solutions have promise in the longer term a n d require further research
As the report explains, the m e m b e r s of the panel chose in o n e cant w a y to broaden their m a n d a t e beyond the explicit target of "remotely sensed and self-identifying" data because w o r k i n g within the limitation of remotely sensed data restricted the p r o b l e m d o m a i n in a way at o d d s with the world From the perspective of confidentiality protection, when social science research d a t a are linked with spatial information, it does not matter whether the geospatial locations are derived from remotely sensed imagery
signifi-or from other m e a n s of determining location (GPS devices, Fsignifi-or example) The issues raised by linking remotely sensed information are a special case within the larger category of spatially precise and accurate information For t h a t r e a s o n , the study considers all Forms of spatial information as part
of its m a n d a t e
In framing the response to its charge, the panel d r e w heavily on existing reports, on published material, and on best practices in the field T h e panel also commissioned papers and r e p o r t s from experts; they were presented at
a w o r k s h o p held in December 2 0 0 5 at the N a t i o n a l Academies T w o of the papers arc included as a p p e n d i x e s to this report Biographical sketches of panel members a n d staff are also included at the end of this report This r e p o r t could not have been completed successfully w i t h o u t the hard w o r k of members of the N R C staff Paul Stern served as study direcFor For the panel and brought his usual skills in planning, organization, consen-sus building, and writing M o r e o v e r , from a panel chair's perspective, he is
a superb p a r t n e r a n d collaboraFor We also t h a n k the m e m b e r s of the
C o m m i t t e e on the H u m a n Dimensions of Global C h a n g e , under w h o s e auspices the panel w a s constituted, For their s u p p o r t
The panel m e m b e r s a n d I also t h a n k the p a r t i c i p a n t s in the W o r k s h o p
Trang 9PREFACE
on Confidentiality Issues in Linking Geographically Explicit a n d Self-identifying D a t a T h e i r papers and presentations provided the m e m -bers of the panel w i t h a valuable body of information a n d interpretations, which c o n t r i b u t e d substantially to o u r Formulation of both problems a n d solutions
Rebecca Clark of the D e m o g r a p h i c and Behavioral Sciences Branch of the N a t i o n a l Institute of Child H e a l t h a n d H u m a n Development has been a tireless s u p p o r t e r of m a n y of the intellectual issues addressed by this study,
b o t h those t h a t encourage the sharing of d a t a and those t h a t encourage the protection of confidentiality; a n d it was in g o o d p a r t her energy t h a t led to the study's initiation We gratefully acknowledge her efForts and the finan-cial s u p p o r t of the N a t i o n a l Institute of Child H e a l t h a n d H u m a n Develop-ment, a p a r t of the N a t i o n a l Institutes of H e a l t h of the D e p a r t m e n t of
H e a l t h and H u m a n Services; the N a t i o n a l Science F o u n d a t i o n ; and the
N a t i o n a l Aeronautics and Space Administration
Finally, I t h a n k the m e m b e r s of the panel For their h a r d w o r k a n d active engagement in the process of preparing this r e p o r t T h e y are a lively g r o u p with a wide diversity of b a c k g r o u n d s and a p p r o a c h e s to the use of spatial
a n d social science d a t a , w h o all b r o u g h t a genuine concern For enhancing research, sharing d a t a , and protecting confidentiality to the task that con-fronted us N a t i o n a l Research Council panels are expected to be interdisci-plinary: t h a t ' s the goal of constituting t h e m to prepare reports such as this one T h i s particular panel w a s m a d e up of individuals w h o w e r e themselves interdisciplinary, a n d the b r e a d t h of their individual and g r o u p expertise
m a d e the process of completing the report especially rewarding The panel's discussions aimed to find balance and consensus a m o n g these diverse indi-viduals and their diverse perspectives Writing the r e p o r t w a s a g r o u p efFort
to which everyone c o n t r i b u t e d I ' m grateful For the h a r d work
This report h a s been reviewed in draft Form by individuals chosen For their diverse perspectives a n d technical expertise, in accordance with proce-dures a p p r o v e d by the R e p o r t Review C o m m i t t e e of the N a t i o n a l Research Council The p u r p o s e of this i n d e p e n d e n t review is to provide candid and critical c o m m e n t s t h a t assist the institution in m a k i n g the published report
as s o u n d as possible a n d ensure t h a t the report meets institutional dards For objectivity, evidence, and responsiveness to the study charge T h e review c o m m e n t s and draft m a n u s c r i p t r e m a i n confidential to protect the integrity of the deliberative process
stan-We t h a n k the following individuals For their participation in the review
of the report: J o e S Cecil, Division of Research, Federal Judicial Center,
W a s h i n g t o n , D C ; Lawrence H C o x , Research a n d M e t h o d o l o g y , N a t i o n a l Center For Health Statistics, Centers For Disease C o n t r o f a n d Prevention, Hyattsville, M D ; Glenn D D e a n e , D e p a r t m e n t of Sociology, University at Albany; J e r o m e E D o b s o n , D e p a r t m e n t of Geography, University of Kan-
Trang 10sas; George T D u n c a n , Heinz S c h o o f of Public Policy and M a n a g e m e n t , Carnegie M e l l o n University; Lawrence Gostin, Research and Academic
P r o g r a m s , Georgerown University L a w Center, Washington, D C ; Joseph
C Kvedar, DirecFor's Office, P a r t n e r s Telemedicine, Boston, M A ; W Christopher L e n h a r d t , Socioeconomic Data and Applications Center, C o -lumbia University, Palisades, N Y ; Jean-Bernard M i n s t e r , Scripps Institution
of O c e a n o g r a p h y , University of CaliFornia, La Jolla, C A ; a n d G e r a r d
R u s h t o n , D e p a r t m e n t of G e o g r a p h y , T h e University of Iowa
Although the reviewers listed above provided m a n y constructive c o m
-m e n t s a n d suggestions, they were n o t asked to e n d o r s e the conclusions or
r e c o m m e n d a t i o n s n o r did they see the final draft of the r e p o r t beFore its release T h e review of this r e p o r t w a s overseen by R i c h a r d K u l k a , A b t Associates, D u r h a m , N C A p p o i n t e d b y the N a t i o n a l Research C o u n c i l ,
he w a s responsible For m a k i n g certain t h a t an independent e x a m i n a t i o n of this r e p o r t w a s carried o u t in a c c o r d a n c e with institutional p r o c e d u r e s and that all review c o m m e n t s were carefully considered Responsibility For the final c o n t e n t of this r e p o r t rests entirely with the a u t h o r i n g panel and the institutions
M y r o n P G u t m a n n , Chair
Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed a n d Self-Identifying D a t a
Trang 11Contents
Executive Summary 1
1 Linked Social-Spatial D a t a : Promises and Challenges 7
2 Legal, Ethical, and Statistical Issues in Protecting Confidentiality 26
3 Meeting the Challenges 42
4 T h e Tradeoff: Confidentiality Versus Access 59
References 71 Appendixes
A Privacy For Research Data 81
Robert Gellman
B Ethical Issues Related to Linked Social-Spatial D a t a 123
Felice J Levine and Joan E Sieber
Biographical Sketches For Panel M e m b e r s and Staff 1 6 0
Trang 13Executive Summary
Precise, accurate spatial data are c o n t r i b u t i n g to a revolution in some fields of social science Improved access to such data a b o u t individuals, groups, a n d organizations m a k e s it possible For researchers to examine questions they could not otherwise e x p l o r e , gain better u n d e r s t a n d i n g of
h u m a n behavior in its physical a n d environmental contexts, and create benefits For society from the k n o w l e d g e flows from n e w types of scientific research H o w e v e r , to the extent t h a t data are spatially precise, there is a corresponding increase in the risk of identification of the people or organi-zations to which the data apply W i t h identification comes a risk of various kinds of h a r m to those identified a n d the compromise of promises of confi-dentiality m a d e to gain access to the d a t a
This r e p o r t focuses on the o p p o r t u n i t i e s and challenges t h a t arise w h e n accurate and precise spatial d a t a on research participants, such as the loca-tions of their h o m e s or w o r k p l a c e s , are linked to personal information they have provided under promises of confidentiality The availability of these data makes it possible to do valuable new kinds of research t h a t links information a b o u t the external e n v i r o n m e n t to the behavior a n d values of individuals A m o n g m a n y possible e x a m p l e s , such research can explore
h o w decisions a b o u t health care are m a d e , h o w young people develop healthy lifestyles, and h o w resource-dependent families in p o o r e r countries spend their time obtaining the energy a n d food t h a t they need to survive The linkage of spatial a n d social information, like the growing linkage of socioeconomic characteristics w i t h b i o m a r k e r s (biological d a t a on indi-
Trang 14viduals), has the potential to revolutionize social science and to significantly advance policy making
While the availability of linked social-spatial data has great promise For research, the locational information makes it possible For a secondary user of the linked data to identify the participant and thus break the promise of confidentiality made when the social data were collected Such a user could also discover additional information about the research participant, without asking For it, by linking to geographically coded information from other sources Open public access to linked social a n d high-resolution spatial data greatly increases the risk of breaches of confidentiality At the same time, highly restrictive Forms of data m a n a g e m e n t and dissemination carry very high costs: by m a k i n g it prohibitively difficult For researchers to gain access
to data or by restricting or altering the d a t a so much t h a t they arc no longer useful For answering many types of i m p o r t a n t scientific questions
C O N C L U S I O N S
C O N C L U S I O N 1: Recent advances in the availability of social-spatial
d a t a and the development of geographic information systems (CIS) a n d related techniques to m a n a g e and analyze t h o s e d a t a give researchers
i m p o r t a n t n e w ways to study i m p o r t a n t social, environmental,
eco-n o m i c , aeco-nd health policy issues a eco-n d are w o r t h further developmeeco-nt
C O N C L U S I O N 2 : T h e increasing use o f linked social-spatial d a t a h a s created significant uncertainties a b o u t t h e ability to protect the confi-dentiality promised to research participants Knowledge is as yet inad-equate concerning the conditions under which a n d the extent to w h i c h the availability of spatially explicit d a t a a b o u t participants increases the risk of confidentiality breaches
Various n e w technical procedures involving transForming d a t a or ing synthetic datasets s h o w p r o m i s e For limiting the risk of identification while providing broader access a n d maintaining most of the scientific value
creat-of the d a t a H o w e v e r , these procedures have not been sufficiently studied to realistically determine their usefulness
C O N C L U S I O N 3 : Recent research o n technical a p p r o a c h e s F o r ing the risk of identification and breach of confidentiality has d e m o n -strated p r o m i s e For future success At this time, however, no k n o w n technical strategy or c o m b i n a t i o n of technical strategies For m a n a g i n g linked spatial-social d a t a adequately resolves conflicts a m o n g the o b -jectives of d a t a linkage, open access, d a t a quality, and confidentiality protection across datasets a n d d a t a uses
Trang 15reduc-EXECUTIVE SUMMARY 3
C O N C L U S I O N 4: Because technical strategics will be not be sufficient
in the Foreseeable future For resolving the conflicting d e m a n d s For d a t a access, d a t a quality, a n d confidentiality, institutional a p p r o a c h e s will
be required to balance those d e m a n d s
Institutional solutions involve establishing tiers of risk a n d access a n d developing data-sharing protocols t h a t m a t c h the level of access to the risks and benefits of the planned research Such protocols will require t h a t the authority to decide a b o u t d a t a access be allocated appropriately a m o n g primary researchers, data s t e w a r d s , data users, institutional review b o a r d s (IRBs), a n d research sponsors a n d t h a t those acFors are very well informed
a b o u t the benefits and risks of the d a t a access policies they may be asked to
a p p r o v e
We generally endorse the recommendations of the 2 0 0 4 N a t i o n a l
Re-search Council report Protecting Participants and facilitating Social and Behavioral Sciences Research, and the 2 0 0 5 report, F,xpanding Access in Research Data: Reconciling Risks and Opportunities, regarding restricted
access to confidential d a t a a n d unrestricted access to public-use d a t a t h a t have been modified so as to protect confidentiality, e x p a n d e d data access (remotely and t h r o u g h licensing agreements), increased research on ways to address the compering claims of access and confidentiality, and related matters T h o s e reports, however, have not dealt in detail with the risks a n d tradeoffs t h a t arise with data t h a t link the information in social science research with spatial locations Consequently, we offer eight r e c o m m e n d a -tions to address those d a t a
R E C O M M E N D A T I O N S
R e c o m m e n d a t i o n 1: Technical a n d Institutional Research
Federal agencies and o t h e r organizations t h a t sponsor the collection and analysis of linked social-spatial d a t a — o r t h a t s u p p o r t data t h a t could provide added benefits w i t h such linkage—should sponsor re-search i n t o techniques and procedures For disseminating such d a t a while protecting confidentiality and maintaining the usefulness of the d a t a For social-spatial analysis T h i s research should include studies to a d a p t existing techniques from other fields, to u n d e r s t a n d h o w the publica-tion of linked social-spatial d a t a m i g h t increase disclosure risk, and to explore institutional mechanisms For disseminating linked d a t a while protecting confidentiality and m a i n t a i n i n g the usefulness of the d a t a
Trang 16R e c o m m e n d a t i o n 2 : Education a n d T r a i n i n g
Faculty, researchers, and organizations involved in the continuing p r o fessional development of researchers should engage in the education of researchers in the ethical use of spatial d a t a Professional associations should participate by establishing and inculcating strong n o r m s For the ethical use a n d sharing of linked social-spatial d a t a
-R e c o m m e n d a t i o n 3: T r a i n i n g in Ethical issues
T r a i n i n g in ethical considerations needs to a c c o m p a n y all m e t h o d ological training in the acquisition a n d use of d a t a t h a t include geo-graphically explicit information on research participants
R e c o m m e n d a t i o n 4 : O u t r e a c h b y Professional Societies and O t h e r O r ganizations
-Research societies and o t h e r research organizations t h a t use linked social-spatial d a t a a n d t h a t have established traditions of protection of the confidentiality of h u m a n research participants should engage in
o u t r e a c h to other research societies and organizations less conversant
in research with issues of h u m a n participant protection to increase attention to these issues in the c o n t e x t of the use of personal, identifi-able d a t a
R e c o m m e n d a t i o n 5: Research Design
Primary researchers w h o intend to collect and use spatially explicit d a t a should design their studies in w a y s that n o t only take i n t o account the obligation to share d a t a a n d the disclosure risks posed, but also provide confidentiality p r o t e c t i o n For h u m a n participants in the p r i m a r y re-search as well as in secondary research use of the d a t a A l t h o u g h the reconciliation of these objectives is difficult, p r i m a r y researchers should nevertheless assume a significant p a r t of this b u r d e n
R e c o m m e n d a t i o n 6: Institutional Review B o a r d s
Institutional Review B o a r d s a n d their organizational sponsors should develop t h e expertise needed to m a k e well-informed decisions t h a t bal-ance the objectives of d a t a access, confidentiality, and quality in re-search projects t h a t will collect or analyze linked social-spatial d a t a
Trang 19re-1
Linked Social-Spatial Data:
Promises and Challenges
Precise, accurate spatial data are c o n t r i b u t i n g to a revolution in some fields t)f social science Improved access to such d a t a , c o m b i n e d with im-proved m e t h o d s of analysis, is m a k i n g possible deeper understanding of the relationships between people a n d their physical a n d social e n v i r o n m e n t s Researchers are no longer limited to analyzing d a t a provided by research participants a b o u t their personal characteristics a n d their views of the
w o r l d ; rather, it h a s become possible to link personal information to the exact locations of h o m e s , workplaces, daily activities, a n d characteristics of the environment (e.g., water supplies) T h o s e links allow researchers to understand m u c h m o r e a b o u t individual behavior a n d social interactions
t h a n previously, just as linking biomedical data (on genes, proteins, blood chemistry) to social d a t a has helped researchers u n d e r s t a n d the progress of illness and health in relation to aspects of people's behavior T h e potential For improved u n d e r s t a n d i n g of h u m a n activities at the individual, g r o u p ,
a n d higher levels by incorporating spatial information is only beginning to
be unlocked
Yet even as researchers are learning from n e w opportunities offered by precise spatial information, these d a t a raise n e w challenges because they allow research participants to be identified and thereFore threaten the p r o m -ise of confidentiality m a d e when collecting the social data to which spatial data are linked Although the difficulties of ensuring access to data while preserving confidentiality h a v e been addressed by previous N a t i o n a l R e -search Council reports ( 1 9 9 3 , 2 0 0 0 , 2 0 0 3 , 2 0 0 5 a ) , those did n o t consider
in detail the risks p o s e d by data t h a t link the information in social science
Trang 20research with spatial locations This r e p o r t directly addresses the tradeoffs between providing greater access to data and protecting research partici-
p a n t s from breaches of confidentiality in the context of the unique capacity
of spatial data to lead to the identification of individuals
T H E N E W W O R L D O F LOCATIONAL D A T A
T h e d e v e l o p m e n t o f n e w d a t a , a p p r o a c h e s , spatial analysis t o o l s ,
a n d d a t a collection m e t h o d s over the past several decades h a s ized h o w researchers a p p r o a c h many questions T h e availability of high-resolution satellite images of E a r t h , collected repeatedly over time, and of software For converting those images into digital information a b o u t spe-cific locations, has m a d e n e w m e t h o d s of analysis possible Along with
revolution-m o r e and irevolution-mproved satellite irevolution-mages, there are aerial irevolution-mages, global tioning systems (GPS) a n d o t h e r types of sensors—especially r a d i o frequency identification (RFID) tags that can he used to track people
posi-w o r l d posi-w i d e — t h a t alloposi-w the possibility of ubiquitous tracking of als a n d g r o u p s T h e same technologies also permit enhanced research a b o u t business enterprises, For e x a m p l e , by providing t r a c k i n g information For commercial vehicles or s h i p m e n t s of g o o d s
individu-W i t h the advent of GPS, the goal of real-time, c o n t i n u o u s global age with an accuracy finer than I meter has been achieved, though some caveats, such as difficulty with indoor coverage, apply T r i a n g u l a t i o n based
cover-on cellular telephcover-one signal strength can be used to establish locaticover-on cover-on the order of 100 meters in m a n y locations, and researchers arc n o w developing techniques For m a p p i n g mobile locations at m u c h higher resolutions (Borriello et al., 2 0 0 5 ) Satellite r e m o t e sensing instruments have improved
by m o r e t h a n an order of magnitude during the past t w o decades in several dimensions of resolution Commercial remote sensing firms provide data with a sub-meter g r o u n d resolution W i t h the increasing availability of hyperspectral sensor systems (those that sense in h u n d r e d s of discrete spec-tral b a n d s along the electromagnetic spectrum), the a m o u n t of geographic information being collected from satellites h a s increased at a staggering pace
Terrestrial sensing systems are also increasing in q u a n t i t y a n d ity Low-cost solid-state imagers with GPS c o n t r o f are n o w widely deployed
capabil-by private companies and scientific investigaFors In a d d i t i o n , fixed sensor arrays (e.g., closed circuit television) are n o w used routinely in m a n y loca-tions to provide c o n t i n u o u s coverage of events in their field of view As computers continue to decrease in size a n d power c o n s u m p t i o n while also
increasing in c o m p u t i n g and sForage capacity, inexpensive in situ sensor
n e t w o r k s are able to record information t h a t is transmitted over
peer-to-peer n e t w o r k s a n d o t h e r types of radio c o m m u n i c a t i o n technologies (Culler,
Trang 21LINKED SOCIAL-SPATIAL DATA 9
Estrin a n d Srivastava, 2 0 0 4 ; M a r t i n e z , H a r t , and O n g , 2 0 0 4 ) These
de-vices are n o w rather primitive, often sensing single types of information
such as t e m p e r a t u r e or pressure, but their capabilities are increasing r a p
-idly M o r e o v e r , their space requirements are decreasing; some researchers
n o w describe nanoscale c o m p u t i n g and sensing devices (Gcer, 2 0 0 6 )
These emerging technologies are being integrated with other
develop-ing streams of technology—such as RFID tags (Want, 2 0 0 6 ) a n d wearable
computers (Smailagic and Siewiorek, 2 0 0 2 ) — t h a t are location and context
a w a r e Indeed, the ubiquity of these devices has caused some to assert that
traditional sensing a n d processing systems will, in essence, disappear (Streit:/
a n d N i x o n , 2 0 0 5 ; Weiser, 1991) These technologies are creating
signifi-cant concerns a b o u t threats to privacy, a l t h o u g h few, if any, of these
con-cerns relate to research uses of the technologies Nevertheless, emerging
technological capabilities are an i m p o r t a n t p a r t of the context For the
re-search use of locationa1 d a t a
As these new tools and m e t h o d s have become m o r e widely available,
researchers have begun to p u r s u e a variety of studies that were
previously-difficult to accomplish For example, analysis of health services once
fo-cused on access as a function of age, sex, race, income, occupation,
educa-tion, and employment It is n o w possible to examine h o w access and its
effects on health are influenced by distances from h o m e a n d work to health
care providers, as well as the quality of the available t r a n s p o r t a t i o n routes
a n d modes (Williams, 1 9 8 3 ; Entwisle et al., 1 9 9 7 ; Parker, 1 9 9 8 ; K w a n ,
2 0 0 3 ; Balk et al., 2 0 0 4 ) Improved u n d e r s t a n d i n g of h o w these spatial
p h e n o m e n a interact with social ones can give a m u c h clearer picture of the
nature of access to health care t h a n w a s previously possible
Critical to research linking social a n d spatial data are the development
a n d use of geographical information systems (GIS) that m a k e it possible to
tie data from different sources to points on the surface of the E a r t h This
connection has great i m p o r t a n c e because geographic coordinates are a
unique and unchanging identification system W i t h GIS, data collected from
participants in a social survey can be linked to the location of the
respon-d e n t s ' resirespon-dences, workplaces, or lanrespon-d holrespon-dings anrespon-d thus can be analyzerespon-d in
connection with data from other sources, such as satellite observations or
administrative records t h a t are tied to the same physical location Such d a t a
linkage can reveal more information a b o u t research participants t h a n can
be k n o w n from cither source a l o n e Such revelations can increase the fund
of h u m a n k n o w l e d g e , but they can also be seen by the individuals w h o s e
data are linked as an invasion of privacy or a violation of a pledge of
confidentiality
Increasingly sophisticated tools For spatial analysis involving, b u t
go-ing far b e y o n d , the simple digitized m a p s of the early geographical information systems have also contributed to this revolution N o t o n l y has
Trang 22commercial software m a d e spatial data processing, visualization, a n d
inte-g r a t i o n relatively accessible, but several packainte-ges (includininte-g freeware; e.inte-g., Anseliu, 2 0 0 5 ; Anselin et al., 2 0 0 6 ; Bivand, 2 0 0 6 ; also see h i t p : / / w w w r-project.org/) also m a k e multivariate spatial regression analysis much easier (e.g., F o t h e r i n g h a m et al., 2 0 0 2 ) M o r e o v e r , s t a n d a r d statistical software p a c k a g e s , such a s Stata a n d M a t l a b , n o w have m u c h greater functionality to a c c o m m o d a t e spatial analytic m o d e l s , a n d SAS (another software package) and Stata have increased flexibility to a c c o m m o d a t e complex design effects often associated with spatially linked d a t a
SCOPE O F W O R K
In response to such challenges of providing wider access to d a t a used For social-spatial analysis while maintaining confidentiality, the sponsors of this study asked the N a t i o n a l Academies to address the scientific value of linking remotely sensed and "self-identifying" social science data that: are often collected in social survey's, t h a t is, d a t a that allow specific individuals and their attributes to be identified T h e Academies were further asked to
discuss and evaluate tradeoffs involving data accessibility, confidentiality, and data quality; consider the legal issues raised by releasing remotely sensed data in Forms linked to self-identifying data; assess the costs and benefits of different methods For addressing confidentiality in the dissemi-nation of such data; and suggest appropriate models For addressing the issues raised by the combined needs For confidentiality and data access
In carrying out o u r study, it became clear t h a t limiting the s t u d y to remotely sensed data unnecessarily restricted the problem d o m a i n W h e n social science research d a t a are linked with spatially precise a n d accurate information, it does n o t m a t t e r in terms of confidentiality issues w h e t h e r the Geospatial locations arc derived from remotely sensed imagery or from
o t h e r m e a n s of d e t e r m i n i n g location, such as GPS devices or
address-m a t c h i n g using GIS technology T h e issues raised by linking readdress-motely sensed information are a special case within the larger category of spatially precise a n d accurate information For t h a t r e a s o n , the c o m m i t t e e consid-ered as p a r t of its m a n d a t e all Forms of spatial information We also considered all Forms of d a t a collected from research participants t h a t might allow t h e m to be identified, including personal information a b o u t indi-viduals, which m a y or m a y n o t be sensitive if revealed to o t h e r s , and information about: specific businesses enterprises For purposes of simplic-ity we call all this p e r s o n a l a n d enterprise information used For the re-search considered here "social data," and their merger with spatial infor-mation"social-spatialdata."
Trang 23LINKED SOCIAL-SPATIAL DATA 11
T h i s r e p o r t focuses mainly on m i c r o d a t a , specifically, inFormation
a b o u t individuals, households, or businesses that participate in research studies or supply data For administrative records that have the potential to
be shared with researchers outside the original g r o u p t h a t p r o d u c e d the
d a t a This focus is the result of the fact t h a t such individual-, household-, or enterprise-level data are easily associated with precise locations M i c r o d a t a arc especially i m p o r t a n t because spatial data can compromise confidential-ity both by identifying respondents directly and by providing sensitive in-Formation that creates risk of h a r m if linked to identifying data In addition, spatially precise inFormation may sometimes be associated with small ag-gregates of individuals or businesses; and care is always needed w h e n shar-ing data t h a t have exact locations, For example, a cluster of persons or families living near each other
This r e p o r t provides guidance to agencies t h a t sponsor data collection
a n d research, to academic a n d n o n a c a d e m i c institutions a n d their tional review b o a r d s (IRBs), to researchers w h o are collecting d a t a , to institutions and individuals involved in the research enterprise (such as firms t h a t contract to c o n d u c t surveys), a n d to those organizations charged with the long-term stewardship of data It discusses the challenges they face
institu-in preservinstitu-ing confidentiality For linstitu-inked social and spatial d a t a , as well as ways that they can simultaneously h o n o r their c o m m i t m e n t to share their wealth of data a n d their c o m m i t m e n t to preserve participant conlidential¬ ity Although all these individuals a n d organizations involved in the re-search enterprise have s o m e w h a t different roles to play and s o m e w h a t different interests a n d concerns, we refer to them t h r o u g h o u t this report as
data stewards This focus on the responsibilities of those w h o share data For
analysis does not absolve others w h o have responsibility For the collected inFormation from thinking a b o u t the risks associated with spatially explicit
d a t a The r e p o r t thereFore also speaks to those w h o use linked social-spatial
d a t a , including researchers w h o analyze the data and editors w h o publish
m a p s or other spatially explicit inFormation t h a t m a y reveal inFormation
t h a t is problematic from a privacy perspective (e.g., M o n m o n n i e r , 2 0 0 2 ; Armstrong a n d Ruggles, 2 0 0 5 ; Rushton et al., 2 0 0 6 )
This study follows a n d builds on a series of previous N a t i o n a l Research Council r e p o r t s t h a t address closely related issues, including: issues of d a t a access (1985); the challenges of protecting privacy a n d reducing disclosure risk while maximizing access to quality, detailed d a t a For inFormed analyses ( 1 9 9 3 , 2 0 0 0 , 2 0 0 3 , 2 0 0 4 b ) ; and ethical considerations i n using micro-level
d a t a , including linked data ( 2 0 0 5 a ) T h e conclusions a n d r e c o m m e n d a t i o n s
of several of these earlier studies inForm this report These earlier reports and o t h e r studies (e.g., N a t i o n a l Research Council, 1 9 9 8 ; J a b i n e 1 9 9 3 ; Melichar et al., 2 0 0 2 ) , have generally developed t w o t h e m e s , o n e emphasiz-ing the need For data—especially m i c r o d a t a — t o be shared a m o n g research-
Trang 24crs, and the o t h e r the need to protect research participants While the theme
of expanding access to data h a s included data p r o d u c e d by b o t h individual researchers a n d government agencies, it has generally emphasized the latter
In the closely related area of environmental d a t a , the N a t i o n a l Research Council (2001) emphasizes t h a t publicly funded data are a public g o o d a n d
t h a t the public is entitled to full a n d open access
The consensus of this w o r k is that secondary use of data For replication and n e w research is valuable and t h a t both privately and publicly produced data should be shared T h e most recent: r e p o r t on the subject (National Research Council, 2 0 0 5 a ) presents a concise set of r e c o m m e n d a t i o n s that encourage increased access to publicly p r o d u c e d d a t a At the same time, these r e p o r t s a n d studies have also insisted on the protection of research participants, mostly in the b r o a d e r c o n t e x t of protecting all h u m a n research subjects
This report: supports the conclusions of the prior work while exploring new g r o u n d N o n e of the earlier r e p o r t s considered t h e potential For breaches of confidentiality posed by the increase in research using linked social-spatial d a t a T h e analyses a n d r e c o m m e n d a t i o n s included in this
r e p o r t strive to e x p a n d the field to the new w o r l d of locational d a t a The concerns addressed in this report are raised in the context of a
b r o a d e r recognition t h a t vast a m o u n t s of data are available a b o u t most residents of the United States, that these data have been collected a n d collated w i t h o u t the explicit permission of their subjects, a n d t h a t invasions
of privacy take place frequently ( O ' H a r r o w 2 0 0 5 ; D o b s o n a n d Fisher 2 0 0 3 ; Goss 1 9 9 5 ; Fisher and D o b s o n 2 0 0 3 ; Sui 2 0 0 5 ; Electronic Privacy InForma-tion Center [http://www.epic.org/pivacy/censusj, 2 0 0 3 ) H u g e commercial databases of financial transactions, court records, telephone records, health inFormation, a n d other personal inFormation have been established, in many cases w i t h o u t any meaningful request to the relevant individuals For release
of t h a t inFormation These databases are often linked and the results m a d e available For a fee to purchasers in a system t h a t has greatly diminished individuals' and businesses' c o n t r o f over inFormation a b o u t themselves These invasions or perceived invasions of privacy, however, are n o t a sub-ject of this report All datasets t h a t include personal inFormation, including those created For commercial as well as research purposes, w h e t h e r or not they h a v e spatial inFormation and those that do not, are in need of c o m p r e -hensive care to prevent breaches of confidentiality a n d invasions of privacy Neither this r e p o r t n o r earlier r e p o r t s deal with the kinds of inFormation technology security required to prevent breaches or invasions, in the case of this report because there is nothing special For spatial data a b o u t the need For that security
Trang 25LINKED SOCIAL-SPATIAL DATA 1 3
PRIVACY, CONFIDENTIALITY, IDENTIFICATION, A N D H A R M
To understand the dimensions of the confidentiality p r o b l e m , it is
im-p o r t a n t first to distinguish the conceim-pts of im-privacy, confidentiality,
identifi-cation, a n d h a r m (see Box 1-1) Privacy concerns the ability of individuals
to c o n t r o f personal inFormation t h a t is not k n o w a b l e from their public presentations of themselves (sec Appendix A For a more detailed discussion
of privacy and U.S privacy law) W h e n someone willingly provides
inFor-m a t i o n a b o u t hiinFor-mself or herself, it is not an invasion of privacy, especially
if the person has been inFormed that it is acceptable to terminate the sure at any time An invasion of privacy occurs w h e n an agent o b t a i n s such inFormation a b o u t a person w i t h o u t that person's agreement An invasion
disclo-of privacy is especially egregious when the person docs not w a n t the agent
to have the inFormation An example is the acquisition and sale of the mobile telephone records of individuals w i t h o u t their permission (New York T i m e s , 2 0 0 6 )
Confidentiality involves a p r o m i s e given by an agent—a researcher in
the cases of interest in this report—in exchange For inFormation BeFore a research activity begins, the researcher explains the purposes of the project, describes the benefits and h a r m s t h a t m a y affect the research participant and society m o r e broadly, and o b t a i n s the consent of the p a r t i c i p a n t to
BOX 1-1
B r i e f Definitions of Some Key T e r m s
Privacy concerns the ability of individuals to controf personal inFormation this is not
knowable from their public presentations of themselves An invasion of privacy
occurs when an agent obtains such inFormation about a person without thai son's agreement
per-Confidentiality in the research context involves an agreement in which a research
participant makes personal inFormation available to a researcher in an exchange For
a promise to use that inFormation only For specified purposes and not to reveal The participant's identity or any identifiable inFormation to unauthorized third parties
Identification of an individual in a database occurs when a third party learns the
identity of the person whose attributes are described there Identification
disclo-sure risk is the likelihood of identification
Harm is a negative consequence that affects a research participant because of a
breach of confidentiality
Trang 26continue This process is called "informed c o n s e n t " (sec N a t i o n a l Research Council, 1 9 9 3 ) The researcher then collects the i n f o r m a t i o n — t h r o u g h in-terview, behavioral observation, physical e x a m i n a t i o n , collection of bio-logical sample specimens, or requests For the information from a third
p a r t y , such as a hospital or a g o v e r n m e n t agency In exchange, the searcher promises to use t h a t information only For specified p u r p o s e s (often limited to statistical analysis) and not to reveal the p a r t i c i p a n t ' s identity or any identifiable information to unauthorized third parries If promises of confidentiality are kept, a participant's privacy is protected in relation to the information given to the researchers In academic a n d o t h e r research organizations, the process of obtaining informed consent a n d m a k i n g con-fidentiality promises is p a r t of n o r m a l research p r o t o c o l : institutional re-view b o a r d s have guidelines t h a t require agreements and protection of confidentiality and the ethical s t a n d a r d s of research communities provide further support For confidentiality
re-Identification is a key element in confidentiality promises
Confidenti-ality means t h a t w h e n researchers release any information—analyses, scriptions of the project, or databases t h a t might be used by third p a r t i e s — they p r o m i s e t h a t the identity of the participants will n o t be publicly revealed and c a n n o t be inferred Identification of an individual in a d a t a -base occurs when a third party learns the identity of the person whose attributes are described there Identification obviously increases the risk of
debreaches of confidentiality Identification disclosure risk is sometimes q u a n
-tified in terms of the likelihood of identification In the c o n t e x t of this study, precise spatial information increases the risk of disclosure a n d thus the likelihood of identification
It is i m p o r t a n t to n o t e t h a t it is not so much the information t h a t is being protected, but the link of the information to the individual For example, it is acceptable to describe a person's survey answers or character-istics so long as the identity of the participant is n o t revealed T h e danger inherent in a breach of confidentiality is not only that private information
a b o u t an individual might be revealed, but also t h a t the successful conduct
of research requires that, there be no breaches of confidentiality: any such breach may significantly endanger future research by m a k i n g potential re-search participants wary of sharing personal information Including spatial data in a dataset with social data greatly increases the possibility of identi-fication while at the same time being necessary For certain kinds of analysis
Harm is a negative consequence t h a t affects a survey r e s p o n d e n t or
other research participant, in the instances of interest in this study, because
of a breach of confidentiality Social science research can cause various kinds of h a r m (For example, legal, r e p u t a t i o n a l , financial, or psychological) because information is revealed a b o u t a person t h a t she or he does n o t wish others to k n o w , such as financial liabilities or a criminal record In excep-
Trang 27LINKED SOCIAL-SPATIAL DATA 15
tional cases, identification of a participant in social science research could put the person at risk of physical h a r m from a third party In linking social
a n d spatial d a t a , the need to prevent breaches of confidentiality remains serious, even if no discernible h a r m is d o n e to respondents, because even apparently harmless breaches violate the expectations of a trusting relation-ship a n d can also dataage the r e p u t a t i o n of the research enterprise.1
T h u s , the challenge to the research c o m m u n i t y is to preserve tiality (and also to protect private information to the extent possible) This
confiden-m e a n s that research participants confiden-must be protected froconfiden-m identification pecially, but not only, when identification can h a r m them T h o u g h the chance of a confidentiality breach is never zero, the risk of disclosure de-
es-p e n d s on the n a t u r e of the d a t a The sees-parate risk of h a r m also dees-pends on the n a t u r e of the d a t a In some instances, confidentiality is difficult to protect bur the risk of h a r m to respondents is low (e.g., w h e n the d a t a include only information that is publicly available); in o t h e r s , confidential-ity may be easy to protect (e.g., because the data include few characteristics
t h a t might be used to identify someone), b u t the risk of h a r m m a y be high
if identification occurs (because some of the recorded characteristics could,
if k n o w n , e n d a n g e r the well-being of the r e s p o n d e n t ) W h e n precise locational data are included in or can be determined from a dataset, re-searchers face tougher challenges of protecting confidentiality and prevent-ing identification
OPPORTUNITIES A N D CHALLENGES For RESEARCHERS
In response to the g r o w i n g opportunities For knowledge a b o u t
relation-ships between social a n d spatial phenomena on the part of researchers a n d policy m a k e r s , research funders—especially the N a t i o n a l Institute of Child Health a n d H u m a n Development [National Institutes of H e a l t h ] , the N a -tional Science F o u n d a t i o n , and the N a t i o n a l Aeronautics a n d Space Admin-istration—the sponsors of this study, have contributed substantial resources
to the creation of linked social-spatial datasets (see Box 1-2) Such datasets cover parts of the United States (Arizona State University, 2 0 0 6 ; University
of Michigan, 200.Sa), Brazil ( M o r a n , Rrondizio, a n d VanWey, 2 0 0 5 ; ana University, 2 0 0 6 ) , E c u a d o r (University of N o r t h C a r o l i n a , 2 0 0 5 ) , T h a i -land (Walsh et al., 2 0 0 5 ; University of N o r t h C a r o l i n a , 2 0 0 6 ) , N e p a l (Uni-versity of Michigan, 2 0 0 5 b ) , and other countries O n e o u t s t a n d i n g example
Indi-is research on the relationship a m o n g p o p u l a t i o n , land use, and ment in the N a n g R a n g district of T h a i l a n d , described in Figure 1-1
environ-For more on the distinction between risk and harm, sec the Risk and Harm Reports of the Social and Behavioral Sciences Working Group on Human Research Protections (http:// www.aera.net/aera.old/humansububjects/risk-harm.pdf, accessed January 2007)
Trang 28FIGURE 1-1 Confidentiality in Nang Rong, Thailand The image is an aerial photo with lated households identified and linked to their farm plots At this resolution, it is impossible to prevent identification of households
simu-BOX 1-2
An Example of Social-Spatial Data
A good example of a social-spatial dataset comes from the Nang Rong study, begun in 1984 This project covers 51 villages in Nang Rong district, Northeast Thailand, an agricultural setting in the country's poorest region The researchers who work on this project have collected data from all households in each village, including precise locations of dwelling units and agricultural parcels Social net-work data link households along lines of kinship as well as economic assistance— who helps whom with agricultural tasks The project team also follows migrants out
to their most common destinations, including Bangkok and the country's Eastern
Trang 29LINKED SOCIAL-SPATIAL DATA 17
FIGURE 1-2 Confidentiality issues in Bangkok, Thailand The background is an Ikonos lite image, with simulated household data overlaid The figure shows that migrants from the same village cluster at their destination, Forming a village enclave (upper insert) or cluster with migrants from other Nang Rong villages Forming a Nang Rong cluster (lower insert) Released
satel-in this fashion, the data can give away the identity of the migrants (unless circles are enlarged
to cover more area in which case the quality of the data is degraded)
Seaboard—a government-sponsored development zone The project's social data have been merged with the locations of homes, fields, and migration destinations and then linked to a variety of other types of geographic information including satellite data, aerial photographs, elevation data, road networks, and hydrological features These linked data have been used For many types of analysis (see Uni-versity of North Carolina, 2006) Figures 1 -1 and 1 -2 are simulated data of the type created For the Nang Rong project They show just how clearly individuals and households can be located in these data and thereFore how easy it would be For anyone who has the spatial information For actual respondents to identify them
Trang 30Linking social data t h a t arc collected from individuals and households with spatial d a t a a b o u t t h e m , collected in place or by r e m o t e sensing, creates potential For improved u n d e r s t a n d i n g of a variety of social p h e n o m -ena (see But/ and Forrey 2 0 0 6 ) M u c h has already been learned a b o u t the effects of c o n t e x t on social o u t c o m e s by analyzing social data at relatively-imprecise geographic levels, such as census blocks a n d tracts or o t h e r pri-
m a r y sampling units (e.g., G c p h a r t , 1 9 9 7 ; Smith and W a i t z m a n 1 9 9 7 ; Lc Clere et al 1 9 9 8 ; Ross et al., 2 0 0 0 ; S a m p s o n et al., 2 0 0 2 ) Advances in geographic information science a n d in r e m o t e sensing m a k e it possible to connect individuals and households to their geographic a n d biophysical
e n v i r o n m e n t s — a n d changes in t h e m — a t m u c h finer scales
Because concerns a b o u t confidentiality h a v e limited the use of linked social a n d fine-scale spatial d a t a , the potential For advancing knowledge
t h r o u g h such linkages is only beginning to be explored T h e r e are some early hints of exciting work, and we can speculate a b o u t future progress Some of the progress involves studies of h u m a n interactions with the n a t u -ral environment, a field t h a t h a s been s u p p o r t e d by the agencies t h a t have requested the present study (e.g., N a t i o n a l Research Council, 1 9 9 8 , 2 0 0 5 b ) Researchers have combined h o u s e h o l d surveys with remotely sensed data
on changes in land use to g a i n deeper u n d e r s t a n d i n g of the processes ing those changes and their economic consequences (e.g., conversion of agricultural land to urban uses, Seto, 2 0 0 5 ; changes in c r o p p i n g patterns,
driv-W a l s h et al., 2 0 0 5 ; changes in Forest cover, Foster, 2 0 0 5 ; M o r a n et al.,
2 0 0 5 )
A n o t h e r area of research and o p p o r t u n i t y involves global p o p u l a t i o n
p a t t e r n s Global gridded p o p u l a t i o n data d e m o n s t r a t e s t h a t people tend to live at low elevation and near sea coasts and rivers (Small and Gohen, 2 0 0 4 ; Small and Nicholls, 2 0 0 3 ) a n d that: people living in coastal regions are disproportionately residents of u r b a n areas M o r e o v e r , coastal regions, whether u r b a n or rural, are much m o r e densely populated t h a n o t h e r types
of ecosystems ( M c G r a n a h a n ct al., 2 0 0 5 ) A b o u t one of every ten people on Earth lives in a l o w elevation coastal zone at risk of sForm surges associated with expected increases in sea levels ( M c G r a n a h a n et al., 2 0 0 6 )
Interesting examples come from health research For e x a m p l e , the ability of exercise options near where people live, including features as simple as a sidewalk, affects people's health and physical fitness ( G o r d o n -Larscn ct al., 2 0 0 6 ) O t h e r research s h o w s h o w m i g r a t i o n responds to local environmental conditions, w i t h recurrent droughts p e r h a p s providing the best example (Deane a n d G u t m a n n , 2 0 0 3 ; G u t m a n n e t al., 2 0 0 6 ) T h e r e are opportunities lor improving estimates of vulnerability to famine by combining data on food availability with d a t a on h o u s e h o l d coping capa-bilities and strategies (Hutchinson, 1 9 9 8 ) In one e x a m p l e , combining de-
avail-m o g r a p h i c survey data with environavail-mental variables s h o w e d t h a t
Trang 31house-LINKED SOCIAL-SPATIAL DATA 19
hold facFors (composition, size, assets), m a t e r n a l e d u c a t i o n , and soil ity were all significant determinants of child hunger in Africa (Balk et al,
fertil-2 0 0 5 ) The future of health research offers myriad opportunities For ample, environmental (acFors (e.g., air and water quality) have been linked
ex-to people's health: as social and biophysical datasets become better grated at finer scales, it will be possible to examine a variety of environmen-tal facFors a n d link t h e m to people's health with greater precision a n d so develop better u n d e r s t a n d i n g of those environmental facFors
inte-Another example of the future of research concerns understanding travel behavior by linking personal data with fine-scale spatial information
on actual travel p a t t e r n s Researchers could evaluate simultaneously the individual attributes of the research participants, the environmental at-tributes of the places t h e y live, w o r k , or otherwise frequent, and the de-tailed travel p a t t e r n s t h a t lead from o n e to a n o t h e r Beyond k n o w i n g
w h e t h e r a route to s c h o o f has a sidewalk a n d whether a child walks to school, o n e can ask whether t h a t route also has a candy sFore or a c o m m u -nity exercise facility and whether the actual trip to s c h o o f allows the child
to s t o p there Yet c o m b i n i n g all t h a t information—location of h o m e a n d school, route t a k e n , a n d attributes of child and family—and publishing it would reveal the actual identities of research participants and so breach the promise of confidentiality m a d e when data were collected from them
As research combining spatial data with social data collected from individuals has expanded, b o t h researchers a n d their sponsors have been Forced to confront questions a b o u t the release of the massive a m o u n t s of data t h e y h a v e accumulated T h e opportunities For research offer the poten-tial For great benefits, b u t there is also some risk of h a r m M o r e o v e r , both professional ethics and agency policies require that researchers share their data with others At the same time, researchers w h o collect social a n d behavioral data customarily promise the participants w h o provide the data confidentiality, and the same professional ethics a n d agency policies t h a t require d a t a sharing also require t h a t pledges of confidentiality be h o n o r e d These requirements c o m b i n e to p r o d u c e the central dilemma t h a t this re-
p o r t addresses
-See, For example, the codes of ethics of the Urban and Regional Information Systems Association (http://\vww.urisa.org/about/ethics): the American Society of Photogrammetry and Remote Sensing (http://www.aspri.org/iiieiiibersliip/cerLilicaliorL/appendix_a.iRiiil); the Ameri-can Sociological Association (http://www.asanet.org/galleries/default-file/Code%20of%20 Ethics.pdf) and the Association of American Geographers (http://www.aag.org/Publications/ EthicsStatement.html) Also sec, lor example, the policies of the National Institutes of Health (hfrp://grants1 nih.gov/grants/guidc/noficc-filc.s/NOT-OD-0.>-0.i2.hrml) and the National Sci-ence Foundation (article 36 at http://nst.gov/pubs/2001/gcl01/gcl01revl.pdt) [All above cited web pageb accessed January 2007.J
Trang 32In order to u n d e r s t a n d the challenges a n d opportunities, consider a
recent finding and t w o hypothetical examples T h e finding concerns the
rapidly g r o w i n g use of m a p s in medical research Brownsteir a n d
col-leagues (2006) identified 19 articles in five major medical journals in 2 0 0 4
and 2 0 0 5 t h a t plotted the addresses of patients as d o t s or symbols on m a p s
To determine h o w easy it might be to identify individual patients from these
m a p s , they created a simulated m a p of 5 5 0 geographically coded addresses
of patients in Boston, using the m i n i m u m figure resolution required For
publication in the New England Journal of Medicine, a n d a t t e m p t e d to
re-identify the addresses using s t a n d a r d GTS technology They precisely
iden-tified 79 percent of the addresses from the m a p , a n d came within 14 meters
of precision with the rest T h e a u t h o r s ' point w a s that improved ability to
visualize disease patterns in space comes at a cost to p a t i e n t s ' privacy
The first hypothetical e x a m p l e concerns a researcher w h o (expanding
on the insights in Gordon-Larsen et: al., 2006) undertakes a project: that
includes a survey of adolescent behavior, including exercise a n d eating
habits, in order to understand the causes of obesity in the teenage p o p u l a
-tion In addition to asking a b o u t h o w the research subjects get to schoof a n d the availability of places to walk and exercise, the researcher t a k e s GPS readings of their h o m e s and schools, and asks t h e m to w e a r a device t h a t
tracks their location d u r i n g w a k i n g h o u r s For I week Because of the c o m
-plexity of the problem, the researcher asks about, d r u g a n d a l c o h o f
cons u m p t i o n in addition to food c o n cons u m p t i o n Finally, the information o b
-tained from the participants is merged with detailed m a p s of the
communities in w h i c h they live in order to k n o w the location of specific
kinds of places and the routes between them In the second example, a
researcher interested in the effects of family size on land use a n d resource
consumption in south Asia conducts a survey t h a t asks each family a b o u t
their reproductive a n d health hisFory, as well as detailed questions a b o u t
the ways t h a t t h e y obtain food a n d fuel T h e n , walking in the c o m m u n i t y
with family representatives, the researcher takes GPS readings of the
loca-tions of the families' farm plots and the areas w h e r e they gather wood For
heat a n d cooking Finally, the researcher spends a day with the w o m e n a n d
children in the families as they go about gathering fuel w o o d , wearing a
GPS-based tracking device so t h a t the location a n d timing of their activities
can be recorded Some of these locations are outside the sanctioned areas in
which the family is legally permitted to gather fuel
In both hypothetical examples, the linking of the social data gathered
from the participants and the spatial data will permit identification of some
or all of the participants Yet the researchers have made promises of
confi-dentiality, which state that the data will only be analyzed by qualified
re-searchers a n d that the participants will never be identified in any publication
or presentation Yet both the sponsor of the research and the research ethics
require that the researchers m a k e their data available to other researchers For
Trang 33LINKED SOCIAL-SPATIAL DATA 21
replication a n d For new research In both surveys, there arc questions about activities that are outside officially sanctioned behavior, which if linked to an individual respondent: might cause them harm if revealed
In both hypothetical examples, the locational information is essential
to the value of the d a t a , so the researchers may not simply discard or modify d a t a items t h a t could lead to identification R a t h e r , they face a choice between h o n o r i n g the requirement to share data and the commit-ment to p r o t e c t confidentiality, or s o m e h o w finding a w a y to do b o t h Sharing data is not by itself automatically harmful to research participants Responsible researchers regularly analyze data that include confidential information, and do so w i t h o u t compromising the promises t h a t were m a d e when the d a t a were collected The challenge arises when the data are shared with secondary researchers, w h o must cither g u a r a n t e e that they will ad-here to the promise of confidentiality m a d e by the original researcher, or receive data t h a t are stripped of useful identifying information T h e goal is
to m a k e sure that: responsible secondary users do n o t reveal respondent identities, a n d do n o t share the d a t a to others w h o might do so But locational information may also m a k e it possible For a secondary researcher
to identify research participants by linking to data from o t h e r sources,
w i t h o u t requesting permission For that information
Some recent research suggests that it is possible to gauge social, d e m o graphic, and e c o n o m i c characteristics from remote sensing data alone (Cowen a n d Jensen 1 9 9 8 ; C o w e n e t a l 1 9 9 3 ; Weeks, Larson, and r u g a t e ,
-2 0 0 5 ) , but this suggestive idea is unproven a n d would require considerable
s u p p o r t i n g research to overcome the challenge that the d a t a arc of limited value and have a high likelihood of error Identifying social attributes from Earth-observing satellites is not easy, but satellite d a t a , particularly from high-resolution satellites (launched since the late 1990s) m a k e the identifi-cation of particular a n t h r o p o g e n i c features—roads, buildings, infrastruc-ture, vehicles—much easier t h a n previously.5 O t h e r Forms of spatial d a t a , such as aerial p h o t o g r a p h s , especially hisForic ones, arc much less likely to
be accurately georeferenced (if georeferenced at all) For fine-scale matching with o t h e r attributes, but may nevertheless foster identification
Spatial data create the possibility that: confidentiality may be c o m p r o mised indirectly by secondary data users in w a y s t h a t identify individual participants.4 T h o s e ways relate to the spatial context of observations a n d the spatial covariance t h a t exists a m o n g variables Spatial covariance refers
-'A review of satellites, their spatial and temporal resolutions and coverage, and detectable features can he found at http//scdac.cicsin.Columbia.edu/tg/guide_frame.jsp?rd=RS&ds=l [ac-cessed January 2007]
Confidentiality issues rarely, it ever, arise For spatial data when unlinked to social data Much spatial data are in the public domain, and the Supreme Court lias ruled that privacy rights do not exist lor observations made from publicly navigable airspace (see Appendix A)
Trang 34to the tendency of the magnitude of variables to be a r r a n g e d systematically across space For example, the locations of high values of o n e variable are often associated systematically with high values (or with low values) of
a n o t h e r variable T h u s , if the spatial covariance structure between variables
is k n o w n , a n d the value For one variable is also k n o w n , an estimate of the other variable can be m a d e , along with an estimate of error This k n o w l -edge can be applied in several w a y s , such as interpolation a n d contextual analyses associated with process models
Interpolation m e t h o d s can be placed i n t o t w o classes: exact a n d
ap-p r o x i m a t e (Tarn, 1983) Exact m e t h o d s enforce the condition t h a t the polated surface will pass t h r o u g h the observations A p p r o x i m a t e m e t h o d s use the d a t a points to fit a surface that may pass above or below the actual observations Kriging is a widely used exact m e t h o d in which the link
inter-between location (x,y) and value of the observation (z) is preserved Kriging,
thereFore, threatens confidentiality because it exactly reproduces data ues For each sample point: if the spatial location of sample data points is
val-k n o w n , the linval-ked values of other variables can be revealed (Cox, 2 0 0 4 ) Kriging also provides the analyst with an assessment of the error at each
point
C o n t e x t u a l data are sometimes used to facilitate analysis when detailed exact data are either t o o sensitive For release or unavailable H o w e v e r , contextual data can themselves be identifying; For example, a sequence of daily air quality moniForing readings from the nearest moniFor provide a complete " s i g n a t u r e " For each moniFor, revealing fairly precise locations For individuals whose data arc linked to such air quality readings K n o w l -edge a b o u t context can also be used to infer locations w h e n deterministic spatial process models are used Studies of the h u m a n effects of air pollu-tion may use such models to study a t m o s p h e r i c dispersion of harmful sub-stances Given a model a n d a set of i n p u t p a r a m e t e r s , such as w i n d speed, direction, t e m p e r a t u r e , a n d humidity, results are r e p o r t e d in the Form of a plume " f o o t p r i n t " of dispersion (sec, e.g., C h a k r a b o r t y a n d A r m s t r o n g ,
2 0 0 1 ) If the location of a pollution source is k n o w n , along w i t h the model
a n d its parameters, a result from the model can be used to reveal the locations of participants in the dataset, w h o can then be identified, along with the confidential information t h e y provided For the dataset
D A T A Q U A L I T Y , ACCESS, A N D C O N F I D E N T I A L I T Y : T R A D E O F F S
M o r e precise a n d accurate data are generally m o r e useful For analysis For analysis of social a n d spatial relationships, accuracy and precision in the spatial d a t a are often crucial H o w e v e r , having such data increases the chances t h a t research participants can be identified, thus breaking research-ers' promises of confidentiality In general, as d a t a with detailed rotational
Trang 35LINKED SOCIAL-SPATIAL DATA 23
information a b o u t participants becomes m o r e widely accessible, the risk of
a confidentiality breach increases T h e p r o b l e m of tradeoffs involving data quality, access, a n d confidentiality is becoming m o r e urgent because of t w o recent trends O n e is increased d e m a n d s from research funders, particularly federal agencies, For improving d a t a access so as to increase the scientific benefit derived from a relatively fixed investment in data collection T h e other is the continuing i m p r o v e m e n t in c o m p u t e r technologies generally, and especially techniques For mining datasets—techniques t h a t can be used not only to provide m o r e detailed understanding of social p h e n o m e n a , but also to identify research participants despite researchers' promises of confi-dentiality The current c o n t e x t a n d a consideration of the ethical, legal, a n d statistical issues are discussed in Chapter 2
This report also addresses w a y s to solve the p r o b l e m of increasing the value of linked social-spatial d a t a , both to the original researchers and to potential secondary users, while at: the same time keeping promises of confidentiality to research participants C h a p t e r 3 examines several meth-
o d s available For dealing w i t h the p r o b l e m T h e y can be roughly classified
as technical a n d institutional, a n d each h a s significant limitations
Both technical a n d institutional a p p r o a c h e s limit the a m o u n t of data available, the usefulness of the d a t a For research, or the ways t h a t research-ers can access those data in return For increased protection of pledges of
confidentiality M o s t researchers believe t h a t those restrictions have had a
negative effect on the a m o u n t a n d value of research t h a t h a s been d o n e , but
there is relatively little solid evidence a b o u t the q u a n t i t y of research not performed For this cause It is not surprising t h a t such negative evidence
does not exist, and its absence does not prevent us from r e c o m m e n d i n g improvements At the w o r k s h o p organized by the panel we heard testimony from users of data enclaves a b o u t the ways that: the a r d u o u s rules of those institutions limited research In a d d i t i o n , there w a s interesting testimony submitted at the time of the p r e p a r a t i o n of the 2 0 0 0 U.S census t h a t
d o c u m e n t e d research t h a t could not be c o n d u c t e d because of variables a n d values t h a t the Census Bureau proposed to remove from the Public Use Microdata Samples in order to reduce the risk of identification (Minnesota Population Center, 2 0 0 0 ) The Lack of readily accessible data a b o u t any-thing smaller t h a n quite large areas does limit research Research is not being d o n e on certain topics that require knowledge of locations because the data arc not available or access is difficult
Some of the technical a p p r o a c h e s involve changing data in various ways to protect confidentiality O n e is to mask locations by shifting them
r a n d o m l y This a p p r o a c h helps protect against identification, but makes the data less useful For u n d e r s t a n d i n g the spatial p h e n o m e n a t h a t justified creating the linked dataset in the first place—the significance of location of places (such as h o m e a n d w o r k ) For the social conditions of interest Re-
Trang 36searchers and data stewards need to be sensitive to linkages of data t h a t arc masked in order to avoid conclusions based on an over estimation of the accuracy of data t h a t have been changed in s o m e way
Institutional a p p r o a c h e s include restrictions in access to the d a t a T h e
n o t i o n of tiers of access to data means t h a t there is a gradient of ity: data t h a t create the greatest risk of identification are least available a n d those w i t h the lowest risk arc the m o s t available At the same time, many analyses will only be possible with d a t a t h a t have the highest risk of disclo-sure a n d h a r m a n d thereFore will be the least available
accessibil-The seriousness of these tradeoffs, in terms of the likelihood of fication or disclosure a n d of the potential For h a r m to research participants,
identi-d e p e n identi-d s on attributes of the research p o p u l a t i o n , the information in the dataset, the contexts of inadvertent disclosure, a n d the motives of second-ary users w h o m a y act as " d a t a spies" (Armstrong et al., 1999) in relation
to the dataset, as well as on the strategy used to protect confidentiality MOST, of these facFors apply regardless of whether the data include spatial information, but the availability of spatial characteristics of the research
p o p u l a t i o n can affect the seriousness of the tradeoffs For example, a highly clustered sample of school-age students (with s c h o o f as the primary sam-pling unit and with geographic identifiers) is m o r e identifiable and more
open to risk of harm than a nationally scattered sample of a d u l t s , especially
if the data collected include information a b o u t social n e t w o r k s M a n y Nonspatial facFors can also affect disclosure risk For example, questions
a b o u t individuals' attitudes (what do you think a b o u t " x " ) are less likely to increase disclosure risk t h a n questions a b o u t easily k n o w n characteristics
of family or occupation (age, n u m b e r of children, occupation, distance to place of employment)
At the same time, some questions, if identification occurs, are more likely to be harmful t h a n o t h e r s , with a question a b o u t d r u g use m o r e likely
to cause h a r m t h a n a question a b o u t retirement planning Finally, the ousness of the tradeoffs may d e p e n d on the identities a n d motives of sec-
seri-o n d a r y users At present, little is k n seri-o w n a b seri-o u t such users, w h a t they might
w a n t , the conditions under which they might seek w h a t they w a n t from a confidential dataset, the extent to which what: they w a n t would lead to identification of research participants a n d their attributes, or the techniques
t h a t they might use (see, e.g., D u n c a n a n d Lambert, 1 9 8 6 b ; A r m s t r o n g et al., 1 9 9 9 )
It is possible For the linkage of social and spatial data to create
signifi •Because social networks locate individuals within a social space, releasing social network
data involve analogous risks to the risks related to spatial network data discussed in this
report For discussions of ethical issues in social network research, see Borgatri and Molina (2003), Breiger (2005), Kadushin '2005;, and Klovdahl (2005)
Trang 37LINKED SOCIAL-SPATIAL DATA 25
cant risks of h a r m to research participants For example, it h a s been claimed
t h a t the Nazis used m a p s and tabulations of "Jews and Mixed Breeds" to
r o u n d up people For concentration c a m p s (Cox, 1996) a n d that the U.S government used special tabulations of 1940 census data to locate Japanese Americans For internment (Anderson and Fienberg, 1 9 9 7 ) I m p r o v e m e n t s
in the precision of spatial data and advances in geocoding are likely to lower the costs of identifying people For such p u r p o s e s We note, however,
t h a t risks of identification and h a r m by governments or other organizations with strong capabilities For tracking people a n d mining datasets exist even
if social data are not being collected u n d e r promises of confidentiality T h e key issue For this study concerns the incremental risks of linking confiden-tial social data to precise spatial information a b o u t research participants
A m o n g secondary users w h o might seek information a b o u t particular individuals, those w h o k n o w t h a t a n o t h e r person is likely or certain to be included in a d a t a b a s e (e.g., a parent k n o w i n g t h a t a child w a s studied or one spouse k n o w i n g a b o u t a n o t h e r ) have a much easier time identifying a respondent t h a n someone w h o starts w i t h o u t t h a t k n o w l e d g e Experts sus-pect t h a t although those w h o k n o w which participant they are looking For
m a y be interested in h a r m i n g t h a t individual, they arc unlikely to be ested in harming the entire class of participants or the research process itself T h e benefit-risk tradeoffs created by social-spatial is a major chal-lenge For research policy
Trang 38institu-M o s t U.S research organizations, whether in universities, commercial firms,
or government agencies, have internal safeguards to help guide d a t a Fors and d a t a users in ethical and legal research practices Some also have guidelines For the organizations responsible For preserving a n d disseminat-ing d a t a , d a t a tables, or o t h e r compilations
collec-Government data stewardship agencies use a suite of tools to construct public-use datasets (micro and aggregates) a n d are guided by federal stan-dards (Doyle ct al., 2 0 0 2 ; Confidentiality and Data Access Committee, 2 0 0 0 ,
2 0 0 2 ) For example, current practices that guide the U.S Census Bureau require that geographic regions must contain at least 100,000 persons For micro data a b o u t them to be released (National Center For Health Statistics and Centers For Disease C o n t r o f a n d Prevention, 2 0 0 3 ) M o s t federal agen-cies that produce data For public use maintain disclosure review boards that arc charged with the task of ensuring that the data made available to the public have minimal risk of identification and disclosure Federal guidelines For data collected under the Health Insurance Portability a n d Accountability Act of 1 9 9 6 (IIIPAA) are less stringent: they prohibit release of data For regions with fewer t h a n 2 0 , 0 0 0 persons Table 2-1 shows the approaches of various federal agencies that regularly collect social data to maintaining con-fidentiality, including cell size restrictions a n d various procedural m e t h o d s
Trang 39LEGAL, ETHICAL, AND STATISTICAL ISSUES 27
Fewer guidelines exist For nongovernmental data stewardship tions M a n y large organizations have their own internal s t a n d a r d s and procedures For ensuring that confidentiality is n u t breached Those proce-dures arc designed to ensure that staff members are well trained to avoid disclosure risk and that d a t a in their possession are subject to a p p r o p r i a t e handling at every stage in the research, preservation, a n d dissemination cycle The Inter-university C o n s o r t i u m For Political and Social Research (ICPSR) requires staff to certify annually that they will preserve confidenti-ality It also has a continual process of reviewing and enhancing the training that: its staff receives M o r e o v e r , ICPSR requires that all data it acquires be subject to a careful e x a m i n a t i o n t h a t measures a n d , if necessary, reduces disclosure risk ICPSR also stipulates t h a t data t h a t c a n n o t be distributed publicly over the Internet be m a d e available using a restricted a p p r o a c h (sec
organiza-C h a p t e r 3) O t h e r n o n g o v e r n m e n t a l data stewardship organizations, such
as the R o p e r Center (University of Connecticut), the O d u m Institute versity of N o r t h Carolina), the Center fur International Earth Science Infor-
(Uni-m a t i o n N e t w o r k (CIESIN, at Colu(Uni-mbia University), a n d the M u r r a y search Archive (Harvard University), have their o w n training and disclosure analysis procedures, which over time have been very effective; there have been no publicly acknowledged breaches of confidentiality involving the data handled by these organizations, a n d in private discussions with archive
Re-m a n a g e r s , we have learned of n o n e that led to any k n o w n h a r Re-m to research participants or legal action against data stewards
Universities a n d o t h e r organizations that handle social d a t a h a v e lines and procedures For collecting a n d using data t h a t arc intended to protect confidentiality Institutional review b o a r d s (IRBs) are central in specifying these rules They can be effective partners with data stewardship organizations in creating a p p r o a c h e s that reduce the likelihood of confiden-tiality breaches T h e m a i n activities of IRBs in the consideration of research occur beFore the research is c o n d u c t e d , to ensure that it follows ethical a n d legal s t a n d a r d s Although IRBs arc m a n d a t e d to do periodic continuing review of ongoing research, they generally get involved in any major way only reactively, when transgressions occur a n d are reported Few IRBs are actively involved in questions a b o u t data sharing over the life of a research project, a n d fewer still have expertise in the n e w areas of linked social-spatial d a t a discussed in this report
guide-Although n o t all research is explicitly subject to the regulations t h a t require IRB review, m o s t academic institutions n o w require IRB review For all h u m a n subjects research undertaken by their students, faculty, and staff
In the few cases For which ORB review is not required For research that links location to other h u m a n characteristics a n d survey responses, researchers undertaking such studies are still subject to s t a n d a r d codes of research ethics In a d d i t i o n , m a n y institutions require t h a t their researchers, regard-
Trang 40TABLE 2 - 1 Agency-Specific Features of D a t a Use A g r e e m e n t s a n d Licenses
Mechanisms For Data Approval1
Agency
IRB Approval Required
Institutional Concurrence
Security Pledges All Users
Report Disclosures National Center For
National Institute of Child
Health and Human
National Institute on Alcohof Abuse and Alcoholism
"The agreement mechanisms For data use range from those believed to be most stringent (IRB approval) on the left to the least stringent (notification of reports) on the right In practice, policies For human subjects protection often comprise several mechanisms or facets of them IRB approval and "'institutional concurrence" are similar, though the latter often encompasses financial and legal requirements of grants not generally covered
by IRBs
less of their funding sources, u n d e r g o general h u m a n subjects p r o t e c t i o n training w h e n such issues are p e r t i n e n t to their w o r k or their supervisory roles IRBs a r c also t a k i n g a m o r e public role; For e x a m p l e , m a k i n g re-sources available For investigaFors a n d s t u d y subjects.1 Educating IRBs a n d
1For example, see the website For Columbia University's IRB: http://www.columbia.edu/ cu/irb/ [accessed April 2006]