1. Trang chủ
  2. » Thể loại khác

Handbook of data visualization

954 57 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 954
Dung lượng 33,65 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

XII List of ContributorsNational Taiwan University of Science and Technology Department of Computer Science and Information Engineering Wirtschatsforschung DIW Berlin German Institute fo

Trang 3

School of Business and EconomicsHumboldt-Universität zu BerlinSpandauer Straße 

 BerlinGermanyhaerdle@wiwi.hu-berlin.deProfessor Antony Unwin

Library of Congress Control Number: 

©  Springer-Verlag Berlin Heidelberg

his work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks Duplication of this publication

or parts thereof is permitted only under the provisions of the German Copyright Law of September ,

, in its current version, and permission for use must always be obtained from Springer Violations are liable for prosecution under the German Copyright Law.

he use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Typesetting and Production: LE-TEX Jelonek, Schmidt & Vöckler GbR, Leipzig, Germany

Cover: deblik, Berlin, Germany

Printed on acid-free paper

        

springer.com

Trang 5

III Methodologies

III.1 Interactive Linked Micromap Plots for the Display

of Geographically Referenced Statistical Data

Jürgen Symanzik, Daniel B Carr 267

III.2 Grand Tours, Projection Pursuit Guided Tours, and Manual Controls

Dianne Cook, Andreas Buja, Eun-Kyung Lee, Hadley Wickham 295

III.3 Multidimensional Scaling

Michael A.A Cox, Trevor F Cox 315

III.4 Huge Multidimensional Data Visualization: Back to the Virtue

of Principal Coordinates and Dendrograms in the New Computer Age

Francesco Palumbo, Domenico Vistocco, Alain Morineau 349

III.5 Multivariate Visualization by Density Estimation

Michael C Minnotte, Stephan R Sain, David W Scott 389

III.6 Structured Sets of Graphs

Richard M Heiberger, Burt Holland 415

III.7 Regression by Parts:

Fitting Visually Interpretable Models with GUIDE

Wei-Yin Loh 447

III.8 Structural Adaptive Smoothing

by Propagation–Separation Methods

Jörg Polzehl, Vladimir Spokoiny 471

III.9 Smoothing Techniques for Visualisation

Adrian W Bowman 493

III.10 Data Visualization via Kernel Machines

Yuan-chin Ivan Chang, Yuh-Jye Lee, Hsing-Kuo Pao, Mei-Hsien Lee,

Su-Yun Huang 539

III.11 Visualizing Cluster Analysis and Finite Mixture Models

Friedrich Leisch .561

III.12 Visualizing Contingency Tables

David Meyer, Achim Zeileis, Kurt Hornik 589

III.13 Mosaic Plots and Their Variants

Heike Hofmann .617

III.14 Parallel Coordinates: Visualization, Exploration

and Classiication of High-Dimensional Data

Alfred Inselberg 643

III.15 Matrix Visualization

Han-Ming Wu, ShengLi Tzeng, Chun-Houh Chen 681

III.16 Visualization in Bayesian Data Analysis

Jouni Kerman, Andrew Gelman, Tian Zheng, Yuejing Ding 709

III.17 Programming Statistical Data Visualization in the Java Language

Junji Nakano, Yoshikazu Yamamoto, Keisuke Honda 725

III.18 Web-Based Statistical Graphics using XML Technologies

Yoshiro Yamamoto, Masaya Iizuka, Tomokazu Fujino 757

Trang 6

Table of Contents VII

IV Selected Applications

IV.1 Visualization for Genetic Network Reconstruction

Grace S Shieh, Chin-Yuan Guo 793

IV.2 Reconstruction, Visualization and Analysis of Medical Images

Henry Horng-Shing Lu 813

IV.3 Exploratory Graphics of a Financial Dataset

Antony Unwin, Martin heus, Wolfgang K Härdle 831

IV.4 Graphical Data Representation in Bankruptcy Analysis

Wolfgang K Härdle, Rouslan A Moro, Dorothea Schäfer 853

IV.5 Visualizing Functional Data with an Application

to eBay’s Online Auctions

Wolfgang Jank, Galit Shmueli, Catherine Plaisant, Ben Shneiderman 873

IV.6 Visualization Tools for Insurance Risk Processes

Krzysztof Burnecki, Rafał Weron 899

Trang 7

George Mason University

Center for Computational Statistics

cchen@stat.sinica.edu.tw

Dianne Cook

Iowa State UniversityDepartment of StatisticsUSA

trevor.cox@unilever.com

Yuejing Ding

Columbia UniversityDepartment of StatisticsUSA

yding@stat.columbia.edu

Trang 8

Fukuoka Women’s University

Department of Environmental Science

bholland@temple.edu

Keisuke Honda

Graduate Universityfor Advanced StudiesJapan

khonda@ism.ac.jp

Kurt Hornik

Wirtschatsuniversität WienDepartment of Statisticsand MathematicsAustria

Kurt.Hornik@wu-wien.ac.at

Su-Yun Huang

Academia SinicaInstitute of Statistical ScienceTaiwan

syhuang@stat.sinica.edu.tw

Masaya Iizuka

Okayama UniversityGraduate School of Natural Scienceand Technology

Japaniizuka@ems.okayama-u.ac.jp

Alfred Inselberg

Tel Aviv UniversitySchool of Mathematical SciencesIsrael

aiisreal@post.tau.ac.il

Wolfgang Jank

University of MarylandDepartment of Decisionand Information TechnologiesUSA

wjank@rhsmith.umd.edu

Trang 9

National Taiwan University

of Science and Technology

Department of Computer Science

and Information Engineering

AustriaDavid.Meyer@wu-wien.ac.at

George Michailidis

University of MichiganDepartment of StatisticsUSA

gmichail@umich.edu

Michael C Minnotte

Utah State UniversityDepartment of Mathematicsand Statistics

USAmike.minnotte@usu.edu

Alain Morineau

La Revue MODULADFrance

alain.morineau@modulad.fr

Rouslan A Moro

Humboldt-Universität zu BerlinInstitut für Statistik und ÖkonometrieGermany

rmoro@diw.de

Paul Murrell

University of AucklandDepartment of StatisticsNew Zealand

Trang 10

XII List of Contributors

National Taiwan University

of Science and Technology

Department of Computer Science

and Information Engineering

Wirtschatsforschung (DIW) Berlin

German Institute for Economic Research

gshieh@stat.sinica.edu.tw

Galit Shmueli

University of MarylandDepartment of Decisionand Information TechnologiesUSA

gshmueli@rhsmith.umd.edu

Ben Shneiderman

University of MarylandDepartment of Computer ScienceUSA

ben@cs.umd.edu

Vladimir Spokoiny

Weierstrass Institutefor Applied Analysis and StochasticsGermany

spokoiny@wias-berlin.de

Jürgen Symanzik

Utah State UniversityDepartment of Mathematicsand Statistics

USAsymanzik@math.usu.edu

Martin Theus

University of AugsburgDepartment of Computational Statisticsand Data Analysis

Germanymartin.theus@math.uni-augsburg.de

ShengLi Tzeng

Academia SinicaInstitute of Statistical ScienceTaiwan

hh@stat.sinica.edu.tw

Trang 11

Worcester Polytechnic Institute

Computer Science Department

gwills@spss.com

Han-Ming Wu

Academia SinicaInstitute of Statistical ScienceTaiwan

hmwu@stat.sinica.edu.tw

Yoshikazu Yamamoto

Tokushima Bunri UniversityDepartment of EngineeringJapan

yamamoto@es.bunri-u.ac.jp

Yoshiro Yamamoto

Tokai UniversityDepartment of MathematicsJapan

yamamoto@sm.u-tokai.ac.jp

Achim Zeileis

Wirtschatsuniversität WienDepartment of Statisticsand MathematicsAustria

Achim.Zeileis@wu-wien.ac.at

Tian Zheng

Columbia UniversityDepartment of StatisticsUSA

tzheng@stat.columbia.edu

Trang 13

Part I

Data Visualization

Trang 15

Introduction

Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

1.1 Computational Statistics and Data Visualization . 4

Data Visualization and Theory . 4

Presentation and Exploratory Graphics . 4

Graphics and Computing . 5

1.2 The Chapters . 6

Summary and Overview; Part II . 7

Summary and Overview; Part III . 9

Summary and Overview; Part IV 10

The Authors . 11

1.3 Outlook 12

Trang 16

4 Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

a matter of common sense (in which case their common sense cannot be in goodshape), while others believe that preparing graphics is a low-level task, not appropri-ate for scientific attention his volume of the Handbook of Computational Statisticstakes graphics for data visualization seriously

Data Visualization and Theory

1.1.1

Graphics provide an excellent approach for exploring data and are essential for senting results Although graphics have been used extensively in statistics for a longtime, there is not a substantive body of theory about the topic Quite a lot of atten-tion has been paid to graphics for presentation, particularly since the superb books ofEdward Tute However, this knowledge is expressed in principles to be followed andnot in formal theories Bertin’s work from the s is oten cited but has not beendeveloped further his is a curious state of affairs Graphics are used a great deal inmany different fields, and one might expect more progress to have been made alongtheoretical lines

pre-Sometimes in science the theoretical literature for a subject is considerable whilethere is little applied literature to be found he literature on data visualization is verymuch the opposite Examples abound in almost every issue of every scientific jour-nal concerned with quantitative analysis here are occasionally articles published in

a more theoretical vein about specific graphical forms, but little else Although there

is a respected statistics journal called the Journal of Computational and GraphicalStatistics, most of the papers submitted there are in computational statistics Perhapsthis is because it is easier to publish a study of a technical computational problemthan it is to publish work on improving a graphic display

Presentation and Exploratory Graphics

1.1.2

he differences between graphics for presentation and graphics for exploration lie

in both form and practice Presentation graphics are generally static, and a single

Trang 17

Figure ..A barchart of the number of authors per paper, a histogram of the number of pages per

paper, and parallel boxplots of length by number of authors Papers with more than three authors have

been selected

graphic is drawn to summarize the information to be presented hese displays should

be of high quality and include complete definitions and explanations of the variables

shown and of the form of the graphic Presentation graphics are like proofs of

math-ematical theorems; they may give no hint as to how a result was reached, but they

should offer convincing support for its conclusion Exploratory graphics, on the other

hand, are used for looking for results Very many of them may be used, and they

should be fast and informative rather than slow and precise hey are not intended

for presentation, so that detailed legends and captions are unnecessary One

presen-tation graphic will be drawn for viewing by potentially thousands of readers while

thousands of exploratory graphics may be drawn to support the data investigations

of one analyst

Books on visualization should make use of graphics Figure . shows some simple

summaries of data about the chapters in this volume, revealing that over half the

chapters had more than one author and that more authors does not always mean

longer papers

Developments in computing power have been of great benefit to graphics in recent

years It has become possible to draw precise, complex displays with great ease and

to print them with impressive quality at high resolution hat was not always the

case, and initially computers were more a disadvantage for graphics Computing

screens and printers could at best produce clumsy line-driven displays of low

resolu-tion without colour hese offered no competiresolu-tion to careful, hand-drawn displays

Furthermore, even early computers made many calculations much easier than before

and allowed fitting of more complicated models his directed attention away from

graphics, and it is only in the last  years that graphics have come into their own

again

Trang 18

6 Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

hese comments relate to presentation graphics, that is, graphics drawn for thepurpose of illustrating and explaining results Computing advances have benefittedexploratory graphics, that is, graphics drawn to support exploring data, far more.Not just the quality of graphic representation has improved but also the quantity It isnow trivial to draw many different displays of the same data or to riffle through manydifferent versions interactively to look for information in data hese capabilities areonly gradually becoming appreciated and capitalized on

he importance of sotware availability and popularity in determining what yses are carried out and how they are presented will be an interesting research topicfor future historians of science In the business world, no one seems to be able to

anal-do without the spreadsheet Excel If Excel anal-does not offer a particular graphic form,then that form will not be used (In fact Excel offers many graphic forms, thoughnot all that a statistician would want.) Many scientists, who only rarely need access

to computational power, also rely on Excel and its options In the world of statisticsitself, the packages SAS and SPSS were long dominant In the last  years, first S andS-plus and now R have emerged as important competitors None of these packagescurrently provide effective interactive tools for exploratory graphics, though they areall moving slowly in that direction as well as extending the range and flexibility of thepresentation graphics they offer

Data visualization is a new term It expresses the idea that it involves more thanjust representing data in a graphical form (instead of using a table) he informationbehind the data should also be revealed in a good display; the graphic should aidreaders or viewers in seeing the structure in the data he term data visualization isrelated to the new field of information visualization his includes visualization ofall kinds of information, not just of data, and is closely associated with research bycomputer scientists Up till now the work in this area has tended to concentrate just

on presenting information, rather than on what may be deduced from it Statisticianstend to be concerned more with variability and to emphasize the statistical properties

of results he closer linking of graphics with statistical modelling can make this moreexplicit and is a promising research direction that is facilitated by the flexible nature

of current computing sotware Statisticians have an important role to play here

The Chapters

1.2

Needless to say, each Handbook chapter uses a lot of graphic displays Figure . is

a scatterplot of the number of figures against the number of pages here is an proximate linear relationship with a couple of papers having somewhat more figuresper page and one somewhat less he scales have been chosen to maximize the data-ink ratio An alternative version with equal scales makes clearer that the number offigures per page is almost always less than one

ap-he Handbook has been divided into three sections: Principles, Methodology,and Applications Needless to say, the sections overlap Figure . is a binary matrixvisualization using Jaccard coefficients for both chapters (rows) and index entries

Trang 19

Figure ..A scatterplot of the number of figures against the number of pages for the Handbook’s

chapters

(columns) to explore links between chapters In the raw data map (lower-let portion

of Fig .) there is a banding of black dots from the lower-let to upper-right

cor-ners indicating a possible transition of chapter/index combinations In the proximity

map of indices (upper portion of Fig .), index groups A, B, C, D, and E are

over-lapped with each other and are dominated by chapters of Good Graphics, History,

Functional Data, Matrix Visualization, and Regression by Parts respectively

he ten chapters in Part II are concerned with principles of data visualization First

there is an historical overview by Michael Friendly, the custodian of the Internet

Gallery of Data Visualization, outlining the developments in graphical displays over

the last few hundred years and including many fine examples

In the next chapter Antony Unwin discusses some of the guidelines for the

prepa-ration of sound and attractive data graphics he question mark in the chapter title

sums it up well: whatever principles or recommendations are followed, the success

of a graphic is a matter of taste; there are no fixed rules

he importance of sotware for producing graphics is incontrovertible Paul

Mur-rell in his chapter summarizes the requirements for producing accurate and exact

static graphics He emphasizes both the need for flexibility in customizing standard

plots and the need for tools that permit the drawing of new plot types

Structure in data may be represented by mathematical graphs George Michailidis

pursues this idea in his chapter and shows how this leads to another class of graphic

displays associated with multivariate analysis methods

Trang 20

8 Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

Figure ..Matrix visualizations of the Handbook with chapters in the rows and index entries in the columns

Lee Wilkinson approaches graph-theoretic visualizations from another point ofview, and his displays are concerned predominantly, though by no means exclusively,with trees, directed graphs and geometric graphs He also covers the layout of graphs,

a tricky problem for large numbers of vertices, and raises the intriguing issue of graphmatching

Most data displays concentrate on one or two dimensions his is frequently cient to reveal striking information about a dataset To gain insight into multivariatestructure, higher-dimensional representations are required Martin heus discussesthe main statistical graphics of this kind that do not involve dimension reduction andcompares their possible range of application

suffi-Everyone knows about Chernoff faces, though not many ever use them he tential of data glyphs for representing cases in informative and productive ways hasnot been fully realized Matt Ward gives an overview of the wide variety of possibleforms and of the different ways they can be utilized

Trang 21

po-here are two chapters on linking Adalbert Wilhelm describes a formal model

for linked graphics and the conceptual structure underlying it He is able to

encom-pass different types of linking and different representations Graham Wills looks at

linking in a more applied context and stresses the importance of distinguishing

be-tween views of individual cases and aggregated views He also highlights the variety

of selection possibilities there are in interactive graphics Both chapters point out the

value of linking simple data views over linking complicated ones

he final chapter in this section is by Simon Urbanek He describes the graphics

that have been introduced to support tree models in statistics he close association

between graphics and the models (and collections of models in forests) is particularly

interesting and has relevance for building closer links between graphics and models

in other fields

he middle and largest section of the Handbook concentrates on individual area of

graphics research

Geographical data can obviously benefit from visualization Much of Bertin’s work

was directed at this kind of data Juergen Symanzik and Daniel Carr write about

mi-cromaps (multiple small images of the same area displaying different parts of the

data) and their interactive extension

Projection pursuit and the grand tour are well known but not easy to use Despite

the availability of attractive free sotware, it is still a difficult task to analyse datasets in

depth with this approach Dianne Cook, Andreas Buja, Eun-Kyung Lee and Hadley

Wickham describe the issues involved and outline some of the progress that has been

made

Multidimensional scaling has been around for a long time Michael Cox and Trevor

Cox (no relation, but an MDS would doubtless place them close together) review the

current state of research

Advances in high-throughput techniques in industrial projects, academic studies

and biomedical experiments and the increasing power of computers for data

collec-tion have inevitably changed the practice of modern data analysis Real-life datasets

become larger and larger in both sample size and numbers of variables Francesco

Palumbo, Alain Morineau and Domenico Vistocco illustrate principles of

visualiza-tion for such situavisualiza-tions

Some areas of statistics benefit more directly from visualization than others

Den-sity estimation is hard to imagine without visualization Michael Minnotte, Steve Sain

and David Scott examine estimation methods in up to three dimensions Interestingly

there has not been much progress with density estimation in even three dimensions

Sets of graphs can be particularly useful for revealing the structure in datasets

and complement modelling efforts Richard Heiberger and Burt Holland describe an

approach primarily making use of Cartesian products and the Trellis paradigm

Wei-Yin Loh describes the use of visualization to support the use of regression models, in

particular with the use of regression trees

Trang 22

10 Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

Instead of visualizing the structure of samples or variables in a given dataset, searchers may be interested in visualizing images collected with certain formats Usu-ally the target images are collected with various types of noise pattern and it is neces-sary to apply statistical or mathematical modelling to remove or diminish the noisestructure before the possible genuine images can be visualized Jörg Polzehl and Vlad-imir Spokoiny present one such novel adaptive smoothing procedure in reconstruct-ing noisy images for better visualization

re-he continuing increase in computer power has had many different impacts onstatistics Computationally intensive smoothing methods are now commonplace, al-though they were impossible only a few years ago Adrian Bowman gives an overview

of the relations between smoothing and visualization Yuan-chin Chang, Yuh-Jye Lee,Hsing-Kuo Pao, Mei-Hsien Lee and Su-Yun Huang investigate the impact of kernelmachine methods on a number of classical techniques: principal components, canon-ical correlation and cluster analysis hey use visualizations to compare their resultswith those from the original methods

Cluster analyses have oten been a bit suspect to statisticians he lack of formalmodels in the past and the difficulty of judging the success of the clusterings wereboth negative factors Fritz Leisch considers the graphical evaluation of clusteringsand some of the possibilities for a sounder methodological approach

Multivariate categorical data were difficult to visualize in the past he chapter byDavid Meyer, Achim Zeileis and Kurt Hornik describes fairly classical approachesfor low dimensions and emphasizes the link to model building Heike Hofmann de-scribes the powerful tools of interactive mosaicplots that have become available inrecent years, not least through her own efforts, and discusses how different varia-tions of the plot form can be used for gaining insight into multivariate data features.Alfred Inselberg, the original proposer of parallel coordinate plots, offers an over-view of this approach to multivariate data in his usual distinctive style Here he con-siders in particular classification problems and how parallel coordinate views can beadapted and amended to support this kind of analysis

Most analyses using graphics make use of a standard set of graphical tools, forexample, scatterplots, barcharts, and histograms Han-Ming Wu, ShengLi Tzeng andChun-houh Chen describe a different approach, built around using colour approxi-mations for individual values in a data matrix and applying cluster analyses to orderthe matrix rows and columns in informative ways

For many years Bayesians were primarily theoreticians hanks to MCMC ods they are now able to also apply their ideas to great effect his has led to newdemands in assessing model fit and the quality of the results Jouni Kerman, An-drew Gelman, Tian Zheng and Yuejing Ding discuss graphical approaches for tack-ling these issues in a Bayesian framework

meth-Without sotware to draw the displays, graphic analyis is almost impossible days Junji Nakano, Yamamoto Yoshikazu and Keisuke Honda are working on Java-based sotware to provide support for new developments, and they outline their ap-proach here Many researchers are interested in providing tools via the Web YoshiroYamamoto, Masaya Iizuka and Tomokazu Fujino discuss using XML for interactivestatistical graphics and explain the issues involved

Trang 23

nowa-Summary and Overview; Part IV 1.2.3

he final section contains seven chapters on specific applications of data

visualiza-tion here are, of course, individual applications discussed in earlier chapters, but

here the emphasis is on the application rather than principles or methodology

Genetic networks are obviously a promising area for informative graphic displays

Grace Shieh and Chin-Yuan Guo describe some of the progress made so far and make

clear the potential for further research

Modern medical imaging systems have made significant contributions to

diag-noses and treatments Henry Lu discusses the visualization of data from positron

emission tomography, ultrasound and magnetic resonance

Two chapters examine company bankruptcy datasets In the first one, Antony

Un-win, Martin heus and Wolfgang Härdle use a broad range of visualization tools to

carry out an extensive exploratory data analysis No large dataset can be analysed

cold, and this chapter shows how effective data visualization can be in assessing data

quality and revealing features of a dataset he other bankruptcy chapter employs

graphics to visualize SVM modelling Wolfgang Härdle, Rouslan Moro and Dorothea

Schäfer use graphics to display results that cannot be presented in a closed analytic

form

he astonishing growth of eBay has been one of the big success stories of recent

years Wolfgang Jank, Galit Shmueli, Catherine Plaisant and Ben Shneiderman have

studied data from eBay auctions and describe the role graphics played in their

anal-yses

Krzysztof Burnecki and Rafal Weron consider the application of visualization in

insurance his is another example of how the value of graphics lies in providing

insight into the output of complex models

he editors would like to thank the authors of the chapters for their contributions It

is important for a collective work of this kind to cover a broad range and to gather

many experts with different interests together We have been fortunate in receiving

the assistance of so many excellent contributors

he mixture at the end remains, of course, a mixture Different authors take

dif-ferent approaches and have different styles It early became apparent that even the

term data visualization means different things to different people! We hope that the

Handbook gains rather than loses by this eclecticism

Figures . and . earlier in the chapter showed that the chapter form varied

be-tween authors in various ways Figure . reveals another aspect he scatterplot shows

an outlier with a very large number of references (the historical survey of Michael

Friendly) and that some papers referenced the work of their own authors more than

others he histogram is for the rate of self-referencing

Trang 24

12 Antony Unwin, Chun-houh Chen, Wolfgang K Härdle

Figure ..A scatterplot of the number of references to papers by a chapter’s authors against the total number of references and a histogram of the rate of self-referencing

Outlook

1.3

here are many open issues in data visualization and many challenging researchproblems he datasets to be analysed tend to be more complex and are certainlybecoming larger all the time he potential of graphical tools for exploratory dataanalysis has not been fully realized, and the complementary interplay between statis-tical modelling and graphics has not yet been fully exploited Advances in computersotware and hardware have made producing graphics easier, but they have also con-tributed to raising the standards expected

Future developments will undoubtedly include more flexible and powerful ware and better integration of modelling and graphics here will probably be indi-vidual new and innovative graphics and some improvements in the general design

sot-of displays Gradual gains in knowledge about the perception sot-of graphics and thepsychological aspects of visualization will lead to improved effectiveness of graphicdisplays Ideally there should be progress in the formal theory of data visualization,but that is perhaps the biggest challenge of all

Trang 25

Part II

Principles

Trang 27

Pre-17th Century: Early Maps and Diagrams . 17

1600–1699: Measurement and Theory . 19

1700–1799: New Graphic Forms 22

1800–1850: Beginnings of Modern Graphics . 25

1850–1900: The Golden Age of Statistical Graphics 28

1900–1950: The Modern Dark Ages . 37

1950–1975: Rebirth of Data Visualization 39

1975–present: High-D, Interactive and Dynamic Data Visualization . 40

1.3 Statistical Historiography . 42

History as ‘Data’ . 42

Analysing Milestones Data 43

What Was He Thinking? – Understanding Through Reproduction 45

1.4 Final Thoughts . 48

Trang 28

16 Michael Friendly

It is common to think of statistical graphics and data visualization as relatively ern developments in statistics In fact, the graphic representation of quantitative in-formation has deep roots hese roots reach into the histories of the earliest map mak-ing and visual depiction, and later into thematic cartography, statistics and statisticalgraphics, medicine and other fields Along the way, developments in technologies(printing, reproduction), mathematical theory and practice, and empirical observa-tion and recording enabled the wider use of graphics and new advances in form andcontent

mod-his chapter provides an overview of the intellectual mod-history of data tion from medieval to modern times, describing and illustrating some significant ad-vances along the way It is based on a project, called the Milestones Project, to collect,catalogue and document in one place the important developments in a wide range ofareas and fields that led to modern data visualization his effort has suggested somequestions concerning the use of present-day methods of analysing and understand-ing this history, which I discuss under the rubric of ‘statistical historiography.’

visualiza-Introduction

1.1

he only new thing in the world is the history you don’t know – Harry S Truman

It is common to think of statistical graphics and data visualization as relatively ern developments in statistics In fact, the graphic portrayal of quantitative informa-tion has deep roots hese roots reach into the histories of the earliest map-makingand visual depiction, and later into thematic cartography, statistics and statisticalgraphics, with applications and innovations in many fields of medicine and sciencewhich are oten intertwined with each other hey also connect with the rise of statis-tical thinking and widespread data collection for planning and commerce up throughthe th century Along the way, a variety of advancements contributed to the wide-spread use of data visualization today hese include technologies for drawing andreproducing images, advances in mathematics and statistics, and new developments

mod-in data collection, empirical observation and recordmod-ing

From above ground, we can see the current fruit and anticipate future growth; wemust look below to understand their germination Yet the great variety of roots andnutrients across these domains, which gave rise to the many branches we see today,are oten not well known and have never been assembled in a single garden to bestudied or admired

his chapter provides an overview of the intellectual history of data tion from medieval to modern times, describing and illustrating some significantadvances along the way It is based on what I call the Milestones Project, an attempt

visualiza-to provide a broadly comprehensive and representative catalogue of important velopments in all fields related to the history of data visualization

Trang 29

de-here are many historical accounts of developments within the fields of

proba-bility (Hald, ), statistics (Pearson, ; Porter, ; Stigler, ), astronomy

(Riddell, ) and cartography (Wallis and Robinson, ), which relate to, inter

alia, some of the important developments contributing to modern data

visualiza-tion here are other, more specialized, accounts which focus on the early history

of graphic recording (Hoff and Geddes, , ), statistical graphs (Funkhouser,

, ; Royston, ; Tilling, ), fitting equations to empirical data

(Fare-brother, ), economics and time-series graphs (Klein, ), cartography (Friis,

; Kruskal, ) and thematic mapping (Robinson, ; Palsky, ) and so

forth; Robinson (Robinson, , Chap ) presents an excellent overview of some

of the important scientific, intellectual and technical developments of the th–th

centuries leading to thematic cartography and statistical thinking Wainer and

Velle-man () provide a recent account of some of the history of statistical graphics

But there are no accounts which span the entire development of visual thinking

and the visual representation of data and which collate the contributions of disparate

disciplines Inasmuch as their histories are intertwined, so too should be any telling

of the development of data visualization Another reason for interweaving these

ac-counts is that practitioners in these fields today tend to be highly specialized and

unaware of related developments in areas outside their domain, much less of their

history

In organizing this history, it proved useful to divide history into epochs, each of which

turned out to be describable by coherent themes and labels his division is, of course,

somewhat artificial, but it provides the opportunity to characterize the

accomplish-ments in each period in a general way before describing some of them in more detail

Figure ., discussed in Sect .., provides a graphic overview of the epochs I

de-scribe in the subsections below, showing the frequency of events considered

mile-stones in the periods of this history For now, it suffices to note the labels attached to

these epochs, a steady rise from the early th century to the late th century, with

a curious wiggle thereater

In the larger picture – recounting the history of data visualization – it turns out

that many of the milestone items have a story to be told: What motivated this

de-velopment? What was the communication goal? How does it relate to other

devel-opments – What were the precursors? How has this idea been used or re-invented

today? Each section below tries to illustrate the general themes with a few exemplars

In particular, this account attempts to tell a few representative stories of these periods,

rather than to try to be comprehensive

For reasons of economy, only a limited number of images could be printed here,

and these only in black and white Others are referred to by Web links, mostly from

Trang 30

18 Michael Friendly

Figure ..Time distribution of events considered milestones in the history of data visualization, shown

by a rug plot and density estimate

the Milestones Project, http://www.math.yorku.ca/SCS/Gallery/milestone/, where

a colour version of this chapter will also be found

Pre-17th Century: Early Maps and Diagrams

1.2.1

he earliest seeds of visualization arose in geometric diagrams, in tables of the tions of stars and other celestial bodies, and in the making of maps to aid in navigationand exploration he idea of coordinates was used by ancient Egyptian surveyors inlaying out towns, earthly and heavenly positions were located by something akin tolatitude and longitude by at least  B.C., and the map projection of a spherical earthinto latitude and longitude by Claudius Ptolemy [c –c ] in Alexandria wouldserve as reference standards until the th century

posi-Among the earliest graphical depictions of quantitative information is an mous th-century multiple time-series graph of the changing position of the sevenmost prominent heavenly bodies over space and time (Fig .), described by Funk-houser () and reproduced in Tute (, p ) he vertical axis represents theinclination of the planetary orbits; the horizontal axis shows time, divided into intervals he sinusoidal variation with different periods is notable, as is the use of

anony-a grid, suggesting both anony-an implicit notion of anony-a coordinanony-ate system anony-and something anony-akin

to graph paper, ideas that would not be fully developed until the –s

In the th century, the idea of plotting a theoretical function (as a proto bar graph)and the logical relation between tabulating values and plotting them appeared in

Trang 31

Figure ..Planetary movements shown as cyclic inclinations over time, by an unknown astronomer,

appearing in a th-century appendix to commentaries by A.T Macrobius on Cicero’s In Somnium

Sciponis Source: Funkhouser (, p )

a work by Nicole Oresme [–] Bishop of Liseus(Oresme, , ),

fol-lowed somewhat later by the idea of a theoretical graph of distance vs speed by

Nico-las of Cusa

By the th century, techniques and instruments for precise observation and

mea-surement of physical quantities and geographic and celestial position were well

de-veloped (for example, a ‘wall quadrant’ constructed by Tycho Brahe [–],

cov-ering an entire wall in his observatory) Particularly important were the development

of triangulation and other methods to determine mapping locations accurately

(Fri-sius, ; Tartaglia, ) As well, we see initial ideas for capturing images directly

(the camera obscura, used by Reginer Gemma-Frisius in  to record an eclipse

of the sun), the recording of mathematical functions in tables (trigonometric tables

by Georg Rheticus, ) and the first modern cartographic atlas (heatrum Orbis

Terrarum by Abraham Ortelius, ) hese early steps comprise the beginnings of

data visualization

Amongst the most important problems of the th century were those concerned

with physical measurement – of time, distance and space – for astronomy,

survey-Funkhouser (, p ) was sufficiently impressed with Oresme’s grasp of the relation

be-tween functions and graphs that he remarked, ‘If a pioneering contemporary had collected

some data and presented Oresme with actual figures to work upon, we might have had

sta-tistical graphs four hundred years before Playfair.’

Trang 32

20 Michael Friendly

ing, map making, navigation and territorial expansion his century also saw greatnew growth in theory and the dawn of practical application – the rise of analyticgeometry and coordinate systems (Descartes and Fermat), theories of errors of mea-surement and estimation (initial steps by Galileo in the analysis of observations onTycho Brahe’s star of  (Hald, , §.)), the birth of probability theory (Pascaland Fermat) and the beginnings of demographic statistics (John Graunt) and ‘politi-cal arithmetic’ (William Petty) – the study of population, land, taxes, value of goods,etc for the purpose of understanding the wealth of the state

Early in this century, Christopher Scheiner (–, recordings from ) troduced an idea Tute () would later call the principle of ‘small multiples’ toshow the changing configurations of sunspots over time, shown in Fig . he mul-tiple images depict the recordings of sunpots from  October  until  December

in-of that year he large key in the upper let identifies seven groups in-of sunspots by theletters A–G hese groups are similarly identified in the  smaller images, arrayedlet to right and top to bottom below

Another noteworthy example (Fig .) shows a  graphic by Michael Florentvan Langren[–], a Flemish astronomer to the court of Spain, believed to bethe first visual representation of statistical data (Tute, , p ) At that time, lack of

Figure ..Scheiner’s  representation of the changes in sunspots over time Source: Scheiner (–)

Trang 33

Figure ..Langren’s  graph of determinations of the distance, in longitude, from Toledo to Rome.

he correct distance is    ′ Source: Tute (, p )

a reliable means to determine longitude at sea hindered navigation and exploration.his -D line graph shows all  known estimates of the difference in longitude be-tween Toledo and Rome and the name of the astronomer (Mercator, Tycho Brahe,Ptolemy, etc.) who provided each observation

What is notable is that van Langren could have presented this information in ious tables – ordered by author to show provenance, by date to show priority, or bydistance However, only a graph shows the wide variation in the estimates; note thatthe range of values covers nearly half the length of the scale Van Langren took as hisoverall summary the centre of the range, where there happened to be a large enoughgap for him to inscribe ‘ROMA.’ Unfortunately, all of the estimates were biased up-wards; the true distance (′) is shown by the arrow Van Langren’s graph is also

var-a milestone var-as the evar-arliest known exemplvar-ar of the principle of ‘effect ordering for dvar-atvar-adisplay’ (Friendly and Kwan, )

In the s, the systematic collection and study of social data began in variousEuropean countries, under the rubric of ‘political arithmetic’ (John Graunt,  andWilliam Petty, ), with the goals of informing the state about matters related towealth, population, agricultural land, taxes and so forth,as well as for commercialpurposes such as insurance and annuities based on life tables (Jan de Witt, ) Atapproximately the same time, the initial statements of probability theory around (see Ball, ) together with the idea of coordinate systems were applied by Chris-tiaan Huygens in  to give the first graph of a continuous distribution function(from Graunt’s based on the bills of mortality) he mid-s saw the first bivariateplot derived from empirical data, a theoretical curve relating barometric pressure toaltitude, and the first known weather map,showing prevailing winds on a map ofthe earth (Halley, )

By the end of this century, the necessary elements for the development of graphicalmethods were at hand – some real data of significant interest, some theory to make

For navigation, latitude could be fixed from star inclinations, but longitude required curate measurement of time at sea, an unsolved problem until  with the invention of

ac-a mac-arine chronometer by John Hac-arrison See Sobel () for ac-a populac-ar ac-account

For example, Graunt () used his tabulations of London births and deaths from parishrecords and the bills of mortality to estimate the number of men the king would find avail-able in the event of war (Klein, , pp –)

Image: http://math.yorku.ca/SCS/Gallery/images/huygens-graph.gif

Image: http://math.yorku.ca/SCS/Gallery/images/halleyweathermap-.jpg

Trang 34

22 Michael Friendly

sense of them, and a few ideas for their visual representation Perhaps more tantly, one can see this century as giving rise to the beginnings of visual thinking, asillustrated by the examples of Scheiner and van Langren

impor-1700–1799: New Graphic Forms

1.2.3

With some rudiments of statistical theory, data of interest and importance, and theidea of graphic representation at least somewhat established, the th century wit-nessed the expansion of these aspects to new domains and new graphic forms Incartography, mapmakers began to try to show more than just geographical position

on a map As a result, new data representations (isolines and contours) were invented,and thematic mapping of physical quantities took root Towards the end of this cen-tury, we see the first attempts at the thematic mapping of geologic, economic andmedical data

Abstract graphs, and graphs of functions became more widespread, along with theearly stirrings of statistical theory (measurement error) and systematic collection ofempirical data As other (economic and political) data began to be collected, somenovel visual forms were invented to portray them, so the data could ‘speak to theeyes.’

For example, the use of isolines to show contours of equal value on a coordinategrid (maps and charts) was developed by Edmund Halley () Figure ., showingisogons – lines of equal magnetic declination – is among the first examples of the-matic cartography, overlaying data on a map Contour maps and topographic mapswere introduced somewhat later by Philippe Buache () and Marcellin du Carla-Boniface ()

Timelines, or ‘cartes chronologiques,’ were first introduced by Jacques Dubourg in the form of an annotated chart of all of history (from Creation) on a -foot scroll (Ferguson, ) Joseph Priestley, presumably independently, used a moreconvenient form to show first a timeline chart of biography (lifespans of  famouspeople,  B.C to A.D , Priestley, ), and then a detailed chart of history(Priestley, )

Barbeu-he use of geometric figures (squares or rectangles) and cartograms to compare eas or demographic quantities by Charles de Fourcroy() and August F.W Crome() provided another novel visual encoding for quantitative data using superim-posed squares to compare the areas of European states

ar-As well, several technological innovations provided necessary ingredients for theproduction and dissemination of graphic works Some of these facilitated the repro-duction of data images, such as three-colour printing, invented by Jacob le Blon in

, and lithography, invented by Aloys Senefelder in  Of the latter, Robinson(, p ) says “the effect was as great as the introduction [of the Xerox machine].”Yet, likely due to expense, most of these new graphic forms appeared in publicationswith limited circulation, unlikely to attract wide attention

Image: http://math.yorku.ca/SCS/Gallery/images/palsky/defourcroy.jpg

Trang 35

Figure ..A portion of Edmund Halley’s New and Correct Sea Chart Shewing the Variations in the Compass in the Western and Southern Ocean,  Source: Halley (), image from Palsky (, p )

A prodigious contributor to the use of the new graphical methods, Johann bert [–] introduced the ideas of curve fitting and interpolation from empir-ical data points He used various sorts of line graphs and graphical tables to showperiodic variation in, for example, air and soil temperature.

Lam-William Playfair [–] is widely considered the inventor of most of the ical forms used today – first the line graph and barchart (Playfair, ), later the

graph-Image: http://www.journals.uchicago.edu/Isis/journal/demo/vn//fg.gif

Trang 36

In this figure, the let axis and line on each circle/pie graph shows population,while the right axis and line shows taxes Playfair intended that the slope of the lineconnecting the two would depict the rate of taxation directly to the eye; but, of course,the slope also depends on the diameters of the circles Playfair’s graphic sins can per-haps be forgiven here, because the graph clearly shows the slope of the line for Britain

to be in the opposite direction of those for the other nations

A somewhat later graph (Playfair, ), shown in Fig ., exemplifies the best thatPlayfair had to offer with these graphic forms Playfair used three parallel time series

to show the price of wheat, weekly wages and reigning ruler over a -year spanfrom  to  and used this graph to argue that workers had become better off inthe most recent years

By the end of this century (), the utility of graphing in scientific applicationsprompted a Dr Buxton in London to patent and market printed coordinate paper;curiously, a patent for lined notepaper was not issued until  he first known

Trang 37

Figure ..William Playfair’s  time-series graph of prices, wages and reigning ruler over a -year

period Source: Playfair (), image from Tute (, p )

published graph using coordinate paper is one of periodic variation in barometric

pressure (Howard, ) Nevertheless, graphing of data would remain rare for

an-other  or so years,perhaps largely because there wasn’t much quantitative

infor-mation (apart from widespread astronomical, geodetic and physical measurement)

of sufficient complexity to require new methods and applications Official statistics,

regarding population and mortality, and economic data were generally fragmentary

and oten not publicly available his would soon change

With the fertilization provided by the previous innovations of design and technique,

the first half of the th century witnessed explosive growth in statistical graphics and

thematic mapping, at a rate which would not be equalled until modern times

In statistical graphics, all of the modern forms of data display were invented:

bar-and piecharts, histograms, line graphs bar-and time-series plots, contour plots,

scatter-plots and so forth In thematic cartography, mapping progressed from single maps

to comprehensive atlases, depicting data on a wide variety of topics (economic,

so-cial, moral, medical, physical, etc.), and introduced a wide range of novel forms of

symbolism During this period graphical analysis of natural and physical

phenom-ena (lines of magnetism, weather, tides, etc.) began to appear regularly in scientific

publications as well

In , the first geological maps were introduced in England by William Smith

[–], setting the pattern for geological cartography or ‘stratigraphic geology’

William Herschel (), in a paper that describes the first instance of a modern scatterplot,

devoted three pages to a description of plotting points on a grid

Trang 38

26 Michael Friendly

(Smith, ) hese and other thematic maps soon led to new ways of showing titative information on maps and, equally importantly, to new domains for graphi-cally based inquiry

quan-In the s, Baron Charles Dupin [–] invented the use of continuousshadings (from white to black) to show the distribution and degree of illiteracy inFrance (Dupin, ) – the first unclassed choropleth map, and perhaps the firstmodern-style thematic statistical map (Palsky, , p ) Later given the lovelytitle ‘Carte de la France obscure et de la France éclairée,’ it attracted wide attention,and was also perhaps the first application of graphics in the social realm

More significantly, in , the ministry of justice in France instituted the firstcentralized national system of crime reporting, collected quarterly from all depart-ments and recording the details of every charge laid before the French courts In ,André-Michel Guerry, a lawyer with a penchant for numbers, used these data (alongwith other data on literacy, suicides, donations to the poor and other ‘moral’ vari-ables) to produce a seminal work on the moral statistics of France (Guerry, ) –

a work that (along with Quételet, , ) can be regarded as the foundation ofmodern social science.

Guerry used maps in a style similar to Dupin to compare the ranking of ments on pairs of variables, notably crime vs literacy, but other pairwise variablecomparisons were made.He used these to argue that the lack of an apparent (nega-tive) relation between crime and literacy contradicted the armchair theories of somesocial reformers who had argued that the way to reduce crime was to increase edu-cation.Guerry’s maps and charts made somewhat of an academic sensation both

depart-in France and the rest of Europe; he later exhibited several of these at the  don Exhibition and carried out a comparative study of crime in England and France(Guerry, ) for which he was awarded the Moynton Prize in statistics by theFrench Academy of Sciences.But Guerry’s systematic and careful work was unable

Lon-Image: http://math.yorku.ca/SCS/Gallery/images/dupin-map_.jpg

Guerry showed that rates of crime, when broken down by department, type of crime, age andgender of the accused and other variables, remained remarkably consistent from year to year,yet varied widely across departments He used this to argue that such regularity implied thepossibility of establishing social laws, much as the regularity of natural phenomena impliedphysical ones Guerry also pioneered the study of suicide, with tabulations of suicides inParis, –, by sex, age, education, profession, etc., and a content analysis of suicidenotes as to presumed motives

Today, one would use a scatterplot, but that graphic form had only just been invented schel, ) and would not enter common usage for another  years; see Friendly and Denis()

(Her-Guerry seemed reluctant to take sides He also contradicted the social conservatives whoargued for the need to build more prisons or impose more severe criminal sentences SeeWhitt ()

Among the  plates in this last work, seven pairs of maps for England and France eachincluded sets of small line graphs to show trends over time, decompositions by subtype

of crime and sex, distributions over months of the year, and so forth he final plate, ongeneral causes of crime, is an incredibly detailed and complex multivariate semi-graphicdisplay attempting to relate various types of crimes to each other, to various social and moralaspects (instruction, religion, population) as well as to their geographic distribution

Trang 39

Figure ..A portion of Dr Robert Baker’s cholera map of Leeds, , showing the districts affected by cholera Source: Gilbert (, Fig )

to shine in the shadows cast by Adolphe Quételet, who regarded moral and socialstatistics as his own domain

In October , the first case of asiatic cholera occurred in Great Britain, and over

  people died in the epidemic that ensued over the next  months or so (Gilbert,

) Subsequent cholera epidemics in – and – produced similarlylarge death tolls, but the water-borne cause of the disease was unknown until when Dr John Snow produced his famous dot map(Snow, ) showing deathsdue to cholera clustered around the Broad Street pump in London his was indeed

a landmark graphic discovery, but it occurred at the end of the period, roughly –

, which marks a high point in the application of thematic cartography to human(social, medical, ethnic) topics he first known disease map of cholera (Fig .), due

to Dr Robert Baker (), shows the districts of Leeds ‘affected by cholera’ in theparticularly severe  outbreak

I show this figure to make another point – why Baker’s map did not lead to a reka’ experience, while John Snow’s did Baker used a town plan of Leeds that hadbeen divided into districts Of a population of   in all of Leeds, Baker mapped

‘eu-Image: http://www.math.yorku.ca/SCS/Gallery/images/snow.jpg

Trang 40

28 Michael Friendly

the  cholera cases by hatching in red ‘the districts in which the cholera had vailed.’ In his report, he noted an association between the disease and living con-ditions: ‘how exceedingly the disease has prevailed in those parts of the town wherethere is a deficiency, oten an entire want of sewage, drainage and paving’ (Baker, ,

pre-p ) Baker did not indicate the incidence of disease on his map, nor was he equipped

to display rates of disease (in relation to population density),and his knowledge ofpossible causes, while definitely on the right track, was both weak and implicit (notanalysed graphically or by other means) It is likely that some, perhaps tenuous, causalindicants or evidence were available to Baker, but he was unable to connect the dots

or see a geographically distributed outcome in relation to geographic factors in eventhe simple ways that Guerry had tried

At about the same time, –, the use of graphs began to become recognized

in some official circles for economic and state planning – where to build railroads andcanals? What is the distribution of imports and exports? his use of graphical meth-ods is no better illustrated than in the works of Charles Joseph Minard [–],whose prodigious graphical inventions led Funkhouser () to call him the Playfair

of France To illustrate, we choose (with some difficulty) an  ‘tableau-graphique’(Fig .) by Minard, an early progenitor of the modern mosaicplot (Friendly, )

On the surface, mosaicplots descend from bar charts, but Minard introduced two multaneous innovations: the use of divided and proportional-width bars so that areahad a concrete visual interpretation he graph shows the transportation of commer-cial goods along one canal route in France by variable-width, divided bars (Minard,

si-) In this display the width of each vertical bar shows distance along this route;the divided-bar segments have height proportional to amount of goods of varioustypes (shown by shading), so the area of each rectangular segment is proportional tothe cost of transport Minard, a true visual engineer (Friendly, ), developed suchdiagrams to argue visually for setting differential price rates for partial vs completeruns Playfair had tried to make data ‘speak to the eyes,’ but Minard wished to makethem ‘calculer par l’œil’ as well

It is no accident that, in England, outside the numerous applications of graphicalmethods in the sciences, there was little interest in or use of graphs amongst statis-ticians (or ‘statists’ as they called themselves) If there is a continuum ranging from

‘graph people’ to ‘table people,’ British statisticians and economists were ically more table-inclined and looked upon graphs with suspicion up to the time ofWilliam Stanley Jevons around  (Maas and Morgan, ) Statistics should beconcerned with the recording of ‘facts relating to communities of men which are ca-pable of being expressed by numbers’ (Mouat, , p ), leaving the generalization

philosoph-to laws and theories philosoph-to others Indeed, this view was made abundantly clear in thelogo of the Statistical Society of London (now the Royal Statistical Society): a banded

he German geographer Augustus Petermann produced a ‘Cholera map of theBritish Isles’ in  using national data from the – epidemic (image:http://images.rgs.org/webimages//////S.jpg) shaded in proportion

to the relative rate of mortality using class intervals (< ,   ,   , )

No previous disease map had allowed determination of the range of mortality in any givenarea

Ngày đăng: 01/06/2018, 15:04

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN