1. Trang chủ
  2. » Ngoại Ngữ

A handbook of statistics analysis of R

21 116 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 21
Dung lượng 244,7 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

1237.2 R output of the summary method for the logistic regressionmodel fitted to ESR and fibrigonen.. 1247.3 R output of the summary method for the logistic regressionmodel fitted to ESR

Trang 1

A Handbook of

Statistical Analyses

Using n

SECOND EDITION

Trang 2

A Handbook of

Statistical Analyses

Using

SECOND EDITION

Brian S Everitt and Ibrsten Hothorn

CRC Press

Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the

Taylor & Francis Croup, an informa business

A CHAPMAN & HALL BOOK

Trang 3

Taylor & Francis Group

6000 Broken Sound Parkway NW, Suite 300

Boca Raton, FL 33487-2742

© 2010 by Taylor and Francis Group, LLC

Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S Government works

Printed in the United States of America on acid-free paper

10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-4200-7933-3 (Paperback)

This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

transmit-For permission to photocopy or use material electronically from this work, please access www.copyright com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC,

a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used

only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Everitt, Brian.

A handbook of statistical analyses using R / Brian S Everitt and Torsten Hothorn

2nd ed.

p cm.

Includes bibliographical references and index.

ISBN 978-1-4200-7933-3 (pbk : alk paper)

1 Mathematical statistics Data processing Handbooks, manuals, etc 2 R

(Computer program language) Handbooks, manuals, etc I Hothorn, Torsten II Title QA276.45.R3E94 2010

Trang 5

Preface to Second Edition

Like the first edition this book is intended as a guide to data analysis withthe R system for statistical computing New chapters on graphical displays,generalised additive models and simultaneous inference have been added tothis second edition and a section on generalised linear mixed models completesthe chapter that discusses the analysis of longitudinal data where the responsevariable does not have a normal distribution In addition, new examples andadditional exercises have been added to several chapters We have also takenthe opportunity to correct a number of errors that were present in the firstedition Most of these errors were kindly pointed out to us by a variety of peo-ple to whom we are very grateful, especially Guido Schwarzer, Mike Cheung,Tobias Verbeke, Yihui Xie, Lothar H¨aberle, and Radoslav Harman

We learnt that many instructors use our book successfully for introductorycourses in applied statistics We have had the pleasure to give some coursesbased on the first edition of the book ourselves and we are happy to shareslides covering many sections of particular chapters with our readers LATEXsources and PDF versions of slides covering several chapters are available fromthe second author upon request

A new version of the HSAUR package, now called HSAUR2 for obviousreasons, is available from CRAN Basically the package vignettes have beenupdated to cover the new and modified material as well Otherwise, the tech-nical infrastructure remains as described in the preface to the first edition,with two small exceptions: names of R add-on packages are now printed inbold font and we refrain from showing significance stars in model summaries.Lastly we would like to thank Thomas Kneib and Achim Zeileis for com-menting on the newly added material and again the CRC Press staff, in par-ticular Rob Calver, for their support during the preparation of this secondedition

Brian S Everitt and Torsten Hothorn

London and M¨unchen, April 2009

Trang 6

Preface to First Edition

This book is intended as a guide to data analysis with the R system for tistical computing R is an environment incorporating an implementation ofthe S programming language, which is powerful and flexible and has excellentgraphical facilities (R Development Core Team, 2009b) In the Handbook weaim to give relatively brief and straightforward descriptions of how to conduct

sta-a rsta-ange of ststa-atisticsta-al sta-ansta-alyses using R Esta-ach chsta-apter desta-als with the sta-ansta-aly-sis appropriate for one or several data sets A brief account of the relevantstatistical background is included in each chapter along with appropriate ref-erences, but our prime focus is on how to use R and how to interpret results

analy-We hope the book will provide students and researchers in many disciplineswith a self-contained means of using R to analyse their data

Ris an open-source project developed by dozens of volunteers for more thanten years now and is available from the Internet under the General Public Li-

cence R has become the lingua franca of statistical computing Increasingly,

implementations of new statistical methodology first appear as R add-on ages In some communities, such as in bioinformatics, R already is the primaryworkhorse for statistical analyses Because the sources of the R system are openand available to everyone without restrictions and because of its powerful lan-guage and graphical capabilities, R has started to become the main computingengine for reproducible statistical research (Leisch, 2002a,b, 2003, Leisch andRossini, 2003, Gentleman, 2005) For a reproducible piece of research, the orig-inal observations, all data preprocessing steps, the statistical analysis as well

pack-as the scientific report form a unity and all need to be available for inspection,reproduction and modification by the readers

Reproducibility is a natural requirement for textbooks such as the Handbook

of Statistical Analyses Using R and therefore this book is fully reproducible

using an R version greater or equal to 2.2.1 All analyses and results, includingfigures and tables, can be reproduced by the reader without having to retype

a single line of R code The data sets presented in this book are collected

in a dedicated add-on package called HSAUR accompanying this book Thepackage can be installed from the Comprehensive R Archive Network (CRAN)via

R> install.packages("HSAUR")

and its functionality is attached by

R> library("HSAUR")

The relevant parts of each chapter are available as a vignette, basically a

Trang 7

document including both the R sources and the rendered output of everyanalysis contained in the book For example, the first chapter can be inspectedby

R> vignette("Ch_introduction_to_R", package = "HSAUR")

and the R sources are available for reproducing our analyses by

R> edit(vignette("Ch_introduction_to_R", package = "HSAUR"))

An overview on all chapter vignettes included in the package can be obtainedfrom

R> vignette(package = "HSAUR")

We welcome comments on the R package HSAUR, and where we think theseadd to or improve our analysis of a data set we will incorporate them into thepackage and, hopefully at a later stage, into a revised or second edition of thebook

Plots and tables of results obtained from R are all labelled as ‘Figures’ inthe text For the graphical material, the corresponding figure also containsthe ‘essence’ of the R code used to produce the figure, although this code maydiffer a little from that given in the HSAUR package, since the latter mayinclude some features, for example thicker line widths, designed to make abasic plot more suitable for publication

We would like to thank the R Development Core Team for the R system, andauthors of contributed add-on packages, particularly Uwe Ligges and VinceCarey for helpful advice on scatterplot3d and gee Kurt Hornik, Ludwig A.Hothorn, Fritz Leisch and Rafael Weißbach provided good advice with somestatistical and technical problems We are also very grateful to Achim Zeileisfor reading the entire manuscript, pointing out inconsistencies or even bugsand for making many suggestions which have led to improvements Lastly wewould like to thank the CRC Press staff, in particular Rob Calver, for theirsupport during the preparation of the book Any errors in the book are, ofcourse, the joint responsibility of the two authors

Brian S Everitt and Torsten Hothorn

London and Erlangen, December 2005

Trang 8

List of Figures

1.1 Histograms of the market value and the logarithm of the

market value for the companies contained in the Forbes 2000

1.2 Raw scatterplot of the logarithms of market value and sales 201.3 Scatterplot with transparent shading of points of the loga-rithms of market value and sales 211.4 Boxplots of the logarithms of the market value for four

selected countries, the width of the boxes is proportional tothe square roots of the number of companies 222.1 Histogram (top) and boxplot (bottom) of malignant melanomamortality rates 302.2 Parallel boxplots of malignant melanoma mortality rates bycontiguity to an ocean 312.3 Estimated densities of malignant melanoma mortality rates

by contiguity to an ocean 322.4 Scatterplot of malignant melanoma mortality rates by geo-graphical location 332.5 Scatterplot of malignant melanoma mortality rates againstlatitude 342.6 Bar chart of happiness 352.7 Spineplot of health status and happiness 362.8 Spinogram (left) and conditional density plot (right) of

happiness depending on log-income 383.1 Boxplots of estimates of room width in feet and metres (afterconversion to feet) and normal probability plots of estimates

of room width made in feet and in metres 553.2 Routput of the independent samples t-test for the roomwidth

3.3 R output of the independent samples Welch test for the

roomwidth data 563.4 R output of the Wilcoxon rank sum test for the roomwidth

Trang 9

3.6 Routput of the paired t-test for the waves data 593.7 R output of the Wilcoxon signed rank test for the waves

3.8 Enhanced scatterplot of water hardness and mortality,

showing both the joint and the marginal distributions and,

in addition, the location of the city by different plotting

rearrestsdata computed via a binomial test 634.1 An approximation for the conditional distribution of the

difference of mean roomwidth estimates in the feet andmetres group under the null hypothesis The vertical linesshow the negative and positive absolute value of the teststatistic T obtained from the original data 714.2 R output of the exact permutation test applied to the

roomwidthdata 724.3 R output of the exact conditional Wilcoxon rank sum testapplied to the roomwidth data 734.4 Routput of Fisher’s exact test for the suicides data 735.1 Plot of mean weight gain for each level of the two factors 845.2 Routput of the ANOVA fit for the weightgain data 855.3 Interaction plot of type and source 865.4 Plot of mean litter weight for each level of the two factors forthe foster data 875.5 Graphical presentation of multiple comparison results for thefosterfeeding data 905.6 Scatterplot matrix of epoch means for Egyptian skulls data 926.1 Scatterplot of velocity and distance 1046.2 Scatterplot of velocity and distance with estimated regressionline (left) and plot of residuals against fitted values (right) 1056.3 Boxplots of rainfall 1076.4 Scatterplots of rainfall against the continuous covariates 1086.5 Routput of the linear model fit for the clouds data 1096.6 Regression relationship between S-Ne criterion and rainfallwith and without seeding 1116.7 Plot of residuals against fitted values for clouds seeding

Trang 10

6.8 Normal probability plot of residuals from cloud seeding modelclouds_lm 1146.9 Index plot of Cook’s distances for cloud seeding data 1157.1 Conditional density plots of the erythrocyte sedimentationrate (ESR) given fibrinogen and globulin 1237.2 R output of the summary method for the logistic regressionmodel fitted to ESR and fibrigonen 1247.3 R output of the summary method for the logistic regressionmodel fitted to ESR and both globulin and fibrinogen 1257.4 Bubbleplot of fitted values for a logistic regression model

fitted to the plasma data 1267.5 R output of the summary method for the logistic regressionmodel fitted to the womensrole data 1277.6 Fitted (from womensrole_glm_1) and observed probabilities

of agreeing for the womensrole data 1297.7 R output of the summary method for the logistic regressionmodel fitted to the womensrole data 1307.8 Fitted (from womensrole_glm_2) and observed probabilities

of agreeing for the womensrole data 1317.9 Plot of deviance residuals from logistic regression model fitted

to the womensrole data 1327.10 R output of the summary method for the Poisson regressionmodel fitted to the polyps data 1337.11 R output of the print method for the conditional logistic

regression model fitted to the backpain data 1368.1 Three commonly used kernel functions 1448.2 Kernel estimate showing the contributions of Gaussian kernelsevaluated for the individual observations with bandwidth

h= 0.4 1458.3 Epanechnikov kernel for a grid between (−1.1, −1.1) and

(1.1, 1.1) 1468.4 Density estimates of the geyser eruption data imposed on ahistogram of the data 1488.5 A contour plot of the bivariate density estimate of the

CYGOB1 data, i.e., a two-dimensional graphical display for athree-dimensional problem 1498.6 The bivariate density estimate of the CYGOB1 data, here shown

in a three-dimensional fashion using the persp function 1508.7 Fitted normal density and two-component normal mixturefor geyser eruption data 1528.8 Bootstrap distribution and confidence intervals for the meanestimates of a two-component mixture for the geyser data 155

Trang 11

9.1 Initial tree for the body fat data with the distribution of bodyfat in terminal nodes visualised via boxplots 1669.2 Pruned regression tree for body fat data 1679.3 Observed and predicted DXA measurements 1689.4 Pruned classification tree of the glaucoma data with class

distribution in the leaves 1699.5 Estimated class probabilities depending on two important

variables The 0.5 cut-off for the estimated glaucoma bility is depicted as a horizontal line Glaucomateous eyes areplotted as circles and normal eyes are triangles 1729.6 Conditional inference tree with the distribution of body fatcontent shown for each terminal leaf 1739.7 Conditional inference tree with the distribution of glaucoma-teous eyes shown for each terminal leaf 17410.1 A linear spline function with knots at a = 1, b = 3 and c = 5 18310.2 Scatterplot of year and winning time 18710.3 Scatterplot of year and winning time with fitted values from

proba-a simple lineproba-ar model 18810.4 Scatterplot of year and winning time with fitted values from

a smooth non-parametric model 18910.5 Scatterplot of year and winning time with fitted values from

a quadratic model 19010.6 Partial contributions of six exploratory covariates to the

predicted SO2 concentration 19110.7 Residual plot of SO2 concentration 19210.8 Spinograms of the three exploratory variables and responsevariable kyphosis 19310.9 Partial contributions of three exploratory variables with

confidence bands 19411.1 ‘Bath tub’ shape of a hazard function 20211.2 Survival times comparing treated and control patients 20511.3 Kaplan-Meier estimates for breast cancer patients who eitherreceived a hormonal therapy or not 20711.4 Routput of the summary method for GBSG2_coxph 20811.5 Estimated regression coefficient for age depending on timefor the GBSG2 data 20911.6 Martingale residuals for the GBSG2 data 21011.7 Conditional inference tree for the GBSG2 data with the

survival function, estimated by Kaplan-Meier, shown forevery subgroup of patients identified by the tree 21112.1 Boxplots for the repeated measures by treatment group forthe BtheB data 220

Ngày đăng: 09/04/2017, 12:12

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm