This text will assist graduate programs in applied linguistics and second language acquisition/studies in providing “in-house” instruction on statistical techniques using sample data and
Trang 2cur-to carry out a relatively advanced, novel, and/or underused statistical technique Using readily available statistical software packages such as SPSS, the chapters walk the reader from conceptualization through to output and interpretation of a range
of advanced statistical procedures such as bootstrapping, mixed effects modeling, cluster analysis, discriminant function analysis, and meta-analysis This practical hands-on volume equips researchers in applied linguistics and second language acquisition (SLA) with the necessary tools and knowledge to engage more fully with key issues and problems in SLA and to work toward expanding the statistical repertoire of the field
Luke Plonsky (PhD, Michigan State University) is a faculty member in the
Applied Linguistics program at Northern Arizona University His interests include SLA and research methods, and his publications in these and other areas have
appeared in Annual Review of Applied Linguistics, Applied Linguistics, Language ing, Modern Language Journal, and Studies in Second Language Acquisition, among other major journals and outlets He is also Associated Editor of Studies in Second Language Acquisition and Managing Editor of Foreign Language Annals.
Trang 3Learn-SECOND LANGUAGE ACQUISITION RESEARCH SERIES
Susan M Gass and Alison Mackey, Series Editors
Monographs on Theoretical Issues:
Trang 4The Longitudinal Study of Advanced L2 Capacities (2008)
The Psychology of the Language Learner—Revisited (2015)
Monographs on Research Methodology:
Trang 5Second Language Research: Methodology and Design (2005)
Gass with Behney & Plonsky
Second Language Acquisition: An Introductory Course, Fourth Edition (2013)
Trang 7First published 2015
by Routledge
711 Third Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2015 Taylor & Francis
The right of Luke Plonsky to be identified as the author of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered
trademarks, and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Plonsky, Luke.
Advancing quantitative methods in second language research / Luke Plonsky, Northern Arizona University.
pages cm — (Second Language Acquisition Research Series)
Includes bibliographical references and index.
1 Second language acquisition—Resesarch 2 Second language acquisition—Data processing 3 Language and languages—Study and teaching—Research 4 Language acquisition—Research 5 Language acquisition—Data processing 6 Quantitative research 7 Multilingual computing 8 Computational linguistics I Title
Trang 9This page intentionally left blank
Trang 10List of Illustrations xi Acknowledgments xvii
3 Statistical Power, p Values, Descriptive Statistics, and Effect
Sizes: A “Back-to-Basics” Approach to Advancing Quantitative
Luke Plonsky
4 A Practical Guide to Bootstrapping Descriptive Statistics,
Geoffrey T LaFlair, Jesse Egbert, and Luke Plonsky
CONTENTS
Trang 11x Contents
Thom Hudson
Luke Plonsky and Frederick L Oswald
PART III
Eun Hee Jeon
Ian Cunnings and Ian Finlayson
9 Exploratory Factor Analysis and Principal
Shawn Loewen and Talip Gonulal
Beth Mackey and Steven J Ross
Index 347
Trang 123.2 Screenshot of effect size calculator for Cohen’s d 32
3.4 Linear regression dialogue box used to calculate CIs
3.7 Output for descriptive statistics produced through
3.8 Descriptive statistics and CIs for abstracts
4.4 Descriptive statistics table with bootstrapped 95% CIs
4.5 Correlations output table with bootstrapped 95% CIs
4.7 Independent-Samples Test output table with
ILLUSTRATIONS
Trang 13xii Illustrations
4.8 Bootstrap mean differences, Q-Q plot, and jackknife-after- boot plot of the mean difference between English and
Vietnamese 664.9 Plot of the bootstrap T-statistics, their Q-Q plot, and the
4.10 One-way ANOVA output table with bootstrapped
5.1 Cleveland’s 1993 graphic display of barley harvest data from
5.2 Types of graphics used over last four regular issues of five
5.3 Bar chart showing means of listening scores for each
category of self-rated confidence ratings with
5.5 Grouped bar chart for speaking scores by gender
5.9 Box-and-whisker plots for the five proficiency levels
5.10 Student scores (means and CIs) on five tests administered
5.11 Mean scores and 95% CIs on reading, listening,
5.12 Graphic representation of score data across levels with
5.13 Scatter plot for the relationship between reading scores
5.15 Mean state scores for NAEP data in Table 5.4 ordered
5.17 Number of weekly online posts with sparklines showing
Trang 146.2 Example of a funnel plot without the presence
6.3 Example of a funnel plot with the presence
7.6 SPSS standard multiple regression dialogue boxes:
7.7 SPSS standard multiple regression dialogue boxes:
7.10 SPSS hierarchical regression analysis dialogue boxes:
7.11 SPSS hierarchical regression analysis dialogue boxes:
selections of PV for the second and final model
8.1 Q-Q plots for untransformed (left) and transformed
9.3 Example of KMO measure of sampling adequacy
Trang 1510.9 Output file three-factor model with correlated error
11.1 Step 1 247
11.3 Step 3, part 1 24911.4 Step 3, part 2 24911.5 Step 4, part 1 25011.6 Step 4, part 2 251
11.12 Truncated agglomeration schedule for 947
11.13 Distance between fusion coefficients
Trang 1611.25 Cluster membership by score level for the
11.26 Cluster membership by score level for the
13.6 Two-dimensional output for three group average values
TABLES
3.5 General benchmarks for interpreting d and r effect
5.1 Types of graphical charts and frequency of use found in
5.2 2009 average reading scale score sorted by gender, grade
5.3 2009 average NAEP reading scale scores by gender for
Trang 17xvi Illustrations
5.4 2009 average NAEP reading scale scores by gender for
grade 12 public schools in 11 states sorted on state mean
6.1 Suggested categories for coding within meta-analyses of L2 research 110
7.6 SPSS output for variables entered/removed in hierarchical
7.8 SPSS output for ANOVA resulting from hierarchical
regression 153
11.1 Reformatted fusion coefficients for final six clusters formed 257
11.3 Means and standard deviations for the three–cluster solution 263
13.2 Box’s M output for testing homogeneity of covariance
13.4 Relationship output for individual predictor variables
13.6 Accuracy of classification output for membership
14.3 Comparison of Means software (exploratory and
Trang 18I want to begin by expressing my sincere gratitude to the diverse set of individuals who have contributed to this volume in equally diverse ways I am very grateful, first of all, to all 18 chapter authors It is clear from their work that they are not only experts in the statistical procedures they have written about but in their abil-ity to communicate and train others on these procedures as well I also thank the authors for their perseverance and persistence in the face of my many requests
In addition to my own comments, each chapter was also reviewed by at least one reviewer from both the target audience (graduate students or junior researchers with at least one previous course in statistics) and from the modest pool of applied linguists with expertise in the focal procedure of each chapter I am very thankful for the comments and suggestions of these reviewers which led to many substan-tial improvements throughout the volume: Dan Brown, Meishan Chen, Euijung Cheong, Joseph Collentine, Jersus Colmenares, Scott Crossley, Deirdre Derrick, Jesse Egbert, Maria Nelly Gutierrez Arvizu, Eun Hee Jeon, Tingting Kang, Geof-frey LaFlair, Jenifer Larson-Hall, Jared Linck, Junkyu Lee, Qiandi Liu, Meghan Moran, John Norris, Gary Ockey, Fred Oswald, Steven Ross, Erin Schnur, and Soo Jung Youn Along these lines, my thanks go to the students in my ENG 599 and 705 courses, who read and commented on prepublication versions of many of the chapters in the book Special thanks to Deirdre Derrick for all her help on the index I also thank Shawn Loewen and Fred Oswald, both of whom have had a (statistically) significant effect on my development as quantitative researcher A big thanks go to Sue Gass and Alison Mackey, series editors, for their encouragement and support in carrying this book from an idea to its current form Last, thanks
to you, the reader, for your interest in advancing the field’s quantitative methods
In the words of Geoff Cumming, happy reading and “may all your confidence intervals be short!”
ACKNOWLEDGMENTS
Trang 19This page intentionally left blank
Trang 20Douglas Biber (Northern Arizona University)
James Dean Brown (University of Hawaii at Manoa) Ian Cunnings (University of Reading)
Jesse Egbert (Brigham Young University)
Ian Finlayson (University of Edinburgh)
Talip Gonulal (Michigan State University)
Thom Hudson (University of Hawaii at Manoa)
Eun Hee Jeon (University of North Carolina, Pembroke) Ute Knoch (University of Melbourne)
Geoffrey T LaFlair (Northern Arizona University) Shawn Loewen (Michigan State University)
Beth Mackey (University of Maryland)
Tim McNamara (University of Melbourne)
John M Norris (Georgetown University)
Frederick L Oswald (Rice University)
Luke Plonsky (Northern Arizona University)
Steven J Ross (University of Maryland)
Rob Schoonen (University of Amsterdam)
Shelley Staples (Purdue University)
CONTRIBUTORS
Trang 21This page intentionally left blank
Trang 22PART I
Introduction
Trang 23This page intentionally left blank
Trang 24Rationale for This Book
Several reviews of quantitative second language (L2) research have demonstrated that empirical efforts in the field rely heavily on a very narrow range of statisti-cal procedures (e.g., Gass, 2009; Plonsky, 2013) Namely, nearly all quantitative
studies employ t tests, ANOVAs, and/or correlations In many cases, these tests
are viable means to address the research questions at hand; however, problems associated with these techniques arise frequently (e.g., failing to meet statistical assumptions) More concerning, though, is the capacity of these tests to provide meaningful and informative answers to our questions about L2 learning, teach-ing, testing, use, and so forth Also concerning is that the near-default status of these statistics restricts researchers’ ability to understand relationships between constructs of interest as well as their use of analyses to examine such relationships
In other words, our research questions are being constrained by our knowledge
of statistical tools
This problem manifests itself in at least two ways First, it is not uncommon
to find researchers that convert intervally measured (independent) variables into categorical ones in order for the data to fit into an ANOVA model Doing so trades precious variance for what appears to be a more straightforward analyti-cal approach (see Plonsky, Chapter 3 in this volume, for further comments and suggestions related to this practice) Second, and perhaps more concerning, the relatively simple statistics found in most L2 research are generally unable to model the complex relationships we are interested in L2 learning and use are multivari-ate in nature (see, e.g., Brown, Chapter 2 in this volume) Many studies account for the complexity in these processes by measuring multiple variables Few, how-ever, attempt to analyze them using multivariate techniques Consequently, it is
1
INTRODUCTION
Luke Plonsky
Trang 254 Luke Plonsky
common to find 20 or 30 univariate tests in a single study leading to a greater chance of Type I error and, more importantly, a fractured view of the relation-ships of interest (Plonsky, 2013)
Before going on I need to clarify two points related to the intentions behind this volume First, neither I nor the authors who have contributed to this volume are advocating for blindly applied technical or statistical sophistication I agree wholeheartedly with the recommendation of the American Psychological Asso-ciation to employ statistical procedures that are “minimally sufficient” to address the research questions being posed (Wilkinson & Task Force on Statistical Infer-ence, 1999, p 598) Second, the procedures described in this book are just tools Yes, they carry great potential to help us address substantive questions that cannot otherwise be answered We have to remember, though, that our analyses must be guided by the substantive interests and relationships in question and not the other way around I mention this because of the tendency, particularly among novice researchers, to become fascinated with a particular method or statistic and to allow one’s research questions to be driven by the method
Having laid out these rationales and caveats at the heart of this volume is
an interest in informing and expanding the statistical repertoire of L2 researchers Toward this end, each chapter provides the conceptual motivation for and the practical, step-by-step guidance needed to carry out a relatively advanced, novel, and/or underused statistical technique using readily available statistical software packages (e.g., SPSS) In related disciplines such as education and psychology, these techniques are introduced in statistics texts and employed regularly Despite their potential in our field, however, they are rarely used and almost entirely absent from methodological texts written for applied linguistics
This volume picks up where introductory texts (e.g., Larson-Hall, 2015) leave off and assumes a basic understanding of research design as well as basic statistical
concepts and techniques used in L2 research (e.g., t test, ANOVA, correlation)
The book goes beyond these procedures to provide a “second course,” that is, a conceptual primer and practical tutorial on a number of analyses not currently available in other methods volumes in applied linguistics The hope is that, by doing so, researchers in the field will be better equipped to address questions cur-rently posed and to take on novel or more complex questions
The book also seeks to improve methodological training in graduate programs, the need for which has been suggested as the result of recent studies surveying both published research as well as researcher self-efficacy (e.g., Loewen et al., 2014; Plonsky, 2014) This text will assist graduate programs in applied linguistics and second language acquisition/studies in providing “in-house” instruction on statistical techniques using sample data and examples tailored to the variables, interests, measures, and designs particular to L2 research
Beyond filling gaps in the statistical knowledge of the field and in available texts and reference books, this volume also seeks to contribute to the budding method-ological and statistical reform movement taking place in applied linguistics The
Trang 26field has seen a rapid increase in its awareness of methodological issues in the last decade Evidence of this movement, which holds that methodological rigor and transparency are critical to advancing our knowledge of L2 learning and teaching,
is found in meta-analyses (e.g., Norris & Ortega, 2000), methodological syntheses (e.g., Hashemi & Babaii, 2013; Plonsky & Gass, 2011), methodologically oriented conferences and symposia (e.g., the Language Learning Currents conference in 2013), and a number of article- and book-length treatments raising method-ological issues (e.g., Norris, Ross, & Schoonen, in press; Plonsky & Oswald, 2014; Porte, 2012) This book aims to both contribute to and benefit from the momen-tum in this area, serving as a catalyst for much additional work seeking to advance the means by which L2 research is conducted
Themes
In addition to the general aim of moving forward quantitative L2 research, three major themes present themselves across the volume The first and most prevalent theme is the role of researcher judgment in conducting each of the analyses pre-sented here Results based on statistical analyses can obscure the decisions made throughout the research process that led to those results As Huff (1954) states in
the now-classic How to Lie with Statistics, “despite its mathematical base, statistics
is as much an art as it is a science” (p 120) As noted throughout this book, sion points abound in more advanced and multivariate statistics These procedures involve multiple steps and are particularly subject to the judgment of individual researchers Consequently, researchers must develop and combine not only sub-stantive but also methodological/statistical expertise in order for the results of such analyses to maximally inform L2 theory, practice, and future research.The second theme, transparency, builds naturally on the first Appropriate deci-sion making is a necessary but insufficient requisite for the theoretical and/or practical potential of a study to be realized Choices made throughout the process must also be justified in the written report, giving proper consideration to the strengths and weaknesses resulting from each decision relative to other avail-able options Consumers of research can then more adequately and confidently interpret study results Of course, the need for transparency applies not only to methodological procedures but also to the reporting of data (see Larson-Hall & Plonsky, in press)
deci-The third major theme found throughout this volume is the interrelatedness of the procedures presented Statistical techniques are often presented and discussed
in isolation despite great conceptual and statistical commonalities ANOVA and multiple regression, for example, are usually considered—and taught—as distinct statistical techniques However, ANOVA can be considered a type of regression with a single, categorical predictor variable; see Cohen’s (1968) introduction to the general linear model (GLM) The relationship between these procedures can also be demonstrated statistically: The eta-squared effect size yielded by an
Trang 276 Luke Plonsky
ANOVA will be equal to the R2 from a multiple regression based on the same independent/predictor and dependent/criterion variables Both indices express the amount of variance the independent variable accounts for in the dependent variable Whenever applicable, the chapters in this volume have drawn attention
to such similarities and shared utility among procedures
Structure of the Book
This book is divided into three parts containing 14 chapters written by some of the most methodologically savvy scholars in the field Part I sets up the book and the techniques found throughout Brown’s chapter, following this brief introduc-tion, discusses the value and place of more advanced statistics, highlighting advan-tages/benefits and disadvantages of applying such techniques in L2 research The remaining two parts correspond to two complementary approaches to advancing quantitative L2 research The chapters in Part II seek to enhance and improve upon techniques currently in use A chapter I wrote begins Part III with a critique
of the status quo of null hypothesis significance testing (NHST) in L2 research The chapter then guides readers toward more appropriate and informative use
of p values, effect sizes, and descriptive statistics, particularly in the context of means-based comparisons (t tests, ANOVAs) and correlations LaFlair, Egbert, and
myself provide a step-by-by guide to an alternative approach to running these same analyses proposed to aid L2 researchers overcome some of the problems
commonly found in our data (e.g., non-normality, small Ns): bootstrapping
Hud-son then illustrates a number of key principles for visual presentations of titative data In the final chapter of Part II, Fred Oswald and I present a practical guide to conducting meta-analyses of L2 research (This chapter is an updated and expanded version of a similar one we published in 2012.)
quan-The eight chapters in the second section focus on more advanced statistical cedures that, despite their potential, are not commonly found in L2 research Each chapter begins with a conceptual overview followed by a step-by-step guide to the targeted technique These include multiple regression ( Jeon), mixed effects model-ing and longitudinal analysis (Cunnings & Finlayson), factor analysis (Loewen & Gonulal), structural equation modeling (Schoonen), cluster analysis (Staples & Biber), Rasch analysis (Knoch & McNamara), discriminant function analysis (Norris), and Bayesian data analysis (Mackey & Ross) Practice data sets have been provided on the companion website to go along with each chapter in this part of the book as well as with Chapters 3, 4, and 6 in the previous part The companion website can be found here: http://oak.ucc.nau.edu/ldp3/AQMSLR.html
pro-Software
One of the challenges in preparing and using a book like this one is ing the statistical software Such a decision involves considering accessibility, cost,
Trang 28choos-TABLE 1.1 Software used and available for procedures in this book
Descriptives, NHST, effect
Bootstrapping (Chapter 4) SPSS, R Excel (macro)
Meta-analysis (Chapter 6) SPSS, Excel R
Multiple regression
Mixed effects, longitudinal
Factor analysis (Chapter 9) SPSS R, Excel (macro)
Structural equation modeling
(Chapter 10) LISREL, SPSS (AMOS) R, Excel (macro)
Cluster analysis (Chapter 11) SPSS R, Excel (macro)
Rasch analysis (Chapter 12) Winsteps, Facets SPSS (extension), R, Excel
(macro) Discriminant function
Bayesian analysis
(Chapter 14) Comparison of Means SPSS (AMOS), R, Excel
*I have limited additional options to SPSS, R, and Excel, the three most commonly used programs used for statistical analyses in applied linguistics according to Loewen et al (2014).
user friendliness, and consistency across chapters, among other issues more, there are numerous options available, each of which possess a unique set of strengths and weaknesses IBM’s SPSS, for example, is very user friendly but can
Further-be costly The default settings in SPSS can also lead to users not understanding the choices that the program makes for them (e.g., Mizumoto & Plonsky, in review; Plonsky & Gonulal, in press)
As shown in Table 1.1, most analyses in this book have been demonstrated using SPSS To a much lesser extent, Microsoft Excel and R (R development core team, 2014) have also been used along with, in a small number of cases, more specialized packages
Hashemi, M R., & Babaii, E (2013) Mixed methods research: Toward new research designs
in applied linguistics Modern Language Journal, 97, 828–852.
Huff, D (1954) How to lie with statistics New York: Norton & Company.
Larson-Hall, J (2015) A guide to doing statistics in second language research using SPSS and R
New York: Routledge.
Trang 298 Luke Plonsky
Larson-Hall, J., & Plonsky, L (in press) Reporting and interpreting quantitative research
findings: What gets reported and recommendations for the field Language Learning,
Norris, J M., & Ortega, L (2000) Effectiveness of L2 instruction: A research synthesis and
quantitative meta-analysis Language Learning, 50, 417–528.
Norris, J M., Ross, S., & Schoonen, R (Eds.) (in press) Improving and extending quantitative reasoning in second language research Malden, MA: Wiley.
Plonsky, L (2013) Study quality in SLA: An assessment of designs, analyses, and reporting
practices in quantitative L2 research Studies in Second Language Acquisition, 35, 655–687.
Plonsky, L (2014) Study quality in quantitative L2 research (1990–2010): A
methodologi-cal synthesis and methodologi-call for reform Modern Language Journal, 98, 450–470.
Plonsky, L., & Gass, S (2011) Quantitative research methods, study quality, and outcomes:
The case of interaction research Language Learning, 61, 325–366.
Plonsky, L., & Gonulal, T (2015) Methodological reviews of quantitative L2 research: A
review of reviews and a case study of exploratory factor analysis Language Learning,
R development core team (2014) R: A language and environment for statistical computing
Vienna, Austria: R Foundation for Statistical Computing.
Wilkinson, L., & Task Force on Statistical Inference (1999) Statistical methods in
psychol-ogy journals: Guidelines and explanations American Psychologist, 54, 594–604.
Trang 30WHY BOTHER LEARNING
ADVANCED QUANTITATIVE
METHODS IN L2 RESEARCH?
James Dean Brown
Why would anyone bother to learn advanced quantitative methods in second language (L2) research? Isn’t it bad enough that language researchers often need
to learn basic statistical analyses? Well, the answer is that learning statistics is like learning anything else: About the time you think you’ve finished, you find that there is so much more to learn Like so many things, every time you come to the crest of a hill, you see the next hill So maybe instead of asking “Why bother?” you should be asking “What’s next after I learn the basic stats?” That is what this book
is about In this chapter, I will summarize some of the benefits that you can reap from taking that next step and continuing to learn more advanced techniques in quantitative analysis Naturally, such benefits must always be weighed against any disadvantages as well, so I will consider those too
What Are the Advantages of Using Advanced
Quantitative Methods?
By advantages, I mean the benefits, the plusses, and the pros of learning and using the advanced quantitative methods covered in this book and elsewhere The primary advantages are that you can learn to measure more precisely, think beyond the basic null hypothesis significance test, avoid the problem of multiple comparisons, increase the statistical power of your studies, broaden your research perspective, align your research analyses more closely to the way people think, reduce redundancy and the number of variables, expand the number and types of variables, get more flexibility in your analyses, and simultaneously address multiple levels of analysis Let’s consider each of these advantages in turn
Trang 3110 James Dean Brown
Measuring More Precisely
One concern that all researchers should share is for the accuracy and precision of the ways they measure the variables in their studies Variables can be quantified
as nominal, ordinal, interval, or ratio scales (for a readily available, brief review of these four concepts, see Brown, 2011a) Variables that are nominal, ordinal, or ratio scales in nature can typically be observed and quantified fairly easily and reliably However, interval scales (e.g., proficiency test scores, questionnaire subsection scores) may be more problematic That is why you should take special care in developing and piloting such measures and should always report the reliability in your study of the resulting scores as well as arguments that support their valid-ity One issue that is seldom addressed is the degree to which these “interval” scales are actually true interval scales Can you say that the raw score points on a particular test actually represent equal intervals? If not, then defending the scores
as an interval scale may not be justified One solution to that problem is to use
an advanced statistical technique called Rasch analysis This form of analysis can help you analyze and improve any raw-score interval scales you use, but also as
a byproduct, you can use Rasch analyses to convert those raw scores into logit
scores which arguably form a true interval scale There are a number of other
rea-sons why you might want to use Rasch analysis to better understand your scales and how precisely you are measuring the variables in your studies (Knoch & McNamara, Chapter 12 in this volume)
Thinking Beyond the Null Hypothesis Significance Test
In this volume, in Chapter 3 Plonsky examines the proper place of null hypothesis
significance testing (NHST) and the associated p values, as well as the importance
of examining the descriptive statistics that underlie the NHST and considering the statistical power of the study as well as the estimated effect sizes As far back as the 1970s, I can remember my statistics teachers telling me that doing an analysis
of variance (ANOVA) procedure and finding a significant result is just the ning They always stressed the importance of considering the assumptions and of following up with planned or post hoc comparisons, with plots of interaction effects, and with careful attention to the descriptive and reliability statistics In the ensuing years, the importance of also considering confidence intervals (CI), power, and effect sizes (for more on these concepts see Plonsky, Chapter 3 in this volume; Brown 2007, 2008a, 2011b) has become increasingly evident All
begin-of these advanced follow-up strategies are so much more informative than the initial result that it is downright foolish to stop interpreting the results once you
have found a significant p value Similar arguments can be made for following
up on initial multiple-regression results, on complex contingency table analyses,
or on any other form of analysis you may perform The point is that you should
never stop just because you got (or didn’t get) an initial significant p value There
Trang 32is so much more to be learned from using follow-up analyses and more still from thinking about all of your results as one comprehensive picture of what is going
on in your data
Avoiding the Problem of Multiple Comparisons
Another important benefit of using advanced statistical analyses is that they can help you avoid the problem of multiple comparisons, also known technically as avoiding Type I errors (incorrect rejection of a true null hypothesis) This is a problem that arises when a researcher uses a univariate statistical test (one that was designed to make a single hypothesis test at a specified probability level within the NHST framework) multiple times in the same study with the same data For more advanced ANOVA techniques with post hoc comparisons, or for studies with multiple dependent variables, multivariate ANOVA (or MANOVA) designs can greatly expand the possibilities for controlling or minimizing such Type I errors These strategies work because they make it possible to analyse more variables simultaneously and adjust for multiple comparisons, thereby giv-ing greater power to the study as a whole and avoiding or minimizing Type
I errors For more on this topic, see Plonsky, Chapter 3 in this volume, or Brown (1990, 2008b)
Increasing Statistical Power
Another way of looking at the issue of multiple statistical tests just described is that many of the more complex (and multivariate) statistical analyses provide strategies and tools for more powerful tests of significance when compared with
a series of univariate techniques used with the same data In the process, using these more omnibus designs, researchers are more likely to focus on CIs, effect sizes, and power instead of indulging in the mania for significance that multiple comparisons exemplifies (again see Plonsky, Chapter 3 in this volume)
In addition, as LaFlair, Egbert, and Plonsky point out in Chapter 4, the advanced
statistical technique called bootstr apping provides a nonparametric alternative to the t-test and ANOVA that can help to overcome problems of small sample sizes
and nonnormal distributions, and do so with increased statistical power Since many studies in our field have these problems with sample size and normality, bootstrapping is an advanced statistical technique well worth knowing about
Broadening Your Research Perspective
More advanced statistical analyses will also encourage you to shift from a myopic focus on single factors or pairs of factors to examining multiple relationships among a number of variables Thus, you will be more likely to look at the larger picture for patterns Put another way, you are more likely to consider all parts of
Trang 3312 James Dean Brown
the picture at the same time, and might therefore see relationships between and among variables (all at once) that you might otherwise have missed or failed to understand
Indeed, you will gain an even more comprehensive view of the data and results for a particular area of research by learning about and applying an advanced
technique called meta-analysis As Plonsky and Oswald explain (Chapter 6 in this
volume), meta-analysis can be defined narrowly as “a statistical method for culating the mean and the variance of a collection of effect sizes across studies,
cal-usually correlations (r) or standardized mean differences (d )” or broadly as “not
only these narrower statistical computations, but also the conceptual integration
of the literature and the findings that gives the meta-analysis its substantive ing” (p 106) Truly, this advanced form of analysis will give you the much broader perspective of comparing the results from a number of (sometimes contradictory) studies in the same area of research
mean-Aligning Your Research Analyses More Closely
to the Way People Think
Because of their broadened focus, many advanced analyses more closely match the ways that you actually think (or perhaps should think) about your data More specifically, language learning is complex and complicated to think about, and some of the advanced statistics can account for such complexity by allowing the study of multiple variables simultaneously, which of course provides a richer and more realistic way of looking at data than is provided by examining one single variable at a time or even pairs of variables
In addition, Hudson (Chapter 5 in this volume) explains the importance of visually representing the data and results and doing so effectively Two of the follow-up strategies mentioned earlier (plotting the interaction effects and CIs) are often effectively illustrated or explained in graphical representations (as line graphs and box-and-whisker plots, respectively) Indeed, thinking beyond the ini-tial NHST and using more advanced statistical analyses will naturally tend to lead you to use tables and figures to visualize many relationships simultaneously For example, a table of univariate follow-up tests adjusted for multiple comparisons puts all of the results in one place and forces you and your reader to consider them
as a package; a factor analysis table shows the relationships among dozens of ables in one comprehensive way; a Rasch analysis figure can show the relation-ships between individual examinees’ performances and the item difficulty at the same time and on the same scale; and a structural equation model figure shows the relationships among all the variables in a study in an elegant comprehensive picture Such visual representations will not only help you interpret the complex-ity and richness of your data and results, but will also help your readers understand your results as a comprehensive set
Trang 34vari-Reducing Redundancy and the Number of Variables
Few researchers think about it, but advanced statistical analyses can also help you by reducing the confusion of data that you may face Since these advanced analyses often require careful screening of the data, redundant variables (e.g., two variables correlating at say 90, which means they are probably largely represent-ing the same construct) are likely to be noticed and one of them eliminated (to
avoid what is called multicollinearity) In fact, one very useful function of factor
analysis (see Loewen & Gonulal, Chapter 9 in this volume) in its many forms is
data reduction For example, if a factor analysis of 32 variables reveals only eight
factors, researchers might want to consider the possibility that there is erable redundancy in her data As a result, she may decide to select only those eight variables with the highest loadings on the eight factors, or may decide to collapse (by averaging them) all of the variables loading together on each factor
consid-to create eight new variables, or may decide consid-to use the eight sets of facconsid-tor scores produced by the factor analysis as variables Whatever choice is made, the study will have gone from 32 variables (with considerable dependencies, relationships, and redundancies among them) to eight variables (that are relatively orthogonal,
or independent) Such a reduction in the number of variables will very often have the beneficial effect of increasing the overall power of the study as well as the parsimony in the model being examined (see Jeon, Chapter 7 in this volume)
Expanding the Number and Types of Variables
Paradoxically, while reducing the number of variables, advanced statistical analyses can also afford you the opportunity to expand the number and types of variables
in your study in important ways For instance, research books often devote
con-siderable space to discussions of moderator variables and control variables, but simple
univariate analyses do not lend themselves to including those sorts of variables Fortunately, more complex analyses actually allow including such variables in
a variety of ways More precisely, multivariate analyses allow you to introduce
additional moderator variables to determine the links between the independent and
dependent variables or to specify the conditions under which those associations take place Similarly, various multivariate analyses can be structured to include
control variables or associations between variables, while examining still other
asso-ciations (e.g., partial and semi-partial correlations, covariate analyses, hierarchical multiple regressions) Thus, moderator and control variables not only become a reality, but can also help us to more clearly understand the core analyses in a study
Getting More Flexibility in Your Analyses
Most quantitative research courses offer an introduction to regression analysis,
which is a useful form of analysis if you want to estimate the degree of relationship
Trang 3514 James Dean Brown
between two continuous scales (i.e., interval or ratio) or to predict one of those scales from the other However, more advanced statistical analyses offer consider-
ably more flexibility For instance, multiple regression (see Jeon, Chapter 7 in this
volume) allows you the possibility of predicting one dependent variable from
multiple continuous and/or categorical independent variables Discriminant tion analysis (see Norris, Chapter 13 in this volume) makes it possible to predict
func-a cfunc-ategoricfunc-al vfunc-arifunc-able from multiple continuous vfunc-arifunc-ables (or more func-accurfunc-ately, to determine the degree to which the continuous variables correctly classify mem-
bership in the categories) Logistic regression makes it possible to predict a
categori-cal variable such as group membership from categoricategori-cal or continuous variables,
or both Loglinear modeling can be applied to purely categorical data to test the
fit of a regression-like equation to the data For excellent coverage of all of these forms of analysis, see Tabachnick and Fidell (2013)
Other advanced statistical procedures provide the flexibility to look beyond simple relationships to patterns in relationships For example, instead of look-ing at a correlation coefficient or a matrix of simple correlation coefficients, it
is possible to examine patterns in those correlation coefficients by performing
factor analysis, which can reveal subsets of variables in a larger set of variables that
are related within subsets, yet are fairly independent between subsets The three types of factor analysis (principle components analysis, factor analysis, and con-firmatory factor analysis; see Chapter 9 in this volume for Loewen and Gonulal’s explanation of the differences) can help you understand the underlying pattern
of relationships among your variables, and thereby help you to: (a) determine which variables are redundant and therefore should be eliminated (as described earlier); (b) decide which variables or combination of variables to use in sub-sequent analyses; and (c) item-analyze, improve, and/or validate your measures
In contrast, cluster analysis is a “multivariate exploratory procedure that is used
to group cases (e.g., participants or texts) Cluster analysis is useful in studies where there is extensive variation among the individual cases within predefined categories” (Staples & Biber, Chapter 11 in this volume, p 243) Also useful is
multiway analysis, which can help you study the associations among three or
more categorical variables (see Tabachnick & Fidell, 2013 for more on multiway analysis)
Another form of analysis that provides you with considerable flexibility is
structural equation modeling (SEM), which is
a collection of analyses that can be used for many questions in L2 research SEM can deal with multiple dependent variables and multiple independent variables, and these variables can be continuous, ordinal or discrete [also
known as categorical ], and they can be indicated as observed variables (i.e.,
observed scores) or as latent variables (i.e., the underlying factor of a set of observed variables) (Mueller & Hancock, 2008; Ullman, 2006)
(Schoonen, Chapter 10 in this volume, p 214)
Trang 36SEM combines ideas that underlie many of the other forms of analysis discussed here, but can additionally be used to model theories (a) to investigate if your data fit them, (b) to compare that fit for several data sets (e.g., for boys and girls), or (c) to examine changes in fit longitudinally.
With regard to means comparisons, mixed effects models (see Cunnings &
Finlayson, Chapter 8 in this volume), which by definition are models that include both fixed and random effects, are flexible enough to be used with data that are normally distributed or that are categorical (i.e., nonnumeric) In addition, mixed effects models are especially useful when designs are unbalanced (i.e., groups have different numbers of participants in each) or have missing data Importantly,
if you are studying learning over time, these models can accommodate repeated measures in longitudinal studies
Simultaneously Addressing Multiple Levels of Analysis
Advanced statistical analyses, especially multivariate analyses, also encourage researchers to use more than one level of analysis Indeed, these advanced analy-ses can provide multiple levels of analysis that help in examining data and the phenomena they represent in overarching ways A simple example is provided by MANOVA, which is a first stage that can justify examining multiple univariate
ANOVAs (with p values adjusted for the multiple comparisons) in a second stage
Stepwise regression or hierarchical/sequential versions of various analyses allow researchers to analyze predictor variables and combinations of variables in stages, even while factoring out another variable or combination of variables
Similarly, Bayesian data analysis as Mackey and Ross apply it to item analysis in Chapter 14 (in this volume) not only provides an alternative to NHST ANOVA approaches, but in fact,
The conceptual difference between null hypothesis testing and the
Bayes-ian alternative is that predictions about mean differences are stated a priori
in a hierarchy of differences as motivated by theory-driven claims In this
approach, the null hypothesis is typically superfluous, as the researchers aim
to confirm that the predicted order of mean differences are instantiated in the data Support for the hierarchically ordered means hypothesis is evident only if the predicted order of mean differences is observed The predicted and plausible alternative hypotheses thus must be expressed in advance of the data analysis—thus making the subsequent ANOVA confirmatory
(Mackey & Ross, Chapter 14 in this volume, p 334)Clearly, this advanced alternative form of analysis not only provides a means for examining data hierarchically and with consideration to previous findings and/
or theoretical predictions, but in fact, it also demands that the data be examined
in that way from the outset
Trang 3716 James Dean Brown
What Are the Disadvantages of Using Advanced
Quantitative Methods?
So far, I have shown some of the numerous advantages of learning more about advanced statistical analyses But given the investment of time and energy involved, the disadvantages of using these advanced techniques should be weighed as well
I will take up those issues next By disadvantages, I mean the difficulties that are likely to be encountered in learning and using advanced quantitative methods like those covered in this book
Larger Sample Sizes
Many of the advanced statistical procedures require larger sample sizes than the more traditional and simpler univariate analyses The sample sizes often need to
be in the hundreds, if not bigger, in order to produce meaningful and table results The central problem with applying many of these advanced statistics
interpre-to small samples is that the standard errors of all the estimates will tend interpre-to be large, which may make analyzing the results meaningless
Unfortunately, getting large sample sizes is often difficult because you will need to get people to cooperate and to get approval from human subjects com-mittees Getting people to cooperate is a problem largely because people are busy and, more to the point, they do not feel that your study is as important as you feel it is Getting human subjects committees to approve your research can also
be vexingly difficult because those committees are often made up of researchers from other fields who have little sympathy for or understanding of the problems
of doing L2 research Nonetheless, for those doing advanced statistical analyses, getting an adequate sample size is crucial, so the data gathering stage in the research process is an important place to invest a good deal of your time and energy
Additional Assumptions
Another disadvantage of the more advanced statistical procedures is that they tend to require that additional assumptions be met Where a simple correlation coefficient will have three assumptions, a multiple regression analysis will have
at least five assumptions, two of which will require the data screening discussed
in the next paragraph In addition, whereas for univariate statistics a good deal
is known about the robustness of violating assumptions (e.g., it is known that ANOVA is fairly robust to violations of the assumption of equal variances if the cell sizes are fairly similar), less is known about such robustness in the more complex designs of advanced statistical procedures For a summary of assumptions underlying univariate and some multivariate statistics, see Brown (1992), or for multivariate statistics, see the early sections of each of the chapters in Tabachnick and Fidell (2013)
Trang 38Need for Data Screening
In analyzing whether the data in a study meet the assumptions of advanced
statisti-cal procedures, data screening is often essential For example, univariate normality
(for each variable) and multivariate normality (for all variables taken together) are assumptions of a number of the more advanced forms of statistical analysis Screening the data to see if these assumptions are met means examining the data for univariate and multivariate outliers, as well as examining skew and kurtosis sta-tistics for each variable and sometimes looking at the actual histograms to ensure that they look approximately normal Not only are such procedures tedious and time consuming, but also they may require you to eliminate cases that are outliers, change some of your data points to bring them into the distribution, or mathe-matically transform the data for one variable or more Such moves are not difficult, but they are tedious In addition, they are hard to explain to the readers of a study
in applied linguistics and may seem to those readers as though you are ing your data Worse yet, moves like mathematical transformations take the analysis one step away from the original data, which may start to become uncomfortable even for you (e.g., what does the correlation mean between a normally distributed scale and one transformed with its natural log, and how do you explain that to your readers?) Nonetheless, the assumptions of advanced procedures, and the sub-sequent data screening may make such strategies absolutely necessary
manipulat-Complexity of Analyses and Interpretations
There is no question that advanced statistical techniques, especially ate ones, are more difficult to analyze and interpret First, because they involve higher-level mathematics than univariate statistics, you may find yourself learning things like matrix algebra for the first time in your life Second, because many
multivari-of the analyses involve tedious recursive procedures, it is absolutely essential to use statistical computer programs (many of which are very expensive) to analyze the data Third, the results in the computer output of advanced statistical tech-niques, especially multivariate ones, are often much more difficult to interpret than those from simpler univariate statistical analyses In short, as Tabachnick and Fidell (2013) put it: “Multivariate methods are more complex than univariate by
at least an order of magnitude” (p 1)
Are the Disadvantages Really Disadvantages?
Fortunately, I have noticed over the years that the disadvantages of learning and using advanced quantitative methods most often lead to long-term advantages
Larger Sample Sizes
For example, the need to obtain large sample sizes forces you to get responsibly large sample sizes These large sample sizes lead in the long run to more stable
Trang 3918 James Dean Brown
results, a higher probability of finding significant results if they exist, more ful results, and ultimately to more credible results in your own mind as well as in the minds of your readers
power-Additional Assumptions
Checking the more elaborate assumptions of advanced statistical tests forces you
to slow down at the beginning of your analyses and think about the descriptive statistics, the shapes of the distributions involved, the reliability of various mea-surements, the amounts of variance involved and accounted for, the degrees of redundancy among variables, any univariate or multivariate outliers, and so forth Ultimately, all of this taken together with the results of the study can and should lead to greater understanding of your data and results
Need for Data Screening
The need for data screening similarly forces you to consider descriptive statistics, distributions, reliability, variance, redundancy, and outliers in the data, but at a time when something can be done to make the situation better by eliminating outliers or bringing them into the relevant distribution, by transforming variables that are skewed, and so forth Even if you cannot fix a problem that you have noticed in data screening, at the very least, you will have been put on notice that
a problem exists (or an assumption has been violated) such that this information can be taken into account when you interpret the results later in the study
Complexity of Analyses and Interpretations
In discussing the complexity issue, I mentioned earlier that Tabachnick and Fidell (2013) said that, “Multivariate methods are more complex than univariate by at least an order of magnitude.” But it is worth noting what they said directly after that: “However, for the most part, the greater complexity requires few conceptual leaps Familiar concepts such as sampling distributions and homogeneity of vari-ance simply become more elaborate” (p 1) Moreover, given the advantages of using advanced statistical techniques, they may well (a) force you to learn matrix algebra for the first time in your life, which will not only make it possible for you
to understand the more advanced statistics, but also make the math underlying the simpler statistics seem like child’s play; (b) motivate you to find a grant to pay for the computer software you need, or some other way to get your institution pay for it, or indeed, to finally sit down and learn R, which is free; and (c) push you to finally get the manuals and/or supplementary books you need to actu-ally understand the output and results of your more elaborate statistical analyses, and again, doing so will make the output from simpler statistical analyses seem like child’s play In short, the added complexity involved in advanced statistical
Trang 40analyses is not all bad Indeed, it can lead you to exciting places you never thought you would go.
Conclusion
In writing this chapter, I wrestled with using the word advantages Perhaps it is
bet-ter to think about the advanced procedures described here as opening up options rather than as having advantages—but then it occurred to me that people with those options will have distinct advantages, so I stayed with the idea of advantages.That is not to say that using advanced statistics, especially multivariate analyses, for every study will be the best way to go For example, I once had a student who hated statistics so much that he set out to write a paper that used only descriptive
statistics and a single t-test, and he did it, writing an elegant, straightforward, and
interesting paper Simple as it was, he was using exactly the right tools for that research project
However, learning new, advanced statistical techniques can help you to stay interested and up-to-date in your research Having multiple options can also help you avoid getting stuck in a statistical rut For instance, I know of one researcher
in our field who clearly learned multiple regression (probably for her tion) and has used that form of analysis repeatedly and almost exclusively across
disserta-a number of studies She is cledisserta-arly stuck in disserta-a stdisserta-atisticdisserta-al rut She is holding disserta-a hdisserta-am-mer, so she uses it for everything, including screws I just wish she would extend her knowledge to include some other advanced statistical procedures, especially extensions of regression like factor analysis or SEM
ham-The bottom line here is that advanced statistics like those covered in this book can be useful and even exciting to learn, but the harsh reality is that these forms
of analysis will mean nothing without good ideas, solid research designs, reliable measurement, sound data collection, adequate data screening, careful checking of assumptions, and comprehensive interpretations that include all facets of the data, their distributions, and all of the statistics in the study
Fortunately, you have this book in your hands I say fortunately because this
col-lection of chapters is a particularly good place for L2 researchers to start expanding their knowledge of advanced statistical procedures: It covers advanced statistical
techniques; it was written by L2 researchers; it was written for L2 researchers; and
it contains examples drawn from L2 research
Good researching!
References
Brown, J D (1990) The use of multiple t tests in language research TESOL Quarterly, 24(4), 770–773.
Brown, J D (1992) Statistics as a foreign language—Part 2: More things to look for in
read-ing statistical language studies TESOL Quarterly, 26(4), 629–664.