A multivariate spatial analysis of these maps shows that grammatical alternation variables follow a relatively small number of common regional patterns in American English, which can be
Trang 2R E G I O N A L VA R I AT I O N I N W R I T T E N
A M E R I C A N E N G L I S H
The first study of its kind, Regional Variation in Written American English takes a corpus-based approach to map over100 grammatical alternation variables across the United States A multivariate spatial analysis of these maps shows that grammatical alternation variables follow a relatively small number of common regional patterns in American English, which can be explained based on both linguistic and extra-linguistic factors Through this rigorous analysis of extensive data, Grieve identifies five primary modern American dialect regions, demonstrating that regional variation is far more pervasive and complex in natural language than is generally assumed The wealth of maps and data and the ground-breaking implications
of this volume make it essential reading for students and researchers
in linguistics, English language, geography, computer science, sociology, and communication studies.
r Identifies and maps regional linguistic variation in written Standard English for the first time.
r Introduces a corpus-based approach to dialectology.
r Presents a statistical method for identifying individual and common patterns of regional variation.
jack grieve is Senior Lecturer in Forensic Linguistics in the School
of Languages and Social Sciences at Aston University in Birmingham, England He holds a Ph.D in Applied Linguistics from Northern Arizona University, where he studied quantitative corpus linguistics under the supervision of Professor Douglas Biber He was also a postdoctoral research fellow in Professor Dirk Geeraerts’s Quantitative Lexicology and Variational Linguistics research unit at the University
of Leuven in Belgium.
www.ebook3000.com
Trang 3a broad range of topics and approaches, including syntax, phonology, grammar, vocabulary, discourse, pragmatics and sociolinguistics, and is aimed at an interna- tional readership.
Already published in this series:
Irma Taavitsainen and P¨aivi Pahta (eds.): Medical Writing in Early Modern English Colette Moore: Quoting Speech in Early English
David Denison, Ricardo Berm´udez-Otero, Chris McCully and Emma Moore
(eds.): Analysing Older English
Jim Feist: Premodifiers in English: Their Structure and Significance
Steven Jones, M Lynne Murphy, Carita Paradis and Caroline Willners: Antonyms
in English: Construals, Constructions and Canonicity
Christiane Meierkord: Interactions across Englishes: Linguistic Choices in Local and International Contact Situations
Haruko Momma: From Philology to English Studies: Language and Culture in the Nineteenth Century
Raymond Hickey (ed.): Standards of English: Codified Varieties Around the World Benedikt Szmrecsanyi: Grammatical Variation in British English Dialects: A Study
in Corpus-Based Dialectometry
Daniel Schreier and Marianne Hundt (eds.): English as a Contact Language Bas Aarts, Joanne Close, Geoffrey Leech and Sean Wallis (eds.): The Verb Phrase
in English: Investigating Recent Language Change with Corpora
Martin Hilpert: Constructional Change in English: Developments in allomorphy, word formation, and syntax
Jakob R E Leimgruber: Singapore English: Structure, Variation and Usage
www.ebook3000.com
Trang 4Christoph R¨uhlemann: Narrative in English Conversation
Dagmar Deuber: English in the Caribbean: Variation, Style and Standards in Jamaica and Trinidad
Eva Berlage: Noun Phrase Complexity in English
Nicole Deh´e: Parentheticals in Spoken English: The Syntax-Prosody Relation Jock Onn Wong: English in Singapore: A Cultural Analysis
Anita Auer, Daniel Schreier and Richard J Watts: Letter Writing and Language Change
Marianne Hundt: Late Modern English Syntax
Irma Taavitsainen, Merja Kyto, Claudia Claridge, and Jeremy Smith: Developments
in English: Expanding Electronic Evidence
Arne Lohmann: English Co-ordinate Constructions: A Processing Perspective on stituent Order
Con-John Flowerdew and Richard W Forest: Signalling Nouns in English: A corpus-based discourse approach
Jeffrey P Williams, Edgar W Schneider, Peter Trudgill, and Daniel Schreier:
Further Studies in the Lesser-Known Varieties of English
Nuria Y´a˜nez-Bouza: Grammar, Rhetoric and Usage in English: Preposition Placement 1500–1900
Jack Grieve: Regional Variation in Written American English
Earlier titles not listed are also available
www.ebook3000.com
Trang 7Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence.
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
www.ebook3000.com
Trang 8For Emily
www.ebook3000.com
Trang 103.2 Alternation variable selection and measurement 39
Trang 115.3 Spatially autocorrelated linguistic data matrix 150
5.5.2 Introduction to fuzzy cluster analysis 176
6.3 Comparison of internal and external explanations 207
6.4 Comparison to previous American dialect surveys 209
Appendices
Trang 12Contents xi
Appendix C: Grammatical alternation variables: descriptive statistics 288
Trang 131.1 American dialect regions: Linguistic Atlas of United States
1.2 American dialect regions: Dictionary of American English 10
1.3 American dialect regions: Atlas of North American English 12
xii
Trang 14List of figures xiii
4.1 Histograms for the variants of selected alternations 104
4.4 Anyone and Anybody simplified local spatial autocorrelation
4.5 Comparison of spatial weights matrices for Anyone/Anybody 119
4.6 Comparison of spatial weights matrices for Do Not
5.1 Interpolated local spatial autocorrelation map for
Trang 156.1 Average formality factor scores (based on Factors 1, 2, 3) 198
6.10 Phonetic Factor 1 (Atlas of North American English reanalysis) 212
6.11 Phonetic Factor 2 (Atlas of North American English reanalysis) 213
6.12 Phonetic Factor 3 (Atlas of North American English reanalysis) 213
6.13 Phonetic Factor 4 (Atlas of North American English
Trang 16xv
Trang 175.1 Preliminary factor analyses 162
6.3 Relative frequency of selected alternations PMW in COCA
Trang 18I began working on the research reported in this book ten years ago as part of
my Ph.D dissertation I had finished my M.A at Simon Fraser University
in Vancouver with Paul McFetridge and Maite Taboada, and moved toNorthern Arizona University in Flagstaff to work with Doug Biber I wasincredibly lucky to be supervised by Doug, who more than anyone else isresponsible for where I am today as an academic His influence can be seenthroughout this book, both in the methods I have applied and the manner
in which I have interpreted my results I am also grateful for the supportand guidance I received from the members of my doctoral committee, BillCrawford, Ray Huang, and Randi Reppen
Following my Ph.D., I became a researcher at the Quantitative ogy and Variational Linguistics (QLVL) research unit at the University ofLeuven, headed by Dirk Geeraerts and Dirk Speelman, whose theoreticaland methodological outlook on linguistics has also shaped this book While
Lexicol-at QLVL, I worked closely with Costanza Asnaghi and Tom Ruette Many
of the ideas and methods described in this book were developed duringconversations with them, as well as with other members of QLVL, includ-ing Alena Anishchanka, Dirk De Hertog, Kris Heylen, Natalia Levshina,Pedro Pulqu´erio Vieira, Eline Zenner, and Weiwei Zhang
After Leuven, I took up a lectureship at Aston University in Birmingham
as a member of the Centre for Forensic Linguistics (CFL), which is wherethis book was written I am particularly grateful to Tim Grant, the head
of CFL, for his constant support, as well as my other colleagues at CFL,including Kate Haworth, Krzys Kredens, and Andrea Nini, as well asMalcolm Coulthard, who founded the unit I would also like to thank all
my colleagues in the School of Languages and Social Sciences, especiallyJudith Baxter, Urzsula Clark, Simon Green, Pam Moores, Garry Plappert,and Gertrud Reershemius
The research reported in this book has also benefited from discussionswith a number of linguists and social scientists from around the world,
xvii
Trang 19especially the dialectometrists Wilbert Heeringa, John Nerbonne, BenediktSzmrecsanyi, and Martijn Weeling, as well as with Federica Barbieri, EricFriginal, Diansheng Guo, Roland Van Hout, Dan Johnson, Alice Kasakoff,Yuri Thierein, and Bert Vaux I would also like to thank Mark Davies andBill Labov for sharing their data with me, as well as their thoughts on thisproject.
This book would also not have been possible without the support ofCambridge University Press, especially Helen Barton and Merja Kyt¨o I amgrateful for the confidence they have had in this project and their feedbackthroughout this process has been invaluable I would also like to thankHelena Dowson, Bethany Gaunt, Christina Sarigiannidou, and MartinBarr for their help preparing this book for publication
Finally, I would like to thank my family, especially my sister, Meg, myfather, Tom, and my mother, Paula, who passed away while I was studyingfor my Ph.D I would also like thank my father’s partner Linda and myparents-in-law Bob and Kathleen for their love and support Most of all,
I must thank my wife, Emily, to whom I dedicate this book I met Emily
in Flagstaff and we were married in Leuven She has read every line of thisbook, most several times, and we have discussed every part of this researchtogether This book would not have been possible without her
Trang 20c h a p t e r 1
Introduction
The goal of this book is to map regional linguistic variation in writtenAmerican English To investigate this topic, a large corpus of modernAmerican letters to the editor was collected from hundreds of cities fromacross the United States This corpus was then used to map hundreds ofmeasurements of grammatical variation in American English for the firsttime Statistical analyses of these maps found that regional grammaticalvariation exists in written American English and that most grammaticalvariables follow only one of a few basic regional patterns In addition,five modern American dialect regions were identified: the Northeast, theSoutheast, the Midwest, the South Central States, and the West Theseresults challenge standard theories of American dialect regions and showthat regional linguistic variation is far more complex than is generallyassumed This chapter situates this study by reviewing previous research inAmerican dialectology and by presenting an outline for the rest of this book
1.1 American dialectology
The first large-scale survey of regional dialect variation in American English
was the Linguistic Atlas of the United States and Canada As recounted by the director of the Atlas Hans Kurath (Kurath et al., 1939), the projectwas first proposed in December1928 by members of the Modern LanguageAssociation, inspired by the national dialect surveys being conducted acrossEurope at the turn of the century A committee chaired by Charles C Friesand including Kurath was formed to consider the feasibility of such aproject In January1929, unaware that this committee had been formed, E
H Sturtevant at Yale proposed a similar project to the American Council
of Learned Societies The two groups were united at a meeting in February
1929 that was organized by Fries, where a formal proposal for the projectwas drafted Sturtevant then presented this proposal in March1929 to theExecutive Committee of the American Council for Learned Societies, who
1
www.ebook3000.com
Trang 21agreed to fund a conference to further discuss the project This conferencetook place that summer at Yale and resulted in the appointment of a newcommittee – chaired by Hans Kurath and including Leonard Bloomfieldamong other top American linguists of the time – that was charged with
presenting a proposal and a budget for the Linguistic Atlas of the United States and Canada to the Executive Committee The plan was approved
by the Council in January 1930, although they recommended that thecommittee first conduct a survey of New England before a continentalsurvey was begun
The Linguistic Atlas of New England began in 1931, with Kurath asthe director and Miles L Hanley as associate director Data collectionfor the survey was completed in 1933, with 416 informants in 213 com-munities from across New England, as well as New Brunswick, havingbeen interviewed by9 fieldworkers, including Guy Lowman, the primaryfieldworker, and Kurath, as well as noted linguists Bernard Bloch andMartin Joos The fieldworkers gathered data using a standardized ques-tionnaire designed by Kurath to elicit upwards of 800 different items,especially words used to discuss common subjects and regional activi-ties, such as geography, weather, time, flora, fauna, farming, mining, andforestry Grammatical data on a limited number of function word alter-
nations (e.g whom/whom, ran across/into) and morphological alternations (e.g dived vs dove) were also collected In addition, responses were phonet-
ically transcribed by the fieldworkers so that phonological features could
be analyzed In most communities only two informants were interviewed –
an elderly informant from an old, local family and a middle-aged and morewell-educated informant from a local family Informants with universityeducations were also interviewed in larger urban areas This approach toselecting informants was taken because it was only possible to interview asmall number informants at each location, making it necessary to focus oninformants who were most likely to use regional forms In addition, becauseKurath was specifically interested in identifying historical patterns ofregional variation, non-mobile, older, rural male informants were generallypreferred
Following data collection, maps were produced showing the distribution
of each of the linguistic forms Because the maps were often quite unclear,with different variants dispersed across New England, Kurath also manually
plotted linguistic borders known as isoglosses to divide the region into
sub-regions where the different forms predominated This allowed forKurath to make sense of the complex data he was faced with and focushis analysis on the underlying patterns of regional variation in these maps
Trang 221.1 American dialectology 3
In addition, the maps for various linguistic variables were compared in
order to identify bundles of isoglosses – isoglosses for multiple variables that
follow similar paths In this way, common patterns of regional linguisticvariation were identified and used to locate dialect regions The majorfinding of this survey was that there were two principal dialect regions inNew England: eastern New England and western New England, with theborder between these regions running through Connecticut, Massachusetts
and Vermont For example, the survey found that post-vocalic /r/ deletion, the pronunciation of library with two syllables, and the use of the term comforter rather than quilt were all features found primarily in eastern
New England These patterns were explained by appealing to historicalsettlement patterns, as the eastern region had been settled by colonistsoriginating from the Atlantic coast whereas the western region had beensettled by colonists originating from the Lower Connecticut River Valleyand the Long Island Sound The methods and results of this survey werepublished in three volumes beginning in1939 (Kurath et al., 1939–1943)along with a handbook (Kurath et al.,1939)
After data collection was completed for the Linguistic Atlas of New England, Hans Kurath prepared to survey the rest of the Atlantic Coast for the Linguistic Atlas of the Middle and South Atlantic States However, the
Great Depression and the lack of funding and interest outside New Englandonly allowed Kurath to send Guy Lowman into the field (Kretzschmar
et al., 1993) From 1933 to 1938, Lowman traveled the eastern seaboardconducting interviews in communities from Delaware to northern Florida(McDavid & O’Cain,1979) For these investigations, Lowman used thesame basic procedure and questionnaire that was used in New England,although Kurath had modified the questionnaire, adding and removingcertain forms Due to a lack of local funds and interest, Kurath put theSouth Atlantic survey on hold in1939 and sent Lowman to begin a survey
of the Middle Atlantic States Over the next two years Lowman collecteddata from Pennsylvania, West Virginia, New Jersey, eastern Ohio, andNew York City (Kretzschmar et al.,1993) Tragically, in the summer of1941Lowman died in car accident while collecting data around the Finger Lakes
in Upstate New York (Kretzschmar et al.,1993) Following Lowman’s death,Kurath selected Raven I McDavid, who had been recruited by Bloch at the
1937 Linguistic Institute, to complete data collection for the Middle andSouth Atlantic States (Kretzschmar et al.,1993) Data collection was put
on hold in1942, when McDavid joined the United States Army’s IntensiveLanguage Program (Kretzschmar et al.,1993), but McDavid returned tothe field in1945 and by 1949 over 1,200 informants had been interviewed
Trang 23Figure1.1 American dialect regions: Linguistic Atlas of the United States and Canada
from across the Middle and South Atlantic States (Kurath & McDavid,
1961)
The first major study to analyze the data from the Middle and South
Atlantic States, as well as the data from New England, was Kurath’s Word Geography of the Eastern United States, published in1949, which mappedlexical variation from New England to South Carolina Kurath identifiedthree major dialect regions in the Eastern United States by plotting andcomparing isoglosses, a method that he had extended since the survey ofNew England and that was becoming the standard approach in Americandialectology These dialect regions are mapped in Figure1.1 and consist
of the North (where words such as pail and brook are more common), the Midland (where words such as skillet and snake feeder are more com- mon), and the South (where words such as snap bean and turn of wood
are more common) In addition, Kurath also identified internal divisionswithin these three regions, including a distinction between the Northernand Southern Midland Kurath considered the identification of a distinctMidland region as the main descriptive contribution of the study
Once again, Kurath explained these dialect patterns based on torical settlement patterns He argued that the Northern dialect regioncorresponds to the area settled by British colonists originating in NewEngland, who moved through New York and into northern Pennsylvania
Trang 24his-1.1 American dialectology 5and Ohio, that the Midland dialect region corresponds to the region settled
by British, Scotch-Irish, and German colonists originating in Philadelphia,who moved through southern Pennsylvania into western Virginia and theLower Midwest, and that the Southern dialect region corresponds to theregion settled by British colonists originating in Virginia and the Carolinas,who moved into the Deep South Because these three groups of settlers haddifferent linguistic and cultural backgrounds and were largely independent
of each other, over time they developed distinct forms of speech, which werethe foundation for the contemporary dialect regions that Kurath observed.This settlement theory of American dialect regions has dominated the fieldever since
The data from New England and the Middle and South Atlantic States,which by then contained data for over 1,400 informants, was also the
basis for E Bagby Atwood’s A Survey of Verb Forms in the Eastern United States, published in1953 This book represents the first and until now theonly American dialect survey to focus on grammatical variation Atwood
analyzed variation in the expression of tense (e.g boiled/boilt), the present perfect (e.g I have/am been), the present participle (e.g singing/singin’ ), the infinitive (e.g to tell/for to tell), verb agreement (e.g you were/was), and verb negation (e.g ain’t, hain’t), as well as the use of certain highly marked verbal constructions such as the might could double modal construction (e.g I might could do it) and the belongs to be construction (e.g he belongs
to be careful ) In line with Kurath, Atwood found evidence for the
three-way division of the Eastern United States into Northern, Midland, and
Southern dialect regions For example, clim as the past tense of climb was identified as a Northern form, boilt as the past tense of boil was identified as
a Midland form, and the belongs to be construction identified as a Southern
form Overall, however, Atwood presents a somewhat different picture
of the Midland than Kurath, noting that the Midland was characterizedmore by the absence of distinct forms, rather than their presence, as wasthe case for the North and the South Atwood also discussed the socialdistribution of these non-standard forms, foreshadowing the shift towardsocial variation that was about to take place in dialectology, led by WilliamLabov (1963,1966a,1969,1972)
This three-way division of American dialect regions was also supported
by Kurath and McDavid’s analysis of phonetic and phonological variation
in the Eastern data set, which at that time represented the language of over
1500 informants Although the same basic patterns of regional variationwere identified, Kurath and McDavid found that the border between theNorthern Midland and the Southern Midland was stronger than the
Trang 25division Kurath had identified in his lexical analysis of the same dataset (see also McDavid,1993) Furthermore, while pervasive regional pat-terns in pronunciation were identified, like Atwood, Kurath and McDavidalso found considerable variation across social groups The results of this
study were presented in The Pronunciation of English in the Atlantic States,
published in1961, which was the last major study based on the data ered for the Linguistic Atlas Project in the Eastern United States Kurathwould pass away a few years later in1964 and McDavid would take overdirectorship of the project, but momentum slowed Partial records for theMiddle and South Atlantic States were finally published in1979 (McDavid
gath-& O’Cain, 1979) and, following McDavid’s death in 1984, a handbookwas published in 1993, led by William Kretzschmar, who took over the
directorship of the project and who maintains the records today An Atlas
of the Middle Atlantic States was never published.
While Kurath and his team were surveying the Eastern United States,affiliated regional surveys were being conducted elsewhere in the UnitedStates As early as1938, data was being collected for the Linguistic Atlas of the North Central States, under the directorship of Albert H Marckwardt
(Allen,1973) Although at first the survey covered the entire Midwest, at
a meeting in New York City in1948 attended by Kurath and McDavid,
Marckwardt agreed that the Linguistic Atlas of the Upper Midwest should
be conducted as a separate survey, focusing on the states of Minnesota,Iowa, Nebraska, South Dakota and North Dakota Directorship of thesurvey was awarded to Harold B Allen, who had been trained by Kurathand Bloch at the 1939 Linguistics Institute Marckwardt continued tocollect and analyze data from the Eastern Midwest and by1978 over 550informants had been interviewed in Michigan, Ohio, Indiana, Kentucky,Illinois, Wisconsin, and Southern Ontario (Kurath, 1979; Labov et al.,
2006) While no atlas was ever published for this region, smaller studies(e.g Marckwardt,1957) found that the division between the Northern andMidland dialect regions in the Eastern United States extended into theMidwest, with the border between the two regions running through thenorthern third of Ohio, Indiana and Illinois These results agreed with Alva
L Davis’s1948doctoral dissertation, Word Atlas of the Great Lake Region,
which was based on a postal questionnaire, and were replicated in RogerShuy’s doctoral dissertation, which focused on the boundary between theNorthern and Midland dialect regions in Illinois (Shuy,1962)
Allen’s survey progressed independently in the Upper Midwest, using
an extended version of Kurath’s basic questionnaire, with a total of208informants interviewed and recorded between1949 and 1957 In addition,
Trang 261.1 American dialectology 71,064 total informants responded to a postal questionnaire following theapproach to data collection developed by Davis The results of the survey
were published by Allen as the Linguistic Atlas of the Upper Midwest in
three volumes from1973 to 1976 (see also Allen, 1952, 1958, 1959, 1964).Based on an analysis of lexical, phonological, and morphological features,Allen concluded that the distinction between the Northern and Midlanddialect regions also extended through the Upper Midwest Like Kurath,Allen explained these patterns based on historical settlement patterns,with settlers of the northern half of the Upper Midwest coming fromNew York and northern Ohio, and with settlers of the southern half of theUpper Midwest coming from the Mid-Atlantic States and southern Ohioalong the Old National Trail
Around the same time as these Midwestern surveys were being ducted, E Bagby Atwood, who had previously analyzed verb forms inthe Eastern United States, was surveying the vocabulary of Texas and theSouth Central States including Louisiana, Arkansas, Oklahoma and New
con-Mexico, which was reported in The Regional Vocabulary of Texas, published
in1962 The data for this survey was gathered by Atwood and his studentsand colleagues during the 1950s, using an extended version of Kurath’squestionnaire By comparing his results to Kurath’s, Atwood showed thatSouthern dialect words were relatively common across the South Centralstates, as were Midland dialect words and Spanish borrowings to a lesserextent, reflecting the mixed settlement history of this region Based on thisevidence, Atwood argued that the language spoken in Texas and the SouthCentral States was a form of Southern English
The last of the affiliated regional surveys to be completed was the
Linguistic Atlas of the Gulf States Primary fieldwork took place under
the directorship of Lee Pederson between 1973 and 1979, during which
1121 informants were interviewed and recorded by 256 field investigators
in 8 southern states: Tennessee, Georgia, Florida, Alabama, Mississippi,Louisiana, Arkansas, and eastern Texas The results were published in sevenvolumes from1986 to 1993 (Pederson,1986; Pederson et al.,1986–1993) Thebasic finding of the survey was that there were two major dialect regions inthe Gulf States – the Upland and the Lowland – with the border betweenthese two regions running through northern Georgia, Alabama, and Mis-sissippi These dialect regions correspond to the Southern Midland andthe South as identified by Kurath and his colleagues in the Eastern UnitedStates, showing that these Eastern dialect regions had been extended acrossthe South through settlement, much as they had been extended across theMidwest
Trang 27Other regional surveys affiliated with the Linguistic Atlas of the United States and Canada were begun, but none were ever completed or resulted
in major publications Most notably, in the Far West, data collection wasbegun and the preliminary results were reported for two surveys Data
collection for the Linguistic Atlas of the Pacific West was conducted in
California and Nevada between1952 and 1959 with initial analyses showing
a distinction between the language of Northern and Southern California(Reed,1954) Similarly, data collection for the Linguistic Atlas of the Pacific Northwest was conducted between1953 and 1963 (Reed,1956,1957,1961),with initial results showing for example that Northern and North Mid-land forms were common across the region, whereas Southern terms were
relatively rare Other unfinished regional surveys included the Linguistic Atlas of Oklahoma, whose preliminary data was analyzed by Atwood in his study of the vocabulary the South Central States, and the Linguistic Atlas
of the Rocky Mountain States, for which data collection reportedly began
in1988 (Labov et al.,2006) Aside from a small amount of data collected
in Ontario and New Brunswick, Canada was never mapped as part of thissurvey
The various regional surveys associated with the Linguistic Atlas of the United States and Canada mapped much of the United States, although
there were several gaps in the analysis, especially in the West, and giventhe many years over which the surveys were completed, it is unclear howcomparable these results are, or if taken together what era they could be said
to represent Nevertheless, the major patterns of regional linguistic variationidentified by these surveys are combined and presented inFigure1.1, whichrepresents a synthesis and an interpolation of the results of these variousregional surveys The dialect regions identified in the Eastern United Statesare based directly on the results of the surveys described above No data,however, is available for Missouri or parts of West Virginia, Kentucky, andFlorida In these regions the dialect borders are estimated based on thesurrounding area In the West, very little data was collected, but according
to Kurath (1972) preliminary analyses on the West Coast demonstratedthat the border between the North and the Midland extends to the PacificNorth West, which is reflected in Figure 1.1 Although Kurath and hiscolleagues never produced such a national map, this map is consistentwith the results of their surveys and with Kurath’s view of American dialectregions This map therefore represents a theory of what the Linguistic AtlasProject would have found had the various regional surveys been completedand combined
Trang 281.1 American dialectology 9
The Linguistic Atlas of the United States and Canada was not the only
attempt to map American English Long before the possibility of a dialectatlas was discussed by the Modern Language Association and the AmericanCouncil of Learned Societies, a dictionary of American English was pro-posed at the founding of the American Dialect Society in1889 Althoughthe Society published research on American regional lexicography in their
journal Dialect Notes, which was first published in1890, and later in the
Publications of the American Dialect Society, data collection for a dictionary
of American English was not begun in earnest until Fredric G Cassidy wasappointed as the editor of the dictionary in1962 Fieldwork was conductedbetween1965 and 1970, over which time 80 fieldworkers interviewed 2,777informants in1002 communities The fieldworkers used a questionnairedeveloped by Cassidy that contained over1,800 questions relating primarily
to rare and archaic vocabulary items, which resulted in over20,000 ent lexical items being elicited (Carver,1987) The results of the survey were
differ-published as the Dictionary of American Regional English in seven volumes
between1985 and 2013 (Cassidy & Hall,1985,1991; Hall & Cassidy,1996;Hall,2002,2012,2013)
The primary purpose of the Dictionary of American Regional English was
to identify and define regional vocabulary items from across the UnitedStates, rather than to map the dialect regions of American English How-ever, the dictionary was the basis for Craig Carver’s analysis of regional
lexical variation in American English, American Regional Dialects: A Word Geography, published in1987, which represents the first complete survey
of regional linguistic variation in American English In order to analyzethe massive amounts of data gathered for the dictionary, Carver focused
on analyzing sets of words in the aggregate Specifically, Carver identified
what he called dialect layers, which were defined based on sets of words that
he judged to exhibit similar regional distributions The degree to which aparticular location was part of a particular dialect layer was then calculated
as the percentage of the words associated with that dialect layer observed atthat location For example, Carver’s New England Layer is defined based
on45 lexical items, including use of the word grinder for a type of wich and rotary for a roundabout Each location was then scored based
sand-on the percentage of these45 lexical items that had been attested at thatlocation Lines were then drawn around the locations with the highest per-centage of those words to map that layer, with the highest concentration
of New England words occurring at locations in New Hampshire, CentralMassachusetts, and Western Vermont Carver mapped a large number of
Trang 29Lower South Upper South
Lower South Upper South Lower North
Upper North
New Eng.
Figure1.2 American dialect regions: Dictionary of American Regional English
layers in this manner and then used these results to infer the locations ofAmerican dialect regions
Based on this approach to the analysis of regional linguistic variation,Carver identified two major dialect regions in the United States: the Northand the South In turn, Carver divided the North into three main sub-regions (the Upper North, the Lower North, and the West) and the Southinto two main sub-regions (the Upper South and the Lower South), asmapped inFigure1.2, with Carver’s Lower North and Upper South sub-regions corresponding roughly to Kurath’s Northern Midland and South-ern Midland sub-regions respectively The identification of a Western sub-region is also notable, as this was the first time that a sufficient amount ofdata had been collected to allow for such a distinction to be made Carver’stwo-way division of American dialect regions between the North and theSouth clearly differs from Kurath’s three-way division between the North,the Midland, and the South, but it was not without precedent According
to Kurath (1949), before he began his surveys of the Eastern United States,
it was generally assumed that the basic distinction in American Englishwas between the North and the South This is why Kurath considered the
identification of the Midland in the Word Geography of the Eastern United States to be such an important discovery The results of Carver’s analysis,
however, directly support this older and simpler conception of Americandialect regions
Trang 30shifts Unlike previous dialect surveys, the Atlas of North American English
focused on speakers in urban areas, where most of the population resides,and sampled informants from across demographic groups, so as to obtain
a more general picture of regional variation in Modern American English.Informants were also interviewed by telephone, which greatly facilitateddata collection Interviews were conducted between 1991 and 1999 withthe final data set containing the language of762 informants from acrossthe United States and Canada In addition, an acoustic analysis of the firstand second formants of the vowels of American English was carried outbased on recordings of the interviews with439 informants This allowedfor differences in the pronunciation of vowels to be measured objectivelyand quantitatively, rather than relying on the manual categorization ofvowels by fieldworkers and dialectologists The results were then mappedand isoglosses were drawn using a formalized technique Finally, bundles ofisoglosses were identified in order to define the dialect regions of ModernAmerican English
Based on this analysis, Labov, Ash and Boberg identified four majorAmerican dialect regions, consisting of the North, the Midland, the South,and the West (see also Labov, 1991), as well as several sub-regions, aspresented inFigure1.3 A distinct Canadian region was also identified The
analysis of American dialect regions presented in the Atlas of North American English falls somewhere in between Kurath’s and Carver’s analyses The atlas
clearly identified three dialect regions in the Eastern United States, which
is similar to Kurath’s analysis, but it also split the North Midland andSouth Midland, classifying the South Midland as part of the South, which
is similar to Carver’s analysis of the Upper South In fact, the atlas does noteven identify the region covered by Kurath’s South Midland or Carver’sUpper South as a distinct sub-region within the South, which differentiatesthis analysis from the analyses of both Kurath and Carver The basicdifference between the dialect regions identified by these three surveys cantherefore be reduced down to the status of the area covered by what Kurathcalled the Midland: for Kurath the North Midland and South Midlandcombine to form a single Midland dialect region, for Carver the NorthMidland (i.e the Lower North) is part of the North and the South Midland
Trang 31New Eng.
New Eng.
Figure1.3 American dialect regions: Atlas of North American English
(i.e the Upper South) is part of the South, and for Labov, Ash and Bobergthe North Midland is an independent dialect region (i.e the Midland) andthe South Midland is part of the South, although not a distinct sub-regionwithin the South
In order to explain the dialect regions they identified, Labov, Ash andBoberg referenced the same settlement patterns as Kurath However, unlikeprevious American dialect surveys, which only provided external explana-tions for dialect regions, Labov and his coauthors also provided inter-nal linguistic explanations for these patterns In particular, the NorthernCities, Southern, and Canadian vowel chain shifts were identified as beingthe source of much of the variation in vowel systems that was observedacross North American English For example, the vowels used in the regionaround the Great Lakes at the core of their Northern dialect region werefound to be undergoing what is known as the Northern Cities Shift, whichbegins with the fronting and raising of /ae/, followed by the fronting of /o/and the lowering of /oh/, and the backing of /e/, /uh/ and /i/ Similar butdistinct chains of inter-related vowel shifts were also found to characterizespeech in the South and in Canada In this way, Labov, Ash and Bobergnot only identified common patterns of regional variation and explainedthese patterns based on external factors, such as settlement patterns, butthey also offered a linguistic explanation for why particular sets of vowelsexhibited similar patterns of regional variation
Trang 321.2 Outline 13Finally, the last major dialect survey of American English was Bert Vaux’sHarvard Dialect Survey, which began as a paper questionnaire distributed
in Vaux’s “Dialects of English” course at Harvard in1999 Vaux eventuallyexpanded the survey and placed it online in2002, where it was completed
by more than47,000 informants over the next year, demonstrating howmuch data can be gathered online The online questionnaire contained122items relating to phonological, grammatical and lexical variation Althoughthe results of the survey were never formally published, maps plotting theanswers to all122 items on the questionnaire are available online This dataset, however, has never been subjected to a detailed analysis to identifycommon patterns of regional variation, and therefore has not contributed
to the debate on the location of American dialect regions
1.2 Outline
Despite the long and active history of research in American dialectology,there are numerous important questions that remained unanswered Chiefamong these is the nature of regional grammatical variation, which hasnever been mapped across the United States and has been the subject
of very little research over the past fifty years There is also considerabledisagreement about the nature of American dialect regions, especially con-cerning the status of the Midland Furthermore, very little is known aboutthe development of American dialects since the turn of the century Finally,almost nothing is known about regional linguistic variation across registers,and in particular if and how regional variation is patterned in written andstandard forms of American English The present study addresses all ofthese issues through a quantitative analysis of regional grammatical varia-tion in a36 million word corpus of modern letters to the editor representing
240 cities from across the United States
In addition to pursuing these new research questions, this book alsopresents a new quantitative approach to the analysis of regional linguisticvariation, which consists of a corpus-based approach to data collection and
a statistical approach to data analysis As opposed to previous Americandialect surveys, which have always been based on language elicited throughquestionnaires or linguistic interviews, this study is based on a corpus ofnaturally occurring texts This book also introduces a quantitative approach
to the analysis of regional linguistic variation, which mirrors the traditionalapproach of identifying isoglosses and isogloss bundles but which is based
on a combination of spatial and multivariate statistical techniques.Based on this new quantitative approach to dialectology, this studypresents a modern picture of regional variation in American English that
Trang 33both expands and challenges traditional theories of American dialects.First, this analysis maps regional linguistic variation in written AmericanEnglish, showing for the first time that regional variation exists in writ-ing Second, this analysis maps regional grammatical variation across theUnited States, showing for the first time that regional grammatical vari-ation is systematically patterned in American English The results of thisstudy therefore demonstrate that regional linguistic variation is pervasive
in natural language, existing across both linguistic levels and registers Theanalysis also identifies three common patterns of regional grammaticalvariation and uses this information to identify five basic dialect regions:the Northeast, the Midwest, the Southeast, the South Central States, andthe West This result contrasts with all previous American dialect surveys,which have always identified a distinction between Northern and Midlandregions or sub-regions as opposed to a distinction between Northeast-ern and the Midwestern regions as identified here To account for theseresults, it is argued that American dialect regions are in the process ofchanging, reflecting changing cultural regions, with the traditional dis-tinction between the North and the Midland currently being replaced
by a modern distinction between the Northeast and the Midwest Thisobservation forms the basis of a proposal for a cultural theory of regionallinguistic variation, which states that American dialect regions correspond
to American cultural regions of the time In addition, internal tions for the common patterns of regional grammatical variation observed
explana-in this study are presented based on a lexplana-inguistic analysis of the matical alternation variables that show similar patterns of regional vari-ation Based on this functional analysis, it is argued that there are for-mality differences in how letters to the editor from across the UnitedStates
gram-The rest of this book is organized as follows.Chapter2introduces anddefends the corpus-based approach to dialectology and describes the 36million word corpus of letters to the editor that is the basis for this study,including its design, compilation, and dimensions.Chapter3introducesthe concept of a grammatical alternation variable and presents the 135grammatical alternation variables that are the focus of this study, including
a map for all295 of their variants.Chapter4describes the spatial analysis
of the maps for each of the grammatical alternation variables, which allowsfor significant underlying patterns of spatial clustering to be identified
Chapter5describes the multivariate analysis of this data set, which allowsfor common patterns of spatial clustering and American dialect regions
to be mapped Chapter 6 considers explanations for these results, both
Trang 341.2 Outline 15from a linguistic and an extra-linguistic perspective, and proposes a gen-eral theory of regional linguistic variation and change Finally,Chapter7
concludes the study with a summary and a discussion of the significance ofboth the findings and the methods, while suggesting directions for futureresearch
Trang 35The basis for this study is a 36 million word regional corpus of ern American letters to the editor, which contains over 200,000 letterswritten by over160,000 authors, representing 240 cities from across thecontiguous United States This chapter introduces the corpus, including
mod-a discussion of its design, compilmod-ation, mod-and dimensions In mod-addition, thecorpus-based approach to dialectology is introduced and defended, as this
is the first time that a corpus-based approach to data collection has beenused as the basis for an analysis of regional linguistic variation in AmericanEnglish
2.1 Corpus-based dialectology
A corpus is a collection of naturally occurring language (Biber et al.,1998)
To create a corpus, spoken or written language is sampled from a particularvariety of language The corpus is then used as a basis for describing thatvariety of language Furthermore, corpora are often stratified in the sensethat they are composed of smaller sub-corpora that each represents a sub-variety of the variety of language under analysis (Biber,1993) The stratifiedcorpus is then used as a basis for comparing these sub-varieties to identifypatterns of linguistic variation within that variety of language
Taking a corpus-based approach to regional dialectology entails creating
a stratified corpus that represents how a particular variety of language isused at various locations across a region of interest The language used
at each of the locations is then compared based on the values of one ormore linguistic variables to identify patterns of regional linguistic variation
in that variety of language For example, the dialect corpus analyzed inthis study represents the modern letter to the editor register of AmericanEnglish as written in240 cities from across the United States Patterns ofregional grammatical variation were then identified by measuring the values
of grammatical alternation variables across these240 city sub-corpora
16
Trang 362.1 Corpus-based dialectology 17Although the corpus-based approach is a relatively straightforwardmethod for observing regional linguistic variation, analyzing natural lan-guage is uncommon in regional dialectology (although see Szmrecsanyi,
2008,2011,2013), especially in American dialectology Instead, the standardapproach to data collection in regional dialectology involves directly elicit-ing language from informants In most cases language is elicited by askinginformants a series of questions, such as how they pronounce a particularword or what word they use to refer to a particular concept Occasion-ally data is gathered through open-ended interviews, where informants areencouraged to engage in discourse, which is recorded so that it can beanalyzed at a later point in time But even the most naturalistic interview is
a communicative event in which the informant would not have otherwiseparticipated, as it is initiated by the fieldworker for the sole purpose of col-lecting linguistic data Unlike these standard approaches to data collection,the corpus-based approach to dialectology is based on the observation ofnatural language, which is produced without the intervention of the dialec-tologist Both approaches to data collection can lead to valid results, butbecause there are advantages and disadvantages to both techniques, it isimportant to keep these differences in mind when collecting dialect data.There are several advantages to eliciting language directly from infor-mants One important advantage of elicitation is that it allows for datacollection to focus on specific linguistic forms For example, dialectologistsare often interested in analyzing relatively rare lexical items If dialectolo-gists were to restrict themselves to analyzing natural language, they mightnot be able to collect large enough samples of language to observe the lowfrequency words in which they are interested Another advantage of elicit-ing language is that it allows for data to be collected in a highly controlledenvironment This is useful for several reasons In general, it ensures thatthe language collected is comparable across locations and facilitates thetranscription and recording of this language so that accent can be analyzedaccurately Eliciting language also generally requires that the dialectolo-gist interact with their informants, which allows the dialectologist to exertconsiderable control over selecting informants from particular social back-grounds This is especially important when language is collected from arelatively small number of informants at each location For dialectologists
to identify patterns of regional variation in such a limited sample, it isnecessary to focus on those informants who are most likely to produceregional dialect patterns, such as non-mobile, older, rural males For all
of these reasons, dialectologists have generally preferred to collect datathrough elicitation rather than through simple observation
Trang 37Most advantages associated with collecting dialect data through tion can also be seen as limitations associated with collecting dialect datathrough a corpus-based approach There are, however, several advantagesassociated with collecting data through observation rather than throughelicitation Most important, a corpus-based approach allows for language
elicita-to be collected far more efficiently than is possible through questionnaires
or linguistic interviews This is because corpus-based studies can oftenfocus on written language or spoken language that is naturally recorded
at the time of production, vastly simplifying the process of data tion, especially in regional dialect studies, where traditional methods ofdata collection often involve large amounts of travel and considerableeffort expended by fieldworkers to interview informants A corpus-basedapproach therefore allows for a dialect survey to be conducted for a fraction
collec-of the cost that would otherwise be necessary Furthermore, because thecorpus-based approach greatly simplifies data collection, it generally allowsfor much more language and many more informants to be sampled at eachlocation This is a major advantage of conducting a corpus-based dialectstudy, as it naturally leads to more reliable and generalizable results
A corpus-based approach to data collection also appears to be larly suitable for the analysis of quantitative regional grammatical variation,which can be difficult to access using traditional approaches to data col-lection This is because grammatical variation often involves constructionsthat are too abstract to be elicited reliably from informants through directquestioning and too infrequent to be consistently produced in relativelyshort open-ended interviews Given a moderately large corpus, however,many grammatical constructions can be easily observed Furthermore, acorpus-based approach is generally more suitable for investigating quanti-tative linguistic variation of any type, because it allows for large amounts ofrunning text or discourse to be analyzed, which is necessary to estimate therelative frequency of linguistic forms This is especially important whenanalyzing grammatical variation, which is often quantitative, in the sensethat an informant or a sample of informants from a particular location willgenerally use a range of equivalent grammatical constructions in varyingproportions, rather than just a single construction
particu-Finally, the corpus-based approach to data collection allows for regionalvariation to be analyzed in specific varieties of natural language Unlikecollecting data through interviews and questionnaires, where the commu-nicative event in which language is obtained is created by the dialectologist,compiling a corpus involves sampling speech or writing from one or morereal varieties of language The patterns of linguistic variation identified by
Trang 382.2 Corpus design 19analyzing a corpus are directly representative of how language is used inthat particular variety of language Alternatively, the patterns of linguisticvariation identified through the analysis of elicited data are only indirectlyrepresentative of how language is generally used in the real world In order
to analyze regional linguistic variation in a specific variety of language,such as letters to the editor, it is therefore necessary to adopt a corpus-based approach to data collection
For all of these reasons, the study of regional grammatical variationpresented in this book is based on a corpus of natural language Its design,compilation, and dimensions are described in the rest of this chapter
2.2 Corpus design
The corpus compiled for this study represents the letter to the editorregister of modern American English as written in240 cities from acrossthe United States from2000 to 2013 The decision to focus on a writtenvariety of modern American English was predetermined by the basic goals
of this study, as outlined inChapter1; however, the design of the corpusand in particular the decisions to focus on the letter to the editor register
of written English and the selection of the specific240 cities represented
in the corpus requires further discussion
2.2.1 Register selection
Letters to the editor have long been a ubiquitous feature of Americannewspapers, with most large daily newspapers publishing several letters tothe editor in every issue A letter to the editor usually consists of a briefletter sent to a newspaper from a reader for publication Letters are usuallypresented in the editorial section of the newspaper, often in a separatesub-section devoted to letters from readers Individual letters to the editorusually address a single topic, but a wide range of different topics arecommon, including current issues affecting the community in which thenewspaper is published, current issues in national and international news,commentary on articles published in the newspaper including corrections,and responses to the opinions expressed on the newspaper’s editorial pageincluding other letters to the editor Letters to the editor are also often used
to make public announcements or to give public thanks to members ofthe community In general, letters on particular topics or from particularauthors are not solicited directly, but occasionally newspapers will explicitlyask their readers to comment on particular topic by submitting a letter
Trang 39Letters to the editor are also rarely anonymous, as most newspapers insist
on publishing an author’s name and place of residence along with theirletter Four examples drawn from the final corpus are presented inTable2.1
in order to illustrate the standard format and content of American letters
to the editor
The letter to the editor register of modern American newspaper writingwas selected for analysis for several reasons First, this register was selectedbecause it is a variety of written language that is conducive to the analysis
of regional linguistic variation, as the place of residence of the author of aletter to the editor is usually provided in the byline of the letter Second,the letter to the editor register was selected because it is a variety of writtenlanguage that is produced by a large number of people from across theUnited States, with letters to the editor being published on a daily basis
by local newspapers from cities and towns in every state Furthermore,letters to the editor are in the public domain and many newspapers makearchives containing letters freely available online Focusing on letters to theeditor therefore facilitated the compilation of a large corpus including textsfrom a large number of authors and cities that were published over a shortperiod of time Third, letters to the editor are a type of correspondence,which is a very common form of written language, perhaps the form ofwritten language that is participated in by the largest number of speakers
of English Of all the different types of written language that could havebeen analyzed, a form of written correspondence would thus seem to be agood choice for an initial analysis of regional variation in writing Finally,analyzing letters to the editor helps control for register variation, becauseletters to the editor are a relatively specific variety of written language,which ensures that the majority of texts in a corpus of letters to the editorare written in a relatively consistent style with similar communicativepurposes
Despite the advantages of analyzing letters to the editor, one potentialproblem with focusing on this register is that newspaper staff can editletters before publication There are numerous reasons, however, to believethat editing should have relatively little effect on the results presented inthis book Based on discussions with editorial page editors from variousnewspapers represented in the corpus, it is clear that letters to the editorare edited to a certain degree, but mainly for length While it is fairlycommon for passages to be deleted from long letters by the editorial staff
of a newspaper, given a large enough corpus, such deletions should have
no effect on a grammatical analysis Letters are also edited for grammatical,typographical, punctuation, and content errors, but according to editorial
Trang 402.2 Corpus design 21Table2.1 Letter to the editor examples
Madison Wisconsin State Journal, November19, 2006:
Sunday Forum columnist Lucy Mathiak listed failings of No Child Left Behind, but missed the worst problem with the law: It’s punitive.
NCLB sets standards for achievement and identifies schools and students in trouble Then
it bankrupts those schools Why not send in a team of school experts, or give the school more money to hire the best teachers and administrators? Why not investigate the latest research on successful schools and try those approaches?
NCLB is a classic conservative response: It offers a simplistic, you’re-on-your-own solution
to a tough, complex problem We need thoughtful and creative approaches to giving every student a good chance to succeed A big stick is not the answer.
Chattanooga Times Free Press, June9, 2010:
Our Republican delegation has succeeded for a second time in their attempts at making bars and restaurants safer for everyone All done in the name of protecting our Second Amendment rights which is the right to bear arms.
Our governor disagreed and tried to stop this legislation but, as we know, was overruled Many others also disagreed, such as our County Commission, City Council and many family people across this state.
I agree our Second Amendment rights should be protected Our Constitution with all its amendments should be and are protected every day by our uniform servicemen What I don’t understand is why is it necessary to carry a gun to a bar or restaurant to protect those rights? The two don’t connect.
Wyoming Tribune Eagle, November3, 2006:
This is in response to Mark Shubert’s letter on Oct 28 and Joe Morelli’s letter on Oct 29.
I was general manager at the Two Bar Bowl for over a quarter century During most of that time, I worked hard with county residents to get the street on the west side of the building paved I had very little success, most of the residents on the street were in the county, we were in the city We belonged to the Greater Cheyenne Chamber of Commerce and one day I asked Larry Atwell if he could help Within six months, we had an agreement to have the street paved.
That is why I am voting for Mr Atwell for county commissioner and urge others to do so because I know he will get things done.
Boston Herald, March22, 2007:
Just a brief note of appreciation for James Verniere’s fine and historically accurate review of
“The Wind that Shakes the Barley” (“‘Wind’ blows timeless message,” March 16) Any person who wishes to understand the lengths to which the British government had gone to intimidate and terrorize the people of Ireland during the years 1918–1921 should see this finely acted film.
When, for the first time in some 800 years of foreign occupation, the people of a then-unpartitioned nation had overwhelmingly voted in an independent republican government, the British had unleashed their native terrorists, the Black and Tans, with
a license to burn cities and murder elected officials as well as any of the mere Irish who were unfortunate enough to cross paths with the Tans.
Many thanks to both the Herald and Verniere for presenting a true picture of a fine and
well-produced film which had won rave reviews at Cannes.