VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES FACULTY OF ENGLISH LANGUAGE TEACHER EDUCATION *** GRADUATION PAPER CREATING A CORPUS-BASED LEGAL
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF ENGLISH LANGUAGE TEACHER EDUCATION
***
GRADUATION PAPER
CREATING A CORPUS-BASED LEGAL
ACADEMIC WORD LIST FOR STUDENTS AND TEACHERS OF ENGLISH FOR LAW AT HANOI
Trang 2ĐẠI HỌC QUỐC GIA HÀ NỘI
TRƯỜNG ĐẠI HỌC NGOẠI NGỮ
KHOA SƯ PHẠM TIẾNG ANH
KHÓA LUẬN TỐT NGHIỆP
XÂY DỰNG SỔ TAY TỪ VỰNG TIẾNG ANH CHO SINH VIÊN VÀ GIẢNG VIÊN CHUYÊN NGÀNH TIẾNG ANH PHÁP LÝ TRƯỜNG ĐẠI HỌC LUẬT
Trang 3ACCEPTANCE PAGE
I hereby state that I: Pham Huyen Trang, class 17E2, being a candidate for the degree of Bachelor of Arts accept the requirements of the College relating to the retention and use of Bachelor’s Graduation Paper deposited in the library
In terms of these conditions, I agree that the origin of my paper deposited in the library should be accessible for the purposes of study and research, in accordance with the normal conditions established by the librarian for the care, loan or reproduction of the paper
Signature
4th May 2021
Trang 4Acknowledgements
First and foremost, I would like to send my deep gratitude to my supervisor,
Ms Can Thi Chang Duyen for having been the light that guided me throughout the days of completing this research I am extremely grateful for the valuable suggestions, constructive comments and guidance that she has provided me to amend my writing She has shown me the direction and motivated me to finish this research
Additionally, this study could have not been done without the support of the teacher from Hanoi Law University (HLU) She has provided me with great insight into the learning and teaching of legal English in general and particularly in HLU since the beginning Thanks to her, I have been more confident with the work I am doing and also with the final product Furthermore, I would like to thank the participants of this research, the lawyers and students, who have made enormous contribution to the completion of the study
I am also thankful that I have my family as an invaluable source of encouragement for me during the last few months Especially, I would like to send
my heartfelt thanks to my brother, who has lent me his laptop and supported me with all the technical issues
Last but not least, I would like to express my great appreciation to my beloved friends: Chim, Phu, Thao, Vor, Do and particularly Lan Phuong, who have always been by my side and given me tremendous mental support in this long and challenging journey These incredible individuals have motivated me and raised me
up whenever I wanted to give up Had it not been for their accompany along the way and throughout the sleepless nights, I would not have managed to overcome insurmountable difficulties that I encountered along the way
Trang 5Abstract
The present research is conducted with a view to developing an annotated Legal Academic Word List (LAWL) to serve pedagogical purposes for teachers, course designers and students at Hanoi Law University This corpus-based lexical study consists of three phases which began with the creation of a legal corpus compiled from the two textbooks used in the subject Advanced Legal English (ALE) 1, 2 The corpus consists of 42,052 running words belonging to eight topics studied in ALE
1, 2 In the second phase, the LAWL was produced and finally annotated Two softwares namely Anthony’s AntConc (3.5.9) and AntWordProfiler (1.5.0) (Anthony, 2020, 2021) were employed to produce the word list Based on the range and frequency criteria, a word list of 236 lemma forms was created and served as a base for the annotated LAWL From the interviews with HLU’s teachers and students and some legal practitioners, words and aspects of each entry word to be included in the annotated LAWL have been identified The final version of the annotated LAWL contains 100 legal terms of which there are 37 academic terms with specialised meanings, 34 exclusive words (not included in either the AWL or GSL) and 29 additional chunks/phrases The annotated list shows the following features of the words: phonetics, part of speech, definition, collocations, related phrases, sample sentence and topic The availability of this list has significant pedagogical contributions to the learning and teaching of students and teachers at HLU
Keywords: ESP, Legal English, LAWL, Corpus-based
Trang 6List of figures, tables and abbreviations
List of figures
List of tables
Table 1 How the textbooks are used in Advanced Legal English 1, 2 26
Table 3 The coverage comparison of the GSL and AWL in the
corpus
33
List of abbreviations
ESP English for Special Purposes
LAWL Legal Academic Word List
Trang 7GSL General Service List
ALE Advanced Legal English
HLU Hanoi Law University
ILE International Legal English: A course for classroom or
self-study use
PEIU-L Professional English in Use – Law
LE Legal English
Trang 8TABLE OF CONTENTS PAGE
CHAPTER 2: LITERATURE REVIEW
b Characteristics of English Legal Language 11
2.2 Vocabulary in ESP teaching and learning 12
2.2.3 Selection of the appropriate vocabulary for teaching and learning 14
c The building of a specialised corpus 20
d Procedure of building a corpus 20
2.4.5 Evaluation lists of high-frequency words 23
CHAPTER 3: METHODOLOGY
Trang 9CHAPTER 4: FINDINGS & DISCUSSION
4.1.1 Research question 1: What are the most frequent academic content words in the corpus of legal english textbooks that are not among the first
2000 words of English as represented in the GSL (West, 1953) based on
4.1.2 Research question 2: What are the most frequently used words that are exclusive to the Legal Academic Word List? 36 4.1.3 Research question 3: What are the words to be included in the
annotated Legal Academic Word List? 38 4.1.4 Research question 4: What aspects can be included in the annotation for each word entry from the word list? 40
Trang 10CHAPTER 1: INTRODUCTION
This chapter is going to provide a brief description of the topic, the research problems, the objectives, four questions that will be addressed, the significance and the scope of this study Furthermore, an overview of this paper is presented in the first chapter
1.1 Statement of the research problem
In the context of globalization and international integration, English has been rapidly growing as one of the most popular languages in the world (Graddol, 1997)
As an international language, English is used for various purposes, from general to more specific ones focusing on a particular discipline such as medicine, pharmacy,
IT, economics, and law
With regard to the sections of law, the use of English for legal purposes has become critical in both professional and academic contexts Specifically, the unprecedented rise in international business transactions requires legal professionals
to get involved in activities conducted in English, such as coming into contact with international clients, dealing with exotic documents, and doing translations (Snyder, 2004) Therefore, English has become a vital requirement in law firms as well as national and international organisations For law students, it can be seen that the ability to understand and use English in legal contexts plays a crucial role in helping them not only extend their legal knowledge and improve academic performances but also prepare for their future careers Such a high demand for language use in the legal field has highlighted the need for English for Specific Purposes (ESP)
ESP refers to the teaching and learning of English as a second language where learners aim at using English in a particular domain (Coxhead, 2003) It has been a prominent area reflected in the rising number of ESP courses around the world In Vietnam, ESP has gained greater attention, especially in the field of law
As a leading law school in the country, Hanoi Law University has offered a program for students majoring in English for Law in the Department of English since 2014 The students are obliged to complete four subjects relating to Basic and Advanced
Trang 11Legal English in their third and fourth year One of the focuses of these courses is
to provide the students with sufficient legal vocabulary
Vocabulary plays a vital role in language as they are “the building blocks”
of a language (Thornbury, 2002) Robinson (1991, p.4) also states, “It may often be thought that a characteristic or even a critical feature of ESP is that a course should involve specialist language (especially terminology) and content” The role of vocabulary is emphasised in reading comprehension (Bin Baki & Kemali, 2013; Golkar & Yamini, 2007) and written communication (Lee, 2003; Yang, 2013) Therefore, the learning of legal vocabulary is essential for HLU students to read and understand legal texts, write common legal text types, comprehend spoken English
in legal topics, and improve their speaking skills in a range of speaking situations typical of legal practice
Considered one of the most visible traits of legal language, legal vocabulary
is also one of the most challenging parts of learning legal English The obstacles in learning the vocabulary stem from the fact that legal English is often very different from ordinary English and other conventional technical languages (Hart, 1954) According to Lã Nguyễn Bình Minh and Nhạc Thanh Hương (2017) in an orientation document for first-year students majored in English for law, the nature
of legal English, including legal terminologies and complex grammatical structures, pose many challenges for the students when learning the language The authors listed three main problems with the learning and teaching of legal terminologies in HLU: the differences in legal systems, the large number of semi-technical vocabulary, and the use of archaic words, doublets, and triplets in legal English Besides, the students’ lack of legal background knowledge and motivation also hinders them from acquiring the vocabulary effectively
To assist the teaching and learning of the English vocabulary, Nation (2001) suggests using word lists as the primary source of vocabulary learning There are pre-compiled word lists derived from various corpora developed from millions of words Word lists, namely GSL (West, 1953), UWL (University Word List) (Xue
Trang 12and Nation, 1984), and AWL (Academic Word List) (Coxhead, 1998), have been proven to be beneficial to English teachers and learners However, for ESP learning, discipline-specific word lists are becoming increasingly essential as different disciplines would show different registers (Hyland & Tse, 2007) Over the past decades, there have been research focusing on a particular discipline such as medicine (Baker, 1988; Chen & Ge, 2007; Wang et al., 2008), engineering (Mudraya, 2006; Ward, 2009), and business (Konstantakis, 2007) These word lists have shown their importance in facilitating the teaching and learning of domain-specific vocabulary This has become one of the driving forces for the researcher to develop a word list in law for the students at HLU, called the Legal Academic Word List (LAWL) in this paper Furthermore, according to Nation (1997, p.17), frequency is one measure of usefulness to a word as it “provides a rational basis for making sure that learners get the best return for their vocabulary learning effort” Hence, the words in LAWL are listed by frequency of occurrence within the given text corpus of two legal textbooks Wordlist by frequency is necessary for the students’ vocabulary acquisition and for teachers and curriculum designers (Nation,
For all the presented conditions, the researcher is motivated to conduct a
study on “Creating a corpus-based Legal Academic Word List for students and teachers of English for Law at Hanoi Law University” The LAWL will be annotated
to provide the students and teachers with more information on how the words are used in the contexts of the given corpus
Trang 131.2 Research aims and Research questions
The research aims to create a word list that is developed from a corpus from the two textbooks used by 4th-year students at HLU The research also investigates the perspectives of students, teachers, and lawyers towards the use of the designed word list in academic and professional settings
Accordingly, the current study seeks to answer the following research questions:
Research question 1: What are the most frequent academic content words in the
corpus of legal english textbooks that are not among the first 2000 words of English
as represented in the GSL (West, 1953) based on frequency criteria?
Research question 2: What are the most frequently used words that are exclusive
to the Legal Academic Word List?
Research question 3: What are the words to be included in the annotated Legal
Academic Word List?
Research question 4: What aspects can be included in the annotation for each word
entry from the word list?
1.3 Scope of the study
This study only focuses on building a word list for 4th-year students who
major in English for Law, studying the main textbooks International Legal English:
A course for classroom or self-study use by Amy Krois-Lidner and Trans Legal and Professional English in Use – Law by Gilian D Brown and Sally Rice The word
list only consists of vocabularies compiled from written texts in the textbook (i.e., spoken texts have been excluded) As the word list is built on a specialised corpus, the use in a different context is not recommended
1.4 Significance of the study
Overall, the research is expected to provide readers an insight into the process
of creating a word list from a specialised corpus, which can be a source of helpful information for the students, teachers, course designers and researchers interested
in a related research topic
Trang 14As for the students, the study hopes to provide them with helpful instruction and a focused material that could assist their learning of legal vocabulary in the program The word list will become a potent tool for them to master Advanced Legal English
This research can also support the teachers in targeting and teaching essential vocabulary since the corpus and word list can be used as data for material design, lesson planning, and classroom activities
Regarding other researchers, they can find reliable information for their related study
1.5 Organisation of the study
This thesis comprises five chapters
Chapter 1: Introduction – reveals the research problem, research aims,
research questions, scope, significance, and the organization of the study
Chapter 2: Literature Review – reviews relevant literature to this research
work, including the definitions of some terms, followed by a description and selection, evaluation and analyis of a corpus This chapter discloses the framework
of the study
Chapter 3: Research Methodology – explains the research methods, the
context, and participants of the study, together with the data collection and data analysis procedure
Chapter 4: Findings and Discussion – offers the research results and the
discussions about the findings to provide answers to four research questions
Chapter 5: Conclusions – summarises the significant findings, provides the
recommendations for students, ESP lecturers, and ESP course designers, as well as highlights some limitations of the study, and suggests directions for the future research area
Trang 16CHAPTER 2: LITERATURE REVIEW
The second chapter is going to review the theories and existing information relavant to the research problem That includes materials related to English for Specific Purposes (ESP), Vocabulary in ESP teaching and learning, The lexical approach, Corpus Linguistic, and The review of previous studies
2.1 English for Specific Purposes
2.1.1 Definition
The definition of ESP varies among researchers According to Hutchinson and Waters (1987), ESP is based on designing courses to meet learners’ needs The authors define ESP as an approach to language teaching in which “all decisions as
to content and method are based on the learner's reason for learning" (p.19) rather than a product That is because ESP cannot be classified as a particular language or methodology and does not contain specific types of teaching materials
Strevens (1988) defines ESP by making a distinction between (1) absolute characteristics and (2) two-variable characteristics Absolute characteristics mention the language that is designed to meet specified needs of the learner; the relation in content to particular disciplines, occupation and activities; the centredness of appropriate language in terms of syntax, text, discourse, etc and discourse analysis; and the contrast with General English Regarding two variable characteristics, ESP may be restricted to language skills and may not be taught according to any pre-ordained methodology Robinson’s (1991) definition of ESP
is based on two criteria which are (1) ESP is “normally goal-directed” and (2) ESP courses develop from a needs analysis Considering the validity and weakness of previous definitions, Dudley-Evans and St John (1998) modified Strevens’ definition of ESP (1988) and developed a more complete one (1988)
“Absolute Characteristics
1 ESP is defined to meet the specific needs of the learners,
2 ESP makes use of underlying methodology and activities of the discipline
it serves,
Trang 173 ESP is centered on the language appropriate to these activities in grammar, lexis, register, study skills, discourse, and genre
Variable Characteristics
1 ESP may be related to or designed for specific disciplines,
2 ESP may use, in specific teaching situations, a different methodology from that of General English,
3 ESP is likely to be designed for adult learners, either at a tertiary level or
in a professional work situation It could, however, be for learners at the secondary school level,
4 ESP is generally designed for intermediate or advanced students,
5 Most ESP courses assume some basic knowledge of the language systems” (Dudley-Evans& St John, 1998, p.4)
The definitions of ESP offered by the authors mentioned above show some critical features of ESP First, it is based on the analysis of the students’ needs and tailor-made to fulfill these needs Second, it may be different from other general language courses in terms of skills selection, themes, situations, functions, language and methodology Third, it aims at preparing the learners for successful language performance in occupational or educational environments without narrowing down the learner’s age and level
The third type of ESP, as identified by Carver (1983) is English with specific topics, which concerns anticipated future needs of English
Trang 18Hutchinson and Waters (1987) represent the relationship between ESP and ELT in the form of a tree, which shows that standard divisions have been made in ESP There are two main types of ESP differentiated according to the learners’ purposes for learning English The learners may require English for academic study (EAP: English for Academic Purposes), for work (EOP/EVP/VESL: English for Occupational Purposes/English for Vocational Purposes/Vocational English as a Second Language) This way of classifying ESP is similar to the second type of ESP offered by Carver (1983) People can work and study simultaneously and there are cases that the language learnt for immediate use in an academic environment that will be needed later in a working context Hence, Hutchinson and Waters (1987) suggest that there is no clear-cut distinction between EAP and EOP Another way
to distinguish ESP courses, according to Hutchinson and Waters (1987), is based on the general nature of the learners’ specialism, which can be separated into three large categories: EST (English for Science and Technology), EBE (English for Business and Economics) and ESS (English for the Social Sciences)
Robinson (1991) also divides ESP into two main areas but based on experience or when it takes place These distinctions are believed to play an important role in deciding the specificity appropriate for the course For example, specific work related to actual discipline will be ruled out in a pre-experience or pre-study course
However, Bojovic (2006) raises the concern that the division of ESP
courses results in various issues as it fails to “capture fluid nature of the various types of teaching and the degree of overlap between “common – core” EAP and EBP and General English” Consequently, she suggests the continuum of ELT course types by Dudley-Evans and St John (1998) that runs from general courses
to more specific ones, as illustrated as the graph below:
Trang 19Figure 1
Continuum of ELT course type
Even though the classification of ESP may overlap and cause potential confusion; however, it is important to define and classify what is meant by ESP (Dudley-Evans and St John, 1988)
2.1.3 Legal English
a Definition
Identified as a branch of ESP, the term Legal English (LE) has been understood in variable ways Some people refer to LE as legalese, a traditional legal writing style that is not readily comprehensible to lay readers (Oates & Enquist,
2009, p.127.) because it is cluttered, wordy, indirect and may contain redundant technical words or phrases Meanwhile, other people may consider Legal English
Trang 20as a shortcut for Anglo-American law Such differences in interpretation have led the ESP practitioners to replace the term LE with English for legal purposes (ELP)
or other terms accounting for different subsets such as EALP (English for academic legal purposes), EOLP (English for occupational legal purposes) and EGLP (English for general legal purposes) (Paltridge and Starfield, 2013)
Legal English is referred to as a “sublanguage” since it is different from ordinary English (Tiersma, 1999) Therefore, the study of legal language can be regarded as learning a second language with a specialized use of vocabulary, phrases and syntax that facilitates communication (Ramsfield, 2005) Similar to this idea, In the Handbook of English for Specific Purposes, Northcott (2009) defines the term Legal English in more detail as “English language education to enable L2 law professionals to operate in academic and professional contexts requiring the use of English” (p.166)
b Characteristics of English Legal Language
Generally, the characteristics of English Legal Language can be summarised into the following points: Archaisms and borrowings; long and complex sentences; passivization, subordination, nominalization; legal doublets; impersonal style – avoiding personal pronouns; particular usage of modal verbs and legal “shall” – imposing an obligation or duty on someone; technical vocabulary and repetition of words
Legal English learners may encounter difficulties in learning the language firstly because of its writing conventions David Crystal (2004) proposed an influence in styles upon English legal language This has caused the lack of transparency and obscurity in legal discourse, with its frequent use of formal words, expressions with different meanings, extreme precision, and complex grammar structure (e.g., Danet, 1980; Mellinkoff 1963) The influence in style results from the developments in the history of the English language Medieval French has led
to long, complicated sentences in LE, while Anglo-Saxon has given alliterative phrases, which is an oral tradition
Trang 21Another issue with legal language is compounded by its system-bound nature, which means many legal terms can only be understood by reference to the particular legal system However, LE has traditionally been the preserve of lawyers from English-speaking countries, which share common law systems Therefore, learners, including both students and practicing lawyers from countries, including Vietnam, whose legal systems are based on civilian law, will encounter many difficulties in understanding legal terms
The most challenging part when learning LE is a large number of difficult words and phrases, categorized into four groups by Haigh (2009): legal terms of art, legal jargon, words with legal meaning differing from the general meaning and words used in apparently peculiar contexts Other key features of LE that cause obstacles such as the use of unfamiliar preforms (e.g the same, the said, the aforementioned, etc.), the use of pronominal adverbs (e.g., hereof, hereto, etc.) and the use of phrasal verbs in quasi-technical sense (e.g parties enter into contracts, put down deposits, etc.) can also be added
2.2 Vocabulary in ESP teaching and learning
Vocabulary acquisition is regarded as a fundamental and important component in the course of most second language learners The students must have suitable strategies to deal with specific vocabulary Paul Nation (2001) suggests that
to overcome the obstacles of specialized usage of vocabulary, different types of vocabulary should be distinguished
2.2.1 Types of vocabulary
Vocabulary can be divided into the following subtypes:
1 Spoken and Written Vocabulary: The written vocabulary, according to Cambridge International Corpus (CIC) (Schmitt and McCarthy, 1997) mainly comprises lexical/ non-lexical words while spoken one tends to be made up of lexical words The study shows that spoken language is the central source of contact to communicative language, while written language is a fundamental source for input (Schmitt and McCarthy, 1997)
Trang 222 Core and non-core vocabulary: The former refers to words that occur frequently and are more central to the language, while subject-specific vocabulary can be considered the latter one
3 Discourse Structuring Vocabulary and Procedural Vocabulary: Discourse structuring vocabulary includes abstract nouns with little independent lexical content (e.g., assumption, variety, etc.) Meanwhile, procedural vocabulary is commonly used in dictionaries to provide definitions
4 Technical, semi-technical and general vocabulary: Two broader categories have been proposed by Dudley-Evans and St John (1998), which include semi-technical vocabulary (i.e., the one that is used in general language but has a higher frequency of occurrence in specific and technical descriptions and discussion), and technical vocabulary (i.e one that has specialized and restricted meanings in specific disciplines and may vary in meaning across disciplines)
5 Academic vocabulary: According to Dudley-Evans and St John (1998), academic vocabulary and semi-technical vocabulary should be prioritised in the teaching and learning of ESP because these types of vocabulary do not only appear in general life contexts but also scientific and technical descriptions and discussions
2.2.2 Legal terminologies
Cambridge English Dictionary defines terminology as special words and expressions used concerning a particular subject or activity As words and expressions are “building blocks” of language (Thornbury, 2002), it is noted that legal lexicons and legal jargons are the based components of the language used in legal settings (Ma and Nguyen, 2019)
Berukstiene (2016) divides legal vocabulary into (1) purely technical vocabulary, (2) semi-technical vocabulary (common terms with uncommon meanings), and (3) shared, common or unmarked vocabulary The first group contains extensive use of archaic vocabulary, doublets and triplets, Latin phrases (e.g pari passu, de jure, de facto and pro bono, etc.) and other words such as herein
Trang 23and hereto The second group refers to the common terms with special meanings in legal contexts Cao (2007) defines this type of term as ‘legal technical terms that carry special legal significance’ The last type of legal terminology is related to the words that are also widely used in non-legal settings Some examples of performative English legal lexicons can include agree, claim, represent, certify, and declare, etc
Haigh's (2009) division of legal terminology is mostly similar to Berukstiene’s However, he adds another classification of legal terms, which mentions legal terms of art and legal jargon Legal terms of art are “technical words and phrases that have precise and fixed legal meanings and which cannot usually be replaced by other words.” (p.4) Some of them may be familiar to laypersons (e.g., patent, share, royalty) while others are only known to lawyers (e.g bailment, abatement) On the other hand, legal jargon, ranging from near slang to almost technically precise words, is only known to lawyers
2.2.3 Selection of the appropriate vocabulary for teaching and learning
To learn vocabulary effectively, it is recommended that teachers and learners should use word lists as the main source of vocabulary learning (Nation, 2001) Derived from various corpora, pre-compiled word lists are greatly useful as they facilitate teachers’ selection of words to teach and allow learners to have a systematic study However, teachers should follow some criteria when choosing words to be appropriate for the contexts and students First, the word lists must contain words representing the varieties of words they are intended to reflect Second, the words selected should be found across a range of different text types Finally, some vocabulary items with multi-units such as so far, good night, and all right, should be considered to be regarded as a whole and included in the teaching list
2.3 The lexical approach
Lexis or lexicon is the vocabulary of a language as different from the grammar The lexicon consists of frequently produced chunks of a language
Trang 24combining to create coherent communication (Lewis, 1993) The lexical approach
is a method of teaching foreign languages proposed by Lewis in 1993 It is based on the basic principle is that "Language is grammaticalised lexis, not lexicalised grammar" (Lewis, 1993) This means that lexis is central in creating meaning and grammar serves as a subservient managerial part Nattinger & DeCarrico (1992) also agrees that it is the learners’ ability to use lexical phrases that help them to speak fluently The prefabricated speech offers more efficient retrieval and permits speakers (and learners) to direct their attention to the larger structure of the discourse
Central to the approach is the idea of collocation Collocation is part of
“lexical chunk” defined as pairs of groups of words commonly found together (e.g
by the way, up to now, etc.) However, collocation is a phrase that combines lexical content words (e.g., basic principles) It is said that identifying chunks and collocations is based on intuition unless access to a corpus is available
Corpus analysis facilitates the learning of lexis by showing the actual use of
a term, locating words found in close proximity and displaying set phrases As in The Routledge handbook of corpus linguistics, a corpus can provide information about (1) lexis and the lexicon (the general lexicon and word formation); (2) Phraseology and phrases (collocation and patterning; fixed expressions and idioms); (3) meaning (context and meaning, polysemy, metaphor, connotation and ideology); (4) sets and synonyms; (5) antonyms and opposites These types of information are considerably useful for language learning, especially to non-native speakers Therefore, corpus linguistics has been recommended by Lewis (1997) as a tool to implement the lexical approach
2.4 Corpus Linguistics
2.4.1 Definition
Corpora, as broadly defined by The Expert Advisory Group on Language Engineering Standards (EAGLES), can comprise any type of texts, not only prose, newspapers, poetry, drama but also word lists, dictionaries, etc However, Meyer
Trang 25(2002) defines a corpus as “a collection of texts or parts of texts upon which some general linguistic analysis can be conducted” (p.7) Corpus is a collection of computer-readable texts compiled for linguistic purposes (e.g., Wynne, 2005 and Aston, 1996) Hence, corpus linguistics is generally considered as a methodology for doing linguistic analysis (Meyer, 2002; O’Keeffe & McCarthy, 2010) Similarly, Cotos (2017) refers CL to the study of large quantities of authentic language using computer-assisted methods
There are two major analytical approaches to corpus linguistics: based and corpus-driven A corpus-based approach relies on corpora that are balanced and representative, can be either small or large and are usually annotated Meanwhile, corpus-driven studies are not necessarily balanced and reprensentative and they are required to be dependent on large corpora Furthermore, corpus-driven studies are not essentially annotated Considering the small scope of this study, the corpus-based approach has been employed
Corpora of Spoken English
Corpora of Academic English
Corpora of Professional English
Corpora of Learner English (First and Second Language Acquisition)
Historical (Diachronic) Corpora of English
Corpora in other languages
Parallel Corpora/Multilingual Corpora
Two opposing trends have appeared in the compilation of corpora Corpora are getting either larger, with “mega-corpora” (e.g., Bank of English and the
Trang 26Cambridge International Corpus) or more specialised, focusing on specific registers and genres (Flowerdew, 2002) One advantage of specialised corpora is that the corpus is more closely linked to the contexts in which the texts were produced As
a result, “the quantitative findings revealed by corpus analysis can be balanced and completed with qualitative findings” (Paltridge and Starfield, 2013) Such a connection between the corpus and the contexts of use is particularly relevant in the fields of ESP Specialized corpora (e.g., the Michigan Corpus of Academic Spoken English (MICASE) and the Cambridge and Nottingham Spoken Business English Corpus (CANBEC) contain texts from a particular genre, register, a specific time or context Small corpora can be compiled to meet certain needs of practitioners They are believed to be beneficial to the learners because they represent language characteristics of the registers (i.e written or spoken, formal or informal, etc.) and genres (e.g., research articles, proposals, business reports, etc.) of interest to the learners’ particular purposes
Paltridge and Starfield (2013) have a different way of categorising corpora when it comes to ESP ESP corpora is divided into the following types:
1 ESP corpora with restricted access: This type of ESP corpora is restricted because some texts are of little use for other learners in different contexts However, some small private ESP corpora are popular because the findings from their analysis are relatable to a variety of learners An example of this is Coxhead ’ s Academic Corpus, used to create the Academic Word List (AWL) (Coxhead, 2000, 2011) This has been a resource for EAP materials developers
2 ESP corpora in the public domain: This type of corpora is similar to the Corpora
of General English Some examples of this are British National Corpus (BNC) containing 100 million words and a balance of texts from various domains of spoken and written language ESP practitioners can utilize ready-made corpora that contain sub-components of relevance to ESP to produce specialized corpora
Trang 273 Academic learner corpora: These refer to the ones consisting of student writing (e.g., the Corpus of Academic Learner English (CALE) (Callies & Zaytseva, 2013)
4 Research article corpora: These corpora are made up of research articles
2.4.3 The building of a corpus
a Text selection
Before building a corpus, it is essential to consider the definition of texts This term refers to transcribed speech when it comes to the studying of spoken language in transcription Some examples of units of speech that might be considered to be texts are a telephone conversation, a lecture, a meeting, etc On the other hand, the notion of a “text” with regards to written texts can be illustrated through the printed monograph or work of narrative fiction In the scope of this research, the creation of written texts will be further focused on
Atkins, Clear & Ostler (1991) presented several characteristics of prototypical written texts First, it is discursive and typically at least several pages long Second, it is integral, which means that it usually has a beginning, middle and end Third, it is the product of a unified authorial effort Finally, it guarantees the stylistic consistency of language However, there are many types of written language that do not follow these features Therefore, the authors listed some typical deviations For example, it may be convenient to treat one issue or an article in a newspaper as one text Meanwhile, as for poems, it is often more convenient to gather many short ones composed by the same author into collections and treat each collection as a text
Wynne (2005, p.8) mentions some criteria for text selection, including:
“the mode of the text; whether the language originates in speech or writing, or perhaps
nowadays in electronic mode;
the type of text; for example, if written, whether a book, a journal, a notice or a letter;
Trang 28 the domain of the text; for example, whether academic or popular;
the language or languages or language varieties of the corpus;
the location of the texts; for example, (the English of) the UK or Australia;
the date of the texts.”
He also notes that criteria should be chosen depending on the builder’s resources has at the selection stage
b Criteria of a corpus
Meyer (2002) mentions the compilation criteria of a corpus, namely size of corpus (i.e., number of words), genres (spoken & written), length of text samples, number of texts, range of speakers, time frame and native vs non-native speakers Regarding the corpora for ESP, Aston (1996) groups these criteria into three aspects: (1) corpus specificity (i.e., corpora for ESP should be as specific as possible); (2) corpus size (i.e the corpus should be as large as possible); (3) corpus representativeness (i.e., larger and more general corpora are more likely to be representative)
The matter of size is closely linked to the representativeness of a corpus Paltridge and Starfield (2013) state that a corpus needs to be sufficiently large to represent a given language variety or type of text However, a specific number of words cannot be given to answer the question of how much data should be used because corpus size is not a case of one size fits all (Carter and McCarthy, 2001) Corpora for specific research or pedagogical purposes tend to be smaller, yet, they still can be representative of a specific language variety A corpus for a specific purpose may contain less than 20 high-quality documents and still yield representative examples of keywords and collocations (Lewis, 2001)
In addition to representativeness, the notion of balance, as Wynne (2005) mentioned is more ambiguous In order to be described as balanced, a corpus must contain the proportions of different kinds of texts that correspond with informed and intuitive judgments (Wynne, 2005) However, it is said that most general corpora of
Trang 29today are not balanced because they lack spoken language Another factor affecting balance is the extent to which the texts are specialised
Representativeness and balance have been central issues in corpus design Therefore, the builder should retain them as target notions
c The building of a specialised corpus
The degree of specialisation of a corpus depends on the following criteria suggested by Flowerdew (2004, p.21)
Specific purpose for compilation, e.g., to investigate a particular grammatical or lexical item
Contextualisation: particular setting, participants and communicative purpose
Genre, e.g., promotional (grant proposals, sales letters)
Type of text/discourse, e.g., biology textbooks, casual conversation
Subject matter/topic, e.g., economics
Variety of English, e.g., Learner English
Many ESP corpora are very specialised because they have been compiled for research or pedagogy One example of specialised ESP corpus is the Indianapolis Business Learner Corpus (IBLC), consisting of 200 letters of application written by business communication students from different nations as part of a writing course
In addition to the degree of specialisation, developers should consider the following points when building a specialised corpus:
The corpus should be representative
The corpus should be set up in a way that is suitable for research
d Procedure of building a corpus
According to Bennet (2010), corpus builders should focus on three factors when designing a corpus First, a corpus must contain the language chosen according to specific characteristics Second, it must also include authentic texts, which author defines as those used for genuine communicative purposes For
Trang 30instance, the MICASE only includes natural speech acts from daily events at a university Finally, a corpus is stored electronically
The above-mentioned factors have been taken into consideration in the creation of a corpus that consists of three parts, including collection, computerization, and annotation of data
a Data collection
When collecting data, the builder should address the issues of sample size and balance to achieve an acceptable level of representativeness However, the corpus itself is a sample; thus, it needs to represent a given aspect of language Kilgariff et al (2006, p 129) note that “There are no generally agreed objective criteria that can be applied to this task: at best, corpus designers strive for a reasonable representation of the full repertoire of available text types”
Next, the number of samples should be determined Meyer (2002) suggested
a “sampling frame”, also known as “probability sampling”, which is achieved by
“identifying a specific population that one wishes to make generalizations about’ (p.42) Another sampling approach is “non-probability” sampling or “convenience”
or “opportunistic” sampling The corpus builder can also combine the two approaches to identify the number of samples
Deciding the sample size and make-up is also an important phase when collecting samples It is suggested that a rise to approximately 20.000 words randomly taken from the selected texts will provide a sample of adequate size to achieve the representativeness of a genre Another issue found in sample make-up
is whether to use extracts or entire texts Taking extracts at random when creating a corpus may lead to failure in gaining representativeness To solve this problem, the purpose to which the corpus will be put should be identified
Another problem mentioned in sample collection is where to gather the texts Two primary sources are introduced, which are publicly available data such as newspapers, journals, magazines and the Internet, and private data
b Data computerization & annotation
Trang 31After having the samples collected, the data needs to be computerised and annotated Corpus annotation is “the practice of adding interpretative linguistic information to a corpus” (Leech, 2004, p.1) There has been a debate about whether
to include an annotation in a corpus or not Some authors, for example, Sinclair (2001) prefer to investigate “pure” corpus, while others believe that annotation is a means to enrich the original raw corpus As a result, the annotation should be viewed
as optional by different researchers
2.4.4 Analysis of a corpus
Having the corpus built, a set of general tools must be used for processing the corpus, for example, WordSmith (Scott, 1996), Monoconc Pro (2000) and AntConc (Anthony, 2014)
This software allows the analysis of the corpus in terms of types, word families, lemmas and tokens According to Nation (2001, p.7-8), tokens refer to all the words counted in a text, including those repeated Meanwhile, a type refers to base words or lemmas A word family is a group of words related to a headword that contains words with different parts of speech A lemma is different from a family as it does not include members from different parts of speech
Basic requirements for corpus analysis are KWIC concordances, word frequency lists, collocation statistics According to Paltridge and Starfield (2013),
to achieve a new perspective on the language in the corpus, the first analytical steps generally involve the production of a frequency list and the generation of concordances (i.e., examples of particular items in context) Lexicalgraphers, language syllabus designers and material designers can benefit from using the frequency lists Furthermore, the lists can form the basis of more complicated statistical measures Concordance analysis is another valuable analytic technique because it allows users to see many examples of items in one place, in their original context The COBUILD Concordance and Collocations Sampler and the Corpus-based Concordances are two examples of popular online concordance programs
Trang 32More advanced text processing tools might include Lemmatization, speech tagging, Parsing, Collocation, Sense disambiguation and Link to the lexical database (Atkins, Clear & Ostler, 1991) Lemmatization is used to relate a particular form of a word to its base form To assign a word class to every word, the tool Part-of-speech tagging will be utilized The collocation is useful to compute the statistical association of word forms in the text)
Part-of-2.5 Review of previous studies
Paul Nation (2001) divides vocabulary into four categories: high-frequency words, academic words, technical words and low-frequency words The first categories include words that are the most common in English that beginners should focus on General Service List (GSL) (West, 1953) consisting of approximately 2,000-word families is a well-known list of high frequency words On the other hand, those learning English for academic purposes, including ESP students, need
to focus on academic vocabulary acquisition to achieve academic proficiency Coxhead’s Academic Word List (AWL) (2000) is a famous discipline-crossing academic word list that was built to support the learning and teaching of academic vocabulary The list was developed from a corpus of 3,5 million tokens of written academic texts in 28 subject areas across four major fields including Law, Science, Commerce, and Art Coxhead based on four criteria: range, frequency and specialised occurrence to generate 570-word families for the AWL However, some researchers such as Chen & Ge (2007) and Martinez (2009) have found that only a small number of words from AWL are found across different specific disciplines
As a result, more researchers follow the approach in which discipline-specific academic word lists are developed to be suitable for learners of a particular field
Many discipline-specific word lists have been compiled, such as medicine (e.g., Chen and Ge, 2007; Wang et al., 2008), nursing (Yang, 2014), and engineering (Ward, 2009) While Chen and Ge (2007) developed the corpus and investigated the coverage of the AWL in medical texts, Wang et al (2008) built the Medical Academic Word List (MAWL) from the corpus of medical research articles
Trang 33With regard to the learning of Legal English, few prior studies have shown the use of corpora to facilitate the students’ learning A bilingual corpus of Chinese and English law has been documented by Fan and Xunfeng (2002) to assist the translation by Hong Kong students Hafner and Candlin (2007) provided the students in a legal writing course with a corpus built specifically for the course In addition, S kier and Vibulphol (2016) developed a corpus of sixteen million words derived from high-quality authentic legal texts to facilitate the study of legal English
by L2 learners The corpus was created using the U.S Supreme Court published opinions and written judgements from the highest state courts
In the context of Vietnam, few studies related to LE have been found Moreover, there have been no word lists developed for the use of LE learners in general and HLU students in particular Such a gap has become a driving force for the researcher to conduct this study to build a corpus-based list of high-frequency words for fourth-year students at HLU
Trang 342.5.1 Evaluation lists of high-frequency words
There have been several corpus-based lists of high-frequency words such
as General Service List (West, 1953), Corpus of Contemporary American English (COCA) list (Davies & Gardner, 2010) and SUBTLEX lists (Brysbaert & New, 2009) However, reviewers’ comments are rarely included in corpus documentation Lexical coverage has been critically evaluating corpus-based wordlists However, teachers’ perception also plays an important role in developing and validating academic vocabulary lists (He & Godfroid, 2018; Simpson-Vlach & Ellis, 2010)
CHAPTER 3: METHODOLOGY
The purpose of this chapter is to provide the context in which the study is taken place, the outline the research methods employed and the procedures that have been followed Additionally, the researcher is going to explain the reasons why particular methods were chosen and how the study was conducted The word list development includes five main stages and adopts a quantitative approach using computational tools as the main approach
3.1 Research context
This study is conducted in the Department of English in Hanoi Law University The students majoring in Legal English need to finish four compulsory subjects: Basic Legal English (BLE) 1, BLE 2, Advanced Legal English (ALE) 1, and ALE 2 Legal English students are required to complete BLE 1 in the second semester of their second year (semester 4), BLE 2 in the first semester of their third year (semester 5), ALE 1 in the second semester of their third year (semester 6) and ALE 2 in the first semester of their fourth year (semester 7) ALE 3 is an optional subject in the second semester of the fourth year (semester 8)
This study narrows down to establishing a word list of Advanced Legal English for the 3rd and 4th-year students of the program and the teachers of legal English for the following reasons First, the word list is essential for the students to gain higher academic achievements in their fourth year The list helps them to revise the vocabulary they have learnt in ALE 1 (second semester of the third year) after
Trang 35the summer break and prepare for ALE 2 in the first semester of the fourth year The annotated word list will be helpful for the students when preparing for the final exams as well Second, the word list is essential for their future career The ability
to understand and use legal vocabulary is essential in helping fourth year students
to perform well in their internships at legal firms However, according to the headteachers in the program, even though the students are proficient in general English, they encounter many difficulties in learning LE vocabulary, especially at the advanced level For all these reasons, having a word list of vocabulary from Advanced Legal English 1,2 will assist the students in both their study and work Hence, the phrase ‘HLU students’ in this research is understood as the students of third and fourth-year majoring in English for Law in HLU With regards to the teachers, the word list provides them with pedagogical application in teaching and test designing As the final exams focus on testing the students’ vocabulary and reading/listening comprehension, the word list can help them select suitable words and phrases for effective teaching and testing
Beside the constraints of users, the word list is developed within the field of commercial law only since it is the program’s main focus Hence, the word list would concentrate on important topics such as company law, intellectual property and employment law Regarding the legal systems where this word list can be used,
it is noted that nearly all legal concepts and legal terms included can be found in legal systems and jurisdictions the world over
3.2 Creating a Legal Academic Word List (LAWL)
The development of the word list includes the steps as shown in the chart below:
Figure 2
Wordlist development process
Trang 363.2.1 Corpus design
This study aims at constructing a specialised corpus of legal English texts to facilitate the learning of non-native speakers Using the parameters by Flowerdew (2004), this corpus is specialised in terms of the contextualisation (particular setting and participants as mentioned in the previous part), the subject matter (law) and the variety of English (Legal english for law students)
In this study, the quantitative and corpus-based approach have been adopted, together with computational tools for establishing a Legal Academic Word List (LAWL) The corpus-based approach was selected as the intended specialised corpus is small and only used by limited targets
3.2.2 Data collection for the corpus
Advanced Legal English consists of two subjects which are Advanced Legal English 1 (studied in the second semester of the third year) and Advanced Legal English 2 (studied in the first semester of the fourth year) The students study
Advanced Legal English using two textbooks, International Legal English: A course for classroom or self-study use (ILE) by Amy Krois-Lidner and Trans.Legal and Professional English in Use – Law (PEIU-L) by Gilian D Brown & Sally Rice
The content of each textbook was selected and integrated into the lesson of different topics as descibed in the table below
Table 1
How the textbooks are used in Advanced Legal English 1, 2
Trang 37Subject Topics ILE PEIU-L
ALE 1
Company law Unit 2, 3, 4Insolvency and winding up Unit 24
Mergers and Acquisitions Unit 27
ALE 2
Intellectual property law Unit 11 Unit 42, 43Competition law Unit 15, 8 Unit 8
Information technology law and
cybercrime
Unit 44
Purposive sampling has been used in choosing texts for building the corpus
As unit 2, 3, 4, 8, 11 and 15 from ILE and unit 8, 24, 26, 27, 41, 44, 45 from
PEIU-L are used in Advanced PEIU-Legal English subjects, those units have been selected as sources of data for building the corpus of Advanced Legal English for fourth-year students in HLU These units were divided into eight topics: Company Law, Insolvency and winding up, Corporate tax, Mergers and Acquisitions, Intellectual property law, Competition law, Employment law, and Information technology law and cybercrime The texts in each unit belong to the following categories: article extracts, language exercises, reports, conversations, letters, emails and speaking samples (presentations, discussions and giving opinions) All 100 pages from the selected units were scanned into PDF files, converted into Microsoft Word documents, and proofread In order to compile the Legal Academic Word List (LAWL), all irrelevant figures, letters, charts, and diagrams were removed and the collected documents were converted into plain text format
3.2.3 Data processing (Analysis of the corpus)
In order to process the collected data, Anthony’s AntConc (3.5.9) and AntWordProfiler (1.5.0) (Anthony, 2020, 2021) and have been employed as they are comprehensive and easy-to-use corpus analysis software for text analysis
Trang 38In this study, AntWordProfiler was the primary tool used to identify statistical values, including the number of word tokens, lemma types and word families, as well as the range and the frequency of the words appearing across the units in the corpus
Out of the total tokens in the corpus, which were generated using AntWordProfiler (1.5.0), only words that meet certain criteria were selected for the LAWL The criteria for word selection in this corpus establishment was based on the principles employed by Coxhead (2000) in her creation of the Academic Word List (AWL) She omitted the GSL words and selected wide-ranging words appearing in at least 14 out of 28 subject areas with a minimum frequency of 100 Similarly, the first criteria in this study is that all the word families included in the corpus had to be outside of the first 2000 most frequent words in the General Service List (GSL) (West, 1953) This is because the study aims at creating an academic word list AntWordProfiler (1.5.0) has been used to identify the share words between the LAWL and the GSL (West, 1953) From this, the researcher can eliminate the GSL’s words out of LAWL Second, the word families had to appear
in at least four of the eight subfields Third, the minimum frequency of each word had to be four Based on those selection conditions, the candidate words for the Legal Academic Word List (LAWL) were identified
The software was also used to compare LAWL with the Academic Word List (Coxhead, 2000) which consists of 570 word families By comparing the LAWL and AWL, all the words exclusive to the LAWL were highlighted to inform the learners about a more discipline-specific vocabulary
After analysing the corpus, any incorrect items, proper names and function words were removed from the processing lexical profiling analysis using AntConc (3.5.9) In this study, Paul Nation’s list of 320 function words (Nation, 2001) has been employed to keep only content words in the lexical profiling
Before finalising the word list, the LAWL was compared with the glossary and index from the two textbooks This allowed the researcher to add in any
Trang 39important words or chunks that appeared in the text but were not included in the word list
3.2.4 Wordlist annotation
An annotated vocabulary list was developed from LAWL using AntConc (3.5.9) Words that are found in the corpus but not in the GSL were chosen for the annotated word list The words are divided into two sections: (A) Section A includes academic words with special legal meaning, and (B) Section B includes exclusive terms Since the LAWL include words of all forms appearing in the corpus, such as plural and singular, the words selected for each section had to meet two criteria First, only words with legal meanings were chosen Other words appearing in the
instructions or non-related to law such as email and Internet were not included
Second, if words under the same word family appear in different forms, the word
family was included For example, the word filed and filing are under the word family file Hence, the word file was annotated instead of filed and filing Third,
except for plural words with special meanings, other words were annotated in their
singular form For instance, the word proceedings appear in the LAWL; yet, it is noted as proceeding in the annotated LAWL.
Following each word, other important chunks or phrases that appear in the glossary and index from the two textbooks but not in the LAWL were also added since “lexical chunks play an important role in human language processing and acquisition” (Bogart, Noord & Rosner, 2001) Lexical chunks are divided into poly-words, collocations, institutionalised expressions and sentence heads or frames/sentence builders In this research, only collocations were added using the Cluster’N-Grams function in AntConc
Additionally, according to (Sschmitt, 2000, p.5), all the properties known as
“word knowledge” consisting of meaning, register, association, collocation, grammatical behavior, written form, spoken form, and frequency, are vital in vocabulary acquisition Therefore, apart from the words, collocations and part of speech, other features such as definition, sample sentences and the topics where the
Trang 40word can be found were also included Regarding the definitions, the words were defined using the book’s glossary, the Black's Law Dictionary, America’s most trusted law dictionary and highly recommended by the teachers at HLU, and Cambridge Dictionary Besides the terms can be defined using dictionaries, some phrases have to be learnt through legal theories These phrases will not be annotated but noted in a separated column In order to choose the correct contextual meanings for the words as used in the corpus, the prominent part of speech was decided according to how the word was used in the corpus Two parts of speech were noted
when the word was used as such For example, the word appeal was used as a verb
and a noun in the corpus; thus, both forms were included in the annotated word list This process was done manually based on the concordance function since the software does not automatically identify the word forms in the corpus
3.2.5 Wordlist validation
The word list validation part provides the researcher with information relating to how the words should be included in the annotated word list, and the structure of the list The interview results in this part will help the researcher identify the problems that legal English learners, users and teachers encounter so that the annotated LAWL can address their needs Moreover, through the interview, strengths and weakness of the word list have been identified; thus, adjustment has been made to improve the list
Participants
The completed word list will be sent to legal practitioners, fourth-year Legal English students and English teachers at the Department of English in HLU for validation
First, the participants of this research include senior associates and junior associates in Vietnam-based commercial law firms, who have to use a lot of Legal English for their work Senior associates and junior associates are not only experienced in commercial law, but they are also competent in using Legal English
in authentic contexts They can provide valuable information on how the words in