Part 1: Can you guess what English word/phrase the Japanese speaker is trying to say with only the Made-in-
2. Would the group with review perform significantly better than the Tests-Only group?
Participants
All three classes of students were from the advanced stream of a freshman course within the English department of a foreign languages university in Japan. All were thus aged similarly (18-20 years old) and of similar language ability;
proficiency tiers are decided by combining student TOEFL scores with their grades on an internal entrance exam. The gender dynamic was also similar in each class with approximately 75% of each class being female.
Method
Students in all classes were given a pretest (see Appendix), which also functioned as an example of the types of quizzes they would be making in the Tests-Only and Tests-Plus- Review groups. They were also given a list of 25 words each week to study on their own or with their quiz teams.
Students in the Tests-Only and Tests-Plus-Review groups took turns in their classes to make the weekly 10-item quiz with their quiz teams. After finishing making the quiz, they would send it to their teacher for accuracy and suitability review. During the following class period, the students would take the quiz individually. We decided to have the students take the quizzes individually because of the comments in the previous study expressing dislike of group scores contributing to individual grades.
At the end of the year, all students in all groups took the posttest, which was the same as the pretest. Their final scores were compared with their pretest scores to see the extent to which the tests and/or the review exercises had had a significant effect on their vocabulary.
Finally, the teacher in charge of the Tests-Only group asked for students’ feedback on all parts of the course
The 2017 PanSIG Journal 163 through an end-of-course survey. One part asked them to rate
the vocabulary lists and tests that they used in class on a 5- point scale ranging from 1 = not at all useful to 5 = very useful.
Data listing all the above results are provided in the next section.
Results
Looking at descriptive statistics, all three groups increased their mean scores, with the largest gain by Tests-Only group and the smallest by the control group. Table 1. summarizes these findings. Table 2. shows the results of statistical analysis of the mean Post-Test scores.
Performing statistical analysis using ANCOVA and post-hoc pairwise comparisons revealed no significant differences on the mean posttest scores between the Tests-
Plus-Review group and the Control group. In other words, the data show that the Tests-Only group significantly outperformed the Tests-Plus-Review and Control groups on the posttest, and the Tests-Plus-Review group’s mean score was not significantly different to the Control group. An analysis of these mixed results is part of the discussion in the following section.
The results of an end-of-course survey conducted by the teacher of the Tests-Only group included an item wherein students could rate the vocabulary portion of the course on a scale of usefulness from 1 to 5. The results of the 19 students who answered the survey indicated that most students who used the vocabulary lists and tests found them to be 4 = quite useful to 5 = very useful, with a mean rating of 4.32 out of a maximum of 5 points.
Table 1.
Pretest and Posttest Mean Scores
Group Pretest M Posttest M SD of Posttest scores N
tests-only 11.41 16.35 2.760 17
tests-plus- review 10.89 12.42 2.317 19
no tests (control) 12.06 12.81 2.786 16
Table 2.
Pairwise Comparisons
(I) group (J) group Mean
difference (I-J)
SE Sig.b 95% CI for Differenceb lower bound upper bound
no tests (control) tests only -3.924* .779 .000 -5.489 -2.358
yests plus review -.296 .769 .702 -1.843 1.251
tests only no tests (control) 3.924* .779 .000 2.358 5.489
tests plus review 3.627* .745 .000 2.130 5.125
tests-plus-review no tests (control) .296 .769 .702 -1.251 1.843
tests only -3.627* .745 .000 -5.125 -2.130
* = The mean difference is significant at p < 0.5
b = Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
The 2017 PanSIG Journal 164
Discussion
The statistics showed that the Tests-Only group performed the best on the posttest relative to the other two groups, but a glaring question remains: Why did the Tests-Plus-Review Group perform no better than the control group? If these results are to be taken at face value, it would imply that review activities, or at the very least the review activities employed in this study, are not only ineffective, but inhibitory to vocabulary acquisition to the extent that they counter any of the benefits of creating and taking the weekly tests.
However, accepting such a simplistic conclusion based on insufficient data would be ill-advised because it is likely that the statistics are not telling the whole story. For instance, issues such as individual differences in study habits among the groups, the timing of the posttest administration, and the fact that the groups were taught by different teachers all could have affected the results. Furthermore, the study itself was conducted as part of an ongoing course and was therefore not as controlled as a laboratory setting might be.
Qualitative differences between the groups such as these may have been substantial.
One other difference between the groups is the issue of word exposure. All groups were exposed to the same list of words and encountered all of the words at least one time when the list was given in class. However, because the student-made quizzes comprised only 10 of the 25 words given each week, and were chosen by the students, it is possible that the quizzes for the Tests-Only group coincidentally exposed students to more of the words that appeared on the pre-/posttest than the Tests-Plus-Review group, thus explaining their higher final scores. The reasons why students chose some words over others can only be assumed. For instance, it is possible that some students would favor words that they felt would challenge their peers, and others might choose words that they already knew in order to reduce the study burden. This is yet one more variable that lies beyond the scope of the present study.
In short, there are various confounding variables involved and any one of them might have had an impact on the posttest scores that the statistical analysis used cannot detect.
On the other hand, considering the results of the Tests-Only group is equally important. The fact that their scores were significantly higher compared to the other groups coupled with the overwhelmingly positive appraisals given on the survey both indicate that continuing to use the testing format while subjecting it to action research or perhaps formalized laboratory inquiry is a worthy pursuit.
To conclude, the results and statistics were mixed:
One group using the Tests performed very well, and another with the Tests and Review activities performed no better than the control group. The reasons for this difference are unclear because the statistics cannot assist us in identifying and measuring the qualitative differences between the groups.
Nevertheless, because the Tests-Only group seemed to benefit a great deal from the testing format and the use of the NAWL, it is possible to recommend experimenting with this testing format to others who desire that their students engage with and learn more academic lexis.
Limitations
While the test format and the NAWL were highly appraised by the students in the Tests-Only group, one might worry about students’ beliefs regarding language learning being used to advise curriculum. It is nevertheless important to consider what they say given that students arguably have the most at stake in a Japanese university setting.
Additionally, even though the Tests-Only group performed significantly better on the posttest, we cannot claim that the testing format encouraged or facilitated long-term retention of the vocabulary. It is possible that students simply became better at taking the tests themselves as opposed to actually acquiring the words (e.g., process of elimination in multiple choice questioning).
Unequal exposure to the words was touched on in the discussion section, but it is worth reiterating this point. We expected that students in both treatment groups would have similar scores, and one possible explanation is unequal exposure on account of the quizzes being student-created.
Because the quizzes made by the students were not the same between groups, this could call our reliance on a standardized posttest for comparative analysis into question. In other
The 2017 PanSIG Journal 165 words, if students in both groups were exposed to the exact
same vocabulary appearing on the posttest to differing degrees, then the results of the posttest are suspect and unreliable. Caution should be exercised if attempts are made to extrapolate any of the results of this paper.
Conclusion
Although this study’s results and analysis do have some glaring weaknesses, the authors’ goal is not to create a perfect instrument for assessment. Rather, we are attempting to improve upon the vocabulary component of the Foundational Literacies course by drawing on a corpus-based, academic word list (NAWL) and by empowering students to be able to choose which words they would like to engage with and learn more deeply. In these goals, we have been largely successful. Finally, it is the hope of the authors that other second-language teachers and researchers will also take charge by experimenting with and reporting on the vocabulary learning and studying methods that they use in their own courses.
References
Browne, C., Culligan, B., & Phillips, J. (2013). The New Academic Word List. Retrieved from
http://www.newacademicwordlist.org/
Coxhead, A. (2000). A new academic word list.
TESOL Quarterly, 34(2), 213-238.
https://doi.org/10.2307/3587951
Davies, R., and Ikeno, O. (2002). Shudan ishiki: Japanese group consciousness. In The Japanese mind:
Understanding contemporary Japanese culture (pp. 195- 199). Boston, MA. Tuttle.
Nation, I. S. P. (2013). Learning vocabulary in another language. Cambridge, England: Cambridge University Press.
New London Group (1998). A pedagogy of multiliteracies:
Designing social futures. Harvard Educational Review, 66(1), 60-92.
Owens, J., & Reed, J. (2017). So many words, so little time:
Implementing a new method for teaching vocabulary in a literacies course. In G. Brooks (Ed.), The 2016 PanSIG Journal (pp. 249-258). Tokyo, Japan: JALT.
Schmitt, N., & Zimmerman, C. B. (2002).
Derivative word forms: What do learners know?
TESOL Quarterly, 36(2), 145-171
The 2017 PanSIG Journal 166
Developing a New Locus of Control Instrument: The Abridged Kambara Scale
Michael James Rupp Tokai University michaelrupp@tsc.u-tokai.ac.jp
Ian Maxwell Isemonger Kumamoto University ian-m@kumamoto-u.ac.jp This paper reports on the results of a confirmatory factor analysis (CFA) using survey data collected with an abridged version of the Kambara Locus of Control Scale (K-LoCS; 1982, 1987). Previous work on the K-LoCS, using CFA (Rupp, 2016a), exploratory factor analysis (EFA; Rupp, 2017b), and qualitative focus group studies (Rupp, 2016b), all provided evidence that a modified and abridged K-LoCS may produce scores with improved psychometric properties. A subset of items, informed by findings in the previous work, was assembled under a two-construct measurement model, and this model was tested a priori, using a new dataset (N = 211). The results, in terms of the p value for the chi-square and the model fit indexes, indicated considerable improvement over results in previous studies.
本論文は短縮型の鎌原氏のローカス・オフ・コントロール(統制の所在)尺度 (K-LoCS;
1982, 1987)、によって収集した調査データの確認的因子分析(CFA)結果を報告するもの である。以前鎌原の尺度を対象した研究の CFA(Rupp, 2016a)、探索的因子分析(EFA;
Rupp, 2017b)、及び定性的なフォーカス・グループ法(Rupp, 2016b)の全ての研究結果が 尺度の変更と短縮することによって尺度から得られるスコアの心理測定的に改善見込み がある証拠を示した。以前の研究の結果を考慮した一部の項目を集め二つ構成概念モデル を構築して新データ・セット(N = 211)を分析・検証した。結果は、カイ二乗及びモデル・
フィット指数の p 値に関して、これまでの研究結果よりも著しく改善されたことが示され た。
Locus of control (Rotter, 1966) is a psychological construct in which people are seen to vary along a continuum ranging from internal locus of control (I-LoC) to external locus of control (E-LoC). An internal orientation is associated with a belief that outcomes in one’s life are mostly due to internal factors and are within one’s own control. Conversely, an external orientation means that one tends to believe that it is factors beyond one’s control which dictate the course of one’s life. These external factors might include such things as fate, luck, and influence from other people, among others. This construct has an important place in the field of language teaching and learning, because having a high degree of I-LoC is arguably aligned with having a strong sense of personal agency and self-efficacy and a greater degree of learner autonomy (Oxford, 2003; 2008); of course, these are qualities which are seen as beneficial to language learners. A number of studies with EFL students have demonstrated a correlation between a high I-LoC orientation and more successful
language learning outcomes (Chang & Ho, 2009; Ghonsooly
& Elahi, 2012; Ghonsooly & Moharer, 2012; Ghonsooly &
Shirvan, 2011; Peek, 2016).
In the Japanese context, the 43-item Kambara Locus of Control Scale (K-LoCS43, Kambara, 1987) has had a presence in the literature for many years. The 43-item scale has its roots in an earlier 18-item scale (K-LoCS18; Kambara, 1982) and represents significant lengthening of the original instrument by simply adding further items. Since its creation, the K-LoCS43 has been used in a wide variety of domains including, but not limited to, studies on employee psychology (Kanda, 2006), developmental psychology (Fushimi, 2011), and, more recently, in secondary-level English education studies in Japan (Hosaka, 2007). The construct of locus of control clearly has a strong notional relationship to learner autonomy (Oxford, 2003; 2008), and given that learner autonomy has proved difficult to measure (Horai, 2013a, 2013b; Macaskill & Taylor, 2010), Rupp (2016a, 2016b,
The 2017 PanSIG Journal 167 2017b) proceeded with a line of research into the
psychometric properties of scores produced by the K- LoCS43, under the theoretical rationale that locus of control might stand in for an important aspect, or dimension, of learner autonomy. This work was conducted in the context of Japanese tertiary education, and it was initially necessary to test the model hypothesized for the instrument (both the 18- item and 43-item version) by Kambara (1982, 1987) using CFA. To date, such a test had not been conducted anywhere in the literature and in any domain; only EFAs had been conducted (Kambara, 1987; Hosaka, 2007). Scores produced by the K-LoCS43 were shown to have poor fit (Rupp, 2016a) with the model hypothesized in a study of Japanese high school students (N = 1125). A test of the 18-item version produced slightly better results than for the 43-item version, but were still unsatisfactory overall. These findings were later more fully explored through a focus group study (Rupp, 2016b) and an EFA-based analysis (Rupp, 2017b). The overall evidence from these various studies was combined in a mixed-methods analysis (Rupp, 2017a), which showed numerous areas of potential improvement for the K-LoCS43.
The results reported in this paper represent a continuation of this trajectory of research using new data. An abridged version of the K-LoCS43 was assembled by selecting a subset of items from the original scale. Previous research in the trajectory (Rupp, 2016a, 2016b, 2017b) had indicated that (a) the scale was too long, (b) items were repetitive and operationally redundant in cases, and (c) the 4-point Likert scale should be more refined. With respect to the length of the scale, other studies (Dragutinovich, White, & Austin, 1983; Ross, Kalucy, & Morten, 1983) on other locus of control instrumentation have reported better psychometric results after the number of items has been reduced in an instrument. Given this previous experience in the literature and the significant length of the K-LoCS43 for only two measured constructs (I-LoC and E-LoC), an abridged version was expected to have better prospects. The second point concerning operationally redundant items (i.e., near- repetitions of other items) corresponds with the first, because many of the items were not actually expanding the operational bandwidth of the two measured constructs. With respect to the third point, many focus group students
complained that the 4-point Likert scale was too restrictive (Rupp, 2016b), so for this study a 5-point scale was adopted.
In order to reduce the number of items, five items per construct were chosen (see Appendix) under the criterion that they not be operationally redundant in terms of any other item. Additionally, and using results from the previous EFA (Rupp, 2017b), items which had better loading coefficients on the two factors representing I-LoC and E-Loc were selected. Finally, participants in the focus group discussions reported that it was more natural for the Likert scale to range from positive to negative; in other words from strongly agree to strongly disagree. Kambara constructed the Likert scale to range from negative to positive, and previous studies by Rupp (2016a, 2016b, 2017b) had preserved this formulation. In this study, the outcome from the focus group study was observed, and the scale was reversed to align with what participants had indicated a preference for.
Methods
The revised instrument comprised ten items (see Appendix) for which the students responded on a 5-point Likert scale.
Five items were hypothesized to measure the I-LoC construct (Items 02, 05, 06, 09, and 10) and five items the E-LoC construct (Items 01, 03, 04, 07, and 08). The survey was administered in Japanese, using the original Japanese phrasing from the K-LoCS43 for the selected items. The survey included a Japanese consent form informing the students of the voluntary nature of the questionnaire and that participation would not affect their grades. The abridged instrument created for this study was designated as the Kambara Locus of Control 10-Item Scale (K-LoCS10).
Participants and Procedure
There were 213 total participants in this study with two responses removed for having incomplete data, leaving 211 usable responses (N = 211). The data was collected from students attending three Japanese universities, two public (32%) and one private (68%). Males represented 55% and females 45% of respondents. Ages ranged from 18 years to 24 years with a mean age of 19 years. The majority of respondents (95%) were between 18 years and 20 years. The faculties represented were agriculture (35.5%), business
The 2017 PanSIG Journal 168 administration (20.4%), engineering (11.4%), literature
(8.5%), and management (24.2%). It required approximately 5 minutes to administer the survey.
Analytical Procedure
The data collected from participants was entered directly into IBM/Statistical Package for the Social Sciences (SPSS;
Version 20). Descriptive statistics (means, score distributions, and normality) and reliability estimates (Cronbach’s alpha and the confidence intervals for alpha) were calculated using SPSS. The CFA was conducted with Analysis of Moment Structures (AMOS; Version 21). Item responses were given numerical values ranging from 1 to 5 with 1 being strongly disagree and 5 being strongly agree. The data was first analyzed with a focus on univariate normality; i.e., skewness and kurtosis. The critical ratios for skewness and kurtosis were compared against two criterion levels (stipulated in advance).
These levels were less than 3.0 (more relaxed), and less than 2.0 (more rigorous), with the latter assisting with identifying meritorious results.
With respect to executing the CFA and assessing model fit, the analytical procedure went beyond merely relying on the chi-square statistic and its associated probability level, because the various researchers in the literature caution that this statistic tends to over-reject models (Byrne, 2001; Hu & Bentler, 1999). In this regard, four indexes of model fit recommended by Hu and Bentler (1999) were adopted and used in triangulation. These included two absolute fit indexes (the SRMSR and RMSEA) and two incremental fit indexes (the TLI and CFI). An absolute fit index indicates how well the observed data fits the specified model, whereas an incremental fit index gives a comparison of how much improvement there is in model fit when the model specified is compared with a more restricted baseline model. It is important to note that these are indexes and not test statistics (like the chi-square), and therefore they are not interpreted in terms of probability (i.e., a p value), but rather are interpreted on a continuum using cutoff criteria that are empirically informed and available in the literature.
The cutoff criteria for interpreting the values produced for the model on these indexes were adopted from Hu and Bentler’s (1999) study. These were as follows: CFI, > .95;
TLI, > .95, SRMSR, < .08; RMSEA, < .06. These cutoffs were empirically derived, using a Monte Carlo or simulation analysis, and under the rationale of minimizing Type I and Type II error, that is, the mistake of rejecting a model when it should have been accepted or accepting a model when it should have been rejected. It is important to note that Hu and Bentler recommended using these indexes in triangulation. In other words, in order to claim model fit, a model has to satisfy the cutoffs on all of the indexes and not just one, or some, of them.
Results
The results, presented below, are reported in terms of descriptive statistics for items scores, reliability estimates for the two subscales, and model fit assessment consequent to the CFA.
Descriptive Statistics
Table 1. indicates the means and standard deviations for each item. From Table 1. it can be observed that means tended to fall in the positive range (or agreement range); that is, above 3.0. Only Item 04 and Item 08 presented with means falling in the negative range (or disagreement range).
Table 1.
Item Means and Standard Deviations (SD) for Scores Derived on Items Comprising the K-LoCS10 (N = 211)
Test items M SD
Item 01 3.02 1.248 Item 02 4.09 1.036 Item 03 3.69 1.116 Item 04 2.55 1.184 Item 05 4.09 1.000 Item 06 3.61 1.227 Item 07 3.04 1.294 Item 08 2.33 1.332 Item 09 4.43 0.925 Item 10 4.14 0.993
Table 2. shows the critical ratios for skewness and kurtosis for each item. Skewness and kurtosis are essentially measures of the spread or distribution of scores. If, for example, there