Michael Mendicino Educational PsychologyWest Virginia UniversityNeil HeffernanComputer Science DepartmentWorcester Polytechnic InstituteJournal of Interactive Learning ResearchTitle: Com
Trang 1Michael Mendicino Educational PsychologyWest Virginia University
Neil HeffernanComputer Science DepartmentWorcester Polytechnic InstituteJournal of Interactive Learning ResearchTitle: Comparing the learning from intelligent tutoring systems, non-intelligent computer-
based versions, and traditional classroom instruction
Abstract
There have been some studies that show that computer-assisted
instructional systems (CAI) can be superior to traditional classroom
instruction (Kulik, 1983, 1994, 2003; Bangert-Drowns, Kulik & Kulik,
1985) Other studies have compared new “intelligent tutoring systems”
(ITS) to classroom instruction (Koedinger, Anderson, Hadley, &
Mark, 1997; Anderson, Corbett, Koedinger, & Pelletier,
1995) while many studies have compared Intelligent
tutoring systems to CAI-like controls (Carroll & Kay, 1988;
Corbett, & Anderson, 2001; Mathan, 2003; Schooler, & Anderson, 1990)
We are aware of no studies that have taken a single ITS and compared it to
both: 1) classroom instruction and 2) CAI In this study we compare these
three (classroom instruction, CAI and ITS) using a newly developed ITS
(Heffernan & Koedinger, 2003).We seek to try to quantify the value added
of CAI over classroom instruction, versus the value-added of ITS on top
of CAI We found evidence that the ITS was much better than the
classroom instruction but with an effect size of only 0.6 Our results in
trying to calculate the value-added of the CAI over the classroom were
mixed, with 2 studies showing effects but the third one not showing
statistically reliable differences The extra value-added of the ITS over
CAI did seem to be robust across the three studies with an average 0 4
effect size
Trang 2There have been some studies that show that traditional computer- assisted
instructional systems (CAI) can be superior to traditional classroom instruction (Kulik,
1983, 1994, 2003; Bangert-Drowns, Kulik & Kulik, 1985) Other studies have compared new so-called “intelligent” tutoring systems (ITS) to classroom instruction (Koedinger & Anderson, 1993; Koedinger, Anderson, Hadley, & Mark, 1997; Anderson, Corbett, Koedinger, & Pelletier, 1995) while many studies have
compared intelligent tutoring systems to CAI-like controls (Carroll & Kay, 1988; Corbett, & Anderson, 2001; Mohan, 2003; Schooler, & Anderson, 1990) We are aware of no studies that have taken a single ITS and compared it to both:1) classroom instruction and 2) CAI In this study we compare all three with respect to student
learning and “motivation” in the algebra domain
Kulik’s (1985 & 1994) studies suggest CAI systems lead to about 0.3 to 0.5 standard-deviation effect sizes over classroom instruction The Koedinger, et al., (1997) study which compared a commercially available ITS (Cognitive Tutor) to a classroom control suggest a 1 standard-deviation effect size for experiment designed metrics, while for external metrics (The Iowa Algebra Aptitude test and a subset of the Math SAT) found
an effect size of 0.3, but this study may also suffer from a confound of the effect of the ITS with a new text-book prepared to go along with the curriculum We are uncertain how to compare these effect sizes with the Kulik and Kulik effect size of about 0.4 as we don’t know if the metrics in the Kulik and Kulik studies are more generally like
externally designed measures or experiment defined measures.In another study, VanLehn
et al (2005) compared an ITS not to classroom instruction, but to doing homework in a traditional paper-and-pencil manner They found results similar to the Cognitive Tutor results mentioned above with effect sizes of about 1 SD for their own measures, and about 0.4 for what they consider analogue to “externally designed measures”
In this study we compare these three (classroom instruction, CAI and ITS) using anewly developed ITS (Heffernan & Koedinger, 2003).We seek to try to quantify the valueadded of CAI over classroom instruction, versus the value-added of ITS on top of CAI How much more learning does adding an “intelligent tutor” get you over CAI? This question is important because ITSs are complex and costly to build and we need to understand if it’s worth the investment, or maybe CAI is good enough? We do this in the context of a mathematics classroom while teaching the skill of writing algebra
expressions for word problems, a skill we call symbolization In this paper we report three experiments, with three teachers and a total of 160 students All studies involved analyzing the amount of learning by students within a one classroom period, measured byexperimenter constructed pre- and posttests the day before and after the experiment Seven of the items were experimenter-designed questions and two were standardized test questions
Trang 3Area of Math we focused on
Students in the United States lag behind many other countries in math skills, particularly at the eighth and twelfth grade levels (TIMSS, 1997) While US eighth grade students showed improvement in math, scoring above the international average (TIMMS,2003), better math instruction that integrates technology is still needed to ensure
continued improvement One skill students have difficulty with is writing algebra
expressions for word problems, a skill we call symbolization Heffernan & Koedinger
(1997) stated that “symbolization is important because if students cannot translate problems into the language of algebra, they will not be able to apply algebra to solve real world problems.” The need for this skill is more crucial now because students have access to graphing calculators and computers that can perform symbol manipulation skills, but translating word problems into the symbolic language of algebra remains a uniquely human endeavor
Other studies have been conducted to determine what behaviors make human tutoring effective and how these behaviors can be incorporated into computer-based tutoring systems (McArthur, Stasz, & Zmuidznasa, 1990; Merrill, Reiser, Ranney and Trafton 1992; Graesser & Person, 1994; Chi, Siler, Jeong, Yamauchi, & Hausmann, 2001;) For example, Merrill, et al.,(1992) concluded that a major reason human tutors are effective is that they let students do most of the work in overcoming impasses, while providing only as much assistance as necessary and keeping students from following
“garden paths” of reasoning that are unlikely to lead to learning VanLehn, Siler, & Murray (2003) also found that allowing students to reach impasses correlated with learning gains Finally, numerous studies (Swanson, 1992; Grasser, Person, & Magliano, 1995; Chi, Siler, Jeong, Yamauchi, & Hausmann, 2001; Katz, Connelly, & Allbritton 2003) hypothesized that it is the interactive nature of the tutorial dialog (i.e., interaction hypothesis) that accelerates learning
Computer-Assisted Instruction
Trang 4Computer-based tutoring systems appear to hold promise for improving
mathematics instruction The first computer-based tutoring systems appeared over thirty years ago with the goal of approaching the effectiveness of human tutors According to Corbett & Trask (2000) these systems, called computer-assisted instruction (CAI)
afforded one advantage of human tutors: individualized interactive learning support While these systems were interactive and provided explicit instruction in the form of longweb pages or lectures they offered no dialog Studies demonstrated the effectiveness of CAI in mathematics at the elementary level (Burns & Bozeman, 1981) secondary level (Kulik, Bangert, & Williams, 1983) and college level (Kulik, Kulik, & Cohen, 1980) In ameta-analysis of 28 studies involving CAI, Kulik et al., (1985) found that CAI improved student achievement by an average effect size of 0.47 over students receiving
conventional instruction In another meta-analysis Kulik (1994) summarized 97 studies from the 1980’s that compared classroom instruction to computer-based instruction and found an average effect size of 0.32 in favor of computer-based instruction Kulik
claimed that students learned more and learned faster in courses that involved based instruction Finally, Kulik (2003) summarized the findings of eight meta-analyses covering 61 studies published after 1990 The median effect size for studies using
computer-computer tutorials was 0.59 meaning that students who received computer-computer tutorials performed in the 72nd percentile while students receiving conventional instruction
performed in the 59th percentile While these studies suggest that CAI can be an effective instructional aid in both elementary and secondary schools, CAI does not address the main concern of McArthur et al., (1990) who claims that teaching tactics and strategies are the least well developed components of most intelligent tutors
individualized assistance that is just-in-time and sensitive to the students’ particular approach to a problem (Anderson, Corbett, Koedinger, and Pelletier 1995) They also provide canned explanations and hint messages that get more explicit as students
continue asking for help until the tutor is telling the student exactly what to do The feedback is immediate and step-wise and is structured so as to lead students toward expert-like performance The tutor intervenes as soon as students deviate from the
solution path, but the cognitive tutor does not engage students in dialog by asking new questions Cognitive tutors also use knowledge tracing technology that traces student’s knowledge growth across problem solving activities and uses this information to select problems and adjust the pacing to adapt to individual student needs
Even though these new cognitive tutors do not engage students in dialog, they have nonetheless had a significant impact on student learning in a variety of domains Forexample, Koedinger, Anderson, Hadley, & Mark (1997) compared a cognitive tutor, PAT
Trang 5(Pump Algebra Tutor) to traditional algebra instruction The PAT intelligent tutor was built to support the Pittsburgh Urban Mathematics Project (PUMP) algebra curriculum that is centrally focused on mathematical analysis of real world situations and the use of computational tools The study evaluated the effect of the PUMP curriculum and PAT tutor use and found that students in the experimental classes outperformed control classes
by 100% on assessments of the targeted problem solving and multiple representations These results also translated into a one standard deviation effect size Recent studies comparing PAT and traditional algebra instruction have found improvements in the 50-100% range thus replicating the above results (Koedinger, Corbett, Ritter, & Shapiro, 2000) This cognitive tutor is currently used by approximately 375,000 students in over
1000 schools
Morgan & Ritter (2002) conducted a study comparing the Cognitive Tutor
Algebra I course and a Traditional Algebra I Course which used a different text with students in their junior high school system Dependent measures included the Education Testing Service (ETS) Algebra I end-of-course exam, course grades and a survey of attitudes towards mathematics These measures certainly seem to have the benefit of not having been defined by the experimenters themselves When restricting the analysis to only those teachers who taught both curricula, the researchers found statistically
significant differences on all dependent measures in favor of the cognitive tutor Morgan and Ritter state that the strongest components of teacher effects have to do with teacher education and professional development and only indirectly with practices In their study the curriculum effect that they were examining had to do with teacher practices which would be expected to be relatively small Therefore, they conclude that the effect size of 0.29 is impressive taken in this context
Finally, as part of the Andes project, VanLehn et al., (2004) evaluated Andes an ITS developed to replace paper-and-pencil homework and to increase student learning in introductory college physics courses Andes provides immediate feedback to student responses and also provides three kinds of help including: 1) pop up error messages when
the error is probably due to lack of attention rather than lack of knowledge, 2) What’s
Wrong Help when the student is essentially asking what is wrong with that, and 3) Next Step Help if students are not sure what to do next The What’s Wrong and Next Step Help
selections generate a hint sequence that includes a pointing hint, a teaching hint, and a bottom-out hint that tells students exactly what to do
Andes was evaluated from 1999 to 2003 and in all years Andes students scored higher than control students with effect sizes ranging from 0.21 to 0.92 VanLehn et al compared their results to the results of the Koedinger et al (1995) study which they suggest is the benchmark study with respect to tutoring systems The Koedinger et al., study evaluated the PAT intelligent tutoring system and a novel curriculum (PUMP) which Carnegie Learning distributes as the Algebra I Cognitive Tutor Koedinger et al used both experimenter-designed questions and standardized tests While analyzing the experimenter-designed tests, they found effect sizes of 1.2 and 0.7 and effect sizes of 0.3 while analyzing multiple-choice standardized tests VanLehn et al found very similar effect sizes (1.21 & 0.69) for their conceptual experimenter-written tests and similar effect sizes, 0.29, for their multiple-choice standardized tests Thus, both evaluations havesimilar tests and effect sizes They both have impressive 1.2 and 0.7 effect sizes for conceptual, experimenter-designed tests, and lower effect sizes on standardized, answer
Trang 6only tests Given the large difference between experimenter-designed tests versus
externally designed tests, it makes one wonder how to interpret the Kulik studies that argue that CAI , when compared to classroom instruction, gives between 0.3 and 0.7 effect sizes
The authors of the Andes study stated, that their evaluation differed from the Koedinger et al evaluation in a crucial way The Andes evaluations manipulated only the way that students did their homework—on Andes vs on paper The evaluation of the Pittsburgh Algebra Tutor (PAT) was also an evaluation of the Pittsburgh Urban
Mathematics Project curriculum (PUMP), which focused on analysis of real world
situations and the use of computational tools such as spreadsheets and graphers
Therefore, how much gain was due to the tutoring system and how much was due to the new curriculum is not clear Finally, VanLehn et al stated that in their study, the
curriculum was not reformed; therefore, the gains in their evaluation may be a better
measure of the power of intelligent tutoring systems per se
Dialog-based Intelligent tutors
Both CAI and cognitive tutors have proved to be more effective than traditional classroom instruction, yet neither has approached the effectiveness of human tutors Perhaps they have not captured the features of human tutoring that account for its
effectiveness Researchers have recently developed ITSs that incorporate dialog that is based on human tutors in specific domains Preliminary results are promising We
mention two related projects before focusing on Heffernan’s system used in this
evaluation
The Tutoring Research Group at the University of Memphis has developed AutoTutor (Graesser et al., 2001), an ITS that helps studentsconstruct answers to computer literacy questions and qualitative
physics problems by holding a conversation in natural language thus taking advantage of the interaction hypothesis AutoTutor attempts to imitate a human tutor by reproducing the dialog patterns and
strategies that were likely to be used by a human tutor AutoTutor presents questions and problems from a curriculum script, attempts to comprehend learner contributions that are entered by keyboard,
formulates dialog moves that are sensitive to the learner’s
contributions … and delivers the dialog moves with a talking head that simulates facial expressions and speech to give the impression of a discussion between the tutor and student (Graesser, Wiemer-Hastings, K., Wiemer-Hastings, P & Kreuz, 1999) AutoTutor has produced gains of0.4 to 1.5 sigma depending on the learning performance measure, the comparison condition, the subject matter, and the version of AutoTutor (Graesser et al., 2003)
Rosé C, Jordan P, Ringenberg M, Siler S, VanLehn K, and
Weinstein A (2001) integrated Atlas and the Andes system to compare
a model-tracing ITS with an ITS incorporating dialog Atlas facilitates incorporating tutorial dialog while Andes is a model-tracing ITS for quantitative physics that provides immediate feedback by highlighting each step attempted in either red or green to indicate a right or wrong
Trang 7answer Andes also provides a hint sequence for students asking for help The researchers were able to compare student learning between the original Andes and the integrated Atlas-Andes with dialog Atlas-Andes students scored significantly higher on post-test measures with
a difference of 0.9 standard deviations
Heffernan & Koedinger (2002) & Heffernan, (2001) developed an intelligent tutoring system (ITS) that uses dialog to help students write algebra expressions It models both student behavior and tutorial behavior by combining a cognitive model of
student behavior in the domain of symbolization with a tutorial model of strategies
observed in human tutors The cognitive student model has a set of production rules that models the problem solving skills needed to write algebraic expressions The tutorial model is based on the observation of an experienced human tutor during an hour long tutoring session and thus captures tutorial strategies that were observed to be effective for
human tutors in the domain of symbolization It was the first intelligent tutor that had
both a model of student thinking and a model of tutorial planning The system is of the
“coached practice” variety that is meant to scaffold “learning by doing” while students practice their problem solving skills It is different from typical Cognitive Tutors in that ittakes its cues more from the dialogs that human tutors have with students and it is also more flexible in the interactions it has with students For example, it can acknowledge that part of an answer is correct and then engage a student in a “sub dialog” to help him
or her to improve the incorrect path It “breaks” problems down for students by asking questions and re-phrasing questions, but does not give students answers Whereas
Cognitive Tutors typically cover a full-year of algebra, Heffernan’s ITS deals only with
symbolization problems The following is a sample tutorial dialog:
Tutor: In the marching band there are 7 fewer flute players than
drummers If the number of drummers is "x", write an expressionfor the
total number of flute players and drummers
Student: 7-x+x {Correct answer is x+(x-7)}
Tutor: No We will come back to the total number of flute players
Tutor: Correct Now, lets try this question again: Please write an
expression for the total number of flute players and drummers
Student: x-7+x
Trang 8Tutor: Correct Let us go onto the next problem.
Heffernan (2001) investigated the impact on learning of two versions of his ITS
In the control version, if students answered incorrectly the tutor told them the answer to type then moved on to another problem This approximates a common homework
situation in which students can look up correct solutions in the back of the book In the experimental version, the ITS engaged students in tutorial dialog specific to student errors in an attempt to help students construct the correct answer Students in the
experimental condition performed better on post-test measures showing evidence of learning form dialogs Heffernan only controlled for the number of problems in this experiment and not for time, therefore, he did not determined if the extra time spent in thedialogs was worth the effort
Heffernan (2002) reported on a web-based experiment in which he controlled for time in an attempt to see if the learning gains students acquired were worth extra time students spent in dialog Heffernan found students in the experimental condition
completed only half as many problems as students in the control condition, but still showed learning gains over the control condition with an effect size of 0.5 Heffernan also reported a possible motivation benefit to dialog In summary, Ms Lindquist seems like one example that supports the hypothesis that incorporating dialog into an ITS can lead to increases in student learning Heffernan & Croteau (2004) replicated some of the findings that Ms Lindquist seems to show some benefit over CAI for some lessons
The purpose of these experiments is to replicate research comparing normal classroom instruction and CAI and to extend the research by also comparing supposed
“intelligent” tutoring instruction to the other conditions We will test the hypothesis that
“intelligent” dialog accounts for more learning than 1) computer-assisted instruction as well as 2) classroom instruction This investigation will seek to determine how much added value “intelligence” will account for above computer-assisted instruction when compared to classroom instruction We will also investigate differences in learning and motivation when comparing classroom instruction, computer-assisted instruction, and intelligent tutoring
Experiment 1: Compare One Teacher to CAI and ITS
In this experiment, students’ learning of symbolization skills is measured from pretests and post-tests administered before and after classroom instruction (traditional or cooperative), computer-assisted instruction, or computer instruction with additional intelligent tutoring
Research Question: The research question for these studies was: Are the effects of
computer-delivered instruction significantly better than the effects of classroom
instruction on students’ ability to learn to symbolize? At a finer grained level, are the effects of intelligent tutoring feedback different than the effects of the simple non-
intelligent tutoring approach of traditional CAI?
Trang 9Setting and Participants The study took place in the students’ regular algebra
classrooms and in a computer lab that consisted of 20 computers with internet access Thehigh school was located in a rural area and served approximately 1200 students Forty-six percent of the students received either free or free and reduced lunches According to Department of Education data on NCLB, this school ranked in the bottom half and did not meet AYP due to low socio-economic subgroup scores
The participants for Experiment 1 were students enrolled in equivalent algebra 1
inclusion classes during the 2004-2005 school year The classes were not Honors or Advanced Placement, but were typical classes with students mostly of average ability One class had twenty-two students and the other had twenty-one students However, a total of seven students, four from one class and three from the other, were not included in the study because they missed at least one day during the experiment Therefore, a total
of thirty-six students participated in the study, twenty-two females and fourteen males Fourteen were students identified as learning disabled, and twenty-two were typical regular education students There were thirty freshman and six sophomores ranging in ages from fourteen to sixteen years The classes were co-taught by a fully certified regular education math teacher and a highly qualified, math through algebra 1, special education teacher Both teachers shared responsibilities for teaching algebra content, lesson planning, and student accommodations The lead author was the primary instructorfor both classes during the experiment, but not the students’ regular teacher Individual Education Programs were reviewed to ensure that the general classroom placement was the least restrictive and most appropriate algebra I placement for students with learning disabilities
Content The computer curriculum is composed of five sections, starting with
relatively easy one-operator problems (i.e., “7x”), and progressing up to more difficult four or five operator problems (i.e., “3x+5*(20-x”) The content of the 9 item pre- and post-tests was identical and contained four multiple-choice questions and five questions requiring students to write algebraic expressions (See Appendix A for sample tests.) Seven of the items were experimenter-designed questions and two were standardized test questions An answer key was constructed and used by the scorer to award one point for each correct answer
The classroom lessons were designed with items of similar content, format and difficulty level In fact, problems used in the classroom lessons were isomorphic to the computer lessons so no group had an unfair advantage (See Appendix B for sample classroom problems)
Procedures Both the control and experimental conditions took place during the
students’ regular fifty-minute class periods The classroom lessons were delivered by the lead author and the study was conducted over a one-week period with pretest, mid-test, and post-test administered on Monday, Wednesday, and Friday and the computer
condition presented on Tuesday and Thursday Prior to the experiment, students in both classes had minimal exposure to algebraic expressions and equations while working in their text: Algebra I, Glencoe Mathematics Series
During the traditional instruction condition, the classroom activities were divided into two main parts: 1) introduction with in-class examples, and 2) guided practice The
Trang 10introduction period began with the teacher giving each student a worksheet containing twenty-five word problems ranging in difficulty from simple one-operator problems to complex four-operator problems After reviewing the objective of the lesson, problems were displayed on an over-head projector while the instructor read a problem and
demonstrated how to translate it into an algebraic expression The instructor used various instructional strategies separately and in combination while demonstrating problems For example, on one problem the instructor exclusively used the “clue” word method by identifying clue words such as “more than”, “less than”, and “sum” that indicate
mathematical operations and parentheses On another problem, he used the “clue” word method along with dividing the problem into component parts and solving each part separately On all problems demonstrated, however, the instructor continually checked for understanding by asking comprehension gauging questions and eliciting questions anddiscussion from students A total of five problems were presented and took approximatelytwenty minutes During guided practice, students were instructed to work on the
remaining problems until the end of the class period, approximately thirty minutes The instructor was available to all students and assisted in the order in which help was
requested The guidance was not interactive in nature, but consisted mainly of promptingstudents to look for clue words, defining words (e.g., “per” means “divide by”, “twice” means “two times a number”), explaining procedures (e.g., “less than” is a backwards construction), and giving hints All questions were answered regardless of the nature
The cooperative instruction condition also consisted of two parts: introduction with in-class examples and cooperative learning groups The introduction period followedthe same instructional sequence used during the traditional instruction condition and also lasted twenty minutes However, students were placed in groups of four and encouraged
to work together on the problems with no additional guidance from the instructor The cooperative learning model had been used on a regular basis in these classes, so students were familiar with the structure and expectations For example, students understood the concept of peer support inherent in the groupings and the many forms in which it can be manifested such as clarifying, interpreting, modeling, explaining, and taking
responsibility for their own learning as well as the group’s learning When students requested assistance from the instructor, they were reminded to attempt the problem as a group first and then were given indirect support, when needed Students worked on the problems in their groups for thirty minutes
During the computer-delivered lesson, students logged on to the computer as soon
as the class began This process took five minutes for the majority of students, a few, however, needed more time to fully log on The computer-system then randomly assignedeach student either to the ITS or the CAI condition Students continued working on computer delivered problems until the end of class The additional five to seven minutes spent logging on effectively resulted in less instructional time for the students in the computer lesson Therefore, students in the classroom conditions received about eight percent more time on task
Design A counterbalanced design, in which all groups received all treatments but
in a different order, was used in this study Each student participated in the experimental condition and either in the cooperative or traditional instruction part of the control
condition For example, students in Group -1 participated in the control condition first, while students in Group - 2 participated in the experimental condition first ensuring that
Trang 11each group participated in a different sequence The experiment lasted one week On Monday students were administered the pretest and were given instructions and a
demonstration of how to log on to the computer-based system Students were not allowed
to practice items at this time The goal was to become familiar with the computer system and its operations On Tuesday, Group -1 participated in the cooperative instruction condition while Group -2 participated in the computer condition On Thursday the order was reversed On Wednesday and Friday all students were given a mid-test and posttest respectively
Every effort was taken to ensure a matched control group While random
assignment was not possible in the school setting, the control group was an equivalent algebra 1 class taught by the same teachers A pretest was administered to both groups to ensure initial balance on the dependent measures The students were given a mid-test after the first condition and a post-test after completion of the experiment The data for this study were analyzed using SPSS Repeated measures analysis of variance, one-way analysis of variance, t-tests and descriptive statistics were used Table 1 displays the overall design of the study
Table 1
Group – 1- Classroom First, Then Computer
Monday ~ 10-20 minutes Pre test / Introduction to
Computer-system
Average 50minutes
CooperativeLearningCondition
Thursday Experiment:
Average 45minutes
Computer ConditionRandomlyassigned
By computer
Trang 12Group – 2 Computer First, Then Classroom
Monday ~ 10-20 minutes Pre test / Introduction to
Computer-systemTuesday Experiment:
Average 45minutes
Computer ConditionRandomlyassigned
TraditionalLearningCondition
Results from Exp 1
The means of the two groups were balanced at pretest (mean number correct for computer first group = 3.00, sd = 1.085; mean number correct for computer second group
= 2.89, sd = 1.323; t =.275, p<.785) Students in all conditions learned a significant amount as shown by analysis of variance with repeated measures which revealed
statistically significant differences between the mean number correct at pretest, mid-test, and post-test (F = 30.32, p<.000) Given that the groups were balanced at pretest and there were large learning gains, we want to determine if there were disproportionate gainsdependent upon condition We first compared both computer versions (CAI & ITS) as a group with classroom instruction Later, we break out the CAI versus ITS
First we discuss the main effect of computer versus classroom Given that we had
a pretest, mid-test and post-test, we calculated gain scores for each student for both classroom and computer conditions (Meaning if “Johnny” was in the condition that first went to the computer lab, and later, after the mid-test, he had classroom instruction Johnny’s computer gain that would be the mid-test scores minus the pretest, and Johnny’sclassroom gain would be the post-test minus the mid-test.) There was a statistically significant difference (t=2.469, p<.019) between the average computer gain (m = 1.7, sd
= 1.22) versus the average classroom gain (m = 1.1, sd = 926), suggesting that students learned about 0.67 more problems from the computer versus the classroom The effect size of this defense was 0.60 with 95% confidence intervals of 0.08 – 1.02
Next we consider whether computer learning gains were different between CAI and ITS When comparing conditions, the ITS group showed about 0.5 problem gain overthe CAI condition (m=1.42, sd =1.45; m=1.95 and sd=1.04), which yielded an effect size
of 41 and confidence intervals of -0.06 – 0.88, but these differences were not statisticallysignificant (t =.126, p<.215) With this somewhat large p-value of 126 it suggested that
Trang 13we should run more subjects to see if we can get a statistically significant result (See experiment 2 below) Given that ITS seemed to be better than classroom instruction, we compared the performance of the ITS versus classroom, dropping those students who got the CAI and found that the students who had ITS did much better (t=2.673, p<.014) with
an effect size of 0.59 and confidence intervals of -0.02 – 1.19 Conversely, when we compared CAI to classroom instruction the results were not statistically significant (t
=.905, p<.383) with an effect size of 4 and confidence intervals of -0.34 – 1.15
Because we used a counterbalanced design and half of the students got the
computer first and the other half got the classroom instruction first, it is worth looking to see if there is an effect of order, and indeed we found that students’ average learning gains were higher in the first section as a main effect (F=29.38, p<.000) When
comparing computer gain and group, the computer-first group (M = 1.89, sd = 1.49) gained the classroom-first group (M = 1.78, sd = 1.26) The mean mid-test score for the computer first group was (5.06, sd = 1.11) and for the computer second group, (m= 3.67,
out-sd = 1.71) The mean post-test score for the computer first was (5.44, out-sd = 856) and (5.44, sd = 1.38) for computer second group
Because the first author was particularly interested in students with learning disabilities, we did a more focused analysis looking at the fifteen students with learning disabilities (LD) and found, not surprisingly, they started with lower pretest scores (m= 2.93, sd = 1.22) By repeating the above analysis, but using only the 15 students with LD,
we found students with LD showed statistically significant differences in learning (t = 2.101, p < 054) and similar but larger effects showing that the students with LD learned more from the computer than classroom instruction The average classroom gain was (m=.954, sd = 91) while the average computer gain was (m=1.62, sd = 1.22), with an effect size of 0.55 and confidence interval of -0.18, 1.28.The students with learning disabilities made comparable gains to students without learning disabilities, but started out with lower average pre-totals
Discussion of Experiment 1
On average, all students had learning gains in all conditions However, these comparisons indicate that students’ use of computer-delivered “intelligent” feedback (ITS) enhanced learning symbolization skills more than teacher-centered classroom instruction, and CAI These results hold also for students with learning disabilities
It is important to note, that while the computer condition overall was significantly better than the classroom condition, the tutorial dialog was not significantly better than the simple computer-assisted condition Therefore, we can not be certain that tutorial dialog alone is more effective or more efficient than simple computer feedback
There are other factors that may have accounted for the present results including differences in classroom conditions For example, students were either in the computer first or computer second conditions and some students were either in direct instruction or cooperative learning groups Also we can not rule out multiple-treatment interference because each group received more than one treatment nor can we can rule out the effects
of practice as each student participated in pre-,mid- and post-tests Experimenter bias may also be a factor because the first author taught both lessons in the classroom
conditions It is also important to mention that this experiment was conducted in October
Trang 14only one and one-half months into the school year with students having only minimal exposure to writing algebra expressions; therefore, these students might be considered nạve learners with respect to symbolizing Given that research (Rose & VanLehn, 2003) suggests that nạve learners may benefit more from tutorial dialog we can not rule this out
as a factor for the benefit of computer-delivered intelligent feedback in this experiment Finally, the control condition consisted of two different teaching methodologies,
traditional instruction and cooperative instruction Perhaps one of these methods is not very effective for teaching algebra word-problem symbolization in a single class period
Experiment 2: Comparing CAI versus ITS done as homework
In experiment I, students who received intelligent tutoring feedback showed about
a two-thirds problem gain over students who received simple computer-based feedback The results were not statistically significant Also, students in the traditional instruction condition out-gained students in the cooperative instruction condition by about one-third
of a problem Again these results were not significant Thus the purpose of experiment 2 was to determine if we could obtain statistically significant results between intelligent feedback and simple computer-based feedback with additional students while also
eliminating the confound between type of classroom teaching
As we explain below, we intended to replicate experiment 1 with larger numbers, and to eliminate the confound of type of classroom instruction, but instead we wound up comparing CAI and ITS when done at home This is similar to the VanLehn et al., (2004)study in which they compared Andes an ITS that provides immediate step-wise feedback,but no dialog, to paper-and-pencil homework and found that Andes helped students learn while replacing only their paper-and-pencil homework However, their study did not compare the ITS to CAI nor did it look at motivational benefits of the ITS
Our two research questions for Experiment 2 are:
Research question #2a: For students who have internet connections at home, does 40 minutes of classroom problem solving do better or worse, in terms of student learning gains than ~40 minutes of home computer use
Research question #2b: For students who have internet connections at home, does ~40 minutes of CAI do better or worse, and by how much, than supposed “intelligent”
tutoring Measure of interest included student learning gains measured in school, gains with the computer system itself, time on task, and student reported satisfaction with the computer system We also asked students to self-report times and condition
Method
Setting and Participants The setting for this study was two regular algebra
classrooms in the same high school as in experiment 1, and students’ home computers That is, instead of the experimental condition occurring in a computer lab in the school,
Trang 15students in both classes were offered extra credit to do the experimental condition as a homework assignment Obviously, only students with internet access could participate in the study This was not a planned circumstance, as we had intended to try to replicate experiment #1, but because the computer labs recently installed new security software that prevented the web site from functioning correctly, the first author instead sought volunteers to engage in the computer condition at home for extra class credit.
Fourteen students from each class agreed to participate in the experimental
condition which meant they agreed to work at home, for at least thirty minutes, on the computer-delivered system Thus, the participants for this study were twenty-eight students (20 female, 8 male, ages 14-16 years) out of a possible forty-five students from both classes All students were classified as typically achieving students, that is, none were identified as learning disabled The lead author, while not the student’s regular teacher, taught both classes during the experiment
Procedures As in experiment 1, the study was conducted over a one-week period
and included a pretest, midtest, and post-test administered on Monday, Wednesday, and
Friday The control and experimental conditions occurred on Tuesday and Thursday in a
counterbalanced manner Group-one received the traditional instruction control on
Tuesday in their classroom and the experimental condition on Thursday, while group-tworeceived the experimental condition on Tuesday and the control condition on Thursday in their classroom For the experimental condition both groups were taught how to log on to the system and instructed to spend at least thirty minutes on the computer system from the time they were logged on without stopping
To be clear, students who did not volunteer to be part of the experiment were not
in the classroom; students who did volunteer to do the extra homework required were
“pulled out” of their normal classroom for the “classroom instruction” part of this
experiment Obviously, this was a more motivated group of students
During the traditional instruction condition we used the same format and
materials (worksheets, pre-, mid-, and post-tests) used in experiment 1 The classroom activities were again divided into two main parts: 1) introduction with in-class examples, and 2) guided practice Students were given a worksheet with the same twenty-five problems in the same order used in experiment 1 The instructor demonstrated how to translate word problems into algebraic expressions by first displaying problems on an over-head projector and reading the problems to the class The instructor then discussed several traditional textbook methods used to translate word problems to algebraic
expressions including matching “clue” words with mathematical operations and
procedures and using problem-solving plans such as, explore the problem, plan the solution, solve the problem, and examine the solution The instructor demonstrated five problems which took approximately twenty-minutes During the remaining thirty-
minutes students completed their worksheet, and the instructor was available to all students and assisted in the order in which students requested help
Design A counterbalanced design was used in which all groups received all
conditions but in a different order Specifically, one group participated in the
experimental condition first while the other group participated in the control condition first, thus ensuring a different sequence of instruction for each group In the experimental condition, students continued to receive computer-delivered instruction, but unlike Experiment 1, the control condition was limited to traditional classroom instructional