210 likes | 224 Views
Explore research comparing the effectiveness of speech-based tutorial dialogue systems versus text-based ones in educational applications. Findings suggest speech-based systems may enhance learning by promoting self-explanation and tailored presentation. Benefits of speech include improved self-explanation, reduced cognitive load, and enhanced emotional state detection. Discover a study on the WHY2 Conceptual Physics Tutoring project, examining reasons for its success. Gain insights into the impact of dialogue length on learning and the potential benefits of speech input.
E N D
A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott Silliman, Ramesh Srivastava, Kurt Vanlehn May 31, 2003 HLT-NAACL Workshop on Educational Applications of NLP
Overview • Hypothesis: speech based tutorial dialogue systems may be more effective than text based ones • Research Context: WHY2 Conceptual physics tutoring project • Parallel corpus collection effort: speech versus text based human tutoring • Comparing features that correlate with learning gains in text based tutoring (Rosé et al., 2003) • Parallel system development: speech versus text based intelligent tutoring systems • ITSPOKE (Litman et al., 2003) adds speech input and output to WHY2-Atlas system (Vanlehn et al., 2002; Jordan and Vanlehn, 2002; Rosé et al., 2002)
Benefits of Tutorial Dialogue • Human tutoring more effective than classroom instruction • 2 Sigma effect sizes (Cohen et al., 1982; Bloom, 1984) • Conjecture: effective because of collaborative dialogue (Fox, 1993; Graesser et al., 1995; Merrill et al., 1992; Chi et al., 2001) • Student self-explanation enhances learning (Chi et al., 1981; Renkl, 1997; Pressley et al., 1992; Lemke, 1990) • Motivates an “Ask, don’t tell” tutoring strategy (Vanlehn et al., 1998) • Trend in favor of Socratic versus Didactic tutoring (Rosé et al, 2001) • Student language makes student thinking visible • Tutors may tailor the presentation of material to the needs of each student
Tutorial Dialogue Systems • Tutorial dialogue systems are typically text based • (Evans et al., 2001; Rosé et al., 2001; Aleven et al., 2001; Zinn et al., 2002; Vanlehn et al., 2002) • Many have been evaluated successfully with students • (Rosé et al., 2001; Heffernan and Koedinger, 2002; Ashley et al., 2002; Graesser et al., 2001) • Some effective tutoring systems with speech capabilities • (Mostow and Aist, 2001; Aist et al., 2003; Fry et al., 2001)
Benefits of Speech • Students do more self explanation in speech versus text (Hausmann and Chi, 2002) • Our preliminary results show larger effect size for speech based human tutoringin about half the time (1.23 sigma vs. .68 sigma) • Speech monopolizes fewer cognitive resources than typing • Leaves more resources for self-explanation and learning • Potential for hands-free interaction (Smith, 1992) • Speech contains prosodic and acoustic information text lacks • Useful for predicting emotional states (Litman et al., 2003; Ang et al., 2002; Batliner et al., 2000) • Improves accuracy at detection and correction of misrecognized utterances (Litman et al., 2000 & 2001; Hirschberg et al., 2001)
WHY2-Atlas Typed Human-Human Tutoring Corpus • Pretest measures prior knowledge of physics • Focus on basic concepts and common misconceptions such as “heavier objects exert more force” (Hake, 1998; Halloun and Hestenes, 1985) • Students read 9-page mini-textbook • Students work through up to 10 essay problems designed to elicit the expression of common misconceptions • Two balls are released in a vacuum, one of which has twice the mass of the other. Which will hit the ground first? Explain. • Post-test, similar to pretest, allows us to measure gain in knowledge • Conditions: Text/Speech human tutoring, Atlas,Targeted Text
Here are a few things to keep in mind when calculating acceleration for a body at rest. Acceleration is change in velocity over time. If velocity is not changing, then there is zero acceleration. Now, if for a finite time interval the velocity remains zero, then it is true that it is not changing. Therefore, when a body is at rest during a time interval, its acceleration within that time interval is zero. Tutor: Here are a few things to keep in mind when calculating acceleration for a body at rest. Acceleration is change of what over time? Student: velocity Tutor: Right. If the velocity is not changing, what is the magnitude of the acceleration? Student: zero. Tutor: Good. If velocity remains zero within a time interval, how much is it changing? Student: zero Tutor: Super. So if a body is at rest during a time interval, what is the... Targeted Text Versus Dialogue
Extended Student Explanations • Correlation between average turn length and learning • Effect of pretest score regressed out • R=.565, p<.05, N=17 • Detailed analysis of complete transcripts of 7 students who completed the study (Rosé et al., 2003) • Coded for question types and negative feedback types • Length of answer correlates with likelihood of receiving negative feedback (Kappa = .78) • R= .8065, p< .01,N=9 • Longer answers more opportunities for learning
ITSPOKE:Intelligent Tutoring SPOKEn Dialogue System • “Back-end” is Why2-Atlas • Speech input via Sphinx2 speech recognizer - 55 dialogue-dependent language models created from 4551 typed student utterances (Why2-Atlas corpus) - about to be enhanced with spoken data • Speech output via Festival text-to-speech synthesizer
Typed vs. Spoken Human Tutoring: Overview of Results • Intelligent Tutoring evaluation (learning gains, reading control) • Spoken Dialogue evaluation (efficiency) • Dialogue Phenomena/Learning correlations - larger student turn lengths and student-tutor word ratios correlate with learning in text (Rosé et al., 2003, Core et al. 2002) - are turn length and word ratios similar in text and speech?
Post-Test Results (new!) MeanSDN Spoken Dialogue .70 .15 7 Typed Dialogue .67 .12 20 Reading Targeted Text .57 .13 20 • Spoken > Reading Targeted (p<.03, sigma=1.23) • Typed > Reading Targeted (p<.01, sigma=0.70)
ITSPOKE (Spoken) Human-Human Tutoring Corpus • Data -Target size: 20 subjects - Current size (May 20): 10 subjects / 86 dialogues / 62 transcribed & turn-annotated • WHY2-Atlas (text) vs. ITSPOKE (speech) data - same experimental procedures - input and output dialogue modalities differ - strict turn-taking in typed; overlaps in speech
Time on Task Results (new!) MeanSDN Reading Targeted Text 85 38 20 Spoken Dialogue 170 56 7 Typed Dialogue 430 160 20 • Reading Targeted Text > Spoken Dialogue (p<.001) • Spoken Dialogue > Typed Dialogue (p<.05)
condition participant #turns/ dialog #words/ dialog #words/ turn speech student 49.45 281.48 5.71 text student 12.8 184.9 14.37 speech tutor 48 1185 26.40 text tutor 14.4 393.3 39.04 Differences:Text/Speech (updated) • Speech: (n=62 dialogues); Text (n=166) • mean student turn length (correlated with learning gains in text) is shorter in speech (p<.001)
Speech Condition Text Condition #Student wrds/ #Tutor wrds #Student trns/ #Tutor trns #Student wrds/ #Tutor wrds #Student trns/ #Tutor trns 0.27 1.00 0.45 0.88 Differences (continued) • Relative proportion of student to tutor words does differ across conditions t(27)=3.68, p<.001
Problem: If a car is able to accelerate at 2 m/s2, what acceleration can it attain if it is towing another car of equal mass? Original Essay: If the car is towing another car of equal mass, the maximum acceleration would be the same because the car would be towed behind and the friction caused would only be by the front of the first car. -----------------------------6.5 min into dialogue-------------------------- Tutor: Now this law that force is equal to mass times acceleration, what's this law called? This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it? Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh) Tutor: Right, right- That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a different name Tutor: Yeah that's right you know Newton was a genius- Student: (laugh) Tutor: and uh he looked at a large number of experiments and experimental data that was available and from that he could come to this general law and it is known as Newton's second law of motion. Um many many other scientists before him had seen all this data which was collected by scientists but had not concluded this now it looks very simple but to come to the conclusion from a mass of data was something which required the genius of Newton. Student: mm hm Why2-Atlas Ex. ITSPOKEEx. (Human Human Corpora) Problem: Suppose that you released 3 identical balls of clay in a vacuum at exactly the same instant. They would all hit the ground at the same instant. Now you stick two of the balls together, forming one ball that is twice as heavy as the remaining, untouched clay ball. Both balls are released in a vacuum at exactly the same instant. Which ball hits the ground first? Original Essay: Both balls will hit the ground at the same time. The balls are in free fall (only gravitational forces). The ratio of the masses and weight are equal. --------------------------2 min into dialogue-------------------------------- Tutor: You have correctly said the balls are in free fall. What do you conclude from this fact? Student: There is no air resistance and the balls’ accelerations are constant in regards to one another Tutor: Right, so the conclusion is that they both have the same acceleration, the free fall acceleration. Now you have to show that time taken by them to reach the ground will be the same. How do you do that? Student: F (net forces) / mass = f/m because F = m*a therefore ratio should equal one another Tutor: But that does not tell you anything about the time taken. What physical quantities will determine the time taken. Will the initial velocity of the balls matter? Student: yes, assuming starting at rest? sorry, kind of at a loss at the moment
Current and Future Directions • Further analysis of dialogue differences - more student and less tutor questions in speech - characterization of overlaps in speech - coding of questions and other phenomena • Further learning gains analyses • Additional tutors • Other dialogue evaluation metrics • Human-computer corpus collection and analysis
Summary Goal: develop and apply language and speech technology to yield the next generation of intelligent tutoring systems Contributions: • Empirical comparisons between typed and spoken tutorial dialogue • Correlation of dialogue behavior with learning gains