610 likes | 714 Views
Eye-tracking reveals the effects of perceptual learning on neighboring phonemes. Bridget Smith The Ohio State University. Using time-course data to view phonological processes. Bridget Smith The Ohio State University. Background.
E N D
Eye-tracking reveals the effects of perceptual learning on neighboring phonemes Bridget Smith The Ohio State University
Using time-course data to view phonological processes Bridget Smith The Ohio State University
Background • The interaction of speech perception and production in sound change • Recreating sound change in a laboratory • Measures changes in perception via lexical decision, identification tasks (RT and Accuracy), and eye-tracking paradigm
Theory of sound change • Phonetically-conditioned sound change begins with phonetic variation and ends with systematic change • e.g., O.E. kirke, kiken -> M.E. church, chicken • What happens in between? • How gradual is it? • Do lexical differences exist? • Can it be conditioned by other non-phonetic factors? • Can we reproduce it in a laboratory?
Research Question • Can we use perceptual learning and shadowing to reproduce sound change in a laboratory? • After participants are exposed to a pronunciation variant, do they exhibit a change in perception and production consistent with a sound shift? • (If yes, then there are many interesting questions about sound change to look at)
Perceptual Learning • When exposed to a pronunciation variant in familiar words, listeners incorporate the variant into their mental representation, temporarily (or sometimes long-term) thus changing the representation of that sound • e.g., Norris, McQueen, & Cutler 2003 • Ambiguous /s-f/ sound replaced segment in words with /s/ and words with /f/, boundary shift depended on which words.
Shadowing/convergence • When saying a word after hearing it pronounced, talkers change their productions to be more similar to those of the model talker. • e.g., Goldinger et al 1998 • Measured similarity using AXB task • Later studies measured variables include VOT, F0, amplitude envelope, mean spectral frequency (center of gravity)
Sub-questions • Do participants undergo perceptual learning? – Does this extend to new talkers? new words? • Do participants undergo convergence with the trainers’ voices while shadowing? Is the change in pronunciation generalized to familiar words that they did not hear during training? • Does the change in one sound affect other neighboring sounds? • …
Experiment design • Needed a source of variation that was not known to have any indexical value or be a sound change in progress • Affrication of /tw/ • Phonetically natural: stops before approximants frequently become affricated • Historical precedent in English: /tr/ and /tj/
Experiment design • Two likely trajectories of change: “front” or “retracted” frication: tsw- or tchw- • tchw- common in observed variation • parallels with of /tr/ and /tj/ • physiological basis - rounding gesture for /w/ • tsw- also possible, especially with dental /t/ • c.f. OHG /t/ -> /ts/, even in front of /w/, e.g. zwei, or Japanese /t/ -> /ts/ before /ɯ/
Ambient variation or sound change-in-progress? • Known TV personalities who now say things like “chwenty” and “chwitter” and “betchween”: • Rachel Maddow • Michael Savage • Michael Ian Black
Links to videos • http://www.youtube.com/watch?v=XS3rR5N5vAA&t=49s • http://youtu.be/D4rq53Ztvbg?t=4m18s
Experiment 1 Design • 45 /tw-/ initial English words • 30 familiar-somewhat familiar • 15 highly unfamiliar/archaic • Necessitates different paradigm than traditional perceptual learning methods (e.g., Norris, McQueen & Cutler 2003).
Experiment 1 Design • Task 1: pre-training production and familiarity rating • Participants read familiar and unfamiliar words at self controlled pace • subset of training words, plus others: tw-, en-, vi-, t-, tr-, and str- words • Rate familiarity of each
Experiment 1 Design Familiarity Ratings: • very familiar – I know this word and use it • somewhat familiar – I know this word, but I may or may not use it myself • neither familiar nor unfamiliar – I may know this word, but do not use it • somewhat unfamiliar – I may have heard this word before, but have never used it • very unfamiliar – I have never heard or used this word before
Experiment 1 Design • Task 2: training/shadowing (en- vi- tw- words): • Participants see the word on the screen • Hear the word pronounced by trainers over headphones • Say the word out loud after hearing it • Hear the word again • Silently read a definition • See, hear, and say the word again • Repeat in blocks of definitions and then sentences with the word in context
Experiment Design • Hear each word 6 times, shadow 4 times • 2 trainers for each word – 1 male, 1 female • Total 8 trainers – 4 male, 4 female • 3 conditions: • Front tsw- • Retracted tchw- • Control tw-
Experiment Design • Task 3 - Lexical decision task: • Using button box, choose whether stimulus is word or non-word • 4 new talkers (2 male, 2 female) • /tw/ target words: 28 trained, 15 untrained • Half front tsw- variant, half retracted tchw- • Non-words with variant, also vi- and en- words and non-words
Experiment Design • Task 4 – Post-training production • Participants read words off the screen for comparison to before pronunciations • Task 5 – Identification task • Participants hear a stimulus and select whether they heard “two” “chew” or “tsu” • Tests whether adaptation is extended to related environment /tu/
Overview Results Experiment 1 • Lexical decision showed greater acceptance of variant that participants were trained on • RT varies greatly by subject and other unknown factors, and cannot be directly compared • Identification task showed generalization of training variant to /tu/ by boundary shifts • Production results show convergence
Perceptual learning means 0.981 0.944 0.964 0.917 0.986 0.950 0.800 0.580 0.733 0.660 0.617 0.640 0.878 0.793 0.825 0.810 0.825 0.880
Perceptual learning • All subjects performed better on words containing the tchw- variant (F(1,67)=16.9, p<0.001), but the difference was leveled out for the trained familiar words, which had a ceiling effect (F(2,134)=3.4, p<0.05). • tch- is a more common variant before other approximants in American English (such as in truck or congratulations). • All subjects performed better on familiar words, especially on words they heard during training (F(2,134)=77.2, p<0.001). • Familiar words are easier to access. • Subjects performed better on words containing the variant they heard in their training condition (F(2,67)=6.9, p<0.005). • Listeners displayed perceptual learning of a pronunciation variant (tsw- or tchw-), which they applied to new talkers’ and new words’ pronunciation during a lexical decision task.
Generalization to /tu/ • Listeners accepted more front- affricated variants as instances of plain t for the female talker, but more retracted affricated variants were labeled as t for the male talker (F(1,64)= 225.3, p<0.001). • Listeners already assume a retracted variant for men and front variant for women. However, these differences are more than can be explained by physiology, or by the spectral characteristics of the sounds (as in Strand 1999, i.a.). • Listeners who were trained on the front tsw- variant were more likely to label front affricated ts- stimuli as instances of t. Listeners who were trained on the retracted affricate tchw- were less likely to judge ts- as an instance of plain t (F(2,64)=5.9, p<0.005). • Front affricated stimuli were more likely to be labeled as t- for the female talker by both control and front-trained groups (F(2,64)=3.5, p<0.05) because listeners already assume a more front variant for women. • The perceptual space for t depends on whether the talker is perceived as male or female, greater at the front end for females, greater at the back end for males. • The shift in t, by association with the shift in tw- caused by perceptual learning most apparent at the unbounded front edge, where both training groups show a shift in perception for the male talker. • The back is bounded by the tʃ phoneme for the male talker, while the front boundary (ts) is more flexible for both male and female talkers, so perceptual range expands for front variant, contracts for back variant
Production Results • Measurements of affrication and place of articulation • Center of gravity (centroid, average frequency of spectrum) • Spectral slope (difference in amplitude between 0-4500Hz and 4500-11025Hz) • Rise time (time from burst to maximum amplitude), normalized relative to duration
Combined distance • Trainers’ measurements for centroid, slope, and rise time in each condition were normalized to a standard normal distribution (z-score) mean=0, sd=1. • Each subject’s measurements for centroid, slope, and rise time in each condition were normalized relative to the trainers for that condition. (-3 = 3sd from trainers) • Euclidean distance
Convergence • Defined here as significant (p<0.05) movement (i.e., difference from starting measures) in the direction of the trainers’ productions.
Production Results • Plain /tw/ control group no movement • Both front and retracted trained groups move significantly toward trainers during shadowing and back away after (p<0.05) • Retracted trained (chw-) group shows marginally significant difference between pre- and post- shadow productions (p=0.053) • What about novel words?
Production Results • Untrained words significantly different between the pre- and post-shadowing for chw- group (t(7)=2.45, p<0.05), mean 0.62. • Trained words for chw- group not significantly different of 0.38 sd (t(7)=2.01, p=0.073). • tsw- group difference (n.s) 0.35 for the on the untrained words, and 0.20 on the trained words. • Convergence not greater for trained words!
Production results • Convergence for chw- group (near-convergence for tsw- ) generalizes to novel words after shadowing • But why isn’t convergence greater for trained words? • Reduction due to repetition effect
Patterns of convergence • Do convergers show greater effects of perceptual learning? • Looks like yes for subset, but need more data!
Subquestions • Do participants undergo perceptual learning, extending to new talkers and words? – yes, according to acceptance, but RT unclear • Do participants undergo convergence with the trainers’ pronunciation? - depends on the variant, chw- is effective, tsw- less so • Is the change in pronunciation generalized to familiar words that they did not hear during training? - yes, but also interaction with reductive processes
Experiment 2 Some problems to be addressed: • Strong dislike of one or two trainers in tsw- condition in anecdotal reports – get new trainers and norm stimuli • Issues with RT because we don’t know why RT would be increased. Unbalanced design makes it hard to compare. Need a more precise way of measuring whether perceptual learning has occurred – eyetracking can show what listeners think they hear. Can also use unfamiliar untrained words (non-words) to balance design.
Experiment 2 • Part 1 is the same, except using all training words instead of a subset • Part 2 is the same, except includes pictures in each step, and some stimuli replaced, dropped en- words to reduce load • Instead of lexical decision, identification, and post-test production, eye-tracking word search task
Experiment 2 • 4 additional talkers – 2 male, 2 female • tw- words with sw- and ch- competitors, words and non-words • v- filler words with f- and v- competitors • All tw- and v- words used twice, half with chw- variant, half with tsw-, once in each 15-min block
Example slide twin fin + vin chin