1 / 8

Language modelling (word FST)

Language modelling (word FST). Operational model for categorizing mispronunciations. prompted image = ‘circus’. step 1: decode visual image. (circus). ( c u r s us ). (circus). step 2: convert graphemes to phonemes. (/k y r s y s/). (k i r k y s ). (/s i r k y s/).

denis
Download Presentation

Language modelling (word FST)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language modelling (word FST) Operational model for categorizing mispronunciations prompted image = ‘circus’ step 1: decode visual image (circus) (cursus) (circus) step 2: convert graphemes to phonemes (/k y r s y s/) (k i r k y s) (/s i r k y s/) step 3: articulate phonemes (/k y - k y r s y s /) (/ka - i - Er - k y s/) (/s i r k y s/) spoken utterance: correct, miscue (step3)or error (steps 1, 2) SPACE symposium - 6/2/09

  2. Language modelling (word FST) Prevalence of errors of different types (Chorec data) Children with RD tend to guess more often Important to model steps 1 and 3 step 2 not so important SPACE symposium - 6/2/09

  3. s t t r a logP = -5.8 s t r t a t logP = -7.2 s t A r Creation of word FST : model step 1 correct pronunciation predictable errors (prediction model needed) SPACE symposium - 6/2/09

  4. s a t t r Es te a te Er Creation of word FST : model step 3 Per branch in previous FST Correctly articulated Restarts (fixed probabilities for now) Spelling (phonemic) (fixed probabilities for now) SPACE symposium - 6/2/09

  5. Modelling image decoding errors • Model 1 : memory model • adopted in listen project • per target word • create list of errors found in database • keep those with P(list entry = error | TW) > TH • advantages • very simple strategy • can model real words + non-real-word errors • disadvantages • cannot model unseen errors • probably low precision SPACE symposium - 6/2/09

  6. Modelling image decoding errors • Model 2 : extrapolation model (idea from ..) • look for existing words that • expected to belong to vocabulary of child (= mental lexicon) • bare good resemblance with target word • select lexicon entries from that vocabulary • feature based: expose (dis)similarities with TW • features: length differences, alignment agreement, word categories, graphemes in common, … • decision tree  P(entry = decoding error | features) • keep those with P > TH • advantage: can model not previously seen errors • disadvantage: can only model real word errors SPACE symposium - 6/2/09

  7. Modelling image decoding errors • Model 3 : rule based model (under dev.) • look for frequently observed transformations at subword level • grapheme deletions, insertions, substitutions (e.g. d  b) • grapheme inversions (e.g. leed  deel) • combinations • learn decision tree per transformation • advantages • more generic  better recall/precision compromise • can model real word + non-real word errors • disadvantage • more complex + time consuming to train SPACE symposium - 6/2/09

  8. Modelling results so far • Measures (over target words with error) • recall = nr of predicted errors / total nr of errors • precision = nr of predicted errors / nr of predictions • F-rate = 2.R.P/(R+P) • branch = average nr of predictions per word • Data : test set from Chorec database SPACE symposium - 6/2/09

More Related