1 / 24

Applying the Pronunciation Lexicon Specification to ASR & TTS

1. Applying the Pronunciation Lexicon Specification to ASR & TTS. Patrizio Bergallo. SpeechTEK ASTS - Advances in Text-to-Speech Processing. Monday , August 20, 2007. Agenda. Loquendo Today Introduction to PLS Reference Scenario Pronunciation Lexicons International Phonetic Alphabet

shirin
Download Presentation

Applying the Pronunciation Lexicon Specification to ASR & TTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1 Applying the Pronunciation Lexicon Specification to ASR & TTS Patrizio Bergallo SpeechTEK ASTS - Advances in Text-to-Speech Processing Monday, August 20, 2007

  2. Agenda • Loquendo Today • Introduction to PLS • Reference Scenario • Pronunciation Lexicons • International Phonetic Alphabet • Overview of PLS • How does TTS use PLS? • How does ASR use PLS? • Examples of Use • Latest Improvements

  3. Loquendo Today • Global company of the Telecom Italia group, leader in Europe and South America in the Speech Technologies market • Company founded in 2001 from Telecom Italia Labs, benefiting from know-how gained from more than 30 years research experience • Complete set of Multilingual speech technologies on a wide spectrum of devices; 25 patents, 50 voices and 20 languages • Full support for international standards (MRCPv1/v2, VoiceXML 2.0/2.1, CCXML, SSML, SRGS, SISR) • Company ready for challenging future scenarios: Multimodality, Security • 100 employees, and displayed strong growth throughout 2007 • HQ in Turin, Offices in US, Spain, Germany and France, and a Worldwide Network of Partners

  4. Reference Scenario • Many speech applications need to specify pronunciation for words and phrases • Surnames, locations, company names • Acronyms • Names in specific contexts (restaurants, sports, movie titles, etc.) • Foreign words, mixed languages • Pronunciation is critical both for TTS and ASR • Improves reading of prompts by TTS • Improves ASR performance • VoiceXML 2.0/2.1 applications are the reference scenario • Prompts are based on SSML 1.0 (or in future SSML 1.1) • Recognition grammars are based on SRGS 1.0

  5. Pronunciation Lexicons • Pronunciation Lexicon • a mapping between words (or short phrases), their written representations, and their pronunciations suitable for use by an ASR engine or a TTS engine • Pronunciation lexicons are not only useful for voice browsers • They have also proven effective mechanisms to support accessibility for the differently able as well as greater usability for all users • They are used to good effect in screen readers and user agents supporting multimodal interfaces • The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 is designed to enable interoperable specification of pronunciation lexicons

  6. Pronunciation Lexicon Specification • W3C specification status • Second Last Call Working Draft (26 October, 2006) • Currently the Implementation Report Plan and the Disposition of Comments are under development (all public comments were addressed) • Candidate Recommendation expected 3Q07 Part of first version of the Speech Interface Framework (Larson, 2000) W3C Recommendation W3C Last Call Working Draft

  7. International Phonetic Alphabet • Pronunciation is represented by a phonetic alphabet • Standard phonetic alphabets • International Phonetic Alphabet (IPA) • Well known phonetic alphabet • SAMPA - ASCII based (simple to write) • Pinyin (Chinese Mandarin), JEITA (Japanese), etc. • Proprietary phonetic alphabets • International Phonetic Alphabet (IPA) • Created by International Phonetic Association (active since 1896), collaborative effort by all the major phoneticians around the world • Universally agreed system of notation for sounds of languages • Covers all languages • Requires UNICODE to write it • Normatively referenced by PLS

  8. Overview of PLS • A PLS document is a container (<lexicon>) of several lexical entries (<lexeme>) • Each lexical entry contains • One or more spellings (<grapheme>) • One or more pronunciations (<phoneme>) or substitutions (<alias>) • Each PLS document is related to a single unique language (xml:lang) • SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents • Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations

  9. PLS Example <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciationlexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon2007@@@@/pls.xsd" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>Sepulveda</grapheme> <phoneme>səˈpʌlvɪdə</phoneme> </lexeme> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> </lexicon>

  10. How does TTS use PLS? • SSML 1.0 <?xml version="1.0" encoding="UTF-8"?> <speak version="1.0" … xml:lang="en-US"> <lexicon uri="http://www.example.com/SSMLexample.pls"/> The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Benigni. </speak> • PLS 1.0 <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>La vita è bella</grapheme> <phoneme>ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə</phoneme> </lexeme> <lexeme> <grapheme>Benigni</grapheme> <phoneme>bɛˈniːnji</phoneme> </lexeme> </lexicon>

  11. How does ASR use PLS? • SRGS 1.0 <?xml version="1.0" encoding="UTF-8"?> <grammar version="1.0" … xml:lang="en-US” root="movies" mode="voice"> <lexicon uri="http://www.example.com/SRGSexample.pls"/> <rule id="movies" scope="public"> <one-of> <item>Terminator 2: Judgment Day</item> <item>Pluto's Judgement Day</item> </one-of> </rule> </grammar> • PLS 1.0 <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>judgment</grapheme> <grapheme>judgement</grapheme> <phoneme>ˈdʒʌdʒ.mənt</phoneme> </lexeme> </lexicon>

  12. Examples of Use • Multiple pronunciations for the same orthography • Multiple orthographies • Homophones • Homographs • Acronyms, Abbreviations, etc.

  13. Multiple pronunciations for the same orthography • Multiple pronunciations are represented by more than one <phoneme> or <alias> element <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>ˈnjuːtən</phoneme> <phoneme>ˈnuːtən</phoneme> </lexeme> </lexicon>

  14. Multiple orthographies • Alternative textual representations for the same word or phrase are represented by more than one <grapheme> inside the same <lexeme> • All the pronunciations given within the <lexeme> apply to each and every <grapheme> within the <lexeme> <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="jp"> <lexeme> <grapheme>nihongo</grapheme> <grapheme>日本語</grapheme> <grapheme>にほんご</grapheme> <phoneme>ɲihoŋo</phoneme> </lexeme> </lexicon>

  15. Homophones • Words with the same pronunciation but different meanings are represented as different lexemes <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>cede</grapheme> <phoneme>siːd</phoneme> </lexeme> <lexeme> <grapheme>seed</grapheme> <phoneme>siːd</phoneme> </lexeme> </lexicon>

  16. Homographs (1/2) • Words with the same spelling but pronounced in different ways are represented using the role attribute of the <lexeme> element • This mechanism allows for the referencing of defined taxonomies of word classes (part of speech, meaning, etc.) <lexicon version="1.0“ xmlns:claws=“http://www.example.com/claws7tags” alphabet="x-myorganization-pinyin" xml:lang="zh-CN"> <lexeme role="claws:VV0"> <!-- base form of lexical verb --> <grapheme>处</grapheme> <phoneme>chu3</phoneme> <!-- pinyin string is: "chǔ" in 处罚 处置 --> </lexeme> <lexeme role="claws:NN"> <!-- common noun, neutral for number --> <grapheme>处</grapheme> <phoneme>chu4</phoneme> <!-- pinyin string is: "chù" in 处所 妙处 --> </lexeme> </lexicon>

  17. Homographs (2/2) • SSML 1.1 will support the role attribute <speak version="1.1“ xmlns:claws="http://www.example.com/claws7tags" xml:lang="zh-CN"> <lexicon uri="http://www.example.com/lexicon.pls“ type="application/pls+xml“ xml:id="mylex"/> <lookup ref="mylex"> 他这个人很不好相<w role="claws:VV0">处</w>。 此<w role="claws:NN">处</w>不准照相。 </lookup> </speak> • Currently PLS doesn’t define/mandate any taxonomy • PLS generally defines role values as qualified names (QNames)

  18. Acronyms, Abbreviations, etc. • Pronunciations expressed as a sequence of other orthographies (acronyms, abbreviations, etc.) are represented by the <alias> element <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> <lexeme> <grapheme>101</grapheme> <alias>one hundred and one</alias> </lexeme> </lexicon>

  19. Latest Improvements • W3C Last Call Working Draft stage allows public comments to be addressed • Large majority were clarifications • New functionalities were deferred to a future version of PLS specification • Major clarifications were about • <alias> recursion • Multiple pronunciations • Changes are subject to a formal approval by the Working Group • Next Steps • PLS 1.0 is very close to Candidate Recommendation stage • SSML 1.1 will provide a more complete support of PLS 1.0

  20. <alias> recursion • Pronunciations of the <alias> element contents MUST be generated by the processor, using pronunciations described by the <phoneme> element of any constituent graphemes in the PLS document, and without invoking recursive access to the PLS document on the <alias> elements of any constituent graphemes <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>GNU</grapheme> <alias>GNU is Not Unix</alias> <phoneme>gəˈnuː</phoneme> </lexeme> <lexeme> <grapheme>Unix</grapheme> <grapheme>UNIX</grapheme> <alias>a multiplexed information and computing service</alias> <phoneme>ˈjuːnɪks</phoneme> </lexeme> </lexicon> GNU is pronounced: gəˈnuː is Not ˈjuːnɪks

  21. Multiple pronunciations (1/2) • ASR • If more than one pronunciation for a given <lexeme> is specified, an ASR processor MUST consider each of them as valid pronunciations for the <grapheme> • TTS • If more than one pronunciation for a given <lexeme> is specified, a TTS processor MUST use the first one in document order that has the prefer attribute set to "true“ • If none of the pronunciations has prefer set to "true", the TTS processor MUST use the first one in document order unless theTTS processor is documented as having a method of selecting pronunciations, in which case the processor MUST use any one of the pronunciations

  22. Multiple pronunciations (2/2) • An ASR processor will recognize both pronunciations, whereas a TTS processor will only use the first one (because it is the first in document order that has prefer set to "true"). <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>lead</grapheme> <alias prefer="true">led</alias> <phoneme prefer="true">liːd</phoneme> </lexeme> <lexeme> <grapheme>led</grapheme> <phoneme>led</phoneme> </lexeme> </lexicon>

  23. References • PLS 1.0 Second Last Call Working Draft (26 October, 2006) • http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/ • Voice Browser Activity Page (VoiceXML, SSML, SRGS, …) • http://www.w3.org/Voice/ • International Phonetic Association • http://www.arts.gla.ac.uk/IPA/ • VoiceXML Forum • http://www.voicexml.org/

  24. Final Remarks • For more information please • Visit Loquendo’s booth#509 • Keep an eye on: www.loquendo.com • Contact us: patrizio.bergallo@loquendo.com THANK YOU

More Related