1 / 17

SSML Extension for Expressive Mandarin TTS

SSML Extension for Expressive Mandarin TTS. Shuang Li Hongwu Yang Lianhong Cai Tsinghua University. Outline. Motivation. Expression of Speech. Proposed SSML extension. Conclusion. Motivation(1/3). Sentences with the same text can be expressed with different styles, emotions and moods.

mmarin
Download Presentation

SSML Extension for Expressive Mandarin TTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

  2. Outline • Motivation • Expression of Speech • Proposed SSML extension • Conclusion

  3. Motivation(1/3) • Sentences with the same text can be expressed with different styles, emotions and moods • Current tts system lacks variability

  4. Motivation(2/3) • Current SSML cannot define speaking style, emotion and mood • Good news: 生日快乐 “Happy birthday” expressed in happiness (emotion) • Bad news: 张总去世了 “Director Zhang passed away” expressed in sadness (emotion) • Information provider: 飞往纽约的飞机将要起飞 “Flight for New York is going to take off”: Expressed in a mild mood • Dialog: 是中国队赢了吗? “Did Chinese team win?”: Emphasize “Chinese”, with interrogative mood • Current SSML hard to show the difference between the expressions above

  5. Motivation(3/3) Expressive speech Phisiological/social characteristics Voice tag characteristic Expressing pattern No tag style news Sports comment dialog Info providing …… Phisiological reactations No tag emotion Positive, neutral, negative • Emotion, style and characteristic are relatively independent but cannot be separated • Characteristic and style: relatively stable and global features • Emotion: short-time, local feature • With different speaking styles • Representing speaker’s attitude, purpose and emotion • More harmonious with the circumstance

  6. Outline • Motivation • Expression of Speech • Proposed SSML extension • Conclusion

  7. Expression of Speech Style :speaking style( dialog, news, information providing…) Mood :mood( request, acquisition, affirmation, apology…) Emotion :emotional activities( neutral, negative, positive)

  8. Hierarchical framework of Prosody • Break level • B0: no break • B1: Syllable • B2: Prosodic word • B3: Prosodic Phrase • B4: Breath Group • B5: Prosodic Group • Chiu-yu Tseng,et al. Fluent speech prosody: Framework and modeling. Speech Communication, 46(2005) 284-399

  9. 我永远忘不了<B3/25ms>一张对日抗战时的新闻照片,<B3/507ms>轰炸后的废墟焦土上,<B3/272ms>一个衣不蔽体、<B3/384ms>满身尘土灰烟的幼儿<B3/100ms>坐在地上<B3/75ms>无助的大哭着。<B5/1110ms>那是一再令我热泪盈眶的镜头。<B3/507ms>新闻摄影中的战争传真<B3/276ms>已不能只称是照片了。<B5/802ms>我永远忘不了<B3/25ms>一张对日抗战时的新闻照片,<B3/507ms>轰炸后的废墟焦土上,<B3/272ms>一个衣不蔽体、<B3/384ms>满身尘土灰烟的幼儿<B3/100ms>坐在地上<B3/75ms>无助的大哭着。<B5/1110ms>那是一再令我热泪盈眶的镜头。<B3/507ms>新闻摄影中的战争传真<B3/276ms>已不能只称是照片了。<B5/802ms> • From Chiu-yu Tseng, report in Beijing University, Oct 11, 2005

  10. Outline • Introduction • Expression of Speech • Proposed SSML extension • Conclusion

  11. Proposed tag(1/2) • Utterance: prosodic group, expressing a complete meaning • Attributes: Style:speaking style Value: News, Reading, Information provider, dialog, etc Emotion: speaking emotion Value: Happy、Sad、Angry、Calm、Despair, etc +1 for positive,0 for neutral, -1 for negative mood:speaking mood Value: given, request, acquisition, affirmation,apology, etc

  12. Proposed tag(2/2) • BG: breath group • attributes: intonation: Value: indicative, interrogative, imperative • PPh: prosodic phrase • PW: prosodic word • Syl: Syllable

  13. Some examples(1/3) • <?xml version="1.0"?> • <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang=“zh-CN"> • <utterence style=”information provide” emotion=”-1” mood=”apology”> • <bg intonation=” indicative”> • <pph>1121次航班(Flight 1121)</pph> • <pph>延误(has been delayed ) • <pw><emphasis level=”strong”>1小时(for an hour )</emphasis></pw></pph> • <break strength=”medium”, time=”215ms”/> • <pph>请旅客们到(Please go to )</pph> • <pw><emphasis=”moderate”>G6</emphasis=”moderate”></pw> • <pph>候机厅等候(the waiting room)</pph> • </bg> • </utterence> • </speak>

  14. Some examples(2/3) • <?xml version="1.0"?> • <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang=“zh-CN"> • <utterence style=”dialog” emotion=”neutral” mood=”acquisition”> • <bg intonation=”interrogative”> • <pph><pw> • <emphasis level=”strong”>张威(Zhang Wei )</emphasis> • </pw></pph> • <break strength=medium time=75ms/> • <pph>担心肖荫开车发晕(is afraid of Xiao Yin being dizzy when driving )</pph> • </bg> • </utterence> • </speak>

  15. Some examples(3/3) • <?xml version="1.0"?> • <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang=“zh-CN"> • <utterence style=”dialog” emotion=”angery”> • <bg intonation=”interrogative”> • <prosody rate=”x-fast”>难道不是你的错吗?(Isn’t it your fault? ) • <break strength=”medium” time=”520ms”/> • </bg> • <bg intonation=”imperative”> • 以后你小心一点(Be careful next time) • </bg> • </utterence> • </speak>

  16. Outline • Motivation • Expression of Speech • Proposed SSML extension • Conclusion

  17. Conclusion & question? • 5 elements for hierarchic prosodic structure • utterance, bg, pph, pw, syl • 3 expressive attributes for utterance • style • emotion • mood • 1 intonation attributes for bg • intonation

More Related