1 / 12

Chapter 15 Speech Synthesis Principles

Chapter 15 Speech Synthesis Principles. 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation and Synthesizer. 15.1 History of Speech Synthesis (1). It should back-trace to17 century.

jnoto
Download Presentation

Chapter 15 Speech Synthesis Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 15 Speech Synthesis Principles • 15.1 History of Speech Synthesis • 15.2 Categories of Speech Synthesis • 15.3 Chinese Speech Synthesis • 15.4 Speech Generation and Synthesizer

  2. 15.1 History of Speech Synthesis (1) • It should back-trace to17 century. • First synthesizer was invented in 18 century. • Basically it is a kind of machine to generate voice like sound by mechanics, later by electronics, at last now by computer. • Fant, Flanagan and Klatt’s contribution. • The speech communication process.

  3. 15.2 Categories of Speech Synthesis (1) • Basically the speech synthesis systems could be classified into four categories: • 1. Parameter Based System • 2. Rule Based System • 3. Waveform Based System • 4. Text to Speech System

  4. Categories of Speech Synthesis (1) • Parametric Analysis-Synthesis • It takes syllable or semi-syllable or phoneme to be synthetic unit. • At first, the analysis for the units is performed, that means to extract the parameters one frame by one frame, and after encoding these parameters compose a speech database. When output, corresponding parameters are taken from the base, after editing and concatenating, sent to the synthesizer in which the parameters control the generation of the signal to output. Used parameters include amplitude(intensity), fundamental frequency(pitch), formants(timbre). Data rate is low, structure is complex, quality is poorer compared with waveform synthesis.

  5. Categories of Speech Synthesis (2) • Synthesis-by-rule It generates speech by using phonetic rules. The stored units are small : parameters of phonemes, di-phones,semi-syllable and syllables and rules of how to compose the syllables by phonemes and words or sentences by syllables. The rules must consider the effects of co-articulation and so on. Rules could be divided into formants frequency rules, duration rules and tone rules and intonation rules. The required memory is more less than parametric approach.But the quality is low, because the rules are not so good and complete.

  6. Categories of Speech Synthesis (3) • Waveform Coding Synthesis It takes word, phrase or sentence as the synthesis unit. The original units are recorded and encoded(probably compressed) to compose the speech-base. When output, the corresponding waveform is taken and after some processing the signal is generated and output.This kind of system is easy to construct, low cost and the natureness is good. But it required much more memory. With the development of IC, it is getting a good way for synthesis. Simple example is the device for reporting the bus stations. A lot of chips can do almost same simple things.

  7. Categories of Speech Synthesis (4) • Text-to-Speech Conversion System • System input is a text string. It contains linguistic processing, semantic dictionary; phonological processing, phonetic dictionary;phonetic processing(prosodic rules, pronounciation variants rules and so on) and speech waveform generation. So real TSS is an Artificial Intelligence system. It is one research direction for a lot of people. Now, the intelligibility is OK, but the natureness is not so good. There are still a lot of work to be done in this area.

  8. Categories of Speech Synthesis (5) • Some basic terminology : • Synthetic Unit • Synthetic Parameters • Database for synthesis • Speech Synthesizer • Quality of Synthetic Speech

  9. 15.3 Chinese Speech Synthesis (1) • Started in 1960’s • Got fast development in later 1970’s. • Now all kind of synthesizer exist. Some are compressed waveform in firmware and replay when needed. • PSOLA(Pitch-Synchronous Overlay Addition) got wide application in waveform based synthesis. • Now the techniques are getting better.

  10. Chinese Speech Synthesis (2) • A Chinese Text-to-Speech System consists of : • Text Analyzer – Word Segmentation Program to segment sentence into word sequence. The segment error rate is about 10%. Main troubles will be ambiguity (overlay) of two words and words out of dictionary (new words). Although there are a lot of approaches were proposed, no one can fully solve these problems. • According to the word and phonetic dictionary the recorded speech will be taken from the database. • Rules for editing and processing the recorded speech

  11. Chinese Speech Synthesis (3) • After processing the regenerated signals are sent into the audio card to generate speech. During the processing we can make all kinds of changes.

  12. 15.4 Speech Generation and Synthesizer • Please see the book on page 291-301.

More Related