RK: Text - to - Speech Systems for Indian Languages

 
 

[ << Go Back ]

 

[ ^^ Goto Home ]

 
 

Text - to - Speech Systems for Indian Languages

Our research and development focus at LTRC, IIIT Hyderabad is to develop speech interfaces for Indian Languages. Immediately we are working on Text to Speech Systems for Indian Languages.

Data Driven (Corpus - based / Example - based) Approaches for Text to Speech System

We are developing text to speech systems under the framework of data driven speech synthesis wherein speech is synthesized by concatenating optimal units selected from a large corpus of recorded utterances. This approach produces very natural sounding speech.

For a demonstration of our current system, visit http://speech.iiit.ac.in/

We are working with generic frameworks like Festvox and have developed our own (from the scratch) systems for synthesis. Several issues are involved in text to speech synthesis by Data Driven approaches. These includes

  • selection of optimal text corpus in terms of size and wide coverage

  • automatic annotation of corpuses

  • choice size of basic units of synthesis

  • choice of speech features to be used

  • database pruning and storage

  • unit selection algorithms

  • choice and parameters of distance and cost functions

Also there are system issues related to the size of the corpus and possibility of its use in various conditions. We are working on improving the system for use in various kind of possible situations. Mentioned here are the projects and effort i am working on immediately and lately.

Evolutionary Approach for Unit Selection: GASpSythesizer

Selection of best units from the speech corpus for a given phonetic and prosodic context so as to achieve a sequence of units that is as close as possible to an actual utterance. The closeness is measured in terms of certain speech parameters. In this project, i am developing an Evolutionary Algorithm for the unit selection problem. Unit selection can be considered a minimization problem in n dimensions where n is the length of unit sequence to be synthesized. Each dimension consists of certain discrete points each corresponding to one occurrence of the unit that particular dimension represents. So in the n dimensional space, unit selection is a each valid point is associated with a cost value. The basic unit selection problem is to find a minima in this search space. Evolutionary Algorithm have over the past couple of decades proven out to be of significant interest in problem which basically involve search in a space of large number of dimension. Projects involves development of a basic evolutionary algorithm for unit selection for speech synthesis and implementation of the selection, crossover (and mutation) operators. Currently i am are experimenting with the various parameters that the initial system is based on.

Pruning of Speech Databases: GPMF Approach

While recording speech corpus for data driven speech synthesis meaningful sentences are recorded. Though it is generally take care (As mentioned above) to optimize the text corpus which is to be read out to have the maximum coverage several noise words and elements of not significant importance follow into the corpus to make it more meaningful. Also while recording the database several units do not get recorded properly often due to speaker error. Such units if kept in the database instead of adding to the coverage of units are possible miscreants that may lead cause a dirty synthesis. Hence automatic methods of pruning speech corpus are of interest. Also database pruning is of interest as it will allow to develop a speech synthesizer at any desired size (ofcourse at a corresponding quality). This allows  us to produce a system for use in any type of situation where the amount of space that can be used is limited. We have come up an algorithm to automatically prune speech databases using statistical analysis of the units in the speech corpus in the signal domain.

Experiments with Size of Speech Database

We have experimented with various sizes of speech databases (here without any pruning) and have observed effects of coverage of units on speech quality by conducting perceptual tests.

Low Memory Device Speech Synthesizer: LMDS

Following the development of an Unrestricted Domain Speech Synthesizer for Indian Languages (mentioned below) the need to have the Synthesizer on a Low Memory Device like Simputer* and other PDAs was felt. The size of the Synthesizer was the only prohibiting factor. So a Low Memory Device Synthesizer for Indian Languages (LMDS) was developed. As the result of the project a LMDS of 1.45MB with almost real time performance for small sentences has been developed. The availability of such a synthesizer opens up large number of possibilities for various applications especially in the domain of Tourism, Rural Kiosks and Mobile AVRS. We have ported synthesizer to Simputer and porting it to other platforms will not be an issue.

We are taking the best unit for each possible unit in the database and storing that one best realization with some speech coding technique applied to it to further reduce the size. The issue to be addressed here the criteria for choice of one best unit from several possible. We have experimented with several possible approaches.

Unrestricted Domain Speech Synthesizer for Indian Languages:

Following the successful development of a Limited Domain Synthesizer, an obvious motivation was to develop an Unrestricted Domain Speech Synthesizer using the features of Limited Domain System like Use of syllables as Basic Unit and Capability of Prosodic Matching Function for selected longer sequence. Also the problem of unrestricted domain synthesizer brought along problems like “What to do when a syllable is not found” etc. Solving each problem one by one in a systematic fashion we were able to come up with a speech synthesizer which tough not very good initializing showed the promise of natural sounding speech.

Following this several improvements and pruning continued through July to December. Finally by the time system was officially launched by IIIT, Hyderabad very good quality of Near Natural sounding speech was being produced by the system. Currently the system is available for Hindi and Telugu.

Related Applications:

  • Talking Tourist Aid
  • Email Client for Hindi (with speech support)
  • English - Hindi Dictionary
  • Samachaar Vaani: News Reader

TTS API for Windows

An API for the text to speech system in Windows has been implemented for robust application development in Windows environments. Currently documentation is on. Couple of sample applications have been developed to demonstrate the use of the API. Also the latest API distribution is distributed with a well packaged TTS with Unit Pruning and news Unit Selection alogorithms implemented in it. A lean distribution of nearly 32 MB size is ready which includes 2 voices one each for Hindi and Telugu. Support for Installation of Pluggable voices is there in the system and this distribution is certainly much more "Distributable" and "Useable" by people who just want something that speaks and dont bother about dwelling into the intricacies of Speech Synthesis.

Next step would be provide large number of pluggable text processing modules for this distribution to support lot of common fonts. Also sophisticated text normalization needs to be implemented into the system

Text Processing Front End: Font Convertors for Hindi

We have made Unicode as the front end of inputs to our Text to Speech Engines and several Font Convertors from various fonts and notations to Unicode have been developed.

[ Top ]


This Page Is: http://nlp.iiit.net/~rohit/tts.htm
Last Update On:
Sunday, 30. November. 2003