|
Text - to - Speech Systems
for Indian Languages
Our research and development
focus at LTRC, IIIT Hyderabad is to develop speech interfaces for Indian
Languages. Immediately we are working on Text to Speech Systems for Indian
Languages.
Data Driven (Corpus - based
/ Example - based) Approaches for Text to Speech System
We are developing text to
speech systems under the framework of data driven speech synthesis wherein
speech is synthesized by concatenating optimal units selected from a large
corpus of recorded utterances. This approach produces very natural sounding
speech.
For a demonstration of our
current system, visit
http://speech.iiit.ac.in/
We are working with generic
frameworks like Festvox and have developed our own (from the scratch) systems
for synthesis. Several issues are involved in text to speech synthesis by Data
Driven approaches. These includes
-
selection of optimal text
corpus in terms of size and wide coverage
-
automatic annotation of
corpuses
-
choice size of basic units
of synthesis
-
choice of speech features
to be used
-
database pruning and
storage
-
unit selection algorithms
-
choice and parameters of
distance and cost functions
Also there are system issues related to the
size of the corpus and possibility of its use in various conditions. We are
working on improving the system for use in various kind of possible
situations. Mentioned here are the projects and effort i am working on
immediately and lately.
Evolutionary Approach for Unit Selection:
GASpSythesizer
Selection of best units from
the speech corpus for a given phonetic and prosodic context so as to achieve
a sequence of units that is as close as possible to an actual utterance. The
closeness is measured in terms of certain speech parameters. In this
project, i am developing an Evolutionary Algorithm for the unit selection
problem. Unit selection can be considered a minimization problem in n
dimensions where n is the length of unit sequence to be synthesized. Each
dimension consists of certain discrete points each corresponding to one
occurrence of the unit that particular dimension represents. So in the n
dimensional space, unit selection is a each valid point is associated with a
cost value. The basic unit selection problem is to find a minima in this
search space. Evolutionary Algorithm have over the past couple of decades
proven out to be of significant interest in problem which basically involve
search in a space of large number of dimension. Projects involves
development of a basic evolutionary algorithm for unit selection for speech
synthesis and implementation of the selection, crossover (and mutation)
operators. Currently i am are experimenting with the various parameters that
the initial system is based on.
Pruning of Speech Databases: GPMF
Approach
While recording speech corpus
for data driven speech synthesis meaningful sentences are recorded. Though
it is generally take care (As mentioned above) to optimize the text corpus
which is to be read out to have the maximum coverage several noise words and
elements of not significant importance follow into the corpus to make it
more meaningful. Also while recording the database several units do not get
recorded properly often due to speaker error. Such units if kept in the
database instead of adding to the coverage of units are possible miscreants
that may lead cause a dirty synthesis. Hence automatic methods of pruning
speech corpus are of interest. Also database pruning is of interest as it
will allow to develop a speech synthesizer at any desired size (ofcourse at
a corresponding quality). This allows us to produce a system for use
in any type of situation where the amount of space that can be used is
limited. We have come up an algorithm to automatically prune speech databases
using statistical analysis of the units in the speech corpus in the signal
domain.
Experiments with Size of Speech Database
We have experimented with
various sizes of speech databases (here without any pruning) and have
observed effects of coverage of units on speech quality by conducting
perceptual tests.
Low Memory Device Speech Synthesizer:
LMDS
Following the development of
an Unrestricted Domain Speech Synthesizer for Indian Languages (mentioned
below) the need to have the Synthesizer on a Low Memory Device like Simputer*
and other PDAs was felt. The size of the Synthesizer was the only
prohibiting factor. So a Low Memory Device Synthesizer for Indian Languages
(LMDS) was developed. As the result of the project a LMDS of 1.45MB with
almost real time performance for small sentences has been developed. The
availability of such a synthesizer opens up large number of possibilities
for various applications especially in the domain of Tourism, Rural Kiosks
and Mobile AVRS. We have ported synthesizer to Simputer and porting it to
other platforms will not be an issue.
We are taking the best unit
for each possible unit in the database and storing that one best realization
with some speech coding technique applied to it to further reduce the size.
The issue to be addressed here the criteria for choice of one best unit from
several possible. We have experimented with several possible approaches.
Unrestricted Domain Speech Synthesizer
for Indian Languages:
Following the successful
development of a Limited Domain Synthesizer, an obvious motivation was to
develop an Unrestricted Domain Speech Synthesizer using the features of
Limited Domain System like Use of syllables as Basic Unit and Capability of
Prosodic Matching Function for selected longer sequence. Also the problem of
unrestricted domain synthesizer brought along problems like “What to do when
a syllable is not found” etc. Solving each problem one by one in a
systematic fashion we were able to come up with a speech synthesizer which
tough not very good initializing showed the promise of natural sounding
speech.
Following this several
improvements and pruning continued through July to December. Finally by the
time system was officially launched by IIIT, Hyderabad very good quality of
Near Natural sounding speech was being produced by the system. Currently the
system is available for Hindi and Telugu.
Related Applications:
- Talking Tourist Aid
- Email Client for Hindi (with speech
support)
- English - Hindi Dictionary
- Samachaar Vaani: News Reader
TTS API for Windows
An API for the text to speech
system in Windows has been implemented for robust application development in
Windows environments. Currently documentation is on. Couple of sample
applications have been developed to demonstrate the use of the API. Also the
latest API distribution is distributed with a well packaged TTS with Unit
Pruning and news Unit Selection alogorithms implemented in it. A lean
distribution of nearly 32 MB size is ready which includes 2 voices one each
for Hindi and Telugu. Support for Installation of Pluggable voices is there
in the system and this distribution is certainly much more "Distributable"
and "Useable" by people who just want something that speaks and dont bother
about dwelling into the intricacies of Speech Synthesis.
Next step would be provide
large number of pluggable text processing modules for this distribution to
support lot of common fonts. Also sophisticated text normalization needs to
be implemented into the system
Text
Processing Front End: Font Convertors for Hindi
We have made Unicode as the
front end of inputs to our Text to Speech Engines and several Font
Convertors from various fonts and notations to Unicode have been developed.
|