Rohit Kumar's Current Work

 
 

[ << Go Back ]

 

[ ^^ Goto Home ]

 
 

June 2004: Speech Recognition and all the lot

Kishore is back. And is in ASR mode right now. So lets work on it. Guess the next two months are going to be hectic with this work. Particularly when you want to work with IIIT Students. And yes more work with PICOPETA and may be OUTSIDE-ECHO. And i need to pack up Khabrein and get it out this month. So more research on that side can be done. Further i must write papers on work done this summer.
 


May 2004: Summer Interns and lots more..

Working with several summer interns on lots of projects. See my FYP/Summer Projects List. Some are core paper projects. Some are add on modules to systems we have developed i.e TTS and Khabrein. Also working on new projects under collaborations.

Further got some deliverables to send to PICOPETA. So lots to do this month.
 


March 2004: Paper Writing

Writing couple of long time pending papers. Want to be done with it by mid April.

Yes did complete them and submitted. See publications page for some briefs.
 


February - March 2004: Khabrein - Online Hindi News Archives

Started to work on a basic system for Hindi (in general Indian Language) News archiving and providing services using the news archives. Services like one place news, spoken news bulletins, and further to extend search, categorization, news alerts, natural language query etc. Have a preliminary. Write to me and i can send you a link. I am not sure of the copyright implications etc.
 


February - March 2004:
Working with Sebsibe on Prosodic Manipulation & Limited Vocabulary Speech Recognition

Sebsibe H/Mariam is a Ph.D. students working with us in Speech Lab here. He is a student at IIIT under exchange program from Ethiopia. He took up a semester project course to develop a Prosody Manipulation system (basically pitch and duration modifiers modules). So lately i have sitting and discussing with him various details of implementation of the modifiers and associated modules. He has done good work and has come up nice modules for the same which we plan to use for some more interesting work in coming summer.

Also he started working on Limited Domain Isolated Word Recognition System (concentrating on Indian Languages and Amharic) under the ARGUS Course. So we decided to build a very simple to use toolkit (RecogPack) for rapidly building limited  domain recognizers. A elementary version of it has been built, but more work need to be done to make it computationally practicable. Coming summer will be good time for it all.
 


December 2003 - February 2004: Indian Language TTS - Text Processing Front End

Start of December i decided to have two items on my agenda for the TTS. One of them is to build a text processing front end for the Indian Language TTS we are building here. By good, i wanted to have an well design set of blocks for font converters, text normalization, NLP, phonetization (& syllabification). Also wanted to get more standardized with the notation that the rest of department and the world follows. Basic punch to the requirement was Kishore's specification that we must be supporting Unicode, ISCII and ITrans. Unicode and ISCII are supposed, i am not sure if ITrans is worth the effort, so its on standby for now. But yes the converters for large number of fonts and notations for Hindi and Telugu are ready and language independent text normalization modules are in place and a good design for NLP modules has evolved which is pretty much in-line with the efforts going on in the other labs in LTRC.

Check out a recent presentation on Indian Language Text Processing Front End we have developed.
 


November 2003: Photorealistic Visual Speech Synthesis

Have started working on developing a Visual Speech Synthesis System built on Corpus based Approaches and Synthesis by Unit Selection. Have a basic plan in mind involving achieving co-articulation by use of divisemes, optimal text selection to provide coverage of divisemes, collection and segmentation of corpus and final synthesis by unit selection and morphing. Here are some notes i wrote some time back.
 


October 2003: Some recent developments in Indian Language TTS work

A Unit Pruning Approach has been implemented and from preliminary observations, we find that we can remove upto 50% to 70% of the units present in the database without perceivable loss of quality (naturalness as well as intelligibility). Though need to establish it through perceptual testing. Also some objective measure can be used for evaluation.

A system based on An Evolutionary Algorithm for Unit Selection has been implemented and is producing equivalent quality as earlier systems. Need to further tune the various parameters and conduct several experiments and tests.

More recently an API for the text to speech system in Windows has been implemented for robust application development in Windows environments. For some details see API for Windows
 


Currently, we at Speech Lab, LTRC are concentrating our efforts on developing Text to Speech Systems for Indian Languages and related research. As we are working with Data Driven Approaches, the related work involves the work on text corpus selection, speech corpus generation and annotation and also some perceptual testing often. So some analysis of current outputs and figuring out where things are going wrong, coming up with ideas and all this is what goes on.

Also as i am attending the Speech Technology Course and also coordinating it at IIIT end since it is being taken by Mr. S. P. Kishore from CMU, quite bit of my time is taken by the course and its assignments.

Also i am interacting with students of Speech Course and another student working in Speech Lab for Final Yr. Project and am also working along with a Ph. D. candidate, i figure i have a lot of interaction job to do besides research and development.

We are developing systems as well doing basic research. System development and integration work involves making the current synthesizer more and more useable and distributable products and basic research work involves exploring better ways of synthesis and ways on improving the current synthesis both in the text as well as the signal domains.

Immediately i am exploring Speech Coding techniques to compress the corpus to a small size for easy distribution of the system. The L. M. D. S. system is a result of massive scale down of the full synthesizer to bare minimums. See the projects page for a brief of this project. Also i am exploring techniques for database pruning. Presently i am also working on developing an Evolutionary Approach for Unit Selection based Synthesis that i have come up with recently. I want to spend significant time of research on analysing the results of the changing the various parameters of the Evolutionary Algorithm.

Also there are there are lot of ideas jumping in my head which i want to work on some time possibly later in life or as projects with some IIIT students.

Also See:    < My Projects >     < My Ideas >

I maintain a technical blog of things i need to discuss with my guide, my boss, my colleague and everyone else (there is only one other person). The blog is at http://speech.iiit.net/~speech/rohit/

 

[ Top ]


This Page Is: http://speech.iiit.net/~rohit/work.htm
Last Update On:
Saturday. 12. June. 2004