Rohit's Blog and Notes and Everything that hits me

 
 

[ << Go Back ]

 

[ ^^ Goto Home ]

 
 

3. May. 2004:

Ah long time. No notes. Now much either today. Just added a Final Year / Summer Project list recently about some of the ideas i want to work or get students to work on. You can check out the list HERE. Am i missing anything that i wanted to write. May be no. Or may be too much of it. OKey. Will put up when i remember.

14 November 2003:

OK since we are doing Audio Visual Text to Speech Systems here, was just wondering about what would be the "good" application of such a system. News Reader ... ah not again. So what else. I was remembering the presentation by Prof. Zue on EECS, MIT Centenary Celebration about Anthropomorphic Interfaces and the point raised by the elderly MIT Grad (i am sure he was from the DoD) about how the Audio Visual Speech Synthesis System (Mary101) can be miss used. Certainly miss use of technology is found even before a "good" use of technology could be found. Ofcourse a parallel appealing use of it would be being able to generate CM Chandrababu Naidu's audio visual speeches in 4 languages just from the script of his talk. And the people of AP would believe the technology to be God. But is that technology exploiting the ignorance of people. Ethical Issue. These bug me.... Let me not talk about them here.

OK but the same technology (with a couple of more things) can be used to provide a lot of things that humans always wanted. Say how about the ability to be at more than one place at the same time. You got it. And how about the ability to live forever. You almost got it. For females: the ability to always be the same as you were when you first got married. A dream probably. That is what all the portrait business started with right and then went on to be the photographs and videos businesses. So we take it forward. "This is a sellable application dear. whoa!!".

So what do we do. We ll make a audio visual speech synthesis system for any person who wants to be "immortal" in this particular sense of the word, by collecting some corpus of his. And then we ll build up AIML knowledge bases for things specific to the person. And put it all together and put it all in a place on the internet or may be on a CD or a fancy storage device which connects through USB and has a photo of the person nicely embossed on it. And then anyone, anywhere can plug in that device and talk to it. Ofcourse we are gonna charge the person a hell of a money to do this.

The person (if all that popular) can live for his fans forever (who would not be able to see him live anyway). So the first thought i get here is BAD. And i know i should not be writing it here, but i will. Consider such a life for Sadaam Hussein and he could be leading the whole of Iraq simply by this even when he dies. And then a good application probably is the Allied Force creating something like this (ofcourse collecting the corpus would then be hell of a job. But say if they capture Saddam somehow) and sending messages from the reborn Sadaam to his people to accept the new reign and live with it peacefully. (I hope this doesnt do me any harm with the apping)

There are a plenty of things we can do. What you really do is what matters.

8 November 2003:   Not able to find time to update it daily. So missing a lot of things. Anyway, here are a couple of recent ones.

While in college recently, we wanted to catch up on someone and could not because we did not have his mobile number. In this situation of need, we found an application that would be use to so many people. A service which may be provided by the mobile service provider or by the independents like Yahoo etc. by which someone can send an SMS to a number followed by message NUM rohitofpec. So the service should return back the mobile number of the user subscribed with the ID rohitofpec. As such no big deal technology wise for people into it. But it would be serving a great purpose to the users. Especially users of services like Reliance whose number changes as one moves from place to place. Everytime someone's number changes the mobile owner sends an update message to the service like UPDATE rohitofpec (some password) 9805012345. So then the people want to connect to this person can easily get the new number by one SMS and can get back to the person. I ll be glad if someone implements this service and more if someone lets me know if something like it has been implemented.

Broadly what i am seeing is that there should be a mechanism that does to Mobile No.s what URLs did to IPs way back in the 80s. Now how and who is a set of permutations and combinations with associated costs and benefits and i ll write a thesis on that, but not now. For analogy sakes, could be like host files which currently Address Books are doing, or some DNS service etc. And people say they are interested in Networking. Crap.

Also i was explaining today to one of the new persons in my lab that what we are doing for speech synthesis is basically search and infact all AI is basically search. Also since we have all this infinite memory and infinite bandwidth thing going around and which the core of emergence of data driven approaches and lot of other things, i was thinking that the simplest and fastest search would be search done offline. ie when we create a search space, we ll also create the solutions of the all possible queries that can be searched in that search space. Actually i am thinking from purely a Unit selection point of view where we are selecting a unit given the desirable features from the previous unit. So, while feature extraction, i ll create an index that will contain information telling given a unit x of the sound X and wanting a unit y of the unit Y where (X and Y belong to set of unique units), (x belongs to set of X units), (y belongs to set of Y units) and (x and y belong to set of all units), we can have pre-created information in a 2 dimensional array that given x and looking for any Y, which is the most suitable for y. Then search would simple be retrieving the contents of the current index from the array. This will be incredibly fast, but as i calculated for my database of Hindi, it will take another 160 MB to store this additional information. Given the infinite memory and infinite bandwidth funda, such an implementation may not sound crazy to many in some years from now.

The other day, i was having a discussion with one of our geek party about Google Hindi. He said it was great and i felt it wasn't much. Primarily it was just doing what Google normally does (which is ofcourse impressive), but only tuning the search space to more Indian websites. But then the issue is the websites which uses strange notations for Hindi using a variety of fonts. At that time, i cam up with an application which may be desirable by many. In any case we have to assume one standard. I like UNICODE (because its an international effort that encompasses (atleast hopes to) everything that we are ever going to face and also because MS Internet explorer and even Netscape is doing pretty good rendering of Hindi utf-8 now (though its still not 100%) and also because we are putting UNICODE as the front end of our TTS now). So then we ll make a set of perfect converters from the all the common fonts to utf-8 and then we ll make a website that will take a URL as input and will replace all font Hindi content that is there on the website with its utf-8 equivalent and will display the page. It would be something like pornolize.com. But it will enable anyone to see any hindi website without really needing any particular fonts etc. to be installed. This is a basic service that many many would want to use. Particular if not the end user then people like Google and ourselves who may want to have all hindi data in a common format from one place so that we can process it and provide services over it (like search) commonly. I think i ll be in a state to do this soon. But lets see. May be some student can take it up as a project.

10 October 2003: Yesterday had a paper presentation at Hyderabad University. Got to see Prof. Yegnanarayana LIVE. He was there to give Keynote address on "Trends in Speech Technology". Saw the person that Kishore described sometimes during interactions. I must blog that lecture as a major milestone in my journey through Speech & Language Technology field. He presented several radical views about speech processing telling us what is wrong and what is missing and what is still to be done. Showed several news directions and i could relate them to several things that are on-going and how these ideas could actually change the face of the current work. Ofcourse it needs some seminal works in direction, some addition to the science and not to the technology. And i that is what we all are waiting for a long time. I am sure that lecture will have considerable influence on my future ventures especially during Ph.D. Will write some notes on things jumping in my head right now sometime.


Will start putting content here soon. But this is not for news. This space is only for stuff that strike me as random excitation in time domain.

Need to put up a form based method to add to it. I cannot keep on editing html pages and uploading them everytime. Will do this soon.
 

[ Top ]


This Page Is: http://speech.iiit.net/~rohit/blog.htm
Last Update On:
Monday, 3. May. 2004