Home > VocaTalk > How text-to-speech works and how VocaTalk enhances it

How text-to-speech works and how VocaTalk enhances it

I was planning to write an article about how text-to-speech works for people to understand the underlying core technology behind VocaTalk.  I just did a search and found this interesting article:  http://riteshhowto.wordpress.com/2009/02/18/how-text-to-speech-works.  I guess, this article is pretty good for non-developers too.  Thanks Ritesh!

VocaTalk does not come with a text-to-speech engine, but it does post-processing to the speech output.  VocaTalk supports SAPI 5.1 and 5.3 compatible voices engines like AT&T, Acapella and IVONA which are all implemented based on a similar principles described in Ritesh’s post.

After speech generation is complete, VocaTalk automatically starts enhancing the generated speech.  Since VocaTalk can use multiple different voices and even voice engines in the same text, the first thing it does is to automatically level the speech volume.  This is necessary because different voices may be recorded at different volume.  Then it turns the speech output into a stereo, 44Khz signal.  The speech engines generally use 16Khz or 22Khz sampling rate.  So VocaTalk has to resample the output to get a 44Khz signal.  This CD quality signal is then used to do all the other processing.  Here are some examples of interesting effects VocaTalk adds to the speech:  echoes, voice position, voice pitch modulation.  Then the selected music is mixed in with the speech.  The music volume is also automatically leveled for a comfortable listening.

When enhancement is done, VocaTalk encodes the media, embeds album art and the original text into the final mp3 file. This file can be played or copied to your mp3 player.  However, managing many such mp3 files is really cumbersome.  To make it easier to manage the generated files, VocaTalk also publishes those files as local podcasts and can automatically launch iTunes and Zune to subscribe and open them.  You can then download the files into your iPod or Zune device.  Those applications will automatically synchronize the podcast episodes and make sure you always have unlistened files on your player and also remember where you left off playing.

These features are really important for text listening that usually takes much longer than a song.  VocaTalk makes it easier and fun to listen hour long episodes and makes it easier to manage hundreds of  such episodes.  Unlike mp3 music files, episodes are designed to be listened once and thrown away.  Well, you don’t really have to throw away, but you get the point; it requires tools that acknowledge this fact to improve the experience.

There are other features that are currently under development.  All for one goal; to make hours and hours of listening and learning experience a fun and enjoyable one.

Categories: VocaTalk Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: