VOIP – Jason Goecke – Teaching Your Application to REALLY Talk
Jason Goecke is the Vice President of Innovation at Voxeo Labs and this article was originally posted on the Tropo blog. He’s been a friend and technology mentor to Agora Media’s Richard Kastelein since 2003 when they built the original Expatforums community as North American tech refugees in Europe.
Speech Synthesis, otherwise known as Text to Speech (TTS), is a technology that quickly synthesizes a human voice using text as input. Speech synthesis is the default behavior for voice calls on the Tropo platform. The Tropo ‘say‘ verb is the one that provides the TTS capability, by taking a string of text and speaking it back. It is of course possible for this verb to take a URL to a ‘wav’ or ‘mp3′ file for pre-recorded audio to be played as well.
When it comes to teaching your application to speak we follow the Perl ethos of making “the simple things easy and difficult things possible”. So your application may speak very well with the simplicity of our APIs, or it may be as sophisticated and emotional as you like through Tropo exposing powerful capabilities for giving your voices character.
For our first example we will simply say:
1 |
say 'I like squirrels!' |
Which then renders this audio.
Next, we may choose from a voice that speaks any number of languages supported by Tropo (US/UK English, Castilian/Mexican Spanish, French, German, Italian & Dutch). Lets give French a try for our next example:
1 |
say "J'aime les écureuils!", :voice => 'florence' |
Which then renders this audio.
Now, those were the simple examples that anyone may use to add a little speech to their applications. But, remember, we also make the difficult possible for those who want to really make their characters speak. As sometimes simply customizing the voice is not enough. There are cases when you’d also like control over pitch, volume and intonation. Tropo natively supports a standard called the Synthesized Speech Markup Language (SSML).
The Speech Synthesis Markup Language (SSML) is a W3C standard for controlling the pace, tone, pitch and all around sound of computer generated voices. Here’s a Ruby script that repeats the same sentence four times; each at a gradually lower speed:
1 |
answer |
2 |
say "<speak> I like squirrels!. |
3 |
I <prosody rate='-10%'>like squirrels!</prosody> |
4 |
I <prosody rate='-30%'>like squirrels!</prosody> |
5 |
I <prosody rate='-50%'>like squirrels!</prosody> |
6 |
</speak>" |
7 |
hangup |
Which renders this audio. The previous example made use of the rate property of the SSML prosody element to control the playback speed. There are many other elements and attributes you may use, including: emphasis, phoneme, etc. To learn more about SSML and related technologies check out the W3C site at http://www.w3.org/TR/speech-synthesis/.
If you would like to call in and listen to these examples live, you may do so by dialing +990009369991429940 on Skype (free) or calling +1.408.940.5920 from any phone. What are you waiting for? Get started by signing up for an always free developer account @ Tropo.com.
TweetWhere does Mobile stand in your strategy?
Mobile is in the spotlights, at the Mobile World Congress in Barcelona, like the Verizon Wireless-Skype agreement, Google’s new Mobile mantra, the Windows phone etc.
Interesting announcements are made, ubiquity is becoming clear, the Mobile Web is being adopted fast, Mobile Search rises and many more -convergent- developments.
Where does Mobile stand in your strategy?
If it’s important for you, what are you doing right now to incorporate it in your strategy?
If not, what are the reasons for that decision?
Discuss as well via Google Wave
read more
T-Commerce – where are the conversations?
T-Commerce is a new -rising- market which has great potential, much is written from the developers/business point of view, products are being developed and so on.
As far as I can see, there’s a discrepancy though between demand and supply? I’ve scanned the social sphere but from the users’ point of view, I’m not seeing many conversations, hype, interest and so on.
Am I wrong? If not, due to what is it being caused?
Discuss as well via Google Wave
read more
Top 3 trends to look out for in 2010
What is your top 3 with regard to Marketing, Internet, eCommerce etc, which we need to look out for next year, or where you think emphasis will be put on?
My personal top 3 is:
* Mobile marketing/applications (Social & Augmented Reality)
* Social TV, with a focus on concepts to further develop convergence of the TV, Mobile and Computer screen (“whatever screen works”)
* Further diffusion and software development for social media purposes
Do discuss in Google Wave as well, below.
Internet “Strategy”
An interesting remark / point of perspective came up in one of the other Waves, where was debated that Internet cannot be a strategy because it should support the marketing strategy.
“In my opinion there is the marketing strategy, as defined by the classics like brand positioning, target segments, benefits of offer and experience. Internet consists of a series of touch points that should support that.
Some could be critical touch points that support the unique benefit, or just be touch points amongst the path that just need not to oppose the benefit. Customer needs comes first, then the positioning/ benefit, then the touch point analysis, that decides on the internet approach.”
I do find this an interesting theorem/point of perspective, what do you think?
Can the Internet be strategized or do you concur with the abovementioned?
Do discuss in Google Wave as well, below.
TweetPrint Media and its place within Marketing
Following the remark of the editor-in-chief of Esquire with regard to their usage of Augmented Reality in their upcoming cover:
“I got so sick of people talking about old media versus new media. I wanted to prove that print is still kind of cool. I think of it as kind of our job to show people the strength of our medium.”
Do you think Print Media still has a place within the marketing strategy? Are they desperate attempts or can it revive in a more innovative cross-channel manner?
TweetMobile & Social Networking Convergence
I created the category Collective Intelligence to have open discussions on certain topics, mainly through Google Wave, being able to experiment.
The fact that the content in a Wave is centralized, portable but also bounded (by displaying) on a specific site creates new angles.
Discuss through Wave or post your comment below.
Tweet