• Find us on

Automatic Subtitling With Speech Recognition

Thanks to the evolution of speech recognition technology, there’s an easier and faster way of converting speech into text—such as for generating transcripts of audio content. Machines can perform this time-consuming tedious job accurately and quickly.
So who needs speech to text functionality?
Audio sections of libraries generate transcripts of faculty lectures and guest talks for students’ reference. Organizations like TED generate English transcripts of TED Talks, which make it easier for people to translate the talks into other languages for a wider reach. Broadcasters generate closed captioning, also called subtitling for television programming, which lets the deaf and those who are hard of hearing follow the news and shows.

The beauty of transcripts is that they increase the potential audience of single language video content. Generating multilingual video subtitles and adding the option to display one of these per the consumer’s preference makes the content consumable by those who do not speak the language of the original audio stream. Incidentally, research shows that subtitling can ensure that more people watch a video to its completion.
Transcripts also make content consumable in mute mode. For instance, Oracle uses speech to text technology to generate closed captioning for proceedings of its enthusiastically followed technology conferences: Oracle Partner KickOff, JavaOne, and Oracle OpenWorld. It acknowledges that subtitling gives “a global audience” access to the conference proceedings as well as “viewers tuning in at their desks requiring a muted volume.”
Another plus point of transcripts is they enhance the value of content by making it searchable for certain words and phrases. Search engines like Google, Bing and even a company’s internal search engine can fully index transcripts, and thus make previously inaccessible audio content accessible. For instance, a civil engineering student might want to search faculty lectures for the keywords “green building” or “glass house,” which is possible if the lectures have been transcribed.
Building speech to text functionality into apps could allow individual consumers to generate transcripts of live audio streams for their own consumption such as helping attendees of multilingual conferences or participants of multilingual conference calls understand what’s going on.
GoVivace’s Speech to Text offers large vocabulary speech recognition technology that is especially well suited to generating accurate subtitling or audio transcripts and thus speeding up what was once a labor-intensive job. Accurately converting speech into the text requires a speech engine that has a large and preferably, evolving vocabulary, so that it can understand practically any word that is fed to it. The GoVivace speech recognition engine fits the bill.

So, spare yourself from the listen-type-rewind cycle and raise the bar on subtitles in the process by taking advantage of our real-time multilingual transcription service and enhance your business processes and customer engagement.

Preventing Financial Fraud Through Voice Biometrics

In the internet age, loss of personal credentials through security breaches (such as the Target Data Breach in 2013) has become a common occurrence. This places individuals and financial institutions at an elevated risk of fraudulent activity.However, some of this activity may be stopped in its tracks by employing additional voice biometrics system to enhance security. Consider that case when a user calls the wire transfer line to request a wire. The user presents some “secret” information and then is able to complete the transaction. However, under this setup, the secret information could easily be stolen and then used against the account owner.

However, if the bank deploys a voice biometric system, not only will the credentials of the user be stored in the system’s database, also a small biometric key of the user that will be less than a kilobyte in size, shall contain the voiceprint of the user. Every time the user calls for a transaction, his voice shall be analyzed and a match will be created to tell if it is the same user. If an imposter makes the call, the system will clearly warn the bank agent to increase the security level and use stricter authentication methods.

There are several features of modern voice biometric systems that make such a system not a dream of the future, but a real implementation that is in the process of happening.

Firstly, voiceprints are small and it takes less than a kilobyte to store the voiceprint. It can be added to the same database that stores the passwords and other identifying information of the user, without much bloat in database size.

Secondly, the process of computing the match is extremely fast. A ten-second conversation at the start of the call may take just a fraction of a second to compare with the stored voiceprint and tell if it is a likely match.

Third, there used to be a time when the same speaker will be easily rejected if they use a different handset to make the call or if the telephone channel is facing issues. However recent advances in voice biometrics have significantly reduced the impact of such factors while maintaining the inter-speaker variability.

Financial institutions are already required to record conversations with their clients and store them for a certain period of time. If an institution determines to enhance its level of security through a voice biometrics system, these stored recordings could be used to enroll the voiceprints of all of their existing users. There will be no change to the user experience, though the call center agents will have an additional authentication metric available to them that allows the institution an additional layer of security without compromising convenience.

So, take advantage of the Govivace’s patented technology for speaker verification to enhance data security for your business and more importantly to enhance customer engagement and satisfaction.