511 services offer information ranging from today’s weather to “how soon will the L7 bus be here”. Today millions of travelers rely on 511 every month, to retrieve traffic updates, transit information and alternative solutions to their transportation needs. 511 is an integral part of what is a larger effort to create “Smart Cities” throughout the world. Currently, IBI Group is instrumental in building “Smart Cities” in 11 countries. IBI is considered the industry leader in designing, organizing and producing 511 systems throughout the United States, most notably in the Greater Los Angeles area, (LA, Orange, and Ventura counties), as well as a 511 systems for the states of Florida, New York, Massachusetts, and Alaska.
The beauty of IBI Group’s 511 interactive voice response system is the ability to tailor each system for a particular region’s needs. Customization allows for a client to offer a handful of options in a rural area while a more advanced, multi-layer system is imperative for metropolitan areas. It is to be expected that users of this system will not be calling from quiet locations but rather noisy environments such as train stations, bus stops or at the side of the road. Therefore, the speech recognition technology utilized must include the first-rate algorithm to filter out the noise and 21st-century technology to interpret utterances into recognizable text. IBI Group has repeatedly chosen GoVivace.
Utilizing proprietary neural networks and deep learning techniques, GoVivace has been able to interpret the nosiest utterances, guiding the Voice User Interface (VUI) to find the appropriate response, thereby completing a caller’s request expeditiously and accurately. The GoVivace speech engine can handle a large vocabulary, or a constrained topic specific language model, which provides for a higher rate of accuracy when the application permits. Larry Baldwin, Manager of Voice Services at IBI Group, states “I chose, and will continue to choose, GoVivace for three main reasons: accuracy, service, and price. I’ve found that the GoVivace team has produced a very high-quality product and their team is very supportive.
The path from choosing a speech technology vendor to deploying a 511 solution was a long journey. Data sources that form the catalogue and menu of the system came from disparate sources. All that data needed to be normalized to an easily readable format. As is the nature of large cities, the names of the streets are not regular English words and the vocabulary sizes can be large. The resulting grammars were gigantic, and efficient optimization was needed to make everything work smoothly. IBI and GoVivace worked together to sort through all those issues.
GoVivace is the go-to speech provider that offers the combination of grammar based and large vocabulary speech recognition (say anything) with highly customized language models. This ensures optimum accuracy in noisy environments for highly vertical subjects such as finance, travel, and medical. GoVivace can also fit in the smallest domain, from silicon to IoT devices, while highly scalable on cloud services like AWS©or Azure©.
In the internet age, loss of personal credentials through security breaches (such as the recent Target breach) has become a common occurrence. This places individuals and financial institutions at an elevated risk of fraudulent activity. However, some of this activity may be stopped in its tracks through voice biometrics. Consider that case when a user calls the wire transfer line to request a wire. The user presents some “secret” information and then is able to complete the transaction. However, under this setup, the secret information could easily be stolen and then used against the account owner. However, if the bank deploys a voice biometric system, not only will the credentials of the user be stored in the system’s database, by also a small biometric key of the user that will be less than a kilobyte in size, shall contain the voiceprint of the user. Every time the user calls for a transaction, his voice shall be analyzed and a match will be created to tell if it is the same user. If an imposter makes the call, the system will clearly warn the agent to increase the security level and use stricter authentication methods.
There are several features of modern voice biometric systems that make such a system not a dream of the future, but a real implementation that is in the process of happening. First, the voiceprints are small. It takes less than a kilobyte to store the voiceprint. It can be added to the same database that stores the passwords and other identifying information of the user, without much bloat in database size.
Second, the process of computing the match is extremely fast. A ten second of conversation at the start of the call may take just a fraction of a second to compare with the stored voiceprint and tell if it is a likely match.
Third, there used to be a time when the same speaker will be easily rejected if they use a different handset to make the call or if the telephone channel is facing issues. However recent advances in voice biometrics have significantly reduced the impact of such factors while maintaining the inter-speaker variability.
Financial institutions are already required to record conversations with their clients and store them for a certain period of time. If an institution determines to enhance their level of security through voice biometrics, these stored recordings could be used to enrol the voiceprints of all of their existing users. There will be no change to the user experience, though the call centre agents will have an additional authentication metric available to them that allows the institution an additional layer of security without compromising convenience.
Texting thumb is every smartphone user’s nightmare. All because mobile apps have invaded every facet of modern lives—socializing, shopping, banking, and so on. And, they promise to get still more versatile. Voice technology is predicted to power the next wave of mobile apps.
Consider for example a grocery store shopping mobile app. What every smartphone owner could use is a voice-enabled version, which would accept spoken input instead of compelling the user to fumble on a tiny keyboard. And the app would actually understand and respond to what has been said. Using an online grocery store’s app, they could just rattle off their shopping list and let the app search for the items and fill the cart. Then they could command the app to proceed to checkout, and voila! Without a single keystroke, the chore is done (leave aside waiting for the home delivery). No fumbling, no worn out thumbs. Nothing beats hands-free voice input—wouldn’t you agree?
Welcome to the talky online world populated by the likes of Google voice and Siri, Apple’s personal assistant app. Technology to voice-enable mobile apps is already a reality.
GoVivace Inc. of McLean, VA has designed an advanced Automatic Speech Recognition engine that is specifically suited for voice-enabled mobile apps. Why’s that?
The key to building reliable and robust voice-enabled mobile apps is to construct a comprehensive application grammar and vocabulary, technical jargon for a set of pre-specified possibilities that the app will look up to understand the speech input. The more inclusive the grammar, the better the app will understand and seem intelligent to the user!
Say the user asks the voice-enabled mobile app of an online grocery store to add two packets of Oreo’s stuffed chocolate chip cookies to the shopping cart. A number of things happen behind the scenes. Just like Siri or any other voice-enabled mobile app, the audio stream representing the spoken input is compressed and sent to a waiting farm of servers. Those servers have also notified the context in which the input was spoken. Putting together the context and the input, the servers quickly change their language model to suit the situation and then convert the audio into text. The servers recognize that “two packets” is the quantity and “Oreo’s stuffed chocolate cookies” is the name of the item. Essentially, the item is looked up in the apps grammar and vocabulary representing the hundreds and even thousands of possible inputs the servers may have to process, and finally, the cart is updated. It involves a lot of steps but everything happens so quickly that the app user doesn’t notice the number of steps involved, and happily goes on shopping.
The performance of voice-enabled mobile apps also depends on the quality of the speech recognition engine. Ideally, the engine must be capable of understanding natural language and adapting to variations in voice quality and spoken content. At the same time, it should be easy to use and integrate.
GoVivace’s Automatic Speech Recognition uses both grammars and a statistical language model to understand natural language, which helps build highly precise voice-enabled mobile apps.
Thanks to the evolution of speech recognition technology, there’s an easier and faster way of converting speech into text—such as for generating transcripts of audio content. Machines can perform this time-consuming tedious job accurately and quickly.
So who needs speech to text functionality?
Audio sections of libraries generate transcripts of faculty lectures and guest talks for students’ reference. Organizations like TED generate English transcripts of TED talks, which make it easier for people to translate the talks into other languages for a wider reach. Broadcasters generate closed captioning, also called subtitling for television programming, which lets the deaf and those who are hard of hearing follow the news and shows.
Transcripts also make content consumable in mute mode. For instance, Oracle uses speech to text technology to generate closed captioning for proceedings of its enthusiastically followed technology conferences: Oracle Partner KickOff, JavaOne, and Oracle OpenWorld. It acknowledges that subtitling gives “a global audience” access to the conference proceedings as well as “viewers tuning in at their desks requiring muted volume.”
Another plus point of transcripts is they enhance the value of content by making it searchable for certain words and phrases. Search engines like Google, Bing and even a company’s internal search engine can fully index transcripts, and thus make previously inaccessible audio content accessible. For instance, a civil engineering student might want to search faculty lectures for the keywords “green building” or “glass house,” which is possible if the lectures have been transcribed.
Building speech to text functionality into apps could allow individual consumers to generate transcripts of live audio streams for their own consumption—such as helping attendees of multilingual conferences or participants of multilingual conference calls understand what’s going on.
GoVivace offers large vocabulary speech recognition technology that is especially well suited to generating accurate subtitling or audio transcripts and thus speeding up what was once a labour-intensive job. Accurately converting speech into text requires a speech engine that has a large and preferably, evolving vocabulary, so that it can understand practically any word that is fed to it. The GoVivace speech engine fits the bill.
Spare yourself from the listen-type-rewind cycle and raise the bar on subtitles in the process.