Govivace’s Patented Speech to Text technology accurately transcribes audio in real-time.
Our Patented Speech to Text (STT) solution Listener is based on state-of-the-art Automatic Speech Recognition (ASR) technology that enables machines to understand and transcribe speech. The STT solution uses advanced machine learning algorithms and natural language processing techniques to accurately recognize and transcribe speech in real time. It supports standard telephony as well as web and mobile applications.
On the back end, Listener’s powerful ASR engines utilize state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) models developed by GoVivace speech and NLP scientists. Listener is capable of transcribing audio of any length in an online streaming fashion as well as in offline mode. Depending on the needs of the clients, the large ASR models can also be easily customized to support custom lingo and business terms and proper nouns. The ASR engine supports Keyword Spotting and Hint Word Recognition as well when requested.
The speech-to-text solution can be also provided as a grammar-based ASR, where very simple to very large grammar can be processed. It can easily support very large grammars for complex tasks such as dates, complex commands, and yellow pages styled complex directory lookups. Performance tuning is another service, whereby we troubleshoot poorly performing grammar by tuning the acoustic and language models for the preferred service. These grammar-based ASR engines can work with both pre-compiled grammar that can be referenced by name, and on-the-fly grammar that evolves as the client uses the application and which can be detected if reused. Both kinds of grammar are stored on the server after compilation, to ensure fast processing. GoVivace also offers consulting services for the design and development of complex grammar for our clients.
GoVivace provides multilingual speech-to-text engines, covering the US, UK and Indian accented English, Spanish, Portuguese, French, German and Italian. Further, it provides English mixed code-switching ASR for some major Indic languages like Hindi, Tamil, Telugu, Kannada, Marathi, Gujarati, Bengali, Malayalam, and Assamese and we are adding more in the upcoming days. Indic ASR supports Indian English accents, considering the speaking habits of the Indian population.
Our distributed client-server architecture supports easy scaling and an ever-growing list of client devices. A load balancer can be used as the front end, and servers added to the system at the back end to allow for redundancy, reliability, and scalability. In addition to this, we also support MRCPv2 for our ASR solution. Since ASR solutions are used extensively in the commercial world at different levels, businesses can use GoVivace’s ASR plugin for the UniMRCP servers for their requirements.
On-device solutions are available upon request to support the hardware of your choice.
Listener is available on Linux, Windows, and Mac platforms. A minimum of 4GB of RAM and a 2.0GHz processor is recommended for enterprises and SMB customers.
Some key features of ASR :
- Accurate and Noise robust
- We provide SDK library and WebSocket-based live transcription with bidirectional streaming to use the software as a service(SaaS) or on-premise service deployment.
- The same speech recognition engine can be used to build mobile, web and high-volume applications to work around the clock, providing the uniform user experience
- Keyword Spotting and voice-trigger capability
- Hint word recognition
- Supports a distributed client/server architecture for easy scaling and an ever-growing list of devices
- Supports multiple languages and multiple accents
- Supports code-switched and multilingual ASR
- Supports unlimited vocabulary and continuous streaming of audio of any length
Listener can be used in a wide range of applications, including:
- Conversational AI: Snappy response times and accuracy delight users as if they were talking to a real person.
- Voicemail: To build applications that convert voicemail into an email
- Dictation: To integrate with live dictation systems, note-taking and e-learning applications
- Transcription: To transcribe audio recordings of meetings, lectures, and interviews, making it easier to search for and analyze specific information.
- Keyword spotting and hint word recognition: For finding a particular keyword in an audio clip or to boast recognition of a word or a set of words.
- Call centers: To automate call center operations, such as call routing, information retrieval, and customer service.
- Closed Captioning: To create closed captions for television programs, movies, and other video content, making it accessible to viewers who are deaf or hard of hearing.
- Voice assistants: To integrate with voice assistants like VIVI, allowing users to interact through natural language.
- Language learning: To integrate with language learning applications helping language learners to improve their pronunciation and speaking skills by providing feedback on their speech.
- Healthcare: To transcribe medical notes and records, which can save time and improve accuracy for a wide range of healthcare professionals.