Extracting information about characteristics of a speaker such as age, gender, language, accent or emotional state from speech has distinct importance in intelligent commercial dialogue systems and smart call centers. Consider somebody calling and call lands on an IVR system, and they request for a sales agent in the electronics department. Sure Speech to Text will do its job and take you to a representative in that department. However, you can also in the meantime estimate the speaker characteristics such as gender and age and maybe the part of the country they belong to. Then how about connecting them to an agent who hails from the same region, or probably has similar interests or age group?
Similarly, when a customer is in conversation with the agent at the call center, our solution will live track every single word they say and while keeping track of successes and related issues, the emotion identification system will also be working in parallel in the background. If the word conveys a negative sentiment, but there is laughter in the style of talking, maybe they are just getting along well and getting a chuckle. On the other hand, if the conversation is neutral, but the emotional style of talking is aggressive, maybe a supervisor should barge in and look into the matter.
Virtual reality and dialogue applications can begin to personalize themselves on the basis of these speech “body language” cues like gender, age, emotion, and accent. In a well-designed system, such an approach will improve user satisfaction.