Consider somebody calling and call lands on an IVR system, and they request for a sales agent in the electronics department. Sure – speech to text will do its job and take you to a representative in that department. However, you can also in the meantime estimate the speaker characteristics such as gender and age and maybe the part of the country they belong to. Then how about connecting them to an agent who hails from the same region, or probably has similar interests or age group?
Similarly, when two people are talking at a call centre, and our technology is tracking live every single word they say and keeping track of successes and issues, the emotion identification system will also be working in parallel in the background. If the word conveys a negative sentiment, but there is laughter in the style of talking, maybe they are just getting along well and getting a chuckle. On the other hand, if the conversation is neutral, but the emotional style of talking is aggressive, maybe a supervisor should barge in and look into the matter.
Virtual reality and dialogue applications can begin to personalize themselves on the basis of these speech “body language” cues like gender, age, emotion, and accent. In a well-designed system, such an approach will improve user satisfaction.