Emotion Id Specification

Purpose

 
The purpose of this document is to share the Emotion Identification API specification so that GoVivace potential customers could test their integration. The contents of this document are GoVivace Proprietary and Subject to change.

Introduction

This API strives to expose Emotion Identification routines as a restful web service. The emotion identification process assumes that the audio file input is an 8KHz 16 bit linear PCM file. If wav format is used, the first 44 bytes are just treated like audio and have been found to work fine.

Usage

The emotion identification service accepts post requests with the audio in the body of the message at the specified URI. For example, using the curl command, one could do-

This is the CURL Command

curl –request POST –data-binary @sample1.wav https://services.govivace.com:7687/EmotionId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

You need to give three parameters:

  • Action : Identify
  • Format: 8K_PCM16, this file format supported by our server
  • Key: This is 32 letter’s alphanumeric authentication key which we will provide you for your data authentication.

Here, sample1.wav is an 8KHz sampling rate 16 bit linear PCM audio file. The body of the post would contain the entire audio file in 16bit linear PCM 8KHz format.

Websocket API

wss://services.govivace.com:7687/EmotionId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

(For Websocket API)

After the last block of speech data, a special 3-byte ANSI-encoded string “EOS” (“end-of-stream”) needs to be sent to the server. This tells the server that no more speech is coming.

After sending “EOS”, the client has to keep the WebSocket open to receive the result from the server. The server closes the connection itself when results have been sent to the client. No more audio can be sent via the same WebSocket after an “EOS” has been sent. In order to process a new audio stream, a new WebSocket connection has to be created by the client.

Python Client

python client.py –uri “wss://services.govivace.com:7687/EmotionId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx” –save-json-filename sample1_emotion.json –rate 8000 sample1.wav

Options for python client
➢–save-json-filename : Save the intermediate JSON to this specified file
➢–rate : Rate in bytes/sec at which audio should be sent to the server
➢–uri : Server websocket URI
➢–action : Action value which we want to perform like identify
➢–key : Authentication key
➢–file_format : Define file format (default is 8K_PCM16)

Response
{
“emotion_score”:0.737160325050354,
“status”:0,
“processing_time”:1.494483,
“message”:”Emotion identification is successful”,
“identified_emotion”:”neutral”
}
Server sends emotion identification results and other information to the client in the JSON format. The response can contain the following fields:

The server sends emotion identification results and other information to the client in the JSON format. The response can contain the following fields:
status: response status (integer), see codes below
message: status message
processing_time: total amount of time spent at the server side to process the audio
identified_emotion: neutral, angry, sad, happy and dominant
emotion_score: confidence score of identified emotion

The following status codes are currently in use-

  • 0: Success. Usually used when recognition result sent
  • 1: No speech. Send when the incoming audio contains a large portion of silence or non-speech
  • 2: Aborted. Recognition was aborted for some reason
  • 9: Not Available. Used when all recognizer processes are currently in use and recognition cannot be performed

 
NOTE: 

  • The range of emotion score is zero to one (0-1) i.e. minimum 0 and maximum 1.
  • The higher the number you get in the emotion score, the more accurate is the result.
Logo