Conversation Processors

A Conversation Processor represents a discrete task that can be performed on the recorded audio stored in a Conversation object. Each processor produces some output which is accessible as a field on the Conversation object. Processors can performs tasks like transcribe speech, extract topics, and estimate sentiment.

Processors can be bound to a Conversation at creation time or after the fact via a call to the Process a conversation endpoint. Each processor has a unique name that is used when binding.

You may bind additional processors to a Conversation at any time. If a processor has already been bound to a Conversation, it will simply remain bound. Therefore, one processor will run at most once against a single Conversation and the API user needn’t worry about performing redundant calculations.

Some processors have dependencies on other processors. For example the Find Topics processor uses the transcript from the Transcribe processor to generate the topics. If a dependant processor is bound to a conversation, the system will automatically bind all the processors it requires, regardless of whether they are explicitly bound by the client. This frees the client from having to know about the implementation details of individual processors.


output fields:transcript

Transcribes the entire conversation. Transcript is provided as a list of segments, each containing the transcribed text, the timestamp, and the connection id of the speaker.

Find Topics

output fields:topics

Generates a list of topics of conversation. Selects words based on their information content, frequency of use, and relevance to the transcript as a whole.


output fields:detected_classes

Performs classification on a conversation. The model must already be constructed.


output fields:call_grades

Grades a call against a set of common quality metrics based on the voice and speech content.

This call_grades object is placed in a Conversation when it’s requested as a processor. Call grading looks at both the speech content (words) and voice content (signal), and so it will perform transcription if the Conversation hasn’t already been transcribed.

All of the following values range from 0.0 to 1.0. The meaning of that scale is defined for each item below.

Property Type Description
Required Properties
outcome float Whether the parties of the call appear to have reached a positive result in the conversation. Higher is more successful.
quality float A measure of whether the call was cordial and professional. Higher is more cordial.
experience float Whether the parties analyzed seemed confident and capable given the topics discussed. Higher is more competent.
proactivity float The extent to which problems were addressed before they escalated. Higher is more proactive.
trust float A measure of perceived honesty and trust given the tone and speech content. Higher is better.
empathy float A measure of the extent to which parties reflexively react to the emotions of each other. Higher is more empathetic.