Monitoring Live Calls

The Sift API offers the unique ability to get transcripts and NLP results from calls as they are happening. Any call or conference initiated though Sift can be transcribed and analyzed with just a few extra parameters.

Handling event callbacks

Any connection routine command that initiates a Conversation can specify a callback URL that will receive HTTP requests on certain voice events. Commands that start Conversations include JoinConference, CallPhone, and CallClient.

Let’s look at an example of a connection routine that declares an event callback:

    "routine": [
            "name": "JoinConference",
            "conference_name": "party-room",
            "on_event": "",
            "live_transcription": true,
            "live_topics": true,
            "live_talk_time": true,
            "do_record": true,

This routine executes a JoinConference command, which tells the target connection to join a conference room called “party-room”. As soon as the first person enters the conference, a new Conversation begins. Conversations have many useful properties: you can record them, process them using Conversation Processors and receive event callbacks, which is what we will look at now.

In the JoinConference command, we have added the on_event property. This specifies a URL where we expect to receive the events. A Sift server will send an HTTP POST request to this URL during the call when an event of interest occurs.

The rest of the parameters specify which types of events we want to recieve. In this case, we have enabled three events: transcripts, topics and talk time.

We will also receive the Conversation Ended Event when the last person disconnects from the conference (usually by hanging up). We don’t need to specify any extra parameters to receive this event; it is always sent if the on_event URL is defined.

Transcript events

Transcript events have the following form:

    "conversation_id": "5d3dc73fc8cf34be",
    "conference_name": "party-room",
    "event_type": "transcript",
    "transcript": "yes i would love to subscribe to some magazines",
    "connection_id": "3143d63913ff344c",
    "timestamp_ms": 1462514977421,
    "duration_ms": 10430

The fields on this event have the following meanings:


This field contains the ID of the conversation that is being generated. Every conversation has a unique ID, even if the conference room is the same. So, after the last person leaves the “party-room” conference, we will get a conversation ended event. If some time later we used another ConnectToConference command to send a participant to “party-room”, a new conversation ID would be generated by Sift.


When a Conversation is generated by a ConnectToConference command, the conference_name property is populated with the name of the Conference that generated the Conversation. If we had used the CallPhone or CallClient command instead, this field would be omitted from the event.


Specifies what type of event this is. See Conversation Events for a full list of possible event types.


The transcript field contains a small bit of transcribed text from the conversation. The amount of text provided will vary depending on the speaker’s rate of speech and how often they pause. The text can correspond to between a few seconds and a couple of minutes of audio data.


This is the ID of the Connection in the conversation that said the given text. Transcript events are always sent for one speaker at a time


The time at which the text was said, represented as an integer number of milliseconds since the Unix Epoch. Usually text is sent within a few seconds of being spoken.

Accuracy Tip

Although Sift uses state-of-the art speech recognition technology, transcript quality can vary depending on the recording conditions. Just like it can be hard for a person to understand someone in a noisy room or on a bad phone connection, machine transcription can become innacurrate when it is hard to hear.

If possible, consider instructing end users to talk in a low-noise environment and stay close to the microphone if they are not using a handset.

Topic events

The live_topics parameter enables topic events. Topic events provide a list of high-level topics related to the conversation. Topics are great for use cases where a transcript might be too much text, like displaying a list of active conversations with a short description for each.

    "conversation_id": "5d3dc73fc8cf34be",
    "conference_name": "party-room",
    "event_type": "topics",
    "topics": ["vacation", "grand canyon", "weather"]

Topics are chosen by Sift algorithmically based on the transcript. Topic events are sent every few minutes.

Talk time events

For some applications it is useful to know how long each participant has been talking in the conversation. For example, you might use it to encourage equal amounts of participantion between all participants in a call. The Talk Time Event provides a measure of time spent talking for each connection in a Conversation.

    "conversation_id": "5d3dc73fc8cf34be",
    "conference_name": "party-room",
    "event_type": "talk-time",
    "talk_times": {
        "3143d63913ff344c": 5128,
        "6cc621614fc279b0": 290282


The talk_times property contains a dictionary mapping from connection ids to milliseconds spent speaking in the current conversation. In this example, connection 3143d63913ff344c has talked for about five seconds, while her conversation partner has spoken for 290 seconds.


Sift measures talk time by listening for voice activity on a connection. If there are multiple participants talking on the same connection—a speaker phone in a large conference room, for instance—Sift has no way of knowing who is who, and will add to the talk time tally for that connection as a whole.