Recording Calls

The Sift API allows you individually record the voice of each participant in a call, as well as the full conversation.

Call recording is available for point to point calls initiated via the CallPhone and CallClient commands, as well as conferences created with JoinConference. In this example, we will set up recording on a conference call.

Enabling recording

There are many ways to initiate a call though Sift. For conference calls, it is usually most convenient to have participants connect via a dial-in number. The Receiving an incoming call guide covers how to set up a Sift phone number and associate it with an Application.

As before, we will implement an on_incoming callback URL using Flask.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/my_conference', methods=['POST'])
def my_conference():
    """Connects to the conference."""
    response = {
        'routine': [
                'name': 'JoinConference',
                'conference_name': 'my_conference',
                'do_record': True,
                'event_callback': ''
    return jsonify(response)

The callback simply connects all incoming calls to a conference room named “my_conference”. Since we have set the do_record property to True, the Sift API will generate individual recordings for each participant, as well as a unified conference recording. We have also provided the event_callback parameter so that our server will be notified when the conference ends and our recordings are ready.

Recordings are available immediately after the call ends. The Sift API notifies us when the conference ends via the event_callback URI, which we have implemented below.

@app.route('/conference_event', methods=['POST'])
def conference_event():
    """Gets the recording when the conference ends."""
    event = request.get_json()
    if event['type'] == 'ended':
        conversation_id = event['conversation_id']

        response = requests.get('' + conversation_id)
        conversation = response.json()

We first check the event type. There are multiple different events that may be sent during the conference, including live transcripts and topics. We are interested in the Conversation Ended Event. Once we know we have the right event, We can use the conversation_id property to get the conversation from the Retrieve a conversation endpoint.

Full conversation recording

We can get a recording of the full conversation from the Conversation object returned by the Sift REST API.

wav_url = conversation['audio']['wav_file']
urllib.retrieveurl(wav_url, '~/')

Here we use the built-in urllib python library to download the .wav recording to our home directory. The audio is also available in a compressed .mp3 format via the mp3_file property.

Single-speaker recordings

In some scenarios, it is appropriate to record some participants in a conversation but not others. For instance, a salesperson may want to record their own voice during a call to review their sales pitch, but may want to avoid recording the voice of the customer for privacy reasons. The Sift API allows you to retrieve recordings for each individual channel of a call to assist with such scenarios.

Individual party recordings are also useful to improve understandability. If there are many speakers talking at once, or one side of a connection has a loud background noise, the listener can isolate a particular speaker to get a cleaner signal.

for channel in conversation['channels']:
    if channel['from'] == '+15558881188':
        # This is the recording we are interested in.
        urllib3.retrieveurl(channel['audio']['wav_file'], '/path/to/file/')

Each connection that joins the conference adds a new channel object to the channels property of the Conversation object. Each channel object stores the isolated recording from that connection, as well as information about the connection itself. In this example, we examine the from property of each channel in the conversation to find the phone number of the participant we are interested in hearing.


In some circumstances the same connection can generate multiple channels in the same Conversation. For instance, if a connection leaves a conference and then re-joins some time later, there will be two channels in the final Conversation with the same connection_id property but different start_ms values. This ensures that each channel is a single contiguous block of audio.

Audio quality metrics

Recording quality strongly impacts the accuracy of Sift’s transcription and analysis capabilities. The API provides two measures of audio quality for each audio recording: signal-to-noise ratio (SNR) and reverberation time (rt60). These values are available in the audio properties of the Conversation and Channel objects. Both are stored as real numbers.

if conversation['audio']['snr'] < 3.0:
    print 'The conversation was very noisy'
if conversation['audio']['rt60'] > 5.0:
    print 'The conversation had lots of echo'

A high snr value is desired, while lower values indicate that there was a lot of background noise on the recording. A low rt60 value is desirable, while a high value indicates that there was a lot of echo in the recording environment. Using a speaker phone in a large room or speaking far away from the microphone can increase the rt60 value.

Both metrics are also available on each channel, which is useful for detecting which end of a call is the source of quality degradation.

for channel in conversation['channels']:
    if channel['audio']['snr'] < 3.0:
        print 'connection ' + channel['connection_id'] + ' was very noisy'
    if channel['audio']['rt60'] > 5.0:
        print 'connection ' + channel['connection_id'] + ' was very echoy'