A journey through real time...

I did my first project on real-time communications in 1997. In the past two decades I did numerous projects in this general area. In particular, these were related to voice over IP (VoIP), multimedia and web communication. As 2016 comes to an end, I reminisce my 20-year journey. I attempt to capture the gist of my projects. How my project themes and ideas evolved over time?

What are endpoint driven communication systems?

A lot of my work in the past decade has focussed on endpoint driven systems, e.g., peer-to-peer Internet telephony (P2P-SIP) [1,2] for inherent scalability/robustness, Rich Internet Applications for web video conferencing [3,4], and more recently, resource-based software architecture [5,6]. In this article, I emphasize the importance of such systems, and differentiate them from other system architectures in the context of real-time communication.

Impress.js 3D presentations on cloud, communication, WebRTC and mobile

I am really impressed by impress.js presentation framework based on CSS3. It is pretty low level, requires knowledge of HTML and CSS, and is small enough that it can stay out of your way. Others have created very impressive 3D presentations. I also did my share of 3D presentations in 2015, as various conference paper presentations, while at Avaya Labs.

click here to see all my presentations

Below, I list my presentations and associated demo videos largely based on impress.js framework. When viewing a presentation, use an HTML5 capable browser such as Chrome or Safari, and please wait for it to loads completely, and the browser's loading icon to stop spinning, before doing the slide show.

Command line Twilio client

[This post is neither contributed not endorsed by Twilio]

Twilio client [1] enables embedding voice conversation in web and mobile apps, by creating a voice pipe between your browser or mobile device and the service. One thing missing there is the ability to create such a voice pipe from non-mobile or non-web programs, such as a command line application. There is a shell script [2] to place a call for testing from command line, but it uses pre-defined text or recorded file for media, and looses the real-time interactive nature of the voice path. In particular, it does not create a voice pipe between the local machine and the service.

Motivation

Ability to connect real-time interactive voice path from an application opens doors to wide range of other use cases such as media path processing and analysis, e.g., for real-time transcription, or to bridge call between diverse services, e.g., translate between IM and voice call. Secondly, such as mechanism is independent of a specific browser or mobile platform, and can work in headless mode for automated testing or client-side programmability of voice call or its media path.

At the high level, there are four potential ways to create such a voice pipe. Two of these can be accomplished using my rtclite project, that we will describe in more detail in this article.

  1. WebRTC - using Twilio 1.3+ web client with command line WebRTC app
  2. Using Twilio mobile client interface ported to command line app
  3. RTMP - using Twilio 1.2 web client API with command line RTMP client
  4. SIP - send/receive SIP call to/from Twilio [3]

The first two approaches essentially implement a command line version of the web and mobile SDKs. For example, a WebRTC stack compiled for Linux/OS X may be used to accomplish (1). The third approach uses a command line RTMP client and an older version of client API, and the fourth one uses a command line SIP endpoint to dial into the service.

We describe how to do the last two using software pieces from our open source rtclite project [5]. The following video demonstration shows the command line call initiation. Don't forget to view in full screen!



Connect to Twilio from command line SIP endpoint

The approach is described on the provider's website [3] including the steps for creating the SIP domain/endpoint such as yourname.sip.twilio.com. The description there is targeted for VoIP providers rather than client. Currently it is not possible [4] to configure your SIP softphone to use the Twilio service as SIP server.  Once the SIP domain is created, you can send request to "sip:something@yourname.sip.twilio.com". The sender's IP address needs to be white-listed or the caller's credential needs to be preapproved for authentication. we have only tried the IP address white-listing, and not the SIP authentication using credentials.

After configuring a SIP endpoint, you should be able to see the SIP domain and its associated voice URL on the provider's website, e.g., "yourname.sip.twilio.com" mapped to voice URL of "http://yourserver/yourtwiml.xml".A simple call forwarding can be done using the following TwiML. These steps can be used to connect to any other TwiML application from the command line SIP endpoint, if you configure your SIP endpoint's voice URL accordingly.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Dial timeout="20" callerId="1415xxxxxxx" method="POST">415yyyyyyy</Dial>
</Response>

You can also create programmable server side script to derive the target number using the called SIP address. For example, send call to sip:1415yyyyyyy@yourname.sip.twilio.com, and in your script, extract the user name part of the URL to populate the Dial tag's content.

Once the initial configuration is complete, use the SIP endpoint available in the rtclite project to initiate a call. More details on the command line SIP endpoint are available in my previous blog post as well as on the project website [6]. After setting up the initial dependencies of the project, run the caller module as follows.

$ python -m rtclite.app.sip.caller --domain=example.net --use-lf --samplerate=48000 \
         --to=sip:something@yourname.sip.twilio.com

The "domain", "use-lf" and "samplerate" options are described on the project website. The "to" option is important, and represents the target address to call to. Once the call is established, the py-audio project's modules are used to interface with the audio device, and to send and receive audio in the call.

Some important notes follow. For a demo account with the provider, your callerId and target number in the Dial tag must both be verified for your account. Some Internet Service Providers (ISP) may block a SIP request on port 5060. In fact, I sometime experience wifi reset of my residential equipment, when I send a SIP packet out from my home machine to outside VoIP service. Router interference of SIP messages may also be the reason for incorrect CRLF handling, and the need for use-lf option. Using TLS may be an option to work around router interference. Make sure that the sample rate specified on the command matches the allowed sample rate of the audio device on your client machine. On Mac OS X, audio capture is usually done with 48kHz.

Connect to Twilio from command line RTMP client

Here, we exploit the older Flash-based Twilio Web Client SDK version 1.2. First, we describe how to figure out the client-server RTMP message flow, and next we show how to do this using the rtmpclient module from the rtclite project.

The web client SDK 1.2 used this twilio.js javascript file. A quick look indicates that it internally loads another twilio.js file.

This second JavaScript file shows that the Flash-based client-server connection is accomplished using the MediaStream class which internally attempts RTMP connection to the service.

Using the hello-client-monkey.php example from the provider, but hosted on your website, and running wireshark on default RTMP port 1935, you can find more information about this client-server exchange. Make sure to use http instead of https for your hosted web app to avoid encryption. The following screenshots show a few example RTMP messages, and their order, and the parameters sent in the initial connect request.

In summary, the NetConnection's connect is done to "rtmp://chunder.twilio.com/chunder", with additional 6 arguments. The first argument looks like the capability token generated by the helper library. The second and third are null and empty string respectively. The fourth one is a JSON formatted string with some client side attributes. The fifth looks like the account SID. And last one looks like the client SDK version. After connect is complete, there are two createStream calls and a startCall RPC method. This is followed by a received RPC method callsid from the server. Next, the "input" stream name is published, and "output" stream name is played. This is followed by bunch of audio data.

We wrote an example application, client.py [7], which uses the rtmpclient module from the rtclite project, and automates client-server exchange process. It then connects the audio sent/received on the two streams to the local audio device. Although, not well tested, there are other command line options in this application that allow recording the received stream, or playing a file to the sent stream.

The command line application takes three mandatory parameters, which if not supplied, will be prompted for. These parameters are for account SID, auth token and application SID. The client application then connects to the service using RTMP and uses those supplied parameters. An example command line invocation is shown below, where the three parameters are fake - use the correct values in your test!

$ python -m rtclite.vnd.twilio.client --account=ACXXXXX --token=YYYYY --app=APZZZZZ

Use ctrl-C to terminate the client application.

Comparison

Here, we attempt to compare our two approaches for command line client: SIP and RTMP.

The audio codec used are G.711 in SIP and Speex in RTMP. Thus bandwidth requirement is more for SIP, but can theoretically give better quality, e.g., for real-time transcription. It may be possible to send G.711 in RTMP to server, but is not clear how to force the server to send back G.711 in RTMP. The media path is over UDP (RTP) for SIP vs. TCP for RTMP - making RTMP one more suseptible to network issues such as latency in interactive conversations. While both these are voice only at this time, using WebRTC or Mobile SDK based approach may allow video pipe. The py-webrtc project may become useful for this, once it is completed.

The SIP approach has provider supplied documentation on how to do incoming calls, but the RTMP approach requires more work to figure that out. The SIP approach seems to target server-to-server call flows, e.g., to connect your soft PBX to provider service. Due to router interference of SIP messages, it may not work in all the cases or all the time from a client machine. On the other hand, the RTMP approach is inspired by the client SDK, and hence suitable for clients. However, Flash-based client has been deprecated by the provider, and the corresponding service may no longer be available in near future. Moreover, RTMP is kind of an obsolete technology. Using WebRTC or Mobile SDK approach will work better in that case. Once SIP registration becomes available on the provider service, a standard command line SIP endpoint [6] should be enough for send/receive of calls.

References

  1. Twilio client: embed voice conversation in Web and Mobile Apps, https://www.twilio.com/client
  2. Twilio Labs: place a Twilio call from the shell, https://www.twilio.com/labs/bash
  3. Programmable voice SIP, https://www.twilio.com/docs/api/twilio-sip
  4. Can I configure my soft phone to work with the Twilio SIP endpoint, https://www.twilio.com/help/faq/voice/can-i-configure-my-soft-phone-to-work-with-the-twilio-sip-endpoint
  5. Rtclite: light weight implementations of real-time communication protocols and applications in Python, https://github.com/theintencity/rtclite
  6. Command line SIP endpoint, https://github.com/theintencity/rtclite/blob/master/rtclite/app/sip/caller.md
  7. Command line Twilio client, https://github.com/theintencity/rtclite/blob/master/rtclite/vnd/twilio/client.py

How to make phone calls from command line?

This article presents a command line SIP endpoint available as part of my rtclite [1] project. It is useful in a number of scenarios such as:
  • dialing out a phone number from command line, 
  • performing automated VoIP system tests, 
  • showing quick demos of communication systems, or 
  • experimenting with media processing on the voice path, e.g., for speech recognition, recording or text-to-speech. 
These tasks cannot easily be done using existing user interface based web, installed or mobile apps. I had implemented something like this, named sipua [2], about 15 years ago in C/C++ at Columbia University. There are other projects such as sipp [3], pjsua [4] or sipcmd [5] that implement some version of command line SIP user agent, but may have limitations such as lack of support for audio capture device, or hard to extend to add new media processing capability such as text-to-speech. This article describes a SIP endpoint written in Python [6] as part of my open source project.

Click here to see the full description of this SIP endpoint written in Python [6].

The project page shows how to use the SIP endpoint to send/receive instant messages and voice call. The command line options allow you to configure various attributes such as whether to register with a SIP server, how to respond to incoming requests, how to work around SIP entities that incorrectly handle CRLF for line endings, how to test signaling without media path, and so on. The project page also describes how to use the module in your own Python project.

Following video demo of the SIP endpoint shows interoperability with X-lite terminal and dialing out toll free numbers using a VoIP provider. It shows how to dial a phone number using a VoIP provider, how to send DTMF digits from terminal, and some experimental features such as text-to-speech and speech recognition. Do not forget to watch in full screen!


References

  1. Rtclite: light weight implementations of real-time communication protocols and applications in Python, http://www.rtclite.com, https://github.com/theintencity/rtclite
  2. Sipua: a SIP test user agent for Solaris, Linux, FreeBSD and Windows NT, http://www1.cs.columbia.edu/irt/cinema/doc/sipua.html
  3. SIPp: free open source test tool/traffic generator for SIP, http://sipp.sourceforge.net/ 
  4. Pjsua: open source command line SIP user agent (softphone), http://www.pjsip.org/pjsua.htm 
  5. sipcmd: the command line SIP/H.323/RTP softphone, https://github.com/tmakkonen/sipcmd
  6. caller: SIP application to initiate or receive VoIP calls from command line. https://github.com/theintencity/rtclite/blob/master/rtclite/app/sip/caller.md


Introducing rtclite

We have attempted to unify our diverse Python based projects related to SIP, RTP, XMPP, RTMP, REST, etc., into a single theme of real-time communication (RTC). In particular, we have migrated the relevant source code from other projects to the new "rtclite" project after cleanup and refactoring. Moving forward, any new functionality related to Python implementations of real-time communication protocols and applications will be done in this new project.

Over the next few weeks, we will continue to evolve this project, migrate or fix issues/comments,  and deprecate the previous older projects. 

I will also write blog posts here  to introduce various specific modules and how they are used in real world in the new few weeks.

From the project page at http://www.rtclite.com (currently forwards to https://github.com/theintencity/rtclite)

"Light-weight implementations of real-time communication protocols and application in Python

This project aims to create an open source repository of light weight implementations of real-time communication (RTC) protocols and applications. In a nutshell, it contains reference implementations, prototypes and sample applications, mostly written in Python, of various RTC standards defined in the IETF, W3C and elsewhere, e.g., RFC 3261 (SIP), RFC 3550 (RTP), RFC 3920 (XMPP), RTMP, etc."
We encourage all users of my open source projects to visit this new project, and if possible migrate your applications to use this new project.

rtclite


My WebRTC related papers from 2015

This post continues from the previous one and gives an overview of my ongoing research on web-based multimedia communication or WebRTC. I wrote five published co-authored research papers last year answering these questions - How do we solve some of the enterprise challenges of WebRTC using browser extensions? How do we address cloud challenges of a multi-services and apps vendor? How do we create pure web-based enterprise communication and collaboration system without depending on legacy protocols? How do we do user reachability in a multi-apps environment created by WebRTC? And, how do we do write-once-run-anywhere for WebRTC based team apps using a cross platform tool? Read on if one or more of these questions interest you...

Enterprise WebRTC powered by browser extensions

Traversing WebRTC flows created by external third-party websites across restricted enterprise firewalls is challenging. There are other challenges in adopting WebRTC in enterprises, e.g., how to integrate it seamlessly with existing communication equipments, or how to enforce enterprise policies such as call recording of WebRTC flows on third-party websites? We show how to use browser extensions to solve these problems. This systems paper is based on my implementations of two interesting projects.

“We use browser extensions to solve two important issues in adopting WebRTC (Web Real-Time Communications) in enterprises: how to integrate WebRTC-centric communication with existing systems such as corporate directories, communication infrastructure and intranet websites, and how to traverse media paths across enterprise firewalls. Vclick is a simple and easy to use web-based video collaboration application that enables click-to-call from other webpages. SecureEdge is a network border traversal system for policy and security enforcement, and consists of a secure media relay that sits at the network border or in the cloud. A browser extension in the enterprise user’s device transparently injects this media relay in every WebRTC media path needing to traverse the enterprise network edge to enable authenticated border traversal without help from the websites hosting the WebRTC pages. We attempt to generically support WebRTC in enterprises on a variety of application scenarios instead of creating another fragmented communication island. The challenges faced and techniques used in our proof-of-concepts are likely extensible to other enterprise WebRTC scenarios using the emerging HTML5 technologies.”

Keywords: WebRTC, enterprise communication, secure edge, browser extension, VoIP, video call, firewall traversal, media relay.

more>>

ALICE: Avaya Labs Innovations Cloud Engagement

Although we know how to create cloud-hosted services, platforms and infrastructures, little is known about cloud hosted communication and collaboration services, especially to enable multi-tenancy and self-service. This research focuses on the challenges of hosting cloud services for customer trials, where resources are limited to make existing services cloud ready or to fit a specific platform. This is based on my work on creating a cloud portal to host research-oriented early or pre-product services on the cloud, and identifying common themes and techniques.

“We present the architecture and implementation of our enterprise cloud portal named ALICE, Avaya Labs Innovations Cloud Engagement, which provides self-service access to service developers, tenants, and users to various communication and collaboration applications. Currently ALICE is used for field testing of advanced research prototype services based on technologies such as WebRTC and HTML5. This paper describes the current portal and extensions to support multi-tenancy.

We describe challenges in creating a self-service multi-tenant SaaS (software-as-a-service) portal to host communications and collaboration applications for small to medium scale businesses. The challenges faced and the techniques used in our architecture relate to security, provisioning, management, complexity, cost savings and multi-tenancy, and are applicable and useful to other cloud deployments of diverse enterprise applications.”

Keywords: Cloud, system architecture, portal, multi-tenancy, Internet telephony, enterprise communications, web collaboration.

more>>

Vclick: endpoint driven enterprise WebRTC

One of my earliest project at Avaya Labs was on creating a light-weight service for wide range of web communication and collaboration scenarios. Vclick is a collection of many loosely coupled apps that run the app-logic in the browser or endpoint, and mash up at the data level. It contains applications for video call, conferencing, video presence, text chat, click-to-call, screen sharing, shared notepad and whiteboard, and so on. It goes against the conventional web wisdom of thin-client, single-page-apps, or rigid GUI, and presents a new software architecture to create robust endpoint driven apps. The paper is really about how to keep the endpoints smart and network (or service) dumb in the context of collaboration applications.

“We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.”

Keywords: WebRTC, enterprise communication, web video conferencing, resource-based architecture, web applications.

more>>

User reachability in multi-apps environment

With numerous "walled-garden" services and apps emerging because of WebRTC, there is a need to identify the best way to reach your contacts, irrespective of which app or service she is on. This systems paper describes my work on implementing a mobile (and desktop) app called Strata Top9, to quickly reach your important contacts. It really is a front-end to launch other applications. Unlike existing presence based systems, we propose to iterate during call initiation. The paper presents the software architecture and design decisions along with several motivational use cases of our project. It also details the concept of dynamic contacts, and endpoint driven caller and receiver policies.

“Recent progress in web real-time communication (WebRTC) promotes multi-apps environment by creating islands of communication apps where users of one website or service cannot easily communicate with those of another. We describe the architecture and implementation of a multi-platform system to do user reachability in multiple communication services where users decide how they want to be reached on multiple apps, e.g., in an organization that has voice-over-IP, web conferencing and messaging from different vendors. Our architecture separates the user contacts from reachability apps, supports user and endpoint driven reachability policies, and has several independent and non-interoperable WebRTC-based apps for two-way and multiparty multimedia communication. Our flexible implementation can be used for enterprise or personal communications, or as a white-labeled app for consumers of a business.”

Keywords: system design; mobile app; user reachability; multiservices; VoIP; WebRTC; caller policy.

more>>

Developing WebRTC-based team apps with a cross-platform mobile framework

Ability to write-once-run-anywhere still eludes many app developers. Luckily several cross-platform development tools exist. However, creating cross-platform communication and collaboration related apps is still challenging. This paper presents my implementation work on creating cross platform apps. In particular, four types of platforms - web app on PC and mobile, and installed app on PC and mobile - are considered, and seven different apps are covered for a wide range of enterprise use cases. Techniques and steps for creating such cross platform apps are presented along with lessons learned based on practical experience. Additionally, considerations for iOS and wearable Glass devices are presented.

“We present lessons learned in developing cross platform multi-party team applications. Our apps include a range of communication and collaboration scenarios: document and content sharing in a team space, an agent-based meeting helper, phone number dialer via a voice-over-IP (VoIP) gateway, and multi-party call in peer-to-peer or client-server mode. We use web real-time communication (WebRTC) to enable the audio and video media paths in the apps. We use frameworks such as Chrome Apps and Apache Cordova to create apps that can be accessed from a browser, or installed on a desktop, mobile device, or wearable. The challenges and techniques described in our paper related to audio, video, network, power conservation and security are important to other developers building cross-platform apps involving WebRTC, VoIP and cloud services.”

Keywords: HTML5, Apache Cordova, Chrome Apps, WebRTC, Mobile, Cloud, Wearable.

more>>