"Light-weight implementations of real-time communication protocols and application in Python


Articles on technology, Internet, Protocols, Web, Open Source, VoIP, P2P, SIP, RTMP and WebRTC.


This post continues from the previous one and gives an overview of my ongoing research on web-based multimedia communication or WebRTC. I wrote five published co-authored research papers last year answering these questions - How do we solve some of the enterprise challenges of WebRTC using browser extensions? How do we address cloud challenges of a multi-services and apps vendor? How do we create pure web-based enterprise communication and collaboration system without depending on legacy protocols? How do we do user reachability in a multi-apps environment created by WebRTC? And, how do we do write-once-run-anywhere for WebRTC based team apps using a cross platform tool? Read on if one or more of these questions interest you...
Traversing WebRTC flows created by external third-party websites across restricted enterprise firewalls is challenging. There are other challenges in adopting WebRTC in enterprises, e.g., how to integrate it seamlessly with existing communication equipments, or how to enforce enterprise policies such as call recording of WebRTC flows on third-party websites? We show how to use browser extensions to solve these problems. This systems paper is based on my implementations of two interesting projects.
“We use browser extensions to solve two important issues in adopting WebRTC (Web Real-Time Communications) in enterprises: how to integrate WebRTC-centric communication with existing systems such as corporate directories, communication infrastructure and intranet websites, and how to traverse media paths across enterprise firewalls. Vclick is a simple and easy to use web-based video collaboration application that enables click-to-call from other webpages. SecureEdge is a network border traversal system for policy and security enforcement, and consists of a secure media relay that sits at the network border or in the cloud. A browser extension in the enterprise user’s device transparently injects this media relay in every WebRTC media path needing to traverse the enterprise network edge to enable authenticated border traversal without help from the websites hosting the WebRTC pages. We attempt to generically support WebRTC in enterprises on a variety of application scenarios instead of creating another fragmented communication island. The challenges faced and techniques used in our proof-of-concepts are likely extensible to other enterprise WebRTC scenarios using the emerging HTML5 technologies.”
Keywords: WebRTC, enterprise communication, secure edge, browser extension, VoIP, video call, firewall traversal, media relay.
Although we know how to create cloud-hosted services, platforms and infrastructures, little is known about cloud hosted communication and collaboration services, especially to enable multi-tenancy and self-service. This research focuses on the challenges of hosting cloud services for customer trials, where resources are limited to make existing services cloud ready or to fit a specific platform. This is based on my work on creating a cloud portal to host research-oriented early or pre-product services on the cloud, and identifying common themes and techniques.
“We present the architecture and implementation of our enterprise cloud portal named ALICE, Avaya Labs Innovations Cloud Engagement, which provides self-service access to service developers, tenants, and users to various communication and collaboration applications. Currently ALICE is used for field testing of advanced research prototype services based on technologies such as WebRTC and HTML5. This paper describes the current portal and extensions to support multi-tenancy.
We describe challenges in creating a self-service multi-tenant SaaS (software-as-a-service) portal to host communications and collaboration applications for small to medium scale businesses. The challenges faced and the techniques used in our architecture relate to security, provisioning, management, complexity, cost savings and multi-tenancy, and are applicable and useful to other cloud deployments of diverse enterprise applications.”
Keywords: Cloud, system architecture, portal, multi-tenancy, Internet telephony, enterprise communications, web collaboration.
One of my earliest project at Avaya Labs was on creating a light-weight service for wide range of web communication and collaboration scenarios. Vclick is a collection of many loosely coupled apps that run the app-logic in the browser or endpoint, and mash up at the data level. It contains applications for video call, conferencing, video presence, text chat, click-to-call, screen sharing, shared notepad and whiteboard, and so on. It goes against the conventional web wisdom of thin-client, single-page-apps, or rigid GUI, and presents a new software architecture to create robust endpoint driven apps. The paper is really about how to keep the endpoints smart and network (or service) dumb in the context of collaboration applications.
“We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.”
Keywords: WebRTC, enterprise communication, web video conferencing, resource-based architecture, web applications.
With numerous "walled-garden" services and apps emerging because of WebRTC, there is a need to identify the best way to reach your contacts, irrespective of which app or service she is on. This systems paper describes my work on implementing a mobile (and desktop) app called Strata Top9, to quickly reach your important contacts. It really is a front-end to launch other applications. Unlike existing presence based systems, we propose to iterate during call initiation. The paper presents the software architecture and design decisions along with several motivational use cases of our project. It also details the concept of dynamic contacts, and endpoint driven caller and receiver policies.
“Recent progress in web real-time communication (WebRTC) promotes multi-apps environment by creating islands of communication apps where users of one website or service cannot easily communicate with those of another. We describe the architecture and implementation of a multi-platform system to do user reachability in multiple communication services where users decide how they want to be reached on multiple apps, e.g., in an organization that has voice-over-IP, web conferencing and messaging from different vendors. Our architecture separates the user contacts from reachability apps, supports user and endpoint driven reachability policies, and has several independent and non-interoperable WebRTC-based apps for two-way and multiparty multimedia communication. Our flexible implementation can be used for enterprise or personal communications, or as a white-labeled app for consumers of a business.”
Keywords: system design; mobile app; user reachability; multiservices; VoIP; WebRTC; caller policy.
Ability to write-once-run-anywhere still eludes many app developers. Luckily several cross-platform development tools exist. However, creating cross-platform communication and collaboration related apps is still challenging. This paper presents my implementation work on creating cross platform apps. In particular, four types of platforms - web app on PC and mobile, and installed app on PC and mobile - are considered, and seven different apps are covered for a wide range of enterprise use cases. Techniques and steps for creating such cross platform apps are presented along with lessons learned based on practical experience. Additionally, considerations for iOS and wearable Glass devices are presented.
“We present lessons learned in developing cross platform multi-party team applications. Our apps include a range of communication and collaboration scenarios: document and content sharing in a team space, an agent-based meeting helper, phone number dialer via a voice-over-IP (VoIP) gateway, and multi-party call in peer-to-peer or client-server mode. We use web real-time communication (WebRTC) to enable the audio and video media paths in the apps. We use frameworks such as Chrome Apps and Apache Cordova to create apps that can be accessed from a browser, or installed on a desktop, mobile device, or wearable. The challenges and techniques described in our paper related to audio, video, network, power conservation and security are important to other developers building cross-platform apps involving WebRTC, VoIP and cloud services.”
Keywords: HTML5, Apache Cordova, Chrome Apps, WebRTC, Mobile, Cloud, Wearable.
![]() |
| Comparison of typical SIP vs WebRTC application stack. The dotted line separates what is programmed by the application developer and what is provided by the platform. |
![]() |
| Interoperability differences in SIP vs WebRTC; (o) means optional. |
| Bellhead | Nethead | Webhead | |
|---|---|---|---|
| Evolution | Initially there were Bellheads. They were happy living in their islands, working hard towards what the island demanded, and created applications and services for the island. | Then Netheads evolved, and separated applications from the grasp of the services. And interconnected the islands. They created Internet. They created the world-wide-web. They created SIP. And the list goes on. | Then Webheads emerged, and they started moving applications to the web. Slowly and steadily. First, it was email to webmail. Then documents on web editor. Then VoIP to web telephony. Now it looks like they have spread every where and practically moved all kinds of applications to the web. |
| Control | Believe in centralized control, like one super-power to rule them all. The central element has enough gravity to pull many applications and services, resulting is heavy weight servers but thin clients. | Believe in distributed control or no-control, resulting in chaos at times. The fluidity of the services and applications keeps them decoupled, or loosely connected with each other, spreading their wings to the peripherals. | Believe in centralized control in the form of the website on which their application is hosted, but often accept the fact that there can be numerous such control elements. Work hard to avoid any connectivity or information leak to other controls, outside their own. |
| Robustness | Everything hardware must be 99.999% reliable. Overall system must be 99.9% reliable. If system goes down, even if nobody is using it right now, they will lose business. Things have strict guarantees. | The system works most of the time, but fails sometimes, but no one person or entity may be blamed, because it is collective responsibility. No guarantees, just best effort service, whatever... whatever works well | The system works if it does, and does not if it does not. They work hard to keep the system running. But many systems are poorly designed, and often fail on sudden popularity of the website. Often blame for the failure is put on the network, and many times on the database. |
| Example | To add video communication on top of voice calls is a massive renovation project involving upgrading existing services, making the pipes bigger to carry video, incorporating fixed bit rate video codecs to guarantee quality, and upgrading terminal devices to support video. It must work all the time from day one. | To add video on top of a voice call client-server application, is just changing the application to use the desktop webcam and display, and pushing new video packet along side voice, without changing any other infrastructure element in most cases. | To add video on top of a voice call web page, there is a redesign of the website to accommodate the visual elements, so that everyone viewing the same web page can see and hear each other. Regarding the codecs and bandwidth, delegate that to Flash or WebRTC. |
rtmp-payload := enc-type[1B] | type[1B] | remaining
enc-type := is-intra[4b] | codec-type[4b]
is-intra := 1 if intra and 2 if non-intra
codec-type := 7 for H.264/AVC
remaining for config := version[1B] | profile-idc[1B]
| profile-compat[1B]
| level-idc[1B]
| length-flag[1B]
| sps-count[1B] | sps0 ...
| pps-count[1B] | pps0 ...
length-flag := 0[6b] | value[2b] where value + 1 is length-size
sps-count := 0[3b] | count[5b] where count is number of sps
pps-count := number of pps elements
sps(n) := length[2B] | sps
pps(n) := length[2B] | pps
remaining-picture := delay[3B] | nalu0 | nalu1 ...
nalu(n) := length | nalu
length := number in length-size bytes
nalu := NAL unit as per H.264
nalu := nal-flags[1B] | encoded-data
nal-flags := forbidden[1b] | nri[2b] | nal-type [5b]
To fragment:
let encoded-data = fragment0 | fragment1 | fragment2...
encoded-data of fragment(n) := orig-nal-flags[1B] | fragment(n)
orig-nal-flags := start[1b] | end[1b] | ignore[1b]
| orig-nal-type[5b]
start := 1 if first fragment else 0
end := 1 if last fragment else 0
To aggregate:
encoded-data of aggregate := nalu0 | nalu1 | nalu2 ...
nalu(n) := length[2B] | orig-nalu(n)