Articles on technology, Internet, Protocols, Web, Open Source, VoIP, P2P, SIP, RTMP and WebRTC.
A generic video-io component for WebRTC
A journey through real time...
Impress.js 3D presentations on cloud, communication, WebRTC and mobile
click here to see all my presentations
Below, I list my presentations and associated demo videos largely based on impress.js framework. When viewing a presentation, use an HTML5 capable browser such as Chrome or Safari, and please wait for it to loads completely, and the browser's loading icon to stop spinning, before doing the slide show.
My WebRTC related papers from 2015
This post continues from the previous one and gives an overview of my ongoing research on web-based multimedia communication or WebRTC. I wrote five published co-authored research papers last year answering these questions - How do we solve some of the enterprise challenges of WebRTC using browser extensions? How do we address cloud challenges of a multi-services and apps vendor? How do we create pure web-based enterprise communication and collaboration system without depending on legacy protocols? How do we do user reachability in a multi-apps environment created by WebRTC? And, how do we do write-once-run-anywhere for WebRTC based team apps using a cross platform tool? Read on if one or more of these questions interest you...
Enterprise WebRTC powered by browser extensions
Traversing WebRTC flows created by external third-party websites across restricted enterprise firewalls is challenging. There are other challenges in adopting WebRTC in enterprises, e.g., how to integrate it seamlessly with existing communication equipments, or how to enforce enterprise policies such as call recording of WebRTC flows on third-party websites? We show how to use browser extensions to solve these problems. This systems paper is based on my implementations of two interesting projects.
“We use browser extensions to solve two important issues in adopting WebRTC (Web Real-Time Communications) in enterprises: how to integrate WebRTC-centric communication with existing systems such as corporate directories, communication infrastructure and intranet websites, and how to traverse media paths across enterprise firewalls. Vclick is a simple and easy to use web-based video collaboration application that enables click-to-call from other webpages. SecureEdge is a network border traversal system for policy and security enforcement, and consists of a secure media relay that sits at the network border or in the cloud. A browser extension in the enterprise user’s device transparently injects this media relay in every WebRTC media path needing to traverse the enterprise network edge to enable authenticated border traversal without help from the websites hosting the WebRTC pages. We attempt to generically support WebRTC in enterprises on a variety of application scenarios instead of creating another fragmented communication island. The challenges faced and techniques used in our proof-of-concepts are likely extensible to other enterprise WebRTC scenarios using the emerging HTML5 technologies.”
Keywords: WebRTC, enterprise communication, secure edge, browser extension, VoIP, video call, firewall traversal, media relay.
ALICE: Avaya Labs Innovations Cloud Engagement
Although we know how to create cloud-hosted services, platforms and infrastructures, little is known about cloud hosted communication and collaboration services, especially to enable multi-tenancy and self-service. This research focuses on the challenges of hosting cloud services for customer trials, where resources are limited to make existing services cloud ready or to fit a specific platform. This is based on my work on creating a cloud portal to host research-oriented early or pre-product services on the cloud, and identifying common themes and techniques.
“We present the architecture and implementation of our enterprise cloud portal named ALICE, Avaya Labs Innovations Cloud Engagement, which provides self-service access to service developers, tenants, and users to various communication and collaboration applications. Currently ALICE is used for field testing of advanced research prototype services based on technologies such as WebRTC and HTML5. This paper describes the current portal and extensions to support multi-tenancy.
We describe challenges in creating a self-service multi-tenant SaaS (software-as-a-service) portal to host communications and collaboration applications for small to medium scale businesses. The challenges faced and the techniques used in our architecture relate to security, provisioning, management, complexity, cost savings and multi-tenancy, and are applicable and useful to other cloud deployments of diverse enterprise applications.”
Keywords: Cloud, system architecture, portal, multi-tenancy, Internet telephony, enterprise communications, web collaboration.
Vclick: endpoint driven enterprise WebRTC
One of my earliest project at Avaya Labs was on creating a light-weight service for wide range of web communication and collaboration scenarios. Vclick is a collection of many loosely coupled apps that run the app-logic in the browser or endpoint, and mash up at the data level. It contains applications for video call, conferencing, video presence, text chat, click-to-call, screen sharing, shared notepad and whiteboard, and so on. It goes against the conventional web wisdom of thin-client, single-page-apps, or rigid GUI, and presents a new software architecture to create robust endpoint driven apps. The paper is really about how to keep the endpoints smart and network (or service) dumb in the context of collaboration applications.
“We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.”
Keywords: WebRTC, enterprise communication, web video conferencing, resource-based architecture, web applications.
User reachability in multi-apps environment
With numerous "walled-garden" services and apps emerging because of WebRTC, there is a need to identify the best way to reach your contacts, irrespective of which app or service she is on. This systems paper describes my work on implementing a mobile (and desktop) app called Strata Top9, to quickly reach your important contacts. It really is a front-end to launch other applications. Unlike existing presence based systems, we propose to iterate during call initiation. The paper presents the software architecture and design decisions along with several motivational use cases of our project. It also details the concept of dynamic contacts, and endpoint driven caller and receiver policies.
“Recent progress in web real-time communication (WebRTC) promotes multi-apps environment by creating islands of communication apps where users of one website or service cannot easily communicate with those of another. We describe the architecture and implementation of a multi-platform system to do user reachability in multiple communication services where users decide how they want to be reached on multiple apps, e.g., in an organization that has voice-over-IP, web conferencing and messaging from different vendors. Our architecture separates the user contacts from reachability apps, supports user and endpoint driven reachability policies, and has several independent and non-interoperable WebRTC-based apps for two-way and multiparty multimedia communication. Our flexible implementation can be used for enterprise or personal communications, or as a white-labeled app for consumers of a business.”
Keywords: system design; mobile app; user reachability; multiservices; VoIP; WebRTC; caller policy.
Developing WebRTC-based team apps with a cross-platform mobile framework
Ability to write-once-run-anywhere still eludes many app developers. Luckily several cross-platform development tools exist. However, creating cross-platform communication and collaboration related apps is still challenging. This paper presents my implementation work on creating cross platform apps. In particular, four types of platforms - web app on PC and mobile, and installed app on PC and mobile - are considered, and seven different apps are covered for a wide range of enterprise use cases. Techniques and steps for creating such cross platform apps are presented along with lessons learned based on practical experience. Additionally, considerations for iOS and wearable Glass devices are presented.
“We present lessons learned in developing cross platform multi-party team applications. Our apps include a range of communication and collaboration scenarios: document and content sharing in a team space, an agent-based meeting helper, phone number dialer via a voice-over-IP (VoIP) gateway, and multi-party call in peer-to-peer or client-server mode. We use web real-time communication (WebRTC) to enable the audio and video media paths in the apps. We use frameworks such as Chrome Apps and Apache Cordova to create apps that can be accessed from a browser, or installed on a desktop, mobile device, or wearable. The challenges and techniques described in our paper related to audio, video, network, power conservation and security are important to other developers building cross-platform apps involving WebRTC, VoIP and cloud services.”
Keywords: HTML5, Apache Cordova, Chrome Apps, WebRTC, Mobile, Cloud, Wearable.
My WebRTC related papers from 2013
A Case for SIP in JavaScript
Taking on WebRTC in an enterprise
Building communicating web applications leveraging endpoints and cloud resource service
Private overlay of enterprise social data and interactions in the public web context
The Internet Video City
(For the last month and half, I have been aggressively involved in another open source project, "videocity". This article describes the salient features and novel ideas in that project.)
The goal of the Internet video city project is to provide open source software tools, both client and server, for video communication and sharing. Unlike other file sharing systems, this is targeted towards video and live video sharing in small groups. Unlike other video communication services, this project provides the tools needed to build a service.
High level description
At the high level, the video communication is abstracted out as a city. An individual can signup with his email user@domain.com and own a home with URL of the form http://server:5080/user@domain.com. This is also the location of the default guest room of that user. The user can build other rooms inside this URL, e.g., for hosting a online family gathering, he can get a room with name "Family Gathering" and the room URL of the form http://server:5080/user@domain.com/Family.Gathering. Each room can be made public or private. A public room is accessible to anyone visiting the URL of the room, whereas a private room needs explicit permission to enter.
Once you have entered a room, you see other members in the room, and can communicate with others using real-time audio, video and text chat. You can share media files such as photos and videos from your computer with others in the room. You can also share online photos and videos with others. All these shared resources are put in an active session and would disappear when the room is closed, i.e., all members have left the room.
The owner of the room can decorate his room by uploading, recording or editing the room's content. A room's content is described using an XML file containing multiple play lists. Each play list contains sequence of media files or URLs. When you enter a room, you see all the pre-configured play lists in that room. This allows the owner to, for example, create a room with his family pictures and videos in a slide show, and give out the URL to others to view the photos. A media resource in a play list can be text, image or audio/video. The image and audio/video can be uploaded from user's computer, downloaded from a web URL or recorded using user's camera in real-time. The play list can be readily edited using drag-drop, built-in text editor or various button controls.
Each signed in user also has an inbox. The inbox is a special XML file that gets loaded when a user logs in, and contains play lists that are sent by other users to this user. When you enter a room, you have an option to send a play list to the owner of the room, which turns up in the owner's inbox. You can record the play list using your camera, or create one using resources available from the web. The play list stored in the inbox is privately available only to the owner of the room.
This simple concept of play list and rooms, allows us to implement various communication scenarios. For example, real-time communication, video mails, publicly posted videos, and video web sites.
Novel idea
One of the novel concept used in the project is that of soft-card. A soft-card is a digital version of your ID card or visiting card. There are two types of cards: a Private login card is your confidential ID card that you use for login to the site, an Internet visiting card is your room's visiting card, which you give out to your friends so that they can visit your room. Usually each signed in person has a private login card, and each room owned by the person can have an Internet visiting card.
A soft-card looks like a digital image of your real ID and visiting cards. It is actually a image file in PNG format. The image has a photo, your name or your room's name, some list of key words identifying your room, and a URL of your room. Unlike a regular PNG file, a soft-card has additional meta information that is used in secure identification and access. In particular, your private login card has your RSA private key (refer to PKI) and your Internet visiting card has X.509 certificate using RSA public key signed by the server. These meta information such as keys, certificates, names, emails, keywords, etc., are stored in information chunks of the PNG file itself.
Similar to public key cryptography, these digital files can allow us to implement security, authentication, access control, privacy, confidentiality, etc. Essentially, anything you can do with PKI, you can do with these soft-cards. Additionally, these soft cards give a visual appearance of an ID card or a visiting card containing the URL which they represent. Users receive them in email on signup, and can give out visiting card to others in email. An example visiting card is shown at the top of this article. If you edit the card's file or image in any way, e.g., converting to JPEG and back, or edit using photo editors, then the card's key information will become invalid and unusable. Note that a card is valid only within the domain it is created for. Thus a card created for http://server1/room1 can not be used by http://server2/room1 even if both server1 and server2 virtual domains are hosted by the same server.
Once we have the login (private key) and visiting (public key) cards, implementing rest of the security mechanisms is straight forward. For example, resources in an inbox can be encrypted using public key of the owner, so that only a private login card can decrypt it. The public rooms are signed by owner's private key, so that anyone with the visiting card of the room can verify the signature. When sending a media resource to another user, PKI can be used to establish a secure session of communication. A room can be made private by allowing only connections from people who have valid visiting card for that room, and have the owner send out visiting card to his friends and family using an independent channel such as email. A room can be made public by uploading the visiting card to the room itself, so that anyone with the URL can first download the visiting card (i.e., public key) and use that to connect to the room. Although we haven't implemented most of the security mechanisms, we have the basic soft-card concept implemented in the project. In particular, you can create your cards, edit the layout of the card during creation, download them after creation, and use them to upload in the client to join a room or to log in. One thing to note is that within the Flash Player environment, the amount of security using PKI is limited. But since we have our own video server implementation as well, we can do some novel tricks in that regard.
Product design ideas
There are several product design ideas we implemented in the project: (1) consistency, (2) flowing and smooth interface, and (3) performance. In this section, I describe these ideas and how they are implemented.
Consistency is very important in user interface design. The look and feel of various buttons should be consistent. Common operations should be consistent with what people are used to doing. For example, most windows users see the 'close', 'maximize', 'minimize' buttons on the top-right corner. Most mac users see the bottom bar as tools or commands bar. Most instant messaging users see notifications on the bottom-right corner of their screen. We used these concepts in our UI design as well.
Flash allows us to implement nice, smooth and flowing user interface. When you go from one room to another, the view slides your window from one room to another. The sliding window component in the project nicely abstracts out the details of this container. When a help video is played, it animates to the full view, and when it is paused, it goes back to the original position. For help videos, flowing subtitles along with audio/video give a better user experience. Computer users are comfortable with drag-and-drop operations using the mouse. In our project, the play list editing, video window re-organizing, delete button, etc., use the drag-and-drop mode of operation.
Performance is important once the project grows to a significant size. In particular, a Flash Player spends lot of cycles rendering images. This is improved significantly in our project since we use only programmatic skins for all our buttons and icons. Moreover, programmatic skins scale nicely when going to full screen or different size.
There were a number of lessons we learned in this project from the product design perspective. Moreover, being responsible for both product design and product engineering helped us avoid ambiguity, which is usually seen in multiple team projects.
The big picture
Although, the project is still "work in progress" and a lot of work is remaining, I wanted to give a big picture of the project. Flash Player is a great browser plugin. However being proprietary makes it hard for others to use it in full potential. For example, until recently the video communication was restricted to only Flash media server, or file upload were not allowed from local computer to Flash Player without going through the server. Although Adobe is making significant progress in keeping the developer community engaged, (e.g., making RTMP protocol open, or making file uploads and downloads available in new Flash Player) there will always be some restriction in the Flash Player. For example, absence of H.264 encoder or good audio quality/preprocessing engine prevents us from using it efficiently in true H.264 video communication or good real-time audio communication. In any case, since the RTMP protocol is open, and since there are a number existing open source RTMP implementations, one can use back-end RTMP based servers to perform some processing.
This videocity project gives us back-end tools to intercept RTMP, integrate web communication, and expose a single server to support various requirements of video conferencing. One can ask whether this will scale? The answer is, may be, not. The reason for doing the project though is that it fits nicely in the big picture of P2P-SIP based communication framework. Flash gives a nice ubiquitous browser based front end, whereas our videocity server gives tools that can be integrated with peer-to-peer network. Thus we can gain from advantages of both worlds.
Distributing a conference in a P2P network is an already researched problem. Several solutions exist, ranging from application level multicast for large conference, to full mesh small conferences, to picking a few servers as relay bridges. Maintaining shared distributed state of the conference and collaboration is interesting to explore. The SIP community has done significant work in centralized conferencing framework, e.g., in the IETF XCON working group. The P2P-SIP working group is creating protocol for standards based peer-to-peer network maintenance and lookup for SIP service. Finally, some API or interface specification is needed for the videocity's client-server model so that others can build clients or server adaptors to integrate between XCON, P2P-SIP and videocity. In particular, we will define all the interface elements such as format of the soft-card, various RPC calls for uploading or downloading resources, sharing play lists, authenticating users, as well as communication mechanisms.
In summary, the project gives developers a starting point from where you can build video communication service, video message platform, video recording and editing system, collaboration engine, media sharing software, video blog web site, video rooms, multi-party conferencing applications, desktop clients, browser extensions, application sharing, new client applications, and so on. The client-server tools available in the project allow you to record a video or snapshot photo from your camera and store it in local file, create play lists of various heterogenous media resources, and share live and stored media with others using the system.
There is no hosted service for this software, and we don't plan to have one. This is because our goal is to go peer-to-peer, where various installations of the software will discover and communicate with each other!
Thank you for your reading time, and we love feedback!