This post gives an overview of my past and ongoing research on web-based multimedia communication. I wrote four published co-authored research papers last year answering these questions - How does WebRTC interoperate with existing SIP/VoIP systems and how does SIP in JavaScript compare with other approaches? What challenges lie ahead of enterprises in adopting WebRTC and what can they do? How can developers take advantage of WebRTC and HTML5 in creating rich internet applications in the cloud? How does emerging HTML5 technology allow better enterprise collaboration using rich application mash-ups? Read on if one or more of these questions interest you...
A Case for SIP in JavaScript
WebRTC (Web Real Time Communications) promises easy interaction with others using just your web browser on a website. SIP (Session Initiation Protocol) is the backbone of Internet communications interconnecting many devices, servers and systems you use daily for your call. Connecting WebRTC with SIP/RTP raises more questions than what we already know, e.g., does interoperability really work? what changes do you need in the existing devices and systems? does the conversion happen at the server or in the browser? what are the trade-offs of doing the translation? what are the risks if you go full throttle with WebRTC in the browser? and many more. There is a need for a fair comparison to evaluate not only what is theoretically possible but also practically achievable with existing systems.
"This paper presents the challenges and compares the alternatives to interoperate between the Session Initiation Protocol (SIP)-based systems and the emerging standards for the Web Real-Time Communication (WebRTC). We argue for an end-point and web-focused architecture, and present both sides of the SIP in JavaScript approach. Until WebRTC has ubiquitous cross-browser availability, we suggest a fall back strategy for web developers — detect and use HTML5 if available, otherwise fall back to a browser plugin."
Taking on WebRTC in an enterprise
On one hand WebRTC enables the end-users to communicate with others directly from their browsers, on the other it opens a can of worms a door to many novel communicating applications that scares your enterprise IT guy. Existing enterprise policies and tools are designed to work with existing communication systems such as emails and VoIP, and employ bunch of edge devices such as firewalls, reverse proxies, email filters and SBCs (session border controllers) to protect the enterprise data and interaction from the outside world. Unfortunately, session-less WebRTC media flows and, more importantly, end-to-end encrypted data channels, pose the same threat to enterprise security that peer-to-peer applications did. The first reaction from your IT guy will likely be to block all such peer-to-peer flows in the firewall. There is a need to systematically understand the risk and propose potential solutions so that the pearls of WebRTC can co-exist with the chains-and-locks of enterprise security.
"WebRTC will have a major impact on enterprise communications, as well as consumer communications and their interactions with enterprises. This article illustrates and discusses a number of issues that are specific to WebRTC enterprise usage. Some of these relate to security: firewall traversal, access control, and peer-to-peer data flows. Others relate to compliance: recording, logging, and enforcing enterprise policies. Additional enterprise considerations relate to integration and interoperation with existing communication infrastructure and session-centric telephony systems."
Building communicating web applications leveraging endpoints and cloud resource service
One problem with the growth of SIP was that there were not many applications that demonstrated its capabilities beyond replacing a telephone call. This resulted in telephony mindset dominating the evolution in both standards and deployments. At the core of it, WebRTC essentially does most of what SIP aims for - peer-to-peer media flows and control in the end-points. Unfortunately, telephony influence in existing SIP devices, servers and providers have hidden that benefit from the end-user. There is a need to avoid such dominating telephony influence on WebRTC by demonstrating how easy it is to build communicating web applications, without relying on existing (and expensive) VoIP infrastructure.
Secondly, the open web is threatened by closed social websites that tend to lock the user data in their ecosystem, and force the user to only use what they build, go where they exist and talk to who they allow. This developer focused research is an attempt to go against them, by showing a rich internet application architecture focused on application logic in the end-point and application mash-ups at the data-level. It shows my 15+ crucial web widgets written in few hundred lines of code each, covering wide rand of use cases from video click-to-call to automatic-call-distribution, call-queue and shared white-boards, and my web applications such as multiparty video collaboration, instant messaging/communicator, video presence and personal social wall, all written using HTML5 without depending on "the others" (imagine tune from lost played here) - the others are the existing SIP, XMPP or Flash Player based systems.
"We describe a resource-based architecture to quickly and easily build communicating web applications. Resources are structured and hierarchical data stored in the server but accessed by the endpoint via the application logic running in the browser. The architecture enables deployments that are fully cloud based, fully on-premise or hybrid of the two. Unlike a single web application controlling the user's social data, this model allows any application to access the authenticated user's resources promoting application mash-ups. For example, user contacts are created by one application but used by another based on the permission from the user instead of the first application."
Private overlay of enterprise social data and interactions in the public web context
Continuing the previous research theme, this research focuses on enterprise collaboration and enterprise social interactions. The goal is to define a few architectural principles to facilitate rich user interactions, both real-time and offline in the form of digital trail, in an enterprise environment. In particular, my proof-of-concept project described in the paper shows how to do web annotations, virtual presence, co-browsing, click-to-call from corporate directory, and internal context sensitive personal wall. The basic idea is to separate the application logic from the user data, keep the user data protected within the enterprise network, use public web as a context for storing the user interactions and build a generic web application framework running the browser.
"We describe our project, living-content, that creates a private overlay of enterprise social data and interactions on public websites via a browser extension and HTML5. It enables starting collaboration from and storing private interactions in the context of web pages. It addresses issues such as lack of adoption, privacy concerns and fragmented collaboration seen in enterprise social networks. In a data-centric approach, we show application scenarios for virtual presence, web annotations, interactions and mash-ups such as showing a user's presence on linked-in pages or embedding a social wall in corporate directory without help from those websites. The project enables new group interactions not found in existing social networks."