The dangers to the promise of WebRTC

WebRTC is an emerging HTML5 component to enable real-time audio and video communication from within the web browser of a desktop or mobile platform. In this article I present my critical views on what can potentially go wrong in achieving the promise of WebRTC.

The promise

The technology promises the web developers a plugin-free cross-browser standard that will allow them to create exciting new audio/video communication applications and bring them to the millions (or even billions!) of web users on the desktop and mobile platform.

The rest of the article assumes these premises:
  1. A technology is easy to invent but hard to change.
  2. Business motivation drives technology deployment.

The dangers

Following are just some of the questions I will explore in this article.
  1. What if Apple does not commit to WebRTC?
  2. What if Microsoft's implementation is different than Google's?
  3. What if there are different flavors of the API implementation?
  4. What if a web browser cannot talk to legacy SIP infrastructure without gateways?
  5. What if "they" cannot agree on a common video codec?
Let us begin by listing the existing technologies - real-time communication protocols (RTP, SIP, SDP), HTTP, web servers, Flash Player and related media servers on desktop - these are hard to change. On the other hand the emerging HTML5 technologies can be changed in the right direction if needed.

Now what are the business motivations for the browser vendors? Google is the most open, rich and web-centric browser vendor with a lot of money and a lot of motivation to drive the technology for a true browser-to-browser real-time communication. It makes sense to design a platform that enables web clients to connect to each other via a rendezvous or relay service that runs in the cloud. Improving and simplifying the tools available to web developers while hiding the complexity of protocol negotiation will improve the overall web experience, which in turn will drive business for web-centric companies such as Google. Moreover, the end-to-end encryption and open codecs are more important to Google than having to deal with legacy VoIP infrastructure -- which gets reflected in the choice of the several new RTP profiles for media path and VP8 for video.

On the other hand, Apple prefers closed systems with integrated offering that works seamlessly across Apple platforms. Opening up the Facetime (or related technology) to all the web developers is interesting but not motivational enough, especially when the resulting web application and the WebRTC session is controlled by the web developers and users instead of the iOS developer platform or the iTunes portal. This makes Apple, as a browser vendor, uncommitted to WebRTC until it proves its worth to the business.

Microsoft already has a huge communication server business that is largely SIP based. Any technology that makes it difficult to interoperate (at the media path/RTP level) with the existing SIP application servers is likely to cause trouble. They could build a gateway, but it will be hard to compete with others who can provide such services on the low cost cloud infrastructure. Hence, having the building blocks in place that will enable a web browser to talk to existing SIP devices in the media path is probably a core requirement for Microsoft. Thus, low level access to SDP and more control over media path is proposed in their draft.

For a web browser on desktop, WebRTC is interesting in bringing a variety of new use cases and web application. However, these use cases are already available to web users on the desktop via alternatives such as Adobe's Flash Player. It it not clear what the WebRTC can contribute in terms of additional use cases for web applications on desktop. However, for mobile platforms and smart phones it is a completely different story. Flash Player either does not work or is too difficult to work for video conferencing on mobile. HTML5 presents a great opportunity to bridge the differences between end user platforms when it comes to web applications. Thus, adding WebRTC to web browsers on mobile phones such as iPhone and Android adds a lot of value.

Let us consider a few "what if"s that pose danger to the promise of WebRTC.

What if Apple does not commit to WebRTC?

If Apple with its iOS platform and variety of cool gadgets and frantic fans-base decides to keep the real-time communication out of reach of web developers -- people will still build such application albeit as standalone native applications and sell over iTunes. Eventually the web developer may end up writing custom  platform dependent kludges, e.g., invoke the native API for iOS but use WebRTC for others. Once Apple realizes that the web-based real-time communication is slipping its grips it may be too late.   Moreover, other browsers besides Safari could run on iOS instead.

What if Microsoft's implementation is different than Google's?

Based on the historic browser interoperability issues and the recent announcement by Microsoft, this is quite likely. Fortunately, web developers are used to such custom browser-dependent kludges. However, WebRTC poses greater risk of differences in operation and interoperability beyond just some HTML/Javascript hacks. Having a server side gateway for media path to facilitate interoperability among browsers from different vendors is not in the spirit of end-to-end principle and is not the correct motivation for WebRTC in my opinion. Moreover, the Chrome Frame extension for Internet Explorer could relieve web developers to some extent.

What if there are different flavors of the API in different browsers?

This is similar to the previous threat -- if the implementations among the browsers do not interwork, especially among the leading mobile platform vendors such as Apple, Google and Microsoft, then we are back to the square one. As a web developer what will be the motivation to build three different applications using WebRTC when one can build once using Adobe Flash Player? Perhaps, Adobe will revive the mobile version of Flash Player to solve the differences :)

What if a web browser cannot talk to legacy SIP infrastructure without gateways?

This issue has troubled both telecom and web-centric vendors alike. Web browsers do not want to blindly implement legacy SIP family of standards (actually RTP) because the problems on the web are different, and need to be addressed differently. The telecom centric vendors do not want to re-do everything to interoperate with web browser. If they can get the web browser to talk to their existing SIP/RTP components by keeping the media path intact, it will be a big deal. Having to implement a variety of new RTP profiles in their equipment for interoperability with web is unmotivational.

This could go in one of the two ways - either browser vendors agree on not interoperating with legacy RTP devices in which case we will need a gateway to do interoperability, or the browser vendors implement proprietary extensions (back doors!) that allow such hooks. If it is the latter, it will get immediately picked up by the telecom-sponsored developers to build applications that use the legacy RTP mode only -- not a good thing!

What if "they" cannot agree on a common video codec?

The debate on what codec to pick is as old as the debate on WebRTC. Currently, "they" are forming two camps -- one that supports royalty free VP8 family of standards and other that backs ubiquitous H.264 family of standards (or keep no mandatory codec). The advantage of H.264 is that many hardware already comes with fast H.264 processing hence will be available natively on mobile platform, whereas with VP8 you may end up using your battery life on the codec -- until VP8 becomes available in the hardware platform. It is not clear what will happen. If VP8 gets picked as the mandatory-to-implement codec in the browser, I can see some browser vendors digressing from the standard and implementing only H.264 (hardware supported codec) or implementing both VP8 and H.264.

What are the other potential threats to this technology? What if the standardization works drags for too long, until it is too late, and web users have moved on to something better! Web is driven by web applications. What a browser vendor does or what a standardization committee adopts are just a shadow of what the web applications do (or want to do). A web developer prefers to have a consistent, simple and unified API for real-time communication.

If it were in my hands, I would ban any IETF or W3C proposal without a running and open implementation. But the reality is far from this!