WebRTC vs Flash Player

This year has been great for the world of IP communications so far -- with the Skype deal, Flash Player adding echo cancellation, and now Google open sourcing WebRTC (with source code) that includes the audio/video codecs and quality engines.

RTC-Web is an effort started in the IETF (and Web-RTC in W3C) to standardize the way media streams are transported end-to-end between two browser instances for a real-time communication experience within the browser. It consists of a protocol for establishing end-to-end media path, abstractions for audio/video codecs and devices, and the language elements to use this feature from with Javascript/HTML. Traditionally browser communication has been done using plugins such as Flash Player. I have written a few open source software projects that use Flash based audio and video communication (flash-videoio, siprtmp, vvowproject). The WebRTC effort brings a completely new dimension, in a good way, because now we do not depend on external plugins for web based real time communications. The real-time communication becomes a first class construct to web developers.

This article summarizes some differences between WebRTC and Flash Player approaches for real-time audio/video communication. It also mentions a separate application approach as described in the VVoW project.

WebRTC is inline with the evolution of web protocols whereas using Flash Player is like patching an incomplete system. With WebRTC there is no external dependency beyond the basic web browser. However, given the ubiquitous availability of Flash Player compared to basic inter-operating HTML5 features, Flash Player approach is still promising, at least in the short term.

The number of web developers who understand Javascript/HTML is clearly much more than Actionscript/MXML, which benefits WebRTC approach as there can be many more new applications and use cases implemented in practice. However, the complexity of building Javascript based application combining various individual pieces of the communication elements may be overwhelming. On the other hand existing IDE tools for Flash development take away a lot of complexity from the developers.

Many users are reluctant to change their browser, and hence getting ubiquitous user adoption may take a long time unless this gets added to Internet Explorer. Moreover, dealing with device interfaces in a portable manner is a challenge. It is also not clear how the devices should be accessed across multiple instances of the same browser or different browser.

In the past, incompatibility in HTML among browsers has been a nightmare for web developers, and extending HTML for yet another feature is bound to cause more interoperability problems. Two interoperability scenarios are significant: between browsers from different vendors running the same web page, and between two different web sites. The latter is tricky from security point of view if open standards are used because the web site owners would want to restrict communication of its user to another web site user, whereas the protocol will be capable of such communication.

On the other hand, Flash Player has shown more ubiquitous availability on user's desktops and laptops than any specific web browser. Flash Player allows implementing platform agnostic software because all the incompatibilities between browsers and platforms are taken care by the plugin vendor.

Flash Player has the ability to do group communication by building scalable application level multicast tree among Flash Player instances. This is useful for one-to-many broadcast type communication scenarios. WebRTC is still in the initial phases of two party communication. Obviously, multiparty communication can be built on top of the two-party communication elements, but requires more effort to achieve efficiency.

In terms of video codecs, WebRTC provides open source high quality video codec, whereas Flash Player's camera captured video is still in outdated Sorenson codec, which is difficult to interoperate with non-Flash products. Availability of source code enables a WebRTC-based project to add new codecs as needed without depending on the vendor to provide new audio and video codec features.

The main problems with Flash Player approach is that the protocol for end-to-end media path is proprietary so interoperating with existing VoIP gears is inefficient without buying server pieces from the plugin vendor. Although, interoperability is possible using open RTMP and SIP-RTMP translators, it is not efficient because the browser to translator media path over TCP incurs unnecessary latency for some users. Secondly, for any new feature, we depend on the vendor, for example, echo cancellation, new codec, portability to new device. Luckily, Adobe has been releasing new updates with new features periodically. For example, echo cancellation feature released in Flash Player 10.3 solved a lot of problems for real-time communication. (Please see the public-chat demo in my flash-videoio project page to try out the video conference with echo cancellation.)

Some problems common to both the approaches are: (1) lack of a listening TCP socket or a general purpose UDP socket which could be used to implement a peer-to-peer application protocol within the browser without relying on servers, (2) the scope of an application is within a web page as defined by the Javascript or Flash elements, so if the user navigates to another web page the communication is lost. This is not a problem for web communication use case, but people are generally not used to this model in traditional communication.

On the other hand, the separate application model as used in the VVoW project allows you to have host resident software for communication, which can be used by any application including a web application running in your browser by connecting to the resident software locally. The resident application can reuse the existing research, e.g., Host Identity Protocol and P2P-SIP. This can save initial setup time for every connection of WebRTC. The main problem is that it involves yet another download and installation by the end user which hampers wide adoption.

I will continue to explore the WebRTC software developed by Google and try to include it in my open source projects. Some example projects could be: (1) add interoperability between WebRTC and Flash Player for communication in my siprtmp project, (2) add option to detect WebRTC support and use that in my flash-videoio project if available, and fallback to Flash Player, and (3) use the WebRTC source code to implement a separate application with high quality end-to-end media path in the VVoW project, and (4) create a Python wrapper to use WebRTC from within any Python application.

3 comments:

Ravikant Cherukuri said...

Great comparision Kundan. Have you looked at the P2P capabilities of WebRTC in comparision with what SIP and RTMFP offer? Do you have an idea of how close is sorensen spark is to H.263? To interop between flash and WebRTC, would you have to transcode audio and video on the server?

Kundan Singh said...

The P2P capabilities are similar to that of SIP as it allows end-to-end media path. RTMFP's P2P capabilities are more as it has built-in application level multicast group feature in addition to the end-to-end media path function. On the other hand, because of the open nature of the protocol, you can use your own media relay servers with WebRTC and SIP but not with RTMFP. So in practice, one can build a Skype like scalable super node architecture using WebRTC (and SIP).

I don't know much about Sorenson spark. From what I know it is closer to H.261 with some changes to make it scalable any dimension.

Yes, for interoperating between Flash Player and WebRTC ideally it would need to transcode both audio and video because Flash Player support Nellymoser, Speex and Sorenson whereas WebRTC currently has iLBC, iSAC and VP8.

Val Gunnarsson said...

Thank you for a very helpful explanation!