Implementing video conferencing and text chat using Channel API

Last week, Google finally released the Channel API [1, 2] for Google App Engine. It has been available to developers for six months [3], but not on actual app engine for production. I had built a few video conferencing and text chat applications [4, 5] using Flash VideoIO project [6] on Google App Engine. Earlier, I had to use Ajax/polling technique to get events related to chat and user list. In the last couple of days, I modified those applications to use the asynchronous event notifications using the Channel API. More text from [6] follows:

"Random-Face [4]: This is a chatroulette-type application built using the Flash VideoIO component on Adobe Stratus service and Python-based Google App Engine. ... You can view the source code of two files, index.html that renders the front end user interface and main.py that forms the back-end service."

"Public-Chat [5]: This is a multi-party audio, video and text chat application built on top of Python-based Google App Engine and using Channel API for asynchronous instant messaging and presence. ... Developers can see the source code files: index.html is the front-end user interface, webtalk.js is the client side Javascript to do signaling, and main.py is the back-end service code."

The Channel API essentially implements an XMPP-style asynchronous communication from your server to the Javascript client. I use this to implement notifications for new messages, change in user list, and update of user video session to other participants in the system.

What is Flash Media Gateway?

I recently saw description of Adobe's Flash Media Gateway [1] and a related information on how Adobe Connect 8 can use it to make and receive SIP calls [2]. This article lists my view on advantages and problems of such a gateway architecture. (Disclaimer: I have not used any of these products though, so my views may be completely wrong).

In summary the new Flash Media Gateway is similar to the bunch of other SIP-RTMP gateway products that already existed for few years, e.g., siprtmp, gtalk2voip, flaphone and red5phone. I feel the industry demand of interoperating between Flash Player and SIP devices eventually forced the company to do something about it. Unfortunately, it did something which is sub-optimal as I describe here.

I have been involved with development of open source siprtmp project [3] hence I can speak from my experience about advantages and problems with such an architecture. I have also blogged earlier about FAQ on using Flash Player to make phone calls [4].

Advantages of Flash Media Gateway
  1. It allows you to build Flash applications that can talk to SIP devices using Adobe's servers at the back end. While it is not useful for those who already have resorted to other solutions such as Red5 and Wowza, it is useful to those who use Adobe's Flash Media Server (FMS) and do not want to switch to other alternatives for any reason. Problem: It is not clear whether the Flash Media Gateway can work with other media servers such as Red5 or Wowza.
  2. It supports audio transcoding among Speex, Nellymoser and G.711, as well as mixing for a simple conference bridge. This allows working with older Flash players that do not have Speex and with SIP devices that do not have Speex. A third-party product such as siprtmp is typically reluctant to implement transcoding with Nellymoser because of licensing restrictions. Problem: In general transcoding is not the best option because it takes significant CPU cycles on your (expensive) hosted servers. It can drastically reduce the capacity of your server by a factor, e.g., support 100 calls with Speex or support 10 calls between Speex and G.711.
  3. It supports video using H.264. Problem: It is not clear whether it allows only one-direction H.264 from SIP device to Flash Player, or whether it supports bi-directional H.264. A bi-directional H.264 will a huge advantage, but will mean that Flash Player is capable of capturing and sending H.264 video, which does not look like the case.
  4. It can potentially support UDP between Flash Player and server. Note that one of the biggest issue with real-time voice calls with Flash Player was that RTMP (over TCP) caused high latency not suitable for interactive communication. Adobe added another protocol, RTMFP (over UDP), that could allow end-to-end media path among the participants thus drastically reducing the end-to-end audio latency. While a gateway architecture does not allow end-to-end media path, it can still allow UDP between Flash Player and media server using RTMFP. This could reduce the end-to-end latency to some extent. Problem: It is not clear whether RTMFP can be used in conjunction with Flash Media Gateway.
Problems with Flash Media Gateway
  1. It does not allow you to build a SIP client in the browser. The communication between Flash application and the media server/gateway is still over RTMP (or RTMFP). This means unlike true end-to-end media path for SIP calls, the media must go through the server/gateway. I don't think the connect plugin is implementing a SIP/RTP stack because it says that it uses the gateway in the back end.
  2. If RTMFP is not allowed for such SIP calls, then the RTMP (over TCP) connection will significantly contribute to latency which is not suitable for interactive voice calls unless you have deployed the gateway close to your user.
  3. Most SIP-PSTN gateways that translate SIP calls to phone network support traditional voice codecs of G.711, G.729, G.723.1 but not Speex or Nellymoser, whereas the Flash Player supports only Speex and Nellymoser for captured voice. Thus you always need a transcoding. Unfortunately, G.711 at 64 kb/s is expensive on bandwidth compared to say G.729 at 8 kb/s. Since the gateway does not support common voice codecs of PSTN providers, in most cases you will need to run some form of transcoding, twice! or live with higher bandwidth usage.
  4. It does not add any more significant value to what already exists with red5phone or siprtmp. You still need to use a third-party SIP provider who can terminate your PSTN calls. It does not optimize the media path latency because of the gateway architecture. And finally it does not really improve the call experience for Flash to SIP calls to the end-user.
Ideally, the SIP/RTP and related protocols should become part of Flash Player, so that it allows one to create a SIP user agent in the browser and enable low latency end-to-end media path with third-party SIP user agents.