Saturday, February 20, 2010

FAQ on using Flash Player to make phone calls

I present my answers to some frequently asked questions (FAQ) on using Flash Player to make phone calls.

1. Is Flash Application a good choice for VOIP?

Depends, the RTMP based application is not a good choice, whereas new RTMFP application is good for Flash to Flash Internet voice applications. For Flash to Phone applications, Flash is not a good choice as it is. Flash is good at user interface and ubiquitous availability but the TCP-based RTMP is not suitable for real-time interactive media, and UDP-based RTMFP is proprietary so cannot interwork with existing SIP-based VoIP systems.

Secondly, Flash Player is missing some of the crucial VoIP pieces such as good silence suppression and echo cancellation, so Flash based VoIP client becomes useless without a headset.

Thirdly, Although Flash Player supports open standard Speex audio codec, many existing VoIP providers do not support Speex, and expect only traditional voice codecs like G.729 and G.723.1. So you may also need to incorporate transcoding which is CPU intensive. Video transcoding is more difficult because of the proprietary video codec in Flash Player.

2. Will there be any performance degradation when the call goes through the following paths? (Flash Client -> Media Server ->RTMP to SIP Converter -> VOIP Server -> VoIP/PSTN Gateway -> PSTN Network -> Telephone)

Yes. If you can avoid intermediaries to cut down on media path latency, it will help a lot. Typically the VoIP Server (or SIP proxy server) is independent of the media path so that doesn't affect. But the media path goes through Media Server (FMS?) and RTMP to SIP converter, and that too over TCP. This degrades the quality a lot. One way could be to remove the "Media Server" from your path by having Flash Client directly connect to the RTMP to SIP converter. Also if you can reduce the network distance between the Flash Client and RTMP to SIP Converter, that will help a lot.

Secondly, with Flash Player you may need to do audio transcoding in your RTMP to SIP converter. This further degrades the performance and limits the scalability of your converter.

3. Some experts says that the development in C or C++ is prefered for VOIP call to phone instead of Flash Player for performance reason. Is that true?

A native VoIP client is preferred over Flash Player because the media packets can go directly from the client to the telephone instead of going through the RTMP to SIP converter. The advantage is because (1) the native client can use UDP instead of restricted to TCP-based RTMP, and (2) the network distance is lower for a direct path. Even if your converter is on good network and close to your client so that the network distance is not much of an issue, the UDP-vs-TCP makes a great impact in improving the quality of native VoIP client implementation over Flash Player.

In general the network component affects the quality more than the programming language. So whether you use C/C++, Python, Java or some other language, it doesn't matter much. But if you can have end-to-end media path over UDP between the two clients, or between the client and the gateway, it is much better. Obviously with Flash Player you cannot have the packets go directly unless your RTMP to SIP converter is local to the Flash Client.

All the existing good quality systems (Skype, GTalk) tend to use end-to-end media-path over UDP as much as possible.

4. There are different media servers available. like Adobe Flash Media server (FMS), Wowza, Red5 etc. Which one is the best choice?

Do you still want to pursue RTMP to SIP converter? Anyways: In terms of performance I would guess that FMS is the best choice. But if your aim to build a RTMP to SIP converter than probably Red5 is the the best. FMS is proprietary with not much customization/programming choices available, so you cannot easily integrate a SIP stack or a RTMP to SIP converter to FMS. On the other hand Red5 is completely open source and in Java so allows easy integration with other Java based SIP stack. Additionally you could integrate SIP stacks written in other advanced languages such as Python or Ruby because Red5 allows applications in those languages, whereas an FMS application is restricted to ActionScript 1.0.

I haven't worked with or used Wowza so I cannot comment on that. I have worked with FMS and Red5 though, as well as Python based rtmplite and siprtmp projects.

6. We are now in a confusion whether to develop our VOIP application in Flash technology or QT/Java/C#. What will be your choice?

I think that decision mostly comes from your business case. But I would suggest non-Flash technology if possible and if your business demands very good quality of voice service. If your VoIP client will be assisting your main business, then people won't mind downloading and installing the VoIP client. The advantage Flash has is that it is already available on most people's browser so doesn't require additional download or installation. So if your VoIP application is only a small part of your main web-based business, then Flash technology will be better I think.

Another option is to use the Gmail video/voice architecture described in my article. Basically it uses Flash Player for user interface, but all the networking or voice related processing happens using their native GoogleTalk plugin.