Articles on technology, Internet, Protocols, Web, Open Source, VoIP, P2P, SIP, RTMP and WebRTC.
WebRTC vs Flash Player
RTC-Web is an effort started in the IETF (and Web-RTC in W3C) to standardize the way media streams are transported end-to-end between two browser instances for a real-time communication experience within the browser. It consists of a protocol for establishing end-to-end media path, abstractions for audio/video codecs and devices, and the language elements to use this feature from with Javascript/HTML. Traditionally browser communication has been done using plugins such as Flash Player. I have written a few open source software projects that use Flash based audio and video communication (flash-videoio, siprtmp, vvowproject). The WebRTC effort brings a completely new dimension, in a good way, because now we do not depend on external plugins for web based real time communications. The real-time communication becomes a first class construct to web developers.
This article summarizes some differences between WebRTC and Flash Player approaches for real-time audio/video communication. It also mentions a separate application approach as described in the VVoW project.
WebRTC is inline with the evolution of web protocols whereas using Flash Player is like patching an incomplete system. With WebRTC there is no external dependency beyond the basic web browser. However, given the ubiquitous availability of Flash Player compared to basic inter-operating HTML5 features, Flash Player approach is still promising, at least in the short term.
The number of web developers who understand Javascript/HTML is clearly much more than Actionscript/MXML, which benefits WebRTC approach as there can be many more new applications and use cases implemented in practice. However, the complexity of building Javascript based application combining various individual pieces of the communication elements may be overwhelming. On the other hand existing IDE tools for Flash development take away a lot of complexity from the developers.
Many users are reluctant to change their browser, and hence getting ubiquitous user adoption may take a long time unless this gets added to Internet Explorer. Moreover, dealing with device interfaces in a portable manner is a challenge. It is also not clear how the devices should be accessed across multiple instances of the same browser or different browser.
In the past, incompatibility in HTML among browsers has been a nightmare for web developers, and extending HTML for yet another feature is bound to cause more interoperability problems. Two interoperability scenarios are significant: between browsers from different vendors running the same web page, and between two different web sites. The latter is tricky from security point of view if open standards are used because the web site owners would want to restrict communication of its user to another web site user, whereas the protocol will be capable of such communication.
On the other hand, Flash Player has shown more ubiquitous availability on user's desktops and laptops than any specific web browser. Flash Player allows implementing platform agnostic software because all the incompatibilities between browsers and platforms are taken care by the plugin vendor.
Flash Player has the ability to do group communication by building scalable application level multicast tree among Flash Player instances. This is useful for one-to-many broadcast type communication scenarios. WebRTC is still in the initial phases of two party communication. Obviously, multiparty communication can be built on top of the two-party communication elements, but requires more effort to achieve efficiency.
In terms of video codecs, WebRTC provides open source high quality video codec, whereas Flash Player's camera captured video is still in outdated Sorenson codec, which is difficult to interoperate with non-Flash products. Availability of source code enables a WebRTC-based project to add new codecs as needed without depending on the vendor to provide new audio and video codec features.
The main problems with Flash Player approach is that the protocol for end-to-end media path is proprietary so interoperating with existing VoIP gears is inefficient without buying server pieces from the plugin vendor. Although, interoperability is possible using open RTMP and SIP-RTMP translators, it is not efficient because the browser to translator media path over TCP incurs unnecessary latency for some users. Secondly, for any new feature, we depend on the vendor, for example, echo cancellation, new codec, portability to new device. Luckily, Adobe has been releasing new updates with new features periodically. For example, echo cancellation feature released in Flash Player 10.3 solved a lot of problems for real-time communication. (Please see the public-chat demo in my flash-videoio project page to try out the video conference with echo cancellation.)
Some problems common to both the approaches are: (1) lack of a listening TCP socket or a general purpose UDP socket which could be used to implement a peer-to-peer application protocol within the browser without relying on servers, (2) the scope of an application is within a web page as defined by the Javascript or Flash elements, so if the user navigates to another web page the communication is lost. This is not a problem for web communication use case, but people are generally not used to this model in traditional communication.
On the other hand, the separate application model as used in the VVoW project allows you to have host resident software for communication, which can be used by any application including a web application running in your browser by connecting to the resident software locally. The resident application can reuse the existing research, e.g., Host Identity Protocol and P2P-SIP. This can save initial setup time for every connection of WebRTC. The main problem is that it involves yet another download and installation by the end user which hampers wide adoption.
I will continue to explore the WebRTC software developed by Google and try to include it in my open source projects. Some example projects could be: (1) add interoperability between WebRTC and Flash Player for communication in my siprtmp project, (2) add option to detect WebRTC support and use that in my flash-videoio project if available, and fallback to Flash Player, and (3) use the WebRTC source code to implement a separate application with high quality end-to-end media path in the VVoW project, and (4) create a Python wrapper to use WebRTC from within any Python application.
Performance of siprtmp: multitask vs gevent
There are several performance aspects of this software, e.g., CPU utilization per call or session, memory usage, bandwidth requirement, etc. This article only focuses on the CPU performance. Moreover, I only consider the steady state CPU usage to measure the number of active simultaneous calls through the gateway. The CPU usage during call setup and termination is not considered.
The conclusion of my measurement is as follows. The SIP-RTMP gateway software using gevent takes about 2/3 the CPU cycles than using multitask, and the RTMP server software using gevent takes about 1/2 the CPU cycles than using multitask. After the improvements, on a dual-core 2.13 GHz CPU machine, a single audio call going though gevent-based siprtmp using Speex audio codec at 8Hz sampling takes about 3.1% CPU, and hence in theory can support about 60 active calls in steady state. Another way to look at it is that the software requires CPU cycles of about 66 MHz per audio call.
Traditionally, I have used the multitask framework for co-operative multitasking in my Python software including p2p-sip, rtmplite and siprtmp. In the past, people have complained about high CPU utilization in siprtmp for a single call or even with no call. Part of the discussion is documented in issue 31. It turned out that the no-call CPU usage was a bug, and that we could optimize the multitask framework to improve the performance by approximately 2x. The optimization alters the way in which the multitask framework looks for io-events and more tasks. In particular, it gives more preference to tasks than to io-events, hence if a single io-event generates multiple tasks, all of them run before waiting for next io-events. These optimizations and fixes are in SVN r60 and r68. Unfortunately, these optimizations are not enough.
To further improve the performance, I looked at the built-in asyncore module of Python and re-implemented rtmp.py to use asyncore. There was significant improvement of approximately another 1.5x to 2x. Unfortunately, getting timers to work with asyncore is not trivial. Hence I couldn't implement siprtmp easily as the SIP/RTP library relies heavily on timers.
Then I looked at the gevent project, thanks to a co-worker for recommending it. It supports co-routine based co-operative multitasking by modifying the existing blocking modules such as socket. Compared to the multitask framework, the source code using gevent is more readable and easy to maintain because it works behind the scene. Unlike this, the multitask framework requires yield statements scattered everywhere and non-trivial StopIteration exception to return from a task. I re-implemented siprtmp.py, and related SIP/RTP modules, using gevent. Since siprtmp module includes all of rtmp module, this can also be used as an RTMP server in addition to being a SIP-RTMP gateway.
Test Setup
| Rate | multitask | gevent |
| 8 kHz | 4.8-5.1% | 3.1-3.2% |
| 16 kHz | 6.2-6.5% | 4.0-4.1% |
| Media | #players | multitask | gevent |
| Audio | 0 | 2.2% | 1.3% |
| Audio | 1 | 3.5% | 1.8% |
| Audio | 2 | 4.5% | 2.1% |
| Audio | 3 | 5.5% | 2.5% |
| Audio+Video | 0 | 3.0-3.9% | 1.4-1.7% |
| Audio+Video | 1 | 4.2-4.7% | 2.1% |
| Audio+Video | 2 | 5.5-6.3% | 2.7% |
| Audio+Video | 3 | 7.0-7.6% | 3.1% |
Based on these, we can conclude that gevent-based software takes less than 1/2 the CPU than the multitask-based software for RTMP streaming. Roughly, the gevent-based server takes 34 MHz/publisher and 12 MHz/player of the CPU cycles in steady state.
Implementing video conferencing and text chat using Channel API
"Random-Face [4]: This is a chatroulette-type application built using the Flash VideoIO component on Adobe Stratus service and Python-based Google App Engine. ... You can view the source code of two files, index.html that renders the front end user interface and main.py that forms the back-end service."
"Public-Chat [5]: This is a multi-party audio, video and text chat application built on top of Python-based Google App Engine and using Channel API for asynchronous instant messaging and presence. ... Developers can see the source code files: index.html is the front-end user interface, webtalk.js is the client side Javascript to do signaling, and main.py is the back-end service code."
The Channel API essentially implements an XMPP-style asynchronous communication from your server to the Javascript client. I use this to implement notifications for new messages, change in user list, and update of user video session to other participants in the system.
What is Flash Media Gateway?
- It allows you to build Flash applications that can talk to SIP devices using Adobe's servers at the back end. While it is not useful for those who already have resorted to other solutions such as Red5 and Wowza, it is useful to those who use Adobe's Flash Media Server (FMS) and do not want to switch to other alternatives for any reason. Problem: It is not clear whether the Flash Media Gateway can work with other media servers such as Red5 or Wowza.
- It supports audio transcoding among Speex, Nellymoser and G.711, as well as mixing for a simple conference bridge. This allows working with older Flash players that do not have Speex and with SIP devices that do not have Speex. A third-party product such as siprtmp is typically reluctant to implement transcoding with Nellymoser because of licensing restrictions. Problem: In general transcoding is not the best option because it takes significant CPU cycles on your (expensive) hosted servers. It can drastically reduce the capacity of your server by a factor, e.g., support 100 calls with Speex or support 10 calls between Speex and G.711.
- It supports video using H.264. Problem: It is not clear whether it allows only one-direction H.264 from SIP device to Flash Player, or whether it supports bi-directional H.264. A bi-directional H.264 will a huge advantage, but will mean that Flash Player is capable of capturing and sending H.264 video, which does not look like the case.
- It can potentially support UDP between Flash Player and server. Note that one of the biggest issue with real-time voice calls with Flash Player was that RTMP (over TCP) caused high latency not suitable for interactive communication. Adobe added another protocol, RTMFP (over UDP), that could allow end-to-end media path among the participants thus drastically reducing the end-to-end audio latency. While a gateway architecture does not allow end-to-end media path, it can still allow UDP between Flash Player and media server using RTMFP. This could reduce the end-to-end latency to some extent. Problem: It is not clear whether RTMFP can be used in conjunction with Flash Media Gateway.
- It does not allow you to build a SIP client in the browser. The communication between Flash application and the media server/gateway is still over RTMP (or RTMFP). This means unlike true end-to-end media path for SIP calls, the media must go through the server/gateway. I don't think the connect plugin is implementing a SIP/RTP stack because it says that it uses the gateway in the back end.
- If RTMFP is not allowed for such SIP calls, then the RTMP (over TCP) connection will significantly contribute to latency which is not suitable for interactive voice calls unless you have deployed the gateway close to your user.
- Most SIP-PSTN gateways that translate SIP calls to phone network support traditional voice codecs of G.711, G.729, G.723.1 but not Speex or Nellymoser, whereas the Flash Player supports only Speex and Nellymoser for captured voice. Thus you always need a transcoding. Unfortunately, G.711 at 64 kb/s is expensive on bandwidth compared to say G.729 at 8 kb/s. Since the gateway does not support common voice codecs of PSTN providers, in most cases you will need to run some form of transcoding, twice! or live with higher bandwidth usage.
- It does not add any more significant value to what already exists with red5phone or siprtmp. You still need to use a third-party SIP provider who can terminate your PSTN calls. It does not optimize the media path latency because of the gateway architecture. And finally it does not really improve the call experience for Flash to SIP calls to the end-user.
How to extend HTML5 for real-time video communication?
How to conduct a technical interview for software engineer?
- Know the position you are hiring for. If you have been part of a software engineering team or have read the book, "The mythical man-month", you would know that you need several different "types" of members in a successful team. You need a "magician", who knows or can figure out solution to every technical problem you may have. You need a couple of "plumbers" who are willing to fix any broken software piece. You need a "general" who is very motivated about what you are doing, knows how and when to delegate, and keeps everyone together. You need a few "soldiers" who can follow orders, do the job, and be happy to contribute. And so on. As an interviewer, you need to know what position you are hiring for? You need to tailer your interview as per the requirement. One interview pattern does not fit all types.
- Do your homework. Before the interview, thoroughly read the candidate's resume/CV. If she has extensive work experience, identify only one or two of her past projects to focus on. If you have even a slight doubt about her programming ability, prepare a written programming test. If possible, scheduler a separate or additional time slot before your face-to-face interview for the programming test. Do not use any existing online programming test material, otherwise you won't be able to distinguish between someone who knows how to program vs someone who has gone through many web sites containing interview questions. Do not give take home tests. Do not share your programming interview questions with other interviewers in your organization.
- Start with knowledge questions. During the interview, after initial introductions, start with a question on her past experience. Your interview should balance between knowledge and application types of questions. Do not ignore his experience or knowledge, and do not focus only on his experience. Getting started with what the candidate already knows is also a good way to make her comfortable. You can ask something from her past project, e.g., "Describe in one minute what you did in XYZ?", or ask about a past technology that he used extensively, e.g., "Did you use STL in C++? What are the common STL classes available?"
- Focus on real application problems. Most software engineering positions require applying your existing knowledge to a new problem. The one quality which distinguishes a good programmer from a mediocre programmer is that a good programmer can easily translate your problem in to pseudo-code. If you are interviewing for "soldiers" and not "magician" or "general", avoid discussing high-level design type of problems, but instead focus on more low level real technical problems. For example, instead of asking "How would you design a scalable web server for blah blah?" ask more specific questions. In my experience, people who can answer high level design questions can create "vaporware" but those who can translate a small real problem to pseudo-code can actually write "software". If you need software engineers, avoid wasting time on high level design questions. Also, such application problems should be independent of specific domains but just be able to test whether the candidate has the required mathematical and computer skills to translate your problem to pseudo-code. I have given some examples later.
- Follow thought processes and provide hints. If you believe that the candidate is getting diverted in to incorrect answer, there is no harm to give hints or counter-questions to course correct her thoughts. Do not be too adamant on your answer. Sometimes, a 75% correct answer is good enough.
- Provide itemized feedback. When you submit your recommendation to the HR or your manager about a candidate, specifically itemize individual qualities and performance, and emphasize specific skills and lack of it. For example, "I had a nice 45 min conversation with XYZ, and I found her to be a very good programmer but needs training on Flex. After initial introductions, I asked one algorithm and three programming questions. She did good in two programming ones and average on others. Programming ability: very good; Needs hand-holding: yes; Algorithms: average; Strength: programming; Weakness: Flex; Recommendation: weak accept." My final recommendation is one of strong-accept, weak-accept, weak-reject, or strong-reject, with implied meaning of "a very strong candidate, and must hire her", "a good enough candidate, but won't argue to hire him if others disliked her", "an average candidate, but won't argue to reject him if others strongly liked her", "a poor candidate, and must not hire her", respectively.
- Video conferencing layout: suppose you know the window dimension, WxH, and want to fit participant videos in MxM tile. Each video has fixed aspect ratio of 4:3. All video objects are of same size in the layout. Your MxM tile should be laid out in the middle-center, with potential empty spaces near window edges. The layout should maximize the size of the MxM tile, so that the empty spaces near edges are minimized. You are given an array of video objects V[] and a function layout(v:video, x, y, w, h) which lays out a single video object with size (w, h) at position (x, y) inside the window. Write pseudo-code to layout participant videos. (Hint: start with 1 video, then 2x2, then 3x3, then generalize. Additional questions: how would you modify it to NxM tile instead of MxM? What should happen if number of videos is more then 9 but less than 16 -- which boxes are empty? How would you modify it so that empty spaces including empty boxes are minimized in NxM layout?)
- Path optimization: suppose you have a map of a city with Manhattan-style layout. Suppose north-south streets are named, a1, a2, etc., and east-west streets are named b1, b2, etc. Some streets have traffic signals, with 5-second walk sign, 15 seconds count-down to continue walking if started, and 20 seconds don't walk sign, periodically repeating in that order. Other streets do not have traffic signal, in which case traffic must yield to pedestrians. Suppose you need to walk from corner of a5/b5 to corner of a7/b10, and only street with traffic lights on you way is a6. You walk at the same speed. Crossing a6 takes 15 seconds whereas crossing any other street takes only 5 seconds. You do not want to cross a6 if you know you can't finish before it turns to don't walk sign. You want to minimize the time taken from source to destination, hence minimize the time waiting on traffic lights. You have function named walk(), turn(left or right), stop(). Write pseudo-code for your decision process from your source to destination point. (Hint: draw out the map first, then it becomes easy to visualize and solve. Additional question: can you generalize between any two points as long as you know the complete map and which streets have signals?)
Why do P2P-SIP?
- Organizations and services providers can save cost of server maintenance. In particular, there is no new training or position required specifically for a VoIP IT staff. There is no need to host dedicated servers in data centers with 99% availability and pay for energy and bandwidth.
- The VoIP industry can move away from traditional service provider oriented business to a more open end-to-end user application. Essentially, it becomes more inline with your other Internet services such as web and email: how web browser don't distinguish between servers or domains, and how you can send email using any mail client and any provider to anybody else.
- Finally, the most important reason is that the cost saving eventually propagates to end-user, who can enjoy free VoIP service as long as they are paying for their IP network access. Peer-to-peer infrastructure enables highly scalable communication system such as Skype at a very low cost. A small vendor's VoIP is able to scale to millions of users only if it can save cost of server maintenance and bandwidth.
- It does not depend on a VoIP service provider for signaling and media path. Hence there is not much money for managed services in P2P-SIP.
- It can use end-user devices on public Internet to relay media path for end-users behind restricted networks. Hence it requires incentive for public end-users to help restricted end-users.
Theory vs practice of SIP-based VoIP
Tips for implementing application protocols
- Keep all blocking operations outside your protocol implementation. This mostly includes sockets, files and timers. If you design your protocol parser and controller to be independent of blocking calls, then it can easily be converted to various asynchronous or synchronous controllers as needed. For example, the rfc3261.py module implements core SIP stack using the Stack class. The application supplies API for timer creation, message receiving as well as sending. When the application receives a packet on socket, it invokes a method on the stack. When stack has parsed the received packet and needs to inform an high-level event such as incoming call to the application, it invokes a method on the application. This allows the application such as voip.py to provide co-operative multitasking based controller. On the other hand, the built-in HTTPServer in Python includes synchronous and blocking calls for sockets and disks. This makes the built-in class' HTTP implementation hard to use for various high-performance application that cannot afford to block. Due to this, almost every web framework implements its own HTTP, instead of re-using the built-in class. The trade-off is that your implementation may become more involved if you keep blocking operations outside.
- Do not use multi-threading in your protocol implementation. Firstly, getting a multi-threaded application right is very hard. Secondly, for CPU intensive tasks or disk I/O bound tasks, the CPython's global interpreter lock (GIL) will prevent efficiency anyway; hence multiprocessing should be used. Thirdly, for network I/O bound applications multi-threading has advantage, but not as much as multiprocessing. Consider using multiprocessing, but beware of cross-platform problems, especially on Windows! In my experience, co-operative multitasking (or green-thread) works best for protocol implementation. If you are worried about efficiency on multi-core CPUs, you should leave that decision to the main application that will use your protocol implementation to present a client or a server application. The main application can decide whether to use multi-threading or multiprocessing and co-ordinate among them.
- Decouple the protocol parsing and handling implementation. Sometimes you may need to use just the parser without the handler. For example, if a single incoming TCP connection can have either HTTP, SIP or RTSP messages, then it becomes easier for the application to first parse the message to determine what it is, and then invoke the appropriate handler. Because of NAT and firewall, many application protocols need to be sent via a single port, e.g., 80 or 443. If an application from Flash Player connects to your server on TCP, it will first send a socket policy request, before sending any other actual application protocol message. If your protocol parser is separate from the handler, you can invoke the socket policy request parser as well as protocol parser, to determine what request it is.
- Avoid blocking on DNS lookup, if possible. This goes back to first point; do not block in your protocol implementation. Usually it is hard to notice the DNS lookup as blocking. Most built-in libraries provide synchronous and blocking calls for DNS lookup. Consider using some asynchronous DNS library. If that is not possible, move the DNS lookup out of the core protocol implementation, to the main application. Sometimes DNS lookup is done during logging, e.g., to convert client IP address to host name, and may be hard to detect.
- Log all warning, errors and exceptions. In server implementations, you may get tempted to handle various exceptions and ignore it, to make your server more "robust". Unfortunately, this practice leads to more headaches later on when some critical bug appears but is hard to detect. If you log all warning, error or exception conditions, even if you ignore them, you may be able to detect such bugs early on. A warning is a suspicious behavior either in your code or external system. An error is a failure case due to some external problem, e.g., file requested by client was not found on server. An exception is most likely a programming mistake, e.g., accessing attribute on "NoneType".
- Do not hold on to resources. With automatic reference counting and garbage collection, it becomes your responsibility to free up any unused references. Typically the application protocol defines how long the resources should be kept, e.g., how long a SIP transaction lasts. But there are some resources which can persist for much longer duration, e.g., user contact location. External databases are more suitable for such resources. Secondly, with event driven software architecture such as listener-provider model, it is easy to get in to reference loops, e.g., listener has reference to provider and vice-versa. Similarly, a Message object may have list of Header objects, and each Header may refer back to the Message. Your cleanup code should correctly free up unused references, e.g., "del varname".
Scalability vs Performance
Lessons in starting a software project
- Brainstorm often: During the initial phases of software growth or even before starting to write a single line of code, you should do several sessions of brain storming. It could be on validating your idea, figuring out competition, predicting the future, picking a programming language, potential learning, etc. This is the difference between carefully planned birth versus unexpected pregnancy. Just because you can write some software, should you? Especially if better alternatives exist?
- Use good version control system: Even for the most trivial projects, you should try to use version control system. I like SVN (subversion) for my open-source projects, but if you can afford git, it works better for complex project management. If you are starting an open source project, consider code.google.com for hosting your SVN repository -- it is fast, simple and hassle free. It is like a good home for your baby software.
- Document all ideas: When the software is evolving you will have many ideas for new features, doing things differently, or incorporating competing features. Obviously due to lack of resources and time, you won't be able to incorporate all these. But you must document all the ideas, and if possible prioritize them. Keep a single list of ideas. Usually the software will evolve on its own to attract new features. Implement only the most crucial ideas and features, and resist the temptation to add many features.
- Few developers during growth: Keep the core set of excellent developers to one, two or at most three when the project is growing. Every major piece of software should have only one excellent developer. This avoid unnecessary friction and induces feeling of ownership. Software is like a baby, which needs a good parent to raise and grow, before it can mature and face the world. You wouldn't want to raise your software in a foster house where nobody feels ownership, i.e., in an organization with an engineering "team".
- Pick the right language and tools: Every programming language has some strengths and weaknesses. Make sure you select the right language, that is quick to develop with and maintain, and works well for your target application. For example, with low-level C/C++ you get performance, and with high-level Java, Python, you get portability. Over the years I have liked Python for most of my applications. Unfortunately, in corporate environment, Java is the pet-child because there are many fold more software developers and managers who know Java well. For modern Internet and web applications, Python, Ruby, Erlang and ActionScript are becoming more popular.
- Include testing and defensive programming: To be successful, sooner or later your software project will need to get out of the demo-mode and face the real world. It might become too late at that time to worry about scalability or glaring bugs if those involve redesigning your software. It saves a lot of time and energy to use common techniques such as good logging, unit testing, performance best practices, and defensive programming from day 1. Also maintain an issue tracker and log even the tiniest of issues with your software. Sooner or later you will need to address them.
- Don't procrastinate: If you have an idea to work on, don't procrastinate. Just get started, write something up, try to get a prototype going. Most successful projects need a complete re-write at least once. So don't be afraid to write throwaway code.
- Don't document before coding: While software engineering people will say that you should follow good software process -- writing requirements specification, design document, test cases, etc. -- those can be written later too! Source code is what makes or breaks a software. You can write detailed specification and design documents, after you already have a prototype and want to document it or propose a change. In my experience, any design document written before writing the code is incorrect, and needs to change drastically after the source code is written.
- Don't spend time on one-off items: For your software, there are some items which are directly related, and then there are one-off items. For example, for a VoIP client, the protocol implementation, good voice quality, etc., are directly related. On the other hand, having a user signup page, instant messaging text chat, file sharing, etc., are one or two-off items, which are not directly related, but indirectly assist users in VoIP. When you start a project, do not spend time doing one-off items, but work on directly related items first.
- Don't wait too long for 1.0 release: There is 80% difference between an 80% complete software and a released software. When you formally release your software, you have to take care of user manual, getting started guide, installer as well as finish those last annoying bugs. In the case of software projects, it is very easy to get started but very difficult to put an end. There is always an endless list of features which needs to be completed before the release, and hence your release never happens. Unless, you make it happen. You will have to make a firm decision about what bugs are important and what can remain as known issues for version 1.0.
Flash-VideoIO: Flash-based audio and video communication
I launched the Flash-VideoIO project to facilitate audio and video communication using easy-to-use reusable Flash application with extensive JavaScript API. More from the project page...
"Flash-VideoIO is a reusable generic Flash application to record and play live audio and video content. The Flash-VideoIO project aims at implementing a generic Flash application named VideoIO.swf which can be used for variety of use cases in audio and video communication, e.g., live camera view, recording of multimedia messages, playing video files from web server or via streaming, live video call and conferencing using client-server as well as peer-to-peer technology."
Developers are invited to explore and experiment with the VideoIO component, provide feedback and/or contribute to the development.
Distributed Systems Development: Client vs Server
Client
Considerations: Auto-configuration, IP address change detection, NAT and firewall traversal, robustness against failures, adapt to network condition, consistent user interface and view, command line vs user interface, guaranteed security, idle and sleep detection, responsiveness of user interface, redundant connections to servers, keep-alive, caching, analytics.A client is a user facing software. The responsiveness of the client user interface distinguishes a good software from an average one. For example, if the client doesn't get a server response within 200 ms, it can automatically inform the user via an hour glass or rolling wheel indication. If your GUI becomes unresponsive while it is "processing" instead of giving an indication, then user is likely to get annoyed or make mistakes clicking on the same button multiple times. You should always use event-based system for your user interface, instead of synchronous processing especially if it can block. Caching can be used if needed to speed up your performance. For example, instead of fetching the user list to display in your client every time you switch to the user list view, you can cache it and display a cached copy. Periodically, refresh your local cache with the actual data or from server. Caching is also useful in other places where client-server communication becomes overloaded.
In summary, a good client software is one which can do one thing that it is meant for. You may add many new features, but how you do the essential function is what will make your client useful and popular. Consider using analytics in finding which feature is gaining popularity, or which feature is no longer used. A software is like a human body. If you don't do exercise to remove body fat -- remove unused pieces and re-factor periodically -- you will become too fat, slow and useless. This is more important for client facing software, because client behavior keeps changing and what you used last year may not be the same client this year.
Server
Considerations: Easy configuration, logging, vertical and horizontal scalability, robustness and automatic failover, auto loading of configuration changes, connectivity to different backends, programmability, event based but multi-threaded, use multi-core CPUs, memory usage optimization, management console, command line control, activity monitoring, admission control for quality, stateless vs stateful, replication of critical data, partitioning of data for scalability, caching, keep-alive for crash detection of server, detection of idle or unresponsive clients.Examples: Apache web server, ejabberd, SIP express router
What Great Programmers think?
FAQ on using Flash Player to make phone calls
1. Is Flash Application a good choice for VOIP?
2. Will there be any performance degradation when the call goes through the following paths? (Flash Client -> Media Server ->RTMP to SIP Converter -> VOIP Server -> VoIP/PSTN Gateway -> PSTN Network -> Telephone)
3. Some experts says that the development in C or C++ is prefered for VOIP call to phone instead of Flash Player for performance reason. Is that true?
All the existing good quality systems (Skype, GTalk) tend to use end-to-end media-path over UDP as much as possible.
4. There are different media servers available. like Adobe Flash Media server (FMS), Wowza, Red5 etc. Which one is the best choice?
6. We are now in a confusion whether to develop our VOIP application in Flash technology or QT/Java/C#. What will be your choice?
Another option is to use the Gmail video/voice architecture described in my article. Basically it uses Flash Player for user interface, but all the networking or voice related processing happens using their native GoogleTalk plugin.