(For the last month and half, I have been aggressively involved in another open source project, "videocity". This article describes the salient features and novel ideas in that project.)
The goal of the Internet video city project is to provide open source software tools, both client and server, for video communication and sharing. Unlike other file sharing systems, this is targeted towards video and live video sharing in small groups. Unlike other video communication services, this project provides the tools needed to build a service.
High level description
At the high level, the video communication is abstracted out as a city. An individual can signup with his email firstname.lastname@example.org and own a home with URL of the form http://server:email@example.com. This is also the location of the default guest room of that user. The user can build other rooms inside this URL, e.g., for hosting a online family gathering, he can get a room with name "Family Gathering" and the room URL of the form http://server:firstname.lastname@example.org/Family.Gathering. Each room can be made public or private. A public room is accessible to anyone visiting the URL of the room, whereas a private room needs explicit permission to enter.
Once you have entered a room, you see other members in the room, and can communicate with others using real-time audio, video and text chat. You can share media files such as photos and videos from your computer with others in the room. You can also share online photos and videos with others. All these shared resources are put in an active session and would disappear when the room is closed, i.e., all members have left the room.
The owner of the room can decorate his room by uploading, recording or editing the room's content. A room's content is described using an XML file containing multiple play lists. Each play list contains sequence of media files or URLs. When you enter a room, you see all the pre-configured play lists in that room. This allows the owner to, for example, create a room with his family pictures and videos in a slide show, and give out the URL to others to view the photos. A media resource in a play list can be text, image or audio/video. The image and audio/video can be uploaded from user's computer, downloaded from a web URL or recorded using user's camera in real-time. The play list can be readily edited using drag-drop, built-in text editor or various button controls.
Each signed in user also has an inbox. The inbox is a special XML file that gets loaded when a user logs in, and contains play lists that are sent by other users to this user. When you enter a room, you have an option to send a play list to the owner of the room, which turns up in the owner's inbox. You can record the play list using your camera, or create one using resources available from the web. The play list stored in the inbox is privately available only to the owner of the room.
This simple concept of play list and rooms, allows us to implement various communication scenarios. For example, real-time communication, video mails, publicly posted videos, and video web sites.
One of the novel concept used in the project is that of soft-card. A soft-card is a digital version of your ID card or visiting card. There are two types of cards: a Private login card is your confidential ID card that you use for login to the site, an Internet visiting card is your room's visiting card, which you give out to your friends so that they can visit your room. Usually each signed in person has a private login card, and each room owned by the person can have an Internet visiting card.
A soft-card looks like a digital image of your real ID and visiting cards. It is actually a image file in PNG format. The image has a photo, your name or your room's name, some list of key words identifying your room, and a URL of your room. Unlike a regular PNG file, a soft-card has additional meta information that is used in secure identification and access. In particular, your private login card has your RSA private key (refer to PKI) and your Internet visiting card has X.509 certificate using RSA public key signed by the server. These meta information such as keys, certificates, names, emails, keywords, etc., are stored in information chunks of the PNG file itself.
Similar to public key cryptography, these digital files can allow us to implement security, authentication, access control, privacy, confidentiality, etc. Essentially, anything you can do with PKI, you can do with these soft-cards. Additionally, these soft cards give a visual appearance of an ID card or a visiting card containing the URL which they represent. Users receive them in email on signup, and can give out visiting card to others in email. An example visiting card is shown at the top of this article. If you edit the card's file or image in any way, e.g., converting to JPEG and back, or edit using photo editors, then the card's key information will become invalid and unusable. Note that a card is valid only within the domain it is created for. Thus a card created for http://server1/room1 can not be used by http://server2/room1 even if both server1 and server2 virtual domains are hosted by the same server.
Once we have the login (private key) and visiting (public key) cards, implementing rest of the security mechanisms is straight forward. For example, resources in an inbox can be encrypted using public key of the owner, so that only a private login card can decrypt it. The public rooms are signed by owner's private key, so that anyone with the visiting card of the room can verify the signature. When sending a media resource to another user, PKI can be used to establish a secure session of communication. A room can be made private by allowing only connections from people who have valid visiting card for that room, and have the owner send out visiting card to his friends and family using an independent channel such as email. A room can be made public by uploading the visiting card to the room itself, so that anyone with the URL can first download the visiting card (i.e., public key) and use that to connect to the room. Although we haven't implemented most of the security mechanisms, we have the basic soft-card concept implemented in the project. In particular, you can create your cards, edit the layout of the card during creation, download them after creation, and use them to upload in the client to join a room or to log in. One thing to note is that within the Flash Player environment, the amount of security using PKI is limited. But since we have our own video server implementation as well, we can do some novel tricks in that regard.
Product design ideas
There are several product design ideas we implemented in the project: (1) consistency, (2) flowing and smooth interface, and (3) performance. In this section, I describe these ideas and how they are implemented.
Consistency is very important in user interface design. The look and feel of various buttons should be consistent. Common operations should be consistent with what people are used to doing. For example, most windows users see the 'close', 'maximize', 'minimize' buttons on the top-right corner. Most mac users see the bottom bar as tools or commands bar. Most instant messaging users see notifications on the bottom-right corner of their screen. We used these concepts in our UI design as well.
Flash allows us to implement nice, smooth and flowing user interface. When you go from one room to another, the view slides your window from one room to another. The sliding window component in the project nicely abstracts out the details of this container. When a help video is played, it animates to the full view, and when it is paused, it goes back to the original position. For help videos, flowing subtitles along with audio/video give a better user experience. Computer users are comfortable with drag-and-drop operations using the mouse. In our project, the play list editing, video window re-organizing, delete button, etc., use the drag-and-drop mode of operation.
Performance is important once the project grows to a significant size. In particular, a Flash Player spends lot of cycles rendering images. This is improved significantly in our project since we use only programmatic skins for all our buttons and icons. Moreover, programmatic skins scale nicely when going to full screen or different size.
There were a number of lessons we learned in this project from the product design perspective. Moreover, being responsible for both product design and product engineering helped us avoid ambiguity, which is usually seen in multiple team projects.
The big picture
Although, the project is still "work in progress" and a lot of work is remaining, I wanted to give a big picture of the project. Flash Player is a great browser plugin. However being proprietary makes it hard for others to use it in full potential. For example, until recently the video communication was restricted to only Flash media server, or file upload were not allowed from local computer to Flash Player without going through the server. Although Adobe is making significant progress in keeping the developer community engaged, (e.g., making RTMP protocol open, or making file uploads and downloads available in new Flash Player) there will always be some restriction in the Flash Player. For example, absence of H.264 encoder or good audio quality/preprocessing engine prevents us from using it efficiently in true H.264 video communication or good real-time audio communication. In any case, since the RTMP protocol is open, and since there are a number existing open source RTMP implementations, one can use back-end RTMP based servers to perform some processing.
This videocity project gives us back-end tools to intercept RTMP, integrate web communication, and expose a single server to support various requirements of video conferencing. One can ask whether this will scale? The answer is, may be, not. The reason for doing the project though is that it fits nicely in the big picture of P2P-SIP based communication framework. Flash gives a nice ubiquitous browser based front end, whereas our videocity server gives tools that can be integrated with peer-to-peer network. Thus we can gain from advantages of both worlds.
Distributing a conference in a P2P network is an already researched problem. Several solutions exist, ranging from application level multicast for large conference, to full mesh small conferences, to picking a few servers as relay bridges. Maintaining shared distributed state of the conference and collaboration is interesting to explore. The SIP community has done significant work in centralized conferencing framework, e.g., in the IETF XCON working group. The P2P-SIP working group is creating protocol for standards based peer-to-peer network maintenance and lookup for SIP service. Finally, some API or interface specification is needed for the videocity's client-server model so that others can build clients or server adaptors to integrate between XCON, P2P-SIP and videocity. In particular, we will define all the interface elements such as format of the soft-card, various RPC calls for uploading or downloading resources, sharing play lists, authenticating users, as well as communication mechanisms.
In summary, the project gives developers a starting point from where you can build video communication service, video message platform, video recording and editing system, collaboration engine, media sharing software, video blog web site, video rooms, multi-party conferencing applications, desktop clients, browser extensions, application sharing, new client applications, and so on. The client-server tools available in the project allow you to record a video or snapshot photo from your camera and store it in local file, create play lists of various heterogenous media resources, and share live and stored media with others using the system.
There is no hosted service for this software, and we don't plan to have one. This is because our goal is to go peer-to-peer, where various installations of the software will discover and communicate with each other!
Thank you for your reading time, and we love feedback!