Software API Design

In this article I present my view on the design of software APIs. Several other people have written extensively about the importance of API design and best practices, mostly in the Java community, but still I find so many poorly designed software APIs.

What is a Good API?

At the high level only a few design principles: complete, easy-to-use and minimum, are enough to design good API. The API should be complete, without any missing element, to achieve all the necessary functions of the software. At the same time, the API should be minimum, without much redundancy. For example, if the same thing can be done using two method, pick the best, and use that. Most importantly the API should be easy to use.

Consider the Unix socket API. It provides the core functions such as connect, sendto, bind, using explicit methods whereas additional rarely used functions are supported using setsockopt or ioctl. The best feature of the API is that it treats a socket similar to a file descriptor, so you could indeed use the file related functions on a socket.
s = socket(...);
write(s, ...); /* use it like a file */

Asynchrounous I/O

The annoying part in Unix socket is the select and poll set of function for doing asynchronous I/O. The primary reason is that there is no standardized way for asynchronous API in Unix. On the other hand Windows invented the WSAAsyncSelect function which is more difficult to use. An event-based API for asynchronous I/O is clean and easy to use, provided the event is dispatched by the relevant object.
s = new Socket();
s.addEventListener("socketData", handlerFunction);

Return vs exception

Exception is better for indicating failure cases, if the failure is less likely to happen. On the other hand, return value of failure is suitable if the failure is part of the result. For example, a findUser() function can return the User object on success, and null on failure such as not-found. What if there is other kind of failure? Should it return null or throw an exception?

Exceptions make more cleaner code, at least in Python. This avoids several if-else constructs to check the return values, and you can accumulate all error handling at one place, either in this function or in the caller stack frame.

Value vs reference

You would have studied pass-by-value and pass-by-reference in your programming class. This topic goes beyond that. For example, if a Camera object represents a default-camera for your machine, should the Camera object automatically switch to the new camera if the user changes the default camera from the control panel independent of your software?
cam = Camera.getCamera("default")
# will cam be the current snapshot of default-camera or will it automatically
# change when default-camera changes?
In your software, if you need to represent the local logged in user as a LocalUser object, should you create a new object when the login changes or should the same object update its state to reflect the new logged in user? If you use a local structure to represent the local listening IP address of your network application, should this structure automatically change its IP value whenever your machine's IP address changes?

The value object is easy to implement and understand. However, if an application needs to detect any change in the value they need periodic polling or event dispatching on change. The reference object is more convenient and clean to the application developer, but needs additional work in the implementation of the API. One option is to have generic reference wrapper, which can be used to convert any value object to reference object, as long as the change detection is uniform.

Synchronous vs non-blocking

Synchronous APIs are easy to understand and use whereas non-blocking event-based APIs are difficult to use. Non-blocking calls have advantage that your thread is not held-up waiting for the response. This is particularly a problem for single threaded software systems. In some platforms, e.g., ActionScript on (single threaded) Flash Player, there is no choice, and you must do non-blocking calls. Python provides constructs such as yield, which allows you to write synchronous co-operative multitasking software. Thus, even though your application code looks like synchronous, it actually yields to other task behind the scene. Clearly critical section and shared resources must be protected appropriately for read and write access as needed.
task1: data = yield multitask.recv(sock, timeout=10)
task2: yield multitask.send(sock, somedata)

Generic vs specific

While most API designers tend to write methods that are specific for one task, the arguments should be generic. In C++ or Java, templates allow you to specify generic algorithm. The question on generic vs specific spans more use cases: should your supply a String as your method name, or should you define an Enum and use that? If you already have a function getUser(String name), should you define another function getUserById(int id) or should you overload using argument type?

Generic API is more extensible but care must be take to handle various edge cases and throw appropriate exceptions.

Document and testing

How should the API be specified? Should there be a set of documents that describe each and every function in various use cases, or should there be built-in test cases that show how to use the API? or both? With document alone, how do you make sure that the document is updated every time someone updates the implementation? Python's doctest module allows you to integrate the unit testing in the API document itself. This cleanly makes sure that your unit test will fail if your API implementation changed.


The single feature which improves the usability of an API is using an analogy with some existing thing. For example, a web interface modeled after Unix file I/O is very easy to understand and use. On the other hand, a completely new paradigm for your API will make it hard to understand. Other examples of existing paradigms are the listener-provider, event dispatcher, property getter setter, read-only vs read-write property, attribute vs container access, dictionary or hash map collection. For example, if you want to implement a new P2P storage module, consider exposing it as a dictionary where user can put and get values using keys, and can replace his existing code that uses local dict or HashMap with your new distribute dictionary.


Software APIs tend to last longer than expected. Careful thoughts in the design process can help your API withstand the test of time. When you design an API, weigh your options with respect to these properties: synchronous vs asynchronous, return vs exception, value vs reference, generic vs specific, and incorporate analogy with existing paradigm, and automatic documentation and testing tools.

Systems Software Research

A very interesting talk by Rob Pike on Systems Software Research is Irrelevant".

Some quotes from the slides (by Rob Pike):

"We see a thriving software industry that largely ignored research, and a research community that writes papers rather than software".

"Java is to C++ as Windows is to Machintosh: an industrial response to an interesting but technically flawed piece of systems software."

"Linux's cleverness is not in the software, but in the development model, hardly a triumph of academic CS (software engineering) by any measure."

"It (systems research) is just a lot of measurement: a misinterpretation and misapplication of the scientific method. Invention has been replaced by observation."

"If it didn't run on a PC, it didn't matter because the average, mean, median, and mode computer was a PC."

"To be a viable computer system, one must honor a huge list of large, and often changing, standards: TCP/IP, HTTP, HTML, XML, CORBA, Unicode, POSIX, NFS, SMB, MIME, POP, IMAP, X, ... With so many externally imposed structure, there is little left for novelty."

"Commercial companies that 'own' standards deliberately make standards hard to comply with, to frustrate competition. Academic is a casualty."

"New employees in our lab now bring their world (Unix, X, Emacs, Tex) with them, or expect it to be there when they arrive... Narrowness of experience leads to narrowness of imagination."

"In science, we reserve our highest honors for those who prove we were wrong. But in computer science..."

"How can operating systems research be relevant when the resulting operating systems are all indistinguishable? (Unix is) a victim of its own success: portability led to ubiquity. That meant architecture didn't matter, so there's only one."

"Government funded and corporate research is directed at very fast 'return on investment'... The metric of merit is wrong."

"Measure success by ideas, not just papers and money. Make the industry want your work."

"The future is distributed computation, but the language community has done very little to address that possibility."

My take on the lessons learned, again in the form of quotes:

"Keep the ideas flowing, even if the implementation is not feasible (using existing systems)."

"When thinking of distributed systems -- think beyond web, Browser and Flash Player"

"Something is popular, does not mean it is correct or best way to do that thing."

"Do not publish papers that fake measurement as research."

"Do not take a job that you are not truly motivated about."

"Writing software in Java is like writing detailed machine instructions. Learn Python instead."

REST, Flash Player, Flex

In doing some experiments with Flex and RESTful architecture, I realized that there seems to be a whole lot of problems. Flash Player was designed to support traditional HTTP web access such as form posting or resource retrieval using GET and POST. So a number of features that are used in RESTful design are not supported by Flash Player. People have written additional client-side or on server-side kludges to work around the problems with HTTP support in Flash Player. Most of the server side changes are hacks, and the client-side changes are in external third-party libraries, which are sometimes missing crucial features like cookies, TLS, etc. Even in the best possible scenario, you still need to provide crossdomain policy file even if the Flash application is accessing resources on the same server. One of the reasons I suspect is that Flash Player relies on the browser for HTTP support, and hence supports only the lowest common set of features, which are not enough for REST architecture.

What is the solution? Depends on what you want to do.
  1. If you have control over server-side of the system, you need to incorporate certain kludges to support Flash restrictions. For example, provide crossdomain policy-file, map some header or URL to method PUT and DELETE, return 200 success response with message body containing actual error code (e.g., 404 vs 405 vs 501) or headers. Any of these techniques looks like a hack at best.
  2. If you have control over the client, you can use an existing third-party RESTful client library in ActionScript. However, be prepared to provide a crossdomain policy-file or incorporate a proxy. Alternatively, you can also perform some Flash Player-specific translations on the fly in your proxy.
  3. Use Flash Player's ExternalInterface mechanism and incorporate your RESTful client code in JavaScript. This is sometimes not easy, error-free or feasible. Moreover, now your Flash application depends on your Javascript.
What are the problems with these solutions?
  1. You cannot build a general purpose RESTful web application (server-side) and expect Flash Player application to use them. You will need to be Flash Player-aware.
  2. You cannot build a general purpose Flash application (client-side) to use an existing RESTful web service without additional dependencies and network elements (proxies).
What is the real solution?
  1. It looks like HTTP+REST won't really be able to solve all the problems in ActionScript for Flash Player without Adobe's blessings, e.g., incorporate full HTTP stack in the Flash Player or provide a different way other than crossdomain to do authentication of scripts.
Some additional references [1, 2, 3, 4]


This article describes a RESTful SIP application server architecture.

Why do we need this?
SIP is the protocol of choice for Internet session initiation and control such as for VoIP or multimedia calls. Although SIP is similar to HTTP in many respects, there are crucial differences in the design. Two of the major difficulties among web developers in adopting SIP are (1) no existing SIP-based web tools similar to programming libraries for HTTP and XMPP on Flash Player, (2) the initial cost to get started with basic working system is huge with lot of specifications, e.g., for NAT and firewall traversal. On the other hand, web developers are used to building applications on top of HTTP which works for most cases out of the box. More recently RESTful architectures are gaining popularity among web services. In the absence of easy to use web tools for SIP and large set of specifications for a SIP system, web developers tend to resort to quick and dirty hacks which in the end are short term and not interoperable. Hence there is a need for a easy to use RESTful architecture for SIP-based systems that allows quick application development by web developers. This article proposes such an architecture.

What exactly is difficult?
SIP supports both UDP and TCP transports. Many earlier systems implemented UDP, whereas both transports are a must for SIP proxy servers. In client-server communication, with several clients behind NAT and firewall, UDP causes problem. Secondly, with UDP you also need the reliability of transactions and hence the transaction state machines in SIP. The SIP request forking and early media feature have created lot of stir and confusion among developers. Several other telephony-style features are also not needed for many Internet oriented SIP applications that do not talk to a phone network. The NAT and firewall traversal are defined outside core SIP, e.g., using rport, sip-outbound. A developer usually prefers to have an integrated application library and API that is quick and easy to use. Moreover with lots of RFCs related to SIP, it becomes difficult to figure out what specifications are core and what are optional for a particular use case. A number of new web-based video communication systems use proprietary technologies such as on Flash Player because of lack of a ready-to-use SIP library to satisfy the needs.

To solve the difficulties faced by web developers, a subset of the core features of SIP are needed as an easy to use API. Such an API could be available as a built-in browser feature or a plugin. Once the core set of resources are identified, rest of the API can be customized by the application server providers and developers, or in separate communities.

What use cases are considered?
SIP is designed to be used consistently in different use cases such as client-to-client communication, client-to-server as well as server-to-server. The core SIP says that each SIP user agent (application client) has both UAC (client) and UAS (server). In this article I refer to client as a user agent and server as an application server, which are different from SIP terminology. Since the target audience for the proposal is application developers, only the client-server interface needs to be considered. The backend application server can translate the client-server request to appropriate SIP messaging for server-to-server case if needed, e.g., for service provider's network you may need high performance UDP based server-to-server SIP messages.

What are the SIP-related resources?
Once we focus on a small subset of the problem -- define RESTful API for client-server communication to access a SIP application server -- rest of the solution falls in place naturally. In particular, the SIP application server will provide two core resources: "/login" and "/call" to represent list of currently logged in users and list of active calls. Additionally, it can provide user profiles of signed up users at "/user" which internally may contain things like voicemail resources for the user. The client uses standard HTTP requests, with some additional methods as shown below, to access the resources and interact with others. One difference with standard RESTful architecture is that the client-server connection may be long lived, and also used for notification from server to client. In that sense it does not remain pure RESTful.

Login: The SIP registration and unregistration are mapped to "/login/{email}" resource, e.g., "/login/". Doing a "POST /login/{email}" with message body containing your contacts, can be used to REGISTER. The response will return your unique identifier for the login resource, e.g., "/login/{email}/{contact_id}. Later, you can use "DELETE /login/{email}/{contact_id}" to un-REGISTER or a subsequent "PUT /login/{email}/{contact_id}" to do a REGISTER refresh. The actual representation of the login contact information can be in XML, JSON or plain text and is application dependent. For example one could combine the presence update including rich presence with the registration method. Clearly the login update requires appropriate authentication.
 POST /login/      -- new registration
request-body: {"contact": "sip:kundan@"}
response-body: {"url": "/login/", "id": 1, "expires": 3600}

PUT /login/ -- registration refresh
request-body: "sip:kundan@"

DELETE /login/ -- unregister

GET /login/ -- get list of contact locations
response-body: [{"id": 1, "contact": "sip:kundan@", ...},...]
Call: The call is split into two part: conference resource and invitation. The conference is represented using a "/call/{call_id}" resource, where a client can "POST /call" to create a new call identifier, or "POST /call/{call_id}" to join an existing call. The conference resource represents the list of participants in a call.
 POST /call             -- create a new call context
request-body: {"subject": "some discussion topic", ...}
response-body: {"id": "123", "url": "/call/123" }

POST /call/123 -- join a call
request-body: {"url": "/login/", "session": "rtsp://...", ...}
response-body: {"id": 2, "url": "/call/123/2", ...}

GET /call/123 -- get participant list and call info
response-body: {"subject": "some discussion topic",
"children": [{"url": "/call/123/2", "session": "rtsp://..."}]

Invite: Call invitation requires a new message such as "SEND". For example, "SEND /login/{email}" sends the given message body to the target logged in user. Similarly, "CANCEL /login/{email}/1" cancels a previously sent message it is not already sent. The message body gives additional details such as whether the message is a call invitation or an instant message. The message body is application dependent. The SIP application server does not need to understand the message body, as long as it can send a SEND message from one client to another. This makes a SEND more closer to an XMPP instead of a SIP INVITE. If the callee wants to accept the call invitation, it joins the particular session URL independently.
 SEND /login/     -- send call invitation
request-body: {"command": "invite", "url": "/call/123", "id": 567}

SEND /login/ -- cancel an invitation
request-body: {"command": "cancel", "url": "/call/123", "id": 567}

SEND /login/ -- sending a response
request-body: {"command": "reject", "url": "/call/123", "id": 567, "reason": ...}
Event: SIP includes an event subscription and notification mechanism which can be used in several applications including presence updates and conference membership updates. Similarly, one needs to define new mechanism to subscribe to any resource and get notification of a change. This gives rise to a concept known as active-resource. The idea is as follows: if a client does a GET on active resource, and does not terminate the connection, then the client keeps getting the initial state of the resource, as well as any future updates until the connection is terminated. The future updates may include the full state or a difference depending on the request parameter.
 GET /call/123          -- keep track of membership information
response 1: ... -- initial membership information
response 2: ... -- any addition or deletion in the membership

GET /login/ -- keep track of presence updates
response 1: ... -- initial presence information
response 2: ... -- subsequent presence updates.
Profile and messages: The SIP application server will host user profile at "/user/{user_id}". The concept of user identifier will be implementation dependent. In particular, the client could "POST /user" to create a new user account, and get the identifier in the response. It can then do a "GET /user/{user_id}" to know various URLs to get contact location of this user. It can then do a GET on that URL to fetch the contacts or do a SEND on that URL to send a message or call invitation.
 POST /user                            -- signup with a new account
request-body: {"email": "", ...}
response-body: {"id": "", "url": "/user/" }

POST /user/ -- send offline messages (voice/video mail)
request-body: {"url": "rtsp://..."}

GET /user/ -- retrieve list of messages
response-body: [{"url": "rtsp://...", ...]
Miscelleneous: There are several other design questions that are left unanswered in the above text. Most of these can be intuitively answered. For example, the HTTP authentication credential defines the sender of a message, i.e., SIP "From" header. The sequential or parallel forking is a decision left to the client application. The decision whether to use a SDP or XML-based session description is application and implementation dependent. For example, if the client is creating a conference on RTSP server, it will just send the RTSP URL in the call invitations. Similarly, for Flash Player conferencing it will send an RTMP URL in the call invitation. The call property such as participant's session description can be fetched by accessing the call resource on the server. Thus, whether an RTSP/RTMP server is used to host a conference or a multicast address is used is all client or application dependent. The application server will provide tools to allow such freedom.

Conclusion: A RESTful interface to SIP application server is an interesting idea described in this article. The idea looks feasible and doable using existing software and tools, and hopefully will benefit both the web developer and SIP community in getting wider usage of SIP systems. The goal is not to replace SIP, but to provide a new mechanism that allows web-centric applications to use services of a SIP application server and to allow building such easy to use SIP application server.

Several of the pieces described in this article are already implemented in Python, e.g., RESTful server tools, video conferencing application server, SIP-RTMP translation and SIP server and client library. The next step would be to combine these pieces to build a complete REST and SIP project. If you are interested in doing the project feel free to get in touch with me!

REST, RESTful and restlite

This post announces a new open source software:

What is restlite? Restlite is a light-weight Python implementation of server tools for quick prototyping of your RESTful web service. Instead of building a complex framework, it aims at providing functions and classes that allows your to build your own application.

restlite = REST + Python + JSON + XML + SQLite + authentication

  1. Very lightweight module with single file in pure Python and no other dependencies hence ideal for quick prototyping.
  2. Two levels of API: one is not intrusive (for low level WSGI) and other is intrusive (for high level @resource).
  3. High level API can conveniently use sqlite3 database for resource storage.
  4. Common list and tuple-based representation that is converted to JSON and/or XML.
  5. Supports pure REST as well as allows browser and Flash Player access (with GET, POST only).
  6. Integrates unit testing using doctest module.
  7. Handles HTTP cookies and authentication.
  8. Integrates well with WSGI compliant applications.

Motivation: As you may have noticed, the software provides tools such as (1) regular expression based request matching and dispatching WSGI compliant router, (2) high-level resource representation using a decorator and variable binding, (3) functions for converting from unified list representation to JSON and XML, and (3) data model and authentication classes. These tools can be used independent of each other. For example, you just need the router function to implement RESTful web services. If you also want to do high-level definitions of your resources you can use the @resource decorator, or bind functions to convert your function or object to WSGI compliant application that can be given to the router. You can return any representation from your application. However, if you want to support multiple consistent representations of XML and JSON, you can use the represent function of request.response method to do so. Finally, you can have any data model you like, but implementations of common SQL style data model and HTTP basic and cookie based authentication are provided for you to use if needed.

This software is provided with a hope to help you quickly realize RESTful services in your application without having to deal with the burden of large and complex frameworks. Any feedback is appreciated. If you have trouble using the software or want to learn more on how to use, feel free to send me a note!

Protocol Jungle of Internet multimedia communication

The diagram shows several protocols for Internet multimedia communication. (Click on the diagram to see the full size picture.) In the protocol jungle, a protocol is analogous to a species, its real-world implementation or deployment is an animal of the species. Some animals or species compete with each other for survival. Some animals live with each other in harmony. Some animals do not care or interact with each other since they live in different place, i.e., application or domain. Evolution and mutation results in long lasting survival of some species whereas others become extinct. Unlike using a protocol zoo metaphor, I use a protocol jungle, because there is really a competition between protocols when big companies have invested in certain protocol unlike a closely guarded and nurtured zoo system.

The diagram shows the species and its relationship with other species, e.g., whether A uses B or whether A and B are friendly. Due to space constraint, some items are grouped together, e.g., all the audio/video codecs, and some relationships are missing, e.g., RTMP is friendly with Speex. Ideally, we need a multi-dimensional representation to show multiple aspects of the jungle and how they are related. The following text lists the protocols that serve similar or common functions, and usually are competing within that function.

Structured data encodingXML, ASN.1, RFC822, others
Audio encodingG.711, G.723.1, G.722, G.726, G.728, G.729, MP3, Speex, Nellymoser, AMR, Silk, GIPS, etc.
Video encodingH.261, H.263, H.264, MPEG, Sorenson, Vidyo, etc.
Media transportRTP/RTCP, SRTP, ZRTP, Skype, IAX, RTMP, RTMFP
RendezvousSIP, H.323, Skype, Stratus/RTMFP
Session descriptionSDP, H.245, Jingle
Session negotiationSIP/SDP, H.245, Jingle, Skype, RTMFP
Call signaling and controlSIP, H.225/Q.931, Skype, IAX, MGCP, SCCP (Skinny), RTMFP
Streaming media controlRTSP, RTMP
Session announcementSAP
ConnectivityICE/STUN/TURN, Skype, RTMFP
Remote Procedure CallSOAP, XMLRPC, REST, RTMP
Programming callsCGI, CPL, CCXML, MSCML
Programming voice dialogVoiceXML
Instant messagingXMPP, SIMPLE, MSRP
Shared resource accessREST, XMPP, XCAP
Shared stateXMPP, RTMP, HTTP

As you can see that a SIP system typically employs one protocol for one task or a few related tasks, but integrated monolithic systems such as those based on RTMP/RTMFP, Skype or IAX tend to combine multiple functions in the single protocol. I have not listed H.32x protocols other than H.323 because those are intended for non-IP networks. Nevertheless, there are several H.32x systems, e.g., for room based video conferencing or for carrying voice among carriers.


With multiple protocols available for the same function, interoperability or interworking among those becomes important. I have talked about SIP and XMPP interworking in the last post. I have hands-on experience with several of the interworking scenarios among protocols shown in the diagram.

H.323-H.324: One of my projects in my first job was interworking between H.323 and H.324. Since both these systems use H.245 as the main session description and negotiation, the interworking task is relatively simple. I also worked on part of H.320 system to try to build H.323-H.320 interworking, but did not complete.

SIP-H.323: One of my first project during my M.S. at Columbia University was SIP-H.323 interworking. I have written sip323 software and couple of internet drafts and papers [1] on this. My PhD thesis gives a complete interworking procedure for basic call setup and registration. The conclusion was that while basic call setup and registration are easy to interwork, the full interworking of all the supplementary services is not feasible and not even needed in many cases. Since both SIP and H.323 use RTP/RTCP for media transport and can use the same set of codecs, the signaling gateway is efficient. The company SIPquest which productized my software demonstrated 10k simultaneous calls (this article).

SIP-RTSP: These protocols serve different purposes, but it is possible to build a system that needs both these functions in a standard compliant way. The sipum software is a voice mail and answering machine that uses SIP for calls and RTSP for recording and playback of media. Since both these use RTP/RTCP for media transport and can use the same set of codecs, the software is efficient as the media path can bypass the software. Please see my papers [1] for details.

SIP-RTMP: There have been several attempts at implementing Flash based SIP systems and SIP-RTMP translator is one of the approach. Some existing projects that implement these are siprtmp, gtalk2voip, red5phone and flaphone. Since RTMP is an integrated streaming protocol which can also do control and RPC, the translator is inefficient since it needs to incorporate the media path as well.

SIP-Skype: Being a proprietary protocol, it is not easy to interwork with Skype. However, Skype itself uses SIP to allow trunking with PSTN providers, and recently there was some news about SIP-based Skype gateway for enterprise.

SIP-IAX: Although IAX is open, it is an integrated protocol that combines media and signaling in the same connection, hence suffers from the same scalability problem as other integrated protocols like RTMP. Asterix also has a SIP gateway so that it can talk to SIP-enabled devices, especially carrier equipments.

SIP-XMPP: There is a interest group that discusses this in depth. My last post gives more links about the interworking scenarios using a gateway or co-location in the client.

SIP-RTMFP: Given the P2P promise of RTMFP, a gateway between these two protocols will be able to connect the proprietary Adobe protocol with the rest of the world for a true web-based end-to-end media path. I haven't seen any system that does this.

SIP-H.320: This gateway is particularly useful for existing room based video conferencing systems that want to connect with more Internet-style SIP devices. The idea is similar to SIP-H.323 translator, and in fact a real deployment may use two gateways: SIP-H.323 and H.323-H.320 in practice.

RTMP-XMPP: Since RTMP and XMPP serve two completely different functions, there is no need to interoperate. However, people have built systems that use XMPP for messaging and signaling while using RTMP for media path. Unfortunately since Jingle extension wants to define its own end-to-end session, it becomes not so useful for exchanging RTMP server session information. In particular use XMPP custom extensions based on presence and message to rendezvous, but do session control and call management in RTMP itself.

XMPP-SIMPLE: The SIP-XMPP interest group is also looking at SIMPLE-XMPP translation. However, given the disconnect between the two protocols, it is likely that all the presence and message updates go through the gateway and hence not as efficient as one would want for presence and instant messages.

RTMP-Skype: Now this is going to be really tough because firstly Skype is still a proprietary protocol, and secondly, both these are integrated protocols hence requiring complete conversion of signaling and media. An specific example could be allowing people to access Skype from web pages, e.g., by having a simple RTMP server in the Skype application itself. This works if Skype is running on your local computer. Alternatively, you need the Flash application to connect to Skype super-nodes running on public computers using RTMP. This poses security risk and is inefficient. Why inefficient? because RTMP over TCP means that only the applications on public Internet will be able to receive the connection, and RTMP is not really good for real-time interactive communication because of its latency and buffering. However, if such gateway are incorporated in Skype, then it truly become ubiquitous to web applications.

RTMP-RTSP: These are two competing streaming protocols. Instead of having a gateway that translates between the two protocols, it might be better to build an integrated client or integrated server -- you can record using RTMP and view using Quicktime (RTSP), or you can use the same client to access real-time streams from RTMP or RTSP. Since RTMP incorporates RPC along with streaming control and media path, whether as RTSP is just streaming control, a complete translation of all the functions may not be feasible.

ASN.1-XML: There has been effort to standardize this, e.g., XER. The proposed H.325 standard by ITU-T will use XML while allowing compatibility with some of the predecessors which are in ASN.1 PER. ASN.1 and XML are just data formats and for the purpose of P2P-SIP, they are not very significant.

If you have data about the usage in real deployment for particular protocol(s), feel free to post your comment.

[1] My publication page

SIP vs XMPP or SIP and XMPP?

(This post is unrelated to P2P, and describes the differences between the two sets of protocols SIP and XMPP. I have implemented both SIP and XMPP, as well as used several existing libraries for SIP and XMPP, so I can comment on the two sets of standards from a developer point of view as well)

SIP was invented to provide rendezvous for session establishment and negotiation on the Internet. XMPP (or Jabber) was invented to do structured data exchange such as synchronous or active presence and text communication among group of people. XMPP evolved from instant messaging and presence, whereas SIP evolved from Internet voice/video communication. Later, XMPP added support for session negotiation using the Jingle extension, and SIP community added extensions such as SIMPLE to support instant messaging and presence.

Technically comparing SIP and XMPP is like comparing apples and oranges because the core protocols serve different purposes: session randevous/establishment vs structured data exchange. On the other hand, because of the extensions invented in both the protocol worlds, SIMPLE and Jingle, they now have overlapping functions, and can be compared. When one compares SIP vs XMPP, actually the comparison is SIP/SIMPLE vs XMPP for IM and presence and/or SIP/SDP vs XMPP/Jingle for session negotiation. Even though the goals of the two sets of protocols are converging, there are fundamental architectural differences that I will enumerate in this article. There are other articles on SIP vs XMPP [1, 2, 3].

Differences: SIP vs XMPP
The following table lists the crucial differences between the two sets of protocols.

PurposeProvide rendezvous for session establishment and negotiation where the actual session is independent, e.g., over RTP media transport.Provide a streaming pipe for structured data exchange between group of clients with the help of server(s), e.g., for instant messaging and presence
ProtocolText-based request-response protocol similar to HTTP, where core attributes are signaled using headers, and additional data using message body, e.g., session description of capabilities.
XML-based client-server protocol to create a streaming pipe on which it sends request, response, indication or error using XML stanza between client and server, and between servers.
TransportUsually implemented in connection-less UDP as well as connection-oriented TCP transport. Also works over secure TLS transport.Works over connection-oriented TCP or TLS transport.
ConnectionA user-agent is both client and server, hence can send or receive connections, in case of TCP or TLS. This does not work well with NATs and firewalls, hence extensions are defined to use reverse connections when server wants to send message to client.The client initiates the connection to the server, which works well with NATs and firewalls. Additionally, extensions are defined such as BOSH to carry XMPP stanza over HTTP to work with very restricted firewalls

There are many other differences, e.g., the way a URI is represented, or how authentication is done, or what kinds of messages are supported. I will not go into details of those since they tend to become too specific for the kind of application and we miss the important points. From a developer's point of view 'ease of programming' is very important.

Ease of programming
Both SIP and XMPP are easy to implement. My 39 peers project has modules for both in few thousand lines of Python code. Although the basic protocol is easy to implement, a complete system such as a collaboration client with audio/video and messaging/presence support is very complex.

Because of the way these protocols have originated, they are well suited for certain kinds of applications. For example, if you want to build an audio/video communication system, it is better to start with SIP. Features such as interoperability with other VoIP phones, incorporating any-cast call distribution, or using existing VoIP provider for trunking are easy and readily available using SIP. If you want to build an instant messaging and presence client, it is better to start with XMPP. Features such as friends roster, group chat, blocking a user, storing offline messages, etc., are readily available using XMPP. Any advanced communication or collaboration system needs to include both these kinds of features.

XMPP has solved the application's problems and has defined mechanisms for several commonly used features in an instant messenger-type or shared state-type application, e.g., group chat, visiting card, avatars, etc. The emphasis is on application design, use cases, and practical solutions.

I think there are two main reasons for SIP's difficulty among developers: (1) the emphasis of SIP is on interoperability rather than application and feature design, and (2) the emphasis in SIP community is to have one protocol solve one problem, which requires implementing a plethora of protocols for a complete system. Let me explain these further.

When a new VoIP features is implemented by one phone, it must interoperate with another phone or VoIP service provider. Hence most SIP extensions focus on wire-protocol and interoperability mechanisms. Although specifications of several SIP extensions are available, there are no evaluation or open reference implementation on how they fit in the overall design. More recently efforts have been made, including my RFC 5638 (Simple SIP Usage Scenario for Applications in the Endpoints), to simplify the specifications for certain types of SIP applications -- those endpoints that want to work in web and Internet world without the legacy of the traditional telephony systems.

Secondly, SIP community tries to keep one protocol to solve one problem. Some extensions deviate from this guideline, but they are exceptions. The problem comes when this design principle involves implementing several distinct protocols just to get a complete system. For example, a SIP system incorporates other external mechanisms such as STUN, TURN, ICE, reverse-connection-reuse and rport-based symmetric request routing to solve the NAT and firewall traversal problem, and still does not guarantee media connectivity in all scenarios unless HTTPS/TCP tunnel in used. Implementing instant messaging and presence involves implementing several RFCs and drafts related to Event, PUBLISH, CPIM, PIDF, XCAP, MSRP, and still the application does not have all the features of commonly available XMPP client. In summary the SIP community has created numerous extensions for solving several problems in a way that scares away a new developer!

As you can see, both these reasons (emphasis on interoperability and one-protocol-one-problem) are ideal in theory. So what is wrong? The practice. To solve these problems, (1) IETF working groups should not proceed with a draft without an open-source and simple reference implementation, (2) IETF working groups should build reference applications combining several protocols for different kinds of applications and evaluate (a) consistency and (b) ease of programming.

Consistency indicates whether the new extension is consistent with existing guidelines, best practices, protocol format, as well as design principles. For example, if an extension incorporates a new processing in the server which could have been done in the endpoint, then it is against the principle of intelligence in the endpoints. Such extensions should be marked as such so that developers know the trade-off. There are only a few good design principles, hence creating a consistency matrix of extensions against principles should be easy.

Ease of programming is determined by three things: (1) how easy it is to implement the set of protocols, (2) how easy it is to build a real application using those protocols, and (3) how easy it is to build the real application using existing platforms and tools. The first is usually available as a software library, the second as an application and the third is re-usability. It should be easy to not only build the library but also use the library to build a usable application. Every new extension adds new things to the library, which cause more interaction in the application and hence more complexity. When a software project is started, usually the interoperability is not the highest requirement, but the re-usability, short development time and real prototype application are crucial requirements. Once the project is started on one path, it is very difficult to change the path by changing the core communication protocol. If there are reference implementations then not only they help you get started quickly but it also becomes easy to see how much additional complexity a particular SIP extension brings to the application. An important programming quote: less is better than more!

The flexibility of SIP also comes with its limitations. For example, SIP is flexible to support both UDP and TCP transport. However, UDP is treated as a second-class citizen by many programming languages or libraries even today, e.g., Tcl didn't support built-in UDP socket when it came out, and Adobe ActionScript does not have built-in UDP sockets for Flash Player even now. This prevents a developer from building a complete SIP stack as Flash application, for example. However, if you peek further, you would expect that if UDP is not supported then the platform is not suitable for real-time communication anyway. However, this does not prevent web-style developers to implement XMPP in ActionScript, and perhaps tweak it to support signaling of media sessions as well. The result is a broken or non-interoperable software application.

Reviewing the evolution of SIP vs XMPP specifications, I think XMPP has defined an architecture that allows adding new extensions easily and hence reduces the application complexity, whereas SIP extensions have focused on interoperability and wire-protocol without much needed attention to application design. While application design may seem unnecessary for protocol specification, it is very important in the short term. Consider a developer who uses some data structures for representing protocol elements. If a new extension is defined in XMPP, and it reuses the existing XML format that gets readily mapped to the data structures, it becomes very easy to incorporate this new extension in his source code. If a new extension is defined in SIP or SDP, which re-uses an existing mechanism of another protocol for which there is no real implementation available, then the developer will first have to implement that other mechanism, then integrate it with SIP or SDP. The mechanism may have its own formatting which needs to be incorporated in the data structures. Essentially the developer will have to spend more time implementing such an extension. In the end, the actual format of the message whether text-based or XML-based is not terribly difficult once you have a library for message formatting and parsing. However, if an extension uses a different format, connections, sessions, etc., that are not readily available in existing libraries and tools, complexity arises. For example, adding ICE to SIP/SDP created custom format whereas ICE in XMPP/Jingle re-used XML. Another example is how an particular endpoint is identified in XMPP vs SIP. In XMPP the URI itself is extended to include the resource, e.g., "user@domain/resource", whereas in SIP new extension such as globally routable user-agent URI (GRUU) is defined which is, well, more programming effort!

Scalability and performance
SIP is inherently a peer-to-peer protocol whereas XMPP is inherently client-server. Tasks that are easy in client-server systems such as shared state, roster storage on server, or offline messages on server, are well accomplished with XMPP. On the other hand, one of the primary goal of SIP is to keep the intelligence in the endpoint. Ideally, a SIP proxy server does not even maintain the session state for the SIP dialog. Few messages in SIP such as REGISTER and PUBLISH are intended for client-server communication. In XMPP, server is a must and all signaling communication goes through the server. There are message semantics defined for the types of messages, e.g., client-server information query, client-server-client message sending, client-server event publishing and server-client event notifications. Clearly client-server applications are limited by scalability and performance of the server. For example, an instant messaging session need not go through the SIP server saving bandwidth and processing at the server. But that means you lose the offline message storage feature at the server. In real SIP applications today, servers have become an integral part of the system and hence the scalability difference diminishes. In fact, the bulky message format of SIMPLE makes it less scalable than XMPP for presence updates that go through the server. Note also that although P2P-SIP is possible, a P2P-XMPP is not easy because XMPP is inherently client-server.

Once we know this, we understand that SIP and XMPP systems solve two different problems, are designed for two different architectures and have evolved with two different guidelines. From here, you can do two things: either try to incorporate/translate all the features of one system to the other and eventually give up, or try to design your system that uses best of both worlds.

Interworking and co-location
There have been interworking attempts to inter-operate SIP/SIMPLE and XMPP, especially the IM and presence part [draft-saintandre-sip-xmpp-*, draft-veikkolainen-sip-voip-xmpp-*]. The first reference shows how to implement a gateway to connect between SIP and XMPP networks, and the second shows how to implement a client that can support both SIP and XMPP and co-relate the two protocol messages if the user is connected to both servers by the same provider. The popular OpenSER (now OpenSIPs and Kamailio) SIP server has a Jabber module to inter-work with XMPP network. People have developed clients that can understand both SIP and XMPP. Interworking is complex, and not all features can be completely translated or used from one protocol to another, unless the protocol is changed a lot with custom hacks.

Industry experts predict that both SIP and XMPP will stay for a long time. Rather than arguing about the differences or trying to mend the protocols to be like each other, one could build systems that use both these protocols for what each is good at. XMPP is good at creating application level streaming/secure/client-server pipes that can be used for shared state, one-to-many message delivery and publish-subscribe-notify-type use cases. SIP is good at rendezvous of session establishment and negotiation of session parameters for a separate session establishment.

To interwork between XMPP and SIP, you could (1) use a gateway at the server to translate the basic functions, (2) learn or send SIP parameters over XMPP message from a client, or (3) use SIP to establish XMPP chat session with a client. For example, a multi-protocol client of user may be talking to over SIP, and discover that both clients support XMPP, and then add each other in XMPP roster or start an XMPP chat session. Alternatively, if they are chatting over XMPP and discover that the other supports SIP as well, then they initiate a SIP session to do multimedia call. Implementing both the protocols in the client is better than in the gateway for scalability and robustness. There are other interworking architectures possible, e.g., having two XMPP servers use SIP to communicate with each other or talk to a trunking provider, or having an integrated SIP-XMPP server that allows both SIP and XMPP users to seamlessly communicate with each other. These modes, however, are not interesting from a P2P point of view.

Reliability and scalability in P2P-SIP

In this article I list the important definitions that contribute to reliability and scalability of distributed systems, especially P2P-SIP. Scalability is defined as the ease with which a system can handle growth of demand (load, users, requests, etc.), and reliability is defined as the ease with which a system handles failure or loss (fault tolerant, availability).

Stateless: The amount of state stored in each networked component limits the scalability (as well as robustness). A truly stateless protocol or service does not need to store any state in the system. At the application level, a session oriented protocol such as RTSP or RTMP is stateful, whereas HTTP is stateless which can be made stateful using session cookies. At the transport level, TCP is stateful whereas UDP is stateless. SIP networked components come in several flavors: stateless, transaction stateful and session stateful. A stateful component has limited scalability not because of storage requirement of the state, but due to matching of a new request against the existing states -- this requires exclusive or read-write-locked access to shared resources and hence is slow. On the other hand a stateless request has all the information that is needed for handling that request. Secondly, a stateful component has limited reliability because if the component fails the state is lost and must be re-established, e.g., using new SIP transaction or new HTTP authentication. If you must use stateful component, then some form of distributed or replicated shared state is desirable.

Partitioning: Data or service partitioning among multiple machines helps in reducing load on each machine. A naive approach of using a hash function 'H(data.key) mod N' works for distributing data based on the lookup key among N identical and robust servers. However, the hashing algorithm remaps majority of the data if N changes -- new server is added or old one fails. On the other hand, consistent hashing using a large hash space, e.g., MD5, maps both data using H(data.key) and machines using id=H( in the same identifier space. A machine can then store the data whose keys are close to its own id. This principle is used in several structured P2P algorithms such as Chord, Pastry, CAN, and works especially well with large number of machines with significant amount of churn. Partitioning can be applied to services as well. Partitioning improves system scalability, but comes with an overhead of maintaining the correct partition when machines come and go.

Replication: Replication of data improves its reliability (also known as availability) as well as scalability to some extent. In a simple master-slave replication of data, write is done to the master which replicates the command to the slave, and read can be done from either master or slave servers. With higher failure probability, such as in P2P, you need more number of replicas. With N replica of some data, you can still get the data if N-1 machines holding the replica fail. Replication interacts with partitioning -- you want to keep access to the replica almost as fast as the original data. Typically a machine can replicate the data to its N (typically small, 2-8) neighbors and when the machine fails, the next data query automatically gets to its ex-neighbors. Note that another approach where a different hashing function, e.g., H(i+data.key), is used to store the i'th replica does not work well, because replication comes with an overhead of having the machines to keep the replicas up-to-date.

Redundancy: Replication is an example of redundancy of data. Redundancy can be applied to servers and services as well. A stateless protocol or service helps in easily deploying redundant server nodes. Data redundancy goes beyond simple replication of data object to N places. Suppose a large file is split into M chunks such that only N chunks are needed to reconstruct the original file, N<=M. Assuming that many data sources are fast, this approach not only improves reliability but also performance in terms of how quickly you can download the file. Such techniques are used in P2P file sharing applications, and can be applied to P2P-SIP as well, e.g., for storing video mails or live streaming. Redundancy improves reliability as well as scalability of the system.

Load sharing: Load sharing is one type of redundancy where the load of the system is shared among N redundant machines. Although not required, load sharing typically works in conjunction with partitioning, e.g., in two-stage SIP server farm where the first stage servers forward the requests to the second stage server clusters based on H(data.key). As with redundancy, a stateless protocol or service can be easily shared whereas a stateful system requires more work. P2P systems are inherently load shared among the participating peers. Load sharing improves system scalability.

Iterative vs recursive: Iterative request routing is one where a client sends a request to one destination, receives a response that redirects it to second destination, and so on. Recursive request routing is one where a client sends a request to one node, which sends request to another node, and so on. Iterative vs recursive is also called as redirect vs proxy. Clearly, iterative poses less load on the networked element but more on client, whereas recursive is opposite of that. If scalability of networked element is desired then iterative should be preferred. However, NATs and firewalls make iterative request processing difficult on the Internet. Secondly, the topology, bandwidth and connections among the networked elements sometimes make recursive routing faster and more efficient in practice than iterative. The decision to go iterative vs recursive affects the number of message that needs to be handled as well as the state carried in the message or stored in the networked element. It is easier to incorporate redundancy in iterative mode, since only the client needs to re-try to redundant destinations.

Keep-alive: Network protocols usually have periodic keep-alive messages to ensure connectivity or to detect failures. Stateful Transport protocol such as TCP has built-in keep-alive mechanisms that can be activated using socket API. Application protocols employ some kind of application level keep-alive, e.g., XMPP has an extension to do ping, SIP has session timers. These not only detect failure due to network but also due to server software crash. The keep-alive messages help in improving the reliability of the system by quickly detecting complete or partial failures. Keep-alives are especially important in P2P network because of the large number of network paths that can fail.

Exponential back-off: After a failure has been detected, a reconnection or resend is attempted. The time for such attempts needs to be backed-off exponentially -- if the failure happens at time 0, then send first attempt at t, say t=0.5s, and subsequent ones at 2*t, 4*t, 8*t, and so on, until it reaches a cut-off, say 5 min. After that keep attempting periodically, say, every 5 min. If the failure is transient or one time, then this mechanism quickly reconnects. On the other hand, if the failure is longer term, then it reduces the load and bandwidth for reconnection attempts. SIP uses exponential back-off when sending subsequent requests in the event of failure.

Request-response: Application protocols typically come in two flavors: send-and-forget and request-response. Most signaling and control protocols such as HTTP and SIP follow the request-response architecture. Media streaming protocols such as RTP usually don't send response for every message. A request-response protocol automatically makes the entity sending the request a client and the entity receiving the request a server. The client-server distinction may be on per-transaction basis, e.g., SIP has each endpoint act as user-agent-client and user-agent-server. Similarly, in P2P network every peer acts as both client and server, while there may be some nodes behind NAT that can act only as client. A request-response architecture is needed where reliability of the message delivery is important or when RPC semantics is desired in the protocol.

Redundant connections: Redundant connections are another form of redundancy found in distributed systems. For example, a client may be connected to multiple servers. It periodically pings the server and selects the best server to actually send the request to. If the best server fails, it can fail-over to the next best server. Redundant connections improve both reliability and performance of the system. Redundant connections are also useful with geographically distributed server farms to locate the closest server. The idea is to dynamically adapt instead of having statically configured connections.

Bi-directional master-slave: Master-slave replication is used for data reliability and to improve scalability of read-dominated applications, as mentioned before. In a bi-directional master-slave configuration, both machines act as both master and slave at the same time. Any write to any of the machine gets propagated to the other, and hence any read can be done from any of the machines. This improves scalability for write-dominated applications also, such as SIP server. The bi-directional replication can be extended to more than two machines by incorporating a circular ring topology of replica machines. The bi-directional replication comes with the over head of having to maintain replicas on all the machines, and some way to solve a race condition where two updates to two different machines result in eventual consistency. For certain types of data, such as SIP contact locations of the users, it is possible to achieve such consistency.

Vertical vs horizontal scalability: A lot of time you will hear people talking about vertical vs horizontal scalability. Vertical scalability implies that when the load increases you identify the bottleneck and add a new component (e.g., CPU, memory, disk) to your machine to improve the scalability. Horizontal scalability implies that you design the system such that when the load increases you add another machine in the network to handle the load, e.g., another server in the server farm. P2P systems are horizontally scalable. Clearly vertical scalability has a limit beyond which it may not scale, whereas server farms can scale linearly by partitioning.

Proxy and Cache: Caching improves performance and scalability of the system. DNS has epitomized the concept. Caching is also used in web and media streaming protocols. The idea is to install a cache in an intermediate networked element which uses the cache instead of sending subsequent requests to the actual destination. HTTP is designed to heavily use caching to improve performance and scalability of the servers. Caching a negative response is more challenging since the time-to-live for the cached entry is unknown. On the other hand, real-time communication protocols such as SIP and RTP have limited use for caching in the network. Nevertheless caching of data improves the performance and scalability of the SIP server, e.g., by using in-memory cache of the SIP contact locations instead of reading from the database every time. With caching comes the overhead of maintaining consistency among redundant data.

Crash vs byzantine failure: Crash and byzantine failure are two types of failure models: a crash indicates that the system has stopped working and it may be possible to detect its failure, e.g., using keep-alive; whereas a byzantine failure indicates that the system does not behave consistently as per the agreed protocol or algorithm all the time. Hence, it is very difficult to detect byzantine failures, especially if it is due to malicious intention. Byzantine failure and malicious node problem is especially important in P2P since the peers may not trust each other. Both crash and byzantine failure reduce the reliability of the system. Various mechanisms mentioned in this article help mitigate the crash failure, but do not help much with the byzantine failure.

[1] Singh, K. and Schulzrinne, H., "Failover, load sharing and server architecture in SIP telephony", Computer Communication 30, 5 (Mar. 2007), 927-942. DOI= [Author's copy]
[2] K.Singh, "Reliable, scalable and interoperable Internet telephony", PhD Thesis, Computer Science Department, Columbia University, New York, NY 10027, June, 2006.

Security in P2P-SIP

I frequently receive questions on security in P2P-SIP, mostly from researchers looking for a new topic to explore. Security in P2P-SIP (and in P2P in general) is a challenging problem. In this article I summarize my understanding of the challenges and open problems.

The first literature on P2P-SIP mentions that P2P-SIP needs to solve the challenges of client-server Internet telephony as well as privacy, confidentiality, malicious node behavior and "free riding" problems of P2P. For example, a malicious node may not forward the call requests correctly or may log all call requests for future misuse. A later publication formally identifies the key security challenges and potential solutions. The paper on survey of DHT security techniques presents a comprehensive listing of challenges, solutions and problems. Let us classify the challenges:

User Authentication: Similar to client-server SIP, authentication is essential. A receiver need to verify that a sender posing as is actually the owner of that identifier. If the user identity is based off some other information or identity owned by the user, e.g., email address, phone number, postal address, social-security number, credit card number, PKI, X.509 certificate. etc., then it is possible to delegate the identity to that mechanism, e.g., by sending email or phone caller ID verification. The challenge can be further divided into: whether the user owns the identity? whether the user can randomly pick his identity to anything? whether a user can be made to believe that he has (wrong) ID or password? whether a malicious user can get password from another user in the pretext of authentication so that the malicious user can later assume the other user's identity?

Node Authentication: Additionally, since a number of P2P algorithms use the node identity to locate a node or define data storage criteria, the node ID is also a candidate for spoofing. A receiver must verify that the sender owns the node ID that it is posing as. The problem can be divided into sub-problems: whether the node ID are randomly picked or self generated by (malicious) nodes or assigned securely by some authority? whether the node ID can be spoofed in the protocol messages or data storage? whether a node can be made to believe by other nodes that it has (wrong) ID? whether a malicious node can get the authentication credentials of another node and later assume other node's identity? These problems if not addressed can result in other problems such as man-in-middle or denial-of-service (DoS) attacks. The most important question is: Can authentication be done in P2P without relying on central trusted authority?

Overlay Routing: A malicious node in a P2P network can drop, alter or wrongly forward a message intentionally manipulating the correct routing algorithm to disrupt the network and hence availability. This partly depends on node ID assignment mechanism, whether a node can intentionally place itself in the topology at a particular place? Further questions to ask: what fraction of malicious nodes affect what fraction of P2P network? What is the relationship between performance (availability, routing and data storage) of P2P and the fraction of malicious nodes or users?

Overlay Maintenance: A malicious node may invite more malicious nodes or copies of itself in the P2P network. A malicious node may partition the P2P network so that one part can not reach the other. A malicious node may reject join requests from other good nodes to prevent them from joining the network. The questions: what fraction of malicious nodes can affect what fraction of P2P network availability? Can a malicious node eventually affect the whole network given enough time? Can a malicious node affect the discovery of bootstrap node by other nodes that affects the joining process of other nodes? Can a malicious node intentionally place itself in the topology at a particular place (e.g., as super peer), so that it affects more number of overlay messages?

Free riding: A P2P network works because the peers do. If a node refuses to serve as a peer, but just use the service of the other peers, how do you handle this? Can the system enforce or give incentive to a particular node to become part of the overlay? What fraction of the nodes must be part of the P2P overlay for the overlay to work?

Privacy, Confidentiality, Anonymity: Unlike the client-server telephony, in P2P-SIP the call signaling and media messages may traverse through other nodes in the system. Can other nodes know who is calling whom and hence infringe on user's privacy? Worse, can a malicious peer listen to the conversation (audio, video, text, etc.) between two other peers? Can the system allow you to make anonymous calls so that the receiver does not know who is calling? Can the system allow you to receive calls (e.g., any-cast calls to call centers) without divulging your identity to the caller?

SIP services: The client server SIP implements several new features and services, but those have limited use in P2P-SIP because of the trust model. For example, programmable services using SIP-CGI or SIP Servlet are difficult, e.g., unless the receiving peer can completely trust the calling peer's CGI. Emergency services, spam prevention and lawful interception that have been researched in client-server SIP are pretty challenging in P2P-SIP.

Cost of security: Most of the existing protocols on the Internet suffer because people don't implement or deploy enough security. For examples, the front web page of many banks do not use HTTPS/TLS but have login forms. The reason is that system and operations engineers see security as an overhead, and do not use unless really needed. P2P takes this to extreme because the (in)security of one node can affect several in the network. The questions: What is the cost of security? How much does performance suffer in terms of number of messages, overhead, delay, for a particular security mechanism?

Given these problem there are several approaches the researcher are taking in solving. However, the core of some of these problems still remain unsolved. The general approach is to define a sub-set of the P2P-SIP system which works for the given security mechanism. For example, the P2P authentication is very challenging -- hence most implementations use a central certificate authority (CA) and everyone trusts that -- similar to the web browser model which comes installed with some root CA. The other approach is to build a closed P2P network of trusted implementations and provide the service to the rest of the untrusted users, e.g., OpenDHT and Dynamo. This works similar to the server farm model, except that the server farm is built using sub-set of P2P features -- self adjusting, less configuration, distributed data storage, geographically distributed. Another approach is to build the closed and proprietary system and protocol which prevents (to some extent) others from injecting the malicious node in the system, e.g., Skype. Unfortunately, sooner or later the protocol gets reverse engineered and the security is not longer present. The research on distributed trust, reward, or credit/debit system works well for file sharing but has not be successfully proven for P2P-SIP. Finally, some researchers focus on the statistics and availability of the whole network, with the theory that a small fraction of malicious nodes do not disrupt the whole network. If there is enough incentive for the nodes to remain good, this may work well.

If you are interested, please read the article on when P2P makes sense? In particular, if (1) most of the peers do not trust each other AND (2) there is not much incentive to store the resources then P2P does not work well because the system does not evolve naturally to work. Think of it as people who do not trust each other and they do not have much incentive to help others or to store information for other people, will a person be able to get information he needs that another person has? The subset of problems I listed in the previous paragraph all try to twist the problem such that peers trust each other, i.e., (1) and the system tends to evolve naturally to work. Still more research is needed in (2) to identify and develop the incentive model for P2P-SIP use case.

The (in)security of Flash Player's crossdomain

This article discusses the (in)security of Flash Player's crossdomain or cross-domain-policy mechanism and why it is against P2P. Anyone who has worked with Flash Player's network (URLLoader, Socket, etc) to implement a protocol would know the problem and pain that Flash Player causes due to it (broken) crossdomain security.

The problem: A programmable content downloaded from site1 running in user's browser should need explicit permission to connect to or use content from another site2. Otherwise, a Flash application may randomly connect to or use content from any other site without user's knowledge and pose security risks.

The real problem: The real problem is in the way Flash Player tries to solve the above problem. If a Flash application is downloaded from site1, and wants to access or connect to another site2, then that site2 must give explicit permission using a web accessible http://site2/crossdomain.xml or in-line cross-domain-policy response in the TCP connection. The crossdomain.xml file lists all the other domains (such as site1) whose Flash applications are allowed to connect here, and also lists all the ports to which they can connect. There are options to give wild-card (*) for domains and ports. Thus, only if site2 trusts site1, will it allow site1 to connect.

The first problem with this approach is the trust model: Flash Player asks site2 instead of the user for permission. This means user still does not have control what other sites the Flash application connects to; if site1 used wild-card domains then any application can connect to it; and admins of site1 and site2 must co-ordinate and collaborate. In most real deployment, this means that site1 and site2 are owned by the same entity and the deployment builds a false sense of closed walled garden of client-server applications.

The second problem is that it is very easy to work around: if site1 is wants to use or connect to site2, but site2 does not trust site1, then site1 can install a connection proxy on its site1 and have the Flash application connect to site2 via this proxy on site1. So it does not really protect site2 from access from any third-party Flash application -- i.e., there is no closed walled garden for site2. What it actually means is that if site2 has a content or service then anybody can build a Flash application to access that content or service as long as he can host a proxy on the Internet. You just need _one_ person in the whole Internet with good bandwidth and an open proxy to potentially break the crossdomain trust model of _all_ the Flash applications.

The third problem is that it assumes Flash Player binary will not be reverse engineered: This is the worst sense of security in the literature. When a Flash application is downloaded from site1 by Flash Player, and wants to make connection to a another site2, it just checks the URL from where the Flash application was downloaded. If the URL contains the domain of site2, then usually the crossdomain check is not done, but if the URL does not contain site2, then it first gets crossdomain.xml file. If some person is able to hack and modify the URL variable inside Flash Player or in the transit on non-secure HTTP, then the Flash Player will effectively ignore the crossdomain check.

The fourth problem is the way it has evolved: The initial implementations of crossdomain policy was broken, with lots of ways to work around. With newer implementations of Flash Player some of these problems have been fixed. However, that means the older mechanism is deprecated and newer mechanism no longer works with older. Concrete example of the problem follows. Should an application downloaded from http://site1/p/app.swf be allowed to connect to site1:5222 (non-HTTP port), or to http://site1 or http://site1/q/second.swf? If yes, then public hosting of personal web pages effectively open up all the Flash applications of all the hosted users on that server. So why not define a meta-policy at the top level http://site1 which controls everything else? How can site2 allow only one application from site1 but not other to connect? How can site2 allow only applications that are signed by site2 to connect but not others, even if those applications are hosted on other sites? No answer!

The fifth problem is that it depends on other unsecured mechanisms: HTTP and DNS. I won't describe details on these, but it would have been nicer if the security was based on code signing or standard authentication mechanisms instead of comparing the hostname from the HTTP URL.

The sixth problem is its incompatibility with existing systems: When implementing a custom non-HTTP protocol, the Flash Player sends the first request as <policy-file-request/> on the connected socket. What if the service does not understand this request? Well use meta-policy. In that case for it to work, the meta-policy needs to be served from say, 843. What if another server is present on the machine on port 843? Or what if you want to add handling of this policy-file-request in your own service protocol? In my personal experience getting this in an XMPP server was a nightmare. What if Flash Player puts a nul character after its request? You cannot control the request or response for policy-file from ActionScript because it happens automatically in Flash Player before a socket is actually connected in the ActionScript.

The solution: In the context of open services and/or open network and/or P2P, the crossdomain is a problem rather than a solution. If site2 has hosted an open service, it should not restrict anybody to build applications to connect to site2. Thus site2 should put a crossdomain.xml with wild-card domains and ports. If site1 has a built a Flash application for an open service, the application should be allowed to connect to any service that follows the protocol as long as the user approves the connection. Thus site1 should build Flash application with correct user authentication, and then proxy the connection via site1's proxy to the site2's service instead of having it connect directly to site2. This allows site1 and site2 to operate independent of each other as long as they implement a common service protocol, e.g., XMPP or P2P-SIP.

The real solution: A correct security implementation in Flash Player should have done something like the following:
1. When the Flash application tries to connect to another new site, ask the user (similar to the camera and microphone security settings) if he wants to allow the connection. Also give an option to remember the approval if needed.
2. Allow site2 to sign a Flash application, and require that only signed Flash application with so-and-so root certificate should be allowed to connect to site2.
Beyond these any site should be free to implement its own authentication mechanism.

The Internet Video City

(For the last month and half, I have been aggressively involved in another open source project, "videocity". This article describes the salient features and novel ideas in that project.)

The goal of the Internet video city project is to provide open source software tools, both client and server, for video communication and sharing. Unlike other file sharing systems, this is targeted towards video and live video sharing in small groups. Unlike other video communication services, this project provides the tools needed to build a service.

High level description

At the high level, the video communication is abstracted out as a city. An individual can signup with his email and own a home with URL of the form http://server:5080/ This is also the location of the default guest room of that user. The user can build other rooms inside this URL, e.g., for hosting a online family gathering, he can get a room with name "Family Gathering" and the room URL of the form http://server:5080/ Each room can be made public or private. A public room is accessible to anyone visiting the URL of the room, whereas a private room needs explicit permission to enter.

Once you have entered a room, you see other members in the room, and can communicate with others using real-time audio, video and text chat. You can share media files such as photos and videos from your computer with others in the room. You can also share online photos and videos with others. All these shared resources are put in an active session and would disappear when the room is closed, i.e., all members have left the room.

The owner of the room can decorate his room by uploading, recording or editing the room's content. A room's content is described using an XML file containing multiple play lists. Each play list contains sequence of media files or URLs. When you enter a room, you see all the pre-configured play lists in that room. This allows the owner to, for example, create a room with his family pictures and videos in a slide show, and give out the URL to others to view the photos. A media resource in a play list can be text, image or audio/video. The image and audio/video can be uploaded from user's computer, downloaded from a web URL or recorded using user's camera in real-time. The play list can be readily edited using drag-drop, built-in text editor or various button controls.

Each signed in user also has an inbox. The inbox is a special XML file that gets loaded when a user logs in, and contains play lists that are sent by other users to this user. When you enter a room, you have an option to send a play list to the owner of the room, which turns up in the owner's inbox. You can record the play list using your camera, or create one using resources available from the web. The play list stored in the inbox is privately available only to the owner of the room.

This simple concept of play list and rooms, allows us to implement various communication scenarios. For example, real-time communication, video mails, publicly posted videos, and video web sites.

Novel idea

One of the novel concept used in the project is that of soft-card. A soft-card is a digital version of your ID card or visiting card. There are two types of cards: a Private login card is your confidential ID card that you use for login to the site, an Internet visiting card is your room's visiting card, which you give out to your friends so that they can visit your room. Usually each signed in person has a private login card, and each room owned by the person can have an Internet visiting card.

A soft-card looks like a digital image of your real ID and visiting cards. It is actually a image file in PNG format. The image has a photo, your name or your room's name, some list of key words identifying your room, and a URL of your room. Unlike a regular PNG file, a soft-card has additional meta information that is used in secure identification and access. In particular, your private login card has your RSA private key (refer to PKI) and your Internet visiting card has X.509 certificate using RSA public key signed by the server. These meta information such as keys, certificates, names, emails, keywords, etc., are stored in information chunks of the PNG file itself.

Similar to public key cryptography, these digital files can allow us to implement security, authentication, access control, privacy, confidentiality, etc. Essentially, anything you can do with PKI, you can do with these soft-cards. Additionally, these soft cards give a visual appearance of an ID card or a visiting card containing the URL which they represent. Users receive them in email on signup, and can give out visiting card to others in email. An example visiting card is shown at the top of this article. If you edit the card's file or image in any way, e.g., converting to JPEG and back, or edit using photo editors, then the card's key information will become invalid and unusable. Note that a card is valid only within the domain it is created for. Thus a card created for http://server1/room1 can not be used by http://server2/room1 even if both server1 and server2 virtual domains are hosted by the same server.

Once we have the login (private key) and visiting (public key) cards, implementing rest of the security mechanisms is straight forward. For example, resources in an inbox can be encrypted using public key of the owner, so that only a private login card can decrypt it. The public rooms are signed by owner's private key, so that anyone with the visiting card of the room can verify the signature. When sending a media resource to another user, PKI can be used to establish a secure session of communication. A room can be made private by allowing only connections from people who have valid visiting card for that room, and have the owner send out visiting card to his friends and family using an independent channel such as email. A room can be made public by uploading the visiting card to the room itself, so that anyone with the URL can first download the visiting card (i.e., public key) and use that to connect to the room. Although we haven't implemented most of the security mechanisms, we have the basic soft-card concept implemented in the project. In particular, you can create your cards, edit the layout of the card during creation, download them after creation, and use them to upload in the client to join a room or to log in. One thing to note is that within the Flash Player environment, the amount of security using PKI is limited. But since we have our own video server implementation as well, we can do some novel tricks in that regard.

Product design ideas

There are several product design ideas we implemented in the project: (1) consistency, (2) flowing and smooth interface, and (3) performance. In this section, I describe these ideas and how they are implemented.

Consistency is very important in user interface design. The look and feel of various buttons should be consistent. Common operations should be consistent with what people are used to doing. For example, most windows users see the 'close', 'maximize', 'minimize' buttons on the top-right corner. Most mac users see the bottom bar as tools or commands bar. Most instant messaging users see notifications on the bottom-right corner of their screen. We used these concepts in our UI design as well.

Flash allows us to implement nice, smooth and flowing user interface. When you go from one room to another, the view slides your window from one room to another. The sliding window component in the project nicely abstracts out the details of this container. When a help video is played, it animates to the full view, and when it is paused, it goes back to the original position. For help videos, flowing subtitles along with audio/video give a better user experience. Computer users are comfortable with drag-and-drop operations using the mouse. In our project, the play list editing, video window re-organizing, delete button, etc., use the drag-and-drop mode of operation.

Performance is important once the project grows to a significant size. In particular, a Flash Player spends lot of cycles rendering images. This is improved significantly in our project since we use only programmatic skins for all our buttons and icons. Moreover, programmatic skins scale nicely when going to full screen or different size.

There were a number of lessons we learned in this project from the product design perspective. Moreover, being responsible for both product design and product engineering helped us avoid ambiguity, which is usually seen in multiple team projects.

The big picture

Although, the project is still "work in progress" and a lot of work is remaining, I wanted to give a big picture of the project. Flash Player is a great browser plugin. However being proprietary makes it hard for others to use it in full potential. For example, until recently the video communication was restricted to only Flash media server, or file upload were not allowed from local computer to Flash Player without going through the server. Although Adobe is making significant progress in keeping the developer community engaged, (e.g., making RTMP protocol open, or making file uploads and downloads available in new Flash Player) there will always be some restriction in the Flash Player. For example, absence of H.264 encoder or good audio quality/preprocessing engine prevents us from using it efficiently in true H.264 video communication or good real-time audio communication. In any case, since the RTMP protocol is open, and since there are a number existing open source RTMP implementations, one can use back-end RTMP based servers to perform some processing.

This videocity project gives us back-end tools to intercept RTMP, integrate web communication, and expose a single server to support various requirements of video conferencing. One can ask whether this will scale? The answer is, may be, not. The reason for doing the project though is that it fits nicely in the big picture of P2P-SIP based communication framework. Flash gives a nice ubiquitous browser based front end, whereas our videocity server gives tools that can be integrated with peer-to-peer network. Thus we can gain from advantages of both worlds.

Distributing a conference in a P2P network is an already researched problem. Several solutions exist, ranging from application level multicast for large conference, to full mesh small conferences, to picking a few servers as relay bridges. Maintaining shared distributed state of the conference and collaboration is interesting to explore. The SIP community has done significant work in centralized conferencing framework, e.g., in the IETF XCON working group. The P2P-SIP working group is creating protocol for standards based peer-to-peer network maintenance and lookup for SIP service. Finally, some API or interface specification is needed for the videocity's client-server model so that others can build clients or server adaptors to integrate between XCON, P2P-SIP and videocity. In particular, we will define all the interface elements such as format of the soft-card, various RPC calls for uploading or downloading resources, sharing play lists, authenticating users, as well as communication mechanisms.

In summary, the project gives developers a starting point from where you can build video communication service, video message platform, video recording and editing system, collaboration engine, media sharing software, video blog web site, video rooms, multi-party conferencing applications, desktop clients, browser extensions, application sharing, new client applications, and so on. The client-server tools available in the project allow you to record a video or snapshot photo from your camera and store it in local file, create play lists of various heterogenous media resources, and share live and stored media with others using the system.

There is no hosted service for this software, and we don't plan to have one. This is because our goal is to go peer-to-peer, where various installations of the software will discover and communicate with each other!

Thank you for your reading time, and we love feedback!

Beauty of open source

This article presents an analogy between software projects and beauty: open source projects have natural beauty, whereas commercial projects acquire beauty through cosmetics and makeup. [see previous article].


Software project

Natural beauty is, well, natural, whereas cosmetics give artificial sense of beauty.

Open source software are built when some motivated developer feels like building something, where as commercial software are mostly built by engineers who are paid and forced to build.

Natural beauty is long lasting, whereas makeup wear down after few hours or days.

Open source software lives longer, whereas commercial projects tend to get out beaten by competition sooner or later.

You are born beautiful and don't have to pay for natural beauty, whereas cosmetics cost money, big money.

Open source software are mostly free, whereas you have to pay for commercial software.

It is hard for salesman to sell fruits and water to enhance your beauty, whereas it is easy for salesman to sell cream and nail-polish. Sometimes these advertisements are deceptive.

Usually you don't find people advertising their open source work much, whereas companies have dedicated sales team to sell the commercial product. Sometimes these sales pitch are deceptive.

Natural beauty is usually open and does not hide scars, whereas cosmetics are meant to hide scars, marks, etc., to give a (false) sense of beauty.

Open source software are open source code, where developers can jump right in the code and see things. Commercial software hides the source code, and instead presents documentation, power points, sales pitch, etc., to give a (false) sense of what is inside and actually hiding what is inside (source code).

Natural beauty does not require support. On the other hand if you are putting a nail polish, you will need a remover; if you are putting on make up, you will need to remove it before bed.

Open source software usually comes with no support. If you have it, you own it, and you put up with it. Commercial software usually comes with (expensive) support system. If you have it, you need to keep paying for bug fixes and upgrades. Otherwise it will harm you sooner or later.