Kundan Singh: October 2010

Why do P2P-SIP?

One of the questions I often get: if SIP itself is peer-to-peer, why do P2P SIP?

Unlike typical client-server web application or Jabber presence system, a SIP user agent is both client and server. Which means, a SIP user agent can send as well as receive a SIP request. In theory, a SIP proxy is not a required component in a SIP system. Ideally, the caller does not need to know whether a call to "sip:bob@some-server.com" will be received by a SIP user agent directly or by an intermediate SIP proxy server first. In practice, people often refer to an intermediate SIP proxy (and registration) server as the SIP server, and such SIP servers are norm rather than exceptions.

In theory, once the end-to-end SIP session is established between the caller and callee user agent, the media can traverse directly between the two user agents on the IP network. This is what makes SIP systems support peer-to-peer (or end-to-end) media path. In practice, presence of network address translators and firewalls prevent direct IP connectivity between the two user agents for media path. This requires workarounds such as interactive connectivity establishment or network relay services such as media-proxy.

A bigger problem in practice is that due to business reasons, the VoIP provider does not want the media path to be end-to-end, so that it can have control over the "service" for accounting, billing, advertisement or other reasons.

In short, SIP is a tool that enables peer-to-peer (or end-to-end) media as well as signaling path. Protocols are like tools, which vendors use to build applications and systems, similar to how a construction worker uses various tools to build a house. In the current Internet, many vendors have used SIP to build closed walled gardens of managed services, unlike what SIP was initially envisioned for.

On the other hand, P2P-SIP is a realization of this problem to explicitly define SIP-based communication without using managed servers. Instead of central SIP servers managed by your provider, which can impose constraints to break end-to-end media and communication services, P2P-SIP aims at decentralizing the registration and proxy functions so that the signaling path is peer-to-peer.

The main benefits of P2P-SIP are as follows:

Organizations and services providers can save cost of server maintenance. In particular, there is no new training or position required specifically for a VoIP IT staff. There is no need to host dedicated servers in data centers with 99% availability and pay for energy and bandwidth.
The VoIP industry can move away from traditional service provider oriented business to a more open end-to-end user application. Essentially, it becomes more inline with your other Internet services such as web and email: how web browser don't distinguish between servers or domains, and how you can send email using any mail client and any provider to anybody else.
Finally, the most important reason is that the cost saving eventually propagates to end-user, who can enjoy free VoIP service as long as they are paying for their IP network access. Peer-to-peer infrastructure enables highly scalable communication system such as Skype at a very low cost. A small vendor's VoIP is able to scale to millions of users only if it can save cost of server maintenance and bandwidth.

The important aspects of P2P-SIP are:

It does not depend on a VoIP service provider for signaling and media path. Hence there is not much money for managed services in P2P-SIP.
It can use end-user devices on public Internet to relay media path for end-users behind restricted networks. Hence it requires incentive for public end-users to help restricted end-users.

As you can see, benefits of P2P-SIP are for end-users, at the cost of service provider businesses. Most of existing VoIP effort is driven by large corporations and carriers who do not have any interest in making the service open to end-users and lose control over them.

Theory vs practice of SIP-based VoIP

I recently attended the VoIP conference and expo [1], at Illinois Institute of Technology, organized by Prof Carol Davids, and also got a chance to speak on a couple of topics [2]. There were many interesting presentations in the conference giving perspectives from leading software and equipment vendors, carriers and service providers, government bodies, standardization forums as well as open source developers. This article presents some of my thoughts regarding the theory and practice of SIP-based VoIP.

The inaugural session showed a demonstration of IP-based 911 call by students by integrating and using the software pieces developed at other universities. It was great to see sipd [3] being used by other universities for exciting new projects, and brought back the memories when we were developing sipd.

Theory

The session initiation protocol (SIP) was invented to create and control multimedia sessions on the Internet because the previous protocols were either insufficient (e.g., HTTP, RTSP) or too complex (e.g., H.323). Unlike HTTP which typically requires a dedicated server, SIP was designed to be more peer-to-peer. Hence, your VoIP phone itself acts like both client as well as server to send and receive SIP requests. Ideally, you do not need to keep the SIP proxies in the session path, except for initial call setup. The protocol is designed to encourage subsequent requests such as ACK, BYE or re-INVITEs to be end-to-end, if possible. The protocol includes mechanism to enable an intermediate proxy to require that all subsequent requests in a session be sent via that proxy. But this was designed to be used as an exception rather than a rule. The motivation is scalability: keep the proxies lightly loaded.

In theory, a SIP-based system follows the end-to-end principle of Internet: keep the intelligence in the end, and have the network (or intermediate proxies) be dumb. The end-to-end principle has been crucial in the success of the Internet and more recently the web applications. As long as you keep the network provider independent of your application provider, you see explosion of application innovations.

Practice

In practice, your SIP-based system is typically "owned" by your network provider who has business incentive to provide you billable applications and services, and prevent you from talking to other open SIP-based systems without going through their billed "services". The largest SIP systems such as Comcast and Verizon digital voice are designed to be closed systems which use SIP in the network without allowing end-users to access it directly. More recently, Apple's Facetime is a closed service and does not inter-operate with other SIP services. With SIP-based IP multimedia subsystem (IMS) being adopted by wireless carriers, there is more incentive for businesses to convert SIP from an end-to-end protocol to a network centered and managed service.

One of the term I kept hearing during the VoIP conference was the "managed" services, and why it is useful for consumers, and what kind of new innovations are happening. When I looked at details, these services and innovations are basically what SIP already provides, e.g., service APIs, unified communications, etc. I had the opportunity to work on some of these 5-10 years ago when I was at Columbia University. It is discouraging to know that because of narrow minded business incentives of vendors and carriers, walled gardens of SIP systems are created which prevent open innovations and require significantly many fold effort to get basic features to the consumers. First, (1) the vendors and carriers use an open protocol, SIP, to build a VoIP system, then (2) invest resources in making it a walled garden, and finally (3) invest more money and resources to create federation of these walled gardens. In the end, (2) and (3) nullify each other, and just (1) was sufficient. All the money invested in (2) and (3) gets wasted in the long term.

Conclusion

I would like to advise vendors and carriers to just focus on providing a good IP network with some quality of service and less restricted NATs, and leave the rest of the VoIP services to the millions of application developers. As Henry Sinnreich said during the conference, the only service is "the Internet". Instead of providing "managed" network services, open up your network for end-to-end innovations. In the long term this will boost your network and bring more revenue. With Internet and web, there is more opportunity for everyone, and a walled garden approach in the network is just going to keep you away from the long term growth.

The open innovations in VoIP are bound to happen. If you do not want to be part of that, someone else definitely will. Adrian Georgescu presented Blink [4], a fully featured easy to use SIP user agent, as a great example of SIP-based open innovation. For every VoIP developer in "managed" service organization, there are probably a thousand independent developers such as web application developers. Sooner or later, these developers will build some open platform or system which will attract hundred times more traffic than your managed services and hence many fold more revenue. At that time all your investment in "managed" services will go down the drain. Because if you don't open up your SIP systems, these developers will not wait for it, and build something else. This has happened before with Internet and web applications, and will happen again sooner or later with Internet voice/video communication or VoIP.

Tips for implementing application protocols

This article presents some tips for implementing application protocols such as for web services, multimedia communication, streaming or Internet telephony. The tips are mostly relevant for implementations in the Python programming language.

Keep all blocking operations outside your protocol implementation. This mostly includes sockets, files and timers. If you design your protocol parser and controller to be independent of blocking calls, then it can easily be converted to various asynchronous or synchronous controllers as needed. For example, the rfc3261.py module implements core SIP stack using the Stack class. The application supplies API for timer creation, message receiving as well as sending. When the application receives a packet on socket, it invokes a method on the stack. When stack has parsed the received packet and needs to inform an high-level event such as incoming call to the application, it invokes a method on the application. This allows the application such as voip.py to provide co-operative multitasking based controller. On the other hand, the built-in HTTPServer in Python includes synchronous and blocking calls for sockets and disks. This makes the built-in class' HTTP implementation hard to use for various high-performance application that cannot afford to block. Due to this, almost every web framework implements its own HTTP, instead of re-using the built-in class. The trade-off is that your implementation may become more involved if you keep blocking operations outside.
Do not use multi-threading in your protocol implementation. Firstly, getting a multi-threaded application right is very hard. Secondly, for CPU intensive tasks or disk I/O bound tasks, the CPython's global interpreter lock (GIL) will prevent efficiency anyway; hence multiprocessing should be used. Thirdly, for network I/O bound applications multi-threading has advantage, but not as much as multiprocessing. Consider using multiprocessing, but beware of cross-platform problems, especially on Windows! In my experience, co-operative multitasking (or green-thread) works best for protocol implementation. If you are worried about efficiency on multi-core CPUs, you should leave that decision to the main application that will use your protocol implementation to present a client or a server application. The main application can decide whether to use multi-threading or multiprocessing and co-ordinate among them.
Decouple the protocol parsing and handling implementation. Sometimes you may need to use just the parser without the handler. For example, if a single incoming TCP connection can have either HTTP, SIP or RTSP messages, then it becomes easier for the application to first parse the message to determine what it is, and then invoke the appropriate handler. Because of NAT and firewall, many application protocols need to be sent via a single port, e.g., 80 or 443. If an application from Flash Player connects to your server on TCP, it will first send a socket policy request, before sending any other actual application protocol message. If your protocol parser is separate from the handler, you can invoke the socket policy request parser as well as protocol parser, to determine what request it is.
Avoid blocking on DNS lookup, if possible. This goes back to first point; do not block in your protocol implementation. Usually it is hard to notice the DNS lookup as blocking. Most built-in libraries provide synchronous and blocking calls for DNS lookup. Consider using some asynchronous DNS library. If that is not possible, move the DNS lookup out of the core protocol implementation, to the main application. Sometimes DNS lookup is done during logging, e.g., to convert client IP address to host name, and may be hard to detect.
Log all warning, errors and exceptions. In server implementations, you may get tempted to handle various exceptions and ignore it, to make your server more "robust". Unfortunately, this practice leads to more headaches later on when some critical bug appears but is hard to detect. If you log all warning, error or exception conditions, even if you ignore them, you may be able to detect such bugs early on. A warning is a suspicious behavior either in your code or external system. An error is a failure case due to some external problem, e.g., file requested by client was not found on server. An exception is most likely a programming mistake, e.g., accessing attribute on "NoneType".
Do not hold on to resources. With automatic reference counting and garbage collection, it becomes your responsibility to free up any unused references. Typically the application protocol defines how long the resources should be kept, e.g., how long a SIP transaction lasts. But there are some resources which can persist for much longer duration, e.g., user contact location. External databases are more suitable for such resources. Secondly, with event driven software architecture such as listener-provider model, it is easy to get in to reference loops, e.g., listener has reference to provider and vice-versa. Similarly, a Message object may have list of Header objects, and each Header may refer back to the Message. Your cleanup code should correctly free up unused references, e.g., "del varname".