Reliability and scalability in P2P-SIP

In this article I list the important definitions that contribute to reliability and scalability of distributed systems, especially P2P-SIP. Scalability is defined as the ease with which a system can handle growth of demand (load, users, requests, etc.), and reliability is defined as the ease with which a system handles failure or loss (fault tolerant, availability).

Stateless: The amount of state stored in each networked component limits the scalability (as well as robustness). A truly stateless protocol or service does not need to store any state in the system. At the application level, a session oriented protocol such as RTSP or RTMP is stateful, whereas HTTP is stateless which can be made stateful using session cookies. At the transport level, TCP is stateful whereas UDP is stateless. SIP networked components come in several flavors: stateless, transaction stateful and session stateful. A stateful component has limited scalability not because of storage requirement of the state, but due to matching of a new request against the existing states -- this requires exclusive or read-write-locked access to shared resources and hence is slow. On the other hand a stateless request has all the information that is needed for handling that request. Secondly, a stateful component has limited reliability because if the component fails the state is lost and must be re-established, e.g., using new SIP transaction or new HTTP authentication. If you must use stateful component, then some form of distributed or replicated shared state is desirable.

Partitioning: Data or service partitioning among multiple machines helps in reducing load on each machine. A naive approach of using a hash function 'H(data.key) mod N' works for distributing data based on the lookup key among N identical and robust servers. However, the hashing algorithm remaps majority of the data if N changes -- new server is added or old one fails. On the other hand, consistent hashing using a large hash space, e.g., MD5, maps both data using H(data.key) and machines using id=H( in the same identifier space. A machine can then store the data whose keys are close to its own id. This principle is used in several structured P2P algorithms such as Chord, Pastry, CAN, and works especially well with large number of machines with significant amount of churn. Partitioning can be applied to services as well. Partitioning improves system scalability, but comes with an overhead of maintaining the correct partition when machines come and go.

Replication: Replication of data improves its reliability (also known as availability) as well as scalability to some extent. In a simple master-slave replication of data, write is done to the master which replicates the command to the slave, and read can be done from either master or slave servers. With higher failure probability, such as in P2P, you need more number of replicas. With N replica of some data, you can still get the data if N-1 machines holding the replica fail. Replication interacts with partitioning -- you want to keep access to the replica almost as fast as the original data. Typically a machine can replicate the data to its N (typically small, 2-8) neighbors and when the machine fails, the next data query automatically gets to its ex-neighbors. Note that another approach where a different hashing function, e.g., H(i+data.key), is used to store the i'th replica does not work well, because replication comes with an overhead of having the machines to keep the replicas up-to-date.

Redundancy: Replication is an example of redundancy of data. Redundancy can be applied to servers and services as well. A stateless protocol or service helps in easily deploying redundant server nodes. Data redundancy goes beyond simple replication of data object to N places. Suppose a large file is split into M chunks such that only N chunks are needed to reconstruct the original file, N<=M. Assuming that many data sources are fast, this approach not only improves reliability but also performance in terms of how quickly you can download the file. Such techniques are used in P2P file sharing applications, and can be applied to P2P-SIP as well, e.g., for storing video mails or live streaming. Redundancy improves reliability as well as scalability of the system.

Load sharing: Load sharing is one type of redundancy where the load of the system is shared among N redundant machines. Although not required, load sharing typically works in conjunction with partitioning, e.g., in two-stage SIP server farm where the first stage servers forward the requests to the second stage server clusters based on H(data.key). As with redundancy, a stateless protocol or service can be easily shared whereas a stateful system requires more work. P2P systems are inherently load shared among the participating peers. Load sharing improves system scalability.

Iterative vs recursive: Iterative request routing is one where a client sends a request to one destination, receives a response that redirects it to second destination, and so on. Recursive request routing is one where a client sends a request to one node, which sends request to another node, and so on. Iterative vs recursive is also called as redirect vs proxy. Clearly, iterative poses less load on the networked element but more on client, whereas recursive is opposite of that. If scalability of networked element is desired then iterative should be preferred. However, NATs and firewalls make iterative request processing difficult on the Internet. Secondly, the topology, bandwidth and connections among the networked elements sometimes make recursive routing faster and more efficient in practice than iterative. The decision to go iterative vs recursive affects the number of message that needs to be handled as well as the state carried in the message or stored in the networked element. It is easier to incorporate redundancy in iterative mode, since only the client needs to re-try to redundant destinations.

Keep-alive: Network protocols usually have periodic keep-alive messages to ensure connectivity or to detect failures. Stateful Transport protocol such as TCP has built-in keep-alive mechanisms that can be activated using socket API. Application protocols employ some kind of application level keep-alive, e.g., XMPP has an extension to do ping, SIP has session timers. These not only detect failure due to network but also due to server software crash. The keep-alive messages help in improving the reliability of the system by quickly detecting complete or partial failures. Keep-alives are especially important in P2P network because of the large number of network paths that can fail.

Exponential back-off: After a failure has been detected, a reconnection or resend is attempted. The time for such attempts needs to be backed-off exponentially -- if the failure happens at time 0, then send first attempt at t, say t=0.5s, and subsequent ones at 2*t, 4*t, 8*t, and so on, until it reaches a cut-off, say 5 min. After that keep attempting periodically, say, every 5 min. If the failure is transient or one time, then this mechanism quickly reconnects. On the other hand, if the failure is longer term, then it reduces the load and bandwidth for reconnection attempts. SIP uses exponential back-off when sending subsequent requests in the event of failure.

Request-response: Application protocols typically come in two flavors: send-and-forget and request-response. Most signaling and control protocols such as HTTP and SIP follow the request-response architecture. Media streaming protocols such as RTP usually don't send response for every message. A request-response protocol automatically makes the entity sending the request a client and the entity receiving the request a server. The client-server distinction may be on per-transaction basis, e.g., SIP has each endpoint act as user-agent-client and user-agent-server. Similarly, in P2P network every peer acts as both client and server, while there may be some nodes behind NAT that can act only as client. A request-response architecture is needed where reliability of the message delivery is important or when RPC semantics is desired in the protocol.

Redundant connections: Redundant connections are another form of redundancy found in distributed systems. For example, a client may be connected to multiple servers. It periodically pings the server and selects the best server to actually send the request to. If the best server fails, it can fail-over to the next best server. Redundant connections improve both reliability and performance of the system. Redundant connections are also useful with geographically distributed server farms to locate the closest server. The idea is to dynamically adapt instead of having statically configured connections.

Bi-directional master-slave: Master-slave replication is used for data reliability and to improve scalability of read-dominated applications, as mentioned before. In a bi-directional master-slave configuration, both machines act as both master and slave at the same time. Any write to any of the machine gets propagated to the other, and hence any read can be done from any of the machines. This improves scalability for write-dominated applications also, such as SIP server. The bi-directional replication can be extended to more than two machines by incorporating a circular ring topology of replica machines. The bi-directional replication comes with the over head of having to maintain replicas on all the machines, and some way to solve a race condition where two updates to two different machines result in eventual consistency. For certain types of data, such as SIP contact locations of the users, it is possible to achieve such consistency.

Vertical vs horizontal scalability: A lot of time you will hear people talking about vertical vs horizontal scalability. Vertical scalability implies that when the load increases you identify the bottleneck and add a new component (e.g., CPU, memory, disk) to your machine to improve the scalability. Horizontal scalability implies that you design the system such that when the load increases you add another machine in the network to handle the load, e.g., another server in the server farm. P2P systems are horizontally scalable. Clearly vertical scalability has a limit beyond which it may not scale, whereas server farms can scale linearly by partitioning.

Proxy and Cache: Caching improves performance and scalability of the system. DNS has epitomized the concept. Caching is also used in web and media streaming protocols. The idea is to install a cache in an intermediate networked element which uses the cache instead of sending subsequent requests to the actual destination. HTTP is designed to heavily use caching to improve performance and scalability of the servers. Caching a negative response is more challenging since the time-to-live for the cached entry is unknown. On the other hand, real-time communication protocols such as SIP and RTP have limited use for caching in the network. Nevertheless caching of data improves the performance and scalability of the SIP server, e.g., by using in-memory cache of the SIP contact locations instead of reading from the database every time. With caching comes the overhead of maintaining consistency among redundant data.

Crash vs byzantine failure: Crash and byzantine failure are two types of failure models: a crash indicates that the system has stopped working and it may be possible to detect its failure, e.g., using keep-alive; whereas a byzantine failure indicates that the system does not behave consistently as per the agreed protocol or algorithm all the time. Hence, it is very difficult to detect byzantine failures, especially if it is due to malicious intention. Byzantine failure and malicious node problem is especially important in P2P since the peers may not trust each other. Both crash and byzantine failure reduce the reliability of the system. Various mechanisms mentioned in this article help mitigate the crash failure, but do not help much with the byzantine failure.

[1] Singh, K. and Schulzrinne, H., "Failover, load sharing and server architecture in SIP telephony", Computer Communication 30, 5 (Mar. 2007), 927-942. DOI= [Author's copy]
[2] K.Singh, "Reliable, scalable and interoperable Internet telephony", PhD Thesis, Computer Science Department, Columbia University, New York, NY 10027, June, 2006.

Security in P2P-SIP

I frequently receive questions on security in P2P-SIP, mostly from researchers looking for a new topic to explore. Security in P2P-SIP (and in P2P in general) is a challenging problem. In this article I summarize my understanding of the challenges and open problems.

The first literature on P2P-SIP mentions that P2P-SIP needs to solve the challenges of client-server Internet telephony as well as privacy, confidentiality, malicious node behavior and "free riding" problems of P2P. For example, a malicious node may not forward the call requests correctly or may log all call requests for future misuse. A later publication formally identifies the key security challenges and potential solutions. The paper on survey of DHT security techniques presents a comprehensive listing of challenges, solutions and problems. Let us classify the challenges:

User Authentication: Similar to client-server SIP, authentication is essential. A receiver need to verify that a sender posing as is actually the owner of that identifier. If the user identity is based off some other information or identity owned by the user, e.g., email address, phone number, postal address, social-security number, credit card number, PKI, X.509 certificate. etc., then it is possible to delegate the identity to that mechanism, e.g., by sending email or phone caller ID verification. The challenge can be further divided into: whether the user owns the identity? whether the user can randomly pick his identity to anything? whether a user can be made to believe that he has (wrong) ID or password? whether a malicious user can get password from another user in the pretext of authentication so that the malicious user can later assume the other user's identity?

Node Authentication: Additionally, since a number of P2P algorithms use the node identity to locate a node or define data storage criteria, the node ID is also a candidate for spoofing. A receiver must verify that the sender owns the node ID that it is posing as. The problem can be divided into sub-problems: whether the node ID are randomly picked or self generated by (malicious) nodes or assigned securely by some authority? whether the node ID can be spoofed in the protocol messages or data storage? whether a node can be made to believe by other nodes that it has (wrong) ID? whether a malicious node can get the authentication credentials of another node and later assume other node's identity? These problems if not addressed can result in other problems such as man-in-middle or denial-of-service (DoS) attacks. The most important question is: Can authentication be done in P2P without relying on central trusted authority?

Overlay Routing: A malicious node in a P2P network can drop, alter or wrongly forward a message intentionally manipulating the correct routing algorithm to disrupt the network and hence availability. This partly depends on node ID assignment mechanism, whether a node can intentionally place itself in the topology at a particular place? Further questions to ask: what fraction of malicious nodes affect what fraction of P2P network? What is the relationship between performance (availability, routing and data storage) of P2P and the fraction of malicious nodes or users?

Overlay Maintenance: A malicious node may invite more malicious nodes or copies of itself in the P2P network. A malicious node may partition the P2P network so that one part can not reach the other. A malicious node may reject join requests from other good nodes to prevent them from joining the network. The questions: what fraction of malicious nodes can affect what fraction of P2P network availability? Can a malicious node eventually affect the whole network given enough time? Can a malicious node affect the discovery of bootstrap node by other nodes that affects the joining process of other nodes? Can a malicious node intentionally place itself in the topology at a particular place (e.g., as super peer), so that it affects more number of overlay messages?

Free riding: A P2P network works because the peers do. If a node refuses to serve as a peer, but just use the service of the other peers, how do you handle this? Can the system enforce or give incentive to a particular node to become part of the overlay? What fraction of the nodes must be part of the P2P overlay for the overlay to work?

Privacy, Confidentiality, Anonymity: Unlike the client-server telephony, in P2P-SIP the call signaling and media messages may traverse through other nodes in the system. Can other nodes know who is calling whom and hence infringe on user's privacy? Worse, can a malicious peer listen to the conversation (audio, video, text, etc.) between two other peers? Can the system allow you to make anonymous calls so that the receiver does not know who is calling? Can the system allow you to receive calls (e.g., any-cast calls to call centers) without divulging your identity to the caller?

SIP services: The client server SIP implements several new features and services, but those have limited use in P2P-SIP because of the trust model. For example, programmable services using SIP-CGI or SIP Servlet are difficult, e.g., unless the receiving peer can completely trust the calling peer's CGI. Emergency services, spam prevention and lawful interception that have been researched in client-server SIP are pretty challenging in P2P-SIP.

Cost of security: Most of the existing protocols on the Internet suffer because people don't implement or deploy enough security. For examples, the front web page of many banks do not use HTTPS/TLS but have login forms. The reason is that system and operations engineers see security as an overhead, and do not use unless really needed. P2P takes this to extreme because the (in)security of one node can affect several in the network. The questions: What is the cost of security? How much does performance suffer in terms of number of messages, overhead, delay, for a particular security mechanism?

Given these problem there are several approaches the researcher are taking in solving. However, the core of some of these problems still remain unsolved. The general approach is to define a sub-set of the P2P-SIP system which works for the given security mechanism. For example, the P2P authentication is very challenging -- hence most implementations use a central certificate authority (CA) and everyone trusts that -- similar to the web browser model which comes installed with some root CA. The other approach is to build a closed P2P network of trusted implementations and provide the service to the rest of the untrusted users, e.g., OpenDHT and Dynamo. This works similar to the server farm model, except that the server farm is built using sub-set of P2P features -- self adjusting, less configuration, distributed data storage, geographically distributed. Another approach is to build the closed and proprietary system and protocol which prevents (to some extent) others from injecting the malicious node in the system, e.g., Skype. Unfortunately, sooner or later the protocol gets reverse engineered and the security is not longer present. The research on distributed trust, reward, or credit/debit system works well for file sharing but has not be successfully proven for P2P-SIP. Finally, some researchers focus on the statistics and availability of the whole network, with the theory that a small fraction of malicious nodes do not disrupt the whole network. If there is enough incentive for the nodes to remain good, this may work well.

If you are interested, please read the article on when P2P makes sense? In particular, if (1) most of the peers do not trust each other AND (2) there is not much incentive to store the resources then P2P does not work well because the system does not evolve naturally to work. Think of it as people who do not trust each other and they do not have much incentive to help others or to store information for other people, will a person be able to get information he needs that another person has? The subset of problems I listed in the previous paragraph all try to twist the problem such that peers trust each other, i.e., (1) and the system tends to evolve naturally to work. Still more research is needed in (2) to identify and develop the incentive model for P2P-SIP use case.

The (in)security of Flash Player's crossdomain

This article discusses the (in)security of Flash Player's crossdomain or cross-domain-policy mechanism and why it is against P2P. Anyone who has worked with Flash Player's network (URLLoader, Socket, etc) to implement a protocol would know the problem and pain that Flash Player causes due to it (broken) crossdomain security.

The problem: A programmable content downloaded from site1 running in user's browser should need explicit permission to connect to or use content from another site2. Otherwise, a Flash application may randomly connect to or use content from any other site without user's knowledge and pose security risks.

The real problem: The real problem is in the way Flash Player tries to solve the above problem. If a Flash application is downloaded from site1, and wants to access or connect to another site2, then that site2 must give explicit permission using a web accessible http://site2/crossdomain.xml or in-line cross-domain-policy response in the TCP connection. The crossdomain.xml file lists all the other domains (such as site1) whose Flash applications are allowed to connect here, and also lists all the ports to which they can connect. There are options to give wild-card (*) for domains and ports. Thus, only if site2 trusts site1, will it allow site1 to connect.

The first problem with this approach is the trust model: Flash Player asks site2 instead of the user for permission. This means user still does not have control what other sites the Flash application connects to; if site1 used wild-card domains then any application can connect to it; and admins of site1 and site2 must co-ordinate and collaborate. In most real deployment, this means that site1 and site2 are owned by the same entity and the deployment builds a false sense of closed walled garden of client-server applications.

The second problem is that it is very easy to work around: if site1 is wants to use or connect to site2, but site2 does not trust site1, then site1 can install a connection proxy on its site1 and have the Flash application connect to site2 via this proxy on site1. So it does not really protect site2 from access from any third-party Flash application -- i.e., there is no closed walled garden for site2. What it actually means is that if site2 has a content or service then anybody can build a Flash application to access that content or service as long as he can host a proxy on the Internet. You just need _one_ person in the whole Internet with good bandwidth and an open proxy to potentially break the crossdomain trust model of _all_ the Flash applications.

The third problem is that it assumes Flash Player binary will not be reverse engineered: This is the worst sense of security in the literature. When a Flash application is downloaded from site1 by Flash Player, and wants to make connection to a another site2, it just checks the URL from where the Flash application was downloaded. If the URL contains the domain of site2, then usually the crossdomain check is not done, but if the URL does not contain site2, then it first gets crossdomain.xml file. If some person is able to hack and modify the URL variable inside Flash Player or in the transit on non-secure HTTP, then the Flash Player will effectively ignore the crossdomain check.

The fourth problem is the way it has evolved: The initial implementations of crossdomain policy was broken, with lots of ways to work around. With newer implementations of Flash Player some of these problems have been fixed. However, that means the older mechanism is deprecated and newer mechanism no longer works with older. Concrete example of the problem follows. Should an application downloaded from http://site1/p/app.swf be allowed to connect to site1:5222 (non-HTTP port), or to http://site1 or http://site1/q/second.swf? If yes, then public hosting of personal web pages effectively open up all the Flash applications of all the hosted users on that server. So why not define a meta-policy at the top level http://site1 which controls everything else? How can site2 allow only one application from site1 but not other to connect? How can site2 allow only applications that are signed by site2 to connect but not others, even if those applications are hosted on other sites? No answer!

The fifth problem is that it depends on other unsecured mechanisms: HTTP and DNS. I won't describe details on these, but it would have been nicer if the security was based on code signing or standard authentication mechanisms instead of comparing the hostname from the HTTP URL.

The sixth problem is its incompatibility with existing systems: When implementing a custom non-HTTP protocol, the Flash Player sends the first request as <policy-file-request/> on the connected socket. What if the service does not understand this request? Well use meta-policy. In that case for it to work, the meta-policy needs to be served from say, 843. What if another server is present on the machine on port 843? Or what if you want to add handling of this policy-file-request in your own service protocol? In my personal experience getting this in an XMPP server was a nightmare. What if Flash Player puts a nul character after its request? You cannot control the request or response for policy-file from ActionScript because it happens automatically in Flash Player before a socket is actually connected in the ActionScript.

The solution: In the context of open services and/or open network and/or P2P, the crossdomain is a problem rather than a solution. If site2 has hosted an open service, it should not restrict anybody to build applications to connect to site2. Thus site2 should put a crossdomain.xml with wild-card domains and ports. If site1 has a built a Flash application for an open service, the application should be allowed to connect to any service that follows the protocol as long as the user approves the connection. Thus site1 should build Flash application with correct user authentication, and then proxy the connection via site1's proxy to the site2's service instead of having it connect directly to site2. This allows site1 and site2 to operate independent of each other as long as they implement a common service protocol, e.g., XMPP or P2P-SIP.

The real solution: A correct security implementation in Flash Player should have done something like the following:
1. When the Flash application tries to connect to another new site, ask the user (similar to the camera and microphone security settings) if he wants to allow the connection. Also give an option to remember the approval if needed.
2. Allow site2 to sign a Flash application, and require that only signed Flash application with so-and-so root certificate should be allowed to connect to site2.
Beyond these any site should be free to implement its own authentication mechanism.