Friday, June 23, 2006

Structured vs unstructured P2P or Why we chose DHT for P2P-SIP?

One of the questions that people ask me is 'whether structured or unstructured P2P works better for SIP-based Internet telephony?'. My answer is that structured P2P such as distributed hash table (DHT) works better because of the following reason.

Unstructured P2P networks usually rely on caching and replication to improve the lookup performance and reliability. In the case of file-sharing application, once the file named "matrix-II.mpg" is uploaded in a P2P network such as Kazaa, the content of the file remains the same (un-mutable). Thus, for reliability the file may get replicated at multiple places. Intermediate nodes in the download path may cache the file locally to improve download performance by other downstream nodes. On the other hand, a phone's contact location or IP address uploaded under the resource named "bob@home.com" may change often (i.e., mutable), e.g., if bob changes his device or the device's IP address is changed. This makes all the randomly replicated and cached copy of the resource content useless. Thus, random caching and replication is not useful.

In unstructured P2P network, any replica of the file "matrix-II.mpg" is good enough, whereas in Internet telephony you want to reach only "bob@home.com" but not some replica of Bob. You can not replicate Bob's phone. The only thing that can be replicated is the mapping between Bob's identifier and his phone's IP address. However, to allow frequent updates and consistent view of the mapping, it is desired that the mapping is replicated in a structured manner so that both Bob can easily update the mapping and any prospective caller can easily query the up-to-date copy of this mapping. This means given the resource name, we can know where the resource should be present in the P2P network. Thus, it is structured.

Unstructured P2P is suitable for random or blind search and allows wild-card search. For example, you can search for a file name containing "matrix" keyword. On the other hand, in Internet telephony most calls are made to the known destination (phone number or destination user identifier). Hence, hash table interface that allows lookup based on a key is good enough. Distributed hash table (DHT)-style P2P algorithms such as Pastry, Chord, Bamboo provide ideal platform for such lookups.

Skype is believed to be unstructured among the super-node, but I think there may be some form of structure in how the data is stored among the super-nodes. Plus, there is no guarantee on upper bound on lookup cost which means it might be falling back to some centralized lookup if the number of hops in search exceed some limit. Since Skype is proprietary, we do not know how exactly it does lookups.

No comments: