Thursday, June 05, 2008

memcached

Recently I got a chance to look at memcached a distributed in-memory cache.

There are two ways of scaling databases: (1) replicating with master-slave such that reads can be done from any slave database whereas writes needs to be done in master, and (2) distributing the data among segments such that accessing a data needs to access only a particular segment instead of the whole database. The first technique doesn't work well for scaling SIP-related contact data, because number of writes (login/logout, presence updates) are significant compared to number of reads (call routing, IM delivery). Distributing the data among segments has been done before, and my thesis also describes it in the context of SIP telephony.

The memcached system uses this second technique, hence has much better performance in certain cases. It is targeted towards web applications that generate dynamic web content by accessing the database at the back-end. One could run memcached on several servers, potentially co-located with the web servers in the server farm. The dynamic web service (e.g., written in PHP) is then modified to first look in the cache, and if not found then query the actual database (and update the cache as well). The cache provides a distribute hash-table (DHT) interface.

When the data is updated in the cache, it gets stored in the appropriate memcached instance based on some hash of the data key. When a query is done, that instance is queried for the data. Each cache instance can be configured with a limit of memory size, typically 1-3GB based on the available memory in the system. The distributed nature of the cache allows you to linearly scale the cache by just adding more instances of the cache on different machines.

At first glance this looks very promising for use in a P2P-SIP application. However, there are some serious limitation. Although memcached has built-in hashing mechanism for data storage, there is no automatic replication of data (hence no fail-over), the hashing needs to be implemented by the client itself (hence no recursive queries). In particular, memcached implements only one part of a full P2P data storage application such as OpenDHT. Because of this limitation, it cannot be used effectively for P2P-SIP client implementations without adding a lot of additional software. Nevertheless, memcached can potentially be used for P2P-SIP proxy server farm.

As a new project idea, it would be interesting to try to use memcached in openser SIP proxy server to readily build a P2P SIP server farm without much overhead! If you are project student willing to work on this, I will be happy to mentor you.