Distributed Systems Development: Client vs Server

In this article I compare the distributed systems development for client vs server. When you start implementing a distributed system such as a client or server for some protocol, the basic functionality is easy to implement. But to make your software usable in real world, the client or server specific considerations take a lot of time. This article tells you how to build good quality distributed systems: client or server.

Client

Considerations: Auto-configuration, IP address change detection, NAT and firewall traversal, robustness against failures, adapt to network condition, consistent user interface and view, command line vs user interface, guaranteed security, idle and sleep detection, responsiveness of user interface, redundant connections to servers, keep-alive, caching, analytics.

Examples: Firefox browser, Skype, Gtalk

Description: A client should automatically configure as much as possible, e.g., network IP, hostname, username, machine type, etc., from system. If the client relies on local IP address, it should automatically detect any change in IP address. For example, a SIP client should re-register the new IP address as contact if there is a change. NAT and firewall traversal is one annoying reality on the Internet. Most often an HTTP based client works out of the box because most networks are permissive of HTTP and HTTPS. However, if you are building any other client such as IM and chat, VoIP or media recording, then there is some network in some enterprise which will block your connection. Most protocols have an alternative to perform NAT and firewall traversal. For example, RTMP has RTMPT, XMPP can work on BOSH, and SIP uses bunch of techniques.

A client should be able to adapt to any network condition. This not only applies to the network topology and filtering, but also to the bandwidth and quality. A VoIP client should automatically adapt to lower quality codec if it detects lower end-to-end bandwidth. Bandwidth detection and adaption should be a continuous process instead of performing only at the beginning. If you need to connect to a server, and there are many distributed servers, the client could periodically detect a list of closest servers, and connect to one or more of the closest servers in network proximity. Where network proximity is determined by network distance or delay and jitter. This allows your client to handle geographic distribution. If you have multiple redundant servers, you client should be able to failover in case one server fails. A better approach could be to keep persistent connections to two servers, so that failover latency is minimized. The automatic configuration, detection and adaption of various network and system conditions is one of the most crucial property of successful peer-to-peer clients such as Skype. Some clients need detection of idle or sleep behavior, e.g., to update your presence status. If the user puts the system on sleep (or standby) then your software may not get any chance to communicate to the server about the status. In such cases, your protocol or server should be robust in detecting idle clients.

A client is a user facing software. The responsiveness of the client user interface distinguishes a good software from an average one. For example, if the client doesn't get a server response within 200 ms, it can automatically inform the user via an hour glass or rolling wheel indication. If your GUI becomes unresponsive while it is "processing" instead of giving an indication, then user is likely to get annoyed or make mistakes clicking on the same button multiple times. You should always use event-based system for your user interface, instead of synchronous processing especially if it can block. Caching can be used if needed to speed up your performance. For example, instead of fetching the user list to display in your client every time you switch to the user list view, you can cache it and display a cached copy. Periodically, refresh your local cache with the actual data or from server. Caching is also useful in other places where client-server communication becomes overloaded.

Command line clients are becoming less common these days. But such clients are more powerful in some scenarios. Consider whether a command line alternative is useful and feasible as well. Finally, a guarantee on security is a must for the Internet client applications. Most application protocols define secure communication, e.g., over TLS/SSL, S/MIME, etc. Your client should have an option to go completely secure and encrypted.

In summary, a good client software is one which can do one thing that it is meant for. You may add many new features, but how you do the essential function is what will make your client useful and popular. Consider using analytics in finding which feature is gaining popularity, or which feature is no longer used. A software is like a human body. If you don't do exercise to remove body fat -- remove unused pieces and re-factor periodically -- you will become too fat, slow and useless. This is more important for client facing software, because client behavior keeps changing and what you used last year may not be the same client this year.

Server

Considerations: Easy configuration, logging, vertical and horizontal scalability, robustness and automatic failover, auto loading of configuration changes, connectivity to different backends, programmability, event based but multi-threaded, use multi-core CPUs, memory usage optimization, management console, command line control, activity monitoring, admission control for quality, stateless vs stateful, replication of critical data, partitioning of data for scalability, caching, keep-alive for crash detection of server, detection of idle or unresponsive clients.

Examples: Apache web server, ejabberd, SIP express router
Anti-example: Tomcat, Flash Media Server

Description: A server should have explicit, easy and extensive configuration option so that it can be deployed on variety of different scenarios, e.g., Apache config file. Note that when it comes to configuration: explicit is better than implicit, easy is better than complex. Another important feature of the server is being able to load the configuration changes without having to kill the server. For example, Tomcat automatically detects new war files and re-deploys the applications. Apache web server can be made to re-read the configuration using Unix signal. Some servers take the configuration to an extreme by defining an easy to use script that controls the server behavior. For example SIP express router defines a perl-like programming script to handle incoming request, forward to telephony gateway or perform authentication. Such fine grained configuration allows deploying the server in variety of different environments -- from personal use to enterprise or carrier deployments. On the other hand, I find J2EE model of defining services and classes in XML configuration files hard to use. Even though the configuration is done by configuration file or script, a easy to use web based management console gives a clean interface to the server control and monitoring.

Easy to use and configurable logging is another crucial piece of server software. A server log is typically the first place you go when you detect a problem. There is a tradeoff between extensive logging vs selective logging. I prefer extensive logging with selective viewing. Also I prefer accessing log from command line using "tail -f logfile.log" instead of the variety of web based log viewers.

Scalability and robustness are part of good server design. There are many other articles and web site dedicated to discussion on this, e.g., highscalability.com. There are several techniques such as event based thread pool, connectivity to different backends, bi-directional master-slave databases, replication of critical data, in-memory distributed cache such as memcache, partitioning of data, two-stage load sharing architecture, and use of servers from different vendors for robustness against security exploits. The server should prefer stateless operations. It should be able to detect unresponsive clients in case of stateful sessions, e.g., by periodically sending keep-alives. Note that a server initiated keep-alive is more robust than a client-initiated for distributed applications. For example, in client initiated keep-alive, if client1's keep-alive fails, client1 assumes it is disconnected, but client2 doesn't know that client1 is disconnected; whereas in server-initiated keep-alive, once the server detects that client1 is disconnected, it can inform other related clients about it.

The server should use the available resources in the best possible way. Typically memory, CPU and bandwidth are the critical resources. Some form of activity monitor should detect the resource usage by the server and inform the concerned IT person in case of abnormal behavior. This could be because of memory leak in the server or some security attack from malicious systems. Obviously the implementers should strive to fix any memory leaks. Another useful behavior by the server is to do admission control based on available resources. For example, if the server detects that it is using 90% of its bandwidth, then it should not admin a new media streaming client, of if it detects it is CPU is fully utilized, it should reject new requests with appropriate error response, so that client retries with exponential back-off timeouts. In a distributed server farm, the servers should be able to not only automatically configure based on configuration of other servers, but also detect overload on and share load from other servers in the farm. For example, a self organizing server can detect other servers in the farm, and automatically assume load sharing and/or secondary server responsibility.

In summary, configuration, scalability and robustness form the core of a good server implementation.

No comments: