Sunday, June 16, 2019

Lessons in web software development

Here is a summary of what I feel are important to me for creating web applications as an individual developer or in a small team. My hope is to help other developers with the important topics that I assembled.

1) Local development - This is the ability to run all the pieces of the puzzle locally on the development machine - even if some of the pieces are replacement stubs. What more? If everything can work without Internet connection, it brings heaven to home - it can take rapid prototyping and productivity to the next level. How many times did you get stuck because the external server did not behave as expected, or did not easily expose the logs for internal server error? Ability to retain local development becomes more difficult but more important as the system becomes more complex, such as with mobile devices or third-party services.

2) Cross platform and responsive - The product designer may not yet have mobile view, or the sales person is not yet targeting Opera browser. But sooner or later these requirements will come up. If I don't write web application to deal with these aspects from the beginning, then I am just incurring technical debt, which may bring foreclosure as the software grows. Luckily there are tools such as Chrome devtool or CSS auto prefixer [link] that can help me. Behind the scenes, cross platform for web applications is largely about HTML5 modern browsers vs. IE. Good news is that Edge is a modern browser. Moreover, there are polyfills out there to work on incompatible features.

3) Loose coupling but strict APIs - A basic guideline of modularity is to create small (preferably single file) modules that interact with other modules using well defined interfaces. For web applications, web components as well as iframes can enable such designs. Some frameworks have ways to enforce such modularity using declarative interface specification. Furthermore, for client-server APIs, enforcing and checking the strict behavior of the request and response content can help detect problems early on. Did the server team changed an attribute, but did not tell the client team about it? Now you can catch that early on.

4) Flexible code - Software flexibility is a huge plus, especially for early stage systems. As the requirements emerge, as the business landscapes clear up, and as the team grows, the ability to quickly change the behavior without incurring four-sprints for one feature is really important. Flexibility can appear in how the data is stored, how the data is passed from one module to another, or how the code operates on the data. Such flexible pieces of code can then be easily orchestrated to run in one scenario versus the other. Modern javascript constructs such as ES6 features and Promise further help in creating flexible yet clean code.

5) Customization - Visual web applications are particularly susceptible to problems in beauty (or lack of it). Since beauty lies in the eye of the beholder (or beer-holder), every user can prefer things differently than what you do, or what your product designer does. At the basic level, knobs to control the theme color, font-family or font-size brings some engagement with the user, and at the advanced level the ability to customize every visual aspect brings incredible flexibility to navigate the rapidly changing landscape of product requirements. Luckily separation of view (CSS) from control (JavaScript) in web programming readily enables such customizations.

6) Avoid artificial or arbitrary restrictions - Although this is loosely related to flexible code, it is important enough to have its own bullet point. "The list display will not fit in small screen if it has more than seven items, should I restrict to maximum seven items, or should I do something else?" "If five tabs are open at the same time, the title starts getting clipped, should I restrict to maximum five tabs, or should I do something different?" You get the idea? Solving this does tend to make things complex in the short term, but is really important in avoiding technical debt, or a call at 3am about broken app. If you really need to have such a restriction, convey it clearly to the end user.

7) State vs. stateless - Depending on the web software architecture, the application state may or  may not be at the client browser, and may or may not be at the server or database. However, each individual module also maintains some state. The ability to recreate the module from a stored state, and the ability to save the current state, goes a long way in making robust and flexible web application. For example, what would you do if your long running chat interaction app is now embedded in another web page, which allows the end user to navigate and hence reload the chat app? How would you recreate the last state before the user's browser crashed, and user re-launches your web application? Implementing stateless modules with complete separation of data is ideal, but not always feasible.

8) Separation of data and application logic - I have talked about this in my earlier posts as well as presented system papers describing applications built using this paradigm. Nevertheless, this is one of the basic principles of scalable software in my opinion. Some existing frameworks enforce the idea. However, some design philosophies (such as object oriented design) can easily go against the idea. In my opinion, the web application logic should be able to work on data that may be obtained from any place - instead of enforcing a tight coupling of where the data is stored and how it is accessed. This allows many other ideas to be applied easily, such as for data partitioning, client-driven sharding, or testing local code changes with production data. Furthermore, separation of static vs dynamic data is as important for scalability and performance.

9) Performance - With event driven JavaScript, and extremely fast engines in the browser, it is hard to make performance an issue for web developers. However, few points are certainly important: (a) use the right algorithm and data structures, (b) use the right tool for animation or background task, and (c) optimize and collate API calls if needed. You may be tempted to use an array for linear search, hoping that the array will not be more than twenty items or so. But adding few more lines of code to use hash table, with proper cleanup, will not only give you confidence in your code, but also prepare it for those corner cases when there could be ten thousand items. Using CSS for animation is far more efficient than JavaScript. Finally, with RESTful principles, the tendency to GET the full data object, when you just need one or two attribute is often found. How can the client-server interaction be optimized, in a generic manner, such that the client and server can accomplish what they are intended to do, as quickly as possible, without doing seven lock-steps to show one web form.

10) Robustness - At the system level, robustness is about ability to recover from intermediate transient failures. At the module or code level, it can exhibit as proper error and exception handling, data verification before use, not to mention checking for null or undefined. Often times a catch all error notification is used in web applications to display or log any abnormal behavior. However, code robustness involves the ability to recover from the error if possible. For example, if a function that expects JSON object failed to understand the object, was it because a JSON string was supplied? If this is an external facing function for the module, does it make sense to provide that flexibility and robustness? What happens if the string has unsupported characters that are not allowed in JSON? Robustness for client-server communication is also useful. How can the client retry the failed API or WebSocket connection? How often should it retry? If keepalive is needed for persistent connections or for liveness checks, who should initiate the keepalive? Concepts such as soft-state and exponential back-off refresh timer are well known, and are often useful in distributed web applications.

11) Caching - Web protocols as well as browsers and servers are expert in caching. However, due to project requirements or customer demands, web caching may have been configured at a sub-optimal level. For example, if images and APIs are served from the same server, perhaps the no-cache policy of API also applies to images. Furthermore, problems in client side software may cause repeated requests to the same image or API in short duration. Will it be useful to cache such requests in the client application code, instead of invoking the request every time? What should be the cache duration? If the same image is being used at twenty different places in the web application, should that be cached instead of loading the HTTP header for each such instance?

12) One more level of indirection - Web runs on indirection, i.e., ability to resolve one name to another, or one DNS hostname to IP address, or one web path to specific blog article. The core idea is that there can be multiple names pointing to the same thing. This has wide range implications in web application development - in creating short links, in routing paths to pages, in converting URLs to API calls, and so on. How this gets applied in a specific scenario completely depends on the scenario. However, there are two ways the indirection is resolved - proxy vs. redirect. Or recursion vs iteration. As an example, if the code needs to do ten sequential tasks, should the controller invoke those ten tasks one after another, i.e., invoke the next one based on the result of the previous; or should the first task take care of invoking the next one, and return to the controller when all the tasks are completed.

13) Security - Whatever I can write about security here is going to be not enough. For web applications, the security of not only the client-server exchange is important, but also that of the software code. Code obfuscation and minification are often done for web application files, but deobfuscation tools are equally popular. Is the application logic found in the client code something to protect? Is the client server communication encrypted on the network? Should the client-server API be hidden from the developers that use devtools? Is client certificate useful for the web application? Are passwords stored unencrypted in cookies or local storage or database?

14) Configuration - Developers often use fixed values or constants to be used in the code, or assume a default from multiple choices to be applied if needed. Such values form the application configuration. Usually the configuration is stored at the server or database, and the client code just uses the values. However in some cases it makes sense to allow the client app to be configured differently for different launches, e.g., using URL parameters. Identifying crucial configuration items and exposing them as easy to turn buttons or controls not only pleases the user but also makes your software more flexible.

15) Reduce fat - This is probably the most neglected one. A lean software has many benefits  - easy maintenance, quick change, rapid testing, fast debugging, and above all, better performance and load time. Not just web developers but any software developers have tendency to not remove code, even if the code is no longer needed or is replaceable by similar code elsewhere. The fear of breaking the running code is far more than the pleasure of clean concise code. Also, use of external frameworks often exacerbates the fear in my opinion. Unfortunately, this causes technical debt, like no other, that is hard to fix later. With version control history and code minification tools, at the very least the unused code should be removed or commented out. And the near duplicate code should be merged or refactored. Like body fat, software fat reduces agility, and hence your productivity.

That's all folks! Happy developing....






No comments: