A journey through real time...

I did my first project on real-time communications in 1997. In the past two decades I did numerous projects in this general area. In particular, these were related to voice over IP (VoIP), multimedia and web communication. As 2016 comes to an end, I reminisce my 20-year journey. I attempt to capture the gist of my projects. How my project themes and ideas evolved over time?

(I used initials instead of names of the people I worked with. These people shaped my journey, gave directions and helped clear my path.)

1997-99  The door opened - "Age of ITU-T"

As part of a semester long practical training in the final year of my bachelor of engineering curriculum, I worked at Motorola India. Working with another student SKP, under the mentorship of SA and SA, we built our first PC-based video phone. They already had two ongoing video call projects, one with H.324 for modem and another with H.320 for ISDN. They wanted to dive into IP-based H.323 video call.
H.323 video phone: A desktop application with really nice user interface to do two-party point-to-point LAN-based video call using H.323. Includes H.225.0 for call setup via Q.931, media transport via RTP over UDP, and TPKT over TCP, H.245 for call control, G.711 for audio and H.261 for video. Reuse parts of Q.931, H.245 and codecs from other ongoing projects. Keywords: H.323, video call, WinNT, Win32, Visual Studio, VC++.
The project had quite a successful demonstration. Both of us, final year students, had job offers from elsewhere. SA offered to keep us at the company to continue and improve our project, and we stayed there. The previous project was further enhanced to include many other features and voice codecs. We also took part in ITU-T interoperability events.

While at Motorola, I did few other projects too. During this time I also worked with and was inspired by ST and HS, who taught me that code can be beautiful! There were ongoing projects on H.320, H.323 and H.324. It made sense to create interworking functions too. I mentored two student projects. One was on a H.323-H.324 gateway where a PC with both modem and ethernet could be used as a protocol translator. Another was about porting an existing H.320 system on to a real-time operating system, pSOS. Developing on embedded systems had its own difficulties related to debugging and testing. In collaboration with another person in the QA department, I created a framework to help developers.
Assert assistant: A framework and supporting C/C++ libraries and macros, that enabled quick debugging of software. The idea is to automatically inject trace statements in function entry and exit points, and to dynamically enable or disable them in various modules. Keywords: debugging, testing, framework, embedded system, C/C++, macros.
After about two years at Motorola, I decided to leave, to pursue higher studies. During the transition time, I worked on system design of a personal project, multimedia communication developer's kit. The core idea was to create and abstract basic multimedia communication elements such as camera, microphone, speaker, display and network. And to create a drag-and-drop user interface to allow creating real time applications by interconnecting them. I did not implement the idea at that time. I got a chance to implement the core concept in another project, aRtisy, fifteen years later.

As I started applying for higher studies in the US, SKP told me about the emerging work on SIP at Columbia's CS department. SIP was invented as an alternative to H.323, and was similar to other Internet style protocols. It looked innovative. I applied there as well, and was accepted by HGS in his research group named IRT (Internet Real Time) laboratory.

1999-2003  Rapid growth in learning - "Age of SIP"

My first project at Columbia was to create a gateway between H.323 and SIP. With my prior H.323 experience and the relative simplicity of the emerging SIP, it was a perfect fit.
SIP-H.323 gateway: A signaling translator written using open source OpenH323 and in-house SIP stack derived from our SIP server. The RTP media path is end-to-end. Can enable built-in gatekeeper vs. use of an external one. Supports fast-connect as well as normal call setup. Keywords: SIP, H.323, gateway, signaling translator, IWF, RFC2543, OpenH323, C/C++, WinNT, Solaris, Linux. [link,paper,spec,ppt]
I did successful demonstration of voice call between the two protocols. I used our locally developed SIP e*phone and Microsoft's H.323 NetMeeting. HGS had another student working on that topic before I joined, but without much progress. My work became an instant hit within the lab as well as outside. It was the first such successful attempt at interoperability between these two competing protocols. It was also quite complete in its translation of incompatible concepts, e.g., fast/slow-start vs. three way handshake, logical channels and capability negotiations vs. offer-answer. I co-wrote academic papers [pdf] and Internet drafts [pdf], and did presentations [ppt] at Voice on the Net (VON) conference on this topic. In collaboration with bunch of other folks, we created the H.323-SIP IWF effort [rfc] in the IETF. The software was further refined, productized, and sold [link] by a Columbia spin-off named SIPquest, later renamed as First Hand Technologies [link].

I built the SIP side of the gateway from our ongoing SIP server project named sipd. I extracted the SIP-parsing, formatting and transaction related code, added SIP dialog and other user agent capabilities, and created higher layer reusable library. This was later used in a number of other projects at Columbia whenever we needed SIP user agent functionality.
Libsip++ SIP stack: A cross-platform SIP user agent library with C++ APIs, and abstractions for message parsing, formatting, transaction and dialog, along with sessions for registration and call setup. Keywords: SIP, stack, C/C++, API, cross-platform, Win32, Unix.
Sipua - SIP/RTP user agent: A command line cross platform application for SIP user agent and RTP/G.711 media stack. Built-in voice interface on the Solaris/Sparc platform. Ability to launch an external audio/video tools (rat, vic, etc) on any platform. Keywords: SIP, user agent, voice call, C++, cross-platform, Win32, Unix. [link]
Furthermore, I wrote a platform abstraction layer. Thus, multi-threading and socket interface could be cross-platform. Many of my earlier projects were built to run on both Windows and Unix (Solaris, Linux, FreeBSD, Tru64) systems. I took HGS's class on Internet Systems Programming [link]. Among other things, it dealt with various Unix make/gcc quirks, system calls and interprocess communications. This further motivated and inspired me to improve the build system and cross platform capabilities. During this time I closely worked with JL, another student, especially related to the SIP stack and the SIP server.

Over the next year, while completing my MS at Columbia, I built a few more SIP applications and services. I also mentored some other students in their projects with HGS. The SIP voicemail system built using distributed and scalable SIP+RTSP architecture, and the SIP multi-party conferencing server with voice mixing, and later video forwarding, capabilities were particularly popular.
Sipum - SIP/RTSP unified messaging: A modular Internet multimedia mail system using SIP and RTSP, to allow message recording and access via any Internet connected device, and using off the shelf streaming software. The media path goes directly from the caller to media server, not via the voice mail server. Also works with access from PSTN via a gateway. The access user interface written using Tcl. Keywords: Voicemail, unified messaging, SIP, RTSP, C/C++, Tcl, Win32, Unix. [link,pdf,ppt]
Sipconf - SIP conference bridge: A scalable voice (and later video) conference server using SIP. Supports wide range of codecs - G.711, DVI, ADPCM, GSM, G.722, and later Speex. Includes call recording, authentication and load balancing. Uses a web interface to configure and setup a conference, or create ad hoc conference on the fly. Includes video forwarding, text chat, VNC screen sharing and file sharing. Keywords: SIP conference, audio mixing, Win32, Unix. [link,pdf,ppt
During this time I closely worked with another student, XW. I often used his SIP user agent for testing and demonstrating my server side systems. Another student, TK, helped in parts of the voicemail user interface implementation. The conference server was initial implemented as part of a class project jointly with GN for the Advanced Internet Services [link] class taught by HGS. Another student, SN, helped with various enhancements such as IPv6, TLS and database scalability in many of my projects. I also co-authored papers on the topics of unified messaging [pdf] and conferencing [pdf] for presentations at some conferences and workshops.

For a class project in Web enHanced Information Management [link] taught by GK, I built my first web-based phone.
Hello2web - web based IP telephony client: A Java applet in a web page that allows sending and receiving voice calls from within the browser. Applets do not have voice (microphone) interface currently, hence create a plugin for Solaris/Netscape, to delegate the real-time voice capture and playback functionality. Uses a backend server to gateway with actual SIP phone, and to perform encoding/decoding and RTP transport of voice path. Keywords: Java, applet, plugin, click-to-call, VoIP, web.
I enjoyed the kinds of projects I was doing. By the time I was close to finishing my master's degree, I decided to enroll in the PhD program with HGS. I collaborated with other students, SN, JL, WJ, and XW to create an integrated VoIP architecture, named CINEMA, Columbia InterNet Extensible Multimedia Architecture.
CINEMA - Columbia InterNet Extensible Multimedia Architecture: An IP telephony architecture with a SIP server at the core, together with bunch of other services such as voice mail, conferencing, interactive voice response, media server, PSTN gateways, H.323 translator, etc., and various SIP user agents, media players and phone devices at the edges. Ability to replace the traditional PBX based communication system in departments, organizations and university campuses. [link,link,poster,ppt,spec,paper]
Over the next year or so, the architecture became quite popular both as our internal test bed [link] as well as for external demonstrations [link] and publications [paper,paper,paper]. The core idea was to keep the system scalable and robust by keeping Internet style signaling, and by distributing various tasks to different servers. Furthermore, additional synchronous and asynchronous collaboration services were implemented to create a comprehensive multi-platform collaboration system [paper]. We also deployed the system within the Computer Science department [paper]. We assisted with the university-wide deployment as part of the Internet2 effort [link] to adopt SIP/VoIP. For some time, during the early days, CINEMA was available as open source. Later, due business needs of our sponsoring organization, it was made closed source [faq].

The project couldn't have been successful without some of students who collaboratively worked at that time. XW took care of the SIP user agent. JL oversaw the SIP server, call routing and software distribution. WJ worked on PSTN interoperability and PBX interconnection. SN did many things including TLS, IPv6, faster database, presence and performance measurement. I was responsible for many of the server-side components including voicemail, conferencing, interworking, voice dialogs, media server, Win32 portability, SIP server farm, and so on.

By this time we already had many SIP services. We also had a gateway to connect between SIP/VoIP and our department PBX. One service missing was interactive voice dialog, e.g., to access the voicemail or conferencing system using an authentication PIN. I took it upon myself to implement such an IVR system using SIP.
Sipvxml - SIP-based VoiceXML browser: A VoiceXML interpreter with a SIP front-end. It can accept input via DTMF or voice, and play prompts to the SIP caller. Uses IBM ViaVoice TTS SDK. Uses RFC2833 for DTMF. Uses the rtspd media server for recording or playback of voice. Integrates with existing voicemail and conference services. Keywords: SIP, VoiceXML, IVR, C/C++. [link][ppt][paper]
The project was done jointly with another student, AN, who helped in implementing some pieces of the server. Later, we published our work in a conference [pdf]. I also mentored other students in creating various VoiceXML apps on top of this service.

Mentoring other project students played a big part in my graduate experience. During the course of my MS and PhD, I mentored more than 35 student projects. Click here to see the list of student projects. Many of these projects eventually became part of our CINEMA test bed. Some of them got used in my own research agenda in various ways. Most of these projects enhanced our test bed either to incorporate new ways of collaboration or to improve the servers, e.g., for codecs support. Some of these projects created entirely new systems unrelated to the test bed. Significant number of the projects dealt with SIP and RTP for voice and video. Some projects were related to media streaming based on RTSP. And some others related to improving the user interface of our system or adding new forms of collaboration in our test bed.

During the summer of 2002, I did internship at Bell Labs, on an entirely different topic - IP mobility. The core idea was to allow NATed devices to change their IP addresses without breaking the end-to-end transport connections.
MobileNAT - mobility across heterogeneous address spaces: Decouple the identity and routing aspects of IP addressing in the endpoint by using two separate IP addresses, virtual vs. real. Implement four pieces: (a) a Windows driver to intercept certain IP messages and apply NAT, (b) client host application to act as a DHCP proxy to decouple the exposed and real IP addresses, (c) mobility manager in the network to manage changes in the NAT mappings, and (d) enhancements to the DHCP server to distribute virtual IP addresses in addition to the real IP addresses. Keywords: mobility, NAT, DHCP, iptables, Windows drivers, Linux. [paper,paper,ppt]
I successfully demonstrated connection persistence of telnet and real time streaming using this novel technique. The internship was quite successful with couple of academic publications and a patent. It was also instrumental in getting me a job offer in the same department after I graduated.

At Columbia, by this time, we had a simple web interface to configure user accounts, contacts, etc. HGS created the initial user interface. Later, I improved and expanded it significantly using Tcl-based CGI scripts [link]. The web user interface included many advanced features. For example, the ability to start/stop the servers, configure database entries, or launch backup services. Our system had evolved into a complete VoIP and collaboration platform. I had presented the architecture and demonstrations numerous times, and had co-authored a few academic papers. 

As my projects evolved from simple "can you hear me?" demonstrations, to more mature systems, I started evaluating system performances. Measuring the quality of the voice path, server health and scalability constraints were very important. Equally important was applying the architecture to create a complete and comprehensive collaboration framework.

2004-2006  Preparing to take on the world - "Age of the Scale"

I started my final stretch towards completing my PhD, which actually involved thesis proposal and defense. I realized that all my prior projects on various servers and systems will only form a minor subset of my thesis. I will need a lot of material on performance evaluation and improvement. I will need to create systems that significantly change the state of the art. This resulted in a dedicated and focussed effort on two areas: (1) scalable and robust server farm architecture, and (2) peer-to-peer telephony to avoid using expensive servers.

For the first part, I explored both vertical and horizontal scalability and robustness of SIP servers. HGS proposed a two-stage server farm for SIP servers. I implemented it using the software derived from our SIP proxy. I showed that the performance is linear with the number of servers, indicating very good horizontal scalability of the architecture. Furthermore, I applied that same basic principle of two stage routing within a single application. I compared the effect of multi-threading and multi-processing, where different stages reside in different threads or processes. 
Failover and load sharing in SIP telephony: Apply existing web and email server scalability and robustness techniques to SIP servers. Create and evaluate two stage server farm. Implement database (MySQL) bi-replication for SIP location service. Keywords: SIP, scalability, failover, C/C++, server farm. [paper,paper,ppt]
SIP server architecture for scalability: Implement SIP servers in various event-based vs. multi-threading vs. thread-pool vs. process-pool architectures and evaluate their performances. Find bottlenecks and propose solutions to improve the system. Keywords: SIP server, multithreading, multiprocessing. [paper,ppt]
I also wrote academic papers [pdf] on this topic. The SIP server scalability and architecture formed a big part of my thesis. However, server-based systems inherently suffered from scalability and robustness limits - particularly in disaster scenarios with limited Internet connectivity.

By then, Skype was quite popular. Another student, SAB, had spent some time understanding its behavior. I had spent some time understanding the various structured P2P networks or Distributed Hash Tables (DHTs). In one of the weekly meetings with HGS, we came up with the idea of using SIP messages to create peer-to-peer network, and then, use that to route SIP calls. 
P2P-SIP: peer-to-peer Internet telephony using SIP: Architecture and implementation to explore how SIP and P2P can be combined in practice. Uses structured P2P networks. Supports user registration and lookups. Explores P2P-over-SIP and SIP-using-P2P approaches. Keywords: P2P, SIP, DHT, architecture. [link,pdf,ppt,spec]
SIPpeer - SIP based P2P IP telephony client adaptor: A piece of SIP software that runs on client host or local network, and turns existing SIP user agents and phones into P2P-SIP system. It creates a self-organizing, scalable and robust DHT-based P2P network, and uses it for various SIP functions such as registration, call routing, offline message delivery, and even multi-party conference. Keywords: P2P, SIP, DHT, C/C++. [pdf]
Using an external DHT as a SIP location service:  Modify a SIP user agent (SIPc) to use an external DHT (OpenDHT) as the location service to register and lookup users. Explore data vs service model. Securely store user location to avoid misuse and to enable authentication. Extend the system for presence and offline message storage. Keywords: P2P, SIP, DHT, Tcl. [pdf,spec]
The P2P-SIP effort got a lot of hype within the IETF. A working group was formed to further explore the architecture and implementation. While I graduated and moved on to other projects, HGS and SAB continued the effort both within the lab and externally.

In 2006, I finished the final steps of my PhD. I defended my thesis by the mid year.
Phd thesis - reliable, scalable and interoperable Internet telephony: Shows how carrier grade scalability and reliability is achieved in software using only low cost commodity hardware. Presents a comprehensive multi-platform collaboration system using SIP. [abstract,pdf,ppt]
I started my professional journey implementing voice and video features for telephony and conferencing. I continued on to implementing many different types of communication and collaboration services. By the end of my PhD, I had found my new love for decentralized peer-to-peer systems. The VoIP industry was moving towards more tight control and managed service offerings. But I wanted to implement peer-to-peer communication and collaboration systems. I wanted to build loosely coupled telephony components, without control from a single application or service provider. I wanted to take the path less travelled!

2006-2011  Jumping on a roller coaster - "Age of Startups"

I joined Bell Labs immediately after completing my research at Columbia. My first project was on design of a serverless mobile gaming platform. It was loosely based on peer-to-peer self-organizing concepts, but not quite the same. However, I got pulled into another project about implementing a Java-based user interface for Lucent's attack detection system. As I finished this one, and was working on the serverless mobile gaming platform, I got a call to join Adobe Systems. 

A small team of people in an internal startup mode were to add various VoIP capabilities to Adobe Flash Player. This was to be based on SIP/RTP to enable point-to-point real-time media path. The goal was to make it available to all the web developers to use and innovate [link]. Adobe already had a plugin based proprietary enterprise collaboration application [link]. But they wanted to create a standards-based VoIP function directly in the Flash Player. Instead of getting involved as yet another VoIP vendor or provider, the goal was to enable other developers easily create VoIP systems. A weapon supplier always profits no matter who wins the war! 

HS, who knew me, was already working there. BS, who hired me, presented a vision of how some of my research ideas on SIP and P2P could integrate with Flash Player and have a tremendous impact. At Adobe, I closely worked with SC, who taught me that doing it right is more important than getting it to work! We created various working prototypes and pre-beta implementations. The project at that time was called Flash Voice, later renamed to Adobe Pacifica. 
Flash voice using SIP/RTP and P2P: Add SIP/RTP based VoIP capabilities for Flash applications. Add missing pieces necessary to enable point-to-point media path in Flash Player. Create sample Flash applications to demonstrate possible use cases. Keywords: Flash player, C/C++, ActionScript, SIP, RTP, XMPP.
I contributed to the RTP-based media stack in Flash Player. I created a SIP stack in ActionScript. I created several working demonstrations of use cases, e.g., Flash-based click-to-call, Flash-based soft phone to send/receive SIP calls, and Flash-based multi-protocol (SIP+XMPP) communicator for instant messaging, presence and voice calls. Furthermore, I created a DHT/P2P implementation in ActionScript. It built a self organizing peer-to-peer network of Flash Player instances. It allowed to discover and reach each other for VoIP, presence and messaging. Some projects were in C/C++, but some others were in ActionScript. As I learned ActionScript, I really started to appreciate the framework, and more importantly, the development environment.

As the team started to grow, and competing business interests interfered, it became clear that the original goal of the project would not be met. People changed. Directions changed. One such change was that Flash voice would only be available in AIR apps, not in Flash Player, significantly limiting its potential. I decided to leave. There were some public news about the project after I had left. From what I heard later, the project was grounded before takeoff.

After that, I spent some time without a real job, working on open source projects, and learning Python. My original passion for P2P systems had not died yet. I created Python implementation of a SIP stack. Thus, the 39peers project [link,link] was born.
39peers - open source P2P IP telephony software using SIP in Python: Implement various IETF standards including SIP, SDP, RTP, DTMF, STUN. Create applications and libraries for DHT, P2P node, P2P-SIP node. Implement Bamboo DHT algorithm, which is used in OpenDHT, but in Python. Keywords: SIP, P2P, RTP, DHT, Bamboo, Python.
Although the project was intended for P2P-SIP, it had implementations of the complete SIP and related standards. I wrote a very extended tutorial [web,pdf] on how to implement SIP in Python. It was a way to document my project source code in a bottom-up manner. As I got pulled in other real jobs, I had to abandon my 130+ page book, and it still remains incomplete. However, the chapters on how to write a fully functional SIP, SDP, RTP stacks are complete and available there.

I contributed to another project, Python based RTMP server, and created a SIP-RTMP gateway. Later on, this project became quite popular. It was a quick and easy way to bridge Flash-based applications with VoIP service providers. The project itself was not tied to a service provider. It started as a voice gateway, and later, was extended to include video.
Siprtmp - SIP-RTMP gateway for voice and video: Using Adobe's Flash Player and Python SIP code, implement a PC-to-phone calling system which should allow both inbound and outbound calls between web browser and SIP. Allow any third-party to build user interface for the web-based soft-phone. Keywords: SIP, RTMP, Python, ActionScript. [link,link]
Those few months without job was when I learned rapid prototyping with Python. I was very productive in creating working code. Over the years, the Python-based SIP and RTMP projects became quite mature. It included support for tunneling, peer-to-peer mode, client library, cloud hosting, better performance and a range of sample applications.

When NT of Tokbox approached me and described their work, it sounded very exciting. The goal was to create a Flash based video call and messaging system. They wanted to allow phone users to connect too. My background on VoIP and more recently on Flash and ActionScript was a perfect match. And I liked the enthusiasm in the leadership, NT and RH. After joining the company, within a month, I was able to show a successful demonstration. It was for a web to phone call using my new gateway written in Java. It used open source Red5 for RTMP side, and open source NIST SIP stack for SIP/RTP side. However, it took them really long time to decide to bring that feature to the customers.

After that initial work, I got absorbed in to the main line development of the video calling system. I restructured the system from old Flash to newer Flex. I replaced the ad hoc signaling with robust XMPP. I made several improvements in the video call and conferencing application. I wrote most of the code using Flex in Mxml and ActionScript. RH, an amazing programmer, was always striving for better and cleaner software. In a small team of talented developers, notably CW, GG, JF, BS, BR, we created many cool features. Besides those, I also worked on improvements to the voice and video quality, e.g., with respect to echo cancellation, and network quality monitoring. Later on, I got involved in reviving the Flash to SIP calling feature, and integrating the emerging RTMFP based peer-to-peer support of Flash Player.

After about one year, the leadership changed. As the tide turned, it became clear that the boat will steer to another direction. After I separated, I decided to focus again on my open source effort. I created an web forum for project ideas and to help students with projects [link]. I continued my work on previous SIP and RTMP based open source projects, and also created a few more open source projects related to web and Flash-based communication. Click here to see all my open source projects. These projects were initially hosted on Google Code, but were moved to GitHub after Google decided to shutdown its code hosting website.
Restlite - lightweight restful server tools in Python:  Allow quickly creating RESTful services in Python. Support JSON and XML entity types. Readymade integration with databases including built-in SQLite. Expose internal data structures, variables or files as read-write web resources. Web-style authentication. [link
Videocity - web based video telephony, conference and messaging: Generalize the idea of video rooms using access keys based on public key cryptography. Enable live video, recording, streaming and non-video messaging and sharing. Allow embedding video rooms in other websites, and controlling the behavior from the parent web page. Self organizing, robust and portable client-server architecture. Client in Flash and server in Python. Keywords: Python, ActionScript, Flash Player, web, security [linkwhat's newvideo]
As I dived deeper into web services, and integration of web and communication, I learned various ways of doing the same thing - and the ability to distinguish the right from the wrong. I enjoyed the gratification of implementing non-trivial software.

Next, I went to work for 6Connex, a subsidiary of the more established DesignReactor. The project initially looked similar to my previous work, but was actually quite different. LP wanted to quickly create a web-based video conference and messaging framework, that worked with virtual events. He wanted to sell it to existing enterprise software vendors. SB designed the user experience. The Flash based client was implemented by RN, a talented programmer. The server side was implemented in Java to tie in nicely with other existing enterprise software. I wrote a lot of server side code. In the new role, I did less coding, and more of software design and architecture of the system. Nevertheless, I did implement a few pieces here and there. After about a year there, I felt it was time for me to move on. In the long term, I wanted to be in the RnD field. Working in a narrowly focussed startup was not the path leading to that desire. I was without a job again for few months.

During the free time, I got a chance to critically evaluate various programming approaches to create Flash based video conferencing. I concluded that many different types of web video conferencing and messaging applications can be built using only one Flash widget - a video box that represents a publish or play video stream.
Flash-VideoIO - reusable generic Flash application to record or play: A reusable generic Flash application to record and play live audio and video content. It can be used for a variety of use cases in audio/video communication, e.g., live camera view, recording of multimedia messages, playing video files from web server or via streaming, live video call and conference using client-server as well as peer-to-peer technology. [link,paper,slides]
This project got used in many of my other projects demonstrating a wide range of use cases. For example, multiparty conference web app [link,video], two party video call desktop application [link,video], a video chat roulette type app [link], a Facebook app for video call and broadcast, an online video office integrated with Google Chat [link], an online platform to connect to experts [link], and implementation of SIP user agent in JavaScript [link,video]. Initially, the project was closed source, but later was made open source.

During this time, I got a chance to collaborate with CD, who hosted VoIP conference and expo. I did a couple of presentations at that conference [pdf,pdf,link]. In collaboration with CD, HS, WW and AJ, I also created the voice and video on web (vvow) project at IIT.
Voice and video on web: A web based hosted solution to enable multiparty video conference, text chat, slides presentation, file sharing, and such. Use RESTful APIs for signaling over WebSocket. Use Flash-VideoIO for peer-to-peer media path, and later, incorporate WebRTC. Cross browser portability. [link,link,video]
Although this project started out as a student project, I ended up doing most of the implementation. Initially, it used Flash-VideoIO, and was later expanded to use plugin-free WebRTC. Later, bunch of us jointly published the work in a conference [pdf, ppt]. The paper compared various ideas on how web based communication can evolve, and described how my Flash based multiparty collaboration used web-oriented RESTful APIs.

As I was racing on in my open source work, I got called by Twilio. I liked JL and EC. But I also wanted to continue my open source and research activities. So I decided to work part time for Twilio, focussing on its RnD type activities. I spent the first couple of months creating prototype mobile apps for Android and iOS. It used SIP/RTP between the mobile device and the cloud service. It used pjsip for the client side SIP stack. I created an implementation of SIP-RTMP gateway, largely based on my open source project, but written using Java. It used Flazr for RTMP side and NIST/SIP for SIP/SDP side. This enabled a web browser to create a voice path to their cloud service. Both these projects eventually were productized and made available to Twilio developers. Later, I did some voice quality measurement and recommendations for improvement. I enhanced the web client and gateway to add Flash-based H.264 video path from the browser to cloud service. This was also based on my open source Python SIP-RTMP gateway, but re-written in Java. At Twilio, I closely worked with EC, JB, NV on separate projects.

During that time, I also did consulting for bunch of other companies - mostly startups. I had created my own consulting business [link]. The goal of the company was to support my open source projects, and to enable other companies use them in their products and services. For example, I helped Bittorrent, working with JK, with my Python-based open source RTMP code. And to do custom packetization of RTMP over unreliable transport.

I worked closely with MT of Emergent Communications, a startup specializing in providing SIP-based NG911 emergency call systems. In particular, I wrote a web-based call taker terminal using Flash [link,video], a Python-based public safety answering point (PSAP) director [link], and a Python-based location to service translation (LoST) client. The software enabled a phone or SIP caller to deliver emergency calls via voice, video, instant messaging or real-time text. The server included a conference server and recorder, and supported various NG911 standards. I used my previous open source projects to implement the server side. I also helped in interoperability testing of the system with other vendors. I hosted the service on Amazon EC2 for their trials. This was the first web-based NG911 call taker system that I knew of.

Finally, I continued on my open source work. I created two new projects. First, Py-WebRTC [link], to expose various objects and methods of voice and video engine of Google's WebRTC stack in Python, and second, SIP-JS. The first one never got completed. The goal of the second one, SIP-JS, was to implement a complete SIP/SDP stack in JavaScript. With a related project, Flash-Network, it allowed RTP transport and media capture for such web-based SIP user agents.
SIP-JS - SIP in JavaScript: Port the SIP/SDP stack from my Python-based open source project to Javascript. Support user login, call, outbound proxy, voice, video, instant messaging, and touch tone digits. Create example web-based video phone application. Allow both Flash and WebRTC as the media stack - automatically detect and fallback. Create hybrid mobile (Android) app using Cordova. [link,video,paper]
Initially, it started as a demonstration of SIP over WebSocket. The goal was to move the feature rich SIP from server side to client for web-based telephony application. Unlike SIP-RTMP gateway, where SIP stack runs in the server or gateway, the SIP-JS approach moved it to the browser. Things that were unavailable in the browser, such as media capture and transport, were implemented using Flash Player. Later, the project became more mature, with WebRTC media stack integration, and an Android app.

In the last five years, I had quite a roller coaster ride - hopping from one company to another, doing many great projects one after other, not getting satisfied by a single project, wanting to do so many things, and lacking a systematic research direction. This was quite in contrast with the next four years, with more stability, and longer term research and development vision. But still with wide range of projects. 

2012-2016  Cruising in the distant seas - "Age of WebRTC"

When VK told me about an opening in his department at Avaya Labs, I eagerly applied. I had known some folks there, including XW and AJ. XW had high regards for the department and the work he did, and AJ was quite happy as well. In my presentation to the labs folks, I talked about the evolution of video conferencing, my contributions on Flash-based applications in the browser, and the emergence of web-based communication [pdf]. Thus, in a way, I set my general research direction in the area of web-based voice and video collaboration, and endpoint driven systems.

At Avaya, I worked on several RnD projects with three common connecting themes: (1) endpoint driven systems where most of the application logic runs in the endpoint [web], (2) separation of the application logic from the user data so that the end user or her organization can control, manage and store the data independent of a single application, and (3) exploration and application of the emerging WebRTC technology to web, mobile and cloud systems for various enterprise use cases. However, one of my first assignments was actually not on WebRTC. It was based on my prior open source work - in fixing H.264 video path conversion between SIP/RTP and Flash RTMP systems for their one-touch video system [link].

My employer's business was largely in enterprise market. I decided to analyze the threats and propose recommendations on - how the enterprises can adopt the emerging WebRTC technology? How does WebRTC affect enterprises? What are the novel use cases that were previously not possible? And how can enterprises deal with these use cases? As part of this effort, I refined my previous SIP in JavaScript open source work [link]. More specifically, I used WebRTC for such an endpoint driven SIP system. Furthermore, I proposed some disruptive enterprise applications enabled by WebRTC and HTML5 technologies. Some of these showed existing use cases of call, conference, messaging and presence. Many others showed other non-trivial ways of collaboration such as virtual presence, video presence, web annotations, digital trail, contextual collaboration, and so on.

In collaboration with JY and AJ, I presented threats and potential solutions to traverse WebRTC flows through enterprise firewalls. We identified problems and proposed solutions on how enterprise policies such as authentication and recording be applied to WebRTC flows. The work got published and presented in conference and journal [paper,details,slides,video].
Secure edge - apply enterprise policies to WebRTC on any website: A system of media relay, firewall and browser extension to apply enterprise policies to WebRTC flows - enable only authorized flows on third-party websites but use user's enterprise credentials, record unencrypted media for all flows, restrict bandwidth or media types, or hide private IP addresses for such flows. [paper,details,slides,notes,video]
With the emergence of WebSocket and WebRTC technologies, many web communication apps emerged. There was a general tendency of developers to create custom messaging on top of WebSocket to enable WebRTC signaling. This created a fragmented world of many walled garden applications - each locking the user data, and hindering independent innovations. I extended and refined my resource oriented software architecture - created a very light weight, robust and scalable Python based resource server, and a generic and easy to use client-server API for shared data access and event notification. I proposed, and later implemented, numerous complex endpoint driven communicating applications per this architecture. Many of the projects listed below were covered and described in my co-authored conference papers - one on building communicating web applications leveraging endpoints and cloud resource service [paper,slides,poster,video], and another on private overlay of enterprise social data and interactions in the public web context [paper,slides,video].
Living-Content: A private overlay of enterprise digital trail on public web. Combine web annotations, virtual presence, ad hoc conversation, co-browsing and client-side application mash ups for many enterprise use cases. An endpoint driven system to leave a digital trail of employees interactions on and in the context of the public web, while keeping the data private, visible to employees but not outside.
Enterprise personal wall: A context sensitive user's personal wall for social sharing within an organization. Includes automatically populated as well as user generated content. Changes appearance depending on how or where it is embedded. Ability to initiate interaction or contact request using digital visiting card. 
With living content, you could edit or annotate any web page, see other employee's annotations, see who else was viewing the page at that instant, or browsing on the website, be able to initiate ad hoc communication with them, and see the past conversation history around that web page or site. The text and drawing annotations, and the interaction history, could allow an organization to keep a digital trail of the social interactions of their employees. It also integrated with our in-house corporate directory as well as third-party social directory. For example, it added additional data and presented a presence and click-to-call button on fellow employees profiles on Linked-In when viewing them from within the enterprise network. A browser extension enabled various client side changes in the web page without help from those third-party websites. I also initiated a project to unify access of various social and enterprise data from various sources including social directory or internal mail system. Thus, a new authorized web or mobile application could easily and quickly access and update user's social data.
HTML5 Communicator: A full feature communicator application written completely in HTML5 using WebSocket and WebRTC, but without legacy SIP, XMPP or Flash Player. Supports user signup, login, contact list, presence, custom status message, instant messaging, audio and video conferencing, multi-party chat (with text, voice and video), file sharing, conversation history, and emoticons. 
aRtisy developer platform: A platform to quickly create communicating web apps. An app editor with drag and drop for building an app by connecting various widgets, and an assortment of pre-built widgets for common communication and collaboration tasks, e.g., phone call, video publisher, conference state, automatic call distribution, call queue, click-to-call, or text-chat. 
Video recorder plugin: A browser plugin (not extension) to record audio and video from the webcam to an MP4 or wave file for video messaging. Supports NPAPI and ActiveX, and popular browsers, Chrome, Firefox, Internet Explorer and Safari. Flexible JavaScript API to control and monitor the plugin behavior, and to exchange the recorded file. Ability to selectively enable or disable the plugin only on certain websites.
Vclick - endpoint driven enterprise WebRTC: A full feature pure web based audio, video and text conferencing application. Implements the popular collaboration features such as screen sharing, white board, shared notepad, etc., as endpoint driven system, using resource oriented software architecture. Uses browser extension to enable click to call and presence in our corporate directory website. Extends to mobile, and has a version without depending on browser extension.  Has cloud hosting with appropriate security.
Always-on video presence:  A web based application for distributed teams to stay in touch during office hours. Shows periodic snapshots of people in a web room. Allows initiating or joining a conversation in one click. Goal is to replicate the behavior of people working in offices of a building, but using video presence, and still remain non-intrusive when possible.
The Vclick project was a joint work with JY and AJ, and was actually quite popular in our internal demonstrations. It did many things differently, e.g., authentication, call initiation, separation of call intent from session negotiation, unidirectional peer connection, and so on. It changed the common perception of how video call and conferencing could work in the emerging web/mobile era. I integrated this with many other existing systems within the company. After our several failed attempts to bring it out as experimental product, we eventually got permission to publish and present our work in conferences - one on how to use browser extensions to facilitate enterprise collaboration [paper,details,slides,notes,video1,video2], and another on the project motivation and system implementation [paper,slides,notes,video].

As part of our cloud hosting trials, I learned many things about cloud software and service. I also implemented many new security features. I implemented a portal to host other similar web applications. Later, in collaboration with JB, we created a more robust and secure cloud portal for customer trials of this and other emerging systems. We also published our work in a conference [paper,slides,notes,video]. Continuing collaboration with JB, I implemented few other software pieces - team spaces for mobile; annotations and interactions in team spaces; and video wall. In particular, I created a mobile app for his connected spaces project - a team space system that enables sharing of documents and other editable contents on the web with other team members. I integrated parts of my previous projects to enable impromptu audio/video communication on a shared document, and to create annotations as overlay in the context of that document. The goal was to embed communication within the existing context of what the user is doing, instead of requiring her to launch a communication app outside her context. All our joint projects were hosted on the cloud for internal customer trials, and were available on both desktop browsers and mobile.

As the demand for mobile versions of my projects grew, I started exploring various options. Cordova is a cross-platform development tool, to convert web code to mobile. Since many of my projects were already HTML5 and Chrome compatible, it made sense to use Cordova, and particularly Chrome Cordova Apps framework and tools. I converted many of my existing web applications to Android, and some to iOS apps. I also created some new mobile oriented projects. In particular, I built a mobile client app to connect to Avaya IP office over voice and video, an endpoint driven multi-party conference logic for client-server media path, a phone dialer to send outbound voice calls via Avaya Breeze, a server-less video phone to discover and connect with others in the local area network, and so on. Each of these cross platform apps could actually work as a web app, an installed desktop app as well as an installed mobile app. Jointly with JB, I published and presented our bag of tricks and findings in a conference [paper,slides,notes,video].

There were several communication apps, many based on WebRTC, internally as well as in the public domain. With numerous voice and video applications, each with its own walled garden, it became evident that we needed something to unify the diversity. Past attempts of server side translation or multi-protocol clients either failed or did not work well. In collaboration with VK, LP and others, I created another app, and its supporting service, for managing the user's popular contacts. It enabled the user to reach a contact quickly irrespective of which application the contact is using. This was also an endpoint driven system, and was based on resource oriented software architecture.
Strata Top9:  An app to quickly connect with your popular contacts on whichever application they are on. Include built-in WebRTC clients for some of our in-house systems - Vclick, IP office, media server, Scopia, Breeze, Messaging, and so on. Use modern HTML5 technologies, and cloud hosted services. 
The project actually became quite popular in our internal demonstrations. I also published and presented the project motivation and system architecture in a conference [paper,slides,notes,video]. It included ideas on dynamic contacts, ability to derive the right reachability address, endpoint driven software architecture, and many other non-trivial design decisions. The app was also written in HTML5, and converted to mobile using Cordova. There were few other projects that I worked on - video wall, media-as-a-service and telemedicine collaboration - but, due to lack of any publications on those topics, I decided to not describe them here.

While building the numerous applications, I experienced several novel concepts, challenges, and innovative ways to solve them. For example, how to create animations using CSS transitions? How to design for mobile and desktop alike? what kinds of asynchronous programming model make sense? And how are iframes used to create web components? I created a set of best practices that helped me write cross platform and mobile compatible software in vanilla JavaScript, i.e., without using other JavaScript frameworks of jQuery, Angular or such. I extensively used CSS3 for various animations and graphics. I relied on iframe based components or modules, instead of bulky and slow single page applications model. I also shared my bag of tricks internally and externally with other researchers. My two publications [first,second] cover many of these tips and tricks.

Although I enjoyed working on these exciting projects, my employer struggled to meet the business goals. When it was time to focus on these goals, long term research became an overhead that could be avoided. Subsequently, when a good incentive was offered to leave or be reassigned, I decided to leave. It was hard for me to abandon so many great projects, and to see my software vanish. Luckily, being in the research organization allowed me to publish and present many of my projects in academic conferences, and thus, brag about them. Bunch of us who had enjoyed working together, decided to continue working together. And thus, Koopid [link], a new startup was formed!

As I transitioned from one job to another, I decided to spend some part time on my other open source activities. In particular, I consolidated various project repositories related to real-time communication protocols and systems into a single project [link,blog]. The goal was to create light weight implementations of various protocols and applications in Python. I already had many of the pieces in my previous open source projects. I added a few more modules and applications, e.g., ability to make phone calls from command line, to connect to Twilio voice path from command line, and to bootstrap web apps using a light weight notification service. I did lots and lots of refactoring!

End of the year is usually the time when I want to reflect on my accomplishments, shortcomings and future goals. End of 2016 is particularly special, as it marks my 20-years in the area of real-time communications. So I decided to reflect on my journey of two decades. And thank you! You are my patient audience to have read all the way to the end.


(these pages are waiting to be filled with many more exciting projects)

No comments: