Scripting VoiceXML and TwiML using Tcl

This article describes how to write XML-based programmable scripts such as for W3C's VoiceXML or Twilio's TwiML using Tcl, the Tool Command Language. The associated project is available as open source software at http://github.com/theintencity/tcl-vxml-twiml.
  1. How easy is a programming language?
  2. What is Tcl?
  3. What are XML-based documents?
  4. How do I get started?


Since many readers may not be familiar with Tcl, the first part of the article is motivational - Why Tcl?


How easy is a programming language?

I believe, there are only a few factors that make a programming language easy to learn: (1) its vocabulary and grammar, (2) how does it express known abstractions? and (3) how does it deal with uncommon abstractions.
  • Vocabulary and grammar A language with a small number of consistent reserved keywords and a small number of consistent ways to arrange the words is easier to learn. Those familiar with multiple languages, can compare Python vs. Perl or PHP or Ruby.
  • How does it express known abstractions Most programmers are not engineers, and often times, they just need a way to programmatically express some data manipulation. People tend to think in terms of objects and operations, instead of wire protocols and style sheets. If you can describe a process using simple abstractions, can you implement it in a simple program in that language?
  • How does it deal with uncommon abstractions Sooner or later, programmers have to deal with abstractions that are not common in natural languages. For example, asynchronous or non-linear behavior (think Momento) is hard to comprehend, but implemented regularly by programmers. Is it easy to add a new programming construct to represent a new abstraction in that language? How does the language interact with other languages? How does it deal with foreign language or unicode?
A bi-lingual family often mixes multiple languages when talking to a family member. Unfortunately, such mix and match of programming languages in code is extremely difficult. And usually external interfaces such as HTTP or JSON are used to glue together code from multiple languages, instead of planned interfaces such as JNI.

Often times, a framework is used as an add-on to fill in the missing features of the underlying core language. Does the framework change the way your can express, or does it preserve the underlying concepts and structure of the programming language? If the latter, then it is easier to learn because the framework deals with the unknown abstractions in a consistent manner with respect to the underlying language.

There are several other factors that can be used to compare different programming languages. However, in my opinion, the above three factors determine how easy it is to learn. Many other factors are important and can determine, for example, how easy it is to understand someone else's program? or how productive it is to write in a particular programming language? Verbosity of the language - some story tellers will take pride in describing a person walking down the street in four pages of prose, whereas others capture the jist in two lines; syntax of the language - some writers will employ pleasing calligraphy in a hand written letter, while others will scribble a barely readable memo on a postit. Code editors often plays a role in making it easy to write a program and be more productive even by a novice programmer, but they do not fundamentally change the ability to learn the language. In my opinion, where you put the curly braces or whether you need semi-colon in your statement does not make it easy or hard to learn to express in that programming language - although it may effect the productivity, which is a different topic.

If you think of this from a baby's perspective it becomes clear. Babies are born without cognitive understanding of natural languages. They learn vocabulary one word at a time, and and then put together words to make sentences even if the grammar may not be right, and before your know it, they grow up and learn to express known as well as unknown concepts in tweets, blogs and essays! A baby learns by putting together small number of words without regard to complex grammar, so as to express common abstractions, e.g., "diaper change" or "milk" or "go outside". As the baby grows up, it now needs some way to express things that were previous not done by her or others around her. She does that by connecting previously known words, e.g., "daddy go outside" or "mama get milk".

Thus, the vocabulary and grammar, and the expression of known as well as uncommon abstractions become instrumental in the early learning phase. Having written good amount of software in a number of different programming languages, I believe that the above listed factors largely affect how easy it is for me to learn a new programming language. I think, to a large extent, the same applies to other programmers.

What is Tcl?

Tcl is a simple yet powerful programming language. It gained termendous popularity about two decades ago. It is also great with interacting with other interpreters and applications. It is extensible, simple and generic, and is a glue language. If you haven't done already I recommend learning Tcl and Expect, especially if you like creating command line tools or non-HTML user interfaces.

Everything in Tcl is a command and a string and a list at the same time. That could be called as Tcl's trinity. Well... that is not entirely true. Because Tcl does not have a built-in "list", but strings can be considered and operated as a "list".

I particularly like Tcl because the actual logic of almost everything in a Tcl program can reside in the application, not in the programming language. This includes, among others, the ability to define a new construct for loops, conditional branches, and even co-routines - all these can be done by defining new commands on top of the core programming language.

I had created a primitive language interpreter with similar concepts in my early programming years, after learning Pascal and C, even before I was exposed to any scripting language. And that was done entirely using C preprocessors and a pre-complication tool. Later, at Columbia University, I built the entire web application of our CINEMA VoIP project in Tcl, and contributed significantly to the SIP user agent project, also written in Tcl and Tk. Around that time, I also created Tcl libraries for easily writing VoiceXML. The work described in this article is based on ideas derived from that project, but the code is different.
Based on the factors listed earlier, I can safely conclude that Tcl is by far the easiest yet powerful high level programming language to learn that I have come across.
The question becomes - if Tcl is so easy, should it be the first programming language to learn. I say No! Continuing the baby anology, a baby will easily learn whatever her mother (or father) speaks as the first language. A motivated student will learn whatever she is exposed to as the first programming language in school. So the complexity of the first language may be largely irrelevant. However, once a person knows how to read and write in one language, learning another language needs to consider the idiosyncrasies and incompatibilities with the first language. For example, articles (a, an, the) in English is particularly hard to grasp if the first language is, say, Hindi, because the latter does not have the concept of articles. So if your first programming language does not have certain concepts such as object oriented paradigm, you may need to learn that later regardless.

So my recommendation is to have Python, Java or C/C++ as the first programming language. However, since Tcl is so easy to learn, it can easily become your second, third, or N-th programming language... over a weekend!.

What are XML-based documents?

XML documents are structured text with hierarchical structure, such as
<?xml version="1.0"?>
<people>
  <person id="1234">
     <name>Kundan Singh</name>
     <url>http://kundansingh.com</url>
  </person>
  <person>
     <name>John Smith</name>
  </person>
</people>
Although XML is popular for machine-to-machine communication, many existing programming languages treat XML text as second class citizens. Thus, manipulating or parsing XML is clumsy, or requires external library that may change the way you have to write XML related code compared to the rest of the code. This is particularly relevant if XML is used to describe control commands such as for VoiceXML or TwiML, instead of just storing structured data.
The following VoiceXML-based code instructs an IVR (Interactive Voice Response) system to play a voice prompt, collect digits, and invoke another program with the collected digits.
<vxml>
  <form>
    <field name="pin">
      <prompt>Please enter your four digit PIN</prompt>
    </field>
    <block>
      <submit next="after-pin.cgi" namelist="pin" />
    </filled>
  </form>
</vxml>
Typically, such XML code is generated by web applications or server side scripts, and are executed or interpreted by the IVR system. The server side script will typically look like below. This may be because the script generates XML on certain condition, e.g., whether the caller is authenticated, and has to substitute some parts with values obtained from external sources, e.g., the path of the next script and prompt text to play based on caller's spoken language.
if (!authenticated) {
    next_script = "after-pin.cgi"; // ... file name obtained from external source
    prompt = "Please enter your four digit PIN"; // ... prompt text from external
    println("<vxml>");
    println("  <form>");
    println("    <field name=\"pin\">");
    println("      <prompt>" + prompt + "</prompt>");
    println("    </field>");
    println("    <block>");
    println("      <submit next=\"" + next_script + "\" namelist=\"pin\" />");
    println("    </filled>");
    println("  </form>");
    println("</vxml>");
}
To reduce the ugliness of the code, the programmer ends up writing supporting libraries with classes and methods to easily create such XML code, e.g.,
if (!authenticated) {
    next_script = "after-pin.cgi"
    prompt = "Please enter your four digit PIN"
    response = new voicexml()
    form = response.form()
    field = form.field(name="pin")
    field.prompt(prompt)
    block = form.block()
    block.submit(next=next_script, namelist="pin")
    print(response)
}
This reduces the opportunities to make mistakes, unlike writing the XML code by hand. Unfortunately, this does not really remove the ugliness from the code. Also, the programmer now has to not only understand the XML document but also the library that provides these new objects and methods.

Wouldn't it be nice if the XML elements became objects and operations in your code on demand? And the original hierarchical structure is preserved? Consider the following code as an example.
if {!$authenticated} {
  set next_script "after-pin.cgi"
  set prompt "Please enter your four digit PIN"
  voicexml {
    form {
      field name=pin {
        prompt {
          puts $prompt
        }
      }
      block {
        submit next=$next_script namelist=pin
      }
    }
  }
}
That is actually a piece of valid Tcl code. And I will describe how to do this shortly in this article.
Note that the core idea is not new. For example, Don Libes cgi.tcl enables writing CGI scripts with hierarchical structure parallel to the desired HTML output.

Another example is as follows, with Twilio's TwiML for IVR-style processing. The first XML code is desired, and second Python script code can generate that XML, using Twilio's Python SDK that defines those new classes and methods. The third Tcl script resembles closely with the XML, and can also generate that XML, using the ideas and code mentioned in this article.
XML
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Dial>
    <Number sendDigits="wwww1928">
      415-123-4567
    </Number>
  </Dial>
</Response>
Python
from twilio.twiml.voice_response import Dial, VoiceResponse

response = VoiceResponse()
dial = Dial()
dial.number('415-123-4567', send_digits='wwww1928')
response.append(dial)

print(response)
Tcl
package require twiml
Response {
  Dial {
    Number sendDigits=wwww1928 {
      415-123-4567
    }
  }
}

How do I get started?

First step is to get comfortable with the basics of Tcl, if not already familiar. Certain syntax and semantics are quite different from other popular scripting languages, e.g., use of "quotes" or {curly braces}.

Next, download the vxml and twiml packages in this repository. Use the examples directory to check out various examples, such as,
$ tclsh examples/vxml1.cgi
These examples are intended to be CGI scripts, but can be reused in other Tcl scripts. You may also rename the file extensions from .cgi to .tcl if you like. These example files include both the desired XML output as well as the Tcl script code to generate that output.

VoiceXML

Consider the following desired XML.
<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" ... version="2.0">
  <form>
    <field name="drink">
      <prompt>
         Would you like coffee, tea, milk, or nothing?
      </prompt>
      <grammar src="drink.grxml" type="application/srgs+xml" />
    </field>
    <block>
      <submit next="http://www.drink.example.com/drink2.asp" />
    </block>
  </form>
</vxml>
For the script code, first include the required package. If the package is not available in standard Tcl library path, you may need to update the search path too.
lappend auto_path .
package require vxml
This package defines all the VoiceXML elements (or tags) as commands. Thus, vxml, form, field, etc., are be assumed to be Tcl commands. (Actually, it uses the catch-all unknown handler behind the scenes to dynamically define code for these XML tags.)

Additionally, the package includes a voicexml command to wrap the output in CGI compatible format, e.g., with Content-Type header when needed. This command also includes the default namespaces and attributes for the top-level vxml tag, and inserts the initial xml declaration.

Thus, the previous XML can roughly map to the following hierarchical Tcl commands.
voicexml {
  form {
    field {
      prompt {
        ...
      }
      grammar
    }
    block {
      submit
    }
  }
}
Every command that represents the XML tag, can also take zero or more attributes. Passing the attribute as arguments to the command can be done as name=value or name="value". Thus, the field, grammar and submit commands are changed as follows.
voicexml {
  form {
    field name=drink {
      prompt {
        ...
      }
      grammar src=drink.grxml type=application/srgs+xml
    }
    block {
      submit next=http://www.drink.example.com/drink2.asp
    }
  }
}
Children elements of a tag are specified as the last argument, if applicable. This is executed as a set of commands, allowing nested heirarchical structure. If the child element is just a text node, then built-in puts command can be used to print that, as shown below.
      ...
      prompt {
        puts "Would you like coffee, tea, milk or nothing ?"
      }
If the child element has both inline text and elements, such as,
   <prompt>
     I have <value expr="card_type"/> card.
   </prompt>
then the corresponding Tcl script should include both text output as well as nested Tcl commands, as follows. The three statements are put on the same line to match the corresponding line in the XML document, but can be spread across three lines for readability.
   prompt {
     puts "I have "; value expr=card_type; puts " card."
   }
Alternatively, you can modify the vxml library to also define commands that return the XML representation, instead of printing out. For example, if value_ is defined as command to return a string representing this value element, then the Tcl code could become:
   prompt {
     puts "I have [value_ expr=card_type] card."
   }
A VoiceXML document can refer to other XML-based content, such as for specifying the grammar rules. The XML elements used by such content are not included in the vxml package. Consider the following XML from examples/vxml3.cgi.
<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" ... version="2.0">
 <link next="operator_xfer.vxml">
   <grammar type="application/srgs+xml" root="root" version="1.0">
     <rule id="root" scope="public">operator</rule>
  </grammar>
 </link>
</vxml>
The corresponding Tcl script is as follows. Note that the children elements of the grammar tag are written as is, without using Tcl commands, e.g., for rule.
voicexml {
  link next=operator_xfer.vxml {
    grammar type=application/srgs+xml root=root version=1.0 {
      puts {<rule id="root" scope="public">operator</rule>}
    }
  }
}
However, if you are interested, you can implement similar concept for such embedded external XML content in your package.

Since VoiceXML allows element names such as if, else, elseif, or throw that are also Tcl commands, you can use the prefix vxml_ to invoke such VoiceXML commands from the Tcl program. Consider the following XML snippet.
  <if cond="card_type =='amex' || card_type =='american express'">
     Please say or key in your 15 digit card number.
  <else/>
     Please say or key in your 16 digit card number.
  </if>
The corresponding Tcl script is as follows. Note that vxml_if and vxml_else are used instead of if and else.
  vxml_if {cond=card_type == 'amex' || card_type == 'american_express'} {
    puts "Please say or key in your 15 digit card number."
    vxml_else
    puts "Please say or key in your 16 digit card number."
  }
In fact, all the vxml commands, including form, field, block, etc., can be called with vxml_ prefix, to avoid name collision with other potential packages you may use. Alternatively, you can use Tcl namespace and modify the vxml package.

Checkout other vxml examples in the repository.

TwiML

Using the twiml package is similar to using the vxml package with some crucial differences: the set of XML tags and hence the commands are different; since the XML tag names start with upper case letters, there is no prefixed named commands, as collision with built-in Tcl commands is unlikely; the generated XML is pretty'fied in vxml but not in twiml; and the twiml package includes the ability to also invoke Twilio REST APIs.

Furthermore, many TwiML tags do not including nested tags, hence the semantics of the last argument of the corresponding command is changed to reflect that. In particular, only the Response, Dial and Gather commands require the last argument to be executable commands to generate the children elements, whereas all other commands assume the last argument to be a string for the child text node in XML.

Let us start with a simple XML example.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Dial action="/handleDialCallStatus" method="GET">
    415-123-4567
  </Dial>
  <Say>Goodbye</Say>
</Response>
First, include the necessary package.
package require twiml
Then use the similar Tcl command hierarchy as the nested XML structure.
Response {
  Dial action=/handleDialCallStatus method=GET {
    puts 415-123-4567
  }
  Say Goodbye
}
In comparision, the corresponding Python script is as follows.
from twilio.twiml.voice_response import Dial, VoiceResponse, Say

response = VoiceResponse()
response.dial('415-123-4567', action='/handleDialCallStatus', method='GET')
response.say('Goodbye')

print(response)
Another example follows.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Gather input="speech dtmf" timeout="3" numDigits="1">
        <Say>Please press 1 or say sales for sales.</Say>
    </Gather>
</Response>
And the corresponding Tcl script as follows.
Response {
  Gather input=speech\ dtmf timeout=3 numDigits=1 {
    Say "Please press 1 or say sales for sales."
  }
}
Note that the space in the attribute value needs to be escaped. Alternatively, you could use quoted value, or use curly braces around the entire first argument.

The corresponding Python code follows:
response = VoiceResponse()
gather = Gather(input='speech dtmf', timeout=3, num_digits=1)
gather.say('Please press 1 or say sales for sales.')
response.append(gather)

print(response)
Compared to VoiceXML, a TwiML script is usually smaller, because TwiML lacks many control structures and telephony control commands available in VoiceXML. Instead, TwiML relies on the server side script to perform those functions.

Suppose the first TwiML to the caller is as follows.
<Response>
  <Say>Hello there!</Say>
  <Gather method="GET" action="?state=one">
    <Say>Please press 1 for sales or 2 for support.</Say>
  </Gather>
</Response>
Once the user enters a digit, say 1, suppose the second TwiML is as follows.
<Response>
  <Say>Let me connect you to a sales person</Say>
  <Dial timeout="10" record="true">
    <Number>+14151234567</Number>
  </Dial>
</Response>
And similarly, a different TwiML if the user enters 2.

To implement this logic in the same Tcl script, running as CGI script, first import the necessary libraries.
lappend auto_path .
package require twiml
You can use the cgi.tcl library for help in writing CGI Tcl scripts. Its cgi_input command captures the supplied CGI input, e.g., ?state=.... Its cgi_import command exposes the captured input as a Tcl variable. Note that TwiML receives the Digits input when the user enters some digits on telephone keypad.
package require cgi
cgi_input
if [catch {cgi_import state}] { set state {}}
if [catch {import Digits}] { set Digits {}}
Based on the supplied input, you can now call the twiml commands as appropriate. The following example illustrtates.
Response {
    if {$state == ""} {
        Say "Hello there!"
        Gather method=GET action=?state=one {
            Say "Please press 1 for sales or 2 for support."
        }
    } else {
        if {$Digits == 1} {
            Say "Let me connect you to a sales person"
            Dial timeout=10 record=true {
                Number "+14151234567"
            }
        } else {
            Say "Let me connect you to customer support"
            Dial timeout=10 record=true {
                Number "+14151234000"
            }
        }
    }
}
Note that you may move the Response command inside the if and else blocks, to keep them closer to the nested twiml commands.

Both the vxml and twiml packages allow error handling in the script. Thus if your script has some errors, the top-level voicexml or Response commands will capture the error, throw away any partial XML generated so far, and then generate only a simple XML to speak out the error. This keeps the generated XML valid, instead of breaking the user dialog abruptly. You can modify the included packages to send an email or log the error too.

The twiml package additionally includes an optional TwiML command to wrap the error handling code. Thus, if you wish to move the Response command closer to the nested commands, and still be able to handle script errors, you can wrap all the relevant code inside TwiML as follows.
TwiML {
  if {$state == ""} {
    Response {
      ...
    }
  } else {
    ...
  }
}
Check out other twiml examples in the repository.

Twilio REST API

The twiml package also includes necessary code to use Twilio REST APIs. The Client namespace is used to encapsulate the code for this.

Consider the following curl command to a text message.
curl -X POST https://api.twilio.com/2010-04-01/Accounts/ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Messages \
   --data-urlencode "Body=What's up?" \
   --data-urlencode "From=+14151234567" \
   --data-urlencode "To=+12121234567" \
   -u ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:your_auth_token
The corresponding Python code using the Twilio's Python SDK is as follows.
from twilio.rest import Client
client = Client('ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'your_auth_token')
client.messages.create(body="What's up?", from_='+14151234567', to='+12121234567')
The corresponding Tcl code using our twiml package is as follows.
package require twiml
set client [Client::create "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" "your_auth_token"]
$client POST Messages Body "Hello There" From "+14151234567" To "+12121234567"
Note the differences between the raw curl API and the Python or Tcl code. The path and parameter names of the curl command transparently map to the corresponding Tcl code elements, but only after some changes to the Python code element, e.g., From becomes from_.

The client object encapsulates the account and token information, and exposes GET, POST, PUT and DELETE methods. These methods take the relative URL path and a list of name-values for the parameters in the request body. On the other hand, any URL parameters must be supplied as part of the URL path. These methods return the received XML response as a DOM node. The twiml package includes Xpath style element and attribute extraction from the XML node, as shown in the following example.

Following is another example.
curl -X GET 'https://.../Calls.json?StartTimeAfter=2009-07-06T00%3A00%3A00Z&Status=completed' \
  -u ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:your_auth_token
The response is in the following format.
<TwilioResponse>
  <Calls start="0" end="49" pagesize="50" ...>
    <Call>
      <Sid>...</Sid>
      ...
    </Call>
    <Call>
      ...
    </Call>
    ...
  </Calls>
</TwilioResponse>
The corresponding Python code is as follows:
from datetime import datetime
from twilio.rest import Client
client = Client('ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'your_auth_token')
calls = client.calls.list(start_time_after=datetime(2009, 7, 6, 0, 0), status='completed')
for record in calls:
    print(record.sid)
And the corresponding Tcl code is shown below. Note the Xpath style XML attribute and element extraction from the response. You can again see that this matches closely with the curl example, compared to the Python code.
set client [Client::create "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" "your_auth_token"]
set calls [$client GET Calls?StartTimeAfter=2009-07-06T00%3A00%3A00Z&Status=completed]
puts "\[[$calls set /TwilioResponse/Calls/@start]-[$calls set /TwilioResponse/Calls/@end]\] \
     [$calls set /TwilioResponse/Calls/Call/Sid]"
$client delete
In case of any error in the API response, the client's method throws an exception. For example, the following may return error, if call ID is not valid.
curl -X GET 'https://.../Calls/CAXXXXXX' -u '...'
<TwilioResponse>
  <RestException>
    <Message>The requested resource... was not found</Message>
    ...
  </RestException>
</TwilioResponse>
The corresponding Tcl code fragment to capture and print the error message is shown below.
if {[catch {$client GET Calls/CAXXXXX} errMsg]} {
    puts $errMsg
}

Closing words

The Tcl code for the twiml and vxml packages are pretty small, about 100-200 lines each. Tcl allows defining a catch-all command that is triggered if that named command is not already defined in the code. This feature is used to dynamically intercept an undefined command, and if it matches a desired XML tag name, then print out the corresponding XML code. All the attribute arguments of the command are captured to form the XML tag's attributes. The last argument, if not in attribute form, can optionally be interpreted to print the XML tag's child elements, recursively.

A wrapper command such as TwiML or voicexml is defined explicitly to capture the generated XML in a buffer, by replacing the built-in puts command with a custom one that writes to the buffer. This allows capturing the error, and generating a sane XML that indicates the error, instead of terminating the script abruptly.

The above mentioned concepts can be seen in the twiml.tcl and vxml.tcl files available in the included packages. Most of that code can be reused in your own XML-based document library written in Tcl.

Finally, note that Tcl is pronounced tickle. So if you got Tcl'ed today, remember to tickle someone else too...

Resources

  1. What is Tcl? https://en.wikipedia.org/wiki/Tcl
  2. Learn Tcl https://learnxinyminutes.com/docs/tcl/
  3. History of Tcl https://web.stanford.edu/~ouster/cgi-bin/tclHistory.php
  4. Tcl the misunderstood http://antirez.com/articoli/tclmisunderstood.html
  5. What went wrong with Tcl and Tk https://journal.dedasys.com/2010/03/30/where-tcl-and-tk-went-wrong/
  6. Comparing Tcl with Web technologies http://beauty-of-imagination.blogspot.com/2016/01/tcltk-vs-web-we-should-abandon-web.html
  7. Writing CGI in Tcl http://expect.sourceforge.net/cgi.tcl/ref.txt
  8. Source code of this project http://github.com/theintencity/tcl-vxml-twiml

No comments: