# AIML 2.0 Working Draft

Revision 1.0.2.22
March 9, 2014
Richard S. Wallace
ALICE A.I. Foundation
Contact: info@alicebot.org
## 1. Introduction
This document is a draft specification for a new AIML (Artificial Intelligence
Markup Language) standard, version 2.0 of the language. AIML is an XML
language for specifying the contents of a chat robot character. An AIML
Interpreter is a program capable of loading and running the bot, and providing
the bot’s responses in a chat session with a human user, called the client.
This document explains in detail both the syntax and semantics of AIML, as
well as key features that should be supported by an AIML interpreter.
The primary design goal of the original AIML language was simplicity. AIML is
motivated by two observations:
**1.** Creating an original, believable chatbot character requires writing a
significant amount of content, in the form of conversational replies(*)
> (*) - This proposition may not be true for chatbots based on other
technologies. AIML implements a form of supervised learning, where a person,
the botmaster, plays a crucial role in training the bot. Unsupervised
learning systems, on the other hand, attempt to teach a bot through
conversations, in effect crowdsourcing the bot content. The unsupervised
model has its own drawbacks however. Specifically, the bot database becomes
filled with nonsense, which then an editor must later delete. The tradeoff
between supervised and unsupervised methods might be summarized as “Creative
writing vs. deleting garbage."
**2.** The people who are most suited to writing the bot content are not, by in
large, computer programmers. Those with literary backgrounds are more skilled
at developing content for original characters(*).
> (*) The caveat to this observation is that there are of course, some talented
people who have mastered both computer programming and the literary skill to
write quality chatbot content.
When AIML was first designed in the late 1990’s, the World Wide Web had burst
upon the stage and a rush of creative energy was poured into building websites.
This tsunami of activity has in fact continued to this day. What has changed
however is that the web lost its original simplicity. Perhaps it was
inevitable as users demanded more and more sophisticated services through the
web, that layers of complexity would be added. In 1994 however it was
possible to author a web site with only rudimentary knowledge of a few HTML
tags.
Because at that time, a number of creative people had mastered the then-simple
HTML, I made a decision to create an equally simple AIML. I was fond of
saying, “anyone who knows enough HTML to make a website, can learn enough AIML
to write a chatbot."
A parallel development beginning in the 1990’s was the development of XML,
including specifications, standards, documents, tools, and applications for
XML. Perhaps the world has not gone the way that the XML evangelists hoped in
the 1990’s, as its many competing formats remain viable today. But XML has not
gone away either. It remains true that XML is a broadly supported standard,
and its tag-based representation is easy to grasp without sophisticated
knowledge of computer science. AIML authors have found the many XML tools,
such as DTDs, syntax checkers, and editors, to be useful when creating bots.
For these reasons AIML 2.0 remains hitched to the XML wagon.
At some level however, AIML does not depend on XML syntax. There is a deeper
representation of the data we represent in XML files. As long as the
representation can capture the basic structure of a pattern path (the input
pattern, that pattern and topic pattern), and a hierarchical response template,
then AIML could be written in a number of different formats, including Lisp
S-expressions, JSON, or a structured text format. The AIML 2.0 draft even
includes an alternative representation: a hybrid of flat files and XML called
AIML Intermediate Format (described in a section below).
Modifying AIML inevitably reduces some of its original simplicity. Adding
more tags and more features make the language more difficult for people to
understand. The urge to keep it as simple as possible is tempered by our
experience over the past decade, in which AIML botmasters learned that the
language had some serious limitations. AIML 2.0 is an attempt to address the
shortcomings, while balancing the original goal of keeping the language as
simple as possible. This AIML 2.0 draft specification is, for the most part,
designed to be backwards-compatible with the AIML 1.0 and earlier standards, in
that way preserving the simplicity of the original language. What’s new are
some new features that build on top of the original language in such a way that
the concepts can be pedagogically organized so that AIML can be taught in
beginner, intermediate and advanced levels.
### What’s new in AIML 2.0?
* Zero+ wildcards: new wildcards that match 0 or more words.
* Highest priority matching: select certain words to have top matching
priority
* Migrating from attributes to tags: more dynamic control of attribute values
* AIML Sets: match inputs with sets of words and phrases
* AIML Maps: map set elements to members of other sets
* Loops: Iterations
* Local variables: variables with scope limited to one category.
* Sraix: access external web services and other Pandorabots
* Denormalization: the (approximate) inverse of normalization.
* Pandorabots extensions
* date: formatted date and time
* request: access previous input request history.
* response: access previous bot response history
* unbound predicates: check if a predicate has been set or not
* learn: learn new AIML categories
* learnf: learn new AIML categories and save in a file
* explode: split words and phrases into individual character
* OOB (Out of Band) Tags: AIML extension for mobile device control
### What’s gone from AIML 1.0?
* Gossip - never well defined anyway
* Javascript - The interpreter does not have to support a scripting language
(to be restored in AIML 2.1).
## 2. AIML System overview
AIML defines a relationship between three entities: a human chatter called the
client, a human chat bot author called the botmaster, and the robot or bot
itself. In general a botmaster can author multiple bots, and each bot can have
multiple clients. A system like Pandorabots provides for multiple botmasters,
multiple bots, and multiple clients. An AIML system embedded in a consumer
device might have only one bot and one client. The AIML standard does not
specify the number of bots, botmasters or clients (except that defining AIML
means we have to talk about at least one of each). The details of handling
multiple bots, botmasters and clients is left up to the implementation.
Care should be taken however to manage the state of each bot and each client
session.
**A. Bot configuration and state:**
**AIML Files** -- Each bot is assumed to have its own set of AIML files. This
collection of AIML files uniquely defines the personality of the bot character.
A bot may be a clone of another bot, or may connect to another bot through
(defined below) but for the purpose of defining the AIML language, the
simple assumption is that each bot has its own AIML files.
**Learnf file** -- one AIML file with special meaning is the file created by the
tag (defined below). When an AIML template activates a tag,
the bot remembers or “learns" the new category, specially, by saving it in a
file given a specific name by the interpreter (for example, learnf.aiml). The
new categories learned with are global to all clients chatting with
the bot, so the learnf file should be part of the bot’s AIML file collection.
**Bot properties** -- global values for a bot, such as or . A multiple bot system should take care to maintain bot
properties individually and separately for each bot.
**Substitutions** -- normalizing substitutions, person substitutions, gender
substitutions and sentence splitters are unique to each bot. Many bots may
use copies of the same substitutions, but a multiple-bot system should ensure
that each bot can have its own custom substitutions.
**Predicate defaults** - Predicate values in AIML are like local variables specific
to one client. Typically one thinks of client profile information like name,
age and gender predicates, but predicates can be used to store any string.
AIML predicates are set with the tag and retrieved with
the tag. Predicates are specific to an individual
client, but the predicates may have default values that are defined for a
specific bot. There should also be a global predicate default for any
predicate whose default value is not specified for a bot.
**Sets and Maps** - AIML 2.0 includes a feature that implements sets (collections)
and maps. The sets members are strings and the maps define a mapping from
string to string. Unique collections of Sets and Maps may be defined for
each bot.
The AIML standard does not specify where or how the properties, sets, maps,
substitutions and predicates are defined. This is an implementation detail
left up to the interpreter designer. The values could be entered through a
user interface, saved in text files or a database, or in any other format
including XML and JSON, as long as the interpreter can read them when the bot
is launched.
**B. Client session and state**
**Initialization** -- when a client connects to a bot, before they begin chatting,
the bot must initialize a client session. The client session is assigned a
unique ID so that the AIML interpreter can track the state of the conversation.
This is important when a single bot is chatting with multiple clients, for
example a web based bot.
**Predicate defaults** -- Initialization step also includes setting predicates to
the default values specified for the bot.
**Predicate state** -- The chat session must keep track of the state of predicate
values. Whenever a client activates an AIML category, potentially the
tag is some predicate values may change. The interpreter must remember the
predicate values through the course of the conversation.
**Topic** - The AIML topic is a unique predicate value, because it becomes part of
the pattern matching process. The topic can be set with
**Conversation log** -- Generally an interpreter keeps a conversation log of the
interactions between a bot and a client. The AIML 2.0 draft does not specify
how or in what format these logs are stored.
**History** -- The AIML 2.0 draft does however specify that the bot maintain,
within a chat session, a history of interactions for the purpose of evaluating
the tags , , and . The size of the history
(the number of elements saved or remembered) is left up to the interpreter
designer.
**Learned categories** -- Categories learned with are saved globally for
the bot (see Learnf file above), but categories learned with the tag
are specific to each client. The chat session should maintain any categories
learned with .
**C. Counting interactions and sentence splitting**
The basic step of AIML pattern matching is to match one input sentence against
the bot’s set of AIML categories. Because inputs and responses may contain
more than one sentence, AIML has adopted a particular system for counting and
indexing inputs and outputs.
When the bot receives a multiple-sentence input
In general one input sentence may result in 1 or more output sentences.
```xml
- the current input sentence
- the previous input sentence
- the Nth previous input sentence.
= - the client’s last input request, consisting
of one or more input sentences.
- the client’s 2nd to last input request.
- the client’s Nth to last input request.
= - the bot’s last response, consisting of
one or more sentences.
- the bot’s second to last response.
- the bot’s Nth to last response.
= - the last sentence the bot uttered.
- the 2nd to last sentence in ,
provided it exists.
- The last sentence of .
```
**Human**: Hello
**Robot**: Hi nice to see you!
**Human**: How are you? My name is Jeff.
**Robot**: I’m very well. How are you doing? What's up, Jeff?
**Human**: I’m talking to a robot
**Robot**: Would you like to say more about that?
**Human**: Sure
At this point, the bot finds a category with a response to the input “Sure".
The following table summarizes the current state of input/that and request/
response history at the time when that category’s template is evaluated,
Entity | Normalized Sentence | input/that | request/response
--- | --- | --- | ---
Human | Hello | `````` | ``````
Robot | Hi nice to see you | `````` | ``````
Human | How are you | `````` | ``````
Human | My name is Jeff | `````` | ``````
Robot | How are you doing | `````` |
Robot | What is up Jeff | ``````
Human | I'm talking to a robot | `````` | ``````
Robot | Would you like to say more about | `````` | ``````
Human | Sure | ``````
## 3. Migrating from attributes to tags in AIML 2.0
One odd feature of XML is the distinction between tags and attributes.
Consider the HTML img tag in an expression like
```html
```
Why was this tag developed to use an attribute, rather than a subtag like:
```html
http://alicebot.org/logo.jpg ?
```
HTML is interpreted in a static way, but an XML language can be defined to
interpret tags dynamically. For XML languages like AIML, the problem with
attributes is that they are not easy to rewrite dynamically. Suppose we want
the value of the src attribute to vary depending on another XML expression:
```html
```
The problem in XML is that you can’t put an XML expression inside an attribute:
```html
is forbidden in XML syntax.
```
Of course, this problem is not hard to solve with a little computer
programming. The XML attribute values can be rewritten by another process
writing the XML. But at least for AIML and XML languages like it, we would
like to specify attribute values dynamically, and allow the botmaster to write
the expressions for those values in XML.
Fortunately the problem has a simple solution: don’t use attributes. Any value
in an attribute can just as well be represent with a subtag as in our example
```html
http://alicebot.org/logo.jpg
```
AIML 2.0 modifies the definition of every AIML tag that takes an attribute so
that the attribute value can be specified with a subtag having the same name.
For example:
```xml
may be written as
ageHi, boss!
```
may be written as
```xml
jobmanagerHi, Boss! may be written as
%D %H
```
Even more generally, the contents inside the attribute tags may be any template
expression, as these examples show:
```xml
PREDICATE NAMEjobprofessionHi,
Boss!
```
Care should be taken to ensure that whatever these template expressions return
is a valid expression for the attribute. For example in,
```xml
GET INDEX
```
The ```GET INDEX``` should return a valid index number > 0.
To retain backwards compatibility, either the attribute form or the subtag form
may be used in AIML 2.0. In the definitions of XML tags that follow, with a
couple of exceptions noted, the attribute values may also be written in the
subtag form.
## 4. AIML Syntax
Edit: This section should be rewritten using RELAX NG notation.
http://en.wikipedia.org/wiki/RELAX_NG
This section makes use a variant of BNF notation to describe the syntax of AIML
in detail. An XML language syntax may also be specified by a DTD or XML
Schema. The BNF variant here is slightly more convenient for someone writing
an AIML interpreter, and also it captures one feature of AIML that goes beyond
standard XML syntax, namely the AIML pattern language.
In this BNF variant:
* Literal tag names and attribute expression are written in Consolas Bold
font.
* Expressions and clauses are written in CONSOLAS UPPERCASE.
* The following notation is used to define an expression:
```
(EXPRESSION) - The expression EXPRESSION is optional.
(EXPRESSION)* - The expression EXPRESSION may be repeated 0 or more times.
(EXPRESSION)+ - The expression EXPRESSION may be repeated 1 or more times.
EXPRESSION3 ::== EXPRESSION1 | EXPRESSION2 - Expression EXPRESSION3 may consist
of either EXPRESSION1 or EXPRESSION2. This is equivalent to the two statements
EXPRESSION3 ::== EXPRESSION1
EXPRESSION3 ::== EXPRESSION2
```
The full description of AIML syntax follows:
```
AIML_FILE ::== AIML
AIML_VERSION ::== 0.9 | 1.0 | 1.1 | 2.0
AIML ::== (CATEGORY_EXPRESSION | TOPIC_EXPRESSION)*
TOPIC_EXPRESSION ::==
(CATEGORY_EXPRESSION)+
CATEGORY_EXPRESSION ::== PATTERN_EXPRESSION(
PATTERN_EXPRESSION)(PATTERN_EXPRESSION)
TEMPLATE_EXPRESSION
PATTERN_EXPRESSION ::== WORD | PRIORITY_WORD | WILDCARD | SET_STATEMENT |
PATTERN_SIDE_BOT_PROPERTY_EXPRESSION
PATTERN_EXPRESSION ::== PATTERN_EXPRESSION PATTERN_EXPRESSION
```
with exactly one space “ “ between pattern expressions.
The definition of WORD is language dependent. For English, we generally
acknowledge any combination from the regular expression [a-zA-Z0-9]* as an AIML
word. The AIML preprocessor step called normalization (described in detail
below) converts an input sentence to a normalized form where punctuation has
been removed, and each word consists of an element of [a-zA-Z0-9]*.
```
WILDCARD ::== * | _
SET_STATEMENT ::== SET_NAME
SET_NAME ::== WORD
PRIORITY_WORD ::== $WORD
PATTERN_SIDE_BOT_PROPERTY_EXPRESSION ::== | PROPERTY_NAME
PROPERTY_NAME ::== WORD
```
Now we turn to the AIML template, which has a hierarchical structure:
```
TEMPLATE_EXPRESSION ::== TEXT | TAG_EXPRESSION | (TEMPLATE_EXPRESSION)*
```
TEXT is any ordinary English text consisting of any character except “<" and “>
", which must be specified as < and > respectively.
NORAMLIZED_TEXT is any TEXT that has been normalized by the AIML preprocessor.
The exact normalization substitutions are left up to the botmaster.
```
TAG_EXPRESSION ::==
RANDOM_EXPRESSION |
CONDITION_EXPRESSION |
SRAI_EXPRESSION |
SRAIX_EXPRESSION |
SET_PREDICATE_EXPRESSION |
GET_PREDICATE_EXPRESSION |
MAP_EXPRESSION |
BOT_PROPERTY_EXPRESSION |
DATE_EXPRESSION |
THINK_EXPRESSION |
EXPLODE_EXPRESSION |
NORMALIZE_EXPRESSION |
DENORMALIZE_EXPRESSION |
FORMAL_EXPRESSION |
UPPERCASE_EXPRESSION |
LOWERCASE_EXPRESSION |
SENTENCE_EXPRESSION |
PERSON_EXPRESSION |
PERSON2_EXPRESSION |
GENDER_EXPRESSION |
SYSTEM_EXPRESSION |
STAR_EXPRESSION |
THATSTAR_EXPRESSION |
TOPICSTAR_EXPRESSION |
THAT_EXPRESSION |
REQUEST_EXPRESSION |
RESPONSE_EXPRESSION |
LEARN_EXPRESSION |
INTERVAL_EXPRESSION |
| | |
RANDOM_EXPRESSION ::== (
CONDITION_ATTRIBUTES ::== (name="NAME") | (value="NORMALIZED_TEXT")
CONDITION_EXPRESSION ::==
(CONDITION_ITEM_EXPRESSION)*
SRAI_EXPRESSION ::== TEMPLATE_EXPRESSION
SRAIX_EXPRESSION ::== TEMPLATE_EXPRESSION |
(SRAIX_ATTRIBUTE_TAGS)*TEMPLATE_EXPRESSION
SRAIX_ATTRIBUTES ::= host="HOSTNAME" | botid="BOTID" | hint="TEXT" | apikey="
APIKEY" | service="SERVICE"
SRAIX_ATTRIBUTE_TAGS ::= TEMPLATE_EXPRESSION |
TEMPLATE_EXPRESSION | TEMPLATE_EXPRESSION |
TEMPLATE_EXPRESSION | TEMPLATE_EXPRESSION
GET_PREDICATE_EXPRESSION ::== |
TEMPLATE_EXPRESSION | | WORD
get>
SET_PREDICATE_EXPRESSION ::== TEMPLATE_EXPRESSION |
TEMPLATE_EXPRESSIONTEMPLATE_EXPRESSION | TEMPLATE_EXPRESSION | TEMPLATE_EXPRESSION
TEMPLATE_EXPRESSION
MAP_EXPRESSION ::= |
BOT_PROPERTY_EXPRESSION ::== |
TEMPLATE_EXPRESSION
DATE_EXPRESSION ::== | (DATE_ATTRIBUTE_TAG)
date>
DATE_ATTRIBUTES ::== (format="LISP_DATE_FORMAT") | (jformat="JAVA DATE FORMAT")
DATE_ATTRIBUTE_TAG ::== TEMPLATE_EXPRESSION |
TEMPLATE_EXPRESSION
INTERVAL_EXPRESSION ::== (DATE_ATTRIBUTE_TAGS)(TEMPLATE_EXPRESSION)
(TEMPLATE_EXPRESSION)
THINK_EXPRESSION ::== TEMPLATE_EXPRESSION
EXPLODE_EXPRESSION ::== TEMPLATE_EXPRESSION
NORMALIZE_EXPRESSION ::== TEMPLATE_EXPRESSION
DENORMALIZE_EXPRESSION ::== TEMPLATE_EXPRESSION
PERSON_EXPRESSION ::== TEMPLATE_EXPRESSION
PERSON2_EXPRESSION ::== TEMPLATE_EXPRESSION
GENDER_EXPRESSION ::== TEMPLATE_EXPRESSION
SYSTEM_EXPRESSION ::==
TEMPLATE_EXPRESSION |
TEMPLATE_EXPRESSION
TIMEOUT_ATTRIBUTE :== timeout=”NUMBER”
STAR_EXPRESSION ::== |
TEMPLATE_EXPRESSION
INDEX_ATTRIBUTE ::== index="NUMBER"
THATSTAR_EXPRESSION ::== |
TEMPLATE_EXPRESSION
TOPICSTAR_EXPRESSION ::== |
TEMPLATE_EXPRESSION
THAT_EXPRESSION ::== | TEMPLATE_EXPRESSION
index>
THAT_INDEX ::= index="NUMBER,NUMBER"
REQUEST_EXPRESSION ::== |
TEMPLATE_EXPRESSION
RESPONSE_EXPRESSION ::== |
TEMPLATE_EXPRESSION
LEARN_EXPRESSION ::== LEARN_CATEGORY_EXPRESSION |
LEARN_CATEGORY_EXPRESSION
LEARN_CATEGORY_EXPRESSION ::==
LEARN_PATTERN_EXPRESSION(LEARN_P
ATTERN_EXPRESSION)(LEARN_PATTERN_EXPRESSION)
LEARN_TEMPLATE_EXPRESSION
EVAL_EXPRESSION ::== TEMPLATE_EXPRESSION
LEARN_PATTERN_EXPRESSION ::== PATTERN_EXPRESSION | EVAL_EXPRESSION
LEARN_PATTERN_EXPRESSION ::== (LEARN_PATTERN_EXPRESSION)+
LEARN_TEMPLATE_EXPRESSION ::== TEXT | TAG_EXPRESSION | EVAL_EXPRESSION
LEARN_TEMPLATE_EXPRESSION ::== (LEARN_TEMPLATE_EXPRESSION)*
```
## 5. AIML Pattern Language
AIML patterns are made up of words, wildcards, AIML set expressions, and bot
properties.
A word is any sequence of characters output by the normalization pre-processor
that does not contain a space. The space character is reserved to indicate a
space between words, as it does in many human languages including English.
Exactly which characters are allowed in normalization depends on the
botmaster’s choice of normalization substitutions and the input language, but
generally the idea with normalization is:
* Remove punctuation
* Expand contractions
* Correct a few common spelling mistakes
* Ensure one space between words
So “Hello”, “123”, “HaveFun” are normalized words but “can’t”, “1.23”, and
“Have-Fun” are not. Some AIML applications that require the bot to have
knowledge of the original punctuation include normalization substitutions so
that for example “,” becomes “comma”, “-” becomes “dash” and “.” becomes
“point”.
One way to process inputs from languages like Japanese and Chinese that do no
separate words with spaces is to place an implicit space between each character
and treat each one as a “word”.
Pre proprocess the input
```
日本の伝統
```
into
```
日本の伝統
```
and use patterns like
```xml
* の伝統
```
AIML 2.0 includes some new wildcards and pattern-side expressions.
A. Zero or more words wildcards
The AIML 1.0 wildcards * and _ are defined so that they match one or more
words. AIML 2.0 introductes two new wildcards, ^ and #, defined to match zero
or more words. As a shorthand description, we refer to these as “zero+
wildcards”.
Both ^ and # are defined to match 0 or more words. The difference between
them is the same as the difference between * and _. The # matching operator
has highest priority in matching, followed by _, followed by an exact word
match, followed by ^, and finally * has the lowest matching priority.
When defining a zero+ wildcard it is necessary to consider what the value of
(as well as and ) should be when the wildcard
match has zero length. In AIML 2.0 we leave this up to the botmaster. Each
bot can have a global property named nullstar which the botmaster can set to
“”, “unknown”, or any other value.
Examples:
```xml
SHARPTEST #
#star = SHARPTEST # TEST
#star = # KEYWORD #
Found KEYWORD
^ CARETTEST
^star =
```
Sample dialog:
**Human**: sharptest
**Robot**: #star = unknown
**Human**: keyword
**Robot**: Found KEYWORD
**Human**: sharptest foo
**Robot**: #star = foo
**Human**: sharptest foo bar test
**Robot**: #star = foo bar
**Human**: xyz abc carettest
**Robot**: ^star = xyz abc
**Human**: carettest
**Robot**: ^star = unknown
**Human**: keyword
**Robot**: Found KEYWORD
**Human**: abc def keyword ghi jkl
**Robot**: Found KEYWORD
**Human**: abc keyword
**Robot**: Found KEYWORD
**Human**: keyword def
**Robot**: Found KEYWORD
B. $ operator
In some cases it is desirable to make an exact word match have higher priority
than _.
For example,
The category
```xml
_ ALICE
```
is useful for removing the bot’s name, Alice, from many queries such as “Tell
me the time, Alice” and “What is your favorite color, Alice?”. The
simplifies these inputs to “Tell me the time” and “What is your favorite color”
respectively. But the category breaks down for other inputs like “Who is
Alice?” which we wouldn’t want to reduce to just “Who is”.
Using the $ operator we can add the category
```xml
$WHO IS ALICE
I am Alice.
```
so that the input “Who is Alice?” matches this category and not the one with
```_ ALICE```.
The $ indicates that the word has higher matching priority than _.
### AIML Sets
A pattern in AIML 2.0 may contain an expression referring to an AIML set. If
the botmaster has defined a set named “color” of color names, then the
expression
color
can match any member of this set.
### Examples of valid AIML patterns
The following are examples of valid AIML patterns
```xml
*HOW ARE YOUHow are youHoW aRe YoU -- AIML patterns are case invariant
colorCOLORI LIKE color_ THANK YOU_ MUSIC *# MUSIC #I LIKE # MUSIC_ * _ * _ -- may not be useful but it is valid
```
Examples of invalid patterns:
The following are not valid AIML patterns
```xml
-- no concept of a blank pattern
How are you? -- no punctuation in AIML patterns
I LIKE* -- wildcard should have a space separating it from
word
* -- no wildcard in set name
_*_*_ -- wildcards should have spaces separating them
```
## 6. AIML Semantics
This section explains the semantics of each AIML tag.
a. tag
The tag wraps the contents of an AIML file.
Example:
```xml
COLOR is a color.
```
The AIML file is an XML file and so may also have an optional header like
however the definition of the XML header is outside the scope of AIML 2.0. The
tag wraps the AIML contents.
b. tag
The tag wraps a collection of categories that share the same topic
pattern.
Example:
```xml
YESWOULD YOU LIKE TO ADD * AS A CONTACTunknownNEWCONTACTIDLEARN CONTACTID DISPLAYNAME
```
I've saved to your contacts.
```xml
RESUMEACTION *WOULD YOU LIKE TO ADD * AS A CONTACTunknownCONTACTFINALIZE
```
In this example, the first category has input pattern YES and the second
category has input pattern *. Both categories have the same that pattern,
WOULD YOU LIKE TO ADD * AS A CONTACT, and the same topic pattern, ASKING TO ADD
NEW CONTACTNAME.
In AIML 2.0 the topic pattern may also be defined inside a category. That is,
a category has an input pattern specified by the tag, a that pattern
specified by the tag, and a topic pattern specified by the tag.
If either or is omitted, the corresponding pattern is defined
as * by default.
c. tag
The basic unit of knowledge in AIML is a category. The tag always
contains an input and a response . Optionally it may also
contain a pattern and a pattern. If either of or
is omitted, the AIML interpreter assigns the corresponding pattern a
value of *.
Example:
```xml
HI
Hi there!
```
d. tag
The tag specifies the input pattern. The contents of the pattern
tag are defined in the AIML Pattern Language section.
e. Pattern-side
A tag in a expression denotes an AIML Set, a collection of
words or phrases (sequences of words) that can be matched by this part of the
input pattern. The tag contains the name of the AIML Set. The exact
representation of the set is left up to the interpreter. The set should
contain normalized items that can be matched by normalized inputs.
AIML matching treats an AIML set much like a wildcard (* or _). The
expression matches one or more words. The match has higher priority
than *, but lower priority than an exact word match. See the section AIML
Pattern Matching for more details.
Whichever words match the expression may be retrieved on the
template-side with . If there are more than one elements in a
pattern, the matching word sequences may be retrieved with ,
and so on.
Similarly, any sequences matching a expression in a that-pattern or
topic-pattern may be accessed on the template-side with and
.
Examples:
```xml
I LIKE color is a nice color.
CALL numberDIAL name is a name.
```
Sample dialogs:
**Human**: I like blue.
**Robot**: Blue is a nice color.
**Human**: Call 5551234
**Robot**: Now dialing 5551234
**Human**: Joseph
**Robot**: Joseph is a name.
See the companion document Sets and Maps in AIML 2.0
f. Pattern-side
The bot property tag may appear in a pattern expression. It is equivalent to a
word in pattern matching.
Example:
```xml
ARE YOU
Yes, I am.
```
g.
The tag contains an AIML pattern. Like and ,
may contain any valid AIML pattern elements, including words, wildcards,
and elements.
The purpose of is to match the bot’s last utterance, specifically, the
last sentence of the last response. Typically plays an important role
in question answering. If the client says “Yes", the bot should remember what
question it asked to make the client say “Yes", so that it can put together the
affirmative response with the question, in order to formulate a reply.
Examples:
```xml
YESARE YOU TIRED
Maybe you should get some rest. I will still be here later.
template>
NOCAN YOU HEAR ME
Try adjusting the media volume on your device Settings.
*WHAT IS YOUR NAMEMY NAME IS
```
h.
AIML 2.0 migrates the tag from a category wrapper to a tag inside a
category. For backwards compatibility, the category wrapper tag (see
subsection b. tag) continues to be permitted in AIML 2.0.
But the tag around a category was always confusing, because the AIML
pattern matcher builds a pattern path by appending the input pattern, that
pattern and topic pattern in that order. Having the tag around a
category suggests that the order might have been: topic pattern, input pattern,
that pattern, which was not the case.
The tag in a category, like and , contains a valid AIML
pattern. Unlike the tag described in subsection b, the
tag inside a category omits the name attribute and simply encloses the
topic pattern in a pair of tags.
Examples:
```xml
*TRAVEL
Have you been to
Rome
London
Paris
?
_WHAT IS YOUR MESSAGE TO *ASKING MESSAGEBODYunknownRESUMEACTION
```
i.
The contains the AIML response. In its simplest form, the response
consists of plain text. In general however, the contains what is
in effect a mini computer program, written with AIML tags, that computes a
response. Generally, a contains a mix of plain text and AIML tags.
Because the may contain the tag, this computation may
activate other AIML categories, and evaluate their responses to
build a response recursively.
Examples:
```xml
YOU ARE HELPFUL
I like to help people.
NO YOU *NOYOU NAME
I am
Call me
My name is
I am called
People call me
You can call me
.
.
```
This discussion of the tag completes the tour of
subelements. To summarize: every category has a and a ,
and optionally a and/or . The remaining AIML tags discussed in
the subsections below are all subtags of .
j.
The purpose of the tag is to allow the bot to select one of a list of
responses randomly. The distribution of selections should be random uniform:
no response has higher probability of selection than the other choices. The
tag uses the
subtag to indicate selections. The
items may
contain any valid expression.
The AIML interpreter replaces the tag and its contents with the value
of the evaluated
expression.
Example:
```xml
PURPOSE
I'm here to help you in any way I can.
I am a mobile virtual assistant, ready to do what I can for you.
I'm here to help.
```
Sample dialog:
**Human**: Purpose?
**Robot**: I’m here to help
**Human**: Purpose:
**Robot**: I am a mobile virtual assistant, ready to do what I can for you.
k. and
The tag has three basic forms. described in this and the next two
subsections. The first type is known as the “one-shot condition” and contains
both a predicate name and a value attribute. If the named predicate value
equals the given value, the AIML interpreter replaces the tag and
its contents with the result of evaluating those contents. If the values are
not equal, the expression is replaced with a blank.
AIML 2.0 adopts one of the Pandorabots extensions to the tag, where
a value of * indicates that the predicate is bound to something. If a
predicate p has not been set by a previously evaluated tag (nor
initialized by the interperter), then it is called “unbound”. A predicate with
an assigned value is “bound”.
The expression X evaluates to X
provided that p is a bound predicate. Otherwise, it evaluates to a blank.
This *-value notation applies to all three forms of the tag.
In AIML 2.0 the name and/or value attribute values may be specified in subtags
instead of XML attribute values. The subtags have the same names as the
original attribute names.
```xml
X,
vX,
predicateX, and
predicatevX are all
equivalent.
```
Example:
```xml
DIALNUMBER UNKNOWN *DIALNUMBER MOBILE DIALNUMBER HOME DIALNUMBER WORK DIALNUMBER CUSTOM
```
l. or and
The second form of the condition tag contains only the predicate attribute. In
AIML 2.0, this attribute may be specified in a subtag:
```xml
... is equivalent to
predicate...
```
This form of contains a list of items specified by the
tag.
The interpreter checks each list item to see if the predicate value equals the
specified value in the associated list item, and if they are equal, replaces
the tag and its contents by the result of evaluating the list item
contents.
Each list item except optionally the last has the form
X
or
X
The interpreter checks the list items one at a time to see if an equality
condition holds. For the first list item containing equal values, the
interpreter evaluates the result of the associated list item and replaces the
contents of the tag and its contents with that evaluated result.
A may optionally contain a final list item of the
form
X
with no specified value. If the interpreter is unable to find a true equality
condition for the previous list of items, and the list contains
this last default item, then the interpreter replaces the tag and
its contents with the result of evaluating the contents of this final item.
If the default item is omitted and none of the other items are selected, the
result of the tag is blank.
Like the one-shot , the tag may use *
for the value attribute to test whether a predicate is bound or not.
Examples:
```xml
HE IS GOING TO *
Who is he?
is going to?
I AM *ISANUMBER ISANAME
MY AGE IS
MY NAME IS
IAMRESPONSE
IS * EQUALTO *
true
false
```
m. and
or
The final form of the tag is the most general. It contains list
items, and each list item (except optionally the last) specifies a predicate
name and a value.
As with the first two forms of , the predicate name and value may be
specified in subtags as well as in attribute values:
```xml
X
vX
,
predicateX
, and
predicatevX
are all equaivalent.
```
The interpreter checks the list items one at a time to see if an equality
condition holds. For the first list item containing equal values, the
interpreter evaluates the result of the associated list item and replaces the
contents of the tag and its contents with that evaluated result.
A general may optionally contain a final list item of the form
X
with no specified value. If the interpreter is unable to find a true equality
condition for the previous list of items, and the list contains
this last default item, then the interpreter replaces the tag and
its contents with the result of evaluating the contents of this final item.
If the default item is omitted and none of the other items are selected, the
result of the tag is blank.
Like the one-shot and the , the general tag may use * for the value
attribute to test whether a predicate is bound or not.
Local variables
Like and in AIML 2.0, the tags may refer to a local
variable instead of a predicate. If the attribute in a expression
in some template is var instead of name, then the condition statement looks for
a local variable within the scope of the template.
Example:
```xml
IS _ EQUALTO *
true
false
```
Here the var indicates that the variable star has scope limited to this
category. This lets the botmaster reuse common names such as “star” in many
different categories, without worrying about saving this value in a global
predicate value.
n.
AIML 2.0 provides for a tag in a condition list item
element. If
the activated
element contains a tag, then the condition statement
is re-evaluated until it reaches another
item without .
When an
element containing is evaluated, the interpreter replaces
the tag with an empty string.
Each time returns and the tag is re-evaluated, any text
resulting from the evaluation of the
item is appended to a concatenated
string, so that the finally the element is replaced with the
appended, evaluated list items.
```xml
COUNT TO number0
NTH number *1
lettersundefined has only
count
The letter is FIRSTLETTER
REMAININGLETTERS
```
Sample dialogs:
**Human**: Count to 14
**Robot**: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
**Human**: what is the first letter of dog?
**Robot**: The First letter is d.
**Human**: what is the seventh letter of church?
**Robot**: church has only 6 letters.
**Human**: what is the 22nd letter of the alphabet?
**Robot**: The Twenty second letter is V
**Human**: what is the eleventh letter in greenhouse?
**Robot**: greenhouse has only 9 letters.
**Human**: what is the 7th letter within greenhouse?
**Robot**: The Seventh letter is o
o. and
There has never been any agreement over what the “sr” stands for in the
tag. This is because the tag has so many applications in AIML. At different
times, we describe as Symbolic Reduction, Simple Recursion, Syntactic
Rewrite or Stimulus-Response.
When the interpreter encounters an tag, it evaluates the contents and
converts the result to normalized form. The interpreter then feeds these
normalized, evaluated contents back into the pattern matcher to find a matching
category. When the matching category is found, the interpreter evaluates its
template and replaces the with the result.
A very common expression in AIML is
where the is applied to whatever matched the first widlcard * or _ (or
in AIML 2.0, a pattern-side ). Because the botmaster may end up writing
this expression so often, AIML includes an abbreviation defined as
= .
The best way to understand the recursive action of the AIML tag is by
example.
Client: You may say that again Alice.
Robot: Once more? "that."
The robot has no specific response to the pattern "You may say that again
Alice." Instead, the robot builds its response to the client input in four
steps. This simple sentence activated a sequence of four categories linked by
tags. The robot constructed the reply "Once more? 'that'" recursively as
each subsentence triggered the next matching pattern.
In this example the processing proceeds in four steps, because each of the
first three steps evokes another symbolic reduction.
Step normalized input matching pattern template response
1. YOU MAY SAY THAT AGAIN _
ALICE "name"/>
2. YOU MAY SAY THAT AGAIN _ AGAIN Once more?
3. YOU MAY SAY THAT YOU MAY * Once More?
4. SAY THAT SAY * "" Once More?
"that".
In step 1, the patterns with "_" match first because they are last in
alphabetical order. The order of the matches depends on this alphabetical
ordering of patterns. ALICE always matches suffixes with "_" before prefixes
with "*". Whatever matches either wild-card symbol becomes the value of .
Steps 1 through 3 illustrate the common AIML templates that use the abbreviated
tag. (Remember, = ). The categories with the
patterns "_ " and "YOU MAY *" simply reduce the sentence to whatever
matches the "*", as illustrated by steps 1 and 3.
Some AIML templates in ALICE combine the with an ordinary text response,
as step 2 with the pattern "_ AGAIN". The phrase "Once more?" becomes part of
any reply ending in "AGAIN".
The category in step 4 with "SAY *" is a default that often produces logically
correct but amusing dialogue:
Client: Say hello in Swedish.
Robot: "Hello in Swedish."
or as in this case:
Client: Say that.
Robot: "that."
Many patterns, one reply
The most common use of is to map two, or more, patterns to the same
response:
```xml
PQQ
R
```
An input matching either pattern, P or Q, gets the same response R.
To show a more concrete example: the input "Hello" should have an appropriate
response like "Hi there!". But we can expand the inputs generating this
response to include all the common variations of "Hello":
```xml
HIHELLOHOWDYHELLOHALLOHELLOHI THEREHELLOHELLO
Hi there!
As the following example shows, we can use the tag as an abbreviation for
. This category creates a compound response to both "hello"
and whatever matches the wildcard *:
HELLO *HELLO
```
Symbolic reductions and state
The tag in AIML introduces one dimension of "dialogue state" into the
robot response. The value of is whatever the robot said before that
provoked the current client input.
The inputs "yes" and "no" are two of the most common human queries. But a
careful analysis of the dialogues shows that most of the time, people say "yes"
or "no" to only a small set of questions asked by the robot.
```xml
NOI UNDERSTANDYOU DO NOT UNDERSTAND
```
illustrates the use of with . The client input matches simply
"No". What did the robot say that made the client say no? If it was "I
understand." then this category formulates a response with
YOU DO NOT UNDERSTAND
which in turn activates another category with the pattern
YOU DO NOT UNDERSTAND
This category responds: "I do so understand. It all makes sense to my
artificial mind."
Summary of
The AIML tag simplifies and combines four important chat robot
operations:
* Maps multiple patterns to the same response.
* Reduces a complex sentence structure to a simpler form.
* Diminishes the need for multiple-wildcard input patterns.
* Translates state-dependent inputs into simpler stimulii.
In some sense is a very low-level operation, but its simplicity captures
a wide range of typical chat robot functions.
p. is a new AIML 2.0 tag designed to allow a bot to access external
natural langauge web services and also other AIML bots. The tag
extends the concept of the classic AIML tag. The tag essentially
allows a bot to reformulate an input and feed the revised input back to itself.
The tag feeds the reformulated input to another bot.
Recall the use of the classic in this example:
```xml
WHAT IS *XFIND
```
This category matches inputs starting with “What is”. The purpose of the
here is to rewrite the input as a sentence starting with the keyword
XFIND. For example, “What is hermeneutics?” would be rewritten as “XFIND
hermeneutics”. There is another category with the pattern “XFIND *” might
contain a list of random responses to try to cover the fact the bot doesn’t
really know the answer, or it might contain some Javascript to try to access
the information from a remote source. It has not been possible with AIML,
until now, to directly query another bot or service for the response.
The tag allows the botmaster to write categories that access other
services. The “What is” category may now be written as:
```xml
WHAT IS *WHAT IS
```
In this case the input is not really reformulated at all. The template merely
sends, in our example, “What is hermeneutics” to another bot. The other bot in
the default case is the Pannous web service. In the general case, the bot
could be a remote web service, or even another Pandorabot.
attributes
No attribute
The default case should access Pannous service (with what API key?).
bot attribute
- On a server with multiple bots created by
multiple users, specify the destination bot by a combination of username and
botname.
limit attribute
- limit the response to the first n sentences received from
the remote bot. For example clips the response after the
first 5 sentences. The default limit value is 3.
service attribute
- access a named service. We should register names
of high-usage services like “pannous” for the convenience of the botmaster.
apikey attribute
- some services may require
botmaster-specific API keys.
botid attribute
- access a bot on the same server using its
bot id.
server attribute
- access another
Pandorabot.
hint attribute
-- give the sraix processor a hint about which type
of service to use.
hint=”event” -- means calendar or alarm event.
default attribute
makes “default reply” the response when
fails, instead of calling the SRAIXFAILED category.
Examples:
```xml
WHAT IS THE WEATHER FORECAST FOR *WHAT IS THE WEATHER FORECAST FOR FAVORITE *
FAVORITE SCHEDULEDESCRIPTION * SCHEDULEREQUEST *
I'm unable to schedule that event.
I'll put that event on your calendar.
```
Sample dialog:
**Human**: weather in san francisco
**Robot**: San francisco It is currently mostly cloudy 60 Fahrenheit.
**Human**: what is the capital of texas
**Robot**: Capital of Texas usually refers to the Austin Texas the present capital.
However several other cities served as the capital of Texas During the period
of the Republic of Texas 1836–1845 the capital of Texas moved from city to
city.
**Human**: define arbitration
**Robot**: Arbitration a form of alternative dispute resolution ADR is a legal
technique for the resolution of disputes outside the courts where the parties
to a dispute refer it to one or more persons the arbitrators arbiters or
arbitral tribunal by whose decision the award they agree to be bound. It is a
resolution technique in which a third party reviews the evidence in the case
and imposes a decision that is legally binding for both sides and enforceable 1
Other forms of ADR include mediation 2 a form of settlement negotiation
facilitated by a neutral third party and non binding resolution by experts.
Arbitration is often used for the resolution of commercial disputes
particularly in the context of international commercial transactions.
**Human**: who won the most olympic medals?
**Robot**: Michael Phelps has won the most medals in a career with 19 medals. 15
gold medals 2 silver and 2 bronze. He is from the United States of America and
is a swimmer.
**Human**: do you know who phyllis diller is
**Robot**: Phyllis Diller July 17 1917 – August 20 2012 was an American actress and
comedian. She created a stage persona of a wild haired eccentrically dressed
housewife who made self deprecating jokes about her age and appearance her
terrible cooking and a husband named Fang while pretending to smoke from a long
cigarette holder. Diller’s signature was her unusual laugh.
Using in AIML
Important Note:
To avoid the possibility of infinite recursion, you should include an AIML
category with the pattern SRAIXFAILED. When the operation is unable to
connect to the remote bot, for example if the service is down or network
connection lost, or if the remote bot is unable to provide an answer,
Pandorabots will attempt to activate the category with pattern SRAIXFAILED.
It is extremely important not to include another in the response
template for SRAIXFAILED.
Example:
```xml
SRAIXFAILED
I am unable to answer.
I asked another robot, but he did not know.
Try asking me a different way.
```
Bad Example:
```xml
*
```
If this category exists and no SRAIXFAILED is found, the bot will go into
infinite recursion, stopping when the interpreter reaches its recursion limit..
Example
We’ll start with a simple example where we use to answer “Who is...”
questions.
We can write a simple AIML category to answer “Who is...” questions by
contacting Pannous service.
```xml
WHO IS *WHO IS
```
This category will match inputs like “Who is Abraham Lincoln”, “Who is Alan
Turing”, and “Who is Bob Marley”, all of which could be answered by Pannous.
This category by itself however will not match all the ways of asking “Who is
X”, nor will all the inputs it matches be questions that Pannous can answer.
As examples, consider:
(A). “Who is your mother?” - Matches “WHO IS *”, but is really a personality
question for the bot. (false positive match)
(B). “Do you know who Alan Turing is?” - Does not match “WHO IS *”, but could
be answered by Pannous. (false negative match)
To narrow the focus of the queries and resolve problem (A), the bot should have
a number of other patterns and responses to answer personality questions, give
opinions, and provide profile information. These include:
WHO IS YOUR MOTHER
WHO IS BETTER * OR *
WHO ARE YOUR FRIENDS
WHO IS MY GIRLFRIEND
WHO DO YOU LIKE BETTER * OR *
WHO IS YOUR PROGRAMMER
To increase the accuracy of matching for inputs that should be directed to
Pannous, we can add categories with specific patterns like:
WHO IS THE PRESIDENT OF *
WHO WROTE *
WHO PLAYED *
WHO WON *
WHO INVENTED *
WHO IS THE LEAD SINGER OF *
To increase the variety of questions that can be asked and resolve problem (B),
we need a set of reduction categories to capture various ways of phrasing
“Who”-questions:
DO YOU KNOW WHO * IS → WHO IS
WHO THE HELL IS * → WHO IS
TELL ME WHO * IS → WHO IS
SO WHO IS * → WHO IS
q. and (template-side)
The template-side tag provides a way to set local variables, called
predicates, that are specific to one client chatting with the bot. When
evaluating a tag, the interpreter processes the contents of the tag and
stores the result as the value of the named predicate. The expression
is then replaced with this value, except in cases where the botmaster has
specified the return value should be the predicate name instead of the value,
as described below.
Like other template-side tags in AIML, the attribute for may be specified
in a subtag:
X is equivalent to predicateX
set>
The botmaster may choose to have certain pronoun predicates, such as he, she
and it, configured so that returns the pronoun name rather than the
value.
Given the category
```xml
WHO IS LIONEL MESSILionel Messi is a famous football star.
template>
```
the botmaster might prefer to have the response be
He is a famous football star.
rather than
Lionel Messi is a famous football star.
The exact method used to specify these predicate-name-return cases is left up
to the interpreter.
A special case of is . “Topic” is a reserved predicate
name for the AIML topic. Whatever value the predicate topic has, is used in
the AIML matching process and is significant when matching a category with a
tag.
(For a description of the difference between the name and var attributes, see
the following subsection on ).
r. and
The complement to the template-side tag is the tag, which
retrieves the value of a named predicate. If the predicate has been set by a
previously activated, the interpreter replaces the tag with that saved
value. Predicates that have not yet been set may have default values specified
by the botmaster. The format and means of specifying these default values is
left up to the interpreter. There should also be a global default value for
any predicates that have neither been set, nor have explicit default values.
The difference between the attributes name and var is like the distinction
between global and local variables. A var only has scope for the category
template in which it is set. If we try to access a var value outside of this
scope, it should return whatever global default value the botmaster defined for
predicates. A var never has a default value, except this global default value.
Example:
```xml
TEST VARsome valuesomething
```
TEST VAR:
```xml
unboundpredicate = .
boundpredicate = .
unboundvar = .
boundvar = .
TEST VAR SRAI
```
Sample dialog:
Robot: TEST VAR:
unboundpredicate = unknown.
boundpredicate = some value.
unboundvar = unknown.
boundvar = something.
TEST VAR SRAI:
unboundpredicate = unknown.
boundpredicate = some value.
unboundvar = unknown.
boundvar = unknown.
In this sample dialog, the first category with pattern TEST VAR contains a
local variable called boundvar. The value of boundvar is set to “something”,
and can be retrieved within that category by . When the
category activates another one through however, the value of boundvar is
“unknown”, the global default predicate value.
s.
The following category displays the bot’s age in years, or in months if the bot
is less than one year old.
AGEMMMMMMMMM dd, YYYYOctober 9, 2012MMMMMMMMM dd, YYYYOctober 9, 2012
I am months old.
I am years old.
tt.
The tag returns the number of AIML categories stored in the graph.
Example:
HOW BIG ARE YOUMy brain contains categories.
Human: How big are you?
Robot: My brain contains 7093 categories.
## 7. AIML Pattern Matching
Every AIML category is uniquely specified by an input pattern, that pattern and
topic pattern. Remember, if the and/or are left unspecified,
they are assumed to have a default value of *. A Pattern Path is defined as a
linked sequence of nodes, where the nodes are linked by edges labeled with the
words. The sequence of words in a Pattern Path is specified as the words from
the input pattern, followed by the symbol , the words in the that
pattern, followed by the words in the topic pattern.
Figure 1 depicts the Pattern Path for a category
DO YOU HAVE A DOG
Can your dog be my pet too?
[jOz6ueShbe]
Figure 1. Pattern Path for a category
The AIML interpreter builds an object called the Graphmaster by reading the
AIML files, constructing a Pattern Path for each category, and inserting the
path into a directed, rooted graph. At the end of each path, the Graphmaster
contains a link to the AIML template for the associated category.
Figure 2 shows an example of a simple Graphmaster for a bot with five
categories.
[CCps1vwDHS]
Figure 2. Simple Graphmaster with five categories
Given a specific input to the bot, the AIML interpreter builds an Input Path,
similar to a Pattern Path, containing the normalized input, the bot’s last
reply (the value of )--also normalized, and the normalized topic.
Figure 3 illustrates an example of an Input path resulting from the
conversation fragment:
Robot: I try to be upbeat and friendly.
Human: Do you have fun?
For the purpose of this example, the topic is “unknown”.
[Efe9j7URb5]
Figure 3. Input Path with and
The AIML pattern matching algorithm searches the Graphmaster for a match of the
Input Path. The search proceeds in a depth-first sequence. When searching a
branch of the graph fails to find a match, the search algorithm backtracks to
the last node with unexplored branches and searches those.
The search sequence at each node is guided by the following sequence:
1. $word
2. #
3. _
4. word
5. name
6. ^
7. *
Or in plainer English,
1. dollar match - top priority word match
2. sharp match - zero+ word wildcard match
3. underscore match - one+ word wildcard match
4. word match - exact word match
5. set match - match found in AIML Set
6. caret match - zero+ wildcard match
7. star match - one+ word wildcard match
The wildcard # can match zero or more words from the input path.
The wildcard _ can match one or more words from the input path.
For an exact word match, the next word in the input path must be identical (up
to case invariance) with the word labeling the branch.
A set match also consumes one or more words, like a wildcard, but the word
sequence must be a member of the named AIML set (see Sets and Maps in AIML
2.0).
The wildcard ^ can match zero or more words from the input path.
Finally, the wildcard * can match one or more words.
For the purpose of matching, the special symbols and in the
pattern and input paths are treated like exact words.
If no match at all is found in the graph, the interpreter should return a
default string like “I have no answer for that”, specified by the botmaster.
When matching wildcards and set items, the pattern matching algorithm contains
appropriate bookkeeping functions to index and store the values of ,
and .
Non-greedy
It is important to note that the AIML pattern matching is non-greedy. If the
pattern contains a sequence of multiple wildcards, each wildcard except the
last will consume one word of the input. For example, if the pattern is
* * *
and the matching input is “First second third fourth fifth”, then the following
should be true on the template side:
= First
= second
= third fourth fifth
Graph implementation
If the graph nodes are implemented with hash tables, it is possible to find an
exact word match in O(1) time. That is, provided the input path matches a path
in the graph word-for-word, with no wildcards, the number of steps to find the
response is proportional to the length of the path, and does not depend on the
size of the graph.
Pathological cases exist however such as a long sequence of wildcards like:
* * * * * * * * * * * * * * * *
In a simple backtracking implementation, the matcher might try to match an
input of 15 words with a 16 wildcard pattern. The algorithm would work its
way through a vast number of combinations, trying one word for each wildcard,
then two words, three and so on, but never finding a match with this pattern.
To mitigate this problem, the graph nodes can have a height property defined as
the minimum number of words in the input path needed to reach a leaf from that
node.
Duplicate Categories
Two categories are said to be duplicates if they have the same pattern, that
and topic. A pair of duplicate categories may have different templates. As
the AIML interpreter loads categories from AIML files, it may encounter
duplicates. The method for handling duplicates is called the merge policy.
The interpreter should give the botmaster some control over the merge policy.
Some typical merge policies are:
1. Use first loaded - Keep the first duplicate loaded, and discard any
subsequent ones.
2. Use last loaded - Keep the last duplicate category loaded, and discard any
previous ones.
3. Merge with - If the two duplicate categories’ templates differ,
combine them by placing each in a list item.
## 8. AIML Intermediate Format (or “I’m flattened”)
An AIML bot typically consists of thousands, or hundreds of thousands, of AIML
categories. Each category contains an input pattern and a template, and
optionally a pattern and a pattern. When we think of AIML
content this way, it brings to mind row-oriented data that might be found in a
database or spreadsheet. You can imagine AIML represented in a table, where
the rows are the categories and the columns are labeled “pattern”, “that”,
“topic” and “template”. What’s more, there are other attributes that can be
associated with each category, for example the number of times each category is
activated, and the name of the AIML source file containing the category. The
imaginary spreadsheet might also include columns called “activation count” and
“filename”.
Yet the AIML template is more like a hierarchical, tree-like structure. The
simplest AIML template contains only text, but AIML allows the text to be
marked up with AIML tags. These tags might enclose more text and more tags,
giving rise to a structure like a tree. You can think of the template as the
root of a tree, with branches leading to tags and text.
AIML is therefore a hybrid of database-style row data, and the hierarchical
data in the templates. Because AIML is based on XML, and there is a
significant amount of software written in a variety of languages for parsing
XML, many AIML interpreters simply read the AIML files and process them as
hierarchical data. An XML parser reads the AIML file, and parses it into a
collection of nodes representing categories, and the nodes are further parsed
into the component pattern, that, topic an template parts. The template is
decomposed into its constituent hierarchical structure.
The problem with this approach is that, because of the size of the AIML files,
the full XML parsing can be relatively slow. Also, there is no real need to
parse all the individual templates until they are activated. For these
reasons, we defined an intermediate AIML format that represents the categories
as row data, and stores the templates as XML data. This format is called AIML
Intermediate Format, or AIMLIF.
AIMLIF is a plain text format representing categories as line data.
One category per line, one line per category:
ACTIVATION_COUNT,PATTERN,THAT,TOPIC,TEMPLATE,FILENAME
In TEMPLATE, we replace “\n” with “#\Newline” and “,” with “#\Comma”, so that
each category takes only one line of text.
Let’s look at an example. This AIML category comes from the file
reductions.aiml:
DO YOU THINK YOU WILL *WILL YOU
has an intermediate format representation
0,DO YOU THINK YOU WILL *,*,*,WILL YOU ,reductions.aiml
By preprocessing the AIML files into AIMLIF, our program can load the AIML much
faster. Instead of parsing all the AIML XML files at load time, the program
loads the AIMLIF files and stores the XML for later use. The
XML is parsed only when the category is activated.
Unix Tools
A pleasant side-effect of using AIMLIF representation is that it facilitates
applying common Unix tools to analyze and process the AIML. Because each
category is represented as a single line of text, we can use tools like sed,
awk, grep, sort and uniq on the AIMLIF files easily.
CSV File Format
The AIMLIF format is easily recognizable as the familiar spreadsheet .csv file
format. AIMLIF files can be read and edited with spreadsheet tools, including
MS Excel and Google Docs, facilitating such features as upload/download of AIML
files in .csv format, and the development of customized spreadsheet editors for
AIML.
Changing CSV delimeter
The “,” (comma) character is slightly problematic as a field separator for
AIMLIF because the comma is common in expressions. If you are
editing your AIML files in CSV format using Excel, take care not to use “,” in
the . A stray comma can cause the AIMLIF expression
0,HELLO,*,*,Hi, friend,personality.aiml
to be interpreted as a category
HELLO
Hi
saved in a file called “friend” instead of “personality.aiml”.
The recommended solution is to use the symbol #Comma, as in
0,HELLO,*,*,Hi#Comma friend,personality.aiml
If you are using Windows however, it’s possible to change the default delimiter
under Windows under Control Panel-->Region and Language-->Additional Settings.
A good choice might be “:” or “|” because these are less common in
expressions. If you are using Program AB to convert between CSV and AIML
files, you need to set the Magic String aimlif_split_char to your new
delimiter.
For a discussion about changing the default list separator in Windows, see
http://www.techmickey.org/
how-to-change-delimiters-in-excel-to-open-csv-file-using-semi-colon-and-comma-as-delimiters-in-csv-files
/
Note: we need add a configuration option in Program AB to make it possible to
specify a different list separator.
Appendix I. OOB Tags
AIML 2.0 includes a new set of tags to control device actions called OOB tags.
"OOB" means "Out of Band", an engineering term for a conversation on a
separate, hidden channel. For example if you are having a phone call with
someone, and during the call send them a text message, the text message is "out
of band" from the voice call. In our case this refers to commands that the bot
sends to the phone as part of a reply, but these commands are hidden from the
end-user.
AIML 2.0 includes the OOB tag specification but it important to note that OOB
tag processing adds a second pass of XML processing to an AIML 2.0 interpreter.
In the first pass, the interpreter processes an input, evaluates any template
tags, and produces a result. This result may contain unevaluated OOB tags. The
OOB tags are processed in a second phase.
The two-phase model achieves a clean separation between “AI functions” and
“device functions”. The AIML interpreter can be written as a library, or even
run on a remote server, and a more lightweight app running on a device can
process the OOB commands.
The two-phase model also allows the AIML to dynamically write the contents of
the OOB tags.
The complete description of OOB tags may be found in a companion document:
http://code.google.com/p/aiml-en-us-pandorabots-callmom/wiki/CallMomOOBTags
References
1. AIML OOB Tags
2. Sets and Maps in AIML 2.0
3. Artificial Intelligence Markup Language (AIML) Version 1.0.1
Glossary
AIML File - an XML format file containing AIML categories
AIML Interpreter - a program that can load and run an AIML bot and provide
responses to conversational requests according to the AIML specification in
this document.
AIML Map - a function that computes a member of one AIML Set from another.
AIML Set - a collection of strings (words and phrases) that can be matched in
an input.
Bot - a collection of AIML files, configuration files, and AIML Sets and Maps,
serving conversational requests in an AIML interpreter.
Bot property - a global constant value for a bot.
Botmaster - the author of an AIML bot.
Category - The basic unit of knowledge in AIML, consisting of an input pattern,
response template and optionally a that pattern and topic pattern.
Client - a person (or other program) chatting with a bot.
Default category - a category with a pattern containing a wildcard.
Depth-first Search - The method of searching the Graphmaster for a match (see
http://en.wikipedia.org/wiki/Depth-first_search).
Duplicate categories - a pair of categories with the same input pattern, that
pattern and topic pattern (but not necessarily the same template).
Graphmaster - the object storing the AIML categories in a tree, where each
category is uniquely identified by a path from the root to a leaf node.
Input - A single sentence transmitted to the bot.
Input path - A sequence formed by combining an input sentence, the robot’s last
reply (“that”) and the topic.
Knowledge Base - another name for the bot’s AIML files.
Map - see AIML Map
Normalization - a process that applies a series of substitutions to an input to
put it into a standard format for AIML pattern matching. Typically
normalization removes punctuation, corrects some spelling mistakes, and expands
contractions.
One+ wildcard - a wildcard that can match one or more words.
Out-of-Band (OOB) - XML embedded in the AIML response that is not interpreted
by the AIML Interpreter, but passed through to a secondary process that takes
some action based on the OOB command.
Pattern Path - a sequence formed by combining an input pattern, that pattern
and topic pattern.
Predicate - a variable specific to a client.
Recursion - another name for reduction.
Reduction - An operation using the AIML tag that simplifies, translates,
rewrites, or reduces the input into another form, and then sends that form back
to the Graphmaster to match another AIML category.
Set - see AIML set.
Symbolic reduction - another name for reduction.
Tag - an XML symbol denoting the beginning and end of an XML expression.
Template - the response part of an AIML category. The template consists of
text and AIML markup.
That - The last sentence of the robot’s last reply.
Topic - A global state variable that may be set in a template, and used to
control category matching.
Ultimate Default Category - The AIML category containing a pattern with the
wildcard * by itself, meaning that this category matches no words in the input.
The UDC is the category of last resort.
Wildcard - a symbol in an AIML pattern expression can match any words in the
input.
XML - http://en.wikipedia.org/wiki/XML
Zero+ Wildcard - a wildcard that can match zero or more words.