2012
2012 Ninth
                             Ninth International
                                   International Conference
                                                 Conference on
                                                            on Information
                                                               Information Technology
                                                                           Technology-- New
                                                                                        New Generations
                                                                                            Generations
         Development and Implementation of a Chat Bot
                     in a Social Network
                                       Salto Martínez Rodrigo, Jacques García Fausto Abraham
                                                                                 ELIZA is an informatics program design in 1966 by Joseph
 Abstract— This document describes how to implement a Chat                    Weizenbaum who was trying to keep a coherent conversation
 Bot on the Twitter social network for entertainment and viral                with the user. ELIZA searches key words within the text
 advertising using a database and a simple algorithm. Having as a             written by the user and it replies with a phrase from its
 main theme a successfully implementation of a Chat Bot                       database.
 preventing people classify it as SPAM, as a result of this a Twitter
 account (@DonPlaticador) that works without the intervention                 II.II ALICE
 of a person and every day earns more followers was obtained.
                                                                                 ALICE (Artificial Linguistic Internet Computer Entity) is
 Index Terms— Social Media, Sentence Processing, Knowledge
 Databases, Artificial Intelligence.
                                                                              an Internet project, part of the Pandora Project.
                                                                                 This project involves the development of many types of
                                                                              bots especially chat ones. In ALICE’s webpage, the user can
                         I. INTRODUCTION                                      chat with an intelligent conversation program, which simulates
                                                                              a real talk. This way, the user may have problems to realize
 Y    ears ago, Alan Turing proposed the question “Can
                                                                              they are talking with a robot.
                                                                                 This technology is developed in Java by Dr. Richard S.
                                                                              Wallace.
 machines think?”. Since then, there have been a large number
 of Bots pretending to answer this question and pretend to
 successfully complete the “Turing’s test” [1].                                                         III. DESIGN
    The complexity is clear, and there exist a fair number of
 methods to build a Chat Bot. Generally they are implemented                  III.I Analysis
 on IRC channels, trying to cover a wide range of issues and
 topics, but also leaving aside so many more opportunity areas.                  An area of opportunity for the development and
 The method exposed in here pretends to take advantage of                     implementation of a Chat Bot is the Social Network Twitter,
 (social interaction electronic platforms) Social Networks, their             since it parts from a simple concept, the exchange of short
 usage and the general rules to implement a Chat Bot oriented                 messages no longer than 140 characters which drastically
 to a specific topic. With the help of a Relational Database, to              reduces the amount of information and the way it is published.
 create a dictionary with key words and phrases, the Chat Bot                 The limited number of characters represents a wonderful
 is capable of answering questions, making specific searches                  advantage and an opportunity to improve the Chat Bot
 and keeping a conversation [2].                                              performance, as it drastically reduces the amount of
 The most common ways to use a Chat Bot are 1) Advertising                    information the bot receives and processes, allowing the
 (Spam), 2) Entertainment and 3) Customer Service                             generation of a more accurate and detailed database with the
 (Knowledge Databases).                                                       topic it manages.
                                                                                 It is necessary for the bot’s accurate performing, to define
                                                                              the objective and topics it will have knowledge of, since the
                                                                              obtained replies are based in the text, phrases and words it
                                                                              receives from the users. It is also important to sort this phrases
                                                                              and words by relevance and resemblance, so the answers can
                                                                              be as correct as they could.
                                                                              The easiest way for the bot to obtain answers depends, on the
                                                                              way the users write their messages (Twitts). The bot compares
                                                                              a Twitt with its database, which, as previously said, is sorted
                                                                              by relevance of words and phrases, until it finds a suitable
978-0-7695-4654-4/12 $26.00 © 2012 IEEE                                 751
DOI 10.1109/ITNG.2012.147
answer. If this answer cannot be found, the Twitt is saved into            user who is following the bot can look at the generated
the database for a posterior analysis to improve its capabilities.         answers, so it will not be considered as SPAM.
   It is possible to define one or more answers for one or more               Once the reply has been sent, the received message’s ID is
key words or phrases, this way the bot can find a suitable reply           stored for management purposes; this way, it is plausible to
to synonyms without repeating continuously the same answer,                avoid a reply to old messages.
which helps with the task of making it difficult for the users to             The full process y shown in Fig. 1.
realize they are not talking to a person.
It is important to highlight that the database does not contain
complete phrases; just the important and significant parts are             III.II.IV Pseudocode
in there, avoiding the storing of prepositions or any other data
that does not represent relevant information.                              Variables:
                                                                           ROW: DB Result Set.
III.II Algorithm                                                           Answer: Flag.
                                                                           LastID: Twittermessage ID.
   The process is divided into three different parts: 1) Message
reception, 2) Message processing, 3) Generation of a suitable              Procedimiento:
reply.                                                                     LastID=0
                                                                           IF Mentions >0 :
III.II.I Message Reception                                                    For each Mentio:
                                                                                Answer=FALSE
   The bot must be capable of receiving messages written by a                   Get Answer From DB
user, regardless of the platform or the published method being                  While ROW :
used. The message must have its punctuation marks and                              If Key Word found &&Username_Sender!=
special characters removed. The message must be changed to                         Username_Bot:
upper or lower case (which must be previously defined).                               Get random answer
   Furthermore, it must be possible for the bot to know two                           Send answer
things. One, if the message was generated by the bot itself;                          Answer=TRUE
and two, if the message has been repeated. If the answer to                           LastID=id twitt
any of these questions is yes, the message must be rejected                        End (If) End
and the bot will not work with it. Is important to avoid the bot                (While)
having a conversation with itself and to stop the reply to a                    IfAnswer== False:
message that has already been replied to.                                          InsertTwitt in DB
                                                                                End (Id)
III.II.II Message Processing                                                  End (For each)
                                                                           End (If)
   After the message has been formatted, the bot must look for             If LastID>0:
the remaining words in the database.                                          Update DB LastID
   The information in the database is stored into a table; one             End (If)
field contains the phrases separated by the special character ‘|’
(pipe), another field stores the suitable replies to those key
words, also separated by ‘|’. Finally, a third field with numeric
values, which determinates the relevance of the coincidence, is
used to sort the results and to choose the reply with a higher
level of relevance.
   The process consists in looking through all the rows in the
table until a suitable reply can be found; once it gets a positive
result, the process halts regardless of the rows that have been
revised. If the process loops into every row in the table and a
suitable answer can’t be found, the original message is stored
into the database for a subsequent analysis, finishing, with
this, the message processing.
III.II.III Generation of a suitable reply
   If the process got a positive result during the previous step,
every possible answer in the chosen row is retrieved; these
answers are not classified in any specific way, for the purpose
of choosing one randomly. Once the random answer has been
picked out, the username that generated the message is added
to the beginning of it, so it can be avoided that every other
                                                                     752
      Fig. 1. General Chat Bot Process
753
III.III Searches                                                           III.VI Analysis of the Messages
   In addition to the capability of the bot to reply to users’                Once there is a considerable number of messages that
messages, it also has the ability to perform searches within               couldn’t be replied to, an analysis of the messages can be
Twitter, achieving this by using the logic operators AND, OR               conducted to have a clear idea of which are the most common
and NOT. Having a limited, but well-defined number of key                  replies a bot gets that were not contemplated in the Database.
words and phrases to look for, allows the bot to avoid SPAM.               This analysis must be done by a human being and can be done
   If it exists a clear way to start a conversation, only three or         in two ways.
four search terms are necessary to achieve it. The purpose of                 Analyzing the complete text of each message to understand
performing searches is to find users to start a conversation               what it wants to transmit and creating groups of topics that
with, although the most important part of the conversation will            have not been tended to, and according to this, generating new
later take place. Since Twitter accounts by default are public             entries in the Database that cover them.
Twitter offers a search engine that can be access to perform                  Also, messages can be divided into words and phrases to get
specific searches, its possible to personalize this searches               those that are most frequently shown and generate specific
defining the language of the Twittt, time, geographic zone.                answers for them. The problem with this method is that there
   It is also possible to search in the profile of the users, this         is not a full understanding of what the user is trying to express,
makes easier to know the users preferences. Performing this                and it is affected by syntax errors.
kind of searches makes the functionality of the Bot more
accurate.                                                                                     IV. IMPLEMENTATION
                                                                              For the implementation of this Chat Bot, a web server with
                                                                           Internet access, PHP 5+, MySql and access keys to the Twitter
III.IV Mentions
                                                                           API were used.
                                                                              The algorithm used for the implementation of this Bot can
   In a Twitter approach, a mention is a Twitt that contains the           work in any other programming language and with any other
username of another person. It is the easiest way to interact              database manager.
among users; therefore it is the bot’s most important part. The
message processing and the reply generation will always start
                                                                                             V. TESTS AND RESULTS
when a mention is received, this is the moment the interaction
between the user and the bot begins. Once a user has sent a                     To perform the tests, different Twitter accounts were
mention to the bot, it is because the bot has caught the                     created with different goals 1) @DonPlaticador 2)
attention of said user. From this point on, it is important to               @WootterC 3) Siguientescena_.
keep this attention. A real conversation with other users is
different from a search, because there is no way to know what                IV.I @DonPlaticador
will the users write or what are they pretending to express; this
is when the database plays an important role. If said database                This account’s goal is to entertain. It is the one that most
is well organized and as complete as it can be, the retrieved              closely follows the lineaments of a Chat Bot. @DonPlaticador
replies can seem really natural and allow a conversation to                resembles a Talking Parrot that lives to party and makes
flow without major problems.                                               company to lonely or bored people.
III.V Contests
   Companies have started noticing the importance of Twitter
to place their brand and reach more market segments. A very
common practice is offering promos or giving away products,
mainly through activities which main purpose is attracting the
                                                                             Fig. 2. @DonPlaticador
most number of people to get to know them and talk about
them. Most of these activities have as rules that people must                 @DonPlaticador uses searches to begin and follow
follow the company and encourage other people to do the                    conversations. To consider this account a success, it was
same, in order to win. Easing the management of these                      necessary for people not to report it as SPAM, for it to keep a
activities, validation of said rules, and informing the user of            conversation by more than 5 tweets in average without the
the progress of the activity can be automated using a Bot [3].             user noticing that they are talking to a Bot, and that the user
   Implementation of a Bot to take control of these tasks                  followed the account without them being followed back.
reduces the workload of the person in charge of the account,                  @DonPlaticador was created on July 16, 2010 and up to
as most times the amount of replies received in a couple of                today, October 1, 2011, it is still functioning without human
seconds is too large for a human being to process.                         intervention, in his profile shown in Fig. 3 @DonPlaticador
                                                                           has more than five thousand Followers and more than 212’000
                                                                           Tweets, with followers that do not know it is a Bot so far, or
                                                                           that think it needs a person to work.
                                                                     754
                                                                            The Bot was also used to respond the most common
                                                                          question the people had, question regarding performers shows,
                                                                          presentations, times, tickets sales and locations.
                                                                            Once again a wide coverage of digital media was achieved,
                                                                          with more than two thousand publications in Internet sites.
                                                                                                     VI. CONCLUSION
  Fig. 3. @DonPlaticador’s Profile
                                                                              It is difficult to create a Chat Bot if there is no specific goal.
  IV.II @WootterC                                                         Only by having a good idea of what is intended to be
                                                                          achieved, and studying thoroughly the way to accomplish it,
   This account has the goal to raise awareness of a Free                 can good results be obtained. A Bot can hardly replace a
Software called Wootter, looking for people possibly                      human being, but it is a great help to accomplish specific
interested in it. To consider this account a success, the project         objectives with a limited reach.
was not sponsored by any other means, using only                              Receding from the general use Chat Bots are given, a useful
@WootterC’s account. In an average of eight months, as seen               product can be obtained, one that allows the user to have a
in Fig. 4 visits from forty-seven different countries were                different experience without feeling plagued with useless and
achieved, as well as wide coverage in digital media such as               senseless information.
blogs, and invitations to different Free Software events                     The next step towards improving the performance of these
organized by communities or Universities.                                 Bots, besides phrase or word hierarchy, is adding a numeric
   The main activities this account performs are searches; it             method to understand the context of the message, to
searches for terms like ‘Free Software’, ‘Open Source’,                   distinguish the mood and the sense of the sentence, leaving
‘Twitter Client’, it also send some Twitts that are programmed            aside grammatical errors. Such grammatical errors generally
to be sent periodically.                                                  are not considered and make the final user think the Bot is
                                                                          programmed incorrectly, or that it just does not work the way
                                                                          it should.
                                                                              It would be of great help being able to add other data to the
                                                                          tables of the Database, in order to have more information that
                                                                          allows us to select an answer more efficiently. Leaning on the
                                                                          option Twitter offers to see a conversation history, an extra
                                                                          field could be defined, containing the IDs of the answers that
                                                                          should have previously been sent, so the selected answer can
                                                                          be considered as valid. This way, a context of what the topic
                                                                          of the conversation is, and what path the conversation is
                                                                          going, can be obtained.
                                                                              To successfully implement a Chat Bot, a lot of factors must
Fig. 4. Analytics Wootter
                                                                          be considered. It is essential to monitor continuously its
  IV.II @Siguientescena_                                                  operation at the early stages and, if necessary, make the
                                                                          appropriate changes. Furthermore, the Database must be
   For the fourth edition of the International Festival of                persistently updated to add new search terms, keywords, or
Alternative Performing Arts “Siguientescena”, organized in                answers that are more consistent with the people interacting
Querétaro City, México, a social media campaign was                       with the Bot. This makes possible to limit the time to interact
conducted, with the goal of attracting visitors to the event, and         with the Bot and don’t have opportunity to know all its
giving away tickets for it. The method for giving away the                answers or limitations.
tickets on Twitter consisted of requesting people to compose a
Tweet with the text “The @Siguientescena_ Festival takes
@username to backstage”, where “@username” was the                                                  ACKNOWLEDGMENT
individual being voted to win the tickets. Furthermore, those             Especial thanks to, Irving Pérez de León, Iris Selenne
individuals had to be followers of the account to participate.            Ramírez Rodriguez, Diego Octavio Ibarra Corona, Carlos
   A Bot was implemented to count and validate the votes                  Alberto Olmos Trejo for their collaboration and insightful
automatically, receiving over two thousand votes and                      comments.
informing each one if the vote was valid or not, speeding the
process and obliterating account handling errors.
   It also performs specific searches that have relevance to the                                        REFERENCES
festival, such as Twitts that contain the name of the                     [1]   Turing, A.M.: Computing Machinery & Intelligence. Mind
performers.                                                                     LIX(236) (1950)
                                                                          [2]   Sawar, A., Atwell: Chatbots: are they really useful? LDV-Forum Band
                                                                                (2007)
                                                                          [3]   http://business.twitter.com/
                                                                    755