Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.
com/python-for-nlp-creating-a-rule-based-chatbot/
Python for NLP: Creating a Rule-
Based Chatbot
By W Usman Malik ( https://twitter.com/usman_malikk) •
O Comments (/python-for-nlp-creating-a-rule-based-chatbot/#disqus_thread)
This is the 12th article in my series of articles on Python for NLP. In the previous article
(/python-for-nlp-working-with-the-gensim-library-part-2/), I briefly explained the
different functionalities of the Python's Gensim library (https://pypi.org/project
/gensim/). Until now, in this series, we have covered almost all of the most commonly
used NLP libraries such as NLTK, SpaCy, Gensim, StanfordCoreNLP, Pattern, TextBlob,
etc.
In this article, we are not going to explore any NLP library. Rather, we will develop a
very simple rule-based chatbot capable of answering user queries regarding the sport
of Tennis. But before we begin actual coding, let's first briefly discuss what chatbots
are and how they are used.
What is a Chatbot?
A chatbot is a conversational agent capable of answering user queries in the form of
text, speech, or via a graphical user interface. In simple words, a chatbot is a software
application that can chat with a user on any topic. Chatbots can be broadly categorized
into two types: Task-Oriented Chatbots and General Purpose Chatbots.
The task-oriented chatbots are designed to perform specific tasks. For instance, a task-
oriented chatbot can answer queries related to train reservation, pizza delivery; it can
also work as a personal medical therapist or personal assistant.
On the other hand, general purpose chatbots can have open-ended discussions with
the users.
There is also a third type of chatbots called hybrid chatbots that can engage in both
task-oriented and open-ended discussion with the users.
A
1 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Approaches for Chatbot Development
Chatbot development approaches fall in two categories: rule-based chatbots and
learning-based chatbots.
Learning-Based Chatbots
Learning-based chatbots are the type of chatbots that use machine learning techniques
and a dataset to learn to generate a response to user queries. Learning-based chatbots
can be further divided into two categories: retrieval-based chatbots and generative
chatbots.
The retrieval based chatbots learn to select a certain response to user queries. On the
other hand, generative chatbots learn to generate a response on the fly.
One of the main advantages of learning-based chatbots is their flexibility to answer a
variety of user queries. Though the response might not always be correct, learning-
based chatbots are capable to answer to any type of user query. One of the major
drawbacks of these chatbots is that they may need a huge amount of time and data to
train.
Rule-Based Chatbots
Rule-based chatbots are pretty straight forward as compared to learning-based
chatbots. There are a specific set of rules. If the user query matches any rule, the
answer to the query is generated, otherwise the user is notified that the answer to user
query doesn't exist.
One of the advantages of rule-based chatbots is that they always give accurate results.
However, on the downside, they do not scale well. To add more responses, you have to
define new rules.
In the following section, I will explain how to create a rule-based chatbot that will reply
to simple user queries regarding the sport of tennis.
2 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Rule-Based Chatbot Development with
Python
The chatbot we are going to develop will be very simple. First we need a corpus that
contains lots of information about the sport of tennis. We will develop such a corpus by
scraping the Wikipedia article on tennis. Next, we will perform some preprocessing on
the corpus and then will divide the corpus into sentences.
When a user enters a query, the query will be converted into vectorized form. All the
sentences in the corpus will also be converted into their corresponding vectorized
forms. Next, the sentence with the highest cosine similarity (https://en.wikipedia.org
/wiki/Cosine_similarity) with the user input vector will be selected as a response to the
user input.
Follow these steps to develop the chatbot:
Importing Required Libraries
import nltk
import numpy as np
import random
import string
import bs4 as bs
import urllib.request
import re
We will be using the Beautifulsoup4 (https://beautiful-soup-4.readthedocs.io
/en/latest/) library to parse the data from Wikipedia. Furthermore, Python's regex
library (/using-regex-for-text-manipulation-in-python/), re , will be used for some
preprocessing tasks on the text.
Creating the Corpus
As we said earlier, we will use the Wikipedia article on Tennis to create our corpus. The
following script retrieves the Wikipedia article and extracts all the paragraphs from the
article text. Finally the text is converted into the lower case for easier processing.
3 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
raw_html = urllib.request.urlopen('https://en.wikipedia.org/wiki/Tennis')
raw_html = raw_html.read()
article_html = bs.BeautifulSoup(raw_html, 'lxml')
article_paragraphs = article_html.find_all('p')
article_text = ''
for para in article_paragraphs:
article_text += para.text
article_text = article_text.lower()
Text Preprocessing and Helper Function
Next, we need to preprocess our text to remove all the special characters and empty
spaces from our text. The following regular expression does that:
article_text = re.sub(r'\[[0-9]*\]', ' ', article_text)
article_text = re.sub(r'\s+', ' ', article_text)
We need to divide our text into sentences and words since the cosine similarity of the
user input will actually be compared with each sentence. Execute the following script:
article_sentences = nltk.sent_tokenize(article_text)
article_words = nltk.word_tokenize(article_text)
Finally, we need to create helper functions that will remove the punctuation from the
user input text and will also lemmatize the text. Lemmatization refers to reducing a
word to its root form. For instance, lemmatization the word "ate" returns eat, the word
"throwing" will become throw and the word "worse" will be reduced to "bad".
Execute the following code:
wnlemmatizer = nltk.stem.WordNetLemmatizer()
def perform_lemmatization(tokens):
return [wnlemmatizer.lemmatize(token) for token in tokens]
punctuation_removal = dict((ord(punctuation), None) for punctuation in string.punctuation)
def get_processed_text(document):
return perform_lemmatization(nltk.word_tokenize(document.lower().translate(punctuation_remov
al))) A
4 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
In the script above we first instantiate the WordNetLemmatizer from the NTLK
(https://www.nltk.org/) library. Next, we define a function perform_lemmatization
which takes a list of words as input and lemmatize the corresponding lemmatized list
of words. The punctuation_removal list removes the punctuation from the passed text.
Finally, the get_processed_text method takes a sentence as input, tokenizes it,
lemmatizes it, and then removes the punctuation from the sentence.
Responding to Greetings
Since we are developing a rule-based chatbot, we need to handle different types of
user inputs in a different manner. For instance, for greetings we will define a dedicated
function. To handle greetings, we will create two lists: greeting_inputs and
greeting_outputs . When a user enters a greeting, we will try to search it in the
greetings_inputs list, if the greeting is found, we will randomly choose a response
from the greeting_outputs list.
Look at the following script:
greeting_inputs = ("hey", "good morning", "good evening", "morning", "evening", "hi", "whatsup")
greeting_responses = ["hey", "hey hows you?", "*nods*", "hello, how you doing", "hello", "Welcom
e, I am good and you"]
def generate_greeting_response(greeting):
for token in greeting.split():
if token.lower() in greeting_inputs:
return random.choice(greeting_responses)
Here the generate_greeting_response() method is basically responsible for validating
the greeting message and generating the corresponding response.
Subscribe to our Newsletter
Get occassional tutorials, guides, and jobs in your inbox. No spam ever. Unsubscribe at
any time.
Enter your email ...
5 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Subscribe
Responding to User Queries
As we said earlier, the response will be generated based upon the cosine similarity of
the vectorized form of the input sentence and the sentences in the corpora. The
following script imports the TfidfVectorizer and the cosine_similarity functions:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
Now we have everything set up that we need to generate a response to the user
queries related to tennis. We will create a method that takes in user input, finds the
cosine similarity of the user input and compares it with the sentences in the corpus.
Look at the following script:
def generate_response(user_input):
tennisrobo_response = ''
article_sentences.append(user_input)
word_vectorizer = TfidfVectorizer(tokenizer=get_processed_text, stop_words='english')
all_word_vectors = word_vectorizer.fit_transform(article_sentences)
similar_vector_values = cosine_similarity(all_word_vectors[-1], all_word_vectors)
similar_sentence_number = similar_vector_values.argsort()[0][-2]
matched_vector = similar_vector_values.flatten()
matched_vector.sort()
vector_matched = matched_vector[-2]
if vector_matched == 0:
tennisrobo_response = tennisrobo_response + "I am sorry, I could not understand you"
return tennisrobo_response
else:
tennisrobo_response = tennisrobo_response + article_sentences[similar_sentence_number]
return tennisrobo_response
You can see that the generate_response() method accepts one parameter which is
user input. Next, we define an empty string tennisrobo_response . We then append
the user input to the list of already existing sentences. After that in the following lines:
6 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
word_vectorizer = TfidfVectorizer(tokenizer=get_processed_text, stop_words='english')
all_word_vectors = word_vectorizer.fit_transform(article_sentences)
We initialize the tfidfvectorizer and then convert all the sentences in the corpus
along with the input sentence into their corresponding vectorized form.
In the following line:
similar_vector_values = cosine_similarity(all_word_vectors[-1], all_word_vectors)
We use the cosine_similarity function to find the cosine similarity between the last
item in the all_word_vectors list (which is actually the word vector for the user input
since it was appended at the end) and the word vectors for all the sentences in the
corpus.
Next, in the following line:
similar_sentence_number = similar_vector_values.argsort()[0][-2]
We sort the list containing the cosine similarities of the vectors, the second last item in
the list will actually have the highest cosine (after sorting) with the user input. The last
item is the user input itself, therefore we did not select that.
Finally, we flatten the retrieved cosine similarity and check if the similarity is equal to
zero or not. If the cosine similarity of the matched vector is 0, that means our query did
not have an answer. In that case, we will simply print that we do not understand the
user query.
Otherwise, if the cosine similarity is not equal to zero, that means we found a sentence
similar to the input in our corpus. In that case, we will just pass the index of the
matched sentence to our "article_sentences" list that contains the collection of all
sentences.
Chatting with the Chatbot
As a final step, we need to create a function that allows us to chat with the chatbot that
A
7 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
we just designed. To do so, we will write another helper function that will keep
executing until the user types "Bye".
Look at the following script, the code has been explained after that:
continue_dialogue = True
print("Hello, I am your friend TennisRobo. You can ask me any question regarding tennis:")
while(continue_dialogue == True):
human_text = input()
human_text = human_text.lower()
if human_text != 'bye':
if human_text == 'thanks' or human_text == 'thank you very much' or human_text == 'thank
you':
continue_dialogue = False
print("TennisRobo: Most welcome")
else:
if generate_greeting_response(human_text) != None:
print("TennisRobo: " + generate_greeting_response(human_text))
else:
print("TennisRobo: ", end="")
print(generate_response(human_text))
article_sentences.remove(human_text)
else:
continue_dialogue = False
print("TennisRobo: Good bye and take care of yourself...")
In the script above, we first set the flag continue_dialogue to true. After that, we print
a welcome message to the user asking for any input. Next, we initialize a while loop
that keeps executing until the continue_dialogue flag is true. Inside the loop, the user
input is received, which is then converted to lower case. The user input is stored in the
human_text variable. If the user enters the word "bye", the continue_dialogue is set to
false and goodbye message is printed to the user.
On the other hand, if the input text is not equal to "bye", it is checked if the input
contains words like "thanks", "thank you", etc. or not. If such words are found, a reply
"Most welcome" is generated. Otherwise, if the user input is not equal to None , the
generate_response method is called which fetches the user response based on the
cosine similarity as explained in the last section.
Once the response is generated, the user input is removed from the collection of
sentences since we do not want the user input to be part of the corpus. The process
continues until the user types "bye". You can see why this type of chatbot is called a
A
rule-based chatbot. There are plenty of rules to follow and if we want to add more
8 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Hello, I am your friend TennisRobo, You c;,n ;isl:• ;,ny question reg;irding tennis:
roger federer
TennisRobo:
(: \Users\us11,1n\Anacond,13\llb\slte-p,1cl:,1ges\sl:le,1rn\fe,1ture_extr,1ctlon\text. py: 399: UserWarnlng: Your stop_words ... y be !neons ls
tent wlth your preprocesslng. Tokenlilng the stop words generated tokens ("h,1", "le", "u", "wa"] not in stop_words.
"stop_words. · X sorted{lnconsistent))
however it 11ust be noted th,1t hath rod l,1ver and ken rosewall also won ... jor pro sl- tourna11ents on all three surfaces (grass,
clay, wood) rosewall in 1963 and l,1ver in 1967. -,re recently, roger federer is considered by 11,1ny observers to have the 1110st
"coo,plete" g""'e in 110dern tennis.
however it must be noted that both rod laver and ken rosewall also won major pro slam tournament
s on all three surfaces (grass, clay, wood) rosewall in 1963 and laver in 1967. more recently, r
oger federer is considered by many observers to have the most "complete" game in modern tennis."
9 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
10 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Enter your email. ..
11 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Want a remote job?
MongoOB OBA - remote
Percona 7 hours ago (https://hireremote.io/remote-job/2010-mongodb-dba-remote-at-percona)
mongodb (https://hireremote.io/remote-mongodb-jobs) java (https://hireremote.io/remote-java-
jobs) python (https://hireremote.io/remote-python-jobs) ruby (https://hireremote.io/remote-ruby-
jobs)
MongoOB OBA - remote
Percona 7 hours ago (https://hireremote.io/remote-job/2015-mongodb-dba-remote-at-percona)
mongodb (https://hireremote.io/remote-mongodb-jobs) java (https://hireremote.io/remote-java-
jobs) python (https://hireremote.io/remote-python-jobs) ruby (https://hireremote.io/remote-ruby-
jobs)
MongoOB OBA - remote
Percona 7 hours ago (https://hireremote.io/remote-job/2009-mongodb-dba-remote-at-percona)
mongodb (https://hireremote.io/remote-mongodb-jobs) java (https://hireremote.io/remote-java-
jobs) python (https://hireremote.io/remote-python-jobs) ruby (https://hireremote.io/remote-ruby-
jobs)
O More jobs (https://hireremote.io)
Jobs via HireRemote.io (https://hireremote.io)
Prepping for an interview?
(https://stackabu.se/daily-coding-problem)
• Improve your skills by solving one coding problem every day
• Get the solutions the next morning via email
• Practice on actual problems asked by top companies, like:
Google facebook amazon.com
'---""
11 Microsoft
A
12 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
</> Daily Coding Problem (https://stackabu.se/daily-coding-problem)
Ad
Recent Posts Tags Follow Us
Structural Design algorithms f
Patterns in Python (/tag/algorithms/) Twitter Facebook RSS
(/structu ra I-design- ( https://t�ttttpsdlw�fm::litmMikmmE
I I
patterns-i n-python/) amqp (/tag/amqp/) /Stackzxbuseacka busess/)
angular
What is the (/tag/angular/)
serialVersionUID in Java?
announcements
(/what-is-the-
(/tag/announcements/)
serialversionuid-in-java/)
I apache (/tag/apache/) I
Reading and Writing I api (/tag/api/) I
Excel (XLSX) Files in
Python with the Pandas arduino
Library (/reading-and- (/tag/arduino/)
writing-excel-files-in-
artificial intelligence
python-with-the-
(/tag/artificial-
pandas-library/)
intelligence/)
asynchronous
(/tag/asynchronous/)
I aws (/tag/aws/) I
13 of 14 11/06/2020, 12:44
Python for NLP: Creating a Rule-Based Chatbot https://stackabuse.com/python-for-nlp-creating-a-rule-based-chatbot/
Copyright© 2020, Stack Abuse (https://stackabuse.com). All Rights Reserved.
Disclosure (/disclosure) • Privacy Policy (/privacy-policy) • Terms of Service (/terms-of-service)
14 of 14 11/06/2020, 12:44