Topic 7 Notes
Natural Language Processing (NLP) and Generation,
Natural language processing (NLP) is a subfield of computer science, information engineering,
and artificial intelligence concerned with the interactions between computers and human (natural)
languages, in particular how to program computers to process and analyze large amounts of natural
language data.
Language Learning Survey
Speech recognition
Given a sound clip of a person or people speaking, determine the textual representation of the
speech. This is the opposite of text to speech and is one of the extremely difficult problems
colloquially termed "AI-complete" (see above). In natural speech there are hardly any pauses
between successive words, and thus speech segmentation is a necessary subtask of speech
recognition (see below). Note also that in most spoken languages, the sounds representing
successive letters blend into each other in a process termed coarticulation, so the conversion of
the analog signal to discrete characters can be a very difficult process.
Speech segmentation
Given a sound clip of a person or people speaking, separate it into words. A subtask of speech
recognition and typically grouped with it.
Components of Expert Systems
Expert System
Basic Concept of an Expert System Function
The expert system consists of two major components: knowledge base and inference engine.
Knowledge-based is a technology used to store complex structured and unstructured information
used by a computer system. A knowledge-based system (KBS) is a computer program that
reasons and uses a knowledge base to solve complex. The term is broad and is used to refer to
many different kinds of systems; the one common theme that unites all knowledge based systems
is an attempt to represent knowledge explicitly via tools such as ontologism and rules rather than
implicitly via code the way a conventional computer program does. A knowledge based system
has three types of sub-systems: a knowledge base, a user interface and an inference engine.
Knowledge base contains the domain knowledge which is used by the inference engine to draw
conclusions. The inference engine is the generic control mechanism that applies the axiomatic
knowledge to the task-specific data to arrive at some conclusion. When a user supplies facts or
relevant information of query to the expert system he receives advice or expertise in response. That
is given the facts it uses the inference engine which in turn uses the knowledge base to infer the
solution.
Objectives of expert systems are:
1. Increase the probability, frequency, consistency of making good decisions.
2. Facilitate distribution of human expertise.
3. Facilitate real time, low cost expert level decisions by the non expert.
4. Enhance the utilization of most of the available data
5. Permit objectivity by weighing evidence without bias and without regard for the user’s
personal and emotional reactions
6. Permit dynamism through modularity of structure
7. Free up the mind and the time of the human expert to enable him or her concentrate on
more creative activities
8. Encourage investigations into the subtle areas of a problem
Characteristics of Expert Systems
High performance: They should perform at the level of a human expert.
Adequate response time: They should have the ability to respond in a reasonable amount of time.
Time is crucial especially for real time systems.
Reliability: They must be reliable and should not crash.
Understandable: They should not be a black box instead it should be able explain the steps of the
reasoning process. It should justify its conclusions in the same way a human expert explains why
he arrived at particular conclusion.
Shell
A shell is a special purpose tool designed based on the requirements of particular applications.
User should supply the knowledge base to the shell. Example for the shell is EMYCIN (Empty
MYCIN) shell. Shell manages the input and output. It processes the information given by the user,
relates it to the concepts contained in the knowledge base, and provides solution for a particular
problem.
Development of Expert Systems: General Steps
The process of ES development is iterative. Steps in developing the ES include:
1. Identify Problem Domain
The problem must be suitable for an expert system to solve it.
Find the experts in task domain for the ES project.
Establish cost-effectiveness of the system.
2. Design the System
Identify the ES Technology.
Know and establish the degree of integration with the other systems and databases.
Realize how the concepts can represent the domain knowledge best.
3. Develop the Prototype
Form Knowledge Base: The knowledge engineer works to:
Acquire domain knowledge from the expert.
Represent it in the form of If-THEN-ELSE rules.
4. Test and Refine the Prototype
The knowledge engineer uses sample cases to test the prototype for any deficiencies in
performance.
End users test the prototypes of the ES.
5. Develop and Complete the ES
Test and ensure the interaction of the ES with all elements of its environment, including end
users, databases, and other information systems.
Document the ES project well.
Train the user to use ES.
6. Maintain the System
Keep the knowledge base up-to-date by regular review and update.
Cater for new interfaces with other information systems, as those systems evolve.
Benefits of Expert Systems
a. Availability: They are easily available due to mass production of software.
b. Less Production Cost: Production cost is reasonable. This makes them affordable.
c. Speed: They offer great speed. They reduce the amount of work an individual puts in.
d. Less Error Rate: Error rate is low as compared to human errors.
e. Reducing Risk: They can work in the environment dangerous to humans.
f. Steady response: They work steadily without getting motional, tensed or fatigued.
Advantages of Expert Systems
Availability: Expert systems are available easily due to mass production software.
Cheaper: The cost of providing expertise is not expensive.
Reduced danger: They can be used in any risky environments where humans cannot work with.
Permanence: The knowledge will last long indefinitely.
Multiple expertise: It can be designed to have knowledge of many experts.
Explanation: They are capable of explaining in detail the reasoning that led to a conclusion.
Fast response: They can respond at great speed due to the inherent advantages of computers over
humans.
Unemotional and response at all times: Unlike humans, they do not get tense, fatigue or panic
and work steadily during emergency situations.
Revision Question
a) For most expert systems, it is crucial for the system developer to interact with a domain
expert. What is the process called, and what can the domain expert be used for?
b) Is an Expert System appropriate for this problem? Why or why not?
Topic 8: Genetic Algorithms
Introduction
Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that
belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly
used to generate high-quality solutions to optimization and search problems by relying on bio-
inspired operators such as mutation, crossover and selection. John Holland introduced Genetic
Algorithm (GA) in 1960 based on the concept of Darwin’s theory of evolution; afterwards, his
student Goldberg extended GA in 1989
Objectives
By the end of this topic you should be able to:
i. Describe Genetic Algorithms
ii. Build genetic algorithms
iii. Create Monte Carlo simulations
iv. Investigate cellular automata
Learning Activity 8.1: discuss How to Grow a Decision Tree
Learning Activity 8.2: Essay
Use heuristics and design fitness functions learning activities
Assessment
Activities 8.1 and 8.2.will be graded.
Topic Resources
1. Akbari, Ziarati (2010). "A multilevel evolutionary algorithm for optimizing numerical
functions" IJIEC 2 (2011): 419–430
2. Akbari, Ziarati (2010). "A multilevel evolutionary algorithm for optimizing numerical
functions" IJIEC
3. Ting, Chuan-Kang (2005). "On the Mean Convergence Time of Multi-parent Genetic
Algorithms Without Selection". Advances in Artificial Life
URL Links
https://en.wikipedia.org/wiki/Genetic_algorithm#Genetic_operators
Topic 1 Notes
Decision Tree
Decision trees come in two forms: classification trees and regression trees. In both cases, you
ask questions and take a branch depending on the answer. Once you reach a leaf node, you reach
a decision. The questions asked can be categorical (e.g., which color) or numerical (e.g., how
high). For a classification tree, the leaf is a category, like inside or outside a paper bag. For a
regression tree, the leaf is an equation that gives a numeric value.
You can present your decision tree as a tree or a list of rules. In tree form, it looks like a flowchart.
You can then transform this flowchart into a list of if-then-else rules by writing down the
questions at each branch. You can also transform the rules into a tree. Each if statement makes
a branch, and the then and else statements make sub-trees or leaves.
There are two main ways to build a decision tree: bottom-up and top-down induction of decision
trees. The bottom-up approach builds a classifier from one data item at a time, whereas the top-
down approach starts with all of the training data and then gradually divides it.
With the bottom-up approach, let’s say you have a list of data that’s classified as “good” or
“bad” and includes letter and number features:
data = [['a', 0, 'good'], ['b', -1, 'bad'], ['a', 101,
'good']]
label = ['letter', 'number', 'class']
You can make a rule from the first data item using Pythonesque pseudocode.
if letter == 'a' and number == 0 then return 'good'
else
return 'No idea'
You can then gradually relax existing rules or add new rules as more data is considered. When
you use the next data item, you add another rule:
if letter == 'a' and number == 0 then return 'good'
else if letter == 'b' and number == -1 then return 'bad'
else
return 'No idea'
The third data item lets you collapse these down. You already have letter 'a' mapping to 'good',
and it doesn’t matter what the number is:
if letter == 'a' then return 'good'
else
return 'bad'
In contrast, the top-down approach uses all of the training data and gradually divides it up to build
a tree. When the letter is 'a', you get 'good'; when you have
'b', you get 'bad'. You can encode this as a decision tree:
For this small dataset, the split point is easy to find. However, as you add more data, you’ll need
a way to determine on which feature to split. As mentioned earlier, you’ll use entropy. Entropy
has a formal definition in thermodynamics relating to the chaos of a system.4 In information
theory, it measures uncertainty. If you toss a two-headed coin, you can be certain you will get
heads every time. If you toss a normal coin, you have a 50/50 chance of getting heads. The normal
coin has more entropy. The coin type allows you to predict what can happen in the future. In the
case of the lost turtle, you don’t have heads or tails, but you do have x and y coordinates, and it
works in much the same way.
Using a Python dictionary to represent the trees, you can start with an empty dictionary:
tree={}. The key tells you which attribute to split on, and the value tells you what to do with
split data. This’ll either be a category or another tree.
For the letter and number data you can make the letter a key
which represents the split or branch. The corresponding value will then
needtwo leaf nodes (one for each value)—these are also dictionaries. One maps
'a'
to 'good' and the other maps 'b' to 'bad'. Your tree looks like this:
tree = {'letter': {'a': 'good', 'b': 'bad'}}
Divide Your Data
Once you have a way to choose features for your branches, you can partition your data
recursively in a similar way to quicksort. Think about how quicksort works:
• Pick a pivot; one element in the data.
• Rearrange the data into two groups; less than or equal to the pivot in one, everything else
in the other.
• Apply the first two steps to each group, until you have groups of one or zero elements.
Quicksort uses a pivot point to divide the data into low and high values. A decision tree
partitions the data using a feature instead of a pivot, but still recursively builds up a decision
tree. By keeping track of the features on which you split, you can report the decision tree you
built and try it on any data. You can also transform the tree into equivalent rules. This gives
you a choice of ways to report what you discover.
You sometimes end up with lots of rules. In the worst case, you can get one rule per training
data item. There are various ways to prune these back. Concerning the turtle point dataset, you
can use the fact that the paper bag was square to transform a large ruleset into succinct rules.
How to Grow a Decision Tree
You build a tree from leaf nodes and sub-trees. The algorithm looks a lot like quicksort,
partitioning the data and proceeding recursively:
ID3(data, features, tree = {}):
if data is (mostly) in same category:
return leaf_node(data)
feature = pick_one(data, features)
tree[feature]={}
groups = partition(data, feature)
for group in groups:
tree[feature][group] = ID3(group, features)
return tree
You partition the data into groups with the same value of your chosen feature. You build up
sub-trees and make a leaf node when all of the data is in the same category—or it is mostly
in the same category. This might be just one data item.
To decide a feature on which to partition the data, you can pick a feature at random, then build
a random forest5 and vote to form a decision. Unfortunate- ly, there’s not space to cover forests
in this book but they’re worth trying out. Instead, for this exercise, you’ll build an ID3 decision
tree using a more direct approach.
How to Decide the Best Feature
You can use all kinds of criteria for selecting features. Let’s think generally first. Consider
four points, (0, 0), (1, 0), (0, 1), and (1, 1). Suppose the first two are inside your bag and the
last two are outside:
You only have two features from which to choose: the x and y values of the coordinates. The
x coordinate can be inside or outside of the bag regardless of whether it’s a value of 0 or 1.
However, the y coordinate is only outside of the bag if its value is set to 1. With that
knowledge, you can make this into a decision tree:
Of course, some points can be below the bottom of the bag or beyond its edges, so this isn’t
the full story. ID3 only knows what to do with feature values it was trained on. Only four
possible points are using these x and y values, so you cannot use the decision tree it makes
on any other data points. You will be able to make trees for other data using the same method
though. You can try your algorithm on a larger paper bag, with edges at x=-1, x=1, y=-
1, y=1, and use five training points: the center (inside the bag), and four points on the edge:
data = [[0, 0, False], [-1, 0, True], [1, 0, True], [0, -1, True], [0, 1, True]]
label = ['x', 'y', 'out']
You can now make new combinations of x and y values you haven’t included in the training
data, such as [1, 1]. You will be able to build a decision tree that classifies this new
coordinate:
For now, let’s consider how to use the four data points. You want to find the purity a feature
gives you when you use it to partition your data. For the four points, using the y coordinate
gave you two pure groups. There are many ways to measure this purity. Some use
probabilities, and others use formal statistical measures. This advanced topic is not covered
in this book, but a web search for “decision tree split quality” should get you started. However,
since entropy doesn’t need detailed statistical knowledge, it’s a good place to start. You can
try other ways to select features once you have learned how to build a decision tree.
Entropy tells you how homogeneous a dataset is. Entropy captures the concept of randomness or
chaos in a system. Data with less entropy contains more structure and can be compressed more
easily. If everything is identical, like with a two-headed coin (as we discussed on page 17),
you have zero entropy.
Entropy uses logarithms. The logarithm of a number, in a base, tells you what power of the base
gives that number. 23 = 8. So the logarithm of 8 is 3, in base two. 32 = 9. So the logarithm of 9 is
2, in base three. Also, notice you can add
the powers:
5
(2 × 2 × 2) × (2 × 2) = (2 × 2 × 2 × 2 × 2) = 2
3 2 3+2 5
⇒2 ×2 =2 =2
Operators,
generate a second generation population of solutions from those selected through a combination
of genetic operators: crossover (also called recombination), and mutation.
For each new solution to be produced, a pair of "parent" solutions is selected for breeding from
the pool selected previously. By producing a "child" solution using the above methods of crossover
and mutation, a new solution is created which typically shares many of the characteristics of its
"parents". New parents are selected for each new child, and the process continues until a new
population of solutions of appropriate size is generated. Although reproduction methods that are
based on the use of two parents are more "biology inspired", some research suggests that more
than two "parents" generate higher quality chromosomes.
These processes ultimately result in the next generation population of chromosomes that is
different from the initial generation. Generally the average fitness will have increased by this
procedure for the population, since only the best organisms from the first generation are selected
for breeding, along with a small proportion of less fit solutions. These less fit solutions ensure
genetic diversity within the genetic pool of the parents and therefore ensure the genetic diversity
of the subsequent generation of children.
Opinion is divided over the importance of crossover versus mutation. There are many references
in Fogel (2006) that support the importance of mutation-based search. Although crossover and
mutation are known as the main genetic operators, it is possible to use other operators such as
regrouping, colonization-extinction, or migration in genetic algorithms.
parameters
It is worth tuning parameters such as the mutation probability, crossover probability and
population size to find reasonable settings for the problem class being worked on. A very small
mutation rate may lead to genetic drift (which is non-ergodic in nature). A recombination rate that
is too high may lead to premature convergence of the genetic algorithm. A mutation rate that is
too high may lead to loss of good solutions, unless elitist selection is employed.