UNIT-3
Semantics and Pragmatics
3.1 Requirements for Representation
One would consider the need of NLP in order to allow people or machines that speak 2
very distinct and different languages to be able to communicate and share ideas with
each other throughout the entire process. One can represent the meaning of these
languages in the way people talk and make conversation. Each conversation is made
up of sentences put together. So these sentences are made up of words which are then
made up of letters. The letters are pronounced in the way that they are phonetically
correct.
If we want to refer to the meaning of the text that we are using we call it semantics. To
have a language that can be correctly interpreted it is very important that the
semantics describe how people think and learn about different ideas being portrayed
in that language. This theory of unifying all meanings for any set language may not be
available readily which is why people resort to different types of methods to achieve
the same. This being us to the question of why representation and why do people use it
?
Verifiability
As the name suggests it talks about the verification of a certain text or other ideas
present in the NLP semantics. We can define this as the ability of any system to
compare or model the knowledge achieved from one data model to the knowledge
achieved from a knowledge base. This simply means that one can match their
knowledge against the facts that are already in existence about the same.
E.g. India is a country of only 1 religion.
This statement if made by one of the worlds described is compared to the one with all
trained realistic situations and data in it, it would come out as false. This is how
semantics helps people understand of their very own data is correct or incorrect by
comprehending the meaning of the data.
Unambiguous Representation
NLP is filled with ambiguous situations which was one of the main problems we studied
in the previous model. We need to have a lot more than just a single context in order to
make any inferences from an ambiguous statement. So let us take an example of the
following.
E.g. Rakesh wants to eat at a place close to the Hawa Mahal.
This statement is filled with ambiguity because of the fact that there could be so many
different places to eat near ‘Hawa Mahal’. Rakesh would like to try one of them but
which one is not known for sure. There needs to be a certain kind of appropriate
representation analysis of the same.
However, when we talk about the representation of a system we need to ensure that it is
free from any type of ambiguity. This is because the representation talks about the
semantics (meanings). For the user to allow the system to comprehend any amount of
information one would need to understand the whole process of determining the same.
Vagueness
Once again this concept is very much related to the previous term called
ambiguity. Vagueness is a form of ambiguity which simply does not allow parts of the
sentence to be specified. The meaning of the representation is thus not achieved. One
must note that it does not give rise to different types of representations for the same
string.
E.g. Mukesh likes to eat Indian food.
The cuisine of the food is specified. However, the cuisine could include so many dishes
which makes it very difficult to represent the meaning accurately without potentially
causing a blunder in the same.
Canonical Representation
This is the concept where one is able to map different inputs to the same output. The
form that is given as a meaning is forged as part of a representation. We could use
simple grammatical rules to relate the sentences and form a certain meaning of the
same. We would be able to easily understand this with the help of examples.
1. Does Sagar serve vegetarian food ?
2. Do they have veg. food at Sagar ?
3. Does Sagar have any vegetarian dishes ?
Now if we look at all of the 3 examples given above it is sure that one would understand
the relation between all 3 of them. They all imply the same thing but with words that are
not so similar. The system might not be able to comprehend the words like ‘food’,
‘dishes’ together. This would make it very difficult to represent the meaning. However,
we could do this by simply converting them to a simple and general form with a
difference in the placement as well.
1. Sagar serves vegetarian food.
2. Vegetarian food is served at Sagar.
A simple yet elegant representation shows us that this is a clear representation that
follows the two parts of ‘Sagar’ and ‘Vegetarian food’ being related to each other
despite being placed in different parts of the sentence.
Inference
When we deal with sentences that are a little more complex than the above examples
we need our system to understand the context and make inferences based on the data
it is trained with.
E.g. Can Vegetarians eat food at Sagar ?
This sentence ultimately asks the question about the availability of veg. food at Sagar.
One would need to infer this form the statement considering the key ‘tokens’ that are
available. Inference is used to describe the inputs and give some background
knowledge about the same.
Key Takeaways:
Verifiability talks about the verification of a certain text or other ideas present
in the NLP semantics.
Vagueness is a form of ambiguity which simply does not allow parts of the
sentence to be specified.
Inference is used to describe the inputs and give some background
knowledge about the same.
3.2 First – Order Logic & Description Logic
First -Order logic which is also referred to as FOL is a logic that has been developed to
understand and comprehend the meaning representations that could come out of
every different string that we might intend to process. It allows the system to run a
computational analysis to infer the requirements and match the data in order to
provide appropriate semantic analysis of the natural language.
FOL also has a feature that allows it to have a say in the representation of different
structures that might be part of the whole process. Schemes that are related may
achieve a certain amount of inference from the above. We will further deal with the
elements that make up First - Order Logic and its applications.
1. Term
This refers to the examination of a certain notion that is part of the string that is to be
processed. The representation of certain terms can be done wit the help of 3 different
blocks given below.
Constants
FOL has constants that basically refer to a certain object that is being described. These
can easily be identified because of their representation. IT is either in the form of a
type of letter like ‘A or ‘Z’’. This may also be possible for the constant if they are
depicted with the help of proper nouns like ‘Sagar’ or ‘Hari’.
Functions
FOL uses this term in order to correspond to the semantics that would describe the
current status or state of an object. We could talk about the location of ‘Hari’. This would
have a representation similar to.
Representation = Location of ( Hari )
These functions are same in terms of their syntax. They will always refer to objects that
are unique. Most of the cases are very favourable to get an idea about the same. But for
most of the part they are all unique and represented as stated above.
Variables
This is the mechanism given by FOL which is used in order to simply refer to objects.
These are depicted in the form of lower case letters. We can use this in order to make
certain inferences about the object that we have in question. We can easily relate a
certain object to one that is distant in the form of other meanings.
Representation = Serves (Sagar, Vegetarian)
Semantics of First – Order Logic
There is a table that needs to be followed in order to make the correct inferences
based on the logic provided by the notations. One can follow up with the table in order
to make appropriate deductions.
¬ = NOT
∧ = AND
∨ = OR
F = FALSE
T = TRUE
P B ¬P P∧Q P∨Q PQ
F F T F F T
F T T F T T
T F F F T F
T T F T T T
Description Logic
To allow the language to work the magic that it has it is very important to understand
the references that have been made about it. The description of the same needs to be
done very well. We us a different type of logic in order to correctly specify different
types of semantics for the structured representations. A framework has been created in
order to get a grip on the model. It essentially describes the set of FOL by being part
of the subsets.
Terminology
We need to emphasize the process of representation while having descriptive logics
work to separate the categories. It is simply the backbone of the entire process of
description logic. We can call all the categories as well as the concepts that go into the
application as terminology.
We also have a base of knowledge that is known to contain all this terminologies and is
rightly known as TBox. Apart from this there is also a box that contains certain facts
about the data that we are intending to describe. This is called the ABox. Further, all
these terminologies are arranged in the form of hierarchy to produce inferences
through set theory. This process is also known as ontology.
Subsumption
Now that we have an idea about all the facts that can go into the mainframe while
describing the algorithm for logic. We look at the different sets that are formed as part
of the ontology. Let us say we have 2 sets C and D. These 2 are represented with the
same symbol as that of a ‘proper subset’. It is also read as C is subsumed by D. This
simply implies that all the elements that are part of the C set are also part of the D set.
We can see this in a diagrammatic representation of the same very easily. Here we area
referring to different cuisines being part of the whole term known as ‘restaurant’.
Semantic Roles and Analysis
Semantics is the study that allows people to work and get a better idea of the
happenings between two languages. These representations of the shallow semantics
are very important. We need to have a certain number of labels that can assign riles to
the different types of restrictions.
Semantics is essentially used in order to provide its construction segments with
meaning. The tokens are given a certain discrete structure. Symbols as well as other
features are easily interpreted by the study of semantics. The analysis part is only key
part in deciding whether the structure that was created using semantics is useful in
terms of meaning or not. We can write this in the form of a formula as :
CFG (Context Free Grammar) + Semantic Rules = Syntax Definitions
This usually depends heavily on the concurrent parsing or other different levels of
parsing. E can use Lambda expressions as a combination with the same in order to find
the FOL arguments. We use the Principle of Compositionality in order to study the
syntax that is used to understand the sematic analysis. This has 2 main factors that
speak volumes about the process given below.
The meaning of certain parts or tokens are used in order to build the
meaning of the entire corpus.
There are certain relations like syntactic ones that are used to guide the
following process.
The semantic analysis is first done by conduction the lexical analysis of the entire
paraphrase which tells us whether the structure is fundamentally sound. Sematic
analysis is key in order to evaluate the grammar and other language use. We can do the
same as follows :
1. Resolution of Scope
2. Checking of the Type
3. Checking bound to arrays
Semantic Errors :
There are a few different types of errors that can easily be recognized by the analyzer.
A list of these are given below :
Mismatching of the type
Variable not being declared
Misuse of a reserved identifier
Multiple variables being declared
Accessing a variable that is out of scope
The formal parameter mismatching with the actual one
Attribute Grammar
This is a very special type of grammar that is made up of CFG along with the addition
of certain attributes. There are appended to the non-terminal components. The values
are well-defined. E.g. char, float, int, string, etc.
One can easily understand the semantics that is used in a programming language with
the help of attribute grammar. Value scan be passed of the same while being seen in
the form of a parse tree.
A A + T { A.value = A.value + T.value }
There are also 2 different types of attributes used which are :
1. Synthesized attributes
2. Inherited attributes
There 2 separate approaches that are used in the process of Semantic analysis.
1. Unification based Approach
2. Store and Retrieve Approach
Word Senses
Words are used in any language in order to construct sentences which will go on
forward to make more sense. This is a very simple method used in order to
communicate between different people around. Words can also mean a lot of different
things. This involves a certain sense of ambiguity surrounding them.
Word sense can be defined as the representation of a certain aspect that makes a
word discretely. This is controlled online with the help of the WordNet. This is basically
a database that is used in order to represent all of the word senses easily. We need
to establish a certain relationship between the words that are used in order to make the
relations a little simpler. Let us take 2 words which are synonyms. Having a relation
between both of the words can make it easier for the language to process the same
when asked.
A sense can give very discrete understanding about the word. Let us take the English
language as an example. There are a few words that could have different meanings and
at the same time be pronounced and spelled the same.
E.g.
1. Mouse [1] = A device used to scroll around a desktop computer.
2. Mouse [2] = An animal that is usually found in houses and other places with
food.
1. Bank [1] = A place where people store money and also make financial
transactions.
2. Bank [2] = The part beside a river.
The above words that are used to give textual definitions for the words in order to have
a unique word sense are known as glosses. These are not exactly used to give a formal
description but come in handy to the users going through the algorithm.
Number of Senses for a Word
The number of words that can be same and have different meanings are endless. To
analyze and get a better understanding we can build something known as
a thesaurus. This can provide the user with a certain specific criteria that is easily
used to pose different conditions. These can be distinct as well as similar. One can join
two word senses and form a conjunction which is known as zeugma.
E.g.
1. Which of these flights serve lunch ?
Does Air Asia serve Mumbai ?
? Does Air Asia serve lunch and Mumbai ?
The question mark ( ? ) is used in order to depict sentences that are semantically
incorrect or wrongly formed. We cannot find a suitable way to explain the 3rd case
which is an example of zeugma.
Relations Between Senses
Word sense talk about the words that have discrete meanings. Here wee will look at
how these senses could be related to each other based on different factors.
1. Synonymy
When 2 words that are in a certain string may or may not have the same spelling and
pronunciation but have meanings that are exactly the same or very identical, we can
call them synonyms. The technical term while processing the language for terms like
these are also called synonymy.
E.g.
Sofa = couch
Throw up = vomit
Car = vehicle
Back = behind
Finish = End
Big = Huge
A synonymy works as more of a relation between senses that it does between words.
We only need to be able to grasp the meaning of the word and find out if we could
swap a word in place of that. This is the main aim.
Antonymy
This is exactly the opposite of the previous terminology that we learned about. When 2
words that are in a certain string may or may not have the same spelling and
pronunciation but have their meanings exactly opposite to each other are known as
antonyms.
E.g.
Short = long
In = out
Outside = inside
Behind = front
Hot = cold
Wet = dry
An antonymy works as more of a relation between the negation of senses rather than
simple words. There are also a separate type of antonymy’s which are known to be the
opposites for those words that change with respect to opposite directions. These are
also known as reverslves.
Taxonomic Relations
We must also consider that there are different ways in which one could relate word
senses. The main one would be known in two types.
Hyponym
A word sense is also called a hyponym if it makes the previous word a little more
specific about itself. It essentially is a part of the subclass of the same.
E.g.
Cat is the hyponym of animal.
Apple is the hyponym of fruits.
Hypernym
This is the exact opposite of a hyponym. In a hypernym the same two words that were
hyponyms, when reversed become this. They simply switch places in order to exist.
E.g.
Animal is the hypernym of cat.
Fruit is the hypernym of apple.
Meronym
We could call this relation one that tells us the relation between the 2 words as a whole.
One of the words is part of the set of another. The word A would be part of the set B.
E.g.
Wheel is the meronym of cycle.
Knee is a meronym of leg.
Structured Polysemy
We can make a relation between the 2-word sense through semantics. This is also
known as structured polysemy.
Thematic Roles
The algorithm which assigns the verb to the Thematic properties are called Thematic
Roles.
1. Agent – The action of the verb is carried out by this particular entity. This
action is carried out intentionally in order to know that the function is taking
place.
2. Experiencer – This refers to a certain emotion that is carried out by a
certain entity. This can also tell us about the perception that the entity has for
the verb who is carrying out the action.
3. Theme – If there is an entity that is going to receive the action that is carried
on by the agent. This is the work of the verb to carry out the action on it.
4. Instrument – This is basically the person or entity that is going to carry out
the action that is happening to the theme. If there was a mediator in any
process, the instrument would have a role very synonymous to it.
5. Goal – After the action is carried out by the verb, the entity might move in a
certain direction (metaphorically). This is essentially the goal or the final
objective of the verb action.
6. Source – Just as the verb is carried out on the theme, one would expect
some effect from it as well. This could also be influenced by the direction from
where the verb has originated.
7. Location – Just as the word says correctly it talks about the location of the
action happening. Here, the action that takes place between the agent and
theme through the mediation of the instrument has a certain location.
8. Benefactive – The final result may have consequences on the theme or
anything that is directly or indirectly influenced by it. The entity that is
objected to the abstract feeling delivered to it by the verb is known as
benefactive.
E.g.
1. Raju hit Shyam
Here Raju is the agent and Shyam is the theme.
Baburao bought his dhoti from Raju at the shop
Here Baburao is the agent, Raju is the theme and the shop is the source.
Raju was fooled in his house by Laxmi Chit Fund.
Here Raju is the theme, Laxmi Chit Fund is the agent and the house is the location.
Key Takeaways:
Term, constant, function and variable are ethe main features of FOL.
The algorithm which assigns the verb to the Thematic properties are
called Thematic Roles.
An antonymy works as more of a relation between the negation of senses
rather than simple words.
Word sense can be defined as the representation of a certain aspect that
makes a word discretely. This is controlled online with the help of the
WordNet.
3.3 Word Sense Disambiguation
The process of making the selection of an appropriate word sense to suit a word that is
chose is known as word sense disambiguation. A lot of algorithms are written based
in the above in order to have potential outputs being scanned. This can be done in a lot
of different steps which are noted below.
Tasks and Datasets of the WSD Algorithm
There are different tags that are essential in order to get the correct output. We need to
have proper datasets that can be used in order to process different algorithms being
used in the above section. WordNet is a good source of resources that can be used to
train the same model and algorithm. Let us consider the word ‘BASS’. This is going to
have different meanings which can be seen easily with the help of the diagram that is
given below.
We could also go through the algorithm being used for different words that serve as
input models for the above. Nouns, adjectives as well as other parts of speech are part
of this algorithm.
Evaluation of WSD algorithm using thesaurus
Evaluating a WSD algorithm can be done with th help of 2 heuristics. These are key to
understand the relations between the word sense and help analyze them further and
bring out inferences as well.
These systems are evaluated in the intrinsic form. We can use the F1 hand-labelled tags
which can evaluate it on the basis of each corpora. To understand the same we use a
term that is known as most frequent sense. We have to choose each word or token
from the labels that are present in the corpus. Each word may be tagged as well with
their counts coming from the WordNet part. This baseline of most frequent sense is
fairly accurate and can be used as a default in order to train a supervised algorithm,
especially with data that is insufficient.
There was another observation made while going through this algorithm which stated
that a certain word may appear multiple times in a corpus with the same sense. This
heuristic is known as one sense per discourse. We notice that this case runs very well
for word sense that are in the form of homonymy than simply polysemy. There is bound
to be some sense of ambiguity throughout the whole process of evaluation.
1 – nearest – neighbour – algorithm
When we look at all the different WSD algorithms that are present around us we notice
that this particular algorithm that uses word embeddings that are contextual, stands out
to be one of the most efficient.
When we pass each string through a certain algorithm at the time of training, there is
embedding done for each word or token that is selected from the string. When we
take each sense ci along with each word c, we are able to compute the average
embedding sense Vs through that sentence. The formula is given below as follows :
For testing the algorithm, we simply use the formula below to calculate the embedding
and also choose appropriate nearest neighbours. All of this is done from the training
set.
Hidden Markov Model
A Hidden Markov Model (HMM) is a model that is a modified stochastic tagging model
with the main process being underlined throughout the main process. The result is
simply the same if not more accurate. However, the processes that get done cannot be
seen. The results of each sequence throughout the process is the only part that can be
seen and comprehended.
To understand this a little better we need to understand the term Markov chain. This
is simply a model that gives us inferences about the sequences from the different
probability densities occurring throughout the corpus. To make a prediction the
markov chain relies heavily on the current position of the tag or word count in the
string. The Hidden Markov Model allows the user to dwell around both visible events
(input strings) as well as hidden events (POS tagging).
Let us take an example to understand better the HMM process for a certain example.
Say we have to build an HMM of the problem where coins are tossed in a certain
sequence. To get a better understand we can break this down into the 2 processes and
factors
1. Visible Events
These are the events that we know are happening and can physically or virtually
picture them. In this case it is the tossing of coins. We are sure that coins are being
tossed with an option of either showing ‘heads’ or ‘tails’ as the outcome.
Hidden Events
These are the factors that are completely hidden from the user. In this case they could
be the number of coins that are used in the problem or the order in which the coins are
tossed up in case they are different or weighted. One could conclude so many
different models to understand this process better. One of the forms of the model is
given below.
Tags in each of the following are basically sequences that generate tags for the outputs
being given in the hidden states. These states have tags that could give an output
which is observable to the user.
Maximum Entropy Markov Model (MEMM)
This model is used as an alternate form for the above Markov Model. In this model, the
sequences which are taking place are replaced by a single function that we can define.
This model can simply tell us about the happenings in the predicted state s while
knowing the past states’ as well as the current observation o. P (s | s’ , o)
The main difference that we can get ourselves familiar with is the part where HMM’s
are only dependent on the current state whereas the MEMM’s have their inferences
which can be based on the current as well as the past states. This could involve a
better form of accuracy while going through the entire tagging process. We can even
see the difference in the diagram which is given below.
Disadvantages of MEMM’s
1. MEMM models have a problem due to the label bias problem. Once the state has
been selected, the observation which is going to succeed the current one will
select all of the transitions that might be leaving the state.
There are a few examples in words such as “rib” and “rob” which when placed in the
algorithm in their raw from have a little different path as compared to the other words.
When we have “r_b” there is a path that goes through 0-1 and 0-4 to get the probability
mass function. Both the paths would be equally likely not allowing the model to select
one fixed word that could be predicted by the model correctly.
Key Takeaways:
The process of making the selection of an appropriate word sense to suit a word
that is chose is known as word sense disambiguation.
References:
1. Fundamentals of Speech Recognition by Lawrence Rabiner
2. Speech and Language Processing by Daniel Jurafsky and James H. martin