Natural Language Processing
& Applications
Why Text ?
Source: www.pinterest.com
2
Source : RECOMND
3
Source: lifeboat.com
4
Natural Language Processing
• A hallmark of human intelligence.
• Natural Language Processing
• NLP = Natural Language Understanding + Natural Language Generation
• Process information contained in natural language text.
• Also known as Computational Linguistics (CL), Human Language Technology
(HLT), Natural Language Engineering (NLE)
• Can machines understand human language?
Ultimate goal
Analyze, understand and generate human languages just like humans do.
5
Fitting in CS taxonomy
Computers
Databases Algorithms Networking
Artificial Intelligence
Robotics Natural Language Processing Search
6
NLP- Tasks
Natural Language Understanding
Taking some spoken/typed sentence and working out what it means
Natural Language Generation
Taking some formal representation of what you want to say and working out a
way to express it in a natural (human) language (e.g., English)
7
Working towards
• Applying computational techniques to language domain.
• Use the theories to build systems that can be of social use.
• Make computers learn our language rather than we learn theirs.
8
Natural language understanding
Raw speech signal /Raw Text
• Speech recognition
Sequence of words spoken /written
• Syntactic analysis
Structure of the sentence
• Semantic analysis
Partial representation of meaning of sentence
• Discourse & Pragmatic analysis
Final representation of meaning of sentence
9
Need for Language Technologies – In Daily life
A computer could be used for:
• Answering the phone, and replying to a question
• Translating a daily newspaper.
• Read the whole newspaper and tell me the important news only
• Automatically generating movie subtitles
• Sentence corrections
• Correcting descriptive questions.
• Understanding text in journals / books and building an expert system.
10
Application Areas
➢Machine Translation
➢Information Retrieval
Selecting from a set of documents the ones that are relevant to a query
➢Text Categorization
Classifying text into fixed topic categories
➢Question Answering
➢ Information Extraction
Converting unstructured text into structured data
11
Application Areas (cont..)
➢Spoken language control systems
➢Spelling and grammar checkers
➢Sentiment Analysis
➢Text-to-Speech & Speech recognition
➢Natural Language Dialogue Interfaces to Databases
12
Question Answering
Source: Google
13
Information Retrieval
• NLP improves web search
• Search for ‘Jaguar’
• Search for ‘Apple’
• Search for notebook, find Laptop
Source: Google
14
Email Spam Filtering/Categorizing
Source : junkemailfilter.com
15
Text Categorization
• Assign Label to a document representing its content (ACM keyword, Yahoo
category)
• E.g. Decide if a newspaper article is about politics, business, or sports?
16
Source: Medium
Machine Translation
• Multilingual Usage
• Machine-assisted human Translation
• Scope
Creating Language resources.
Source: www.localizer.co
17
Source: Google
18
Duplicate Question detection
19
Knowledge Extraction
Source: http://aritter.github.io
20
Information Extraction
Information extraction systems
• Find and understand relevant parts of text.
• Produce a structured representation of the relevant information
from text, in the form of :
• entities,
• relations between entities ,
• events in which the entities are involved.
• Produce a structured representation of the relevant information-
relations/events
21
Information Extraction
Source : cs.washington.edu 22
Applications of IE Systems
• Extracting diagnoses, symptoms, physical findings, test results from Medical
patient records.
• Gathering earnings, profits, board members, etc. from company reports
• Automatic Verification of construction industry specifications documents
• Real estate advertisements
• Building job databases from textual job vacancy postings
• Extraction of company take-over events
• Categorizing customer feedbacks based on product names
• Location extraction from social media texts for security applications.
23
Semantic Web
• Linked Data
• Vocabularies / Domain Information
• Inference
• Query
Source :Google
24
TOOLS
• Apache OpenNLP : Java machine learning toolkit for natural language
processing
• OpenCalais : Tag the people, places, companies, facts, and events in your
content to increase its value, accessibility and interoperability
• DBpedia Spotlight : Tool for automatically annotating mentions of DBpedia
resources in text.
• Natural Language Toolkit is a suite of libraries and programs for NLP
• General Architecture for Text Engineering (GATE)
• Spacy is a free open-source library featuring state-of-the-art speed and
accuracy and a powerful Python API.
• Stanford CoreNLP:a Java annotation pipeline framework, which provides
most of the common core natural language processing (NLP) steps, from
tokenization through to coreference resolution.
25
Aspects of Language Processing
• Phonology
• Word, lexicon: lexical analysis
• Morphology, word segmentation
• Syntax
• Sentence structure, phrase, grammar, …
• Semantics
• Meaning
• Discourse analysis
• Meaning of a text
• Relationship between sentences
• Pragmatics
The study of meaning in different contexts of use
26
Phonology
Speech processing
• Humans process speech remarkably well.
• Speech interface can replace keyboards and monitors.
• Convert Acoustic signals to Text.
• Phonemes are the smallest recognizable speech unit in a language.
Grapheme
A way of writing
down a phoneme
Speech Recognition, Text to Speech Conversion 27
Lexical Analysis -Morphology
Delegate
(de + leg + ate)
Take the legs from
cashier
(cashy + er)
More wealthy
Source: www.pinterest.co.uk
28
Morphology
• Structures and patterns in words
• Words are a sequence of Morphemes.
• Morpheme – smallest meaningful unit in a word.
• Analyses how words are formed from morphemes.
e.g., dogs= dog+s.
• Inflectional Morphology – Same Part of Speech
• Buses = Bus + es
• Carried = Carry + ed
• Derivational Morphology – Change PoS.
• Destruct + ion = Destruction (Noun)
• Beauty + ful = Beautiful (Adjective)
• Affixes – Prefixes, Suffixes Rules govern the fusion.
Spell checkers, Lemmatization, Information retrieval
29
Syntax
• Words when put together they convey more.
• Syntax is the grammatical structure of the
sentence.
• Syntactic Analysis (Parsing)
Process of assigning a parse tree to a
sentence.
Parsing: Given a sentence and a grammar
• Checks that the sentence is correct
according to the grammar
• Returns a parse tree representing the
structure of the sentence
Grammar checking tools, Information Extraction, Phrase Identification 30
Syntactic Analysis - Grammar
sentence -> noun_phrase, verb_phrase
noun_phrase -> proper_noun
noun_phrase -> determiner, noun
verb_phrase -> verb, noun_phrase
proper_noun -> [mary]
noun -> [apple]
verb -> [ate]
determiner -> [the]
31
Parsing
• Analyze the structure of a sentence
NP VP
PP
NP NP
D N V D N P D N
The student put the book on the table
32
Semantic Analysis
• What do you mean..?
• Words – Lexical Semantics
• Sentences – Compositional Semantics
• Converting the syntactic structures to semantic format – meaning
representation.
• Semantics: the meaning of a word or phrase within a sentence
Event Extraction, Knowledgebase construction
33
Semantic Representations
• Meaning representation of the sentence from its syntactic structure(s)
• Ways of meaning representing the sentence:
• Logical forms
Sentence: A tall man plays basketball
Representation: x man(x) & tall(x) & plays(x, basketball)
• Semantic role labelling
Sentence: 3 people were killed as X fired with gun.
Representation: Kill( Agent : X, Victim: 3 people, Instrument : Gun)
34
Discourse Analysis
• The meaning of an individual sentence may depend on the sentences that
precede it and may influence the meaning of the sentence that follow it.
• Issues related to discourse Integration
• Anaphora
Resolving the pronoun’s reference. Co-reference resolution
• Ellipsis
Incomplete sentences
• Anaphora
• I read the book by Dr. Kalam. It was great
• He hits the car with a stone. It bounces back.
35
Anaphora Resolution
• Anaphora Resolution(AR) is the process of
determining the antecedent of an anaphor.
• Anaphor – The reference that points to the
previous item (he, it)
• Antecedent –The entity to which anaphor
refers (John, Ice-cream)
• Mary bought a book for Kelly. She didn’t like it.
• She refers to Mary or Kelly??
• It refers to what -- book
36
Discourse Structures- Ellipsis
• Ellipsis – Incomplete sentences
• “What’s your name?”
• “Sri, and yours?”
The second sentence is not complete, but what it means can be inferred
from the first one.
37
Pragmatics
• Uses context of utterance
• Where, by who, to whom, why, when it was
said
• Intentions: inform, request, promise,
criticize, …
38
Challenges in NLP: Ambiguity
Morphology
• Words with Different Part of speech(POS)
Ex. Issue ( I have an issue/ Please issue a ticket)
• Words with different meaning with same POS
Ex. Bank (River Bank, Indian Bank)
39
Syntax Ambiguity
S S
VP VP
NP NP
NP NP
N N V N N V Adj N
Teacher strikes idle kids Teacher strikes idle kids
40
Attachment Ambiguity
• A sentence has attachment ambiguity if a constituent fits more than one
position in a parse tree.
• Attachment ambiguity arises from uncertainty of attaching a phrase or clause to
a part of a sentence
• “ John saw Mary with a telescope”
• John saw (Mary with a telescope)
• John (saw Mary with a telescope)
41
Semantic Ambiguity
• Meaning of the words themselves can be misinterpreted.
• Example 1: The car hit the pole while it was moving.
• The interpretations can be
• The car, while moving, hit the pole
• The car hit the pole while the pole was moving.
• Example 2:
42
Semantic Ambiguity
Semantic ambiguity: “I saw the prudential building flying into Boston”
Semantic Restriction, Domain Knowledge - Ontology
43
Sample Ontology
44
Discourse Ambiguity
“We gave the monkeys the bananas because they were hungry”
“We gave the monkeys the bananas because they were over-ripe”
45
Pragmatics Ambiguity
Pragmatic ambiguity: “you’re late”
What’s the speaker’s intention: informing or criticizing?
46
Enabling Computing Techniques
• Stemming
• Reduce words to base form.
• Part of Speech Tagging
• Determine for each word whether it is a noun, adjective, verb, …..
• Parsing
• sentence to parse tree
• Wordnet – Lexical Database - 206941 word sense pairs
• Word Sense Disambiguation
• Bank (Financial Bank vs Riverbank)
• Semantic similarity metrics
• Vector Representations of Words, Sentences
• Neural Network based Models
• Word2Vec, Glove, Elmo.
• Pretrained models
• BERT etc.
• Large Language Models
• GPT, Llama etc.
47
Conclusion
• Complete human-level natural language understanding is still a
distant goal
• Develop Algorithms for each level.
• Find appropriate match between application domain and the
available methods
48
References
Books
• Dan Jurafsky and James H. Martin, Speech and Language Processing , Pearson
education
49