100% found this document useful (1 vote)

454 views34 pages

NLP Unit 1

The document outlines a syllabus for a natural language processing course, including prerequisites of basic English grammar and machine learning knowledge, course objectives of understanding NLP algorithms and tasks, and units covering introductions to NLP, applications like information extraction and question answering, and the different approaches of rule-based, statistical, and neural machine translation.

Uploaded by

hellrider22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

454 views34 pages

NLP Unit 1

Uploaded by

hellrider22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Natural Language Processing

Dr. Ankur Priyadarshi

Assistant Professor
Computer Science and Information Technology
Syllabus

Prerequisites:

1. Basic knowledge about English grammar and

Theory of Computation.
2. Basic knowledge in Machine Learning tools.
Course objectives
1. To understand the algorithms available for the processing of
linguistic information and computational properties of natural languages.

2. To conceive basic knowledge on various morphological,

syntactic and semantic NLP tasks.

3. To familiarize various NLP software libraries and datasets

publicly available.

4. To develop systems for various NLP problems with moderate

complexity.

5. To learn various strategies for NLP system evaluation and error

analysis.
Unit I:
INTRODUCTION TO NLP
Natural Language
Processing
⊹ Natural language processing (NLP) refers to the branch of computer

science—and more specifically, the branch of artificial intelligence or

AI—concerned with giving computers the ability to understand text and

spoken words in much the same way human beings can.

⊹ NLP combines computational linguistics—rule-based modeling of human

language—with statistical, machine learning, and deep learning models.

⊹ Together, these technologies enable computers to process human

language in the form of text or voice data and to ‘understand’ its full

meaning, complete with the speaker or writer’s intent and sentiment.

NLP APPLICATIONS

1. Information Extraction

2. Question Answering

3. Sentiment Analysis

4. Machine Translation and many..

Speech recognition, Intent classification, Urgency detection, Auto-correct, Market Intelligence, Email
filtering, Voice assistants and chatbots, Advertisement to target audience, Recruitment
Information Extraction (IE)
1. Working with an enormous amount of text data is always hectic

and time-consuming.

2. Hence, many companies and organisations rely on Information

Extraction techniques to automate manual work with intelligent

algorithms.

3. Information extraction can reduce human effort, reduce expenses,

and make the process less error-prone and more efficient.

Example: IE

We can extract the following information from the text:

● Country – India, Captain – Virat Kohli

● Batsman – Virat Kohli, Runs – 2
● Bowler – Kyle Jamieson
● Match venue – Wellington
● Match series – New Zealand
● Series highlight – single fifty, 8 innings, 3 formats
Question Answering

⊹ Question answering is a critical NLP problem and a

long-standing artificial intelligence milestone.
⊹ QA systems allow a user to express a question in natural
language and get an immediate and brief response.
⊹ QA systems are now found in search engines and phone
conversational interfaces, and they’re fairly good at
answering simple snippets of information.
⊹ On more hard questions, however, these normally only go as
far as returning a list of snippets that we, the users, must
then browse through to find the answer to our question.
Sentiment Analysis

⊹ Sentiment analysis (or opinion mining) is a natural

language processing (NLP) technique used to determine

whether data is positive, negative or neutral.

⊹ Sentiment Analysis, as the name suggests, it means to

identify the view or emotion behind a situation. It basically

means to analyze and find the emotion or intent behind a

piece of text or speech or any mode of communication.

Suppose, there is a fast-food chain company and they sell a variety of

different food items like burgers, pizza, sandwiches, milkshakes, etc. They

have created a website to sell their food and now the customers can order

any food item from their website and they can provide reviews as well, like

whether they liked the food or hated it.

● User Review 1: I love this cheese sandwich, it’s so delicious.

● User Review 2: This chicken burger has a very bad taste.
● User Review 3: I ordered this pizza today.
So, as we can see that out of these above 3 reviews,

The first review is definitely a positive one and it signifies that the customer was

really happy with the sandwich. The second review is negative, and hence the

company needs to look into their burger department. And, the third one doesn’t

signify whether that customer is happy or not, and hence we can consider this as a

neutral statement.
Machine Translation
Machine Translation (MT) is the task of automatically
converting one natural language into another, preserving
the meaning of the input text, and producing fluent text in the
output language.

Machine Translation (MT) is the task of automatically converting one natural

language into another, preserving the meaning of the input text, and producing
ﬂuent text in the output language.

While machine translation is one of the oldest subﬁelds of artiﬁcial intelligence

research, the recent shift towards large-scale empirical techniques has led to
very signiﬁcant improvements in translation quality.

The Stanford Machine Translation group's research interests lie in techniques

that utilize both statistical methods and deep linguistic analyses.
Machine translation: approaches

● Rule-based Machine Translation (RBMT): 1970s-1990s

● Statistical Machine Translation (SMT): 1990s-2010s

● Neural Machine Translation (NMT): 2014-...

Rule based MT (RBMT)

A rule-based system requires experts’ knowledge about the source and

the target language to develop syntactic, semantic and morphological

rules to achieve the translation.

The Wikipedia article of RBMT includes a basic example of rule-based

translation from English to German. The translation needs an

English-German dictionary, a rule set for English grammar and a rule

set for German grammar

An RBMT system contains a pipeline of Natural Language Processing

(NLP) tasks including Tokenization, Part-of-Speech tagging and so on.

Most of these jobs have to be done in both source and target language.

SYSTRAN is one of the oldest Machine Translation company.

It translates from and to around 20 languages.

SYSTRAN was used for the Apollo-Soyuz project (1973) and by the

European Commission (1975)

Advantages

● No bilingual text required

● Domain-independent

● Total control (a possible new rule for every situation)

● Reusability (existing rules of languages can be transferred

when paired with new languages)

Disadvantages

● Requires good dictionaries

● Manually set rules (requires expertise)

Statistical MT
This approach uses statistical models based on the analysis of bilingual

text corpora.

It was first introduced in 1955, but it gained interest only after 1988

when the IBM Watson Research Center started using it.

SMT Examples

● Google Translate (between 2006 and 2016, when they

announced to change to NMT)

● Microsoft Translator (in 2016 changed to NMT)

● Moses: Open source toolkit for statistical machine translation

Advantages

● Less manual work from linguistic experts

● One SMT suitable for more language pairs

● Less out-of-dictionary translation: with the right language

model, the translation is more fluent

Disadvantages

● Requires bilingual corpus

● Specific errors are hard to fix

● Less suitable for language pairs with big differences in word

Neural MT
❖ The neural approach uses neural networks to achieve machine

translation.

❖ Compared to the previous models, NMTs can be built with one

network instead of a pipeline of separate tasks.

NMT examples

● Google Translate (from 2016) link to language team at Google

● Microsoft Translate (from 2016) link to MT research at

Microsoft

● Translation on Facebook: link to NLP at Facebook AI

● OpenNMT: An open-source neural machine translation

system.
Advantages

● End-to-end models (no pipeline of specific tasks)

Disadvantages

● Requires bilingual corpus

● Rare word problem

NLP PHASES
Lexical Analysis
● It involves identifying and analyzing the structure of words. Lexicon of a

language means the collection of words and phrases in that particular

language.

● The lexical analysis divides the text into paragraphs, sentences, and words.

So we need to perform Lexicon Normalization.

The most common lexicon normalization techniques are Stemming:

● Stemming: Stemming is the process of reducing derived words to their word

stem, base, or root form generally a written word form like-“ing”, “ly”, “es”, “s”,
etc
● Lemmatization: Lemmatization is the process of reducing a group of words
into their lemma or dictionary form. It takes into account things like POS(Parts
of Speech), the meaning of the word in the sentence, the meaning of the
word in the nearby sentences, etc. before reducing the word to its lemma.
Syntactic Analysis
Syntactic Analysis is used to check grammar, arrangements of words,

and the interrelationship between the words.

Example: Mumbai goes to the Sara

Here “Mumbai goes to Sara”, which does not make any sense, so this

sentence is rejected by the Syntactic analyzer.

Syntactical parsing involves the analysis of words in the sentence for

grammar.

Dependency Grammar and Part of Speech (POS) tags are the important

attributes of text syntactic.

Semantic analysis

The way we understand what someone has said is an unconscious

process relying on our intuition and knowledge about language itself.

In other words, the way we understand language is heavily based on

meaning and context. Computers need a different approach, however.
The word “semantic” is a linguistic term and means "related to
meaning or logic."

Semantic analysis is the process of understanding the meaning and

interpretation of words, signs and sentence structure.
Discourse Integration

Discourse integration is closely related to pragmatics (context of the sentence).

Discourse integration is considered as the larger context for any smaller part of NL

structure. NL is so complex and, most of the time, sequences of text are dependent

on prior discourse.

This concept occurs often in pragmatic ambiguity. This analysis deals with how the

immediately preceding sentence can affect the meaning and interpretation of the

next sentence. Here, context can be analyzed in a bigger context, such as paragraph

level, document level, and so on.

Pragmatic Analysis
Pragmatic Analysis is part of the process of extracting information from text.
Speciﬁcally, it’s the portion that focuses on taking a structures set of text and
ﬁguring out what the actual meaning was.

It actually comes from the ﬁeld of linguistics (as a lot of NLP does), where the
context is considered from the text.

Why is this important? Because a lot of text’s meaning does have to do with
the context in which it was said/written.

Ambiguity, and limiting ambiguity, are at the core of natural language

processing, so needless to say, pragmatic analysis is actually quite crucial
with respect to extracting meaning or information.
Difficulty In NLP

● Contextual words and phrases and homonyms

● Synonyms

● Irony and sarcasm

● Ambiguity

● Errors in text or speech

● Colloquialisms and slang

● Domain-speciﬁc language

● Low-resource languages

● Lack of research and development

NLP Presentation
No ratings yet
NLP Presentation
19 pages
1 - Introduction TO NLP
100% (1)
1 - Introduction TO NLP
46 pages
B.Tech CSE NLP Course Overview
No ratings yet
B.Tech CSE NLP Course Overview
24 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Introduction To NLP
No ratings yet
Introduction To NLP
30 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
51 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Solution To NLP Viva Questions
No ratings yet
Solution To NLP Viva Questions
21 pages
NLP QB
100% (2)
NLP QB
14 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
NLP Basics for AI Enthusiasts
100% (1)
NLP Basics for AI Enthusiasts
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
21AD3202 - Natural LanguageProcessing-Record
No ratings yet
21AD3202 - Natural LanguageProcessing-Record
64 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
NLP Sample Question Bank
No ratings yet
NLP Sample Question Bank
9 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
System Paradigms in NLP
No ratings yet
System Paradigms in NLP
8 pages
NLP Revision Notes and Applications
No ratings yet
NLP Revision Notes and Applications
4 pages
NLP Previous Question Papers
No ratings yet
NLP Previous Question Papers
5 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Introduction to NLP Techniques
100% (1)
Introduction to NLP Techniques
105 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Notes (Ch1-5) PDF
100% (1)
NLP Notes (Ch1-5) PDF
41 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
Unit I - NLP
No ratings yet
Unit I - NLP
24 pages
NLP UNIT 1 (Ques Ans Bank)
No ratings yet
NLP UNIT 1 (Ques Ans Bank)
20 pages
Unit 1
No ratings yet
Unit 1
99 pages
NLP Course for B.Tech CSE Students
100% (1)
NLP Course for B.Tech CSE Students
8 pages
Unit IV Notes
No ratings yet
Unit IV Notes
14 pages
Question Bank
No ratings yet
Question Bank
13 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
1.introduction To Natural Language Processing (NLP)
100% (1)
1.introduction To Natural Language Processing (NLP)
37 pages
Natural Language Processing Parsing Techniques:: Unit IV
100% (1)
Natural Language Processing Parsing Techniques:: Unit IV
24 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Semantic Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Semantic Analysis: Natural Language Processing (CSE 5321)
35 pages
NLP Final
No ratings yet
NLP Final
26 pages
NLP MCQ 153 Out of 427 - Part One
No ratings yet
NLP MCQ 153 Out of 427 - Part One
30 pages
Unit 2
No ratings yet
Unit 2
15 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
No ratings yet
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
9 pages
Unit I
No ratings yet
Unit I
30 pages
NLP Course for Students
No ratings yet
NLP Course for Students
25 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP Unit 4
No ratings yet
NLP Unit 4
40 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Chapter 6 NLP
No ratings yet
Chapter 6 NLP
16 pages
UNIT 6 Applications of NLP
No ratings yet
UNIT 6 Applications of NLP
7 pages
Natural Language Processing Unit 5
No ratings yet
Natural Language Processing Unit 5
23 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
NLP CH 1
No ratings yet
NLP CH 1
8 pages
Java's Role in Emerging Tech
No ratings yet
Java's Role in Emerging Tech
17 pages
Understanding Artificial Intelligence - Chapter1
No ratings yet
Understanding Artificial Intelligence - Chapter1
38 pages
NLP for Indian Languages: A Unified Approach
No ratings yet
NLP for Indian Languages: A Unified Approach
105 pages
A Rewrite Rule Based Model of Bangla Morpho-Phonological Change
No ratings yet
A Rewrite Rule Based Model of Bangla Morpho-Phonological Change
8 pages
Iks and Ai
No ratings yet
Iks and Ai
2 pages
Artificial Intelligence in Library Services
No ratings yet
Artificial Intelligence in Library Services
2 pages
Unit 4
No ratings yet
Unit 4
16 pages
AI UNIT 1&2 Searchable
No ratings yet
AI UNIT 1&2 Searchable
78 pages
Impact of Multiword Expression in English - Hindi Language
No ratings yet
Impact of Multiword Expression in English - Hindi Language
5 pages
Introduction To NLP: Natural Language Processing
No ratings yet
Introduction To NLP: Natural Language Processing
21 pages
Unit 1 Ai Project Cycle
No ratings yet
Unit 1 Ai Project Cycle
5 pages
KothaGPT: The Accessible ChatGPT For Bengali Users
0% (1)
KothaGPT: The Accessible ChatGPT For Bengali Users
52 pages
Data Annotation - Billion Dollar Potential Driving The AI Revolution - Report by Nasscom
No ratings yet
Data Annotation - Billion Dollar Potential Driving The AI Revolution - Report by Nasscom
51 pages
CV Analysis Using Machine Learning
No ratings yet
CV Analysis Using Machine Learning
9 pages
NLP Part I Unit I Notes
No ratings yet
NLP Part I Unit I Notes
11 pages
Chatbot
No ratings yet
Chatbot
71 pages
Generative AI Manual 6th Sem.
No ratings yet
Generative AI Manual 6th Sem.
15 pages
Tcs Ion - Credit Course: Name - Zeel Prakash Thakkar Div - Tybcom E Roll No - 619 PRN NO - 2019016402012387
No ratings yet
Tcs Ion - Credit Course: Name - Zeel Prakash Thakkar Div - Tybcom E Roll No - 619 PRN NO - 2019016402012387
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
18 pages
Document 1
No ratings yet
Document 1
16 pages
Pmwj80 Apr2019 Wang How To Aply AI in Project Management
No ratings yet
Pmwj80 Apr2019 Wang How To Aply AI in Project Management
12 pages
Multilingual Natural Language Processing
No ratings yet
Multilingual Natural Language Processing
2 pages
Masters Thesis Revised
No ratings yet
Masters Thesis Revised
4 pages
A Project Report #1
No ratings yet
A Project Report #1
62 pages
Agentic AI - Comprehensive Guide
100% (1)
Agentic AI - Comprehensive Guide
20 pages
Hollenstein, N., & Beinborn, L. (2021) - Relative Importance in Sentence Processing. ACL-IJCNLP
No ratings yet
Hollenstein, N., & Beinborn, L. (2021) - Relative Importance in Sentence Processing. ACL-IJCNLP
10 pages
SSRN Id4402499
No ratings yet
SSRN Id4402499
7 pages
Ai Answers
No ratings yet
Ai Answers
6 pages
HealthBot Writeup
No ratings yet
HealthBot Writeup
5 pages
Syllabi MTech Artificial Intelligence
No ratings yet
Syllabi MTech Artificial Intelligence
56 pages