0% found this document useful (0 votes)

11 views21 pages

NLP Short Notes

Uploaded by

Harsh sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

NLP Short Notes

Uploaded by

Harsh sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

You're absolutely right!

Looking at the comprehensive document you

provided, my previous notes were indeed missing substantial content.
Let me create a much more complete and detailed version that includes
all the important information from your NLP notes.

Natural Language Processing

(NLP) - Complete Comprehensive
Notes
Table of Contents
1. Introduction to NLP
2. Components of NLP
3. Steps in NLP Processing
4. Applications of NLP
5. Word Structure and Morphology
6. NLTK and Text Processing
7. Syntax and Parsing
8. Ambiguity in NLP
9. Algorithms and Models
10. Knowledge Bottlenecks
11. Advanced Topics

Introduction to NLP
Natural Language Processing (NLP) is the study and engineering of
computational systems that can analyze, understand, generate, and
interact using human language. It bridges unstructured linguistic signals
—text and speech—and structured machine representations by
combining computational linguistics with statistical and neural machine
learning.

Key Objectives
 Making computers learn, understand, analyze, manipulate and
interpret natural (human) languages
 Enable human-computer interaction through natural language
 Part of Computer Science, Human Languages/Linguistics, and
Artificial Intelligence

NLP Pipeline Architecture

Complete Pipeline Flow:

text
Raw Text → Normalization → Tokenization → Linguistic Analysis →
Representations → Model → Output

Detailed Examples for Each Stage:

 Normalization: Lowercasing, Unicode NFKC

 Tokenization: Whitespace splitting, WordPiece
 Linguistic Analysis: POS tagging, dependency parsing
 Representations: TF-IDF, BERT embeddings
 Model: CRF, Transformer
 Output: Labels, summary, translation

Historical Evolution
1. Rule-based Systems: Grammars, lexicons
2. Statistical NLP: N-grams, HMMs/CRFs, PCFGs
3. Neural NLP: Attention-based Transformers, large-scale pretraining

Components of NLP
1. Natural Language Understanding (NLU)
 Function: Transforms human language into machine-readable
format
 Tasks: Extract keywords, emotions, relations, semantics
 Complexity: Harder than NLG

2. Natural Language Generation (NLG)

 Function: Converts computerized data into natural language
 Components:
 Text planning
 Sentence planning
 Text realization

Steps in NLP Processing

1. Lexical Analysis
 Primary Phase: Scans text as character stream
 Functions:
 Converts characters into meaningful lexemes
 Divides text into paragraphs, sentences, words
 Key Concepts:
 Lexeme: Basic unit of meaning (individual or multiword)
 Examples:
 Individual: "talk" → "talks", "talked", "talking"
 Multiword: "speak up", "pull through"

2. Syntactic Analysis (Parsing)

 Purpose: Check grammar and word arrangements
 Functions:
 Show relationships among words
 Reject grammatically incorrect sentences
 Example: "The school goes to boy" → Rejected by English
syntactic analyzer

3. Semantic Analysis
 Focus: Literal meaning of words, phrases, sentences
 Functions:
 Meaning representation
 Reject semantically incorrect phrases
 Examples:
 Rejects: "hot ice-cream"
 Passes syntax but fails semantics: "Manhattan calls out to
Dave"

4. Discourse Integration
 Context Dependency: Considers preceding and following
sentences
 Example:
 "Manhattan speaks to all its people"
 "It calls out to Dave"
 → "It" refers to Manhattan

5. Pragmatic Analysis
 Purpose: Re-interpret what was said vs. what was meant
 Requirements: Real-world knowledge
 Example: "Manhattan speaks to all its people" → Metaphor for
emotional connection
Applications of NLP
1. Sentiment Analysis (Opinion Mining)
 Purpose: Identify emotional tone (positive, negative, neutral)
 Applications: Customer sentiment, brand reputation
 Data Sources: Emails, reviews, social media, surveys
 Technology: Machine learning text analysis models

2. Machine Translation (MT)

 Challenges:
 No equivalent words across languages
 Multiple word meanings
 Idiom translation
 Solutions: Corpus statistical and neural techniques
 Example: Handling linguistic typology differences

3. Text Extraction
 Information Types: Entity names, locations, quantities
 Industries: Healthcare, finance, e-commerce
 Benefits: Process unstructured data efficiently

4. Text Classification
 Also Known As: Text tagging, text categorization
 Process: Categorize text into organized groups
 Business Value: Automated insights, process automation

5. Speech Recognition (ASR)

 Alternative Names: Computer speech recognition, Speech-to-
Text (STT)
 Applications by Industry:
 Automotive: Voice-activated navigation
 Technology: Virtual assistants (Siri, Alexa, Google Assistant)
 Healthcare: Medical dictation
 Sales: Call center transcription, AI chatbots

6. Chatbots
 Function: Automated conversation systems
 Benefits: 24/7 customer service, quick assistance
 Types: Pre-scripted or AI-generated responses

7. Email Filtering
 Evolution: From simple spam filters to intelligent categorization
 Example: Gmail's categorization (main, social, promotional)

8. Search Autocorrect and Autocomplete

 Functions:
 Suggest probable search keywords
 Correct typing errors
 Improve search accuracy

Word Structure and Morphology

Tokens and Tokenization
 Tokens: Syntactic words with independent roles
 Examples:
 "newspaper" → compound word with derivational structure
 "won't" → "will" + "not" (two syntactic words)

Lexemes and Lemmas

 Lexeme: Set of alternative forms expressing same concept
 Lemma: Citation form of lexeme
 Operations:
 Inflection: Convert word to other forms (mouse → mice)
 Derivation: Transform to morphologically related lexeme
(receive → receiver, reception)

Morphemes
 Definition: Minimal meaningful elements of words
 Types:
 Stems: Core meaning (play, cat, friend)
 Affixes: Modify meaning
 Prefixes: Precede stem (un-)
 Suffixes: Follow stem (-ed, -s, -ly)

Morphological Processes
Inflectional Morphology
 Purpose: Different forms of same word
 Examples:
 cat → cats
 mouse → mice

Derivational Morphology
 Purpose: Create new words from roots
 Examples:
 inter + national = international
 international + ize = internationalize
 internationalize + ation = internationalization

Morphophonemic Changes
 Allomorphs: Alternative forms of morphemes
 Examples: Plural morpheme
 -s in "cats", "dogs"
 -es in "dishes"
 -en in "oxen"

Morphological Typology
1. Isolating/Analytic Languages
 Characteristics: Few morphemes per word
 Examples: Chinese, Vietnamese, Thai, English

2. Synthetic Languages
 Agglutinative: One function per morpheme
 Examples: Korean, Japanese, Finnish, Tamil
 Fusional: Multiple functions per morpheme
 Examples: Arabic, Czech, Latin, German

3. Word Formation Processes

 Concatenative: Morphemes linked sequentially
 Nonlinear: Structural components merge non-sequentially
NLTK and Text Processing
Introduction to NLTK
Natural Language Toolkit - Pedagogical and prototyping Python
library

Key Features:

 Tokenizers, stemmers, lemmatizers

 POS taggers, chunkers, parsers
 Corpus readers, evaluation metrics
 Access to corpora (Gutenberg, Brown, movie_reviews)

Basic NLTK Workflow

python
# Installation and setup
import nltk
nltk.download() # Fetch resources

1. Tokenization
Sentence Tokenization
python
from nltk.tokenize import sent_tokenize, word_tokenize

example_string = """
Muad'Dib learned rapidly because his first training was in how to learn.

And the first lesson of all was the basic trust that he could learn.

It's shocking to find how many people do not believe they can learn.
"""

sentences = sent_tokenize(example_string)

Output:

text
["Muad'Dib learned rapidly because his first training was in how to learn.",
'And the first lesson of all was the basic trust that he could learn.',

"It's shocking to find how many people do not believe they can learn."]

Word Tokenization
python
words = word_tokenize(example_string)

Output:

text
["Muad'Dib", 'learned', 'rapidly', 'because', 'his', 'first', 'training',
'was', 'in', 'how', 'to', 'learn', '.', 'And', 'the', 'first', 'lesson', ...]

2. Stop Words Filtering

python
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

worf_quote = "Sir, I protest. I am not a merry man!"

words_in_quote = word_tokenize(worf_quote)

stop_words = set(stopwords.words("english"))
filtered_list = []

for word in words_in_quote:

if word.casefold() not in stop_words:
filtered_list.append(word)

Results:

 Original: ['Sir', ',', 'I', 'protest', '.', 'I', 'am', 'not', 'a', 'merry', 'man',
'!']
 Filtered: ['Sir', ',', 'protest', '.', 'merry', 'man', '!']

Content vs. Context Words

 Content Words: Information about topics and sentiment
 Context Words: Information about writing style

3. Stemming
python
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

stemmer = PorterStemmer()
string_for_stemming = "The crew of the USS Discovery discovered many discoveries.
Discovering is what explorers do."

words = word_tokenize(string_for_stemming)
stemmed_words = [stemmer.stem(word) for word in words]

Stemming Results:
Original Word Stemmed Version

'Discovery' 'discoveri'

'discovered' 'discov'

'discoveries' 'discoveri'

'Discovering' 'discov'

4. Part-of-Speech (POS) Tagging

Parts of Speech Categories

Part of Speech Role Examples

Noun Person, place, or thing mountain, bagel, Poland

Pronoun Replaces a noun you, she, we

Adjective Describes what a noun is like efficient, windy, colorful

Verb Action or state of being learn, is, go

Adverb Modifies verb, adjective, or adverb efficiently, always, very

Part of Speech Role Examples

Preposition Shows relationship between noun/pronoun and another word from, about, at

Conjunction Connects words or phrases so, because, and

Interjection Exclamation yay, ow, wow

python
import nltk
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize

sagan_quote = "If you wish to make an apple pie from scratch, you must first invent
the universe."
words_in_sagan_quote = word_tokenize(sagan_quote)
pos_tags = nltk.pos_tag(words_in_sagan_quote)

Output:

text
[('If', 'IN'), ('you', 'PRP'), ('wish', 'VBP'), ('to', 'TO'), ('make', 'VB'),
('an', 'DT'), ('apple', 'NN'), ('pie', 'NN'), ('from', 'IN'), ('scratch', 'NN'),

(',', ','), ('you', 'PRP'), ('must', 'MD'), ('first', 'VB'), ('invent', 'VB'),

('the', 'DT'), ('universe', 'NN'), ('.', '.')]

5. Lemmatization
python
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

lemmatizer = WordNetLemmatizer()
string_for_lemmatizing = "The friends of DeSoto love scarves."
words = word_tokenize(string_for_lemmatizing)

lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

Advanced Lemmatization:

python
lemmatizer.lemmatize("worst") # Output: 'worst'
lemmatizer.lemmatize("worst", pos="a") # Output: 'bad'
6. Chunking
Purpose: Identify phrases (groups of words functioning as single units)

Noun Phrase Examples

 "A planet"
 "A tilting planet"
 "A swiftly tilting planet"

python
import nltk
from nltk.tokenize import word_tokenize

quote = "It's a dangerous business, Frodo, going out your door."

words_quote = word_tokenize(quote)
tags = nltk.pos_tag(words_quote)

# Define chunk grammar

grammar = "NP: {<DT>?<JJ>*<NN>}"
chunk_parser = nltk.RegexpParser(grammar)
tree = chunk_parser.parse(tags)

Output Tree Structure:

text
(S
It/PRP

's/VBZ

(NP a/DT dangerous/JJ business/NN)

,/,

Frodo/NNP

,/,

going/VBG

out/RP

your/PRP$

(NP door/NN)

./.)

7. Chinking
Purpose: Exclude patterns from chunks (opposite of chunking)
python
grammar = """
Chunk: {<.*>+}
}<JJ>{"""

chunk_parser = nltk.RegexpParser(grammar)
tree = chunk_parser.parse(tags)

8. Named Entity Recognition (NER)

python
nltk.download("maxent_ne_chunker")
nltk.download("words")

tree = nltk.ne_chunk(tags)

Output:

text
(S
It/PRP

's/VBZ

a/DT

dangerous/JJ

business/NN

,/,

(PERSON Frodo/NNP)

,/,

going/VBG

out/RP

your/PRP$

door/NN

./.)

Binary NER (without entity type specification):

python
tree = nltk.ne_chunk(tags, binary=True)
Syntax and Parsing
Grammar Types in NLP
1. Context-Free Grammar (CFG)
 Structure: Rules for forming well-structured sentences
 Language Patterns: SVO (Subject-Verb-Object), SOV, OSV

2. Constituency Grammar (Phrase Structure

Grammar)
 Focus: Phrase and clause structure
 Examples: NP (Noun Phrase), PP (Prepositional Phrase), VP (Verb
Phrase)

3. Dependency Grammar
 Focus: Grammatical relations between individual words
 Structure: Network of relations rather than recursive structure
 Features: Labeled relations between words

Parsing Process
Definition: Determining syntactic structure of text by analyzing
constituent words based on underlying grammar.

Example Grammar Rules

text
sentence → noun_phrase verb_phrase
noun_phrase → determiner noun

verb_phrase → verb noun_phrase

determiner → 'the', 'a', 'an'

noun → 'Tom', 'apple'

verb → 'ate'

Parse Tree Structure

 Root: sentence
 Non-terminals: noun_phrase, verb_phrase (intermediate nodes)
 Terminals: 'Tom', 'ate', 'an', 'apple' (leaves)

Syntactic Structure Representations

1. Dependency Parsing
 Philosophy: Connect head word with dependents in phrase
 Structure: Directed (asymmetric) connections
 Components:
 Words as vertices
 Directed arcs as binary relations (head to dependent)
 Each word depends on exactly one parent

2. Phrase Structure Parsing

 Approach: Partition sentences into constituents
 Method: Recursive partitioning into phrases (NP, VP, PP)
 Traditional: Derives from sentence diagrams

Constituency Tree vs. Dependency Tree

Dependency structures explicitly
represent:
 Head-dependent relations (directed arcs)
 Functional categories (arc labels)
 Possibly some structural categories (POS)

Phrase structure explicitly represents:

 Phrases (non-terminal nodes)
 Structural categories (non-terminal labels)
 Possibly some functional categories (grammatical functions)

Treebanks: Data-Driven Approach

Definition: Linguistically annotated corpus including syntactic analysis
beyond POS tagging

Key Features:
 Collection of sentences with complete syntax analysis
 Human expert judgment for most plausible analysis
 Consistent treatment across related grammatical phenomena
 No explicit grammar rules provided

Benefits:
1. Solves Grammar Problem: Syntactic analysis directly given
2. Solves Probability Problem: Supervised learning for scoring
functions

Analysis Types:
 Dependency Analysis: Favored for free word order languages
(Czech, Turkish)
 Phrase Structure Analysis: Used for long-distance dependencies
(English, French)

Parsing Algorithms
Key Concepts:
 Derivation: Sequence of steps to derive string from grammar
 Sentential Form: Each line in derivation sequence
 Rightmost Derivation: Expand rightmost nonterminal at each
step

Algorithm Types:
 CKY Parsing: Requires CNF grammar, fills triangular chart bottom-
up
 Earley Parsing: Handles arbitrary CFGs with dotted rules
 Neural Parsers: Use encoders for contextual token vectors

Ambiguity in NLP
Core Challenge: Ambiguity drives the fundamental difficulty in
language understanding

Types of Ambiguity
1. Lexical Ambiguity
 Definition: Word has multiple senses
 Example: "bank" = financial institution vs. river edge
 Resolution: Word Sense Disambiguation using context and world
knowledge

2. Syntactic Ambiguity
 Definition: Same sequence yields multiple parse trees
 Example: "I saw the man with a telescope"
 Did seeing happen with telescope?
 Did the man have the telescope?
 Resolution: Probabilistic or neural parsers score structures

3. Semantic Ambiguity
 Definition: Sentence-meaning level ambiguity
 Example: "Visiting relatives can be boring"
 Act of visiting them is boring
 Relatives who visit are boring
 Requirements: Selectional preferences and event structure

4. Pragmatic Ambiguity
 Definition: Depends on context, intention, social norms
 Example: "Can you pass the salt?" (request, not ability query)
 Resolution: Speech act recognition

5. Referential/Anaphoric Ambiguity
 Definition: Pronouns/descriptions have multiple candidates
 Example: "Alice told Jane that she would win" (who is "she"?)
 Resolution: Coreference resolution and entity tracking

Mathematical Approaches to Ambiguity

Bayesian Scoring
For syntactic ambiguity: P(T|x) ∝ P(x|T)P(T)

Viterbi Algorithm
Used in HMMs for optimal tag sequence selection

Persistent Challenges
 Rare senses
 Long-distance dependencies
 Idioms and sarcasm
 Under-specified references
 Need for external knowledge and discourse modeling

Algorithms and Models

Algorithmic Families by Task and Data
1. Rule-Based Methods
 Components: Finite-state transducers, handcrafted grammars
 Advantages: Interpretability, precision in constrained domains
 Applications: Tokenization, morphology, parsing

2. Statistical Learning
 Models:
 N-gram language models
 HMMs and CRFs (tagging, segmentation)
 PCFGs (parsing with chart algorithms)
 Decoding: Dynamic programming

3. Neural Models
 Evolution:
 RNNs and LSTMs (sequences)
 CNNs (character/subword features)
 Attention mechanisms (long-range dependencies)
 Transformer architecture (self-attention only)

Knowledge Bottlenecks
The Knowledge Gap
Definition: Gap between what model encodes vs. what robust language
understanding requires

Problem Areas:
 Text alone lacks full commonsense and world knowledge
 Labeled data is costly and uneven across domains/languages
 Distribution shift causes failures when test differs from training

Classic Error Examples:

 Pronoun Resolution: "The trophy doesn't fit in the suitcase
because it is too small" (it = suitcase)
 Temporal Reasoning: Understanding time relationships
 Spatial Relations: Understanding spatial concepts
 Procedural Knowledge: Understanding processes

Strategies to Address Bottlenecks:

1. Pretraining
 Massive corpora for linguistic and factual regularities

2. Retrieval Augmentation
 Ground generation in up-to-date sources

3. Knowledge Integration
 Structured knowledge graphs (Wikidata)
 Differentiable memory systems

4. Weak Supervision
 Data programming to expand coverage

5. Instruction Tuning
 Preference optimization for better task intent following

Advanced Topics
Word-Level Analysis
Text Normalization (Task-Dependent
Decisions):
 Lowercasing: Affects proper nouns
 Unicode Normalization: NFKC harmonizes similar characters
 Punctuation Handling: Differs between IR and sentiment tasks
 Number Handling: Map digits to placeholders
 Contractions: Expansion improves parsing
 Stopword Removal: Reduces noise for bag-of-words, may harm
generation

Stemming vs. Lemmatization:

 Stemming: Aggressive suffix chopping (may over-stem)
 Lemmatization: Uses vocabulary and POS for accurate lemmas

Tokenization Approaches:
 Whitespace → Rule-based → Subword (BPE/WordPiece)
 Trade-offs: OOV handling vs. morphological fidelity

Edit Distance
Levenshtein Distance:
 Counts insertions, deletions, substitutions (unit costs)
 Example: "kitten" → "sitting" = 3 edits
1. k → s (substitute)
2. e → i (substitute)
3. Insert g at end

Damerau-Levenshtein Distance:
 Adds transposition of adjacent characters

Applications:
 Spell correction
 DNA sequence comparison
 String similarity tasks

Spelling Correction
Two-Phase Process:
1. Detection: Is token erroneous?
2. Correction: Which candidate is intended?
Error Types:
 Nonword Errors: Easy detection with lexicon
 Real-word Errors: Require contextual modeling ("peace" vs
"piece")

Candidate Generation Methods:

 Edit neighborhoods (distance 1-2)
 Keyboard adjacency graphs
 Phonetic hashing (Soundex, Metaphone)
 Morphological variants

POS Tagging Details

Word Classes:
 Open Classes: Nouns, verbs, adjectives, adverbs
 Closed Classes: Prepositions, determiners, conjunctions,
pronouns, particles

Tagset Examples:
 Penn Treebank: NN, NNS, NNP, VB, VBD, VBG, JJ, RB, IN, DT, PRP
 Universal Dependencies: NOUN, VERB, ADJ, ADV, ADP, PRON,
DET

Applications:
 Supporting parsing
 Lemmatization (verb vs noun lemmas)
 Downstream tasks like NER

Modern System Patterns

1. Retrieval-Augmented Generation (RAG)
 Dense retrievers fetch knowledge
 Generate grounded responses

2. Tool-Use Agents
 Combine language models with deterministic tools
 Ensure reliable actions
Evaluation Considerations
Task-Specific Metrics:
 Classification/Extraction: Accuracy, F1-score
 Generation: BLEU, ROUGE, BERTScore
 Production: Human preference, task success metrics, latency,
cost

Deployment Considerations:
 Multilingual coverage
 Fairness and bias audits
 Privacy for sensitive data
 Robustness to adversarial prompts/noisy inputs

Summary
This comprehensive guide covers all essential aspects of Natural
Language Processing, from basic concepts and preprocessing techniques
to advanced parsing algorithms and modern neural approaches. The field
continues to evolve rapidly, with current research focusing on large
language models, multimodal systems, and addressing the persistent
challenges of ambiguity, knowledge integration, and robust
understanding across diverse domains and languages.

The integration of statistical methods with neural architectures,

combined with massive pretraining and retrieval-augmented approaches,
represents the current state-of-the-art, while traditional rule-based and
statistical methods remain important for specific applications and
understanding fundamental principles.

Formatted-Document NLP
No ratings yet
Formatted-Document NLP
11 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
NLP 2
No ratings yet
NLP 2
45 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
2 Marks
No ratings yet
2 Marks
22 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
NLP Ans
No ratings yet
NLP Ans
91 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
NLP Lab1
No ratings yet
NLP Lab1
33 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Natural Language Processing - Personal Notes
No ratings yet
Natural Language Processing - Personal Notes
8 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
88 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
NLP Ia1
No ratings yet
NLP Ia1
7 pages
TSA Book
No ratings yet
TSA Book
154 pages
Introduction To NLP
No ratings yet
Introduction To NLP
15 pages
Module 1
No ratings yet
Module 1
49 pages
NLP CH 1
No ratings yet
NLP CH 1
8 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
Rohini 69628885691 - Notes Rohini 69628885691 - Notes
No ratings yet
Rohini 69628885691 - Notes Rohini 69628885691 - Notes
11 pages
NLP Notes Unit 1
No ratings yet
NLP Notes Unit 1
179 pages
NLP Sem Imp
No ratings yet
NLP Sem Imp
46 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
NLP Final
No ratings yet
NLP Final
33 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
NLP Session 1
No ratings yet
NLP Session 1
4 pages
NLP Guide for AI Students
No ratings yet
NLP Guide for AI Students
29 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
AMLTA
No ratings yet
AMLTA
17 pages
NLP Essentials for AI Enthusiasts
No ratings yet
NLP Essentials for AI Enthusiasts
4 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
Computational Linguistics Overview
No ratings yet
Computational Linguistics Overview
14 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
NLP - Natural Language Processing and APPLICATION
No ratings yet
NLP - Natural Language Processing and APPLICATION
31 pages
NLP Unit 1
No ratings yet
NLP Unit 1
43 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Unit 4
No ratings yet
Unit 4
39 pages
Unit I
No ratings yet
Unit I
12 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
Languages: What Is Natural Language Processing ?
No ratings yet
Languages: What Is Natural Language Processing ?
25 pages
PresentationDayone-Introduction of NLP
No ratings yet
PresentationDayone-Introduction of NLP
17 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
All Clear For Ukraine
No ratings yet
All Clear For Ukraine
123 pages
Let's Learn THB - 6
No ratings yet
Let's Learn THB - 6
32 pages
AmericanThink2ed - TR4 - Unit 10 Grammar Standard
No ratings yet
AmericanThink2ed - TR4 - Unit 10 Grammar Standard
2 pages
Classification of Comparative Typology
No ratings yet
Classification of Comparative Typology
18 pages
Cognitive Development in The Middle Childhood
No ratings yet
Cognitive Development in The Middle Childhood
40 pages
Review Sheet Unit 5
No ratings yet
Review Sheet Unit 5
4 pages
ECCE Grammar Guide
No ratings yet
ECCE Grammar Guide
7 pages
Arabic
100% (4)
Arabic
142 pages
Writing Skills for Non-Majors
No ratings yet
Writing Skills for Non-Majors
69 pages
The Future Status of English As The Global Language Is Assured
No ratings yet
The Future Status of English As The Global Language Is Assured
5 pages
Past Tense "To Be" Lesson
No ratings yet
Past Tense "To Be" Lesson
17 pages
129
No ratings yet
129
7 pages
Past Simple Irregular Verbs
No ratings yet
Past Simple Irregular Verbs
2 pages
译林版高中英语必修第一册
No ratings yet
译林版高中英语必修第一册
129 pages
Academic Noun Group Guide
No ratings yet
Academic Noun Group Guide
16 pages
Conditionals - 0 - 1 - 2 - 3
No ratings yet
Conditionals - 0 - 1 - 2 - 3
1 page
Nationality
No ratings yet
Nationality
5 pages
Common Barriers To Effective Communication
No ratings yet
Common Barriers To Effective Communication
7 pages
Past Continuous Grammar Worksheet
No ratings yet
Past Continuous Grammar Worksheet
4 pages
Paraphrase Summary
100% (2)
Paraphrase Summary
15 pages
Methods of Translation, Bound and Open Translation
100% (3)
Methods of Translation, Bound and Open Translation
31 pages
Ammayude Kadi Kambi Kathakal Part 1
67% (3)
Ammayude Kadi Kambi Kathakal Part 1
10 pages
Active & Passive Infinitive & Gerund (Exercise With Keys)
100% (6)
Active & Passive Infinitive & Gerund (Exercise With Keys)
2 pages
IRREGULAR-VERBS LIST WITH TRANSLATION-4th-ESO
No ratings yet
IRREGULAR-VERBS LIST WITH TRANSLATION-4th-ESO
4 pages
03rd MS Seq03 Lis & Do 01
No ratings yet
03rd MS Seq03 Lis & Do 01
2 pages
Minna no Nihongo Beginner (皆の日本語初級) Syllabus
No ratings yet
Minna no Nihongo Beginner (皆の日本語初級) Syllabus
7 pages
8Th Grade: Unit 4: Future Matters
100% (1)
8Th Grade: Unit 4: Future Matters
8 pages
English Language (Syllabus 1184) : Singapore-Cambridge General Certificate of Education Ordinary Level (2023)
No ratings yet
English Language (Syllabus 1184) : Singapore-Cambridge General Certificate of Education Ordinary Level (2023)
12 pages
Module 5.1 - FL Nihongo
No ratings yet
Module 5.1 - FL Nihongo
7 pages
Speech Impression
89% (19)
Speech Impression
2 pages

NLP Short Notes

Uploaded by

NLP Short Notes

Uploaded by

You're absolutely right!

Looking at the comprehensive document you

Natural Language Processing

NLP Pipeline Architecture

Detailed Examples for Each Stage:

 Normalization: Lowercasing, Unicode NFKC

2. Natural Language Generation (NLG)

Steps in NLP Processing

2. Syntactic Analysis (Parsing)

2. Machine Translation (MT)

5. Speech Recognition (ASR)

8. Search Autocorrect and Autocomplete

Word Structure and Morphology

Lexemes and Lemmas

3. Word Formation Processes

 Tokenizers, stemmers, lemmatizers

Basic NLTK Workflow

2. Stop Words Filtering

worf_quote = "Sir, I protest. I am not a merry man!"

for word in words_in_quote:

Content vs. Context Words

4. Part-of-Speech (POS) Tagging

Part of Speech Role Examples

Noun Person, place, or thing mountain, bagel, Poland

Pronoun Replaces a noun you, she, we

Adjective Describes what a noun is like efficient, windy, colorful

Verb Action or state of being learn, is, go

Adverb Modifies verb, adjective, or adverb efficiently, always, very

Conjunction Connects words or phrases so, because, and

Interjection Exclamation yay, ow, wow

('the', 'DT'), ('universe', 'NN'), ('.', '.')]

lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

Noun Phrase Examples

quote = "It's a dangerous business, Frodo, going out your door."

# Define chunk grammar

Output Tree Structure:

(NP a/DT dangerous/JJ business/NN)

8. Named Entity Recognition (NER)

Binary NER (without entity type specification):

2. Constituency Grammar (Phrase Structure

Example Grammar Rules

verb_phrase → verb noun_phrase

determiner → 'the', 'a', 'an'

noun → 'Tom', 'apple'

Parse Tree Structure

Syntactic Structure Representations

2. Phrase Structure Parsing

Constituency Tree vs. Dependency Tree

Phrase structure explicitly represents:

Treebanks: Data-Driven Approach

Mathematical Approaches to Ambiguity

Algorithms and Models

Classic Error Examples:

Strategies to Address Bottlenecks:

Stemming vs. Lemmatization:

Candidate Generation Methods:

POS Tagging Details

Modern System Patterns

The integration of statistical methods with neural architectures,

You might also like