Introduction to NLP (Natural Language
Processing)
What is NLP?
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that enables computers to
understand, interpret, and generate human language. It combines computational linguistics with
machine learning and deep learning to process and analyze large amounts of natural language data.
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and computational linguistics
focused on enabling computers to understand, interpret, and generate human language. It bridges the
gap between human communication and machine understanding.
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on enabling
computers to understand, interpret, and generate human language in a way that is both meaningful and
useful. It combines insights from linguistics, computer science, and machine learning to bridge the gap
between human communication and computational systems.
Key Concepts in NLP
1. Human Language Complexity:
o Language is inherently ambiguous, context-dependent, and varies across cultures,
dialects, and individuals.
o NLP aims to model this complexity to perform tasks like translation, sentiment
analysis, or question answering.
2. Goals of NLP:
o Understanding: Extract meaning from text or speech (e.g., identifying intent in
"Can you open the window?").
o Generation: Produce coherent and contextually appropriate language (e.g.,
chatbots generating responses).
o Interaction: Enable seamless human-computer communication (e.g., virtual
assistants like Siri or Alexa).
Importance of NLP
Automates text-based tasks (e.g., translation, summarization, and sentiment analysis).
Improves human-computer interaction (e.g., chatbots and virtual assistants).
Enhances data-driven decision-making (e.g., extracting insights from customer reviews
and social media).
Applications of NLP
1. Machine Translation – Google Translate, Deep-L.
2. Speech Recognition – Siri, Alexa, Google Assistant.
3. Sentiment Analysis – Identifying opinions in reviews and social media posts.
4. Text Summarization – Summarizing news articles or reports.
5. Chatbots and Virtual Assistants – AI-powered customer service bots.
6. Information Retrieval – Search engines like Google.
7. Text-to-Speech (TTS) and Speech-to-Text (STT) – Assistive technologies.
8. Grammar and Spell Checking – Grammarly, MS Word Editor.
Historical Context:
Rule-Based Systems (1950s–1980s): Relied on handcrafted
grammatical rules (e.g., ELIZA chatbot).
Statistical Methods (1990s–2010s): Leveraged probabilistic models
(e.g., Hidden Markov Models for speech recognition).
Deep Learning Era (2010s–present): Dominated by neural
networks (e.g., Transformers, BERT) for context-aware processing.
Challenges in NLP
Ambiguity: Words or sentences can have multiple meanings.
Context Understanding: Words change meaning based on context.
Sarcasm and Irony: Difficult for machines to detect.
Languages and Dialects: Variation in grammar, structure, and vocabulary.
Data Quality: NLP models need large datasets for training.
NLP Tasks in Syntax, Semantics, and
Pragmatics (Components of NLP)
1. Syntax-based NLP Tasks (Structure of Language)
Syntax refers to the arrangement of words in a sentence following grammatical rules. NLP tasks
that deal with syntax include:
a) Part-of-Speech (POS) Tagging
Purpose: Assign grammatical categories (noun, verb, etc.) to tokens. Or
Identifies words as nouns, verbs, adjectives, etc.
Example: "She runs" → [("She", PRP), ("runs", VBZ)]. And
Sentence: "The dog runs quickly."
POS Tags: The (DET) dog (NOUN) runs (VERB) quickly (ADV).
Methods: Hidden Markov Models (HMMs), CRFs, or neural networks.
b) Parsing
Analyzes the grammatical structure of a sentence.
Types of Parsing:
o Dependency Parsing: Shows relationships between words. Or Identifies
relationships between words (e.g., subject, object).
Example: "The cat sat" → "cat" (subject) → "sat" (root).
o Constituency Parsing: Breaks sentences into phrases. Or Builds a parse tree
using phrasal grammar rules (e.g., NP → Det + N).
Example:
Sentence: "The cat sat on the mat."
Parse Tree:
(S
(NP (DT The) (NN cat))
(VP (VBD sat) (PP (IN on) (NP (DT the) (NN mat)))))
1. Lemmatization & Stemming:
o Lemmatization: Reduces words to base forms using dictionaries
(e.g., "running" → "run").
o Stemming: Chops off affixes crudely (e.g., "running" → "run").
c) Sentence Segmentation
Splitting text into individual sentences.
Example:
Input: "Dr. Smith is a scientist. He works at NASA."
Output: ["Dr. Smith is a scientist.", "He works at NASA."]
Challenges: Ambiguous periods (e.g., "Mr. Smith arrived.").
d) Word Tokenization
Splitting sentences into individual words.
Example:
Input: "Natural Language Processing is amazing!"
Output: ["Natural", "Language", "Processing", "is", "amazing", "!"]
2. Semantics-based NLP Tasks (Meaning of Language)
Semantics deals with the meaning of words, phrases, and sentences. NLP tasks that focus on
semantics include:
Definition: Semantics focuses on the meaning of words, phrases, and sentences.
a) Named Entity Recognition (NER)
Identifies names, locations, organizations, dates, etc. or Identify entities (e.g.,
persons, dates) in text.
Example:
Sentence: "Elon Musk founded SpaceX in 2002."
NER Output:
o Person: Elon Musk
o Organization: SpaceX
o Date: 2002
Or
"Apple launched iPhone in 2007." → [ORG: Apple], [PRODUCT: iPhone],
[DATE: 2007].
Methods: CRFs, BiLSTM-CRF, or transformer models.
b) Word Sense Disambiguation (WSD)
Determines the correct meaning of a word in context. Or Determine the correct
meaning of polysemous words.
Example:
Sentence: "He went to the bank to withdraw money."
o Bank (Financial Institution) vs. Bank (Riverbank)
o WSD ensures the correct meaning is chosen.
Challenges: Requires context (e.g., "I deposited money at the bank").
c) Semantic Role Labeling (SRL)
Identifies who did what to whom, when, and where in a sentence. Or "Mary ate an
apple." → [Agent: Mary], [Action: ate], [Patient: apple].
Example:
Sentence: "John gave Mary a book."
o Agent: John
o Recipient: Mary
o Theme: Book
o "Mary ate an apple." → [Agent: Mary], [Action: ate], [Patient: apple].
d) Text Similarity and Paraphrasing
Measures how similar two texts are.
Example:
Sentence 1: "He is going to New York tomorrow."
Sentence 2: "Tomorrow, he will travel to NYC."
o NLP can recognize these as having the same meaning.
e) Sentiment Analysis
Determines the emotion behind a text (positive, negative, or neutral).
Example:
Sentence: "I love this phone, it’s amazing!"
Sentiment: Positive
Methods: Lexicon-based approaches (e.g., VADER) or deep learning (e.g.,
BERT)
f) Semantic Parsing:
o Purpose: Convert text to structured meaning representations
(e.g., SQL queries).
o Example: "Show flights to Paris" → SELECT * FROM flights WHERE
destination = 'Paris';
3. Pragmatics-based NLP Tasks (Context and Intent)
Pragmatics deals with the contextual meaning of language in different situations.
Definition: Pragmatics involves interpreting meaning in context, including implied
intentions.
a) Coreference Resolution
Identifies when different words refer to the same entity.
Example:
Sentence: "Sara went to the store. She bought milk."
o She refers to Sara.
o "John said he’s late." → "he" refers to "John".
Methods: Rule-based systems (e.g., Hobbs algorithm) or neural models (e.g.,
SpanBERT)
b) Text Summarization
Generates a short version of a long text while keeping key information.
Types:
o Extractive Summarization: Selects key sentences from the text.
o Abstractive Summarization: Generates new sentences with the same meaning.
Example:
Original Text: "The global economy is facing challenges due to inflation and supply
chain issues."
Summary: "Global economy struggles with inflation and supply chains."
c) Dialogue Systems (Chatbots and Conversational AI)
Understands and generates human-like conversations.
Example:
o User: "What’s the weather today?"
o Bot: "It’s sunny and 25°C."
d) Speech Recognition and Text-to-Speech
Converts spoken language into text (Speech-to-Text).
Converts text into spoken words (Text-to-Speech).
Example:
o Voice assistants like Google Assistant, Siri, and Alexa.
e) Question Answering (QA)
Answers specific questions based on a document or knowledge base.
Example:
Question: "Who is the President of the USA?"
Answer: "Joe Biden."
Discourse Analysis:
o Purpose: Understand how sentences connect (e.g., cause-effect,
contrast).
o Example: "It rained. The match was canceled." → The second
sentence is a result of the first.
Speech Act Recognition:
o Purpose: Classify utterances into actions (e.g., request,
command).
o Example: "Can you pass the salt?" → Polite request (not a literal
question).
Implied Meaning & Sarcasm Detection:
o Purpose: Infer unstated meaning or detect sarcasm.
o Example: "Great job!" (context-dependent—could be sincere or
sarcastic).
o Challenges: Requires world knowledge and tonal cues (often
absent in text).