Computational Linguistic Notes
Computational Linguistic Notes
MODULE 1:
Overview
NLP: Field combining linguistics, artificial intelligence, computer science, and
cognitive science to enable computers to process, understand, and generate
human language.
Point-to-Point Notes
Objectives:
NLP (LINGUISTICS) 1
7. Pragmatic Analysis: Interpret intended meaning considering real-world
knowledge.
Complexities in NLP:
NLP Flowchart
Text/Speech Input
↓
Preprocessing (Tokenization, Cleaning, Normalization)
↓
Morphological Analysis
↓
Syntactic Parsing
↓
Semantic Analysis
↓
Discourse Integration
↓
Pragmatic Interpretation
↓
Application Output (Translation, Information Extraction, Dialogue, etc.)
Overview
Computational Linguistics: Interdisciplinary domain focused on computational
aspects of human language faculties—both in theory and application.
NLP (LINGUISTICS) 2
Two Components:
Point-to-Point Notes
Goals:
Interdisciplinary Nature:
Areas of Focus:
Corpus linguistics: Use of large text datasets for model training and
evaluation.
CL Flowchart
Linguistic Data Collection
↓
Formal Representation (Grammar, Description)
NLP (LINGUISTICS) 3
↓
Algorithm Design (Parsing, Generation)
fdvfdgvdccq↓
Model Implementation (Theoretical Models)
↓
Evaluation & Refinement (against corpora/cognitive studies)
↓
Application in NLP Tools/Systems
Major Applications
Machine Translation: Automated translation between different languages
(e.g., Google Translate, SYSTRAN).
Speech Recognition & Synthesis: Converting speech to text and vice versa.
Dialogue Systems & Chatbots: Conversational agents (e.g., Siri, Alexa, ELIZA).
Applications Flowchart
Raw Text/Speech
↓
NLP Pipeline (as above)
NLP (LINGUISTICS) 4
↓
Task-Specific Processing Module:
Machine Translation
Information Extraction
Summarization
Dialogue Management
↓
End-User Application (translator, bot, summarizer, etc.)
4. Comparison Table
Aspect NLP Computational Linguistics (CL)
Linguists, researchers, AI
Typical Users Engineers, product developers
scientists
Research
More applied Both applied and theoretical
Orientation
NLP (LINGUISTICS) 5
NLP (Natural Language
Aspect Computational Linguistics (CL)
Processing)
NLP (LINGUISTICS) 6
2000s–Present: Machine learning dominance; large corpora, deep learning,
integrated symbolic-statistical techniques.
3. Feature Extraction: The audio is divided into phonemes (basic sound units).
English has about 44 distinct phonemes.
NLP (LINGUISTICS) 7
4. Statistical Analysis: Probabilistic models match phoneme sequences to
words using context.
Types of ASR:
Directed Dialogue: The user can only reply with specific, recognized
commands.
Active Learning: The ASR adapts dynamically by learning from new user input
and correcting mistakes.
NLP (LINGUISTICS) 8
2. Transcription
Definition:
Transcription converts spoken words into written language, often using a phonetic
alphabet for accuracy.
Types:
Example: “The cat sat on the mat” → /ðə kæt sæt ɑn ðə mæt/
What is Sound?
Sound: Vibrations traveling as waves through a medium, perceived by the
brain.
Tone: The steady pitch of a sound; in tone languages, changing tone can alter
meaning.
NLP (LINGUISTICS) 9
Example: “You’re coming?” (rising intonation = question)
Amplitude (Loudness)
^
| /\
| / \ /\
| /\ / \ /\ / \ /\
| / \/ \/ \/ \ / \
|
/_____________________________> Time
<- ->
Wavelength (determines pitch)
妈
mā ( ) High Mother [Mandarin]
🔹 1. Consonants
✅ Definition:
Consonants are speech sounds produced by obstructing airflow in the vocal
tract.
🔍 Key Features:
Produced with restriction or closure.
🧭 Places of Articulation:
NLP (LINGUISTICS) 10
Place Description Example Sounds
🔹 2. Vowels
✅ Definition:
Vowels are produced without significant constriction of airflow in the vocal tract.
🔍 Key Features:
Shaped by tongue position and lip rounding.
🗺️ Classified by:
Feature Types
NLP (LINGUISTICS) 11
Phones are the physical realization of sounds.
🧾 Example:
[t] in tap and [tʰ] in top are two different phones.
🔹 2. Allophone
An allophone is a variant of a phoneme that occurs in specific contexts.
🧠 Key Point:
Allophones belong to the same phoneme but sound slightly different depending
on context.
🔹 1. Complementary Distribution
✅ Definition:
Two sounds are in complementary distribution when they never occur in the
same phonetic environment.
📌 Key Features:
They are predictable variants of the same phoneme (i.e., allophones).
Substituting one for the other does not change the word’s meaning.
🧾 Example:
[pʰ] (aspirated) in pin vs. [p] (unaspirated) in spin
NLP (LINGUISTICS) 12
🔸 2. Contrastive Distribution
✅ Definition:
Two sounds are in contrastive distribution if they occur in the same environment
but change the meaning of the word.
📌 Key Features:
They are separate phonemes.
Substitution leads to a minimal pair (words that differ by only one sound).
🧾 Example:
/p/ vs. /b/ in:
🔹 Structure Components:
Syllable = Onset + Nucleus + Coda
NLP (LINGUISTICS) 13
Component Description Example (Word: "cat")
🔊 2. Stress Pattern
✅ Definition:
Stress is the emphasis placed on a syllable within a word or phrase.
🔹 Types:
Word Stress: One syllable is more prominent in a word.
Longer duration
Higher pitch
🧩 3. Minimal Pair
NLP (LINGUISTICS) 14
✅ Definition:
A minimal pair is a pair of words that differ by only one sound in the same
position but have different meanings.
🔹 Purpose:
Used to identify phonemes in a language.
🧾 Examples:
Word 1 Word 2 Difference
1. What is Morphology?
1. Definition: Study of the internal structure of words and the rules for word
formation.
2. Morphemes
Free Morphemes
Bound Morphemes
Cannot stand alone; must attach to a stem (e.g., prefixes un-, pre-,
suffixes ing, ness).
3. Types of Morphology
Morphological processes fall into two major categories:
NLP (LINGUISTICS) 15
A. Derivational Morphology
Purpose: Creates new words or changes a word’s part of speech/meaning.
Examples:
1. Prefixes
2. Suffixes
Note: Often changes word class (e.g., run (verb) → runner (noun) via er).
B. Inflectional Morphology
Purpose: Modifies a word to express grammatical features (tense, number,
degree) without changing its core meaning or part of speech.
1. Verbs
2. Nouns
s (plural): cat‑s
’s (possessive): Laura‑’s
3. Adjectives
NLP (LINGUISTICS) 16
er (comparative): quick‑er
Key Point: Does not change word class—just marks grammatical distinctions.
NLP Applications:
🧠Acquisition
B.F. Skinner – Behaviourist Theory of Language
NLP (LINGUISTICS) 17
NLP (LINGUISTICS) 18
📌 Main Idea:
Language is learned through imitation, reinforcement, and conditioning,
similar to how animals learn behaviors.
🧪 Background:
Based on behaviourist psychology and experiments with animals (e.g., rats
and pigeons).
✅ Key Features:
1. Imitation:
NLP (LINGUISTICS) 19
2. Positive Reinforcement:
3. Negative Reinforcement:
🧠 Example:
Child: “Ball” → Parent gives the ball → Child learns to associate word with the
object.
Over time, correct words are reinforced, and incorrect ones fade.
NLP (LINGUISTICS) 20
She learned words but never fully mastered grammar.
🧬Acquisition
Noam Chomsky – Innateness Theory of Language
📌 Main Idea:
Children are born with an innate ability to acquire language.
Allows children to extract grammar rules from the language they hear.
🌐 Universal Grammar:
All human languages share basic rules (e.g., presence of nouns, verbs, word
order).
LAD helps child identify how their native language fits those universal rules.
🧠 Examples:
Child hears “played”, “worked” → applies rule → says “goed” (instead of
“went”).
✅ Supporting Evidence:
1. Virtuous Errors:
2. Creole Languages:
NLP (LINGUISTICS) 21
Suggests children’s brains impose structure on limited input.
4. Brain Studies:
Watched TV and heard speech, but did not develop language until a
speech therapist interacted with him.
LAD explains “how” language is learned but not why children want to
speak or communicate.
Let me know if you’d like similar detailed notes on Bruner, Piaget, or Saussure for
your exam prep!
NLP (LINGUISTICS) 22
Natural Language Processing (NLP):
Detailed Notes
Overview
Natural Language Processing (NLP) is an interdisciplinary field combining
linguistics, artificial intelligence (AI), computer science, and cognitive science. Its
goal is to enable computers to process, understand, interpret, and generate
human language—both written and spoken123.
Main Application Areas:
Objectives
Facilitate natural, intuitive interaction between humans and machines using
text or speech.
1. Data Acquisition
Text Data: Collect raw text from sources like documents, web pages, emails,
and chat logs.
NLP (LINGUISTICS) 23
2. Preprocessing
Tokenization: Divide text into sentences and words (called tokens).
Example:
Original: "NLP techniques are advancing rapidly!"
3. Morphological Analysis
Identifies structure within words, such as prefixes, suffixes, and roots.
Example Parse:
NLP (LINGUISTICS) 24
"The cat sat on the mat."
Subject: cat
Verb: sat
5. Semantic Analysis
Assigns meaning to syntactically valid sentences.
Example:
"I saw the man with the telescope."
6. Discourse Integration
Links meaning across sentences and paragraphs.
Example:
7. Pragmatic Analysis
NLP (LINGUISTICS) 25
Interprets intended meaning beyond literal text.
Ambiguity
Lexical Ambiguity: Single word, multiple meanings (e.g., "bank": riverbank vs.
financial institution)89.
Context-Dependence
Context can drastically change meaning: Words or sentences require
surrounding information for clarity.
Example: "She gave her dog food" (Did she give her dog some food, or did
she give food that belonged to her dog?).
NLP (LINGUISTICS) 26
Must be robust to errors and gaps in input1011.
Language Diversity
Enormous variation in grammar, word order, morphology, and idioms across
languages complicate universal NLP solutions10.
Applications of NLP
Machine translation (e.g., Google Translate)
Text summarization
Sentiment analysis
Morphological
Identify and analyze word roots and forms "running" → root "run"
Analysis
Syntactic Analysis Determine grammar, parse structure Parse tree, POS tags
Sense of "bank" in
Semantic Analysis Assign meaning to phrases and sentences
context
NLP (LINGUISTICS) 27