0% found this document useful (0 votes)

21 views42 pages

01 Introduction To Natural Language Processing

The document outlines a course on Natural Language Processing (NLP), detailing its structure, key topics, and evaluation criteria. It covers the challenges of language processing, including ambiguity, and introduces the NLP pipeline and main approaches such as rule-based, statistical, and deep learning methods. Additionally, it provides a brief history of NLP, highlighting significant developments and current trends in the field.

Uploaded by

kiroamir66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views42 pages

01 Introduction To Natural Language Processing

Uploaded by

kiroamir66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

TM340 Natural Language

Processing

Introduction
to Natural Language Processing

Based on slides by Dan Jurafsky and Chris Manning

Agenda

 Course Overview

 Introduction to Natural Language Processing

 Linguistic Levels

 Ambiguity

 NLP Pipeline and Main Approaches

 Turing Test

 A brief history of NLP

2
Course Overview

3
Credit and mark distribution

 8 credits course, one semester

 Pre-requisite course: (TM271)

 TMA (20%), MTA(30%), Final Exam (50%)

 To pass the course you have to get:

• A Minimum of 40% on the CA (TMA and MTA)

• A Minimum of 40% on the final exam
• A Minimum of 50% for the average of the CA and the final
4
Course Structure
 The course covers the following topics:

 Introduction to Natural Language Processing

 Text Processing and Normalization
 Language Modeling
 Text Classification
 Information Retrieval
 Ranked Information Retrieval
 Sequence Labeling for Parts of Speech
 Vector Semantics and Embeddings
 Neural Language Models
 Chatbots and Dialogue Systems
5
Introduction to Natural Language
Processing

6
Language Processing in Computers

 The concept of computers processing human language

has existed since the inception of computers.

 This field aims to develop systems that enable

computers to perform tasks involving human
language, such as:
Extracting information from language
Interacting with humans via language
7
Natural Language Processing

 Natural Language Processing (NLP) is a field of

artificial intelligence that focuses on the interaction
between computers and human language. It involves
the development of algorithms and models to enable
computers to understand, interpret, and generate
human-like language.

8
Natural Language Processing

 Natural Language Processing has roots in multiple

disciplines:
• Natural Language Processing (Computer Science)
• Computational Linguistics (Linguistics)
• Speech Recognition (Electrical Engineering)
• Computational Psycholinguistics (Psychology)

9
Natural Language Processing Commercial World

10
Personal Assistants

11
Natural Language Processing Main Tasks

 Natural Language Processing (NLP) is a vast field with

numerous tasks. Here are some of the core ones:
• Information Retrieval and Extraction
• Text Classification
• Chatbots and Dialogue Systems
• Web-Based Question Answering
• Machine Translation
• Summarization
• Computational Biology: Comparing Sequences
12
Linguistic Levels

13
Natural Language Processing

 What distinguishes language processing applications

from other data processing systems is their use of
knowledge of language.

 To process the language, we need to consider Natural

Language Levels

14
Linguistic Levels
 Phonetics and Phonology: Knowledge of sounds.

 Morphology: Word components and meanings.

 Syntax: Structural relationships between words.

 Semantics: Meaning of words and sentences.

 Pragmatics: Speaker’s intentions and goals.

 Discourse: Connections across larger text units.

15
Why Natural Language Processing is Hard?

 Natural Language Processing (NLP) is considered

difficult for several reasons.

 On of the main reasons is the Ambiguity.

16
Ambiguity

17
Ambiguity

 Natural language is designed to make human

communication efficient.
• We omit a lot of “common sense” knowledge, which we
assume the hearer/reader possesses
• We keep a lot of ambiguities, which we assume the
hearer/reader knows how to resolve

18
Ambiguity

 Ambiguity occurs when a sentence or word can have

multiple interpretations or meanings.

 Speech and language processing involve resolving

ambiguities.

"I made her duck"

 This sentence demonstrates various types of ambiguity at
different linguistic levels.
19
Ambiguity

"I made her duck"

 Morphological/Syntactic Ambiguity:
• Word 'duck': Can be a noun (waterfowl) or a verb (to lower the
head/body).
• Word 'her': Can be a dative pronoun (to/for her) or a possessive
pronoun (belonging to her).

20
Ambiguity

"I made her duck"

 Semantic Ambiguity:

• Word 'make': Can mean create or cook.

 Structural/Syntactic Ambiguity:

• Transitive Use: "I made her duck" = I cooked her waterfowl.

• Ditransitive Use: "I made her duck" = I turned her into a duck.
• Causative Use: "I made her duck" = I caused her to lower her head.
21
Ambiguity

"I made her duck"

 Phonological Ambiguity:

• Spoken sentence may be misinterpreted as "Eye maid her duck"

instead of "I made her duck".

 Speech Act Interpretation:

• Determining whether a sentence is a statement or a question.

22
Ambiguity

 Disambiguation Techniques:

• Part-of-Speech Tagging :
 Resolves whether "duck" is a noun or a verb.

• Word Sense Disambiguation :

 Determines whether "make" means "create" or "cook".

• Probabilistic Parsing :
 Addresses syntactic ambiguities, e.g., if "her" and "duck" are the
same or different entities.
23
Generic NLP Pipeline

24
Generic NLP Pipeline

 The concept of a pipeline in NLP:

• Streamlines organization
• Promotes flexibility
• Supports collaboration
• Ensures maintainability throughout the development process.
• It systematically tackles the unique challenges of NLP
• Enabling the conversion of raw text data into meaningful
insights.
25
Generic NLP Pipeline

8. Monitoring
and
7. Maintenance
Deployment
6. Model
Evaluation
5. Model
Selection and
4. Feature Training
Engineering
3. Data
Preprocessing
2. Data
Collection
1. Problem
Definition and
Business
Value

26
Generic NLP Pipeline
1. Problem Definition and Business Value : Before starting, it's important to know why you're
using NLP. What problem are you trying to solve? How will it help your business? Make sure you have
a clear goal..

2. Data Collection : The first step is gathering the raw text data needed for your NLP task. This could
involve scraping websites, accessing databases, using Pre-existing Datasets or other methods to
collect relevant data.

3. Data Pre-Processing : This step involves cleaning and preparing the text. Common tasks include
breaking the text into smaller units (tokenization), converting everything to lowercase, and reducing
words to their base form (stemming/lemmatization). These steps standardize the text for analysis.

4. Feature Engineering : Here, you convert the pre-processed text into numerical forms that a
machine learning model can understand. Techniques like TF-IDF (Term Frequency-Inverse Document
Frequency) or word embeddings (e.g., Word2Vec or GloVe) are often used.

27
Generic NLP Pipeline
5. Model Selection and Training : Choose and train a suitable model for your NLP task, such as
text classification, named entity recognition, or translation. The model learns patterns from the
numerical data.

6. Model Evaluation : After training, evaluate your model's performance using metrics like
accuracy, precision, recall, or F1-score, depending on your task. This helps you see how well the
model works on new, unseen data.

7. Deployment : Once the model performs well, deploy it in a real-world setting where it can
make predictions on new text data. This might involve integrating it into an app, API, or other
systems.

8. Monitoring and Updating : After deployment, keep an eye on how the model performs in the
real world. If its accuracy decreases or it becomes less reliable, you may need to retrain or
update the model to keep it effective.
28
Main Approaches for NLP

 There are three main approaches to do NLP:

• Rule-Based NLP : This is the oldest way. It uses pre-written

rules to understand language.
• Statistical NLP : This uses math and statistics to understand
language.
• Deep Learning NLP : This is the newest and most powerful
way. It uses large amounts of data and neural networks to
learn language.
29
NLP tools

 Here is a list of popular Natural Language Processing (NLP) tools and libraries:

• Natural Language Toolkit (NLTK)

• Spacy
• Transformers (Hugging Face)
• Textblob
• KerasNLP
• TensorFlow
• Stanford NLP
• Google AI tools
• Facebook AI tools

30
Turing Test

31
Turing Test

 The effective use of language is intertwined with general

cognitive abilities, making it a significant marker of intelligence.

 Turing Test (1950): Introduced to empirically test if a

machine could think.

 Turing Test Concept: A machine is considered intelligent if it

can use language to fool a human interrogator into believing it
is human.

32
The Turing Test

 Participants:

1. Interrogator (Human) (C)

2. Computer (A)
3. Another Human (B)
 Objective: The interrogator must determine which
participant is the machine by asking questions.

 Machine's Task: Respond like a human to fool the

interrogator. 33
A brief history of NLP

34
A brief history of NLP

 Foundational Insights (1940s-1950s)

• Automaton and Probabilistic Models: Origins in post-WWII work,
including Turing's model of algorithmic computation.
• Finite-State Machines: Contributions by Chomsky, Kleene, and
Shannon on language and grammar.
• Shannon's Information Theory: Introduced concepts of noisy
channels and entropy in language.
• Early Speech Recognition: Bell Labs' system recognizing digits with
high accuracy.
35
A brief history of NLP

 Symbolic vs. Stochastic Paradigms (1957-1970)

• Symbolic Paradigm:
 Generative Syntax: Chomsky's work in formal language theory and AI focus
on reasoning and logic.
 Early AI Systems: Used pattern matching and keyword searches.

• Stochastic Paradigm:
 Bayesian Methods: Applied to text recognition and authorship attribution
(Bledsoe and Browning, 1959).
 Psychological Models: Based on transformational grammar and the rise of
online corpora (e.g., Brown Corpus).
36
A brief history of NLP

 Research Paradigms (1970-1983)

• Stochastic Paradigm: Introduction of Hidden Markov Models

(HMM) for speech recognition.
• Logic-Based Paradigm: Development of Prolog, Lexical
Functional Grammar, and natural language understanding
systems.
• Discourse Modeling: Focused on discourse structure,
reference resolution, and speech acts.
37
A brief history of NLP

 Empiricism and Finite-State Models Redux: 1983-1993

• Finite-State Models: Revival in phonology, morphology, and

syntax.
• Rise of Empiricism: Data-driven and probabilistic methods in
speech and language processing.
• Evaluation Focus: Quantitative metrics and model comparison.
• Natural Language Generation: Expansion of work in this area.

38
A brief history of NLP

 The Field Comes Together: 1994-1999

• Standardization of Probabilistic Models: Incorporated

into NLP tasks like parsing and tagging.
• Technological Advances: Enabled commercial
applications in speech recognition and grammar correction.
• Web Influence: Increased demand for language-based
information retrieval and extraction.

39
A brief history of NLP

 The Rise of Machine Learning: 2000-2008

• Annotated Resources: Availability of large datasets like Penn

Treebank facilitated supervised learning.
• Machine Learning Techniques: SVMs, maximum entropy, Bayesian
models become standard.
• High-Performance Computing: Enabled complex system
deployment.
• Unsupervised Learning: Gained traction due to difficulties in
obtaining annotated data (e.g., machine translation, topic modeling).
40
A brief history of NLP
 Current trend:
• Transformer Models and Pre-trained Language Models: The continued advancement of
models like BERT, GPT, and T5, which have revolutionized NLP by offering state-of-the-art
performance on a wide range of tasks through large-scale pre-training.
• Few-Shot, Zero-Shot, and Transfer Learning: The development of models that can generalize
across tasks with minimal training data, making them highly adaptable and reducing the need for
task-specific datasets.
• Ethical AI and Bias Mitigation: Growing emphasis on addressing biases in NLP models, ensuring
that AI systems are fair, transparent, and ethical, particularly in their impact on diverse
demographic groups.
• Multimodal and Multilingual NLP: The integration of NLP with other modalities (e.g., vision,
speech) and the expansion of NLP capabilities to handle multiple languages, including low-resource
languages, to make AI more universally accessible.
41
Thank You

Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
31 pages
RELATIONSHIPS
No ratings yet
RELATIONSHIPS
21 pages
JCBJS220LC PDF
No ratings yet
JCBJS220LC PDF
20 pages
Heavy Duty Truck Systems 7th Edition Full Download
100% (2)
Heavy Duty Truck Systems 7th Edition Full Download
410 pages
SSP 820233-Electric-Drives-7 - 9 - 2013 - SM
No ratings yet
SSP 820233-Electric-Drives-7 - 9 - 2013 - SM
62 pages
For In-Vehicle CAN Bus Communications
100% (1)
For In-Vehicle CAN Bus Communications
15 pages
Engine Troubleshooting Guide
No ratings yet
Engine Troubleshooting Guide
4 pages
Structure & Function Injectors
No ratings yet
Structure & Function Injectors
14 pages
Novel Internal Combustion Engine Technologies For Performance Im 2021
100% (1)
Novel Internal Combustion Engine Technologies For Performance Im 2021
269 pages
Auto Electrical and Electronics Anoop S
No ratings yet
Auto Electrical and Electronics Anoop S
25 pages
Autonomous Vehicle Tech Overview
No ratings yet
Autonomous Vehicle Tech Overview
2 pages
WhitePaper AVEVA Custeau PredictiveMaintenanceInOilandGas 09-18
100% (1)
WhitePaper AVEVA Custeau PredictiveMaintenanceInOilandGas 09-18
6 pages
Engine Control System: Workshop Manual
100% (2)
Engine Control System: Workshop Manual
175 pages
Automotive Sensors
No ratings yet
Automotive Sensors
13 pages
City and Gulds Syllabus
No ratings yet
City and Gulds Syllabus
240 pages
Tata Indica EV Presentation
No ratings yet
Tata Indica EV Presentation
22 pages
Bosch KTS 8 Series: ECU Diagnosis Solutions
No ratings yet
Bosch KTS 8 Series: ECU Diagnosis Solutions
24 pages
CAN Systems & J1939 Connector Guide
No ratings yet
CAN Systems & J1939 Connector Guide
6 pages
CPF (Catalyzed Particulate Filter)
No ratings yet
CPF (Catalyzed Particulate Filter)
52 pages
2003 Sentra Engine Control Guide
No ratings yet
2003 Sentra Engine Control Guide
1,864 pages
Building Machine Learning-Program-Notes
No ratings yet
Building Machine Learning-Program-Notes
28 pages
4.design and Test Solutions For Advanced Automotive
No ratings yet
4.design and Test Solutions For Advanced Automotive
39 pages
QSK45 Wiring Diagram (Tier 1, Phase 2.0 With CENSE) - Throttle - Switch PDF
No ratings yet
QSK45 Wiring Diagram (Tier 1, Phase 2.0 With CENSE) - Throttle - Switch PDF
3 pages
Bumper Handling and Refinishing: Quick Training Guide - Qlb12A
No ratings yet
Bumper Handling and Refinishing: Quick Training Guide - Qlb12A
25 pages
Electronically Controlled Diesel Engine
100% (3)
Electronically Controlled Diesel Engine
50 pages
2001 Kia Optima Fuel System
No ratings yet
2001 Kia Optima Fuel System
136 pages
Bosch
No ratings yet
Bosch
30 pages
The Diagnostic Process: Learning Objectives Key Terms
No ratings yet
The Diagnostic Process: Learning Objectives Key Terms
21 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
793C - 4GZ-ATY Slide-1
No ratings yet
793C - 4GZ-ATY Slide-1
208 pages
27 Hilux (Cont. Next Page) : Engine Control (1GD-FTV)
No ratings yet
27 Hilux (Cont. Next Page) : Engine Control (1GD-FTV)
8 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
1 NLP
No ratings yet
1 NLP
26 pages
1 NLP (Introduction)
No ratings yet
1 NLP (Introduction)
60 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
Unit 4
No ratings yet
Unit 4
39 pages
Lesson 1 Introduction To Natural Language Processing
No ratings yet
Lesson 1 Introduction To Natural Language Processing
93 pages
Aids Module 5
No ratings yet
Aids Module 5
35 pages
Natural Language Processing: John Doe CEO
No ratings yet
Natural Language Processing: John Doe CEO
16 pages
Artificial Intelligence-UNIT-4
No ratings yet
Artificial Intelligence-UNIT-4
37 pages
DLNLP Chapter-1
No ratings yet
DLNLP Chapter-1
38 pages
Seminar Outline NLP
No ratings yet
Seminar Outline NLP
5 pages
Introduction To NLP
No ratings yet
Introduction To NLP
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
Unit 1
No ratings yet
Unit 1
99 pages
Natural Language Processing - Personal Notes
No ratings yet
Natural Language Processing - Personal Notes
8 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
21 pages
NLP Unit1 Presentation
No ratings yet
NLP Unit1 Presentation
65 pages
Chap 1
No ratings yet
Chap 1
54 pages
SNLP - 1
No ratings yet
SNLP - 1
11 pages
Eco 36
No ratings yet
Eco 36
6 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
1 page
Elliott Wave Pattern Recognition Scanner
No ratings yet
Elliott Wave Pattern Recognition Scanner
5 pages
Peterborough Land Use Map
No ratings yet
Peterborough Land Use Map
1 page
DSEWebNet Smart Device Application Manual
No ratings yet
DSEWebNet Smart Device Application Manual
46 pages
Kasun CV
No ratings yet
Kasun CV
3 pages
74F382 4-Bit Arithmetic Logic Unit: General Description Features
No ratings yet
74F382 4-Bit Arithmetic Logic Unit: General Description Features
9 pages
Chapter 7
No ratings yet
Chapter 7
2 pages
Serial Number Robot
No ratings yet
Serial Number Robot
3 pages
EG Manual-1
No ratings yet
EG Manual-1
13 pages
Controller Design
No ratings yet
Controller Design
253 pages
T C White Observership Award Application Form
No ratings yet
T C White Observership Award Application Form
4 pages
Naive Bayes for Data Science Students
No ratings yet
Naive Bayes for Data Science Students
1,652 pages
DCOM Univ Paper
0% (1)
DCOM Univ Paper
9 pages
Manual SIMOTION Web Accumulator V3.0.0
No ratings yet
Manual SIMOTION Web Accumulator V3.0.0
59 pages
Measuring The Dimensions of Serendipity in Digital Environments
No ratings yet
Measuring The Dimensions of Serendipity in Digital Environments
6 pages
Graphic Design Solutions 5th Edition Robin Landa Solutions Manualinstant Download
100% (10)
Graphic Design Solutions 5th Edition Robin Landa Solutions Manualinstant Download
46 pages
E-Book - 1.9
No ratings yet
E-Book - 1.9
776 pages
692283-Phone Repair StepbyStep Flowchart Diagrams
No ratings yet
692283-Phone Repair StepbyStep Flowchart Diagrams
52 pages
Advanced Stack Implementations
No ratings yet
Advanced Stack Implementations
2 pages
TOEIC Answer Sheet
No ratings yet
TOEIC Answer Sheet
2 pages
MU-N Series: Instruction Manual
No ratings yet
MU-N Series: Instruction Manual
4 pages
Industrial Network Optimization
No ratings yet
Industrial Network Optimization
6 pages
Digital Twin Technology
No ratings yet
Digital Twin Technology
10 pages
Bits For Mid1
100% (1)
Bits For Mid1
14 pages
Mark Sheet
No ratings yet
Mark Sheet
1 page
Web Technologies Unit II Notes
No ratings yet
Web Technologies Unit II Notes
6 pages
Incomings Courses+in+english PDF
No ratings yet
Incomings Courses+in+english PDF
9 pages
Company Name Profile Rounds What Were The Main Questions Asked? Suggestions or Tips
No ratings yet
Company Name Profile Rounds What Were The Main Questions Asked? Suggestions or Tips
5 pages
Internship Report
No ratings yet
Internship Report
20 pages
Suction Unit User's Manual - Eng
No ratings yet
Suction Unit User's Manual - Eng
15 pages
Travel Email
No ratings yet
Travel Email
2 pages