Unit I NLP

overview of NLP

Uploaded by

Aruna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views5 pages

Unit I NLP

overview of NLP

Uploaded by

Aruna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ACADEMIC YEAR 2024-2025

IFETCE R-2023

Unit I – INTRODUCTION 1.1.5

1.1 NLP: Overview: Future Scope:
 The meaning of NLP is Natural Language  Bots: Chatbots assist clients to get to the
Processing (NLP) which is a fascinating and point quickly by answering inquiries and
rapidly evolving field that intersects referring them to relevant resources and
computer science, artificial intelligence, and products at any time of day or night.
linguistics.  Supporting Invisible UI: Almost every
 With the increasing volume of text data connection we have with machines involves
generated every day, from social media posts human communication, both spoken and
to research articles, NLP has become an written.
essential tool for extracting valuable insights  Smarter Search: NLP’s future also
and automating various tasks. includes improved search, something we’ve
 Natural language processing (NLP) is a field been discussing at Expert System for a long
of computer science and a subfield of time.
artificial intelligence that aims to make 1.2 Approaches in NLP
computers understand human language. https://www.geeksforgeeks.org/rule-based-
1.1.2 NLP Techniques: approach-in-nlp/
 Text Processing and Preprocessing In  In the context of Natural Language
NLP Processing (NLP), "approaches" refer to
 Syntax and Parsing In NLP different methodologies or techniques used to
 Semantic Analysis tackle various tasks related to understanding
 Information Extraction and processing human language.
 Text Classification in NLP There are three types of NLP approaches:
 Language Generation  Rule-based Approach – Based on
 Speech Processing linguistic rules and patterns
 Question Answering  Machine Learning Approach – Based
 Dialogue Systems on statistical analysis
 Sentiment and Emotion Analysis in NLP  Neural Network Approach – Based
1.1.3 Working of Natural Language Processing on various artificial, recurrent, and
(NLP): convolutional neural network
algorithms
1.3 Data Acquisition:
https://www.linkedin.com/pulse/data-
acquisition-natural-language-processing-nlp-
vivekanandan
 Data acquisition is the process of gathering
and collecting data for use in natural
language processing (NLP) tasks. The quality
and quantity of the data is critical to the
 Working in natural language processing success of any NLP model.
(NLP) typically involves using
 There are a number of different ways to
computational techniques to analyze and
acquire data for NLP tasks. Some common
understand human language. This can
methods include:
include tasks such as language
 Crawling and scraping the web
understanding, language generation, and
 Using social media data
language interaction.
 Customer reviews:
1.1.4 Applications of Natural Language
 Using public datasets
Processing (NLP):
 Generating synthetic data:
 Spam Filters
1.4 Text extraction: Unicode Normalization
 Algorithmic Trading
 Text extraction in NLP refers to the process
 Questions Answering
of identifying and extracting relevant
 Summarizing Information

1
ACADEMIC YEAR 2024-2025
IFETCE R-2023

information or structured data from  Phonetic Matching

unstructured textual data. This is  Language Models
particularly useful for tasks such as  Rule-Based Approaches
information retrieval, information  User Feedback
extraction, and summarization.  Domain-Specific
Techniques involved in text extraction:  Customization
 Entity Extraction  Pre-processing
 Keyword Extraction https://www.analyticsvidhya.com/blog/
 Phrase Extraction 2021/06/text-preprocessing-in-nlp-with-
 Information Extraction python-codes/
 Template Filling
 Text Summarization 1.5 Text preprocessing :
 Feature Extraction  Text preprocessing is an essential step
 Document Classification in natural language processing (NLP) that
 Unicode Normalization involves cleaning and transforming
http://www.unicode.org/reports/tr15/ unstructured text data to prepare it for
 It is normalizing Unicode to make analysis.
processing more uniform.
 a Unicode normalization standard to  It includes tokenization, stemming,
decompose a character into its basic parts. lemmatization, stop-word removal, and
 Unicode Normalization Forms are part-of-speech tagging.
formally defined normalizations of
Unicode strings which make it possible to
determine whether any two Unicode
strings are equivalent to each other.
Depending on the particular Unicode
Normalization Form, that equivalence can
either be a canonical equivalence or a
compatibility equivalence.
The four Unicode Normalization Forms
are summarized in Table 1.

 Text preprocessing is to prepare the text

data for the model building. It is the very
first step of NLP projects. Some of the
preprocessing steps are:
 Removing punctuations like . , ! $( ) *
%@
 Removing URLs
 Removing Stop words
1.4.3
 Lower casing
Spell Corrections:
 Tokenization
https://www.naukri.com/code360/library/
 Stemming
spelling-correction-in-nlp
 Lemmatization
 One way to deal with spelling errors in
 Preliminaries
NLP is by using techniques such as spell
https://www.slideshare.net/slideshow/
checking, phonetic matching, and
lecture-2-preliminaries-understanding-and-
incorporating language models that handle
preprocessing-data/54905946
out-of-vocabulary words effectively.
 In Natural Language Processing (NLP), the
1.4.3.1 several techniques commonly used to
preliminaries in preprocessing refer to the
handle spelling errors:
initial steps taken to prepare raw text data
 Spell Checking:

2
ACADEMIC YEAR 2024-2025
IFETCE R-2023

before it can be used for more advanced  Steps

linguistic analysis or modeling tasks. in NLP
 Tokenization
 Stemming
 Lemmatization
 Part-of-speech (POS) tagging
 Named entity recognition
 Chunking
1.6 Feature engineering :
https://www.geeksforgeeks.org/feature-
 These preliminary steps are crucial as they extraction-techniques-nlp/
help clean and transform the text into a  Feature engineering is the process
format that is more suitable for the specific of transforming raw data into features that are
NLP task at hand. suitable for machine learning models.
 Here are some common preliminaries in  In other words, it is the process of
preprocessing: selecting, extracting, and transforming the
 Text Cleaning most relevant features from the available
 Tokenization data to build more accurate and efficient
 Stopword Removal machine learning models.
 Normalization
 Handling Noise:
 Handling Rare Words:
 Sentence Segmentation:
 Part-of-Speech Tagging (POS
tagging)
 Feature Extraction

 These preliminaries are essential because

they lay the groundwork for more advanced
NLP tasks such as sentiment analysis, named
entity recognition, machine translation, and
more.
1.5.1 Frequent steps:
https://www.analyticsvidhya.com/blog/
2021/06/text-preprocessing-in-nlp-with-
Processes Involved in Feature Engineering
python-codes/
 Feature Creation
 Feature Transformation
 Feature Extraction
 Feature Selection
 Feature Scaling
Techniques Used in Feature Engineering
 One-Hot Encoding
 Binning
 Scaling
 Feature Split
 Text Data Preprocessing
Feature Engineering Tools
There are several tools available for feature
engineering some of popular ones are:
 Featuretools

3
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 TPOT of the
 DataRobot features extracted.
 Alteryx 1.6.2 Modelling:
 H2O.ai  In Natural Language Processing (NLP),
1.6.1 Machine Learning Pipeline in NLP: modeling refers to the process of building
https://www.geeksforgeeks.org/ computational models that can understand,
natural-language-processing-nlp generate, or analyze human language.
Feature engineering in the context of  These models are designed to process textual
Machine Learning (ML) and Deep data in a way that enables them to perform
Learning (DL) pipelines in Natural specific tasks or solve particular problems.
Language Processing (NLP) refers to
the process of creating meaningful
 Tasks in NLP Modeling
 Text Classification:
and relevant features from raw text
 Named Entity Recognition (NER
data that can be used as input to
 Machine Translation:
machine learning or deep learning
 Text Generation:
models.
 Question Answering:
 This is crucial because raw text data,
 Sentiment Analysis:
being unstructured, needs to be
 Steps in NLP Modeling:
transformed into a structured format that
 Data Preparation:
can effectively capture the underlying
 Model Selection
patterns and relationships in the data.
 Training
 Here’s how feature engineering fits into  Evaluation
ML and DL pipelines in NLP:  Deployment and Fine-tuning
Machine Learning Pipeline in NLP:  Challenges in NLP Modeling:
 Text Preprocessing:  Ambiguity and Variability
 Feature Extraction  Data Sparsity:
 Feature Selection/Engineering  Interpretable Representations
 Model Training and Evaluation 1.6.3 Evaluation:
Deep Learning Pipeline in NLP:  Evaluation metrics are quantitative
 Text Preprocessing measures
 Feature Representation used to assess the performance and
 Model Architecture effectiveness of Natural Language
 Training and Optimization Processing
 Fine-tuning: (NLP) systems.
 Evaluation
 These metrics help evaluate how well a
Integration of Feature Engineering in ML
particular NLP system performs its
and DL Pipelines:
intended task, such as machine translation,
 Pipeline Design: The design of ML and sentiment analysis, or named entity
DL pipelines in NLP often involves recognition.
integrating various stages of text
 Importance of evaluation metrics skills:
preprocessing, feature extraction, model
 Accuracy of NLP Results:
training, and evaluation.
 Comparative Analysis
 Iterative Process: Feature engineering is  Improvement and Optimization
often an iterative process where different  Task-Specific Expertise
features and representations are  Quality Assurance
experimented with to find the most
 key areas covered under the umbrella
effective ones for the task at hand.
of evaluation metrics:
 Domain Knowledge: Incorporating  Precision and Recall
domain knowledge and task-specific  F1 Score
requirements into feature engineering  Accuracy
enhances the relevance and effectiveness  Perplexity

4
ACADEMIC YEAR 2024-2025
IFETCE R-2023

 Task-Specific Metrics
 Applications of Evaluation
Metrics
 Model Development and
Selection
 Algorithm Fine-tuning and
Optimization
 Benchmarking and Research
Comparisons
 Quality Assurance and User
Satisfaction
 Performance Monitoring and
Error Analysis
1.6.4 Post Modelling Phases:
 Post-modeling phases in Natural
Language Processing (NLP) involve
activities that occur after the model has
been trained and evaluated.
 The key post-modeling phases in NLP:
 Model Evaluation and Validation:
 Hyperparameter Tuning:
 Model Deployment:
 Performance Monitoring and
Maintenance:
 Iterative Improvement and Feedback
Loop
 Ethical Considerations and Bias
Mitigation
 Documentation and Knowledge
Sharing

Introduction to NLP Course Guide
No ratings yet
Introduction to NLP Course Guide
26 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
NLP Front Matter
No ratings yet
NLP Front Matter
28 pages
NLP Pipeline
No ratings yet
NLP Pipeline
50 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
NLP Notes Unit 1
No ratings yet
NLP Notes Unit 1
179 pages
NLP Seminar for Graduate Students
No ratings yet
NLP Seminar for Graduate Students
22 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
NLP Intro Logistics MIHE
No ratings yet
NLP Intro Logistics MIHE
21 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Natural Language Processing (NLP) With Python - Tutorial
No ratings yet
Natural Language Processing (NLP) With Python - Tutorial
72 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
Unit 1
No ratings yet
Unit 1
99 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
No ratings yet
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
20 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
Unit 4
No ratings yet
Unit 4
39 pages
Unit 1
No ratings yet
Unit 1
20 pages
NLP Essentials for AI Enthusiasts
No ratings yet
NLP Essentials for AI Enthusiasts
4 pages
Module 2
No ratings yet
Module 2
19 pages
Languages: What Is Natural Language Processing ?
No ratings yet
Languages: What Is Natural Language Processing ?
25 pages
Introduction To NLP
No ratings yet
Introduction To NLP
23 pages
Wisdom Natural Language Processing
No ratings yet
Wisdom Natural Language Processing
4 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
BIT4133 Natural Language Processing Course Outline and Week 1 Introduction
No ratings yet
BIT4133 Natural Language Processing Course Outline and Week 1 Introduction
4 pages
Natural Language Processing - Step by Step Guide - NLP
No ratings yet
Natural Language Processing - Step by Step Guide - NLP
21 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Ai 2
No ratings yet
Ai 2
7 pages
Lecture 1 Intro To NLP
No ratings yet
Lecture 1 Intro To NLP
34 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
NLP Guide for AI Students
No ratings yet
NLP Guide for AI Students
29 pages
NLP Pipeline
No ratings yet
NLP Pipeline
58 pages
Part01 Overview
No ratings yet
Part01 Overview
31 pages
1 NLP (Introduction)
No ratings yet
1 NLP (Introduction)
60 pages
Module 1
No ratings yet
Module 1
49 pages
NLP Final
No ratings yet
NLP Final
33 pages
Lect 02
No ratings yet
Lect 02
23 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
68 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
Reading4 NLP
No ratings yet
Reading4 NLP
64 pages
ChatGPT-NLP Course Summary
No ratings yet
ChatGPT-NLP Course Summary
34 pages
Applied Natural Language Processing
No ratings yet
Applied Natural Language Processing
3 pages
NLP Basics for Beginners
No ratings yet
NLP Basics for Beginners
19 pages
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
No ratings yet
Natural Language Processing: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
61 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP Semester 7
100% (1)
NLP Semester 7
1,072 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Lesson 1 Introduction To Natural Language Processing
No ratings yet
Lesson 1 Introduction To Natural Language Processing
93 pages
Syllabus 2
No ratings yet
Syllabus 2
6 pages
Lect36 Tasks
No ratings yet
Lect36 Tasks
95 pages
Unit 3
No ratings yet
Unit 3
14 pages
NLP PPT1
No ratings yet
NLP PPT1
29 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
211 pages
Unit 1 Fundamentals
No ratings yet
Unit 1 Fundamentals
2 pages
List of Experiments
No ratings yet
List of Experiments
1 page
Heuristic Search with Manhattan Distance
No ratings yet
Heuristic Search with Manhattan Distance
12 pages
UG I YR - MODEL QP Template ODD SEM (100 Mark Pattern) WITH CO PO
No ratings yet
UG I YR - MODEL QP Template ODD SEM (100 Mark Pattern) WITH CO PO
3 pages
Sem II IA 2 ANS KEY AI&DS
No ratings yet
Sem II IA 2 ANS KEY AI&DS
15 pages
Python - Assignement Word
No ratings yet
Python - Assignement Word
2 pages
LESSON 10 - Functions
No ratings yet
LESSON 10 - Functions
6 pages
Draft MSBP
No ratings yet
Draft MSBP
3 pages
JSS 1 Third Term Maths Examination
100% (3)
JSS 1 Third Term Maths Examination
3 pages
SMS-Controlled Motor System
No ratings yet
SMS-Controlled Motor System
6 pages
DB QP
No ratings yet
DB QP
6 pages
Chapter4 OK
No ratings yet
Chapter4 OK
39 pages
Phenotype Boot
No ratings yet
Phenotype Boot
3 pages
Moneris PCI DSS Checklist 100716
No ratings yet
Moneris PCI DSS Checklist 100716
3 pages
Ramkir SHP0000774190 - 1
No ratings yet
Ramkir SHP0000774190 - 1
87 pages
Ai Lab Reports
No ratings yet
Ai Lab Reports
7 pages
CompTIA Server
No ratings yet
CompTIA Server
3 pages
Bosch Ve Pump Timing
No ratings yet
Bosch Ve Pump Timing
2 pages
American Eagle Pickup Notification
No ratings yet
American Eagle Pickup Notification
5 pages
Econ 151
No ratings yet
Econ 151
4 pages
Ball - Animation Slides
No ratings yet
Ball - Animation Slides
36 pages
Web Technologies Unit II Notes
No ratings yet
Web Technologies Unit II Notes
6 pages
Chapter 1 Complex Numbers: Question Bank
No ratings yet
Chapter 1 Complex Numbers: Question Bank
4 pages
Pic
100% (1)
Pic
71 pages
DCOM Univ Paper
0% (1)
DCOM Univ Paper
9 pages
PLINK3 Operation Ver.0.04.2004.10
No ratings yet
PLINK3 Operation Ver.0.04.2004.10
204 pages
Participant Guide: Xerox 6204 Wide Format Solution
No ratings yet
Participant Guide: Xerox 6204 Wide Format Solution
72 pages
Python DSA for Beginners
No ratings yet
Python DSA for Beginners
16 pages
Complexity Theory and Big O Notation
No ratings yet
Complexity Theory and Big O Notation
21 pages
文献目录帮助
100% (2)
文献目录帮助
9 pages
MGT 296: Competitive Strategy in Network and Information Industries Syllabus For Spring 2008 Class: Wednesdays, 3-6pm, Room 261, AOB IV
No ratings yet
MGT 296: Competitive Strategy in Network and Information Industries Syllabus For Spring 2008 Class: Wednesdays, 3-6pm, Room 261, AOB IV
17 pages
MotusOne Strategic IT Management Case
No ratings yet
MotusOne Strategic IT Management Case
79 pages
1 s2.0 S0164121217303072 Main
No ratings yet
1 s2.0 S0164121217303072 Main
34 pages
Cisco Live Introduction To SRv6 uSID Technology-2
No ratings yet
Cisco Live Introduction To SRv6 uSID Technology-2
129 pages
Pay Fixation Guide for Educators
No ratings yet
Pay Fixation Guide for Educators
17 pages
Serial Communication With ET 200S 1SI Module Via PROFIBUS-PROFINET CP
No ratings yet
Serial Communication With ET 200S 1SI Module Via PROFIBUS-PROFINET CP
57 pages

Unit I NLP

Uploaded by

Unit I NLP

Uploaded by

ACADEMIC YEAR 2024-2025

Unit I – INTRODUCTION 1.1.5

information or structured data from  Phonetic Matching

 Text preprocessing is to prepare the text

before it can be used for more advanced  Steps

 These preliminaries are essential because

You might also like