ACADEMIC YEAR 2024-2025
IFETCE R-2023
Unit I – INTRODUCTION 1.1.5
1.1 NLP: Overview: Future Scope:
The meaning of NLP is Natural Language Bots: Chatbots assist clients to get to the
Processing (NLP) which is a fascinating and point quickly by answering inquiries and
rapidly evolving field that intersects referring them to relevant resources and
computer science, artificial intelligence, and products at any time of day or night.
linguistics. Supporting Invisible UI: Almost every
With the increasing volume of text data connection we have with machines involves
generated every day, from social media posts human communication, both spoken and
to research articles, NLP has become an written.
essential tool for extracting valuable insights Smarter Search: NLP’s future also
and automating various tasks. includes improved search, something we’ve
Natural language processing (NLP) is a field been discussing at Expert System for a long
of computer science and a subfield of time.
artificial intelligence that aims to make 1.2 Approaches in NLP
computers understand human language. https://www.geeksforgeeks.org/rule-based-
1.1.2 NLP Techniques: approach-in-nlp/
Text Processing and Preprocessing In In the context of Natural Language
NLP Processing (NLP), "approaches" refer to
Syntax and Parsing In NLP different methodologies or techniques used to
Semantic Analysis tackle various tasks related to understanding
Information Extraction and processing human language.
Text Classification in NLP There are three types of NLP approaches:
Language Generation Rule-based Approach – Based on
Speech Processing linguistic rules and patterns
Question Answering Machine Learning Approach – Based
Dialogue Systems on statistical analysis
Sentiment and Emotion Analysis in NLP Neural Network Approach – Based
1.1.3 Working of Natural Language Processing on various artificial, recurrent, and
(NLP): convolutional neural network
algorithms
1.3 Data Acquisition:
https://www.linkedin.com/pulse/data-
acquisition-natural-language-processing-nlp-
vivekanandan
Data acquisition is the process of gathering
and collecting data for use in natural
language processing (NLP) tasks. The quality
and quantity of the data is critical to the
Working in natural language processing success of any NLP model.
(NLP) typically involves using
There are a number of different ways to
computational techniques to analyze and
acquire data for NLP tasks. Some common
understand human language. This can
methods include:
include tasks such as language
Crawling and scraping the web
understanding, language generation, and
Using social media data
language interaction.
Customer reviews:
1.1.4 Applications of Natural Language
Using public datasets
Processing (NLP):
Generating synthetic data:
Spam Filters
1.4 Text extraction: Unicode Normalization
Algorithmic Trading
Text extraction in NLP refers to the process
Questions Answering
of identifying and extracting relevant
Summarizing Information
1
ACADEMIC YEAR 2024-2025
IFETCE R-2023
information or structured data from Phonetic Matching
unstructured textual data. This is Language Models
particularly useful for tasks such as Rule-Based Approaches
information retrieval, information User Feedback
extraction, and summarization. Domain-Specific
Techniques involved in text extraction: Customization
Entity Extraction Pre-processing
Keyword Extraction https://www.analyticsvidhya.com/blog/
Phrase Extraction 2021/06/text-preprocessing-in-nlp-with-
Information Extraction python-codes/
Template Filling
Text Summarization 1.5 Text preprocessing :
Feature Extraction Text preprocessing is an essential step
Document Classification in natural language processing (NLP) that
Unicode Normalization involves cleaning and transforming
http://www.unicode.org/reports/tr15/ unstructured text data to prepare it for
It is normalizing Unicode to make analysis.
processing more uniform.
a Unicode normalization standard to It includes tokenization, stemming,
decompose a character into its basic parts. lemmatization, stop-word removal, and
Unicode Normalization Forms are part-of-speech tagging.
formally defined normalizations of
Unicode strings which make it possible to
determine whether any two Unicode
strings are equivalent to each other.
Depending on the particular Unicode
Normalization Form, that equivalence can
either be a canonical equivalence or a
compatibility equivalence.
The four Unicode Normalization Forms
are summarized in Table 1.
Text preprocessing is to prepare the text
data for the model building. It is the very
first step of NLP projects. Some of the
preprocessing steps are:
Removing punctuations like . , ! $( ) *
%@
Removing URLs
Removing Stop words
1.4.3
Lower casing
Spell Corrections:
Tokenization
https://www.naukri.com/code360/library/
Stemming
spelling-correction-in-nlp
Lemmatization
One way to deal with spelling errors in
Preliminaries
NLP is by using techniques such as spell
https://www.slideshare.net/slideshow/
checking, phonetic matching, and
lecture-2-preliminaries-understanding-and-
incorporating language models that handle
preprocessing-data/54905946
out-of-vocabulary words effectively.
In Natural Language Processing (NLP), the
1.4.3.1 several techniques commonly used to
preliminaries in preprocessing refer to the
handle spelling errors:
initial steps taken to prepare raw text data
Spell Checking:
2
ACADEMIC YEAR 2024-2025
IFETCE R-2023
before it can be used for more advanced Steps
linguistic analysis or modeling tasks. in NLP
Tokenization
Stemming
Lemmatization
Part-of-speech (POS) tagging
Named entity recognition
Chunking
1.6 Feature engineering :
https://www.geeksforgeeks.org/feature-
These preliminary steps are crucial as they extraction-techniques-nlp/
help clean and transform the text into a Feature engineering is the process
format that is more suitable for the specific of transforming raw data into features that are
NLP task at hand. suitable for machine learning models.
Here are some common preliminaries in In other words, it is the process of
preprocessing: selecting, extracting, and transforming the
Text Cleaning most relevant features from the available
Tokenization data to build more accurate and efficient
Stopword Removal machine learning models.
Normalization
Handling Noise:
Handling Rare Words:
Sentence Segmentation:
Part-of-Speech Tagging (POS
tagging)
Feature Extraction
These preliminaries are essential because
they lay the groundwork for more advanced
NLP tasks such as sentiment analysis, named
entity recognition, machine translation, and
more.
1.5.1 Frequent steps:
https://www.analyticsvidhya.com/blog/
2021/06/text-preprocessing-in-nlp-with-
Processes Involved in Feature Engineering
python-codes/
Feature Creation
Feature Transformation
Feature Extraction
Feature Selection
Feature Scaling
Techniques Used in Feature Engineering
One-Hot Encoding
Binning
Scaling
Feature Split
Text Data Preprocessing
Feature Engineering Tools
There are several tools available for feature
engineering some of popular ones are:
Featuretools
3
ACADEMIC YEAR 2024-2025
IFETCE R-2023
TPOT of the
DataRobot features extracted.
Alteryx 1.6.2 Modelling:
H2O.ai In Natural Language Processing (NLP),
1.6.1 Machine Learning Pipeline in NLP: modeling refers to the process of building
https://www.geeksforgeeks.org/ computational models that can understand,
natural-language-processing-nlp generate, or analyze human language.
Feature engineering in the context of These models are designed to process textual
Machine Learning (ML) and Deep data in a way that enables them to perform
Learning (DL) pipelines in Natural specific tasks or solve particular problems.
Language Processing (NLP) refers to
the process of creating meaningful
Tasks in NLP Modeling
Text Classification:
and relevant features from raw text
Named Entity Recognition (NER
data that can be used as input to
Machine Translation:
machine learning or deep learning
Text Generation:
models.
Question Answering:
This is crucial because raw text data,
Sentiment Analysis:
being unstructured, needs to be
Steps in NLP Modeling:
transformed into a structured format that
Data Preparation:
can effectively capture the underlying
Model Selection
patterns and relationships in the data.
Training
Here’s how feature engineering fits into Evaluation
ML and DL pipelines in NLP: Deployment and Fine-tuning
Machine Learning Pipeline in NLP: Challenges in NLP Modeling:
Text Preprocessing: Ambiguity and Variability
Feature Extraction Data Sparsity:
Feature Selection/Engineering Interpretable Representations
Model Training and Evaluation 1.6.3 Evaluation:
Deep Learning Pipeline in NLP: Evaluation metrics are quantitative
Text Preprocessing measures
Feature Representation used to assess the performance and
Model Architecture effectiveness of Natural Language
Training and Optimization Processing
Fine-tuning: (NLP) systems.
Evaluation
These metrics help evaluate how well a
Integration of Feature Engineering in ML
particular NLP system performs its
and DL Pipelines:
intended task, such as machine translation,
Pipeline Design: The design of ML and sentiment analysis, or named entity
DL pipelines in NLP often involves recognition.
integrating various stages of text
Importance of evaluation metrics skills:
preprocessing, feature extraction, model
Accuracy of NLP Results:
training, and evaluation.
Comparative Analysis
Iterative Process: Feature engineering is Improvement and Optimization
often an iterative process where different Task-Specific Expertise
features and representations are Quality Assurance
experimented with to find the most
key areas covered under the umbrella
effective ones for the task at hand.
of evaluation metrics:
Domain Knowledge: Incorporating Precision and Recall
domain knowledge and task-specific F1 Score
requirements into feature engineering Accuracy
enhances the relevance and effectiveness Perplexity
4
ACADEMIC YEAR 2024-2025
IFETCE R-2023
Task-Specific Metrics
Applications of Evaluation
Metrics
Model Development and
Selection
Algorithm Fine-tuning and
Optimization
Benchmarking and Research
Comparisons
Quality Assurance and User
Satisfaction
Performance Monitoring and
Error Analysis
1.6.4 Post Modelling Phases:
Post-modeling phases in Natural
Language Processing (NLP) involve
activities that occur after the model has
been trained and evaluated.
The key post-modeling phases in NLP:
Model Evaluation and Validation:
Hyperparameter Tuning:
Model Deployment:
Performance Monitoring and
Maintenance:
Iterative Improvement and Feedback
Loop
Ethical Considerations and Bias
Mitigation
Documentation and Knowledge
Sharing