0% found this document useful (0 votes)

1 views17 pages

Unit 6 Natural Language Processing 1

The document provides an overview of Natural Language Processing (NLP), explaining its importance in enabling computers to understand human languages through various applications like voice assistants, language translation, and sentiment analysis. It outlines the stages of NLP, including lexical analysis, syntactic analysis, and semantic analysis, and discusses text normalization techniques such as tokenization and stemming. Additionally, it introduces the Bag of Words model and TFIDF for feature extraction and word value identification in text processing.

Uploaded by

anshikajaisingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views17 pages

Unit 6 Natural Language Processing 1

Uploaded by

anshikajaisingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Unit 6: Natural Language Processing

Introduction
A natural language is a human language, such as French, Spanish, English, Japanese, etc.

Features of Natural Languages

● They are governed by set rules that include syntax, lexicon, and semantics.
● All natural languages are redundant, i.e., the information can be conveyed in multiple ways.
● All natural languages change over time.

Computer Language
Computer languages are languages used to interact with a computer, such as Python, C++, Java, HTML,
etc.

Can computers understand our language?

Computers require a specific set of instructions to understand
human input called programs.
To talk to a computer, we convert natural language into a
language that a computer understands. We need Natural
Language Processing to help computers understand natural
language.

Why is NLP important?

Computers can only process electronic signals in the form of
binary language. Natural Language Processing facilitates this
conversion to digital form from the natural form.
Thus, the whole purpose of NLP is to make communication
between computer systems and humans possible. This includes
creating different tools and techniques that facilitate better
communication of intent and context.

Demystify Natural Language Processing (NLP)

Natural Language Processing, or NLP, is the sub-field of AI that is focused on enabling computers to
analyse, understand and process human languages to derive meaningful information from human
language.
Applications of Natural Language Processing
Since Artificial Intelligence nowadays is becoming an integral part of our lives, its applications are very
commonly used by the majority of people in their daily lives.
Here are some of the applications of Natural Language Processing which are used in the real-life
scenario:

Voice assistants: Voice assistants take our natural speech, process

it, and give us an output. These assistants leverage NLP to
understand natural language and execute tasks efficiently.
For example:
Hey Google, set an alarm at 3.30 pm
Hey Alexa, play some music
Hey Siri, what's the weather today

Autogenerated captions: Captions are generated by turning natural

speech into text in real-time. It is a valuable feature for enhancing
the accessibility of video content.
For example:
Auto-generated captions on YouTube and Google Meet.

Language Translation: It incorporates the generation of translation

from another language. This involves the conversion of text or
speech from one language to another, facilitating cross-linguistic
communication and fostering global connectivity.
For example:
Google Translate

Sentiment Analysis: Sentiment Analysis is a tool to express an

opinion, whether the underlying sentiment is positive, negative, or
neutral. Customer sentiment analysis helps in the automatic
detection of emotions when customers interact with the products,
services, or brand

Text Classification: Text classification is a tool which classifies a

sentence or document category-wise.
In the example, we can observe news articles containing
information on various sectors, including Food, Sports, and
Politics, being categorized through the text classification process.
This process classifies the raw texts into predefined groups or
categories.
Keyword Extraction: Keyword extraction is a tool that automatically
extracts the most used, important words and expressions from a text.
It can give valuable insights into people’s opinions about any business
on social media.
Customer Service can be improved by using a Keyword extraction
tool.

Stages of Natural Language Processing (NLP)

The different stages of Natural Language Processing (NLP) serve various purposes in the overall task of
understanding and processing human language. The stages of Natural Language Processing (NLP)
typically involve the following :

Lexical Analysis:
NLP starts with identifying the structure of input words. It is the process of dividing a large chunk of
words into structural paragraphs, sentences, and words.
Lexicon stands for a collection of the various words and phrases used in a language.

Lengthy text is broken down into chunks.

Syntactic Analysis / Parsing
It is the process of checking the grammar of sentences and phrases. It forms a relationship among words
and eliminates logically incorrect sentences.

The grammar is correct!

Semantic Analysis
In this stage, the input text is now checked for meaning, and every word and phrase is checked for
meaningfulness.
For example:
It will reject a sentence that contains ‘hot ice cream’ in it. The fox jumped into the dog.

Sentences make actual sense!

Discourse Integration
It is the process of forming the story of the sentence. Every sentence should have a relationship with its
preceding and succeeding sentences.

The flow of words makes sense!

Pragmatic Analysis
In this stage, sentences are checked for their relevance in the real world. Pragmatic means practical or
logical, i.e., this step requires knowledge of the intent in a sentence. It also means to discard the actual
word meaning taken after semantic analysis and take the intended meaning.

The intended meaning has been achieved!

In summary,

Chatbots
One of the most common applications of Natural Language Processing is a chatbot.
A chatbot is a computer program that's designed to simulate human conversation through voice
commands or text chats or both.
It can learn over time how to best interact with humans. It can answer questions and troubleshoot
customer problems, evaluate and qualify prospects, generate sales leads and increase sales on an
ecommerce site. There are a lot of chatbots available.
Let us try some of the chatbots and see how they work.

Elizabot - https://www.masswerk.at/elizabot/
Mitsuki - https://www.kuki.ai/
Cleverbot - https://www.cleverbot.com/
Singtel - https://www.singtel.com/personal/support
As you interact with more and more chatbots, you would realise that some of them are scripted or in
other words are traditional chatbots while others are AI-powered and have more knowledge.
With the help of this experience, we can understand that there are 2 types of chatbots around us:
Script-bot and Smart-bot.

Text Processing
Humans interact with each other very easily. For us, the natural languages that we use are so convenient
that we speak them easily and understand them well too. But for computers, our languages are very
complex.
As you have already gone through some of the complications in human languages above, now it is time
to see how Natural Language Processing makes it possible for machines to understand and speak in
Natural Languages just like humans.
Since we all know that the language of computers is Numerical, the very first step that comes to our mind
is to convert our language to numbers. This conversion takes a few steps to happen.
The first step to it is Text Normalisation.
Since human languages are complex, we need to first of all simplify them in order to make sure that
understanding becomes possible. Text Normalisation helps in cleaning up the textual data in such a way
that it comes down to a level where its complexity is lower than the actual data.

Text Normalisation
In Text Normalisation, we undergo several steps to normalise the text to a lower level. Before we begin,
we need to understand that in this section, we will be working on a collection of written text.
That is, we will be working on text from multiple documents and the term used for the whole textual data
from all the documents altogether is known as corpus.

Not only would we go through all the steps of Text Normalisation, we would also work them out on a
corpus.
Let us take a look at the steps:

Sentence Segmentation
Under sentence segmentation, the whole corpus is divided into sentences. Each sentence is taken as a
different data so now the whole corpus gets reduced to sentences.

Tokenization
After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used
for any word or number or special character occurring in a sentence. Under tokenisation, every word,
number and special character is considered separately and each of them is now a separate token.
Removing Stop words, Special Characters and Numbers
In this step, the tokens which are not necessary are removed from the token list. What are the possible
words which we might not require?
Stop words are the words which occur very frequently in the corpus but do not add any value to it.
Humans use grammar to make their sentences meaningful for the other person to understand. But
grammatical words do not add any essence to the information which is to be transmitted through the
statement hence they come under stop words. Some examples of stop words are:

These words occur the most in any given corpus but talk very little or nothing about the context or the
meaning of it. Hence, to make it easier for the computer to focus on meaningful terms, these words are
removed.
Along with these words, a lot of times our corpus might have special characters and/or numbers. Now it
depends on the type of corpus that we are working on whether we should keep them in it or not.
For example, if you are working on a document containing email IDs, then you might not want to remove
the special characters and numbers whereas in some other textual data if these characters do not make
sense, then you can remove them along with the stop words.

Converting Text to a Common Case

After the stop words removal, we convert the whole text into a similar case, preferably lowercase. This
ensures that the case sensitivity of the machine does not consider the same words as different just because
of different cases.

Here in this example, all the 6 forms of hello would be converted to lowercase and hence would be
treated as the same word by the machine.
Stemming
In this step, the remaining words are reduced to their root words. In other words, stemming is the process
in which the affixes of words are removed and the words are converted to their base form.

Note that in stemming, the stemmed words (words that we get after removing the affixes) might not be
meaningful. Here in this example as you can see: healed, healing and healer all were reduced to heal but
studies was reduced to studi after the affix removal which is not a meaningful word. Stemming does not
take into account whether the stemmed word is meaningful or not. It just removes the affixes hence it is
faster.

Lemmatization
Stemming and lemmatization both are alternative processes to each other as the role of both the processes
is same – removal of affixes. But the difference between both of them is that in lemmatization, the word
we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that a lemma is a word with meaning and hence it takes a
longer time to execute than stemming.

As you can see in the same example, the output for studies after affix removal has become study instead
of studi.
The difference between stemming and lemmatization can be summarized by this example:

With this, we have normalised our text to tokens which are the simplest form of words present in the
corpus. Now it is time to convert the tokens into numbers. For this, we would use the Bag of Words
algorithm

Bag of Words
Bag of Words is a Natural Language Processing model which helps in extracting features out of the text
which can be helpful in machine learning algorithms. In the bag of words, we get the occurrences of each
word and construct the vocabulary for the corpus.

This image gives us a brief overview of how the bag of words works. Let us assume that the text on the
left in this image is the normalised corpus which we have got after going through all the steps of text
processing. Now, as we put this text into the bag of words algorithm, the algorithm returns to us the
unique words out of the corpus and their occurrences in it. As you can see on the right, it shows us a list
of words appearing in the corpus and the numbers corresponding to it show how many times the word
has occurred in the text body. Thus, we can say that the bag of words gives us two things:
1. A vocabulary of words for the corpus
2. The frequency of these words (number of times it has occurred in the whole corpus).
Here calling this algorithm a “bag” of words symbolises that the sequence of sentences or tokens does not
matter. In this case, all we need are the unique words and their frequency.

Here is the step-by-step approach to implementing the bag of words algorithm:

1. Text Processing: Collect data and pre-process it
2. Create a Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the word
from the unique list of words has occurred.
4. Create document vectors for all the documents.

Let us go through all the steps with an example:

Step 1: Collecting data and pre-processing it.

Document 1: Aman and Avni are stressed
Document 2: Aman went to a therapist
Document 3: Avni went to download a health chatbot
Here are three documents having one sentence each. After text normalisation, the text becomes:
Document 1: [aman, and, avni, are, stressed] Document 2: [aman, went, to, a, therapist]
Document 3: [avni, went, to, download, a, health, chatbot]
Step 2: Create a Dictionary
Go through all the steps and create a dictionary i.e., list down all the words which occur in all three
documents:
Dictionary:

Note that even though some words are repeated in different documents, they are all written just once as
while creating the dictionary, we create the list of unique words.

Step 3: Create a document vector

In this step, the vocabulary is written in the top row. Now, for each word in the document, if it matches
the vocabulary, put a 1 under it. If the same word appears again, increment the previous value by 1. And
if the word does not occur in that document, put a 0 under it.

Since, in the first document, we have words: aman, and, avni, are, stressed. So, all these words get a
value of 1 and the rest of the words get a 0 value.
Step 4: Create document vectors for all the documents.
The same exercise has to be done for all the documents. Hence, the table becomes:

In this table, the header row contains the vocabulary of the corpus and three rows correspond to three
different documents. Take a look at this table and analyse the positioning of 0s and 1s in it.
Finally, this gives us the document vector table for our corpus. However, the tokens have still not been
converted to numbers. This leads us to the final step of our algorithm: TFIDF.

TFIDF: Term Frequency & Inverse Document Frequency

Suppose you have a book. Which characters or words do you think would occur the most in it?
The bag of words algorithm gives us the frequency of words in each document we have in our corpus. It
gives us an idea that if the word is occurring more in a document, its value is more for that document.
For example, if I have a document on air pollution, air and pollution would be the words which occur
many times in it. And these words are valuable too as they give us some context around the document.
But let us suppose we have 10 documents and all of them talk about different issues.
One is on women's empowerment; the other is on unemployment and so on.
Do you think air and pollution would still be one of the most occurring words in the whole corpus?
If not, then which words do you think would have the highest frequency in all of them?

And, this, is, the, etc. are the words which occur the most in almost all the documents. But these words
do not talk about the corpus at all.
Though they are important for humans as they make the statements understandable to us, for the machine
they are a complete waste as they do not provide us with any information regarding the corpus.
Hence, these are termed as stop words and are mostly removed at the pre-processing stage only.

Take a look at this graph. It is a plot of the occurrence of words versus their value.
As you can see, if the words have the highest occurrence in all the documents of the corpus, they are said
to have negligible value hence they are termed as stop words. These words are mostly removed at the
pre-processing stage only.
Now as we move ahead from the stop words, the occurrence level drops drastically and the words which
have adequate occurrence in the corpus are said to have some amount of value and are termed as
frequent words. These words mostly talk about the document’s subject and their occurrence is adequate
in the corpus.
Then as the occurrence of words drops further, the value of such words rises. These words are termed as
rare or valuable words. These words occur the least but add the most value to the corpus.
Hence, when we look at the text, we consider frequent and rare words.

TFIDF stands for Term Frequency and Inverse Document Frequency.

TFIDF helps us identify the value of each word.

Term Frequency
Term frequency is the frequency of a word in one document. Term frequency can easily be found in the
document vector table as in that table we mention the frequency of each word of the vocabulary in each
document.

Here, you can see that the frequency of each word for each document has been recorded in the table.
These numbers are nothing but the Term Frequencies!
Inverse Document Frequency
Now, let us look at the other half of TFIDF which is Inverse Document Frequency. For this, let us first
understand what document frequency means. Document Frequency is the number of documents in which
the word occurs irrespective of how many times it has occurred in those documents. The document
frequency for the exemplar vocabulary would be:

Here, you can see that the document frequency of ‘aman’, ‘avni’, ‘went’, ‘to’ and ‘a’ is 2 as they have
occurred in two documents. The rest of them occurred in just one document hence the document
frequency for them is one.
Talking about inverse document frequency, we need to put the document frequency in the denominator
while the total number of documents is the numerator. Here, the total number of documents is 3, hence
inverse document frequency becomes:
Finally, the formula of TFIDF for any word W becomes:
TFIDF(W) = TF(W) * log( IDF(W) )
Here, log is to the base of 10. Don’t worry! You don’t need to calculate the log values by yourself.
Simply use the log function in the calculator and find out!
Now, let’s multiply the IDF values by the TF values.

Note that the TF values are for each document while the IDF values are for the whole corpus. Hence, we
need to multiply the IDF values to each row of the document vector table.

Here, you can see that the IDF values for Aman in each row are the same and a similar pattern is
followed for all the words of the vocabulary. After calculating all the values, we get:

Finally, the words have been converted to numbers. These numbers are the values of each for each
document.
Here, you can see that since we have less amount of data, words like ‘are’ and ‘and’ also have a high
value. But as the IDF value increases, the value of that word decreases.

That is, for example:

Total Number of documents: 10
Number of documents in which ‘and’ occurs: 10. Therefore, IDF(and) = 10/10 = 1
Which means: log(1) = 0. Hence, the value of ‘and’ becomes 0.
On the other hand, the number of documents in which ‘pollution’ occurs: 3 IDF(pollution) = 10/3 =
3.3333…
This means log(3.3333) = 0.522; which shows that the word ‘pollution’ has considerable value in the
corpus.

Summarising the concept, we can say that:

1.Words that occur in all the documents with high term frequencies have the lowest values and are
considered to be the stop words.
2.For a word to have a high TFIDF value, the word needs to have a high term frequency but less
document frequency which shows that the word is important for one document but is not a common word
for all documents.
3.These values help the computer understand which words are to be considered while processing the
natural language. The higher the value, the more important the word is for a given corpus.
Applications of TFIDF
TFIDF is commonly used in the Natural Language Processing domain. Some of its applications are:

Examples of Code and No-code NLP Tools

Applications of NLP
Introduction to Sentiment Analysis
Applications of Sentiment Analysis-Customer Service

Applications of Sentiment Analysis –Voice of the Customer

•Voice of the customer analysis helps to analyze customer feedback and gain actionable insights from it.

•It measures the gap between what customers expect and what they actually experience when they use the
products or services

Asm1 Artificial Intelligence 314384
No ratings yet
Asm1 Artificial Intelligence 314384
9 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
11 pages
NLP_Content2
No ratings yet
NLP_Content2
24 pages
What Is Natural Language Processing - 2025
No ratings yet
What Is Natural Language Processing - 2025
23 pages
10-Unit 6 NLP-Notes and Exercise
No ratings yet
10-Unit 6 NLP-Notes and Exercise
13 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
NLP Basics for Beginners
No ratings yet
NLP Basics for Beginners
8 pages
Basic NLP To End-To-End Pipeline .PPTX - Removed
No ratings yet
Basic NLP To End-To-End Pipeline .PPTX - Removed
35 pages
NLP Notes
No ratings yet
NLP Notes
90 pages
Natural Language Processing (NLP) Unit-I 1. Natural Language Processing Introduction
No ratings yet
Natural Language Processing (NLP) Unit-I 1. Natural Language Processing Introduction
90 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Group 8 NLP
No ratings yet
Group 8 NLP
3 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
Lesson 1 Introduction To Natural Language Processing
No ratings yet
Lesson 1 Introduction To Natural Language Processing
93 pages
Unit-6 Natural Language Processing
No ratings yet
Unit-6 Natural Language Processing
7 pages
Introduction
No ratings yet
Introduction
24 pages
A Beginner's Introduction To Natural Language Processing (NLP)
100% (1)
A Beginner's Introduction To Natural Language Processing (NLP)
15 pages
Notes MSC NLP
No ratings yet
Notes MSC NLP
36 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Unit 3&4
No ratings yet
Unit 3&4
10 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
IP Projects NLP
No ratings yet
IP Projects NLP
8 pages
1 Introduction
No ratings yet
1 Introduction
13 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
9 pages
TSA Book
No ratings yet
TSA Book
154 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
27 pages
Aids Module 5
No ratings yet
Aids Module 5
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
Lakshmi Priya Vellineni - Module 4 Assignment
No ratings yet
Lakshmi Priya Vellineni - Module 4 Assignment
5 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
PresentationDayone-Introduction of NLP
No ratings yet
PresentationDayone-Introduction of NLP
17 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
NLP Unit-1 Merged
No ratings yet
NLP Unit-1 Merged
41 pages
NLP for AI and Business Solutions
No ratings yet
NLP for AI and Business Solutions
13 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
NLP Unit-1
No ratings yet
NLP Unit-1
20 pages
NLP for AI and Tech Enthusiasts
No ratings yet
NLP for AI and Tech Enthusiasts
30 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Natural Language Processing Using Artificial Intelligence
No ratings yet
Natural Language Processing Using Artificial Intelligence
3 pages
What is Natural Language Processing (1)
No ratings yet
What is Natural Language Processing (1)
3 pages
Natural Language Processing Unit1
No ratings yet
Natural Language Processing Unit1
23 pages
NLP CH 1
No ratings yet
NLP CH 1
8 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
NLP Insem Notes
No ratings yet
NLP Insem Notes
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
NLP M1 Students
No ratings yet
NLP M1 Students
17 pages
NLP Basics for Computer Science Students
No ratings yet
NLP Basics for Computer Science Students
87 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
Introduction To
No ratings yet
Introduction To
16 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
NLP Unit-1 - 1
No ratings yet
NLP Unit-1 - 1
24 pages
Unit 1 Extra
No ratings yet
Unit 1 Extra
6 pages
AI Unit-3
No ratings yet
AI Unit-3
17 pages
Chapter 6 Natural Language Processing
No ratings yet
Chapter 6 Natural Language Processing
6 pages
Areas Related to Circles 202324
No ratings yet
Areas Related to Circles 202324
3 pages
Notes Poem Amanda (1)
No ratings yet
Notes Poem Amanda (1)
4 pages
Mcq Quadratic Equations
No ratings yet
Mcq Quadratic Equations
2 pages
Unit–1 Revisiting AI Project Cycle and Ethical Frameworks for AI.docx (1)
No ratings yet
Unit–1 Revisiting AI Project Cycle and Ethical Frameworks for AI.docx (1)
8 pages
Money and Credit QA (1)
No ratings yet
Money and Credit QA (1)
4 pages
MUN Draft Resolution ECOSOC[1]
No ratings yet
MUN Draft Resolution ECOSOC[1]
1 page
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
Class X Physics Practical 3 202223
No ratings yet
Class X Physics Practical 3 202223
2 pages
Program Name BCA Title of The Course Data Mining Course Code CA-E1 Credits 03 Total No. of Teaching Hours 48
No ratings yet
Program Name BCA Title of The Course Data Mining Course Code CA-E1 Credits 03 Total No. of Teaching Hours 48
2 pages
VHDL Data Types & Usage Guide
No ratings yet
VHDL Data Types & Usage Guide
70 pages
Sample Ieee Literature Review
67% (3)
Sample Ieee Literature Review
7 pages
Technician'S Manual: Wall Mount & Fully Ducted
No ratings yet
Technician'S Manual: Wall Mount & Fully Ducted
46 pages
M12l64164a (2y) PDF
No ratings yet
M12l64164a (2y) PDF
45 pages
Inggris Pas Gasal 22 23
No ratings yet
Inggris Pas Gasal 22 23
8 pages
Leaflet CED1700 - 00 Released Ireland (English) High-Res A4.fm
No ratings yet
Leaflet CED1700 - 00 Released Ireland (English) High-Res A4.fm
3 pages
Os - Lab - Manual Cse-2024-25
No ratings yet
Os - Lab - Manual Cse-2024-25
58 pages
AccountStatement Report 6068523653 29052024 12 28
No ratings yet
AccountStatement Report 6068523653 29052024 12 28
2 pages
Python Module-1 QB Solution (21EC643)
No ratings yet
Python Module-1 QB Solution (21EC643)
25 pages
Changelog
No ratings yet
Changelog
4 pages
Ae Books Browse Tree Guide
No ratings yet
Ae Books Browse Tree Guide
1,658 pages
ADA - UNIT-2 - Chapter-1 - SETS REPRESENTATION
No ratings yet
ADA - UNIT-2 - Chapter-1 - SETS REPRESENTATION
18 pages
Google Glass Technical Seminar Report
No ratings yet
Google Glass Technical Seminar Report
30 pages
Man 8035 Ord Hand
No ratings yet
Man 8035 Ord Hand
1 page
Ayush Resume
No ratings yet
Ayush Resume
1 page
Introduction of Blu-Ray Disc
No ratings yet
Introduction of Blu-Ray Disc
17 pages
V920 International Lamp Driver
No ratings yet
V920 International Lamp Driver
24 pages
10th IT - Sample Paper
No ratings yet
10th IT - Sample Paper
5 pages
Richard Project
No ratings yet
Richard Project
14 pages
IL230x-B110 Fieldbus Box Modules For EtherCAT
No ratings yet
IL230x-B110 Fieldbus Box Modules For EtherCAT
2 pages
Advanced LTE Radio Planning Course
No ratings yet
Advanced LTE Radio Planning Course
4 pages
Take Section 3.3
No ratings yet
Take Section 3.3
10 pages
Web Lab
No ratings yet
Web Lab
42 pages
Analogue Electronics 111 Eec 314 315 HND 1 Ce Ee
100% (1)
Analogue Electronics 111 Eec 314 315 HND 1 Ce Ee
2 pages
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
No ratings yet
Atmega809/1609/3209/4809 - 48-Pin: 48-Pin Data Sheet - Megaavr® 0-Series
82 pages
Raghavi Resume
No ratings yet
Raghavi Resume
1 page
Java Serialization for CS Students
No ratings yet
Java Serialization for CS Students
29 pages
Seminar 1
No ratings yet
Seminar 1
98 pages
UNIT IV - C Programing
No ratings yet
UNIT IV - C Programing
13 pages

Unit 6 Natural Language Processing 1

Uploaded by

Unit 6 Natural Language Processing 1

Uploaded by

Unit 6: Natural Language Processing

Features of Natural Languages

Can computers understand our language?

Why is NLP important?

Demystify Natural Language Processing (NLP)

Voice assistants: Voice assistants take our natural speech, process

Autogenerated captions: Captions are generated by turning natural

Language Translation: It incorporates the generation of translation

Sentiment Analysis: Sentiment Analysis is a tool to express an

Text Classification: Text classification is a tool which classifies a

Stages of Natural Language Processing (NLP)

Lengthy text is broken down into chunks.

The grammar is correct!

Sentences make actual sense!

The flow of words makes sense!

The intended meaning has been achieved!

Converting Text to a Common Case

Here is the step-by-step approach to implementing the bag of words algorithm:

Let us go through all the steps with an example:

Step 1: Collecting data and pre-processing it.

Step 3: Create a document vector

TFIDF: Term Frequency & Inverse Document Frequency

TFIDF stands for Term Frequency and Inverse Document Frequency.

That is, for example:

Summarising the concept, we can say that:

Examples of Code and No-code NLP Tools

Applications of Sentiment Analysis –Voice of the Customer

You might also like