KEMBAR78
NLP Class X AI | PDF | Human Communication | Cognitive Science
0% found this document useful (0 votes)
10 views36 pages

NLP Class X AI

The document provides an overview of Natural Language Processing (NLP), a sub-field of AI focused on enabling computers to understand human languages. It discusses various applications of NLP, including automatic summarization, sentiment analysis, text classification, and virtual assistants like chatbots. Additionally, it addresses challenges in processing natural language and outlines data processing techniques such as text normalization, stemming, lemmatization, and the Bag of Words algorithm.

Uploaded by

swastiksambhu10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views36 pages

NLP Class X AI

The document provides an overview of Natural Language Processing (NLP), a sub-field of AI focused on enabling computers to understand human languages. It discusses various applications of NLP, including automatic summarization, sentiment analysis, text classification, and virtual assistants like chatbots. Additionally, it addresses challenges in processing natural language and outlines data processing techniques such as text normalization, stemming, lemmatization, and the Bag of Words algorithm.

Uploaded by

swastiksambhu10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

ARTIFICIAL INTELLIGINCE

As Per Latest
CBSE Class X
Syllabus

Natural Language Processing


Natural Language Processing

• It is the sub-field of AI that is focused on


enabling computers to understand and
process human languages.

• It is concerned with the interactions


between computers and human (natural)
languages, in particular how to program
computers to process and analyse large
amounts of natural language data.
Applications of Natural Language
Processing

• Automatic Summarization

• Sentiment Analysis

• Text classification

• Virtual Assistants
Automatic Summarization

• It is the process of shortening a set of data


computationally, to create a summary that
represents the most relevant information within
the original content.

• It comes out as the solution to information


overload.

• It is about understanding emotional meanings


within the information.
Sentiment analysis

• It is about identifying sentiment among several


posts or even in the same post where emotion is not
always explicitly expressed.

• Companies use NLP applications, such as sentiment


analysis, to identify opinions and sentiment online to
help them understand what customers think about
their products and services.
Text classification

• Text classification makes it possible to assign


predefined categories to a document and
organize it to help finding the information
needed.

• For example, an application of text categorization


is spam filtering in email.
Virtual Assistants

• An application program that understands


natural language voice commands and
completes tasks for the user.

• Benefits of AI Assistants:

• Improved customer support

• Ease of key data collection


• Personalized user experience
• Examples:

Chatbots, Voice Assistants, AI Avatars, Domain


Specific Virtual Assistants, etc.
Chatbots

• One of the most common applications of Natural Language Processing is a chatbot.

• An Al software that can simulate a real human conversation with real-time responses to users
based on reinforced learning.

• AI Chatbots either use text messages, voice commands, or both.


Chatbots…
• Ex-
• Mitsuku Bot
https://www.pandorabots.com/mitsuku/
• CleverBot
https://www.cleverbot.com/
• Jabberwacky
http://www.jabberwacky.com/
• Haptik
https://haptik.ai/contact-us
• Rose
http://ec2-54-215-197-164.us-west-1.compute.amazonaws.com/speech.php
• Ochatbot
https://www.ometrics.com/blog/list-of-fun-chatbots/
Chatbots…

• There are 2 types of chatbots:

Ex- bots deployed in the customer care section of various companies


Human Language VS Computer Language

• Human brain continuously processes everything what it gets around, makes sense and stores it in
some place.

• When someone whispers, the focus of our brain automatically shifts(giving more priority) to that
speech and starts processing automatically.

• While, the computer understands the language of numbers.

• Everything that is sent to the machine has to be converted to numbers.


Difficulties during processing natural language
by a machine

Arrangement of the words and meaning

• There are structures/characteristics in the human language that might be easy for a
human to understand but extremely difficult for a computer to understand.

• Different syntax, same semantics:

2+3 = 3+2

• Different semantics, same syntax:

2/3 (Python 2.7) ≠ 2/3 (Python 3)


Difficulties during processing natural language
by a machine

Multiple Meanings of a word


=> His face turned red after he found out that he took the wrong bag.

=> His face turns red after consuming the medicine.

• Both the sentences might have multiple meanings.

Perfect Syntax, but no Meaning


=> Chickens feed extravagantly while the moon drinks tea.

• Both the sentences might have multiple meanings.


Data Processing : (Text Normalization)

• It involves preparing and cleaning text data for machines to be able to


analyze it.

• This process puts data in workable form and highlights features in the
text that an algorithm can work with.

• There are several ways this can be done, including:


Data Processing…

Sentence Segmentation:
In this process the whole corpus is divided into sentences. Each sentence is taken as a different data so
now the whole corpus gets reduced to sentences.
Data Processing…

• Tokenisation:

It is the process of breaking down the sentences into smaller units(tokens) to work with.
Data Processing…

• Removing Stopwords, Special Characters and Numbers:

It is the process of removing common words, special characters, etc(which do not add any
essence to the information) are removed from text so, unique words that offer the most
information about the text remain.

Some examples of stopwords are:

a, an, are, for, etc.


Data Processing…

• Converting text to a common case:

In this process the whole text is converted into a similar case(lower case). This ensures that the machine is
case-insensitive.
Data Processing…

Stemming:
Here, the remaining words are
reduced to their root words. It is the
process in which the affixes of words
are removed and the words are
converted to their base form.
Data Processing…

Lemmatization:
The process in which a word is converted to
its meaningful root form.

Stemming and lemmatization both are


alternative processes to each other as the
role of both the processes is same – removal
of affixes. But the difference between both of
them is that in lemmatization, the word we
get after affix removal (also known as lemma)

is a meaningful one.
Bag of words Algorithm

• A Natural Language
Processing model which helps
in extracting features out of
the text which is very helpful in
machine learning algorithms.

• The occurrences of each word


is counted and the vocabulary
for the corpus is constructed.
Bag of words…

The step-by-step approach to implement bag of words algorithm:

1. Text Normalisation: Collect data and pre-process it.

2. Create Dictionary: Make a list of all the unique words occurring in the
corpus. (Vocabulary).

3. Create document vectors: For each document in the corpus, find out
how many times the word from the unique list of words has occurred.

4. Create document vectors for all the documents.


Bag of words…
Bag of words…

Here are three documents having one sentence each. After text normalization, the text
becomes:

Note that no tokens have been removed in the stopwords removal step. It is because we have
very little data and since the frequency of all the words is almost the same, no word can be said
to have lesser value than the other.
Bag of words…

List down all the words which occur in all three documents:
Bag of words…

In this step,
•The vocabulary is written in the top row.
•Now, for each word in the document, if it matches
with the vocabulary, put a 1 under it.
•If the same word appears again, increment the
previous value by 1.
•And if the word does not occur in that document, put
a 0 under it.
Bag of words…

Since in the first document, we have words: aman, and,


anil, are, stressed. So, all these words get a value of 1 and
rest of the words get a 0 value.
Bag of words…

This gives us the document vector table for our corpus. But the tokens have still not
converted to numbers. This leads us to the final steps of our algorithm: TFIDF
Bag of words…

A plot of occurrence of words versus their value


TFIDF stands for Term Frequency and Inverse Document Frequency.
It helps in identifying the value for each word.
Let us understand each term one by one.

Term Frequency:
▪Term frequency is the frequency of a word in one
document.
▪It can easily be found from the document vector table.
Inverse Document Frequency:
▪It the total number of documents divided by the
document frequency.
▪ IDF =
Total no. of documents
The document frequency
TFIDF(W) = TF(W) * log( IDF(W) )
After calculating all the values:

Conclusion:
The value of a word is inversely proportional to the
IDF value of that word.
Ex-
Total Number of documents: 10
Number of documents in which ‘and’ occurs: 10
Therefore, IDF(and) = 10/10 = 1
Which means: log(1) = 0.
Hence, the value of ‘and’ becomes 0.
On the other hand,
Number of documents in which ‘pollution’ occurs: 3
IDF(pollution) = 10/3 = 3.3333…
Which means: log(3.3333) = 0.522;
Which shows that the word ‘pollution’ has considerable
value in the corpus.
Applications of TFIDF:-

• Document Classification

• Topic Modelling

• Information Retrieval System

• Stop word filtering

You might also like