0% found this document useful (0 votes)

14 views14 pages

Unit - 7 NLP

The document discusses various natural language processing (NLP) models, including BERT, GPT-2/3, XLNet, ELMO, ERNIE, ELECTRA, and T5, highlighting their architectures and key features. It also covers text summarization techniques, differentiating between extractive and abstractive methods, and outlines several algorithms for summarization, such as Luhn's Heuristic and LexRank. The document emphasizes the importance of pre-trained models in NLP tasks and the need for effective summarization in managing large volumes of information.

Uploaded by

shanthisarav75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Unit - 7 NLP

Uploaded by

shanthisarav75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT – 7

Natural Language Processing Models

Natural language processing is one of the most fascinating topics in artificial intelligence. Deep
learning models that have been trained on a large dataset to perform specific NLP tasks are referred
to as pre-trained models (PTMs) for NLP, and they can aid in downstream NLP tasks by avoiding
the need to train a new model from scratch.

List of natural language processing models :

1- BERT
Bidirectional Encoder Representations from Transformers is abbreviated as BERT, which was
created by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. It is a natural
language processing machine learning (ML) model that was created in 2018 and serves as a Swiss
Army Knife solution to 11+ of the most common language tasks, such as sentiment analysis and
named entity recognition.
BERT, compared to recent language representation models, is intended to pre-train deep
bidirectional representations by conditioning on both the left and right contexts in all layers. As a
matter of fact, the pre-trained BERT representations can be fine-tuned with just one additional
output layer to produce cutting-edge models for a variety of tasks, including question answering
and language inference, without requiring significant task-specific modifications.
This bidirectional model can learn from both the left and the right directions of a token’s context,
which is very important when we want the model to understand the context very well.

Ex:
We went to the river bank.
I need to go to bank to make a deposit. [context]

We want to get the context of the word “Bank”

If we take left to right part of the sentence or right to left part of the sentence alone, definitely we
commit a mistake.
We must consider both the left and right parts of the sentence to get the right context of the word
“bank”,
This is exactly done in BERT.

BERT’s continued success has been aided by a massive dataset of 3.3 billion words. It was trained
specifically on Wikipedia with 2.5B words and Google BooksCorpus with 800M words. These
massive informational datasets aided BERT’s deep understanding of not only the English language
but also of our world.

Key performances of BERT

• BERT suggests a pre-trained model that does not require any significant architecture
changes to be applied to specific NLP tasks.

2. GPT – 2/3
GPT means Generative Pre-trained Transformer 2/3 is an autoregressive language model that
uses deep learning to produce human-like text.
It utilizes the concept of the multi-layer transformer decoder as the feature extractor.
Given the wide variety of possible tasks and the difficulty of collecting a large labeled training
dataset, researchers proposed an alternative solution, which was scaling up language models to
improve task-agnostic few-shot performance. They put their solution to the test by training and
evaluating a 175B-parameter autoregressive language model called GPT-3 on a variety of NLP
tasks.

When using the transformers model, we see that it does not consider performing the feature
extraction in one direction, ie., from left to the right instead, it observes the following word while
predicting the next word.
It uses mask multi head attention concept ie., the transformers only consider one part of the input
text called one way transformer. GPT-2 is a successor of the GPT model that is trained on more
than 1.8 billion parameters and many web pages. The main objective is to predict the next word
with the presence of all the other prior words in the context.
The evaluation results show that GPT-3 achieves promising results and occasionally outperforms
the state of the art achieved by fine-tuned models under few-shot learning, one-shot learning, and
zero-shot learning.

Key performances of GPT-3

• It can create anything with a text structure, and not just human language text.
• It can automatically generate text summarizations and even programming code.

3- XLNet

The XLNet model was proposed in XLNet: Generalized Autoregressive Pretraining for Language
Understanding by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan
Salakhutdinov, and Quoc V. Le.
XLnet is a Transformer-XL model extension that was pre-trained using an autoregressive method
to maximize the expected likelihood across all permutations of the input sequence factorization
order.
XLNet is a generalized autoregressive pretraining method that enables learning in bidirectional
contexts by maximizing the expected likelihood over all permutations of the factorization order and
overcomes the limitations of other models thanks to its autoregressive formulation.
Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive
model, into pretraining. Empirically, XLNet outperforms BERT, for example, on 20 tasks, often by
a large margin, and achieves state-of-the-art results on 18 tasks, including question answering,
natural language inference, sentiment analysis, and document ranking.
Key performances of XLnet
• The new model outperforms previous models on 18 NLP tasks, including question
answering, natural language inference, sentiment analysis, and document ranking.
• XLnet consistently outperforms BERT often by a wide margin.

4.ELMO

Embeddings from Language Models (ELMO) is another pre trained autoregressive model used
in building NLP applications. It comprises the two i-way language models and the 1-way long
short term memory model for auto-regressive pre-training.
The long short term memory models are not used for encoding purposes because of the pretraining
issues. The LSTM model will have con textual representation & will impact the result of
prediction. The Embeddings from Language Models are a 1-way model as it handles the encoding
and characterization in one direction but then combines both later.

5.ERNIE
The bidirectional encoder representations from Transformer’s MLM model in the Chinese version
can capture linguistic data from the input text such as the collations; it is not comparatively stronger
when capturing semantic and entity level information. Hence, the ERNIE model utilizes the
extensively used pre-trained models outside the comprehension for performance boosting on the
knowledge level workloads.

Pre-trained tasks are

• Basic level Masking: The masking of words takes place like BERT. It is challenging to
learn the semantic information
• Phrase level masking: A single word is in the input level, and the mask happens for a
continuous phrase.
• Entity level masking: Entity recognition takes place as the first step and entities are finally
masked.

6.ELECTRA
It is a language model commonly used for supervised representation learning tasks. It takes less
computation and can be used to pre train the transformer-based models. These models can
distinguish between the real and the fake tokens created by other networks.
Existing pre-training methods generally fall under two categories: language models (LMs), such
as GPT, which process the input text left-to-right, predicting the next word given the previous
context, and masked language models (MLMs), such as BERT, RoBERTa, and ALBERT, which
instead predict the identities of a small number of words that have been masked out of the input.
MLMs have the advantage of being bidirectional instead of unidirectional in that they “see” the
text to both the left and right of the token being predicted, instead of only to one side. However,
the MLM objective (and related objectives such as XLNet’s) also have a disadvantage. Instead of
predicting every single input token, those models only predict a small subset — the 15% that was
masked out, reducing the amount learned from each sentence.

Existing pre-training methods and their disadvantages. Arrows indicate which tokens are used to produce a given output
representation (rectangle). Left: Traditional language models (e.g., GPT) only use context to the left of the current
word. Right: Masked language models (e.g., BERT) use context from both the left and right, but predict only a small
subset of words for each input.

Replaced token detection trains a bidirectional model while learning from all input positions.

The replacement tokens come from another neural network called the generator. While the
generator can be any model that produces an output distribution over tokens, we use a small
masked language model (i.e., a BERT model with small hidden size) that is trained jointly with
the discriminator. Although the structure of the generator feeding into the discriminator is similar
to a GAN, we train the generator with maximum likelihood to predict masked words, rather than
adversarially, due to the difficulty of applying GANs to text. The generator and discriminator share
the same input word embeddings. After pre-training, the generator is dropped and the discriminator
(the ELECTRA model) is fine-tuned on downstream tasks.
Our models all use the transformer neural architecture.

Text to Text Transfer Transformer

The model calls Text-to-Text Transfer Transformer (T5) .From the model name, you may already
know that the architecture of T5 is Transformer and leveraging transfer learning. This story will
not go through the detail of transformer architecture while you may visit this blog to understand
it.

The T5 model uses the transformer framework. The T5 model finds its application in language
translation, document translation, question generation and answering and text classification tasks.
It takes a text document as the input for training and outputs a text. This capability of the model
lets itself, the loss function and the hyperparameters are used across various text-based tasks.
Text Summarization Techniques

Text Summarization is the process of creating a compact yet accurate summary of text
documents.

What is the need for Text Summarization?

With the ever-growing amount of information available for use, it is important to have shorter,
and meaningful summaries for better structure to the same information.
Forms of Text Summarization
There are two primary approaches towards text summarization. They are -
1. Extractive
Within this approach, the most relevant sentences in the text document are reproduced as it is in
the summary. New words or phrases are thus, not added.
2. Abstractive
This approach, on the other hand, focuses on interpreting the text within documents and generating
new phrases that best represent the essence of the document.
Extractive Approach
The Extractive Approach is mainly based on three independent steps as described below.

1. Generation of an Intermediate Representation

The text that is to be summarized is drawn to an intermediate form, either by topic
representation or indicator representation.
Each kind of representation differs in complexity and has several techniques for
performing it.
2. Assign a score to each sentence
The Sentence Score directly implies how important the sentence is to the text.
3. Select Sentences for the Summary
The most relevant k number of sentences are selected for the summary based on several
factors such as eliminating redundancy, fulfilling the context, etc.

Abstractive Approach
The Abstractive Approach is mainly based on the following steps -

1. Establishing a context for the text

An Abstractive Approach works similar to human understanding of text summarization.
Thus, the first step is to understand the context of the text.
2. Semantics
Words based on semantic understanding of the text are either reproduced from the
original text or newly generated.

Example
Text: There were bad weather conditions in the town. Subsequently, the roads were impassable.
Extractive Approach
There were bad weather conditions in the town. Subsequently, the roads were impassable.
Bad weather conditions town. Subsequently, roads impassable
Abstractive Approach
Bad weather conditions made town roads impassable.
Methods of Implementation
Following are the text summarization techniques:

• Luhn's Heuristic Method

• Edmundson's Heuristic Method
• SumBasic
• KL-Sum
• LexRank
• TextRank
• Reduction
• Latent Semantic Analysis
Listed below are some common methods of text summarization, their advantages and
disadvantages –

Luhn's Heuristic Method

• Luhn proposed that the significance of each word in a text document signifies how
relevant it is.
• Filler words like 'a', 'and', 'the' and likewise are ignored and more importance is assigned
to the sentences in the beginning.
• The idea is that any sentence with maximum occurrences of the highest frequency words
are more important to the meaning of the document than others.
This is one of the earliest approaches of text summarization and is not considered very accurate
Edmundson's Heuristic Method
• This method uses the idea of defining bonus words and stigma words, words that are of
high or low importance respectively.
• Words in the document title are given additional importance.
• It is one of the earlier methods of text summarization, along with Luhn's Method.
SumBasic
• It is generally used for generating multi-document summaries.
• It applies the basic idea of probability, assuming that the high-frequency words in the
bag-of-words model of the document have a higher possibility of occurring in the
summary of the document.
• Probabilities are assigned to each word on the basis of their term frequency in the
document, and these probabilities are updated as sentences are chosen for the summary.
KL-Sum
• This method is based on the concept of KL Divergence and Unigram distribution.
• It includes those sentences to the summary that minimize Summary Vocabulary
Divergence from the original In out Vocabulary.
This method has no explicit way of eliminating redundancy
LexRank
It is "based on the concept of eigenvector centrality in a graph representation of sentences".
• Within this algorithm, each sentence recommends sentences similar to it.
• A graph is created with each node being a sentence, connected to its similar sentences
( the similarity measure is usually Cosine Similarity or TF-IDF )
• Sentences with maximum recommendations is more likely to get picked for the summary.
• The idea is that any sentence important to the text document will probably be repeated in
similar ways thus having more number of similar sentences.
TextRank
This algorithm is similar to LexRank but relatively simpler.
• This algorithm works on the same basic principle as LexRank, with the only difference
being the similarity measure or metric for construction of edges of the graph.
• In this algorithm, number of common words measure the sentences similarity.
• While LexRank can be applied to multiple documents, TextRank is primarily used for
single documents.
Reduction
• This method also works on the idea of graph-based modelling of the text document.
• It assigns importance to sentences in accordance with the sum of their edges to other
sentences.
Latent Semantic Analysis
• It works on the principle of Term Frequency along with Singular Value Decomposition.
• The idea is to resolve the document space to a "concept space", meaning the document is
broken down into the actual underlying concept and comparisons are made within that
space.
• This is a more complicated method as compared to others.
Applications
Text Summarizations finds a wide variety of applications in creation of headlines, synopses,
reviews, book, movie ,summaries, resumes, and so on.

Extractive Text summarization refers to extracting (summarizing) out the relevant information
from a large document while retaining the most important information.

BERT

BERT (Bidirectional Encoder Representations from Transformers) introduces rather advanced

approach to perform NLP tasks
BERT (Bidirectional tranformer) is a transformer used to overcome the limitations of RNN and
other neural networks as Long term dependencies. It is a pre-trained model that is naturally
bidirectional. This pre-trained model can be tuned to easily to perform the NLP tasks as
specified.
Points to a glance:

1. BERT models are pre-trained on huge datasets thus no further training is required.

2. It uses a powerful flat architecture with inter sentence tranform layers so as to get the best
results in summarization.
Advantages
1. It is most efficient summarizer till date.

2. Faster than RNN.

• The Summary sentences are assumed to be representing the most important points of a
document.
Methodology
For a set of sentences {sent1,sent2,sent3,...,sentn,} we have two possibilities , that are , yi={0,1}
which denotes whether a particular sentence will be picked or not.

Being trained as a masked model the output vectors are tokened instead of sentences. Unlike other
extractive summarizers it makes use of embeddings for indicating different sentences and it has
only two labels namely sentence A and sentence B rather than multiple sentences. These
embeddings are modified accordingly to generate required summaries.

The complete process can be divided into several phases, as follows:

Encoding Multiple Sentences
In this step sentences from the input document are encoded so as to be preprocessed. Each
sentence is preceded by a CLS tag and succeeded by a SEP tag. The CLS tag is used to aggregate
the features of one or more sentences.
Interval Segment Embeddings
This step is dedicated to distinguish sentences in a document. Sentences are assigned either of
the labels discussed above. For example,
{senti}= EA or EB depending upon i. The criterion is basically as EA for even i and EB for odd i.
Embeddings
It basically refers to the representation of words in their vector forms. It helps to make their
usage flexible. Even the Google utilizes this feature of BERT for better understanding of queries.
It helps in unlocking various functionality towards the semantics from understanding the intent
of the document to developing a similarity model between the words.
There are three types of embeddings applied to our text prior to feeding it to the BERT layer,
namely:

a. Token Embeddings - Words are converted into a fixed dimension vector. [CLS] and [SEP] is
added at the beginning and end of sentences respectively.

b. Segment Embeddings - It is used to distinguish or we can say classify the different inputs
using binary coding. For example, input1- "I love books" and input2- "I love sports". Then after
the processing through token embedding we would have

```[CLS],I,love,books,[SEP],I,love,sports ```
segment embedding would result into
```[0,0,0,0,0,1,1,1] ```
``` Input1=0 , Input2=1```
c. Position Embeddings - BERT can support input sequences of 512. Thus the resulting vector
dimensions will be (512,768). Positional embedding is used because the position of a word in a
sentence may alter the contextual meaning of the sentence and thus should not have same
representation as vectors. For example, "We did not play,however we were spectating."
In the sentence above "we" have must not have same vector representations.

NOTE - Every word is stored as a 768 dimensional representation. Overall sum of these
embeddings is the input to the BERT.
BERT uses a very different approach to handle the different contextual meanings of a word, for
instand "playing" and "played" are represented as play+##ing and play+ ##ed . ## here refers to
the subwords.
BERT Architecture
There are following two bert models introduced:
1. BERT base
In the BERT base model we have 12 transformer layers along with 12 attention layers
and 110 million parameters.

2. BERT Large
In BERT large model we have 24 transformer layers along with 16 attention layers and
340 million parameters.
Transformer layer- Tranformer layer is actually a combination of complete set of encoder and
decoder layers and the intermediate connections. Each encoder includes Attention layers along
with a RNN. Decoder also has the same architecture but it includes another attention layer in
between them as does the seq2seq model. It helps to concentrate on important words.

Summarization layers
The one major noticable difference between RNN and BERT is the Self attention layer. The
model tries to identify the strongest links between the words and thus helps in representation.
We can have different types of layers within the BERT model each having its own
specifications:

1. Simple Classifier - In a simple classifier method , a linear layer is added to the BERT
along with a sigmoid function to predict the score Yˆ i.
Yˆi = σ(WoTi + bo)

2. Inter Sentence Transformer - In the inter sentence transformer ,simple classifier is not
used. Rather various transformer layers are added into the model only on the sentence
representation thus making it more efficient. This helps in recognizing the important
points of the document.
h˜l = LN(hl-1 + MHAtt(hl-1)
hl = LN(h˜l + FFN(h˜l))
where h0 = PosEmb(T) and T are the sentence vectors output by BERT, PosEmb is the function
of adding positional embeddings (indicating the position of each sentence) to T, LN is the layer
normalization operation, MHAtt is the multi-head attention operation and the superscript l
indicates the depth of the stacked layer.
These are followed by the sigmoid output layer
Yˆi = σ(WohLi + bo)
hL is the vector for senti from the top layer (the L-th layer ) of the Transformer.
3. Recurrent Neural network - An LSTM layer is added with the BERT model output in
order to learn the summarization specific features. Where each LSTM cell is normalized.
. At time step i, the input to the LSTM layer is the BERT output Ti.
Ci = σ(Fi).Ci-1+ σ(Ii).tanh(Gi-1)
hi =σ(Ot).tanh(LNc(Ct))
where Fi, Ii, Oi are forget gates, input gates,
output gates; Gi is the hidden vector and Ci is the memory vector, hi is the output vector and
LNh, LNx, LNc are there difference layer normalization operations. The output layer is again the
sigmoid layer.
Pseudocode
BERT summarizer library can be directly installed in python using the following commands
pyhton pip install bert-extractive-summarizer for the easies of the implementation.
Import the required module from the library and create its object.

from summarizer import Summarizer

model=summarizer()
Python
Copy

Text to be summarized is to be stored in a variable

text='''
OpenGenus Foundation is an open-source non-profit organization with the aim to enable people
to work offline for a longer stretch, reduce the time spent on searching by exploiting the fact that
almost 90% of the searches are same for every generation and to make programming more
accessible.OpenGenus is all about positivity and innovation.Over 1000 people have contributed
to our missions and joined our family. We have been sponsored by three great companies namely
Discourse, GitHub and DigitalOcean. We run one of the most popular Internship program and
open-source projects and have made a positive impact over people's life.
'''
Python
Copy

Finally we call the model to pass our text for summarization

summary=model(text)
print(summary)
Python
Copy

OUTPUT-
OpenGenus Foundation is an open-source non-profit organization with the aim to enable people
to work offline for a longer stretch , reduce the time spent on searching by exploiting the fact that
almost 90 % of the searches are same for every generation and to make programming more
accessible. We run one of the most popular Internship program and open-source projects and have
made a positive impact over people 's life

The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
No ratings yet
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
71 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
BERT
No ratings yet
BERT
98 pages
Bert
No ratings yet
Bert
20 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
Literary Research On NLP
No ratings yet
Literary Research On NLP
4 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
BERT for NLP Experts
No ratings yet
BERT for NLP Experts
17 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
No ratings yet
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
12 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Duan 2020
No ratings yet
Duan 2020
6 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
4 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Bert
No ratings yet
Bert
10 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Paper Review
No ratings yet
Paper Review
6 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
BERT Applications in Natural Language Processing: A Review
No ratings yet
BERT Applications in Natural Language Processing: A Review
49 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
Three 150224 Generative A I Intro
No ratings yet
Three 150224 Generative A I Intro
19 pages
Study of Intelligent Search Engine of Energy Indus
No ratings yet
Study of Intelligent Search Engine of Energy Indus
8 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
Language Models for NLP Experts
No ratings yet
Language Models for NLP Experts
31 pages
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
No ratings yet
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
23 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
No ratings yet
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
8 pages
Transformer Models Overview for NLP
No ratings yet
Transformer Models Overview for NLP
5 pages
Bert
No ratings yet
Bert
36 pages
BERT: Key Insights for NLP Students
No ratings yet
BERT: Key Insights for NLP Students
33 pages
NLP Year in Review - 2019 - Dair - Ai - Medium
No ratings yet
NLP Year in Review - 2019 - Dair - Ai - Medium
26 pages
Trend
No ratings yet
Trend
47 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Bert 1
No ratings yet
Bert 1
4 pages
BERT
No ratings yet
BERT
4 pages
German Language Model Advances
No ratings yet
German Language Model Advances
9 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
BERT Survey: Types and Applications
No ratings yet
BERT Survey: Types and Applications
6 pages
Bert Model - NLP
No ratings yet
Bert Model - NLP
10 pages
AI: Pre-Trained Language Models Review
No ratings yet
AI: Pre-Trained Language Models Review
15 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
Evolution of NLP Models: LSTM to BERT
No ratings yet
Evolution of NLP Models: LSTM to BERT
30 pages
BERT (Bidirectional Encoder Representations From Transformers)
No ratings yet
BERT (Bidirectional Encoder Representations From Transformers)
4 pages
Rishabh Sharma (Anantika Johari)
No ratings yet
Rishabh Sharma (Anantika Johari)
8 pages
The Birth of BERT
No ratings yet
The Birth of BERT
7 pages
LLMs and Future Directions in AI
No ratings yet
LLMs and Future Directions in AI
8 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
3 - Ball Mill Grinding
92% (12)
3 - Ball Mill Grinding
78 pages
SICK Area Scanner Manual 2
No ratings yet
SICK Area Scanner Manual 2
12 pages
FMC Conventional Wellhead Breakdown
100% (1)
FMC Conventional Wellhead Breakdown
13 pages
Sintang City Drainage Masterplan
No ratings yet
Sintang City Drainage Masterplan
11 pages
GEA PHE NT en
100% (4)
GEA PHE NT en
2 pages
UTS (Philosophical Perspective)
No ratings yet
UTS (Philosophical Perspective)
68 pages
Precise Limit and Continuity Concepts
No ratings yet
Precise Limit and Continuity Concepts
12 pages
CNC Machine Control Guide
No ratings yet
CNC Machine Control Guide
2 pages
Importing Data Python Cheat Sheet PDF
No ratings yet
Importing Data Python Cheat Sheet PDF
1 page
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
No ratings yet
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
14 pages
TCP Flow Control and Error Control
No ratings yet
TCP Flow Control and Error Control
19 pages
HWSC S A0012940590 1
No ratings yet
HWSC S A0012940590 1
4 pages
Assignment Electric Charges & Field
No ratings yet
Assignment Electric Charges & Field
6 pages
Second Internal Test - December II PUC Statistics
No ratings yet
Second Internal Test - December II PUC Statistics
2 pages
Chemistry Worksheet 5
No ratings yet
Chemistry Worksheet 5
4 pages
Guideline For Customer Notifications PCN V5.0
No ratings yet
Guideline For Customer Notifications PCN V5.0
21 pages
Ciit VC Date Sheet (1st Sessional April 2014)
No ratings yet
Ciit VC Date Sheet (1st Sessional April 2014)
11 pages
Engineering Students' Grinding Lab
No ratings yet
Engineering Students' Grinding Lab
9 pages
Smart Polymers and Their Applications: September 2014, Volume 2 Issue 4, ISSN 2349-4476
No ratings yet
Smart Polymers and Their Applications: September 2014, Volume 2 Issue 4, ISSN 2349-4476
12 pages
Comprehensive Guide to Report Writing
No ratings yet
Comprehensive Guide to Report Writing
127 pages
BPS-300W To 30KVA Solar Power System - BESTSUN Solar 2017
No ratings yet
BPS-300W To 30KVA Solar Power System - BESTSUN Solar 2017
5 pages
A Novel Facts Compensation Scheme For Power Quality Improvement in Wind Smart Grid
No ratings yet
A Novel Facts Compensation Scheme For Power Quality Improvement in Wind Smart Grid
4 pages
Casting Processes and Defects Quiz
No ratings yet
Casting Processes and Defects Quiz
10 pages
Intro to Management Science
100% (2)
Intro to Management Science
30 pages
NRB IT Mix MCQ
No ratings yet
NRB IT Mix MCQ
14 pages
Nutrition in Plants and Animals.
No ratings yet
Nutrition in Plants and Animals.
136 pages
Addis Ababa University Department of Law Fresh Man Course
No ratings yet
Addis Ababa University Department of Law Fresh Man Course
12 pages
Zayat - Wireless Infra Structure & DDF
No ratings yet
Zayat - Wireless Infra Structure & DDF
18 pages
KG Basin
No ratings yet
KG Basin
8 pages
Introductory Econometrics Exam
No ratings yet
Introductory Econometrics Exam
2 pages

Unit - 7 NLP

Uploaded by

Unit - 7 NLP

Uploaded by

UNIT – 7

Natural Language Processing Models

List of natural language processing models :

We want to get the context of the word “Bank”

Key performances of BERT

Key performances of GPT-3

Pre-trained tasks are

Text to Text Transfer Transformer

What is the need for Text Summarization?

1. Generation of an Intermediate Representation

1. Establishing a context for the text

• Luhn's Heuristic Method

Luhn's Heuristic Method

BERT (Bidirectional Encoder Representations from Transformers) introduces rather advanced

2. Faster than RNN.

The complete process can be divided into several phases, as follows:

from summarizer import Summarizer

Text to be summarized is to be stored in a variable

Finally we call the model to pass our text for summarization

You might also like