Mini Project
Natural Language Processing
“AUTOMATIC TEXT SUMMARIZATION ”
Group Members
Makarand Bhalerao - A - 21
Tejas Hasabnis - A - 35
Shrutika Kadam - A - 40
R1(2 M) R2(2 M) R3(1 M) Total(5M) Sign
Datta Meghe College of Engineering
Department of Computer Engineering
October 2022
Introduction
Text Summarization is one of those applications of Natural Language
Processing (NLP) which is bound to have a huge impact on our lives.
With growing digital media a never-growing publishing who has the
time to go through entire articles/documents / books. This is where
text summarization helps apps like Inshorts use it efficiently
Automatic Text Summarization gained attention as early as the 1950’
s. A research paper, published by Hans Peter Luhn in the late 1950s,
titled “The automatic creation of literature abstracts”, used features
such as word frequency and phrase frequency to extract important
sentences from the text for summarization purposes.
Summarization is the task of condensing a piece of text to a shorter
version, reducing the size of the initial text while at the same time
preserving key informational elements and the meaning of content.
Since manual text summarization is a time expensive and generally
laborious task, the automatization of the task is gaining increasing
popularity and therefore constitutes a strong motivation for academic
research.In the big data era, there has been an explosion in the amount
of text data from a variety of sources. This volume of text is an
inestimable source of information and knowledge which needs to be
effectively summarized to be useful. This increasing availability of
documents has demanded exhaustive research in the NLP area for
automatic text summarization. Automatic text summarization is the
task of producing a concise and fluent summary without any human
help while preserving the meaning of the original text document.
Problem Definition
Automatic Text Summarization is one of the most challenging and
interesting problems in the field of Natural Language Processing
(NLP). It is a process of generating a concise and meaningful
summary of text from multiple text resources such as books, news
articles, blog posts, research papers, emails, and tweets.
The demand for automatic text summarization systems is spiking
these days thanks to the availability of large amounts of textual data.
Summarization is a technique where a computer summarizes a text. A
text is given to the computer and the computer returns a required
extract of the original text document. Our methods on the sentence
extraction-based text summarization task use the graph based
algorithm to calculate importance of each sentence in document and
most important sentences are extracted to generate document
summary. These extraction based text summarization methods give an
indexing weight to the document terms to compute the similarity
values between sentences
Thus Automatic Text Summarization is very helpful in today's era
Proposed Solution
The first step would be to concatenate all the text contained in the
articles. Then split the text into individual sentences. In the next step,
we will find vector representation (word embeddings) for each and
every sentence.
Similarities between sentence vectors are then calculated and stored in
a matrix. The similarity matrix is then converted into a
graph, with sentences as vertices and similarity scores as edges, for
sentence rank calculation. Finally, a certain number of top-ranked
sentences form the final summary.
Extractive summarization picks up sentences directly from the
document based on a scoring function to form a coherent summary.
This method work
by identifying important sections of the text cropping out and stitch
together portions of the content to produce a condensed version.
Steps of the project:
Importing libraries
Load and preprocess the data
Apply Tokenization
Creating the model
Plot the model
Build the model
Prediction
Workflow of Project:
Code
Conclusion
Thus Text summarization is the technique for generating a concise and
precise summary of voluminous texts while focusing on the sections
that convey useful information, and without losing the overall
meaning
A final advantage of text summarization lies in its ability to increase
user engagement. When people read short summaries instead of
lengthy ones, they tend to spend less time reading each piece and will
typically read more as a result. This leads to higher levels of
engagement.
The study of automated text summarization still has a long way to go
before we can really claim to understand the nature of summaries.
The vast growth in the rate of information due to internet has called
for a need of efficient summarization systems.Although the research
on text summarization has started so many years ago, there is still a
long trail to walk and some more things to be researched as well