From book to presentation generation using hybrid summarization
From book to presentation generation using hybrid summarization
Final Year Project Proposal by
Samiha Azeem
(231462422)
Aleezae Zia
(231523065)
Fateh Ali Alim
(231523017)
Primary Advisor: Muhammad Salman Chaudhry
Secondary Advisor: Rabranea Bqa
Department of Computer Science
Forman Christian College University
Lahore, Pakistan (20xx)
From book to presentation generation using hybrid summarization
ABSTRACT
In today’s world, books play an important part in any field to grow. From story books to extensive
academic books, every individual needs to access them. Over the past few years, Artificial
Intelligence (AI) and machine learning have gained widespread importance and are now integral
parts of various industries. Summarization methods have become significant for data recovery as
enormous volumes of information are accessible on internet and since time is running short
limitations, it is hard for a human to filter through huge measures of information to remove relevant
data. To find a relevant technique that can help to develop a prototype that can automatically
generate a PowerPoint presentation from a given book by utilizing hybrid summarization techniques
is the goal of this research paper. In order to produce a summary that is both more precise and more
coherent, hybrid summarization will be studied and researched. Hybrid summarization combines the
extractive and abstractive methods. The objective of this examination is to give an instrument to
teachers and experts to rapidly and effectively make introductions in view of composed material. The
system would take the most important ideas and concepts from the book, make a presentation outline,
and make slides with text and images that show the content the best. The review includes
investigating different rundown strategies, assessing the viability of the framework, and researching
its expected applications in different fields. The models will be evaluated based on various metrics,
including rouge and human evaluation.
1
From book to presentation generation using hybrid summarization
TABLE OF CONTENTS
Introduction ……………………………………………………………………………… 4
Problem Statement …………………………………………….……………………….. 5-6
Literature Review ………………………………………………….…………………… 6-
11
Project Overview …………………………………………………………..………….. 11-
12
Project Development Methodology ………………………………...…………………. 12-
13
Project Milestones and Deliverables …………………………………………..………… 13
Work Division …………………………………………………………………...……. 14-
15
Costing ……………………………………………………………………..……………. 15
References ……………………………………………………………….……………… 16
2
From book to presentation generation using hybrid summarization
1. INTRODUCTION
In today's knowledge-driven economy, academic books play a critical role in disseminating new
ideas and research findings across the scholarly community. However, the traditional format of
academic books can make it challenging for readers to quickly identify the most important ideas
and concepts contained within. With the abundance of information available today, time is often a
precious commodity, and it can be challenging to effectively communicate important ideas and
concepts to others. This can be especially challenging for busy academics and researchers who may
not have the time to read an entire book cover-to-cover.
Researchers and practitioners have begun to explore the use of AI summarization approaches to
automatically generate presentations from academic books. By leveraging the power of machine
learning and natural language generation, it may be possible to quickly and efficiently extract the
key ideas and concepts from an academic book and present them in a way that is both informative
and engaging.
To address this challenge, we are researching different summarization techniques that can help in
developing an AI-powered product that uses advanced machine learning and natural language
generation techniques to automatically generate presentations from books. Our research paper will
aid designing a product that can help individuals and organizations quickly and easily extract the
most important ideas and concepts from a book and present them in a way that is engaging,
informative, and customizable to the needs of the audience.
3
From book to presentation generation using hybrid summarization
The framework can save time and exertion for the individuals who need to plan introductions in
view of composed material, as it mechanizes the most common way of summing up and making
slides. The produced PowerPoint introductions can likewise be more precise and far reaching than
physically made ones, as the framework utilizes modern rundown methods to extricate key data.
In addition, the device can further develop availability for individuals with visual debilitations or
learning handicaps who might find it challenging to peruse extensive books. By creating precise
and straightforward PowerPoint, they can all the more likely access and figure out the substance.
. In this paper, we will describe the development process for our prototype, including the AI models
and algorithms used, as well as the user interface and user experience design. We will also discuss
potential applications and use cases for this technology, as well as its limitations and areas for
future development
2. PROBLEM STATEMENT
There is a growing need for a tool that can convert lengthy and complex academic and educational
books into more engaging and interactive presentations that can be consumed in a shorter amount of
time. The time-consuming and difficult process of creating PowerPoint presentations based on
written material is the issue that is addressed by the research topic "From book to generating
presentation using hybrid summarization." Experts, instructors, and specialists frequently need to get
ready introductions in view of books, reports, and other extensive records. However, manually
summarizing the content and creating a presentation outline can be time-consuming and require
significant effort.
The extractive or abstractive summarization methods that are utilized by the currently available
4
From book to presentation generation using hybrid summarization
summarization tools are often hampered in terms of accuracy and coherence. Abstractive
summarization generates new sentences that capture the essence of the original text, whereas
extractive summarization simply selects and condenses existing sentences from the text. hybrid
summarization, then again, joins these strategies to make a more precise and reasonable outline of
the content.
The problem statement, therefore, is the lack of an automated tool that can generate PowerPoint
presentations from a given book using hybrid summarization techniques. For those who need to
make presentations based on written material, such a tool could save time and effort while also
providing a more accurate and comprehensive summary of the material.
Using hybrid summarization methods, the proposed research aims to create a system that can
automate the process of summarizing books and creating PowerPoint presentations. By streamlining
the process of creating presentations and enhancing the information's accuracy and accessibility, this
would be a useful tool for educators and professionals in a variety of fields.
To achieve this goal, the study will explore several research questions, such as:
1. How can hybrid summarization techniques be effectively applied to generate PowerPoint
presentations from a given book?
2. What are the most important features and elements to include in a PowerPoint presentation
generated from a book, and how can they be selected using natural language processing
techniques?
3. How can the system be evaluated in terms of accuracy, coherence, and efficiency, and how
does it compare to existing summarization tools?
5
From book to presentation generation using hybrid summarization
3. LITERATURE REVIEW
Starting off this discussion a paper which condenses the basics of automated summarization The
paper “Text Summarization Techniques: A Brief Survey “by Mehdi Allahyari, Seyedamin Pouriyeh,
Mehdi Assef, provides a good starting point, focusing on the simple understanding of abstractive and
extractive summarization techniques, further discussing different models which might be used for
these techniques. Extractive summarizing bases the summarization based on important sentences and
phrases in a text, this is done based on frequency and representative sentences, phrases and words.
This method has its draw backs such as loss of coherence and inability to generate new information.
In short this method is best for simple uncomplicated and informal text summarization. This type of
summarization will use graph-based methods, sentence scoring and clustering based methods, but
these methods have a drawback of not being able to capture to semantic relationships between
sentences. On the other side abstractive summarization uses paraphrasing text creating new sentences.
This covers extractive summarizations draw back incoherence. But the paper does have its own
challenges, such as difficulty in preservation of the source texts meaning and so needing further
advanced natural language processing techniques. Abstractive techniques may use rule-based
methods, machine learning-based methods and deep learning-based methods. Deep learning the most
used method as it can capture complicated semantic relationships between not only sentences but also
words. But using deep learning requires a sizable amount of data and computational resources. The
paper has proved a good base in-order to research more complex and unique applications of these
techniques and models.
Now moving the topic of interest to abstractive summarization the paper "Abstractive Text
Summarization using Sequence-to-sequence RNNs and Beyond" by Nallapati et al, uses Abstractive
6
From book to presentation generation using hybrid summarization
summarization which generates a response based on the meaning of the test itself in-order to rephrase
the text shortening the input using deep learning models. As a result, the researchers used RNN’s as
Recurrent neural network schemes work for both abstractive and extractive summarization, this
technique summarizes in a sequential manner in correlation to a sequence-to-sequence model. Both
the models were mixed as the summarization was done based on machine translation encoding the
source text into fixed length vectors, the vectors are then decoded into a summary based on key
information from the original text. The model of mixed sequence to sequence and RNN.s can be seen
a successful method but abstractive summarization can lead to some issues. This method can be seen
to be successful for simple novel books using a basic and simple literature construction as the words
themselves can be inter changed to make the text simpler and even more easy to understand. As a
result of this given an input text which may be text reliant and to complex may result in an inaccurate
summary since some words and definitions won’t be interchangeable.
For a different approach Acharya carried out research which was condensed in “Extractive Text
Summarization Using Machine Learning" by. Which focuses on the use of machine learning
techniques for extractive summarization, mainly focusing on supervised learning for identifying
important words and phrases in an input text. The authors described different techniques which come
under this umbrella, like decision trees, Support vector machines, chi-square and etc. The authors
themselves select a mix of Support Vector Machines and Chi-square, which is a feature selection
technique. This method may produce acceptable f-scores, precision and recall scores, never the less
the methods a major issue. Starting off the method requires a large amount of training data, if the
training data is not up to code the model may not be able to deal specific and unique texts test cases
which were not catered. Another issue was the conversion didn’t take into account the semantic
"Get To The Point: Summarization with Pointer-Generator Networks" by See et al is a paper which
7
From book to presentation generation using hybrid summarization
also focuses their research on abstractive research but propose the use of Pointer Generated
Networks. The paper’s main goal is to generate a coherent and informative summary capturing the
main idea of the source text. As such the authors propose a neural network mechanism which will in
cooperate pointing and generative mechanisms. The pointing mechanism allows copying words from
the original input. Generative mechanism takes the opposite direction, allowing the generation of new
words not appearing the input. These proposes network is based on a sequence-to-sequence model
using encoding and decoding. Normally in this model encoder takes an input and produces states
which are used the decoder to generate a summary. The paper presents two additional components,
the pointer mechanism copies words from the source and the generator mechanism prevents the
model from repeatedly attending a previously visited section of the input text, increasing efficiency.
The evaluation bases on baseline in terms of ROUGE scores, which showed an increase in efficiency
and effectiveness of the approach and its potential to improving auto generating summarization tools.
The paper by Saif Mohammad and Bonnie J. Dorr "A Hybrid Approach to Automatic Summarization
of News Articles" was a little unique and different from other papers since the paper’s emblements
the proposed idea of Hybrid summarization, combining extractive and abstractive techniques to
generate a summary. This done through a carefully constructive method, first the most important
sentences and words are extracted from the input text using graph-based algorithm this a type of
extractive summarization. Next the researchers used a sequence-to-sequence model which is usually
used in abstractive summarization, to generate the summary based on the extracted sentences. The
authors further compared their results to abstractive and extractive summarization, the evaluation was
done using ROUGUE which as expected shows that the hybrid summarization outperformed other
methods in terms of ROUGUE score. Furthermore, conducting a human survey also showed that
hybrid summarization resulted in a more readable and informative summary of the source text.
8
From book to presentation generation using hybrid summarization
In Comparative Study of Text Summarization Methods" Munot and Govilkar evaluate the existing
summarization techniques. The different methods evaluated may including statistical and machine
learning-based approaches, graph-based methods, and hybrid techniques, highlighting the importance
of domain specific summarization. The comparative study used six summarization techniques,
KLSum, TextRank, LexRank, Luhn's algorithm, Latent Semantic Analysis (LSA), and Naive Bayes
(NB). The study was based on a dataset of 10 documents, with a total of 5,554 sentences. The
evaluation was performed though ROUGE metrics. The results showed that TextRank and LexRank
had the best performance on this dataset. While LSA and NB’s performance was not u to the
standards. One of the strengths of this paper is that it provides a comprehensive overview of the
different text summarization methods and compares them in a detailed and systematic manner. The
authors have also included a critical analysis of the existing literature on text summarization, which
can be valuable for researchers and practitioners working in this field.
The most common way of summing up a lot of composed material has been the subject of much
research in the field of normal language handling (NLP). Techniques for automatically summarizing
lengthy texts into shorter summaries have been developed using extractive and abstractive
summarization. Abstractive summarization creates new sentences that capture the essence of the
original text, whereas extractive summarization selects and condenses existing sentences from the
text.
However, these methods lack coherence and accuracy, so hybrid summarization methods have been
developed to overcome these drawbacks. Half breed synopsis consolidates both extractive and
abstractive strategies to make more precise and sound rundowns. This approach has been displayed to
beat both extractive and abstractive techniques on different measurements (Rush et al., 2015).
In conclusion, automated summarization has come a long way in recent years, with advancements in
9
From book to presentation generation using hybrid summarization
both extractive and abstractive summarization techniques. Extractive summarization has its
limitations but is ideal for simple and informal texts, while abstractive summarization can generate
more coherent and informative summaries but requires more advanced natural language processing
techniques. Deep learning-based methods are the most popular in abstractive summarization, but they
require a large amount of data and computational resources. Pointer Generated Networks and Hybrid
summarization are some of the newer approaches that have shown promise in generating more
efficient and effective summaries. However, despite the progress made, there are still challenges that
need to be addressed, such as the difficulty in preserving the source text's meaning, requiring the use
of more advanced natural language processing techniques.
After literature review the main question that came to our mind was: Can a hybrid model for book
summarization and presentation generation outperform existing methods in terms of accuracy,
efficiency, and user satisfaction?"
Based on this we made our hypothesis "The hybrid model for book summarization and presentation
generation will outperform existing methods in terms of accuracy, efficiency, and user satisfaction."
4. PROJECT OVERVIEW/GOAL
"From book to generating ppt using hybrid summarization" aims to create a system that automates the
process of summarizing a book and producing a PowerPoint presentation using hybrid summarization
methods. The proposed framework plans to give an important device to teachers and experts in
different fields, as making a PowerPoint show from a book can be tedious and testing.
In order to produce summaries that are more precise and coherent, the strategy known as hybrid
summarization combines extractive and abstractive approaches. This approach has been displayed to
10
From book to presentation generation using hybrid summarization
outflank both extractive and abstractive techniques on different measurements and has been applied
in different applications, including archive outline and text-to-discourse transformation. However, the
use of hybrid summarization methods to generate PowerPoint presentations from a book has not been
the subject of any research.
As a result, the proposed research aims to fill this void by creating a system that can automate the
process of summarizing books and creating PowerPoint presentations using hybrid summarization
techniques. The system will be compared to existing summarization tools and its accuracy, coherence,
and efficiency will be evaluated. Additionally, the tool's potential uses and advantages for
professionals in various fields and educators will be investigated. Additionally, the system's potential
extensions and enhancements, such as the incorporation of multimedia content or interactive
elements, will be the subject of the study.
In general, the research that has been proposed has the potential to offer professionals and educators a
useful tool for making book information in PowerPoint presentations more accessible and effective.
5. PROJECT DEVELOPMENT METHODOLOGY / ARCHITECTURE
Data Collection:
The collection of a dataset is the first step in this project.
"SciSumm" dataset, which contains abstracts and summaries of scientific articles in the field
of computational linguistics. The dataset includes both extractive and abstractive
summaries, and is intended to be used for evaluating the effectiveness of summarization
techniques for scientific literature.
"ACLRD" (Academic Corpus for Longform Reading Comprehension Dataset) dataset,
which contains academic articles and books from various domains, such as computer
11
From book to presentation generation using hybrid summarization
science, economics, and law. The dataset includes both the full text of the articles and book
chapters, as well as human-generated summaries.
Preprocessing:
Once the dataset is collected, it will be preprocessed before training the models. Preprocessing
involves standardizing the format. Hybrid model will be trained and tested using this dataset by
building a Sequence-to-Sequence model using a deep learning framework like TensorFlow, Keras,
or PyTorch
Evaluation Metrics:
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Measures the overlap between
the generated summary and a reference summary.
Human evaluation: A subjective evaluation conducted by human judges who assess the
quality of the generated summaries based on readability, accuracy, and coherence
F1 score: It measures the trade-off between precision and recall. Precision measures the
percentage of relevant information in the summary, Recall measures the percentage of
relevant information in the reference summary that is also in the generated summary.
These evaluation metrics can be used individually or in combination to assess the quality of the
generated summaries and the effectiveness of the presentation generation tool.
12
From book to presentation generation using hybrid summarization
System Block Level Diagram, figure.1
6. PROJECT MILESTONES AND DELIVERABLES
13
From book to presentation generation using hybrid summarization
7. WORK DIVISION
Samiha Azeem Aleezae Zia Fateh Ali Alim
Proposal Proposal Proposal
Documentation (SRS) Documentation (SRS) Documentation (SRS)
Dataset Generation Dataset Generation Dataset Generation
Training models Training models Training models
Testing Testing Testing
Updating Research Paper Updating Research Paper Updating Research Paper
Web app (frontend) Web app (frontend) Web app (frontend)
Web app (backend) Web app (backend) Web app (backend)
14
From book to presentation generation using hybrid summarization
8. COSTING
We will use online resources to conduct the research at this time, so no investment is required;
however, when we build the website in the future, we may require additional funds for the domain.
REFERENCES
1. “Text Summarization Techniques: A Brief Survey “by Mehdi Allahyari, Seyedamin Pouriyeh,
15
From book to presentation generation using hybrid summarization
Mehdi Assef,
2. "Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond" by
Nallapati et al
3. “Extractive Text Summarization Using Machine Learning" by Acharya
4. "Get To The Point: Summarization with Pointer-Generator Networks" by See et al
5. "A Hybrid Approach to Automatic Summarization of News Articles" by Saif Mohammad and
Bonnie J. Dorr
6. “Comparative Study of Text Summarization Methods" by Munot and Govilkar
7. Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence
summarization. arXiv preprint arXiv:1509.00685. https://arxiv.org/abs/1509.00685
8. Kirmani, M., Manzoor Hakak, N., Mohd, M., & Mohd, M. (2019). Hybrid text summarization:
a survey. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2017 (pp. 63-
73). Springer Singapore.
9. El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text
summarization: A comprehensive survey. Expert systems with applications, 165, 113679.
10. Thione, G. L., Van den Berg, M., Polanyi, L., & Culy, C. (2004, July). Hybrid text
summarization: Combining external relevance measures with structural analysis. In Text
Summarization Branches Out (pp. 51-55).
16