0% found this document useful (0 votes)

16 views12 pages

Summarization - Doc - Jupyter Notebook

Notes

Uploaded by

Ritesh Mahale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views12 pages

Summarization - Doc - Jupyter Notebook

Notes

Uploaded by

Ritesh Mahale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1.

Problem Statement

The objective of this project is to develop a highly accurate and efficient audio and video
transcription system that converts spoken content from various recordings into written text. The
system should support a wide range of audio and video file formats, handling different accents,
dialects, and languages.The transcription should reflect spoken words accurately, including
proper names, technical terms, and context-specific vocabulary, with appropriate punctuation
and capitalization to enhance readability. Key functionalities must also include noise reduction to
manage background sounds and maintain transcription accuracy, as well as real-time
transcription capabilities for live recordings alongside batch processing for pre-recorded
files.The output should be exportable in formats like TXT.

2. Data Gathering

To enhance the dataset for our audio-to-text transcription and summarization project, we
collected a diverse set of audio data from various YouTube videos.
3. Understanding APIs:

1. Researching multiple APIs used to convert audio to text data.

2. I have used AssemblyAI, which provides a powerful API for speech-to-text transcription. It's
designed to transcribe audio data into text with high accuracy and speed

4. Creating an AssemblyAI Key

I have create Assembly key for transmitting audio to text

4. Searching Methods for Transcribing Audio to Text

1. I found the Huggingface library.

2. The Huggingface library has three models that are useful for converting audio to text.
3. I implemented the wav2vec2 and Whisper models for transcribing audio to text.

5. Displaying Video

Libraries for Displaying Video and Audio

1. I have used the moviepy library for reading video files.

2. I have used the pygame library for handling video and audio playback.
6. Transcribing Audio to Text

1. I have used AssemblyAI for transcribing audio to text.

2. I used the librosa library to load audio files.

7. Perform EDA

A. Perform Visualization

1. Visualize the Waveform

2. Visualize the Spectrogram

Compute and display the spectrogram

3. Extract and Visualize MFCCs

4. Analyze Silence and Noise
5. Extract and Analyze Other Features

a. Perform Zero-crossing: The zero-crossing rate is the rate at which the signal changes
sign,indicating noise levels.

b. Perform Spectral Centroid: The spectral centroid indicates where the center of mass of
thespectrum is located, giving a sense of brightness.

c. Perform Statistical Analysis: Perform statistical analysis to understand the distribution

andcharacteristics of your audio data.
B. Post-Processing the Transcribed Text

1. Cleaning the Transcriptions

a. Remove unwanted characters (I used the re library).

2. Normalizing the Text

3. Handling Silence and Noise

8. Transcribing Audio to Text

1. I researched the Huggingface library and used the wav2vec2 and Whisper models for
transcribing audio to text.
2. For wav2vec2, I used the IPython, SciPy, and NumPy libraries.
3. After that, I used the Whisper model for transcribing audio into text.

Screenshot of wav2vec2:
Screenshot of whisper:
9. Learning Gemini/ Gemini Pro

Gemini:

1. I learned about Gemini and received some resources from Ritesh.

2. I created an API key in Gemini.
3. Using this API key, I will transmit audio into text.
Gemini-Pro:

1. I learned about Gemini Pro.

2. I also learned how to transmit video into text, and I transmitted a video into summarized text
using Gemini.

10. Creating a Streamlit Using AssemblyAI/Wav2Vec2/Whisper Model

1. I have created a Streamlit application for transmitting audio to text using AssemblyAI and
saving that text into a PDF.
2. I had a discussion with Ritesh Mahale, who guided me about the Streamlit UI. Finally, I
completed the Streamlit UI.
3. I have created a Streamlit application using the Whisper model for transcribing audio to text,
and the result was really good. Therefore, I finalized the Whisper model

AssemblyAI:
Whisper:
Finalise
1. I have used the Whisper model for transcribing audio to text.
2. I have used Gemini pro for text summarization, and I have finally completed the

summarization.
Model Deployment

Using Whisper Model:

1. I Have deployed Model on streamlit

link: text-summarization-jkeca3vdz9ewoingfdskaq.streamlit.app

2. I have use whisper model for transmitting audio into Text and have used
Gemini for Text summarization

Using AssemblyAI:

1. I Have deployed Model on streamlit

link : audiototext-gf2ksybgvmuhdzsvbbxmgx.streamlit.app

2. I have use AssemblyAi for transmitting audio into Text

Algorithm For AssemblyAi and Gemini
No ratings yet
Algorithm For AssemblyAi and Gemini
8 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
Paper 4
No ratings yet
Paper 4
5 pages
Speech Recognition Techniques - GUVI
No ratings yet
Speech Recognition Techniques - GUVI
4 pages
Case Study: Speech Recognition For Virtual Assistants: 1. Problem Identification
No ratings yet
Case Study: Speech Recognition For Virtual Assistants: 1. Problem Identification
8 pages
Video Transcription and Summarization Using NLP
No ratings yet
Video Transcription and Summarization Using NLP
5 pages
Mini Project Final Review Batch 8B
No ratings yet
Mini Project Final Review Batch 8B
16 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Unit 3 NMU
No ratings yet
Unit 3 NMU
4 pages
Presentation Format
No ratings yet
Presentation Format
17 pages
Text To Speech
No ratings yet
Text To Speech
14 pages
Industrial Training Presentation
No ratings yet
Industrial Training Presentation
14 pages
SIH PPT (Applications Added)
No ratings yet
SIH PPT (Applications Added)
5 pages
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
No ratings yet
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
5 pages
Format Edit
No ratings yet
Format Edit
10 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Consulting Proposal
No ratings yet
Consulting Proposal
28 pages
Real-Time Video To Text Transcription Android App (Using Video Processing and Multimedia)
No ratings yet
Real-Time Video To Text Transcription Android App (Using Video Processing and Multimedia)
32 pages
Automated Notes Maker From Audio Reccordings
No ratings yet
Automated Notes Maker From Audio Reccordings
4 pages
Proposal PhamThaiNguyen 22560053
No ratings yet
Proposal PhamThaiNguyen 22560053
11 pages
Real Time Voice Translator
No ratings yet
Real Time Voice Translator
28 pages
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
VisionTune - Bridging Text and Creativity Through AI-Generated Video, Images, and Music
No ratings yet
VisionTune - Bridging Text and Creativity Through AI-Generated Video, Images, and Music
11 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
3-2 Project Report
No ratings yet
3-2 Project Report
6 pages
YouTube Video Summarizer and Question Answering Bot Using Gemini
No ratings yet
YouTube Video Summarizer and Question Answering Bot Using Gemini
12 pages
AI Assistant PBL Project
No ratings yet
AI Assistant PBL Project
13 pages
On Text To Speech Conversion Using OCR
50% (2)
On Text To Speech Conversion Using OCR
26 pages
Speech To Text Conversion
No ratings yet
Speech To Text Conversion
7 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Py Report
No ratings yet
Py Report
8 pages
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
No ratings yet
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
1 page
Conversionof Image, Value, and Text To Speech by Using Machine Learning
No ratings yet
Conversionof Image, Value, and Text To Speech by Using Machine Learning
16 pages
Project Report
No ratings yet
Project Report
58 pages
1.modern Text Tool
No ratings yet
1.modern Text Tool
8 pages
Department of Mechanical Engineering: Mini Project Phase 1 Presentation
No ratings yet
Department of Mechanical Engineering: Mini Project Phase 1 Presentation
12 pages
Balaa Punda
No ratings yet
Balaa Punda
25 pages
Ai Voice Assistant PPT Project
0% (1)
Ai Voice Assistant PPT Project
22 pages
Face Detection Using Open CV
No ratings yet
Face Detection Using Open CV
9 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
Text To Audio (Team 05)
No ratings yet
Text To Audio (Team 05)
30 pages
Hackblitz
No ratings yet
Hackblitz
6 pages
Keywords
No ratings yet
Keywords
4 pages
AI-Powered Voice-to-Image Tech
No ratings yet
AI-Powered Voice-to-Image Tech
16 pages
Final
No ratings yet
Final
12 pages
Caption Generator
No ratings yet
Caption Generator
18 pages
Sujal Kumar Sinha - IOT - MATLAB Mini
No ratings yet
Sujal Kumar Sinha - IOT - MATLAB Mini
13 pages
GENAI
No ratings yet
GENAI
6 pages
Ai Voice Assistant PPT Project
No ratings yet
Ai Voice Assistant PPT Project
23 pages
Voice Assistant
No ratings yet
Voice Assistant
30 pages
Speech to Text Conversion Project
No ratings yet
Speech to Text Conversion Project
2 pages
Voice-to-Text Tool for Students
No ratings yet
Voice-to-Text Tool for Students
13 pages
Text Data Understanding
No ratings yet
Text Data Understanding
6 pages
Topic ApprovalBEA13
No ratings yet
Topic ApprovalBEA13
6 pages
22.026.DSTA - CS: Virtual Assistant For Video Conference System
No ratings yet
22.026.DSTA - CS: Virtual Assistant For Video Conference System
9 pages
Thank You
No ratings yet
Thank You
23 pages
C4GT DMP 2025 - Proposal (Repaired)
No ratings yet
C4GT DMP 2025 - Proposal (Repaired)
21 pages
AI For Speech Recognition Complete
No ratings yet
AI For Speech Recognition Complete
4 pages
Nepal College of Information Technology: Hydropower Project - PUWA KHOLA 1, Ilam
No ratings yet
Nepal College of Information Technology: Hydropower Project - PUWA KHOLA 1, Ilam
18 pages
Notes 1
No ratings yet
Notes 1
17 pages
Be Complete Marksheet
No ratings yet
Be Complete Marksheet
9 pages
Coase Theorem
No ratings yet
Coase Theorem
8 pages
The Decolonization Dilemma at Different Stages of The Academic Career
No ratings yet
The Decolonization Dilemma at Different Stages of The Academic Career
2 pages
Principle of Heat Pipe in Air Handling Unit
100% (1)
Principle of Heat Pipe in Air Handling Unit
3 pages
Prarancangan Pabrik Asam Benzoat Dari Toluen Dan Oksigen Dengan Proses Oksidasi Kapasitas 20.000 TON/TAHUN
No ratings yet
Prarancangan Pabrik Asam Benzoat Dari Toluen Dan Oksigen Dengan Proses Oksidasi Kapasitas 20.000 TON/TAHUN
1 page
1 Introduction - Sociology of Science and Technology
No ratings yet
1 Introduction - Sociology of Science and Technology
19 pages
Science: 1st Term Test Sri Lankan School Muscat
No ratings yet
Science: 1st Term Test Sri Lankan School Muscat
10 pages
Daftar Peserta Kelas Semester Antara T.A. 2023-2024
No ratings yet
Daftar Peserta Kelas Semester Antara T.A. 2023-2024
77 pages
Type 3241 Dodatkowy
No ratings yet
Type 3241 Dodatkowy
6 pages
DCPT Test
100% (1)
DCPT Test
10 pages
Sandaband Material: A Cost Saving Technical Revolution!
No ratings yet
Sandaband Material: A Cost Saving Technical Revolution!
4 pages
T3804U
No ratings yet
T3804U
2 pages
Tank - FR - &FR - Design - Presentation - Final - Bijwasan
100% (5)
Tank - FR - &FR - Design - Presentation - Final - Bijwasan
93 pages
Lecture 5 ParametricMethod
No ratings yet
Lecture 5 ParametricMethod
20 pages
Occupational Health and Safety at Work For Dummies, UK Edition - 978!1!119-28724-7
No ratings yet
Occupational Health and Safety at Work For Dummies, UK Edition - 978!1!119-28724-7
2 pages
VF Control of Three Phase Induction Motor Drive With Different P WM Techniques PDF
No ratings yet
VF Control of Three Phase Induction Motor Drive With Different P WM Techniques PDF
15 pages
Project 4 Final Draft 1
No ratings yet
Project 4 Final Draft 1
9 pages
Economy Housing Market Research
No ratings yet
Economy Housing Market Research
12 pages
BIOM4025 - Statistical Modelling - QA Session 2
No ratings yet
BIOM4025 - Statistical Modelling - QA Session 2
24 pages
Green Synthesis of Silver Nanoparticles Using Trac
No ratings yet
Green Synthesis of Silver Nanoparticles Using Trac
15 pages
Las in Earth Life Science C1W1
No ratings yet
Las in Earth Life Science C1W1
10 pages
List of Outbond Student
No ratings yet
List of Outbond Student
5 pages
Barangay Primicias Green Plan
No ratings yet
Barangay Primicias Green Plan
3 pages
English Mini-IO Transcript
No ratings yet
English Mini-IO Transcript
4 pages
DESIGN P1 GR11 QP NOV2017 - English
No ratings yet
DESIGN P1 GR11 QP NOV2017 - English
13 pages
Logical Fallacies
No ratings yet
Logical Fallacies
2 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Environmental Compliance Duties
No ratings yet
Environmental Compliance Duties
3 pages