0% found this document useful (0 votes)

6 views5 pages

DAV Example

This document outlines a lab exercise for a Data Analysis and Visualization course at Namal University, focusing on analyzing a speech by Theresa May. It details tasks to read a text file, extract words, filter out stopwords, and plot a bar chart of the most frequent words. The document includes Python code snippets for each task and expected outputs.

Uploaded by

Muhammad Shahbaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

DAV Example

Uploaded by

Muhammad Shahbaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Analysis and Visualization Lab (CS-352 )

Namal University Mianwali

Faculty of Computer Science

Lab 04 – Example Dataset

In this exercise, there is a .txt file of the famous speech given by Theresa May. The aim of this exercise is to plot a
bar chart of the most "frequent" words in the speech.

The resultant graph will look similar to the one given below:

Read the file speech.txt using Python and extract each word in the file. (Remember this list will and
Task 1
should contain duplicates)
Solution # Import libraries
import numpy as np
import pandas as pd
import string
import matplotlib.pyplot as plt
%matplotlib inline
# Read the speech.txt file and create a word list
# file_name = "speech.txt"
word_list = []

with open('speech.txt', 'r', encoding='utf-8') as file:

text = file.read()
words = text.lower().split()
# Remove punctuation characters and numbers from the word list
for word in words:
# Remove leading/trailing punctuation characters
word = word.strip(string.punctuation).replace('–', '')

# Add the cleaned word to the word list

if word and not word.isdigit():
word_list.append(word)

print(word_list[:20]) #printing some values

OUTPUT

['thank', 'you', 'today', 'i', 'want', 'to', 'talk', 'about', 'the', 'united', 'kingdom', 'our', 'place', 'in', 'the',
'world', 'and', 'our', 'membership', 'of']
Read stopwords.csv using pandas and extract all the stopwords in the file that belongs to language
Task 2
English.
Solution # Read 'stopwords.csv' file as pandas dataframe and print its head()
data = pd.read_csv('stopwords.csv')
data.head()

OUTPUT
# Selecting stopwords that belong to English language
english_stopwords = set(data[data['Language'] == 'English']['Words'].tolist())
print(english_stopwords)

OUTPUT

{'was', 're', 'himself', 'after', 'ourselves', 'whom', 'out', 'against', 'has', 'aren', 't', 'we', 'have', 'own',
'it', 'as', 'do', "should've", "aren't", "hadn't", 'where', 'll', 'its', 'their', 'again', 'below', 's', 've', 'the',
'too', 'wouldn', 'more', 'y', 'can', 'are', 'itself', 'needn', 'she', 'having', 'now', 'or', 'at', 'nor', 'about',
"don't", 'very', 'were', 'mustn', "mustn't", 'his', "you're", "wasn't", 'once', 'which', 'doing', 'don',
"mightn't", 'but', "shouldn't", 'ain', 'most', 'then', "isn't", 'with', 'both', 'them', 'weren', 'be', 'had',
'will', 'hasn', 'isn', 'that', "couldn't", 'there', 'only', 'into', 'here', 'does', 'doesn', 'herself', 'me', 'by',
'is', "shan't", 'if', "it's", 'from', 'while', 'theirs', 'wasn', 'of', 'because', 'just', 'm', 'than', 'in', 'haven',
'yourself', 'him', "she's", 'my', 'our', 'did', 'off', 'each', "won't", 'shouldn', 'above', 'other',
'yourselves', 'how', 'they', 'hers', 'ours', 'hadn', 'am', 'her', 'shan', 'such', 'couldn', "haven't", 'those',
'when', 'been', 'for', 'same', "you've", 'between', "didn't", 'why', 'and', 'ma', 'on', 'should', 'under',
"you'll", 'over', 'some', 'further', "needn't", "doesn't", 'myself', 'until', "weren't", 'before', 'any', 'he',
'o', 'didn', 'being', 'not', 'few', 'to', 'up', 'won', 'an', 'down', 'd', "that'll", 'through', "wouldn't", 'you',
'during', 'your', 'what', 'mightn', 'i', 'all', 'yours', 'no', 'so', 'a', 'these', 'themselves', "you'd", 'who',
"hasn't", 'this'}

Task 3 Remove all the words extracted from the speech file if those words are present in the stopwords.
word_list = [word for word in word_list if word not in english_stopwords]

Solution
OUTPUT

['thank', 'today', 'want', 'talk', 'united', 'kingdom', 'place', 'world', 'membership', 'european',
'union', 'start', 'want', 'make', 'clear', 'see', 'rally', 'attack', 'even', 'criticism']
Task 4 Sort the pruned list of words from the file in order of the highest frequency of occurrence.

# Sort the dictionary based on highest frequncy of occurence

sorted_word_freq = {k: v for k, v in sorted(frequency_dict.items(), key=lambda item: item[1],
reverse=True)}
Solution
OUTPUT

Task 5 Plot a bar chart that represents the frequency of the top 15 words of the speech.
# Separate the frequency_list into 2 lists for easy plotting
keys = [key for key in sorted_word_freq]
values = [sorted_word_freq[key] for key in sorted_word_freq]

OUTPUT

# Plotting the words on the x-axis and frequency along the y-axis

# Plot the bar chart

plt.figure(figsize=(10, 5))
plt.bar(keys[:15], values[:15])
plt.title("Most Frequent Words in Theresa May's Speech")
plt.xlabel("Words")
plt.ylabel("Frequency")
plt.xticks(rotation=90)
plt.show()

OUTPUT

Solution
THE END

BR PRB 2
No ratings yet
BR PRB 2
6 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Problem 2
100% (1)
Problem 2
10 pages
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
No ratings yet
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
5 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
NLTK Cheatsheet for Text Analysis
No ratings yet
NLTK Cheatsheet for Text Analysis
3 pages
Text Mining Basics
No ratings yet
Text Mining Basics
16 pages
Ex4 Lab
No ratings yet
Ex4 Lab
4 pages
NLP EXP 3 (A) - Word Analysis
No ratings yet
NLP EXP 3 (A) - Word Analysis
2 pages
Batch 2
No ratings yet
Batch 2
13 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Document
No ratings yet
Document
5 pages
TSA Student
No ratings yet
TSA Student
20 pages
Problem 2 Businessreport ML Docx 1
No ratings yet
Problem 2 Businessreport ML Docx 1
9 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
LV 4 M Sclakwa BR EZb B5 Qo
No ratings yet
LV 4 M Sclakwa BR EZb B5 Qo
3 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
22 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
Trends Merged
No ratings yet
Trends Merged
10 pages
NLP Labexperiments
No ratings yet
NLP Labexperiments
46 pages
03 Python
No ratings yet
03 Python
5 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Artificial Intelligience
No ratings yet
Artificial Intelligience
3 pages
NLP Day1
No ratings yet
NLP Day1
4 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
Siddhant Part 2
No ratings yet
Siddhant Part 2
3 pages
Assignment No - 7
No ratings yet
Assignment No - 7
4 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Printing Frequently Appearing Words in A Text File
No ratings yet
Printing Frequently Appearing Words in A Text File
6 pages
Gebrekidan Yonatan Yakob
No ratings yet
Gebrekidan Yonatan Yakob
14 pages
AP19110010110 Lab Assignment-2 - Jupyter Notebook
No ratings yet
AP19110010110 Lab Assignment-2 - Jupyter Notebook
18 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
Natural Langauage Processing (NLP) : Tokenization of Words
No ratings yet
Natural Langauage Processing (NLP) : Tokenization of Words
8 pages
SPR 05 NLTK
No ratings yet
SPR 05 NLTK
18 pages
CS Practicals Xii 2022 23
No ratings yet
CS Practicals Xii 2022 23
26 pages
05 - Dictionaries and Tuples
No ratings yet
05 - Dictionaries and Tuples
61 pages
Natural Language Processing Journal
No ratings yet
Natural Language Processing Journal
73 pages
Python Lab Program 5
No ratings yet
Python Lab Program 5
4 pages
Bag of Words 03 and 04 Model
No ratings yet
Bag of Words 03 and 04 Model
4 pages
Program1 Explanation
No ratings yet
Program1 Explanation
3 pages
All Practicals
No ratings yet
All Practicals
33 pages
Tsa Lab Manual Document About Text and Speech Analysis
No ratings yet
Tsa Lab Manual Document About Text and Speech Analysis
25 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Assignment2 Fall 2024
No ratings yet
Assignment2 Fall 2024
6 pages
Python NLP Tasks with NLTK
No ratings yet
Python NLP Tasks with NLTK
17 pages
Data Processing & Analysis Task
No ratings yet
Data Processing & Analysis Task
5 pages
Python
No ratings yet
Python
5 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
No ratings yet
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
32 pages
NLP Record
No ratings yet
NLP Record
15 pages
Dictionaries: 'One' 'Uno'
No ratings yet
Dictionaries: 'One' 'Uno'
10 pages
BBC Sports Text Preprocessing Guide
No ratings yet
BBC Sports Text Preprocessing Guide
6 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
CS Practicals Xii 2021 22
No ratings yet
CS Practicals Xii 2021 22
18 pages
Cloud Computing
No ratings yet
Cloud Computing
6 pages
Bank Ui
No ratings yet
Bank Ui
11 pages
Research Question
No ratings yet
Research Question
6 pages
Bank Account Management System
No ratings yet
Bank Account Management System
6 pages
Mobile Ui
No ratings yet
Mobile Ui
3 pages
Coal Project
No ratings yet
Coal Project
10 pages
DB - Chapter 3
No ratings yet
DB - Chapter 3
51 pages
Relational Database Basics
No ratings yet
Relational Database Basics
31 pages
Database Basics for Beginners
No ratings yet
Database Basics for Beginners
64 pages
65 F 915 A 372 CDF Rhetorical Devices
No ratings yet
65 F 915 A 372 CDF Rhetorical Devices
3 pages
Resource Modelling for Geologists
100% (1)
Resource Modelling for Geologists
10 pages
Mail Merge
No ratings yet
Mail Merge
3 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
K-1000C LED Controller Manual
No ratings yet
K-1000C LED Controller Manual
9 pages
Computer Planner 2025 9th
No ratings yet
Computer Planner 2025 9th
4 pages
1.2 Order of Operations and Evalutating Expressions
No ratings yet
1.2 Order of Operations and Evalutating Expressions
2 pages
English Learning Exercises
No ratings yet
English Learning Exercises
2 pages
Digital Disruption and Transformation Case Studies Approaches and Tools Daniel Schallmo PDF Download
100% (1)
Digital Disruption and Transformation Case Studies Approaches and Tools Daniel Schallmo PDF Download
80 pages
H.S.C Physics 1st & 2nd Paper (NCTB Approved) : June 2015
No ratings yet
H.S.C Physics 1st & 2nd Paper (NCTB Approved) : June 2015
2 pages
Python Module-1 QB Solution (21EC643)
No ratings yet
Python Module-1 QB Solution (21EC643)
25 pages
Section1 OMMXTS-W00079 1 System Overview 045727
No ratings yet
Section1 OMMXTS-W00079 1 System Overview 045727
40 pages
NSP 22.11 Simplified RAN Transport Solution
100% (1)
NSP 22.11 Simplified RAN Transport Solution
38 pages
Os - Lab - Manual Cse-2024-25
No ratings yet
Os - Lab - Manual Cse-2024-25
58 pages
Java Mainsit
No ratings yet
Java Mainsit
18 pages
Access Control Paxton KP50
No ratings yet
Access Control Paxton KP50
4 pages
Subsea Control Systems Guide
No ratings yet
Subsea Control Systems Guide
7 pages
COURSE OUTLINE-CSE 3310 Computer Graphics
No ratings yet
COURSE OUTLINE-CSE 3310 Computer Graphics
5 pages
Holistic Testing-Weave Quality Into Your Product
No ratings yet
Holistic Testing-Weave Quality Into Your Product
37 pages
OUA-Memo - 0421136 - May 2021 Mi-TechTalk Webinar Sessions On Microsoft 365 - 2021 - 04 - 28
No ratings yet
OUA-Memo - 0421136 - May 2021 Mi-TechTalk Webinar Sessions On Microsoft 365 - 2021 - 04 - 28
6 pages
Sun Storage 7000 Unified Storage System Administration Guide
No ratings yet
Sun Storage 7000 Unified Storage System Administration Guide
388 pages
MMAT5390 Chapter 1
No ratings yet
MMAT5390 Chapter 1
12 pages
Только анархизм 1st Edition Блэк Боб download
100% (3)
Только анархизм 1st Edition Блэк Боб download
12 pages
4.5m Earth Station Antenna: Assembly, Installation, Operations, & Maintenance Manual
100% (1)
4.5m Earth Station Antenna: Assembly, Installation, Operations, & Maintenance Manual
29 pages
Lab Technician
No ratings yet
Lab Technician
10 pages
Restaurant Management Website
No ratings yet
Restaurant Management Website
42 pages
Roblox Api Domain List
No ratings yet
Roblox Api Domain List
7 pages
TPG E-Wars
No ratings yet
TPG E-Wars
1 page
Fact - Sheet - Spectra-Professional - 2 2
No ratings yet
Fact - Sheet - Spectra-Professional - 2 2
1 page
Embedded System Design Course Intro
No ratings yet
Embedded System Design Course Intro
13 pages
Problem - B - Codeforces
No ratings yet
Problem - B - Codeforces
3 pages

DAV Example

Uploaded by

DAV Example

Uploaded by

Data Analysis and Visualization Lab (CS-352 )

Namal University Mianwali

Lab 04 – Example Dataset

with open('speech.txt', 'r', encoding='utf-8') as file:

# Add the cleaned word to the word list

print(word_list[:20]) #printing some values

# Sort the dictionary based on highest frequncy of occurence

# Plot the bar chart

You might also like