0% found this document useful (0 votes)

22 views37 pages

Ad3364 - Dev Lab Manual Final

The document is a lab manual for the Data Exploration and Visualization Laboratory course at SRM Valliammai Engineering College for the academic year 2025-2026. It includes the syllabus, objectives, experiments, and evaluation procedures, focusing on data analysis and visualization techniques using tools like R, Python, and Tableau. Additionally, it outlines program educational objectives, outcomes, and specific outcomes related to Artificial Intelligence and Data Science.

Uploaded by

jikola6980

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views37 pages

Ad3364 - Dev Lab Manual Final

Uploaded by

jikola6980

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

SRM VALLIAMMAI ENGINEERING COLLEGE

(An Autonomous Institution)

SRM Nagar, Kattankulathur-603203.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

LAB MANUAL
(Regulation 2023)

AD3364 DATA EXPLORATION AND VISUALIZATION LABORATORY

(THIRD SEMESTER)

Academic Year: 2025-2026(ODD)

Prepared By
Mr.B.Yogesh Kumar, Asst. Prof. (O.G) / AI&DS

1
INDEX
E.NO EXPERIMENT NAME Pg. No.

A PEO, PO, PSO 3-5

B Syllabus 6

C Introduction/ Description of major Software & Hardware involved in lab 7

D CO, CO-PO Matrix, CO-PSO Matrix 7

E Mode of Assessment 8

Install Data Analysis and Visualization tools: R/Python/Tableau Public/Power

1
BI. 9-12

Perform Exploratory Data Analysis (EDA) on with datasets like email data set.
2 Export all your emails as a dataset, import them inside a pandas data frame, 13-16
visualize them and get different insights from the data.

3 Working with Numpy arrays, Pandas data frames, Basic Plots using Matplotlib. 17-21

Explore various variable and row filters in R for cleaning data. Apply various
4
plot features in R on sample data sets and visualize. 22-23

5 Perform Time Series Analysis and apply the various visualizations techniques. 24-26

Perform Data Analysis and representation on map using map data sets with
6
Mouse Rollover effect, user interaction, etc. 27-28

Build Cartographic visualization for multiple datasets involving various

7 29-31
countries of the world; states and districts in India etc.

8 Perform EDA on Wine Quality Data Set. 32-36

Use a case study on a data set and apply the various EDA and visualizations
9
techniques and present an analysis report. 37

2
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

1. To afford the necessary background in the field of Information Technology to deal with engineering problems
to excel as engineering professionals in industries.
2. To improve the qualities like creativity, leadership, teamwork and skill thus contributing towards the growth
and development of society.
3. To develop ability among students towards innovation and entrepreneurship that caters to the needs of Industry
and society.
4. To inculcate and attitude for life-long learning process through the use of information technology sources.
5. To prepare then to be innovative and ethical leaders, both in their chosen profession and in other activities.

PROGRAMME OUTCOMES (POs)

After going through the four years of study, Information Technology Graduates will exhibit ability to:

PO# Graduate Attribute Programme Outcome

Apply the knowledge of mathematics, science, engineering

1 Engineering knowledge fundamentals, and an engineering specialization for the solution
of complex engineering problems.
Identify, formulate, research literature, and analyze complex
engineering problems reaching substantiated conclusions using
2 Problem analysis
first principles of mathematics, natural sciences, and engineering
sciences.
Design solutions for complex engineering problems and design
Design/development of system components or processes that meet the specified needs
3
solutions with appropriate consideration for public health and safety, and
cultural, societal, and environmental considerations.
Use research-based knowledge and research methods including
Conduct investigations of
4 design of experiments, analysis and interpretation of data, and
complex problems
synthesis of the information to provide valid conclusions

3
Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools, including prediction and
5 Modern tool usage
modeling to complex engineering activities, with an
understanding of the limitations.
Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal, and cultural issues and the
6 The engineer and society
consequent responsibilities relevant to the professional
engineering practice
Understand the impact of the professional engineering solutions
Environment and
7 in societal and environmental contexts, and demonstrate the
sustainability
knowledge of, and need for sustainable development.
Apply ethical principles and commit to professional ethics and
8 Ethics
responsibilities and norms of the engineering practice
Function effectively as an individual, and as a member or leader
9 Individual and team work
in diverse teams, and in multidisciplinary settings
Communicate effectively on complex engineering activities with
the engineering community and with the society at large, such
10 Communication as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give
and receive clear instructions
Demonstrate knowledge and understanding of the engineering
Project management and and management principles and apply these to one’s own work,
11
finance as a member and leader in a team, to manage projects and in
multidisciplinary environments
Recognize the need for, and have the preparation and ability to
12 Life-long learning engage in independent and life-long learning in the broadest
context of technological change

4
PROGRAMME SPECIFIC OUTCOMES (PSOs)
After the completion of Bachelor of Technology in Artificial Intelligence and Data Science
programme the student will have following Program specific outcomes
1. Design and develop secured database applications with data analytical approaches of data preprocessing,
optimization, visualization techniques and maintenance using state of the art methodologies based on
ethical values.
2. Design and develop intelligent systems using computational principles, methods and systems for extracting
knowledge from data to solve real time problems using advanced technologies and tools.
3. Design, plan and setting up the network that is helpful for contemporary business environments using latest
software and hardware.
4. Planning and defining test activities by preparing test cases that can predict and correct errors ensuring a
socially transformed product catering all technological needs.

5
AD3364 DATA EXPLORATION AND VISUALIZATION LABORATORY LTPC
0 0 3 1.5
OBJECTIVES:

 To understand the key techniques behind data visualization.

 To learn about various visualization structures.
 To evaluate the information visualization systems.
 To design and build data visualization systems.
 To analyze and identify trends in data sets.

LIST OF EXPERIMENTS

1. Install Data Analysis and Visualization tools: R/Python/Tableau Public/Power BI.

2. Perform Exploratory Data Analysis (EDA) on with datasets like email data set. Export all your emails
as a dataset, import them inside a pandas data frame, visualize them and get different insights from
the data.
3. Working with Numpy arrays, Pandas data frames, Basic Plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in R on
sample data sets and visualize.
5. Perform Time Series Analysis and apply the various visualizations techniques.
6. Perform Data Analysis and representation on map using map data sets with Mouse Rollover effect,
user interaction, etc.
7. Build Cartographic visualization for multiple datasets involving various countries of the world; states
and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualizations techniques and present
an analysis report.

TOTAL: 45 PERIODS

6
LIST OF EQUIPMENTS FOR A BATCH OF 30 STUDENTS

SOFTWARE:
Standalone desktops with Python 3 interpreter for Windows / Linux 30 Nos. (or) Server with Python 3
interpreter for Windows/Linux supporting 30 terminals or more.
HARDWARE:
Standalone Desktops: 30 Nos.

COURSE OUTCOMES

AD3364.1 Understand the fundamentals of exploratory data analysis.

AD3364.2 Implement the data visualization using Matplotlib.

AD3364.3 Perform univariate data exploration and analysis.

AD3364.4 Apply bivariate data exploration and analysis.

AD3364.5 Use data exploration and visualization techniques for multivariate and time series data.

CO- PO-PSO MATRIX

PO PSO
CO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4

1 3 - 3 - - - - - - - - - 2 - - 1

2 - 2 - - 1 - - - - 2 - - 1 - - -

3 - - 3 1 - - - - - 3 - 2 2 - - -

4 2 - - - - - - - 1 - - - 2 - 1 2

5 - 3 1 3 1 - - - 1 - - 2 3 1 - -

Average 2.5 2.5 2.3 2.0 1.0 - - - 1.0 2.5 - 2.0 2.0 1.0 1.0 1.5

7
EVALUATION PROCEDURE FOR EACH EXPERIMENT

S. No Description Mark

1. Aim & Procedure 20

2. Observation 30

3. Conduction and Execution 30

4. Output & Result 10

5. Viva 10

Total 100

INTERNAL ASSESSMENT FOR LABORATORY

S. No Description Mark

1. Conduction & Execution of Experiment 25

2. Record 10

3. Model Test 15

Total 50

8
Ex. No: 1 INSTALLING DATA ANALYSIS AND VISUALIZATION TOOL

AIM
To write a step to install data analysis and visualization tool: R / Python / Tableau Public / Power BI.

PROCEDURE
1. R:
 Download R:
 Visit the official R website (https://cran.r-project.org/) and download the installer for
your operating system (Windows, macOS, or Linux).
 Install R by following the instructions provided in the installer.

2. Python:
 Download Python:
 Visit the official website (https://www.python.org/downloads/) and download the Python
installer for your OS (Windows, macOS, or Linux).
 Install Python by running the installer and making sure to check the option to add Python to your
system’s PATH during installation.

(i) INSTALL NUMPY WITH PIP

NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is
a general-purpose array and matrices processing package.

pip install numpy

(ii) INSTALL JUPYTER LAB

Install Jupyter Lab with pip:

pip install jupyterlab

Once installed, launch Jupyter Lab with:

jupyter-lab

9
(iii) JUPYTER NOTEBOOK

Install the classic Jupyter Notebook with:

pip install notebook

To run the notebook:

Jupyter notebook

(iv) INSTALL SCIPY

Scipy is a Python library that is useful in solving many mathematical equations and algorithms. It
is designed on the top of Numpy library. SCIPY means Scientific Python.

pip install scipy

(v) INSTALL PANDAS

Pandas is a Python Package that provides fast, flexible, and expressive data structures designed to
make working with “relational” or “labeled” data botheasy and intuitive.

pip install pandas

(vi) INSTALL MATPLOTLIB

Matplotlib is a comphrehensive library for creating static, animated, and interactive visualizations
in Python. Working with “relational” or “labeled” data botheasy and intuitive.

pip install matplotlib

3. Tableau Public:
 Tableau Public
 It is a web-based tool, so there’s no installation required. Simply visit the Tableau Public
Website (https”//public.tableau.com/s/gallery) and create an account to start using it.

10
4. Power Bi:
 Download Power BI Desktop:
 Go to the official Power BI wenbsite (https://powerbi.microsoft.com/en-us/desktop/) and
download Power BI Desktop.
 Installer Power BI Desktop by running the installer.

PROGRAM 1
import numpy as np
import pandas as pd
hafeez = [‘Hafeez’, 19]
aslam = [‘Aslam’, 21]
kareem = [‘kareem’, 18]
dataframe = pd.DataFrame([hafeez, aslam, kareem], columns = [‘Name’, ‘Age’])
print(dataframe)

Output 1

PROGRAM 1
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv(“CountryData.csv”)
plt.hist(data)
plt.xlabel(“code”)
plt.ylabel(“Total_personal_income”)
plt.show()

CREATE A CSV FILE IN EXCEL:

 First create a CSV file in excel with attributes ‘code’ and ‘Total_personal_income’.

 Save the file with filename mentioned above “CountryData” with extension as .csv file.

11
Output 2

RESULT
Thus, the python program to install data analysis and visualization tools like R, Python, Tableau Public,
or Power BI, and their features were explored successfully.
12
Ex. No: 2 EXPLORATORY DATA ANALYSIS (EDA) ON WITH DATASETS

AIM
To perform exploratory data analysis (EDA) on with datasets like email data set.

PROCEDURE
Exploratory Data Analysis (EDA) on email datasets involves importing the data, cleaning it, visualizing it, and
extracting insights. Here's a step-by-step guide on how to perform EDA on an email dataset using Python and
Pandas
1. Import Necessary Libraries:
Import the required Python libraries for data analysis and visualization.

2. Load Email Data:

Assuming you have a folder containing email files (e.g., .eml files), you can use the email library to
parse and extract the email contents.

3. Data Cleaning:
Depending on your dataset, you may need to clean and preprocess the data. Common cleaning steps
include handling missing values, converting dates to datetime format, and removing duplicates.

4. Data Exploration:
Now, you can start exploring the dataset using various techniques. Here are some common EDA
tasks:
Basic Statistics:
Get summary statistics of the dataset.
Distribution of Dates:
Visualize the distribution of email dates.

5. Word Cloud for Subject or Message:

Create a word cloud to visualize common words in email subjects or messages.

6. Top Senders and Recipients:

Find the top email senders and recipients.
Depending on your dataset, you can explore further, analyze sentiment, perform network analysis,
or any other relevant analysis to gain insights from your email data.

PROGRAM
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

13
import seaborn as sns
# Load the dataset
df = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\Emaildataset.csv')
# Display basic information about the dataset
print(df.info())
# Display the first few rows of the dataset
print(df.head())
# Descriptive statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Visualize the distribution of numerical variables
sns.pairplot(df) plt.show()
# Visualize the distribution of categorical variables
sns.countplot(x='label', data=df) plt.show()
# Correlation matrix for numerical variables
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()
# Word cloud for text data (if you have a column with text data) from wordcloud
import WordCloud text_data = ' '.join(df['text_column']) wordcloud =
WordCloud(width=800, height=400, random_state=21,
max_font_size=110).generate(text_data)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()

OUTPUT
Data columns (total 4 columns):
# Column Non-Null Count Dtype

14
0 Unnamed: 0 5171 non-null int64
1 label 5171 non-null object
2 text 5171 non-null object
3 label_num 5171 non-null int64
dtypes: int64(2), object(2)
memory usage: 161.7+ KB
None
Unnamed: 0 label text label_num
0 605 ham Subject: enron methanol; meter #: 988291\r\n... 0
1 2349 ham Subject: hpl nom for january 9 , 2001\r\n( see... 0
2 3624 ham Subject: neon retreat\r\nho ho ho , we ' re ar... 0
3 4685 spam Subject: photoshop , windows , office . cheap ... 1
4 2030 ham Subject: re : indian springs\r\nthis deal is t... 0
Unnamed: 0 label_num
count 5171.000000 5171.000000
mean 2585.000000 0.289886
std 1492.883452 0.453753
min 0.000000 0.000000
25% 1292.500000 0.000000
50% 2585.000000 0.000000
75% 3877.500000 1.000000
max 5170.000000 1.000000
Unnamed: 0 0
label 0
text 0
label_num 0
dtype: int64

15
RESULT
Thus, the above Performing exploratory data analysis (EDA) on with datasets like email data set has been
performed successfully.
16
Ex. No:3 WORKING WITH NUMPY ARRAYS, PANDAS DATA FRAMES, BASIC PLOTS
USING MATPLOTLIB

AIM
To write the steps for Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib

PROCEDURE

1. NumPy:
NumPy is a fundamental library for numerical computing in Python. It provides support for multi-
dimensional arrays and various mathematical functions. To get started, you’ll first need to install NumPy
if you haven’t already (you can use pip):

pip install numpy

Once NumPy is installed, you can use it as follows:

import numpy as np
# Creating NumPy arrays
arr = np.array([1,2,3,4,5])
print(arr)
# Basic operations
mean = np.mean(arr)
sum = np.sum(arr)
# Mathematical functions
square_root = np.sqrt(arr)
exponential = np.exp(arr)
# Indexing and Slicing
first_element = arr[0]
sub_array = arr[1:4]
# Array Operations
Combined_array = np.concatenate([arr, sub_array])

17
OUTPUT

2. Pandas:
Pandas is a powerful library for data manipulation and analysis.
You can install pandas using pip:

pip install pandas

Here’s how to work with Pandas DataFrames:

import pandas as pd
# Creating a DataFrame from a dictionary
data = { ‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Emily’], ‘Age’: [25, 30, 35, 28, 22], ‘City’: [‘New York’,
‘Los Angeles’, ‘Chicago’, ‘Houston’, ‘Miami’]
}
df = pd.DataFrame(data)
# Display the entire DataFrame
print(“DataFrame:”)
print(df)
# Accessing specific columns
print(“\n Accessing ‘Name’ Column:”)
print(df[‘Name’])
# Adding a new column
df[‘Salary’] = [50000, 60000, 75000, 48000, 55000]
# Filtering data
print("\nPeople older than 30:")
18
print(df[df['Age'] > 30])
# Sorting by a column
print("\nSorting by 'Age' in descending order:")
print(df.sort_values(by='Age', ascending=False))
# Aggregating data
print("\nAverage age:")
print(df['Age'].mean())
# Grouping and aggregation
grouped_data = df.groupby('City')['Salary'].mean()
print("\nAverage salary by city:")
print(grouped_data)
# Applying a function to a column
df['Age_Squared'] = df['Age'].apply(lambda x: x ** 2)
# Removing a column
df = df.drop(columns=['Age_Squared'])
# Saving the DataFrame to a CSV file
df.to_csv('output.csv', index=False)
# Reading a CSV file into a DataFrame
new_df = pd.read_csv('output.csv')
print("\nDataFrame from CSV file:")
print(new_df)

OUTPUT

19
3. Matplotlib:
Matplotlib is a popular library for creating static, animated, or interactive plots and graphs.
Install Matplotlib using pip:
pip install matplotlib

Here’s a simple example of creating a basic plot:

import matplotlib.pyplot as plt
# Sample data
x = np.linspace (0, 10, 100)
y = np.sin(x)
# Create a line plot
20
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave')
plt.title('Sine Wave Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()

OUTPUT

RESULT
Thus, the above working with numpy, pandas, and matplotlib has been completed successfully.

21
Ex. No: 4 EXPLORING VARIOUS VARIABLE AND ROW FILTERS IN R FOR CLEANING
DATA

AIM
To exploring various variable and row filters in R for cleaning data.

PROCEDURE
Data Preparation and Cleaning
First, let’s create a sample dataset and then explore various variable and row filters to clean the data.
# Create a sample dataset
set.seed(123)
data <- data.frame( ID = 1:10, Age = sample(18:60, 10, replace = TRUE), Gender = sample(c("Male",
"Female"), 10, replace = TRUE), Score = sample(1:100, 10) )
# Print the sample data
print(data)

OUTPUT

Variable Filters
1. Filtering by s Specific Value:
To filter rows based on a specific value in a variable (e.g., only show rows where Age is greater than
30): filtered_data <-data [data$Age>30, ]
2. Filtering by Multiple Conditions:
You can filter row based on multiple conditions using the & (AND) or | (OR) operators (e.g., show rows

22
where Age is greater than 30 and Gender is “Male”): filtered_data <- data [data$Age > 30 &
data$Gender == “Male”,]

Row Filters
1. Removing Duplicate Rows:
To remove duplicate rows based on certain columns (e.g., remove duplicates based on ‘ID’):
cleaned_data <- unique (data [, c(“ID”, “Age”, “Gender”)])
2. Removing Rows with Missing Values:
To remove rows with missing values (NA):
cleaned_data <-na.omit (data)

Data Visualization
Apply various plot features using the ggplot2 package to visualize the cleaned data.
# Load the ggplot2 package library (ggplot2)
# Create a scatterplot of Age vs. Score with points colored by Gender
Ggplot (data = cleaned_data, aes(x = Age, y = Score, color = Gender)) + geom_point () + labs(title = "Scatterplot
of Age vs. Score", x = "Age", y = "Score")
# Create a histogram of Age
Ggplot (data = cleaned_data, aes(x = Age)) + geom_histogram (binwidth = 5, fill = "blue", alpha = 0.5) + labs
(title = "Histogram of Age", x = "Age", y = "Frequency")
# Create a bar chart of Gender distribution
Ggplot (data = cleaned_data, aes(x = Gender)) + geom_bar (fill = "green", alpha = 0.7) + labs (title = "Gender
Distribution", x = "Gender", y = "Count")

RESULT
Thus, the above exploring various variable and row filters in R for cleaning data was successfully
completed.

23
Ex. No: 5 TIME SERIES ANALYSIS USING VARIOUS VISUALIZATION TECHNIQUES

AIM
To perform time series analysis and apply the various visualization techniques.

PROCEDURE

DOWNLOAD DATASET

Step 1: Open google and type the following path in the address bar and download a dataset.
http://github.com/jbrownlee/Datasets.
Step 2: Write the following code to get the details.
from pandas import read_csv
from matplotlib import pyplot
series = read_csv (‘pathname’)
print (series.head ( ))
series.plot ( )
pyplot.show ( )

OUTPUT

24
Step 3: To get the time series line plot:
series.plot (style=’-.’)
pyplot.show ( )

OUTPUT

Step 4: To create a Histogram:

series.hist ( )
pyplot.show ( )

OUTPUT

25
Step 5: To create density plot:
series.plot (kind = ‘kde’)
pyplot.show ( )

OUTPUT

RESULT
Thus, the above time analysis has been checked with various visualization techniques.

26
Ex. No: 6 DATA ANALYSIS AND REPRESENTATION ON MAP

AIM
Write a program to perform data analysis and representation on a map using various map data sets with
mouse rollover effect, user interaction etc.

PROCEDURE
STEP 1:
 Make sure to install the necessary libraries.
pip install geopandas folium bokeh

PROGRAM
from bokeh.io import show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure
from bokeh.layouts import column
import pandas as pd
import folium
# Load your data
data = pd.read_csv('D:\ARCHANA\dxv\LAB\DXV\geographic.csv')
# Create a Bokeh figure
p = figure(width=800, height=400, tools='pan,wheel_zoom,reset')
# Create a ColumnDataSource to hold data
source = ColumnDataSource(data)
# Add circle markers to the figure
p.circle(x='Longitude', y='Latitude', size=10, source=source, color='orange')
# Create a hover tool for mouse rollover effect
hover = HoverTool() hover.tooltips = [("Info", "@Info"), ("Latitude", "@Latitude"), ("Longitude",
"@Longitude")]
p.add_tools(hover)
# Display the Bokeh plot
layout = column(p)
27
show(layout)
# Create a map centered at a specific location
m = folium.Map(location=[latitude, longitude], zoom_start=10)
# Add markers for your data points
for index, row in data.iterrows():
folium.Marker( location=[row['Latitude'], row['Longitude']], popup=row['Info'], # Display additional
info on mouse click ).add_to(m)
# Save the map to an HTML file
m.save('map.html')

OUTPUT

RESULT
Thus, the data analysis and representation on a map using various map data sets with mouse rollover effect,
use interaction has been completed successfully.
28
Ex. No: 7 BUILDING CARTOGRAPHIC VISUALIZATION

AIM
Build cartographic visualization for multiple datasets involving various countries of the world; states
and districts in India etc.

PROCEDURE
STEP 1:
Collect Datasets
Gather the datasets containing geographical information for countries, states, or districts. Make sure these
datasets include the necessary attributes for mapping (e.g., country/state/district names, codes, and relevant
data).

STEP 2:
Install Required Libraries:
pip install geopandas matplotlib

STEP 3:
Load Geographic Data:
Use Geopandas to load the geographic data for countries, states, or districts. Make sure to match the
geographical data with your datasets based on the common attributes.

STEP 4:
Merge Datasets:
Merge your datasets with the geographic data based on common attributes. This step is crucial for linking your
data to the corresponding geographic regions.

STEP 5:
Create Cartographic Visualizations:
Use Matplotlib to create cartographic visualizations. You can create separate plots for different datasets or
overlay them on a single map.

29
STEP 6:
Customize and Enhance:
Customize your visualizations based on your needs. You can add legends, labels, titles, and other elements to
enhance the interpretability of your maps.

STEP 7:
Save and Share:
Save your visualizations as image files or interactive plots if needed. You can then share these visualizations
with others.

PROGRAM:
import pandas as pd
import geopandas as gpd
import shapely
# needs 'descartes'
import matplotlib.pyplot as plt
df = pd.DataFrame({'city': ['Berlin', 'Paris', 'Munich'], 'latitude': [52.518611111111, 48.856666666667,
48.137222222222], 'longitude': [13.408333333333, 2.3516666666667, 11.575555555556]})
gdf = gpd.GeoDataFrame(df.drop(['latitude', 'longitude'], axis=1), crs={'init': 'epsg:4326'},
geometry=[shapely.geometry.Point(xy) for xy in zip(df.longitude, df.latitude)])
print (gdf)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
base = world.plot(color='white', edgecolor='black')
gdf.plot (ax=base, marker='o', color='red', markersize=5)
plt.show ( )

OUTPUT
city geometry
0 Berlin POINT (13.40833 52.51861)
1 Paris POINT (2.35167 48.85667)
2 Munich POINT (11.57556 48.13722)

30
RESULT
Build cartographic visualization for multiple datasets involving various countries of the world; has
been visualized successfully.

31
Ex. No: 8 PERFORM EDA ON WINE QUALITY DATA SET.

AIM
To write a program to peform EDA on Wine Quality Data Set.

PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
data = pd.read_csv("pathname")
# Display the first few rows of the dataset
print(data.head())
# Get information about the dataset
print(data.info())
# Summary statistics
print (data.describe())
# Distribution of wine quality
sns.countplot (data ['quality'])
plt.title (" Wine Quality data set")
plt.show ( )
# Box plots for selected features by wine quality
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure (figsize=(8, 6))
sns.boxplot(x='quality', y=feature, data=data)
plt.title (f'{feature} by Wine Quality')
plt.show ( )
# Pair plot of selected features
sns.pairplot (data, vars= ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar'], hue='quality', diag_kind='kde')
plt.suptitle ("Pair Plot of Selected Features")
plt.show ( )
32
# Correlation heatmap
corr_matrix = data.corr ( )
plt.figure (figsize = (10, 8))
sns.heatmap (corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")
plt.title ("Correlation Heatmap")
plt.show ( )
# Histograms of selected features
features = ['alcohol', 'volatile acidity', 'citric acid', 'residual sugar']
for feature in features:
plt.figure (figsize = (6, 4))
sns.histplot (data [feature], kde=True, bins=20)
plt.title (f"Distribution of {feature}")
plt.show ( )

OUTPUT

33
34
35
RESULT
Thus the above program to perform EDA on Wine Quality Data Set.

36
Ex. No: 9 VISUALIZING VARIOUS EDA TECHNIQUES AS CASE STUDY FOR IRIS
DATASET

AIM
The Mini Project to predict the time taken to solve a problem given the current status of the user using
Random Forest Regressor Model.

PROCEDURE
Import Libraries:
Start by importing the necessary libraries and loading the dataset.
Descriptive Statistics:
Compute and display descriptive statistics.
python
Check for Missing Values:
Verify if there are any missing values in the dataset.
Visualize Data Distributions:
Visualize the distribution of numerical variables.
python
Correlation Heatmap:
Examine the correlation between numerical variables.
Boxplots for Categorical Variables:
Use boxplots to visualize the distribution of features by species.
Violin Plots:
Combine box plots with kernel density estimation for better visualization.
Correlation between Features:
Visualize pair-wise feature correlations.

Conclusion and Summary:

Summarize key findings and insights from the analysis.
This case study provides a comprehensive analysis of the Iris dataset, including data exploration, descriptive
statistics, and visualization of data distributions, correlation analysis, and feature-specific visualizations

Course Plan For DEV
No ratings yet
Course Plan For DEV
18 pages
Ccs346-Eda Lab Record
No ratings yet
Ccs346-Eda Lab Record
74 pages
Eda Lab Manual Without Output
No ratings yet
Eda Lab Manual Without Output
33 pages
Eda Lab Verified
No ratings yet
Eda Lab Verified
38 pages
DEV Lab Record Updated Final
No ratings yet
DEV Lab Record Updated Final
59 pages
Ad3301 - Dev Lab
No ratings yet
Ad3301 - Dev Lab
52 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
93 pages
Ad3467 Data Science and Analytics Laboratory Manual
No ratings yet
Ad3467 Data Science and Analytics Laboratory Manual
59 pages
EDA Lab Record
No ratings yet
EDA Lab Record
45 pages
Dav - Lab Manual
No ratings yet
Dav - Lab Manual
34 pages
Dav Cis R20 DS
No ratings yet
Dav Cis R20 DS
9 pages
New CP - Cse2500 Data Analytics
No ratings yet
New CP - Cse2500 Data Analytics
11 pages
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
No ratings yet
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
151 pages
Department of Computer Science and Engineering: Even Semester
No ratings yet
Department of Computer Science and Engineering: Even Semester
45 pages
Knowledge Institute of Technology: (An Autonomous Institution)
No ratings yet
Knowledge Institute of Technology: (An Autonomous Institution)
33 pages
DV Lab Manual AI DS 2024 25
No ratings yet
DV Lab Manual AI DS 2024 25
89 pages
AI3104 Foundation of Data Science (Handout) 2024
No ratings yet
AI3104 Foundation of Data Science (Handout) 2024
7 pages
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
No ratings yet
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
107 pages
Lab Manual: Department of Information Technology
No ratings yet
Lab Manual: Department of Information Technology
10 pages
Data Analytics With R - BDS306C - LAB - Full
No ratings yet
Data Analytics With R - BDS306C - LAB - Full
61 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
DVP - Lab Manual 2024-2025
No ratings yet
DVP - Lab Manual 2024-2025
26 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
43 pages
Data Mining Lab Guide
No ratings yet
Data Mining Lab Guide
56 pages
It - III B.tech Sem-II - DWDM Lab Manual (20-21)
No ratings yet
It - III B.tech Sem-II - DWDM Lab Manual (20-21)
94 pages
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
No ratings yet
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
96 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
Ccw331-Business Analtics Lab
No ratings yet
Ccw331-Business Analtics Lab
64 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
Ccw331 Lab Manual
No ratings yet
Ccw331 Lab Manual
102 pages
Ilide - Info Data Analytics Lab File Rohit PR
No ratings yet
Ilide - Info Data Analytics Lab File Rohit PR
23 pages
DV Final
No ratings yet
DV Final
70 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
69 pages
DV Lab Manual
No ratings yet
DV Lab Manual
68 pages
DV Lab 97541
No ratings yet
DV Lab 97541
91 pages
II-i Data Visualization R-PGMN Aids
No ratings yet
II-i Data Visualization R-PGMN Aids
35 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
57 pages
DW Lab Manual
No ratings yet
DW Lab Manual
39 pages
DM Lab Manual
No ratings yet
DM Lab Manual
72 pages
Ccs334 Big Data Analytics Laboratory Manual
No ratings yet
Ccs334 Big Data Analytics Laboratory Manual
75 pages
CST 322 Data Analytics (Elective)
No ratings yet
CST 322 Data Analytics (Elective)
244 pages
Unit 1
No ratings yet
Unit 1
36 pages
Ad3411-Dsa Lab Final Record
No ratings yet
Ad3411-Dsa Lab Final Record
33 pages
DDM Lab Manual 22-23
No ratings yet
DDM Lab Manual 22-23
53 pages
FDS Lab Manual FDS Lab Manual
No ratings yet
FDS Lab Manual FDS Lab Manual
57 pages
III-i Bda Syllabus
No ratings yet
III-i Bda Syllabus
8 pages
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages
CSE3141 PredictiveAnalytics CourseHandout
No ratings yet
CSE3141 PredictiveAnalytics CourseHandout
8 pages
DMV Lab Manual
No ratings yet
DMV Lab Manual
45 pages
191ai32a - Data Structures Laboratory Record
No ratings yet
191ai32a - Data Structures Laboratory Record
98 pages
Updated - DSV - Lab Manual (2024-25)
No ratings yet
Updated - DSV - Lab Manual (2024-25)
90 pages
AD3271 DSD Lab Manual 8.6.2022
No ratings yet
AD3271 DSD Lab Manual 8.6.2022
125 pages
Data Analytics Course Handout
No ratings yet
Data Analytics Course Handout
7 pages
EDA and DPA Lab Curicullam
No ratings yet
EDA and DPA Lab Curicullam
5 pages
Computer Science Graduate Goals
No ratings yet
Computer Science Graduate Goals
5 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
Zoho Round2 Test 1
No ratings yet
Zoho Round2 Test 1
72 pages
Data Analytics With Python Curriculum (LOCTECH) PDF
No ratings yet
Data Analytics With Python Curriculum (LOCTECH) PDF
6 pages
Introduction To Python For Science and Engineering Second Edition David J. Pine Download
100% (1)
Introduction To Python For Science and Engineering Second Edition David J. Pine Download
61 pages
Panda Joins
No ratings yet
Panda Joins
25 pages
Python Packages To Learn Data Science E-Book
No ratings yet
Python Packages To Learn Data Science E-Book
76 pages
Python Lab Exam Guide
No ratings yet
Python Lab Exam Guide
7 pages
Amarjeet Kumar
No ratings yet
Amarjeet Kumar
2 pages
Machine Learning Part 02
No ratings yet
Machine Learning Part 02
161 pages
MLS 2 - NumPy and Pandas
No ratings yet
MLS 2 - NumPy and Pandas
27 pages
Python For AI Developers
No ratings yet
Python For AI Developers
45 pages
AI and Machine Learning in Action Real World Solutions For Coders
No ratings yet
AI and Machine Learning in Action Real World Solutions For Coders
175 pages
Set 2
No ratings yet
Set 2
3 pages
Where
No ratings yet
Where
22 pages
Informatics Practices Q & A
No ratings yet
Informatics Practices Q & A
14 pages
Class XII Python Practical File
No ratings yet
Class XII Python Practical File
19 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
Pandas 2
No ratings yet
Pandas 2
17 pages
Data Scientist Bootcamp - NG
No ratings yet
Data Scientist Bootcamp - NG
25 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
Graph Plotting Questions Class12
No ratings yet
Graph Plotting Questions Class12
5 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
AI Assignment Guide for Students
No ratings yet
AI Assignment Guide for Students
7 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
9 pages
Data Centric Computing
No ratings yet
Data Centric Computing
8 pages
Data Science Lab Guide
No ratings yet
Data Science Lab Guide
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Resumefornonte4ch PDF
No ratings yet
Resumefornonte4ch PDF
1 page
Data Analyst Roadmap 2025
No ratings yet
Data Analyst Roadmap 2025
19 pages
X AI - Programs
No ratings yet
X AI - Programs
12 pages
Practical File (Edited) 5
No ratings yet
Practical File (Edited) 5
21 pages
Data Analysis in Python RichContent
No ratings yet
Data Analysis in Python RichContent
61 pages

Ad3364 - Dev Lab Manual Final

Uploaded by

Ad3364 - Dev Lab Manual Final

Uploaded by

SRM VALLIAMMAI ENGINEERING COLLEGE

(An Autonomous Institution)

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

AD3364 DATA EXPLORATION AND VISUALIZATION LABORATORY

Academic Year: 2025-2026(ODD)

A PEO, PO, PSO 3-5

C Introduction/ Description of major Software & Hardware involved in lab 7

D CO, CO-PO Matrix, CO-PSO Matrix 7

Install Data Analysis and Visualization tools: R/Python/Tableau Public/Power

Build Cartographic visualization for multiple datasets involving various

8 Perform EDA on Wine Quality Data Set. 32-36

PROGRAMME OUTCOMES (POs)

PO# Graduate Attribute Programme Outcome

Apply the knowledge of mathematics, science, engineering

 To understand the key techniques behind data visualization.

1. Install Data Analysis and Visualization tools: R/Python/Tableau Public/Power BI.

AD3364.1 Understand the fundamentals of exploratory data analysis.

AD3364.2 Implement the data visualization using Matplotlib.

AD3364.3 Perform univariate data exploration and analysis.

AD3364.4 Apply bivariate data exploration and analysis.

CO- PO-PSO MATRIX

1. Aim & Procedure 20

3. Conduction and Execution 30

4. Output & Result 10

INTERNAL ASSESSMENT FOR LABORATORY

1. Conduction & Execution of Experiment 25

(i) INSTALL NUMPY WITH PIP

pip install numpy

(ii) INSTALL JUPYTER LAB

Install Jupyter Lab with pip:

pip install jupyterlab

Once installed, launch Jupyter Lab with:

Install the classic Jupyter Notebook with:

pip install notebook

To run the notebook:

(iv) INSTALL SCIPY

pip install scipy

(v) INSTALL PANDAS

pip install pandas

(vi) INSTALL MATPLOTLIB

pip install matplotlib

CREATE A CSV FILE IN EXCEL:

2. Load Email Data:

5. Word Cloud for Subject or Message:

6. Top Senders and Recipients:

pip install numpy

Once NumPy is installed, you can use it as follows:

pip install pandas

Here’s how to work with Pandas DataFrames:

Here’s a simple example of creating a basic plot:

Step 4: To create a Histogram:

Conclusion and Summary:

You might also like