0% found this document useful (0 votes)

31 views20 pages

DVA Lab Manual

The document is a lab manual for the Data and Visual Analytics course at SAGAR Institute of Research and Technology, detailing various programming experiments in Python. It includes tasks on descriptive statistics, statistical inference, hypothesis testing, regression modeling, data wrangling, data visualization, and an overview of data ecosystems. Each section provides code examples and expected outputs for practical learning.

Uploaded by

pawanyadav7015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views20 pages

DVA Lab Manual

Uploaded by

pawanyadav7015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

SAGAR INSTITUTE OF RESEARCH AND TECHNOLOGY

Bhopal, Madhya Pradesh (462021)

DEPARTMENT OF AIML

Lab Manual

Data and Visual Analytics AL603 (B)

INDEX

Plan Date of
S.No Name of Experiment Remark
Date Completion

Write a program to implement Descriptive

1. Statistics with Measures of Central Tendency and
Dispersion in Python
Write a program to implement Statistical
2. Inference and Sampling Distribution in Python.

Write a program to implement Statistical

3. Hypothesis Testing and Analysis in Python.

Write a program to implement Regression

4. Modeling and Bayesian Inference in Python.

Write a program to implement Data Wrangling

5. and Cleaning in Python.

Write a program to implement Data Visualization

in Data Analysis in Python.
6.

Write a program to implement Data Ecosystem

Overview, File Formats, and Sources of Data in
7. Python.

Write a program to implement Data Pipelines,

ETL, and Big Data Processing with Spark in
8.
Python.

Write a program to implement Basic Data

Visualizations using Matplotlib, Seaborn, and
9. Pandas in Python.

Write a program to implement Interactive

Visualizations using Plotly in Python.
10.
1) Write a program to implement Descriptive Statistics with Measures of
Central Tendency and Dispersion in Python.

import numpy as np
import pandas as pd
from scipy import stats

# Sample data representing test scores of students (ratio scale data)

data = [56, 77, 89, 92, 65, 74, 83, 90, 78, 85, 60, 69]

# Convert the data into a pandas Series for better management and indexing
data_series = pd.Series(data)

# Descriptive statistics
mean = np.mean(data_series)
median = np.median(data_series)
mode = stats.mode(data_series)[0][0]
variance = np.var(data_series)
std_deviation = np.std(data_series)
range_data = np.max(data_series) - np.min(data_series)

# Output the results

print("Descriptive Statistics for Test Scores:")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_deviation}")
print(f"Range: {range_data}")

Sample Output:
Descriptive Statistics for Test Scores:
Mean: 76.33333333333333
Median: 77.0
Mode: 56
Variance: 105.43434343434343
Standard Deviation: 10.270219259636385
Range: 36
2) Write a program to implement Statistical Inference and Sampling
Distribution in Python.

In this program, we will demonstrate sampling distributions, resampling, and statistical

inference using bootstrapping to estimate the confidence interval for the mean of a
population based on a sample.

import numpy as np
import matplotlib.pyplot as plt

# Sample data (ratio-level data)

data = [56, 77, 89, 92, 65, 74, 83, 90, 78, 85, 60, 69]

# Function for bootstrapping to estimate confidence interval

def bootstrap(data, n_iterations, sample_size):
means = []
for _ in range(n_iterations):
sample = np.random.choice(data, size=sample_size, replace=True)
means.append(np.mean(sample))
return np.array(means)

# Parameters
n_iterations = 1000 # Number of bootstrap iterations
sample_size = len(data) # Size of each sample

# Perform bootstrapping
bootstrap_means = bootstrap(data, n_iterations, sample_size)

# Calculate the 95% confidence interval

conf_interval = np.percentile(bootstrap_means, [2.5, 97.5])

# Output the results

print(f"Bootstrap 95% Confidence Interval for the Mean: {conf_interval}")
print(f"Bootstrap Mean Estimate: {np.mean(bootstrap_means)}")

# Plotting the distribution of the bootstrap samples

plt.hist(bootstrap_means, bins=30, edgecolor='black')
plt.axvline(conf_interval[0], color='red', linestyle='dashed', linewidth=2, label=f'2.5% CI:
{conf_interval[0]}')
plt.axvline(conf_interval[1], color='red', linestyle='dashed', linewidth=2, label=f'97.5% CI:
{conf_interval[1]}')
plt.axvline(np.mean(bootstrap_means), color='blue', linestyle='solid', linewidth=2,
label=f'Mean: {np.mean(bootstrap_means)}')
plt.title('Bootstrap Distribution of Sample Means')
plt.xlabel('Sample Means')
plt.ylabel('Frequency')
plt.legend()
plt.show()

3) Write a program to implement Statistical Hypothesis Testing and

Analysis in Python.

This program performs hypothesis testing, including the Chi-Square test, t-Test, and ANOVA
(Analysis of Variance), along with Correlation Analysis.

import numpy as np
import scipy.stats as stats
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Data for testing

group1 = np.array([23, 45, 67, 89, 56, 45, 78, 23, 67, 89]) # Example data for group 1
group2 = np.array([45, 67, 56, 34, 23, 90, 56, 78, 54, 68]) # Example data for group 2
observed = np.array([10, 20, 30, 40, 50]) # Observed frequency for Chi-square test
expected = np.array([12, 18, 28, 38, 52]) # Expected frequency for Chi-square test

# Chi-Square Test
chi2_stat, p_val_chi2 = stats.chisquare(observed, expected)
print(f"Chi-Square Test: chi2_stat = {chi2_stat}, p-value = {p_val_chi2}")

# Independent t-Test
t_stat, p_val_ttest = stats.ttest_ind(group1, group2)
print(f"\nt-Test: t_stat = {t_stat}, p-value = {p_val_ttest}")

# ANOVA (Analysis of Variance) - Testing if there are significant differences between

multiple groups
group3 = np.array([100, 110, 120, 130, 140])
group4 = np.array([150, 160, 170, 180, 190])
f_stat, p_val_anova = stats.f_oneway(group1, group2, group3, group4)
print(f"\nANOVA: F-stat = {f_stat}, p-value = {p_val_anova}")

# Correlation Analysis - Pearson's correlation

correlation, p_val_corr = stats.pearsonr(group1, group2)
print(f"\nCorrelation Analysis: Pearson correlation = {correlation}, p-value = {p_val_corr}")

# Plotting the correlation using a scatter plot

plt.scatter(group1, group2)
plt.title("Correlation between Group 1 and Group 2")
plt.xlabel("Group 1")
plt.ylabel("Group 2")
plt.show()
Sample Output:

Chi-Square Test: chi2_stat = 3.64, p-value = 0.45915828422551696

t-Test: t_stat = -0.18453789687872552, p-value = 0.8542929353724321

ANOVA: F-stat = 0.2324324324324324, p-value = 0.8723641987883339

Correlation Analysis: Pearson correlation = 0.3278688493185258, p-value =

0.3101255781256621
4)Write a program to implement Regression Modeling and Bayesian
Inference in Python.

This program demonstrates Regression Modeling, including Linear Regression and Bayesian
Inference using a Bayesian Network approach to predict outcomes based on data.

import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
import pymc3 as pm
import matplotlib.pyplot as plt

# Generate synthetic data for linear regression (Example: predicting house price based on
size)
np.random.seed(0)
size = np.random.normal(1500, 500, 100) # House sizes in square feet
price = size * 300 + np.random.normal(50000, 10000, 100) # Price in dollars (size * 300 +
some noise)

# Linear Regression (Ordinary Least Squares)

X = sm.add_constant(size) # Adding constant for intercept
model = sm.OLS(price, X).fit() # Fit the model
print(model.summary())

# Visualization of regression line

plt.scatter(size, price, color='blue', label='Data Points')
plt.plot(size, model.predict(X), color='red', label='Regression Line')
plt.title('Linear Regression: House Price vs. Size')
plt.xlabel('Size (sq ft)')
plt.ylabel('Price (in dollars)')
plt.legend()
plt.show()

# Bayesian Inference using PyMC3 (Bayesian Linear Regression)

with pm.Model() as model_bayesian:
# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=10)

# Linear model
mu = alpha + beta * size

# Likelihood
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=price)
# Sampling from the posterior distribution
trace = pm.sample(2000, return_inferencedata=False)

# Plot posterior distributions

pm.plot_posterior(trace, var_names=['alpha', 'beta', 'sigma'])
plt.show()

Sample Output:

OLS Regression Summary (Printed Output):

5)Write a program to implement Data Wrangling and Cleaning in Python.

This program demonstrates how to gather, assess, and clean data before performing
visualizations. We will work with a sample dataset (e.g., from a CSV file) to perform basic
data wrangling tasks.

import pandas as pd
import numpy as np

# Load a sample dataset (Assume it's a CSV file)

# For this example, we're using a small manually-created dataset for simplicity.
data = {
'Age': [23, 45, 36, np.nan, 50, 29, 35, np.nan, 60, 44],
'Income': [50000, 70000, 80000, 60000, 120000, np.nan, 55000, 80000, 100000, 85000],
'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Chicago', 'Los Angeles', np.nan,
'Chicago', 'New York', 'Los Angeles']
}

df = pd.DataFrame(data)

# Displaying the initial dataset

print("Initial Dataset:")
print(df)

# Assessing the data - Checking for missing values

print("\nMissing Values:")
print(df.isnull().sum())

# Cleaning the data - Handling missing values

# For 'Age' and 'Income', we can fill missing values with the mean of the respective column
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Income'].fillna(df['Income'].mean(), inplace=True)

# For 'City', we can fill missing values with the mode (most common value)
df['City'].fillna(df['City'].mode()[0], inplace=True)

# Displaying the cleaned dataset

print("\nCleaned Dataset:")
print(df)

Sample Output:

Initial Dataset:
Age Income City
0 23.0 50000.0 New York
1 45.0 70000.0 Los Angeles
2 36.0 80000.0 Chicago
3 NaN 60000.0 New York
4 50.0 120000.0 Chicago
5 29.0 NaN Los Angeles
6 35.0 55000.0 Chicago
7 NaN 80000.0 New York
8 60.0 100000.0 Los Angeles
9 44.0 85000.0 Los Angeles

Missing Values:
Age 2
Income 1
City 1
dtype: int64

Cleaned Dataset:
Age Income City
0 23.0 50000.0 New York
1 45.0 70000.0 Los Angeles
2 36.0 80000.0 Chicago
3 39.5 60000.0 New York
4 50.0 120000.0 Chicago
5 29.0 74900.0 Los Angeles
6 35.0 55000.0 Chicago
7 39.5 80000.0 New York
8 60.0 100000.0 Los Angeles
9 44.0 85000.0 Los Angeles
6)Write a program to implement Data Visualization in Data Analysis in
Python.

This program focuses on various types of visualizations: univariate, bivariate, and

multivariate exploration. We will use the cleaned data from Program 1 and apply different
visualization techniques to explore the relationships in the dataset.

import matplotlib.pyplot as plt

import seaborn as sns

# Data from Program 1

# Assuming df is the cleaned DataFrame from Program 1

# Univariate Exploration: Distribution of 'Age'

plt.figure(figsize=(8, 6))
sns.histplot(df['Age'], kde=True, color='blue', bins=8)
plt.title('Distribution of Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

# Bivariate Exploration: Scatter plot of 'Income' vs 'Age'

plt.figure(figsize=(8, 6))
sns.scatterplot(x='Age', y='Income', data=df, hue='City', palette='viridis', s=100)
plt.title('Income vs Age by City')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()

# Multivariate Exploration: Pairplot (pairwise relationships)

sns.pairplot(df, hue='City', diag_kind='kde')
plt.title('Pairplot of Age, Income, and City')
plt.show()

# Explanatory Visualization: Boxplot of 'Income' across 'City'

plt.figure(figsize=(8, 6))
sns.boxplot(x='City', y='Income', data=df)
plt.title('Income Distribution by City')
plt.xlabel('City')
plt.ylabel('Income')
plt.show()
7) Write a program to implement Data Ecosystem Overview, File Formats,
and Sources of Data in Python.

This program demonstrates the handling of different data types and file formats (CSV, JSON,
and NoSQL with MongoDB) to understand various sources of data and data repositories.

import pandas as pd
import json
import pymongo
from bson.json_util import dumps

# 1. Overview of Data Types (Structured, Semi-structured, Unstructured)

# Structured Data: Tabular format (e.g., CSV, SQL Databases)
structured_data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [24, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(structured_data)
print("Structured Data (DataFrame):")
print(df)

# Semi-structured Data: JSON format

semi_structured_data = [
{"Name": "Alice", "Age": 24, "City": "New York"},
{"Name": "Bob", "Age": 30, "City": "Los Angeles"},
{"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
json_data = json.dumps(semi_structured_data, indent=4)
print("\nSemi-structured Data (JSON):")
print(json_data)

# Unstructured Data: Text data (example)

unstructured_data = "Alice is 24 years old. Bob is 30 years old. Charlie is 35 years old."
print("\nUnstructured Data (Text):")
print(unstructured_data)

# 2. Sources of Data: Using a CSV file (you can substitute this with any CSV file)
csv_data = pd.read_csv('sample_data.csv') # Assumed 'sample_data.csv' exists in your
directory
print("\nCSV Data Loaded:")
print(csv_data.head())
# 3. MongoDB NoSQL Database Example
client = pymongo.MongoClient("mongodb://localhost:27017/") # MongoDB connection
(locally hosted)
db = client['test_database']
collection = db['people']

# Inserting the structured data (inserting a document into MongoDB)

collection.insert_many(structured_data)
print("\nMongoDB Data Insertion Complete.")

# Retrieve and print documents

documents = collection.find()
print("\nData Retrieved from MongoDB:")
print(dumps(documents, indent=4))

Sample Output:
Structured Data (DataFrame):
Name Age City
0 Alice 24 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago

Semi-structured Data (JSON):

[
{
"Name": "Alice",
"Age": 24,
"City": "New York"
},
{
"Name": "Bob",
"Age": 30,
"City": "Los Angeles"
},
{
"Name": "Charlie",
"Age": 35,
"City": "Chicago"
}
]

Unstructured Data (Text):

Alice is 24 years old. Bob is 30 years old. Charlie is 35 years old.
CSV Data Loaded:
Name Age City
0 Alice 24 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago

MongoDB Data Insertion Complete.

Data Retrieved from MongoDB:

[
{
"Name": "Alice",
"Age": 24,
"City": "New York"
},
{
"Name": "Bob",
"Age": 30,
"City": "Los Angeles"
},
{
"Name": "Charlie",
"Age": 35,
"City": "Chicago"
}
]
8) Write a program to implement Data Pipelines, ETL, and Big Data
Processing with Spark in Python.

This program demonstrates ETL (Extract, Transform, Load) and basic big data processing
using Apache Spark (via PySpark). It also touches on concepts like Hadoop Distributed File
System (HDFS) and processing large datasets in Spark.

from pyspark.sql import SparkSession

from pyspark.sql.functions import col

# 1. Setting up a Spark session

spark = SparkSession.builder \
.appName("ETL and Big Data Processing") \
.getOrCreate()

# 2. Extracting data: Loading a large dataset (assuming a CSV file with large data)
# Example: Loading the data from HDFS or local file system (adjust the path accordingly)
# data = spark.read.csv("hdfs://path_to_large_dataset.csv", header=True, inferSchema=True)

# For demonstration purposes, we use a smaller sample dataframe (similar to large dataset)
data = spark.createDataFrame([
("Alice", 24, "New York"),
("Bob", 30, "Los Angeles"),
("Charlie", 35, "Chicago"),
("David", 40, "Miami"),
("Eve", 22, "San Francisco")
], ["Name", "Age", "City"])

# 3. Transforming data: Filtering out people who are younger than 30 years old
filtered_data = data.filter(col("Age") > 30)

# 4. Loading data: Show the result after transformation (ETL process)

print("Transformed Data:")
filtered_data.show()

# 5. Big Data Processing - Using Spark for Large Dataset Processing (e.g., group by city and
count)
city_count = data.groupBy("City").count()

print("\nCity Count Aggregation (Big Data Processing Example):")

city_count.show()

# Stop the Spark session

spark.stop()

Sample Output:
9) Write a program to implement Basic Data Visualizations using
Matplotlib, Seaborn, and Pandas in Python.

This program demonstrates how to use Matplotlib for basic plots, Seaborn for advanced
statistical visualizations, and Pandas for data handling.

import matplotlib.pyplot as plt

import seaborn as sns
import pandas as pd
import numpy as np

# 1. Data Preparation using Pandas

# Creating a sample DataFrame with random data
data = {
'Age': np.random.randint(20, 60, 100), # Random ages between 20 and 60
'Income': np.random.randint(30000, 120000, 100), # Random income values
'City': np.random.choice(['New York', 'Los Angeles', 'Chicago', 'Miami'], 100)
}

df = pd.DataFrame(data)

# 2. Basic Plotting with Matplotlib

plt.figure(figsize=(8, 6))
plt.hist(df['Age'], bins=10, color='skyblue', edgecolor='black')
plt.title('Histogram of Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

# 3. Seaborn: Boxplot to compare Income by City

plt.figure(figsize=(8, 6))
sns.boxplot(x='City', y='Income', data=df, palette='Set2')
plt.title('Income Distribution by City')
plt.xlabel('City')
plt.ylabel('Income')
plt.show()

# 4. Seaborn: Scatterplot to explore relationship between Age and Income

plt.figure(figsize=(8, 6))
sns.scatterplot(x='Age', y='Income', data=df, hue='City', palette='Set1', s=100)
plt.title('Scatterplot of Age vs Income')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()

Sample Output:
1. Histogram of Age:
o A histogram showing the distribution of Age with bins, allowing you to
observe the frequency of different age ranges.
2. Boxplot of Income by City:
o A boxplot for Income across different cities, showing the central tendency,
spread, and potential outliers.
3. Scatterplot of Age vs Income:
o A scatter plot showing how Income correlates with Age, with different colors
representing different cities.

\
10) Write a program to implement Interactive Visualizations using Plotly
in Python.

This program demonstrates how to create interactive visualizations using Plotly. We will use
Plotly Express to visualize a dataset, making it more dynamic and user-friendly.

import plotly.express as px
import pandas as pd
import numpy as np

# 1. Data Preparation using Pandas

df = pd.DataFrame(data)

# 2. Plotly: Scatter plot of Age vs Income with color by City

fig = px.scatter(df, x='Age', y='Income', color='City', title="Age vs Income by City",
labels={'Age': 'Age (years)', 'Income': 'Income ($)'})
fig.show()

# 3. Plotly: Box plot of Income by City

fig = px.box(df, x='City', y='Income', title="Income Distribution by City", labels={'Income':
'Income ($)', 'City': 'City'})
fig.show()

# 4. Plotly: Histogram of Age

fig = px.histogram(df, x='Age', title="Distribution of Age", labels={'Age': 'Age (years)'})
fig.show()

o.
Sample Output:
1. Interactive Scatter Plot (Age vs Income by City):
o A dynamic scatter plot where you can hover over points to see Age, Income,
and City. You can zoom in or out to explore specific data points.
2. Interactive Box Plot (Income by City):
o An interactive box plot showing the distribution of Income across cities.
Hovering over the box shows detailed statistics like median, quartiles, and
outliers.
3. Interactive Histogram (Age Distribution):
o An interactive histogram for Age showing the frequency of different age
groups. You can zoom in or click on specific bins for more detailed analysis.
Key Concepts Covered in These Programs:
1. Matplotlib:
o Used for creating basic plots like histograms and line plots.
o Offers customization options for appearance (e.g., titles, labels, colors).
2. Seaborn:
o Built on top of Matplotlib, Seaborn provides higher-level abstractions and
improved visualizations.
o Great for statistical plots like boxplots and scatterplots with hue based on
categories.
3. Plotly:
o Plotly enables the creation of interactive visualizations. It's excellent for
scatter plots, box plots, histograms, and more.
o Offers interactivity features like zooming, panning, and hover text.
4. Pandas:
o Pandas is used to manipulate and prepare the data for visualization.
o It simplifies handling and transforming data for plotting with libraries like
Matplotlib, Seaborn, and Plotly.

Ankit Python
No ratings yet
Ankit Python
26 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
Data Science
No ratings yet
Data Science
18 pages
ML Lab Mala Reddy CLG
No ratings yet
ML Lab Mala Reddy CLG
23 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
ML Manual New
No ratings yet
ML Manual New
38 pages
ML Programs
No ratings yet
ML Programs
41 pages
Ad3411 - Dsa Lab Manual
No ratings yet
Ad3411 - Dsa Lab Manual
34 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Datascience Lab
No ratings yet
Datascience Lab
24 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Pandas & NumPy Data Analysis Guide
No ratings yet
Pandas & NumPy Data Analysis Guide
11 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
External
No ratings yet
External
11 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Data Analytics Lab Manual Final1
No ratings yet
Data Analytics Lab Manual Final1
32 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Python in Research
No ratings yet
Python in Research
18 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Statistical Analysis With Scipy?
No ratings yet
Statistical Analysis With Scipy?
9 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
4 12
No ratings yet
4 12
17 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Lab Manual
No ratings yet
Lab Manual
7 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
ML Lab (R22) Manual
No ratings yet
ML Lab (R22) Manual
25 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Corrected Index of Topics
No ratings yet
Corrected Index of Topics
2 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Bayes Theorem PDF
No ratings yet
Bayes Theorem PDF
9 pages
Engaging Young Learners with Freeze Framing
No ratings yet
Engaging Young Learners with Freeze Framing
14 pages
Part - A Answer The Following Questions (10x1 10)
No ratings yet
Part - A Answer The Following Questions (10x1 10)
2 pages
ID Book v3
No ratings yet
ID Book v3
25 pages
Editor Comparative Politics Nature and Major Approaches
No ratings yet
Editor Comparative Politics Nature and Major Approaches
5 pages
Vehicle Manual for Technicians
No ratings yet
Vehicle Manual for Technicians
1 page
Deloitte Life Sciences Healthcare Predictions
No ratings yet
Deloitte Life Sciences Healthcare Predictions
28 pages
Applied Radiological Anatomy 2nd Semester
No ratings yet
Applied Radiological Anatomy 2nd Semester
7 pages
Tài Liệu Không Có Tiêu Đề-2
No ratings yet
Tài Liệu Không Có Tiêu Đề-2
19 pages
Unit 1 - Understanding Guidance
No ratings yet
Unit 1 - Understanding Guidance
13 pages
Planificare Anuala Upstream Proficiency L1 Cls 12 Teoretic Si Vocational
No ratings yet
Planificare Anuala Upstream Proficiency L1 Cls 12 Teoretic Si Vocational
6 pages
Satechi Smart Pointer UP4-GBR-100
No ratings yet
Satechi Smart Pointer UP4-GBR-100
1 page
Aggregate & Capacity Planning Guide
100% (2)
Aggregate & Capacity Planning Guide
10 pages
BCH 201 General - Biochemistry 1 - Farid2 PDF
100% (1)
BCH 201 General - Biochemistry 1 - Farid2 PDF
103 pages
6 Enlargement Negative Scale Factor
No ratings yet
6 Enlargement Negative Scale Factor
10 pages
Identity Crisis in Michael Ondaatje's The English Patient
No ratings yet
Identity Crisis in Michael Ondaatje's The English Patient
3 pages
M.Com Marketing Analysis: Apple
No ratings yet
M.Com Marketing Analysis: Apple
19 pages
Zug Medical Accessories Catalog
No ratings yet
Zug Medical Accessories Catalog
6 pages
HRM Assignment
No ratings yet
HRM Assignment
13 pages
Plant List Manas Ayurved
No ratings yet
Plant List Manas Ayurved
23 pages
Alien Influence on Atlantis and Humanity
100% (2)
Alien Influence on Atlantis and Humanity
10 pages
Footloose
No ratings yet
Footloose
22 pages
Prestressed Slab Design Calculation
No ratings yet
Prestressed Slab Design Calculation
35 pages
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
No ratings yet
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
37 pages
Programing TSSM Button
No ratings yet
Programing TSSM Button
10 pages
Cleaning of Glass
No ratings yet
Cleaning of Glass
1 page
Chocolate Mousetti Cookie Recipe
No ratings yet
Chocolate Mousetti Cookie Recipe
1 page
12thDailyTest5-Students 20241127110535
No ratings yet
12thDailyTest5-Students 20241127110535
2 pages
Sade Assignment Full
No ratings yet
Sade Assignment Full
12 pages
Thesis Chapter 4 Qualitative
100% (3)
Thesis Chapter 4 Qualitative
8 pages

DVA Lab Manual

Uploaded by

DVA Lab Manual

Uploaded by

SAGAR INSTITUTE OF RESEARCH AND TECHNOLOGY

Bhopal, Madhya Pradesh (462021)

Data and Visual Analytics AL603 (B)

Write a program to implement Descriptive

Write a program to implement Statistical

Write a program to implement Regression

Write a program to implement Data Wrangling

Write a program to implement Data Visualization

Write a program to implement Data Ecosystem

Write a program to implement Data Pipelines,

Write a program to implement Basic Data

Write a program to implement Interactive

# Sample data representing test scores of students (ratio scale data)

# Output the results

In this program, we will demonstrate sampling distributions, resampling, and statistical

# Sample data (ratio-level data)

# Function for bootstrapping to estimate confidence interval

# Calculate the 95% confidence interval

# Output the results

# Plotting the distribution of the bootstrap samples

3) Write a program to implement Statistical Hypothesis Testing and

# Data for testing

# ANOVA (Analysis of Variance) - Testing if there are significant differences between

# Correlation Analysis - Pearson's correlation

# Plotting the correlation using a scatter plot

Chi-Square Test: chi2_stat = 3.64, p-value = 0.45915828422551696

t-Test: t_stat = -0.18453789687872552, p-value = 0.8542929353724321

ANOVA: F-stat = 0.2324324324324324, p-value = 0.8723641987883339

Correlation Analysis: Pearson correlation = 0.3278688493185258, p-value =

# Linear Regression (Ordinary Least Squares)

# Visualization of regression line

# Bayesian Inference using PyMC3 (Bayesian Linear Regression)

# Plot posterior distributions

OLS Regression Summary (Printed Output):

# Load a sample dataset (Assume it's a CSV file)

# Displaying the initial dataset

# Assessing the data - Checking for missing values

# Cleaning the data - Handling missing values

# Displaying the cleaned dataset

This program focuses on various types of visualizations: univariate, bivariate, and

import matplotlib.pyplot as plt

# Data from Program 1

# Univariate Exploration: Distribution of 'Age'

# Bivariate Exploration: Scatter plot of 'Income' vs 'Age'

# Multivariate Exploration: Pairplot (pairwise relationships)

# Explanatory Visualization: Boxplot of 'Income' across 'City'

# 1. Overview of Data Types (Structured, Semi-structured, Unstructured)

# Semi-structured Data: JSON format

# Unstructured Data: Text data (example)

# Inserting the structured data (inserting a document into MongoDB)

# Retrieve and print documents

Semi-structured Data (JSON):

Unstructured Data (Text):

MongoDB Data Insertion Complete.

Data Retrieved from MongoDB:

from pyspark.sql import SparkSession

# 1. Setting up a Spark session

# 4. Loading data: Show the result after transformation (ETL process)

print("\nCity Count Aggregation (Big Data Processing Example):")

# Stop the Spark session

import matplotlib.pyplot as plt

# 1. Data Preparation using Pandas

# 2. Basic Plotting with Matplotlib

# 3. Seaborn: Boxplot to compare Income by City

# 4. Seaborn: Scatterplot to explore relationship between Age and Income

# 1. Data Preparation using Pandas

# 2. Plotly: Scatter plot of Age vs Income with color by City

# 3. Plotly: Box plot of Income by City

# 4. Plotly: Histogram of Age

You might also like