0% found this document useful (0 votes)

41 views9 pages

Some Exercises

The document describes several exercises for practicing different data analysis techniques. The exercises cover topics like data cleaning, exploratory data analysis, regression, clustering, classification, and visualization. Example solutions are provided for some basic exercises involving data cleaning, EDA, and visualization.

Uploaded by

Eralda FRROKU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views9 pages

Some Exercises

Uploaded by

Eralda FRROKU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Some exercises :

Exercise 1: Data Cleaning

Objective: Cleanse the data to prepare it for analysis.
Dataset: You have a dataset with information about customers, but it contains missing values
and outliers.
Tasks:
1. Identify and handle missing values (you can use techniques like filling with
mean/median or removing rows/columns).
2. Detect and deal with outliers in numerical variables.
Exercise 2: Exploratory Data Analysis (EDA)
Objective: Understand the characteristics of the dataset through exploratory data analysis.
Dataset: Use a dataset containing information about sales transactions.
Tasks:
1. Create summary statistics (mean, median, standard deviation, etc.) for numerical
variables.
2. Generate visualizations (histograms, box plots, scatter plots) to explore the
distribution of key variables.
3. Identify correlations between variables.
Exercise 3: Regression Analysis
Objective: Explore relationships between variables and make predictions.
Dataset: Use a dataset with information about housing prices, including features like square
footage, number of bedrooms, etc.
Tasks:
1. Perform a simple linear regression to predict housing prices based on a single variable
(e.g., square footage).
2. Evaluate the model's performance using metrics like Mean Squared Error or R-
squared.
Exercise 4: Clustering
Objective: Group similar data points together based on certain criteria.
Dataset: Use a dataset containing customer behavior data.
Tasks:
1. Apply a clustering algorithm (e.g., k-means) to group customers based on their
behavior.
2. Visualize the clusters and interpret the results.
Exercise 5: Classification
Objective: Assign observations to predefined categories or classes.
Dataset: Use a dataset with information about customer purchases.
Tasks:
1. Define a target variable (e.g., whether a customer will make a repeat purchase).
2. Split the data into training and testing sets.
3. Train a classification model (e.g., logistic regression or decision tree) to predict the
target variable.
4. Evaluate the model's performance on the testing set.
Exercise 6: Data Visualization
Objective: Create visual representations of data to aid in understanding.
Dataset: Choose a dataset that interests you.
Tasks:
1. Select two or more variables and create appropriate visualizations (e.g., bar chart, line
plot, pie chart).
2. Use color and labeling to enhance the interpretability of the visualizations.

Exercises with solutions

Exercise 1: Data Cleaning
Dataset: Download the dataset
Objective: Cleanse the COVID-19 dataset to prepare it for analysis.
Tasks:
1. Load the dataset into your preferred data analysis tool.
2. Identify and handle missing values.
3. Remove unnecessary columns.
4. Ensure consistency in date formats.
import pandas as pd

# Load the dataset

url = "https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-
aggregated.csv"
covid_data = pd.read_csv(url)

# Identify and handle missing values

covid_data.dropna(inplace=True)

# Remove unnecessary columns

covid_data = covid_data[['Date', 'Country', 'Confirmed', 'Recovered', 'Deaths']]

# Ensure consistency in date formats

covid_data['Date'] = pd.to_datetime(covid_data['Date'])

# Display the cleaned dataset

print(covid_data.head())

Exercise 2: Exploratory Data Analysis (EDA)

Dataset: Download the dataset
Objective: Perform exploratory data analysis on the Auto MPG dataset.
Tasks:
1. Load the dataset into your preferred data analysis tool.
2. Generate summary statistics for numerical variables.
3. Create visualizations to explore the distribution of key variables.
4. Identify correlations between variables.
Solution:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
column_names = ["MPG", "Cylinders", "Displacement", "Horsepower", "Weight",
"Acceleration", "Model Year", "Origin", "Car Name"]
auto_data = pd.read_csv(url, delim_whitespace=True, names=column_names)

# Summary statistics
print(auto_data.describe())

# Visualizations
sns.pairplot(auto_data[['MPG', 'Cylinders', 'Displacement', 'Weight']])
plt.show()

# Correlation matrix
correlation_matrix = auto_data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()

Exercise 3: Regression Analysis

Dataset: Download the dataset
Objective: Perform a simple linear regression to predict the age of abalones.
Tasks:
1. Load the dataset into your preferred data analysis tool.
2. Choose a variable to predict (e.g., Rings).
3. Split the data into training and testing sets.
4. Train a linear regression model.
5. Evaluate the model's performance using metrics like Mean Squared Error or R-
squared
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
column_names = ["Sex", "Length", "Diameter", "Height", "WholeWeight",
"ShuckedWeight", "VisceraWeight", "ShellWeight", "Rings"]
abalone_data = pd.read_csv(url, names=column_names)

# Choose a variable to predict

X = abalone_data[['Length', 'Diameter', 'Height', 'WholeWeight', 'ShuckedWeight',
'VisceraWeight', 'ShellWeight']]
y = abalone_data['Rings']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')

print(f'R-squared: {r2}')

BASIC EXERCISES WITH SOLUTIONS

Exercise 1: Data Cleaning
Objective: Cleanse a simple dataset.
Dataset:
import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank', 'Grace', 'Harry', 'Ivy', 'Jack'],
'Age': [25, 28, None, 22, 30, 35, 28, None, 24, 29],
'Salary': [50000, 60000, 75000, 48000, None, 90000, 80000, 75000, 52000, 60000]
}

df = pd.DataFrame(data)

Tasks:
1. Handle missing values in the 'Age' and 'Salary' columns.
2. Drop any rows with missing values.
3. Display the cleaned dataset.
# Handle missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)

# Drop rows with missing values

df.dropna(inplace=True)

# Display the cleaned dataset

print(df)

Exercise 2: Exploratory Data Analysis (EDA)

Objective: Explore a dataset and generate basic insights.
Dataset:
import pandas as pd

data = {
'Student_ID': [1, 2, 3, 4, 5],
'Math_Score': [85, 90, 78, 92, 88],
'English_Score': [75, 80, 85, 88, 92],
'Science_Score': [90, 85, 88, 80, 95]
}

df = pd.DataFrame(data)

Tasks:
1. Calculate the mean, median, and standard deviation for each subject.
2. Plot a bar chart to visualize the average scores for each subject.
# Calculate mean, median, and standard deviation
subject_stats = df.describe().loc[['mean', '50%', 'std']].transpose()
print(subject_stats)

# Plot a bar chart

import matplotlib.pyplot as plt

subject_stats.plot(kind='bar', y='mean', yerr='std', legend=False)

plt.title('Average Scores for Each Subject')
plt.ylabel('Score')
plt.xlabel('Subject')
plt.show()

Exercise 3: Data Visualization

Objective: Visualize a dataset using scatter plots.
Dataset:
import pandas as pd

data = {
'Hours_Studied': [2, 3, 5, 1, 4, 6, 7, 3, 2, 5],
'Exam_Score': [50, 65, 80, 40, 75, 90, 95, 60, 55, 85]
}

df = pd.DataFrame(data)

Tasks:
1. Create a scatter plot to visualize the relationship between hours studied and exam
scores.
2. Add labels and a title to the plot.
import matplotlib.pyplot as plt

# Scatter plot
plt.scatter(df['Hours_Studied'], df['Exam_Score'])
plt.title('Relationship Between Hours Studied and Exam Score')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.show()

Pandas Prac
No ratings yet
Pandas Prac
4 pages
Index
No ratings yet
Index
4 pages
Data Analytics Course for Beginners
No ratings yet
Data Analytics Course for Beginners
34 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML
No ratings yet
ML
21 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
External
No ratings yet
External
11 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
DA Lab
No ratings yet
DA Lab
27 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Essential Python
No ratings yet
Essential Python
16 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Data Science
No ratings yet
Data Science
18 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Semi-Automated EDA in Python
No ratings yet
Semi-Automated EDA in Python
3 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Dadv Manual
No ratings yet
Dadv Manual
35 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Dav Lab Manual Final
No ratings yet
Dav Lab Manual Final
16 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Ids 1
No ratings yet
Ids 1
30 pages
Data Munging & Storage Price Projections
No ratings yet
Data Munging & Storage Price Projections
14 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Data Analytics Lab Manual - 250402 - 095326
No ratings yet
Data Analytics Lab Manual - 250402 - 095326
58 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Saurabh
No ratings yet
Saurabh
22 pages
Set 2
No ratings yet
Set 2
3 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
Essential: Python
No ratings yet
Essential: Python
16 pages
Data Analysis Practical
No ratings yet
Data Analysis Practical
13 pages
A Short Case Study of SMAC Technique in Digital Transformation
No ratings yet
A Short Case Study of SMAC Technique in Digital Transformation
3 pages
A Short Case Study Using PESTEL Model
100% (1)
A Short Case Study Using PESTEL Model
2 pages
Quiz
No ratings yet
Quiz
3 pages
Introduction To Data Science
100% (1)
Introduction To Data Science
200 pages
Exercise Availability
No ratings yet
Exercise Availability
2 pages
Python Data Analysis for Beginners
No ratings yet
Python Data Analysis for Beginners
7 pages

Some Exercises

Uploaded by

Some Exercises

Uploaded by

Some exercises :

Exercise 1: Data Cleaning

Exercises with solutions

# Load the dataset

# Identify and handle missing values

# Remove unnecessary columns

# Ensure consistency in date formats

# Display the cleaned dataset

Exercise 2: Exploratory Data Analysis (EDA)

# Load the dataset

Exercise 3: Regression Analysis

# Load the dataset

# Choose a variable to predict

# Split the data into training and testing sets

# Train a linear regression model

# Make predictions on the test set

# Evaluate the model's performance

print(f'Mean Squared Error: {mse}')

BASIC EXERCISES WITH SOLUTIONS

# Drop rows with missing values

# Display the cleaned dataset

Exercise 2: Exploratory Data Analysis (EDA)

# Plot a bar chart

subject_stats.plot(kind='bar', y='mean', yerr='std', legend=False)

Exercise 3: Data Visualization

You might also like