0% found this document useful (0 votes)

49 views26 pages

TYCS Practical

Data science involves analyzing data to extract useful information and insights. Common techniques include wrangling, preprocessing, modeling, and visualizing data. This document discusses concepts like regression, clustering, principal component analysis and how to apply them in Python.

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views26 pages

TYCS Practical

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Data science

Vikas College Of Arts, Science and Commerce Page 1

INDEX

Sr
Title Date Sign
No

1 Introduction to Excel

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

4 Hypothesis Testing

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

8 K-Means Clustering

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

PRACTICAL 1
Introduction to Excel
A. Perform conditional formatting on a dataset using various criteria.

Steps
Step 1: Go to conditional formatting > Greater Than

Step 2: Enter the greater than filter value for example 2000.

Vikas College Of Arts, Science and Commerce Page 3

Step 3: Go to Data Bars > Solid Fill in conditional formatting.

B. Create a pivot table to analyse and summarize data.

Steps
Step 1: select the entire table and go to Insert tab PivotChart > Pivotchart Step 2:
Select “New worksheet” in the create pivot chart window.

Vikas College Of Arts, Science and Commerce Page 4

Step 3: Select and drag attributes in the below boxes.

C. Use VLOOKUP function to retrieve information from a different worksheet or table. Steps:
Step 1: click on an empty cell and type the following command.
=VLOOKUP(B3, B3:D3,1, TRUE)

Vikas College Of Arts, Science and Commerce Page 5

D. Perform what-if analysis using Goal Seek to determine input values for desired output.
Steps-
Step 1: In the Data tab go to the what if analysis>Goal seek.

Step 2: Fill the information in the window accordingly and click ok.

Vikas College Of Arts, Science and Commerce Page 6

Vikas College Of Arts, Science and Commerce Page 7
PRACTICAL 2
Data Frames and Basic Data Pre-processing
A. Read data from CSV and JSON files into a data frame.
B. Perform basic data pre-processing tasks such as handling missing values and outliers. Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("samp.csv")
print("Our dataset:")
print(df)

# Reading JSON file into DataFrame

data = pd.read_json("sample.json")
print(data)

# Displaying the first 10 rows of the DataFrame

df.head(10)

# Filling missing values with 0

print("Dataset after filling NA values with 0:")
df2 = df.fillna(value=0)
print(df2)

# Dropping rows with any missing values

print("Dataset after dropping NA values:")
df.dropna(inplace=True)
print(df)

Vikas College Of Arts, Science and Commerce Page 8

C. Manipulate and transform data using functions like filtering, sorting, and grouping Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("samp.csv")

# Filtering data based on a condition (e.g., age greater than 25)

filtered_df = df[df["age"] > 25]

# Sorting data based on a column (e.g., sorting by age in descending order)

sorted_df = df.sort_values(by="age", ascending=False)

# Grouping data based on a column and applying an aggregation function (e.g., finding the average age per
city)
grouped_df = df.groupby("city").agg({"age": "mean"})

# Displaying the filtered DataFrame

print("Filtered DataFrame:")
print(filtered_df)

# Displaying the sorted DataFrame

print("\nSorted DataFrame:")
print(sorted_df)

# Displaying the grouped DataFrame

print("\nGrouped DataFrame:")
print(grouped_df)

Vikas College Of Arts, Science and Commerce Page 9

PRACTICAL 3
Feature Scaling and Dummification
A. Apply feature-scaling techniques like standardization and normalization to numerical
features.

Code:
# Standardization and normalization import pandas as pd
import numpy as np
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler

print("printing few data")

df = pd.read_csv("D:\TYCS\Data Science\SampleFile.csv")
print(df.head())

print("Max values")
max_vals = np.max(np.abs(df))
print(max_vals)
print((df - max_vals) / max_vals)

print("Normalization")
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
print(scaled_df.head())

print("Standardization")
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
print(scaled_df.head())

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 11
B. Perform feature Dummification to convert categorical variables into numerical
representations.
Code:

import pandas as pd
data = pd.read_csv("data32.csv")
categorical_features = data.select_dtypes(include="object")
dummies = pd.get_dummies(categorical_features)
data = pd.concat([data, dummies], axis=1)
data.drop(categorical_features, axis=1, inplace=True)
data.to_csv("Output.csv")

Vikas College Of Arts, Science and Commerce Page 12

Practical 4 Hypothesis
Testing
Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-square test) # t-test
import numpy as np
import scipy.stats as stats

np.random.seed(42)
scoreA = np.random.normal(loc=70,scale=10,size=30)
scoreB = np.random.normal(loc=75,scale=10,size=30)

t_stat,pvalue = stats.ttest_ind(scoreA,scoreB)
print(f"T-Statistics: {t_stat}\nP-Value: {pvalue}")

alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant difference in exam scores.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in exam scores.")

Output:

Chi-test
import numpy as np
import scipy.stats as stats
observed_data = np.array([[25, 15], [20, 40]])
chi2, pvalue, dof, expected = stats.chi2_contingency(observed_data)
print(f'Chi-Square Statistic: {chi2}\nPvalue: {pvalue}\nDegrees of Freedom: {dof}\nExpected
frequency:\n{expected}')
alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant association between gender and job satisfaction.")
else:
print("Fail to reject the null hypothesis. Gender and job satisfaction are independent.")
Output:

Vikas College Of Arts, Science and Commerce Page 13

Practical 5
ANOVA (Analysis of Variance)
Perform one-way ANOVA to compare means across multiple groups.
from scipy.stats import f_oneway

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

print("F-statistic:", f_statistic)

print("P-value:", p_value)

alpha = 0.05

if p_value < alpha:

print(

"Reject null hypothesis: There are significant differences between the means of the groups."

else:

print(

"Fail to reject null hypothesis: There are no significant differences between the means of the groups."

Output:-

Vikas College Of Arts, Science and Commerce Page 14

Practical 6
Regression and its Types.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Dependent variable (predictor)

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
# Independent variable (predictor)
y = np.array([[7], [9], [11], [13], [15], [17], [19], [21], [23], [25]])
# Dependent variable (response)

# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Simple Linear Regression

model = LinearRegression()
model.fit(X_train, y_train) # Fitting the model

# Coefficients
print("Intercept:", model.intercept_[0])
print("Coefficient:", model.coef_[0][0])

# Predictions
y_pred = model.predict(X_test)

# Model Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Vikas College Of Arts, Science and Commerce Page 15

# Plotting the regression line
plt.scatter(X_test, y_test, color="blue")
plt.plot(X_test, y_pred, color="red")
plt.title("Simple Linear Regression")
plt.xlabel("Independent Variable (X)")
plt.ylabel("Dependent Variable (y)")
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 16

Practical 7
Logistic Regression and Decision Tree
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.60, random_state=0)

# Determine the optimal number of clusters using the silhouette score

silhouette_scores = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, random_state=0).fit(X)
score = silhouette_score(X, kmeans.labels_)
silhouette_scores.append(score)

# Plot the silhouette scores

plt.plot(range(2, 11), silhouette_scores, marker="o")
plt.xlabel("Number of clusters")
plt.ylabel("Silhouette Score")
plt.title("Silhouette Score for Optimal Number of Clusters")
plt.show()

# Choose the optimal number of clusters based on the silhouette score

optimal_k = silhouette_scores.index(max(silhouette_scores)) + 2

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=optimal_k, random_state=0).fit(X)

# Visualize the clustering results

plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap="viridis", s=50, alpha=0.7)
plt.scatter(
kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1],
Vikas College Of Arts, Science and Commerce Page 17
s=200,
c="red",
marker="X",
label="Centroids",
)
plt.title("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()

# Analyze the cluster characteristics

silhouette_avg = silhouette_score(X, kmeans.labels_)
print(f"Silhouette Score: {silhouette_avg}")
Output:

Vikas College Of Arts, Science and Commerce Page 18

Vikas College Of Arts, Science and Commerce Page 19
Practical 8
K-Means clustering
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Load data
data = pd.read_csv("wholesale.csv")

# Display the first few rows of the dataset

data.head()

# Define categorical and continuous features

categorical_features = ["Channel", "Region"]
continuous_features = [
"Fresh",
"Milk",
"Grocery",
"Frozen",
"Detergents_Paper",
"Delicassen",
]

# Descriptive statistics for continuous features

data[continuous_features].describe()

# Convert categorical features into dummy variables

for col in categorical_features:
dummies = pd.get_dummies(data[col], prefix=col)
data = pd.concat([data, dummies], axis=1)
data.drop(col, axis=1, inplace=True)

Vikas College Of Arts, Science and Commerce Page 20

# Display the first few rows of the updated dataset
data.head()

# Normalize the data

mms = MinMaxScaler()
data_transformed = mms.fit_transform(data)

# Calculate the sum of squared distances for different values of k

sum_of_squared_distances = []
K = range(1, 15)
for k in K:
km = KMeans(n_clusters=k)
km.fit(data_transformed)
sum_of_squared_distances.append(km.inertia_)

# Plot the elbow method graph

plt.plot(K, sum_of_squared_distances, "bx-")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Sum of Squared Distances")
plt.title("Elbow Method for Optimal k")
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 21

Practical 9
Principal Component Analysis (PCA)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Perform PCA
pca = PCA(n_components=2) # Specify the number of components (dimensions)
X_r = pca.fit_transform(X)

# Create a DataFrame for visualization

df = pd.DataFrame(data=X_r, columns=['PC1', 'PC2'])
df['target'] = y

# Plot the data

plt.figure(figsize=(8, 6))
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

plt.scatter(df.loc[df['target'] == i, 'PC1'], df.loc[df['target'] == i, 'PC2'], color=color, alpha=.8, lw=lw,
label=target_name)

plt.title('PCA of IRIS dataset')

plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 22

Vikas College Of Arts, Science and Commerce Page 23
Practical 10
Data Visualization and Storytelling

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset

# Assume 'data.csv' contains your dataset
df = pd.read_csv("data.csv")

# Perform data analysis

# Example: Calculate summary statistics
summary_stats = df.describe()

# Create meaningful visualizations

# Example: Plot a histogram of a numerical variable
plt.figure(figsize=(8, 6))
sns.histplot(data=df, x="numerical_variable", bins=20, kde=True)
plt.title("Histogram of Numerical Variable")
plt.xlabel("Numerical Variable")
plt.ylabel("Frequency")
plt.show()

# Example: Plot a bar chart of a categorical variable

plt.figure(figsize=(8, 6))
sns.countplot(data=df, x="categorical_variable", palette="viridis")
plt.title("Bar Chart of Categorical Variable")
plt.xlabel("Categories")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()

# Present findings and insights in a clear and concise manner

# Example: Use Markdown to format text for presentation
print("# Data Analysis and Visualization Report\n")
print("## Summary Statistics:\n")
print(summary_stats)
print("\n## Insights:\n")
print(
"- The histogram shows that the distribution of the numerical variable is approximately normal."
)
print(
"- The bar chart indicates that category A is the most frequent in the categorical variable."
)
print(
"- The scatterplot suggests a positive correlation between numerical variables 1 and 2, with different
categories showing distinct patterns.\n"

Vikas College Of Arts, Science and Commerce Page 24

)

Output:

Vikas College Of Arts, Science and Commerce Page 25

Vikas College Of Arts, Science and Commerce Page 26

Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Omkar
No ratings yet
Omkar
37 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
ML
No ratings yet
ML
21 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
ML Combined
No ratings yet
ML Combined
254 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Saurabh
No ratings yet
Saurabh
22 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Data Science and Machine Learning - Interview Questions
No ratings yet
Data Science and Machine Learning - Interview Questions
185 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Hands On Machine Learning, End-to-End Machine Learning Project Notes
No ratings yet
Hands On Machine Learning, End-to-End Machine Learning Project Notes
10 pages
Medium Com Sarowar Saurav10 20 Advanced Statistical Approaches Every Data Scientist Should Know Ccc70ae4df28
No ratings yet
Medium Com Sarowar Saurav10 20 Advanced Statistical Approaches Every Data Scientist Should Know Ccc70ae4df28
15 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Dadv Manual
No ratings yet
Dadv Manual
35 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
R Basics for Beginners
No ratings yet
R Basics for Beginners
24 pages
Practicals
No ratings yet
Practicals
42 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
ML Updated File
No ratings yet
ML Updated File
36 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
TOBo ML
No ratings yet
TOBo ML
120 pages
Ric BNB Manual For MSC It Part 1, Sem-1
No ratings yet
Ric BNB Manual For MSC It Part 1, Sem-1
53 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Course Guide: Department of Industrial Engineering
No ratings yet
Course Guide: Department of Industrial Engineering
2 pages
Tests For Structural Change and Stability: y X X I N
No ratings yet
Tests For Structural Change and Stability: y X X I N
8 pages
Fin534 - Individual Assignment 2 Part B
No ratings yet
Fin534 - Individual Assignment 2 Part B
2 pages
HMM Intraday Momentum
No ratings yet
HMM Intraday Momentum
23 pages
For Printing-Chapter 2-Ambasa, Et Al.
No ratings yet
For Printing-Chapter 2-Ambasa, Et Al.
19 pages
OceanofPDF - Com Power and Sample Size in R - Catherine M Crespi
No ratings yet
OceanofPDF - Com Power and Sample Size in R - Catherine M Crespi
574 pages
ANOVA DTGT Practice With Answers
No ratings yet
ANOVA DTGT Practice With Answers
2 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
Harvard Government 2000 Syllabus
No ratings yet
Harvard Government 2000 Syllabus
10 pages
Econometrics (Econ 301) : Erdalaydin@sabanciuniv - Edu
No ratings yet
Econometrics (Econ 301) : Erdalaydin@sabanciuniv - Edu
2 pages
PLS Tutorial PDF
No ratings yet
PLS Tutorial PDF
12 pages
Yule Walker Estimation in MATLAB
No ratings yet
Yule Walker Estimation in MATLAB
16 pages
Stats Extra Credit
100% (3)
Stats Extra Credit
5 pages
Basic Statistics For Health Sciences
91% (11)
Basic Statistics For Health Sciences
361 pages
Innovation - Individual - Feasibility of Seashells As A Blackboard Chalk
No ratings yet
Innovation - Individual - Feasibility of Seashells As A Blackboard Chalk
39 pages
Bivariate Analysis
100% (2)
Bivariate Analysis
19 pages
T-Distribution Lesson Plan
No ratings yet
T-Distribution Lesson Plan
12 pages
Sample Problem With Answers On Hypothesis Testing
No ratings yet
Sample Problem With Answers On Hypothesis Testing
3 pages
Final Exam 2022
No ratings yet
Final Exam 2022
6 pages
Submitted To Submitted by - : - DR Amanpreet Kaur Luthra Siddak Maggo Enroll No.-03890201822 3 Semester (Morning Shift)
No ratings yet
Submitted To Submitted by - : - DR Amanpreet Kaur Luthra Siddak Maggo Enroll No.-03890201822 3 Semester (Morning Shift)
55 pages
Lecture05 - Survival Analysis
No ratings yet
Lecture05 - Survival Analysis
52 pages
Solutions RVCE AIML Test 2
No ratings yet
Solutions RVCE AIML Test 2
5 pages
Two-Way Anova Ujian Anova Dua Hala Kom 6115
No ratings yet
Two-Way Anova Ujian Anova Dua Hala Kom 6115
17 pages
Exercise On T Test and Correlation Final
No ratings yet
Exercise On T Test and Correlation Final
10 pages
Volatility Forecasting - A Comparison of GARCH (1,1) and EWMA Models
No ratings yet
Volatility Forecasting - A Comparison of GARCH (1,1) and EWMA Models
14 pages
Sol 08
No ratings yet
Sol 08
16 pages
Attachment 1
No ratings yet
Attachment 1
3 pages
Univariate and Bivariate Analysis
No ratings yet
Univariate and Bivariate Analysis
21 pages
Chapter 16 - Class
No ratings yet
Chapter 16 - Class
36 pages
Cheat Sheet Interpreting Regressions Three Pages
No ratings yet
Cheat Sheet Interpreting Regressions Three Pages
3 pages

TYCS Practical

Uploaded by

TYCS Practical

Uploaded by

Data science

Vikas College Of Arts, Science and Commerce Page 1

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

Vikas College Of Arts, Science and Commerce Page 3

B. Create a pivot table to analyse and summarize data.

Vikas College Of Arts, Science and Commerce Page 4

Vikas College Of Arts, Science and Commerce Page 5

Vikas College Of Arts, Science and Commerce Page 6

# Reading CSV file into DataFrame

# Reading JSON file into DataFrame

# Displaying the first 10 rows of the DataFrame

# Filling missing values with 0

# Dropping rows with any missing values

Vikas College Of Arts, Science and Commerce Page 8

# Reading CSV file into DataFrame

# Filtering data based on a condition (e.g., age greater than 25)

# Sorting data based on a column (e.g., sorting by age in descending order)

# Displaying the filtered DataFrame

# Displaying the sorted DataFrame

# Displaying the grouped DataFrame

Vikas College Of Arts, Science and Commerce Page 9

print("printing few data")

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 12

Vikas College Of Arts, Science and Commerce Page 13

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

if p_value < alpha:

Vikas College Of Arts, Science and Commerce Page 14

# Dependent variable (predictor)

# Splitting the data into training and testing sets

# Simple Linear Regression

Vikas College Of Arts, Science and Commerce Page 15

Vikas College Of Arts, Science and Commerce Page 16

# Generate sample data

# Determine the optimal number of clusters using the silhouette score

# Plot the silhouette scores

# Choose the optimal number of clusters based on the silhouette score

# Apply K-Means clustering with the optimal number of clusters

# Visualize the clustering results

# Analyze the cluster characteristics

Vikas College Of Arts, Science and Commerce Page 18

# Display the first few rows of the dataset

# Define categorical and continuous features

# Descriptive statistics for continuous features

# Convert categorical features into dummy variables

Vikas College Of Arts, Science and Commerce Page 20

# Normalize the data

# Calculate the sum of squared distances for different values of k

# Plot the elbow method graph

Vikas College Of Arts, Science and Commerce Page 21

# Load the Iris dataset

# Create a DataFrame for visualization

# Plot the data

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

plt.title('PCA of IRIS dataset')

Vikas College Of Arts, Science and Commerce Page 22

# Load the dataset

# Perform data analysis

# Create meaningful visualizations

# Example: Plot a bar chart of a categorical variable

# Present findings and insights in a clear and concise manner

Vikas College Of Arts, Science and Commerce Page 24

Vikas College Of Arts, Science and Commerce Page 25

You might also like