KEMBAR78
KDDLabManual ForStudentsReference | PDF | Data Mining | Outlier
0% found this document useful (0 votes)
11 views43 pages

KDDLabManual ForStudentsReference

The document is a lab manual for the Knowledge Discovery and Data Mining course at D Y Patil International University, outlining the course's vision, mission, objectives, and outcomes. It includes detailed descriptions of various experiments using tools like Weka and Python for data preprocessing, outlier removal, and data mining techniques. The manual emphasizes hands-on learning through practical applications in areas such as classification, clustering, and data visualization.

Uploaded by

Sumeet malu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views43 pages

KDDLabManual ForStudentsReference

The document is a lab manual for the Knowledge Discovery and Data Mining course at D Y Patil International University, outlining the course's vision, mission, objectives, and outcomes. It includes detailed descriptions of various experiments using tools like Weka and Python for data preprocessing, outlier removal, and data mining techniques. The manual emphasizes hands-on learning through practical applications in areas such as classification, clustering, and data visualization.

Uploaded by

Sumeet malu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

DYPIU

D Y Patil International University Akurdi, Pune

TY Btech – DS Track

Knowledge Discovery and Data Mining

Lab Manual

1|Page
DYPIU
DY Patil International University Akurdi, Pune

Vision of the University:


To Create a vibrant learning environment – fostering innovation and
creativity, experiential learning, which is inspired by research, and focuses
on regionally, nationally and globally relevant areas.

Mission
To provide a diverse, vibrant and inspirational learning environment.
To establish the university as a leading experiential learning and research
oriented center.
To Become a responsive university serving the needs of industry and
society.
To embed internationalization, employability and value thinking.

2|Page
Knowledge Discovery and Data Mining
Course Objectives:

1. KDD deals with data integration techniques and with the discovery, interpretation and visualization of patterns in large
collections of data.
2. Topics covered in this course include data mining methods such as rule-based learning, decision trees, association rules
and neural-networks; data visualization; and the cross industry standard process for data mining (CRISP-DM).
3. The work discussed originates in the fields of artificial intelligence, machine learning, statistical data analysis, data
visualization, databases, and information retrieval.
4. Several scientific and industrial applications of KDD will be described. In particular, current applications.
5. to bioinformatics, ecommerce, and web mining will be studied.

Course Outcomes:

On completion of the course the student should be able to;


This Track course has been designed and expects the engineering graduates:

• On completion of the course the student should be able to:

• Understand basic concepts of knowledge discovery and data mining concepts.

• Apply Data pre-processing methods to improve the quality of data.

• Design a data warehouse for an enterprise/company.

• Implement various classification methods and common data mining techniques,

• mining frequent pattern, association, correlation, clustering on database

• Identify the importance and role of data mining for any real world applications.

Rules and Regulations for Laboratory:

● Students should be regular and punctual to all the Lab practical

● Lab assignments and practical’s should be submitted within a given time.

● Mobile phones are strictly prohibited in the Lab.

● Please shut down the Electronic Devices before leaving the Lab.

● Please keep the chair in proper position before leaving the Lab

● Maintain proper discipline in La


3|Page
S
TY Knowledge Discovery & Data Mining

4|Page
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.1
Title: Weka Tools
Aim: Demonstration of WEKA Explorer for preprocessing of data and
study the attributes.

Step 1: Load the dataset

Step 2: Apply the preprocessing to bring all attributes to same scale

5|Page
S
TY Knowledge Discovery & Data Mining
Step 3: Apply the Random Forest Classifier to predict if the given
person has diabetes or no

Conclusion:
1. Weka is a tool that is used to perform machine learning tasks,
without typing any code
2. Here in the above example, we demonstrated the use of Random
Forest Classification to determine if the person has diabetes or
no
3. As we can see the model classified 75 % of the data correctly,
while it struggled to classify 25% of the data.
4. While we got the Mean Absolute error score as 0.3115, which is
actually a better result.

6|Page
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.2
Title: Python
Aim: Removing Outliers using IQR , Zscore Methods.

1. Z- Score – Iris Dataset

a. Import the libraries and load the dataset

b. Check for missing values

7|Page
S
TY Knowledge Discovery & Data Mining
c. Statistical Summary for numeric columns

d. Box plot for checking Outliers

e. Performing Z-Score Stats to remove outliers

8|Page
S
TY Knowledge Discovery & Data Mining

f. Dataset after removing the outlier value

Conclusion:
As we can see there was one outlier value in the sepal width column, which was identified and
removed using the z -score statistics method.

2. IQR- Titanic Dataset

9|Page
S
TY Knowledge Discovery & Data Mining

a. Import Libraries and load the dataset

b. Check for missing values

c. Fill the missing values using mean for numeric column and mode for categorical column

10 | P a g e
S
TY Knowledge Discovery & Data Mining

d. Box plot for the fare column to check for the outliers

e. Remove the Outliers using IQR method

11 | P a g e
S
TY Knowledge Discovery & Data Mining

f. Outlier values in the dataset

g. Dataset After removing the outlier

h. Boxplot for the Fare column after removing the outlier

12 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
As we can see the Fare column had multiple outlier values which were identified and removed
using the inter-quartile range method. By observing the boxplots before and after the removal of
outliers we can understand the change in distribution of data.

13 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.3
Title: Python
Aim: Fill Missing Values & remove outlier using standard deviation

a. Import libraries and load the dataset

b. Check for the missing values

c. Method 1:- Fill the missing values using “Constant value” of that each particular column

14 | P a g e
S
TY Knowledge Discovery & Data Mining

d. Dataset after filling the missing values

e. Method 2:- Fill the missing values using mean, mode

f. Dataset after filling the missing values

g. Boxplot of the Fare column to observe the outliers

15 | P a g e
S
TY Knowledge Discovery & Data Mining

h. Calculating Standard Deviation and removing the outliers

i. Boxplot to confirm the removal of outlier

16 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We filled the missing values in the dataset using two methods: by constant value and by mean,
median and mode. Also we successfully identified and removed the outliers from the Fare Column
using the standard deviation.

17 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.4
Title: Python
Aim: Binning Method

18 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We performed binning by equal width and binning by equal frequency for the age column
from the dataset. We also perform data smoothing by applying transformation mean, by median and
by bin boundaries on the bins that were created.

19 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.5
Title: Assignment
Aim: Data Warehouse Schemas and Data Cube Construction.

///scan the signed hand written copy from your lab book and paste here.

20 | P a g e
S
TY Knowledge Discovery & Data Mining

21 | P a g e
S
TY Knowledge Discovery & Data Mining

22 | P a g e
S
TY Knowledge Discovery & Data Mining

23 | P a g e
S
TY Knowledge Discovery & Data Mining

24 | P a g e
S
TY Knowledge Discovery & Data Mining

25 | P a g e
S
TY Knowledge Discovery & Data Mining

26 | P a g e
S
TY Knowledge Discovery & Data Mining

27 | P a g e
S
TY Knowledge Discovery & Data Mining

28 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.6
Title: Python
Aim: ETL On given data set

29 | P a g e
S
TY Knowledge Discovery & Data Mining

a. Log File for ETL Process

b. The output csv file

30 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We extracted the dataset from the different sources like csv files, json files and xml files.
Applied transformation techniques on all those files, for allowing them to be merged. And then
loading all the data into a single csv file.

31 | P a g e
S
TY Knowledge Discovery & Data Mining

EXPERIMENT NO.7
Title: Python
Aim: Frequent Item set Pattern

32 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
Here we can observe the frequent items set and frequent pattern set based on their support
values and confidence values.

33 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.8
Title: Python
Aim: Linear regression

a. Import Libraries and load the dataset

b. Statistical Summary

34 | P a g e
S
TY Knowledge Discovery & Data Mining

c. Plot the distribution

d. Scatter Plot

35 | P a g e
S
TY Knowledge Discovery & Data Mining

e. Fit the Linear Regression Line on train data

36 | P a g e
S
TY Knowledge Discovery & Data Mining
f. Plot the Linear regression on test data

Conclusion:
We have successfully trained a Linear regression model on the Salary dataset.
After splitting the data into training and testing data, We trained the data on the
training data and then fitted the model on the test data

37 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.9
Title: Python
Aim: Apriori algorithm

a. Import libraries and load the dataset

b. The shape of the dataset

c. Applying the association rules on the dataset

38 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We applied the association rules in the grocery basket to understand the association of the
products with each other.

39 | P a g e
S
TY Knowledge Discovery & Data Mining

EXPERIMENT NO.10
Title: Python
Aim: Classification

40 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We have classified the 3 classes of Iris dataset into different columns using K-
Nearest Neighbors Classification method

41 | P a g e
S
TY Knowledge Discovery & Data Mining
EXPERIMENT NO.11
Title: Python
Aim: Clustering

42 | P a g e
S
TY Knowledge Discovery & Data Mining

Conclusion:
We successfully implement the clustering algorithm on the custom dataset.

43 | P a g e

You might also like