0% found this document useful (0 votes)

5 views10 pages

DWDM Lab Report

The document outlines three labs focused on implementing machine learning algorithms using Python: K-means clustering, Apriori algorithm, and ID3 decision tree. Each lab includes objectives, theoretical background, executable code, and conclusions about the implementation results. The K-means lab clusters 1000 random data points, the Apriori lab analyzes retail transaction data for association rules, and the ID3 lab predicts diabetes using a decision tree classifier.

Uploaded by

Wakizu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views10 pages

DWDM Lab Report

Uploaded by

Wakizu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

TRIBHUVAN UNIVERSITY

INSTITUTE OF SCIENCE AND TECHNOLOGY

BIRENDRA MULTIPLE CAMPUS

Data Warehousing and Data Mining

BIT 454

Submitted by
Aaditya Pageni (BIT 267/077)

Submitted to
Lab 01 :

Implementation of K-means clustering algorithm

Objective:

Write a Python program to implement K-means Clustering algorithm. Generate 1000 2D data
points in the range 0-100 randomly. Divide data points into 3 clusters.

Required Theory:

K-means clustering is a popular unsupervised machine learning algorithm used for partitioning
a dataset into a predetermined number of clusters. The goal of K means is to minimize the
within-cluster variance, also known as inertia or sum of squared distances from each point in the
cluster to the centroid of that cluster. Here's a step-by-step overview of how the algorithm works:

1. Initialization: Choose the number of clusters (K) and randomly initialize K centroids.
Centroids are the points that represent the center of each cluster.

2. Assignment Step: Assign each data point to the nearest centroid. This is typically done by
calculating the Euclidean distance between each point and each centroid, and assigning each
point to the cluster with the nearest centroid.

3. Update Step: After all points have been assigned to clusters, calculate the mean of the points
in each cluster and update the centroid to be the mean. This moves the centroid to the center of
its cluster.

4. Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs when the centroids no
longer change significantly or when a maximum number of iterations is reached.

5. Final Step: Once convergence is reached, the algorithm outputs the final centroids and the
cluster assignments for each data point.

Executable python code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.random.rand(1000, 2) * 100
km = KMeans(n_clusters=3, init="random")
km.fit(data)
centers = km.cluster_centers_
labels = km.labels_
print("Cluster centers: ", *centers)
# print("Cluster Labels: ", *labels)
colors = ["r", "g", "b"]
markers = ["+", "x", "*"]
for i in range(len(data)):
plt.plot(data[i][0], data[i][1], color=colors[labels[i]],
marker=markers[labels[i]])
plt.scatter(centers[:, 0], centers[:, 1], marker="s", s=100,
linewidths=5)
plt.show()

Output:

Cluser centers: [82.29904926 50.2243326 ] [31.71356124 22.45600779]

[31.52067957 77.620532 ]

Conclusion:

Hence, we implemented the k-means clustering algorithm in python using google colab.

Lab 02 :

Implementation of Apriori algorithm

Objective:

Write a Python program to utilize the Apriori algorithm on a retail dataset and identify
significant association rules between items in customer transactions.

Required Theory:

The Apriori algorithm is a classical algorithm in data mining and machine learning used for
association rule mining in transactional databases. It aims to find interesting relationships or
associations among a set of items in large datasets. The most common application of the Apriori
algorithm is in market basket analysis, where it helps identify associations between products that
are frequently purchased together.

Algorithm Steps:

1. Generating Candidate Itemsets:

- The Apriori algorithm starts by generating candidate itemsets of length 1 (individual items)
and then iteratively generates larger itemsets.

- New candidate itemsets are generated by joining pairs of frequent itemsets found in the
previous iteration.

2. Pruning Candidate Itemsets:

- After generating candidate itemsets, the algorithm scans the dataset to count the support of
each candidate itemset.

- Candidate itemsets that do not meet the minimum support threshold are pruned from further
consideration.

3. Generating Association Rules:

- Once frequent itemsets are identified, association rules are generated from these itemsets.

- Association rules are generated by partitioning frequent itemsets into non empty subsets and
calculating support, confidence, and lift for each rule.

- Rules that meet the minimum confidence and lift thresholds are considered significant and are
returned as the final output of the algorithm.

Dataset Description:

We are using a dataset of a retail store which contains 7501 total customer transactions (rows) in
a CSV file. A snapshot of the dataset transaction is given below:
Executable python code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
path="/content/store_data.csv"
dataset=pd . read_csv (path)
dataset.head(None)
records = []
for i in range(0, 7500):
test = []
data = dataset.iloc[i]
data = data.dropna()
for j in range(0, len(data)):
test.append(str(dataset.values[i, j]))
records.append(test)
association_rules = apriori(
records, min_support=0.005, min_confidence=0.2,
min_lift=3, min_length=2
)
association_results = list(association_rules)
for item in association_results:
print(list(item[2][0][0]), '->', list(item[2][0][1]))

Output:

Association rules generated:

['mushroom cream sauce'] -> ['escalope']

['pasta'] -> ['escalope']
['herb & pepper'] -> ['ground beef']
['tomato sauce'] -> ['ground beef']
['whole wheat pasta'] -> ['olive oil']
['pasta'] -> ['shrimp']
['chocolate', 'frozen vegetables'] -> ['shrimp']
['spaghetti', 'frozen vegetables'] -> ['ground beef']
['shrimp', 'mineral water'] -> ['frozen vegetables']
['spaghetti', 'frozen vegetables'] -> ['olive oil']
['spaghetti', 'frozen vegetables'] -> ['shrimp']
['spaghetti', 'frozen vegetables'] -> ['tomatoes']
['spaghetti', 'grated cheese'] -> ['ground beef']
['herb & pepper', 'mineral water'] -> ['ground beef']
['herb & pepper', 'spaghetti'] -> ['ground beef']
['shrimp', 'ground beef'] -> ['spaghetti']
['milk', 'spaghetti'] -> ['olive oil']
['mineral water', 'soup'] -> ['olive oil']
['pancakes', 'spaghetti'] -> ['olive oil']

Conclusion:

Hence, we implemented Apriori algorithm in python using google collab.

Lab 03 :

Implementation of ID3 decision tree algorithm

Objective:

Write a python program to predict diabetes using ID3 Decision Tree Classifier.
Required Theory:

The ID3 (Iterative Dichotomiser 3) algorithm is a classic and straightforward algorithm used for
constructing decision trees. It was developed by Ross Quinlan in 1986 and is particularly
popular for its simplicity and ease of understanding.

ID3 algorithm works as:

1. Input Data: ID3 algorithm starts with a dataset containing features and corresponding target
labels.

2. Feature Selection: It selects the best attribute to split the data at each node based on a
criterion called Information Gain. Information Gain measures how much entropy (uncertainty or
randomness) is reduced in the dataset after splitting on a particular attribute.

3. Tree Construction: It recursively constructs the decision tree by selecting the best attribute to
split the data at each node. This process continues until one of the stopping criteria is met, such
as:

- All instances at a node belong to the same class.

- No more attributes are left to split on.

- The tree reaches a maximum depth.

4. Output: The resulting decision tree is used for classification by following the decision paths
from the root to the leaf nodes based on the values of the features of the input data.

Dataset Description:

We are using a dataset of a hospital which contains 768 total patients transactions (rows) in a
CSV file. A snapshot of the dataset transaction is given below:
Executable python code:

import pandas as pd
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
path="/content/Diabetes.csv"
dataset = pd.read_csv(path)
print("Dataset Size: ", len(dataset))
split = int(len(dataset) * 0.7)
train, test = dataset.iloc[:split], dataset.iloc[split:]
p = train["Pragnency"].values
g = train["Glucose"].values
bp = train["Blod Pressure"].values
st = train["Skin Thikness"].values
ins = train["Insulin"].values
bmi = train["BMI"].values
dpf = train["DFP"].values
a = train["Age"].values
d = train["Diabetes"].values
trainfeatures = zip(p, g, bp, st, ins, bmi, dpf, a)
traininput = list(trainfeatures)
# print(traininput)
model = DecisionTreeClassifier(criterion="entropy",
max_depth=4)
model.fit(traininput, d)
p = test["Pragnency"].values
g = test["Glucose"].values
bp = test["Blod Pressure"].values
st = test["Skin Thikness"].values
ins = test["Insulin"].values
bmi = test["BMI"].values
dpf = test["DFP"].values
a = test["Age"].values
d = test["Diabetes"].values
testfeatures = zip(p, g, bp, st, ins, bmi, dpf, a)
testinput = list(testfeatures)
predicted = model.predict(testinput)
# print('Actual Class:', *d)
# print('Predicted Class:', *predicted)
print("Confusion Matrix:")
print(metrics.confusion_matrix(d, predicted))
print("\nClassification Measures:")
print("Accuracy:", metrics.accuracy_score(d, predicted))
print("Recall:", metrics.recall_score(d, predicted))
print("Precision:", metrics.precision_score(d,
predicted))
print("F1-score:", metrics.f1_score(d, predicted)

Output:

Association rules generated:

Dataset Size: 767
Confusion Matrix:
[[117 35]
[ 17 62]]

Classification Measures:
Accuracy: 0.7748917748917749
Recall: 0.784A8101265822784
Precision: 0.6391752577319587
F1-score: 0.7045454545454545

Conclusion:

Hence, we implemented the ID3 decision tree algorithm in python using google colab.

DWDM Lab Report
No ratings yet
DWDM Lab Report
12 pages
Document 1116
No ratings yet
Document 1116
6 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Apriori Algorithm & Clustering Guide
No ratings yet
Apriori Algorithm & Clustering Guide
8 pages
Ashfatmaterial
No ratings yet
Ashfatmaterial
4 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Da Exp 9
No ratings yet
Da Exp 9
5 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Da Pra Week 15 (Apriori Algo) - 114413
No ratings yet
Da Pra Week 15 (Apriori Algo) - 114413
11 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Indexdw
No ratings yet
Indexdw
34 pages
DMT Cia2
No ratings yet
DMT Cia2
11 pages
Apriori Algorithm in Word File
No ratings yet
Apriori Algorithm in Word File
16 pages
ML Expected Question and Explanation of The 3 PGM
No ratings yet
ML Expected Question and Explanation of The 3 PGM
12 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
Experiment 3
No ratings yet
Experiment 3
3 pages
BDA Experiments
No ratings yet
BDA Experiments
41 pages
Machine Learnine Experiment by Priyanka
No ratings yet
Machine Learnine Experiment by Priyanka
6 pages
Apriori Algorithm: Market Basket Analysis
No ratings yet
Apriori Algorithm: Market Basket Analysis
23 pages
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
No ratings yet
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
8 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
26 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
CUET ML Algorithms Report
No ratings yet
CUET ML Algorithms Report
28 pages
Apriori Algorithm in Machine Learning
No ratings yet
Apriori Algorithm in Machine Learning
8 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
51 pages
DA Assignment
No ratings yet
DA Assignment
18 pages
ID3 Algorithm
No ratings yet
ID3 Algorithm
11 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
Record
No ratings yet
Record
23 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
12 pages
ML Exp 4,5
No ratings yet
ML Exp 4,5
7 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
ML Copy
No ratings yet
ML Copy
33 pages
AD3461 ML Lab Manual
No ratings yet
AD3461 ML Lab Manual
32 pages
DWDM Lab Manual 28.04.25-9-14
No ratings yet
DWDM Lab Manual 28.04.25-9-14
6 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Machine Learning File
No ratings yet
Machine Learning File
28 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Ex. 9 Association Rule Learning Using Apriori Algorithm
No ratings yet
Ex. 9 Association Rule Learning Using Apriori Algorithm
3 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
219 - Exp 9 - DWM
No ratings yet
219 - Exp 9 - DWM
10 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Advanced Database
No ratings yet
Advanced Database
23 pages
Apriori Algorithm: Market Basket Analysis Guide
No ratings yet
Apriori Algorithm: Market Basket Analysis Guide
30 pages
ARM and Clustering
No ratings yet
ARM and Clustering
79 pages
R Record-1
No ratings yet
R Record-1
53 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
ML Assignment 3
No ratings yet
ML Assignment 3
7 pages
DataAnalytics Practical3
No ratings yet
DataAnalytics Practical3
3 pages
Algorithm
No ratings yet
Algorithm
8 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
Error Control, Digital Data Communication Technique
No ratings yet
Error Control, Digital Data Communication Technique
44 pages
Che188-1 Q#2
No ratings yet
Che188-1 Q#2
3 pages
Solving LP with Simplex Method
No ratings yet
Solving LP with Simplex Method
60 pages
ch3 Linear-Programming
No ratings yet
ch3 Linear-Programming
15 pages
Deadline: You Have To Submit Before The End of PS: Humanturn Humantotalscore Computerturn Computertotalscore
No ratings yet
Deadline: You Have To Submit Before The End of PS: Humanturn Humantotalscore Computerturn Computertotalscore
1 page
Quiz Machine Learning
No ratings yet
Quiz Machine Learning
4 pages
B.Tech Soft Computing Exam 2020-21
100% (1)
B.Tech Soft Computing Exam 2020-21
2 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Class IX
No ratings yet
Class IX
18 pages
Question Bank DSA
No ratings yet
Question Bank DSA
6 pages
Digital Signal Processing Basics
No ratings yet
Digital Signal Processing Basics
7 pages
Chap 03 PDF
No ratings yet
Chap 03 PDF
32 pages
Iva Syb With Lab
No ratings yet
Iva Syb With Lab
3 pages
Lab Assignment
0% (1)
Lab Assignment
62 pages
Excel Solver Optimization Report
No ratings yet
Excel Solver Optimization Report
14 pages
15 FACTORING Polynomials
No ratings yet
15 FACTORING Polynomials
59 pages
Meritnation
No ratings yet
Meritnation
4 pages
Understanding Signal Correlation
No ratings yet
Understanding Signal Correlation
16 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
25 pages
CSIT Assignment 2
No ratings yet
CSIT Assignment 2
14 pages
Lec 01
No ratings yet
Lec 01
16 pages
Data Structure Question Bank
No ratings yet
Data Structure Question Bank
13 pages
DSA Insem
No ratings yet
DSA Insem
2 pages
14 Patterns To Ace Any Coding Interview Question - HackerNoon
No ratings yet
14 Patterns To Ace Any Coding Interview Question - HackerNoon
30 pages
Complex Numbers & Matrices Guide
No ratings yet
Complex Numbers & Matrices Guide
10 pages
Analog and Digital Signal Processing PDF
100% (1)
Analog and Digital Signal Processing PDF
821 pages
Design and Analysis of Algorithm: Lab File
No ratings yet
Design and Analysis of Algorithm: Lab File
58 pages
Comparing Four Model-Order Reduction Techniques, Applied To
No ratings yet
Comparing Four Model-Order Reduction Techniques, Applied To
18 pages
Cat 1 Ans Key 25 Batch-1
No ratings yet
Cat 1 Ans Key 25 Batch-1
4 pages
dịch bt
No ratings yet
dịch bt
11 pages

DWDM Lab Report

Uploaded by

DWDM Lab Report

Uploaded by

TRIBHUVAN UNIVERSITY

INSTITUTE OF SCIENCE AND TECHNOLOGY

BIRENDRA MULTIPLE CAMPUS

Data Warehousing and Data Mining

Implementation of K-means clustering algorithm

Executable python code:

Cluser centers: [82.29904926 50.2243326 ] [31.71356124 22.45600779]

Implementation of Apriori algorithm

1. Generating Candidate Itemsets:

2. Pruning Candidate Itemsets:

3. Generating Association Rules:

Association rules generated:

['mushroom cream sauce'] -> ['escalope']

Hence, we implemented Apriori algorithm in python using google collab.

Implementation of ID3 decision tree algorithm

ID3 algorithm works as:

- All instances at a node belong to the same class.

- No more attributes are left to split on.

- The tree reaches a maximum depth.

Association rules generated:

You might also like