0% found this document useful (0 votes)

11 views12 pages

DWDM Lab Report

The document outlines a series of laboratory exercises conducted using Python and Weka for data mining techniques, including K-means clustering, Apriori algorithm, ID3 decision tree, and DBSCAN clustering. Each lab includes objectives, required theory, executable Python code, and conclusions on the implementation results. The exercises demonstrate practical applications of machine learning algorithms on various datasets, focusing on data visualization and analysis.

Uploaded by

Wakizu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

DWDM Lab Report

Uploaded by

Wakizu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

TRIBHUVAN UNIVERSITY

INSTITUTE OF SCIENCE AND TECHNOLOGY

BIRENDRA MULTIPLE CAMPUS

Data Warehousing and Data Mining

BIT 454

Submitted by
Aaditya Pageni (BIT 267/077)

Submitted to
Lab 01: Using Weka tool for data-mining

Objective: To use Weka tool for visualization of data

Required Theory

Weka is an open-source software suite written in Java that provides a collection of machine learning
algorithms for data mining tasks. It offers a user-friendly graphical interface, allowing users to perform
data preprocessing, classification, regression, clustering, association rules mining, and visualization.

Apple sweetness data visualization

Lab 02 :

Implementation of K-means clustering algorithm

Objective:

Write a Python program to implement K-means Clustering algorithm. Generate 1000 2D data
points in the range 0-100 randomly. Divide data points into 3 clusters.
Required Theory:

K-means clustering is a popular unsupervised machine learning algorithm used for partitioning
a dataset into a predetermined number of clusters. The goal of K means is to minimize the
within-cluster variance, also known as inertia or sum of squared distances from each point in the
cluster to the centroid of that cluster. Here's a step-by-step overview of how the algorithm works:

1. Initialization: Choose the number of clusters (K) and randomly initialize K centroids.
Centroids are the points that represent the center of each cluster.

2. Assignment Step: Assign each data point to the nearest centroid. This is typically done by
calculating the Euclidean distance between each point and each centroid, and assigning each
point to the cluster with the nearest centroid.

3. Update Step: After all points have been assigned to clusters, calculate the mean of the points
in each cluster and update the centroid to be the mean. This moves the centroid to the center of
its cluster.

4. Repeat: Repeat steps 2 and 3 until convergence. Convergence occurs when the centroids no
longer change significantly or when a maximum number of iterations is reached.

5. Final Step: Once convergence is reached, the algorithm outputs the final centroids and the
cluster assignments for each data point.

Executable python code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.random.rand(1000, 2) * 100
km = KMeans(n_clusters=3, init="random")
km.fit(data)
centers = km.cluster_centers_
labels = km.labels_
print("Cluser centers: ", *centers)
# print("Cluser Labels: ", *labels)
colors = ["r", "g", "b"]
markers = ["+", "x", "*"]
for i in range(len(data)):
plt.plot(data[i][0], data[i][1], color=colors[labels[i]],
marker=markers[labels[i]])
plt.scatter(centers[:, 0], centers[:, 1], marker="s", s=100, linewidths=5)
plt.show()

Output:

Cluser centers: [82.29904926 50.2243326 ] [31.71356124 22.45600779] [31.52067957

77.620532 ]
Conclusion:

Hence, we implemented k-means clustering algorithm in python using google colab.

Lab 03 :

Implementation of Apriori algorithm

Objective:

Write a Python program to utilize the Apriori algorithm on a retail dataset and identify
significant association rules between items in customer transactions.

Required Theory:

The Apriori algorithm is a classical algorithm in data mining and machine learning used for
association rule mining in transactional databases. It aims to find interesting relationships or
associations among a set of items in large datasets. The most common application of the Apriori
algorithm is in market basket analysis, where it helps identify associations between products that
are frequently purchased together.

Algorithm Steps:

1. Generating Candidate Itemsets:

- The Apriori algorithm starts by generating candidate itemsets of length 1 (individual items)
and then iteratively generates larger itemsets.

- New candidate itemsets are generated by joining pairs of frequent itemsets found in the
previous iteration.

2. Pruning Candidate Itemsets:

- After generating candidate itemsets, the algorithm scans the dataset to count the support of
each candidate itemset.

- Candidate itemsets that do not meet the minimum support threshold are pruned from further
consideration.

3. Generating Association Rules:

- Once frequent itemsets are identified, association rules are generated from these itemsets.

- Association rules are generated by partitioning frequent itemsets into non empty subsets and
calculating support, confidence, and lift for each rule.

- Rules that meet the minimum confidence and lift thresholds are considered significant and are
returned as the final output of the algorithm.

Dataset Description:

We are using a dataset of a retail store which contains 7501 total customer transactions (rows) in
a CSV file. A snapshot of the dataset transaction is given below:

Executable python code:

!pip install apyori

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
path="/content/drive/MyDrive/DataSet/store_data.csv"
dataset=pd . read_csv (path)
dataset.head(None)
records = []
for i in range(0, 7500):
test = []
data = dataset.iloc[i]
data = data.dropna()
for j in range(0, len(data)):
test.append(str(dataset.values[i, j]))
records.append(test)
association_rules = apriori(
records, min_support=0.005, min_confidence=0.2,
min_lift=3, min_length=2
)
association_results = list(association_rules)
for item in association_results:
print(list(item[2][0][0]), '->', list(item[2][0][1]))

Output:

Association rules generated:

['mushroom cream sauce'] -> ['escalope']

['pasta'] -> ['escalope']
['herb & pepper'] -> ['ground beef']
['tomato sauce'] -> ['ground beef']
['whole wheat pasta'] -> ['olive oil']
['pasta'] -> ['shrimp']
['chocolate', 'frozen vegetables'] -> ['shrimp']
['spaghetti', 'frozen vegetables'] -> ['ground beef']
['shrimp', 'mineral water'] -> ['frozen vegetables']
['spaghetti', 'frozen vegetables'] -> ['olive oil']
['spaghetti', 'frozen vegetables'] -> ['shrimp']
['spaghetti', 'frozen vegetables'] -> ['tomatoes']
['spaghetti', 'grated cheese'] -> ['ground beef']
['herb & pepper', 'mineral water'] -> ['ground beef']
['herb & pepper', 'spaghetti'] -> ['ground beef']
['shrimp', 'ground beef'] -> ['spaghetti']
['milk', 'spaghetti'] -> ['olive oil']
['mineral water', 'soup'] -> ['olive oil']
['pancakes', 'spaghetti'] -> ['olive oil']

Conclusion:

Hence, we implemented Apriori algorithm in python using google collab.

Lab 04 :

Implementation of ID3 decision tree algorithm

Objective:

Write a python program to predict diabetes using ID3 Decision Tree Classifier. Required

Theory:

The ID3 (Iterative Dichotomiser 3) algorithm is a classic and straightforward algorithm used for
constructing decision trees. It was developed by Ross Quinlan in 1986 and is particularly
popular for its simplicity and ease of understanding.

ID3 algorithm works as:

1. Input Data: ID3 algorithm starts with a dataset containing features and corresponding target
labels.

2. Feature Selection: It selects the best attribute to split the data at each node based on a
criterion called Information Gain. Information Gain measures how much entropy (uncertainty or
randomness) is reduced in the dataset after splitting on a particular attribute.

3. Tree Construction: It recursively constructs the decision tree by selecting the best attribute to
split the data at each node. This process continues until one of the stopping criteria is met, such
as:

- All instances at a node belong to the same class.

- No more attributes are left to split on.

- The tree reaches a maximum depth.

4. Output: The resulting decision tree is used for classification by following the decision paths
from the root to the leaf nodes based on the values of the features of the input data.

Dataset Description:

We are using a dataset of a hostpital which contains 768 total patients transactions (rows) in a
CSV file. A snapshot of the dataset transaction is given below:

Executable python code:

!pip import pandas as pd

from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
path="/content/drive/MyDrive/DataSet/Diabetes.csv
" dataset = pd.read_csv(path)
print("Dataset Size: ", len(dataset))
split = int(len(dataset) * 0.7)
train, test = dataset.iloc[:split],
dataset.iloc[split:] p = train["Pragnency"].values
g = train["Glucose"].values
bp = train["Blod Pressure"].values
st = train["Skin Thikness"].values
ins = train["Insulin"].values
bmi = train["BMI"].values
dpf = train["DFP"].values
a = train["Age"].values
d = train["Diabetes"].values
trainfeatures = zip(p, g, bp, st, ins, bmi, dpf,
a) traininput = list(trainfeatures)
# print(traininput)
model = DecisionTreeClassifier(criterion="entropy",
max_depth=4) model.fit(traininput, d)
p = test["Pregnency"].values
g = test["Glucose"].values
bp = test["Blod Pressure"].values
st = test["Skin Thikness"].values
ins = test["Insulin"].values
bmi = test["BMI"].values
dpf = test["DFP"].values
a = test["Age"].values
d = test["Diabetes"].values
testfeatures = zip(p, g, bp, st, ins, bmi, dpf,
a) testinput = list(testfeatures)
predicted = model.predict(testinput)
# print('Actual Class:', *d)
# print('Predicted Class:', *predicted)
print("Confusion Matrix:")
print(metrics.confusion_matrix(d, predicted))
print("\nClassification Measures:")
print("Accuracy:", metrics.accuracy_score(d, predicted))
print("Recall:", metrics.recall_score(d, predicted))
print("Precision:", metrics.precision_score(d,
predicted)) print("F1-score:", metrics.f1_score(d,
predicted))

Output:

Association rules generated:

Dataset Size: 767
Confusion Matrix:
[[117 35]
[ 17 62]]

Classification Measures:
Accuracy: 0.7748917748917749
Recall: 0.7848101265822784
Precision: 0.6391752577319587
F1-score: 0.7045454545454545
Conclusion:

Hence, we implemented ID3 decision tree algorithm in python using google colab.

Lab 05 :

Implementation of DBSCAN clustering algorithm

Objective:

Write a python program to implement DBSCAN clustering algorithm. Required

Theory:

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular

clustering algorithm used in machine learning and data mining. It's particularly useful for
identifying clusters of arbitrary shapes in spatial data, and it's robust to outliers.

Here's how DBSCAN works:

1. Density-Based: DBSCAN defines clusters as areas in the data space where there are many
data points concentrated together, separated by areas with few or no data points. It doesn't
assume that clusters have a specific shape.

2. Parameters:

- Epsilon (ε): This is a distance threshold that defines the radius within which to search for
neighboring points. It determines the minimum number of points required to form a dense
region.

- MinPts: This parameter specifies the minimum number of points required to form a dense
region. Any point with at least MinPts points within distance ε is considered a core point.

3. Core Points: A core point is a point that has at least MinPts neighboring points within
distance ε. Core points lie deep within a cluster.

4. Border Points: Border points are not core points themselves but are within the ε neighborhood
of a core point. They belong to the same cluster as the core point but lie on the edges.

5. Noise Points (Outliers): Points that are neither core points nor border points are considered
noise points or outliers. They do not belong to any cluster.
6. Algorithm Steps:

- DBSCAN starts by randomly selecting a point from the dataset.

- It then finds all points in the ε-neighborhood of this point. If the number of points in this
neighborhood is less than MinPts, the point is labeled as noise. Otherwise, it's labeled as a core
point and assigned to a new cluster or an existing cluster.

- It continues this process recursively, expanding the cluster by adding core points and their
reachable neighbors until no more points can be added.

- Once all points are either assigned to a cluster or labeled as noise, the algorithm terminates.

Executable python code:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs

# Generate some random data for demonstration

data, labels = make_blobs(n_samples=300, centers=3, random_state=42)

# Use DBSCAN for clustering

dbscan = DBSCAN(eps=0.5, min_samples=5)
clusters = dbscan.fit_predict(data)

# Plot the results

plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='viridis')
plt.title('DBSCAN Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Output:
Conclusion:

Hence, we implemented DBSCAN clustering algorithm in python using google collab.

DWDM Lab Report
No ratings yet
DWDM Lab Report
10 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
ML Expected Question and Explanation of The 3 PGM
No ratings yet
ML Expected Question and Explanation of The 3 PGM
12 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Apriori Algorithm & Clustering Guide
No ratings yet
Apriori Algorithm & Clustering Guide
8 pages
Document 1116
No ratings yet
Document 1116
6 pages
Recent Trends in IT Practical Solutions
No ratings yet
Recent Trends in IT Practical Solutions
11 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Ashfatmaterial
No ratings yet
Ashfatmaterial
4 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
DWDM Lab Manual 28.04.25-9-14
No ratings yet
DWDM Lab Manual 28.04.25-9-14
6 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Machine Learnine Experiment by Priyanka
No ratings yet
Machine Learnine Experiment by Priyanka
6 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
8 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Record 5
No ratings yet
Record 5
22 pages
Da Pra Week 15 (Apriori Algo) - 114413
No ratings yet
Da Pra Week 15 (Apriori Algo) - 114413
11 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Machine Learning File
No ratings yet
Machine Learning File
28 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
ML Copy
No ratings yet
ML Copy
33 pages
Record
No ratings yet
Record
23 pages
DMT Cia2
No ratings yet
DMT Cia2
11 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Pattern Recognition Practicals
No ratings yet
Pattern Recognition Practicals
8 pages
Da Exp 9
No ratings yet
Da Exp 9
5 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
51 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Xii STD Practical 1 (1) 1
No ratings yet
Xii STD Practical 1 (1) 1
22 pages
R Data Analysis Techniques
No ratings yet
R Data Analysis Techniques
9 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
26 pages
Indexdw
No ratings yet
Indexdw
34 pages
Data Mining Lab Manual Student - Copy - For - Print
No ratings yet
Data Mining Lab Manual Student - Copy - For - Print
24 pages
ML Exp 4,5
No ratings yet
ML Exp 4,5
7 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
No ratings yet
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
8 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
Experiment 6,7
No ratings yet
Experiment 6,7
14 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
Minor Project
No ratings yet
Minor Project
21 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
DM Lab Cycle 7 1
No ratings yet
DM Lab Cycle 7 1
7 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
ML 3
No ratings yet
ML 3
24 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
R - Practical
No ratings yet
R - Practical
50 pages
A Discriminative Convolutional Neural Network With Context-Aware Attention
No ratings yet
A Discriminative Convolutional Neural Network With Context-Aware Attention
21 pages
Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN
No ratings yet
Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN
16 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
93 pages
Responsible Ai Best Practices For Creating Trustworthy Ai Systems Early Release 9780138073947 9780138073923 0138073929
No ratings yet
Responsible Ai Best Practices For Creating Trustworthy Ai Systems Early Release 9780138073947 9780138073923 0138073929
340 pages
Car Price Prediction Project Chapters
No ratings yet
Car Price Prediction Project Chapters
30 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
(KKPM) P1 - Pengantar
No ratings yet
(KKPM) P1 - Pengantar
23 pages
A Systematic Review: B-Cell Conformational Epitope Prediction From Epitope Characteristics View
No ratings yet
A Systematic Review: B-Cell Conformational Epitope Prediction From Epitope Characteristics View
7 pages
ITI 4102 - EN Data Warehousing and Mining1
No ratings yet
ITI 4102 - EN Data Warehousing and Mining1
96 pages
Agarwal Et Al - 2020 - Development of Efficient CNN Model For Tomato Crop Disease Identification
No ratings yet
Agarwal Et Al - 2020 - Development of Efficient CNN Model For Tomato Crop Disease Identification
25 pages
AI Product Managers Guide Ebook
No ratings yet
AI Product Managers Guide Ebook
8 pages
Tugas Bahasa Inggris Review Jurnal
No ratings yet
Tugas Bahasa Inggris Review Jurnal
2 pages
Dimension Reduction Methods
No ratings yet
Dimension Reduction Methods
32 pages
An Interval Type-3 Fuzzy System and A New Online Fractional-Order Learning Algorithm Theory and Practice
100% (1)
An Interval Type-3 Fuzzy System and A New Online Fractional-Order Learning Algorithm Theory and Practice
11 pages
Course Catalogue NASSCOM
No ratings yet
Course Catalogue NASSCOM
26 pages
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
No ratings yet
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
21 pages
Ai Project Logbook
No ratings yet
Ai Project Logbook
26 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Aircraft Fault Detection Using Artificial Intelligence A Review
No ratings yet
Aircraft Fault Detection Using Artificial Intelligence A Review
6 pages
Ai Learning Plan
No ratings yet
Ai Learning Plan
7 pages
Deep Learning For Recommendation System
No ratings yet
Deep Learning For Recommendation System
8 pages
Week 6 Machine Learning Assignments
No ratings yet
Week 6 Machine Learning Assignments
13 pages
Fault Localization
No ratings yet
Fault Localization
4 pages
DLunit 3
No ratings yet
DLunit 3
13 pages
Deep Representation Learning Techniques For Audio Signal Processing
No ratings yet
Deep Representation Learning Techniques For Audio Signal Processing
152 pages
AI Model Development Updates
No ratings yet
AI Model Development Updates
7 pages
Cyc 6
No ratings yet
Cyc 6
26 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
Auto Target
No ratings yet
Auto Target
12 pages
Ebook Machine Learning Applications
No ratings yet
Ebook Machine Learning Applications
235 pages

DWDM Lab Report

Uploaded by

DWDM Lab Report

Uploaded by

TRIBHUVAN UNIVERSITY

INSTITUTE OF SCIENCE AND TECHNOLOGY

BIRENDRA MULTIPLE CAMPUS

Data Warehousing and Data Mining

Objective: To use Weka tool for visualization of data

Apple sweetness data visualization

Implementation of K-means clustering algorithm

Executable python code:

Cluser centers: [82.29904926 50.2243326 ] [31.71356124 22.45600779] [31.52067957

Hence, we implemented k-means clustering algorithm in python using google colab.

Implementation of Apriori algorithm

1. Generating Candidate Itemsets:

2. Pruning Candidate Itemsets:

3. Generating Association Rules:

Executable python code:

!pip install apyori

Association rules generated:

['mushroom cream sauce'] -> ['escalope']

Hence, we implemented Apriori algorithm in python using google collab.

Implementation of ID3 decision tree algorithm

ID3 algorithm works as:

- All instances at a node belong to the same class.

- No more attributes are left to split on.

- The tree reaches a maximum depth.

Executable python code:

!pip import pandas as pd

Association rules generated:

Implementation of DBSCAN clustering algorithm

Write a python program to implement DBSCAN clustering algorithm. Required

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular

Here's how DBSCAN works:

- DBSCAN starts by randomly selecting a point from the dataset.

Executable python code:

# Generate some random data for demonstration

# Use DBSCAN for clustering

# Plot the results

Hence, we implemented DBSCAN clustering algorithm in python using google collab.

You might also like