KEMBAR78
Macky Data Mining | PDF | Cluster Analysis | Algorithms And Data Structures
0% found this document useful (0 votes)
103 views14 pages

Macky Data Mining

The document discusses two algorithms for association rule mining and data clustering: 1. The Apriori algorithm is used for frequent itemset mining and association rule learning over transactional databases. It identifies frequent individual items and extends them to larger item sets that appear frequently. 2. The k-means clustering algorithm partitions observations into k clusters where each observation belongs to the cluster with the nearest mean, resulting in a Voronoi cell partitioning of the data space. It alternates between assignment and update steps to refine the cluster means.

Uploaded by

Magesh Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views14 pages

Macky Data Mining

The document discusses two algorithms for association rule mining and data clustering: 1. The Apriori algorithm is used for frequent itemset mining and association rule learning over transactional databases. It identifies frequent individual items and extends them to larger item sets that appear frequently. 2. The k-means clustering algorithm partitions observations into k clusters where each observation belongs to the cluster with the nearest mean, resulting in a Voronoi cell partitioning of the data space. It alternates between assignment and update steps to refine the cluster means.

Uploaded by

Magesh Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

EX.

NO:2

ASSOCIATION RULE MINING

APRIORI ALGORITHM

MOTIVATION OF THE MINING TECHNIQUE:

Apriori is an algorithm for frequent item set mining and association rule learning
over transactional databases. It proceeds by identifying frequent individual items in the
database and extending them to larger and larger item sets as long as those item sets appears
sufficiently often in the database. The frequent item sets determined by apriori can be used to
determine association rules which highlights general trends in the database. This has
applications in domains such as book-sharing system.

MINING LOGIC:
APPLICATION:

dataset.arff is an arff file. It deals with the attributes of clothing retail. The attributes are:
shirt,tshirt,pant,watch,belt,shoe,socks,perfume,hat,sunglasses.

CREATING DATASET:

import java.util.*;
import java.io.*;
class random1{
public static void main(String[] args) throws IOException{
Random rand = new Random();
StringBuilder sb= new StringBuilder();
FileWriter fw = new
FileWriter("C:/Users/Administrator.OS-CP23/Desktop/dataset.arff",true);
PrintWriter printWriter = new PrintWriter(fw);
printWriter.println();
int nextnumber = 0;
int j=0;
for(int i=0;i<25;i++){
nextnumber = rand.nextInt(1023);
String my=String.format("%10s",Integer.toBinaryString(nextnumber)).replace('', '0');

while(j!=9){
sb.append(mys.charAt(j++));
sb.append(',');
}
sb.append(mys.charAt(j++));
j=0;

printWriter.println(sb.toString()); //New line

sb.setLength(0);
}
printWriter.close();

}
}

.ARFF File

@relation ClothingShop
@attribute Shirt{1,0}

@attribute T-Shirt{1,0}

@attribute Pant{1,0}

@attribute Shoe{1,0}

@attribute Belt{1,0}

@attribute Socks{1,0}

@attribute Sweater{1,0}

@attribute Sunglass{1,0}

@attribute Hat{1,0}

@attribute Perfume{1,0}

@data

0,0,0,1,0,1,1,0,0,0

1,1,0,0,1,1,1,1,0,0

1,0,1,0,0,0,1,1,0,1

1,0,1,0,0,0,1,1,1,0

0,0,1,1,1,1,0,1,0,1

1,0,0,0,0,1,1,0,1,0

1,0,0,0,1,1,0,1,1,0

0,1,0,1,1,0,1,0,0,1

1,0,0,0,0,1,1,1,0,1

0,0,1,0,0,0,0,0,1,1
1,1,1,1,0,1,0,1,0,1

0,0,0,1,0,1,1,1,0,1

1,0,0,0,0,1,1,1,0,0

0,1,1,0,0,1,0,0,1,1

1,1,0,1,0,0,0,1,0,0

1,0,1,1,0,0,1,1,0,1

1,0,0,1,0,1,0,1,0,1

1,0,1,1,0,1,0,1,1,1

1,0,0,0,0,0,1,1,1,0

1,0,0,0,1,1,1,1,1,0

0,0,1,0,0,1,0,0,0,1

1,1,0,1,0,1,0,0,0,1

1,0,0,1,1,1,1,0,0,0

0,1,1,0,0,0,1,1,0,1

0,0,0,0,1,0,1,0,1,1
OUTPUT:
EX.NO:3

ASSOCIATION RULE MINING

FP-GROWTH ALGORITHM

MOTIVATION OF THE MINING TECHNIQUE:

The FP-Growth algorithm is an efficient and scalable method for mining the
complete set of frequent patterns by pattern fragment growth, using an extended prefix tree
structure for storing compressed and crucial information about frequent pattern named
frequent-pattern tree(FP-Tree).

MINING LOGIC:

Algorithm 1: FP-tree construction


Input​:​ A transaction database DB and a minimum support threshold ?.
Output:​ ​ FP-tree, the frequent-pattern tree of DB.
Method:​ ​ The FP-tree is constructed as follows.

1. Scan the transaction database DB once. Collect F, the set of frequent items, and
the support of each frequent item. Sort F in support-descending order as FList,
the list of frequent items.
2. Create the root of an FP-tree, T, and label it as “null”. For each transaction Trans
in DB do the following:

● Select the frequent items in Trans and sort them according to the order of FList. Let
the sorted frequent-item list in Trans be [ p | P], where p is the first element and P is
the remaining list. Call insert tree([ p | P], T ).
● The function insert tree([ p | P], T ) is performed as follows. If T has a child N such
that N.item-name = p.item-name, then increment N ’s count by 1; else create a new
node N , with its count initialized to 1, its parent link linked to T , and its node-link
linked to the nodes with the same item-name via the node-link structure. If P is
nonempty, call insert tree(P, N ) recursively.

This algorithm works as follows: first it compresses the input database creating an
FP-Tree instance to represent frequent items. After this first step it divides the compressed
database into a set of conditional databases, each one associated with one frequent pattern.
Finally, each such database is mined separately. Using this strategy, the FP-growth reduces the
search cost looking for short patterns recursively and then concatenating them in the long
frequent patterns, offering good selectivity.

CREATING DATASET:

import java.util.*;
import java.io.*;
class random1{
public static void main(String[] args) throws IOException{
Random rand = new Random();
StringBuilder sb= new StringBuilder();
FileWriter fw = new
FileWriter("C:/Users/Administrator.OS-CP23/Desktop/dataset.arff",true);
PrintWriter printWriter = new PrintWriter(fw);
printWriter.println();
int nextnumber = 0;
int j=0;
for(int i=0;i<25;i++){
nextnumber = rand.nextInt(1023);
String my=String.format("%10s",Integer.toBinaryString(nextnumber)).replace('', '0');

while(j!=9){
sb.append(mys.charAt(j++));
sb.append(',');
}
sb.append(mys.charAt(j++));
j=0;

printWriter.println(sb.toString()); //New line

sb.setLength(0);
}
printWriter.close();

}
}

.ARFF File

@relation ClothingShop
@attribute Shirt{1,0}

@attribute T-Shirt{1,0}

@attribute Pant{1,0}

@attribute Shoe{1,0}

@attribute Belt{1,0}

@attribute Socks{1,0}

@attribute Sweater{1,0}

@attribute Sunglass{1,0}

@attribute Hat{1,0}

@attribute Perfume{1,0}

@data

0,0,0,1,0,1,1,0,0,0

1,1,0,0,1,1,1,1,0,0

1,0,1,0,0,0,1,1,0,1

1,0,1,0,0,0,1,1,1,0

0,0,1,1,1,1,0,1,0,1

1,0,0,0,0,1,1,0,1,0

1,0,0,0,1,1,0,1,1,0

0,1,0,1,1,0,1,0,0,1

1,0,0,0,0,1,1,1,0,1

0,0,1,0,0,0,0,0,1,1
1,1,1,1,0,1,0,1,0,1

0,0,0,1,0,1,1,1,0,1

1,0,0,0,0,1,1,1,0,0

0,1,1,0,0,1,0,0,1,1

1,1,0,1,0,0,0,1,0,0

1,0,1,1,0,0,1,1,0,1

1,0,0,1,0,1,0,1,0,1

1,0,1,1,0,1,0,1,1,1

1,0,0,0,0,0,1,1,1,0

1,0,0,0,1,1,1,1,1,0

0,0,1,0,0,1,0,0,0,1

1,1,0,1,0,1,0,0,0,1

1,0,0,1,1,1,1,0,0,0

0,1,1,0,0,0,1,1,0,1

0,0,0,0,1,0,1,0,1,1

OUTPUT:
EX.NO:4

DATA CLUSTERING

K-MEANS ALGORITHM

MOTIVATION OF THE MINING :

k-​ means clustering​ is a method of ​vector quantization​, originally from ​signal processing​,
that is popular for ​cluster analysis​ in ​data mining​. ​k-​ means clustering aims
to ​partition​ ​n​ observations into ​k​ clusters in which each observation belongs to the ​cluster​ with
the nearest ​mean​, serving as a prototype of the cluster. This results in a partitioning of the data
space into ​Voronoi cells​.

MINING LOGIC:

The most common algorithm uses an iterative refinement technique. Due to its ubiquity, it is
often called the ​k-​ means algorithm; it is also referred to as ​Lloyd's algorithm​, particularly in the
computer science community.
Given an initial set of ​k​ means ​m​1​(1)​,…,​mk​ (1)​
​ (see below), the algorithm proceeds by alternating
between two steps:​[6]
Assignment step​: Assign each observation to the cluster whose mean has the least
squared ​Euclidean distance​, this is intuitively the "nearest" mean.​[7]​ (Mathematically,
this means partitioning the observations according to the ​Voronoi diagram​ generated by
the means).

where each is assigned to exactly one , even if it could be assigned to two or more
of them.
Update step​: Calculate the new means (​centroids​) of the observations in the new clusters.
.ARFF File

@relation ClothingShop

@attribute Shirt{1,0}

@attribute T-Shirt{1,0}

@attribute Pant{1,0}

@attribute Shoe{1,0}

@attribute Belt{1,0}

@attribute Socks{1,0}

@attribute Sweater{1,0}

@attribute Sunglass{1,0}

@attribute Hat{1,0}

@attribute Perfume{1,0}

@data

0,0,0,1,0,1,1,0,0,0

1,1,0,0,1,1,1,1,0,0

1,0,1,0,0,0,1,1,0,1

1,0,1,0,0,0,1,1,1,0

0,0,1,1,1,1,0,1,0,1

1,0,0,0,0,1,1,0,1,0

1,0,0,0,1,1,0,1,1,0
0,1,0,1,1,0,1,0,0,1

1,0,0,0,0,1,1,1,0,1

0,0,1,0,0,0,0,0,1,1

1,1,1,1,0,1,0,1,0,1

0,0,0,1,0,1,1,1,0,1

1,0,0,0,0,1,1,1,0,0

0,1,1,0,0,1,0,0,1,1

1,1,0,1,0,0,0,1,0,0

1,0,1,1,0,0,1,1,0,1

1,0,0,1,0,1,0,1,0,1

1,0,1,1,0,1,0,1,1,1

1,0,0,0,0,0,1,1,1,0

1,0,0,0,1,1,1,1,1,0

0,0,1,0,0,1,0,0,0,1

1,1,0,1,0,1,0,0,0,1

1,0,0,1,1,1,1,0,0,0

0,1,1,0,0,0,1,1,0,1

0,0,0,0,1,0,1,0,1,1
OUTPUT:

You might also like