EX.
NO:2
ASSOCIATION RULE MINING
APRIORI ALGORITHM
MOTIVATION OF THE MINING TECHNIQUE:
Apriori is an algorithm for frequent item set mining and association rule learning
over transactional databases. It proceeds by identifying frequent individual items in the
database and extending them to larger and larger item sets as long as those item sets appears
sufficiently often in the database. The frequent item sets determined by apriori can be used to
determine association rules which highlights general trends in the database. This has
applications in domains such as book-sharing system.
MINING LOGIC:
APPLICATION:
dataset.arff is an arff file. It deals with the attributes of clothing retail. The attributes are:
shirt,tshirt,pant,watch,belt,shoe,socks,perfume,hat,sunglasses.
CREATING DATASET:
import java.util.*;
import java.io.*;
class random1{
public static void main(String[] args) throws IOException{
Random rand = new Random();
StringBuilder sb= new StringBuilder();
FileWriter fw = new
FileWriter("C:/Users/Administrator.OS-CP23/Desktop/dataset.arff",true);
PrintWriter printWriter = new PrintWriter(fw);
printWriter.println();
int nextnumber = 0;
int j=0;
for(int i=0;i<25;i++){
nextnumber = rand.nextInt(1023);
String my=String.format("%10s",Integer.toBinaryString(nextnumber)).replace('', '0');
while(j!=9){
sb.append(mys.charAt(j++));
sb.append(',');
}
sb.append(mys.charAt(j++));
j=0;
printWriter.println(sb.toString()); //New line
sb.setLength(0);
}
printWriter.close();
}
}
.ARFF File
@relation ClothingShop
@attribute Shirt{1,0}
@attribute T-Shirt{1,0}
@attribute Pant{1,0}
@attribute Shoe{1,0}
@attribute Belt{1,0}
@attribute Socks{1,0}
@attribute Sweater{1,0}
@attribute Sunglass{1,0}
@attribute Hat{1,0}
@attribute Perfume{1,0}
@data
0,0,0,1,0,1,1,0,0,0
1,1,0,0,1,1,1,1,0,0
1,0,1,0,0,0,1,1,0,1
1,0,1,0,0,0,1,1,1,0
0,0,1,1,1,1,0,1,0,1
1,0,0,0,0,1,1,0,1,0
1,0,0,0,1,1,0,1,1,0
0,1,0,1,1,0,1,0,0,1
1,0,0,0,0,1,1,1,0,1
0,0,1,0,0,0,0,0,1,1
1,1,1,1,0,1,0,1,0,1
0,0,0,1,0,1,1,1,0,1
1,0,0,0,0,1,1,1,0,0
0,1,1,0,0,1,0,0,1,1
1,1,0,1,0,0,0,1,0,0
1,0,1,1,0,0,1,1,0,1
1,0,0,1,0,1,0,1,0,1
1,0,1,1,0,1,0,1,1,1
1,0,0,0,0,0,1,1,1,0
1,0,0,0,1,1,1,1,1,0
0,0,1,0,0,1,0,0,0,1
1,1,0,1,0,1,0,0,0,1
1,0,0,1,1,1,1,0,0,0
0,1,1,0,0,0,1,1,0,1
0,0,0,0,1,0,1,0,1,1
OUTPUT:
EX.NO:3
ASSOCIATION RULE MINING
FP-GROWTH ALGORITHM
MOTIVATION OF THE MINING TECHNIQUE:
The FP-Growth algorithm is an efficient and scalable method for mining the
complete set of frequent patterns by pattern fragment growth, using an extended prefix tree
structure for storing compressed and crucial information about frequent pattern named
frequent-pattern tree(FP-Tree).
MINING LOGIC:
Algorithm 1: FP-tree construction
Input: A transaction database DB and a minimum support threshold ?.
Output: FP-tree, the frequent-pattern tree of DB.
Method: The FP-tree is constructed as follows.
1. Scan the transaction database DB once. Collect F, the set of frequent items, and
the support of each frequent item. Sort F in support-descending order as FList,
the list of frequent items.
2. Create the root of an FP-tree, T, and label it as “null”. For each transaction Trans
in DB do the following:
● Select the frequent items in Trans and sort them according to the order of FList. Let
the sorted frequent-item list in Trans be [ p | P], where p is the first element and P is
the remaining list. Call insert tree([ p | P], T ).
● The function insert tree([ p | P], T ) is performed as follows. If T has a child N such
that N.item-name = p.item-name, then increment N ’s count by 1; else create a new
node N , with its count initialized to 1, its parent link linked to T , and its node-link
linked to the nodes with the same item-name via the node-link structure. If P is
nonempty, call insert tree(P, N ) recursively.
This algorithm works as follows: first it compresses the input database creating an
FP-Tree instance to represent frequent items. After this first step it divides the compressed
database into a set of conditional databases, each one associated with one frequent pattern.
Finally, each such database is mined separately. Using this strategy, the FP-growth reduces the
search cost looking for short patterns recursively and then concatenating them in the long
frequent patterns, offering good selectivity.
CREATING DATASET:
import java.util.*;
import java.io.*;
class random1{
public static void main(String[] args) throws IOException{
Random rand = new Random();
StringBuilder sb= new StringBuilder();
FileWriter fw = new
FileWriter("C:/Users/Administrator.OS-CP23/Desktop/dataset.arff",true);
PrintWriter printWriter = new PrintWriter(fw);
printWriter.println();
int nextnumber = 0;
int j=0;
for(int i=0;i<25;i++){
nextnumber = rand.nextInt(1023);
String my=String.format("%10s",Integer.toBinaryString(nextnumber)).replace('', '0');
while(j!=9){
sb.append(mys.charAt(j++));
sb.append(',');
}
sb.append(mys.charAt(j++));
j=0;
printWriter.println(sb.toString()); //New line
sb.setLength(0);
}
printWriter.close();
}
}
.ARFF File
@relation ClothingShop
@attribute Shirt{1,0}
@attribute T-Shirt{1,0}
@attribute Pant{1,0}
@attribute Shoe{1,0}
@attribute Belt{1,0}
@attribute Socks{1,0}
@attribute Sweater{1,0}
@attribute Sunglass{1,0}
@attribute Hat{1,0}
@attribute Perfume{1,0}
@data
0,0,0,1,0,1,1,0,0,0
1,1,0,0,1,1,1,1,0,0
1,0,1,0,0,0,1,1,0,1
1,0,1,0,0,0,1,1,1,0
0,0,1,1,1,1,0,1,0,1
1,0,0,0,0,1,1,0,1,0
1,0,0,0,1,1,0,1,1,0
0,1,0,1,1,0,1,0,0,1
1,0,0,0,0,1,1,1,0,1
0,0,1,0,0,0,0,0,1,1
1,1,1,1,0,1,0,1,0,1
0,0,0,1,0,1,1,1,0,1
1,0,0,0,0,1,1,1,0,0
0,1,1,0,0,1,0,0,1,1
1,1,0,1,0,0,0,1,0,0
1,0,1,1,0,0,1,1,0,1
1,0,0,1,0,1,0,1,0,1
1,0,1,1,0,1,0,1,1,1
1,0,0,0,0,0,1,1,1,0
1,0,0,0,1,1,1,1,1,0
0,0,1,0,0,1,0,0,0,1
1,1,0,1,0,1,0,0,0,1
1,0,0,1,1,1,1,0,0,0
0,1,1,0,0,0,1,1,0,1
0,0,0,0,1,0,1,0,1,1
OUTPUT:
EX.NO:4
DATA CLUSTERING
K-MEANS ALGORITHM
MOTIVATION OF THE MINING :
k- means clustering is a method of vector quantization, originally from signal processing,
that is popular for cluster analysis in data mining. k- means clustering aims
to partition n observations into k clusters in which each observation belongs to the cluster with
the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data
space into Voronoi cells.
MINING LOGIC:
The most common algorithm uses an iterative refinement technique. Due to its ubiquity, it is
often called the k- means algorithm; it is also referred to as Lloyd's algorithm, particularly in the
computer science community.
Given an initial set of k means m1(1),…,mk (1)
(see below), the algorithm proceeds by alternating
between two steps:[6]
Assignment step: Assign each observation to the cluster whose mean has the least
squared Euclidean distance, this is intuitively the "nearest" mean.[7] (Mathematically,
this means partitioning the observations according to the Voronoi diagram generated by
the means).
where each is assigned to exactly one , even if it could be assigned to two or more
of them.
Update step: Calculate the new means (centroids) of the observations in the new clusters.
.ARFF File
@relation ClothingShop
@attribute Shirt{1,0}
@attribute T-Shirt{1,0}
@attribute Pant{1,0}
@attribute Shoe{1,0}
@attribute Belt{1,0}
@attribute Socks{1,0}
@attribute Sweater{1,0}
@attribute Sunglass{1,0}
@attribute Hat{1,0}
@attribute Perfume{1,0}
@data
0,0,0,1,0,1,1,0,0,0
1,1,0,0,1,1,1,1,0,0
1,0,1,0,0,0,1,1,0,1
1,0,1,0,0,0,1,1,1,0
0,0,1,1,1,1,0,1,0,1
1,0,0,0,0,1,1,0,1,0
1,0,0,0,1,1,0,1,1,0
0,1,0,1,1,0,1,0,0,1
1,0,0,0,0,1,1,1,0,1
0,0,1,0,0,0,0,0,1,1
1,1,1,1,0,1,0,1,0,1
0,0,0,1,0,1,1,1,0,1
1,0,0,0,0,1,1,1,0,0
0,1,1,0,0,1,0,0,1,1
1,1,0,1,0,0,0,1,0,0
1,0,1,1,0,0,1,1,0,1
1,0,0,1,0,1,0,1,0,1
1,0,1,1,0,1,0,1,1,1
1,0,0,0,0,0,1,1,1,0
1,0,0,0,1,1,1,1,1,0
0,0,1,0,0,1,0,0,0,1
1,1,0,1,0,1,0,0,0,1
1,0,0,1,1,1,1,0,0,0
0,1,1,0,0,0,1,1,0,1
0,0,0,0,1,0,1,0,1,1
OUTPUT: