Machine Learning Laboratory
Machine Learning Laboratory
TECHNOLOGY
(Approved by AICTE & Affiliated to Anna University)
Thalambur, Chennai-600130
EVEN SEMESTER
Year/Dept/Section:
AGNI COLLEGE OF TECHNOLOGY
(Approved by AICTE& Affiliated to Anna University)
Thalambur, Chennai – 600 130
BONAFIDE CERTIFICATE
Register Number :
LIST OF EXPERIMENTS:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the
training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same
using appropriate data sets.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set storedas a
.CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and measure the accuracy,
precision, and recall.
6. Write a program to construct a Bayesian network to diagnose CORONA infection using standard WHO
Data Set.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Printboth
correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select an appropriate data set for your experiment and draw graphs.
TOTAL:60 PERIODS
OUTCOMES:
At the end of this course, the students will be able to:
CO1:Apply suitable algorithms for selecting the appropriate features for analysis.
CO2:Implement supervised machine learning algorithms on standard datasets and evaluate theperformance.
CO3:Apply unsupervised machine learning algorithms on standard datasets and evaluate theperformance.
CO4:Build the graph based learning models for standard data sets.
CO5:Assess and compare the performance of different ML algorithms and select the suitable onebased
on the application
EX.NO. : 1 For a given set of training data examples stored in a .CSV file, implement
DATE : and demonstrate the Candidate-Elimination algorithm to output a
description of the set of allhypotheses consistent with the training examples.
Aim:
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.
Algorithm:
class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length
'''
Constructor of class Holder holding two parameters,self
refers to the instance of the class
'''
def init (self,attr): #
self.attributes = attrfor i in
attr:
self.factors[i]=[]
def add_values(self,factor,values):
self.factors[factor]=values
class CandidateElimination:
Positive={} #Initialize positive empty dictionary
Negative={} #Initialize negative empty dictionary
def run_algorithm(self):'''
Initialize the specific and general boundaries, and loop the dataset against the
algorithm
'''
G = self.initializeG()S =
self.initializeS()
'''
Programmatically populate list in the iterating variable trial_set'''
count=0
for trial_set in self.dataset:
if self.is_positive(trial_set): #if trial set/example consists of positive examples
G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data fromthe
general boundary
S_new = S[:] #initialize the dictionary with no key-value pair
print (S_new)
for s in S:
if not self.consistent(s,trial_set[0]):
S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0])
generalization = self.get_general(generalization,G)
if generalization:
S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S)
print(S)
else:#if it is negative
print (S)print
(G)
def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector
return [S]
def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector
return [G]
def is_positive(self,trial_set):
''' Check if a given training trial_set is positive '''if
trial_set[1] == 'Y':
return True
elif trial_set[1] == 'N':
return False
else:
raise TypeError("invalid target value")
def match_factor(self,value1,value2):
''' Check for the factors values match, necessary
while checking the consistency oftraining
trial_set with the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False
def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis '''for
i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]):
return False
return True
for g in hypotheses:
if not self.consistent(g,instance):
G_new.remove(g)
return G_new
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S
general than others in S should be removed '''
S_new = hypotheses[:]for
old in hypotheses:
for new in S_new:
if old!=new and self.more_general(new,old):
S_new.remove[new]
return S_new
def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in G
specific than others in G should be removed '''
G_new = hypotheses[:]for
old in hypotheses:
for new in G_new:
if old!=new and self.more_specific(new,old):
G_new.remove[new]
return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific
boundary S,
it should be generalized to be consistent with the trial_set ... we will get one
hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutabilityfor
i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]):
hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability
return generalization
def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general
boundary G
should be specialized to be consistent with the trial_set.. we will get a set of
hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutabilityfor
i,factor in enumerate(hypo):
if factor == '?':
values = self.factors[self.attr[i]]for j
in values:
if instance[i] != j:
hyp=hypo[:]
hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability
specializations.append(hyp)
return specializations
def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G
for a generalization of inconsistent hypothesis in S
in case of positive trial_set and returns valid generalization '''
for g in G:
if self.more_general(g,generalization):
return generalization
return None
def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in Sfor
each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_setand
return the valid specializations'''
valid_specializations = [] for
hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]:
valid_specializations.append(hypo)
return valid_specializations
def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in
general boundary for version space'''
for g in G:
if self.more_general(g,hypothesis):
return True
return False
def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in
general boundary for version space'''
for s in S:
if self.more_specific(s,hypothesis):
return True
return False
def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 '''hyp
= zip(hyp1,hyp2)
for i,j in hyp:if i
== '?':
continue
elif j == '?':
if i != '?': return
False
elif i != j: return
False
else:
continue
return True
dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast')f =
Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong
f.add_values('Water',('warm','cold')) #water can be warm or cold
f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call therun
algoritm method
a.run_algorithm()
Output:
[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]
Result:
Hence , the Candidate-Elimination algorithm was implemented and
demonstrated successfully.
EX.NO. :2 Write a program to demonstrate the working of the decision
DATE : tree based ID3 algorithm.Use an appropriate data set for
building the decision tree and apply this knowledge to
classify a new sample.
Aim:
Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.
Algorithm:
class Node:
def init (self, attribute):
self.attribute = attribute
self.children = [] self.answer
= ""
in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):if
data[y, col] == items[x]:
dict[items[x]][pos] = data[y]pos
+= 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
def entropy(S):
items = np.unique(S)if
items.size == 1:
return 0
for x in range(items.shape[0]):
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1)) for x
in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
= np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dict = subtables(data, split, delete=True)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
empty(size):
s = ""
for x in range(size):s +=
""
return s
print(empty(level), node.attribute)for
value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)
Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
outlook,temperature,humidity,wind,
answer sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no
Output :
outlook
overcastb'yes'
rain
wind
b'strong'b'no'
b'weak' b'yes'
sunny
humidity
b'high'b'no'
b'normal'b'yes
Result:
Hence , the working of the decision tree based on ID3 algorithm was
demonstrated successfully.
EX.NO. :3 Build an Artificial Neural Network by implementing
DATE : the Backpropagation algorithm and test the same
using appropriate data sets.
Aim:
Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.
Algorithm:
Step 1 : Create a feed-forward network with ni inputs , nhidden hidden units, and nout
output units
Step 2 : Initialize all network weights to small random numbers
Step 3 : Until the termination condition is met, Do
Step 4 : For each samples in training examples, Do
Step 5 : Propagate the input forward through the network that is input the instance to
the network and compute the output ou of every unit u in the network.
Step 6 : For each network output unit k, calculate its error term
Step 7 : For each hidden unit h, calculate its error term
Step 8 : Update each network weight wji
Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinallyy =
y/100
#Sigmoid Functiondef
sigmoid (x):
return 1/(1 + np.exp(-x))
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer #weight
and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*yfor i
in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh hlayer_act =
sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#BackpropagationEO =
y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts
contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and
currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lrwh
+= X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y)) print("Predicted
Output: \n" ,output)
Output:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
Result:
Hence, Artificial Neural network was build by implementing the Backpropagation
algorithm was tested successfully.
EX.NO. : 4 Write a program to implement the naïve Bayesian classifier
DATE : for a sample training dataset stored as a .CSV file. Compute
the accuracy of the classifier, considering few test datasets.
Aim:
To Write a program to implement the naïve Bayesian classifier for a sample training
dataset stored as a .CSV file. Compute the accuracy of the classifier, considering few test
datasets.
Algorithm:
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];del
summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
def main():
filename = '5data.csv'
splitRatio = 0.67
dataset = loadCsv(filename);
main()
Output:
confusion matrix is as
follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support
Result :
Hence, the naïve Bayesian classifier for a sample training data set was
implemented successfully.
EX.NO. :5 Assuming a set of documents that need to be classified, use the
DATE : naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to writethe program. Calculate the
accuracy, precision, and recall for your data set.
Aim:
Algorithm:
Step 1 : Collect all words, punctuation, and other tokens that occur in Examples
Step 2 : Vocabulary ← c the set of all distinct words and other tokens occurring in any
text document from Examples
Step 3 : Calculate the required P(vj) and P(wk|vj) probability terms
Step 4 : For each target value vj in V do
Step 5 : docsj ← the subset of documents from Examples for which the target value
is vj
Step 6 : P(vj) ← | docsj | / |Examples|
Step 7 : Textj ← a single document created by concatenating all members of docsj
Step 8 : n ← total number of distinct word positions in Textj
Step 9 : for each word wk in Vocabulary
Step 10: nk ← number of times word wk occurs in Textj
Step 11: P(wk|vj) ← ( nk + 1) / (n + | Vocabulary| )
Program:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm) #sparse matrix representation
Result:
Hence, a set of documents using naïve Bayesian Classifier model was
classified successfully.
EX.NO. : 6 Write a program to construct a Bayesian network considering
DATE : medical data. Usethis model to demonstrate the diagnosis of heart
patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
Aim:
To write a program to construct a Bayesian network considering medical
data.
Algorithm:
Step 1 : A Bayesian network is a directed acyclic graph in which each edge
corresponds to a conditional dependency, and each node corresponds to a
unique random variable.
Step 2 : Bayesian network consists of two major parts: a directed acyclic graph and
a set of conditional probability distributions
Step 3 : The directed acyclic graph is a set of random variables represented by
nodes.
Step 4 : The conditional probability distribution of a node (random variable) is
defined for every possible outcome of the preceding causal node(s).
Program:
From pomegranate import* Asia=DiscreteDistribution({
„True‟:0.5, „False‟:0.5 })
Tuberculosis=ConditionalProbabilityTable(
[[ „True‟, „True‟, 0.2],
[„True‟, „False‟, 0.8],
[ „False‟, „True‟, 0.01],
[ „False‟, „False‟, 0.98]], [asia])
Bronchitis = ConditionalProbabilityTable([[
„True‟, „True‟, 0.92],
[„True‟, „False‟,0.08].
[ „False‟, „True‟,0.03],
[ „False‟, „False‟, 0.98]], [ smoking])
Tuberculosis_or_cancer = ConditionalProbabilityTable([[
„True‟, „True‟, „True‟, 1.0],
[„True‟, „True‟, „False‟, 0.0],
[„True‟, „False‟, „True‟, 1.0],
[„True‟, „False‟, „False‟, 0.0],
[„False‟, „True‟, „True‟, 1.0],
[„False‟, „True‟, „False‟, 0.0],
[„False‟, „False‟ „True‟, 1.0],
[„False‟, „False‟, „False‟, 0.0]], [tuberculosis, lung])
Xray = ConditionalProbabilityTable([[
„True‟, „True‟, 0.885],
[„True‟, „False‟, 0.115],
[ „False‟, „True‟, 0.04],
[ „False‟, „False‟, 0.96]], [tuberculosis_or_cancer])
dyspnea = ConditionalProbabilityTable(
[[ „True‟, „True‟, „True‟, 0.96],
[„True‟, „True‟, „False‟, 0.04],
[„True‟, „False‟, „True‟, 0.89],
[„True‟, „False‟, „False‟, 0.11],
[„False‟, „True‟, „True‟, 0.96],
[„False‟, „True‟, „False‟, 0.04],
[„False‟, „False‟ „True‟, 0.89],
[„False‟, „False‟, „False‟, 0.11 ]], [tuberculosis_or_cancer, bronchitis])s0 =
State(asia, name=”asia”)
s1 = State(tuberculosis, name=” tuberculosis”)s2 =
State(smoking, name=” smoker”)
network = BayesianNetwork(“asia”)
network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2) network.bake()
print(network.predict_probal({„tuberculosis‟: „True‟}))
Result :
Hence, a Bayesian network considering medical data was constructed and
implemented successfully.
EX.NO. : 7 Apply EM algorithm to cluster a set of data stored in a .CSV
DATE : file. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add
Java/Python MLlibrary classes/API in the program.
Aim:
To cluster a set of data stored in a .CSV file and cluster using k-Means algorithm.
Compare the results of these two algorithms
Algorithm:
U, s, Vt = np.linalg.svd(covariance)
Angle = np.degrees(np.arctan2(U[1, 0], U[0,0]))
Width, height = 2 * np.sqrt(s)
else:
angle = 0
width, height = 2 * np.sqrt(covariance)
Result:
Hence, Data stored in a .CSV file was clustered using k-Means algorithm and compared
with both algorithm successfully.
K-means
from sklearn.cluster import KMeans
X=np.matrix(list(zip(f1,f2)))plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50]) plt.title('Dataset')
plt.ylabel('speeding_feature')
plt.xlabel('Distance_Feature')plt.scatter(f1,f2)
plt.show()
# KMeans algorithm#K = 3
kmeans_model = KMeans(n_clusters=3).fit(X)
plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l], marker=markers[l],ls='None')plt.xlim([0, 100])
plt.ylim([0, 50])plt.show()
Driver_ID,Distance_Feature,Speeding_Feature
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18
Result :
Hence, Data stored in a .CSV file was clustered using k-Means algorithm and the output
was verified successfully.
EX.NO. :8
DATE :
Write a program to implement k-Nearest Neighbor algorithm to classify
the irisdata set. Print both correct and wrong predictions. Java/Python
ML library classescan be used for this problem.
Aim:
Algorithm:
Step 1 : Training algorithm – For each training example (x,f(x)) add the
example to the list training examples algorithm
Step 2 : Given a query instance xq to be classified
Step 3 : Let x1….xk denote the k instances from training examples that are
nearest to xq , Return f(xq) Σi=1 to k * f(xi)
k
Step 4 : where, f(xi) function to calculate the mean value of the k nearest
training examples.
Program:
import csv import random
import math import operator
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]if
response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes =
sorted(classVotes.iteritems(),
reverse=True)
return sortedVotes[0][0]
def main():
# prepare data trainingSet=[]
testSet=[]split = 0.67
loadDataset('knndat.data', split, trainingSet, testSet) print('Train set: '
+ repr(len(trainingSet)))print('Test set: ' + repr(len(testSet)))
# generate predictions
predictions=[]k=3
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x],k) result =
getResponse(neighbors) predictions.append(result)
print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-1])) accuracy =
getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) +'%') main()
OUTPUT:
Result :
Hence, k-Nearest Neighbor algorithm to classify the iris data set was
implemented successfully.
EX.NO. : 9 Implement the non-parametric Locally Weighted Regression
DATE : algorithm in orderto fit data points. Select appropriate data
set for your experiment and drawgraphs.
Aim:
Algorithm:
Step 1 : Read the Given data Sample to X and the curve (linear or non linear) to Y
Step 2 : Set the value for Smoothening parameter or Free parameter say τ
Step 3 : Set the bias /Point of interest set x0 which is a subset of X
Step 4 : Determine the weight matrix
Step 5 : Determine the value of model term parameter β using :
β(x0) = (XTWX)-1 XTWy
Step 6 : Prediction = x0*β
Program:
from numpy import *import operator
from os import listdirimport matplotlib
import matplotlib.pyplot as pltimport pandas as pd
import numpy as np1 import numpy.linalg as
np
from scipy.stats.stats import pearsonr
def localWeight(point,xmat,ymat,k):wei =
kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))return W
def localWeightRegression(xmat,ymat,k):m,n =
np1.shape(xmat)
ypred = np1.zeros(m)for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)return ypred
Result:
Hence, the non-parametric Locally Weighted Regression algorithm in order
to fit data points was implemented successfully.