Ad3461 ML Lab Manual Format Edited
Ad3461 ML Lab Manual Format Edited
AIM:
ALGORITHM:
● Initialize the most general hypothesis (h_G) to the maximally general hypothesis
(all attributes set to '?').
● Initialize the most specific hypothesis (h_S) to the maximally specific hypothesis
(all attributes set to specific values or 'null' if not possible).
● For each attribute that does not match the positive example, make it
more specific in h_G.
● For each attribute that matches the positive example, make it more
specific in h_S.
● For each attribute that does not match the negative example, make it
more specific in h_S.
● For each attribute that matches the negative example, make it more
specific in h_G.
Step 3: Refine the version space.
● Remove any hypothesis from the version space that is more general than
another hypothesis or more specific than another hypothesis.
● Keep iterating through the training examples and refining the version space
until it becomes consistent, i.e., contains only one specific hypothesis that
correctly classifies all the training examples.
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
specific_h = concepts[0].copy()
print(specific_h)
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
OUTPUT:
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
Specific_h 8
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
Final General_h:
RESULT:
AIM:
ALGORITHM:
Step 2: Load the dataset and organize it into a table, with rows representing instances and
columns representing features. The last column should contain the class labels.
Step 3: Define a function to calculate the entropy of the dataset. Entropy measures the
uncertainty in the dataset based on class distribution.
Step 4: For each feature, calculate the information gain. Information gain measures how
much a feature contributes to reducing the uncertainty in the dataset.
Step 5: Select the feature with the highest information gain as the best feature to split the
dataset.
Step 6: Divide the dataset into subsets based on the values of the best feature found in
Step 4.
Step 8: Build the decision tree by assigning the best feature as the splitting criterion at
each internal node and the majority class as the class label for each leaf node.
Step 9: Use the created decision tree to classify new instances by traversing the tree from
the root to the appropriate leaf node based on their feature values.
PROGRAM:
import pandas as pd
import numpy as np
dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
entropy=np.sum([(counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
def InfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])
return Information_Gain
def ID3(data,originaldata,features,target_attribute_name="class",
parent_node_class = None):
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data)==0:
returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]
return parent_node_class
else:
parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
tree = {best_feature:{}}
value = value
subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)
tree = ID3(dataset,dataset,dataset.columns[:-1])
OUTPUT:
Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
10
RESULT:
Thus the implementation candidate - Elimination algorithm has been implemented successfully
AIM:
To implement Artificial Neural Network using back Propagation Algorithm using python
script.
ALGORITHM:
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
11
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce
the error.
PROGRAM:
import numpy as np
#Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
12
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
13
d_hiddenlayer = EH * hiddengrad
wh += X.T.dot(d_hiddenlayer) *lr
OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
14
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
RESULT:
Thus the implementation of back propagation algorithm has been done successfully.
AIM:
ALGORITHM:
15
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
16
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
print('Accuracy metrics')
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:
17
8 He is my sworn enemy
9 My boss is horrible
12 I love to dance
01
11
21
31
41
50
60
70
80
90
10 1
18
11 0
12 1
13 0
14 1
15 0
16 1
17 0
(5,)
(13,)
(5,)
(13,)
Accuracy metrics
Confusion matrix
[[3 1]
[0 1]]
1.0
0.5
19
RESULT:
Thus the implementation of Naive Bayesian Classifier algorithm has been done
successfully.
20
AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using
python.
ALGORITHM:
PROGRAM:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
21
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes
belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
22
23
def main():
filename = '5data.csv' splitRatio = 0.67
dataset = loadCsv(filename);
main()
OUTPUT:
confusion matrix is as
follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support
RESULT:
Thus the implementation of Naïve Bayesian Classifier model has been done successfully.
24
AIM:
To implement a Bayesian Network to diagnose an infection with WHO dataset using
python script.
ALGORITHM:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
PROGRAM:
25
26
OUTPUT:
27
RESULT:
To the implementation of a Bayesian Network to diagnose an infection with WHO dataset
has been done successfully
28
AIM:
To implement EM algorithm to cluster a data set using python.
ALGORITHM:
Step 1: Identify the variable in which the set of attributes are specified in the data set
Step 2: Determine the domain of each variable to take from the set of values.
Step 3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.
Step 4: Determine the prior and conditional probability for each attribute
Step 5: Perform the inference on the module and determine the marginal probability.
PROGRAM:
import numpy as np
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
29
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
kmeans = KMeans(n_clusters=3)
30
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
OUTPUT:
EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean: [[57.70629058 25.73574491][52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances [[[83.51878796 14.926902 ] [14.926902 2.70846907]] [[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54 27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40. ]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05 26. ]
[58.45 30. ] [43.42 23. ] [55.68 37. ] [55.15 18. ][[57.74090909 24.27272727] [48.6 38. ] [45.176
16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]
31
RESULT:
Thus the EM Algorithm to cluster a data set has been implemented successfully.
32
ALGORITHM:
Step 1: Start the Program
Step 3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.
Step 7: Predictions for the KNN Classifier, then in the test set, we forecast the target
values and compare them to the actual values.
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
33
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
34
OUTPUT:
0 1 2 3
5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------------
35
RESULT:
36
Thus the K-Nearest Neighbour Algorithm to classify the data set using Python has been
implemented successfully.
EX NO 9: IMPLEMENTATION OF NON-PARAMETRIC
LOCALLY WEIGHTED REGRESSION ALGORITHM
AIM:
ALGORITHM:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
PROGRAM:
37
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights
m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))
#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
38
OUTPUT:
RESULT:
Thus the non parametric locally weighted regression algorithm has been implemented
successfully.
ALGORITHM:
39
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
PROGRAM:
40
41
RESULT:
Thus the Regression Algorithm using Python has been implemented successfully.
42
ALGORITHM:
Step 1:
43
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
Step 8:
Step 9:
PROGRAM:
44
RESULT: