KEMBAR78
Pattern Recog - Manual | PDF | Estimation Theory | Sampling (Signal Processing)
0% found this document useful (0 votes)
19 views115 pages

Pattern Recog - Manual

Pattern Recog - Manual

Uploaded by

ishaanpanwar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views115 pages

Pattern Recog - Manual

Pattern Recog - Manual

Uploaded by

ishaanpanwar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 115

Laboratory Report Cover Sheet

SRM Institute of Science and Technology


College of Engineering and Technology
Department of Electronics and Communication Engineering
18ECCE242J Pattern Recognition and Neural Networks
Fifth Semester, 2023-24 (odd semester)

Name :
Register No. :
Day / Session :
Venue :
Title of Experiment :
Date of Conduction :
Date of Submission :

Particulars Max. Marks Marks Obtained


Pre-lab questions 10
In-lab experiment 15
Post-lab questions 10
Viva 5
Total 40

REPORT VERIFICATION

Date :

Staff Name :

Signature :
1. DIGITIZATION OF ANALOG SIGNAL

1.1 Objective
To convert the anlaog signal into digital signal

1.2 Tasks
I. Generate a continuous time signal

II. Discretised the continuous time signal using sampling

III. Convert the discrete time signal to the digital signal using rounding and truncation process of
quantization.

1.3 Theory
A signal is defined as any physical quantity that varies with time, space, or any other
independent variable or variables. Digitizaation of the signal is shown in the following figure.

Fig 1.1 Digitization of the Signal

Sampler converts analog signal in to discrete time signal. After sampling the
amplitude values are infinite between the two limits. The quantizer is used to map the infinite
amplitude values onto a finite set of known values. Encoder converts quantized value into
digital signal (sequence of 0 and 1).

Sampling

In sampling, analog signal is sampled every TS secs. Ts is referred to as the sampling interval.
fs = 1/Ts is called the sampling rate or sampling frequency.

There are 3 sampling methods:

• Ideal - an impulse at each sampling instant (amplitude at each TS interval.


• Natural - a pulse of short width with varying amplitude
• Flattop - sample and hold, like natural but with single amplitude value

Generally, ideal sampling is used.


1
x ( n )=x (n T S ) T S= F s= sampling frequency
FS
(Ideal sampling:)

Fig 1.2 Sampling Techniques

Sampling Theorem According to the Nyquist theorem, the sampling rate must be at least 2
times the highest frequency contained in the signal. F S ≥2 F max

Quantizer Sampling results in a series of pulses of varying amplitude values ranging between
two limits: a min and a max. The amplitude values are infinite between the two limits. We
need to map the infinite amplitude values onto a finite set of known values.

If we want to convert b number bits. Number of levels or number of zones=L=2b. This is


achieved by dividing the distance between min and max into L zones, each of height  
=(max - min)/L

Assume we have a voltage signal with amplitudes V min=-20V and Vmax=+20V. We want to
use 3 bits(b), so number of levels=L=23=¿ 8 quantization levels. Convert the discretized
value (d) as quantized type value(q)

V max−V min
q 1= (L means 0 to L-1)
L−1

d−V min
q=
q1

There are 2 methods used for quantization.

(i) Rounding

(ii) Truncation
Rounding: Convert the value of q into nearest integer(n q ¿ by rounding

For eg. q=1.8 then, n q=2.

If q=1.2 then, n q=1.

Truncation: Convert the value of q into nearest integer(n q ¿ by truncating.

For eg. q=1.8 then, n q=1.

If q=1.2 then, n q=1.

Fig 1.3 Quantization levels

Basic codes used in the programme:

sin-------- It is used to generate sin wave

cos---It is used to generate cosine wave

plot----- plot the continuous time signal

plot2d3--- plot the discrete time signal

xlabel-----labelling the x axis of the plot.

ylabel----labelling the ylabel of the plot

subplot------ divide the plot

title----- giving title to the plot

round---rounding the floating number to the integer

floor--- truncating the floating number to the integer

dec2bin-----convert decimal to binary


disp------display the values in command window

1.4 Algorithm

Step-1:

Step-2:

1.5 Programme

// Experiment number-1

//Digitization of analog signal

// Clear all the things in command window

clc ;

close ;

clear ;

//Part-1---Generation of analog signal

// Generation of sin wave with peak value 5;

A=5; // peak value

Amax=A // maximum value

Amin=-A // minimum value


f =100; // maximum frequency=100 Hz

t=0:0.00009:0.5; // 0.5 seconds

y = A*sin (2* %pi * f * t ) ;

subplot(2,1,1);

plot (t, y );

title ( "Sin wave" )

xlabel ( "Time" ) ;

ylabel ( "Amplitude" ) ;

//Part-2 --Sampling

//Sampling frequency // Fs>>2fm

// x(n)=x(Ts . n)

Fs=800; // sampling frequency

Ts=1/Fs;

t1=0:Ts:0.5;

n=1:length(t1);

y1 = A*sin (2* %pi * f * t1 ) ;

subplot(2,1,2);

plot2d3 ( y1 );

title ( "Out put of sampler" )

xlabel ( "Number of samples" ) ;

ylabel ( "Amplitude" ) ;

// Part-3 ---Quantization

// Quantization

b=3; // number of bits for quantization (excluding sign bit)


L=2^b // number of levels

quant1=(Amax-Amin)/(L-1);

quant=(y1-Amin)/quant1;

// rounding

y2=round(quant);

out1=dec2bin((y2,b)) // Converting to binary value

y3=floor(quant);

out2=dec2bin((y3,b)) // Converting to binary value

out3=[out1;out2];

disp(out3); // Displaying both the quantized signal

1.6 Results

1.7 Screen Shots


1.8 Pre-lab Questions
1. Define the Nyquist theorem or sampling theorem ?

2. Explain the process of Ideal sampling.

3. Explain the process of quantization using rounding operation.

4. Explain the process of quantization using truncation operation.

Pre-lab Answers
1.

2.

3.

4.

1.9 Post-lab Questions


1. Write a programme using Scilab to generate a cosine wave with peak value 3 volt,
maximum frequency 200 Hz. Show the graph.

2. Sampled the generated cosine wave with sampling frequency 600 Hz. Plot the sampled
signal.

3. Sampled the generated cosine wave with sampling frequency 300 Hz. Plot the sampled
signal. Find out the difference between plot and 2 and this question.

4. Generate the binary code using the rounding operation for number bits=4. What is the
binary value at sample number 10?
5. Generate the binary code using the truncation operation for number bits=4. What is the
binary value at sample number 10?

Post-lab Answers
1.

2.

3.

4.

5.

1.10 Conclusion
2. PROGRAM TO COUNT THE WHITE PIXELS FROM THE IMAGE

2.1 Objective
Program to count the white pixels from the image

2.2 Tasks
I. Read the image file.

II. Convert the image to Gray scale

III. Count the number of white pixel and black pixels from image.

2.3 Theory
Basic about Pixels of image

An image consists of a rectangular array of dots called pixels. The size of the image is
usually specified as width X height, in numbers of pixels. The physical size of the image, in
inches or centimeters or whatever, depends on the resolution of the device on which the
image is displayed. Resolution is usually measured in terms of DPI, which stands for dots
per inch. An image will appear smaller (and generally sharper) on a device with a higher
resolution than on one with a lower resolution.

The usefulness of imread()

A=imread(“image name”) ----- Is used in Scilab/MATLAB to read the image. Generally, A


consist of 3-dimensional vector. 2 dimensions is used to represent the pixel. That third
column of the 3-dimensional image array tells us that the image has 3 planes, here (red,
green, blue) since the image is RGB. This is a color image. Gray is a valid color, just not a
terribly exciting one. (And it is available in more than 50 shades.)

imshow(A) ----Display the image

For Scilab use atomsInstall(“IPCV”)

Counting of white pixels and black pixels

For white pixel, pixel value=255. For black pixels the pixel value=0; There are 3 methods can
be use to detect White pixels
Method-1: Calculate the channels

i. redChannel = A(:, :, 1);

ii. greenChannel = A(:, :, 2);

iii. blueChannel = A(:, :, 3);

Then calculate how many have all 3 channels have. value of 255 for counting number of
white pixel

Method-2:

Covert the image in to Gray scale image. b11=rgb2gray(A) ---- This code is used to convert
rgb to gray. After that simply count number of pixels where it is 255 for white pixel, and 0
for black pixels

Method-3:

Convert Gray scale image into binary image. From the image if value is 1, that is white, If
value is zero, that is black. a=im2double(b11) ----- it is used to convert gray scale image to
binary

2.4 Algorithm
Step-1:

Step-2:

2.5 Programme
//Experiment number-2 //Program to count the white pixels from the image
clc

clear all;

//atomsInstall("IPCV")

x=imread("cameraman.jpg");

figure; imshow(x)

// 3 channels

redChannel = x(:, :, 1); greenChannel = x(:, :, 2); blueChannel = x(:, :, 3);

//Method-1 // calculation of number of white pixels

whitePixels = redChannel == 255 & greenChannel == 255 & blueChannel == 255;

count = sum(whitePixels(:));

disp( "by method-1 number of white pixels=", count)

// Method-2 b11=rgb2gray(x);

// convert gray image

figure;

imshow(b11)

whitePixels = b11== 255

blackPixels=b11==0

count1 = sum(whitePixels(:));

count2= sum(blackPixels(:)); d

isp( "by method-2 number of white pixels=", count1)

disp( "by method-2 number of black pixels=", count2

//Method-3

a=im2double(b11);

//im2double(I) converts the image I to double precision.


// I can be a grayscale intensity image, a truecolor image, or a binary image.

// im2double rescales the output from integer data types to the range [0, 1].

// black correspond to zero and

// white correspond to one if image is double format

b=find(a==0); // find black pixels

w=find(a==1); // find white pixels

numBlackPixels = length(b); // Count the number of linear indexes returned.

numWhitePixels = length(w); // Count the number

disp( "by method-3 number of white pixels=", numWhitePixels )

disp( "by method-3 number of black pixels=", numBlackPixels)

2.6 Results

2.7 Screen Shots


2.8 Pre-lab Questions
5. .Q. Explain the concept of pixel in image.

6. Q. A=imread (). A is 3 dimensional matrix. Explain about each dimension.

7. Q. What is the pixel value for white colour?

8. Q. What is the pixel value for black colour?

Pre-lab Answers
1.

2.

3.

4.

2.9 Post-lab Questions


1.Q- Read the image fruit as given in the folder.

2. Q- display the image

3.Q. Convert the image into Gray scale and display it.

4.Q. use all the 3 methods to calculate the number of white pixels and black pixels. Display
it.

Post-lab Answers
1.

2.
3.

4.

2.10 Conclusion
3. ANALYSIS OF DATA SET WITH CLASSIFIERS

3.1 Objective
To analysis of data set with classifiers
3.2 Tasks
I. Create datasets.
II. Create datasets and assign label to datasets
III. Compute conditional probability density function using Gaussian probabilistic density
function
IV. Compute posterior probability
V. Design Bayes classifier
VI. Computation of accuracy and classification error using Bayes classifier
3.3 Theory
Some basic notations

• p ( w i|x ) = q i ( X )=¿posterior probability for class w i corresponding to feature vector x

• p ( w i )=Pi=¿ prior probability for class w i

• p ( x|wi ) = f i ( X )=¿ Conditional probability density function for feature vector


correspond to class w i
p ( x|wi ) p ( wi )
• p ( w i|x ) =
Z
M
• Z=∑ p ( x|wi ) p ( w i )
i=1

Bayes Classifier
f i ( X ) . Pi
Bayes theorem states that: q i ( X )= Where Z=f 0 ( X ) p0 +¿ f 1 ( X ) p1 is the normalizing
Z
constant.
Consider the classifier given by

{
q0 ( X )
0 >1
h ( X )= q1 ( X)
1 otherwise
This is called the binary Bayes classifier.

We are given a pattern whose class label is unknown and


Let x ≡ [x(1), x(2), . . . , x(l)]T ∈ Rl be its corresponding feature vector, which results from
some measurements. Also, we let the number of possible classes be equal to c, that is, w 1 ,
. . . , wc

According to the Bayes decision theory, x is assigned to the class w i if


p ( w i|x ) > P ( w j| x ) for every j ≠ i

p ( w i|x ) is posterior probability

p ( x|wi ) p(wi )> p ( x|w j) p(w j) for every j ≠ i

Bayes classifier optimal in the sense that minimizes the probability error.

The techniques adopted in this programme


In this programme the data sets were generated using random normal distribution function”
grand”
The dataset generated by random numbers is assigned to binary label (-1 or 1) for each
sample.
For estimation of conditional density function Gaussian distribution function is used for
estimation.

Here σ= variance or covariance


μ= mean vector
X= input data
After computation of conditional density function, it is multiplied with prior probability and
estimated the posterior probability
According to posterior probability, the action is considered, according to action the
classification is assigned.
Then accuracy and error in percentage is calculated, by computing how much samples
correctly detected.
3.4 Algorithm
Step-1:
Step-2:

3.5 Programme
Programme-1: Create datasets
clc;
//
//% Generate the first dataset (case #1)
rand('seed',0);
m=[0 0]';
S=[1 0;0 1];
N=500;

X= grand(N, "mn", m, S) // Generates multivariate normal random variates;


//Mean (m) must be a m x 1 column vector and Cov(S) a m-by-m symmetric positive definite
matrix (Y is then a m-by-n matrix)

//% Plot the first dataset


figure(1);
plot(X(1,:),X(2,:),'.');

//% Generate and plot the second dataset (case #2)


m=[0 0]';
S=[0.2 0;0 0.2];
N=500;
X= grand(N, "mn", m, S)
figure(2),
plot(X(1,:),X(2,:),'.');

//% Generate and plot the third dataset (case #3)


m=[0 0]';
S=[2 0;0 2];
N=500;
X= grand(N, "mn", m, S)
figure(3),
plot(X(1,:),X(2,:),'.');

//% Generate and plot the fourth dataset (case #4)


m=[0 0]';
S=[0.2 0;0 2];
N=500;
X= grand(N, "mn", m, S)
figure(4),
plot(X(1,:),X(2,:),'.');

//
//% Generate and plot the fifth dataset (case #5)
m=[0 0]';
S=[2 0;0 0.2];
N=500;
X= grand(N, "mn", m, S)
figure(5),
plot(X(1,:),X(2,:),'.');

//% Generate and plot the sixth dataset (case #6)


m=[0 0]';
S=[1 0.5;0.5 1];
N=500;
X= grand(N, "mn", m, S)
figure(6),
plot(X(1,:),X(2,:),'.');

// Generate and plot the seventh dataset (case #7)


m=[0 0]';
S=[.3 0.5;0.5 2];
N=500;
X= grand(N, "mn", m, S)
figure(7),
plot(X(1,:),X(2,:),'.');

//% Generate and plot the eighth dataset (case #8)


m=[0 0]';
S=[.3 -0.5;-0.5 2];
N=500;
X= grand(N, "mn", m, S)
figure(8),
plot(X(1,:),X(2,:),'.');

Programme-2: Create datasets and assign label to datasets


clc
clear all

rand('seed',0);
// Generate the dataset X1 as well as the vector containing the class labels of
// the points in X1
N=[100 100]; // 100 vectors per class
l=2; // Dimensionality of the input space

x=[3 3]';
//x=[2 2]'; //for X2
// x=[0 2]'; for X3
// x=[1 1]'; for X4

X1=[2*rand(l,N(1)) 2*rand(l,N(2))+x*ones(1,N(2))];
X1=[X1; ones(1,sum(N))];
y1=[-ones(1,N(1)) ones(1,N(2))];

// 1. Plot X1, where points of different classes are denoted by different colors,
figure(1),
plot(X1(1,y1==1),X1(2,y1==1),'bo',X1(1,y1==-1),X1(2,y1==-1),'r.')

Programme-3: Compute conditional probability density function using Gaussian


probabilistic density function

function [z]=comp_gauss_dens_val(m, S, x)

//% FUNCTION
//% [z]=comp_gauss_dens_val(m,S,x)
//% Computes the value of a Gaussian distribution, N(m,S), at a specific point

//%
//% INPUT ARGUMENTS:
//% m: l-dimensional column vector corresponding to the mean vector of the
//% gaussian distribution.
//% S: lxl matrix that corresponds to the covariance matrix of the
//% gaussian distribution.
//% x: l-dimensional column vector where the value of the gaussian
//% distribution will be evaluated.
//%
//% OUTPUT ARGUMENTS:
//% z: the value of the gaussian distribution at x.
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[l,c]=size(m);
z=(1/( (2*%pi)^(l/2)*det(S)^0.5) )*exp(-0.5*(x-m)'*inv(S)*(x-m));
end

Programme-4: Design Bayes classifier


function [z]=bayes_classifier(m, S, P, X)

//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
// FUNCTION
// [z]=bayes_classifier(m,S,P,X)
// Bayesian classification rule for c classes, modeled by Gaussian
//% distributions (also used in Chapter 2).
//%
//% INPUT ARGUMENTS:
//% m: lxc matrix, whose j-th column is the mean of the j-th class.
//% S: lxlxc matrix, where S(:,:,j) corresponds to
//% the covariance matrix of the normal distribution of the j-th
//% class.
//% P: c-dimensional vector, whose j-th component is the a priori
//% probability of the j-th class.
//% X: lxN matrix, whose columns are the data vectors to be
//% classified.
//%
//% OUTPUT ARGUMENTS:
//% z: N-dimensional vector, whose i-th element is the label
//% of the class where the i-th data vector is classified.

[l,c]=size(m);
[l,N]=size(X);

for i=1:N
for j=1:c
t(j)=P(j)*comp_gauss_dens_val(m(:,j),S(:,:,j),X(:,i));
end
[num,z(i)]=max(t);
end
end

Programme-5: Classification using Bayes classifier


clc
rand('seed',0);
// Generate the dataset X1 as well as the vector containing the class labels of
// the points in X1
N=[100 100]; // 100 vectors per class
l=2; // Dimensionality of the input space

x=[3 3]';
//x=[2 2]'; //for X2
// x=[0 2]'; for X3
// x=[1 1]'; for X4

X1=[2*rand(l,N(1)) 2*rand(l,N(2))+x*ones(1,N(2))];
X1=[X1; ones(1,sum(N))];
y1=[-ones(1,N(1)) ones(1,N(2))];

// 1. Plot X1, where points of different classes are denoted by different colors,
figure(1),
plot(X1(1,y1==1),X1(2,y1==1),'bo',X1(1,y1==-1),X1(2,y1==-1),'r.')

m=[0 0 0; 1 2 2; 3 3 4]';
S1=0.8*eye(3);
S(:,:,1)=S1;S(:,:,2)=S1;S(:,:,3)=S1;
P=[1/3 1/3 1/3]';

z_bayesian=bayes_classifier(m,S,P,X1);
for i=1:length(z_bayesian)
if (z_bayesian(i)==3)
z11(i)=1;
else
z11(i)=-1;
end
end

//Compute the error probability for each classifier


l22=find(y1'==z11);
correctsamp=length(l22);
err_bayesian = (1-(correctsamp/length(z11)))*100;
acc_classification=((correctsamp/length(z11)))*100;
z22=[z11 y1']
disp(z22);
disp('in percentage error is', err_bayesian);
disp('in percentage accuracy is', acc_classification);

3.6 Results

3.7 Screen Shots


3.8 Pre-lab Questions
1. Define Bayes classifier.
2. Explain about the normal distribution function
3. Explain about the Gaussian distribution function
Pre-lab Answers
1.

2.

3.

3.9 Post-lab Questions


1. Identify the difference in creation of 8 datasets in Programme-1.
2. In programme-5, set x=[2 2]’ . Write the programme. Show the percentage of error
and percentage of accuracy.
3. In programme-5, set m=[0 0 1; 1 3 2; 3 2 4]'; S1=0.6*eye(3);. Write the programme.
Show the percentage of error and percentage of accuracy.
4. Compare the accuracy of Programme-5 and post lab question-4
Post-lab Answers
1.

2.

3.

4.

3.10 Conclusion
4. PROGRAMS ON ESTIMATION

4.1 Objective
To estimate of parameters for normal distribution function of

4.2 Tasks
I. Estimation of parameters using maximum likelihood method

II. Generation of dataset with class labels by using Gaussian distribution.

III. Estimation of parameters using maximum posterior estimation

4.3 Theory
Parametric Method

One of the most straight forward approaches to estimate conditional probability density p(x )
in terms of a specific functional form which contains number of adjustable parameters. The
values of the parameters can then be optimized to give best fit to the data. The simplest and
most widely used parametric model is the normal or Gaussian distribution, which has a
number of convenient and static properties.

The normal distribution

The normal density function for the case of a single variable can be written in the form

{ }
2
1 −(x −μ)
p ( x )= 1
exp ⁡
( 2σ 2 )
(2π σ )2 2

(1)

Where, μ= mean σ 2=variance Square root of variance is called the standard deviation.

The coefficient in front of the exponential in equation-(1) is ensures that

∫ p ( x ) dx=1
−∞
The mean and variance of the one-dimensional distribution satisfy


μ=E [ x ] =¿ ∫ x p ( x ) dx
−∞


σ =E [ ( x−μ ) ]= ∫ ( x−μ ) p ( x ) dx
2 2 2

−∞

Where E [ . ] denotes the expectation.

In d dimension the general multivariate normal probability density can be written

p ( X )=
1
d
( 2 π ) |S|
2
1
2
exp {−12 ( x−μ) S
T −1
( x−μ ) } (2)

Where, mean μ is a d dimensional vector.

S is a dxd covariance matrix, and

|S|is the determinant of S

The pre-factor in equation-(2) ensures that

∫ p ( X ) dx=1
−∞

μ=E [ x ]

σ =E [ ( x−μ ) ( x−μ ) ]
2 T

2 T −1
∆ =( x−μ ) S ( x−μ ) is called the Mahalanobis distance from x to μ.

It is sometimes convenient to consider a simplified form of a Gaussian distribution in which


covariance matrix is diagonal.

2
(S)i , j=δ i , j σ j

It reduces the total number of independent parameters to 2d . δ = Identity matrix.

• In this case, the components of X are statistically independent, can be written as


product of distribution function

d
p ( X )=∏ p(x i )
i=1
Further simplification can be obtained by choosing

σ j=σ

Maximum likelihood method (Maximum likelihood estimation)

For simplicity, we are using normal density function as probability of conditional density
function. Suppose we consider a conditional density function p ( X ) which depends on set of
T
parameters θ=( θ1 , … … … … … … . ,θ M ) . In a classification problem we would take one such
function for each of the class.Here, we shall omit the class labels for simplicity, but
essentially the same steps are performed separately for each class in the problem. To make
the dependence on the parameters explicit, the density function is written in the form
p( X∨θ). We also have dataset of N vectors. χ ={ X 1 , … … … .. , X N }. If these vectors are
drawn independently from the distribution p( X∨θ), then the joint probability density of the
N
whole dataset χ is given by p ¿|θ ¿=∏ p( X ∨θ)= L(θ)
n

n=1

Where, L(θ) can be viewed as a function of θ for fixed χ ,in which case it is referred as the
likelihood of θ for the given χ . The technique of maximum likelihood then sets the value of
θ by maximizing L(θ). The idea of choosing θ which is most likely to give rise to the
observed data.

In practice, it is often more convenient to consider the negative logarithm of likelihood.

E=−ln ¿ ¿ (3)

And to find the maximum of E .

This is equivalent to maximizing L, since the negative logarithm of likelihood is


monotonically decreasing function. The negative log-likelihood can be regarded as an error
function. For most choices, the density function, the optimum θ will have to found by
iterative numerical procedure. However for the special case of a multivariate normal density,
we can find the maximum likelihood solution by analytic differentiation of the equation-(3).

Some straight forward but rather involved matrix algebara then leads to the following results.

N
1
^μ= ∑ Xn
N n=1
(4)
N
^S= 1 ∑ (X n−μ)(X n−μ)T (5)
N n=1

Which represents the maximum likelihood estimate ^μ of the mean vector μ is given by the
sample average (i.e. the average with respect to the given dataset) shows in equation-5.

Similarly, the maximum likelihood estimate ^S of the covariance matrix S is given by the
equation-(5).

It can suffer from some deficiencies.

Maximum posterior estimation:

Here prior probability is assumed as explained in the process of likelihood the same method
is continued, to evaluate the posterior probability using Bayes theorem. The posterior is
developed by generating the Gaussian distribution with the help of parameter. The best suited
parameter gives the better result is consider as estimated parameter by using maximum
posterior estimation.

4.4 Algorithm
Step-1:

Step-2:

4.5 Programme
Programme-1: Estimation of parameters using Maximum likelihood estimation

function [m_hat, S_hat]=Gaussian_ML_estimate(X)


//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
//% FUNCTION
//% [m_hat,S_hat]=Gaussian_ML_estimate(X)
//% Maximum Likelihood parameters estimation of a multivariate Gaussian
//% distribution, based on a data set X.
//%
//% INPUT ARGUMENTS:
//% X: lxN matrix, whose columns are the data vectors.
//%
//% OUTPUT ARGUMENTS:
//% m_hat: l-dimensional estimate of the mean vector of the distribution.
//% S_hat: lxl estimate of the covariance matrix of the distribution.
//%
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[l,N]=size(X);
m_hat=(1/N)*sum(X')';
S_hat=zeros(l);
for k=1:N
S_hat=S_hat+(X(:,k)-m_hat)*(X(:,k)-m_hat)';
end
S_hat=(1/N)*S_hat;
end

Programme-2: Estimation of parameters (mean and variance) using Maximum likelihood


estimation for a dataset generated by using normal distribution function

clc;
// Generate dataset X
rand('seed',0);
m = [2 -2]'; //mean
S= [0.9 0.2; 0.2 .3]; // covariance
X=grand(50, "mn", m,S);
figure;
plot(X(1,:), X(2,:),'.r');
title ("Dataset")
// Compute the ML estimates of m and S
[m_hat_1, S_hat_1]=Gaussian_ML_estimate(X);
disp( "Estimate mean by maximum likelihood estimation", m_hat_1);
disp( "Estimate mean by maximum likelihood estimation",S_hat_1);

Programme-3: Generation of dataset with class labels using Gaussian distribution

function [X, y]=generate_gauss_classes(m, S, P, N)


//
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
//% FUNCTION
//% [X,y]=generate_gauss_classes(m,S,P,N)
//% Generates a set of points that stem from c classes, given
//% the corresponding a pricori class probabilities and assuming that each
//% class is modeled by a Gaussian distribution (also used in Chapter 2).
//%
//% INPUT ARGUMENTS:
//% m: lxc matrix, whose j-th column corresponds to the mean of
//% the j-th class.
//% S: lxlxc matrix. S(:,:,j) is the covariance matrix of the j-th normal
//% distribution.
//% P: c-dimensional vector whose j-th component is the a priori
//% probability of the j-th class.
//% N: total number of data vectors to be generated.
//%
//% OUTPUT ARGUMENTS:
//% X: lxN matrix, whose columns are the produced data vectors.
//% y: N-dimensional vector whose i-th component contains the label
//% of the class where the i-th data vector belongs.
//%

//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[l,c]=size(m);
X=[];
y=[];
for j=1:c
//% Generating the [p(j)*N] vectors from each distribution
// t=mvnrnd(m(:,j),S(:,:,j),fix(P(j)*N))';
t=grand(N,"mn",m(:,j),S(:,:,j))';
// % The total number of data vectors may be slightly less than N due to
// % the fix operator
X=[X t];
y=[y ones(1,fix(P(j)*N))*j];
end
end

Programme-4: Estimation of parameters (mean and variance) using Maximum likelihood


estimation for a dataset generated with class labels using Gaussian distribution

clc;
//To generate X, utilize the function generate_gauss_classes
//m=[0 0 0; 1 2 2; 3 3 4]';
m=[1 3 4];
S1=0.8*eye(3);
S(:,:,1)=S1;S(:,:,2)=S1;S(:,:,3)=S1;
P=[1/3 1/3 1/3]';
N=1000;
rand('seed',0);
[X,y]=generate_gauss_classes(m,S,P,N);

// Compute the ML estimates of the mean values and covariance matrix


//(common to all three classes) using function Gaussian_ML_estimate
class1_data=X(find(y==1),:);
[m1_hat, S1_hat]=Gaussian_ML_estimate(class1_data);
class2_data=X(find(y==2),:);
[m2_hat, S2_hat]=Gaussian_ML_estimate(class2_data);
class3_data=X(find(y==3),:);
[m3_hat, S3_hat]=Gaussian_ML_estimate(class3_data);
S_hat=(1/3)*(S1_hat+S2_hat+S3_hat);
m_hat=[m1_hat m2_hat m3_hat];

disp( "Estimate mean by maximum likelihood estimation for Gaussian class", m_hat);

Programme-5: Computation of values of each Gaussian

function [z]=gauss(x, m, s)
//
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
//% FUNCTION (auxiliary)
//% [z]=gauss(x,m,s)
//% Takes as input the mean values and the variances of a number of Gaussian
//% distributions and a vector x and computes the value of each
//% Gaussian at x.
//%
//% NOTE: It is assumed that the covariance matrices of the gaussian
//% distributions are diagonal with equal diagonal elements, i.e. it has the
//% form sigma^2*I, where I is the identity matrix.
//%
//% INPUT ARGUMENTS:
//% x: l-dimensional row vector, on which the values of the J
//% gaussian distributions will be calculated
//% m: Jxl matrix, whose j-th row corresponds to the
//% mean of the j-th gaussian distribution
//% s: J-dimensional row vector whose j-th component corresponds to
//% the variance for the j-th gaussian distribution (it is assumed
//% that the covariance matrices of the distributions are of the
//% form sigma^2*I, where I is the lxl identity matrix)
//%
//% OUTPUT ARGUMENTS:
//% z: J-dimensional vector whose j-th component is the value of the
//% j-th gaussian distribution at x.
//%
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[J,l]=size(m);
[p,l]=size(x);
z=[];
for j=1:J
t=(x-m(j,:))*(x-m(j,:))';
c=1/(2*%pi*s(j))^(l/2);
z=[z c*exp(-t/(2*s(j)))];
end
end

Programme-6: Estimation of parameters using maximum posterior estimation (ONLY


RUNS FOR THE CASE WHERE THE COVARIANCE MATRICES ARE OF THE
FORM sigma^2*I. )
function [m, s, Pa, iter, Q_tot, e_tot]=em_alg_function(x, m, s, Pa, e_min)

//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
//% FUNCTION
//% [m,s,Pa,iter,Q_tot,e_tot]=em_alg_function(x,m,s,Pa,e_min)
//% EM algorithm for estimating the parameters of a mixture of normal
//% distributions, with diagonal covariance matrices.
//% WARNING: IT ONLY RUNS FOR THE CASE WHERE THE COVARIANCE MATRICES
//% ARE OF THE FORM sigma^2*I. IN ADDITION, IF sigma_i^2=0 FOR SOME
//% DISTRIBUTION AT AN ITERATION, IT IS ARBITRARILY SET EQUAL TO 0.001.
//%
//% INPUT ARGUMENTS:
//% x: lxN matrix, each column of which is a feature vector.
//% m: lxJ matrix, whos j-th column is the initial
//% estimate for the mean of the j-th distribution.
//% s: 1xJ vector, whose j-th element is the variance
//% for the j-th distribution.
//% Pa: J-dimensional vector, whose j-th element is the initial
//% estimate of the a priori probability of the j-th distribution.
//% e_min: threshold used in the termination condition of the EM
//% algorithm.
//%
//% OUTPUT ARGUMENTS:
//% m: it has the same structure with input argument m and contains
//% the final estimates of the means of the normal distributions.
//% s: it has the same structure with input argument s and contains
//% the final estimates of the variances of the normal
//% distributions.
//% Pa: J-dimensional vector, whose j-th element is the final estimate
//% of the a priori probability of the j-th distribution.
//% iter: the number of iterations required for the convergence of the
//% EM algorithm.
//% Q_tot: vector containing the likelihood value at each iteration.
//% e_tot: vector containing the error value at each itertion.
//%
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

x=x';
m=m';
[p,n]=size(x);
[J,n]=size(m);

e=e_min+1;

Q_tot=[];
e_tot=[];

iter=0;
//while (e>e_min)
while (iter<20)

iter=iter+1;
e;

P_old=Pa;
m_old=m;
s_old=s;

//% Determine P(j|x_k; theta(t))


for k=1:p
temp=gauss(x(k,:),m,s);
ru=isnan(temp);
temp(ru)=0;

P_tot=temp*Pa';
for j=1:J
P(j,k)=temp(j)*Pa(j)/P_tot;
end
end

//% Determine the log-likelihood


Q=0;
for k=1:p
for j=1:J
Q=Q+P(j,k)*(-(n/2)*log(2*%pi*s(j)) - sum( (x(k,:)-m(j,:)).^2)/(2*s(j)) + log(Pa(j)) );
end
end
Q_tot=[Q_tot Q];

// % Determine the means


for j=1:J
a=zeros(1,n);
for k=1:p
a=a+P(j,k)*x(k,:);
end

la=(sum(P(j,:)));
m(j,1)=a(n)/la;

end

// % Determine the variances


for j=1:J
b=0;
for k=1:p
b=b+ P(j,k)*((x(k,:)-m(j,:))*(x(k,:)-m(j,:))');
end
s(j)=b/(n*sum(P(j,:)));

if(s(j)<10^(-10))
s(j)=0.001;
end

end
end
// % Determine the a priori probabilities
for j=1:J
a=0;
for k=1:p
a=a+P(j,k);
end
Pa(j)=a/p;
end
ru1=isnan(m);
m(ru1)=0;
ru1=isnan(s);
s(ru1)=0;
e=sum(abs(Pa-P_old))+sum(sum(abs(m-m_old)))+sum(abs(s-s_old));
e_tot=[e_tot e];

end

Programme-7: Estimation of parameters (mean and variance) using Maximum posterior


estimation for a dataset generated by using normal distribution function

clc;
// Generate dataset X
rand('seed',0);
m = [2 -2]'; //mean
S= [0.2 0; 0.2]; // covariance
X=grand(50, "mn", m,S);

//maximum posterior estimation

m1_ini=[ 2];m2_ini=[-1]
m_ini=[m1_ini m2_ini];
s_ini=[.1 0; 0 0.1];
Pa_ini=[1/2 1/2 ];
e_min=10^(-5);
[m_hat,s_hat,Pa,iter,Q_tot,e_tot]=em_alg_function(X,m_ini,s_ini,Pa_ini,e_min);
disp("Estimate mean by posterior likelihood estimation",m_hat);
disp("Estimate covariance by posterior likelihood estimation",s_hat);

4.6 Results

4.7 Screen Shots

4.8 Pre-lab Questions


4. Explain about the parametric method.

5. Differentiate between maximum likelihood method and Bayesian inference approach

6. Write the formula for estimation of mean and covariance in maximum likelihood method
for one dimensional data.

Pre-lab Answers
1.
2.

3.

4.9 Post-lab Questions

5. Generate a dataset with normalized distribution function having mean is [−11 ] and
covariance [ 0.8
0.2 0.4 ]
0.2

6. Estimate mean and covariance by using maximum likelihood estimation of dataset


generated by post-lab question-(1)

7. Generate a dataset with normalized distribution function having mean is [ 23] and
covariance [ 0.60 0.60 ]
8. Estimate mean and covariance by using maximum posterior estimation of dataset
generated by post-lab question-(4)

Post-lab Answers
1.

2.

3.

4.

4.10 Conclusion
5. LOADING A DATA SET AND SELECTING PREDICTIVE FEATURES

5.1 Objective
To load a data set and select predictive features
5.2 Tasks
I. Load the datasets
II. Interpret the datasets
III. Extract the general features from the dataset
IV. Histogram plot of dataset
V. Estimation of best feature selection using correlation method

5.3 Theory
Pattern recognition is formally defined as the process whereby a received pattern/signal is
assigned to one of a prescribed number of classes. If the grouping can be identified by the
particular class is known as supervised classification. Otherwise the classification is known
as unsupervised classification.

Features depend on the problem. –measure ‘relevant’ quantities. Some techniques available
to extract ‘more relevant’ quantities from the initial measurements (e.g. PCA(principal
component analysis)). After feature extraction each pattern is a vector. Classifier is a function
to map such vector into class labels. Many general techniques of classifiers design are
available. Need to test and validate the final system.
Use of all the features may not provide good performance. For example, if you read the
whole content of the test book for test, you may not score very good mark, as unnecessary
studying unwanted materials. you can assist your algorithm by feeding in only those features
that are really important for the predictive model. So, you have to choose best combination of
possible features for training the machine learning algorithm. This process is known as
feature selection.
Main reasons to use feature selection are:
 It enables the machine learning algorithm to train faster.
 It reduces the complexity of a model and makes it easier to interpret.
 It improves the accuracy of a model if the right subset is chosen.
 It reduces overfitting.
There are three methods for feature selection are available
 Filter Methods
 Wrapper Methods
 Embedded Methods
Filter method
In this method the selection of features is independent of any machine learning algorithms.
The features are selected on the basis of their scores in various statistical tests for their
correlation with the outcome variable.

Pearson’s Correlation:
It quantifies linear dependence between two continuous variables X and Y. It varies from -1
to +1.
cov ( X , y)
Pearson’s correlation is given as: ρ X , y =
σX σy
In this experiment Pearson’s correlation is used to select the best features.

Database used:
There are 3 databases used.
(i) diabetes database,
(ii) lung cancer database.
(iii) Shape database
1st 2 databases are one dimensional database. Another database is 2-dimensional database.
For 2nd database we have created excel sheet to save the name of files presented in the
database.
Diabetes database:
It consists of 9 columns (9 feature vectors). The description is as follows
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age (years)
Outcome: Class variable (0 or 1)

Some scilab codes used in the programme:

a=read_csv(file name); // It converts all the content to string


b=csvRead(file name); // It only shows numerical value

[fd,SST,Sheetnames,Sheetpos]=xls_open('imagedataset.xls'); // open the excel shhet


a22=strcat([a11,".png"]) // cunctation of 2 strings

Features extracted in the programme


//1. count--- It shows number of NoN-empty rows in a feature
//2. mean1------It shows the mean value of each feature vector
//3. std1-----Standard deviation
//4. min1----minimum value
//max1----maximum value
//f25---25% percentile
//f50---50% percentile
//f75---75% percentile

5.4 Algorithm
Step-1:
Step-2:

5.5 Programme
Programme-1: Load the diabetes dataset and selection of best predictive feature using 1-D
database

//Experiment number-5
//Loading the dataset and selection of best features using 1-D data
clc
clear ;
// load the database
a=read_csv('diabetes.csv'); // It converts all the content to string
b=csvRead('diabetes.csv'); // It only shows numerical value

// Display the databse


disp(a);

// Display 5 rows only


disp(a(1:5,:));

// size of the matrix


[m,n]=size(b);

// Computing the features


//1. count--- It shows number of NoN-empty rows in a feature
//2. mean1------It shows the mean value of each feature vector
//3. std1-----Standard deviation
//4. min1----minimum value
//max1----maximum value
//f25---25% percentile
//f50---50% percentile
//f75---75% percentile

count=zeros(1,n);
for i=1:n
c=b(:,i);
for j=1:m
if ~isnan(c(j))
count(1,i)=count(1,i)+1;
end
end
loc1=find(~isnan(c))
d=c(loc1)
mean1(1,i)=mean(d);
std1(1,i)=stdev(d);
min1(1,i)=min(d);
max1(1,i)=max(d);
e(:,i)=gsort(d,'g','i');

ad1=round(length(d)*0.25);
ad2=round(length(d)*0.5);
ad3=round(length(d)*0.75);
f25(1,i)=e(ad1,i)
f50(1,i)=e(ad2,i)
f75(1,i)=e(ad3,i)
end
features=[count; mean1; std1; min1; max1; f25; f50; f75];
disp(features);
figure(1);
histplot(20,b(2:m,1), style=2)
title("pregnencies histogram plot")
figure(2);
histplot(20,b(2:m,2),style=2)
title("Glucose histogram plot")

figure(3);
histplot(20,b(2:m,3),style=2)
title("BloodPressure histogram plot")
figure(4);
histplot(20,b(2:m,4),style=2)
title("SkinThickness histogram plot")
figure(5);
histplot(20,b(2:m,5),style=2)
title("Insulin histogram plot")
figure(6);
histplot(20,b(2:m,6),style=2)
title("BMI histogram plot")
figure(7);
histplot(20,b(2:m,7),style=2)
title("DiabetesPedigreeFunction histogram plot")

figure(8);
histplot(20,b(2:m,8),style=2)
title("Age histogram plot")

figure(9);
histplot(20,b(2:m,9),style=2)
title("Outcome histogram plot")

// pearson's correlation estimation


for i=1: n
for j=1:n
car(i,j)= correl(b(2:m,i),b(2:m,j));
end
end

[bestfr,bestfc]=find (car>0.5 & car<0.9);

best_features=[bestfr' bestfc'];
disp(best_features);

Programme-2: Load the diabetes dataset and selection of best predictive feature using 2-D
database

//Experiment number-5
//Loading the dataset and selection of best features using 2-D data
clc;
clear

[fd,SST,Sheetnames,Sheetpos]=xls_open('imagedataset.xls');
for l=1:30
a11=SST(l)
a22=strcat([a11,".png"])
a=imread(a22);
b=rgb2gray(a);
// figure
//imshow (b);
a33=im2double(b);
mean1(l,1)=mean(a33);
std1(l,1)=stdev(a33);
min1(l,1)=min(a33);
max1(l,1)=max(a33);
// h11(l,1)=histc(a33);

end
features=[mean1,std1,min1,max1];
disp("The features are", features)

// pearson's correlation estimation


[m,n]=size(features)
for i=1: n
for j=1:n
car(i,j)= correl(features(:,i),features(:,j));
end
end

[bestfr,bestfc]=find (car>0 & car<0.9);

best_features=[bestfr' bestfc'];
disp(best_features);

5.6 Results

5.7 Screen Shots


5.8 Pre-lab Questions
1. Differentiate between supervised learning and unsupervised learning.
2. Explain the need of feature extraction in pattern recognition system.
3. Explain the need of feature selection in pattern recognition system.

Pre-lab Answers
1.

2.

3.

5.9 Post-lab Questions


9. Load the lung cancer dataset as given in csv file.
10. Plot the histogram plot for all features provided in database.
11. Calculate features like mean, standard deviation, minimum value maximum value for
each feature vector given in the dataset.
12. Choose best feature from the database using correlation method.
13. Create excel sheet and the dataset of shape and display the drawing(20).

Post-lab Answers
1.

2.

3.
4.

5.10 Conclusion

6. Program on clustering techniques

6.1 Objective
To write a program on clustering the dataset

6.2 Tasks
I. Create datasets.

II. Develop K mean clustering algorithm

III. Application of K mean clustering algorithm to cluster the dataset into k number of
clusters

6.3 Theory
Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups
data instances that are similar to (near) each other in one cluster and data instances that are
very different (far away) from each other into different clusters. Clustering is often called an
unsupervised learning task as no class values denoting an a priori grouping of the data
instances are given, which is the case in supervised learning. Due to historical reasons,
clustering is often considered synonymous with unsupervised learning. In fact, association
rule mining is also unsupervised.

K-means clustering: K-means is a partitional clustering algorithm. Let the set of data points
(or instances) D be {x1, x2, …, xn},

where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is the number of
attributes (dimensions) in the data.
For example: For diabetic database, It consists of 9 columns (9 feature vectors).(Pregnancies,
Glucose, BloodPressure, Skin Thickness, Insulin, BMI, DiabetesPedigreeFunction, Age,
Outcome) and 763 rows (patients)

The k-means algorithm partitions the given data into k clusters. Each cluster has a cluster
center, called centroid. k is specified by the user.

Given k, the k-means algorithm works as follows:

Step-1: Randomly choose k data points (seeds) to be the initial centroids, cluster centers

Step-2: Assign each data point to the closest centroid (closest point can be estimated by
calculating distance).

Step-3: Re-compute the centroids using the current cluster memberships.

Step-4: If a convergence criterion is not met, go to step-2).

Stopping and converging criteria is

1. no (or minimum) re-assignments of data points to different clusters,

2. no (or minimum) change of centroids, or

3. minimum decrease in the sum of squared error (SSE),

Cj is the jth cluster, mj is the centroid of cluster Cj (the mean vector of all the data points in
Cj), and dist(x, mj) is the distance between data point x and centroid mj.

6.4 Algorithm
Step-1:

Step-2:
6.5 Program
Programme-1: k-mean clustering algorithm

//%% K-means
function [CENTS, DAL]=km_fun(F, K, KMI)
CENTS = F( ceil(rand(K,1)*size(F,1)) ,:); // Cluster Centers
DAL = zeros(size(F,1),K+2); //Distances and Labels
for n = 1:KMI

for i = 1:size(F,1)
for j = 1:K
DAL(i,j) = norm(F(i,:) - CENTS(j,:));
end
[Distance, CN] = min(DAL(i,1:K)); // 1:K are Distance from Cluster Centers
1:K
DAL(i,K+1) = CN; // K+1 is Cluster Label
DAL(i,K+2) = Distance; // K+2 is Minimum Distance
end
for i = 1:K
A = (DAL(:,K+1) == i); // Cluster K Points
CENTS(i,:) = mean(F(A,:)); // New Cluster Centers
if sum(isnan(CENTS(:))) ~= 0 // If CENTS(i,:) Is Nan Then Replace It With
Random Point
NC = find(isnan(CENTS(:,1)) == 1); // Find Nan Centers
for Ind = 1:size(NC,1)
CENTS(NC(Ind),:) = F(randi(size(F,1)),:);
end
end
end
end
end

Programme-2: Cluster the dataset using k mean clustering algorithm

// This code implements K-means Clustering


//Parameters (number of clusters, K-means Iteration)
clc;
// Generate Points
Sigma = [0.5 0.05; 0.05 0.5];
f1 = grand(1000,"mn",[0.5 0]' ,Sigma);
f2=grand(1000,"mn",[0.5 0.5]' ,Sigma);
f3=grand(1000,"mn",[0.5 1]' ,Sigma);
f4=grand(1000,"mn",[0.5 1.5]' ,Sigma);
//F = [f1;f2;f3;f4]';
F=f1'
figure;
plot(F(:,1), F(:,2),'ro')

//%% K-means options


feature_vector = F; //% Input
number_of_clusters = 4; //% Number of Clusters
Kmeans_iteration = 40; //% K-means Iteration
//%% Test K-means
[cluster_centers, data] = km_fun(feature_vector, number_of_clusters, Kmeans_iteration);
//% K-means clustering
disp(cluster_centers);
//%% Plot

figure;
PT = feature_vector(data(:,number_of_clusters+1) == 1, :);
plot(PT(:, 1),PT(:, 2),'bo', 'LineWidth', 2);
set(gca(),"auto_clear","off")
PT = feature_vector(data(:,number_of_clusters+1) == 2, :);
plot(PT(:, 1),PT(:, 2),'ro', 'LineWidth', 2);
set(gca(),"auto_clear","off")
PT = feature_vector(data(:,number_of_clusters+1) == 3, :);
plot(PT(:, 1),PT(:, 2),'go', 'LineWidth', 2);
set(gca(),"auto_clear","off") //% Plot points with determined color and shape
PT = feature_vector(data(:,number_of_clusters+1) == 4, :);
plot(PT(:, 1),PT(:, 2),'yo', 'LineWidth', 2);
set(gca(),"auto_clear","off") //% Plot points with determined color and shape

plot(cluster_centers(:, 1), cluster_centers(:, 2), '*k', 'LineWidth', 7); //% Plot cluster centers

6.6 Results

6.7 Screen Shots


6.8 Pre-lab Questions
1.Q. Define clustering
2.Q. Write the algorithm steps for k-means clustering
3.Q. Explain the k-means clustering algorithm with diagram.
4. Q. write the strength and weakness of k-means clustering algorithm.

Pre-lab Answers
1.

2.

3.

4.
6.9 Post-lab Questions
1.Q, Develop the programme to generate the random 1000 datapoints by using multivariate

normal distribution function with mean [ 0.5


0.5 ]
, and covariance=[
0.5
0.05
0.05
0.5]. Plot the

generated dataset.

2.Q. Cluster the data in the post lab question-1 in to 3 clusters by using use k-mean clustering
with number of iterations=50, plot the clusters with centroids.

3. Q, Develop the programme to generate the random 1000 datapoints by using multivariate

normal distribution function with mean [ 0.5


1.5 ]
, and covariance=[
0.5
0.05
0.05
0.5]. Plot the

generated dataset.

4.Q. Cluster the data in the post lab question-3 in to 5 clusters by using use k-mean clustering
with number of iterations=60, plot the clusters with centroids.

Post-lab Answers
1.

2.

3.

4.

6.10 Conclusion
7. LOGIC GATE FUNCTION DESCRIPTION WITH HEBB RULE

7.1 Objective
Logic gate function description with Hebb rule

7.2 Tasks
I. Develop programme to Implement of Hebb network to classify 2 dimensional patterns

II. Develop program to compute weight and bias using Hebb rule with target created by AND
logic gate

7.3 Theory
Definition Of neural network: neural network is a massively parallel distributed processor
made up of simple processing units that has a natural propensity (natural tendency to behave
in a particular way) for storing experiential knowledge and making it available for use. It
resembles the brain in two respects:

1. Knowledge is acquired by the network from its environment through a learning process.

2. Interneuron connection strengths, known as synaptic weights, are used to store the
acquired knowledge.

The procedure used to perform the learning process is called a learning algorithm, the
function of which is to modify the synaptic weights of the network in an orderly fashion to
attain a desired design objective. The modification of synaptic weights provides the
traditional method for the design of neural networks.

MODELS OF A NEURON
A neuron is an information-processing unit that is fundamental to the operation of a neural
network. The block diagram of figure is shown below shows the model of a neuron, which
forms the basis for designing a large family of neural networks. Here, we identify three basic
elements of the neural model:

1. A set of synapses, or connecting links, each of which is characterized by a weight or


strength of its own. Specifically, a signal x j at the input of synapse j connected to
neuron k is multiplied by the synaptic weight w kj. It is important to make a note of the
manner in which the subscripts of the synaptic weight w kj are written.

The first subscript in w kj refers to the neuron in question, and the second subscript refers to
the input end of the synapse to which the weight refers. Unlike the weight of a synapse in the
brain, the synaptic weight of an artificial neuron may lie in a range that includes negative as
well as positive values.

2. An adder for summing the input signals, weighted by the respective synaptic strengths of
the neuron; the operations described here constitute a linear combiner.

3. An activation function for limiting the amplitude of the output of a neuron. The activation
function is also referred to as a squashing function, in that it squashes (limits) the permissible
amplitude range of the output signal to some finite value.

Fig 7.1 Nonlinear Model of a neuron, labelled k

Learning process in Neural network:

According to the error the updating weight and bias is known as learning process in neural
network.
Hebb network algorithm (Hebb learning rule) is used to update the weight and bias.

Hebb learning rule

Step-1: Initialize weights, they can be set to zero,

w i=0 , for i=1¿ nn=¿Total number of input neuron

Set bias (b) to 0.

Step-2: Weight adjustment and bias adjustments are performed as follows: x i is input, y is
target

w i ( new )=wi ( old ) + x i y

b ( new )=b ( old ) + y

Change in weight=∆ wi=x i y

Example with problem

Using Hebb rule, find weights required to perform the following classification of given input
pattern: + symbol represent the value 1, and empty sequence indicates -1, Consider ‘I’
belongs to member of class has target value 1 and ‘O’ does not belong to member of class so
has target value -1. Implement manual method to calculate new weight and bias

+ + + + + +
+ + +

+ + + + + +
I O

Ans:- According to the given data the following table is formulated.

Input/class x1 x2 x3 x4 x5 x6 x7 x8 x9 y

I 1 1 1 -1 1 -1 1 1 1 1

O 1 1 1 1 -1 1 1 1 1 -1

Step-1: Initialize weights, they can be set to zero,


w i=0 , for i=1¿ nn=¿Total number of input neuron

Set bias (b) to 0. Here n=¿9

Step-2: w i ( new )=wi ( old ) + x i y

b ( new )=b ( old ) + y

Epoch-1 (For class I) y=1 here

w 1 ( new )=0+1X1=1 w 2 ( new )=0+1X1=1 w 3 ( new )=0+1X1=1

w 4 ( new )=0+(-1)X1=-1 w 5 ( new )=0+1X1=-1 w 6 ( new ) =0+(-1)X1=-1

w 7 ( new )=0+1X1=1 w 8 ( new ) =0+1X1=1 w 9 ( new ) =0+1X1=1

b ( new )=0+ 1=1

w ( new )=[1, 1 , 1.−1 ,1 ,−1, 1 ,1 , 1]

w ( new )=[1, 1 , 1.−1 ,1 ,−1, 1 ,1 , 1]

Epoch-2 (For class O) y=-1 here

w 1 ( new )=1+1X(-1)=0 w 2 ( new )=1+1X(-1)=0 w 3 ( new )=1+1X(-1)=0

w 4 ( new )=-1+1X(-1)=-2 w 5 ( new )=1+(-1)X(-1)=2

w 6 ( new ) =-1+(1)X(-1)=-2 w 7 ( new )=1+1X(-1)=0

w 8 ( new ) =1+1X(-1)=0 w 9 ( new ) =1+1X(-1)=0

b ( new )=1+(−1) =0

w ( new )= [ 0 , 0 , 0 ,−2, 2 , 2 ,0 , 0 , 0 ] (ans)

b=0 (ans)

Hebb rule with target created by AND logic gate

AND gate

x1 x2 y
1 1 1

1 0 0

0 1 0

0 0 0

Here 0 is considered as -1

Here bias(b) assumed to be 1

x1 x2 y

1 1 1

1 -1 -1

-1 1 -1

-1 -1 -1

x1 x2 B y

1 1 1 1

1 -1 1 -1

-1 1 1 -1

-1 -1 1 -1

Change in weight=∆ wi=x i y

∆ b= y
x1 x2 b y ∆ w1 ∆ w2 ∆b

1 1 1 1 1 1 1

1 -1 1 -1 -1 1 -1

-1 1 1 -1 1 -1 -1

-1 -1 1 -1 1 1 -1

w i ( new )=wi ( old ) + x i y

b ( new )=b ( old ) + y

x1 x2 b y ∆ w1 ∆ w2 ∆b w1 w2 b

1 1 1 1 1 1 1 1 1 1

1 -1 1 -1 -1 1 -1 0 2 0

-1 1 1 -1 1 -1 -1 1 1 -1

-1 -1 1 -1 1 1 -1 2 2 -2

7.4 Algorithm
Step-1:

Step-2:
7.5 Programme
Programme-1: Develop programme to Implement of Hebb network to classify 2
dimensional patterns

// Experiment number-7
//Hebb Net to c l a s s i f y two d ime n s i o n a l input p a t t e r n s

clear ;
clc ;

// Input Pa t t e r n s
E =[1 1 1 1 1 -1 -1 -1 1 1 1 1 1 -1 -1 -1 1 1 1 1];
F =[1 1 1 1 1 -1 -1 -1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1];
x (1 ,1:20) =E;
x (2 ,1:20) =F;
w (1:20) =0;
w=w'
t =[1 -1];
b =0;
for i =1:2
w=w+x(i ,1:20) *t(i);

b=b+t(i);
end
disp ( 'Weight mat r ix ' );
disp (w);
disp ( ' Bi a s ' );
disp (b);
Programme-2: Develop program to compute weight and bias using Hebb rule with
target created by AND logic gate

//Compution of bias and weight by using AND function using Hebb rule
clear ;
clc ;
E=[1,1,-1,-1];
F=[1,-1,1,-1];
x (1 ,1:4) =E;
x (2 ,1:4) =F;
[m,n]=size(E)
B=[1 1 1 1]
for i=1:n
if x(1,i)==1 & x(2,i)==1
y(1,i)=1
elseif x(1,i)==-1 & x(2,i)==1
y(1,i)=-1
elseif x(1,i)==1 & x(2,i)==-1
y(1,i)=-1
else
y(1,i)=-1
end
end
disp("input is", x);
disp("Target is",y)

for i=1:n
delw1(1,i)=x(1,i)*y(1,i)
delw2(1,i)=x(2,i)*y(1,i)
delb(1,i)=y(1,i)
end
disp("Del w1 is",delw1);
disp("Del w2 is",delw2 );
disp("Del bias is", delb );
w1old=0;
w2old=0;
bold=0;
for i=1:n
w1new(1,i)=w1old+delw1(1,i);
w2new(1,i)=w2old+delw2(1,i);
bnew(1,i)=bold+delb(1,i);
w1old=w1new(1,i);
w2old=w2new(1,i);
bold=bnew(1,i)
end

disp(" w1 newis",w1new);
disp("w2 new is",w2new );
disp("new bias is", bnew );

7.6 Results

7.7 Screen Shots


7.8 Pre-lab Questions
1.Q. Define neural network.
2.Q. Define model of neuron with block diagram.
3.Q. Write the algorithm for Hebb learning rule.
4.Q. Compute the weight and bias using AND function with Hebb network.

Pre-lab Answers
1.

2.

3.

4.

7.9 Post-lab Questions


1.Q, Using Hebb rule, find weights required to perform the following classification of given
input pattern: + symbol represent the value 1, and empty sequence indicates -1, Consider ‘I’
belongs to member of class has target value 1 and ‘O’ does not belong to member of class so
has target value -1. Implement manual method to calculate new weight and bias

+ +
+ + + + +
+ +
I + + O

2.Q. Write the programme for given input in post-lab Q1, Compare result for Question-1 and
question-2

3..Q, Using Hebb rule, find weights required to perform the following classification of given
input pattern: + symbol represent the value 1, and empty sequence indicates -1, Consider ‘I’
belongs to member of class has target value 1 and ‘O’ does not belong to member of class so

has target value -1. Implement manual method to calculate new weight and bias

+ + + + +
+ + + + +
+ +
I O

4.Q. Write the programme for given input in post-lab Q3, Compare result for Question-3 and
question-4

Post-lab Answers
1.

2.

3.
4.

7.10 Conclusion

8. EVALUATING FUNCTION WITH DIFFERENT LEARNING RULES

8.1 Objective

To update of weights and bias by learning rules for different functions

8.2 Tasks

I. Develop programme to Implement simple neuron model

II. Develop program to update weight and bias by using correction learning for AND function

8.3 Theory
MODELS OF A NEURON

A neuron is an information-processing unit that is fundamental to the operation of a neural


network. The block diagram of figure is shown below shows the model of a neuron, which
forms the basis for designing a large family of neural networks. Here, we identify three basic
elements of the neural model:

1. A set of synapses, or connecting links, each of which is characterized by a weight or


strength of its own. Specifically, a signal x j at the input of synapse j connected to neuron k is
multiplied by the synaptic weight w kj. It is important to make a note of the manner in which
the subscripts of the synaptic weight w kj are written.
The first subscript in w kj refers to the neuron in question, and the second subscript refers to
the input end of the synapse to which the weight refers. Unlike the weight of a synapse in the
brain, the synaptic weight of an artificial neuron may lie in a range that includes negative as
well as positive values.

2. An adder for summing the input signals, weighted by the respective synaptic strengths of
the neuron; the operations described here constitute a linear combiner.

3. An activation function for limiting the amplitude of the output of a neuron. The activation
function is also referred to as a squashing function, in that it squashes (limits) the permissible
amplitude range of the output signal to some finite value.

Fig 8.1 Nonlinear Model of a neuron, labelled k

Typically, the normalized amplitude range of the output of a neuron is written as the
closed unit interval [0,1], or, alternatively, [-1,1].

The neural model of Fig. 5 also includes an externally applied bias, denoted by b k. The
bias b k has the effect of increasing or lowering the net input of the activation function,
depending on whether it is positive or negative, respectively. In mathematical terms, we
may describe the neuron k depicted in Fig. 5 by writing the pair of equations:

m
uk =∑ w kj x j
j =1

y k =φ(u k + bk )

v k =uk +b k
Where x 1 , x 2 ,… .. x m are input signal, w k 1 , w k 2 ,… .. w km are respective synaptic weights of
neuron k . uk is the linear combiner output due to the input signal. b k is the bias. φ is the
activation function. y k is the output signal of the neuron.

Two basic types of activation functions are available

1. Threshold function: The threshold function is defined as

φ ( v )= {10 if v ≥ 0
if v <0

v is the input to the activation function

In engineering, this form of a threshold function is commonly referred to as a Heaviside


function.

2. Sigmoid function The sigmoid function is defined as

1
φ ( v )=
1−exp ⁡(−av )

a=¿ slope parameter of the sigmoid function

Fig 9.2 (a) Threshold Function (b) Sigmoid Function for varying slope parameter a

Definition of learning

Learning is a process by which the free parameters of a neural network are adapted through a
process of stimulation by the environment in which the network is embedded. The type of the
learning is determined by the manner in which the parameter changes take place. (Mendel &
McClaren 1970).

Five Basic Learning Rules

• Error-correction learning <- optimum filtering

• Memory-based learning <- memorizing the training data explicitly

• Hebbian learning <- neurobiological

• Competitive learning <- neurobiological

• Boltzmann learning <- statistical mechanics

Error-correction learning

error signal = desired response – output signal

ek(n) = dk(n) –yk(n)

ek(n) actuates a control mechanism to make the output signal yk(n) come closer to the desired
response dk(n) in step by step manner

Fig 9.3 Illustrating error- correction Learning

A cost function (n) = ½e²k(n) is the instantaneous value of the error energy -> a steady
state

= a delta rule or Widrow-Hoff rule

wkj(n) =  ek(n) xj(n),


 is the learning rate parameter The adjustment made to a synaptic weight of a neuron is
proportional to the product of the error signal and the input signal of the synapse in
question.

wkj(n+1) = wkj(n) + wkj(n)

Model for 2 input function

x1 w1 w0

Activatio
y
∑ n
function

x2 w2

AND Function

x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1

AND-NOT Function

x1 x2 y
0 0 0
0 1 0
1 0 1
1 1 0

OR Function

x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 1

8.4 Algorithm

Step-1:

Step-2:

8.5 Programme

Programme-1: Develop programme to Implement simple neuron model

clear;
clc;
//number of inputs
n=input("Enter number of inputs for neural network");

// randomly generate input


x=-1+rand(1,n)*2; // to generate input between -1 and 1
disp("Input is ", x)

// randomly generate weight


w=-1+rand(1,n)*2;
disp("weight is ", w)
// enter bias
b=rand(1,1);

u=0;
for i=1:n
u=u+(x(i)*w(i));
end
v=u+b;

// Activation function
// Threshold activation function
if v>=0
y1=1
else
y1=0
end
disp("Output considering activation function as threshold function", y1)
// sigmoid activation function
a=1
y2=(1/(1-exp(-a*v)))
if y2>=0.5
y3=1
else
y3=0
end
disp("Output considering activation function as sigmoid function", y3)

Programme-2: Develop program to update weight and bias by using correction learning
for AND function

// Use of error correction learning for AND gate function


clear
clc
//initialise the inputs
x=[1 0 0
101
110
1 1 1];
disp("Input is :");
disp("B x1 x2");
disp(x);
yd=[0;0;0;1]; // This is for AND gate
disp("Target Output Yd Is :");
disp(yd);

ya=rand(4,1);

//Initialise the weights


w=rand(1,3);
w1=w

disp("Initialise Random Weights:");


disp(" W1 W2 W3");
disp(w);

lr=0.5; // learning rate


disp("Learning Coefficient =");
disp(lr);
flag=0;
net=0;
err=0;
epoch=0;
thresh=0;

while flag==0 do
for i=1:4
for j=1:3
net=net+w(1,j)*x(i,j);
end;
if net >= thresh then // threshold activation function
ya(i,1)=1;
else
ya(i,1)=0;
end;

err=yd(i,1)-ya(i,1);
for j=1:3
w(1,j)=w(1,j)+ (lr*x(i,j)*err); // error correction learning
end;
net=0.00; //Reset net for next iteration
end
disp(ya,"Actual Output");
disp(yd,"Desired Output");

epoch=epoch+1;
disp("End of Epoch No:");
disp(epoch);
disp("************************************************************’");
if epoch > 1000 then
disp("Learning Attempt Failed !")
break
end;

if yd(1,1) == ya(1,1)& yd(2,1) == ya(2,1) & yd(3,1) == ya(3,1) & yd(4,1) == ya(4,1) then
flag=1;
else
flag=0;
end
end
disp("Initial Random Weights -");
disp(w1);
disp("Final Adjusted Weights -");
disp(w);
disp(lr,"Learning rate is – ")
disp("***********************************’")
plot(yd,ya);

8.6 Results

8.7 Screen Shots

8.8 Pre-lab Questions


1.Q. Define learning process in neural network.
2.Q. What is the use of activation function in neuron model?
3.Q. Define threshold and sigmoid activation function
4.Q. Explain error correction learning.

Pre-lab Answers

1.

2.

3.

4.

8.9 Post-lab Questions

1. Q. Develop program to update weight and bias by using correction learning for OR
function. With learning rate 0.3.
2. Q. Develop program to update weight and bias by using correction learning for OR
function. With learning rate 0.7.
3. Q. Develop program to update weight and bias by using correction learning for AND-
NOT function with learning rate 0.4.
4. Q. Develop program to update weight and bias by using correction learning for AND-
NOT function with learning rate 0.6.

8.10 Post-lab Answers

1.

2.

3.

4.

8.11 Conclusion
9. XOR problem with Perceptron network

9.1 Objective

XOR problem with madaline network

9.2 Tasks

I. Develop programme to implement McCulloch-pitts Net for XOR function


II. Develop programme to implement XOR function using madline network

9.3 Theory
The multilayer perceptron network consists of input layer, output layer and hidden layers.

Input Output

layer layer

Hidden Layers

Fig 9.1 Multi layer Perceptron Network


Fig 9.2 A solution for the XOR problem

9.4 Algorithm

Step-1:

Step-2:
9.5 Programme

Programme-1: Develop programme to implement McCulloch-pitts Net for XOR


function

//McCul loch􀀀P i t t s f o r XOR f u n c t i o n


//Windows 10
// S c i l a b 5 . 4 . 1
clear ;
clc ;

// Ge t t ing we i g h t s and t h r e s h o l d v a l u e
disp ( ' Ent e r we i g h t s ' );
w11 = input ( 'Weight w11=' );
w12 = input ( ' we i ght w12=' );
w21 = input ( 'Weight w21=' );
w22 = input ( ' we i ght w22=' );
v1= input ( ' we i ght v1=' );
v2= input ( ' we i ght v2=' );
disp ( ' Ent e r Thr e sho ld Value ' );
theta = input ( ' t h e t a=' );
x1 =[0 0 1 1];
x2 =[0 1 0 1];
z =[0;1;1;0];

con =1;
while con
zin1 =x1* w11 +x2*w21;
zin2 =x1* w21 +x2*w22;
for i =1:4
if zin1 (i) >= theta
y1(i)=1;
else
y1(i)=0;
end
if zin2 (i) >= theta
y2(i)=1;
else
y2(i)=0;
end
end
yin =y1*v1+y2*v2;
for i =1:4
if yin (i) >= theta ;
y(i)=1;
else
y(i)=0;
end
end
disp ( ' Output o f Net ' );
disp (y);
if y == z
con =0;
else
disp ( ' Net i s not l e a r n i n g e n t e r ano the r s e t o f we i g h t s and Thr e sho ld v a l u e
' );

end
end
disp ( 'McCul loch􀀀P i t t s Net f o r XOR f u n c t i o n ' );
disp ( 'Weight s o f Neuron Z1 ' );
disp (w11);
disp (w21);
disp ( ' we i g h t s o f Neuron Z2 ' );
disp (w12);
disp (w22);
disp ( ' we i g h t s o f Neuron Y' );
disp (v1);
disp (v2);
disp ( ' Thr e sho ld v a l u e ' );
disp ( theta );

//For this programme w11=1, w21=-1, w12=-1, w22=1, v1=1, v2=1, theta=1

Programme-2: Develop programme to implement XOR function using madline network

// Gene rat e XOR f u n c i o n f o r b i p o l a r i n p u t s and


//t a r g e t s u s i n g madal ine ne twork
// Truth t a b l e f o r XOR g a t e
// X1 X2 Y
// 􀀀1 􀀀1 􀀀1
// 􀀀1 1 1
// 1 􀀀1 1
// 1 1 􀀀1 ( Bi p o l a r (1 , 􀀀1) )

clc ;
clear ;
x =[1 1 -1 -1;1 -1 1 -1]; // input
t=[ -1 1 1 -1]; // t a r g e t
// as suming i n i t i a l we i ght mat r ix and b i a s
w =[0.05 0.1;0.2 0.2];
b1 =[0.3 0.15];
disp("Initial weight",w);
disp("bias",b1);
v =[0.5 0.5];
b2 =0.5;
con =1;
alpha =0.5; // learning parameter
epoch =0;
while con
con =0;
for i =1:4
for j =1:2
zin (j)=b1(j)+x(1,i)*w(1,j)+x(2,i)*w(2,j); // n e u r a l f u n c t i n output
if zin (j) >=0 then
z(j)=1;
else
z(j)= -1;
end
end
yin =b2+z(1)*v (1) +z(2)*v (2) ;
if yin >=0 then
y =1;
else
y= -1;
end
if y~=t(i) then
con =1;
if t(i) ==1 then
if abs ( zin (1))>abs (zin (2) ) then
k =2;
else
k =1;
end
b1(k)=b1(k)+ alpha *(1 - zin (k));
// upg r ading b i a s
w(1:2 ,k)=w(1:2 ,k)+ alpha *(1 - zin(k))*x(1:2 ,i); // upg r ading we i ght
else
for k =1:2
if zin (k) >0 then

b1(k)=b1(k)+ alpha *( -1 - zin(k)); // upg r ading b i a s


w(1:2 ,k)=w(1:2 ,k)+ alpha *( -1 -zin (k))*x(1:2 ,i); //upg r ading we i ght
end
end
end
end
y1(i)=y
end
epoch = epoch +1;
disp("----Output in in each iteration")
disp("Weight matrix of hidden layer");
disp ( w );
disp ( " Bi a s " ) ;
disp ( b1 ) ;
disp ( " Epoch number ") ;
disp(epoch);
disp("The output by network",y1)
disp("The output of XOR",t)
end
disp("-----------Final result------------")
disp ( 'Weight mat r ix o f hidden l a y e r ' );
disp (w);
disp ( ' Bi a s ' );
disp (b1);
disp ( ' To t a l epoch ' );
disp ( epoch );

9.6 Results

9.7 Screen Shots


9.8 Pre-lab Questions

1.Q. Explain single layer perceptron network

2.Q. Draw the diagram for implementation of XOR gate using multi-layer perceptron
network

Pre-lab Answers

1.

2.

9.9 Post-lab Questions

1. Q. Develop programme to implement XOR function using madline network using


learning parameter=0.7
2. Q. Develop programme to implement XOR function using madline network using
learning parameter=0.5, w =[0.05 0.1;0.2 0.2]; b1 =[0.3 0.15];

Post-lab Answers

1.

2.
9.10 Conclusion

10. PROGRAMS ON TRAINING A HOPFIELD NETWORK

10.1 Objective

Train and test a discrete Hopfield network

10.2 Tasks

I. Develop programme to train a discrete Hopfield network


II. Develop programme to test a discrete Hopfield network

10.3 Theory

Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network
is commonly used for auto-association and optimization tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words, it can be said
the input and output patterns are discrete vector, which can be either binary 0,1 or
bipolar +1,−1 in nature. The network has symmetrical weights with no self-connections
i.e., w ij = w ji and w ii= 0.
Architecture

Following are some important points to keep in mind about discrete Hopfield network −

 This model consists of neurons with one inverting and one non-inverting output.

 The output of each neuron should be the input of other neurons but not the input of
self.

 Weight/connection strength is represented by w ij.

 Connections can be excitatory as well as inhibitory. It would be excitatory, if the


output of the neuron is same as the input, otherwise inhibitory.

 Weights should be symmetrical, i.e. w ij = w ji

Fig 10.1 Hopfield Network

The output from Y 1 going to Y 2, Y i and Y n have the weights w 12, w 1 i and w 1 n
respectively. Similarly, other arcs have the weights on them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we know that we
can have the binary input vectors as well as bipolar input vectors. Hence, in both the cases,
weight updates can be done with the following relation

Case 1 − Binary input patterns


For a set of binary patterns s p, for p=1 ¿ P

Here, s p = s1 p, s2 p,..., sip,..., snp

Weight Matrix is given by


P
w ij =∑ ¿ ¿ 2 sip-1][2 s jp −1] for i≠ j
p=1

Case 2 − Bipolar input patterns

For a set of bipolar patterns s p , for p=1 ¿ P

Here, s p = s1 p, s2 p,..., sip,..., snp

Weight Matrix is given by

P
w ij =∑ ¿ ¿ sip ][2 s jp] for i≠ j
p=1

10.4 Testing Algorithm

Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.

Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.

Step 3 − For each input vector X, perform steps 4-8.

Step 4 − Make initial activation of the network equal to the external input vector X as
follows : y i=x i for i=1 , 2 ,3 … … . n

Step 5 − For each unit y i, perform steps 6-9.

Step 6 − Calculate the net input of the network as follows:

y ini =x i+ ∑ y j w ji
j

Step 7 − Apply the activation as follows over the net input to calculate the output:
{
1 If yini >θi
y i= y i If y ini =θi
0 If yini <θi

Here θi is the threshold.

Step 8 − Broadcast this output y i to all other units.

Step 9 − Test the network for conjunction.

Algorithm

Step-1:

Step-2:

10.5 Programme: Develop a programme to train and test a discrete Hopfield network

// Di s c r e t e Ho p f i e l d ne t
clear ;
clc ;
x =[1 1 1 0]; // target
tx =[0 0 1 0]; //input
//training
w1 =(2*x ' -1);
w2 =(2*x -1) ;
w=w1*w2; // creation of weights

for i =1:4
w(i,i)=0;
end
// testing
con =1;
y =tx; //assignment of input
epoch=0;
while con
up =[4 2 1 3];
epoch=epoch+1;
for i =1:4
yin (up(i))=tx(up(i))+y*w(1:4 , up(i)); //calculate net input
if yin (up(i)) >0
y(up(i)) =1; // application of activation function
end
end
disp("The epoch number", epoch);
disp ("ouput in loop",y);
if y==x
disp ( ' Convergence e has been o b t a i n e d ' );
disp ( 'The Converged Output ' );
disp (y);
disp("epoch number", epoch);
con =0;
end
end

10.6 Results
10.7 Screen Shots

10.8 Pre-lab Questions

1. Define Hopfield network.

2. Write the steps for training of discrete Hopfield network.

3. Write the steps for testing of discrete Hopfield network.

Pre-lab Answers

1.

2.

3.

10.9 Post-lab Questions

1. Q. develop a programme to train and test discrete Hopfield network with binary input
pattern, where input is [0 0 1 0] and target is [0 1 1 0];
2. Q. develop a programme to train and test discrete Hopfield network with bipolar input
pattern, where input is [-1 -1 1 -1] and target is [-1 1 1 -1];

Post-lab Answers

1.
2.

10.10 Conclusion

11. PROGRAMS ON AUTO AND HETERO ASSOCIATION OF MEMORY

11.1 Objective

To train and test auto and hetero association of memory

11.2 Tasks

I. Develop programme to train an auto associative network


II. Develop programme to test an auto associative network
III. Develop programme to train a hetero associative network

11.3 Theory
Associative memory network:

These kinds of neural networks work on the basis of pattern association, which means they
can store different patterns and at the time of giving an output they can produce one of the
stored patterns by matching them with the given input pattern. These types of memories are
also called Content-Addressable Memory CAM. Associative memory makes a parallel
search with the stored patterns as data files.

Following are the two types of associative memories we can observe :

(i) Auto Associative Memory


(ii) Hetero Associative memory

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.

Architecture

As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors

Fig.11.1 Auto Associative Memory

Training Algorithm

For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as w ij =0 i=1to n, j=1to n

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

x i=si ¿=1 to n)

Step 4 − Activate each output unit as follows −


y j=s j ¿=1 to n)

Step 5 − Adjust the weights as follows −

w ij ( new )=w ij ( old )+ x i y j

Testing Algorithm

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to n

n
y inj =∑ x i wij
i=1

Step 5 − Apply the following activation function to calculate the output

y j=f ¿ )=
{+1 if y inj > 0
−1 if y inj ≤ 0

Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero
associative network is static in nature, hence, there would be no non-linear and delay
operations.

Architecture

As shown in the following figure, the architecture of Hetero Associative Memory


network has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Fig.11.2 Hetro Associative Memory

Training Algorithm

For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as w ij =0 i=1to n, j=1 to m

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

x i=si ¿=1 to n)

Step 4 − Activate each output unit as follows −

y j=s j ¿=1 to m)

Step 5 − Adjust the weights as follows −

w ij ( new )=w ij ( new )+ x i y j

Testing Algorithm

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to m

n
y inj =∑ x i wij
i=1

Step 5 − Apply the following activation function to calculate the output


{
1 if y inj >0
y j=f ¿ )= 0 if y inj =0
−1 if y inj <0

11.4 Algorithm

Step-1:

Step-2:

11.5 Programme

clear ;
clc ;
// Au o t a s s o c i a t i v e ne t to s t o r e the v e c t o r
x =[1 1 -1 -1];
xv =[1;1; -1; -1];
//Training
w= zeros (4 ,4);
w=x '*x;
//Testing
yin =x*w;
for i =1:4
if yin (i) >0
y(i)=1;
else
y(i)= -1;
end
end
disp('---Auto associative network---');
disp ( 'Weight mat r ix ' );
disp (w);
disp('output',y);
if xv ==y
disp ( 'The v e c t o r i s a Known Ve c tor ' );
else
disp ( 'The v e c t o r i s an Unknown Ve c tor ' );
end

// He te ro a s s o c i a t i v e n e u r a l ne t
x =[1 1 0 0;1 0 1 0;1 1 1 0;0 1 1 0];
t =[1 0;1 0;0 1;0 1];
//Training
w= zeros (4 ,2);
for i =1:4
w=w+x(i ,1:4) '*t(i ,1:2) ;
end
disp('--Hetero asspciative memory network---')
disp ( 'Weight mat r ix ' );
disp (w);

11.6 Results
11.7 Screen Shots

11.8 Pre-lab Questions

1. Define Hopfield network.

2. Write the steps for training of auto associative memory network.

3. Write the steps for testing of auto associative memory network.

4. Write the steps for training of hetero associative memory network.

5. Write the steps for testing of hetero associative memory network.

Pre-lab Answers

1.

2.

3.

4.

5.

11.9 Post-lab Questions


3. Q. develop a programme to train and test auto associative memory with input and
target [1 -1 1 -1]
4. Q. develop a programme to train a hetero associative memory and display the weight

[ ] [ ]
1100 1 0
1101 0 1
by using input and target
1011 1 1
0100 0 0

Post-lab Answers

1.

2.

11.10 Conclusion

12. EVALUATION OF ERROR IN BACK PROPAGATION NETWORK (BPN)

12.1 Objective
To evaluate error in Back propagation network (BPN)

12.2 Tasks

I. Develop a multilayer perceptron architecture


II. Calculate output using feedforward network
III. Update weight by using back propagation algorithm.

12.3 Theory
Back propagation algorithm

Two phases of computation:


• Forward pass: run the NN and compute the error for each neuron of the
output layer.

• Backward pass: start at the output layer, and pass the errors backwards
through the network, layer by layer, by recursively computing the local
gradient of each neuron.

Summary of multilayer perceptron architecture (algorithm) by using back propagation


algorithm

Step-1: Initialize weights and learning rate (random small values are taken) (in problem it
will be provided).

Step-2: Calculate net input to each hidden layer Z j for j=1 ,2 , … . , n

(Z¿ ¿ j)¿ =¿ ¿

Step-3: Calculate the output for each hidden layer Z j by applying activation function. a=1
here, (sigmoid)

Step-4: These signals are set as input signal to output unit

Step-5: For each output unit y k for k =1 ,2 , … . , p ,

p
Calculate net input ( y ¿¿ k )¿ =¿ ¿ w 0 k + ∑ z i wik
i=1

Step-6: Apply activation function to get ( y ¿¿ k )¿

Step-7: Compute local gradient for output layer:

d is target here.

Step-8: compute change in weight in output layer Δ w i=η δ j Z i

Step-9: Compute local gradient for hidden layer:

Step-10: compute change in weight in hidden layer

Step-11: new weight=old weight+ change in weight


After updating weights calculate output (step-2 to step-6). After calculating the output,
subtract from the target to get the error.

Problem

1.Q. Calculate the new weight of the multilayer perceptron neural network. If x 1=0 ,
x 2=1 , w01=0.3, w 02=0.5, w 11=0.6 , w21=−0.1, w 12=−0.3 ,

w 22=0.4 , w 0=−0.2 , w 1=0.4 , w 2=0.1. Target output=1, Learning rate=0.25, use binary
sigmoid activation function.

w 01 w0
1
1
x1 Z1
w 11
w1 y

w 21 w 12
w2
x2 Z2
w 22
w 02

Fig 12.1 Multi layer Perceptron Network

Solution

Step-2: Calculate net input ton each hidden layer Z j for j=1 ,2 , … . , n
w 0 j + ∑ x i wij
i=1
(Z¿ ¿ j)¿ =¿ ¿

x 1=0 , x 2=1 , w01=0.3, w 02=0.5, w 11=0.6 w 21=−0.1,

w 12=−0.3 , w22=0.4

(Z¿ ¿1)¿ =w 01+¿ ¿ x 1 w11 + x 2 w21=0.3+0-0.1=0.2

(Z¿ ¿ 2)¿ =w 02+¿ ¿ x 1 w12+ x 2 w 22=0.5+0+0.4=0.9

Step-3: Calculate the output for each hidden layer Z j by applying activation function. a=1
here, (sigmoid)

1 1
z 1= −(Z ¿¿ 1)¿ = −0.2
=¿0.5498
1+ e ¿ 1+ e

1 1
z 2= −(Z ¿¿2)¿ = −0.9
=¿0.7109
1+ e ¿ 1+ e

Step-4 and 5: For each output unit y k for k =1 ,2 , … . , p ,

p
Calculate net input ( y ¿¿ k )¿ =¿ ¿ w 0 k + ∑ z i wik
i=1

w 0=−0.2 , w 1=0.4 , w 2=0.1

( y )¿=w0 +¿ Z1 w 1+ Z 2 w2

=-0.2+(0.5498*(0.4))+(0.7109*0.1)=0.09101

Step-6: Apply activation function to get ( y ¿¿ k )¿

1 1
y= −( y) = −0.09101
=¿0.5227
1+e 1+ e
¿

Step-7: Compute local gradient for output layer:

Here target is 1.

=0.5227(1-0.5227)(1-0.5227)=0.1191
Step-8: compute change in weight in output layer

Δ w i=η δ j Z i

η =0.25, z 1=¿0.5498, z 2=¿0.7109,δ 1=0.1191

Δ w 1=η δ 1 Z 1=0.25∗0.1191∗¿ 0.5498=0.0164

Δ w 2=η δ 1 Z 2=0.25∗0.1191∗¿ 0.7109=0.02117

Δ w 0=η δ 1=0.25∗0.1191=0.02978

Step-9: Compute local gradient for hidden layer:


δ jj =Z j [1−Z j ] ∑ δ k w kj
k

z 1=¿0.5498, z 2=¿0.7109, w 1=0.4 , w 2=0.1, δ 1=0.1191

δ 11 =Z1 [1−Z 1 ] δ 1 w1=0.5498[1−0.5498] 0.1191∗0.4

=0.0118

δ 22 =Z2 [1−Z 2 ] δ 1 w2=0.7109[1−0.7109] 0.1191∗0.1

=0.00295

Step-10: compute change in weight in hidden layer

δ 11=0.0118, δ 22=0.00295, x 1=0 , x 2=1, η =0.25

Δ w 11=η δ 11 x1 =¿0.25*0.0118*0=0

Δ w 21=η δ 11 x 2=¿0.25*0.0118*1=0.00295

Δ w 01=η δ 11 =¿0.25*0.0118=0.00295

Δ w 22=η δ 22 x 2=¿ 0.25*0.00295*1=0.0006125

Δ w 12=η δ 22 x 1=¿ 0.25*0.00295*0=0

Δ w 02=η δ 22=¿ 0.25*0.00295=0.0006125


Step-11: new weight=old weight+ change in weight

Δ w 11=0 , Δ w21 =0.00295, Δ w 01=¿0.00295, Δ w 22=0.0006125

Δ w 12=0, Δ w 02=0.0006125, Δ w 1=0.0164, Δ w 2=0.02117, Δ w 0=0.02978

w 01=0.3, w 02=0.5, w 11=0.6 w 21=−0.1, w 12=−0.3 , w22=0.4 ,

w 0=−0.2 , w 1=0.4 , w 2=0.1

w 11(new)= w 11(old)+ Δ w 11=0.6+0=0.6

w 21(new)= w 21(old)+ Δ w 21=-0.1+0.00295=-0.09705

w 01(new)= w 01(old)+ Δ w 01=0.3+0.00295=0.30295

w 22(new)= w 22(old)+ Δ w 22=0.4+0.0006125=0.4006125

w 12(new)= w 12(old)+ Δ w 12=-0.3+0=-0.3

w 02(new)= w 02(old)+ Δ w 02=0.5+0.0006125=0.5006125

w 1(new)= w 1(old)+ Δ w 1=0.4+0.0164=0.4164

w 2(new)= w 2(old)+ Δ w 2=0.1+0.02117=0.12117

w 0(new)= w 0(old)+ Δ w 0=-0.2+0.02978=-0.17022

After updating weights calculate output (step-2 to step-6). After calculating the output,
subtract from the target to get the error.

12.4 Algorithm

Step-1:

Step-2:
12.5 Programme

clc;
clear all;

x=[0 1];// input


wh=[0.3 0.6 -0.1 -0.3 0.4 0.5]// weight of hidden layer
//wh=[w01 w11 w21 w12 w22 w02]
wo=[-0.2 0.4 0.1]; // weight of output layer
//wo=[w0 w1 w2]
d=1 // target
n=0.25 // learning rate
//step-2
// estimation of Zjin
z1in=wh(1)+(x(1)*wh(2))+(x(2)*wh(3));
z2in=wh(6)+(x(1)*wh(4))+(x(2)*wh(5));

//Step-3
z1=1/(1+exp(-z1in));
z2=1/(1+exp(-z2in));
disp("z1",z1);
disp("z2",z2);
//step-4 and 5
yin=wo(1)+(z1*wo(2))+(z2*wo(3));
//step-6
y=1/(1+exp(-yin));
disp("Output",y);

//step-7 (local gradient for output layer)


del1=y*(1-y)*(d-y);

//step-8 change in weight in output layer


delw0=n*del1;
delw1=n*del1*z1;
delw2=n*del1*z2;

delwo=[delw0 delw1 delw2];


disp("---Change in weight of hidden layer---")
disp(delwo);
//step-9
del11=z1*(1-z1)*del1*wo(2);
del22=z2*(1-z2)*del1*wo(3);

// step-10
delw01=n*del11;
delw11=n*del11*x(1);
delw21=n*del11*x(2);
delw02=n*del22;
delw12=n*del22*x(1);
delw22=n*del22*x(2);
disp("---Change in weight of hidden layer---")
disp(delw01);
disp(delw11);
disp(delw21);
disp(delw12);
disp(delw22);
disp(delw02);
delwh=[delw01 delw11 delw21 delw12 delw22 delw02];
//new weight (updataion of weight)
wh=wh+delwh;
wo=wo+delwo;
disp("new weight of hidden layer",wh);
disp("new weight of output layer",wo);
// Calculation of output after weight updation
//step-2
// estimation of Zjin
z1in=wh(1)+(x(1)*wh(2))+(x(2)*wh(3));
z2in=wh(6)+(x(1)*wh(4))+(x(2)*wh(5));

//Step-3
z1=1/(1+exp(-z1in));
z2=1/(1+exp(-z2in));
disp(z1);
disp(z2);
//step-4 and 5
yin=wo(1)+(z1*wo(2))+(z2*wo(3));
//step-6
y=1/(1+exp(-yin));
disp(y);
// error
err=d-y;
disp("Error is",err);

12.6 Results

12.7 Screen Shots


12.8 Pre-lab Questions

1.Q. Summarize backpropagation algorithm in multilayer perceptron network.

2.Q. What is XOR problem.

Pre-lab Answers

1.

2.

12.9 Post-lab Questions

1. Q. develop a programme to compute error in multilayer perceptron network, where


Calculate the new weight of the multilayer perceptron neural network. If x 1=1 ,
x 2=0 , w 01=0.3, w 02=0.5, w 11=0.6 , w21=−0.1 , w 12=−0.3 , w22=0.4, w 0=−0.2 , w 1=0.4 ,
w 2=0.1. Target output=1, Learning rate=0.2, use binary sigmoid activation function.
2. Q. develop a programme to compute error in multilayer perceptron network, where
Calculate the new weight of the multilayer perceptron neural network. If x 1=1 ,
x 2=0 , w 01=0.3, w 02=0.5, w 11=0.6 , w21=−0.1 , w 12=−0.3 , w22=0.4, w 0=−0.2 , w 1=0.4 ,
w 2=0.1. Target output=1, Learning rate=0.5, use binary sigmoid activation function.

Post-lab Answers

1.
2.

12.10 Conclusion

13. PROGRAMS ON ORTHOGONALITY AND EVALUATING INPUT AND


OUTPUT FOR ASSOCIATION

13.1 Objective
Checking input matrix orthogonal or not, if orthogonal then estimate weight

13.2 Tasks

I. Check input matrixes is orthogonal or not


II. Train and test the input vector
III. Calculate weight for storing 2 vectors in auto-associative network

13.3 Theory
If sum of the single row matrix is zero, then it is called orthogonal.

For eg. X=[1 -1 1 -1] , then sum(X)=1-1+1-1=0. As sum is zero, so it is orthogonal.

More than one vector can be stored in an auto-associative net by simply adding the weights
needed for each vector. Assume we have two vectors that we want to store in an auto-
associative net. Assume we want to store (1 1 -1 -1) and (-1 1 1 -1) in an auto-associative net.
We obtain the weight matrices for each input and add them up.

The auto-associative network is explained as follows:

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.

Architecture

As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors

Fig.13.1 Auto Associative Memory

Training Algorithm

For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as w ij =0 i=1to n, j=1to n

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −


x i=si ¿=1 to n)

Step 4 − Activate each output unit as follows −

y j=s j ¿=1 to n)

Step 5 − Adjust the weights as follows −

w ij ( new )=w ij ( old )+ x i y j

Testing Algorithm

Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to n

n
y inj =∑ x i wij
i=1

Step 5 − Apply the following activation function to calculate the output

y j=f ¿ )=
{+1 if y inj > 0
−1 if y inj ≤ 0

13.4 Algorithm

Step-1:

Step-2:
13.5 Programme

clc
clear;

// Giving input for 2 input for auto-associative network


x1=input("Enter a first input for training vector for auto association1*4 matrix");
//Eg. [ 1 1 -1 -1]
x2=input("Enter a second input for training vector for auto association 1*4 matrix");
//Eg [ 1 -1 1 -1]

if sum(x1)==0 && sum(x2)==0 then


disp("The vector is orthogonal");

//Training 1st input vector


w1= zeros (4 ,4);
w1=x1 '*x1;
//Testing
yin1 =x1*w1;
for i =1:4
if yin1 (i) >0
y1(i)=1;
else
y1(i)= -1;
end
end
//Training 2nd input vector
w2= zeros (4 ,4);
w2=x2 '*x2;
//Testing
yin2=x2*w2;
for i =1:4
if yin2 (i) >0
y2(i)=1;
else
y2(i)= -1;
end
end
disp(y1)
disp(y2)

// Weight for association


w=w1+w2
disp("final weight", w)
else
disp("The vector is non-orthogonal");
end

13.6 Results

13.7 Screen Shots


13.8 Pre-lab Questions

1. Explain the difference between auto- association network and hetero associative network.

2. How will you estimate the orthogonality of a input of single row matrix .

Pre-lab Answers

1.

2.

13.9 Post-lab Questions

1. Give 3 inputs(matrixes) having 1 row and 4 columns and input bipolar values (1 or -
1). Check whether all the input vectors are orthogonal or not.

Post-lab Answers

1.

13.10 Conclusion

You might also like