0 ratings0% found this document useful (0 votes) 33 views14 pagesFML Unit4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Machine Learning 3-13 Ensemble Techniques and Unsupervised Learning
+ We hold to iterate the process till we gain the advent of a pre-set range of
vulnerable learners or we can not look at further improvement at the dataset. At
the end of the algorithm, we are left with some vulnerable learners with a stage
B ifee.
"BFAD Difference between Bagging and Boosting
Bagging . Boosting
Bagging is a technique that builds Boosting refers to a group of algorithms |
multiple homogeneous models from that utilize weighted averages to make
different subsamples of the same weak learning algorithms stronger
training dataset to obtain more accurate _ learning algorithms.
predictions than its individual models
Learns them independently from each __Learns them sequentially in a very
other in parallel adaptative way
It helps in reducing variance. It helps in reducing bias and variance.
Every model receives an equal weight. _ Models are weighted by their
performance.
Clustering
+ © Given a set of objects, place them in groups such that the objects in a group are
similar (or related) to one another and different from (or unrelated to) the objects
in other groups.
© Cluster analysis can be a powerful data-mining tool for any organization that
needs to identity discrete groups of customers, sales transactions, or other types of
behaviors and things. For example, insurance providers use cluster analysis to
detect fraudulent claims and banks used it for credit scoring.
* Cluster analysis uses mathematical models to discover groups of similar customers
based on the smallest variations among customers within each group.
* Cluster is a group of objects that belong to the same class, In another words the
_ similar object are grouped in one cluster and dissimilar are grouped in other
cluster,
Clustering is a process of partitioning a set of data in a set of meaningful
subclasses. Every data in the sub class shares a common trait. It helps a user
Understand the natural grouping or structure in a data set.
parious types of clustering methods are partitioning methods, hierarchical
lustering, fuzzy clustering, density based clustering and model based clustering.
Cluster anlysis is process of grouping a set of data objects into clusters.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeTechniques 2nd UNSUPerV0 Lo4m,
g-14___ Ensemble ;
Machine Leeming .
stering algorithm are as fol
i clu
+ Desirable properties of @
Fig. 3.3.1
1. Scalability (in terms of both time and space)
2. Ability to deal with different data types _
' uirements for domain knowledge to determine input parameters,
3. Minimal req}
4. Interpretability and usability.
eee of data is a method by which large sets of data are grouped into
, clusters of smaller sets of similar data. Clustering can be considered the most
important unspervised learning problem.
A cluster is therefore a collection of objects which are "similar" between them and
are dissimilar" to the objects belonging to other clusters. Fig. 3.3.1 shows cluster.
In this case we easily identify the 4 clusters into which the data can be divided;
the similarity criterion is distance : two or more objects belong to the same cluster
if they are “close” according to a given distance (in this case geometrical distance),
This is called distance-based clustering.
Rew data Clustering algorithm Clusters of data
Fig. 3.3.2
* Clustering means grouping of data or dividing a large
data set into smaller data sets of some similarity.
. " ‘ - Centroid
A clustering algorithm attempts to find natural groups =
components or data based on some similarity. Also, the
clustering algorithm finds the centroid of a group of
TECHNICAL PUBLICATIONS® - an Up-thrust for knowl
owledgeMachine Leaming 3-15 Ensemble Techniques and Unsupervised Learning
basically a statistical description of the cluster centroids with the number of
components in each cluster.
+ Cluster centroid: The centroid of a cluster is a point whose parameter values are
the mean of the parameter values of all the points in the cluster. Each cluster has
a well defined centroid.
« Distance : The distance between two points is taken as a common metric to as see
the similarity among the components of population. The commonly used distance
measure is the euclidean metric which defines the distance between two points
P=(P1/P2---) and q =(q1,q2,...)is given by,
k
d = Yi-ai)?
a
+ The goal of clustering is to determine the intrinsic grouping in a set of unlableled
data. But how to decide what constitutes a good clustering ? It can be shown that
there is no absolute "best" criterion which would be independent of the final aim
of the clustering. Consequently, it is the user which must supply criterion, in such
a way that the result of the clustering will suit their needs.
« Clustering analysis helps construct meaninful partitioning of a large set of objects
Cluster analysis has been widely used in numerous applications, including pattern
recognition, data analysis, image processing etc.
* Clustering algorithms may be classified as listed below :
1. Exclusive clustering
2. Overlapping clustering
3. Hierarchical clustering
4, Probabilisitic clustering.
* A good clustering method will produce high quality clusters high intra - class
similarlity and low inter - class similarity. The quality of a clustering result
depends on both the similarity measure used by the method and its
implementation. The quality of a clustering method is also measured by it's ability
to discover some or all of the hidden patterns.
* Clustering techniques types : The major clustering techniques are,
a) Partitioning methods
>) Hierarchical methods
°) Density - based methods.
TECHNICAL PUBLICATIONS® - an up-hrust for knowledgeET |
Machine Learning
means
ach cluster is represented
4. Here © it is typicall bY the
usters, it is typically a user in,
‘Put
matically estimate K.
unsupervised Learning *
F tho
«js heuristic Me
K-Means clustering 1 h stands for number of ch
center of the cluster. can be used to auto
thm; some ;
to the algorithm; s°! £ components of the population equa a
This method ity take o! this step itself ts fal required numbe
E ee
De ee al such that the points are mutually farthes' apart.
us apart
o! each component in the population and assigns it to one of the
8 minimum distance. The centroid’s position j,
o the cluster and this continue
onent is added t
‘he final required number of clusters,
criteria
akes the number ©
f clusters. In
Next, it examine:
clusters dependi
recalculated everytime a CO
until all the components are 8}
Given K, the K-means algorithm consists of f
1. Select initial centroids at random.
2. Assign each object to the cluster wii
3. Compute each centroid as the mean of the obj
ling on the
mp:
rouped into #
four steps :
ith the nearest centroid.
jects assigned to it.
4. Repeat previous 2 steps until no change.
The x1,..,xy are data points or vectors of observations. Each observation
(vector x;) will Es assigned to one and only one cluster. The C(i) denotes cluster
number for the i observation. K-means minimizes within-cluster point scatter :
K
wo =>
22 cece Ix;
K
K
= ZNK OY linmg |?
Kel C=K
i IP
where
mx is the mean vector of the K"®: clustey
r.
Nx is the numbe
er of observations in Kth
fons in K cluster.
K-Means Algorithm Properties
1. There are always K clusters,
2. There i:
is always at least one item in ch
each cluster,
3. The clusters are i
Non-hierarchical and they d
cluster js closer t eee Weis
ways j 0 its ch
YS involve the “omter’ ofc than any other cluster bea
lusters,
4. Every member of a
closeness does not al
rejing Learning: 3-17 __Ensomble Techniques and Unsupervised Leaning
_ Mech
K-Means Algorithm Process
The dataset is partitioned into K clusters and the data points are randomly
assigned to the clusters resulting in clusters that have roughly the same number of
data points.
For each data point.
Calculate the distance from the data point to each cluster.
p, If the data point is closest to its own cluster, leave it where it is.
If the data point is not closest to its own cluster, move it into the closest
cluster.
Repeat the above step until a complete pass through all the data points results in
no data point moving from one cluster to another. At this point the clusters are
stable and the clustering process ends.
The choice of initial partition can greatly affect the final clusters that result, in
terms of inter - cluster and intracluster distances and cohesion.
Kemeans algorithm is iterative in nature. It converges, however only a local
minimum is obtained. It works only for numerical data. This method easy to
c
- implement.
+ Advantages of K-Means Algorithm :
1, Efficient in computation
2, Easy to implement.
+ Weaknesses
1. Applicable only when mean is defined.
Need to specify K, the number of clusters, in advance.
2
3. Trouble with noisy data and outliers.
4. Not suitable to discover clusters with non-convex shapes.
"© kNearest Neighbour is one of the only Machine Learning algorithms based totally
on supervised learning approach.
* NN algorithm assumes the similarity between the brand new case/facts and
available instances and placed the brand new case into the category that is
Maximum similar to the to be had classes.
: NN set of rules shops all of the to be had facts and classifies a new statistics
Point based at the similarity. This means when new data seems then it may be
effortlessly categorised into a properly suite class by using k-NN algorithm.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge18 Ensemble Techniques and Unsupervised Learning
Machine Learning SI
nas well as for classification howeye,
18801
> .s can be used for regres
ane we ification troubles.
cause of this it does no longer makes any
normally it's miles used for the class
K-NN is a non-parametric algorithm, be
assumption on underlying data.
because it does no lon,
learner set of rules ger
is also referred to as a lazy pape on
- a trom the training set immediately as a substitute it shops ataset ang
ese: .
at the time of class, it plays an movement at the datase'
‘The KNN set of rules at the schooling section simply stores the dataset and when
then it classifies that statistics into a class that is an awful Jot
it gets new data,
similar to the brand new data. ;
Example ; Suppose, we've an picture of a creature that ole oe like cat and
dog, but we want to know both it is a cat or dog. So for this identity, we are able
to use the KNN algorithm, because it works on a similarity degree. Our kKNN
version will discover the similar features of the new facts set to the cats and dogs
snap shots and primarily based on the most similar functions it will place it in
both cat or canine class.
EEE why Do We Need KNN ?
* Suppose there are two categories, ile, category A and category B and we've a
brand new statistics point x1, so this fact point will lie within of these classes. To
solve this sort of problem, we need a k-NN set of rules. With the help of k-NN,
we will without difficulty discover the category or class of a selected dataset.
Consider the underneath diagram :
x,
% %
oo
0 20
o%o a
Q 0° 0 0% 0
Category B ° °
. ‘ \ Category B
°
O80 New data point ° AC
ee ©0600 New data point
oo assigned to
Category A 5 cates
I oo Category A
X | _
Before k-NN x,
After k-NN
Fig. 3.4.1 Why do we need KNN ?
TECHNICAL PUBLicatinuin®chine Learning 3.
E Ma 19 Ensemble Techniques and Unsupervised Learning
po How Does KNN Work 2
The k-NN working can b is
. ‘SB explained on the basis of the below algorithm :
step- 1? Select the wide variety k of the acquaintances.
B step 2? Calculate the Euclidean distance of k variety of friends.
_ step + 3: Take the k nearest neighbors as according to the calculated Euclidean
- distance.
step - 4: Among these ok pals, count number the number of the data points in each
dass.
step -5: Assign the brand new records points to that cate; for which th tit
"of the neighbor is maximum. P ce ha ee
step - 6: Our model is ready.
+ Suppose we've got a brand new information point and we want to place it in the
required category. Consider the under image
X2
& 6
° 4°
. @ °
Category B
o\
Category A
x
Fig. 3.4.2 KNN example
we are able to select the
_ ® Firstly,
ok = 5.
we are able to pick the number of friends, so
Euclidean distance between the facts points. The
* Next, we will calculate the
which we've got already studied in
Euclidean distance is the gap between points,
geometry. It may be calculated as :
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeand Unsupervised Leary
Techniques 9
Ensemble
3-20
Machine Leaming
Best hyperplane
Euclidean distance between A, and By %
+(Y2-Y4)'
Fig. 3.4.3 KNN example continue
By calculating the Euclidean distance we Bot the nearest acquaintances, as 3
egory A and two Nearest associates in class B, Consider
TECHNICAL PUBLICATIONS® . an up.tnrust for knowledge3-21
°
°
@° 0
\.@ %o
Category 8
New data
point
°
oo
ee?
Category A
Ensemble Techniques and Unsupervised Leaming
Category A: 3 neighbours
Category B : 2 neighbours
%
Fig. 3.4.4 KNN example continue
« As we are able to see the three nearest acquaintances are from category A,
subsequently this new fact point must belong to category A.
_ EEE] Difference between K-means and kNN
Sr. No. K-means
i K-Means is an unsupervised
machine learning algorithm used
for clustering.
as K-Means is an eager learner
3. It is used for Clustering
4 K' in K-Means is the number of
clusters the algorithm is trying to
identify/learn from the data
5. K-means require unlabelled data.
It gathers and groups data into k
number of clusters.
K-NN is a lazy learner
kKNN
KNN is a supervised machine
learning algorithm used for
classification.
It is used mostly for
Classification, and sometimes
even for Regression
K in KNN is the number of
nearest neighbours used to :
classify or predict a test sample |
KNN require labelled data and
will give new data points
accordingly to the k number or
the closest data points.semble Techniques 2d UNSUPESe¢ Loon,
0
3-22 En:
Machine Learning
7 ixture Models
Gaussian Mixtu' |
Es . odels is a “soft” clustering algorithm, where each po;
© Gaussian Sa ianeT to all clusters. This is different than k-means where g,4,
bilistical
proba ster.
point belongs to one clu
« The Gaussian mixture mo :
points are generated from a mix of
parameters. ; oe .
« For example, in modeling human height data, height is pea oaned as a
normal distribution for each gender with a mean of approximately 5'10" for males
and 5'5" for females. Given only the height data and not the gender assignments
for each data point, the distribution of all heights would follow the sum of two
scaled (different variance) and shifted (different mean) normal distributions. 4
model making this assumption is an example of a Gaussian mixture model.
* Gaussian mixture models do not rigidly classify each and every instance into one
class or the other. The algorithm attempts to produce K-Gaussian distributions that
would take into account the entire training space. Every point can be associated
with one or more distributions. Consequently, the deterministic factor would be
the probability that each point belongs to a certain Gaussian distribution.
Mean vectors and covariance
del is a probabilistic model that assumes all the datg
f Gaussian distributions with unknoys
* Gaussian mixture model consists of two parts :
matrices.
+ A Gaussian distribution is defined as a continuous probability distribution that
takes on a bell-shaped curve. Another name for Gaussian distribution is the
normal distribution.
* In a one dimensional space, the probability density function of a Gaussian
distribution is given by :
bow)?
f(x|p,02) =
where 1 is the mean and o? is the variance
. ae mixture models can be used for a variety of use cases, including
fying customer segments, detecting fraudulent activity and clustering image
* GMMs have a variety of real-wor i
I-world applicati ' ,
a) Used for signal processing applications. Some of them are listed below,
b) Used for customer churn analysis
©) Used for language identification
eeeporing Le0rind 3-23 Ensemble Techniques and Unsupervised Leaming
4) Used in video game industry
e) Genre classification of songs,
al Expectation - Maximization
ein Gaussian ae models, an expectation - maximization method is a powerful
tool for estimating the parameters of a Gaussian mixture model. The expectation
is termed E and maximization is termed M.
«Expectation is used to find the Gaussian parameters which are used to represent
each component of Gaussian mixture models. Maximization is termed M and it is
involved in determining whether new data points can be added or not.
The Expectation-Maximization (EM) algorithm is used in maximum likelihood
estimation where the problem involves two.sets of random variables of which one,
X, is observable and the other Z, is hidden.
«The goal of the algorithm is to find the parameter vector @ that maximizes the
likelihood of the observed values of X, L($1X).
‘© But in cases where this is not feasible, we associate the extra hidden variables Z
and express the underlying model using both, to maximize the likelihood of the
joint distribution of X and Z, the complete likelihood L.(@1X,Z).
[EE@] EM Algorithm
_« Expectation-Maximization (EM) is an iterative method used to find maximum
likelihood estimates of parameters in probabilistic models, where the model
depends on unobserved, also called latent, variables.
EM alternates between performing an expectation (E) step, which computes an
expectation of the likelihood by including the latent variables as if they were
observed, and maximization (M) step, which computes the maximum likelihood
estimates of the parameters by maximizing the expected likelihood found in the E
step.
© The parameters found on the M step are then used to start another E step, and the
"process is repeated until some criterion is satisfied. EM is frequently used for data
clustering like for example in Gaussian mixtures.
In the Expectation step, find the expected values of the latent variables (here you
need to use the current parameter values).
In the Maximization step, first plug in the expected values of
in the log-likelihood of the augmented data. Then maximize this log-likelihood to
the latent variables
Teevaluate the parameters.
X
|
|
|ae |
Machine Leaming
9-24 Ensemble Techniques and Unsupervised Leaming
tion (EM) is a technique used in point estimation. Given
al is
Expectation-Maximiz: latent) variables Z we want to
set of observable variables X and unknown (
estimate parameters @ in a model. /
The Expectation Maximization (EM) algorithm SE ees asim
likeli-hood estimation procedure for statistical models when © of
the variables in the model are not observed ;
The EM algorithm is an elegant and powerful method for finding the maximum
likelihood of models with hidden variables. The key concept in the EM algorithm
is that it iterates between the expectation step (E-step) and maximization step
(M-step) until convergence.
In the E-step, the algorithm estimates the posterior distribution of the hidden
variables Q given the observed data and the current parameter settings; and in the
M-step the algorithm calculates the ML parameter settings with Q fixed.
At the end of each iteration the lower bound on the likelihood is optimized for the
given parameter setting (M-step) and the likelihood is set to that bound (E-step),
which guarantees an increase in the likelihood and convergence to a local
maximum, or global maximum if the likelihood function is unimodal.
Generally, EM works best when the fraction of missing information is small and
the dimensionality of the data is not too large. EM can require many iterations,
and higher dimensionality can dramatically slow down the E-step.
EM is useful for several reasons: conceptual simplicity, ease of implementation,
and the fact that each iteration improves 1(@). The rate of convergence on the first
few steps is typically quite good, but can become excruciatingly slow as you
approach local optima.
Sometimes the M-step is a constrained maximization, which means that there are
constraints on valid solutions not encoded in the function itself.
Expectation maximization is an effective technique that is often used in data
analysis to manage missing data. Indeed, expectation maximization overcomes
Some of the limitations of other techniques, such as mean substitution or
aca substitution. These alternative techniques generate biased estimates-and,
Specifically, underestimate the standard e i 1
rors. Expectati imizati
overcomes this problem. ee
Two Marks Questions with Answers
Q1
What Is unsupervised learning 7
TECHNICAL PUBLICATIONS® « an up.tnnsy for knowled
190Machine Learning 3-25 Ensemble Techniques and Unsupervised Learning
What is semi-supervised learning 7
a2
Semi-supervised learning uses both labeled and unlabeled data to improve
Ans.
supervised learning.
what is ensemble method ?
a3
‘Ans. : Ensemble methods is a machine learning technique that combines several base
models in order to produce one optimal predictive model. It combine the insights
obtained from multiple learning models to facilitate accurate and improved decisions.
Q4 What is cluster ?
‘Ans. : Cluster is a group of objects that belong to the same class. In other words the
similar object are grouped in one cluster and dissimilar are grouped in other cluster.
a5 Explain clustering.
‘Ans. : Clustering is a process of partitioning a set of data in a set of meaningful
subclasses. Every data in the subclass shares a common trait. It helps a user understand
the natural grouping or structure in a data set.
Q6 What is bagging ?
‘Ans. : Bagging is also known as Bootstrap aggregation, ensemble method works by
training multiple models independently and combining later to result in a strong
model.
Q7 Define boosting.
Ans. : Boosting refers to a group of algorithms that utilize weighted averages to make
weak learning algorithms stronger learning algorithms
Q8 What is k-Nearest Neighbour Methods ?
Ans.: © The k-Nearest Neighbor (KNN) is a classical classification method and
requires no training effort, critically depends on the quality of the distance
measures among examples.
* The KNN classifier uses mahalanobis distance function. A sample is classified
according to the majority vote of the its nearest k training samples in the
feature’ space. Distance of a sample to its neighbors is defined using a
distance function,
Q9 Which are the performance factors that Influence KNN algorithm ?
Ans. : The performance of the kNN algorithm is influenced by three main factors :
1. The distance function or distance metric used to determine the nearest
neighbors.
2. The decision rule used to derive a classification from the k-nearest neighbors.
3. The number of neighbors used to classify the new example.
TECHNICAL PUBLICATIONS®
= an up-thrust for knowledge6 Ensemble Technigue’ and Unsupenisgg
3-2
Machine Leaming
fe Komeans clustering ?
heuristic me!
kemeans algoti
k-clusters $0
shod. Here each cluster is representeg
thm takes the input parameter, the
that the resulting intracluster aia ang
tity
is
Q.10 What Ii
Ans. : k-means clustering is
center of the cluster. The
n objects into
Juster similarity is low.
perties of K-Means algorithm.
partitions a set of
high but the intercl
Q.11 List the pro
‘Ans. : 1. There are always k clusters.
2, There is always at least one item in each cluster.
3, The clusters are non-hierarchical and they do not overlap.
Q.12 What is stacking ?
aa Stacking, sometimes called stacked generalization, is an ensemble mach
g method that combines multiple heterogeneous base or component mises
a meta-model.
Q.13 How do GMMs differentiate from K-means clustering ?
Ans.: GMMs and K-means, both i pervis
anaes , both are clustering algorithms used ;
lamin ng tas. However, the basic difference between the is — ects
clusteri: i i ee :
gears ing method while GMMs is a distribution based ae :
tering
Quo