0% found this document useful (0 votes)

30 views84 pages

Unit-2 MLT

The document discusses regression analysis, emphasizing its importance in predicting the value of a variable based on others, with generalized linear models being the most commonly used technique. It outlines the regression process, common reasons for conducting regression, and the formulation of linear models. Additionally, it introduces Bayes theorem and concept learning, detailing the brute-force MAP learning algorithm and its assumptions regarding hypothesis probabilities.

Uploaded by

Kartikeya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views84 pages

Unit-2 MLT

Uploaded by

Kartikeya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

UNIT

UNIT-2
1. Regression

Prepared By: Deepti Singh

Regression Modelling
• Regression Analysis is widely used, because there are so many
statistical problems that can be presented as finding out how to
predict the value of a variable from the value of other variables.
Generalized Linear Models:
• The fitting of generalized linear models is currently the most
frequently applied statistical technique.
technique
- used to describe the relationship between the mean, sometimes
called the trend, of one variable and the values taken by several other
variables.
• Modelling this type of relationship is sometimes called Regression.
Regression
• Regression: It is the process of determining how a variable y is related
to one or more, other variables x1,x2,x3….
x1,x2,x3….xn
xn..
Common reasons for doing a regression:
• The output is expensive to measure, but the inputs are not, and so
cheap predictions of the output are sought.
• The values of the inputs are known earlier than the output.
• We can control the values of the inputs, we believe there is a casual
link between the inputs and the outputs and so we want to know
what values of the i/p
/p should be chosen to obtain a particular target
value for the output.
• As we believed about casual link between the i/p and o/p, we wish to
identify which i/p
/p is related to the o/p.
• The most widely used form of regression model is GLM.
• The linear model is usually written as:
Cont…
• Yj =β0 + β1x1j +β2x2j +…….+ βnxn

Y =β0 + β1x
y= dependent variable
x= independent variable
β0 =constant term or intercept
β1 = x’s slope or coefficient
L-------------_j)
'1
p( ji' "=' I ( X; ~ fi) ,:, h ( XL) -- ~(V
f (;y =-0 I>lj ; pJ J-h ( i ,)
=;
!--
w ~ ~ vvv\., ~
d

IAJe_ ~ _____ , ,_,

J' ,,' - Ji
p ( ~1/xii ~)~ Qi(xtl)' (! ;- h(?!i); __-y-

fvu,.-,.,rt\LJ,M Q_,~J -h.' ¼~ 1-a 1,1,-,- J~ 4

~ u·~J. { f->o, p,) 1~ ~ -14 ~
f>roh,J,,Lh'~ ~ ~r ,j_o ~ (J__ oY O) .
-re_,__ L~~od ~ ·Lt) (t~ ~

l ( /2°,~ )= }f
1
f(d
. J; =-I
,!y; - C,- p(K,'.))--~U)
-< ~ 0
-n;:L- ~ r.~ ~ )1-,'r__, ,y; 'n
'
w .

~ 0-JL. c-,_ ~ (l.A i1yW\'WJ ~ _&:n.-+- /

fl-o ~ ~toed 11"' a-§4
j
I

r. 0
,- .
0
3.
0
0
t. 0
0

. 2.· oo \
0
1.. 2..5 I
0
I
(J

1
0

5, 0 0 I
,.s I
ngs (bkhs) versus the probability of \oan tql.\yment , with the l<Jy)<;tk r~ve..~c;,·\on c.urvc
.
ttgttssion analysis using the maximum \ikclihnm\ estimate gives the ft,\\ow·m g out~ut:
coefficients are /3 0 = -4.07778 and /3 1 == 1.5046

1.00
• ••••• •
l.4 -
C
G)
E 0 .75
>-
ca
0.
!
C
ca

-I
.2 0.50
0

:aca 0.25
.a
e
~
: ...

0.00
1 2 3 4 5
Savings in lakhs

Figure 8.3 Probability of loan repayment with yearly savings.

J 'nto th<' lnp,m1C.. . cqu.:u,on
, , rrr,rr~~1nn ' to r.,uitn h, ~ "
111Nl' c;c,c-ffidcntl arr rntrrr I ' ,_,, I ' p,,,. h·
1,>an non -Jc I.,111 IIC"f : I r.. 11
'') 'if
.. (I · on •tld;111l1 rr ~ - ---·----
""lh.-h1l11y o "'"'~" l+r~,,( - (1 .50/4(i,,,. · ·----,._
Ytr1g.1 - /4 .() ]]i""
•f, 11 ,t,Hnrr wi th 2 l"kh , 5:wi ng prr yr:ir, rhr r•titt1:.1trcJ L J)
For ('U nll'Ic, f,' ' 1 r,ro,>3hili,y r.1
..J _. lrrt i~ u h,11,,"ll: .,, I. ~
n<'n r,111 . - -· I '11!
r"~\ahilicy nf hring non -tl,,f;:111l1 rr ~ I -( ( - - - - - -
+ <'X p - 1. 50/4'1 , 2 - 4,0777 D
rt with 4 la~lu Mwi 11r,, prr yr:.i r, thr c~tim:.itrd prob~hility f I
~ l11iy "..., • 'u~tom ' I r I /I r I I c, '•3'1 ,
•l'ht · _J and al 011 , 1)u ll for d,t 11011 '"~ ~
m, rf'<'ih1'N
.,,u -c rnrn tcr c c,:iu trr arc, 1own in 1~hl
(! t),2
O . • ,, ,,,
,. o., iJ t1lrn I.\ C'utro, I, rl,t (llllf)III 0. "'· 6,,

~
i,
........,
Table 1.2 Predicted and actual values of defaulter/non.defaulter

o.s
~

0
D,fi,,J.,. Fitt,JV._

0.0347) 0025
0
O.S~ 0 0.04977197 I
0
1 0 0.070889852
0
1.:.5 0 0.1000247 15 ·O
1.5 0 0.1 39337907 0
1.75 0 0.190826302 0
l.':'5 1 0.190826302 0
l o. 0.255688447 0
2.25 I 0.333510508 0
l.S o,,. 0.42160211 5 0
2.75 1 0.51498301 3 I
J 0 0.607329347 I •
315 1 0.692588758 I
J.S 0 0.766454783 l •
4 I 0.8744290:?6 l
4lS I 0.9 10262'>67 l
4.S
l 0.9366123:?4 l
..,s I 0. 95 S602 l 24 l
s l 0. 969090667 l
s.s I 0. 98 Sl '>&N4 I
-
' .1 ~J ~f N) T~ 14 w-{ tP) .

-® TN · ~ ~ ~~ ~
7

VJ O. -rL_
a.c.fu:t.-.t if,,_k_ J:,.,
fvLPM .
fr ~
~V"-"4

~"'""· -~· •,
~· th hr
~ Utl _
@ Ff ➔ ,~ ~ v);.t,.,_ ~ o 2..- ~ t~
tn 1. TIN- ~k L )Jluu r-f"•- ~- ~' th ·

~ -~ ,al . J ~ ~
/2)~
lV
pJ ~
J-
~
~
fyu)µ
r ~
tf'.~-«uilj
-· -o
V1 ~
~-c b..
V"'&

tA
I
o. ~
,
~ TP ~ t,Shv,v µ
. ,

f«Y((~b
~c J_ o-,.,L,
.

rJ,./
~ t/~ IA 1 , lK' fflAAuJly f~ ~ -/
• also called confusion matrix, shown in Table 8.3 helps classify the values that were correctly
rnacrLX,
d using the model bu1·1t.
'1"I.1
11 e.
predicce lassifications are used to calculate accuracy, precision (or positive predictive value), recall (or
These
. . . )c specificity, an d negauve
· pred.1cuve
· val ue. These
are a few metrics used to identify the accuracy
senor 6siuv;1~
co
logistic model based on the conf4sion matrix shown in Table 8.4.

Table 8.4 Metrics to evaluate logistic regression for defaulter/non-defaulter

prediction

(TP + TN)/Totld N,m,ber of01,lff'INdio,u

Precision or positive predictive value TP/(TP + FP)
Negative predictive rate TN/(TN +FN)
Sensitivity TP/(TP + FN)
Specificity TN/(TN + FP)

Let us consider the following example of confusion matrix:

Predicted 0 Predicted 1
Actual 0 8 (TN) 2(FP)
Actual 1 2(FN) 8(TP)

(TP+TN
Accuracy=....:,___ _. .);.,. = -16 = 0.8
N 20
.. TP 8
Prects1on = (TP + FP) = = 08 .
10
. TN 8
Negative predicuve rate= ( ) = - = 0.8
' TN+FN 10
TP 8
Sensitivity= (TP + FN) = = 0.8
10

S "fi . TN 8
08
pec1 1c1ty =(TN+ FP) = 10 = .

In this sample dataset the accuracy obtained is 80%. This is how accuracy and precision arc computed in a
real time dataset.
UNIT
UNIT-2
Bayes Theorem & Concept Learning
Bayes Optimal Classifier
Naïve Bayes Classifier
Bayesian Belief Network
EM Algorithm

Prepared By: Deepti Singh

Bayes Theorem & Concept Learning
1. Brute-Force Bayes Concept Learning
• Design a concept learning algorithm to output the maximum a posteriori hy
BRUTE-FORCE MAP LEARNING algorithm hypothesis, based on Bayes theorem, as
follows:
1. For each hypothesis h in H, calculate the posterior probability

2. Output the hypothesis hMAP with the highest posterior probability

3. Impractical for large hypothesis spaces

• Learning problem for the BRUTE-FORCEFORCE MAP LEARNING algorithm we must
specify what values are to be used for P(h) and for P(D|h).
• We may choose the probability distributions P(h) and P(D|h) in any way we wish,
to describe our prior knowledge about the learning task.
• Here let us choose them to be consistent with the following assumptions:
1. The training data D is noise free (i.e., di = c(xi)).
2. The target concept c is contained in the hypothesis space H.
3. We have no a priori reason to believe that any hypothesis is more probable
than any other.
• Given these assumptions, what values should we specify for P(h)?
P(h)
- given no prior knowledge that one hypothesis is more likely than another, it is
reasonable to assign the same prior probability to every hypothesis h in H.
- because we assume the target concept is contained in H we should require that
these prior probabilities sum to 1.
….. Eq. 1
• What choice shall we make for P(D|h)? )? P(D|h)
P( is the probability of observing the
target values D = (d1 . . .dm) for the fixed set of instances (x1 . . . xm), given a world
in which hypothesis h holds.
- Since we assume noise-freefree training data, the probability of observing
classification di given h is just 1 if di = h(xi) and 0 if di != h(xi). Therefore,

….. Eq.2
It means, the probability of data D given hypothesis h is 1 if D is consistent with
h, and 0 otherwise.
• Given these choices for P(h) and for P(D|h
D|h) we now have a fully-defined problem
for the above BRUTE-FORCE MAP LEARNING algorithm.
-> Let us consider the first step of this algorithm, which uses Bayes theorem to
compute the posterior probability P(h|D)
P( of each hypothesis h given the
observed training data D.

Bayes theorem, we have :

First consider the case where h is inconsistent with the training data D. Since
Equation (2) defines P(D|h) to be 0 when h is inconsistent with D, we have :

The posterior probability of a hypothesis inconsistent with D is zero.

Now consider the case where h is consistent with D. Since Equation (2) defines
P(D|h) to be 1 when h is consistent with D, we have :
• To summarize, Bayes theorem implies that the posterior probability P(h|D)
P( under
our assumed P(h) and P(D|h) is :
2. MAP Hypotheses and Consistent Learners
• We will say that a learning algorithm is a consistent which results a
hypothesis that commits zero errors over the training examples.
• Given the previous analysis, we can conclude that every consistent
learner outputs/results a MAP hypothesis, if we assume a uniform
prior probability distribution over H (i.e., P(hi) = P(hj) for all i, j), and if
we assume deterministic, noise free training data (i.e., P(D|h) = 1 if D
and h are consistent, and 0 otherwise).
otherwise)
Example:
• FIND-SS looks through the hypothesis space, starting with the most
specific hypotheses and moving to more general ones.
• It stops when it finds the most specific hypothesis that fits all the
data (called a consistent hypothesis).
• Even though FIND-SS doesn't use probability calculations, its chosen
hypothesis turns out to match the MAP hypothesis if the probability
distributions P(h)P(h) (prior probability) and P(D∣h)P(D|h)
P( ∣ (likelihood)
favor more specific explanations.
BAYES OPTIMAL CLASSIFIER
• "what is the most probable hypothesis given the training data?' In fact, the
question that is often of most significance is the closely related question "what
is the most probable classification of the new instance given the training data?
• Although it may seem that this second question can be answered by simply
applying the MAP hypothesis to the new instance, in fact it is possible to do
better.
• the most probable classification of the new instance is obtained by combining the
predictions of all hypotheses, weighted by their posterior probabilities. If the
possible classification of the new example can take on any value v, from some set
V, then the probability P(vj|D) that the correct classification for the new instance
is vj, is just :
• The optimal classification of the new instance is the value vj, for which P (vj|D) is
maximum.

• Any system that classifies new instances according to acc. to above equation is
called a Bayes optimal classifier, or Bayes optimal learner.
• No other classification method using the same hypothesis space and same prior
knowledge can outperform this method on average.
• This method maximizes the probability that the new instance is classified
correctly, given the available data, hypothesis space, and prior probabilities over
the hypotheses.
Naive Bayes Classifier
• One highly practical Bayesian learning method is the naive Bayes learner, often
called the naive Bayes classifier. In some domains its performance has been
shown to be comparable to that of neural network and decision tree learning.

• The naive Bayes classifier applies to learning tasks where each instance x is
described by a conjunction of attribute values and where the target function f (x)
can take on any value from some finite set V.

• A set of training examples of the target function is provided, and a new instance
is presented, described by the tuple of attribute values (a1, a2.. .an). The learner is
asked to predict the target value, or classification, for this new instance.
• The Bayesian approach to classifying the new instance is to assign the most probable
target value, VMAP, given the attribute values (a1, a2.. .an) that describe the instance.

• ….Eq 1
• The naive Bayes classifier is based on the simplifying assumption that the attribute
values are conditionally independent given the target value. In other words, the
assumption is that given the target value of the instance, the probability of observing the
conjunction al, a2.. .a, is just the product of the probabilities for the individual attributes:

• Substituting this into Equation (1), we have the approach used by the naive Bayes
classifier.
• where VNB denotes the target value output by the naive Bayes classifier.

• An Illustrative Example, Let us apply the naive Bayes classifier to a concept

learning problem i.e., classifying days according to whether someone will play
tennis.
• The below table provides a set of 14 training examples of the target concept
PlayTennis, where each day is described by the attributes Outlook, Temperature,
Humidity, and Wind
• Here we use the naive Bayes classifier and the training data from this table to
classify the following novel instance:
• < Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong >
• Our task is to predict the target value (yes or no) of the target concept PlayTennis
for this new instance
….eqn1

• To calculate VNB we now require 10 probabilities that can be estimated from th

training data. First, the probabilities of the different target values can easily b
estimated based on their frequencies over the 14 training examples:
P(PlayTennis = yes) = 9/14 = .64
P(PlayTennis = no) = 5/14 = .36
• Similarly, we can estimate the conditional probabilities.
probabilities For example, those for Wind = strong are
P(Wind = strong|PlayTennis = yes) = 3|9 = .33
P(Wind = strong| PlayTennis = no) = 3|5 = .60

Temperature Y N Outlook Y N
hot 2/9 2/5 Sunny 2/9 3/5
mild 4/9 2/5 Overcast 4/9 0
cool 3/9 1/5 Rain 3/9 2/5

Windy Y N Humidity Y N
Strong 3/9 3/5 High 3/9 4/5
Weak 6/9 2/5 Normal 6/9 1/5
• Using these probability estimates and similar estimates for the remaining
attribute values, we calculate VNB according to Equation (1) as follows:
(9/14*2/9*3/9*3/9*3/

Thus, the naive Bayes classifier assigns the target value PlayTennis = no to this
new instance, based on the probability estimates learned from the training data.
• By normalizing the above quantities to sum to one we can calculate the
conditional probability that the target value is no, given the observed attribute
values.
• ESTIMATING PROBABILITIES
• We have estimated probabilities by the fraction of times the event is observed to occur ove
the total number of opportunities.
e.g. : we estimated P(Wind = strong| Play Tennis = no) by the fraction
nc/n
where n = 5 is the total number of training examples for which PlayTennis = no, and nc = 3 is
the number of these for which Wind = strong.
• While this observed fraction provides a good estimate of the probability in many cases, it
provides poor estimates when n, is very small.
• To avoid this difficulty we can adopt a Bayesian approach to estimating the probability, usin
the m-estimate defined as follows :
p=prior estimate,
m is constant called equivalent sample
size
•
Bayesian Belief Network
• It is a graphical model that represents the probabilistic relationships among
variables.
• It is used to handle uncertainty and make predictions or decisions based on
probabilities.
• Graphical Representation: Variables are represented as nodes in a directed
acyclic graph (DAG), and their dependencies are shown as edges.
• Conditional Probabilities:: Each node’s probability depends on its parent
nodes, expressed as P(Variable | Parent).
• Probabilistic Model:: Built from probability distributions, BBNs apply
probability theory for tasks like prediction and anomaly detection.
• The naive Bayesian classifier makes the assumption of class conditional
independence, i.e., given the class label of a tuple, the values of the attributes
are assumed to be conditionally independent of one another. This simplifies
computation.
• When the assumption influence true, therefore the naïve Bayesian classifier is
the efficient in comparison with multiple classifiers. Bayesian belief networks
defines joint conditional probability distributions.
distributions
• They enable class conditional independencies to be represented among subsets
of variables. They support a graphical structure of causal relationships, on which
learning can be implemented. Trained Bayesian belief networks is used for
classification. Bayesian belief networks are also called a belief networks, Bayesian
networks, and probabilistic networks.
• A belief network is represented by two components including a directed acyclic
graph and a group of conditional probability tables. Every node in the directed
acyclic graph defines a random variable.
variable The variables can be discrete- or
continuous-valued.
• In general, a Bayesian network represents the joint probability distribution by
specifying a set of conditional independence assumptions (represented by a
directed acyclic graph), together with sets of local conditional probabilities.
• The joint probability for any desired assignment of values (y1, . . . , yn) to the tuple
of network variables (Y1 . . . Yn) can be computed by the formula:

• where Parents(Yi) denotes the set of immediate predecessors of Yi in the net-

work. Note the values of P(yi |Parents(Yi)) are precisely the values stored in the
conditional probability table associated with node Yi.
Example
Expectation-Maximization
Maximization Algorithm (EM)
• In real world applications of machine learning it is common that there are many
relevant features available for learning but only a small subset of them are
observable.
• The EM algorithm can be used for the latent variables (variables that are not
directly observable and are actually inferred from the values of the other
observed variables.)
• It has a wide range of applications, but it is likely best recognized in machine
learning for its usage in unsupervised learning tasks such as density estimation
and expectation maximization clustering.
EM Algorithm
• Initially, a set of initial values of parameters are considered. A set of incomplete
observed data is given to the system with the assumptions that the observed data
comes from a specific model.
• The second step, known as Expectation or E-Step,
E is used to estimate or guess the
values of missing or incomplete data using the observed data. E-step E also largely
updates the variables.
• The third step is known as the Maximization or M-step,
M and it is when we use the
whole data from the 2nd step to update the parameter values.(update
hypothesis)
• The fourth or final step is to determine whether or not the values of latent
variables are converging. If it returns “yes,” end the procedure; otherwise, restart
from step 2 until convergence occurs.
EM Algorithm
Uses:
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• Used for the purpose of estimating parameters of Hidden Markov
Model (HMM).
• Discovers the values of latent variables.
Advantages:
Disadvantages:
Support Vector Machine
• -Introduction
- Types of Support Vector Kernel
- Hyperplane
- Properties of SVM & Issues.
Support Vector Machine

• Support Vector Machine are Supervised Learning algorithms that were introduced in 1992.
- It becomes popular of their success in handwritten digit recognition.
- Experimentally, it was proved that SVMs has low error rate.
- 1.1% test error rate for SVM.(same as neural networks)
- It can be employed for both classification and regression purpose.
- It tries to map an input space into an output space using a non-linear mapping
function Φ such that, the problem or the data points become linearly separable in the
output space.
- When the points become linearly separable then SVM discovers the optimal separating
hyperplane.

• The goal of SVM is to find the optimal hyperplane which maximizes the margin of the
training data.
SVM: Types
1. Linear SVM:
• We want to find the best hyperplane (i.e. decision boundary) linearly separating our classes. Our
boundary will have equation: wTx + b = 0.
• Anything above the decision boundary should have label 1. i.e.,
• wTxi + b > 0 will have corresponding yi = 1.
• Similarly, anything below the decision boundary should have label -1. i.e.,
wTxi + b < 0 will have corresponding yi = -1.
2. Non-Linear SVM: The dataset cannot be classified into two classes by using
a straight line.
• Non-linear
linear classification is carried out using the kernel concept.
• Non-Linear
Linear SVM applies the function of the kernel concept to a space that has
high dimensions.
Kernel Trick in SVM
• In Machine Learning, the data can be text, image or video.
• The function of the kernel trick is to map the low-dimensional input space and
transforms into a higher dimensional space.
space
• We need to extract features from these data for the classification purpose.
• In real world, Many classifications models are complex and mostly require non-
linear hyperplanes.
• E.g. mapping function Φ: R2 R3 Used to transform a 2-D data into
3-D data. Given as follows:
Φ(x,y)=(x2,√2xy,y
√2xy,y2)
Kernel Trick for 2nd Degree polynomial Mapping.
Types of Kernel
• Linear Kernel
• Polynomial Kernel
• Homogeneous Kernel
• Inhomogeneous Kernel
• Gaussian Kernel or Radial-Basis
Basis function Kernel
• Sigmoid Kernel
• Etc.
1. Linear Kernel
• Linear Kernel are of the type:

where x and y are two vectors.

• Therefore,
2. Polynomial Kernel
• Polynomial kernels are of type:
• This is called homogeneous kernel.
kernel
• Here q is the degree of the polynomial.
• If q=2 then it is called quadratic kernel.
• For inhomogeneous kernel,, this is given as:
• Here, C is constant, q is degree of the polynomial
• If c=0 & q=1,the polynomial kernel is reduced to a linear kernel.
kernel
• The value of q should be optimal as more degree may lead to
overfitting.
Example:
Example:
Example:
Gaussian Kernels
• RBFs or Gaussian kernels are extremely useful in in SVM.
• It is shown as follows:

• Here, y plays an very important role as a parameter. If y is small, then

the RBFs is similar to linear SVM.
• If y is large, then the kernel is impacted by more support vectors.
• The RBFs performs the dot product in R∞ ,and because of this it is
highly effective in separating the classes.
Example:
Sigmoid Kernel
Hyperplane
• A hyperplane in Support Vector Machine (SVM) is a decision boundary that
separates different classes in the data
Its Purpose is:
• To classify data points into different categories.
• To maximize the margin between classes for optimal separation.
• SVM can work with no. of dimensions:
• In a 1-D, a hyperplane is called a point; in a 2D space, it's a line; in 3D space, it's a
plane; in higher dimensions, it's called a hyperplane.
• The optimal hyperplane is the one that maximizes the margin between two
classes.
• SVM uses support vectors (data points closest to the hyperplane) to define this
boundary.
Cont…

w⊤x+b=0
x+b

• w = weight vector perpendicular to the hyperplane

• x = feature vector
• b = bias
Properties of SVM
• Margin Maximization: SVM aims to find the hyperplane that best separates the
classes in the feature space by maximizing the margin, which is the distance
between the hyperplane and the nearest data points (support vectors).
• Support Vectors: Only the data points closest to the decision boundary (support
vectors) influence the position of the hyperplane. This makes SVM robust to
outliers.
• Kernel Trick: SVM can handle non-linear
linear data by applying kernel functions (e.g.,
linear, polynomial, radial basis function (RBF)) to transform the input features
into a higher-dimensional space where the data becomes linearly separable.
• Dual Formulation: SVM uses a dual optimization problem, which allows it to
operate efficiently, especially when the number of features exceeds the number of
data points.
Cont…

• Regularization Parameter (C):: The parameter C controls the trade-off

trade between
achieving a low error on the training data and maintaining a large margin. A
smaller C encourages a wider margin, while a larger C prioritizes fitting the data.

• Scalability:: While SVMs work well with small-

small to medium-sized datasets, their
training time can grow significantly with large datasets, as the complexity depends
on the number of support vectors.
Issues in SVM

• Scalability: SVMs struggle with large datasets because their training time can be
computationally expensive, especially if the number of support vectors grows.
• Choice of Kernel: Deciding on the right kernel function and its parameters (like
the degree for polynomial kernels or gamma for RBF kernels) can be tricky and
often requires experimentation.
• Sensitivity to Parameters: SVM performance heavily depends on
hyperparameters (e.g., regularization parameter CC and kernel parameters). Poor
tuning can lead to suboptimal results.
Cont…
• Non-Probabilistic Outputs: SVMs don't directly provide probabilistic outputs.
this means that the model provides a definitive classification or decision (e.g.,
"Class A" or "Class B") rather than a probability score indicating the likelihood of
belonging to a particular class.
• Difficulty with Noisy Data: When data is noisy or overlapping classes, SVM can
struggle to find a clear decision boundary, affecting its performance.

Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
IoT Data Analytics for Tech Enthusiasts
No ratings yet
IoT Data Analytics for Tech Enthusiasts
27 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
DA Unit 2
No ratings yet
DA Unit 2
124 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Unit 2&3 - 250421 - 215911
No ratings yet
Unit 2&3 - 250421 - 215911
60 pages
05 Logistic Regression
No ratings yet
05 Logistic Regression
12 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
AML Winter 2021 Solution
No ratings yet
AML Winter 2021 Solution
6 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
Lec 20
No ratings yet
Lec 20
16 pages
Lecture 02
No ratings yet
Lecture 02
4 pages
ML 01
No ratings yet
ML 01
57 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Naïve Bayes for Text Classification
No ratings yet
Naïve Bayes for Text Classification
33 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
480 Note Lin
No ratings yet
480 Note Lin
11 pages
MLT by Engineering Express
No ratings yet
MLT by Engineering Express
94 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Co 3&4
No ratings yet
Co 3&4
22 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Unit 3 LOGISTIC
No ratings yet
Unit 3 LOGISTIC
7 pages
i2ML Cheatsheets
No ratings yet
i2ML Cheatsheets
7 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
Unit 1
No ratings yet
Unit 1
92 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
49 pages
Cours ML
No ratings yet
Cours ML
14 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Week 2
No ratings yet
Week 2
43 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Module 3 Intro
No ratings yet
Module 3 Intro
46 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Chapter 3 - Introduction Via Linear Regression
No ratings yet
Chapter 3 - Introduction Via Linear Regression
20 pages
Machine Leraning Unit 2
No ratings yet
Machine Leraning Unit 2
62 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Linear Review 1
No ratings yet
Linear Review 1
235 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
hw2b 2017
No ratings yet
hw2b 2017
7 pages
American Options Pricing Methods
No ratings yet
American Options Pricing Methods
9 pages
Wireless Communications: Principles and Practice 2 Edition T.S. Rappaport
No ratings yet
Wireless Communications: Principles and Practice 2 Edition T.S. Rappaport
19 pages
P35 Portable Dewpoint Meter Datasheet 1898 Iss7
No ratings yet
P35 Portable Dewpoint Meter Datasheet 1898 Iss7
3 pages
Product Catalogue 11 Stauff Hire
No ratings yet
Product Catalogue 11 Stauff Hire
20 pages
Agsc QP
No ratings yet
Agsc QP
15 pages
Ohmicide Ref Man
100% (1)
Ohmicide Ref Man
33 pages
SAP S - 4 HANA in Project Management
No ratings yet
SAP S - 4 HANA in Project Management
3 pages
CSEC EDPM CoverSheetForESBA V02 Fillable
No ratings yet
CSEC EDPM CoverSheetForESBA V02 Fillable
1 page
Grade 4 DLL English 4 q3 Week 5
No ratings yet
Grade 4 DLL English 4 q3 Week 5
5 pages
Error Peskin
No ratings yet
Error Peskin
4 pages
LKPD Let Me Introduce Myself
100% (1)
LKPD Let Me Introduce Myself
10 pages
Humanities and Art First Session
No ratings yet
Humanities and Art First Session
31 pages
Rexa Iom X3
No ratings yet
Rexa Iom X3
157 pages
LG Oem Lgit Plde-P017a SCH
No ratings yet
LG Oem Lgit Plde-P017a SCH
2 pages
Excerpt
No ratings yet
Excerpt
10 pages
Roberts and Lamp - Geoeconomics Narrative
No ratings yet
Roberts and Lamp - Geoeconomics Narrative
21 pages
Semantic Structure & Translation Theory
0% (1)
Semantic Structure & Translation Theory
13 pages
Circored
No ratings yet
Circored
2 pages
GU Student Manual 2 Schemas
No ratings yet
GU Student Manual 2 Schemas
11 pages
Cauchy Sequences for Math Students
No ratings yet
Cauchy Sequences for Math Students
4 pages
53302337203
No ratings yet
53302337203
3 pages
Aquatic Plant Presentation
No ratings yet
Aquatic Plant Presentation
17 pages
Exercise Workbook2 Basic
No ratings yet
Exercise Workbook2 Basic
90 pages
Vitocal Heat Pumps Brochure
No ratings yet
Vitocal Heat Pumps Brochure
52 pages
SME Growth with Eurogain Consulting
No ratings yet
SME Growth with Eurogain Consulting
10 pages
Dsoc202 Social Stratification English PDF
No ratings yet
Dsoc202 Social Stratification English PDF
315 pages
Art & Design Student Assessment
No ratings yet
Art & Design Student Assessment
2 pages
Summer Training Guidelines - BBA MSI
No ratings yet
Summer Training Guidelines - BBA MSI
10 pages
Rem Koolhaas
100% (1)
Rem Koolhaas
7 pages
2002 Marathon S Service Manual
No ratings yet
2002 Marathon S Service Manual
22 pages