KEMBAR78
Week 5: Logistic Regression & SVM Quiz | PDF | Support Vector Machine | Regression Analysis
0% found this document useful (0 votes)
126 views23 pages

Week 5: Logistic Regression & SVM Quiz

Uploaded by

Ashok Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views23 pages

Week 5: Logistic Regression & SVM Quiz

Uploaded by

Ashok Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Course -Introduction to Machine Learning


Assignment- Week 5 (Logistic Regression, SVM, Kernel Function, Kernel SVM)
TYPE OF QUESTION: MCQ/MSQ
Number of Question: 10 Total Marks: 10x2 = 20

1. What would be the ideal complexity of the curve which can be used for separating the two
classes shown in the image below?

A) Linear
B) Quadratic
C) Cubic
D) insufficient data to draw conclusion

Answer: A
(The blue point in the red region is an outlier (most likely noise). The rest of the data is
linearly separable.)

2. I. Logistic Regression is used for regression purposes.


II. Logistic Regression is used for classification purposes.

A) Only I is Correct
B) Only II is Correct
C) Both I and II are Correct
D) Both I and II are Incorrect

Answer: C
Logistic Regression is used for both the calssfication and regression task.

3. Which of the following methods do we use to best fit the data in Logistic Regression?

A) Least Square Error


B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Answer: B
In logistic regression, both least square error and maximum likelihood are used as
estimation methods for fitting the data.

4. Consider a following model for logistic regression: P(y=1|x,w)=g(w0+w1x)


where g(z) is the logistic function.

In the above equation the P(y =1|x; w), viewed as a function of x, that we can get by
changing the parameters w.

What would be the range of P in such a case?

A) (-inf,0)
B) (0,1)
C) (-inf, inf)
D) (0,inf)

Answer: B
For values of x in the range (-inf ,+inf), logistic function always give a output in the range
(0,1).

5. State whether True or False.


After training an SVM, we can discard all examples which are not support vectors and can
still classify new examples.

A) TRUE
B) FALSE

Answer: A
This is true because the support vectors only affect the boundary.

6. Suppose you are dealing with 3 class classification problem and you want to train a SVM
model on the data for that you are using One-vs-all method.

How many times we need to train our SVM model in such case?

A) 1
B) 2
C) 3
D) 4

Answer: C
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

In a N-class classification problem, we have to train the SVM at least N times in a one vs
all method.

7. What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these.

Answer: C
Kernels are used in SVMs to map low dimensional data into high dimensional feature
space to classify non-linearly separable data. It is a similarity function between low-
dimensional data points and its high dimensional feature space to find out what data points
can be mapped into what sort of feature space.

8. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

A) The model would consider even far away points from hyperplane for modelling.
B) The model would consider only the points close to the hyperplane for
modelling.
C) The model would not be affected by distance of points from hyperplane for
modelling.
D) None of the above

Answer: B
The gamma parameter in SVM tuning signifies the influence of points either near or far
away from the hyperplane.
For a low gamma, the model will be too constrained and include all points of the training
dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.

9. Below are the labelled instances of 2 classes and hand drawn decision boundaries for
logistic regression. Which of the following figure demonstrates overfitting of the training data?

A) A
B) B
C) C
D) None of these
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Answer: C
In figure 3, the decision boundary is very complex and unlikely to generalize the data.

10. What do you conclude after seeing the visualization in previous question?

C1. The training error in first plot is higher as compared to the second and third plot.
C2. The best model for this regression problem is the last (third) plot because it has
minimum training error (zero).
C3. Out of the 3 models, the second model is expected to perform best on unseen data.
C4. All will perform similarly because we have not seen the test data.

A) C1 and C2
B) C1 and C3
C) C2 and C3
D) C4

Answer: B
From the visualization, it is clear that the misclassified samples are more in the plot A when
compared to B. So, C1 is correct. In figure 3, the training error is less due to complex
boundary. So, it is unlikely to generalize the data well. Therefore, option C2 is wrong.
The first model is very simple and underfits the training data. The third model is very
complex and overfits the training data. The second model compared to these models has
less training error and likely to perform well on unseen data. So, C3 is correct.
We can estimate the performance of the model on unseen data by observing the nature of
the decision boundary. Therefore, C4 is incorrect

End
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Course -Introduction to Machine Learning


Assignment- Week 5 (Logistic Regression, SVM, Kernel Function, Kernel SVM)
TYPE OF QUESTION: MCQ/MSQ
Number of Question: 10 Total Marks: 10x2 = 20

1. What would be the ideal complexity of the curve which can be used for separating the two
classes shown in the image below?
A) Linear
B) Quadratic
C) Cubic
D) insufficient data to draw conclusion

Answer: A
(The blue point in the red region is an outlier (most likely noise). The rest of the data is
linearly separable.)

2. Which of the following option is true?


A) Linear regression error values have to normally distributed but not in the case
of the logistic regression
B) Logistic regression values have to be normally distributed but not in the case of the
linear regression
C) Both linear and logistic regression error values have to be normally distributed
D) Both linear and logistic regression error values need not to be normally distributed

Answer: A
Linear regression error values have to be normally distributed only.

3. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Manhattan distance
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Answer: B
In logistic regression, maximum likelihood is used as estimation methods for fitting the data.

4. Imagine, you have given the below graph of logistic regression which shows the relationships
between cost function and number of iterations for 3 different learning rate values (different
colors are showing different curves at different learning rates).

Suppose, you save the graph for future reference but you forgot to save the value of different
learning rates for this graph. Now, you want to find out the relation between the leaning rate
values of these curve. Which of the following will be the true relation?
Note: 1. The learning rate for blue is L1.
2. The learning rate for red is L2.
3. The learning rate for green is L3.

A) L1>L2>L3
B) L1=L2=L3
C) L1<L2<L3
D) None of these

Answer: C
If you have low learning rate means your cost function will decrease slowly but in case of
large learning rate cost function will decrease very fast.

5. State whether True or False.


After training an SVM, we can discard all examples which are not support vectors and can
still classify new examples.
A) TRUE
B) FALSE

Answer: A
This is true because the support vectors only affect the boundary.

6. Suppose you are dealing with 3 class classification problem and you want to train a SVM
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

model on the data for that you are using One-vs-all method.

How many times we need to train our SVM model in such case?
A) 1
B) 2
C) 3
D) 4

Answer: C
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

In a N-class classification problem, we have to train the SVM at least N times in a one vs
all method.

7. What is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these.

Answer: C
Kernels are used in SVMs to map low dimensional data into high dimensional feature
space to classify non-linearly separable data. It is a similarity function between low-
dimensional data points and its high dimensional feature space to find out what data points
can be mapped into what sort of feature space.

8. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

A) The model would consider even far away points from hyperplane for modelling.
B) The model would consider only the points close to the hyperplane for
modelling.
C) The model would not be affected by distance of points from hyperplane for
modelling.
D) None of the above

Answer: B
The gamma parameter in SVM tuning signifies the influence of points either near or far
away from the hyperplane.
For a low gamma, the model will be too constrained and include all points of the training
dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.

9. Below are the labelled instances of 2 classes and hand drawn decision boundaries for
logistic regression. Which of the following figure demonstrates overfitting of the training data?

A) A
B) B
C) C
D) None of these
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

Answer: C
In figure 3, the decision boundary is very complex and unlikely to generalize the data.

10. What do you conclude after seeing the visualization in previous question?

C1. The training error in first plot is higher as compared to the second and third plot.
C2. The best model for this regression problem is the last (third) plot because it has
minimum training error (zero).
C3. Out of the 3 models, the second model is expected to perform best on unseen data.
C4. All will perform similarly because we have not seen the test data.

A) C1 and C2
B) C1 and C3
C) C2 and C3
D) C4

Answer: B
From the visualization, it is clear that the misclassified samples are more in the plot A when
compared to B. So, C1 is correct. In figure 3, the training error is less due to complex
boundary. So, it is unlikely to generalize the data well. Therefore, option C2 is wrong.
The first model is very simple and underfits the training data. The third model is very
complex and overfits the training data. The second model compared to these models has
less training error and likely to perform well on unseen data. So, C3 is correct.
We can estimate the performance of the model on unseen data by observing the nature of
the decision boundary. Therefore, C4 is incorrect

End
Introduction to Machine Learning -IITKGP
Assignment - 5
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 15 Total mark: 2 * 15 = 30

1. What would be the ideal complexity of the curve which can be used for separating the two
classes shown in the image below?
a. Linear
b. Quadratic
c. Cubic
d. insufficient data to draw a conclusion

Correct Answer: a

Explanation: The blue point in the red region is an outlier (most likely noise). The rest of
the data is linearly separable.

2. Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you
have been given the following data in which some points are circled red that are representing
support vectors.

If you remove the following any one red points from the data. Will the decision boundary
change?
a. Yes
b. No

Correct Answer: a

Explanation: These three examples are positioned such that removing any one of them
introduces slack in the constraints. So, the decision boundary would completely change.

3. What do you mean by a hard margin in SVM Classification?


a. The SVM allows very low error in classification
b. The SVM allows high amount of error in classification
c. Both are True
d. Both are False
Correct Answer: a
Explanation: A hard margin means that an SVM is very rigid in classification and tries to
work extremely well in the training set, causing overfitting.

4. Which of the following statements accurately compares linear regression and logistic
regression?
a. Linear regression is used for classification tasks, while logistic regression is used for
regression tasks.
b. Linear regression models the relationship between input features and continuous
target variables, while logistic regression models the probability of binary outcomes.
c. Linear regression and logistic regression are identical in their mathematical
formulation and can be used interchangeably.
d. Linear regression and logistic regression both handle multi-class classification tasks
equally effectively.
Correct Answer: b
Explanation: Linear regression is employed to predict continuous numeric target variables
based on input features. It finds the best-fitting linear relationship between features and the
target variable. Logistic regression, on the other hand, is designed for binary classification
tasks where the goal is to estimate the probability that a given input belongs to a particular
class. It employs the logistic (sigmoid) function to map the linear combination of features to a
probability value between 0 and 1. Linear regression and logistic regression serve different
purposes and are not interchangeable due to their distinct objectives and mathematical
formulations.
5. After training an SVM, we can discard all examples which are not support vectors and can
still classify new examples?

a. True
b. False

Correct Answer: a

Explanation: Since the support vectors are only responsible for the change in decision
boundary.

6. Suppose you are building a SVM model on data X. The data X can be error prone which
means that you should not trust any specific data point too much. Now think that you want
to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses
Slack variable C as one of it’s hyper parameter.

What would happen when you use very large value of C (C->infinity)?
a. We can still classify data correctly for given setting of hyper parameter C.
b. We can not classify data correctly for given setting of hyper parameter C
c. None of the above

Correct Answer: a

Explanation: For large values of C, the penalty for misclassifying points is very high, so
the decision boundary will perfectly separate the data if possible.

7. Following Question 6, what would happen when you use very small C (C~0)?
a. Data will be correctly classified
b. Misclassification would happen
c. None of these

Correct Answer: b

Explanation: The classifier can maximize the margin between most of the points, while
misclassifying a few points, because the penalty is so low.

8. If g(z) is the sigmoid function, then its derivative with respect to z may be written in term of
g(z) as

a. g(z)(1-g(z))
b. g(z)(1+g(z))
c. -g(z)(1+g(z))
d. g(z)(g(z)-1)

Correct Answer: a

𝑑 1 1
𝐃𝐞𝐭𝐚𝐢𝐥𝐞𝐝 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: 𝑔′ (𝑧) = ( −𝑧
)= . 𝑒 −𝑧
𝑑𝑧 1 + 𝑒 (1 + 𝑒 −𝑧 )2

1 1
= . (1 − )
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧

= 𝑔(𝑧)(1 − 𝑔(𝑧)

9. In the linearly non-separable case, what effect does the C parameter have on the
SVM mode.

a. it determines how many data points lie within the margin


b. it is a count of the number of data points which do not lie on their respective
side of the hyperplane
c. it allows us to trade-off the number of misclassified points in the training data
and the size of the margin
d. it counts the support vectors

Correct Answer: c

Explanation: A high value of the C parameter results in more emphasis being given to the
penalties arising out of points lying on the wrong sides of the margins. This results in
reducing the number of such points being considered in deciding the decision boundary by
reducing the margin.

10. What type of kernel function is commonly used for non-linear classification tasks in
SVM?

a. Linear kernel
b. Polynomial kernel
c. Sigmoid kernel
d. Radial Basis Function (RBF) kernel

Correct Answer: d

Explanation: The Radial Basis Function (RBF) kernel is commonly used for non-linear
classification tasks in SVM. It introduces non-linearity by mapping data points into a high-
dimensional space, where a linear decision boundary corresponds to a non-linear decision
boundary in the original feature space. The RBF kernel is suitable for capturing complex
relationships and is widely used due to its effectiveness.
11. Which of the following statements is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space


2. It’s a similarity function

a. 1 is True but 2 is False


b. 1 is False but 2 is True
c. Both are True
d. Both are False

Correct Answer: c
Explanation: Follow lecture notes

12. The soft-margin SVM is prefered over the hard-margin SVM when:

a. The data is linearly separable


b. The data is noisy
c. The data contains overlapping point

Correct Answer: b, c

Explanation: When the data has noise and overlapping points, there is a problem in drawing
a clear hyperplane without misclassifying.

13. Consider the data-points in the figure below.


Let us assume that the black-colored circles represent positive class whereas the white-colored
circles represent negative class. Which of the following among H1, H2 and H3 is the
maximum-margin hyperplane?

a. H1
b. H2
c. H3
d. None of the above.

Correct Answer: c

Explanation: In a Support Vector Machine (SVM), the maximum-margin hyperplane is the


one that has the largest distance between itself and the nearest data point of either class. This
hyperplane ensures the best generalization to unseen data. The SVM aims to maximize this
margin while still correctly classifying the training data.

To determine the maximum-margin hyperplane, you need to look for the hyperplane that has
the largest "margin" between it and the nearest data point. The margin is the perpendicular
distance between the hyperplane and the closest data point from either class.

H3 has the largest gap between itself and the nearest data point. That hyperplane would be the
maximum-margin hyperplane.

14. What is the primary advantage of Kernel SVM compared to traditional SVM with a linear
kernel?

a. Kernel SVM requires less computational resources.


b. Kernel SVM does not require tuning of hyperparameters.
c. Kernel SVM can capture complex non-linear relationships between data points.
d. Kernel SVM is more robust to noisy data.

Correct Answer: c

Explanation: The primary advantage of Kernel SVM is its ability to capture complex non-
linear relationships between data points through the use of kernel functions. While traditional
SVM with a linear kernel is limited to finding linear decision boundaries, Kernel SVM can
transform the data into higher-dimensional spaces where non-linear decision boundaries can
be effectively learned. This makes Kernel SVM suitable for a wide range of classification tasks
where linear separation is not sufficient.
15. What is the sigmoid function's role in logistic regression?

a. The sigmoid function transforms the input features to a higher-dimensional space.


b. The sigmoid function calculates the dot product of input features and weights.
c. The sigmoid function defines the learning rate for gradient descent.
d. The sigmoid function maps the linear combination of features to a probability value.

Correct Answer: d

Explanation: The sigmoid function, also known as the logistic function, plays a crucial role in
logistic regression. It transforms the linear combination of input features and corresponding
weights into a value between 0 and 1. This value represents the estimated probability that the
input belongs to a particular class. The sigmoid function's curve ensures that the output remains
within the probability range, making it suitable for binary classification.

************END************
Course -Introduction to Machine Learning
Assignment- Week 5 (Logistic Regression, SVM, Kernel Function, Kernel
SVM)
TYPE OF QUESTION: MCQ/MSQ
Number of Question: 10 Total Marks:10x2 =20
__________________________________________________________________
Question 1:
What would be the ideal complexity of the curve which can be used for separating the
two classes shown in the image below?

A) Linear
B) Quadratic
C) Cubic
D) insufficient data to draw conclusion

Correct Answer: A
Detailed Solution: The blue point in the red region is an outlier. The rest of the data is
linearly separable.
__________________________________________________________________

Question 2:

Suppose you have a dataset with n=10 features and m=1000 examples. After training a
logistic regression classifier with gradient descent, you find that it has high training error
and does not achieve the desired performance on training and validation sets. Which of
the following might be promising steps to take?
1. Use SVM with a non-linear kernel function
2. Reduce the number of training examples
3. Create or add new polynomial features

A) 1, 2
B) 1, 3
C) 1, 2, 3
D) None
Correct Answer: B
Detailed Solution: As logistic regression did not perform well, it is highly likely that the
dataset is not linearly separable. SVM with a non-linear kernel works well for
non-linearly separable datasets. Creating new polynomial features will also help in
capturing the non-linearity in the dataset.
__________________________________________________________________

Question 3:

In logistic regression, we learn the conditional distribution p(y|x), where y is the class
label and x is a data point. If h(x) is the output of the logistic regression classifier for an
input x, then p(y|x) equals:

𝑦 (1−𝑦)
A. ℎ(𝑥) (1 − ℎ(𝑥))
𝑦 (1−𝑦)
B. ℎ(𝑥) (1 + ℎ(𝑥))
1−𝑦 𝑦
C. ℎ(𝑥) (1 − ℎ(𝑥))
𝑦 (1+𝑦)
D. ℎ(𝑥) (1 + ℎ(𝑥))

Correct Answer: A
Detailed Solution: Refer to the lecture.
__________________________________________________________________

Question 4:

The output of binary class logistic regression lies in the range:


A. [-1,0]
B. [0,1]
C. [-1,-2]
D. [1,10]

Correct Answer: B
Detailed Solution: The output of binary class logistic regression lies in the range:
[0,1].

__________________________________________________________________

Question 5:

State whether True or False.


“After training an SVM, we can discard all examples which are not support
vectors and can still classify new examples.”
A) TRUE
B) FALSE

Correct Answer: A
Detailed Solution : Using only the support vector points, it is possible to classify new
examples.
__________________________________________________________________

Question 6:

Suppose you are dealing with a 3-class classification problem and you want to train a
SVM model on the data. For that you are using the One-vs-all method. How many
times do we need to train our SVM model in such a case?
A) 1
B) 2
C) 3
D) 4

Correct Answer: C
Detailed Solution: In a N-class classification problem, we have to train the SVM N
times in the one vs all method.
__________________________________________________________________
__________________________________________________________________

Question 7:

What is/are true about kernels in SVM?

1. Kernel function can map low dimensional data to high dimensional space
2. It’s a similarity function

A) 1
B) 2
C) 1 and 2
D) None of these.

Correct Answer: C
Detailed Solution: Kernels are used in SVMs to map low dimensional data into high
dimensional feature space to classify non-linearly separable data. It also acts as a
similarity function.
_________________________________________________________________

Question 8:

If g(z) is the sigmoid function, then its derivative with respect to z may be written in
term of g(z) as

A) g(z)(g(z)-1)
B) g(z)(1+g(z))
C) -g(z)(1+g(z))
D) g(z)(1-g(z))

Correct Answer: D
Detailed Answer:
−𝑧
𝑑 1 𝑒 1 1
𝑔'(𝑧) = 𝑑𝑧
( −𝑧 ) = −𝑧 2 = −𝑧 (1 − −𝑧 ) = 𝑔(𝑧)(1 − 𝑔(𝑧))
1+𝑒 (1+𝑒 ) 1+𝑒 1+𝑒
__________________________________________________________________
Question 9:

Below are the labelled instances of 2 classes and hand drawn decision boundaries for
logistic regression. Which of the following figures demonstrates overfitting of the
training data?

A) A
B) B
C) C
D) None of these

Correct Answer: C
Detailed Solution: In figure 3, the decision boundary is very complex and unlikely to
generalize the data.
__________________________________________________________________
Question 10:

What do you conclude after seeing the visualization in the previous question (Question
9)?

C1. The training error in the first plot is higher as compared to the second and third
plot.
C2. The best model for this regression problem is the last (third) plot because it
has minimum training error (zero).
C3. Out of the 3 models, the second model is expected to perform best on
unseen data.
C4. All will perform similarly because we have not seen the test data.

A) C1 and C2
B) C1 and C3
C) C2 and C3
D) C4

Correct Answer: B
Detailed Solution: From the visualization, it is clear that the misclassified samples
are more in the plot A when compared to B and C. So, C1 is correct. In figure 3, the
training error is less due to complex boundaries. So, it is unlikely to generalize the
data well. Therefore, option C2 is wrong.
The first model is very simple and underfits the training data. The third model is very
complex and overfits the training data. The second model compared to these models
has less training error and is likely to perform well on unseen data. So, C3 is correct.
We can estimate the performance of the model on unseen data by observing the
nature of the decision boundary. Therefore, C4 is incorrect.

__________________________________________________________________
End

You might also like