KEMBAR78
Model Paper - Applied Machine Learning | PDF | Statistical Classification | Support Vector Machine
0% found this document useful (0 votes)
25 views3 pages

Model Paper - Applied Machine Learning

The document is an exam paper for an Applied Machine Learning course, consisting of three questions that cover topics such as Naive Bayesian classifiers, model evaluation methods, overfitting in regression models, and the application of Support Vector Machines for predicting loan defaults. Each question requires detailed explanations, reasoning, and analysis of various machine learning concepts and techniques. The exam assesses the understanding of classification, model complexity, data preprocessing, and performance metrics in machine learning.

Uploaded by

usanduni36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views3 pages

Model Paper - Applied Machine Learning

The document is an exam paper for an Applied Machine Learning course, consisting of three questions that cover topics such as Naive Bayesian classifiers, model evaluation methods, overfitting in regression models, and the application of Support Vector Machines for predicting loan defaults. Each question requires detailed explanations, reasoning, and analysis of various machine learning concepts and techniques. The exam assesses the understanding of classification, model complexity, data preprocessing, and performance metrics in machine learning.

Uploaded by

usanduni36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Applied Machine Learning

Model Exam Paper

QUESTION 01

You are the program chair of a large academic conference. When papers are submitted to the
conference, you must decide which paper should go into which topical area. For instance, a paper
on neural networks should go into the "Artificial Intelligence" area. The areas are fixed before
submission takes place. For each area, you have recruited an expert who will organize reviews
for each paper in their area. These experts are called "area chairs". You use statistical
classification to route the incoming papers into various areas. You have several decades of papers
labelled with the area to which they were manually assigned.

a) Explain how to set up a suitable classifier Naive Bayesian classifier for this task and derive
the required parameter estimates. Give all necessary formulae.
b) Explain why the Naïve Bayes classifier would be a good choice compared to logistic
regression and Support vector machines.
c) You now want to quantify how well your classifier is doing. Given that you can ask your area
chairs for instant feedback, which two different evaluation methods can you realize in your
setting, and how would you do this? Your answer should give details about the data split and
metrics,
d) We have assumed that areas have been stable. You found out that some areas of your field's
history have been changed for the first time for your upcoming conference. For each of the
cases below, explain what would happen if you ran your classifier from (a) unchanged in the
new situation, and propose the best course of action in the light of your existing classifier,
giving your reasons.

i. An area has become unpopular and is no longer treated in this year's conference.
ii. A new area has been proposed, treating material never covered in your conference.
iii. An existing area has split into two new areas.
QUESTION 02

The provided graph displays the training and testing errors as a function of model complexity,
which might represent the degree of a polynomial model used in regression.

Answer the following questions.


a) The overfitting beginning point is marked in the graph. Is this the actual overfitting
beginning point? Explain your reasoning.
b) Determine the model complexity that minimizes the testing error. Discuss why this point
is considered optimal.
c) Explain the significance of the gap between the training and testing errors increasing as
model complexity increases. What does this imply about the model's generalizability?
d) Describe the trend of both errors as the model complexity increases from 1 to 20. What
machine learning concept can explain this behavior?
e) Assume that a machine learning engineer chose an eight-degree polynomial as the best
model fit after observing the above graph and analyzing error trends. What do you think
about his choice? If his choice is wrong, what is the best model complexity to avoid
overfitting and underfitting?
QUESTION 03

A financial services company in Sri Lanka plans to use machine learning to predict whether loan
applicants are likely to default. The dataset includes the following features for each applicant:
credit score (ranging from 300 to 850), annual income (ranging from Rs100,000 to Rs
10,000,000), debt-to-income ratio (ranging from 0% to 50%), loan amount (ranging from Rs
50,000 to Rs 500,000), and past repayment history (categorized as 'good', 'fair, 'poor).
The company has decided to use a Support Vector Machine (SVM) for this binary classification
problem, with applicants classified as 'default' or 'no default'. The dataset consists of 10,000
entries with about 800 cases of defaults.

a) Describe the preprocessing steps before applying SVM to this dataset. Specifically,
address how you would handle scaling, encoding, and potential outliers.
b) Assume you initially choose a linear kernel for the SVM. Considering the nature of the
features, why might a polynomial kernel be more appropriate?
c) Given a subset of the dataset with two features, credit score and debt-to-income ratio,
draw a conceptual sketch of the decision boundary using a polynomial kernel of degree 2.
Label areas of likely 'default' and 'no default'.
d) What specific techniques would you use to handle the class imbalance in the dataset?
Discuss how these methods would affect the SM's learning process.
e) Given the potential financial consequences of incorrectly predicting 'no default' for
someone who ends up defaulting, which metrics would be most crucial for evaluating the
performance of the SVM model? Explain your choices.

You might also like