Course Name : Applied Machine Learning
Course Code : MTDS 5143
Course Mode : Modular
Schedule : 25-26 Nov, 2019 (ML)
02-03 Dec, 2019 (ML, DL)
09-10 Dec, 2019 (DL)
16-17 Dec, 2019 (Assessment)
ULearn : MTDS 5143 Applied Machine Learning
(All work should be submitted through ULearn platform)
Instructor : Dr. Noor Fazilla Binti Abd Yusof
elle@utem.edu.my
: Assoc. Prof. Ts. Dr. Choo Yun Huoy
huoy@utem.edu.my
Subject
Information MTDS5143 Applied Machine Learning
Session 2019 / 2020 Semester I
Suggested Reference
2
Main References: Lab/Practical:
By
Assoc. Prof. Dr. Choo Yun Huoy
Department of Intelligent Computing & Analytics
Faculty of Information and Communication Technology
Univ. Teknikal Malaysia Melaka (UTeM)
76109 Durian Tunggal, Melaka, Malaysia.
huoy@utem.edu.my
MTDS 5143 APPLIED MACHINE LEARNING
4
§ From AI To Machine Learning
§ What Is Machine Learning?
§ Machine Learning Task
§ Using Data to Make Decisions
§ The Machine Learning Workflow
§ Summary
From AI To Machine Learning
5
§ AI is basically the intelligence – how we
make machines intelligent, while
machine learning is the implementation
of the compute methods that support it.
§ AI is the science and machine learning
is the algorithms that make the
machines smarter.
§ “So the enabler for AI is machine
learning”
Source: http://www.wired.co.uk/article/machine-learning-ai-explained
From AI To Machine Learning
6 Source: http://iot.ghost.io/is-it-all-machine-learning/
Where is Machine Learning ?
7
What is Concept Learning?
8
§ The keyword of Machine Learning is the learning process itself.
§ Learning means learning a concept.
§ A concept describe a set of objects or events with similar
characteristics.
Concept
of Owl
Concept of
Butterflies
Concept
of Love
Concept
Concept of Happy
Concept
of Trees of Sad
Iris Flower Concept
9
The Iris Flower Data Set
Famous database; from Fisher, 1936
https://archive.ics.uci.edu/ml/datasets/Iris
§ To predict the Iris flower breed, we must first
learn the concept of Iris flower.
How to recognise an Iris flower?
What Is Machine Learning?
10
§ Using the right features to build the right models that
achieve the right tasks.
What Is Machine Learning?
11
Abstraction of
Mapping / Relations
Problem to be solved
Object descriptor
The method used
§ Using the right features to build the right models that
achieve the right tasks.
Machine Learning Vocabulary
12
§ Target: predicted category or value of the
data (column to predict)
Data
13
The Iris Flower Data Set
Famous database; from Fisher, 1936
https://archive.ics.uci.edu/ml/datasets/Iris
§ Predict the type of iris plant based on the sepal and
petal size (width and length).
Machine Learning Vocabulary
14
The Iris Flower Data Set
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
Target
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
Machine Learning Vocabulary
15
§ Target: predicted category or value of the
data (column to predict)
§ Features: properties of the data used for
prediction (non-target columns)
Machine Learning Vocabulary
16
The Iris Flower Data Set
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
Features
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
Machine Learning Vocabulary
17
§ Target: predicted category or value of the
data (column to predict)
§ Features: properties of the data used for
prediction (non-target columns)
§ Example: a single data point within the data
(one row)
Machine Learning Vocabulary
18
The Iris Flower Data Set
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
Examples 4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
Machine Learning Vocabulary
19
§ Target: predicted category or value of the
data (column to predict)
§ Features: properties of the data used for
prediction (non-target columns)
§ Example: a single data point within the data
(one row)
§ Label: the target value for a single data point
Machine Learning Vocabulary
20
The Iris Flower Data Set
sepal length sepal width petal length petal width species
6.7 3.0 5.2 2.3 virginica
6.4 2.8 5.6 2.1 virginica
4.6 3.4 1.4 0.3 setosa
6.9 3.1 4.9 1.5 versicolor
4.4 2.9 1.4 0.2 setosa
Label 4.8 3.0 1.4 0.1 setosa
5.9 3.0 5.1 1.8 virginica
5.4 3.9 1.3 0.4 setosa
4.9 3.0 1.4 0.2 setosa
5.4 3.4 1.7 0.2 setosa
Machine Learning Task
21
Supervised Unsupervised
Learning
Types
Reinforcement
Machine Learning Task
22
Unsupervised Supervised
Continuous
Clustering Regression
Categorical
Association Classification
Analysis
The Machine Learning Tree
Source: https://vas3k.com/blog/machine_learning/
Classical ML vs Deep Learning
Classical Machine Learning
Source: https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351
Artificial Neural Network vs
Deep Learning Neural Network
Source: https://www.pnas.org/content/116/4/1074
Source: https://vas3k.com/blog/machine_learning/
Source: https://vas3k.com/blog/machine_learning/
Tree Based Learning
Source: https://vas3k.com/blog/machine_learning/
Function Based Learning
Source: https://vas3k.com/blog/machine_learning/
Probability Based Learning
Source: https://vas3k.com/blog/machine_learning/
Source: https://vas3k.com/blog/machine_learning/
Value Prediction
Source: https://vas3k.com/blog/machine_learning/
Source: https://vas3k.com/blog/machine_learning/
Distance-based Clustering
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Mean-Shift Clustering
Mean-shift Clustering for A The Entire Process of Mean-
Single Sliding Window shift Clustering
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Density-Based Spatial Clustering of
Applications with Noise (DBSCAN)
DBSCAN Smiley Face Clustering
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Expectation–Maximization (EM)
Clustering using Gaussian Mixture
Models (GMM)
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Agglomerative Hierarchical
Clustering
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Source: https://vas3k.com/blog/machine_learning/
Component Analysis
Finding Principal Components
Source: http://www.ait.edu.gr/ait_web_site/faculty/apne/Face_Recognition.html
Source: https://vas3k.com/blog/machine_learning/
Source: https://vas3k.com/blog/machine_learning/
Association Rule Mining
Source: https://www.stratlytics.com/blog.php?id=9
Association Rule Mining
Machine Learning Task
45
Machine Learning Task
46
§ Semi-supervised learning is to use a small labelled
training set to build an initial model, which is then
refined using the unlabeled data.
§ Semi-supervised learning is used when constructing a
labelled training set is a painstaking process.
MODEL
47
The Output of Machine Learning
• models use intuitions from geometry such as separating
Geometric (hyper) planes, linear transformations and distance metrics.
• SVM, Nearest Neighbour, PCA
• models view learning as a process of reducing uncertainty,
Probabilistic modelled by means of probability distributions
• Bayes model, likelihood ratio, NB
• models are defined in terms of easily interpretable logical
Logical
• Decision trees
MODEL
48
The Output of Machine Learning
FEATURES
49
The Workhorses of Machine Learning
§ Features and models are intimately connected.
Features define model. A single feature can be turned into
a univariate model.
Features:
50
The Workhorses of Machine Learning
§ Features may interact in various ways. These
interactions can be exploited, ignored, or poses a
challenge.
§ Examples:
§ Covariance
§ Correlation coefficient
§ Empirical estimate of sample mean
§ Expectation operator such as population variance
Data Exploration on different features is
important to build a good model !
FEATURES
51
The Workhorses of Machine Learning
§ Features construction and transformation is important
to create a good model.
§ Kernal trick is used to
modify the way the
decision boundary is
calculated.
Using Data to Make Decision
52 The contents of this slide has been modified from its original version.
§ We make decision every now and then...
§ Machine learning model can be used to assist
decision making.
Postal Mail
Spam Filtering Web Search
Routing
Movie Vehicle Driver
Fraud Detection
Recommendations Assistance
Web Speech
Social Networks
Advertisements Recognition
Using Data to Make Decision
53
1. Converting business problems into analytics
solutions
i. What is the business problem? What are the goals
that the business wants to achieve?
ii. How does the business currently work?
iii. In what way could a predictive analytics model help
to address the business problem?
Case Study : Motor Insurance Fraud
Problem to Solutions
54
Case Study : Motor Insurance Fraud
In spite of having a fraud investigation team that investigates up
to 30% of all claims made, a motor insurance company is still
losing too much money due to fraudulent claims. The following
predictive analytics solutions could be proposed to help address
this business problem:
§ Claim prediction to predict the likelihood of fraudulent claim.
§ Member prediction to predict the propensity of member to
commit fraud in the near future.
§ Application prediction to predict the likelihood of a policy
application to ultimately result in fraudulent claim.
§ Payment prediction to predict the amount of pay out after
investigation.
Using Data to Make Decision
55
2. Assessing Feasibility
i. The key objects in the company’s data model and the
data available regarding them.
ii. The connections that exist between key objects in the
data model.
iii. The granularity of the data that the business has
available.
iv. The volume of data involved.
v. The time horizon for which data is available.
Case Study : Motor Insurance Fraud
Problem to Solutions
56
Case Study : Motor Insurance Fraud
[Claim prediction]
§ Data Requirements:
large collection of historical claims marked as fraudulent and non-
fraudulent; the details of each claim; the related policy; and the related
claimant.
§ Capacity Requirements:
Given that the insurance company already has a claims investigation
team, the main requirements would be that a mechanism could be put in
place to inform claims investigators that some claims were prioritized
above others. This would also require that information about claims become
available in a suitably timely manner so that the claims investigation
process would not be delayed by the model.
Problem to Solutions
57
Case Study : Motor Insurance Fraud
[Member prediction] to predict the propensity of member to
commit fraud in the near future
§ Data Requirements:
____________________________________________________________
____________________________________________________________
____________________________________________________________
§ Capacity Requirements:
_________________________________________________________
_________________________________________________________
_________________________________________________________
Problem to Solutions
58
Case Study : Motor Insurance Fraud
[Member prediction]
§ Data Requirements:
large collection of claims labeled as either fraudulent or non-fraudulent; all
relevant details, all claims and policies can be connected to an identifiable
member; historical data on recorded changes to a policy.
§ Capacity Requirements:
Assume to run the prediction every quarter to analysis the behavior of each
customer. The company has the capacity to advise members without
damaging the customer relationship so badly as to lose the customer. Finally,
there are possibly legal restrictions associated with making this kind of
contact.
Using Data to Make Decision
59
3. Designing the Analytics Base Table
i. Elicit the domain concept, subdomain concept, and
the features involved?
ii. Prediction subject details
iii. Demographics
iv. Usage (frequency, recency, monetary value)
v. Changes in usage, and special usage
vi. Lifecycle phase
vii. Network links
Case Study : Motor Insurance Fraud
Problem to Solutions
60
Case Study : Motor Insurance Fraud
[Claim prediction]
§ Concepts elicitation:
Problem to Solutions
61
Case Study : Motor Insurance Fraud
[Claim prediction]
§ Analytic Base Table:
Using Data to Make Decision
62
4. Designing & Implementing Features
i. Data Availability : Data Type
Using Data to Make Decision
63
4. Designing & Implementing Features
i. Data Availability : Features
n Raw Features
n Derived Feature
n Aggregates
n Flags
n Ratios
n Mappings
ii. Data Availability Timing : propensity
n Observation period
n Outcome period
64
65
66
67
Using Data to Make Decision
68
4. Designing & Implementing Features
iii. Feature Longevity
iv. Legal Issues
n Collection limitation principle
n Purpose specification principle
n Use limitation principle
v. Data Manipulation
n Joining data sources
n Filtering rows and fields in a data source
n Combining or transforming features
n Aggregating data sources
Problem to Solutions
69
Case Study : Motor Insurance Fraud
[Claim prediction]
What are the observation period and outcome period for
the motor insurance claim prediction scenario?
Problem to Solutions
70
Case Study : Motor Insurance Fraud
[Claim prediction]
What are the observation period and outcome period for
the motor insurance claim prediction scenario?
§ The observation period and outcome period are measured
over different dates for each insurance claim, defined relative
to the specific date of that claim.
Problem to Solutions
71
Case Study : Motor Insurance Fraud
[Claim prediction]
What are the observation period and outcome period for
the motor insurance claim prediction scenario?
§ The observation period and outcome period are measured
over different dates for each insurance claim, defined relative
to the specific date of that claim.
§ The observation period is the time prior to the claim event,
over which the descriptive features capturing the claimant’s
behavior are calculated.
Problem to Solutions
72
Case Study : Motor Insurance Fraud
[Claim prediction]
What are the observation period and outcome period for
the motor insurance claim prediction scenario?
§ The observation period and outcome period are measured
over different dates for each insurance claim, defined relative
to the specific date of that claim.
§ The observation period is the time prior to the claim event,
over which the descriptive features capturing the claimant’s
behavior are calculated.
§ The outcome period is the time immediately after the claim
event, during which it will emerge whether the claim is
fraudulent or genuine.
Problem to Solutions
73
Case Study : Motor Insurance Fraud [Claim prediction]
Problem to Solutions
74
Case Study : Motor Insurance Fraud [Claim prediction]
What are the features for
Claim Type Subdomain ?
Problem to Solutions
75
Case Study : Motor Insurance Fraud [Claim prediction]
The Analytics Base Table
76
§ The table contains
more descriptive
features
§ The table shows
the first four
instances.
§ If we examine the
table closely, we
see a number of
strange values
(e.g., -9 999) and a
number of missing
values.
Machine Learning Workflow
77
ML
Algorithm
Machine Learning Experiment
78
To evaluate a particular
model on 1 or more
data sets, use the
measurements to
answer the questions
related to the
formulated problem.
Summary
79
§ Machine learning is to use the right features to build the
right models to achieve the right tasks.
§ Tasks are addressed by models, whereas learning
problems are solved by learning algorithms that
produce models.
§ Predictive data analytics models built using machine
learning techniques are tools that we can use to help
make better decisions.
§ It is important to fully understand the business problem
that a model is being constructed to address the goal
behind.
Primary Reference:
Kelleher, John D., Brian Mac Namee, and Aoife D'Arcy. Fundamentals of machine learning for predictive
data analytics: algorithms, worked examples, and case studies. MIT Press, 2015.
Flach, P., (2012), Machine Learning: The Art and Science of Algorithms that Make Sense of Data,
Cambridge University Press.