Machine Learning
Overview
Course: Artificial Intelligence
Fundamentals
Instructor: Marco Bonzanini
Machine Learning vs Programming
Rules
Programming Answers
Data
Answers
Machine Learning Rules
Data
Ref: Deep Learning with Python, F. Chollet, 2017.
Examples of ML
Applications
• Filtering Emails (Spam Detection)
• Automatic Trading
• Fraud Detection
• Self-driving cars
• Playing chess/poker/go
• Recommending products / items / services
Machine Learning Tasks
Supervised Unsupervised
Discrete Data
Classification Clustering
(predict a label) (group similar items)
Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Machine Learning Process
• Exercise:
— Search “machine learning stages” (or steps, or
process) on Google
— Find dozens of “The X stages of Machine
Learning” articles
• No standard process?!
Recap: CRISP-DM
Recap: CRISP-DM
Machine Learning Process
• What’s the problem you’re trying to solve?
(identify ML task)
• What ML algorithms are available for such task?
• How does the data set look like?
(enough data? need labelled data? need pre-
processing?)
ML Modelling
• Step 1: Learning (a.k.a. Training)
— Batch process (could take hours/days)
— “Learn” from the data
— Output: your “model”
• Step 2: Prediction (a.k.a. Testing)
— Given a trained model, make a prediction on
new, unseen data
— Output: depends on the task
Example: classification task
Ref: Mastering Social Media Mining with Python, M. Bonzanini, 2016.
ML Terminology
• Item or Sample: the “objects” we’re dealing with
• Item representation (e.g. a vector)
• Features: the attributes of an item (e.g. elements of
a vector)
Item Representation
• We can use any type of attributes
• Numerical features
• Categorical features → one-hot encoding
• Text → bag-of-words
Item Representation
Item Representation
One-hot Encoding
Rome = [1, 0, 0, 0, 0, 0, …, 0]
Paris = [0, 1, 0, 0, 0, 0, …, 0]
Italy = [0, 0, 1, 0, 0, 0, …, 0]
France = [0, 0, 0, 1, 0, 0, …, 0]
Feature Engineering
• Using domain knowledge of the data to create
features that make ML algorithms work
• Fundamental, difficult, expensive, time-consuming
• Quality and quantity of features can have a big
impact on the final result
Feature Selection
• Dimensionality!
How many words in the English vocabulary?
How many unique tokens on the Web?
• Using millions of features is not feasible for some
classifiers
• Reducing training time
• Can improve generalisation, e.g. eliminate noise,
avoid overfitting
Feature Selection
• Define a utility function A(f, c)
For a given class c, for all features f, compute the
value of A(f, c) and only use the k features with the
highest utility
• Example: Term Frequency
- Discard words that appear in many documents
- Discard words that appear in a very small number
of documents
Feature Scaling
• a.k.a. data normalisation
• Different features may have different range of values
• Many algorithms use a concept of “distance”,
therefore features with a broad range will dominate
• After scaling, features will contribute equally to the
distance
Feature Scaling (2)
• Many options for scaling
• “Standardisation”: zero-mean and unit-variance
Overfitting and Underfitting
• Symptom: your ML model doesn’t perform well
outside of your test environment
• Possible cause: generalisation is hard!
• More precisely:
— Overfitting
— Underfitting
Overfitting
• Your model learns the details of the training data
set “too well”
• Good performance on the given data set,
but not on new data sets
• Noise and random fluctuations in your training data
treated as important information
• Possible solution: cross-validation
Underfitting
• Less discussed (it’s clear since the beginning)
• Your model performs badly with the given data set,
and doesn’t generalise to new data
• Possible solution: move on (change feature
engineering, feature selection, or ML algorithm
altogether)
Questions?