0% found this document useful (0 votes)

29 views24 pages

03 Machine Learning Overview

The document provides an overview of machine learning as part of an artificial intelligence course taught by Marco Bonzanini. It covers the differences between machine learning and programming, various applications of machine learning, tasks such as supervised and unsupervised learning, and the machine learning process including modeling, feature engineering, and challenges like overfitting and underfitting. Key concepts like item representation, feature selection, and scaling are also discussed.

Uploaded by

hrhee1atl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views24 pages

03 Machine Learning Overview

Uploaded by

hrhee1atl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Machine Learning

Overview
Course: Artificial Intelligence
Fundamentals

Instructor: Marco Bonzanini

Machine Learning vs Programming

Rules
Programming Answers
Data

Answers
Machine Learning Rules
Data

Ref: Deep Learning with Python, F. Chollet, 2017.

Examples of ML
Applications
• Filtering Emails (Spam Detection)

• Automatic Trading

• Fraud Detection

• Self-driving cars

• Playing chess/poker/go

• Recommending products / items / services

Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Machine Learning Process

• Exercise:
— Search “machine learning stages” (or steps, or
process) on Google
— Find dozens of “The X stages of Machine
Learning” articles

• No standard process?!
Recap: CRISP-DM
Recap: CRISP-DM
Machine Learning Process

• What’s the problem you’re trying to solve?

(identify ML task)

• What ML algorithms are available for such task?

• How does the data set look like?

(enough data? need labelled data? need pre-
processing?)
ML Modelling
• Step 1: Learning (a.k.a. Training)
— Batch process (could take hours/days)
— “Learn” from the data
— Output: your “model”

• Step 2: Prediction (a.k.a. Testing)

— Given a trained model, make a prediction on
new, unseen data
— Output: depends on the task
Example: classification task

Ref: Mastering Social Media Mining with Python, M. Bonzanini, 2016.

ML Terminology

• Item or Sample: the “objects” we’re dealing with

• Item representation (e.g. a vector)

• Features: the attributes of an item (e.g. elements of

a vector)
Item Representation

• We can use any type of attributes

• Numerical features

• Categorical features → one-hot encoding

• Text → bag-of-words
Item Representation
Item Representation
One-hot Encoding

Rome = [1, 0, 0, 0, 0, 0, …, 0]
Paris = [0, 1, 0, 0, 0, 0, …, 0]
Italy = [0, 0, 1, 0, 0, 0, …, 0]
France = [0, 0, 0, 1, 0, 0, …, 0]
Feature Engineering

• Using domain knowledge of the data to create

features that make ML algorithms work

• Fundamental, difficult, expensive, time-consuming

• Quality and quantity of features can have a big

impact on the final result
Feature Selection
• Dimensionality!
How many words in the English vocabulary?
How many unique tokens on the Web?

• Using millions of features is not feasible for some

classifiers

• Reducing training time

• Can improve generalisation, e.g. eliminate noise,

avoid overfitting
Feature Selection
• Define a utility function A(f, c)
For a given class c, for all features f, compute the
value of A(f, c) and only use the k features with the
highest utility

• Example: Term Frequency

- Discard words that appear in many documents
- Discard words that appear in a very small number
of documents
Feature Scaling
• a.k.a. data normalisation

• Different features may have different range of values

• Many algorithms use a concept of “distance”,

therefore features with a broad range will dominate

• After scaling, features will contribute equally to the

distance
Feature Scaling (2)

• Many options for scaling

• “Standardisation”: zero-mean and unit-variance

Overfitting and Underfitting

• Symptom: your ML model doesn’t perform well

outside of your test environment

• Possible cause: generalisation is hard!

• More precisely:
— Overfitting
— Underfitting
Overfitting
• Your model learns the details of the training data
set “too well”

• Good performance on the given data set,

but not on new data sets

• Noise and random fluctuations in your training data

treated as important information

• Possible solution: cross-validation

Underfitting

• Less discussed (it’s clear since the beginning)

• Your model performs badly with the given data set,

and doesn’t generalise to new data

• Possible solution: move on (change feature

engineering, feature selection, or ML algorithm
altogether)
Questions?

ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
(Fall 2024) Intro To ML
No ratings yet
(Fall 2024) Intro To ML
51 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
ML Revision
No ratings yet
ML Revision
207 pages
Machine Learning Usefull Things
No ratings yet
Machine Learning Usefull Things
18 pages
Module 4
No ratings yet
Module 4
28 pages
Lecture 1 Course Introduction
No ratings yet
Lecture 1 Course Introduction
18 pages
ML Da
No ratings yet
ML Da
55 pages
01 Introduction
No ratings yet
01 Introduction
28 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
10 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
1.lab 1 Manual
No ratings yet
1.lab 1 Manual
20 pages
ML4D L2 2425
No ratings yet
ML4D L2 2425
52 pages
ModalPaperUpload AIML201
No ratings yet
ModalPaperUpload AIML201
7 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
An Introduction To Machine Learning and How To Teach Machines To See
No ratings yet
An Introduction To Machine Learning and How To Teach Machines To See
50 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Module 1
No ratings yet
Module 1
22 pages
ML and DL
No ratings yet
ML and DL
15 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Machine Learning - Course
No ratings yet
Machine Learning - Course
6 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
Topic 2
No ratings yet
Topic 2
47 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
51 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Week 15
No ratings yet
Week 15
41 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
25 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Basic Concepts of Machine Learning For Beginners
No ratings yet
Basic Concepts of Machine Learning For Beginners
102 pages
ML - Unit 1
No ratings yet
ML - Unit 1
68 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
SI Analytics Lab 845 - Catalogue Pages 1 6
No ratings yet
SI Analytics Lab 845 - Catalogue Pages 1 6
6 pages
PowerFlex 4 Class Multi-Drive Control On EtherNetIP PDF
No ratings yet
PowerFlex 4 Class Multi-Drive Control On EtherNetIP PDF
8 pages
Design Session
No ratings yet
Design Session
45 pages
B.Tech Admissions Guide
No ratings yet
B.Tech Admissions Guide
15 pages
Manhunt Game Modding Log
No ratings yet
Manhunt Game Modding Log
2 pages
Test2 Chap5678 HonsDBMS-11feb25
No ratings yet
Test2 Chap5678 HonsDBMS-11feb25
1 page
A Short Overview of The Technical Architecture
No ratings yet
A Short Overview of The Technical Architecture
4 pages
Search Prob Solving Agent Norvig
No ratings yet
Search Prob Solving Agent Norvig
109 pages
MET CS 693 2020 Arena
No ratings yet
MET CS 693 2020 Arena
16 pages
VPN Connection Guide + Troubleshooting - PUBLIC
No ratings yet
VPN Connection Guide + Troubleshooting - PUBLIC
12 pages
Factors, Primes and Multiples
No ratings yet
Factors, Primes and Multiples
7 pages
Mobile Tracking Memanfaatkan Teknologi Global Positioning System (GPS) Dan General Packet Radio Service (GPRS
No ratings yet
Mobile Tracking Memanfaatkan Teknologi Global Positioning System (GPS) Dan General Packet Radio Service (GPRS
7 pages
85027A (브릿지)
No ratings yet
85027A (브릿지)
88 pages
Ransomware Q2 2024 Report
No ratings yet
Ransomware Q2 2024 Report
24 pages
وثيقة العمل الحر
No ratings yet
وثيقة العمل الحر
1 page
Hannan Sir Lab File
No ratings yet
Hannan Sir Lab File
26 pages
K-1000C LED Controller Manual
No ratings yet
K-1000C LED Controller Manual
9 pages
Resource Modelling for Geologists
100% (1)
Resource Modelling for Geologists
10 pages
CSIT561 Module8 Network Security
No ratings yet
CSIT561 Module8 Network Security
62 pages
Section1 OMMXTS-W00079 1 System Overview 045727
No ratings yet
Section1 OMMXTS-W00079 1 System Overview 045727
40 pages
Numerical Methods: Regula-Falsi, Newton-Raphson, and Interpolation
No ratings yet
Numerical Methods: Regula-Falsi, Newton-Raphson, and Interpolation
10 pages
Mayuri
No ratings yet
Mayuri
71 pages
Meyo Stream Policy 4.0-EN-2
No ratings yet
Meyo Stream Policy 4.0-EN-2
1 page
English Learning Exercises
No ratings yet
English Learning Exercises
2 pages
Lesson 3 - Patterns Ans Number Sequences
No ratings yet
Lesson 3 - Patterns Ans Number Sequences
2 pages
The Misconceptions of ICTs
No ratings yet
The Misconceptions of ICTs
40 pages
Jensen Ackles Ass Equation 2.0
100% (1)
Jensen Ackles Ass Equation 2.0
7 pages
HTML Tag Sheet
75% (4)
HTML Tag Sheet
1 page
HALOT Box User Manual
No ratings yet
HALOT Box User Manual
26 pages
PRACTICAL ASSIGNMENTS EXCEL and HTML
No ratings yet
PRACTICAL ASSIGNMENTS EXCEL and HTML
71 pages