Introduction to Machine
Learning
What to expect from the course?
What is Machine Learning ?
Data Visualization/Analysis, Pandas, NumPy,……..
Dimensionality Reduction
Hands-On session will be
Different Machine Learning Algorithms (Supervised, conducted in parallel
Unsupervised, metrics)
Deep Learning (Neural Networks, back propagation, loss
functions)
CNN, RNN, LSTM
….. and more
Introduction
Any technique which enables
computers to mimic human behaviour
Artificial Intelligence
Machine Learning Learning is much deeper than
memorization and information recall
Learning is “a process that leads to
Deep Learning
change, which occurs as a result of
experience and increases the potential
for improved performance and future
learning” (Ambrose et al, 2010, p.3)
Machine Learning
Machine learning is a “Field of study that gives
computers the ability to learn without being explicitly INPUT, DATA
programmed.” : Arthur Samuel
The function of a machine learning system can be: Intelligent
System
descriptive, meaning that the system uses the data
to explain what happened
Decisions,
predictive, meaning the system uses the data to
Output,
predict what will happen
Actions
prescriptive, meaning the system will use the data
to make suggestions about what action to take
Data Driven Problem Solving
Area (sq.ft) Price Area (sq.ft) Price
250 250000 250 145500
120 120000 120 212800
310 310000 310 194390
290 290000 290
Not a trivial solution. There
Simple well-known solution.
should be more parameters,
(Price = Area *1000)
(e.g., Age, Location)
The above relation obtained in a
Lot more data is needed to
trivial way, with one example.
solve the above.
Remarks
General Strategy: Given many examples of (X,Y), learn an automated solution to predict Y
Given a new X, Y = F(X)
Main Challenge: The data is becoming complex
3.1
-2.6
What is X is not a simple number? 0.41 3.9 m
A N-dim vector? 1.89 ₹ 8.2 L
15.2 Blue
Entities other than numbers?
Sedan
A picture? …
A sound bite? 9.23
How do we get the machine to do this?
General Strategy: Given many examples of (X,Y), learn an automated solution to predict Y
Given a new X, Y = F(X)
3.1
There is too much information in raw data
-2.6
0.41
Relevant information is hidden probably? 1.89
3.9 m
₹ 8.2 L
15.2
Leads to Feature Extraction: Extracting Blue
useful information (X) from raw data
…
Sedan
9.23
Representation: From Raw data to Features
Area Bedrooms Bathrooms Age Parking Basement Price
240 3 2 10 No Yes 250000
Convert all data into a vector of real numbers: Raw Data
Points in a feature space
𝐢 𝐢
Convert all predictions into an integer/real number:
How do we deal with categorical data?
Categorical Data
Ordinal Data – The categories have a meaningful order or ranking, but the intervals between the
categories are not necessarily equal. e.g.-Satisfaction Rating: Poor, Fair, Good, Excellent.
Nominal Data - The categories are names or labels with no inherent order or ranking.
e.g.- Colors: Red, Green, Blue, Types of Pets: Dog, Cat, Bird, Fish.
Use Integer Encoding for ordinal data where the order of categories is meaningful.
Categories like "low," "medium," and "high" can be represented as 1, 2, and 3, respectively. The
numerical values reflect the order or ranking among the categories.
One-Hot Encoding is used for nominal data where there is no natural order, or to prevent
algorithms from mistakenly interpreting ordinal relationships between categories.
One-Hot Encoding
Most widespread approach used for categorical data, unless your categorical variable takes on a large
number of values.
Pets Cat Dog Fish
Cat 1 0 0
Cat 1 0 0
Dog One-Hot Encoding 0 1 0
Fish 0 0 1
Dog 0 1 0
Cat 1 0 0
Fish 0 0 1
Can lead to a significant increase in the number of features, especially if the categorical feature has
many unique values.
Representation: From Raw data to Features
Area Bedrooms Bathrooms Age Parking Basement Price
240 3 2 10 No Yes 250000
We are given a set of n examples: 𝐢
Our goal is to learn a model: that captures the pattern of the training
samples
We can assume a model and learn its parameters
Once we learn the model, we can predict the output, corresponding to any new
input, X’ :
Usual Programming vs Machine Learning
Programming: Machine Learning:
New Data: X’
Data Program Data: X Output: Y
F(X, Y)
Testing phase
Training phase
Computer
Computer Computer
Output: Y’
Output Program: F(X, Y)
ML Based on Training-Testing Data
Labelled Data [ samples]
Test Data for
Training Data [ samples] remaining
samples
Take care to not leak information from Test Data into the Model
Feature extraction, Goal: to predict f()
Training Data with the Building
Learn about f() from
representation of a model
training data
feature space
Test Data Model Design
and Validation
Feature extraction, with the
representation of feature
space Trained Model
Model
Evaluation and Compute Prediction
Deployment for the test data
Data Representation
Age
Area Age Property
230 15 A
120 6 B
202 2 B
398 11 A
274 8 ?
Area
Feature Space Representation
Finding the best
Property Type Equation of the line
Feature extraction, fit line
with the Goal: to predict f()
Training Data Building
representation of Learn about f() from
a model
feature space training data
Unknown Property Type Area, Age as points in
Test Data 2D space Model Design
and Validation
Feature extraction, with the
representation of feature
Point vs. Line
space Trained Model
Area, Age as points in
2D space Property Type – A/B
Learning is concerned with accurate Model
Evaluation and Compute Prediction
prediction of future data, not accurate
Deployment for the test data
prediction of training or available data
Summary – Machine Learning Framework
y = f(x) Note: Training set and
testing set comes from
the same distribution
output prediction feature or
function representation
Training: given a training set of labeled examples 𝟏 𝟏 𝟐 𝟐 𝑵 𝑵 ,
estimate the prediction function f by minimizing the prediction error
Testing: apply f to the test example x’ and output the predicted value y = f(x’)
Summary – Machine Learning Framework
y = f(x)
output prediction feature or
function representation
The input is converted to a vector x
The output is a value indicated by y
Depending on the nature of x and y, we define
1) Regression
2) Classification
3) …………………
Representations
Representations in machine learning refer to the way data is transformed or encoded into
a format that is suitable for a learning algorithm to process
Sepal Length Sepal Width Petal Length Petal Width Species
5.1 3.5 1.4 0.2 A
5.4 3.7 1.1 0.1 A
5.2 2.7 3.9 1.0 B
6.6 2.9 3.5 1.2 B
5.8 2.8 5.1 2.4 C
7.7 3.7 6.7 2.2 C
-------- ------ ----- ----- -----
Feature Space
Representations
Images: Raw Pixel Representation, Deep Learning Based Features
The sum of all the pixels
The number of boundary pixels
Edge detection
Representations
Sound: Waveform Representation,
Spectrogram Representation, Mel-
Frequency Cepstral Coefficients
Reference: Towards Low-Complexity Wireless
Technology Classification Across Multiple
Environments,
DOI:10.1016/j.adhoc.2019.101881
Representations – Textual Data
Text Data: N-grams, Bag of Words, Term Frequency-Inverse Document Frequency, Word
Embeddings
Sentence: The weather is sunny today
N-gram N-gram Generated Number of N-gram
Sentence Features
Unigram (1-Gram) “The”, “weather”, “is”, 5
“sunny”, “today”
Bigram (2-Gram) “The weather”, “The is”, 10
“The sunny”, ……
Trigram (3-Gram) “The weather is”, 3
“weather is sunny”, ….
Representations – Textual Data
Text Data: N-grams, Bag of Words, Term Frequency-Inverse Document Frequency, Word
Embeddings
Sentence 1: The weather is sunny today
Sentence 2: The weather was rainy yesterday
1 2 3 4 5 6 7 8 Length
The weather is sunny today was rainy yesterday
1 1 1 1 1 1 0 0 0 5
2 1 1 0 0 0 1 1 1 5
Vector of Sentence 1: [1 1 1 1 1 0 0 0]
Vector of Sentence 2: [1 1 0 0 0 1 1 1]
Why sudden interest in AI?
Appearance of large, high-quality labeled datasets
Massively parallel computing with GPUs
Backprop-friendly activation functions, Improved
architectures
Software platforms, Cloud Compute, APIs,
Libraries
More People, Papers, Results, New regularization techniques, Robust optimizers
Funding, Positive Feedback.
Where is Machine Learning?
Recommendation
Systems Virtual Assistants
Facial Recognition
E-Commerce
Create Photographs, Paintings
Chess/ Go Champions
Autonomous Cars/Navigation
Speech Recognition
Segmentation
Image Courtesy: Google
Other Applications
• Surveillance
• Automated Assembly
• Mail Sorting
• Face detection (photography)
• Robot Navigation
• Content-Based Image Retrieval
• Entertainment
• And many more…
Image Courtesy: Google