KEMBAR78
Lecture Machinelearning | PDF | Machine Learning | Dependent And Independent Variables
0% found this document useful (0 votes)
78 views32 pages

Lecture Machinelearning

Machine learning involves using example data or past experience to allow computers to optimize performance and automatically detect patterns in data. Machine learning can be used to predict future data or other outcomes of interest by uncovering patterns in existing data. Machine learning based on statistics attempts to find the relationship between input and output variables by estimating a function that maps the input to the output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views32 pages

Lecture Machinelearning

Machine learning involves using example data or past experience to allow computers to optimize performance and automatically detect patterns in data. Machine learning can be used to predict future data or other outcomes of interest by uncovering patterns in existing data. Machine learning based on statistics attempts to find the relationship between input and output variables by estimating a function that maps the input to the output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

MACHINE LEARNING

Machine learning is programming computers to optimize a performance criterion using example data or past
experience. Machine learning can automatically detect patterns in data, and then to use the uncovered patterns to
predict future data or other outcomes of interest.

Human Learning
Observ Learning
ations
What

model/
Machine Learning Data predictor

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

Organizations/governments are collecting a lot of data

Information from data is being used to take key business/ political decisions

At lower levels in organization, data is used for MIS reporting


Why
At higher levels data based prescriptive and predictive models are being built

Machine learning is the most popular technique of creating these predictive


and prescriptive model

Start-Tech Academy
MACHINE LEARNING
Machine learning is closely associated with Statistics, AI and Data mining

Machine Learning Vs. Statistics


• Traditional Statistics focuses on provable results with mathematical
assumptions, and care less about computation
• “Statistics: A useful tool for Machine Learning”

Machine Learning Vs. Artificial Intelligence


ML vs others • “Machine Learning is one possible route to realize AI”

Machine Learning Vs. Data Mining


• Traditional DM focuses on provable results with math assumptions
along with efficient computation in large dataset
• “Difficult to distinguish ML and DM in reality”

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

Example

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

Example

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

• Identify:
– Prospective customers
– Dissatisfied customers
Use cases – Good customers
– Bad payers
Banking / • Obtain:
Telecom / Retail – More effective advertising
– Less credit risk
– Fewer fraud
– Decreased churn rate

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

• Medicine:
– Screening
– Diagnosis and prognosis
Use cases – Drug discovery
Biomedical /
Biometrics • Security:
– Face recognition
– Signature / fingerprint / iris verification
– DNA fingerprinting

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

• Computer interfaces:
– Troubleshooting wizards
– Handwriting and speech
Use cases – Chat bots
Computer /
• Internet
Internet – Hit ranking
– Spam filtering
– Text categorization
– Text translation
– Recommendation

Start-Tech Academy
MACHINE LEARNING
Machine learning based on statistics is basically attempting to find the relationship between input and output
variables.

For example, a real estate agent who wants to price a particular property will
have:

Output variable: Price of property (Y)


Input variables: Area covered (X1), Number of bedrooms (X2), proximity to a
landmark (X3), proximity to market (X4), recent sale price of a neighborhood
Example property (X5) and so on

The real estate wants to find out


Y = f(X1, X2, X3, X4, X5…)

So that whenever s/he gives a value of the input variables to this function, s/he
can get the price of the property.

Start-Tech Academy
WHY ESTIMATE f(x)
f(x) defines the relationship between dependent and independent variables.

There are two major reasons to estimate f(x):

1. Prediction – When the values of input variables is available and output


variable is to be predicted. We are only interested in the value of Y, not in
Types the relationship of Y with other variables

2. Inference – When the relationship between input and output variable is


important. We want to establish how output variable varies with change in
each predictor variable

Start-Tech Academy
WHY ESTIMATE f(x)
f(x) defines the relationship between dependent and independent variables.

Choice of model for estimating will depend on whether we want to predict or


infer.

• For Prediction, accuracy of predicted function is the most important

• For Inference, interpretability of predicted function is most important


Choice of Model
For example, linear regression is simple to interpret but may not give very
accurate predicted values of Y

Whereas highly non-linear models may be predicting very accurately but the
relationship may be very difficult to interpret

Start-Tech Academy
HOW TO ESTIMATE F(x)
Next, we need to specify the type of learning method.

In Parametric approach, we assume the functional form of the relationship


Parametric vs between predictor and predicted variable
Non parametric For example, we may assume linear relationship between house price with
other variables
Price (Y) = a0 + a1*x1 + a2*x2 + a3*x3 …..an*xn
Then we will use the training data to estimate the values of a0, a1, a2, a3… an

In non-parametric approach, we do not assume any functional form for the


relationship. Instead f is estimated by getting as close to the training points
For example, in the image shown, for three variables, a three dimensional
spleen is created which is as close to the points and has a smooth surface

Start-Tech Academy
HOW TO ESTIMATE F(x)
Parametric vs Non parametric

Parametric approach
Parametric vs • Usually more interpretable
Non parametric • May not be as accurate
• Preferable if inference is the reason estimating f(x)

Non-parametric approach, w
• Less interpretable
• Potentially more accurate
• Needs large amount of data to train
• Preferable if prediction is the priority

Start-Tech Academy
TYPES OF LEARNING
Supervised vs Unsupervised learning

Supervised Learning:
• Supervised learning is where you have input variables (x) and an output variable
(Y) and you use an algorithm to learn the mapping function from the input to the
output.
Supervised • The goal is to approximate the mapping function so well that when you have
new input data (x) that you can predict the output variables (Y) for that data.
Vs
Unsupervised Learning:
Unsupervised • Unsupervised learning is where you only have input data (X) and no
corresponding output variables.
• The goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.

Start-Tech Academy
Supervised Learning: Example
examples

label
label1

Supervised label3
Learning labeled examples
Example label4

label5

Start-Tech Academy
Supervised Learning: Example

Category Weight
Apple 100 gm

Supervised
Apple 80 gm model/
Learning
predictor
Example
Banana 40 gm

Banana 60 gm

Start-Tech Academy
Supervised Learning: classification

Supervised
model/
Learning predictor
Predicted Category

Example
(classification)

Start-Tech Academy
Supervised Learning (classification)
Classification:
• Example: Credit scoring
• Differentiating between low-risk and
high-risk customers from their income
and savings
Supervised • Model - Discriminant
Learning IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
(classification)
Applications :
• Pattern recognition
• Face recognition
• Character recognition
• Medical diagnosis
• Web Advertising

Start-Tech Academy
Supervised Learning: Regression

Supervised
model/
Learning predictor
Predicted Weight

Example
(Regression)

Start-Tech Academy
Supervised Learning (Regression)
Regression:
• Example: Price of a used car
• x : car attributes
(e.g. mileage)
y : price
y = wx+w0
Supervised • Model – Linear Regression
Learning y = wx+w0

(Regression) Applications :
• Weather forecast
• Sales forecasting
• Advertising budget allocation
• Product pricing

Start-Tech Academy
Supervised Learning Algorithms

Supervised
Learning
Algorithms

Start-Tech Academy
Unsupervised Learning: Example

Unsupervised
Learning
Example

Unupervised learning: given data, i.e. examples, but no labels

Start-Tech Academy
Unsupervised Learning Algorithms

Unsupervised Learning - Algorithms:


• Clustering
o K means
Unsupervised
o Hierarchical clustering
Learning • Hidden Markov Models (HMM)
Algorithms • Dimension Reduction (Factor Analysis, PCA)
• Feature Extraction methods
• Self-organizing Maps (Neutral Nets)

Start-Tech Academy
Machine Learning Model

Steps in Building ML Model


1. Problem formulation
2. Data Tidying
3. Pre-Processing
Steps
4. Train-Test Split
5. Model Building
6. Validation and Model Accuracy
7. Prediction

Start-Tech Academy
Machine Learning Model

• Convert your business problem into a Statistical problem


1. Problem • Clearly define the dependent and independent variable
• Identify whether you want to predict or infer
formulation

Start-Tech Academy
Machine Learning Model

• Transform collected data into a useable data table format


• Example

2. Data
Tidying

Start-Tech Academy
Machine Learning Model

• Filter data

• Aggregate values

3. Data • Missing value treatment


Pre-Processing • Outlier treatment

• Variable transformation

• Variable reduction

Start-Tech Academy
Machine Learning Model

Training data is the information used to train an algorithm.


The training data includes both input data and the corresponding expected
output.
Based on this data, the algorithm can learn the relationship between input and
output variables.
4. Test - Train Testing data includes only input data, not the corresponding expected output.
Split The testing data is used to assess the accuracy of model created or the
predictor function created using the training data.

• There should not be any overlap between the two.


• Usually, 70-80% of the available data is used as training data and 20-30% as
testing data

Start-Tech Academy
Machine Learning Model

5. Model 𝑦 = 𝑓(𝑥)
Training
Output Function Input variables

Start-Tech Academy
Machine Learning Model

In Sample error
• Error resulted from applying your prediction algorithm to the dataset you
6. Performance built it with
Metrics and Out of Sample error
Validation • Error resulted from applying your prediction algorithm to a new data set

Start-Tech Academy
Machine Learning Model

• Setup a pipeline to use your model in real life scenario


7. Prediction • Improve by monitoring your model over time
• Try to automate

Start-Tech Academy

You might also like