KEMBAR78
02 Linear Regression Models | PDF | Regression Analysis | Least Squares
0% found this document useful (0 votes)
23 views206 pages

02 Linear Regression Models

In-depth explanation of the Linear Regression Models with background and historical analysis as well as the details to solve complex problems

Uploaded by

leocortes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views206 pages

02 Linear Regression Models

In-depth explanation of the Linear Regression Models with background and historical analysis as well as the details to solve complex problems

Uploaded by

leocortes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 206

Linear Regression

Linear Regression

● The first machine learning algorithm we


will explore is also one of the oldest!
Linear Regression

● Linear Regression
○ Theory of Linear Regression
○ Simple Implementation with Python
○ Scikit-Learn Overview
○ Linear Regression with Scikit-learn
○ Polynomial Regression
○ Regularization
○ Overview of Project Dataset
Let’s get started!
Introduction to Linear
Regression
Algorithm Theory - Part One
History and Motivation
Linear Regression

● Before we do any coding, we will have a


deep dive into building out an intuition
of the theory and motivation behind
Linear Regression.
Linear Regression

● This will include understanding:


○ Brief History
○ Linear Relationships
○ Ordinary Least Squares
○ Cost Functions
○ Gradient Descent
○ Vectorization
Introduction to Linear
Regression
Brief History
Linear Regression

● The history of the “invention” of linear


regression is a bit muddled.
● The linear regression methods based on
least squares grew out of a need for
mathematically improving navigation
methods based on astronomy during the
Age of Exploration in the 1700s.
Linear Regression

● 1722 - Roger Cotes discovers combining


different observations yields better
estimates of the true value.
● 1750 - Tobias Mayer explores averaging
different results under similar conditions
in studying librations of the moon.
Linear Regression

● 1757 - Roger Joseph Boscovich further


develops combining observations
studying the shape of the Earth.
● 1788 - Pierre-Simon LaPlace develops
similar averaging theories in explaining
the differences of motion between
Jupiter and Saturn.
Linear Regression

● 1805 - First public exposition on Linear


Regression with least squares method
published by Adrien-Marie Legendre -
Nouvelles Méthodes pour la
Détermination des Orbites des Comètes
Linear Regression

● 1809 - Carl Friedrich Gauss publishes his


methods of calculating orbits of celestial
bodies.
● Claiming to have invented least-squares
back in 1795!
Linear Regression

● 1808 - Robert Adrain published his


formulation of least squares (a year
before publication by Gauss).
Introduction to Linear
Regression
Linear Relationships
Linear Regression

● Put simply, a linear relationship implies


some constant straight line relationship.
● The simplest possible being y = x.
Linear Regression

● Here we see x = [1,2,3] and y = [1,2,3]


Linear Regression

● We could then, based on the three real


data points, build out the relationship
y=x as our “fitted” line.
Linear Regression

● This implies for some new x value I can


predict its related y.
Linear Regression

● But what happens with real data?


Linear Regression

● How do we draw a “fitted” line?


Linear Regression

● Ho do we draw a better “fitted” line?


Linear Regression

● Fundamentally, we understand we want


to minimize the overall distance from the
points to the line.
Linear Regression

● Fundamentally, we understand we want


to minimize the overall distance from the
points to the line.
Linear Regression

● We also know we can measure this error


from the real data points to the line,
known as the residual error.
Linear Regression

● Some lines will clearly be better fits than


others.
Linear Regression

● We can also see the residuals can be


both positive and negative.
Introduction to Linear
Regression
Ordinary Least Squares
Linear Regression

● Ordinary Least Squares works by


minimizing the sum of the squares of the
differences between the observed
dependent variable (values of the
variable being observed) in the given
dataset and those predicted by the linear
function.
Linear Regression

● We can visualize squared error to


minimize:
Linear Regression

● We can visualize squared error to


minimize:
Linear Regression

● Having a squared error will help us


simplify our calculations later on when
setting up a derivative.
Linear Regression

● Let’s explore Ordinary Least Squares by


converting a real data set into
mathematical notation, then working to
solve a linear relationship between
features and a variable!
Introduction to Linear
Regression
Algorithm Theory - Part Two
OLS Equations
Linear Regression

● Linear Regression OLS Theory


○ We know the equation of a simple
straight line:
■ y = mx + c
● m is the slope or gradient
● c is intercept with y-axis (or the
value of y when x is zero [0,c] )
Linear Regression
Linear Regression
Linear Regression

● Linear Regression OLS Theory


○ We can see for y=mx+c there is only
room for one possible feature x.
○ OLS will allow us to directly solve for
the slope m and intercept c.
○ We will later see we’ll need tools like
gradient descent to scale this to
multiple features.
Linear Regression

● Let’s explore how we could translate a


real data set into mathematical notation
for linear regression.
● Then we’ll solve a simple case of one
feature to explore OLS in action.
● Afterwards we’ll focus on gradient
descent for real world data set situations.
Linear Regression

● Linear Regression allows us to build a


relationship between multiple features
to estimate a target output/predicted
value Area m Bedrooms
2 Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Linear Regression
● We can translate this data into
generalized mathematical notation
● [x: features, y: predicted value]
X y
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Linear Regression

● We can translate this data into


generalized mathematical notation
X y
x1 x2 x3 y

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Linear Regression

● We can translate this data into


generalized mathematical notation
X y
x1 x2 x3 y

x11 3 2 $500,000

x21 2 1 $450,000

x31 3 3 $650,000

x41 1 1 $400,000

x51 2 2 $550,000
Linear Regression

● We can translate this data into


generalized mathematical notation
X y
x1 x2 x3 y

x11 x11 x11 y1

x21 x21 x21 y2

x31 x31 x31 y3

x41 x41 x41 y4

x51 x51 x51 y5


Linear Regression

● Now let’s build out a linear relationship


between the features X and label y.
X y
x1 x2 x3 y

x11 x11 x11 y1

x21 x21 x21 y2

x31 x31 x31 y3

x41 x41 x41 y4

x51 x51 x51 y5


Linear Regression

● Now let’s build out a linear relationship


between the features x and label y.
X y
x1 x2 x3 y
Linear Regression

● Reformat for y = x equation

y X
y x1 x2 x3
Linear Regression

● Each feature should have some Beta


coefficient associated with it.
y X
y x1 x2 x3
Linear Regression

● This is the same as the common notation


for a simple straight line: y=mx+c
y X
y x1 x2 x3
Linear Regression

● This is stating there is some Beta


coefficient for each feature to minimize
error. y X
y x1 x2 x3
Linear Regression

● We can also express this equation as a


sum:
y X
y x1 x2 x3
Linear Regression

● Note the y hat symbol displays a


prediction.
Linear Regression

● Line equation:
Linear Regression
Linear Regression

β
Linear Regression
Linear Regression
Linear Regression

● For simple problems with one X feature


we can easily solve for Betas values with
an analytical solution.

● Let’s quickly solve a simple example


problem, then we will see that for
multiple features we will need gradient
descent.
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression

● As we expand to more than a single


feature however, an analytical solution
quickly becomes unscalable.

● Instead we shift focus on minimizing a


cost function with gradient descent.
Linear Regression

● We can use gradient descent to solve a


cost function to calculate Beta values!
Introduction to Linear
Regression
Algorithm Theory - Part Three
Cost Function
Linear Regression

● What we know so far:


○ Linear Relationships
■ y = mx+c
○ Ordinary Least Squares (OLS)
■ Solve simple linear regression
○ Not scalable for multiple features
○ Translating real data to Matrix Notation
○ Generalized formula for Beta coefficients
Linear Regression

● Remember we are searching for Beta


values for a best-fit line.
Linear Regression

● The equation below simply defines our


line, but how to choose beta
coefficients?
Linear Regression

● We’ve decided to define a “best-fit” as


minimizing the squared error or better
known as mean squared error (MES)
Linear Regression

● What is a Cost Function?

○ It is a function that measures the performance


of a Machine Learning model for given data.

○ Cost Function quantifies the error between


predicted values and expected values and
presents it in the form of a single real number.
Linear Regression

● What is a Cost Function?

○ In this situation, the event we are finding the


cost of is the difference between estimated
values, or the hypothesis and the real values —
the actual data we are trying to fit a line to.

https://medium.com/@lachlanmiller_52885/understanding-and-calculating-the-cost-function-for-linear-regression-39b8a3519fcb
Linear Regression
● What is a Cost Function?
Linear Regression

● What is a Cost Function?

○ The goal here is to find a line of best fit. A line that


approximates the values most accurately.
Linear Regression

● What is a Cost Function?


○ Here are some random
guesses for the slope of
each line
Linear Regression

● What is a Cost Function?


We have three hypothesis — three potential sets of data that might
represent a line of best fit. The slope for each line is as follows:

best_fit_2 looks pretty good. But we are data scientists, we don’t guess,
we conduct analysis and make well founded statements using
mathematics.
Linear Regression

● Our cost function can be defined by the


squared error formula:
Linear Regression

● Remember a cost function maps event or


values of one or more variables onto a real
number.

● In this case, the event we are finding the cost


of is the difference between estimated
values (the hypothesis) and the real values
— the actual data we are trying to fit a line to.
Linear Regression

• m is the number of samples — in this case, we


have three samples for X.

• Those are 1, 2 and 3. So 1/2*m is a constant. It


turns out to be 1/6, or 0.1667 .
Linear Regression

• Now we have sigma. This means the sum.

• In this case, the sum from i to m, or 1 to 3.


Linear Regression

• We repeat the calculation to the right of the


sigma for each sample.

• The actual calculation is just the hypothesis


value for minus the actual value of y. Then
you square whatever you get.
Linear Regression

• The final result will be a single number.

• We repeat this process for all the hypothesis, in this


case best_fit_1 , best_fit_2 and best_fit_3. Whichever has the
lowest result, or the lowest “cost” is the best fit of the three
hypothesis.

• Then you square whatever you get.


Linear Regression

● Calculating the Cost Function by Hand

Let’s run the calculation for best_fit_1


Linear Regression

● Calculating the Cost Function by Hand

Let’s run the calculation for best_fit_1

= 1/6
Linear Regression

● Calculating the Cost Function by Hand

Let’s run the calculation for best_fit_1

(0.50 – 1)2 = 0.25


Linear Regression

● Calculating the Cost Function by Hand

Let’s run the calculation for best_fit_1

(1.00 – 2.50)2 = 2.25


Linear Regression

● Calculating the Cost Function by Hand

Let’s run the calculation for best_fit_1

(1.50-3.50)2 = 4
Linear Regression

J = 1/6 * (0.25 + 2.25 + 4.00 )2


J = 1.083
Linear Regression

• Repeat the same process for all the other hypothesis


and we get:

• A lowest cost is desirable. A low costs represents a


smaller difference. By minimizing the cost, we are
finding the best fit.

• Out of the three-hypothesis presented, best_fit_2 has


the lowest cost.
Linear Regression

The orange
line, best_fit_2, is the
best fit of the three.

We can see this is


likely the case by
visual inspection, but
now we have a more
defined process for
confirming our
observations.
Linear Regression

● Unfortunately, it is not scalable to try to


get an analytical solution to minimize
this cost function.

● In the next lecture we will learn to use


gradient descent to minimize this cost
function.
Introduction to Linear
Regression
Algorithm Theory - Part Three
Gradient Descent
Linear Regression

● We just figured out a cost function to


minimize!

● Taking the cost function derivative and


then solving for zero to get the set of
Beta coefficients will be too difficult to
solve directly through an analytical
solution.
Linear Regression

● Instead we can describe this cost


function through vectorized matrix
notation and use gradient descent to
have a computer figure out the set of
Beta coefficient values that minimize the
cost/loss function.
Linear Regression

● Our goals:
○ Find a set of Beta coefficient values
that minimizes the error (cost
function)
○ Leverage computational power
instead of having to manually attempt
to analytically solve the derivative.
Linear Regression

● What is a Gradient Descent?

Gradient Descent (GD) is an efficient optimization


algorithm that attempts to find a local or global
minimum of a function.

In other words, GD is a general function for


minimizing a function, in this case the Mean
Squared Error cost function
Linear Regression

● What is a Gradient Descent?

Gradient Descent basically just does what we were


doing by hand — change the theta values, or
parameters, bit by bit, until we hopefully arrived to a
minimum.
Linear Regression

● What is a Gradient Descent?

Gradient descent enables a model to learn


the gradient or direction that the model
should take in order to reduce errors
(differences between actual y and
predicted y).
Linear Regression

● What is a Gradient Descent?

Direction in the simple linear regression


example refers to how the model
parameters m and c should be tweaked or
corrected to further reduce the cost
function
Linear Regression

● What is a Gradient Descent?

As the model iterates, it gradually


converges towards a minimum where
further tweaks to the parameters produce
little or zero changes in the loss — also
referred to as convergence.
Linear Regression

● Gradient descent can be defined by the


following formula:

Source: https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
Linear Regression

At this point the


model
has optimized the
weights such that
they minimize the
cost function.
Linear Regression
Linear Regression

The first thing to notice is


the thick red line. This is the
line estimated from the
initial values of m and b1.

You can see that this


doesn’t fit the data points
well at all and because of
this it is has the highest
error (MSE )
Linear Regression
However, you can see the
lines gradually moving
toward the data points until
a line of best fit (the thick
blue line) is identified.

In other words, upon each


iteration the model has
learned better values for m
and c until it finds the values
that minimize the cost
function.
Linear Regression

● What is a Gradient Descent?

The alternative to the gradient descent


process would be brute forcing a potentially
infinite combination of parameters until the
set that minimizes the cost are identified.

For obvious reasons this isn’t really feasible.


Linear Regression

● What is a Gradient Descent?

Gradient descent, therefore, enables the


learning process to make corrective updates
to the learned estimates that move the
model toward an optimal combination of
parameters.
Linear Regression

● Let’s visually explore what this looks like


in the case of a single Beta value.
Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● Common mountain analogy


Linear Regression

● This is exactly what gradient descent


does!

● It even looks similar for the case of a


single coefficient search.
Linear Regression

● 1 dimensional cost function (single Beta)


Linear Regression

● Choose a starting point


Linear Regression

● Calculate gradient at that point


Linear Regression

● Step forward proportional to negative


gradient
Linear Regression

● Repeat the steps


Linear Regression

● Repeat the steps


Linear Regression

● Note how we are essentially mapping the


gradient!
Linear Regression

● Eventually we will find the Beta that


minimizes the cost function!
Linear Regression

● Steps are proportional to negative


gradient!
Linear Regression

● Steeper gradient at start gives larger


steps.
Linear Regression

● Smaller gradient at end gives smaller


steps.
Linear Regression

● Finally! We can now leverage all our


computational power to find optimal
Beta coefficients that minimize the cost
function producing the line of best fit!

● We are now ready to code out Linear


Regression!
Simple Linear
Regression
Linear Regression

● Now that we understand what is


happening “under the hood” for linear
regression, let’s begin by coding through
an example of simple linear regression.
Linear Regression

● Simple Linear Regression


○ Limited to one X feature (y=mx+c)
○ We will create a best-fit line to map
out a linear relationship between
total advertising spend and
resulting sales.
● Let’s head over to the notebook!
Linear Regression

● Simple Linear Regression Exercise


○ Years Old
○ Experience Years
○ Academic Years
○ Salary

● Predict the Salary for the following


people:
Scikit-Learn
Overview
Scikit-Learn

● We’ve seen NumPy had some built in


capabilities for simple linear regression,
but when it comes to more complex
models, we’ll need Scikit-Learn!
● Before we jump straight into machine
learning with Scikit-Learn and Python,
let’s understand the philosophy behind
sklearn.
Scikit-Learn

● Scikit-learn is a library containing many


machine learning algorithms.
● It utilizes a generalized “estimator API”
framework to calling the models.
● This means the way algorithms are
imported, fitted, and used is uniform
across all algorithms.
Scikit-Learn

● This allows users to easily swap algorithms


in and out and test various approaches.

● This uniform framework also means users


can easily apply almost any algorithm
effectively without truly understanding
what the algorithm is doing!
Scikit-Learn

● Scikit-learn also comes with many


convenience tools, including train test
split functions, cross validation tools, and
a variety of reporting metric functions.
● This leaves Scikit-Learn as a “one-stop
shop” for many of our machine learning
needs.
Scikit-Learn

● Philosophy of Scikit-Learn
○ Scikit-Learn’s approach to model
building focuses on applying
models and performance metrics.
○ This is a more pragmatic industry
style approach rather than an
academic approach of describing
the model and its parameters.
Scikit-Learn

● Philosophy of Scikit-Learn
○ Academic users used to R style
reporting may also want to explore
the statsmodels python library if
interested in more statistical
description of models such as
significance levels.
Scikit-Learn

● Let’s quickly review the framework of


Scikit-Learn for the supervised machine
learning process.
● We will quickly see how the code directly
relates to the process theory!
Supervised Machine Learning Process

● Recall that we will perform a Train | Test


split for supervised learning.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000
TEST
210 2 2 $550,000
Supervised Machine Learning Process

● Also recall there are 4 main components


after a Train | Test split:
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
X TRAIN 190 2 1 $450,000 Y TRAIN

230 3 3 $650,000

180 1 1 $400,000 Y TEST


X TEST
210 2 2 $550,000
Supervised Machine Learning Process

● Scikit-Learn easily does this split (as well


as more advanced cross-validation)
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000
TEST
210 2 2 $550,000
Scikit-Learn

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)


Supervised Machine Learning Process

● Also recall that we want to compare


predictions to the y test labels.
Predictions Area m2 Bedrooms Bathrooms Price

$410,000 180 1 1 $400,000


TEST
$540,000 210 2 2 $550,000
Scikit-Learn

from sklearn.model_family import ModelAlgo


Scikit-Learn

from sklearn.model_family import ModelAlgo


mymodel = ModelAlgo(param1,param2)
Scikit-Learn

from sklearn.model_family import ModelAlgo


mymodel = ModelAlgo(param1,param2)
mymodel.fit(X_train,y_train)
Scikit-Learn

from sklearn.model_family import ModelAlgo


mymodel = ModelAlgo(param1,param2)
mymodel.fit(X_train,y_train)
predictions = mymodel.predict(X_test)
Scikit-Learn

from sklearn.model_family import ModelAlgo


mymodel = ModelAlgo(param1,param2)
mymodel.fit(X_train,y_train)
predictions = mymodel.predict(X_test)

from sklearn.metrics import error_metric


Scikit-Learn

from sklearn.model_family import ModelAlgo


mymodel = ModelAlgo(param1,param2)
mymodel.fit(X_train,y_train)
predictions = mymodel.predict(X_test)

from sklearn.metrics import error_metric


performance = error_metric(y_test,predictions)
Scikit-Learn

● This framework will be similar for any


supervised machine learning algorithm.
● Let’s begin exploring it further with
Linear Regression!
Linear Regression with
Scikit-Learn
Part One:
Data Setup and Model Training
Linear Regression

● Previously, we explored “Is there a


relationship between total advertising
spend and sales? “
● Now we want to expand this to “What is
the relationship between each
advertising channel
(TV,Radio,Newspaper) and sales?”
Linear Regression

● Le’ts jump into jupyter notebook to


answer this question
Performance
Evaluation
Regression Metrics
Evaluating Regression

● Now that we have a fitted model that


can perform predictions based on
features, how do we decide if those
predictions are any good?
● Fortunately we have the known test
labels to compare our results to.
Evaluating Regression

● Let’s take a moment now to discuss


evaluating Regression Models
● Regression is a task when a model
attempts to predict continuous
values (unlike categorical values,
which is classification)
Evaluating Regression

● For example, attempting to predict the


price of a house given its features is a
regression task.
● Attempting to predict the country a
house is in given its features would be
a classification task.
Evaluating Regression

● You may have heard of some


evaluation metrics like accuracy or
recall.
● These sort of metrics aren’t useful for
regression problems, we need metrics
designed for continuous values!
Evaluating Regression

● Let’s discuss some of the most


common evaluation metrics for
regression:
○ Mean Absolute Error
○ Mean Squared Error
○ Root Mean Square Error
Evaluating Regression

● The metrics shown here apply to any


regression task, not just Linear
Regression!
Evaluating Regression

● Mean Absolute Error (MAE)


○ This is the mean of the absolute
value of errors.
○ Easy to understand
Evaluating Regression
● MAE won’t punish large errors however

https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Evaluating Regression

● MAE won’t punish large errors


however.
Evaluating Regression

● We want our error metrics to account


for these!
Evaluating Regression

● Mean Squared Error (MSE)


○ Large errors are “punished” more
than with MAE making MSE more
popular.
Evaluating Regression

● Mean Squared Error (MSE)


○ Issue with MSE:
■ Different units than original y.
■ It reports units of y squared!
Evaluating Regression

● Root Mean Square Error (RMSE)


○ This is the root of the mean of the
squared errors.
○ Most popular (has same units as y)
Machine Learning

● Most common question:


○ “What is a good value for RMSE?”
● Context is everything!
● A RMSE of $10 is fantastic for
predicting the price of a house, but
horrible for predicting the price of a
candy bar!
Machine Learning

● Compare your error metric to the


average value of the label in your data
set to try to get an intuition of its overall
performance.

● Domain knowledge also plays an


important role here!
Machine Learning

● Context of importance is also necessary


to consider.
○ We may create a model to predict how
much medication to give, in which
case small fluctuations in RMSE may
actually be very significant.
Machine Learning

● Context of importance is also necessary


to consider.
○ If we create a model to try to improve
on some runners performance, we
would need some baseline RMSE to
compare to.
Evaluating Regression

● Let’s quickly jump back to the notebook


and calculate these metrics with SciKit-
Learn!
Evaluating Residuals
Linear Regression

● Often for Linear Regression it is a good


idea to separately evaluate residuals
● (y-ŷ) and not just calculate performance
metrics (e.g. RMSE).
● Let’s explore why this is important.
Linear Regression

● Anscombe’s Quartet:
Linear Regression

● Clearly Linear Regression is not suitable!


Linear Regression

● We need to understand how can we tell


if we’re dealing with more than one x
feature?

● We can not see this discrepancy of fit


visually if we have multiple features!
Linear Regression

● What we could do is plot residual error


against true y values.
● Consider an appropriate data set:
Linear Regression

● The residual errors should be random


and close to a normal distribution.
Linear Regression

● The residual errors should be random


and close to a normal distribution.
Linear Regression

● Residual plot shows residual error vs.


true y value.
Linear Regression

● There should be no clear line or curve.


Linear Regression

● What about non valid datasets?


Linear Regression

● What about non valid datasets?


Example 1
Linear Regression

● Residual plot showing a clear pattern,


indicating Linear Regression no valid,
and we should choose another model
Linear Regression

● Residual plot showing a clear pattern,


indicating Linear Regression no valid!
Linear Regression

● What about non valid datasets? Example 2


Linear Regression

● Residual plot showing a clear pattern,


indicating Linear Regression no valid!
Linear Regression

● Residual plot showing a clear pattern,


indicating Linear Regression no valid!
Linear Regression

● Let’s explore creating these plots with


Python and our model results!
Model Deployment
Linear Regression

● We’re almost done with our first


machine learning run through!
● Let’s quickly review what we’ve done so
far in the ML process.
Supervised Machine Learning Process

● Recall the Supervised ML Process


Training
Data Set ?
X and y Fit/Train Adjust as Deploy
Data Model Needed Model

Test
Data Set
Evaluate
Performance
Linear Regression

● We will explore later on the polynomial


regression and regularization as model
adjustments.
● For now, let’s focus on a simple
“deployment” of our model by saving
and loading it, then applying to new
data.

You might also like