Machine Learning
Linear Regression
Quan Minh Phan & Ngoc Hoang Luong
University of Information Technology
-
Vietnam National University Ho Chi Minh City
October 7, 2022
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 1 / 40
New Packages
numpy → very frequently used in ML (python)
Link: https://numpy.org/doc/stable/user/index.html#user
>> import numpy as np
matplotlib → for visualization
Link: https://matplotlib.org/stable/tutorials/index.html
>> import matplotlib.pyplot as plt
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 2 / 40
Generate A Regression Problem
>> from sklearn.datasets import make regression
>> X, y = make regression(n samples=500, n features=1,
n informative=1, noise=25, random state=42)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 3 / 40
Data Visualization
>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 4 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 5 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 6 / 40
Recall (Linear Regression)
Figure: The general concept of Linear Regression
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 7 / 40
Minimizing cost function with gradient descent
Cost function (Squared Error):
1 X (i)
J(w ) = (y − ŷ (i) )2 (1)
2
i
Update the weights:
wt+1 := wt + ∆w (2)
∆w = −η∇J(w ) (3)
∂J X (i)
=− (y (i) − ŷ (i) )xj (4)
∂wj
i
∂J X (i)
∆wj = −η =η (y (i) − ŷ (i) )xj (5)
∂wj
i
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 8 / 40
Minimizing cost function with gradient descent (cont.)
(
wj + η ∗ sum(y − ŷ ) j =0
wj = (i)
wj + η ∗ i (y (i) − ŷ (i) )xj
P
j ∈ [1, . . . , n]
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 9 / 40
Pseudocode of the Training Process
Algorithm 1 Gradient Descent
1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, ŷ
4: Updates the weights
5: Compute the difference between y and ŷ
6: Update the intercept
7: Update the coefficients
8: end while
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 10 / 40
Components
Hyperparameters
eta (float): the initial learning rate
max iter (int): the maximum number of iterations
random state (int)
Parameters
w (list/array): the weight values
costs (list/array): the list containing the cost values over iterations
Methods
fit(X , y )
predict(X )
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 11 / 40
Implement (code from scratch)
class LinearRegression GD:
def init (self, eta = 0.001, max iter = 20, random state = 42):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]
def predict(self, X):
return np.dot(X, self.w[1:]) + self.w[0]
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 12 / 40
’fit’ method
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range(self.max iter):
y pred = self.predict(X)
diff = y - y pred
self.w[0] += self.eta * np.sum(diff)
for j in range(X.shape[1]): // j ← [0, 1, ..., X.shape[1]]
delta = 0.0
for i in range(X.shape[0]): // i ← [0, 1, ..., X.shape[0]]
delta += self.eta * diff[i] * X[i][j]
self.w[j + 1] += delta
cost = np.sum(diff ** 2) / 2
self.costs.append(cost)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 13 / 40
’fit’ method (2)
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range (self.max iter):
y pred = self.predict(X)
diff = y - y pred
self.w[0] += self.eta * np.sum(diff)
self.w[1:] += self.eta * np.dot(X.T, diff)
cost = np.sum(diff ** 2) / 2
self.costs.append(cost)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 14 / 40
Train Model
Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=42)
reg GD.fit(X, y)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 15 / 40
Visualize the trend in the cost values (Gradient Descent)
>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)
plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 16 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 17 / 40
Visualize on Data
>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
plt.plot(X, reg GD.predict(X), color=’green’, lw=6, label=’Gradient
Descent’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 18 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 19 / 40
Weight values
>> w GD = reg GD.w
w GD
>> [-0.9794002, 63.18592509]
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 20 / 40
Implement (package)
Stochastic Gradient Descent
from sklearn.linear model import SGDRegressor
Hyperparameters Parameters Methods
eta0 intercept fit(X, y)
max iter coef predict(X)
random state
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 21 / 40
Implement (package) (cont.)
Normal Equation
from sklearn.linear model import LinearRegression
Parameters Methods
intercept fit(X, y)
coef predict(X)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 22 / 40
Differences
Gradient Descent
w := w + ∆w
∆w = η i (y (i) − ŷ (i) )x i
P
Stochastic Gradient Descent
w := w + ∆w
∆w = η(y (i) − ŷ (i) )x i
Normal Equation
w = (X T X )−1 X T y
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 23 / 40
Practice (cont.)
Stochastic Gradient Descent
>> from sklearn.linear model import SGDRegressor
>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=42, learning rate=’constant’)
reg SGD.fit(X, y)
Normal Equation
>> from sklearn.linear model import LinearRegression
>> reg NE = LinearRegression()
reg NE.fit(X, y)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 24 / 40
Weight Values Comparisons
Gradient Descent (ours)
>> w GD = reg GD.w
w GD
>> [-0.9794002, 63.18592509]
Stochastic Gradient Descent
>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )
w SGD
>> [-1.02681553, 63.08630288]
Normal Equation
>> w NE = np.append(reg NE.intercept , reg NE.coef )
w NE
>> [-0.97941333, 63.18605572]
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 25 / 40
Visualize on Data (all)
>> plt.scatter(X, y, facecolor=’tab:blue’, edgecolor=’white’, s=70)
plt.plot(X, reg GD.predict(X), color=’green’, lw=6, label=’Gradient
Descent’)
plt.plot(X, reg SGD.predict(X), color=’black’, lw=4,
label=’Stochastic Gradient Descent’)
plt.plot(X, reg NE.predict(X), color=’orange’, lw=2, label=’Normal
Equation’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 26 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 27 / 40
Performance Evaluation
Mean Absolute Error (MAE)
1 X (i)
MAE (y , ŷ ) = |y − ŷ (i) | (6)
n
i
Mean Squared Error (MSE)
1 X (i)
MSE (y , ŷ ) = (y − ŷ (i) )2 (7)
n
i
R-Squared (R2)
P (i)
2 (y − ŷ (i) )2
R (y , ŷ ) = 1 − Pi (i) − y )2
(8)
i (y
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 28 / 40
Performance Evaluation
>> from sklearn.metrics import mean absolute error as MAE
from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2
>> y pred GD = reg GD.predict(X)
>> y pred SGD = reg SGD.predict(X)
>> y pred NE = reg NE.predict(X)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 29 / 40
Performance Evaluation (cont.)
Mean Absolute Error
>> print(’MAE of GD:’, round(MAE(y, y pred GD), 6))
print(’MAE of SGD:’, round(MAE(y, y pred SGD), 6))
print(’MAE of NE:’, round(MAE(y, y pred NE), 6))
Mean Squared Error
>> print(’MSE of GD:’, round(MSE(y, y pred GD), 6))
print(’MSE of SGD:’, round(MSE(y, y pred SGD), 6))
print(’MSE of NE:’, round(MSE(y, y pred NE), 6))
R 2 score
>> print(’R2 of GD:’, round(R2(y, y pred GD), 6))
print(’R2 of SGD:’, round(R2(y, y pred SGD), 6))
print(’R2 of NE:’, round(R2(y, y pred NE), 6))
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 30 / 40
Run Gradient Descent with lr = 0.005
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 31 / 40
Polynominal Regression
Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]
>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,
480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 32 / 40
Visualize data
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 33 / 40
Experiment with Linear Regression
>> from sklearn.linear model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 34 / 40
Experiment with Linear Regression (cont.)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 35 / 40
Experiment with Polynominal Regression
Syntax
from sklearn.preprocessing import PolynomialFeatures
>> from sklearn.preprocessing import PolynomialFeatures
quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr = LinearRegression()
pr.fit(X quad, y)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 36 / 40
Experiment with Polynominal Regression (cont.)
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 37 / 40
>> X test = np.arange(250, 600, 10)[:, np.newaxis]
>> y pred linear = lr.predict(X test)
y pred quad = pr.predict(quadratic.fit transform(X test))
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X test, y pred linear, label=’Linear fit’, c=’black’)
plt.plot(X test, y pred quad, label=’Quadratic fit’, c=’orange’)
plt.legend()
plt.show()
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 38 / 40
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 39 / 40
Practice
Dataset: ’Boston Housing’ (housing.csv) (14 attributes: 13
independent variables + 1 target variable)
File: boston housing.iypnb
Q.M. Phan & N.H. Luong (VNU-HCM UIT) Machine Learning October 7, 2022 40 / 40