KEMBAR78
W2 Lab04 Gradient Descent Problem Assignment | PDF | Mathematical Analysis | Mathematics
0% found this document useful (0 votes)
53 views10 pages

W2 Lab04 Gradient Descent Problem Assignment

This document outlines a lab assignment focused on implementing gradient descent for linear regression using Python. It details the goals, tools, and steps required to automate the optimization of parameters w and b, including functions for computing cost and gradients. The assignment also includes practical exercises for predicting housing prices based on training data and visualizing the gradient descent process.

Uploaded by

2020 DGNL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views10 pages

W2 Lab04 Gradient Descent Problem Assignment

This document outlines a lab assignment focused on implementing gradient descent for linear regression using Python. It details the goals, tools, and steps required to automate the optimization of parameters w and b, including functions for computing cost and gradients. The assignment also includes practical exercises for predicting housing prices based on training data and visualizing the gradient descent process.

Uploaded by

2020 DGNL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Week 02 _LAB04: Gradient Descent for


Linear Regression

Goals
In this lab, you will:

automate the process of optimizing w and b using gradient descent.

Tools
In this lab, we will make use of:

NumPy, a popular library for scientific computing


Matplotlib, a popular library for plotting data
plotting routines in the lab_utils.py file in the local directory

In [ ]: import math, copy


import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
from lab_utils_uni import plt_house_x, plt_contour_wgrad, plt_divergence,

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…2_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 1 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Problem Statement
Let's use the same two data points as before - a house with 1000 square feet sold for
$300,000 and a house with 2000 square feet sold for \$500,000.

Size (1000 sqft) Price (1000s of dollars)

1 300

2 500

In [ ]: # Load our data set


x_train = np.array([1.0, 2.0]) #features
y_train = np.array([300.0, 500.0]) #target value

Compute_Cost
This was developed in the last lab. We'll need it again here.

In [ ]: #Function to calculate the cost


def compute_cost(x, y, w, b):

m = x.shape[0]
cost = 0

for i in range(m):
f_wb = w * x[i] + b
cost = cost + (f_wb - y[i])**2
total_cost = 1 / (2 * m) * cost

return total_cost

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 2 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Gradient descent summary


So far in this course, you have developed a linear model that predicts fw,b (x(i) ):

fw,b (x(i) ) = wx(i) + b (1)

In linear regression, you utilize input training data to fit the parameters w,b by
minimizing a measure of the error between our predictions fw,b (x(i) ) and the actual
data y (i) . The measure is called the cost, J(w, b). In training you measure the cost
over all of our training samples x(i) , y (i)

1 m−1
J(w, b) = ∑ (fw,b (x(i) ) − y (i) )2 (2)
2m
i=0

In lecture, gradient descent was described as:

repeat until convergence: {


∂J(w, b)
w=w−α (3)
∂w
∂J(w, b)
b=b−α
∂b
}

where, parameters w, b are updated simultaneously.


The gradient is defined as:

∂J(w, b) 1 m−1
= ∑ (fw,b (x(i) ) − y (i) )x(i) (4)
∂w m
i=0
∂J(w, b) 1 m−1
= ∑ (fw,b (x(i) ) − y (i) ) (5)
∂b m
i=0

Here simultaniously means that you calculate the partial derivatives for all the
parameters before updating any of the parameters.

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 3 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Implement Gradient Descent


You will implement gradient descent algorithm for one feature. You will need three
functions.

compute_gradient implementing equation (4) and (5) above


compute_cost implementing equation (2) above (code from previous lab)
gradient_descent , utilizing compute_gradient and compute_cost

Conventions:

The naming of python variables containing partial derivatives follows this pattern,
∂J(w,b)
will be dj_db .
∂b
w.r.t is With Respect To, as in partial derivative of J(wb) With Respect To b.

compute_gradient
∂J(w,b) ∂J(w,b)
compute_gradient implements (4) and (5) above and returns , . The
∂w ∂b
embedded comments describe the operations.

In [ ]: def compute_gradient(x, y, w, b):


"""
Computes the gradient for linear regression
Args:
x (ndarray (m,)): Data, m examples
y (ndarray (m,)): target values
w,b (scalar) : model parameters
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""

# Number of training examples


m = x.shape[0]
dj_dw = 0
dj_db = 0

for i in range(m):
f_wb = w * x[i] + b
dj_dw_i = (f_wb - y[i]) * x[i]
dj_db_i = f_wb - y[i]
dj_db += dj_db_i
dj_dw += dj_dw_i
dj_dw = dj_dw / m
dj_db = dj_db / m

return dj_dw, dj_db

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 4 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

The lectures described how


gradient descent utilizes the
partial derivative of the cost with
respect to a parameter at a
point to update that parameter.
Let's use our
compute_gradient function
to find and plot some partial
derivatives of our cost function
relative to one of the
parameters, w0 .

In [ ]: plt_gradients(x_train,y_train, compute_cost, compute_gradient)


plt.show()

∂J(w,b)
Above, the left plot shows or the slope of the cost curve relative to w at three
∂w
points. On the right side of the plot, the derivative is positive, while on the left it is
negative. Due to the 'bowl shape', the derivatives will always lead gradient descent
toward the bottom where the gradient is zero.

∂J(w,b) ∂J(w,b)
The left plot has fixed b = 100. Gradient descent will utilize both ∂w
and
∂b
to update parameters. The 'quiver plot' on the right provides a means of viewing the
gradient of both parameters. The arrow sizes reflect the magnitude of the gradient at
∂J(w,b)
that point. The direction and slope of the arrow reflects the ratio of and
∂w
∂J(w,b)
at that point. Note that the gradient points away from the minimum. Review
∂b
equation (3) above. The scaled gradient is subtracted from the current value of w or
b. This moves the parameter in a direction that will reduce cost.

Gradient Descent
Now that gradients can be computed, gradient descent, described in equation (3)
above can be implemented below in gradient_descent . The details of the
implementation are described in the comments. Below, you will utilize this function to
find optimal values of w and b on the training data.

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 5 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

In [ ]: def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradien


"""
Performs gradient descent to fit w,b. Updates w,b by taking
num_iters gradient steps with learning rate alpha

Args:
x (ndarray (m,)) : Data, m examples
y (ndarray (m,)) : target values
w_in,b_in (scalar): initial values of model parameters
alpha (float): Learning rate
num_iters (int): number of iterations to run gradient descent
cost_function: function to call to produce cost
gradient_function: function to call to produce gradient

Returns:
w (scalar): Updated value of parameter after running gradient descent
b (scalar): Updated value of parameter after running gradient descent
J_history (List): History of cost values
p_history (list): History of parameters [w,b]
"""

w = copy.deepcopy(w_in) # avoid modifying global w_in


# An array to store cost J and w's at each iteration primarily for graphing
J_history = []
p_history = []
b = b_in
w = w_in

for i in range(num_iters):
# Calculate the gradient and update the parameters using gradient_funct
dj_dw, dj_db = gradient_function(x, y, w , b)

# Update Parameters using equation (3) above


b = b - alpha * dj_db
w = w - alpha * dj_dw

# Save cost J at each iteration


if i<100000: # prevent resource exhaustion
J_history.append( cost_function(x, y, w , b))
p_history.append([w,b])
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters/10) == 0:
print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e} ",
f"w: {w: 0.3e}, b:{b: 0.5e}")

return w, b, J_history, p_history #return w and J,w history for graphing

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 6 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

In [ ]: # initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init
iterations, compute_cost
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})"

Take a moment and note


some characteristics of the
gradient descent process
printed above.

The cost starts large and


rapidly declines as
described in the slide from
the lecture.
The partial derivatives,
dj_dw , and dj_db also
get smaller, rapidly at first and then more slowly. As shown in the diagram from
the lecture, as the process nears the 'bottom of the bowl' progress is slower due
to the smaller value of the derivative at that point.
progress slows though the learning rate, alpha, remains fixed

Cost versus iterations of gradient descent


A plot of cost versus iterations is a useful measure of progress in gradient descent.
Cost should always decrease in successful runs. The change in cost is so rapid
initially, it is useful to plot the initial decent on a different scale than the final descent.
In the plots below, note the scale of cost on the axes and the iteration step.

In [ ]: # plot cost versus iteration


fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12
ax1.plot(J_hist[:100])
ax2.plot(1000 + np.arange(len(J_hist[1000:])), J_hist[1000:])
ax1.set_title("Cost vs. iteration(start)"); ax2.set_title("Cost vs. iteration
ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step')
plt.show()

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 7 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Predictions
Now that you have discovered the optimal values for the parameters w and b, you
can now use the model to predict housing values based on our learned parameters.
As expected, the predicted values are nearly the same as the training values for the
same housing. Further, the value not in the prediction is in line with the expected
value.

In [ ]: print(f"1000 sqft house prediction {w_final*1.0 + b_final:0.1f} Thousand dollar


print(f"1200 sqft house prediction {w_final*1.2 + b_final:0.1f} Thousand dollar
print(f"2000 sqft house prediction {w_final*2.0 + b_final:0.1f} Thousand dollar

Plotting
You can show the progress of gradient descent during its execution by plotting the
cost over iterations on a contour plot of the cost(w,b).

In [ ]: fig, ax = plt.subplots(1,1, figsize=(12, 6))


plt_contour_wgrad(x_train, y_train, p_hist, ax)

Above, the contour plot shows the cost(w, b) over a range of w and b. Cost levels
are represented by the rings. Overlayed, using red arrows, is the path of gradient
descent. Here are some things to note:

The path makes steady (monotonic) progress toward its goal.


initial steps are much larger than the steps near the goal.

Zooming in, we can see that final steps of gradient descent. Note the distance
between steps shrinks as the gradient approaches zero.

In [ ]: fig, ax = plt.subplots(1,1, figsize=(12, 4))


plt_contour_wgrad(x_train, y_train, p_hist, ax, w_range=[180, 220, 0.5],
contours=[1,5,10,20],resolution=0.5)

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 8 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Increased Learning Rate


In the lecture, there was a
discussion related to the
proper value of the
learning rate, α in
equation(3). The larger α
is, the faster gradient
descent will converge to a
solution. But, if it is too
large, gradient descent
will diverge. Above you
have an example of a
solution which converges nicely.

Let's try increasing the value of α and see what happens:

In [ ]: # initialize parameters
w_init = 0
b_init = 0
# set alpha to a large value
iterations = 10
tmp_alpha = 8.0e-1
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init
iterations, compute_cost

Above, w and b are bouncing back and forth between positive and negative with the
∂J(w,b)
absolute value increasing with each iteration. Further, each iteration changes
∂w
sign and cost is increasing rather than decreasing. This is a clear sign that the
learning rate is too large and the solution is diverging. Let's visualize this with a plot.

In [ ]: plt_divergence(p_hist, J_hist,x_train, y_train)


plt.show()

Above, the left graph shows w's progression over the first few steps of gradient
descent. w oscillates from positive to negative and cost grows rapidly. Gradient
Descent is operating on both w and b simultaneously, so one needs the 3-D plot on
the right for the complete picture.

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tri…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 9 of 10
W2_Lab04_Gradient_Descent_Problem_Assignment 04/06/2025, 01:57

Congratulations!
In this lab you:

delved into the details of gradient descent for a single variable.


developed a routine to compute the gradient
visualized what the gradient is
completed a gradient descent routine
utilized gradient descent to find parameters
examined the impact of sizing the learning rate

In [ ]:

Additional Challenge Questions


Q1. [Prediction Accuracy]
Try modifying the number of training examples or adding noisy data to the dataset.
How does this affect the convergence of gradient descent? How does it impact the
accuracy of predictions?

Q2. [Learning Rate Sensitivity]


Try using very small (e.g., 0.0001) or very large (e.g., 1.0) learning rates. Observe the
cost function during training. Does the model converge? Why or why not?

Q3. [Visual Analysis]


Plot the cost function value versus iteration number for different learning rates. What
can you conclude about the relationship between learning rate and convergence
speed/stability?

http://localhost:8888/nbconvert/html/Desktop/rac/bai%20giang/tr…_Lab04_Gradient_Descent_Problem_Assignment.ipynb?download=false Page 10 of 10

You might also like