Training Models
Mohamed Yasser
1) Training model
2) Computational complexity
3) Gradient descent
4) Batch gradient descent
Topics 5) Stochastic gradient
descent
6) Mini batch gradient
descent
7) Polynomial Regression
9/3/20XX Presentation Title 2
Training model
training a model refers to the process of
teaching a machine learning algorithm
to make predictions or decisions based
on input data. The goal is to enable the
model to generalize well to new, unseen
data.
9/3/20XX Presentation Title 3
Computational complexity
In machine learning, computational
complexity refers to the amount of
computational resources, such as time
and memory, required by an algorithm
to perform a certain task as a function of
the input size. It is concerned with
analyzing the efficiency and scalability
of machine learning algorithms.
Example :
- KNN algorithm it take O(n log n) .
9/3/20XX Presentation Title 4
Linear model
Linear regression is a statistical method and a
fundamental algorithm in machine learning used for
modeling the relationship between a dependent variable
(target) and one or more independent variables
(features). The basic idea is to find the best-fitting linear
relationship (a straight line) that represents the data. This
relationship is expressed in the form of a linear equation.
The equation for simple linear regression with one
independent variable is:
Y = mx + b
Y : dependent variable
X : independent variable
M : slope
B : bias 5
Gradient descent
Gradient Descent is an iterative optimization
algorithm commonly used in machine learning to
minimize a cost function. Its primary purpose is to find
the optimal parameters for a model by adjusting
them in the direction that leads to a reduction in the
cost.
The general idea behind Gradient Descent is to
iteratively move towards the minimum of a function
by taking steps proportional to the negative of the
gradient (derivative) of the function at the current
point.
7
Formula
θ: The parameter or weight being
optimized.
α: The learning rate, which is a
positive constant determining the
step size of the update.
The partial derivative of the
cost function J(θ)
8
How it work ..!
1. Initialize weights randomly.
2. Calculate the predicted output.
3. Compute the loss between predicted and actual
output.
4. Calculate the gradient of the loss with respect to
the weights.
5. Update weights using the gradient and a learning
rate.
6. Repeat steps 2-5 until convergence or a specified
number of iterations.
9
Types of gradient descent
✓ Batch Gradient Descent (BGD)
✓ Stochastic Gradient Descent (SGD)
✓ Mini-Batch Gradient Descent:
10
Batch gradient descent
In Batch Gradient Descent, the entire
training dataset is used to compute the
gradient of the cost function with respect
to the model parameters in each iteration.
The algorithm calculates the average
gradient for the entire dataset and then
updates the model parameters.
11
Stochastic gradient descent
In Stochastic Gradient Descent, the model
parameters are updated after computing
the gradient of the cost function with
respect to the parameters for each
training example individually. The key
difference from Batch Gradient Descent is
that it uses only one training example at a
time for the update.
13
Mini batch gradient descent
Mini-Batch Gradient Descent is a
compromise between Batch Gradient
Descent and Stochastic Gradient Descent.
Instead of using the entire training
dataset (Batch GD) or just one example
(SGD), Mini-Batch GD processes a small
random subset of the training data (a
mini-batch) in each iteration.
15
Polynomial regression
Polynomial regression is a type of
regression analysis in which the
relationship between the independent
variable x and the dependent variable y is
modeled as an n-th degree polynomial.
Instead of fitting a straight line (as in linear
regression), polynomial regression uses a
polynomial equation to capture the
nonlinear relationships between variables.
17
Thank you
20