linear
regression
GROUP 4
What is a linear regression?
Linear regression is an algorithm that provides a
linear relationship between an independent variable
and a dependent variable.
It's used for predicting outcomes of future events.
note:
Independent Variable: Predictor variable (X)
Dependent Variable: Outcome variable (Y)
Purpose: Predict continuous or numeric variables
(e.g., sales, salary, price)
Best Fit Line for a Linear Regression Model
X-axis = Independent variable
Y-axis = Output / dependent
variable
Line of regression = Best fit line for
a model
Key Benefits of Linear Regression
1. Easy implementation
The linear regression model is computationally simple to implement, as it does not require significant engineering
overhead, either before the model launch or during its maintenance.
2. Interpretability
Unlike other deep learning models (neural networks), linear regression is relatively straightforward. As a result,
this algorithm stands out against black-box models, which fail to justify how input variables influence changes in the
output variable.
3. Scalability
Linear regression is not computationally intensive and, therefore, functions well in scenarios where scaling is
crucial. For example, the model scales effectively with increased data volume (big data).
4. Optimal for online settings
The ease of computation associated with these algorithms allows them to be utilized in online settings. The model
can be trained and retrained with each new example to generate predictions in real-time, unlike neural networks or
support vector machines, which are computationally intensive and require substantial computing resources and
significant waiting time to retrain on a new dataset. All these factors render compute-intensive models expensive and
unsuitable for real-time applications.
Linear Regression Equation
Equation:
Y=mX+b
Where:
Y: Dependent variable
X: Independent variable
m: Slope (rate of change)
b: Y-axis intercept
Multiple Linear Regression Equation
Equation:
y(x) = p₀ + p₁x₁ + p₂x₂ + ... + pₙxⁿ
Where:
y(x):
This represents the predicted value of the dependent variable (the variable you're trying to predict).
p0:
This is the y-intercept, the value of y when all the independent variables are zero.
p1, p2, ..., pn:
These are the regression coefficients, also known as slopes. They represent the change in the predicted value of y for a one-
unit increase in the corresponding independent variable, while holding all other independent variables constant.
x1, x2, ..., xn:
These are the independent variables (also called predictors or regressors) that are used to predict the value of y.
The Concept of MSE (Mean Squared Error)
Where:
N: is the number
of data points.
yi: is the actual
value
y^i: is the
predicted value.
Purpose: Measures the difference between actual and
predicted values.
Objective: Minimize MSE for better model accuracy
Types of Linear Regression
Simple Linear One independent variable predicting a
dependent variable.
Regression: Example: Pollution levels vs.
temperature.
Multiple Linear Multiple independent variables predict a dependent
variable.
Regression:
Example: Blood pressure prediction using height,
weight, and exercise.
Types of Linear Regression (cont'd)
Logistic Regression: Predicts binary outcomes (0 or 1).
Example: Likelihood of clicking on an offer.
Ordinal Regression: Predicts an ordered categorical variable.
Example: Survey responses (Agree, Strongly Agree, etc.).
Multinomial Logistic Regression: Predicts outcomes with more than two
categories.
Example: Predicting program choice (Vocational, Sports, Academic)
Conclusion
Linear regression is
fundamental in predictive
analytics and machine
learning.
Linear regression remains one of the
Simple, scalable, and most powerful and commonly used
interpretable. methods in data science.
Proper assumptions and
best practices ensure
accurate models.