KEMBAR78
Linear Regression Model | PDF | Regression Analysis | Coefficient Of Determination
0% found this document useful (0 votes)
6 views4 pages

Linear Regression Model

Industrial 4.0 copy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Linear Regression Model

Industrial 4.0 copy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Aim: Predicting Sales from Advertisement Data Using Linear Regression for Smart Business

Modeling
Software Tools: Google Collab,
Libraries: pandas (pd) → For reading the CSV file and handling tabular data.
numpy (np) → For numerical calculations (mean, sum, array operations).
matplotlib.pyplot (plt) → For visualizing the dataset and regression results.

Description: This demonstrates predicting sales from advertisement data using a simple linear
regression model implemented from scratch.
1. Data Upload & Loading → The Advertising.csv file is uploaded, cleaned (column names
stripped of spaces), and displayed.
2. Feature Selection → TV advertising budget is chosen as the input feature (X) and sales as the
output target (Y).
3. Data Splitting → The dataset is manually split into training and test sets.
4. Model Building →
o Slope (m) and intercept (b) are calculated using statistical formulas.
o The regression line is defined as:
o 𝑦 = 𝑚𝑥 + 𝑐
5. Visualization → Scatter plots show actual data points and the fitted regression line.
6. Prediction → Sales prediction is made for a TV ad budget of ₹50,000.
7. Model Testing → Predicted sales are compared with actual test set sales using a scatter plot.
8. Performance Evaluation →
o Mean Squared Error (MSE) measures average error.
o R-squared indicates how well the model explains variance in sales.

Code:

# Step 1: Upload your Advertising.csv file


from google.colab import files
uploaded = files.upload()

# Step 2: Import required libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Step 3: Load dataset


# Get the uploaded file name
uploaded_file_name = list(uploaded.keys())[0]
data = pd.read_csv(uploaded_file_name)

# Clean column names in case there are spaces or casing issues


data.columns = data.columns.str.strip()

# Display the dataset


print("First 5 rows of dataset:")
print(data.head())
print("Column Names:", data.columns)

# Step 4: Extract Feature (TV) and Target (Sales) variables


x = data['TV'].values
y = data['Sales'].values

# Step 5: Split dataset into training and test sets


x_train = x[:150]
x_test = x[150:]
y_train = y[:150]
y_test = y[150:]

# Step 6: Define helper functions


def errors_product(x, y):
return np.sum((x - np.mean(x)) * (y - np.mean(y)))

def squared_errors(x):
return np.sum((x - np.mean(x))**2)

# Step 7: Calculate slope and intercept


slope = errors_product(x_train, y_train) / squared_errors(x_train)
intercept = np.mean(y_train) - slope * np.mean(x_train)

print(f"Slope: {slope}")
print(f"Intercept: {intercept}")

# Step 8: Plot the best fit regression line


plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='red', marker='o')
plt.plot(x, slope * x + intercept, color='black', linewidth=2)
plt.title('Regression Line: TV Advertisement vs Sales')
plt.xlabel('TV Advertisement Expense (₹1000s)')
plt.ylabel('Sales (Units in 1000s)')
plt.legend(['Best Fit Line', 'Data'])
plt.grid(True)
plt.show()

# Step 9: Predict sales for Rs. 50,000 spent on TV ads


def sales_predicted(tv_budget_k):
return slope * tv_budget_k + intercept

predicted_sales = sales_predicted(50) * 1000 # rescale back


print(f"Predicted Sales for Rs 50,000 spent on TV ads:
{predicted_sales}")

# Step 10: Compare original and predicted test data


y_predicted = slope * x_test + intercept

plt.figure(figsize=(8, 5))
plt.scatter(x_test, y_test, color='red', marker='o')
plt.scatter(x_test, y_predicted, color='black', marker='+')
plt.title('Original vs Predicted Sales (Test Set)')
plt.xlabel('TV Advertisement Expense ($1000s)')
plt.ylabel('Sales (Units in 1000s)')
plt.legend(['Original', 'Predicted'])
plt.grid(True)
plt.show()

# Step 11: Evaluate model performance


mean_error = np.mean((y_test - y_predicted)**2)
r_squared = np.corrcoef(y_test, y_predicted)[0, 1]**2

print(f"Mean Squared Error: {mean_error}")


print(f"R-squared Value: {r_squared}")

Output:
Saving Advertising.csv to Advertising (3).csv
First 5 rows of dataset:
Unnamed: 0 TV Radio Newspaper Sales
0 1 230.1 37.8 69.2 22.1
1 2 44.5 39.3 45.1 10.4
2 3 17.2 45.9 69.3 9.3
3 4 151.5 41.3 58.5 18.5
4 5 180.8 10.8 58.4 12.9
Column Names: Index(['Unnamed: 0', 'TV', 'Radio', 'Newspaper',
'Sales'], dtype='object')
Slope: 0.04906288039571123
Intercept: 7.110732084446855

You might also like