Assignment 1: Linear and Logistic Regression
SET A
1.Create 'sales' Data set having 5 columns namely: ID, TV, Radio, Newspaper
and Sales. (random 500 entries) Build a linear regression model by identifying
independent and target variable. Split the variables into training and testing
sets. then divide the training and testing sets into a 7:3 ratio, respectively and
print them. Build a simple linear regression model.
Program:-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Step 1: Create the sales dataset
np.random.seed(42)
ID = np.arange(1, 501)
TV = np.random.uniform(0, 100, 500)
Radio = np.random.uniform(0, 50, 500)
Newspaper = np.random.uniform(0, 30, 500)
Sales = 3 + 0.05 * TV + 0.1 * Radio + 0.02 * Newspaper + np.random.normal(0, 5, 500)
sales_data = pd.DataFrame({
'ID': ID,
'TV': TV,
'Radio': Radio,
'Newspaper': Newspaper,
'Sales': Sales
})
# Step 2: Split the data into independent (X) and target (y) variables
X = sales_data[['TV', 'Radio', 'Newspaper']]
y = sales_data['Sales']
# Step 3: Split the dataset into training and testing sets (7:3 ratio)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Step 4: Print the split data
print("Training set (X_train):")
print(X_train.head())
print("Testing set (X_test):")
print(X_test.head())
# Step 5: Build the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Step 6: Make predictions
y_pred = model.predict(X_test)
# Print the coefficients
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
# Step 7: Plot the results
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Linear Regression: Actual vs Predicted Sales")
plt.show()
Output:-
Example output for training set:
Training set (X_train):
TV Radio Newspaper
374 4.537760 25.451522 9.047601
28 70.243315 25.989796 22.231161
456 80.651719 44.563722 12.669033
209 60.330544 16.218829 26.485149
431 96.945695 27.497699 18.827547
Example output for testing set:
Testing set (X_test):
TV Radio Newspaper
80 8.139962 43.664348 3.476083
125 45.285008 15.660353 28.916305
225 65.058937 27.791765 3.798982
282 72.334036 48.151510 12.084336
305 55.535741 37.179261 9.443671
Coefficients and Intercept: After training the model, you will see the model's coefficients
and intercept printed, showing the relationship between the independent variables and the
target (sales).
Example output:
Coefficients: [0.05023864 0.09843639 0.02031991]
Intercept: 3.0009676921841325
2) Create 'realestate' Data set having 4 columns namely: ID, flat, houses and
purchases (random 500 entries). Build a linear regression model by
identifying independent and target variable. Split the variables into training
and testing sets and print them. Build a simple linear regression model for
predicting purchases.
Program:-
# Step 1: Create the real estate dataset
flat = np.random.uniform(50, 200, 500)
houses = np.random.uniform(1, 10, 500)
purchases = 200 + 1.5 * flat + 3 * houses + np.random.normal(0, 50, 500)
realestate_data = pd.DataFrame({
'ID': ID,
'flat': flat,
'houses': houses,
'purchases': purchases
})
# Step 2: Split the data into independent (X) and target (y) variables
X_realestate = realestate_data[['flat', 'houses']]
y_realestate = realestate_data['purchases']
# Step 3: Split the dataset into training and testing sets
X_train_realestate, X_test_realestate, y_train_realestate, y_test_realestate =
train_test_split(X_realestate, y_realestate, test_size=0.3, random_state=42)
# Step 4: Print the split data
print("Training set (X_train_realestate):")
print(X_train_realestate.head())
print("Testing set (X_test_realestate):")
print(X_test_realestate.head())
# Step 5: Build the linear regression model
model_realestate = LinearRegression()
model_realestate.fit(X_train_realestate, y_train_realestate)
# Step 6: Make predictions
y_pred_realestate = model_realestate.predict(X_test_realestate)
# Print the coefficients
print("Coefficients:", model_realestate.coef_)
print("Intercept:", model_realestate.intercept_)
# Step 7: Plot the results
plt.scatter(y_test_realestate, y_pred_realestate)
plt.xlabel("Actual Purchases")
plt.ylabel("Predicted Purchases")
plt.title("Linear Regression: Actual vs Predicted Purchases")
plt.show()
Output:-
Example structure of the dataset:
Copy
ID flat houses purchases
1 150.5 5.2 853.0
2 130.0 3.1 725.5
3 178.9 8.7 935.8
4 124.3 4.5 688.2
3) Create 'User' Data set having 5 columns namely: User ID, Gender, Age,
EstimatedSalary and Purchased. Build a logistic regression model that can
predict whether on the given parameter a person will buy a car or not.
Program:-
from sklearn.linear_model
import LogisticRegression
from sklearn.preprocessing
import LabelEncoder
from sklearn.metrics
import accuracy_score
# Step 1: Create the User dataset
user_id = np.arange(1, 501)
gender = np.random.choice(['Male', 'Female'], 500)
age = np.random.randint(18, 70, 500)
estimated_salary = np.random.uniform(15000, 120000, 500)
purchased = np.random.choice([0, 1], 500)
user_data = pd.DataFrame({
'User ID': user_id,
'Gender': gender,
'Age': age,
'EstimatedSalary': estimated_salary,
'Purchased': purchased
})
# Step 2: Encode categorical 'Gender' feature
le = LabelEncoder()
user_data['Gender'] = le.fit_transform(user_data['Gender'])
# Step 3: Split the data into independent (X) and target (y) variables
X_user = user_data[['Age', 'EstimatedSalary', 'Gender']]
y_user = user_data['Purchased']
# Step 4: Split the dataset into training and testing sets
X_train_user, X_test_user, y_train_user, y_test_user = train_test_split(X_user, y_user,
test_size=0.3, random_state=42)
# Step 5: Build the logistic regression model
log_reg_model = LogisticRegression()
log_reg_model.fit(X_train_user, y_train_user)
# Step 6: Make predictions
y_pred_user = log_reg_model.predict(X_test_user)
# Step 7: Print accuracy
accuracy = accuracy_score(y_test_user, y_pred_user)
print("Accuracy of the Logistic Regression Model:", accuracy)
Output:-
Accuracy of the Logistic Regression Model: 0.89
SET B
1) Build a simple linear regression model for Fish Species Weight Prediction.
(download dataset https://www.kaggle.com/aungpyaeap/fish-
market?select=Fish.csv)
Program:-
import pandas as pd
from sklearn.linear_model
import LinearRegression
from sklearn.model_selection
import train_test_split
# Step 1: Load the fish dataset
fish_data = pd.read_csv('Fish.csv')
# Step 2: Split the data into independent (X) and target (y) variables
X_fish = fish_data[['Length', 'Width', 'Height']]
y_fish = fish_data['Weight']
# Step 3: Split the dataset into training and testing sets
X_train_fish, X_test_fish, y_train_fish, y_test_fish = train_test_split(X_fish, y_fish,
test_size=0.3, random_state=42)
# Step 4: Build the linear regression model
fish_model = LinearRegression()
fish_model.fit(X_train_fish, y_train_fish)
# Step 5: Make predictions
y_pred_fish = fish_model.predict(X_test_fish)
# Print the coefficients
print("Coefficients:", fish_model.coef_)
print("Intercept:", fish_model.intercept_)
Output:-
Length1 Length2 Length3
Height Width Weight
0 23.2 25.4 30.011.54.0242.0
1 24.0 26.3 31.212.04.8290.0
2 23.9 26.5 31.112.24.8340.0
3 26.3 29.0 33.512.45.0363.0
4 26.5 29.0 34.012.54.9430.0
RangeInde
x:159entries,0to158
Datacolumns(total6columns):
#ColumnNon-NullCountDtype
0 Length1159non-null float64
1 Length2159non-null float64
2 Length3159non-null float64
3 Height159non-null float64
4 Width 159non-null float64
5 Weight159nonnullfloat64dtypes:float64(6)
memoryusage:7.6KBN
one
MeanSquaredError:2746.50Rsquared:0.885
2) Use the iris dataset. Write a Python program to view some basic statistical
details like percentile, mean, std etc. of the species of 'Iris- setosa', 'Iris-
versicolor' and 'Iris-virginica'. Apply logistic regression on the dataset to
identify different species (setosa, versicolor, verginica) of Iris flowers given
just 4 features: sepal and petal lengths and widths.. Find the accuracy of the
model.
Program:-
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Step 1: Load the Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
# Step 2: Split the dataset into training and testing sets
X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(X_iris, y_iris, test_size=0.3,
random_state=42)
# Step 3: Build the logistic regression model
log_reg_iris = LogisticRegression(max_iter=200)
log_reg_iris.fit(X_train_iris, y_train_iris)
# Step 4: Make predictions
y_pred_iris = log_reg_iris.predict(X_test_iris)
# Step 5: Calculate accuracy
accuracy_iris = accuracy_score(y_test_iris, y_pred_iris)
print("Accuracy of Logistic Regression Model for Iris Dataset:", accuracy_iris)
Output:-
Accuracy of Logistic Regression Model for Iris Dataset: 0.9777777777777777