KEMBAR78
Machine Learning | PDF | Mean Squared Error | Algorithms
0% found this document useful (0 votes)
47 views10 pages

Machine Learning

gbvfb g

Uploaded by

Shrenik Pittala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

Machine Learning

gbvfb g

Uploaded by

Shrenik Pittala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Notebook

January 1, 2025

Linear Regression

[19]: import numpy as np


import matplotlib.pyplot as plt

# Data points
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([5, 8, 9, 11, 20, 16, 17, 18, 21, 26])

# Number of observations
n = len(x)

# Mean values of x and y


mean_x = np.mean(x)
mean_y = np.mean(y)

# Calculate coefficients b1 and b0


numerator = np.sum(x * y) - (n * mean_x * mean_y)
denominator = np.sum(x**2) - (n * mean_x**2)

b1 = numerator / denominator
b0 = mean_y - b1 * mean_x

print(f"Estimated coefficients are:")


print(f"b0 = {b0}")
print(f"b1 = {b1}")

# Scatter plot
plt.scatter(x, y, color="b", label='Data', marker="o", s=100)

# Regression line
y_pred = b0 + b1 * x
plt.plot(x, y_pred, color='red', label='Regression Line', markersize=10)

plt.xlabel('x')
plt.ylabel('y')
plt.title("Simple Linear Regression", fontsize=30, color="magenta")
plt.legend()

1
plt.show()

Estimated coefficients are:


b0 = 3.799999999999999
b1 = 2.0545454545454547

Multiple Linear Regression

[15]: import pandas as pd


import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Dataset
data = pd.read_csv(r"C:\Users\P. Shrenik Kumar\Downloads\Housing.csv")
print(data)
# Load the dataset from a CSV file
file_path = r"C:\Users\P. Shrenik Kumar\Downloads\Housing.csv" # Replace with␣
↪your CSV file path

2
# Display the first few rows of the dataset
print(data.head())
# Assuming the dependent variable (target) is in a column named 'target'
# and the independent variables are in columns 'feature1', 'feature2', etc.
# Define the independent variables (features) and the dependent variable␣
↪(target)

X = data[['area', 'bedrooms', 'bathrooms']]


y= data['price']
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# Initialize the linear regression model


model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Output the model evaluation metrics
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
# Plot Actual vs Predicted
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.axline((0,0),slope=1,color='red')

price area bedrooms bathrooms stories mainroad guestroom basement \


0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes
.. … … … … … … … …
540 1820000 3000 2 1 1 yes no yes
541 1767150 2400 3 1 1 no no no
542 1750000 3620 2 1 1 yes no no
543 1750000 2910 3 1 1 no no no
544 1750000 3850 3 1 2 yes no no

hotwaterheating airconditioning parking prefarea furnishingstatus


0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished

3
.. … … … … …
540 no no 2 no unfurnished
541 no no 0 no semi-furnished
542 no no 0 no unfurnished
543 no no 0 no furnished
544 no no 0 no unfurnished

[545 rows x 13 columns]


price area bedrooms bathrooms stories mainroad guestroom basement \
0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes

hotwaterheating airconditioning parking prefarea furnishingstatus


0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished
Mean Squared Error: 2750040479309.0513
R-squared: 0.45592991188724474

[15]: <matplotlib.lines.AxLine at 0x1cbc6c36b40>

4
Decision Tree Classfier

[6]: # Import necessary libraries


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import␣
↪accuracy_score,classification_report,confusion_matrix

from sklearn import tree


import matplotlib.pyplot as plt
# Load the Iris dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Labels
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,␣
↪random_state=42)

# Initialize the Decision Tree classifier


clf = DecisionTreeClassifier()
# Train the classifier
clf.fit(X_train, y_train)
# Predict on the test set
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test,y_pred)
class_report = classification_report(y_test,y_pred,target_names=iris.
↪target_names)

print(accuracy)
print(cm)
print(class_report)
# Visualize the Decision Tree
plt.figure(figsize=(12, 8))
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.
↪target_names)

plt.title("Decision Tree for Iris Dataset", color='red',size=42)


plt.show()

1.0
[[19 0 0]
[ 0 13 0]
[ 0 0 13]]
precision recall f1-score support

setosa 1.00 1.00 1.00 19


versicolor 1.00 1.00 1.00 13

5
virginica 1.00 1.00 1.00 13

accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45

KNN

[8]: # Import necessary libraries


from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report,␣
↪confusion_matrix

# Load the Iris dataset


data = load_iris()
X = data.data # Features
y = data.target # Labels
# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

6
# Create and train the KNN classifier
k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Predict the labels for the test set
y_pred = knn.predict(X_test)
# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test,y_pred)
class_report = classification_report(y_test,y_pred,target_names=data.
↪target_names)

print(accuracy)
print(cm)
print(class_report)

1.0
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
precision recall f1-score support

setosa 1.00 1.00 1.00 10


versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Logistic Regression

[9]: # Import necessary libraries


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report,␣
↪confusion_matrix

# Load the Iris dataset


data = load_iris()
X = data.data # Features
y = data.target # Labels
# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# Create and train the Logistic Regression model


log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train, y_train)

7
# Predict the labels for the test set
y_pred = log_reg.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test,y_pred)
class_report = classification_report(y_test,y_pred,target_names=data.
↪target_names)

print(accuracy)
print(cm)
print(class_report)

1.0
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
precision recall f1-score support

setosa 1.00 1.00 1.00 10


versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

K Means

[10]: # Import required libraries


from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

# Load the Iris dataset


X = load_iris().data

# Create and train the K-Means model


kmeans = KMeans(n_clusters=3, random_state=42).fit(X)

# Plot the clusters (using the first two features)


plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.title("K-Means Clustering on Iris Dataset")
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

8
Sure! Let’s break down the statement plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_,
cmap='viridis') in detail:

0.0.1 1. plt.scatter
This is a function in the matplotlib.pyplot module that creates a scatter plot. A scatter plot
displays points in a 2D space, where each point represents a data sample, and its position is
determined by two numerical features (x and y).

0.0.2 2. X[:, 0]
• X is the feature matrix (data) loaded from the Iris dataset.
• X[:, 0] selects all rows (:) of the first column (0) from X. This column corresponds to the
feature “sepal length (cm)” in the Iris dataset.
• This becomes the x-coordinate for each data point in the scatter plot.

9
0.0.3 3. X[:, 1]
• Similar to X[:, 0], this selects the second column (1) of X, which corresponds to the feature
“sepal width (cm)” in the Iris dataset.
• This becomes the y-coordinate for each data point in the scatter plot.

0.0.4 4. c=kmeans.labels_
• kmeans.labels_ contains the cluster labels assigned to each data point by the K-Means
model.
– For example, if there are 3 clusters, the labels might look like [0, 1, 2, 1, 0, ...].
– These labels are used to group data points by their cluster assignment.
• The c parameter assigns a different color to each cluster based on these labels.

0.0.5 5. cmap='viridis'
• cmap stands for “color map,” which defines the set of colors used for the scatter plot.
• 'viridis' is a popular color map that provides a visually appealing gradient of colors,
transitioning from dark blue to bright yellow.
• Each cluster label (e.g., 0, 1, 2) is mapped to a specific color within this gradient.

0.0.6 6. Putting It All Together


This line plots a scatter plot where: - The x-coordinates are the sepal lengths (X[:, 0]). - The
y-coordinates are the sepal widths (X[:, 1]). - The points are colored based on the clusters
(kmeans.labels_), with colors chosen from the viridis color map.

0.0.7 Example in Action


If the Iris dataset contains 150 samples: - X[:, 0] and X[:, 1] provide 150 x and y coordinates.
- kmeans.labels_ assigns one of three labels (e.g., 0, 1, 2) to each sample. - cmap='viridis'
ensures each label gets a distinct color.
When executed, this produces a visual representation of the clusters found by K-Means, making it
easy to observe patterns or groupings in the data.
This notebook was converted with convert.ploomber.io

10

You might also like