KEMBAR78
ML Lab-1 | PDF | Mean Squared Error | Median
0% found this document useful (0 votes)
18 views32 pages

ML Lab-1

The document outlines various machine learning techniques, including computing central tendency measures, applying pre-processing techniques, and implementing algorithms such as KNN, decision trees, random forests, Naïve Bayes, support vector machines, linear regression, and logistic regression. Each technique is accompanied by sample code demonstrating its application on datasets, primarily using the Iris dataset. The document emphasizes model evaluation through metrics like accuracy, mean squared error, and classification reports.

Uploaded by

MunaVLakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

ML Lab-1

The document outlines various machine learning techniques, including computing central tendency measures, applying pre-processing techniques, and implementing algorithms such as KNN, decision trees, random forests, Naïve Bayes, support vector machines, linear regression, and logistic regression. Each technique is accompanied by sample code demonstrating its application on datasets, primarily using the Iris dataset. The document emphasizes model evaluation through metrics like accuracy, mean squared error, and classification reports.

Uploaded by

MunaVLakshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

1.

Compute Central Tendency Measures: Mean, Median, Mode Measure of


Dispersion:Variance, Standard Deviation.

2. Apply the following Pre-processing techniques for a given dataset.

a. Attribute selection

b. Handling Missing Values

c. Discretization

d. Elimination of Outliers

3. Apply KNN algorithm for classification and regression

4. Demonstrate decision tree algorithm for a classification problem and perform

parameter tuning for better results

5. Demonstrate decision tree algorithm for a regression problem

6. Apply Random Forest algorithm for classification and regression

7. Demonstrate Naïve Bayes Classification algorithm.

8. Apply Support Vector algorithm for classification

9. Demonstrate simple linear regression algorithm for a regression problem

10. Apply Logistic regression algorithm for a classification problem

11. Demonstrate Multi-layer Perceptron algorithm for a classification problem

12. Implement the K-means algorithm and apply it to the data you selected.
Evaluate performance by measuring the sum of the Euclidean distance of each
example fromits class center. Test the performance of the algorithm as a function
of the parameters K.

13. Demonstrate the use of Fuzzy C-Means Clustering

14. Demonstrate the use of Expectation Maximization based clustering algorithm


1) Compute Central Tendency Measures: Mean, Median, Mode Measure of
Dispersion:Variance, Standard Deviation.

Program:
import numpy as np

import scipy.stats as stats

# Sample data

data = [10, 20, 30, 40, 50]

# Calculate central tendency

mean = np.mean(data)

median = np.median(data)

mode = stats.mode(data)

# Calculate dispersion

variance = np.var(data)

std_dev = np.std(data)

print("Mean:", mean)

print("Median:", median)

print("Mode:", mode)

print("Variance:", variance)

print("Standard Deviation:", std_dev)


output:
Mean: 30.0

Median: 30.0

Mode: ModeResult(mode=array([10]), count=1, axis=0)

Variance: 500.0

Standard Deviation: 22.3606797749979

3) Apply KNN algorithm for classification and regression

Program:

# Import necessary libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, confusion_matrix

# Step 1: Load the dataset (example using the Iris dataset)

from sklearn.datasets import load_iris

data = load_iris()

# Convert to pandas DataFrame

df = pd.DataFrame(data.data, columns=data.feature_names)

df['target'] = data.target
# Step 2: Preprocess data (if needed, e.g., scaling features)

X = df.drop('target', axis=1) # Features

y = df['target'] # Target

# Feature Scaling (important for KNN)

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Step 3: Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3,


random_state=42)

# Step 4: Apply KNN Classifier

knn_classifier = KNeighborsClassifier(n_neighbors=5) # You can tune 'k'

knn_classifier.fit(X_train, y_train)

# Step 5: Make predictions and evaluate the model

y_pred = knn_classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

print("Confusion Matrix:")

print(conf_matrix)
OUTPUT:

Accuracy: 100.00%
Confusion Matrix:
[[19 0 0]
[ 0 13 0]
[ 0 0 13]]

4) Demonstrate decision tree algorithm for a classification problem and


perform parameter tuning for better results

Program:

import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, classification_report

import matplotlib.pyplot as plt

from sklearn import tree

# Load the Iris dataset

data = load_iris()

X = data.data # Features

y = data.target # Target (class labels)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Train a Decision Tree Classifier


dt_classifier = DecisionTreeClassifier(random_state=42)

dt_classifier.fit(X_train, y_train)

# Make predictions

y_pred = dt_classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

# Classification Report

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

# Define the parameter grid for tuning

param_grid = {

'max_depth': [None, 5, 10, 15, 20],

'min_samples_split': [2, 5, 10],

'min_samples_leaf': [1, 2, 4],

'max_features': [None, 'sqrt', 'log2']

# Initialize GridSearchCV

grid_search = GridSearchCV(estimator=DecisionTreeClassifier(random_state=42),

param_grid=param_grid,

cv=5, # 5-fold cross-validation

scoring='accuracy', # Metric to optimize for


n_jobs=-1, # Use all available CPUs

verbose=1)

# Fit GridSearchCV

grid_search.fit(X_train, y_train)

# Best parameters from GridSearchCV

print("Best Parameters:", grid_search.best_params_)

# Get the best model

best_dt_classifier = grid_search.best_estimator_

# Make predictions with the best model

y_pred_best = best_dt_classifier.predict(X_test)

# Evaluate the tuned model

accuracy_best = accuracy_score(y_test, y_pred_best)

print(f"Accuracy after tuning: {accuracy_best * 100:.2f}%")

# Classification Report for the tuned model

print("\nClassification Report for Tuned Model:")

print(classification_report(y_test, y_pred_best))

# Visualize the decision tree

plt.figure(figsize=(12, 8))

tree.plot_tree(best_dt_classifier, filled=True, feature_names=data.feature_names,


class_names=data.target_names)

plt.title("Decision Tree Visualization")


plt.show()

OUTPUT:

Accuracy: 100.00%

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Fitting 5 folds for each of 135 candidates, totalling 675 fits


Best Parameters: {'max_depth': None, 'max_features': None,
'min_samples_leaf': 4, 'min_samples_split': 2}
Accuracy after tuning: 100.00%

Classification Report for Tuned Model:


precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
5) Demonstrate decision tree algorithm for a classification problem and
perform parameter tuning for better results

Program:

# Import necessary libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split


from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import mean_squared_error

from sklearn.datasets import make_regression

# Generate synthetic dataset

X, y = make_regression(n_samples=100, n_features=1, noise=15,


random_state=42)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the Decision Tree Regressor

dt_regressor = DecisionTreeRegressor(max_depth=5, random_state=42)

# Train the model

dt_regressor.fit(X_train, y_train)

# Predict on the test set

y_pred = dt_regressor.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)


print(f"Mean Squared Error: {mse:.2f}")

# Visualize the decision tree's predictions

X_grid = np.linspace(X.min(), X.max(), 500).reshape(-1, 1)

y_grid_pred = dt_regressor.predict(X_grid)

plt.scatter(X, y, color="blue", label="Data")

plt.plot(X_grid, y_grid_pred, color="red", label="Decision Tree Prediction")

plt.title("Decision Tree Regression")

plt.xlabel("Feature")

plt.ylabel("Target")

plt.legend()

plt.show()

OUTPUT:

Mean Squared Error: 396.47


6) Apply Random Forest algorithm for classification and regression

Program:
# Import necessary libraries

from sklearn.datasets import load_iris

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report

# Load the dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the Random Forest Classifier

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model

rf_classifier.fit(X_train, y_train)

# Predict on the test set


y_pred = rf_classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

OUTPUT:

Accuracy: 1.00

Classification Report:

precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9

2 1.00 1.00 1.00 11

accuracy 1.00 30

macro avg 1.00 1.00 1.00 30

weighted avg 1.00 1.00 1.00 30

7)Demonstrate Naïve Bayes Classification algorithm.

Program:
# Import necessary libraries

from sklearn.datasets import load_iris


from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, classification_report,


confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the Naïve Bayes classifier

nb_classifier = GaussianNB()

# Train the model

nb_classifier.fit(X_train, y_train)

# Make predictions on the test set

y_pred = nb_classifier.predict(X_test)
# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

# Classification Report

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

# Confusion Matrix

conf_matrix = confusion_matrix(y_test, y_pred)

sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues",


xticklabels=iris.target_names, yticklabels=iris.target_names)

plt.title("Confusion Matrix")

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.show()

OUTPUT:
Accuracy: 1.00

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

8)Apply Support Vector algorithm for classification

Program:

# Import necessary libraries

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC


from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the Support Vector Classifier with a linear kernel

svc_model = SVC(kernel='linear', random_state=42)

# Train the model

svc_model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = svc_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

# Classification Report

print("\nClassification Report:")

print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Confusion Matrix

conf_matrix = confusion_matrix(y_test, y_pred)

sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues",

xticklabels=iris.target_names, yticklabels=iris.target_names)

plt.title("Confusion Matrix")

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.show()

OUTPUT:

Accuracy: 1.00

Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 10


versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

9)Demonstrate simple linear regression algorithm for a regression problem

Program:

# Import necessary libraries

import numpy as np

import matplotlib.pyplot as plt


from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data

np.random.seed(42)

X = 2.5 * np.random.randn(100, 1) + 1.5 # Feature (Independent Variable)

y = 1.2 * X + np.random.randn(100, 1) * 0.8 # Target (Dependent Variable)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the Linear Regression model

lr_model = LinearRegression()

# Train the model

lr_model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = lr_model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared (R²) Score: {r2:.2f}")

# Visualize the results

plt.figure(figsize=(8, 6))

plt.scatter(X_test, y_test, color="blue", label="Actual Data")

plt.plot(X_test, y_pred, color="red", label="Regression Line")

plt.title("Simple Linear Regression")

plt.xlabel("Independent Variable (X)")

plt.ylabel("Dependent Variable (y)")

plt.legend()

plt.show()

OUTPUT:

Mean Squared Error: 0.56


R-squared (R²) Score: 0.90
10)Apply Logistic regression algorithm for a classification problem

Data set:
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
5,116,74,0,0,25.6,0.201,30,0
3,78,50,32,88,31.0,0.248,26,1
10,115,0,0,0,35.3,0.134,29,0
2,197,70,45,543,30.5,0.158,53,1
8,125,96,0,0,0.0,0.232,54,1
4,110,92,0,0,37.6,0.191,30,0
10,168,74,0,0,38.0,0.537,34,1
10,139,80,0,0,27.1,1.441,57,0
1,189,60,23,846,30.1,0.398,59,1
5,166,72,19,175,25.8,0.587,51,1
7,100,0,0,0,30.0,0.484,32,1
0,118,84,47,230,45.8,0.551,31,1
7,107,74,0,0,29.6,0.254,31,1
1,103,30,38,83,43.3,0.183,33,0

Program:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset (Pima Indians Diabetes Dataset)


# You can replace this with your local dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-
diabetes.data.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness',
'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=columns)

# Separate features (X) and target (y)


X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the Logistic Regression model


log_reg_model = LogisticRegression(max_iter=200, random_state=42)

# Train the model


log_reg_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = log_reg_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=["No Diabetes",
"Diabetes"]))

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues",
xticklabels=["No Diabetes", "Diabetes"], yticklabels=["No Diabetes",
"Diabetes"])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

OUTPUT:
Accuracy: 0.75

Classification Report:
precision recall f1-score support

No Diabetes 0.81 0.79 0.80 99


Diabetes 0.64 0.67 0.65 55

accuracy 0.75 154


macro avg 0.73 0.73 0.73 154
weighted avg 0.75 0.75 0.75 154
11. Demonstrate Multi-layer Perceptron algorithm for a classification
problem

Program:

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import accuracy_score

# Load Iris dataset

data = load_iris()
X, y = data.data, data.target

# Split into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Define and train the MLP classifier

mlp = MLPClassifier(hidden_layer_sizes=(10,), activation='relu', solver='adam',


max_iter=500, random_state=42)

mlp.fit(X_train, y_train)

# Make predictions

y_pred = mlp.predict(X_test)

# Evaluate performance

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.4f}')

OUTPUT:

Accuracy: 0.9333

12. Implement the K-means algorithm and apply it to the data you selected.
Evaluate performance by measuring the sum of the Euclidean distance of each
example fromits class center. Test the performance of the algorithm as a
function of the parameters K.

Program:
import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

# Generate synthetic data

np.random.seed(42)

X, _ = make_blobs(n_samples=300, centers=3, cluster_std=1.0)

# Function to evaluate K-Means clustering performance

def evaluate_kmeans(X, max_k=10):

sse = [] # Sum of Squared Errors (SSE)

for k in range(1, max_k + 1):

kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

kmeans.fit(X)

sse.append(kmeans.inertia_) # SSE (inertia) for the given K

return sse

# Evaluate performance for different values of K

max_k = 10

sse_values = evaluate_kmeans(X, max_k)


# Plot SSE vs K

plt.figure(figsize=(8, 5))

plt.plot(range(1, max_k + 1), sse_values, marker='o', linestyle='--', color='b')

plt.xlabel("Number of Clusters (K)")

plt.ylabel("Sum of Squared Errors (SSE)")

plt.title("K-Means Performance Evaluation (Elbow Method)")

plt.grid()

plt.show()

OUTPUT:
13. Demonstrate the use of Fuzzy C-Means Clustering

Program:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=3, cluster_std=1.2, random_state=42)

# Initialize cluster centers randomly

cntr = X[np.random.choice(len(X), 3, replace=False)]

for _ in range(50): # Reduce iterations for simplicity

dist = np.linalg.norm(X[:, None] - cntr, axis=2)

u = 1 / (dist ** 2 + 1e-8) # Avoid division by zero

u /= u.sum(axis=1, keepdims=True)

new_cntr = (u.T @ X) / u.sum(axis=0, keepdims=True).T

if np.linalg.norm(new_cntr - cntr) < 1e-5:

break

cntr = new_cntr
# Assign clusters

labels = np.argmax(u, axis=1)

# Plot results

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')

plt.scatter(cntr[:, 0], cntr[:, 1], c='red', marker='x', s=200)

plt.title("Fuzzy C-Means Clustering")

plt.show()

OUTPUT:
14. Demonstrate the use of Expectation Maximization based clustering
algorithm

Program:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.mixture import GaussianMixture

# Generate synthetic data

np.random.seed(42)

X1 = np.random.normal(loc=2, scale=1.0, size=(100, 2)) # Cluster 1

X2 = np.random.normal(loc=-2, scale=1.0, size=(100, 2)) # Cluster 2

X3 = np.random.normal(loc=5, scale=1.5, size=(100, 2)) # Cluster 3

X = np.vstack((X1, X2, X3)) # Combine all clusters

# Apply Gaussian Mixture Model (Expectation-Maximization)

gmm = GaussianMixture(n_components=3, covariance_type='full',


random_state=42)

gmm.fit(X)

# Predict cluster labels

labels = gmm.predict(X)

# Plot clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', alpha=0.6)

plt.scatter(gmm.means_[:, 0], gmm.means_[:, 1], s=200, c='red', marker='X',


label='Centroids')

plt.title("Expectation Maximization Clustering (GMM)")

plt.legend()

plt.show()

OUTPUT:

You might also like