0% found this document useful (0 votes)

38 views36 pages

ML Lab Manual

Uploaded by

I Am Ankur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views36 pages

ML Lab Manual

Uploaded by

I Am Ankur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

PROGRAM 1:

AIM: To implement a program to load and view the dataset

ALGORITHM:
1. Download a dataset from Kaggle

2. Import the dataset or load in the jupyter book or in google colab

3. Create a duplicate set.

4. Display mean,median and other statistics using describe function().

PROGRAM:

import pandas as pd
import matplotlib.pylot as plt
af = pd.read_csv(“Most streamed spotify songs 2024.csv”, encoding=’unicode_escape’)
new_af=af.copy()
new_af=new_af[[“Artist”, “release
elease date”, “All time rank”, “spotify playlist reach”, “youtube
views”]]
new_af.head(10)

OUTPUT:

RESULT: Thus the desired output of most streamed songs has been successfully executed.
PROGRAM 2

AIM: To display the summary and statistics of the dataset.

ALGORITHM:

1. Download a dataset from Kaggle

2. Import the dataset or load in the jupyter book or in google colab

3. Create a duplicate set.

4. Display mean,median and other statistics using describe function().

PROGRAM:

median = af[“spotify popularity”].median()

print(“Median is “, median)

new_af.describe()

OUTPUT:

Median is 67.0

RESULT: Thus the program to display the summary and statistics has been successfully verified
and executed.
PROGRAM 3

AIM: To implement linear regression to perform prediction.

ALGORITHM:

1. Initialize Parameters: Start by initializing the parameters like coefficients (slope and
intercept).

2. Input Data: Gather the dataset containing the independent variable (X) and dependent
variable (Y).
3. Feature Scaling (Optional): Normalize or standardize the input data if necessary to ensure
better convergence.

4. Split Data: Divide the dataset into training and testing sets to evaluate the model.

5. Model Training: Implement a method to optimize the parameters (coefficients) based on the
training data. This can be done using techniques like Gradient Descent, Normal Equations, or
using libraries like scikit-learn.

6. Prediction: Use the learned parameters to predict outcomes for new data points or the test
set.

7. Evaluation: Measure the performance of the model using evaluation metrics like Mean
Squared Error (MSE), R-squared, etc.

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

# Sample dataset

X = np.array([1, 2, 3, 4, 5])

Y = np.array([2, 3, 4, 5, 6])

# Function to perform linear regression using Ordinary Least Squares

def linear_regression_ols(X, Y):

# Add a column of ones to X for the intercept term

X_b = np.c_[np.ones((len(X), 1)), X]

# Calculate theta using the Normal Equation: theta = (X^T * X)^(-1) * X^T * Y
theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(Y)
return theta

# Function to make predictions

def predict(X, theta):

# Add a column of ones to X for the intercept term

X_b = np.c_[np.ones((len(X), 1)), X]

# Predict Y_hat = X_b * theta

Y_pred = X_b.dot(theta)

return Y_pred

# Perform linear regression

theta = linear_regression_ols(X, Y)

# Make predictions
X_new = np.array([6, 7])
predictions = predict(X_new, theta)

# Plotting the original data and the linear regression line

plt.scatter(X, Y, color='blue', label='Data points')

plt.plot(X_new, predictions, color='red', label='Linear Regression')

plt.xlabel('X')

plt.ylabel('Y')
plt.legend()

plt.title('Linear Regression using Ordinary Least Squares')

plt.show()

print(f"Predictions for X_new: {predictions}")

OUTPUT:

RESULT: Thus the implementation of linear regression to perform prediction has been successfully
executed.
PROGRAM 4.1

AIM: To implement Bayesian logistic regression for classification

ALGORITHM:

8. Initialize Parameters: Start with prior distributions for the model parameters (e.g.,
coefficients, intercept) and likelihood distributions based on the data.
9. Input Data: Gather the dataset containing features (X) and corresponding binary
labels (Y).
10. Posterior Calculation: Use Bayesian inference techniques such as Markov Chain
Monte Carlo (MCMC) or Variational Inference to compute the posterior distribution
over the parameters given the data.
11. Prediction: Use the posterior distribution to predict the probability of classes for
new data points.
12. Evaluation: Measure the performance of the model using metrics such as accuracy,
precision, recall, and F1-score.

PROGRAM:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic data

X, Y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split data into train and test sets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Add intercept to X_train for bias term

X_train = np.c_[np.ones((len(X_train), 1)), X_train]
# Define sigmoid function
def sigmoid(z):

return 1 / (1 + np.exp(-z))

# Initialize parameters randomly

np.random.seed(42)
theta = np.random.randn(X_train.shape[1])

# Bayesian Logistic Regression with Metropolis-Hastings sampling

def bayesian_logistic_regression(X, Y, num_samples=1000, burn_in=200):
m, n = X.shape
trace = np.zeros((num_samples, n)) # Trace to store samples of theta
theta_current = theta.copy()
acceptance_count = 0

for i in range(num_samples):
# Generate proposal from Gaussian distribution
proposal = theta_current + np.random.randn(n)

# Calculate prior probabilities (assuming uninformative priors)

prior_current = np.sum(-np.log(1 + np.exp(-(X.dot(theta_current)))))
prior_proposal = np.sum(-np.log(1 + np.exp(-(X.dot(proposal)))))

# Calculate likelihoods
likelihood_current = np.sum(Y * X.dot(theta_current) - np.log(1 +
np.exp(X.dot(theta_current))))
likelihood_proposal = np.sum(Y * X.dot(proposal) - np.log(1 + np.exp(X.dot(proposal))))

# Calculate posterior probabilities

posterior_current = likelihood_current + prior_current
posterior_proposal = likelihood_proposal + prior_proposal

# Accept or reject the proposal

acceptance_prob = np.exp(posterior_proposal - posterior_current)
accept = np.random.rand() < acceptance_prob

if accept:
theta_current = proposal
acceptance_count += 1

trace[i] = theta_current

acceptance_rate = acceptance_count / num_samples

print(f'Acceptance rate: {acceptance_rate}')
return trace[burn_in:]

# Perform Bayesian Logistic Regression

trace = bayesian_logistic_regression(X_train, Y_train)

# Predictions for test data

X_test = np.c_[np.ones((len(X_test), 1)), X_test]
logits = X_test.dot(trace.mean(axis=0))
Y_pred = (sigmoid(logits) >= 0.5).astype(int)

# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
print(f'Accuracy: {accuracy}')

# Plotting the coefficients distribution

plt.figure(figsize=(10, 6))
plt.hist(trace[:, 1:], bins=30, label='Coefficients')
plt.xlabel('Coefficient Value')
plt.ylabel('Frequency')
plt.title('Posterior Distribution of Coefficients')
plt.legend()
plt.grid(True)
plt.show()

OUTPUT:
Acceptance rate: 0.005
Accuracy: 0.69

RESULT: Thus the program for to implement Bayesian logistic regression for classification is
been successfully executed.
PROGRAM 4.2

AIM: To implement the SVM for classification

ALGORITHM:

i. Initialize Parameters: Start with setting parameters such as kernel type (linear,
polynomial, radial basis function (RBF)), regularization parameter (C), and kernel
coefficients (gamma).
ii. Input Data: Gather the dataset containing features (X) and corresponding binary
labels (Y).
iii. Model Training: Use the training data to fit the SVM model, adjusting the
parameters to maximize the margin between classes.
iv. Prediction: Use the learned SVM model to predict classes for new data points.
v. Evaluation: Measure the performance of the model using metrics such as accuracy,
precision, recall, and F1-score.

PROGRAM:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# Generate synthetic data

X, Y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split data into train and test sets

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Define SVM model

svm_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)

# Train SVM model

svm_model.fit(X_train, Y_train)

# Predictions for test data

Y_pred = svm_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
print(f'Accuracy: {accuracy}')

# Classification report
print(classification_report(Y_test, Y_pred))

# Plotting decision boundary (for 2D data)

if X.shape[1] == 2:
# Plot decision boundary
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap='viridis', s=50, alpha=0.6)

# Create grid to evaluate model

xlim = plt.gca().get_xlim()
ylim = plt.gca().get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),
np.linspace(ylim[0], ylim[1], 100))
Z = svm_model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and margins

plt.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
plt.title('SVM Decision Boundary')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')

plt.show()

OUTPUT:

Accuracy: 0.845

precision recall f1-score support

0 0.80 0.89 0.84 93

1 0.90 0.80 0.85 107

accuracy 0.84 200

macro avg 0.85 0.85 0.84 200

weighted avg 0.85 0.84 0.85 200

RESULT: The given program for SVM classification is executed and verified successfully.
PROGRAM 5.1

AIM : To implement the K-means clustering to categorize the data

ALGORITHM:

1. Initialize: Randomly select KKK initial centroids from the data.

2. Assignment: Assign each data point to the nearest centroid, forming KKK clusters.
3. Update: Recalculate the centroids of the clusters by taking the mean of all data points in
each cluster.
4. Repeat: Repeat steps 2 and 3 until the centroids no longer change (convergence) or for a
fixed number of iterations.
PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic data

def generate_data(n_samples=300, n_centers=4, random_seed=42):

np.random.seed(random_seed)

points_per_center = n_samples // n_centers

centers = np.random.uniform(-10, 10, (n_centers, 2))

X = np.vstack([center + np.random.randn(points_per_center, 2) for center in centers])

return X

# K-Means Clustering

def k_means(X, k, max_iters=100):

centroids = X[np.random.choice(X.shape[0], k, replace=False)]

for _ in range(max_iters):

# Assign each point to the nearest centroid

distances = np.linalg.norm(X[:, np.newaxis] - centroids, axis=2)

clusters = np.argmin(distances, axis=1)

# Calculate new centroids

new_centroids = np.array([X[clusters == j].mean(axis=0) for j in range(k)])

# Check for convergence

if np.all(centroids == new_centroids):

break

centroids = new_centroids

return clusters, centroids

# Generate data and run K-Means

X = generate_data()

clusters_kmeans, centroids_kmeans = k_means(X, k=4)

# Plot K-Means results

plt.figure(figsize=(10, 6))

plt.scatter(X[:, 0], X[:, 1], c=clusters_kmeans, cmap='viridis', marker='o')

plt.scatter(centroids_kmeans[:, 0], centroids_kmeans[:, 1], c='red', marker='x')

plt.title('K-Means Clustering')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.show()
OUTPUT:

RESULT: Thus the program for K-means

means clustering is been executed successfully

PROGRAM 5.2

AIM: To implement the mixture of gaussian models to categorize the data

ALGORITHM:

1. Initialize:: Choose initial parameters for the Gaussian components (means, covariances, and
mixing coefficients).
2. Expectation (E-step):: Calculate the probability of each data poi
point
nt belonging to each Gaussian
component.
3. Maximization (M-step):: Update the parameters of the Gaussian components using the
probabilities computed in the EE-step.
4. Repeat: Repeat the E-step
step and M-step
M until convergence.

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic data

def generate_data(n_samples=300, n_centers=4, random_seed=42):

np.random.seed(random_seed)

points_per_center = n_samples // n_centers

centers = np.random.uniform(-10, 10, (n_centers, 2))

X = np.vstack([center + np.random.randn(points_per_center, 2) for center in centers])

return X

# Gaussian Mixture Model (GMM)

def gmm(X, k, max_iters=100):

n_samples, n_features = X.shape

# Initialize parameters

np.random.seed(42)

weights = np.ones(k) / k

means = X[np.random.choice(X.shape[0], k, replace=False)]

covariances = np.array([np.eye(n_features) for _ in range(k)])

def gaussian(x, mean, cov):

n = x.shape[0]

diff = x - mean

return (np.exp(-0.5 * np.dot(diff.T, np.linalg.solve(cov, diff))) /

np.sqrt((2 * np.pi) ** n * np.linalg.det(cov)))

def e_step(X, weights, means, covariances):

responsibilities = np.zeros((n_samples, k))

for i in range(n_samples):

for j in range(k):

responsibilities[i, j] = weights[j] * gaussian(X[i], means[j], covariances[j])

responsibilities[i] /= np.sum(responsibilities[i])

return responsibilities

def m_step(X, responsibilities):

weights = np.mean(responsibilities, axis=0)

means = np.dot(responsibilities.T, X) / np.sum(responsibilities, axis=0)[:, np.newaxis]

covariances = []

for j in range(k):

diff = X - means[j]

cov = np.dot(responsibilities[:, j] * diff.T, diff) / np.sum(responsibilities[:, j])

covariances.append(cov)

return weights, means, np.array(covariances)

for _ in range(max_iters):

responsibilities = e_step(X, weights, means, covariances)

weights, means, covariances = m_step(X, responsibilities)

clusters = np.argmax(responsibilities, axis=1)

return clusters, means

# Generate data and run GMM

X = generate_data()

clusters_gmm, means_gmm = gmm(X, k=4)

# Plot GMM results

plt.figure(figsize=(10, 6))

plt.scatter(X[:, 0], X[:, 1], c=clusters_gmm, cmap='viridis', marker='o')

plt.scatter(means_gmm[:, 0], means_gmm[:, 1], c='red', marker='x')

plt.title('Gaussian Mixture Model Clustering')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.show()
OUTPUT:

RESULT:

Thus the program is been executed successfully.

PROGRAM 5.3

AIM: To implement the hierarchial clustering to categorize the data

ALGORITHM:

1. Start:: Treat each data point as a singleton cluster.

2. Merge:: Find the pair of clusters that are closest and merge them into a single cluster.
3. Repeat: Repeat step 2 until only a single cluster remains or a stopping criterion is met (e.g., a
desired number of clusters).
4. Cut: Cut the dendrogram at the desired level to extract clusters.

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic data

def generate_data(n_samples=300, n_centers=4, random_seed=42):

np.random.seed(random_seed)

points_per_center = n_samples // n_centers

centers = np.random.uniform(-10, 10, (n_centers, 2))

X = np.vstack([center + np.random.randn(points_per_center, 2) for center in centers])

return X

# Calculate Euclidean distance

def euclidean_distance(a, b):

return np.sqrt(np.sum((a - b) ** 2))

# Compute the distance matrix

def compute_distance_matrix(X):

n_samples = X.shape[0]

distances = np.zeros((n_samples, n_samples))

for i in range(n_samples):

for j in range(i + 1, n_samples):

distances[i, j] = euclidean_distance(X[i], X[j])

distances[j, i] = distances[i, j]

return distances

# Hierarchical Clustering

def hierarchical_clustering(X):

distances = compute_distance_matrix(X)

n_samples = len(X)

# Initialize clusters

clusters = [[i] for i in range(n_samples)]

while len(clusters) > 1:

# Find the two closest clusters

min_dist = float('inf')

to_merge = (None, None)

for i in range(len(clusters)):

for j in range(i + 1, len(clusters)):

d = np.min([distances[p][q] for p in clusters[i] for q in clusters[j]])

if d < min_dist:

min_dist = d

to_merge = (i, j)

# Merge the two clusters

i, j = to_merge

clusters[i].extend(clusters[j])

del clusters[j]

return clusters

# Function to extract cluster labels

def extract_clusters(clusters, n_samples):

labels = np.zeros(n_samples)

for cluster_id, cluster in enumerate(clusters):

for index in cluster:

labels[index] = cluster_id

return labels

# Generate data and perform hierarchical clustering

X = generate_data()

final_clusters = hierarchical_clustering(X)

# Extract final cluster labels

cluster_labels = extract_clusters(final_clusters, len(X))

# Plot Hierarchical Clustering results

plt.figure(figsize=(10, 6))

plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis', marker='o')

plt.title('Hierarchical Clustering')
plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.show()

OUTPUT:

RESULT: Thus the desired program have been successfully executed .

PROGRAM 6

AIM: To create a program to perform PCA

ALGORITHM:

1. Standardize the Data: Center the data by subtracting the mean of each feature from the
data. Optionally, scale each feature to unit variance.
2. Compute the Covariance Matrix: Calculate the covariance matrix of the centered data.
3. Calculate Eigenvalues and Eigenvectors: Find the eigenvalues and eigenvectors of the
covariance matrix. The eigenvectors determine the directions of the new feature space, and
the eigenvalues determine their magnitude.
4. Sort Eigenvalues and Eigenvectors: Sort the eigenvectors by decreasing eigenvalues and
select the top k eigenvectors to form a matrix (principal components).
5. Transform the Data: Project the original data onto the new feature space using the matrix of
principal components.
PROGRAM:

import numpy as np

def pca(X, n_components):

# Step 1: Center the Data

X_centered = X - np.mean(X, axis=0)

# Step 2: Compute the Covariance Matrix

cov_matrix = np.cov(X_centered, rowvar=False)

# Step 3: Calculate Eigenvalues and Eigenvectors

eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

# Step 4: Sort Eigenvalues and Eigenvectors

sorted_index = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_index]

sorted_eigenvalues = eigenvalues[sorted_index]

# Step 5: Select Top n_components Eigenvectors

eigenvector_subset = sorted_eigenvectors[:, :n_components]

# Step 6: Transform the Data

X_reduced = np.dot(X_centered, eigenvector_subset)

return X_reduced, sorted_eigenvalues, eigenvector_subset

# Example usage

if __name__ == "__main__":

# Example data

X = np.array([[2.5, 2.4],

[0.5, 0.7],

[2.2, 2.9],

[1.9, 2.2],

[3.1, 3.0],

[2.3, 2.7],

[2, 1.6],

[1, 1.1],

[1.5, 1.6],

[1.1, 0.9]])
# Perform PCA

X_reduced, eigenvalues, eigenvectors = pca(X, n_components=2)

print("Reduced Data:\n", X_reduced)

print("Eigenvalues:\n", eigenvalues)

print("Eigenvectors:\n", eigenvectors)

OUTPUT:

Reduced Data:

[[ 0.82797019 -0.17511531]

[-1.77758033 0.14285723]

[ 0.99219749 0.38437499]

[ 0.27421042 0.13041721]

[ 1.67580142 -0.20949846]

[ 0.9129491 0.17528244]

[-0.09910944 -0.3498247 ]

[-1.14457216 0.04641726]

[-0.43804614 0.01776463]

[-1.22382056 -0.16267529]]

Eigenvalues:

[1.28402771 0.0490834 ]

Eigenvectors:

[[ 0.6778734 -0.73517866]

[ 0.73517866 0.6778734 ]]

RESULT: Thus the given program has been successfully executed

PROGRAM 7

AIM : To implement HMM to predict the sequential data.

ALGORITHM :

import numpy as np

class SimpleHMM:

def init(self, states, observations, start_prob, trans_prob, emit_prob):

self.states = states

self.observations = observations

self.start_prob = start_prob

self.trans_prob = trans_prob

self.emit_prob = emit_prob

def viterbi(self, obs_sequence):

n_states = len(self.states)

n_obs = len(obs_sequence)

# Initialize the dynamic programming tables

viterbi_table = np.zeros((n_states, n_obs))

backpointer_table = np.zeros((n_states, n_obs), dtype=int)

# Initialization step

first_obs = obs_sequence[0]

for s in range(n_states):

viterbi_table[s, 0] = self.start_prob[s] * self.emit_prob[s, first_obs]

backpointer_table[s, 0] = 0

# Recursion step

for t in range(1, n_obs):

for s in range(n_states):

probabilities = viterbi_table[:, t-1] * self.trans_prob[:, s] * self.emit_prob[s,

obs_sequence[t]]

viterbi_table[s, t] = np.max(probabilities)

backpointer_table[s, t] = np.argmax(probabilities)

# Termination step

best_path_prob = np.max(viterbi_table[:, n_obs-1])

best_last_state = np.argmax(viterbi_table[:, n_obs-1])

# Path backtracking

best_path = np.zeros(n_obs, dtype=int)

best_path[-1] = best_last_state

for t in range(n_obs-2, -1, -1):

best_path[t] = backpointer_table[best_path[t+1], t+1]

return best_path, best_path_prob

# Example usage

if __name__ == "__main__":

# Define the states, observations, and model parameters

states = ['Rainy', 'Sunny']

observations = ['Walk', 'Shop', 'Clean']

start_probability = np.array([0.6, 0.4])

transition_probability = np.array([

[0.7, 0.3], # From Rainy to Rainy/Sunny

[0.4, 0.6], # From Sunny to Rainy/Sunny

])

emission_probability = np.array([

[0.1, 0.4, 0.5], # Probabilities of Walk/Shop/Clean from Rainy

[0.6, 0.3, 0.1], # Probabilities of Walk/Shop/Clean from Sunny

])

# Create the HMM model

hmm = SimpleHMM(states, observations, start_probability, transition_probability,

emission_probability)

# Encode the observation sequence as integers

obs_map = {obs: i for i, obs in enumerate(observations)}

obs_sequence = np.array([obs_map['Walk'], obs_map['Shop'], obs_map['Clean'],

obs_map['Walk']])

# Predict the most likely sequence of states

state_sequence, probability = hmm.viterbi(obs_sequence)

# Decode state sequence into state names

state_names = [states[state] for state in state_sequence]

print("Most likely states sequence:", state_names)

print("Probability of the best path:", probability)

OUTPUT:

Most likely states sequence: ['Sunny', 'Rainy', 'Rainy', 'Sunny']

Probability of the best path: 0.0024192

RESULT: Thus the program to implement HMM has been successfully executed .
PROGRAM 8

AIM: To implement the CART learning algorithms to perform categorization.

ALGORITHM:

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn import tree

import matplotlib.pyplot as plt

# Load the Iris dataset

data = load_iris()

X, y = data.data, data.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Decision Tree classifier

clf = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)

# Train the classifier

clf.fit(X_train, y_train)
# Make predictions on the test set

y_pred = clf.predict(X_test)

# Evaluate the model

accuracy = clf.score(X_test, y_test)

print(f'Accuracy: {accuracy:.2f}')

# Visualize the Decision Tree

plt.figure(figsize=(12, 8))

tree.plot_tree(clf, feature_names=data.feature_names,
feature_names=data.feature_names, class_names=data.target_names,
filled=True)

plt.show()

OUTPUT:

Accuracy:1.0
RESULT: Thus the given program for the implementation of CART learning algorithms to perform
categorization has been successfully executed.

PROGRAM 9

AIM: To implement Ensemble learning models to perfom classification

ALGORITHM:

Bagging (e.g., Random Forest)

1. Create multiple subsets of the training data by sampling with replacement (bootstrap
sampling).

2. Train a base model (e.g., decision tree) on each subset independently.

3. Aggregate the predictions of all models by taking a majority vote (for classification) or
averaging (for regression).

Boosting (e.g., AdaBoost)

1. Initialize weights for all training examples.

2. Train a base model on the training data, weighted according to their current weights.

3. Evaluate the model and increase the weights of misclassified examples.

4. Train subsequent models iteratively, focusing more on difficult examples.

5. Combine the models by giving more weight to better-performing models.

PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

from sklearn.decomposition import PCA

# Load Iris dataset and split into train and test sets

X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Reduce dimensionality for visualization purposes

pca = PCA(n_components=2)

X_train_2d = pca.fit_transform(X_train)

X_test_2d = pca.transform(X_test)

# Train Random Forest classifier

rf_clf = RandomForestClassifier(random_state=42)

rf_clf.fit(X_train_2d, y_train)

# Train AdaBoost classifier

ada_clf = AdaBoostClassifier(random_state=42)

ada_clf.fit(X_train_2d, y_train)

# Function to plot decision boundaries

def plot_decision_boundaries(clf, X, y, ax, title):

h = .02 # Step size in the mesh

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, h),

np.arange(y_min, y_max, h))

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)

ax.contourf(xx, yy, Z, alpha=0.3)

ax.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o')

ax.set_title(title)

# Create plots

fig, axs = plt.subplots(1, 2, figsize=(14, 6))

# Plot Random Forest decision boundaries

plot_decision_boundaries(rf_clf, X_test_2d, y_test, axs[0], 'Random Forest')

# Plot AdaBoost decision boundaries

plot_decision_boundaries(ada_clf, X_test_2d, y_test, axs[1], 'AdaBoost')

plt.show()
OUTPUTS:

RESULT: Thus the program for implementation to Ensemble learning models to perfom classification
is been successfully executed and verified.

21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
Python Machine Learning Practical Guide
No ratings yet
Python Machine Learning Practical Guide
13 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
ML Record
No ratings yet
ML Record
23 pages
ML Lab Record - 250625 - 105014
No ratings yet
ML Lab Record - 250625 - 105014
29 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
MLA Lab Record (2024)
No ratings yet
MLA Lab Record (2024)
47 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
ML Lab Programs For Exam
No ratings yet
ML Lab Programs For Exam
10 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Da 012307
No ratings yet
Da 012307
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML File
No ratings yet
ML File
10 pages
ML Ass
No ratings yet
ML Ass
16 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
ML 4,5,6 (Sample1)
No ratings yet
ML 4,5,6 (Sample1)
6 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
DM LabManual Teena
No ratings yet
DM LabManual Teena
6 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
ML With Python Practical
No ratings yet
ML With Python Practical
22 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
ML Record
No ratings yet
ML Record
14 pages
AIML Project
No ratings yet
AIML Project
4 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
ML Lab
No ratings yet
ML Lab
30 pages
Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
23 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Week 3
No ratings yet
Week 3
10 pages
ML Regression & Classification Guide
100% (1)
ML Regression & Classification Guide
45 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Deep Learningexp4
No ratings yet
Deep Learningexp4
4 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
ML Manual
No ratings yet
ML Manual
53 pages
MLLAb
No ratings yet
MLLAb
36 pages
ML Lab
No ratings yet
ML Lab
29 pages
Logistic Regression Lab Manual
No ratings yet
Logistic Regression Lab Manual
7 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
CS3491 Artificial Intelligence and Machine Learning Laboratory
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Laboratory
36 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
02 - Clustering
No ratings yet
02 - Clustering
43 pages
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
No ratings yet
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
43 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Chapter 3. Machine Learning - Full
No ratings yet
Chapter 3. Machine Learning - Full
18 pages
Regression & Linear Programming Guide
No ratings yet
Regression & Linear Programming Guide
66 pages
21ai71 Simp Tie (1) - 250107 - 124440
No ratings yet
21ai71 Simp Tie (1) - 250107 - 124440
19 pages
Machine Learning: An Artificial Intelligence Methodology: WWW - Ijecs.in
No ratings yet
Machine Learning: An Artificial Intelligence Methodology: WWW - Ijecs.in
6 pages
A Database For Handwritten Text Recognition Research
No ratings yet
A Database For Handwritten Text Recognition Research
5 pages
Aiml Cie Iii
No ratings yet
Aiml Cie Iii
7 pages
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
No ratings yet
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
5 pages
1 s2.0 S1568494614001823 Main
No ratings yet
1 s2.0 S1568494614001823 Main
10 pages
Clustering Algorithms Compared
No ratings yet
Clustering Algorithms Compared
2 pages
Project Outliner
No ratings yet
Project Outliner
7 pages
Leaf Disease Detection
No ratings yet
Leaf Disease Detection
4 pages
MSC Group Project Demo
No ratings yet
MSC Group Project Demo
31 pages
Evolutionary Clustering
No ratings yet
Evolutionary Clustering
7 pages
Multi Document Summarization Research Paper 1
No ratings yet
Multi Document Summarization Research Paper 1
26 pages
Analysis and Prediction in Agricultural Data Using Data Mining Techniques
No ratings yet
Analysis and Prediction in Agricultural Data Using Data Mining Techniques
8 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
40 pages
Business Intelligence Question Bank
No ratings yet
Business Intelligence Question Bank
35 pages
Lecture32 K-Means Clustering Exercise
No ratings yet
Lecture32 K-Means Clustering Exercise
2 pages
ML - Unit 5
No ratings yet
ML - Unit 5
22 pages
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
K Mean Clustering
No ratings yet
K Mean Clustering
11 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
Machine Learning Interview Q&A
100% (1)
Machine Learning Interview Q&A
83 pages
All Project Ideas
No ratings yet
All Project Ideas
38 pages