0% found this document useful (0 votes)

15 views7 pages

Machine Learning Assignment

This document presents a comprehensive analysis of student performance using various machine learning techniques, including regression, classification, and clustering models. Key methods employed include Linear Regression, Logistic Regression, Decision Trees, and K-Means Clustering, with evaluation metrics such as R² Score and Confusion Matrix. The analysis aims to predict student performance based on the dataset from Kaggle, showcasing the effectiveness of each model.

Uploaded by

www.shashanksaini1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Machine Learning Assignment

Uploaded by

www.shashanksaini1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In [ ]: # This Python 3 environment comes with many helpful analytics libraries inst

# It is defined by the kaggle/python Docker image: https://github.com/kaggle

# For example, here's several helpful packages to load

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory

# For example, running this (by clicking run or pressing Shift+Enter) will l

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that

# You can also write temporary files to /kaggle/temp/, but they won't be sav

Student Performance Analysis Using

Machine Learning
This notebook implements various machine learning techniques on the Student
Performance dataset from Kaggle.

Included Techniques:

Simple & Multiple Linear Regression

Polynomial, Lasso & Ridge Regression
Naïve Bayes, Logistic Regression
Decision Tree, SVM, K-NN Classifier
Artificial Neural Network
K-Means and Hierarchical Clustering

Evaluation Metrics:

R² Score (Regression)
Confusion Matrix, F1 Score (Classification)
Silhouette Score (Clustering)

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score, confusion_matrix,
In [4]: df = pd.read_csv('/kaggle/input/students-performance-in-exams/StudentsPerfor
df = pd.get_dummies(df, drop_first=True)

X = df.drop(['math score'], axis=1)

y = df['math score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

📈 Regression Models
1️⃣ Simple Linear Regression
In [5]: from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train[:, [0]], y_train)
y_pred = model.predict(X_test[:, [0]])

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.6804469009921283

2️⃣ Multiple Linear Regression

In [6]: model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.8804332983749564

3️⃣ Polynomial Regression

In [7]: poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_train)

model = LinearRegression()
model.fit(X_poly, y_train)
y_pred = model.predict(poly.transform(X_test))

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.8650480765142721
4️⃣ Ridge and Lasso Regression
In [8]: from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0).fit(X_train, y_train)

lasso = Lasso(alpha=0.1).fit(X_train, y_train)

print("Ridge R²:", r2_score(y_test, ridge.predict(X_test)))

print("Lasso R²:", r2_score(y_test, lasso.predict(X_test)))

Ridge R²: 0.8805453685953484

🤖 Classification Models
Lasso R²: 0.8822147639745545

🔁 Convert Target to Binary (Pass/Fail)

In [9]: df['target'] = ['pass' if s >= 50 else 'fail' for s in df['math score']]

X = df.drop(['math score', 'target'], axis=1)

y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

5️⃣ Logistic Regression

In [10]: from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()
log_model.fit(X_train, y_train)
log_preds = log_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, log_preds))

print("Classification Report:\n", classification_report(y_test, log_preds))
Confusion Matrix:
[[ 20 14]
[ 7 159]]
Classification Report:
precision recall f1-score support

fail 0.74 0.59 0.66 34

pass 0.92 0.96 0.94 166

accuracy 0.90 200

macro avg 0.83 0.77 0.80 200
weighted avg 0.89 0.90 0.89 200

6️⃣ Naïve Bayes Classifier

In [11]: from sklearn.naive_bayes import GaussianNB

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_preds = nb_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, nb_preds))

print("Classification Report:\n", classification_report(y_test, nb_preds))

Confusion Matrix:
[[ 23 11]
[ 18 148]]
Classification Report:
precision recall f1-score support

fail 0.56 0.68 0.61 34

pass 0.93 0.89 0.91 166

accuracy 0.85 200

macro avg 0.75 0.78 0.76 200
weighted avg 0.87 0.85 0.86 200

7️⃣ K-Nearest Neighbors (K-NN)

In [12]: from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)
knn_preds = knn_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, knn_preds))

print("Classification Report:\n", classification_report(y_test, knn_preds))
Confusion Matrix:
[[ 9 25]
[ 4 162]]
Classification Report:
precision recall f1-score support

fail 0.69 0.26 0.38 34

pass 0.87 0.98 0.92 166

accuracy 0.85 200

macro avg 0.78 0.62 0.65 200
weighted avg 0.84 0.85 0.83 200

8️⃣ Decision Tree Classification

In [13]: from sklearn.tree import DecisionTreeClassifier

tree_model = DecisionTreeClassifier(random_state=0)
tree_model.fit(X_train, y_train)
tree_preds = tree_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, tree_preds))

print("Classification Report:\n", classification_report(y_test, tree_preds))

Confusion Matrix:
[[ 20 14]
[ 11 155]]
Classification Report:
precision recall f1-score support

fail 0.65 0.59 0.62 34

pass 0.92 0.93 0.93 166

accuracy 0.88 200

macro avg 0.78 0.76 0.77 200
weighted avg 0.87 0.88 0.87 200

9️⃣ Support Vector Machine (SVM)

Classification
In [14]: from sklearn.svm import SVC

svm_model = SVC()
svm_model.fit(X_train, y_train)
svm_preds = svm_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, svm_preds))

print("Classification Report:\n", classification_report(y_test, svm_preds))
Confusion Matrix:
[[ 16 18]
[ 2 164]]
Classification Report:
precision recall f1-score support

fail 0.89 0.47 0.62 34

pass 0.90 0.99 0.94 166

accuracy 0.90 200

macro avg 0.89 0.73 0.78 200
weighted avg 0.90 0.90 0.89 200

🔟 Artificial Neural Network (ANN)

In [15]: from sklearn.neural_network import MLPClassifier

ann_model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=300, random_sta

ann_model.fit(X_train, y_train)
ann_preds = ann_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, ann_preds))

print("Classification Report:\n", classification_report(y_test, ann_preds))

Confusion Matrix:
[[ 19 15]
[ 8 158]]
Classification Report:
precision recall f1-score support

fail 0.70 0.56 0.62 34

pass 0.91 0.95 0.93 166

accuracy 0.89 200

macro avg 0.81 0.76 0.78 200
weighted avg 0.88 0.89 0.88 200

/usr/local/lib/python3.11/dist-packages/sklearn/neural_network/_multilayer_p
erceptron.py:686: ConvergenceWarning: Stochastic Optimizer: Maximum iteratio
ns (300) reached and the optimization hasn't converged yet.

📊 Clustering Models
warnings.warn(

1️⃣1️⃣ K-Means Clustering

In [16]: from sklearn.cluster import KMeans

X = df.drop(['math score', 'target'], axis=1)

X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=2, random_state=0).fit(X_scaled)
print("KMeans Silhouette Score:", silhouette_score(X_scaled, kmeans.labels_)

KMeans Silhouette Score: 0.10611407723279058

/usr/local/lib/python3.11/dist-packages/sklearn/cluster/_kmeans.py:870: Futu
reWarning: The default value of `n_init` will change from 10 to 'auto' in 1.
4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(

1️⃣2️⃣ Hierarchical Clustering

In [17]: from sklearn.cluster import AgglomerativeClustering

hclust = AgglomerativeClustering(n_clusters=2).fit(X_scaled)
print("Hierarchical Clustering Silhouette Score:", silhouette_score(X_scaled

Hierarchical Clustering Silhouette Score: 0.15767520836587193

📌 Conclusion
This notebook demonstrates the application of various machine learning
algorithms to predict student performance. We implemented multiple regression
models, classification algorithms, and clustering techniques to analyze the
dataset and evaluate the effectiveness of each model.

Thank you for reviewing this analysis!

Submitted by:
Raunak Kumar Singh

University Name:
Atmaram Sanatan Dharma College

Course Name:
B.sc Computer Science Hons

Date of Submission:
15-04-2025

This notebook was converted with convert.ploomber.io

Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Lab On ML Print-Set-2022
No ratings yet
Lab On ML Print-Set-2022
10 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
No ratings yet
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
4 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Professional Machine Learning
No ratings yet
Professional Machine Learning
67 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
Text Classification with ML Algorithms
No ratings yet
Text Classification with ML Algorithms
5 pages
ML Lab 8
No ratings yet
ML Lab 8
9 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Exp9 10
No ratings yet
Exp9 10
4 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
CCD - Ipynb - Colab
No ratings yet
CCD - Ipynb - Colab
6 pages
ML - Labtask5.ipynb - K - Colab
No ratings yet
ML - Labtask5.ipynb - K - Colab
8 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
ANN Classification with Python & R
No ratings yet
ANN Classification with Python & R
9 pages
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
No ratings yet
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
5 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
ML Lab
No ratings yet
ML Lab
29 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Project-4 (KNN CLASSIFICATION) (2) PRANAB
No ratings yet
Project-4 (KNN CLASSIFICATION) (2) PRANAB
2 pages
Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
ML 4
No ratings yet
ML 4
5 pages
ML Assignment 4
No ratings yet
ML Assignment 4
7 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
CP4252 Machine Learning Lab Manual
100% (1)
CP4252 Machine Learning Lab Manual
33 pages
Practical 1
No ratings yet
Practical 1
2 pages
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
No ratings yet
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
6 pages
AIML Programs
No ratings yet
AIML Programs
22 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
KNN Practical Debasmita Datta
No ratings yet
KNN Practical Debasmita Datta
6 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
Import As Import As Import As Import As From Import
No ratings yet
Import As Import As Import As Import As From Import
3 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
3 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Hatespeech Code Ipynb
No ratings yet
Hatespeech Code Ipynb
31 pages
Cheat Sheet: Python For Data Science
100% (1)
Cheat Sheet: Python For Data Science
1 page
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Arsd - CV Format
No ratings yet
Arsd - CV Format
1 page
High Accuracy Detection of Early Parkinsons Disease Through Multimodal Features and Machine Learning
No ratings yet
High Accuracy Detection of Early Parkinsons Disease Through Multimodal Features and Machine Learning
22 pages
Key Findinds
No ratings yet
Key Findinds
20 pages
Review Paper
No ratings yet
Review Paper
15 pages
Bertani (2018) How To Describe Bivariate Data
No ratings yet
Bertani (2018) How To Describe Bivariate Data
5 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
Cost Estimation Methods Guide
No ratings yet
Cost Estimation Methods Guide
2 pages
This Study Resource Was: SPSS Assignment 2: Paper and Exam Marks
100% (2)
This Study Resource Was: SPSS Assignment 2: Paper and Exam Marks
2 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
Chapter 2 - Exercises - Econometrics2
No ratings yet
Chapter 2 - Exercises - Econometrics2
2 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Vector Auto Regressive Models
No ratings yet
Vector Auto Regressive Models
13 pages
Numerical Tech For Interpolation & Curve Fitting
No ratings yet
Numerical Tech For Interpolation & Curve Fitting
46 pages
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
No ratings yet
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
26 pages
Regression Analysis Assumptions
No ratings yet
Regression Analysis Assumptions
19 pages
Anova: Two-Factor With Replication
No ratings yet
Anova: Two-Factor With Replication
3 pages
ANOVA and Post Hoc
No ratings yet
ANOVA and Post Hoc
2 pages
Assignment 11 Statistics
No ratings yet
Assignment 11 Statistics
4 pages
A Guide To Modern Econometrics 5th Edition Marno Verbeek PDF Available
100% (2)
A Guide To Modern Econometrics 5th Edition Marno Verbeek PDF Available
78 pages
Jump2Learn: S: P - 301: S M
No ratings yet
Jump2Learn: S: P - 301: S M
28 pages
S1 Exercise 5C
No ratings yet
S1 Exercise 5C
5 pages
4.8 Slides - Example Melanoma Mortality (Count)
No ratings yet
4.8 Slides - Example Melanoma Mortality (Count)
12 pages
Understanding ANCOVA in Research
No ratings yet
Understanding ANCOVA in Research
8 pages
Regression Beta of Tesla
No ratings yet
Regression Beta of Tesla
5 pages
Pengaruh Kompensasi Dan Motivasi Kerja Terhadap Kinerja Karyawan Pada Pt. Devina Surabaya
No ratings yet
Pengaruh Kompensasi Dan Motivasi Kerja Terhadap Kinerja Karyawan Pada Pt. Devina Surabaya
15 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
110 pages
Assignment6.1 DataMining Part2 Multiple Linear Regression
100% (1)
Assignment6.1 DataMining Part2 Multiple Linear Regression
8 pages
ML 5 Marks Questions Answers 1 To 30
No ratings yet
ML 5 Marks Questions Answers 1 To 30
5 pages
Power Plant Output Prediction
No ratings yet
Power Plant Output Prediction
12 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
13 pages
TB - 04 - Superwised Learning
No ratings yet
TB - 04 - Superwised Learning
24 pages
SVD and PCA in Data Science
No ratings yet
SVD and PCA in Data Science
58 pages

Machine Learning Assignment

Uploaded by

Machine Learning Assignment

Uploaded by

In [ ]: # This Python 3 environment comes with many helpful analytics libraries inst

# It is defined by the kaggle/python Docker image: https://github.com/kaggle

import numpy as np # linear algebra

# Input data files are available in the read-only "../input/" directory

# You can write up to 20GB to the current directory (/kaggle/working/) that

Student Performance Analysis Using

Simple & Multiple Linear Regression

In [1]: import pandas as pd

from sklearn.model_selection import train_test_split

X = df.drop(['math score'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

print("R² Score:", r2_score(y_test, y_pred))

2️⃣ Multiple Linear Regression

print("R² Score:", r2_score(y_test, y_pred))

3️⃣ Polynomial Regression

print("R² Score:", r2_score(y_test, y_pred))

ridge = Ridge(alpha=1.0).fit(X_train, y_train)

print("Ridge R²:", r2_score(y_test, ridge.predict(X_test)))

Ridge R²: 0.8805453685953484

🔁 Convert Target to Binary (Pass/Fail)

X = df.drop(['math score', 'target'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

5️⃣ Logistic Regression

print("Confusion Matrix:\n", confusion_matrix(y_test, log_preds))

fail 0.74 0.59 0.66 34

accuracy 0.90 200

6️⃣ Naïve Bayes Classifier

print("Confusion Matrix:\n", confusion_matrix(y_test, nb_preds))

fail 0.56 0.68 0.61 34

accuracy 0.85 200

7️⃣ K-Nearest Neighbors (K-NN)

print("Confusion Matrix:\n", confusion_matrix(y_test, knn_preds))

fail 0.69 0.26 0.38 34

accuracy 0.85 200

8️⃣ Decision Tree Classification

print("Confusion Matrix:\n", confusion_matrix(y_test, tree_preds))

fail 0.65 0.59 0.62 34

accuracy 0.88 200

9️⃣ Support Vector Machine (SVM)

print("Confusion Matrix:\n", confusion_matrix(y_test, svm_preds))

fail 0.89 0.47 0.62 34

accuracy 0.90 200

🔟 Artificial Neural Network (ANN)

ann_model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=300, random_sta

print("Confusion Matrix:\n", confusion_matrix(y_test, ann_preds))

fail 0.70 0.56 0.62 34

accuracy 0.89 200

1️⃣1️⃣ K-Means Clustering

X = df.drop(['math score', 'target'], axis=1)

KMeans Silhouette Score: 0.10611407723279058

1️⃣2️⃣ Hierarchical Clustering

Hierarchical Clustering Silhouette Score: 0.15767520836587193

Thank you for reviewing this analysis!

This notebook was converted with convert.ploomber.io

You might also like