Machine Learning Model for M.S admissions

10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 1/8
Prediction For Master Of Science (M.S) Admissions
using Machine Learning Model
Dataset Refered:
https://github.com/santhoshpkumar/StudentAdmissionsKeras
(https://github.com/santhoshpkumar/StudentAdmissionsKeras)
Step 1: Importing necessary libaries
In [32]:
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
import numpy as np; np.random.seed(0)
import seaborn as sns; sns.set()
import warnings; warnings.simplefilter('ignore')
Step 2: Importing dataset
In [33]:
dataset=pd.read_csv('binary.csv')
dataset.head()
Out[33]:
admit gre gpa rank
0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 NaN 4
4 0 520 2.93 4

Step 3: Preprocessing , Analyzing, Feature Engineering
Reaplacing missing value i.e 'NaN' with Mean values
In [34]:
dataset.apply(lambda x:sum(x.isnull()))
In [35]:
dataset.mean() # mean for all
In [36]:
dataset['gpa'].tail()
Out[34]:
admit 0
gre 0
gpa 1
rank 0
dtype: int64
Out[35]:
admit 0.317500
gre 587.700000
gpa 3.390401
rank 2.485000
dtype: float64
Out[36]:
395 4.00
396 3.04
397 2.63
398 3.65
399 3.89
Name: gpa, dtype: float64

In [37]:
dataset.fillna(dataset.mean()) # replacing missing value with mean values.

Out[37]:
admit gre gpa rank
0 0 380 3.610000 3
1 1 660 3.670000 3
2 1 800 4.000000 1
3 1 640 3.390401 4
4 0 520 2.930000 4
5 1 760 3.000000 2
6 1 560 2.980000 1
7 0 400 3.080000 2
8 1 540 3.390000 3
9 0 700 3.920000 2
10 0 800 4.000000 4
11 0 440 3.220000 1
12 1 760 4.000000 1
13 0 700 3.080000 2
14 1 700 4.000000 1
15 0 480 3.440000 3
16 0 780 3.870000 4
17 0 360 2.560000 3
18 0 800 3.750000 2
19 1 540 3.810000 1
20 0 500 3.170000 3
21 1 660 3.630000 2
22 0 600 2.820000 4
23 0 680 3.190000 4
24 1 760 3.350000 2
25 1 800 3.660000 1
26 1 620 3.610000 1
27 1 520 3.740000 4
28 1 780 3.220000 2
29 0 520 3.290000 1
... ... ... ... ...
370 1 540 3.770000 2
371 1 680 3.760000 3
372 1 680 2.420000 1
373 1 620 3.370000 1
374 0 560 3.780000 2
375 0 560 3.490000 4

In [38]:
dataset=dataset.fillna(dataset.mean()) # write mean value to table
In [39]:
# Convert Dataframe into matrix
dataArray = dataset.values
Step 4: Splitting Data into Testing & Training Sets
admit gre gpa rank
376 0 620 3.630000 2
377 1 800 4.000000 2
378 0 640 3.120000 3
379 0 540 2.700000 2
380 0 700 3.650000 2
381 1 540 3.490000 2
382 0 540 3.510000 2
383 0 660 4.000000 1
384 1 480 2.620000 2
385 0 420 3.020000 1
386 1 740 3.860000 2
387 0 580 3.360000 2
388 0 640 3.170000 2
389 0 640 3.510000 2
390 1 800 3.050000 2
391 1 660 3.880000 2
392 1 600 3.380000 3
393 1 620 3.750000 2
394 1 460 3.990000 3
395 0 620 4.000000 2
396 0 560 3.040000 3
397 0 460 2.630000 2
398 0 700 3.650000 2
399 0 600 3.890000 3
400 rows × 4 columns

In [40]:
# Splitting Input features & Output Variables
X = dataArray[:,1:4]
y = dataArray[:,0:1]
In [41]:
# Splitting training & testing
validation_size = 0.10
seed = 9
X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size=validation_size, rand
om_state = seed)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
Step 5 : Applying Different Machine Algorithms & Comparing :
here we are running all 7 algorithms on our data point and tried to evaluate it. As you can see LR and LDA
perform the best. We can also implement a
Linear Discriminant Analysis (LDA)
K Nearest Neighbours (KNN)
Decision Tree (CART)
Random Forest (RF)
Gaussian Naive Bayes (NB)
Support Vector Machines (SVM)
In [42]:
num_trees = 200
max_features = 3
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('RF', RandomForestClassifier(n_estimators=num_trees, max_features=max_fe
atures)))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
#Fit Models and Evaulate
results = []
names = []
scoring = 'accuracy'
Step 6: Applying Cross-Validation
(360, 3)
(40, 3)
(360, 1)
(40, 1)

In [43]:
#Cross Validation
for name, model in models:
kfold = KFold(n_splits = 10, random_state=7)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring = scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name,cv_results.mean(), cv_results.std())
print(msg)
Step 7: Creating Prediction Model and Checking Accuracy of Model
In [44]:
#Step 1 - Create prediction model
model = LogisticRegression()
#Step 2 - Fit model
model.fit(X_train, Y_train)
#Step 3 - Predictions
predictions = model.predict(X_test)
#Step 4 - Check Accuracy
print("Model --- LogisticRegression")
print("Accuracy: {} ".format(accuracy_score(Y_test,predictions) * 100))
print(classification_report(Y_test, predictions))
Step 8: Data Visualization
LR: 0.694444 (0.087841)
LDA: 0.702778 (0.083380)
KNN: 0.658333 (0.079592)
CART: 0.636111 (0.072913)
RF: 0.675000 (0.043123)
NB: 0.688889 (0.088541)
SVM: 0.658333 (0.093830)
Model --- LogisticRegression
Accuracy: 77.5
precision recall f1-score support
0.0 0.81 0.94 0.87 31
1.0 0.50 0.22 0.31 9
accuracy 0.78 40
macro avg 0.65 0.58 0.59 40
weighted avg 0.74 0.78 0.74 40

Name: Omkar Rane Block-1 Batch-1 Roll No:BETB118 ENTC (Machine Learning Elective)
In [45]:
#plotting confusion matrix on heatmap
cm = confusion_matrix(Y_test, predictions)
sns.heatmap(cm, annot=True, xticklabels=['reject','admit'], yticklabels=['reject','admi
t'])
plt.figure(figsize=(3,3))
plt.show()
Step 9: Testing Model with New Data
In [46]:
#Making predictions on some new data
new_data = [(720,4,1), (300,2,3) , (400,3,4) ]
#Convert to numpy array
new_array = np.asarray(new_data)
#Output Labels
labels=["reject","admit"]
# predictions
prediction=model.predict(new_array)
#Get number of test cases used
no_of_test_cases, cols = new_array.shape
for i in range(no_of_test_cases):
print("Status of Student with GRE scores = {}, GPA grade = {}, Rank = {} will be -----
{}".format(new_data[i][0],new_data[i][1],new_data[i][2], labels[int(prediction[i])]))
<Figure size 216x216 with 0 Axes>
Status of Student with GRE scores = 720, GPA grade = 4, Rank = 1 will be -
---- admit
---- reject
---- reject

Machine Learning Model for M.S admissions

More Related Content

Similar to Machine Learning Model for M.S admissions

More from Omkar Rane

Recently uploaded

Machine Learning Model for M.S admissions