KEMBAR78
Machine Learning Model for M.S admissions | PDF
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 1/8
Prediction For Master Of Science (M.S) Admissions
using Machine Learning Model
Dataset Refered:
https://github.com/santhoshpkumar/StudentAdmissionsKeras
(https://github.com/santhoshpkumar/StudentAdmissionsKeras)
Step 1: Importing necessary libaries
In [32]:
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
import numpy as np; np.random.seed(0)
import seaborn as sns; sns.set()
import warnings; warnings.simplefilter('ignore')
Step 2: Importing dataset
In [33]:
dataset=pd.read_csv('binary.csv')
dataset.head()
Out[33]:
admit gre gpa rank
0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 NaN 4
4 0 520 2.93 4
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 2/8
Step 3: Preprocessing , Analyzing, Feature Engineering
Reaplacing missing value i.e 'NaN' with Mean values
In [34]:
dataset.apply(lambda x:sum(x.isnull()))
In [35]:
dataset.mean() # mean for all
In [36]:
dataset['gpa'].tail()
Out[34]:
admit 0
gre 0
gpa 1
rank 0
dtype: int64
Out[35]:
admit 0.317500
gre 587.700000
gpa 3.390401
rank 2.485000
dtype: float64
Out[36]:
395 4.00
396 3.04
397 2.63
398 3.65
399 3.89
Name: gpa, dtype: float64
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 3/8
In [37]:
dataset.fillna(dataset.mean()) # replacing missing value with mean values.
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 4/8
Out[37]:
admit gre gpa rank
0 0 380 3.610000 3
1 1 660 3.670000 3
2 1 800 4.000000 1
3 1 640 3.390401 4
4 0 520 2.930000 4
5 1 760 3.000000 2
6 1 560 2.980000 1
7 0 400 3.080000 2
8 1 540 3.390000 3
9 0 700 3.920000 2
10 0 800 4.000000 4
11 0 440 3.220000 1
12 1 760 4.000000 1
13 0 700 3.080000 2
14 1 700 4.000000 1
15 0 480 3.440000 3
16 0 780 3.870000 4
17 0 360 2.560000 3
18 0 800 3.750000 2
19 1 540 3.810000 1
20 0 500 3.170000 3
21 1 660 3.630000 2
22 0 600 2.820000 4
23 0 680 3.190000 4
24 1 760 3.350000 2
25 1 800 3.660000 1
26 1 620 3.610000 1
27 1 520 3.740000 4
28 1 780 3.220000 2
29 0 520 3.290000 1
... ... ... ... ...
370 1 540 3.770000 2
371 1 680 3.760000 3
372 1 680 2.420000 1
373 1 620 3.370000 1
374 0 560 3.780000 2
375 0 560 3.490000 4
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 5/8
In [38]:
dataset=dataset.fillna(dataset.mean()) # write mean value to table
In [39]:
# Convert Dataframe into matrix
dataArray = dataset.values
Step 4: Splitting Data into Testing & Training Sets
admit gre gpa rank
376 0 620 3.630000 2
377 1 800 4.000000 2
378 0 640 3.120000 3
379 0 540 2.700000 2
380 0 700 3.650000 2
381 1 540 3.490000 2
382 0 540 3.510000 2
383 0 660 4.000000 1
384 1 480 2.620000 2
385 0 420 3.020000 1
386 1 740 3.860000 2
387 0 580 3.360000 2
388 0 640 3.170000 2
389 0 640 3.510000 2
390 1 800 3.050000 2
391 1 660 3.880000 2
392 1 600 3.380000 3
393 1 620 3.750000 2
394 1 460 3.990000 3
395 0 620 4.000000 2
396 0 560 3.040000 3
397 0 460 2.630000 2
398 0 700 3.650000 2
399 0 600 3.890000 3
400 rows × 4 columns
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 6/8
In [40]:
# Splitting Input features & Output Variables
X = dataArray[:,1:4]
y = dataArray[:,0:1]
In [41]:
# Splitting training & testing
validation_size = 0.10
seed = 9
X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size=validation_size, rand
om_state = seed)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
Step 5 : Applying Different Machine Algorithms & Comparing :
here we are running all 7 algorithms on our data point and tried to evaluate it. As you can see LR and LDA
perform the best. We can also implement a
Linear Discriminant Analysis (LDA)
K Nearest Neighbours (KNN)
Decision Tree (CART)
Random Forest (RF)
Gaussian Naive Bayes (NB)
Support Vector Machines (SVM)
In [42]:
num_trees = 200
max_features = 3
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('RF', RandomForestClassifier(n_estimators=num_trees, max_features=max_fe
atures)))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))
#Fit Models and Evaulate
results = []
names = []
scoring = 'accuracy'
Step 6: Applying Cross-Validation
(360, 3)
(40, 3)
(360, 1)
(40, 1)
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 7/8
In [43]:
#Cross Validation
for name, model in models:
kfold = KFold(n_splits = 10, random_state=7)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring = scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name,cv_results.mean(), cv_results.std())
print(msg)
Step 7: Creating Prediction Model and Checking Accuracy of Model
In [44]:
#Step 1 - Create prediction model
model = LogisticRegression()
#Step 2 - Fit model
model.fit(X_train, Y_train)
#Step 3 - Predictions
predictions = model.predict(X_test)
#Step 4 - Check Accuracy
print("Model --- LogisticRegression")
print("Accuracy: {} ".format(accuracy_score(Y_test,predictions) * 100))
print(classification_report(Y_test, predictions))
Step 8: Data Visualization
LR: 0.694444 (0.087841)
LDA: 0.702778 (0.083380)
KNN: 0.658333 (0.079592)
CART: 0.636111 (0.072913)
RF: 0.675000 (0.043123)
NB: 0.688889 (0.088541)
SVM: 0.658333 (0.093830)
Model --- LogisticRegression
Accuracy: 77.5
precision recall f1-score support
0.0 0.81 0.94 0.87 31
1.0 0.50 0.22 0.31 9
accuracy 0.78 40
macro avg 0.65 0.58 0.59 40
weighted avg 0.74 0.78 0.74 40
10/22/2019 final_assignment_IA3
localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false 8/8
Name: Omkar Rane Block-1 Batch-1 Roll No:BETB118 ENTC (Machine Learning Elective)
In [45]:
#plotting confusion matrix on heatmap
cm = confusion_matrix(Y_test, predictions)
sns.heatmap(cm, annot=True, xticklabels=['reject','admit'], yticklabels=['reject','admi
t'])
plt.figure(figsize=(3,3))
plt.show()
Step 9: Testing Model with New Data
In [46]:
#Making predictions on some new data
new_data = [(720,4,1), (300,2,3) , (400,3,4) ]
#Convert to numpy array
new_array = np.asarray(new_data)
#Output Labels
labels=["reject","admit"]
# predictions
prediction=model.predict(new_array)
#Get number of test cases used
no_of_test_cases, cols = new_array.shape
for i in range(no_of_test_cases):
print("Status of Student with GRE scores = {}, GPA grade = {}, Rank = {} will be -----
{}".format(new_data[i][0],new_data[i][1],new_data[i][2], labels[int(prediction[i])]))
<Figure size 216x216 with 0 Axes>
Status of Student with GRE scores = 720, GPA grade = 4, Rank = 1 will be -
---- admit
Status of Student with GRE scores = 300, GPA grade = 2, Rank = 3 will be -
---- reject
Status of Student with GRE scores = 400, GPA grade = 3, Rank = 4 will be -
---- reject

Machine Learning Model for M.S admissions

  • 1.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false1/8 Prediction For Master Of Science (M.S) Admissions using Machine Learning Model Dataset Refered: https://github.com/santhoshpkumar/StudentAdmissionsKeras (https://github.com/santhoshpkumar/StudentAdmissionsKeras) Step 1: Importing necessary libaries In [32]: from matplotlib import pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC import numpy as np; np.random.seed(0) import seaborn as sns; sns.set() import warnings; warnings.simplefilter('ignore') Step 2: Importing dataset In [33]: dataset=pd.read_csv('binary.csv') dataset.head() Out[33]: admit gre gpa rank 0 0 380 3.61 3 1 1 660 3.67 3 2 1 800 4.00 1 3 1 640 NaN 4 4 0 520 2.93 4
  • 2.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false2/8 Step 3: Preprocessing , Analyzing, Feature Engineering Reaplacing missing value i.e 'NaN' with Mean values In [34]: dataset.apply(lambda x:sum(x.isnull())) In [35]: dataset.mean() # mean for all In [36]: dataset['gpa'].tail() Out[34]: admit 0 gre 0 gpa 1 rank 0 dtype: int64 Out[35]: admit 0.317500 gre 587.700000 gpa 3.390401 rank 2.485000 dtype: float64 Out[36]: 395 4.00 396 3.04 397 2.63 398 3.65 399 3.89 Name: gpa, dtype: float64
  • 3.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false3/8 In [37]: dataset.fillna(dataset.mean()) # replacing missing value with mean values.
  • 4.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false4/8 Out[37]: admit gre gpa rank 0 0 380 3.610000 3 1 1 660 3.670000 3 2 1 800 4.000000 1 3 1 640 3.390401 4 4 0 520 2.930000 4 5 1 760 3.000000 2 6 1 560 2.980000 1 7 0 400 3.080000 2 8 1 540 3.390000 3 9 0 700 3.920000 2 10 0 800 4.000000 4 11 0 440 3.220000 1 12 1 760 4.000000 1 13 0 700 3.080000 2 14 1 700 4.000000 1 15 0 480 3.440000 3 16 0 780 3.870000 4 17 0 360 2.560000 3 18 0 800 3.750000 2 19 1 540 3.810000 1 20 0 500 3.170000 3 21 1 660 3.630000 2 22 0 600 2.820000 4 23 0 680 3.190000 4 24 1 760 3.350000 2 25 1 800 3.660000 1 26 1 620 3.610000 1 27 1 520 3.740000 4 28 1 780 3.220000 2 29 0 520 3.290000 1 ... ... ... ... ... 370 1 540 3.770000 2 371 1 680 3.760000 3 372 1 680 2.420000 1 373 1 620 3.370000 1 374 0 560 3.780000 2 375 0 560 3.490000 4
  • 5.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false5/8 In [38]: dataset=dataset.fillna(dataset.mean()) # write mean value to table In [39]: # Convert Dataframe into matrix dataArray = dataset.values Step 4: Splitting Data into Testing & Training Sets admit gre gpa rank 376 0 620 3.630000 2 377 1 800 4.000000 2 378 0 640 3.120000 3 379 0 540 2.700000 2 380 0 700 3.650000 2 381 1 540 3.490000 2 382 0 540 3.510000 2 383 0 660 4.000000 1 384 1 480 2.620000 2 385 0 420 3.020000 1 386 1 740 3.860000 2 387 0 580 3.360000 2 388 0 640 3.170000 2 389 0 640 3.510000 2 390 1 800 3.050000 2 391 1 660 3.880000 2 392 1 600 3.380000 3 393 1 620 3.750000 2 394 1 460 3.990000 3 395 0 620 4.000000 2 396 0 560 3.040000 3 397 0 460 2.630000 2 398 0 700 3.650000 2 399 0 600 3.890000 3 400 rows × 4 columns
  • 6.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false6/8 In [40]: # Splitting Input features & Output Variables X = dataArray[:,1:4] y = dataArray[:,0:1] In [41]: # Splitting training & testing validation_size = 0.10 seed = 9 X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size=validation_size, rand om_state = seed) print(X_train.shape) print(X_test.shape) print(Y_train.shape) print(Y_test.shape) Step 5 : Applying Different Machine Algorithms & Comparing : here we are running all 7 algorithms on our data point and tried to evaluate it. As you can see LR and LDA perform the best. We can also implement a Linear Discriminant Analysis (LDA) K Nearest Neighbours (KNN) Decision Tree (CART) Random Forest (RF) Gaussian Naive Bayes (NB) Support Vector Machines (SVM) In [42]: num_trees = 200 max_features = 3 models = [] models.append(('LR', LogisticRegression())) models.append(('LDA', LinearDiscriminantAnalysis())) models.append(('KNN', KNeighborsClassifier())) models.append(('CART', DecisionTreeClassifier())) models.append(('RF', RandomForestClassifier(n_estimators=num_trees, max_features=max_fe atures))) models.append(('NB', GaussianNB())) models.append(('SVM', SVC())) #Fit Models and Evaulate results = [] names = [] scoring = 'accuracy' Step 6: Applying Cross-Validation (360, 3) (40, 3) (360, 1) (40, 1)
  • 7.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false7/8 In [43]: #Cross Validation for name, model in models: kfold = KFold(n_splits = 10, random_state=7) cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring = scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name,cv_results.mean(), cv_results.std()) print(msg) Step 7: Creating Prediction Model and Checking Accuracy of Model In [44]: #Step 1 - Create prediction model model = LogisticRegression() #Step 2 - Fit model model.fit(X_train, Y_train) #Step 3 - Predictions predictions = model.predict(X_test) #Step 4 - Check Accuracy print("Model --- LogisticRegression") print("Accuracy: {} ".format(accuracy_score(Y_test,predictions) * 100)) print(classification_report(Y_test, predictions)) Step 8: Data Visualization LR: 0.694444 (0.087841) LDA: 0.702778 (0.083380) KNN: 0.658333 (0.079592) CART: 0.636111 (0.072913) RF: 0.675000 (0.043123) NB: 0.688889 (0.088541) SVM: 0.658333 (0.093830) Model --- LogisticRegression Accuracy: 77.5 precision recall f1-score support 0.0 0.81 0.94 0.87 31 1.0 0.50 0.22 0.31 9 accuracy 0.78 40 macro avg 0.65 0.58 0.59 40 weighted avg 0.74 0.78 0.74 40
  • 8.
    10/22/2019 final_assignment_IA3 localhost:8888/nbconvert/html/test folder/final_assignment_IA3.ipynb?download=false8/8 Name: Omkar Rane Block-1 Batch-1 Roll No:BETB118 ENTC (Machine Learning Elective) In [45]: #plotting confusion matrix on heatmap cm = confusion_matrix(Y_test, predictions) sns.heatmap(cm, annot=True, xticklabels=['reject','admit'], yticklabels=['reject','admi t']) plt.figure(figsize=(3,3)) plt.show() Step 9: Testing Model with New Data In [46]: #Making predictions on some new data new_data = [(720,4,1), (300,2,3) , (400,3,4) ] #Convert to numpy array new_array = np.asarray(new_data) #Output Labels labels=["reject","admit"] # predictions prediction=model.predict(new_array) #Get number of test cases used no_of_test_cases, cols = new_array.shape for i in range(no_of_test_cases): print("Status of Student with GRE scores = {}, GPA grade = {}, Rank = {} will be ----- {}".format(new_data[i][0],new_data[i][1],new_data[i][2], labels[int(prediction[i])])) <Figure size 216x216 with 0 Axes> Status of Student with GRE scores = 720, GPA grade = 4, Rank = 1 will be - ---- admit Status of Student with GRE scores = 300, GPA grade = 2, Rank = 3 will be - ---- reject Status of Student with GRE scores = 400, GPA grade = 3, Rank = 4 will be - ---- reject