am19-eda-assignment4
November 28, 2024
Name: Swapnil Chaudhari
PRN: 2122000238
Roll No.: AM19
Assignment No. 5
B.Use ‘Placement_Dataset.xlsx’ and perform all the below mentioned encoding tasks.
[1]: import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
[2]: df= pd.read_excel('Placement_Dataset.xlsx')
[3]: df
[3]: sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p \
0 1 M 67.00 Others 91.00 Others Commerce 58.00
1 2 M 79.33 Central 78.33 Others Science 77.48
2 3 M 65.00 Central 68.00 Central Arts 64.00
3 4 M 56.00 Central 52.00 Central Science 52.00
4 5 M 85.80 Central 73.60 Central Commerce 73.30
.. … … … … … … … …
210 211 M 80.60 Others 82.00 Others Commerce 77.60
211 212 M 58.00 Others 60.00 Others Science 72.00
212 213 M 67.00 Others 67.00 Others Commerce 73.00
213 214 F 74.00 Others 66.00 Others Commerce 58.00
214 215 M 62.00 Central 58.00 Others Science 53.00
degree_t workex etest_p specialisation mba_p status salary
0 Sci&Tech No 55.0 Mkt&HR 58.80 Placed 270000.0
1 Sci&Tech Yes 86.5 Mkt&Fin 66.28 Placed 200000.0
2 Comm&Mgmt No 75.0 Mkt&Fin 57.80 Placed 250000.0
3 Sci&Tech No 66.0 Mkt&HR 59.43 Not Placed NaN
4 Comm&Mgmt No 96.8 Mkt&Fin 55.50 Placed 425000.0
.. … … … … … … …
210 Comm&Mgmt No 91.0 Mkt&Fin 74.49 Placed 400000.0
211 Sci&Tech No 74.0 Mkt&Fin 53.62 Placed 275000.0
1
212 Comm&Mgmt Yes 59.0 Mkt&Fin 69.72 Placed 295000.0
213 Comm&Mgmt No 70.0 Mkt&HR 60.23 Placed 204000.0
214 Comm&Mgmt No 89.0 Mkt&HR 60.22 Not Placed NaN
[215 rows x 15 columns]
[4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sl_no 215 non-null int64
1 gender 215 non-null object
2 ssc_p 215 non-null float64
3 ssc_b 215 non-null object
4 hsc_p 215 non-null float64
5 hsc_b 215 non-null object
6 hsc_s 215 non-null object
7 degree_p 215 non-null float64
8 degree_t 215 non-null object
9 workex 215 non-null object
10 etest_p 215 non-null float64
11 specialisation 215 non-null object
12 mba_p 215 non-null float64
13 status 215 non-null object
14 salary 148 non-null float64
dtypes: float64(6), int64(1), object(8)
memory usage: 25.3+ KB
[5]: df.dtypes
[5]: sl_no int64
gender object
ssc_p float64
ssc_b object
hsc_p float64
hsc_b object
hsc_s object
degree_p float64
degree_t object
workex object
etest_p float64
specialisation object
mba_p float64
status object
2
salary float64
dtype: object
1. Perform the One Hot Encoding separately on features –degree_t, hsc_s.
[6]: df['degree_t'].unique()
[6]: array(['Sci&Tech', 'Comm&Mgmt', 'Others'], dtype=object)
[7]: ohe= OneHotEncoder()
ohe
[7]: OneHotEncoder()
[8]: feature_arr1= ohe.fit_transform(df[['degree_t']]).toarray()
feature_arr1
[8]: array([[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
3
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
4
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
5
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
6
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.]])
[9]: feature_label1=ohe.categories_
feature_label1
7
[9]: [array(['Comm&Mgmt', 'Others', 'Sci&Tech'], dtype=object)]
[10]: features=np.array(feature_label1).ravel()
[11]: df1= pd.DataFrame(feature_arr1, columns=features)
df1
[11]: Comm&Mgmt Others Sci&Tech
0 0.0 0.0 1.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 1.0 0.0 0.0
.. … … …
210 1.0 0.0 0.0
211 0.0 0.0 1.0
212 1.0 0.0 0.0
213 1.0 0.0 0.0
214 1.0 0.0 0.0
[215 rows x 3 columns]
[12]: df=pd.concat([df,df1],axis=1)
[13]: df.drop(['degree_t'],axis=1,inplace=True)
df
[13]: sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p workex \
0 1 M 67.00 Others 91.00 Others Commerce 58.00 No
1 2 M 79.33 Central 78.33 Others Science 77.48 Yes
2 3 M 65.00 Central 68.00 Central Arts 64.00 No
3 4 M 56.00 Central 52.00 Central Science 52.00 No
4 5 M 85.80 Central 73.60 Central Commerce 73.30 No
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others Commerce 77.60 No
211 212 M 58.00 Others 60.00 Others Science 72.00 No
212 213 M 67.00 Others 67.00 Others Commerce 73.00 Yes
213 214 F 74.00 Others 66.00 Others Commerce 58.00 No
214 215 M 62.00 Central 58.00 Others Science 53.00 No
etest_p specialisation mba_p status salary Comm&Mgmt Others \
0 55.0 Mkt&HR 58.80 Placed 270000.0 0.0 0.0
1 86.5 Mkt&Fin 66.28 Placed 200000.0 0.0 0.0
2 75.0 Mkt&Fin 57.80 Placed 250000.0 1.0 0.0
3 66.0 Mkt&HR 59.43 Not Placed NaN 0.0 0.0
4 96.8 Mkt&Fin 55.50 Placed 425000.0 1.0 0.0
.. … … … … … … …
8
210 91.0 Mkt&Fin 74.49 Placed 400000.0 1.0 0.0
211 74.0 Mkt&Fin 53.62 Placed 275000.0 0.0 0.0
212 59.0 Mkt&Fin 69.72 Placed 295000.0 1.0 0.0
213 70.0 Mkt&HR 60.23 Placed 204000.0 1.0 0.0
214 89.0 Mkt&HR 60.22 Not Placed NaN 1.0 0.0
Sci&Tech
0 1.0
1 1.0
2 0.0
3 1.0
4 0.0
.. …
210 0.0
211 1.0
212 0.0
213 0.0
214 0.0
[215 rows x 17 columns]
[14]: df['hsc_s'].unique()
[14]: array(['Commerce', 'Science', 'Arts'], dtype=object)
[15]: feature_arr2= ohe.fit_transform(df[['hsc_s']]).toarray()
feature_arr2
[15]: array([[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
9
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
10
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
11
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
12
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
13
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.]])
[16]: feature_label2 = ohe.categories_
feature_label2
[16]: [array(['Arts', 'Commerce', 'Science'], dtype=object)]
[17]: features=np.array(feature_label2).ravel()
[18]: df2= pd.DataFrame(feature_arr2, columns=features)
df2
[18]: Arts Commerce Science
0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0
[215 rows x 3 columns]
[19]: df=pd.concat([df,df2],axis=1)
[20]: df.drop(['hsc_s'],axis=1,inplace=True)
df
[20]: sl_no gender ssc_p ssc_b hsc_p hsc_b degree_p workex etest_p \
0 1 M 67.00 Others 91.00 Others 58.00 No 55.0
1 2 M 79.33 Central 78.33 Others 77.48 Yes 86.5
2 3 M 65.00 Central 68.00 Central 64.00 No 75.0
3 4 M 56.00 Central 52.00 Central 52.00 No 66.0
4 5 M 85.80 Central 73.60 Central 73.30 No 96.8
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others 77.60 No 91.0
14
211 212 M 58.00 Others 60.00 Others 72.00 No 74.0
212 213 M 67.00 Others 67.00 Others 73.00 Yes 59.0
213 214 F 74.00 Others 66.00 Others 58.00 No 70.0
214 215 M 62.00 Central 58.00 Others 53.00 No 89.0
specialisation mba_p status salary Comm&Mgmt Others Sci&Tech \
0 Mkt&HR 58.80 Placed 270000.0 0.0 0.0 1.0
1 Mkt&Fin 66.28 Placed 200000.0 0.0 0.0 1.0
2 Mkt&Fin 57.80 Placed 250000.0 1.0 0.0 0.0
3 Mkt&HR 59.43 Not Placed NaN 0.0 0.0 1.0
4 Mkt&Fin 55.50 Placed 425000.0 1.0 0.0 0.0
.. … … … … … … …
210 Mkt&Fin 74.49 Placed 400000.0 1.0 0.0 0.0
211 Mkt&Fin 53.62 Placed 275000.0 0.0 0.0 1.0
212 Mkt&Fin 69.72 Placed 295000.0 1.0 0.0 0.0
213 Mkt&HR 60.23 Placed 204000.0 1.0 0.0 0.0
214 Mkt&HR 60.22 Not Placed NaN 1.0 0.0 0.0
Arts Commerce Science
0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0
[215 rows x 19 columns]
2. Perform the One Hot Label separately on features –status.
[21]: le=LabelEncoder()
le
[21]: LabelEncoder()
[22]: df['status'].unique()
[22]: array(['Placed', 'Not Placed'], dtype=object)
[24]: df['status']=le.fit_transform(df['status'])
[25]: df
15
[25]: sl_no gender ssc_p ssc_b hsc_p hsc_b degree_p workex etest_p \
0 1 M 67.00 Others 91.00 Others 58.00 No 55.0
1 2 M 79.33 Central 78.33 Others 77.48 Yes 86.5
2 3 M 65.00 Central 68.00 Central 64.00 No 75.0
3 4 M 56.00 Central 52.00 Central 52.00 No 66.0
4 5 M 85.80 Central 73.60 Central 73.30 No 96.8
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others 77.60 No 91.0
211 212 M 58.00 Others 60.00 Others 72.00 No 74.0
212 213 M 67.00 Others 67.00 Others 73.00 Yes 59.0
213 214 F 74.00 Others 66.00 Others 58.00 No 70.0
214 215 M 62.00 Central 58.00 Others 53.00 No 89.0
specialisation mba_p status salary Comm&Mgmt Others Sci&Tech \
0 Mkt&HR 58.80 1 270000.0 0.0 0.0 1.0
1 Mkt&Fin 66.28 1 200000.0 0.0 0.0 1.0
2 Mkt&Fin 57.80 1 250000.0 1.0 0.0 0.0
3 Mkt&HR 59.43 0 NaN 0.0 0.0 1.0
4 Mkt&Fin 55.50 1 425000.0 1.0 0.0 0.0
.. … … … … … … …
210 Mkt&Fin 74.49 1 400000.0 1.0 0.0 0.0
211 Mkt&Fin 53.62 1 275000.0 0.0 0.0 1.0
212 Mkt&Fin 69.72 1 295000.0 1.0 0.0 0.0
213 Mkt&HR 60.23 1 204000.0 1.0 0.0 0.0
214 Mkt&HR 60.22 0 NaN 1.0 0.0 0.0
Arts Commerce Science
0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0
[215 rows x 19 columns]
[ ]:
16