KEMBAR78
AM19 EDA Assignment4 | PDF
0% found this document useful (0 votes)
12 views16 pages

AM19 EDA Assignment4

Assignment 5 of EDa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

AM19 EDA Assignment4

Assignment 5 of EDa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

am19-eda-assignment4

November 28, 2024

Name: Swapnil Chaudhari


PRN: 2122000238
Roll No.: AM19
Assignment No. 5
B.Use ‘Placement_Dataset.xlsx’ and perform all the below mentioned encoding tasks.
[1]: import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

[2]: df= pd.read_excel('Placement_Dataset.xlsx')

[3]: df

[3]: sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p \


0 1 M 67.00 Others 91.00 Others Commerce 58.00
1 2 M 79.33 Central 78.33 Others Science 77.48
2 3 M 65.00 Central 68.00 Central Arts 64.00
3 4 M 56.00 Central 52.00 Central Science 52.00
4 5 M 85.80 Central 73.60 Central Commerce 73.30
.. … … … … … … … …
210 211 M 80.60 Others 82.00 Others Commerce 77.60
211 212 M 58.00 Others 60.00 Others Science 72.00
212 213 M 67.00 Others 67.00 Others Commerce 73.00
213 214 F 74.00 Others 66.00 Others Commerce 58.00
214 215 M 62.00 Central 58.00 Others Science 53.00

degree_t workex etest_p specialisation mba_p status salary


0 Sci&Tech No 55.0 Mkt&HR 58.80 Placed 270000.0
1 Sci&Tech Yes 86.5 Mkt&Fin 66.28 Placed 200000.0
2 Comm&Mgmt No 75.0 Mkt&Fin 57.80 Placed 250000.0
3 Sci&Tech No 66.0 Mkt&HR 59.43 Not Placed NaN
4 Comm&Mgmt No 96.8 Mkt&Fin 55.50 Placed 425000.0
.. … … … … … … …
210 Comm&Mgmt No 91.0 Mkt&Fin 74.49 Placed 400000.0
211 Sci&Tech No 74.0 Mkt&Fin 53.62 Placed 275000.0

1
212 Comm&Mgmt Yes 59.0 Mkt&Fin 69.72 Placed 295000.0
213 Comm&Mgmt No 70.0 Mkt&HR 60.23 Placed 204000.0
214 Comm&Mgmt No 89.0 Mkt&HR 60.22 Not Placed NaN

[215 rows x 15 columns]

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sl_no 215 non-null int64
1 gender 215 non-null object
2 ssc_p 215 non-null float64
3 ssc_b 215 non-null object
4 hsc_p 215 non-null float64
5 hsc_b 215 non-null object
6 hsc_s 215 non-null object
7 degree_p 215 non-null float64
8 degree_t 215 non-null object
9 workex 215 non-null object
10 etest_p 215 non-null float64
11 specialisation 215 non-null object
12 mba_p 215 non-null float64
13 status 215 non-null object
14 salary 148 non-null float64
dtypes: float64(6), int64(1), object(8)
memory usage: 25.3+ KB

[5]: df.dtypes

[5]: sl_no int64


gender object
ssc_p float64
ssc_b object
hsc_p float64
hsc_b object
hsc_s object
degree_p float64
degree_t object
workex object
etest_p float64
specialisation object
mba_p float64
status object

2
salary float64
dtype: object

1. Perform the One Hot Encoding separately on features –degree_t, hsc_s.


[6]: df['degree_t'].unique()

[6]: array(['Sci&Tech', 'Comm&Mgmt', 'Others'], dtype=object)

[7]: ohe= OneHotEncoder()


ohe

[7]: OneHotEncoder()

[8]: feature_arr1= ohe.fit_transform(df[['degree_t']]).toarray()


feature_arr1

[8]: array([[0., 0., 1.],


[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],

3
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],

4
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],

5
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],

6
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.]])

[9]: feature_label1=ohe.categories_
feature_label1

7
[9]: [array(['Comm&Mgmt', 'Others', 'Sci&Tech'], dtype=object)]

[10]: features=np.array(feature_label1).ravel()

[11]: df1= pd.DataFrame(feature_arr1, columns=features)


df1

[11]: Comm&Mgmt Others Sci&Tech


0 0.0 0.0 1.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 1.0 0.0 0.0
.. … … …
210 1.0 0.0 0.0
211 0.0 0.0 1.0
212 1.0 0.0 0.0
213 1.0 0.0 0.0
214 1.0 0.0 0.0

[215 rows x 3 columns]

[12]: df=pd.concat([df,df1],axis=1)

[13]: df.drop(['degree_t'],axis=1,inplace=True)
df

[13]: sl_no gender ssc_p ssc_b hsc_p hsc_b hsc_s degree_p workex \
0 1 M 67.00 Others 91.00 Others Commerce 58.00 No
1 2 M 79.33 Central 78.33 Others Science 77.48 Yes
2 3 M 65.00 Central 68.00 Central Arts 64.00 No
3 4 M 56.00 Central 52.00 Central Science 52.00 No
4 5 M 85.80 Central 73.60 Central Commerce 73.30 No
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others Commerce 77.60 No
211 212 M 58.00 Others 60.00 Others Science 72.00 No
212 213 M 67.00 Others 67.00 Others Commerce 73.00 Yes
213 214 F 74.00 Others 66.00 Others Commerce 58.00 No
214 215 M 62.00 Central 58.00 Others Science 53.00 No

etest_p specialisation mba_p status salary Comm&Mgmt Others \


0 55.0 Mkt&HR 58.80 Placed 270000.0 0.0 0.0
1 86.5 Mkt&Fin 66.28 Placed 200000.0 0.0 0.0
2 75.0 Mkt&Fin 57.80 Placed 250000.0 1.0 0.0
3 66.0 Mkt&HR 59.43 Not Placed NaN 0.0 0.0
4 96.8 Mkt&Fin 55.50 Placed 425000.0 1.0 0.0
.. … … … … … … …

8
210 91.0 Mkt&Fin 74.49 Placed 400000.0 1.0 0.0
211 74.0 Mkt&Fin 53.62 Placed 275000.0 0.0 0.0
212 59.0 Mkt&Fin 69.72 Placed 295000.0 1.0 0.0
213 70.0 Mkt&HR 60.23 Placed 204000.0 1.0 0.0
214 89.0 Mkt&HR 60.22 Not Placed NaN 1.0 0.0

Sci&Tech
0 1.0
1 1.0
2 0.0
3 1.0
4 0.0
.. …
210 0.0
211 1.0
212 0.0
213 0.0
214 0.0

[215 rows x 17 columns]

[14]: df['hsc_s'].unique()

[14]: array(['Commerce', 'Science', 'Arts'], dtype=object)

[15]: feature_arr2= ohe.fit_transform(df[['hsc_s']]).toarray()


feature_arr2

[15]: array([[0., 1., 0.],


[0., 0., 1.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],

9
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],

10
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],

11
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],

12
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],

13
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.]])

[16]: feature_label2 = ohe.categories_


feature_label2

[16]: [array(['Arts', 'Commerce', 'Science'], dtype=object)]

[17]: features=np.array(feature_label2).ravel()

[18]: df2= pd.DataFrame(feature_arr2, columns=features)


df2

[18]: Arts Commerce Science


0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0

[215 rows x 3 columns]

[19]: df=pd.concat([df,df2],axis=1)

[20]: df.drop(['hsc_s'],axis=1,inplace=True)
df

[20]: sl_no gender ssc_p ssc_b hsc_p hsc_b degree_p workex etest_p \
0 1 M 67.00 Others 91.00 Others 58.00 No 55.0
1 2 M 79.33 Central 78.33 Others 77.48 Yes 86.5
2 3 M 65.00 Central 68.00 Central 64.00 No 75.0
3 4 M 56.00 Central 52.00 Central 52.00 No 66.0
4 5 M 85.80 Central 73.60 Central 73.30 No 96.8
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others 77.60 No 91.0

14
211 212 M 58.00 Others 60.00 Others 72.00 No 74.0
212 213 M 67.00 Others 67.00 Others 73.00 Yes 59.0
213 214 F 74.00 Others 66.00 Others 58.00 No 70.0
214 215 M 62.00 Central 58.00 Others 53.00 No 89.0

specialisation mba_p status salary Comm&Mgmt Others Sci&Tech \


0 Mkt&HR 58.80 Placed 270000.0 0.0 0.0 1.0
1 Mkt&Fin 66.28 Placed 200000.0 0.0 0.0 1.0
2 Mkt&Fin 57.80 Placed 250000.0 1.0 0.0 0.0
3 Mkt&HR 59.43 Not Placed NaN 0.0 0.0 1.0
4 Mkt&Fin 55.50 Placed 425000.0 1.0 0.0 0.0
.. … … … … … … …
210 Mkt&Fin 74.49 Placed 400000.0 1.0 0.0 0.0
211 Mkt&Fin 53.62 Placed 275000.0 0.0 0.0 1.0
212 Mkt&Fin 69.72 Placed 295000.0 1.0 0.0 0.0
213 Mkt&HR 60.23 Placed 204000.0 1.0 0.0 0.0
214 Mkt&HR 60.22 Not Placed NaN 1.0 0.0 0.0

Arts Commerce Science


0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0

[215 rows x 19 columns]

2. Perform the One Hot Label separately on features –status.


[21]: le=LabelEncoder()
le

[21]: LabelEncoder()

[22]: df['status'].unique()

[22]: array(['Placed', 'Not Placed'], dtype=object)

[24]: df['status']=le.fit_transform(df['status'])

[25]: df

15
[25]: sl_no gender ssc_p ssc_b hsc_p hsc_b degree_p workex etest_p \
0 1 M 67.00 Others 91.00 Others 58.00 No 55.0
1 2 M 79.33 Central 78.33 Others 77.48 Yes 86.5
2 3 M 65.00 Central 68.00 Central 64.00 No 75.0
3 4 M 56.00 Central 52.00 Central 52.00 No 66.0
4 5 M 85.80 Central 73.60 Central 73.30 No 96.8
.. … … … … … … … … …
210 211 M 80.60 Others 82.00 Others 77.60 No 91.0
211 212 M 58.00 Others 60.00 Others 72.00 No 74.0
212 213 M 67.00 Others 67.00 Others 73.00 Yes 59.0
213 214 F 74.00 Others 66.00 Others 58.00 No 70.0
214 215 M 62.00 Central 58.00 Others 53.00 No 89.0

specialisation mba_p status salary Comm&Mgmt Others Sci&Tech \


0 Mkt&HR 58.80 1 270000.0 0.0 0.0 1.0
1 Mkt&Fin 66.28 1 200000.0 0.0 0.0 1.0
2 Mkt&Fin 57.80 1 250000.0 1.0 0.0 0.0
3 Mkt&HR 59.43 0 NaN 0.0 0.0 1.0
4 Mkt&Fin 55.50 1 425000.0 1.0 0.0 0.0
.. … … … … … … …
210 Mkt&Fin 74.49 1 400000.0 1.0 0.0 0.0
211 Mkt&Fin 53.62 1 275000.0 0.0 0.0 1.0
212 Mkt&Fin 69.72 1 295000.0 1.0 0.0 0.0
213 Mkt&HR 60.23 1 204000.0 1.0 0.0 0.0
214 Mkt&HR 60.22 0 NaN 1.0 0.0 0.0

Arts Commerce Science


0 0.0 1.0 0.0
1 0.0 0.0 1.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 0.0 1.0 0.0
.. … … …
210 0.0 1.0 0.0
211 0.0 0.0 1.0
212 0.0 1.0 0.0
213 0.0 1.0 0.0
214 0.0 0.0 1.0

[215 rows x 19 columns]

[ ]:

16

You might also like