0% found this document useful (0 votes)

23 views6 pages

Prac3.ipynb (Auto-R) - JupyterLab

The document demonstrates how to handle missing values in pandas DataFrames. It shows how to identify and count missing values, drop columns or rows based on conditions, sort the DataFrame, and remove duplicate values.

Uploaded by

Aaryan Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views6 pages

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

Aaryan Pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In [3]: import pandas as pd

import numpy as np

# Set seed for reproducibility

np.random.seed(0)

# Create a DataFrame with 3 columns and 50 rows of random numeric data

data = np.random.rand(50, 3)
df = pd.DataFrame(data, columns=['A', 'B', 'C'])

# Replace 10% of the values by null values whose index positions are generated using random
null_indices = np.random.choice(df.index, size=int(0.1 * len(df)), replace=False)
df.iloc[null_indices] = np.nan

In [4]: # a. Identify and count missing values in a DataFrame

missing_values_count = df.isnull().sum()
print("Missing Values Count:")
print(missing_values_count)

Missing Values Count:

A 5
B 5
C 5
dtype: int64

In [5]: # b. Drop the column having more than 5 null values

df = df.dropna(thresh=len(df) - 5, axis=1)
print("\nDataFrame after dropping columns with more than 5 null values:")
print(df)
DataFrame after dropping columns with more than 5 null values:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
2 0.437587 0.891773 0.963663
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
14 NaN NaN NaN
15 0.670638 0.210383 0.128926
16 NaN NaN NaN
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
24 0.976761 0.604846 0.739264
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
43 NaN NaN NaN
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227
49 NaN NaN NaN

In [7]: # c. Identify the row label having the maximum sum of all values in a row and drop that row
max_sum_row_label = df.sum(axis=1).idxmax()
df = df.drop(index=max_sum_row_label)
print("\nDataFrame after dropping row with maximum sum of values:")
print(df)
DataFrame after dropping row with maximum sum of values:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
14 NaN NaN NaN
15 0.670638 0.210383 0.128926
16 NaN NaN NaN
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
43 NaN NaN NaN
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227
49 NaN NaN NaN

In [8]: # d. Sort the DataFrame on the basis of the first column

df_sorted = df.sort_values(by='A')
print("\nDataFrame sorted on the basis of the first column:")
print(df_sorted)
DataFrame sorted on the basis of the first column:
A B C
33 0.004695 0.677817 0.270008
42 0.019193 0.301575 0.660174
25 0.039188 0.282807 0.120197
5 0.087129 0.020218 0.832620
29 0.093941 0.575946 0.929296
23 0.096098 0.976459 0.468651
8 0.118274 0.639921 0.143353
44 0.135474 0.298282 0.569965
21 0.138183 0.196582 0.368725
20 0.158970 0.110375 0.656330
18 0.208877 0.161310 0.653108
36 0.223082 0.952749 0.447125
19 0.253292 0.466311 0.244426
10 0.264556 0.774234 0.456150
26 0.296140 0.118728 0.317983
30 0.318569 0.667410 0.131798
47 0.367562 0.435865 0.891923
3 0.383442 0.791725 0.528895
27 0.414263 0.064147 0.692472
17 0.438602 0.988374 0.102045
1 0.544883 0.423655 0.645894
0 0.548814 0.715189 0.602763
28 0.566601 0.265389 0.523248
4 0.568045 0.925597 0.071036
11 0.568434 0.018790 0.617635
35 0.576157 0.592042 0.572252
39 0.581273 0.881735 0.692532
32 0.586513 0.020108 0.828940
45 0.590873 0.574325 0.653201
12 0.612096 0.616934 0.943748
41 0.643990 0.423855 0.606393
46 0.652103 0.431418 0.896547
15 0.670638 0.210383 0.128926
13 0.681820 0.359508 0.437032
31 0.716327 0.289406 0.183191
40 0.725254 0.501324 0.956084
34 0.735194 0.962189 0.248753
7 0.799159 0.461479 0.780529
48 0.806194 0.703889 0.100227
38 0.813798 0.396506 0.881103
22 0.820993 0.097101 0.837945
37 0.846409 0.699479 0.297437
9 0.944669 0.521848 0.414662
6 NaN NaN NaN
14 NaN NaN NaN
16 NaN NaN NaN
43 NaN NaN NaN
49 NaN NaN NaN

In [9]: # e. Remove all duplicates from the first column

df_unique = df.drop_duplicates(subset='A')
print("\nDataFrame after removing duplicates from the first column:")
print(df_unique)
DataFrame after removing duplicates from the first column:
A B C
0 0.548814 0.715189 0.602763
1 0.544883 0.423655 0.645894
3 0.383442 0.791725 0.528895
4 0.568045 0.925597 0.071036
5 0.087129 0.020218 0.832620
6 NaN NaN NaN
7 0.799159 0.461479 0.780529
8 0.118274 0.639921 0.143353
9 0.944669 0.521848 0.414662
10 0.264556 0.774234 0.456150
11 0.568434 0.018790 0.617635
12 0.612096 0.616934 0.943748
13 0.681820 0.359508 0.437032
15 0.670638 0.210383 0.128926
17 0.438602 0.988374 0.102045
18 0.208877 0.161310 0.653108
19 0.253292 0.466311 0.244426
20 0.158970 0.110375 0.656330
21 0.138183 0.196582 0.368725
22 0.820993 0.097101 0.837945
23 0.096098 0.976459 0.468651
25 0.039188 0.282807 0.120197
26 0.296140 0.118728 0.317983
27 0.414263 0.064147 0.692472
28 0.566601 0.265389 0.523248
29 0.093941 0.575946 0.929296
30 0.318569 0.667410 0.131798
31 0.716327 0.289406 0.183191
32 0.586513 0.020108 0.828940
33 0.004695 0.677817 0.270008
34 0.735194 0.962189 0.248753
35 0.576157 0.592042 0.572252
36 0.223082 0.952749 0.447125
37 0.846409 0.699479 0.297437
38 0.813798 0.396506 0.881103
39 0.581273 0.881735 0.692532
40 0.725254 0.501324 0.956084
41 0.643990 0.423855 0.606393
42 0.019193 0.301575 0.660174
44 0.135474 0.298282 0.569965
45 0.590873 0.574325 0.653201
46 0.652103 0.431418 0.896547
47 0.367562 0.435865 0.891923
48 0.806194 0.703889 0.100227

In [10]: # f. Find the correlation between the first and second column and covariance between the sec
correlation_AB = df['A'].corr(df['B'])
covariance_BC = df['B'].cov(df['C'])
print("\nCorrelation between the first and second column:",correlation_AB)
print("Covariance between the second and third column:",covariance_BC)

Correlation between the first and second column: 0.05849765987946871

Covariance between the second and third column: -0.025965685609794554

In [16]: # g. Discretize the second column and create 5 bins

import pandas as pd
import numpy as np

# Assuming df is your DataFrame and 'B' is the second column

df['B_bins'] = pd.qcut(df['B'], q=5, labels=False)
print("\nDataFrame with discretized second column:")
print(df)
DataFrame with discretized second column:
A B C B_bins
0 0.548814 0.715189 0.602763 4.0
1 0.544883 0.423655 0.645894 2.0
3 0.383442 0.791725 0.528895 4.0
4 0.568045 0.925597 0.071036 4.0
5 0.087129 0.020218 0.832620 0.0
6 NaN NaN NaN NaN
7 0.799159 0.461479 0.780529 2.0
8 0.118274 0.639921 0.143353 3.0
9 0.944669 0.521848 0.414662 2.0
10 0.264556 0.774234 0.456150 4.0
11 0.568434 0.018790 0.617635 0.0
12 0.612096 0.616934 0.943748 3.0
13 0.681820 0.359508 0.437032 1.0
14 NaN NaN NaN NaN
15 0.670638 0.210383 0.128926 1.0
16 NaN NaN NaN NaN
17 0.438602 0.988374 0.102045 4.0
18 0.208877 0.161310 0.653108 0.0
19 0.253292 0.466311 0.244426 2.0
20 0.158970 0.110375 0.656330 0.0
21 0.138183 0.196582 0.368725 0.0
22 0.820993 0.097101 0.837945 0.0
23 0.096098 0.976459 0.468651 4.0
25 0.039188 0.282807 0.120197 1.0
26 0.296140 0.118728 0.317983 0.0
27 0.414263 0.064147 0.692472 0.0
28 0.566601 0.265389 0.523248 1.0
29 0.093941 0.575946 0.929296 3.0
30 0.318569 0.667410 0.131798 3.0
31 0.716327 0.289406 0.183191 1.0
32 0.586513 0.020108 0.828940 0.0
33 0.004695 0.677817 0.270008 3.0
34 0.735194 0.962189 0.248753 4.0
35 0.576157 0.592042 0.572252 3.0
36 0.223082 0.952749 0.447125 4.0
37 0.846409 0.699479 0.297437 3.0
38 0.813798 0.396506 0.881103 1.0
39 0.581273 0.881735 0.692532 4.0
40 0.725254 0.501324 0.956084 2.0
41 0.643990 0.423855 0.606393 2.0
42 0.019193 0.301575 0.660174 1.0
43 NaN NaN NaN NaN
44 0.135474 0.298282 0.569965 1.0
45 0.590873 0.574325 0.653201 2.0
46 0.652103 0.431418 0.896547 2.0
47 0.367562 0.435865 0.891923 2.0
48 0.806194 0.703889 0.100227 3.0
49 NaN NaN NaN NaN

In [17]: print('By- Aaryan Pandey 13591')

By- Aaryan Pandey 13591

In [ ]:

Practical File Ip
No ratings yet
Practical File Ip
27 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Ip Project
No ratings yet
Ip Project
27 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Numpy Dataframe
No ratings yet
Numpy Dataframe
12 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Practical 1 and 2-1
No ratings yet
Practical 1 and 2-1
33 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Pandas Part-2
No ratings yet
Pandas Part-2
9 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
xử lý số liệu
No ratings yet
xử lý số liệu
11 pages
DATA SCIENCE IDC 302 End Sem Project
No ratings yet
DATA SCIENCE IDC 302 End Sem Project
1 page
Pandas
No ratings yet
Pandas
4 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Fda Batch2program
No ratings yet
Fda Batch2program
18 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
Ip 12 MT1 2024
No ratings yet
Ip 12 MT1 2024
3 pages
Data Cleaning
No ratings yet
Data Cleaning
22 pages
DA Lab
No ratings yet
DA Lab
27 pages
PCA
No ratings yet
PCA
23 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Dealing With Missing Values
No ratings yet
Dealing With Missing Values
19 pages
ML PROGRAMS
No ratings yet
ML PROGRAMS
55 pages
Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
Edp 3
No ratings yet
Edp 3
16 pages
Data Analysis & Visualization
No ratings yet
Data Analysis & Visualization
26 pages
Pandas Ds
No ratings yet
Pandas Ds
18 pages
Data - Analytics Lab - Manual JNTUH R22 Regulation
No ratings yet
Data - Analytics Lab - Manual JNTUH R22 Regulation
26 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Answers Practical File
No ratings yet
Answers Practical File
19 pages
Outpot MXout
No ratings yet
Outpot MXout
81 pages
Python
No ratings yet
Python
32 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Unit-1 AI ML PYTHON - Jupyter Notebook
No ratings yet
Unit-1 AI ML PYTHON - Jupyter Notebook
10 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Maths Coursework
No ratings yet
Maths Coursework
42 pages
Curentul Electric in Functie de Radacina Patrata A Tensiunii de Franare
No ratings yet
Curentul Electric in Functie de Radacina Patrata A Tensiunii de Franare
5 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
QP - Info - Gr.12 - June MT - 2022 - MS
No ratings yet
QP - Info - Gr.12 - June MT - 2022 - MS
15 pages
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
No ratings yet
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
6 pages
Machine Learning Group Project
No ratings yet
Machine Learning Group Project
22 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
ML Labs
No ratings yet
ML Labs
14 pages
CS2209 Python Pandas
No ratings yet
CS2209 Python Pandas
30 pages
SCM630 WM (Warehouse Management)
94% (16)
SCM630 WM (Warehouse Management)
446 pages
Palm Wine: A Review On Its Composition, Preservation, Health Benefits, and Market Value
No ratings yet
Palm Wine: A Review On Its Composition, Preservation, Health Benefits, and Market Value
11 pages
(SAMS) Safety Assessment Management System - Cebu Pacific Air
No ratings yet
(SAMS) Safety Assessment Management System - Cebu Pacific Air
22 pages
Comba ODI-065R12M15JJJ02-GQ V1
No ratings yet
Comba ODI-065R12M15JJJ02-GQ V1
3 pages
Cip Accomplishment Report Asingan, Pangasinan
No ratings yet
Cip Accomplishment Report Asingan, Pangasinan
1 page
Literature Review - Anorexia
No ratings yet
Literature Review - Anorexia
11 pages
MaSh Marketing Compendium
No ratings yet
MaSh Marketing Compendium
30 pages
2024 Digital Planner - Landscape, Light Mode, Sunday Start - PDF - Google Drive
0% (1)
2024 Digital Planner - Landscape, Light Mode, Sunday Start - PDF - Google Drive
1 page
Proportionality & Average Speed
No ratings yet
Proportionality & Average Speed
2 pages
Iwar Alto
0% (1)
Iwar Alto
1 page
Validation of Hot Corrosion and Fatigue Models in HOTPITS: K. S. Chan
No ratings yet
Validation of Hot Corrosion and Fatigue Models in HOTPITS: K. S. Chan
11 pages
Fiscal Management
100% (2)
Fiscal Management
19 pages
Manning Defense Exhibits
No ratings yet
Manning Defense Exhibits
1,282 pages
Paper 2 - May 2018 Mark Scheme
No ratings yet
Paper 2 - May 2018 Mark Scheme
14 pages
CBP Values Education Orientation
No ratings yet
CBP Values Education Orientation
15 pages
Account STMT
No ratings yet
Account STMT
1 page
Complaint Against Blue Ocean Capital
No ratings yet
Complaint Against Blue Ocean Capital
19 pages
Aspirin Uses for Healthier Plants
No ratings yet
Aspirin Uses for Healthier Plants
8 pages
Me 453 Heat Exchanger Design - Syllabus
No ratings yet
Me 453 Heat Exchanger Design - Syllabus
4 pages
Andrino Alexandra Resume
No ratings yet
Andrino Alexandra Resume
1 page
Usher Job Description at Performing Arts Center
No ratings yet
Usher Job Description at Performing Arts Center
1 page
Cash Remittance
No ratings yet
Cash Remittance
9 pages
005-Shrinkage Blanket-SOP
No ratings yet
005-Shrinkage Blanket-SOP
3 pages
E-Sabong Regulation Guide
No ratings yet
E-Sabong Regulation Guide
15 pages
Elective 3 Mathematics of Finance PDF
No ratings yet
Elective 3 Mathematics of Finance PDF
70 pages
TXL 270 Assignment 1 Fatin Zulaikha PDF
No ratings yet
TXL 270 Assignment 1 Fatin Zulaikha PDF
3 pages
Capital Budgeting Techniques - DDMK
No ratings yet
Capital Budgeting Techniques - DDMK
89 pages
Ranking
100% (1)
Ranking
17 pages
SCI EI Art Qi
No ratings yet
SCI EI Art Qi
72 pages
Untitled Document
No ratings yet
Untitled Document
2 pages

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

Prac3.ipynb (Auto-R) - JupyterLab

Uploaded by

In [3]: import pandas as pd

# Set seed for reproducibility

# Create a DataFrame with 3 columns and 50 rows of random numeric data

In [4]: # a. Identify and count missing values in a DataFrame

Missing Values Count:

In [5]: # b. Drop the column having more than 5 null values

In [8]: # d. Sort the DataFrame on the basis of the first column

In [9]: # e. Remove all duplicates from the first column

Correlation between the first and second column: 0.05849765987946871

In [16]: # g. Discretize the second column and create 5 bins

# Assuming df is your DataFrame and 'B' is the second column

In [17]: print('By- Aaryan Pandey 13591')

By- Aaryan Pandey 13591

You might also like