0% found this document useful (0 votes)

3 views18 pages

Pandas Ds

Uploaded by

feroz22sep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views18 pages

Pandas Ds

Uploaded by

feroz22sep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

24/07/2025, 08:33 Pandas - Colab

keyboard_arrow_down Pandas
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

d={'fruits':["apple","banana","orange"],'vegetables':["tomato","onion","carrot"]}

import pandas as pd

fruits vegetables

0 apple tomato

1 banana onion

2 orange carrot

keyboard_arrow_down Pandas Series

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

import pandas as pd
a=[3,6,7]
s=pd.Series(a)
print(s)

0 3
1 6
2 7
dtype: int64

keyboard_arrow_down Labels
import pandas as pd
a=[3,6,7]
s=pd.Series(a)
print(s)
print(s[0])

0 3
1 6
2 7
dtype: int64
3

s=pd.Series(a,index=["x","y","z"])
print(s)

x 3
y 6
z 7
dtype: int64

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
s = pd.Series(calories)
print(s)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 1/18
24/07/2025, 08:33 Pandas - Colab
day1 420
day2 380
day3 390
dtype: int64

import pandas as pd
Family={"father":"Chand Basha","Mother":"Fathima","D1":"Farasha","D2":"Sana","D3":"Firoz"}
s=pd.Series(Family)
print(s)

father Chand Basha

Mother Fathima
D1 Farasha
D2 Sana
D3 Firoz
dtype: object

keyboard_arrow_down DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

import pandas as pd
data={
"Exam": ["Python","Java","Data Science"],
"Marks":[80,90,100]
}
mydata=pd.DataFrame(data)
print(mydata)

Exam Marks
0 Python 80
1 Java 90
2 Data Science 100

import pandas as pd
data={
"Sisters": ["Farasha","Sana"],
"Parents": ["Chand","Fathima"]
}
mydta=pd.DataFrame(data)
print(mydta)

Sisters Parents
0 Farasha Chand
1 Sana Fathima

A Pandas DataFrame is a 2 dimensional data structure,

like a 2 dimensional array, or a table with rows and columns.

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

calories duration
0 420 50
1 380 40
2 390 45

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the loc attribute to return one or more specified row(s)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 2/18
24/07/2025, 08:33 Pandas - Colab
print(mydta.loc[0])

Sisters Farasha
Parents Chand
Name: 0, dtype: object

print(mydta.loc[1])

Sisters Sana
Parents Fathima
Name: 1, dtype: object

print(mydata.loc[0])

Exam Python
Marks 80
Rollno 1
Name: 0, dtype: object

print(mydata.loc[1])

Exam Java
Marks 90
Rollno 2
Name: 1, dtype: object

Example

import pandas as pd
data={

"Rollno":[5801, 5802, 5803, 5804],

"Students":["Pavan", "Kavya", "Firoz", "Manoj"]
}
df=pd.DataFrame(data)
print(df)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 Firoz
3 5804 Manoj

print(df.loc[0])

Rollno 5801
Students Pavan
Name: 0, dtype: object

print(df.loc[[1,2]])

Rollno Students
1 5802 Kavya
2 5803 Firoz

Named Indexes

import pandas as pd

data = {
"Emcet marks": [420, 380, 390],
"Rank": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["Fahad", "Shazi", "Sania"])

print(df)

Emcet marks Rank

Fahad 420 50
Shazi 380 40
Sania 390 45

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 3/18
24/07/2025, 08:33 Pandas - Colab
print(df.loc["Shazi"])

Emcet marks 380

Rank 40
Name: Shazi, dtype: int64

keyboard_arrow_down Load Files Into a DataFrame

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

from google.colab import files

uploaded = files.upload()

Choose files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.

If you have a large DataFrame with many rows,

Pandas will only return the first 5 rows, and the last 5 rows:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.0
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

Use : to_string()

to print the entire DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Duration Pulse Maxpulse Calories

0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.0
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0
17 45 90 112 NaN
18 60 103 123 323.0
19 45 97 125 243.0
20 60 108 131 364.2

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 4/18
24/07/2025, 08:33 Pandas - Colab
21 45 100 119 282.0
22 60 130 101 300.0
23 45 105 132 246.0
24 60 102 126 334.5
25 60 100 120 250.0
26 60 92 118 241.0
27 60 103 132 NaN
28 60 100 132 280.0
29 60 102 129 380.3
30 60 92 115 243.0
31 45 90 112 180.1
32 60 101 124 299.0
33 60 93 113 223.0
34 60 107 136 361.0
35 60 114 140 415.0
36 60 102 127 300.0
37 60 100 120 300.0
38 60 100 120 300.0
39 45 104 129 266.0
40 45 90 112 180.1
41 60 98 126 286.0
42 60 100 122 329.4
43 60 111 138 400.0
44 60 111 131 397.0
45 60 99 119 273.0
46 60 109 153 387.6
47 45 111 136 300.0
48 45 108 129 298.0
49 60 111 139 397.6
50 60 107 136 380.2
51 80 123 146 643.1
52 60 106 130 263.0
53 60 118 151 486.0
54 30 136 175 238.0
55 60 121 146 450.7
56 60 118 121 413 0

max_rows

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the pd.options.display.max_rows statement.

print(pd.options.display.max_rows)

keyboard_arrow_down Handling Missing Data

import pandas as pd
data={

"Rollno":[5801, 5802, 5803, 5804],

"Students":["Pavan", "Kavya", None, "Manoj"]
}
df=pd.DataFrame(data)
print(df)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 None
3 5804 Manoj

1. Check for Missing Values

print(df.isnull())
print(df.isnull().sum())

Rollno Students
0 False False
1 False False
2 False True
3 False False
Rollno 0
Students 1
dtype: int64

df.dropped=df.dropna()
print(df.dropped)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 5/18
24/07/2025, 08:33 Pandas - Colab
Rollno Students
0 5801 Pavan
1 5802 Kavya
3 5804 Manoj
/tmp/ipython-input-6-493819295.py:1: UserWarning: Pandas doesn't allow columns to be created via a new attribute name -
df.dropped=df.dropna()

df.filled=df.fillna(0)
print(df.filled)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 0
3 5804 Manoj

df_bfill=df.fillna(method="bfill")
print(df_bfill)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 Manoj
3 5804 Manoj
/tmp/ipython-input-10-902698713.py:1: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a fu
df_bfill=df.fillna(method="bfill")

df_filled_mean=df.fillna(df.mean(numeric_only=True))
print(df_filled_mean)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 None
3 5804 Manoj

keyboard_arrow_down Hierarchical Indexes

Hierarchical Indexes are also known as multi-indexing is setting more than one column name as the index

Why Use Hierarchical Indexing? Hierarchical Indexing offers several advantages:

Organized Data: It helps in organizing and structuring data in a more intuitive way. Efficient Data Slicing: You can slice and dice data across
multiple dimensions easily. Enhanced Grouping: Grouping operations become more powerful and flexible. Clearer Analysis: Complex data
analysis becomes more manageable and understandable.

Creating a MultiIndex

\Let’s start by creating a MultiIndex. Assume we have data on students’ scores in different subjects across various semesters. Here’s how we
can create a MultiIndex DataFrame:

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_tuples(
[('India', 'Delhi'), ('India', 'Mumbai'), ('USA', 'New York'), ('USA', 'LA')],
names=['Country', 'City']
)

data = pd.Series([100, 150, 200, 180], index=index)

print(data)

Country City
India Delhi 100
Mumbai 150
USA New York 200
LA 180
dtype: int64

arrays = [
['India', 'India', 'USA', 'USA'],
['Delhi', 'Mumbai', 'New York', 'LA']
]

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 6/18
24/07/2025, 08:33 Pandas - Colab

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'City'))

df = pd.DataFrame({
'2022': [100, 120, 200, 180],
'2023': [130, 140, 220, 190]
}, index=index)

print(df)

2022 2023
Country City
India Delhi 100 130
Mumbai 120 140
USA New York 200 220
LA 180 190

Stacking and Unstacking

import pandas as pd

data = {
'State': ['Karnataka', 'Karnataka', 'Maharashtra', 'Maharashtra'],
'City': ['Bangalore', 'Mysore', 'Mumbai', 'Pune'],
'2022': [100, 120, 140, 160],
'2023': [110, 130, 150, 170]
}

df = pd.DataFrame(data)
df = df.set_index(['State', 'City'])
print(df)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

stacked = df.stack()
print(stacked)

State City
Karnataka Bangalore 2022 100
2023 110
Mysore 2022 120
2023 130
Maharashtra Mumbai 2022 140
2023 150
Pune 2022 160
2023 170
dtype: int64

unstacked = stacked.unstack()
print(unstacked)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Swapping index levels

print(df)
swapped = df.swaplevel()
print(swapped)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170
2022 2023
City State

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 7/18
24/07/2025, 08:33 Pandas - Colab
Bangalore Karnataka 100 110
Mysore Karnataka 120 130
Mumbai Maharashtra 140 150
Pune Maharashtra 160 170

Sorting Index Levels

sorted_df = df.sort_index(level=0)
sorted_df2 = df.sort_index(level=1)
sorted_df3 = df.sort_index(level=[0, 1])
print(sorted_df3)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Indexing with .xs() (cross-section)

print(df.xs('Karnataka'))
print(df.xs('Mumbai', level='City'))
print(df.loc[('Maharashtra', 'Pune'), '2022'])

2022 2023
City
Bangalore 100 110
Mysore 120 130
2022 2023
State
Maharashtra 140 150
160

Set Index with Multiple Columns

df_reset = df.reset_index()
print(df_reset)

State City 2022 2023

0 Karnataka Bangalore 100 110
1 Karnataka Mysore 120 130
2 Maharashtra Mumbai 140 150
3 Maharashtra Pune 160 170

df_multi = df_reset.set_index(['State', 'City'])

print(df_multi)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Concat()

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['z', 'w']})
result = pd.concat([df1, df2])
print(result)

A B
0 1 x
1 2 y
0 3 z
1 4 w

pd.concat([df1, df2], axis=1)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 8/18
24/07/2025, 08:33 Pandas - Colab

A B A B

0 1 x 3 z

1 2 y 4 w

Merge

left = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']
})

right = pd.DataFrame({
'ID': [2, 3, 4],
'Score': [85, 90, 95]
})
merged = pd.merge(left, right, on='ID', how='inner')
print(merged)

ID Name Score
0 2 Bob 85
1 3 Charlie 90

pd.merge(left, right, on='ID', how='left')

ID Name Score

0 1 Alice NaN

1 2 Bob 85.0

2 3 Charlie 90.0

pd.merge(left, right, on='ID', how='outer')

ID Name Score

0 1 Alice NaN

1 2 Bob 85.0

2 3 Charlie 90.0

3 4 NaN 95.0

keyboard_arrow_down JOIN()
join() is a convenient method for combining columns of two DataFrames based on the index (by default).

It works similarly to SQL joins (left, right, outer, inner).

It’s a shortcut to merge() when joining on the inde

import pandas as pd
a = pd.DataFrame()
d = {'id': [1, 2, 10, 12],
'val1': ['a', 'b', 'c', 'd']}
a = pd.DataFrame(d)
a

id val1

0 1 a

1 2 b

2 10 c

3 12 d

import pandas as pd
b=pd.DataFrame()
d = {'id' : [1,2,9,8],
'val2': ['e','f','g','h']}

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 9/18
24/07/2025, 08:33 Pandas - Colab
b=pd.DataFrame(d)
b

id val2

0 1 e

1 2 f

2 9 g

3 8 h

keyboard_arrow_down Types of Joins in Pandas

We will use these two Dataframes to understand the different types of joins.

Pandas Inner Join

Inner join is the most common type of join you’ll be working with. It returns a Dataframe with only those rows that have common
characteristics. This is similar to the intersection of two sets.

df = pd.merge(a, b, on='id', how='inner')

id val1 val2

0 1 a e

1 2 b f

df = pd.merge(a, b, on='id', how='inner')

id val1 val2

0 1 a e

1 2 b f

Pandas Full Outer Join

A full outer join returns all the rows from the left Dataframe, and all the rows from the right Dataframe, and matches up rows where possible,
with NaNs elsewhere. But if the Dataframe is complete, then we get the same output.

df = pd.merge(a,b, on='id',how='outer')
df

id val1 val2

0 1 a e

1 2 b f

2 8 NaN h

3 9 NaN g

4 10 c NaN

5 12 d NaN

Pandas Left Join

With a left outer join, all the records from the first Dataframe will be displayed, irrespective of whether the keys in the first Dataframe can be
found in the second Dataframe. Whereas, for the second Dataframe, only the records with the keys in the second Dataframe that can be
found in the first Dataframe will be displayed.

df = pd.merge(a,b, on='id',how='left')
df

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 10/18
24/07/2025, 08:33 Pandas - Colab

id val1 val2

0 1 a e

1 2 b f

2 10 c NaN

3 12 d NaN

Pandas Right Outer Join

For a right join, all the records from the second Dataframe will be displayed. However, only the records with the keys in the first Dataframe
that can be found in the second Dataframe will be displayed.

df = pd.merge(a,b, on='id',how='right')
df

id val1 val2

0 1 a e

1 2 b f

2 9 NaN g

3 8 NaN h

Pandas Index Join

To merge the Dataframe on indices pass the left_index and right_index arguments as True i.e. both the Dataframes are merged on an index
using default Inner Join.

df = pd.merge(a,b, right_index=True, left_index=True)

id_x val1 id_y val2

0 1 a 1 e

1 2 b 2 f

2 10 c 9 g

3 12 d 8 h

keyboard_arrow_down groupby()
The groupby() method allows you to group your data and execute functions on these groups.

import pandas as pd
data={
'Department':['HR','HR','IT','IT','Finance','Finance'],
'Employee':['A','B','C','D','E','F'],
'Salary':[1000,2000,3000,4000,5000,6000]
}
df=pd.DataFrame(data)
grouped=df.groupby('Department')
print(grouped['Salary'].sum())

Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

Aggregate()

Aggregation is the process of combining multiple values into a single summary value. In Pandas, aggregation happens after groupingthe data
using groupby().It is used to compute summary statistics such as: Sum

result=grouped['Salary'].aggregate('sum')
print(result)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 11/18
24/07/2025, 08:33 Pandas - Colab
Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

print(df.groupby('Department')['Salary'].sum())

Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

Multiple Agreegate functions

result=grouped['Salary'].aggregate(['sum','mean','max'])
print(result)

sum mean max

Department
Finance 11000 5500.0 6000
HR 3000 1500.0 2000
IT 7000 3500.0 4000

Agreegation on Multiple columns

df.groupby('Department').agg({'Salary':'sum','Employee':'count'})

Salary Employee

Department

Finance 11000 2

HR 3000 2

IT 7000 2

df.groupby('Department').agg({'Salary':['mean','min'],'Employee':['sum','max']})

Salary Employee

mean min sum max

Department

Finance 5500.0 5000 EF F

HR 1500.0 1000 AB B

IT 3500.0 3000 CD D

Custom agreegation Functions

Create a planet Dataset

import pandas as pd

data = {
'method': ['Radial Velocity', 'Radial Velocity', 'Transit', 'Transit', 'Imaging',
'Radial Velocity', 'Microlensing', 'Transit', 'Imaging', 'Transit'],
'number': [1, 1, 1, 2, 1, 1, 1, 3, 2, 1],
'orbital_period': [269.3, 874.8, 1.5, 2.2, 4100.0, 763.0, 1000.5, 3.5, 2000.0, 1.0],
'mass': [7.10, 2.21, 0.02, 0.03, 5.00, 2.60, 3.40, 0.01, 6.50, 0.02],
'distance': [77.4, 56.95, 300.0, 150.5, 25.0, 19.84, 4000.0, 80.0, 32.0, 75.0],
'year': [2006, 2008, 2012, 2014, 2005, 2011, 2013, 2015, 2010, 2011]
}

df = pd.DataFrame(data)
print(df)

method number orbital_period mass distance year

0 Radial Velocity 1 269.3 7.10 77.40 2006
1 Radial Velocity 1 874.8 2.21 56.95 2008
2 Transit 1 1.5 0.02 300.00 2012
3 Transit 2 2.2 0.03 150.50 2014
4 Imaging 1 4100.0 5.00 25.00 2005
5 Radial Velocity 1 763.0 2.60 19.84 2011

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 12/18
24/07/2025, 08:33 Pandas - Colab
6 Microlensing 1 1000.5 3.40 4000.00 2013
7 Transit 3 3.5 0.01 80.00 2015
8 Imaging 2 2000.0 6.50 32.00 2010
9 Transit 1 1.0 0.02 75.00 2011

df.groupby('method')['mass'].mean()

mass

method

Imaging 5.75

Microlensing 3.40

Radial Velocity 3.97

Transit 0.02

dtype: float64

df.groupby('method')['mass'].aggregate(['count','mean', 'min', 'max'])

count mean min max

method

Imaging 2 5.75 5.00 6.50

Microlensing 1 3.40 3.40 3.40

Radial Velocity 3 3.97 2.21 7.10

Transit 4 0.02 0.01 0.03

df.groupby('year')['number'].sum()

number

year

2005 1

2006 1

2008 1

2010 2

2011 2

2012 1

2013 1

2014 2

2015 3

dtype: int64

df.groupby(['method','year']).size().unstack(fill_value=0)

year 2005 2006 2008 2010 2011 2012 2013 2014 2015

method

Imaging 1 0 0 1 0 0 0 0 0

Microlensing 0 0 0 0 0 0 1 0 0

Radial Velocity 0 1 1 0 1 0 0 0 0

Transit 0 0 0 0 1 1 0 1 1

df.groupby('method')['distance'].mean()

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 13/18
24/07/2025, 08:33 Pandas - Colab

distance

method

Imaging 28.500000

Microlensing 4000.000000

Radial Velocity 51.396667

Transit 151.375000

dtype: float64

df.groupby('method')['distance'].aggregate(lambda x: x.max() - x.min())

distance

method

Imaging 7.00

Microlensing 0.00

Radial Velocity 57.56

Transit 225.00

dtype: float64

df.groupby('method').filter(lambda x: len(x)>2)

method number orbital_period mass distance year

0 Radial Velocity 1 269.3 7.10 77.40 2006

1 Radial Velocity 1 874.8 2.21 56.95 2008

2 Transit 1 1.5 0.02 300.00 2012

3 Transit 2 2.2 0.03 150.50 2014

5 Radial Velocity 1 763.0 2.60 19.84 2011

7 Transit 3 3.5 0.01 80.00 2015

9 Transit 1 1.0 0.02 75.00 2011

Pivot table

import pandas as pd
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})

df
table = pd.pivot_table(df, index=['A', 'B'])
table

A B

Boby Graduate 23.0

John Masters 27.0

Mina Graduate 21.0

Nicky Graduate 24.0

Peter Masters 23.0

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 14/18
24/07/2025, 08:33 Pandas - Colab
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})
table = pd.pivot_table(df, values='C', index='C', columns='B', aggfunc='sum')
print(table)

B Graduate Masters
C
21 Mina NaN
23 Boby Peter
24 Nicky NaN
27 NaN John

import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})
table = pd.pivot_table(df, values='C', index=['A', 'B'], aggfunc='mean', margins=True)
table

A B

Boby Graduate 23.0

John Masters 27.0

Mina Graduate 21.0

Nicky Graduate 24.0

Peter Masters 23.0

All 23.6

import pandas as pd

df = pd.DataFrame({'Product': ['Carrots', 'Broccoli', 'Banana', 'Banana',

'Beans', 'Orange', 'Broccoli', 'Banana'],
'Category': ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
'Amount': [270, 239, 617, 384, 626, 610, 62, 90]})
df

Product Category Quantity Amount

0 Carrots Vegetable 8 270

1 Broccoli Vegetable 5 239

2 Banana Fruit 3 617

3 Banana Fruit 4 384

4 Beans Vegetable 5 626

5 Orange Fruit 9 610

6 Broccoli Vegetable 11 62

7 Banana Fruit 8 90

pivot = df.pivot_table(index=['Product'],
values=['Amount'],
aggfunc='sum')
print(pivot)

Amount
Product
Banana 1091
Beans 626
Broccoli 301
Carrots 270
Orange 610

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 15/18
24/07/2025, 08:33 Pandas - Colab

pivot = df.pivot_table(index=['Category'],
values=['Amount'],
aggfunc='sum')
print(pivot)

Amount
Category
Fruit 1701
Vegetable 1197

pivot = df.pivot_table(index=['Product', 'Category'],

values=['Amount'], aggfunc='sum')
print(pivot)

Amount
Product Category
Banana Fruit 1091
Beans Vegetable 626
Broccoli Vegetable 301
Carrots Vegetable 270
Orange Fruit 610

pivot = df.pivot_table(index=['Category'], values=['Amount'],

aggfunc={'median', 'mean', 'min'})
print(pivot)

Amount
mean median min
Category
Fruit 425.25 497.0 90
Vegetable 299.25 254.5 62

Start coding or generate with AI.

keyboard_arrow_down Vectorized String operations

Vectorized string operations in Pandas are powerful and efficient because they are optimized for performance and operate element-wise on
entire columns (i.e., Series) of string values without using loops.

In Pandas, you can access string methods using the .str accessor on a Series. Here's a clear overview with examples:

import pandas as pd

data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'city': ['New York', 'los angeles', 'Chicago', 'Houston', 'PHOENIX']
}

df = pd.DataFrame(data)

1. Case Conversation

df['name'].str.lower()

name

0 alice

1 bob

2 charlie

3 david

4 eva

dtype: object

df['city'].str.upper()

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 16/18
24/07/2025, 08:33 Pandas - Colab

city

0 NEW YORK

1 LOS ANGELES

2 CHICAGO

3 HOUSTON

4 PHOENIX

dtype: object

df['name'].str.title()

name

0 Alice

1 Bob

2 Charlie

3 David

4 Eva

dtype: object

2. String Matching and Searching

contains()

df['name'].str.contains('o')

name

0 False

1 True

2 False

3 False

4 False

dtype: bool

startswith()

df['name'].str.startswith('A')

name

0 True

1 False

2 False

3 False

4 False

dtype: bool

endswith()

df['city'].str.endswith('a')

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 17/18
24/07/2025, 08:33 Pandas - Colab

city

0 False

1 False
df['name'].str.match('A.*')
2 False

3 name
False

0
4 True
False

1 False
dtype: bool
2 False

3 False

4 False

dtype: bool

3. String Replacement

df['name'].str.replace('a','A')

name

0 Alice

1 Bob

2 ChArlie

3 DAvid

4 EvA

dtype: object

df['name'].str[0:4]

name

0 Alic

1 Bob

2 Char

3 Davi

4 Eva

dtype: object

df['name'].str.slice(0, 3)

name

0 Ali

1 Bob

2 Cha

3 Dav

4 Eva

dtype: object

df['city'].str.len()

city

0 8

1 11

2 7

3 7

4 7

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 18/18

Pandas
No ratings yet
Pandas
20 pages
Data Loading - Jupyter Notebook
No ratings yet
Data Loading - Jupyter Notebook
15 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Ml1.ipynb - Colaboratory
No ratings yet
Ml1.ipynb - Colaboratory
5 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
Bigdata - Ipynb - Colab
No ratings yet
Bigdata - Ipynb - Colab
28 pages
PANDAS Intro 1
No ratings yet
PANDAS Intro 1
24 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
6 pages
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
No ratings yet
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
6 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
Pandas
No ratings yet
Pandas
18 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Python Pandas-2
No ratings yet
Python Pandas-2
64 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
Decision Tree PBEL With GridSearchCV
No ratings yet
Decision Tree PBEL With GridSearchCV
12 pages
Data Analysis for Heart Disease
No ratings yet
Data Analysis for Heart Disease
1 page
Pandas for Data Science Beginners
No ratings yet
Pandas for Data Science Beginners
41 pages
Statistical Data Analysis - Ipynb - Colaboratory
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
6 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Diabetes Dataset Analysis & Prep
No ratings yet
Diabetes Dataset Analysis & Prep
11 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Practical File Ip
No ratings yet
Practical File Ip
27 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
ML FINAL Lab Manual
No ratings yet
ML FINAL Lab Manual
7 pages
Practical 1
No ratings yet
Practical 1
26 pages
Practical Solutions
No ratings yet
Practical Solutions
6 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
Ploomber Notebook Conversion - 2
No ratings yet
Ploomber Notebook Conversion - 2
14 pages
Project 16 Calories Burnt Prediction
No ratings yet
Project 16 Calories Burnt Prediction
10 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
No ratings yet
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
12 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
19 pages
Dsa 1
No ratings yet
Dsa 1
8 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
EEC Notes
No ratings yet
EEC Notes
34 pages
Prac3.ipynb (Auto-R) - JupyterLab
No ratings yet
Prac3.ipynb (Auto-R) - JupyterLab
6 pages
Chisquare
No ratings yet
Chisquare
9 pages
Class12 IP Practical File With Outputs
No ratings yet
Class12 IP Practical File With Outputs
8 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Data Frame Notes3
No ratings yet
Data Frame Notes3
39 pages
Project Prog
No ratings yet
Project Prog
6 pages
Various Gas (English Units)
No ratings yet
Various Gas (English Units)
4 pages
Astrology Levels Stock Setup
No ratings yet
Astrology Levels Stock Setup
222 pages
Fifth Class Hands On - Jupyter Notebook
No ratings yet
Fifth Class Hands On - Jupyter Notebook
11 pages
School Management System
No ratings yet
School Management System
3 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
Data Sci
No ratings yet
Data Sci
29 pages
Class12 IP Practical Solutions
No ratings yet
Class12 IP Practical Solutions
39 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Functions - Colab
No ratings yet
Functions - Colab
3 pages
Architectural Design The Blueprint For Software Success 2
No ratings yet
Architectural Design The Blueprint For Software Success 2
8 pages
NumPy - Ds
No ratings yet
NumPy - Ds
15 pages
Project Schedule
No ratings yet
Project Schedule
11 pages
Supporting Look and Feel and Window Systems
No ratings yet
Supporting Look and Feel and Window Systems
13 pages
Lexi Design Patterns Presentation
No ratings yet
Lexi Design Patterns Presentation
7 pages
The Bridge Pattern Adapting Applications To Diverse Window Systems
No ratings yet
The Bridge Pattern Adapting Applications To Diverse Window Systems
8 pages
Skyline Leaflet
No ratings yet
Skyline Leaflet
2 pages
Honor 6 Plus - Pe-Tl10 QSG - (01, All, Neu, Si, L)
No ratings yet
Honor 6 Plus - Pe-Tl10 QSG - (01, All, Neu, Si, L)
144 pages
Design Report (HK)
100% (1)
Design Report (HK)
22 pages
20 Self Exploration Exercises
100% (1)
20 Self Exploration Exercises
12 pages
Zhan Zhuang
No ratings yet
Zhan Zhuang
8 pages
Learning Activity4.1 (Science Grade 8) : Name: Grade/Score: Year and Section: Date
100% (1)
Learning Activity4.1 (Science Grade 8) : Name: Grade/Score: Year and Section: Date
2 pages
Worksheets Comparatives and Superlatives
No ratings yet
Worksheets Comparatives and Superlatives
15 pages
Artificial - Intelegence-1 - Autosaved
No ratings yet
Artificial - Intelegence-1 - Autosaved
155 pages
Backup Online de Valores Actuales s7300
No ratings yet
Backup Online de Valores Actuales s7300
11 pages
Performance
No ratings yet
Performance
3 pages
Energy
No ratings yet
Energy
77 pages
GIS in Supply Chain Management
No ratings yet
GIS in Supply Chain Management
10 pages
Drilling Program - EDD-364-D1
No ratings yet
Drilling Program - EDD-364-D1
18 pages
LIGHTING in MUSEUM - Elements and Design Consideration
100% (1)
LIGHTING in MUSEUM - Elements and Design Consideration
38 pages
72508FG SDS en
No ratings yet
72508FG SDS en
12 pages
Food As Medicine Everyda PDF
75% (8)
Food As Medicine Everyda PDF
267 pages
Loctite Solvo-Rust Super Penetrating Oil Aerosol
No ratings yet
Loctite Solvo-Rust Super Penetrating Oil Aerosol
4 pages
Text 1: Aristotle Nichomachean Ethics. Book 1 Chapter 8 Central Idea
No ratings yet
Text 1: Aristotle Nichomachean Ethics. Book 1 Chapter 8 Central Idea
3 pages
Ahdp Gse Deicer Cat
No ratings yet
Ahdp Gse Deicer Cat
4 pages
' SEPAKAT SETIA PERUNDING (SDN) BHD, ,, MM, ,",,, - "
No ratings yet
' SEPAKAT SETIA PERUNDING (SDN) BHD, ,, MM, ,",,, - "
1 page
Manual English Volume 2 of 3 (Rev.01)
No ratings yet
Manual English Volume 2 of 3 (Rev.01)
168 pages
The Order of The Eastern Star
100% (6)
The Order of The Eastern Star
15 pages
DriveDxReport - APPLE SSD SM0256G - 2023-03-23 - 00-40-37-603
No ratings yet
DriveDxReport - APPLE SSD SM0256G - 2023-03-23 - 00-40-37-603
4 pages
2061 High Performance HMI White Paper
No ratings yet
2061 High Performance HMI White Paper
13 pages
The Gentle Art of Preserving Pickling Smoking Freezing Drying Curing Fermenting Bottling Canning and Making Jams Jellies and Cordials 1st Edition Katie Caldesi Download
100% (1)
The Gentle Art of Preserving Pickling Smoking Freezing Drying Curing Fermenting Bottling Canning and Making Jams Jellies and Cordials 1st Edition Katie Caldesi Download
54 pages
Nippon Steel Arcelor Mittal Catalogue
0% (1)
Nippon Steel Arcelor Mittal Catalogue
8 pages
Toyota Production System Overview
No ratings yet
Toyota Production System Overview
4 pages
Jawai-Interim Proposal-Map
No ratings yet
Jawai-Interim Proposal-Map
1 page
PROC 5071: Process Equipment Design I: Mixing and Agitation
No ratings yet
PROC 5071: Process Equipment Design I: Mixing and Agitation
43 pages
Math741 - HW 4
No ratings yet
Math741 - HW 4
3 pages