KEMBAR78
Pandas Ds | PDF | Comma Separated Values | Computer Programming
0% found this document useful (0 votes)
3 views18 pages

Pandas Ds

Uploaded by

feroz22sep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

Pandas Ds

Uploaded by

feroz22sep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

24/07/2025, 08:33 Pandas - Colab

keyboard_arrow_down Pandas
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

d={'fruits':["apple","banana","orange"],'vegetables':["tomato","onion","carrot"]}

import pandas as pd

pd

fruits vegetables

0 apple tomato

1 banana onion

2 orange carrot

keyboard_arrow_down Pandas Series


A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

import pandas as pd
a=[3,6,7]
s=pd.Series(a)
print(s)

0 3
1 6
2 7
dtype: int64

keyboard_arrow_down Labels
import pandas as pd
a=[3,6,7]
s=pd.Series(a)
print(s)
print(s[0])

0 3
1 6
2 7
dtype: int64
3

s=pd.Series(a,index=["x","y","z"])
print(s)

x 3
y 6
z 7
dtype: int64

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
s = pd.Series(calories)
print(s)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 1/18
24/07/2025, 08:33 Pandas - Colab
day1 420
day2 380
day3 390
dtype: int64

import pandas as pd
Family={"father":"Chand Basha","Mother":"Fathima","D1":"Farasha","D2":"Sana","D3":"Firoz"}
s=pd.Series(Family)
print(s)

father Chand Basha


Mother Fathima
D1 Farasha
D2 Sana
D3 Firoz
dtype: object

keyboard_arrow_down DataFrames
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

import pandas as pd
data={
"Exam": ["Python","Java","Data Science"],
"Marks":[80,90,100]
}
mydata=pd.DataFrame(data)
print(mydata)

Exam Marks
0 Python 80
1 Java 90
2 Data Science 100

import pandas as pd
data={
"Sisters": ["Farasha","Sana"],
"Parents": ["Chand","Fathima"]
}
mydta=pd.DataFrame(data)
print(mydta)

Sisters Parents
0 Farasha Chand
1 Sana Fathima

A Pandas DataFrame is a 2 dimensional data structure,

like a 2 dimensional array, or a table with rows and columns.

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)

calories duration
0 420 50
1 380 40
2 390 45

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the loc attribute to return one or more specified row(s)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 2/18
24/07/2025, 08:33 Pandas - Colab
print(mydta.loc[0])

Sisters Farasha
Parents Chand
Name: 0, dtype: object

print(mydta.loc[1])

Sisters Sana
Parents Fathima
Name: 1, dtype: object

print(mydata.loc[0])

Exam Python
Marks 80
Rollno 1
Name: 0, dtype: object

print(mydata.loc[1])

Exam Java
Marks 90
Rollno 2
Name: 1, dtype: object

Example

import pandas as pd
data={

"Rollno":[5801, 5802, 5803, 5804],


"Students":["Pavan", "Kavya", "Firoz", "Manoj"]
}
df=pd.DataFrame(data)
print(df)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 Firoz
3 5804 Manoj

print(df.loc[0])

Rollno 5801
Students Pavan
Name: 0, dtype: object

print(df.loc[[1,2]])

Rollno Students
1 5802 Kavya
2 5803 Firoz

Named Indexes

import pandas as pd

data = {
"Emcet marks": [420, 380, 390],
"Rank": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["Fahad", "Shazi", "Sania"])

print(df)

Emcet marks Rank


Fahad 420 50
Shazi 380 40
Sania 390 45

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 3/18
24/07/2025, 08:33 Pandas - Colab
print(df.loc["Shazi"])

Emcet marks 380


Rank 40
Name: Shazi, dtype: int64

keyboard_arrow_down Load Files Into a DataFrame


Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called 'data.csv'.

from google.colab import files


uploaded = files.upload()

Choose files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.

If you have a large DataFrame with many rows,

Pandas will only return the first 5 rows, and the last 5 rows:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.0
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

Use : to_string()

to print the entire DataFrame.

import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.0
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7
12 60 106 128 345.3
13 60 104 132 379.3
14 60 98 123 275.0
15 60 98 120 215.2
16 60 100 120 300.0
17 45 90 112 NaN
18 60 103 123 323.0
19 45 97 125 243.0
20 60 108 131 364.2

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 4/18
24/07/2025, 08:33 Pandas - Colab
21 45 100 119 282.0
22 60 130 101 300.0
23 45 105 132 246.0
24 60 102 126 334.5
25 60 100 120 250.0
26 60 92 118 241.0
27 60 103 132 NaN
28 60 100 132 280.0
29 60 102 129 380.3
30 60 92 115 243.0
31 45 90 112 180.1
32 60 101 124 299.0
33 60 93 113 223.0
34 60 107 136 361.0
35 60 114 140 415.0
36 60 102 127 300.0
37 60 100 120 300.0
38 60 100 120 300.0
39 45 104 129 266.0
40 45 90 112 180.1
41 60 98 126 286.0
42 60 100 122 329.4
43 60 111 138 400.0
44 60 111 131 397.0
45 60 99 119 273.0
46 60 109 153 387.6
47 45 111 136 300.0
48 45 108 129 298.0
49 60 111 139 397.6
50 60 107 136 380.2
51 80 123 146 643.1
52 60 106 130 263.0
53 60 118 151 486.0
54 30 136 175 238.0
55 60 121 146 450.7
56 60 118 121 413 0

max_rows

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the pd.options.display.max_rows statement.

print(pd.options.display.max_rows)

60

keyboard_arrow_down Handling Missing Data


import pandas as pd
data={

"Rollno":[5801, 5802, 5803, 5804],


"Students":["Pavan", "Kavya", None, "Manoj"]
}
df=pd.DataFrame(data)
print(df)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 None
3 5804 Manoj

1. Check for Missing Values

print(df.isnull())
print(df.isnull().sum())

Rollno Students
0 False False
1 False False
2 False True
3 False False
Rollno 0
Students 1
dtype: int64

df.dropped=df.dropna()
print(df.dropped)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 5/18
24/07/2025, 08:33 Pandas - Colab
Rollno Students
0 5801 Pavan
1 5802 Kavya
3 5804 Manoj
/tmp/ipython-input-6-493819295.py:1: UserWarning: Pandas doesn't allow columns to be created via a new attribute name -
df.dropped=df.dropna()

df.filled=df.fillna(0)
print(df.filled)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 0
3 5804 Manoj

df_bfill=df.fillna(method="bfill")
print(df_bfill)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 Manoj
3 5804 Manoj
/tmp/ipython-input-10-902698713.py:1: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a fu
df_bfill=df.fillna(method="bfill")

df_filled_mean=df.fillna(df.mean(numeric_only=True))
print(df_filled_mean)

Rollno Students
0 5801 Pavan
1 5802 Kavya
2 5803 None
3 5804 Manoj

keyboard_arrow_down Hierarchical Indexes


Hierarchical Indexes are also known as multi-indexing is setting more than one column name as the index

Why Use Hierarchical Indexing? Hierarchical Indexing offers several advantages:

Organized Data: It helps in organizing and structuring data in a more intuitive way. Efficient Data Slicing: You can slice and dice data across
multiple dimensions easily. Enhanced Grouping: Grouping operations become more powerful and flexible. Clearer Analysis: Complex data
analysis becomes more manageable and understandable.

Creating a MultiIndex

\Let’s start by creating a MultiIndex. Assume we have data on students’ scores in different subjects across various semesters. Here’s how we
can create a MultiIndex DataFrame:

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_tuples(
[('India', 'Delhi'), ('India', 'Mumbai'), ('USA', 'New York'), ('USA', 'LA')],
names=['Country', 'City']
)

data = pd.Series([100, 150, 200, 180], index=index)


print(data)

Country City
India Delhi 100
Mumbai 150
USA New York 200
LA 180
dtype: int64

arrays = [
['India', 'India', 'USA', 'USA'],
['Delhi', 'Mumbai', 'New York', 'LA']
]

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 6/18
24/07/2025, 08:33 Pandas - Colab

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'City'))

df = pd.DataFrame({
'2022': [100, 120, 200, 180],
'2023': [130, 140, 220, 190]
}, index=index)

print(df)

2022 2023
Country City
India Delhi 100 130
Mumbai 120 140
USA New York 200 220
LA 180 190

Stacking and Unstacking

import pandas as pd

data = {
'State': ['Karnataka', 'Karnataka', 'Maharashtra', 'Maharashtra'],
'City': ['Bangalore', 'Mysore', 'Mumbai', 'Pune'],
'2022': [100, 120, 140, 160],
'2023': [110, 130, 150, 170]
}

df = pd.DataFrame(data)
df = df.set_index(['State', 'City'])
print(df)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

stacked = df.stack()
print(stacked)

State City
Karnataka Bangalore 2022 100
2023 110
Mysore 2022 120
2023 130
Maharashtra Mumbai 2022 140
2023 150
Pune 2022 160
2023 170
dtype: int64

unstacked = stacked.unstack()
print(unstacked)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Swapping index levels

print(df)
swapped = df.swaplevel()
print(swapped)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170
2022 2023
City State

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 7/18
24/07/2025, 08:33 Pandas - Colab
Bangalore Karnataka 100 110
Mysore Karnataka 120 130
Mumbai Maharashtra 140 150
Pune Maharashtra 160 170

Sorting Index Levels

sorted_df = df.sort_index(level=0)
sorted_df2 = df.sort_index(level=1)
sorted_df3 = df.sort_index(level=[0, 1])
print(sorted_df3)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Indexing with .xs() (cross-section)

print(df.xs('Karnataka'))
print(df.xs('Mumbai', level='City'))
print(df.loc[('Maharashtra', 'Pune'), '2022'])

2022 2023
City
Bangalore 100 110
Mysore 120 130
2022 2023
State
Maharashtra 140 150
160

Set Index with Multiple Columns

df_reset = df.reset_index()
print(df_reset)

State City 2022 2023


0 Karnataka Bangalore 100 110
1 Karnataka Mysore 120 130
2 Maharashtra Mumbai 140 150
3 Maharashtra Pune 160 170

df_multi = df_reset.set_index(['State', 'City'])


print(df_multi)

2022 2023
State City
Karnataka Bangalore 100 110
Mysore 120 130
Maharashtra Mumbai 140 150
Pune 160 170

Concat()

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['z', 'w']})
result = pd.concat([df1, df2])
print(result)

A B
0 1 x
1 2 y
0 3 z
1 4 w

pd.concat([df1, df2], axis=1)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 8/18
24/07/2025, 08:33 Pandas - Colab

A B A B

0 1 x 3 z

1 2 y 4 w

Merge

left = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']
})

right = pd.DataFrame({
'ID': [2, 3, 4],
'Score': [85, 90, 95]
})
merged = pd.merge(left, right, on='ID', how='inner')
print(merged)

ID Name Score
0 2 Bob 85
1 3 Charlie 90

pd.merge(left, right, on='ID', how='left')

ID Name Score

0 1 Alice NaN

1 2 Bob 85.0

2 3 Charlie 90.0

pd.merge(left, right, on='ID', how='outer')

ID Name Score

0 1 Alice NaN

1 2 Bob 85.0

2 3 Charlie 90.0

3 4 NaN 95.0

keyboard_arrow_down JOIN()
join() is a convenient method for combining columns of two DataFrames based on the index (by default).

It works similarly to SQL joins (left, right, outer, inner).

It’s a shortcut to merge() when joining on the inde

import pandas as pd
a = pd.DataFrame()
d = {'id': [1, 2, 10, 12],
'val1': ['a', 'b', 'c', 'd']}
a = pd.DataFrame(d)
a

id val1

0 1 a

1 2 b

2 10 c

3 12 d

import pandas as pd
b=pd.DataFrame()
d = {'id' : [1,2,9,8],
'val2': ['e','f','g','h']}

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 9/18
24/07/2025, 08:33 Pandas - Colab
b=pd.DataFrame(d)
b

id val2

0 1 e

1 2 f

2 9 g

3 8 h

keyboard_arrow_down Types of Joins in Pandas


We will use these two Dataframes to understand the different types of joins.

Pandas Inner Join

Inner join is the most common type of join you’ll be working with. It returns a Dataframe with only those rows that have common
characteristics. This is similar to the intersection of two sets.

df = pd.merge(a, b, on='id', how='inner')


df

id val1 val2

0 1 a e

1 2 b f

df = pd.merge(a, b, on='id', how='inner')


df

id val1 val2

0 1 a e

1 2 b f

Pandas Full Outer Join

A full outer join returns all the rows from the left Dataframe, and all the rows from the right Dataframe, and matches up rows where possible,
with NaNs elsewhere. But if the Dataframe is complete, then we get the same output.

df = pd.merge(a,b, on='id',how='outer')
df

id val1 val2

0 1 a e

1 2 b f

2 8 NaN h

3 9 NaN g

4 10 c NaN

5 12 d NaN

Pandas Left Join

With a left outer join, all the records from the first Dataframe will be displayed, irrespective of whether the keys in the first Dataframe can be
found in the second Dataframe. Whereas, for the second Dataframe, only the records with the keys in the second Dataframe that can be
found in the first Dataframe will be displayed.

df = pd.merge(a,b, on='id',how='left')
df

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 10/18
24/07/2025, 08:33 Pandas - Colab

id val1 val2

0 1 a e

1 2 b f

2 10 c NaN

3 12 d NaN

Pandas Right Outer Join

For a right join, all the records from the second Dataframe will be displayed. However, only the records with the keys in the first Dataframe
that can be found in the second Dataframe will be displayed.

df = pd.merge(a,b, on='id',how='right')
df

id val1 val2

0 1 a e

1 2 b f

2 9 NaN g

3 8 NaN h

Pandas Index Join

To merge the Dataframe on indices pass the left_index and right_index arguments as True i.e. both the Dataframes are merged on an index
using default Inner Join.

df = pd.merge(a,b, right_index=True, left_index=True)


df

id_x val1 id_y val2

0 1 a 1 e

1 2 b 2 f

2 10 c 9 g

3 12 d 8 h

keyboard_arrow_down groupby()
The groupby() method allows you to group your data and execute functions on these groups.

import pandas as pd
data={
'Department':['HR','HR','IT','IT','Finance','Finance'],
'Employee':['A','B','C','D','E','F'],
'Salary':[1000,2000,3000,4000,5000,6000]
}
df=pd.DataFrame(data)
grouped=df.groupby('Department')
print(grouped['Salary'].sum())

Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

Aggregate()

Aggregation is the process of combining multiple values into a single summary value. In Pandas, aggregation happens after groupingthe data
using groupby().It is used to compute summary statistics such as: Sum

result=grouped['Salary'].aggregate('sum')
print(result)

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 11/18
24/07/2025, 08:33 Pandas - Colab
Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

print(df.groupby('Department')['Salary'].sum())

Department
Finance 11000
HR 3000
IT 7000
Name: Salary, dtype: int64

Multiple Agreegate functions

result=grouped['Salary'].aggregate(['sum','mean','max'])
print(result)

sum mean max


Department
Finance 11000 5500.0 6000
HR 3000 1500.0 2000
IT 7000 3500.0 4000

Agreegation on Multiple columns

df.groupby('Department').agg({'Salary':'sum','Employee':'count'})

Salary Employee

Department

Finance 11000 2

HR 3000 2

IT 7000 2

df.groupby('Department').agg({'Salary':['mean','min'],'Employee':['sum','max']})

Salary Employee

mean min sum max

Department

Finance 5500.0 5000 EF F

HR 1500.0 1000 AB B

IT 3500.0 3000 CD D

Custom agreegation Functions

Create a planet Dataset

import pandas as pd

data = {
'method': ['Radial Velocity', 'Radial Velocity', 'Transit', 'Transit', 'Imaging',
'Radial Velocity', 'Microlensing', 'Transit', 'Imaging', 'Transit'],
'number': [1, 1, 1, 2, 1, 1, 1, 3, 2, 1],
'orbital_period': [269.3, 874.8, 1.5, 2.2, 4100.0, 763.0, 1000.5, 3.5, 2000.0, 1.0],
'mass': [7.10, 2.21, 0.02, 0.03, 5.00, 2.60, 3.40, 0.01, 6.50, 0.02],
'distance': [77.4, 56.95, 300.0, 150.5, 25.0, 19.84, 4000.0, 80.0, 32.0, 75.0],
'year': [2006, 2008, 2012, 2014, 2005, 2011, 2013, 2015, 2010, 2011]
}

df = pd.DataFrame(data)
print(df)

method number orbital_period mass distance year


0 Radial Velocity 1 269.3 7.10 77.40 2006
1 Radial Velocity 1 874.8 2.21 56.95 2008
2 Transit 1 1.5 0.02 300.00 2012
3 Transit 2 2.2 0.03 150.50 2014
4 Imaging 1 4100.0 5.00 25.00 2005
5 Radial Velocity 1 763.0 2.60 19.84 2011

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 12/18
24/07/2025, 08:33 Pandas - Colab
6 Microlensing 1 1000.5 3.40 4000.00 2013
7 Transit 3 3.5 0.01 80.00 2015
8 Imaging 2 2000.0 6.50 32.00 2010
9 Transit 1 1.0 0.02 75.00 2011

df.groupby('method')['mass'].mean()

mass

method

Imaging 5.75

Microlensing 3.40

Radial Velocity 3.97

Transit 0.02

dtype: float64

df.groupby('method')['mass'].aggregate(['count','mean', 'min', 'max'])

count mean min max

method

Imaging 2 5.75 5.00 6.50

Microlensing 1 3.40 3.40 3.40

Radial Velocity 3 3.97 2.21 7.10

Transit 4 0.02 0.01 0.03

df.groupby('year')['number'].sum()

number

year

2005 1

2006 1

2008 1

2010 2

2011 2

2012 1

2013 1

2014 2

2015 3

dtype: int64

df.groupby(['method','year']).size().unstack(fill_value=0)

year 2005 2006 2008 2010 2011 2012 2013 2014 2015

method

Imaging 1 0 0 1 0 0 0 0 0

Microlensing 0 0 0 0 0 0 1 0 0

Radial Velocity 0 1 1 0 1 0 0 0 0

Transit 0 0 0 0 1 1 0 1 1

df.groupby('method')['distance'].mean()

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 13/18
24/07/2025, 08:33 Pandas - Colab

distance

method

Imaging 28.500000

Microlensing 4000.000000

Radial Velocity 51.396667

Transit 151.375000

dtype: float64

df.groupby('method')['distance'].aggregate(lambda x: x.max() - x.min())

distance

method

Imaging 7.00

Microlensing 0.00

Radial Velocity 57.56

Transit 225.00

dtype: float64

df.groupby('method').filter(lambda x: len(x)>2)

method number orbital_period mass distance year

0 Radial Velocity 1 269.3 7.10 77.40 2006

1 Radial Velocity 1 874.8 2.21 56.95 2008

2 Transit 1 1.5 0.02 300.00 2012

3 Transit 2 2.2 0.03 150.50 2014

5 Radial Velocity 1 763.0 2.60 19.84 2011

7 Transit 3 3.5 0.01 80.00 2015

9 Transit 1 1.0 0.02 75.00 2011

Pivot table

import pandas as pd
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})

df
table = pd.pivot_table(df, index=['A', 'B'])
table

A B

Boby Graduate 23.0

John Masters 27.0

Mina Graduate 21.0

Nicky Graduate 24.0

Peter Masters 23.0

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 14/18
24/07/2025, 08:33 Pandas - Colab
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})
table = pd.pivot_table(df, values='C', index='C', columns='B', aggfunc='sum')
print(table)

B Graduate Masters
C
21 Mina NaN
23 Boby Peter
24 Nicky NaN
27 NaN John

import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]
})
table = pd.pivot_table(df, values='C', index=['A', 'B'], aggfunc='mean', margins=True)
table

A B

Boby Graduate 23.0

John Masters 27.0

Mina Graduate 21.0

Nicky Graduate 24.0

Peter Masters 23.0

All 23.6

import pandas as pd

df = pd.DataFrame({'Product': ['Carrots', 'Broccoli', 'Banana', 'Banana',


'Beans', 'Orange', 'Broccoli', 'Banana'],
'Category': ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
'Amount': [270, 239, 617, 384, 626, 610, 62, 90]})
df

Product Category Quantity Amount

0 Carrots Vegetable 8 270

1 Broccoli Vegetable 5 239

2 Banana Fruit 3 617

3 Banana Fruit 4 384

4 Beans Vegetable 5 626

5 Orange Fruit 9 610

6 Broccoli Vegetable 11 62

7 Banana Fruit 8 90

pivot = df.pivot_table(index=['Product'],
values=['Amount'],
aggfunc='sum')
print(pivot)

Amount
Product
Banana 1091
Beans 626
Broccoli 301
Carrots 270
Orange 610

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 15/18
24/07/2025, 08:33 Pandas - Colab

pivot = df.pivot_table(index=['Category'],
values=['Amount'],
aggfunc='sum')
print(pivot)

Amount
Category
Fruit 1701
Vegetable 1197

pivot = df.pivot_table(index=['Product', 'Category'],


values=['Amount'], aggfunc='sum')
print(pivot)

Amount
Product Category
Banana Fruit 1091
Beans Vegetable 626
Broccoli Vegetable 301
Carrots Vegetable 270
Orange Fruit 610

pivot = df.pivot_table(index=['Category'], values=['Amount'],


aggfunc={'median', 'mean', 'min'})
print(pivot)

Amount
mean median min
Category
Fruit 425.25 497.0 90
Vegetable 299.25 254.5 62

Start coding or generate with AI.

keyboard_arrow_down Vectorized String operations


Vectorized string operations in Pandas are powerful and efficient because they are optimized for performance and operate element-wise on
entire columns (i.e., Series) of string values without using loops.

In Pandas, you can access string methods using the .str accessor on a Series. Here's a clear overview with examples:

import pandas as pd

data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'city': ['New York', 'los angeles', 'Chicago', 'Houston', 'PHOENIX']
}

df = pd.DataFrame(data)

1. Case Conversation

df['name'].str.lower()

name

0 alice

1 bob

2 charlie

3 david

4 eva

dtype: object

df['city'].str.upper()

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 16/18
24/07/2025, 08:33 Pandas - Colab

city

0 NEW YORK

1 LOS ANGELES

2 CHICAGO

3 HOUSTON

4 PHOENIX

dtype: object

df['name'].str.title()

name

0 Alice

1 Bob

2 Charlie

3 David

4 Eva

dtype: object

2. String Matching and Searching

contains()

df['name'].str.contains('o')

name

0 False

1 True

2 False

3 False

4 False

dtype: bool

startswith()

df['name'].str.startswith('A')

name

0 True

1 False

2 False

3 False

4 False

dtype: bool

endswith()

df['city'].str.endswith('a')

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 17/18
24/07/2025, 08:33 Pandas - Colab

city

0 False

1 False
df['name'].str.match('A.*')
2 False

3 name
False

0
4 True
False

1 False
dtype: bool
2 False

3 False

4 False

dtype: bool

3. String Replacement

df['name'].str.replace('a','A')

name

0 Alice

1 Bob

2 ChArlie

3 DAvid

4 EvA

dtype: object

df['name'].str[0:4]

name

0 Alic

1 Bob

2 Char

3 Davi

4 Eva

dtype: object

df['name'].str.slice(0, 3)

name

0 Ali

1 Bob

2 Cha

3 Dav

4 Eva

dtype: object

df['city'].str.len()

city

0 8

1 11

2 7

3 7

4 7

https://colab.research.google.com/drive/11AzfFuazrGwZXlQHz6mJWS2sZDyTUfkz#printMode=true 18/18

You might also like