0% found this document useful (0 votes)

230 views27 pages

Pandas

The document provides an introduction to the Pandas library in Python, highlighting its key features for data manipulation and analysis, including data structures like Series and DataFrame. It covers installation, creation of Series and DataFrames, basic operations, and methods for handling missing data. Additionally, it includes examples of statistical methods and data manipulation techniques such as selecting, adding, deleting, and renaming columns.

Uploaded by

ashupremchauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

230 views27 pages

Pandas

Uploaded by

ashupremchauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Pandas Data Science using Python

GEETA DESAI
KrishnanGEE

Module 1: Introduction to Pandas

1. Pandas

Pandas is an open-source Python library used for data manipulation and analysis. It provides data structures and
functions needed to work efficiently with structured data, particularly tabular data.

Key Features of Pandas:

 Fast and efficient for manipulating data, including missing data handling.

 Flexible data structures: It provides the Series and DataFrame structures to hold and manipulate data in various
formats (e.g., CSV, Excel).

 Integrated with other libraries: Pandas works well with libraries like NumPy, Matplotlib, and Seaborn for
numerical computations and visualization.

Install Pandas using pip:

pip install pandas

To verify if Pandas is installed, you can import it in your Python environment:

import pandas as pd
print(pd. version )

1 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

2. Data Structures in Pandas

Pandas provides two primary data structures:

 Series: 1-dimensional array-like structure (similar to a column in a spreadsheet).

 DataFrame: 2-dimensional table-like structure (similar to an entire spreadsheet or SQL table).

Series

Creating a Series from a List: Creating a Series from a Dictionary:

import pandas as pd import pandas as pd

data = [10, 20, 30, 40] data = {'a': 10, 'b': 20, 'c': 30}
series = pd.Series(data) series = pd.Series(data)
print(series) print(series)

Output: Output:

0 10 a 10
1 20 b 20
2 30 c 30
3 40 dtype: int64
dtype: int64
In this above, the dictionary keys become the index
Here, 0, 1, 2, 3 are the indices, and 10, 20, 30, 40 are labels.
the values.

Accessing Elements in a Series:

We can access Series elements using their index or by using their key label:

import pandas as pd

data = [10, 20, 30, 40]

series_list = pd.Series(data)
print(series_list[0])

data = {'a': 10, 'b': 20, 'c': 30}

series_dict = pd.Series(data)
print(series_dict['b'])

Output:

10
20

2 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Vectorized Operations:

We can able to apply operations directly to the entire Series:

import pandas as pd

data = [10, 20, 30, 40]

series_list = pd.Series(data)
print(series_list + 5) # Adds 5 to each element

data = {'a': 10, 'b': 20, 'c': 30}

series_dict = pd.Series(data)
print(series_dict + 5) # Adds 5 to each element

Output:

0 15
1 25
2 35
3 45
dtype: int64

a 15
b 25
c 35
dtype: int64

Statistical Methods for Series:

mean(): Calculates the average of all the elements.

min(): Returns the smallest value among the elements.
max(): Returns the largest value among the elements.
std(): Computes the standard deviation, measuring the spread of the data.
sum(): Adds up all the elements in the Series or DataFrame column

import pandas as pd

data = [10, 20, 30, 40]

series_list = pd.Series(data)
print(series_list.mean()) # 25.0
print(series_list.sum()) # 100
print(series_list.min()) # 10
print(series_list.max()) # 40
print(series_list.std()) # 12.909944487358056

data = {'a': 30, 'b': 40, 'c': 50}

series_dict = pd.Series(data)
print(series_dict.mean()) # 40.0
print(series_dict.sum()) # 120
print(series_dict.min()) # 30
print(series_dict.max()) # 50
print(series_dict.std()) # 10.0

3 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

DataFrame

Creating a DataFrame from a Dictionary:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Occupation': ['Engineer', 'Doctor', 'Artist']
}

df = pd.DataFrame(data)
print(df)

Output:

Name Age Occupation

0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 35 Artist

Creating a DataFrame from a List of Lists:

import pandas as pd

data = [['Alice', 25, "BLR"],

['Bob', 30, "TRPL"],
['Charlie', 35, "GLB"],
['Dread Wing', 30, "CYB"],
['Bumble Bee', 40, "CYB"],
['Arcee', 45, 'CYB']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'Place'])

print(df)

Output:

Name Age Place

0 Alice 25 BLR
1 Bob 30 TRPL
2 Charlie 35 GLB

Viewing data

head(): Shows the first few rows (default 5 rows).

tail(): Shows the last few rows (default 5 rows).

info(): Displays information about the DataFrame including data types and memory usage.

describe(): Generates descriptive statistics like mean, min, max, and quartiles for numeric columns.

4 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

import pandas as pd

data = [['Alice', 25, "BLR"],

['Bob', 30, "TRPL"],
['Charlie', 35, "GLB"],
['Dread Wing', 30, "CYB"],
['Bumble Bee', 40, "CYB"],
['Arcee', 45, 'CYB']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'Place'])

print("Head")
print(df.head(), '\n')
print("Tail")
print(df.tail(), '\n')
print("Description")
print(df.describe(),'\n')
print("Info")
df.info() # Just Call the method here

Output:
Head Description
Name Age Place Age
0 Alice 25 BLR count 6.000000
1 Bob 30 TRPL mean 34.166667
2 Charlie 35 GLB std 7.359801
3 Dread Wing 30 CYB min 25.000000
4 Bumble Bee 40 CYB 25% 30.000000
50% 32.500000
Tail 75% 38.750000
Name Age Place max 45.000000
1 Bob 30 TRPL
2 Charlie 35 GLB Info
3 Dread Wing 30 CYB <class 'pandas.core.frame.DataFrame'>
4 Bumble Bee 40 CYB RangeIndex: 6 entries, 0 to 5
5 Arcee 45 CYB Data columns (total 3 columns):
# Column Non-Null Count Dtype

0 Name 6 non-null object

1 Age 6 non-null int64
2 Place 6 non-null object
dtypes: int64(1), object(2)
memory usage: 272.0+ bytes

5 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Module 2: Data Manipulation with Pandas

Basic DataFrame Operations

Selecting Columns
import pandas as pd

data = [['Alice', 25, "BLR"],

['Bob', 30, "TRPL"],
['Charlie', 35, "GLB"],
['Dread Wing', 30, "CYB"],
['Bumble Bee', 40, "CYB"],
['Arcee', 45, 'CYB']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'Place'])

print(df['Name']) # Returns as a Series

print("\n")

print(df[['Name', 'Age']]) # Returns as a DataFrame

Output:

0 Alice
1 Bob
2 Charlie
3 Dread Wing
4 Bumble Bee
5 Arcee
Name: Name, dtype: object

Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 Dread Wing 30
4 Bumble Bee 40
5 Arcee 45

6 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Selecting Rows (Using loc or iloc)

import pandas as pd

# Data as a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Dread Wing', 'Bumble Bee', 'Arcee'],
'Age': [25, 30, 35, 30, 40, 45],
'Place': ['BLR', 'TRPL', 'GLB', 'CYB', 'CYB', 'CYB']
}

# Create DataFrame from dictionary

df = pd.DataFrame(data)

# Select the first row using iloc (position-based)

print(df.iloc[0],"\n")

# Select the first and third rows using iloc

print(df.iloc[[0, 2]],"\n")

# Select the first row using loc (label-based)

print(df.loc[0],"\n")

# Select rows where Age > 30

print(df[df['Age'] > 30],"\n")

# Select rows where Age > 30 and Place is 'CYB'

print(df[(df['Age'] > 30) & (df['Place'] == 'CYB')],"\n")

Output:

Name Alice
Age 25
Place BLR
Name: 0, dtype: object

Name Age Place

0 Alice 25 BLR
2 Charlie 35 GLB

Name Alice
Age 25
Place BLR
Name: 0, dtype: object

Name Age Place

2 Charlie 35 GLB
4 Bumble Bee 40 CYB
5 Arcee 45 CYB

Name Age Place

4 Bumble Bee 40 CYB
5 Arcee 45 CYB

7 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Adding, deleting, and renaming columns

Adding Column

import pandas as pd

# Sample DataFrame
data = {'Name': ['Rahul', 'Ananya', 'Vikram'],
'Age': [25, 30, 22]}

df = pd.DataFrame(data)

# Adding a new column

df['City'] = ['Delhi', 'Mumbai', 'Bangalore']

print(df)

print('\n')
# Adding a new column based on existing column
df['Age+5'] = df['Age'] + 5

print(df)

Output:

Name Age City

0 Rahul 25 Delhi
1 Ananya 30 Mumbai
2 Vikram 22 Bangalore

Name Age City Age+5

0 Rahul 25 Delhi 30
1 Ananya 30 Mumbai 35
2 Vikram 22 Bangalore 27

Deleting Columns
import pandas as pd

# Sample DataFrame
data = {'Name': ['Rahul', 'Ananya', 'Vikram'],
'Age': [25, 30, 22],
'Age+5': [30, 35, 27]}

df = pd.DataFrame(data)

# Deleting the 'Age+5' column using drop()

df = df.drop('Age+5', axis=1)

print(df)

print('\n')
del df['Name']

print(df)

8 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Output:

Name Age
0 Rahul 25
1 Ananya 30
2 Vikram 22

Age
0 25
1 30
2 22

Renaming Columns

import pandas as pd

# Sample DataFrame
data = {'Name': ['Rahul', 'Ananya', 'Vikram'],
'Age': [25, 30, 22]}

df = pd.DataFrame(data)

# Renaming the 'Age' column to 'Years'

df = df.rename(columns={'Age': 'Years'})
print(df)

print('\n')

# Renaming all columns at once

df.columns = ['Full Name', 'Age in Years']
print(df)

Output:

Name Years
0 Rahul 25
1 Ananya 30
2 Vikram 22

Full Name Age in Years

0 Rahul 25
1 Ananya 30
2 Vikram 22

9 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Handling missing data (isnull(), dropna(), fillna())

isnull() and notnul()

import pandas as pd Output:
A B
# Sample DataFrame 0 False False
data = {'A': [1, 2, None], 1 False True
'B': [4, None, None]} 2 True True
df = pd.DataFrame(data)
A B
# Detect missing values 0 True True
print(df.isnull(),'\n') 1 True False
print(df.notnull(),'\n') 2 False False
print(df.isnull().sum(),'\n')
print(df.notnull().sum(),'\n') A 1
B 2
dtype: int64

A 2
B 1
dtype: int64

dropna()

Remove Rows with Any Missing Values

import pandas as pd

# Sample DataFrame with missing values

data = {
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, 2, 3, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

df_dropped_any = df.dropna()
print("\nDataFrame after dropping rows with any missing values:")
print(df_dropped_any)

Output:
Original DataFrame:
A B C
0 1.0 NaN 1
1 2.0 2.0 2
2 NaN 3.0 3
3 4.0 4.0 4
DataFrame after dropping rows with any missing values:
A B C
1 2.0 2.0 2
3 4.0 4.0 4

10 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Remove Rows Where All Values Are Missing

import pandas as pd

# Sample DataFrame with missing values

data = {
'A': [None, None, None, 4],
'B': [None, None, None, 4],
'C': [None, None, None, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Drop rows where all values are missing

df_dropped_all = df.dropna(how='all')
print("\nDataFrame after dropping rows where all values are missing:")
print(df_dropped_all)

Output:
Original DataFrame:
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 4.0 4.0 4.0

DataFrame after dropping rows where all values are missing:

A B C
3 4.0 4.0 4.0

Remove Columns with Any Missing Values

import pandas as pd

data = {
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, 2, 3, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

df_dropped_cols_any = df.dropna(axis=1)
print("\nDataFrame after dropping columns with any missing values:")
print(df_dropped_cols_any)

11 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Output:

Original DataFrame:
A B C
0 1.0 NaN 1
1 2.0 2.0 2
2 NaN 3.0 3
3 4.0 4.0 4

DataFrame after dropping columns with any missing values:

C
0 1
1 2
2 3
3 4

Remove Rows with Missing Values in Specific Columns

import pandas as pd

data = {
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, 2, 3, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

df_dropped_specific = df.dropna(subset=['A', 'B'])

print("\nDataFrame after dropping rows with missing values in columns A and B:")
print(df_dropped_specific)

Output:

Original DataFrame:
A B C
0 1.0 NaN 1
1 2.0 2.0 2
2 NaN 3.0 3
3 4.0 4.0 4

DataFrame after dropping rows with missing values in columns A and B:

A B C
1 2.0 2.0 2
3 4.0 4.0 4

12 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

fillna()

Fill Missing Values with a Constant Value

import pandas as pd

# Sample DataFrame with missing values

data = {
'A': [1, 2, None, 4],
'B': [None, 2, 3, None],
'C': [1, 2, None, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Fill missing values with a constant value (e.g., 0)

df_filled_constant = df.fillna(0)
print("\nDataFrame after filling missing values with 0:")
print(df_filled_constant)

Output:

Original DataFrame:
A B C
0 1.0 NaN 1.0
1 2.0 2.0 2.0
2 NaN 3.0 NaN
3 4.0 NaN 4.0

DataFrame after filling missing values with 0:

A B C
0 1.0 0.0 1.0
1 2.0 2.0 2.0
2 0.0 3.0 0.0
3 4.0 0.0 4.0

13 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Forward and Backward fille

import pandas as pd

# Creating the DataFrame

data = {'A': [1.0, None, 3.0, 4.0],
'B': [None, 2.0, None, 4.0],
'C': [1.0, 2.0, 3.0, None]}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Forward filling the missing values

ffill = df.ffill()

print("DataFrame after forward filling:")

print(ffill)

# Backward filling the missing values

bfill = df.bfill()

print("DataFrame after backward filling:")

print(bfill)

Output:

Original DataFrame:
A B C
0 1.0 NaN 1.0
1 NaN 2.0 2.0
2 3.0 NaN 3.0
3 4.0 4.0 NaN
DataFrame after forward filling:
A B C
0 1.0 NaN 1.0
1 1.0 2.0 2.0
2 3.0 2.0 3.0
3 4.0 4.0 3.0
DataFrame after backward filling:
A B C
0 1.0 2.0 1.0
1 3.0 2.0 2.0
2 3.0 4.0 3.0
3 4.0 4.0 NaN

14 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Fill Missing Values with the Mean of a Column

import pandas as pd

# Sample DataFrame with missing values

data = {
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, 2, 3, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Fill missing values in column 'A' with its mean

mean_A = df['A'].mean()
df['A'] = df['A'].fillna(mean_A)

print("\nDataFrame after filling missing values in column A with its mean:")

print(df)

Output:

Original DataFrame:
A B C
0 1.0 NaN 1
1 2.0 2.0 2
2 NaN 3.0 3
3 4.0 4.0 4

DataFrame after filling missing values in column A with its mean:

A B C
0 1.000000 NaN 1
1 2.000000 2.0 2
2 2.333333 3.0 3
3 4.000000 4.0 4

15 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Fill Missing Values with Different Values for Each Column

import pandas as pd

# Sample DataFrame with missing values

data = {
'A': [1, None, 3, None],
'B': [None, 2, None, 4],
'C': [1, None, 3, 4]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Fill missing values with different values for each column

fill_values = {'A': 0, 'B': 99, 'C': 5}
df_filled_diff = df.fillna(value=fill_values)
print("\nDataFrame after filling missing values with different values for each
column:")
print(df_filled_diff)

Output:

Original DataFrame:
A B C
0 1.0 NaN 1.0
1 NaN 2.0 NaN
2 3.0 NaN 3.0
3 NaN 4.0 4.0

DataFrame after filling missing values with different values for each column:
A B C
0 1.0 99.0 1.0
1 0.0 2.0 5.0
2 3.0 99.0 3.0
3 0.0 4.0 4.0

16 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Filling Missing Values with Values from Another Series or DataFrame

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({
'A': [None, 2, None, 4],
'B': [1, None, 3, None]
})

df2 = pd.DataFrame({
'A': [5, 6, 7, 8],
'B': [9, 10, 11, 12]
})

print("Original DataFrame:")
print(df1)

print("\nFilling missing values using another DataFrame:")

df_filled_from_other = df1.fillna(df2)
print(df_filled_from_other)

Output:

Original DataFrame:
A B
0 NaN 1.0
1 2.0 NaN
2 NaN 3.0
3 4.0 NaN

Filling missing values using another DataFrame:

A B
0 5.0 1.0
1 2.0 10.0
2 7.0 3.0
3 4.0 12.0

17 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Filling Missing Values with a Dictionary of Functions

import pandas as pd

# Sample DataFrame
data = {
'A': [None, 2, None, 4],
'B': [1, None, 3, None]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Fill missing values using different functions for each column

df_filled = df.fillna({'A': df['A'].mean(), 'B': df['B'].median()})
print("\nDataFrame after filling missing values with dict methods")
print(df_filled)

Output:

Original DataFrame:
A B
0 NaN 1.0
1 2.0 NaN
2 NaN 3.0
3 4.0 NaN

DataFrame after filling missing values with mean for 'A' and median for 'B':
A B
0 3.0 1.0
1 2.0 2.0
2 3.0 3.0
3 4.0 2.0

18 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Data Alignment and Operations

Arithmetic operations on DataFrames and Series

import pandas as pd Output:

# Creating two DataFrames A B

df1 = pd.DataFrame({ X 8 14
'A': [1, 2, 3], Y 10 16
'B': [4, 5, 6] Z 12 18
}, index=['X', 'Y', 'Z'])
A B
df2 = pd.DataFrame({ X -6 -6
'A': [7, 8, 9], Y -6 -6
'B': [10, 11, 12] Z -6 -6
}, index=['X', 'Y', 'Z'])
A B
# Arithmetic operation
print(df1 + df2, '\n') X 7 40
print(df1 - df2, '\n') Y 16 55
print(df1 * df2, '\n') Z 27 72
print(df1 / df2)
A B
X 0.142857 0.400000
Y 0.250000 0.454545
Z 0.333333 0.500000

Data alignment during operations

Pandas performs alignment automatically when performing arithmetic operations on objects that may
not have the same labels (for both Series and DataFrames). This means that Pandas will match on the row and
column labels, and if the labels don't match, it will fill missing data with NaN.

import pandas as pd

df1 = pd.DataFrame({ Output:

'A': [1, 2, 3],
A B
'B': [4, 5, 6]
}, index=['X', 'Y', 'Z']) X 1 4
print(df1) Y 2 5
Z 3 6
df2 = pd.DataFrame({ A B
'A': [7, 8, 9], X 7 10
'B': [10, 11, 12] Y 8 11
}, index=['X', 'Y', 'Z']) Z 9 12
print(df2) A C
X 7 10
df3 = pd.DataFrame({ Y 8 11
'A': [7, 8, 9], Z 9 12
'C': [10, 11, 12] A B C
}, index=['X', 'Y', 'Z'])
X 8 NaN NaN
print(df3)
# Addition with alignment Y 10 NaN NaN
df_sum_misaligned = df1 + df3 Z 12 NaN NaN
print(df_sum_misaligned)

19 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Broadcasting and handling mismatched data

Broadcasting in Pandas refers to performing operations between objects of different dimensions, such
as a Series and a DataFrame

import pandas as pd

df1 = pd.DataFrame({
'A': [1, 2, 3],
Output:
'B': [4, 5, 6]
}, index=['X', 'Y', 'Z']) A B
print(df1) X 1 4
Y 2 5
# Series for broadcasting
Z 3 6
series = pd.Series([10, 20, 30],
index=['A', 'B', A B C
'C']) X 11 24 NaN
Y 12 25 NaN
# Broadcasting addition Z 13 26 NaN
df_broadcast = df1 + series
print(df_broadcast)

Sorting Data

Sorting by index (sort_index()) and Sorting by values (sort_values())

import pandas as pd

# Creating a DataFrame
data = {'Name': ['Ravi', 'Anita', 'Priya', 'Amit'],
'Age': [28, 22, 25, 32]}
df = pd.DataFrame(data, index=['b', 'a', 'd', 'c'])

# Sorting by index (row labels)

sorted_df = df.sort_index()

print(sorted_df)
print('\n')

# Sorting by the 'Age' column

sorted_by_values = df.sort_values(by='Age')

print(sorted_by_values)

Output:
Name Age
a Anita 22
b Ravi 28
c Amit 32
d Priya 25

Name Age
a Anita 22
d Priya 25
b Ravi 28
c Amit 32

20 | 😁 👻 ✌️ 😎
Pandas – Data Science using Python

Module 3: Reading and Writing Data Into files

Working With CSV file

Reading a CSV File

Let’s say we have a CSV file called employees.csv with the following content in the same folder where we
have the python file:
Name,Department,Salary
Vijay,Sales,50000
Nisha,HR,60000

Anil,Finance,70000

Example Output
import pandas as pd Name Department Salary

df = pd.read_csv('employees.csv') 0 Vijay Sales 50000

1 Nisha HR 60000
print(df.head())
2 Anil Finance 70000

Breakdown

Step 1: Importing Pandas We import the Pandas library

Step 2: Reading CSV File pd.read_csv() is used to read a CSV file and store its contents in a DataFrame.

Step 3: Displaying Data The .head() method displays the first 5 rows of the DataFrame to quickly inspect the

Writing CSV files (to_csv())

Let’s create and write the following data to a new CSV file new_employees.csv.

Example
import pandas as pd

data = {
'Name': ['Ramesh', 'Sunita', 'Amit'],
'Department': ['IT', 'Marketing', 'Operations'],
'Salary': [80000, 65000, 72000]
}

df = pd.DataFrame(data)

df.to_csv('new_employees.csv', index=False)

df_new = pd.read_csv('new_employees.csv')
print(df_new)
Pandas – Data Science using Python

Breakdown

Step 0: Import library to the code

Step 1: Creating Data The sample data is stored in a dictionary format where each key represents a
column name.

Step 2: Creating DataFrame We convert the dictionary into a Pandas DataFrame.

Step 3: Writing CSV File The .to_csv() method is used to write the DataFrame to a CSV file named
new_employees.csv. We set index=False to avoid writing row indices into the CSV.

Step 4: Verifying Data We read the newly created CSV file back to verify its content.

Working With XLSX file

Reading Excel Files(read_excel()).(make sure openpyxl module installed)

Let’s assume we have an Excel file students.xlsx with the following sheet:

Name Grade
Amit A
Priya B
Sohan A

Example
import pandas as pd

# Reading an Excel file

df = pd.read_excel('students.xlsx')

# Display the first 5 rows

print(df.head())

Output
Student Marks
0 Ravi 85
1 Kiran 90
2 Anjali 88

Breakdown

Step 1: Import Pandas We import Pandas to handle the Excel file.

Step 2: Reading Excel File The pd.read_excel() function reads the students.xlsx file.

Step 3: Displaying Data We use .head() to inspect the first few rows of the DataFrame.
Pandas – Data Science using Python

Writing Excel files (to_excel())

Let’s write a DataFrame to an Excel file called marks.xlsx

Example

import pandas as pd

# Creating sample data

data = {
'Student': ['Ravi', 'Kiran', 'Anjali'],
'Marks': [85, 90, 88]
}

# Creating DataFrame from the data

df = pd.DataFrame(data)

# Writing DataFrame to Excel file

df.to_excel('marks.xlsx', index=False)

# Verifying by reading the newly created Excel file

df_new = pd.read_excel('marks.xlsx')
print(df_new)

Breakdown

Step 1: Creating Data A dictionary is used to store the student names and their marks.

Step 2: Creating DataFrame The dictionary is converted into a Pandas DataFrame.

Step 3: Writing Excel File The DataFrame is written to an Excel file named marks.xlsx. The index=False
option prevents Pandas from writing row numbers into the Excel file.

Step 4: Verifying Data The newly created Excel file is read back into a DataFrame for verification.

Working With JSON file

Reading a JSON File: Let’s assume we have a JSON file data.json with the following content:

[
{"Name": "Nikhil", "Age": 23},
{"Name": "Sonal", "Age": 25},
{"Name": "Pooja", "Age": 22}
]

Example Output:

import pandas as pd Name Age

0 Nikhil 23
# Reading a JSON file 1 Sonal 25
df = pd.read_json('data.json') 2 Pooja 22
# Display the first 5 rows
print(df.head())
Pandas – Data Science using Python

Breakdown

Step 1: Import Pandas We import Pandas to handle the JSON file.

Step 2: Reading JSON File The pd.read_json() function reads the JSON file and loads it into a DataFrame.

Step 3: Displaying Data The .head() method is used to display the first few rows of the DataFrame.

Writing a JSON File

Let’s write a DataFrame to a JSON file named cities.json

Example

import pandas as pd

# Creating sample data

data = {
'City': ['Delhi', 'Bangalore', 'Hyderabad'],
'Population': [3000000, 9000000, 6000000]
}

# Creating DataFrame from the data

df = pd.DataFrame(data)

# Writing DataFrame to JSON file

df.to_json('cities.json')

# Verifying by reading the newly created JSON file

df_new = pd.read_json('cities.json')
print(df_new)

Breakdown

Step 1: Creating Data A dictionary is used to store city names and their population.

Step 2: Creating DataFrame The dictionary is converted into a Pandas DataFrame.

Step 3: Writing JSON File The DataFrame is written to a JSON file named cities.json.

Step 4: Verifying Data The JSON file is read back to ensure it was written correctly.
Pandas – Data Science using Python
Pandas – Data Science using Python
Pandas – Data Science using Python

Pandas
No ratings yet
Pandas
8 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
48 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
87 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
4 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Pandas
No ratings yet
Pandas
86 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
246 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas
No ratings yet
Pandas
30 pages
PANDAS Cheatsheet
No ratings yet
PANDAS Cheatsheet
4 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
Pandas & Matplotlib Cheat Sheet
No ratings yet
Pandas & Matplotlib Cheat Sheet
2 pages
Data Frame
No ratings yet
Data Frame
17 pages
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
Pandas vs PySpark: Data Operations
No ratings yet
Pandas vs PySpark: Data Operations
3 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
Project
No ratings yet
Project
18 pages
Pandas
No ratings yet
Pandas
14 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Journal 12
No ratings yet
Journal 12
54 pages
Pandas Guide for Data Science
No ratings yet
Pandas Guide for Data Science
42 pages
IP TERM-1 Study Material (Session 2021-22)
No ratings yet
IP TERM-1 Study Material (Session 2021-22)
84 pages
1 - Interactive Data Visualization With Bokeh
No ratings yet
1 - Interactive Data Visualization With Bokeh
31 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Analyzing Data Using Python Filtering Data in Pandas
No ratings yet
Analyzing Data Using Python Filtering Data in Pandas
52 pages
Python Data Visualization Guide
No ratings yet
Python Data Visualization Guide
16 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Atlas Honda: Vision Statement
No ratings yet
Atlas Honda: Vision Statement
1 page
NumPy Array Operations Guide
100% (1)
NumPy Array Operations Guide
73 pages
Pandas
No ratings yet
Pandas
5 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Video Intercom / Access Control / Alarm: Products and Solutions
No ratings yet
Video Intercom / Access Control / Alarm: Products and Solutions
56 pages
LEAKED SEO SWIPES Rank1.com From Panel Rank Facebook - Ad - 'S Made With Getkong - Ai
No ratings yet
LEAKED SEO SWIPES Rank1.com From Panel Rank Facebook - Ad - 'S Made With Getkong - Ai
14 pages
ACME Solar - Team 7
No ratings yet
ACME Solar - Team 7
15 pages
Curve Tracing Lecture Notes
No ratings yet
Curve Tracing Lecture Notes
11 pages
Lecture09 CE72.12Isoparametric Formulation
No ratings yet
Lecture09 CE72.12Isoparametric Formulation
14 pages
Unit 5 Evaluating Information Sources
No ratings yet
Unit 5 Evaluating Information Sources
11 pages
SCAW Installation and Upgrade Procedure SCAW-9003B
No ratings yet
SCAW Installation and Upgrade Procedure SCAW-9003B
6 pages
Rozdział 12 - Nauka I Technika - MiniMatura (Grupa B)
No ratings yet
Rozdział 12 - Nauka I Technika - MiniMatura (Grupa B)
2 pages
BlueJay Uttar Presentation
No ratings yet
BlueJay Uttar Presentation
13 pages
Creating Effective User Personas
No ratings yet
Creating Effective User Personas
14 pages
Drawing - Boll & Kirch Filterbau GMBH
No ratings yet
Drawing - Boll & Kirch Filterbau GMBH
7 pages
ABB UPS Maintenance Offer For Shalafa Intl
No ratings yet
ABB UPS Maintenance Offer For Shalafa Intl
2 pages
Hydraulic Control Valves: Operating Guidelines
No ratings yet
Hydraulic Control Valves: Operating Guidelines
2 pages
CS601 Short Notes (VUAnswer - Com) Topic 124 To 204
100% (1)
CS601 Short Notes (VUAnswer - Com) Topic 124 To 204
98 pages
A. General C. List of Drawings:: (Proposed G+1+Lr Villa)
No ratings yet
A. General C. List of Drawings:: (Proposed G+1+Lr Villa)
2 pages
Medical Dosimetry Certification Exam Pass Rates Does Degree Level Make A Difference
No ratings yet
Medical Dosimetry Certification Exam Pass Rates Does Degree Level Make A Difference
1 page
Differential Calculus: y + y F (X + X) y F (X + X) - y or y F (X + X) - F (X)
No ratings yet
Differential Calculus: y + y F (X + X) y F (X + X) - y or y F (X + X) - F (X)
13 pages
Workstation PC Build Download - Keyframes Animation
No ratings yet
Workstation PC Build Download - Keyframes Animation
4 pages
3s01fxa 0712
No ratings yet
3s01fxa 0712
2 pages
Dbms Imp Questions Unit Wise
No ratings yet
Dbms Imp Questions Unit Wise
2 pages
Amazon Go: Cashless Shopping Debut
No ratings yet
Amazon Go: Cashless Shopping Debut
2 pages
Manual Kick Tolerance Guide
100% (1)
Manual Kick Tolerance Guide
3 pages
MSB-HDR Sav
No ratings yet
MSB-HDR Sav
12 pages
FORMWORKS For COLUMNS
No ratings yet
FORMWORKS For COLUMNS
4 pages
Importance of Language Laboratory in Developing La
No ratings yet
Importance of Language Laboratory in Developing La
6 pages
Resource Planning and Scheduling
No ratings yet
Resource Planning and Scheduling
10 pages
4 Bit Binary Adder (Mini-Calculator)
No ratings yet
4 Bit Binary Adder (Mini-Calculator)
9 pages
Accounting Error Canceled Invoice
No ratings yet
Accounting Error Canceled Invoice
10 pages
An Introduction To Python For Scientific Computing: © 2019 M. Scott Shell Last Modified 9/24/2019
No ratings yet
An Introduction To Python For Scientific Computing: © 2019 M. Scott Shell Last Modified 9/24/2019
62 pages
WI1 3E Relay
100% (1)
WI1 3E Relay
30 pages