0% found this document useful (0 votes)

2 views20 pages

Pandas Notes

Uploaded by

yadavasit24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views20 pages

Pandas Notes

Uploaded by

yadavasit24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1.

Need and Overview of Pandas:

What is Pandas?

Pandas is a Python library for data manipulation and analysis. It provides data
structures like Series (1D) and DataFrame (2D), making it easy to work with
structured data.

Why is Pandas Needed?

● Efficiently handles large datasets.

● Simplifies data cleaning, transformation, and analysis.
● Integrates with libraries like NumPy and Matplotlib.
● Supports various file formats: CSV, Excel, JSON, SQL, etc.

2. Setup for Pandas:

Step 1: Install Pandas

Pandas can be installed using pip, Python's package manager.run:

pip install pandas

For Jupyter Notebook/ Google Colab users, install Pandas using the following command
to ensure compatibility:

!pip install pandas

Step 2: Import Pandas

To use Pandas in your Python script or notebook, import it using the standard alias:

import pandas as pd

3. Pandas Data Structures: Series and DataFrame

Pandas provides two main data structures to handle and manipulate data efficiently: Series and
DataFrame.
i. Series

A Series is a one-dimensional labeled array that can hold data of any type (e.g.,
integers, floats, strings). It is similar to a column in a spreadsheet or a Python list with an
index.

Key Features

● Indexing: Each element has a unique label (index).

● Homogeneous: Holds data of a single type (e.g., all integers or all strings).

Code Example:

1. Creating a Series

2. Accessing Data in a Series

ii. DataFrame

● DataFrame: A 2D labeled data structure similar to a table.

Key Features

● Labeled Rows and Columns: Each row and column has a unique label (index
and column names).
● Heterogeneous: Columns can hold data of different types.

Note : Difference Between DataFrames and 2D Arrays:

DataFrames have labeled rows and columns, whereas arrays rely solely on numerical
indices.

Code Example:

1. Creating a DataFrame

2. Accessing Data in a DataFrame

Common File Formats for Datasets:

Note:
Parquet and Feather file formats, which are optimized for fast reading and writing of large
datasets. These formats are commonly used in data engineering and analytics for efficient
storage and processing.

Common Methods for Inspecting Data in Pandas:

These methods are particularly helpful for inspecting large datasets by viewing a small subset at
the beginning, end, or randomly.
Details of DataFrames:

Labels (Columns, Index ), Shape, Size, Info, and Describe:

Pandas provides several methods to quickly understand and summarize the structure and
content of a DataFrame.

Accessing Data Using .loc[] and .iloc[]

In pandas, .loc[] and .iloc[] are powerful indexers used to access and manipulate data in
a DataFrame.

1. .Ioc[]

.loc[] is primarily label-based indexing. It is used to access rows and columns by their
labels (names).

● It can accept a row label and column label to return a specific value or subset
of data.
● You can use boolean conditions with .loc[] as well.
Code Example:

2. .iloc[]

.iloc[] is primarily integer position-based indexing. It is used to access rows and columns
by their integer index positions.

● It works with integer-based indexing, so you can provide the position of the rows and
columns.
● It does not include the last index (like Python's usual behavior with slicing).

Code Example:
When to Use:

● .loc[] is useful when you need to access data by names (labels).

● .iloc[] is best when you need to access data by integer position (index numbers).

Accessing Single Values Using .at[] and .iat[]

.at[] is used to access a single value in a DataFrame by label.

Example:
df.at[row_label, column_label]

.iat[] is used to access a single value in a DataFrame by integer position.

Example:
df.iat[row_position, column_position]

Accessing Columns: Shorthand and Dot Notation

Shorthand Notation: Access a column in a DataFrame by label using square brackets.

Example:
df[‘column_name’]

Dot Notation: Access a column in a DataFrame by label using dot notation.

Example:
df.column_name
Filtering Data Based on Conditions

You can filter data by applying conditions to one or more columns to return rows that meet the
specified criteria.

Syntax:
df[condition]

condition: Boolean condition applied to one or more columns.

Example:

1. Filter rows based on a single condition:

Condition: Age > 30 )

df[df['Age'] > 30]

2. Filter rows based on multiple conditions (AND):

Condition: Age > 30 and City is "Chicago"

df[(df['Age'] > 30) & (df['City'] == 'Chicago')]

3. Filter rows based on multiple conditions (OR):

Condition: Age > 30 or City is "New York"

df[(df['Age'] > 30) | (df['City'] == 'New York')]

4. Filter rows using isin() for multiple values:

Condition: City is either "Chicago" or "Houston"

df[df['City'].isin(['Chicago', 'Houston'])]

Note: You can apply conditions based on numerical comparisons, string matching, and more,
using & (AND) and | (OR) for combining multiple conditions.

Regular Expressions (Regex) in Pandas

Regular expressions allow you to filter, match, and manipulate string data in pandas columns
based on patterns.

Common Syntax:

1. Filter rows containing a pattern:

Syntax:

df[df['column_name'].str.contains('pattern', regex=True)]

2. Filter rows not containing a pattern:

Syntax:

df[~df['column_name'].str.contains('pattern', regex=True)]

3. Filter rows starting with a specific pattern:

Syntax:

df[df['column_name'].str.match('^pattern')]

4. Replace values using regex:

Syntax:

df['column_name'] = df['column_name'].str.replace('pattern',
'replacement', regex=True)
General patterns which widely used with regex:
Transforming Data Using apply()
The apply() method in pandas is used to apply a custom function or a predefined operation
along the rows (axis=1) or columns (axis=0) of a DataFrame or on a Series.

Syntax:

For Series: Series.apply(func)

For DataFrame: DataFrame.apply(func, axis=0/1)

Example 1: Applying a Function to a Series

Example 2: Applying a Function to a Series

Example 3: Applying a Function Along DataFrame Rows

Transforming or Adding Data Using where()

The where() method in pandas is used to conditionally transform data. It retains values that
meet a given condition and replaces others with a specified value (default is NaN).

Syntax:

For Series: Series.where(cond, other=np.nan) # np -> numpy alias

For DataFrame: DataFrame.where(cond, other=np.nan, axis=0)

Example:
Let’s use where() on the above data and dataFrame

Inserting Columns:

Syntax : df.insert(position, new_column_name, column_data)

Example: df.insert(1, 'Gender', ['F', 'M'])

Dropping Columns

Syntax : df.drop(column_name, axis=1, inplace=True)

Example: df.drop('Gender', axis=1, inplace=True)
Renaming Columns

Syntax :

df.rename(columns={'old_column_name': 'new_column_name'},
inplace=True)

Example:

df.rename(columns={'name': 'FullName'}, inplace=True))

Merging DataFrames: Inner, Outer, Left, Right Joins

Merging combines two DataFrames using a common key (or keys). Joins control how the
DataFrames are merged based on the relationship of their keys.

Code Examples:
1. Inner Join

2. Outer Join

3. Left Join

4. Right Join

Concatenating DataFrames
Concatenation in pandas refers to combining two or more DataFrames along a particular axis
(either rows or columns). The concat() function is used to join DataFrames either vertically
(stacking rows) or horizontally (joining columns).

Syntax:
pd.concat([df1, df2, ...], axis=0, join='outer', ignore_index=False)

Note:
axis: Determines whether to concatenate along rows (axis=0, default) or columns (axis=1).
join: Specifies how to handle columns that are not present in both DataFrames:

● 'outer' (default): Includes all columns (union of columns).

● 'inner': Includes only columns common to all DataFrames.

ignore_index: If True, the index is reset. If False, keeps the original index from each
DataFrame.

Code Example
1. Concatenate Vertically (Stacking Rows)

2. Concatenate Horizontally (Joining Columns)

Handling Null (Missing) Values in Pandas

Null values are represented as NaN in pandas. Handling them efficiently is essential for data
cleaning and preparation. Pandas provides several methods to detect, fill, or drop missing data.

Grouping Data Using groupby()

groupby() in pandas is a powerful tool for grouping data based on one or more columns,
followed by applying aggregation or transformation operations to each group. It is commonly
used for summarizing, aggregating, and transforming data.

Syntax:

df.groupby(by, axis=0, level=None, as_index=True, sort=True,

group_keys=True)

by: Column(s) or index level(s) to group by.

axis: Axis to group along (default is 0 for rows).
level: Group by a particular level (useful for MultiIndex).
as_index: If True (default), the group labels become the index.
sort: If True (default), the groups are sorted.
group_keys: If True (default), it includes group keys in the result.

Common Operations with groupby()

1. Aggregation (e.g., sum, mean)

2. Transformation (e.g., normalization, filling missing values)
3. Iteration (e.g., iterating over groups)

Code Example:
1. Grouping and Aggregating with sum()

2.Grouping and Aggregating with multiple functions

3. Grouping and Iterating Over Groups

4. Grouping by Multiple Columns

5. Transforming Data Within Groups Using transform()

Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Unit III - Notes
No ratings yet
Unit III - Notes
12 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas
No ratings yet
Pandas
7 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas
No ratings yet
Pandas
13 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas: DataFrames & Series Guide
No ratings yet
Pandas: DataFrames & Series Guide
2 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas
No ratings yet
Pandas
4 pages
Learn Complete Pandas With Real World Interviews Questions
No ratings yet
Learn Complete Pandas With Real World Interviews Questions
40 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Module 4
No ratings yet
Module 4
38 pages
Unit 3
No ratings yet
Unit 3
10 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas
No ratings yet
Pandas
5 pages
Subject IP
No ratings yet
Subject IP
9 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas Interview Prep Guide
No ratings yet
Pandas Interview Prep Guide
5 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas and Python
No ratings yet
Pandas and Python
24 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Kishor Sonawane - Resume - Technical Lead Software
No ratings yet
Kishor Sonawane - Resume - Technical Lead Software
3 pages
Exadata Migration
100% (1)
Exadata Migration
13 pages
Exam 1Z0 1123 25 Dumps - OCI 2025 Migration Architect Professional
No ratings yet
Exam 1Z0 1123 25 Dumps - OCI 2025 Migration Architect Professional
3 pages
40572a Microsoft Access Expert 2019 Ebook
No ratings yet
40572a Microsoft Access Expert 2019 Ebook
546 pages
DBMS Basic Concepts: BY: Ashish Kumar Singh
No ratings yet
DBMS Basic Concepts: BY: Ashish Kumar Singh
42 pages
CS403P Quiz#7 File#1 3-01-2024 Miss Mehwish
No ratings yet
CS403P Quiz#7 File#1 3-01-2024 Miss Mehwish
4 pages
DBMS Exam for Computer Science Students
No ratings yet
DBMS Exam for Computer Science Students
1 page
DBMS REDUCTION 2023.PDF - Odt - 0
No ratings yet
DBMS REDUCTION 2023.PDF - Odt - 0
8 pages
Memory Management in SAP Hana 2
No ratings yet
Memory Management in SAP Hana 2
10 pages
AI Vector Search - RAG Application Using PL/SQL in Oracle Database 23ai
No ratings yet
AI Vector Search - RAG Application Using PL/SQL in Oracle Database 23ai
46 pages
Flawfinder PDF
No ratings yet
Flawfinder PDF
8 pages
DBMS Query & Transaction Basics
No ratings yet
DBMS Query & Transaction Basics
19 pages
SQL Basics Cheat Sheet Ledger
No ratings yet
SQL Basics Cheat Sheet Ledger
1 page
Practical Index List (XII)
No ratings yet
Practical Index List (XII)
2 pages
Data Base and Data Model
No ratings yet
Data Base and Data Model
22 pages
Romney Ais13 PPT 04
No ratings yet
Romney Ais13 PPT 04
29 pages
DataStage SCD Implementation Guide
No ratings yet
DataStage SCD Implementation Guide
13 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
Database Management System Using Libreoffice Base: Ntroduction
No ratings yet
Database Management System Using Libreoffice Base: Ntroduction
49 pages
Image Re-Ranking for Diverse Topics
No ratings yet
Image Re-Ranking for Diverse Topics
9 pages
Certified Data Analyst Course in Collaboration With IBM
No ratings yet
Certified Data Analyst Course in Collaboration With IBM
30 pages
SQLServerGeeks Magazine June 2021
No ratings yet
SQLServerGeeks Magazine June 2021
45 pages
Literature Review On Computerized School Management System
No ratings yet
Literature Review On Computerized School Management System
4 pages
Question Bank (DBMS)
No ratings yet
Question Bank (DBMS)
14 pages
Advanced Database Systems: Lecture # 1 Core Concepts and Outlines
No ratings yet
Advanced Database Systems: Lecture # 1 Core Concepts and Outlines
17 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
4 pages
Db2 SQL Query
No ratings yet
Db2 SQL Query
188 pages
DAD 220 Module Four Major Activity Godfrey
No ratings yet
DAD 220 Module Four Major Activity Godfrey
5 pages
Linq Revision
No ratings yet
Linq Revision
22 pages
Full Stack Python & React JS - 20250505 - 144143 - 0000
No ratings yet
Full Stack Python & React JS - 20250505 - 144143 - 0000
20 pages

Pandas Notes

Uploaded by

Pandas Notes

Uploaded by

1.

​ Need and Overview of Pandas:

●​ Efficiently handles large datasets.

2.​ Setup for Pandas:

Step 1: Install Pandas

Pandas can be installed using pip, Python's package manager.run:

pip install pandas

!pip install pandas

Step 2: Import Pandas

3.​ Pandas Data Structures: Series and DataFrame

●​ Indexing: Each element has a unique label (index).

1.​ Creating a Series

2.​ Accessing Data in a Series

●​ DataFrame: A 2D labeled data structure similar to a table.

Note : Difference Between DataFrames and 2D Arrays:

1.​ Creating a DataFrame

2.​ Accessing Data in a DataFrame

Common Methods for Inspecting Data in Pandas:

Labels (Columns, Index ), Shape, Size, Info, and Describe:

Accessing Data Using .loc[] and .iloc[]

●​ .loc[] is useful when you need to access data by names (labels).

Accessing Single Values Using .at[] and .iat[]

.at[] is used to access a single value in a DataFrame by label.

.iat[] is used to access a single value in a DataFrame by integer position.

Accessing Columns: Shorthand and Dot Notation

Shorthand Notation: Access a column in a DataFrame by label using square brackets.

Dot Notation: Access a column in a DataFrame by label using dot notation.

condition: Boolean condition applied to one or more columns.

1.​ Filter rows based on a single condition:

Condition: Age > 30 )

df[df['Age'] > 30]

2.​ Filter rows based on multiple conditions (AND):

Condition: Age > 30 and City is "Chicago"

df[(df['Age'] > 30) & (df['City'] == 'Chicago')]

3.​ Filter rows based on multiple conditions (OR):

Condition: Age > 30 or City is "New York"

df[(df['Age'] > 30) | (df['City'] == 'New York')]

4.​ Filter rows using isin() for multiple values:

Condition: City is either "Chicago" or "Houston"

Regular Expressions (Regex) in Pandas

1.​ Filter rows containing a pattern:

2.​ Filter rows not containing a pattern:

3.​ Filter rows starting with a specific pattern:

4.​ Replace values using regex:

For Series: Series.apply(func)

For DataFrame: DataFrame.apply(func, axis=0/1)

Example 1: Applying a Function to a Series

Example 3: Applying a Function Along DataFrame Rows

Transforming or Adding Data Using where()

For Series: Series.where(cond, other=np.nan) # np -> numpy alias

For DataFrame: DataFrame.where(cond, other=np.nan, axis=0)

Syntax : df.insert(position, new_column_name, column_data)

Syntax : df.drop(column_name, axis=1, inplace=True)

df.rename(columns={'name': 'FullName'}, inplace=True))

Merging DataFrames: Inner, Outer, Left, Right Joins

2.​ Outer Join

3.​ Left Join

●​ 'outer' (default): Includes all columns (union of columns).

2.​ Concatenate Horizontally (Joining Columns)

Handling Null (Missing) Values in Pandas

Grouping Data Using groupby()

df.groupby(by, axis=0, level=None, as_index=True, sort=True,

by: Column(s) or index level(s) to group by.

Common Operations with groupby()

1.​ Aggregation (e.g., sum, mean)

2.​Grouping and Aggregating with multiple functions

3.​ Grouping and Iterating Over Groups

5.​ Transforming Data Within Groups Using transform()

You might also like

Need and Overview of Pandas:

● Efficiently handles large datasets.

2. Setup for Pandas:

3. Pandas Data Structures: Series and DataFrame

● Indexing: Each element has a unique label (index).

1. Creating a Series

2. Accessing Data in a Series

● DataFrame: A 2D labeled data structure similar to a table.

1. Creating a DataFrame

2. Accessing Data in a DataFrame

● .loc[] is useful when you need to access data by names (labels).

1. Filter rows based on a single condition:

2. Filter rows based on multiple conditions (AND):

3. Filter rows based on multiple conditions (OR):

4. Filter rows using isin() for multiple values:

1. Filter rows containing a pattern:

2. Filter rows not containing a pattern:

3. Filter rows starting with a specific pattern:

4. Replace values using regex:

2. Outer Join

3. Left Join

● 'outer' (default): Includes all columns (union of columns).

2. Concatenate Horizontally (Joining Columns)

1. Aggregation (e.g., sum, mean)

2.Grouping and Aggregating with multiple functions

3. Grouping and Iterating Over Groups

5. Transforming Data Within Groups Using transform()