0% found this document useful (0 votes)

31 views14 pages

Data Analysis - 5th Unit

The document provides an overview of data analysis using Pandas, focusing on Pandas Series and DataFrames. It covers creating Series and DataFrames, performing operations like arithmetic calculations, boolean indexing, and handling missing values, as well as basic DataFrame manipulations such as adding and dropping columns. Additionally, it introduces data visualization techniques using Matplotlib, including line plots, bar graphs, histograms, and pie charts.

Uploaded by

niharikap229

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views14 pages

Data Analysis - 5th Unit

Uploaded by

niharikap229

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Data analysis - 5th unit

Pandas Series:
Pandas Series are useful for organizing and working with one-dimensional
data. They provide labeled indexing, making it easy to access and manipulate
data. Series support operations like data alignment, handling missing data, and
mathematical calculations. They can be easily integrated with other Pandas
structures like DataFrames, making them a versatile tool for data analysis and
manipulation in Python.

Creating data using Pandas Series:

import pandas as pd
s = pd.Series([10, 20, 30, 40, 50])
print(s)

Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64

The pd.Series() function creates a Series object from the provided list [10, 20,
30, 40, 50] . The index is automatically generated as integers starting from 0.

import pandas as pd
k= pd.Series([1,7,2],index=[1,2,3])
print(k)

To add index = [1,2,3], we will add index parameter either it is 1,2,3 or a,b,c we
can add to the index parameter

Output:
1 1

Data analysis - 5th unit 1

2 7
3 2
dtype: int64

Pandas Series operations:

Pandas Series operations allow you to perform various operations on Series
objects, such as arithmetic operations, boolean operations, and more. Here are
some common operations:

1. Arithmetic Operations: You can perform arithmetic operations like addition,

subtraction, multiplication, and division on Series objects. For example:

import pandas as pd

# Create two Series

s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([5, 6, 7, 8])

# Addition
result = s1 + s2
print(result)

Output: 0 6
1 8
2 10
3 12
dtype: int64

# Subtraction
result = s1 - s2
print(result)

Output: 0 -4
1 -4
2 -4

Data analysis - 5th unit 2

3 -4
dtype: int64

# Multiplication
result = s1 * s2
print(result)

Output: 0 5
1 12
2 21
3 32
dtype: int64

# Division
result = s1 / s2
print(result)

Output: 0 0.200000
1 0.333333
2 0.428571
3 0.500000
dtype: float64

2. Boolean Indexing:

This operation is called boolean indexing. In this operation, the expression res >

2 creates a boolean mask where each element in the Series is compared

against the value 2

import pandas as pd
res = pd.Series([1,2,3,4],index = ["a","b","c","d"])
k = res[res>2]
print(k)

This code creates a Pandas Series res with values [1, 2, 3, 4] and custom
index labels ["a", "b", "c", "d"] . Then, it filters the Series using boolean

Data analysis - 5th unit 3

indexing to select only the elements that are greater than 2, and stores the
result in a new Series k .

Output:
c 3
d 4
dtype: int64

3. Descriptive Statistics: Pandas provides several methods to calculate

descriptive statistics for Series objects, such as mean() , median() , min() ,
max() etc. For example:

import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4])

# Mean
print("Mean:", s.mean())

# Median
print("Median:", s.median())

# Sum
print("Sum:", s.sum())

# Minimum
print("Minimum:", s.min())

# Maximum
print("Maximum:", s.max())

# Count
print("Count:", s.count())

Output:
Mean: 2.5
Median: 2.5

Data analysis - 5th unit 4

Sum: 10
Minimum: 1
Maximum: 4
Count: 4

4. Handling Missing values:

import pandas as pd

# Create a Series with missing values

s = pd.Series([15,85,None,74,56,None,87])

# Check for missing values

print(s.isnull())

0 False
1 False
2 True
3 False
4 False
5 True
6 False
dtype: bool

# Drop missing values

print(s.dropna())

0 15.0
1 85.0
3 74.0
4 56.0
6 87.0
dtype: float64

# Fill missing values with a specific value (e.g., 0)

Data analysis - 5th unit 5

print(s.fillna(19))

0 15.0
1 85.0
2 19.0
3 74.0
4 56.0
5 19.0
6 87.0
dtype: float64

These operations demonstrate how to check for missing values in a Pandas

Series ( isnull() ), how to drop those missing values ( dropna() ), and how to fill
missing values with a specific value ( fillna() ).

Pandas DataFrame:
A DataFrame in Pandas is a two-dimensional, size-mutable, and heterogeneous
tabular data structure with labeled axes (rows and columns). It is similar to a
spreadsheet or SQL table, where each column can have a different data type.
DataFrames are particularly useful for data manipulation and analysis tasks, as
they provide powerful methods for handling and processing structured data.

Creating a DataFrame using Pandas:

import pandas as pd

# Creating a DataFrame from a dictionary

data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)

Name Age City

0 Alice 25 New York

Data analysis - 5th unit 6

1 Bob 30 San Francisco
2 Charlie 35 Los Angeles

This code creates a Pandas DataFrame df from a dictionary data , where each
key in the dictionary becomes a column in the DataFrame and the
corresponding list of values becomes the data in that column. The DataFrame is
then printed to the console, displaying the data in a tabular format.

DataFrame operations:
1. Adding a new column:

df['Gender'] = ['Female', 'Male', 'Male']

print(df)

Name Age City Gender

0 Alice 25 New York Female
1 Bob 30 San Francisco Male
2 Charlie 35 Los Angeles Male

here we added a column called gender.

2. drop:

dropping a column:

k= df.drop("City",axis =1)
k

Name Age Gender

0 Alice 25 Female
1 Bob 30 Male
2 Charlie 35 Male

here we dropped City column. axis = 1, represents column

dropping a row:

k= df.drop(1,axis =0)

Data analysis - 5th unit 7

print(k)

Name Age City Gender

0 Alice 25 New York Female
2 Charlie 35 Los Angeles Male

here we dropped a row with index 1, axis =0 represents a row.

3. Grouping by Gender and calculating mean age:

grouped_df = df.groupby('Gender')['Age'].mean()
print(grouped_df)

Gender
Female 25.0
Male 32.5
Name: Age, dtype: float64

The groupby method in Pandas is used to split the DataFrame into groups based
on some criteria, such as a specific column value. In this case,
df.groupby('Gender') groups the DataFrame df by the 'Gender' column.
After grouping, the ['Age'] part selects the 'Age' column from each group, and
the mean() method calculates the mean age for each group.
Finally, print(grouped_df) displays the resulting Series, where each index
corresponds to a unique value in the 'Gender' column, and each value is the
mean age of the group.

4. Selection and indexing:

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'San Francisco', 'Los Angeles',
df = pd.DataFrame(data)

Data analysis - 5th unit 8

# Access a single value
print("\nValue at index 1, column 'Name':")
print(df.at[1, 'Name'])

# Access a row using integer position

print("\nRow at index 2:")
print(df.iloc[2])

# Access a column using label

print("\nColumn 'Age':")
print(df['Age'])

Value at index 1, column 'Name':

Bob

Row at index 2:
Name Charlie
Age 35
City Los Angeles
Name: 2, dtype: object

Column 'Age':
0 25
1 30
2 35
3 40
Name: Age, dtype: int64

Selection and indexing in Pandas DataFrame allows you to access and

manipulate data efficiently. Here's a short explanation:

1. Single Column Selection: You can select a single column by using square
brackets [] with the column name as a string. For example,
df['Column_Name'] selects the column named 'Column_Name' .

2. Multiple Columns Selection: To select multiple columns, you can pass a list
of column names inside the square brackets. For example, df[['Column1',
'Column2']] selects columns 'Column1' and 'Column2' .

Data analysis - 5th unit 9

3. Row Selection Using Label: You can use the loc[] indexer to select rows
by label. For example, df.loc['Label'] selects the row with the specified
label.

5. reading a CSV file in pandas:

This Python code uses the pandas library to read a CSV file named 'data.csv'
into a DataFrame object called df . The pd.read_csv() function is used to read
the CSV file and create the DataFrame. The DataFrame is a tabular data
structure that stores the data from the CSV file in rows and columns, allowing
for easy manipulation and analysis of the data using pandas' powerful tools and
functions.

• head() : View the first few rows of the DataFrame.

Data analysis - 5th unit 10

Filtering Data: here we are considering first_ings_score which was greater
than 150

Matplotlib:
Line plot:
A line plot is a type of plot where data points are connected by straight lines. It
is often used to visualize data over a continuous interval.

Data analysis - 5th unit 11

Bar Graph:
Creating a bar graph

so Now lets Import a dataset using pandas and perform data visualization
operation using matplotlib, here we are using ipl 2022 dataset

histogram:

Data analysis - 5th unit 12

In Matplotlib, you can create a histogram using the hist function.

piechart:
To draw a pie chart with multiple colors, labels, and percentages in Matplotlib,
you can use the pie function along with the autopct parameter to display the
percentages.

In this example, each slice of the pie chart will be colored according to the
colors list, and the autopct='%1.1f%%' parameter will display the percentage

values with one decimal place. Adjust the labels , colors lists according to your
data and color preferences.

Data analysis - 5th unit 13

Data analysis - 5th unit 14

Subject IP
No ratings yet
Subject IP
9 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Ip Study
No ratings yet
Ip Study
18 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
27 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
XII IP Resource Material - DataFrame
No ratings yet
XII IP Resource Material - DataFrame
22 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Mohit
No ratings yet
Mohit
19 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Pandas Programs
No ratings yet
Pandas Programs
2 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas
No ratings yet
Pandas
12 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas
No ratings yet
Pandas
12 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Panda
No ratings yet
Panda
33 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Python & Pandas for Beginners
No ratings yet
Python & Pandas for Beginners
29 pages
Week 4.1
No ratings yet
Week 4.1
16 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
Pandas
No ratings yet
Pandas
49 pages
Module 6
No ratings yet
Module 6
48 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Lab 9
No ratings yet
Lab 9
9 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Unit 4
No ratings yet
Unit 4
27 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
33 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Bike Service
No ratings yet
Bike Service
19 pages
Attachment - Report Justus
No ratings yet
Attachment - Report Justus
23 pages
s122 Nrf52 8.0.0 Release-Notes
No ratings yet
s122 Nrf52 8.0.0 Release-Notes
6 pages
Web Lab
No ratings yet
Web Lab
42 pages
Final-2023 Modular System Tech Guide
No ratings yet
Final-2023 Modular System Tech Guide
160 pages
Problem A: Python File: Time Limit: 1 Second
No ratings yet
Problem A: Python File: Time Limit: 1 Second
15 pages
Important Questions For Class 12 Computer Science Networking
No ratings yet
Important Questions For Class 12 Computer Science Networking
44 pages
Dell EMC VDI Complete Solutions Brief
No ratings yet
Dell EMC VDI Complete Solutions Brief
3 pages
Chess Master Club
No ratings yet
Chess Master Club
2 pages
GSM Gate Opener GSM Remote Switch
No ratings yet
GSM Gate Opener GSM Remote Switch
13 pages
White Paper: Six Striking Truths That Will Change Your Perception of Power
No ratings yet
White Paper: Six Striking Truths That Will Change Your Perception of Power
12 pages
Tim Cook Is The CEO
No ratings yet
Tim Cook Is The CEO
5 pages
List of 1SFB To1SFA Codes For Spare Parts - 20200925
No ratings yet
List of 1SFB To1SFA Codes For Spare Parts - 20200925
18 pages
V920 International Lamp Driver
No ratings yet
V920 International Lamp Driver
24 pages
Likewise Open 5.0 Installation and Administration Guide
100% (2)
Likewise Open 5.0 Installation and Administration Guide
86 pages
Man 8035 Ord Hand
No ratings yet
Man 8035 Ord Hand
1 page
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
No ratings yet
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
77 pages
5 Things You Need To Know About Your Application
No ratings yet
5 Things You Need To Know About Your Application
2 pages
Information Technology in A Global Society
No ratings yet
Information Technology in A Global Society
21 pages
Syllabus Cmpersd Fall2023
No ratings yet
Syllabus Cmpersd Fall2023
6 pages
Vinafix - VN - Tra Ma IC Richtek 1
No ratings yet
Vinafix - VN - Tra Ma IC Richtek 1
42 pages
B.Tech Admissions Guide
No ratings yet
B.Tech Admissions Guide
15 pages
Manual DC Series
No ratings yet
Manual DC Series
59 pages
Empowerment Technologies Quarter 1, Module 3
No ratings yet
Empowerment Technologies Quarter 1, Module 3
12 pages
Class 9 Update Syllabus
No ratings yet
Class 9 Update Syllabus
13 pages
Passleader - Jncis SP - jn0 363.dumps.224.q&as
100% (1)
Passleader - Jncis SP - jn0 363.dumps.224.q&as
78 pages
Evs 9300 Series
No ratings yet
Evs 9300 Series
504 pages
Hannan Sir Lab File
No ratings yet
Hannan Sir Lab File
26 pages
Video 2 - Digital Images in PIL and NumPy
No ratings yet
Video 2 - Digital Images in PIL and NumPy
15 pages
Log
No ratings yet
Log
30 pages

Data Analysis - 5th Unit

Uploaded by

Data Analysis - 5th Unit

Uploaded by

Data analysis - 5th unit

Creating data using Pandas Series:

Data analysis - 5th unit 1

Pandas Series operations:

1. Arithmetic Operations: You can perform arithmetic operations like addition,

# Create two Series

Data analysis - 5th unit 2

2 creates a boolean mask where each element in the Series is compared

against the value 2

Data analysis - 5th unit 3

3. Descriptive Statistics: Pandas provides several methods to calculate

Data analysis - 5th unit 4

4. Handling Missing values:

# Create a Series with missing values

# Check for missing values

# Drop missing values

# Fill missing values with a specific value (e.g., 0)

Data analysis - 5th unit 5

These operations demonstrate how to check for missing values in a Pandas

Creating a DataFrame using Pandas:

# Creating a DataFrame from a dictionary

Name Age City

Data analysis - 5th unit 6

df['Gender'] = ['Female', 'Male', 'Male']

Name Age City Gender

here we added a column called gender.

Name Age Gender

here we dropped City column. axis = 1, represents column

Data analysis - 5th unit 7

Name Age City Gender

here we dropped a row with index 1, axis =0 represents a row.

3. Grouping by Gender and calculating mean age:

4. Selection and indexing:

Data analysis - 5th unit 8

# Access a row using integer position

# Access a column using label

Value at index 1, column 'Name':

Selection and indexing in Pandas DataFrame allows you to access and

Data analysis - 5th unit 9

5. reading a CSV file in pandas:

• head() : View the first few rows of the DataFrame.

Data analysis - 5th unit 10

Data analysis - 5th unit 11

Data analysis - 5th unit 12

Data analysis - 5th unit 13

You might also like