0% found this document useful (0 votes)

44 views32 pages

2,3. Introduction Pandas & Matplotlib

The document provides an introduction to the Pandas library for data analysis and manipulation in Python, covering its core features, data structures, and basic operations. It also introduces Matplotlib for data visualization, detailing various plot types and customization options. Hands-on examples with code are included for both libraries to facilitate understanding and practical application.

Uploaded by

roqia.nasimzada12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views32 pages

2,3. Introduction Pandas & Matplotlib

Uploaded by

roqia.nasimzada12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to Pandas and

dataset analysis and visualization

matplotlib
Content:
Introduction to Pandas
• What is Pandas?
• Core Features of Pandas
• Pandas Data Structures: Series & DataFrame
• Importing Data
• Basic Data Manipulation Operations
• Example Use Cases
• Hands-on Examples with Code
• Introduction to Matplotlib
• What is Matplotlib?
• Why Matplotlib for Data Visualization?
• Basic Components of a Plot
• Common Plot Types
• Customizing Plots
• Hands-on Examples with Code
Introduction to pandas
• Pandas is a Python library for working with data sets
• Pandas is a powerful Python library used for data analysis and manipulation.
• Built on top of NumPy and provides easy-to-use data structures and operations.
• It has functions for analysing, cleaning, exploring, and manipulating data.
• The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes McKinney in 2008
pandas core feature…
Key Features:
• Loading data into DataFrames Series structures
• Handling missing data
• Data manipulation (filtering, aggregation , imputation, removal)
• Filtering data based on conditions
• Creating new columns based on existing data
• It is an opensource library

Key structures:
• Series (1D labeled array)
• DataFrame (2D labeled data, like a table)

Pandas Codebase? - https://github.com/pandas-dev/pandas

Importing and installation
Installation
• Open command prompt
• Run the command `pip install pandas`

Importing pandas
• To use the library we have to import it or include it in our project using the following command
• `import pandas ` or `import pandas as pd`
Importing example
Importing data to pandas
Data Importing:
• Pandas can easily import data from different fileformats, such as CSV, Excel, JSON.
Exploring dataset
import pandas as pd

# data = pd.read_csv(‘file-name.csv')
# data = pd.read_excel(‘file-name.xlsx’) # the path to file
data = pd.read_excel('ESD.xlsx')

# print(data.head())
# print(data.tail())
# print(data.info())
# print(data.describe())
# print(data.isnull().sum())
Handling duplicate data
import pandas as pd

#handling duplicate values

data = pd.read_csv('company1.csv')

print(data['EEID'].duplicated())
print(data)

#find duplicated value

print(data['EEID'].duplicated().sum())
#finds the non-null values
print(data['salary'].count())
# drop duplicate values in dataframe
print(data.drop_duplicates('EEID’))

# in case we want to replace duplicate values

data['EEID'] = data['EEID'].where(~data['EEID'].duplicated(),
other=pd.NA)
# data.loc[data['EEID'].duplicated(), 'EEID'] = pd.NA
Handling missing data
import numpy as np

hmd = pd.read_csv('company1.csv’)

# shows missing data

hmd.isnull()

# counts missing data

Hmd.isnull().sum()

# drop null values

hmd.dropna()

# replace null values to a custom value

hmd.replace(np.nan , 'default_value’)

# replace null values in a specific column

hmd['Name'] = hmd['Name'].replace(np.nan , 'no-name’)
Handling missing data continue…
Sometimes you can’t just drop the missing data .
Mean = the average value (the sum of all values divided by number of values).
Median = the value in the middle, after you have sorted all values ascending.
Mode = the value that appears most frequently.

mean_salary = hmd['salary'].mean() # Calculate the mean

median_salary = hmd['salary'].median() # Calculate the median
mode_salary = hmd['salary'].mode()[0] # Calculate the mode (most frequent salary value)

# Print the calculated values

print(f"\nMean Salary: {mean_salary}")
print(f"Median Salary: {median_salary}")
print(f"Mode Salary: {mode_salary}")

hmd['salary'] = hmd['salary'].replace(np.nan, mode_salary) # Replace NaN with mode in the 'salary' column

print(hmd)
Handling missing data continue…
Filling missing data
• Forward filling
• Backward filling

For example we cannot get the mean, median or mode for gender

hmd_['gender'] = hmd_['gender'].bfill() # backword fill

hmd_['gender'] = hmd_['gender'].ffill() # forward fill
print(hmd_)
Data Transformation
esd = pd.read_excel('ESD.xlsx')

esd.loc[esd['Bonus %'] == 0 , "GetBonus"] = "No bonus"

esd.loc[esd['Bonus %'] > 0 , "GetBonus"] = "bonus"

print(esd.head(10))

# another example
esd = pd.read_excel('ESD.xlsx')

esd['describe_employee'] = esd['EEID'] + ' ' + esd['Full Name'].str.upper() + ' ' + esd['Job Title']
esd['tax'] = esd['Annual Salary'] - ((esd['Annual Salary'] / 100) * 10 )

print(esd.head())
Dataset summarization
sum(): Returns the sum of the values.
mean(): Returns the average of the values.
count(): Returns the number of non-NA/null observations.
max(): Returns the maximum value.
min(): Returns the minimum value.
median(): Returns the median value.

esd = pd.read_excel('ESD.xlsx')

agg_1 = esd.groupby(['Department' , 'Gender']).agg({"EEID":"count"})

agg_2 = esd.groupby(['Department' , 'Ethnicity']).agg({"EEID":"count"})

print(agg_1)
print(agg_2)
Merge and join
import pandas as pd

employee = {
"id": [1,2,3,4],
'names': ['ahmad' , 'mahmood' , 'khalil' , 'khanwali']
}
employee2 = {
"id": [1,2,3,4],
'salary': [12000,10000,4500,8000]
}

df = pd.DataFrame(employee)
df1 = pd.DataFrame(employee2)

emp = pd.merge(df,df1, on='id')

emp = pd.merge(df,df1, how='left')
emp = pd.concat([df,df1])

print(emp)
Introduction to Matplotlib
A comprehensive library for creating static, animated, and interactive visualizations in Python.
• Built on NumPy and designed for easy and flexible plotting.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source and we can use it freely.
• https://matplotlib.org/stable/ -- documentation
• https://github.com/matplotlib/matplotlib -- codebase
Why matplotlib
Advantages:
• Versatile: Supports a variety of plots (line, scatter,
bar, etc.).
• Customizable: Extensive options for styling and
formatting.
• Integration: Works well with Jupyter notebooks, other
Installation & usage:
libraries like Pandas, and GUI applications.
• To install it `pip install matplotlib`
• To use it `import matplotlib.pyplot as plt`
Plot types
Common Plot Types are as follows:
• Line Plot: `plt.plot()`
• Scatter Plot: `plt.scatter()`
• Bar Plot: `plt.bar()`
• Histogram: `plt.hist()`
• Box Plot: `plt.boxplot()`
Scatter plot
import numpy as np
import matplotlib.pyplot as pt

x_axis = np.random.random(50) * 100

y_axis = np.random.random(50) * 100

pt.scatter(x_axis,y_axis)
pt.show()

pt.scatter(x_axis,y_axis, color='#0000ff', marker="1", s= 100)

pt.show()
Line plot or chart
#line chart

years = [2000+ x for x in range(24)] # x axis

home_price = np.random.random(24) * 1000 # y axis

years1 = [2000+ x for x in range(24)] # x axis

home_price1 = np.random.random(24) * 500 # y axis

pt.plot(years,home_price )
pt.plot(years,home_price, c='red', lw=4, label='line-1' )
#linestyle='--'
pt.plot(years1,home_price1, label='line-2' )
pt.legend('top right')
pt.show()
Barchart
import matplotlib.pyplot as plt

# Data for the bar chart

products = ['Product A', 'Product B', 'Product C', 'Product D']
sales = [120, 300, 250, 450]
# Create a bar chart
plt.bar(products, sales, color='skyblue', edgecolor='black', width=0.6)
# Add Title and Labels
plt.title('Sales of Products in Q1 2024', fontsize=16, fontweight='bold', color='darkblue')
plt.xlabel('Products', fontsize=12, fontweight='bold')
plt.ylabel('Sales (in Units)', fontsize=12, fontweight='bold')

# Gridlines (add gridlines for better readability)

plt.grid(True, which='both', axis='y', linestyle='--', linewidth=0.7)

for i, value in enumerate(sales):

plt.text(i, value + 10, str(value), ha='center', fontsize=11)

plt.gca().set_facecolor('whitesmoke’) # Add a Background Color to the Chart

plt.ylim(0, 1000) # Set the Limits for Y-axis
plt.tight_layout() # Display the bar chart
plt.show()
Histogram chart
# Data for a histogram (continuous data)
ages = [22, 23, 25, 26, 28, 30, 32, 35, 36, 37, 40, 42, 45, 48, 50]
ages = np.random.normal(20,1.5,12000)

# Creating a histogram
# pt.hist(ages, bins=5, color='lightgreen', edgecolor='black')
pt.hist(ages, bins=112, cumulative=True) # cumulative
pt.xlabel('Age')
pt.ylabel('Frequency')
pt.title('Age Distribution')
pt.show()
Pie chart
# Company names and their market caps (in billions)
companies = [
'Apple', 'Microsoft', 'Nvidia', 'Saudi Aramco', 'Alphabet (Google)', 'Amazon',
'Meta Platforms (Facebook)', 'Berkshire Hathaway', 'TSMC', 'Eli Lilly',
'JPMorgan Chase', 'Tesla', 'Visa', 'Johnson & Johnson', 'ExxonMobil',
'Samsung', 'Chevron', 'Walmart', 'Pfizer', 'Procter & Gamble',
'Mastercard', 'Alibaba', 'Boeing', 'Cisco', 'IBM’, 'Shell', 'American Express', 'Qualcomm', 'Verizon', 'Morgan Stanley'
]

market_caps = [
3100, 3100, 2200, 2100, 1700, 1400, 768, 768, 500, 500, 500, 800, 500, 450, 400, 320, 330, 648, 240, 320, 350, 240, 150, 216,
215,
211, 196, 189, 180, 179
]

# Create the pie chart

pt.figure(figsize=(10, 10))
pt.pie(market_caps, labels=companies, autopct='%2.2f%%', startangle=140, colors=plt.cm.Paired.colors)

# Add a title
pt.title('Market Share of Top 30 Companies by Market Cap (2024)')

# Display the pie chart

pt.show()
Boxplot or boxchart
# Sample data for the salaries in different departments
salaries = [
[45000, 48000, 52000, 55000, 58000, 60000, 62000, 67000, 70000], # Department A
[40000, 43000, 47000, 50000, 53000, 56000, 59000], # Department B
[30000, 35000, 40000, 42000, 45000, 47000, 49000, 50000], # Department C
[60000, 62000, 65000, 67000, 70000, 75000], # Department D
[50000, 52000, 54000, 55000, 58000, 60000, 62000, 65000] # Department E
]

# Create the box plot

plt.figure(figsize=(8, 6))
plt.boxplot(salaries, labels=['Dept A', 'Dept B', 'Dept C', 'Dept D', 'Dept E'])

# Add a title and labels

plt.title('Salary Distribution by Department')
plt.ylabel('Salary ($)')
plt.xlabel('Departments')

# Display the plot

plt.show()
Multiple Figures
import matplotlib.pyplot as plt
import numpy as np

# Generating sample data for 1 year (12 months)

months = [
'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
]

# Sample prices for each cryptocurrency (in USD)

btc_prices = [
32000, 33000, 34000, 35000, 36000, 37000,
38000, 39000, 40000, 41000, 42000, 43000
] # Bitcoin (BTC)

eth_prices = [
2000, 2100, 2200, 2300, 2400, 2500,
2600, 2700, 2800, 2900, 3000, 3100
] # Ethereum (ETH)

# Add a title and labels

plt.title('Cryptocurrency Prices Over 1 Year')
plt.xlabel('Months')
plt.ylabel('Price (USD)')
# Create the line chart
plt.figure(1)

# Plotting the prices

plt.plot(months, btc_prices, marker='o', label='Bitcoin (BTC)', color='orange')
Subplots and saving
import matplotlib.pyplot as plt
import numpy as np

# Generate x values
x = np.linspace(-2 * np.pi, 2 * np.pi, 1000)

# Calculate y values for each function

y_cos = np.cos(x) # Cosine wave
y_sin = np.sin(x) # Sine wave
y_tan = np.tan(x) # Tangent wave

# Create a figure with 3 subplots

fig, axs = plt.subplots(3, 1, figsize=(10, 12))

# Cosine Wave
axs[0].plot(x, y_cos, color='blue', label='Cosine Wave')
axs[0].set_title('Cosine Wave')
axs[0].set_ylabel('cos(x)')
axs[0].set_ylim(-1.5, 1.5) # Limit y-axis for better visibility
axs[0].grid(True)
axs[0].legend()
Subplots and saving continue…
# Sine Wave
axs[1].plot(x, y_sin, color='orange', label='Sine Wave')
axs[1].set_title('Sine Wave')
axs[1].set_ylabel('sin(x)')
axs[1].set_ylim(-1.5, 1.5) # Limit y-axis for better visibility
axs[1].grid(True)
axs[1].legend()

# Tangent Wave
axs[2].plot(x, y_tan, color='green', label='Tangent Wave')
axs[2].set_title('Tangent Wave')
axs[2].set_ylabel('tan(x)')
axs[2].set_ylim(-10, 10) # Limit y-axis for better visibility
axs[2].grid(True)
axs[2].legend()

# Adjust layout to prevent overlap

plt.tight_layout()
plt.savefig('subplots.jpeg', dpi=300, transparent = True )
# plt.show()
3d plotting
import numpy as np
import matplotlib.pyplot as pt

# Create a grid of x and y values

x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)

# Calculate z values based on the mathematical formula

z = np.sin(np.sqrt(x**2 + y**2))

# Create a 3D plot
fig = pt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Plot the surface

surface = ax.plot_surface(x, y, z, cmap='viridis', edgecolor='none')

# Add a color bar which maps values to colors

fig.colorbar(surface, shrink=0.5, aspect=10)

# Set titles and labels

ax.set_title('3D Plot of z = sin(sqrt(x^2 + y^2))')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')

# Show the plot

pt.show()
3d plotting second example –
scatter plot
import numpy as np
import matplotlib.pyplot as pt

# Generate random data for the scatter plot

num_points = 100
x = np.random.rand(num_points) * 10 # X values
y = np.random.rand(num_points) * 10 # Y values
z = np.random.rand(num_points) * 10 # Z values

# Create a 3D scatter plot

fig = pt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Scatter the data points

scatter = ax.scatter(x, y, z, c='g', marker='o', alpha=0.7)

# Set titles and labels

ax.set_title('3D Scatter Plot', fontsize=16)
ax.set_xlabel('X axis', fontsize=12)
ax.set_ylabel('Y axis', fontsize=12)
ax.set_zlabel('Z axis', fontsize=12)

# Show the plot

pt.show()
Using both pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

esd = pd.read_excel('ESD.xlsx')
agg_ = esd.groupby(['Ethnicity']).agg({"EEID": "count"})

ethnicity_counts = agg_.reset_index() # Reset index to make 'Ethnicity' a column

ethnicity_counts.columns = ['Ethnicity', 'Count'] # Rename columns for clarity

plt.figure(figsize=(8, 6)) # Set figure size

plt.pie(ethnicity_counts['Count'], labels=ethnicity_counts['Ethnicity'], autopct='%1.1f%%',
startangle=140)
plt.title('Employee Distribution by Ethnicity')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()
Datasets resources
Datasets
• https://archive.ics.uci.edu/ - uci dataset repo
• https://datasetsearch.research.google.com/ - google dataset
• https://data.un.org/ - un datasets
• https://www.statista.com

Pandas
No ratings yet
Pandas
25 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
1 page
BDA File
No ratings yet
BDA File
26 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Datascienece
No ratings yet
Datascienece
18 pages
ML Week 7
No ratings yet
ML Week 7
12 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Python Finance & Trading Guide
No ratings yet
Python Finance & Trading Guide
11 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Python Data Exploration Guide
100% (1)
Python Data Exploration Guide
12 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
1st Class-Introduction and Python Package
No ratings yet
1st Class-Introduction and Python Package
93 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Python Data Analysis Guide
100% (3)
Python Data Analysis Guide
72 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Matplotlib Cheat Sheet
No ratings yet
Matplotlib Cheat Sheet
6 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Data Analytics Preparation & Visualization
No ratings yet
Data Analytics Preparation & Visualization
54 pages
Mohit
No ratings yet
Mohit
19 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
Ex1 - Plotting and Visualization Using Numpy and Pandas
No ratings yet
Ex1 - Plotting and Visualization Using Numpy and Pandas
14 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Introduction To Pandas - Loading and Exploring Data
No ratings yet
Introduction To Pandas - Loading and Exploring Data
4 pages
4 Pandas
No ratings yet
4 Pandas
35 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
Employee Data Analysis System (Ip Class Xii)
No ratings yet
Employee Data Analysis System (Ip Class Xii)
26 pages
Eda Lab Assignment2
No ratings yet
Eda Lab Assignment2
10 pages
Class 1 Data Visualization in Python Using Matplotlib
No ratings yet
Class 1 Data Visualization in Python Using Matplotlib
13 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
Comprehensive Pandas Guide
No ratings yet
Comprehensive Pandas Guide
171 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
No ratings yet
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Jurnal Pengelolaan Sampah Dian Apriliani & Maesaroh
No ratings yet
Jurnal Pengelolaan Sampah Dian Apriliani & Maesaroh
14 pages
13 Amines 2
No ratings yet
13 Amines 2
17 pages
Size Reduction in Pharmaceutical Engineering
No ratings yet
Size Reduction in Pharmaceutical Engineering
46 pages
Economics Exam IMP Question May-24 by HM Hasnan.
No ratings yet
Economics Exam IMP Question May-24 by HM Hasnan.
83 pages
Four Theoretical Contributions Which Are Central To The Understanding of Organizations Ezdehar Okasheh University of The People
No ratings yet
Four Theoretical Contributions Which Are Central To The Understanding of Organizations Ezdehar Okasheh University of The People
8 pages
LECTURE 1 Introduction
No ratings yet
LECTURE 1 Introduction
45 pages
Biology Unit 4 PDF
No ratings yet
Biology Unit 4 PDF
41 pages
Bishan Public Library - Fact Sheet
100% (1)
Bishan Public Library - Fact Sheet
3 pages
Subterranean Twin Cities - Greg Brick
0% (1)
Subterranean Twin Cities - Greg Brick
247 pages
Chemistry Solubility Project
No ratings yet
Chemistry Solubility Project
9 pages
Report On Project Work
No ratings yet
Report On Project Work
11 pages
Logic & S&T Concepts for Students
No ratings yet
Logic & S&T Concepts for Students
2 pages
Classical Mechanics by John R. Taylor Instant Download
100% (3)
Classical Mechanics by John R. Taylor Instant Download
102 pages
T3804U
No ratings yet
T3804U
2 pages
Degree Ceremony 201617 Web
No ratings yet
Degree Ceremony 201617 Web
20 pages
Project Risk Management - TMA4
No ratings yet
Project Risk Management - TMA4
58 pages
Green Synthesis of Silver Nanoparticles Using Trac
No ratings yet
Green Synthesis of Silver Nanoparticles Using Trac
15 pages
Getting The GMMA Right
No ratings yet
Getting The GMMA Right
3 pages
Worksheet-SCIENCE12 - General Physics 1 - Module 4 - Mechanical Waves - W1 PDF
No ratings yet
Worksheet-SCIENCE12 - General Physics 1 - Module 4 - Mechanical Waves - W1 PDF
3 pages
Voice Conversion for Engineers
No ratings yet
Voice Conversion for Engineers
4 pages
AI Search Algorithms Guide
No ratings yet
AI Search Algorithms Guide
9 pages
Group 4 - Instru
No ratings yet
Group 4 - Instru
17 pages
Education Foundations Overview
No ratings yet
Education Foundations Overview
44 pages
Competitive Exams After 12th - Vikaspedia
No ratings yet
Competitive Exams After 12th - Vikaspedia
4 pages
Business Statistics A First Course 6th Edition David Levine Timothy Krehbiel
No ratings yet
Business Statistics A First Course 6th Edition David Levine Timothy Krehbiel
309 pages
Client Brief - Highlands BBQ (Copy 2)
No ratings yet
Client Brief - Highlands BBQ (Copy 2)
18 pages
Kinematic Equation Practice Problems
No ratings yet
Kinematic Equation Practice Problems
3 pages
Teacher Empathy A Prerequisitefor
No ratings yet
Teacher Empathy A Prerequisitefor
7 pages
A Handbook For FLN Teachers by SCERT-JKUT
No ratings yet
A Handbook For FLN Teachers by SCERT-JKUT
167 pages
Origins of Organic Chemistry and Organic Synthesis
No ratings yet
Origins of Organic Chemistry and Organic Synthesis
13 pages

2,3. Introduction Pandas & Matplotlib

Uploaded by

2,3. Introduction Pandas & Matplotlib

Uploaded by

Introduction to Pandas and

dataset analysis and visualization

Pandas Codebase? - https://github.com/pandas-dev/pandas

#handling duplicate values

#find duplicated value

# in case we want to replace duplicate values

# shows missing data

# counts missing data

# drop null values

# replace null values to a custom value

# replace null values in a specific column

mean_salary = hmd['salary'].mean() # Calculate the mean

# Print the calculated values

hmd_['gender'] = hmd_['gender'].bfill() # backword fill

esd.loc[esd['Bonus %'] == 0 , "GetBonus"] = "No bonus"

agg_1 = esd.groupby(['Department' , 'Gender']).agg({"EEID":"count"})

emp = pd.merge(df,df1, on='id')

x_axis = np.random.random(50) * 100

pt.scatter(x_axis,y_axis, color='#0000ff', marker="1", s= 100)

years = [2000+ x for x in range(24)] # x axis

years1 = [2000+ x for x in range(24)] # x axis

# Data for the bar chart

# Gridlines (add gridlines for better readability)

for i, value in enumerate(sales):

plt.gca().set_facecolor('whitesmoke’) # Add a Background Color to the Chart

# Create the pie chart

# Display the pie chart

# Create the box plot

# Add a title and labels

# Display the plot

# Generating sample data for 1 year (12 months)

# Sample prices for each cryptocurrency (in USD)

# Add a title and labels

# Plotting the prices

# Calculate y values for each function

# Create a figure with 3 subplots

# Adjust layout to prevent overlap

# Create a grid of x and y values

# Calculate z values based on the mathematical formula

# Plot the surface

# Add a color bar which maps values to colors

# Set titles and labels

# Show the plot

# Generate random data for the scatter plot

# Create a 3D scatter plot

# Scatter the data points

# Set titles and labels

# Show the plot

ethnicity_counts = agg_.reset_index() # Reset index to make 'Ethnicity' a column

plt.figure(figsize=(8, 6)) # Set figure size

You might also like