KEMBAR78
Pandas | PDF | Computer Programming | Computer Data
0% found this document useful (0 votes)
6 views6 pages

Pandas

The document is a comprehensive tutorial on the Pandas library for Python, covering installation, data structures, and various operations such as data manipulation, handling missing data, and visualization. It includes sections on advanced topics like pivot tables and multi-indexing. The tutorial emphasizes the importance of Pandas in data science and analysis workflows.

Uploaded by

Tom Cruise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Pandas

The document is a comprehensive tutorial on the Pandas library for Python, covering installation, data structures, and various operations such as data manipulation, handling missing data, and visualization. It includes sections on advanced topics like pivot tables and multi-indexing. The tutorial emphasizes the importance of Pandas in data science and analysis workflows.

Uploaded by

Tom Cruise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

📘 Pandas Tutorial: From Beginner to

Advanced

📌 Table of Contents
1.​ Introduction​

2.​ Installing Pandas​

3.​ Getting Started​

4.​ Data Structures in Pandas​

○​ Series​

○​ DataFrame​

5.​ Basic Operations​

6.​ Indexing and Selecting Data​

7.​ Handling Missing Data​

8.​ Data Manipulation​

9.​ GroupBy Operations​

10.​Merging & Joining​

11.​Time Series​

12.​Input/Output (I/O)​

13.​Visualization​

14.​Advanced Topics​
15.​Conclusion​

🔍 1. Introduction
pandas is an open-source Python library that provides data structures and data analysis tools
for working with structured data (like Excel spreadsheets, SQL tables, CSV files, etc.).

💻 2. Installing Pandas
Install via pip:

pip install pandas

Or in Jupyter:

!pip install pandas

🚀 3. Getting Started
Import pandas:

import pandas as pd

Check version:

print(pd.__version__)

📦 4. Data Structures in Pandas


🧵 Series (1D)
s = pd.Series([1, 3, 5, None, 6, 8])
print(s)
📊 DataFrame (2D)
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

⚙️ 5. Basic Operations
df.head() # First 5 rows
df.tail(2) # Last 2 rows
df.shape # (rows, columns)
df.info() # Info about DataFrame
df.describe() # Statistical summary
df.columns # Column names
df.dtypes # Data types

🔍 6. Indexing and Selecting Data


df['Name'] # Select column
df[['Name', 'City']] # Multiple columns

df.loc[0] # Row by label


df.iloc[1] # Row by index
df.iloc[0:2] # Slicing rows

df[df['Age'] > 25] # Conditional filter

❓ 7. Handling Missing Data


df.isnull() # Detect missing values
df.dropna() # Drop rows with NaN
df.fillna(0) # Fill NaNs with 0
df['Age'].fillna(df['Age'].mean()) # Fill with mean
🛠️ 8. Data Manipulation
Add new column
df['Salary'] = [50000, 60000, 70000]

Rename columns
df.rename(columns={'Name': 'FullName'}, inplace=True)

Drop columns/rows
df.drop('Salary', axis=1) # Drop column
df.drop([0, 1], axis=0) # Drop rows

Sorting
df.sort_values('Age', ascending=False)

🔄 9. GroupBy Operations
grouped = df.groupby('City')
grouped['Age'].mean()

df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})

🔗 10. Merging, Joining, Concatenating


Concatenate
pd.concat([df1, df2]) # Vertical
pd.concat([df1, df2], axis=1) # Horizontal

Merge
pd.merge(df1, df2, on='id')

Join
df1.join(df2, on='id', how='left')

⏰ 11. Time Series


dates = pd.date_range('2023-01-01', periods=6)
df = pd.DataFrame({'value': range(6)}, index=dates)

df.resample('D').sum() # Daily resample


df['2023'] # Select by year

💾 12. Input / Output (I/O)


Read files
pd.read_csv('data.csv')
pd.read_excel('data.xlsx')
pd.read_json('data.json')
pd.read_sql(query, connection)

Write files
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx')

📈 13. Visualization
Basic plotting (uses Matplotlib):

import matplotlib.pyplot as plt

df['Age'].plot(kind='hist')
df.plot(x='Name', y='Salary', kind='bar')
plt.show()
🧠 14. Advanced Topics
●​ Pivot Tables​

df.pivot_table(values='Salary', index='City', aggfunc='mean')

●​ Apply functions​

df['Age'].apply(lambda x: x + 1)

●​ Categorical data​

df['City'] = df['City'].astype('category')

●​ MultiIndexing​

df.set_index(['City', 'Name'], inplace=True)

✅ 15. Conclusion
pandas is essential for any data science or data analysis workflow. It handles:

●​ Data cleaning​

●​ Transformation​

●​ Aggregation​

●​ Input/output​

●​ Time-series and categorical data​

You might also like