KEMBAR78
Pandas Basics Guide | PDF | Computer Science | Computer Data
0% found this document useful (0 votes)
45 views4 pages

Pandas Basics Guide

The document is a beginner's guide to the Pandas library in Python, covering data manipulation and analysis with key data structures like Series and DataFrame. It includes steps for installation, creating data structures, reading data from files, exploring datasets, selecting and filtering data, and handling missing values. Additionally, it touches on optional topics such as grouping, applying functions, merging DataFrames, creating pivot tables, and exporting data.

Uploaded by

mayurgbari52076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views4 pages

Pandas Basics Guide

The document is a beginner's guide to the Pandas library in Python, covering data manipulation and analysis with key data structures like Series and DataFrame. It includes steps for installation, creating data structures, reading data from files, exploring datasets, selecting and filtering data, and handling missing values. Additionally, it touches on optional topics such as grouping, applying functions, merging DataFrames, creating pivot tables, and exporting data.

Uploaded by

mayurgbari52076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Pandas Library - Beginner Guide

Step 1: What is Pandas?


Pandas is a Python library used for data manipulation and analysis. It helps you work with
data in a table-like format — similar to Excel or SQL tables.

You mainly work with two data structures in Pandas:


- Series: 1D labeled array (like a single column)
- DataFrame: 2D labeled data (like a full table)

Step 2: Installing and Importing Pandas


# Install it (run this in terminal or notebook if not installed)
!pip install pandas

# Import it
import pandas as pd

Step 3: Creating Data Structures


1. Series (1D array):
import pandas as pd
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

With labels:
s = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(s)

2. DataFrame (2D Table):


From dictionary:
data = {'Name': ['Mayur', 'Aniket', 'Jayesh'], 'Marks': [85, 90, 78]}
df = pd.DataFrame(data)
print(df)

From list of lists:


data = [['Mayur', 85], ['Aniket', 90], ['Jayesh', 78]]
df = pd.DataFrame(data, columns=['Name', 'Marks'])
print(df)
Step 4: Reading Data from File
# CSV File
df = pd.read_csv('your_file.csv')
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows

Step 5: Exploring the Dataset


print(df.shape) # (rows, columns)
print(df.columns) # Column names
print(df.info()) # Data types and non-null counts
print(df.describe()) # Summary stats for numerical columns

Step 6: Selecting Data


# Single column
df['Name'] # Returns a Series

# Multiple columns
df[['Name', 'Marks']] # Returns a DataFrame

# Access rows using iloc (by index)


df.iloc[0] # First row
df.iloc[1:3] # Rows 1 to 2

# Access using loc (by label)


df.loc[0, 'Name']

Step 7: Filtering Rows


df[df['Marks'] > 80] # Students with marks > 80

Step 8: Adding New Column


df['Grade'] = ['A', 'A+', 'B']
print(df)

Step 9: Handling Missing Data


df.isnull() # Shows True for missing values
df.dropna() # Removes rows with missing values
df.fillna(0) # Replace missing with 0
Step 10: Sorting
df.sort_values(by='Marks', ascending=False)

Optional but Useful Topics (if you have time):

1. Group By (Grouping and Aggregating)

Useful for summarizing data.

df.groupby('Grade')['Marks'].mean()

2. Apply Function (Custom Functions on Data)

Apply a lambda or custom function to a column.

df['Bonus'] = df['Marks'].apply(lambda x: x + 5)

3. Merging / Joining DataFrames

Similar to SQL joins.

pd.merge(df1, df2, on='ID', how='inner')

4. Pivot Tables

For advanced summarization.


df.pivot_table(values='Marks', index='Grade', aggfunc='mean')

5. Exporting to File

Save the modified DataFrame to CSV or Excel.

df.to_csv('output.csv', index=False)

df.to_excel('output.xlsx', index=False)

You might also like