KEMBAR78
Introduction To Pandas For Data Analysis | PDF | Comma Separated Values | Computer Programming
0% found this document useful (0 votes)
28 views6 pages

Introduction To Pandas For Data Analysis

Analisa Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Introduction To Pandas For Data Analysis

Analisa Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction to Pandas for Data Analysis

Estimated time: 10 Mins

Objective:
1. Learn what Pandas Series are and how to create them.
2. Understand how to access and manipulate data within a Series.
3. Discover the basics of creating and working with Pandas DataFrames.
4. Learn how to access, modify, and analyze data in DataFrames.
5. Gain insights into common DataFrame attributes and methods.

What is Pandas?
Pandas is a popular open-source data manipulation and analysis library for the Python programming
language. It provides a powerful and flexible set of tools for working with structured data, making it a
fundamental tool for data scientists, analysts, and engineers.
Pandas is designed to handle data in various formats, such as tabular data, time series data, and more,
making it an essential part of the data processing workflow in many industries.

Here are some key features and functionalities of Pandas:

Data Structures: Pandas offers two primary data structures - DataFrame and Series.

1. A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data


structure with labeled axes (rows and columns).
2. A Series is a one-dimensional labeled array, essentially a single column or row of data.

Data Import and Export: Pandas makes it easy to read data from various sources, including CSV files,
Excel spreadsheets, SQL databases, and more. It can also export data to these formats, enabling seamless
data exchange.

Data Merging and Joining: You can combine multiple DataFrames using methods like merge and join,
similar to SQL operations, to create more complex datasets from different sources.

Efficient Indexing: Pandas provides efficient indexing and selection methods, allowing you to access
specific rows and columns of data quickly.

Custom Data Structures: You can create custom data structures and manipulate data in ways that suit
your specific needs, extending Pandas' capabilities.

Importing Pandas:
Import Pandas using the import command, followed by the library's name.
Commonly, Pandas is imported as pd for brevity in code.
1. 1

1. import pandas as pd

Copied!

Data Loading:
• Pandas can be used to load data from various sources, such as CSV and Excel files.
• The read_csv function is used to load data from a CSV file into a Pandas DataFrame.

To read a CSV (Comma-Separated Values) file in Python using the Pandas library, you can use the
pd.read_csv() function. Here's the syntax to read a CSV file:
1. 1
2. 2
3. 3
4. 4

1. import pandas as pd
2.
3. # Read the CSV file into a DataFrame
4. df = pd.read_csv('your_file.csv')

Copied!

Replace 'your_file.csv' with the actual file path of your CSV file. Make sure that the file is located in the
same directory as your Python script, or you provide the correct file path.

What is a Series?
A Series is a one-dimensional labeled array in Pandas. It can be thought of as a single column of data with
labels or indices for each element. You can create a Series from various data sources, such as lists, NumPy
arrays, or dictionaries
Here's a basic example of creating a Series in Pandas:

1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7

1. import pandas as pd
2.
3. # Create a Series from a list
4. data = [10, 20, 30, 40, 50]
5. s = pd.Series(data)
6.
7. print(s)

Copied!

In this example, we've created a Series named s with numeric data. Notice that Pandas automatically
assigned numerical indices (0, 1, 2, 3, 4) to each element, but you can also specify custom labels if
needed.

Accessing Elements in a Series


You can access elements in a Series using the index labels or integer positions. Here are a few common
methods for accessing Series data:

Accessing by label

1. 1

1. print(s[2]) # Access the element with label 2 (value 30)

Copied!

Accessing by position

1. 1

1. print(s.iloc[3]) # Access the element at position 3 (value 40)

Copied!

Accessing multiple elements

1. 1

1. print(s[1:4]) # Access a range of elements by label

Copied!

Series Attributes and Methods


Pandas Series come with various attributes and methods to help you manipulate and analyze data
effectively. Here are a few essential ones:

• values: Returns the Series data as a NumPy array.


• index: Returns the index (labels) of the Series.
• shape: Returns a tuple representing the dimensions of the Series.
• size: Returns the number of elements in the Series.
• mean(), sum(), min(), max(): Calculate summary statistics of the data.
• unique(), nunique(): Get unique values or the number of unique values.
• sort_values(), sort_index(): Sort the Series by values or index labels.
• isnull(), notnull(): Check for missing (NaN) or non-missing values.
• apply(): Apply a custom function to each element of the Series.

What is a DataFrames?
A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.
Think of it as a table where each column represents a variable, and each row represents an observation or
data point. DataFrames are suitable for a wide range of data, including structured data from CSV files,
Excel spreadsheets, SQL databases, and more.

Creating DataFrames from Dictionaries:


DataFrames can be created from dictionaries, with keys as column labels and values as lists representing
rows.
1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11

1. import pandas as pd
2.
3. # Creating a DataFrame from a dictionary
4. data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
5. 'Age': [25, 30, 35, 28],
6. 'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}
7.
8. df = pd.DataFrame(data)
9.
10. print(df)
11.

Copied!

Column Selection:
You can select a single column from a DataFrame by specifying the column name within double brackets.
Multiple columns can be selected in a similar manner, creating a new DataFrame.
1. 1

1. print(df['Name']) # Access the 'Name' column

Copied!

Accessing Rows:

You can access rows by their index using .iloc[] or by label using .loc[].

1. 1
2. 2

1. print(df.iloc[2]) # Access the third row by position


2. print(df.loc[1]) # Access the second row by label

Copied!

Slicing:

You can slice DataFrames to select specific rows and columns.

1. 1
2. 2

1. print(df[['Name', 'Age']]) # Select specific columns


2. print(df[1:3]) # Select specific rows

Copied!
Finding Unique Elements:
Use the unique method to determine the unique elements in a column of a DataFrame.

1. 1

1. unique_dates = df['Age'].unique()

Copied!

Conditional Filtering:
You can filter data in a DataFrame based on conditions using inequality operators.
For instance, you can filter albums released after a certain year.
1. 1

1. high_above_102 = df[df['Age'] > 25]

Copied!

Saving DataFrames:
To save a DataFrame to a CSV file, use the to_csv method and specify the filename with a “.csv”
extension.Pandas provides other functions for saving DataFrames in different formats.
1. 1

1. df.to_csv('trading_data.csv', index=False)

Copied!

DataFrame Attributes and Methods


DataFrames provide numerous attributes and methods for data manipulation and analysis, including:

• shape: Returns the dimensions (number of rows and columns) of the DataFrame.
• info(): Provides a summary of the DataFrame, including data types and non-null counts.
• describe(): Generates summary statistics for numerical columns.
• head(), tail(): Displays the first or last n rows of the DataFrame.
• mean(), sum(), min(), max(): Calculate summary statistics for columns.
• sort_values(): Sort the DataFrame by one or more columns.
• groupby(): Group data based on specific columns for aggregation.
• fillna(), drop(), rename(): Handle missing values, drop columns, or rename columns.
• apply(): Apply a function to each element, row, or column of the DataFrame.

Pandas offers a wide range of methods beyond these examples. For more detailed
information, please refer to the official documentation available on the Pandas official
website.

Conclusion
In conclusion, mastering the use of Pandas Series and DataFrames is essential for effective data
manipulation and analysis in Python. Series provide a foundation for handling one-dimensional data with
labels, while DataFrames offer a versatile, table-like structure for working with two-dimensional data.
Whether you're cleaning, exploring, transforming, or analyzing data, these Pandas data structures, along
with their attributes and methods, empower you to efficiently and flexibly manipulate data to derive
valuable insights. By incorporating Series and DataFrames into your data science toolkit, you'll be well-
prepared to tackle a wide range of data-related tasks and enhance your data analysis capabilities.
To further your skills in data analysis with Pandas, consider the following next steps:

Practice:

Work with real datasets to apply what you've learned and gain hands-on experience.

Explore Documentation:

Visit the Pandas official website to explore the extensive documentation and discover more functions and
methods.

Author
Akansha Yadav

Changelog
Date Version Changed by Change Description
2023-10-02 1.0 Akansha Yadav Created Reading

You might also like