Pandas
Installation
Installation
What kind of data does pandas handle?
• Pandas is a powerful open-source data manipulation and analysis
library for Python.
• It is built on top of the NumPy library and provides data structures
such as Series and DataFrame that are designed to handle various
types of data.
The primary data structures in pandas:
• Series: A one-dimensional labeled array that can hold any data type. It
is similar to a column in a spreadsheet or a single column in a SQL
table.
• DataFrame: A two-dimensional labeled data structure with columns
that can be of different data types. It can be thought of as a table or a
spreadsheet, where each column is a Series.
Pandas can handle various types of data,
including:
• Numeric Data: Integer and floating-point numbers.
• Text Data: String data and text.
• Time Series Data: Pandas provides robust support for time-based data
and includes various functions for time series analysis.
• Categorical Data: Data with a fixed, limited set of values.
• Missing Data: Pandas provides methods for handling missing data,
making it easy to clean and preprocess datasets.
• Mixed Data Types: Pandas DataFrames can handle columns with
different data types.
Pandas can handle various types of data,
including: Cont…
• Pandas is widely used in data analysis, cleaning, transformation, and
exploration tasks, making it a valuable tool in the field of data science.
It is commonly used in conjunction with other libraries such as
NumPy, Matplotlib, and scikit-learn for comprehensive data analysis
and machine learning workflows.
What kind of
data does
pandas handle?
• When working with
tabular data, such as data
stored in spreadsheets or
databases, pandas is the
right tool for you.
• pandas will help you to
explore, clean, and
process your data.
• In pandas, a data table is
called a DataFrame.
How do I read and write tabular data?
• In pandas, reading and writing tabular data is commonly done using
the read_* and to_* functions, where * can be replaced with various
file formats.
Reading Tabular Data: CSV
Reading Tabular Data: Excel
Reading Tabular Data: JSON
Reading Tabular Data: SQL Database
How do I read and write tabular data?
• pandas supports the integration with many file formats or data
sources out of the box (csv, excel, sql, json, parquet,…).
• Importing data from each of these data sources is provided by
function with the prefix read_*.
• Similarly, the to_* methods are used to store data.
How do I select a subset of a table?
• Selecting or filtering specific rows and/or columns? Filtering the data
on a condition?
• Methods for slicing, selecting, and extracting the data you need are
available in pandas.
How do I select a subset of a table? Cont…
• In pandas, you can select a subset of a DataFrame using various
methods. Here are some common techniques:
1. Selecting Columns:
To select one or more columns, you can use square brackets or the loc and iloc methods:
2. Selecting Rows:
To select rows based on conditions, you can use boolean indexing:
3. Selecting both Rows and Columns:
You can combine row and column selections using loc or iloc:
4. Selecting by Position:
You can use the iloc method to select rows and columns by their integer positions:
5. Using isin for Filtering:
You can use the isin method to filter rows based on a list of values:
6. Using query Method:
The query method allows you to write SQL-like queries to filter data:
How to create plots in pandas?
• pandas provides plotting your data out of the box, using the power of
Matplotlib.
• You can pick the plot type (scatter, bar, boxplot,…) corresponding to
your data.
How to create plots in pandas? Cont…
• Pandas integrates with the Matplotlib library to provide convenient
functions for creating various types of plots directly from DataFrames.
1. Line Plot:
2. Bar Plot:
3. Histogram:
4. Scatter Plot:
5. Box Plot:
6. Area Plot:
7. Pie Chart:
8. Kernel Density Estimate (KDE) Plot:
Multiple Plots:
You can create multiple plots in a
single figure using subplots:
Multiple Plots:
How to create new columns derived from
existing columns?
• There is no need to loop over all rows of your data table to do
calculations. Data manipulations on a column work elementwise.
• Adding a column to a DataFrame based on existing data in other
columns is straightforward.
How to calculate summary statistics?
• Basic statistics (mean, median, min, max, counts…) are easily
calculable.
• These or custom aggregations can be applied on the entire data set, a
sliding window of the data, or grouped by categories. The latter is also
known as the split-apply-combine approach.
How to reshape the layout of tables?
• Change the structure of your data table in multiple ways.
• You can melt() your data table from wide to long/tidy form or pivot()
from long to wide format. With aggregations built-in, a pivot table is
created with a single command.
How to combine data from multiple tables?
• Multiple tables can be concatenated both column wise and row wise
as database-like join/merge operations are provided to combine
multiple tables of data.
How to handle time series data?
• pandas has great support for time series and has an extensive set of
tools for working with dates, times, and time-indexed data.
How to manipulate textual data?
• Data sets do not only contain numerical data. pandas provides a wide
range of functions to clean textual data and extract useful information
from it.