KEMBAR78
Pandas | PDF | Data Analysis | Data
0% found this document useful (0 votes)
38 views8 pages

Pandas

Pandas is a powerful Python library widely used in data science for data manipulation and analysis, providing structures like DataFrame and Series for handling relational data. It simplifies tasks such as data cleansing, merging datasets, and statistical analysis, making it essential for data preparation and exploration. Common applications include data cleaning, visualization, machine learning, and financial analysis, with its data structures built on top of Numpy for performance.

Uploaded by

usawant163
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Pandas

Pandas is a powerful Python library widely used in data science for data manipulation and analysis, providing structures like DataFrame and Series for handling relational data. It simplifies tasks such as data cleansing, merging datasets, and statistical analysis, making it essential for data preparation and exploration. Common applications include data cleaning, visualization, machine learning, and financial analysis, with its data structures built on top of Numpy for performance.

Uploaded by

usawant163
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Pandas

DR. ARCHANA RAJE


What is Pandas?
➢ Python pandas is one of the most widely-used Python libraries in data
science and analytics.
➢ It provides high-performance, easy-to-use structures, and data analysis
tools.
➢ Pandas is a powerful Python library that is specifically designed to work on
data frames that have "relational" or "labeled" data.
➢ Two-dimensional table objects in pandas are referred to as DataFrame, as
well as Series.
➢ It is a structure that contains column names and row labels.
➢ This Python package works well for data manipulation, operating a dataset,
exploring a data frame, data analysis, and machine learning-related tasks.
Why Pandas?
Pandas simplifies the task related to data frames and makes it simple to
do many of the time-consuming, repetitive tasks involved in working with
data frames, such as:
➢ Import datasets - available in the form of spreadsheets, comma-
separated values (CSV) files, and more.
➢ Data cleansing - dealing with missing values and representing them as
NaN, NA, or NaT.
➢ Size mutability - columns can be added and removed from DataFrame
and higher-dimensional objects.
➢ Data normalization – normalize the data into a suitable format for
analysis.
➢ Data alignment - objects can be explicitly aligned to a set of labels.
Why Pandas?
➢ Intuitive merging and joining data sets – we can merge and join
datasets.
➢ Reshaping and pivoting of datasets – datasets can be reshaped
and pivoted as per the need.
➢ Efficient manipulation and extraction - manipulation and
extraction of specific parts of extensive datasets using intelligent
label-based slicing, indexing, and subsetting techniques.
➢ Statistical analysis - to perform statistical operations on datasets.
➢ Data visualization - Visualize datasets and uncover insights.
Applications of Pandas
The most common applications of Pandas are as follows:
➢ Data Cleaning: Pandas provides functionalities to clean messy data, deal with incomplete or
inconsistent data, handle missing values, remove duplicates, and standardize formats to do
effective data analysis.
➢ Data Exploration: Pandas easily summarize statistics, find trends, and visualize data using built-in
plotting functions, Matplotlib, or Seaborn integration.
➢ Data Preparation: Pandas may pivot, melt, convert variables, and merge datasets based on
common columns to prepare data for analysis.
➢ Data Analysis: Pandas supports descriptive statistics, time series analysis, group-by operations, and
custom functions.
➢ Data Visualisation: Pandas itself has basic plotting capabilities; it integrates and supports data
visualisation libraries like Matplotlib, Seaborn, and Plotly to create innovative visualisations.
➢ Time Series Analysis: Pandas supports date/time indexing, resampling, frequency conversion, and
rolling statistics for time series data.
Applications of Pandas
The most common applications of Pandas are as follows:
➢ Data Aggregation and Grouping: Pandas groupby() function lets you aggregate data and
compute group-wise summary statistics or apply functions to groups.
➢ Data Input/Output: Pandas makes data input and export easy by reading and writing CSV,
Excel, JSON, SQL databases, and more.
➢ Machine Learning: Pandas works well with Scikit-learn for data preparation, feature
engineering, and model input data.
➢ Web Scraping: Pandas may be used with BeautifulSoup or Scrapy to parse and analyse
structured web data for web scraping and data extraction.
➢ Financial Analysis: Pandas is commonly used in finance for stock market data analysis,
financial indicator calculation, and portfolio optimization.
➢ Text Data Analysis: Pandas' string manipulation, regular expressions, and text mining
functions help analyse textual data.
➢ Experimental Data Analysis: Pandas makes manipulating and analysing large datasets,
performing statistical tests, and visualising results easy.
Introduction to Data Structures
Pandas deals with the following three data Data Structure Dimensions Description
structures −
1D labeled
➢ Series Series 1 homogeneous array,
➢ DataFrame sizeimmutable.

➢ Panel General 2D labeled,


size-mutable tabular
These data structures are built on top of Numpy
array, which means they are fast. structure with
Data Frames 2
potentially
heterogeneously
The best way to think of these data structures is typed columns.
that the higher dimensional data structure is a
container of its lower dimensional data structure. General 3D labeled,
For example, DataFrame is a container of Series, Panel 3
size-mutable array.
Panel is a container of DataFrame.
Introduction to Data Structures
Series Panel
Series is a one-dimensional
DataFrame Panel is a three-dimensional data structure
with heterogeneous data. It is hard to
array like structure with DataFrame is a two-dimensional
represent the panel in graphical
homogeneous data. array with heterogeneous data. representation. But a panel can be illustrated
as a container of DataFrame.

Note − DataFrame is widely used and one of the most important data structures. Panel is used much less.

You might also like