This document provides an introduction to Python for data science, focusing on the Pandas library, which is used for data manipulation and analysis. It covers fundamental concepts such as Series, DataFrame, and Panel data structures, including their creation, characteristics, and methods for statistical analysis. It also includes installation instructions for Pandas and explains how to manage data effectively within these structures.
Replay
• OOPS Continued------
•Anaconda Installation
• Jupyter Notebook Interface
• Working and Use of Jupyter Notebook
• Shortcuts for cell moving and marking
3.
Session 9:Basic datamanipulation with
pandas
Agenda
• What is Pandas
Series
DataFrame
Panel
• Installing Pandas
• Creating DataFrame
• Adding data in DataFrame using Append Function
• Getting Shape and information of the data
• Getting Statistical Analysis of Data
• Dropping Columns from Data
• Dropping Rows from Data
4.
Pandas
• Pandas isa Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and
manipulating data.
• The name "Pandas" has a reference to both "Panel Data", and
"Python Data Analysis" and was created by Wes McKinney in
2008.
5.
Series
• Pandas Seriesis a 1-dimensional structure resembling arrays
containing homogeneous data in it. It is a linear data structure and
stores elements in a single dimension.
• Note: The size of the Series Data Structure in Pandas is
immutable i.e once set, it cannot be changed dynamically. While
the values/elements in the Series can be changed or manipulated.
6.
Series Example
• Aone-dimensional labeled array capable of holding any data type
Output
7.
DataFrame
• Python Pandasmodule provides DataFrame that is a 2-dimensional
structure, resembling the 2-D arrays. Here, the input data is framed in
the form of rows and columns.
• Note: The size of the DataFrame Data Structure in Pandas is
mutable.
8.
DataFrame Example
• Atwo-dimensional labeled data structure with columns of potentially
different types.
Output
9.
Panel
• Python Pandasmodule offers a Panel that is a 3-dimensional data
structure and contains 3 axes to serve the following functions:
• items: (axis 0) Every item of it corresponds to a DataFrame in it.
• major_axis: (axis 1) It corresponds to the rows of each DataFrame.
• minor_axis: (axis 2) It corresponds to the columns of each DataFrame.
10.
Panel Data Example
•Athree-dimensional data structure designed for handling 3D data.
•As of Pandas version 1.0.0, Panels are deprecated and users are encouraged to use multi-index DataFrames
import numpy as np
panel = pd.Panel(np.random.rand(2, 3, 4), items=['Item1', 'Item2'], major_axis=['A', 'B', 'C'], minor_axis=['X1', 'X2', 'X3', 'X4'])
print(panel)
11.
Installation
There are variousways to install the Python Pandas module. One of the easiest ways is to
install using Python package installer i.e. PIP.
Getting Statistical Analysisof
Data
Statistical data analysis is a procedure of performing various statistical
operations. It is a kind of quantitative research, which seeks to quantify the
data. Quantitative data basically involves descriptive data, such as survey data
and observational data.