PANDA LIBRARY
Introduction
• “Pandas”– “Panel Data” and “Python Data Analysis”
• Powerful and versatile library, simplifies the task of data manipulation in Python.
• Built upon Numpy library, so lot of data structures of Numpy are used or replicated in Pandas.
• Pandas – well suited for working with tabular data – spreadsheets and SQL tables.
• Versatality – makes it an essential tool for data analytics, scientists and engineers working
with structured data in Python.
Where and why Pandas in AI?
• Case-study: Marketing campaigns conducted by a company – dataset contains info about
campaign type, budget, duration, reach, engagement metrics, sales performance.
• Pandas – used to locate datasets, display summary statistics, perform group-wise analysis to
understand the performance of the different marketing campaigns.
• Data visualization – of the sales performance and average engagement metrics for each
campaign – Plotted using Matplotlib – a popular plotting library in Python.
2.
Pandas and AI
•Pandas provide powerful data manipulation and aggregation functionalities
• Makes it easy to perform complex analysis and generate insightful data visualizations.
• This capability is invaluable in AI and data-driven decision making processes, allowing
businesses to gain actionable insights from their data.
3.
Data structures inPanda
• Provides – two data structures for manipulating data.
• a) Series and b) Data Frame
• Series – 1d array containing a series of values of any data type (int,
float, list, string etc) – default numeric data labels from 0 called
‘index’.
• Panda Series – similar to a column in a spreadsheet.
• Data Frame – 2d data structures – used when we need to work with
multiple columns at a time – to process tabular data
• Eg- result of a class, items in a restaurant menu, reservation chart of a
train etc.
4.
Creating a DataFrame
• 2 methods
1. using NumPy ndarrays
2. using list of Dictionaries
1. Syntax
array_object = np.array([list of items])
dataframe_object = pd.DataFrame([list of array_object], rows =[list of row labels],
columns=[list of column labels])
Rows accessed with index or row labels.
Columns accessed with column labels or key in the case of Dictionary.
2. Syntax
listDict = [{key:value} pairs]
dataframe_object = pd.DataFrame(listDict, index = [list of row names as strings])
5.
Dealing with Rowand Columns
• Over to Python Script and Shell modes.