KEMBAR78
PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx
PANDA LIBRARY
Introduction
• “Pandas” – “Panel Data” and “Python Data Analysis”
• Powerful and versatile library, simplifies the task of data manipulation in Python.
• Built upon Numpy library, so lot of data structures of Numpy are used or replicated in Pandas.
• Pandas – well suited for working with tabular data – spreadsheets and SQL tables.
• Versatality – makes it an essential tool for data analytics, scientists and engineers working
with structured data in Python.
Where and why Pandas in AI?
• Case-study: Marketing campaigns conducted by a company – dataset contains info about
campaign type, budget, duration, reach, engagement metrics, sales performance.
• Pandas – used to locate datasets, display summary statistics, perform group-wise analysis to
understand the performance of the different marketing campaigns.
• Data visualization – of the sales performance and average engagement metrics for each
campaign – Plotted using Matplotlib – a popular plotting library in Python.
Pandas and AI
• Pandas provide powerful data manipulation and aggregation functionalities
• Makes it easy to perform complex analysis and generate insightful data visualizations.
• This capability is invaluable in AI and data-driven decision making processes, allowing
businesses to gain actionable insights from their data.
Data structures in Panda
• Provides – two data structures for manipulating data.
• a) Series and b) Data Frame
• Series – 1d array containing a series of values of any data type (int,
float, list, string etc) – default numeric data labels from 0 called
‘index’.
• Panda Series – similar to a column in a spreadsheet.
• Data Frame – 2d data structures – used when we need to work with
multiple columns at a time – to process tabular data
• Eg- result of a class, items in a restaurant menu, reservation chart of a
train etc.
Creating a Data Frame
• 2 methods
1. using NumPy ndarrays
2. using list of Dictionaries
1. Syntax
array_object = np.array([list of items])
dataframe_object = pd.DataFrame([list of array_object], rows =[list of row labels],
columns=[list of column labels])
Rows accessed with index or row labels.
Columns accessed with column labels or key in the case of Dictionary.
2. Syntax
listDict = [{key:value} pairs]
dataframe_object = pd.DataFrame(listDict, index = [list of row names as strings])
Dealing with Row and Columns
• Over to Python Script and Shell modes.
PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx
PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx
PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx
PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx

PANDASDATAFRAMES_SCIKIT_METRICS PPT.pptx

  • 1.
    PANDA LIBRARY Introduction • “Pandas”– “Panel Data” and “Python Data Analysis” • Powerful and versatile library, simplifies the task of data manipulation in Python. • Built upon Numpy library, so lot of data structures of Numpy are used or replicated in Pandas. • Pandas – well suited for working with tabular data – spreadsheets and SQL tables. • Versatality – makes it an essential tool for data analytics, scientists and engineers working with structured data in Python. Where and why Pandas in AI? • Case-study: Marketing campaigns conducted by a company – dataset contains info about campaign type, budget, duration, reach, engagement metrics, sales performance. • Pandas – used to locate datasets, display summary statistics, perform group-wise analysis to understand the performance of the different marketing campaigns. • Data visualization – of the sales performance and average engagement metrics for each campaign – Plotted using Matplotlib – a popular plotting library in Python.
  • 2.
    Pandas and AI •Pandas provide powerful data manipulation and aggregation functionalities • Makes it easy to perform complex analysis and generate insightful data visualizations. • This capability is invaluable in AI and data-driven decision making processes, allowing businesses to gain actionable insights from their data.
  • 3.
    Data structures inPanda • Provides – two data structures for manipulating data. • a) Series and b) Data Frame • Series – 1d array containing a series of values of any data type (int, float, list, string etc) – default numeric data labels from 0 called ‘index’. • Panda Series – similar to a column in a spreadsheet. • Data Frame – 2d data structures – used when we need to work with multiple columns at a time – to process tabular data • Eg- result of a class, items in a restaurant menu, reservation chart of a train etc.
  • 4.
    Creating a DataFrame • 2 methods 1. using NumPy ndarrays 2. using list of Dictionaries 1. Syntax array_object = np.array([list of items]) dataframe_object = pd.DataFrame([list of array_object], rows =[list of row labels], columns=[list of column labels]) Rows accessed with index or row labels. Columns accessed with column labels or key in the case of Dictionary. 2. Syntax listDict = [{key:value} pairs] dataframe_object = pd.DataFrame(listDict, index = [list of row names as strings])
  • 5.
    Dealing with Rowand Columns • Over to Python Script and Shell modes.