KEMBAR78
PYTHON PANDAS.pptx
PYTHON PANDAS
PANDAS STANDS FOR
“PYTHON DATA ANALYSIS LIBRARY”.
Pandas is a high-level data manipulation tool
developed by Wes Mckinney.
It is built on the numpy package and its key data
structure is called the dataframe.
Dataframes allow you to store and
manipulate tabular data in rows of observations and
columns of variables.
Using pandas, we can accomplish five typical steps in the
processing and analysis of data , regardless of the origin of
data –
LOAD
PREPARE
MANIPULATE
MODEL
ANALYZE
Python with pandas is used in a wide range of
fields including academic and commercial
domains including finance, economics,
statistics, analytics, etc.
Key features of pandas
Fast and efficient dataframe object with default and
customized indexing.
Tools for loading data into in-memory data objects from
different file formats.
Data alignment and integrated handling of missing data.
Label-based slicing, indexing and subsetting of large data
sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
What’s cool about pandas is that it takes data (like a CSV or TSV
file, or a SQL database) and creates a python object with rows
and columns called data frame that looks very similar to table in
a statistical software (think excel or SPSS for example. This is so
much easier to work with in comparison to working with lists
and/or dictionaries through for loops or list comprehension
INSTALLATION AND GETTING STARTED
In order to “get” Pandas you would need to install it. You would also need
to have Python 2.7 and above as a pre-requirement for installation. It is
also dependent on other libraries (like Numpy) and has optional
dependancies (like Matplotlib for plotting). Therefore, I think that the
easiest way to get Pandas set up is to install it through a package like
the Anaconda distribution , “a cross platform distribution for data analysis
and scientific computing.” There you can download the Windows, OS X and
Linux versions.
In order to use pandas in your python IDE (integrated
development environment) like jupyter notebook or spyder ,
you need to import the pandas library first. Importing a library
means loading it into the memory and then it’s there for you
to work with.
In order to import pandas all you have to do is run the
following code:
import pandas as pd
import numpy as np
Usually you would add the second part (‘as pd’) so you
can access pandas with ‘pd.Command’ instead of
needing to write ‘pandas.Command’ every time you
need to use it. Also, you would import numpy as well,
because it is very useful library for scientific computing
with python. Now pandas is ready for use!
Remember, you would need to do it every time you start
a new jupyter notebook, spyder file etc.
import pandas as pd
data = [ [ ‘Alex’,10], [‘Bob’,12],[‘Clarke’,13] ]
df = pd.DataFrame (data, columns = [‘Name’ , ‘Age ‘])
print ( df )
OUTPUT
NAME AGE
0 ALEX 10
1 BOB 12
2 CLARKE 13
# Create a DataFrame from Dictionary
import pandas as pd
data = { ‘Name ‘ : [ ‘Tom’,’Jack’,’Steve’,’Ricky’], ‘Age’ : [28,34,29,42] }
df = pd.DataFrame (data)
print ( df )
OUTPUT
AGE NAME
0 28 TOM
1 34 JACK
2 29 STEVE
3 42 RICKY
# Create an indexed DataFrame
import pandas as pd
data = { ‘Name ‘ : [ ‘Tom’,’Jack’,’Steve’,’Ricky’], ‘Age’ : [28,34,29,42] }
df = pd.DataFrame (data, index = [‘rank 1’, ‘rank 2’, ‘rank 3’, ‘rank
4’ ])
print ( df )
OUTPUT
AGE NAME
RANK 1 28 TOM
RANK 2 34 JACK
RANK 3 29 STEVE
RANK 4 42 RICKY
# Create a DataFrame from List of Dictionaries
import pandas as pd
data = [ { ‘a’ :1 , ‘b’ : 2 } , { ‘a’ : 5 , ‘b’ : 10 , ‘c’ : 20 } ]
df = pd.DataFrame (data)
print ( df )
OUTPUT
a b c
1 2 NaN
5 10 20
It is also possible to get statistics on the entire data frame or a series (a
column etc):
 df.mean() Returns the mean of all columns
 df.corr() Returns the correlation between columns in a data frame
 df.count() Returns the number of non-null values in each data frame
column
 df.max() Returns the highest value in each column
 df.min() Returns the lowest value in each column
 df.median() Returns the median of each column
 df.std() Returns the standard deviation of each column
THANKS

PYTHON PANDAS.pptx

  • 1.
  • 2.
    PANDAS STANDS FOR “PYTHONDATA ANALYSIS LIBRARY”.
  • 3.
    Pandas is ahigh-level data manipulation tool developed by Wes Mckinney. It is built on the numpy package and its key data structure is called the dataframe. Dataframes allow you to store and manipulate tabular data in rows of observations and columns of variables.
  • 4.
    Using pandas, wecan accomplish five typical steps in the processing and analysis of data , regardless of the origin of data – LOAD PREPARE MANIPULATE MODEL ANALYZE
  • 5.
    Python with pandasis used in a wide range of fields including academic and commercial domains including finance, economics, statistics, analytics, etc.
  • 6.
    Key features ofpandas Fast and efficient dataframe object with default and customized indexing. Tools for loading data into in-memory data objects from different file formats. Data alignment and integrated handling of missing data. Label-based slicing, indexing and subsetting of large data sets. Columns from a data structure can be deleted or inserted. Group by data for aggregation and transformations. High performance merging and joining of data.
  • 7.
    What’s cool aboutpandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a python object with rows and columns called data frame that looks very similar to table in a statistical software (think excel or SPSS for example. This is so much easier to work with in comparison to working with lists and/or dictionaries through for loops or list comprehension
  • 8.
    INSTALLATION AND GETTINGSTARTED In order to “get” Pandas you would need to install it. You would also need to have Python 2.7 and above as a pre-requirement for installation. It is also dependent on other libraries (like Numpy) and has optional dependancies (like Matplotlib for plotting). Therefore, I think that the easiest way to get Pandas set up is to install it through a package like the Anaconda distribution , “a cross platform distribution for data analysis and scientific computing.” There you can download the Windows, OS X and Linux versions.
  • 9.
    In order touse pandas in your python IDE (integrated development environment) like jupyter notebook or spyder , you need to import the pandas library first. Importing a library means loading it into the memory and then it’s there for you to work with. In order to import pandas all you have to do is run the following code: import pandas as pd import numpy as np
  • 10.
    Usually you wouldadd the second part (‘as pd’) so you can access pandas with ‘pd.Command’ instead of needing to write ‘pandas.Command’ every time you need to use it. Also, you would import numpy as well, because it is very useful library for scientific computing with python. Now pandas is ready for use! Remember, you would need to do it every time you start a new jupyter notebook, spyder file etc.
  • 11.
    import pandas aspd data = [ [ ‘Alex’,10], [‘Bob’,12],[‘Clarke’,13] ] df = pd.DataFrame (data, columns = [‘Name’ , ‘Age ‘]) print ( df )
  • 12.
    OUTPUT NAME AGE 0 ALEX10 1 BOB 12 2 CLARKE 13
  • 13.
    # Create aDataFrame from Dictionary import pandas as pd data = { ‘Name ‘ : [ ‘Tom’,’Jack’,’Steve’,’Ricky’], ‘Age’ : [28,34,29,42] } df = pd.DataFrame (data) print ( df )
  • 14.
    OUTPUT AGE NAME 0 28TOM 1 34 JACK 2 29 STEVE 3 42 RICKY
  • 15.
    # Create anindexed DataFrame import pandas as pd data = { ‘Name ‘ : [ ‘Tom’,’Jack’,’Steve’,’Ricky’], ‘Age’ : [28,34,29,42] } df = pd.DataFrame (data, index = [‘rank 1’, ‘rank 2’, ‘rank 3’, ‘rank 4’ ]) print ( df )
  • 16.
    OUTPUT AGE NAME RANK 128 TOM RANK 2 34 JACK RANK 3 29 STEVE RANK 4 42 RICKY
  • 17.
    # Create aDataFrame from List of Dictionaries import pandas as pd data = [ { ‘a’ :1 , ‘b’ : 2 } , { ‘a’ : 5 , ‘b’ : 10 , ‘c’ : 20 } ] df = pd.DataFrame (data) print ( df )
  • 18.
    OUTPUT a b c 12 NaN 5 10 20
  • 19.
    It is alsopossible to get statistics on the entire data frame or a series (a column etc):  df.mean() Returns the mean of all columns  df.corr() Returns the correlation between columns in a data frame  df.count() Returns the number of non-null values in each data frame column  df.max() Returns the highest value in each column  df.min() Returns the lowest value in each column  df.median() Returns the median of each column  df.std() Returns the standard deviation of each column
  • 20.