Getting Started with Pandas Cheatsheet
# import from JSON string, file, URL df.iloc[10 : 20] # select rows from 10 to 20
pandas is open-source and the most popular Python tool for data pd.read_json(json_string)
# select all rows with columns at position 2, 4, and 5
wrangling and analytics. It is fast, intuitive, and can handle # extract tables from HTML file, URL df.iloc[ : , [2, 4, 5]]
pd.read_html(url)
multiple data formats such as CSV, Excel, JSON, HTML, and # select all rows with columns from sale to profit
df.loc[ : , 'sale' : 'profit']
SQL.
Exporting Data # filter the dataframe using logical condition and select sale
and profit columns
These commands are commonly used to export files in various df.loc[df['sale'] > 10, ['sale', 'profit']]
Creating DataFrames formats but you can also export the dataframe into binary Feather,
HDF5, BigQuery table, and Pickle file. df.iat[1, 2] # select a single value using positioning
Change the layout, rename the column names, append rows and
df.to_csv(filename) # export CSV tabular file df.at[4, 'sale'] # select single value using label
Create a pandas dataframe object by specifying the columns name
and index.
df.to_excel(filename) # export Excel file
From dictionary:
# apply modifications to SQL database Querying
df.to_sql(table_name, connection_object)
df = pd.DataFrame( {"A" : [1, 4, 7], "B" : [2, 5, 8],
"C" : [3, 6, 3]}, index=[101, 102, 103]) df.to_json(filename) # export JSON format file Filter out the rows using logical conditions. The query() returns a
boolean for filtering rows.
From multi-dimensional list:
df.query('sale > 20') # filters rows using logical conditions
df = pd.DataFrame( [[1, 2, 3], [4, 5, 6],[7, 8, 3]], Inspecting Data
index=[101, 102, 103], columns=['A', 'B', 'C']) df.query('sale > 20 and profit < 30') # combining conditions
Understand the data and the distribution by using these
commands. # string logical condition
df.query('company.str.startswith("ab")', engine="python")
# view first n rows or use df.tail(n) for last n rows
A B C df.head(n)
# display and ordered first n values or use df.nsmallest(n, Reshaping Data
101 1 2 3 'value') for ordered last n rows
df.nlargest(n, 'value')
Change the layout, rename the column names, append rows and
102 4 5 6 df.sample(n=10) # randomly select and display n rows columns, and sort values and index.
Df.shape # view number of rows and columns pd.melt(df) # combine columns into rows
103 7 8 9
# view the index, datatype and memory information # convert rows into columns
df.info() df.pivot(columns='var', values='val')
# view statistical summary of numerical columns pd.concat([df1,df2], axis = 0) # appending rows
df.describe()
Importing Data pd.concat([df1,df2], axis = 1) # appending columns
# view unique values and counts of the city column
df.city.value_counts() # sort values by sale column from high to low
Import the data from text, Excel, website, database, or nested df.sort_values('sale', ascending=False)
JSON file.
df.sort_index() # sort the index
Subsetting
pd.read_csv(file_location) # import tabular CSV file
df.reset_index() # move the index to columns
pd.read_table(file_location) # import delimited text file Select a single row or column and multiple rows or columns using
these commands. # rename a column using dictionary
pd.read_excel(file_location) # import Excel file df.rename(columns = {'sale':'sales'})
df['sale'] # select a single column
# connect and extract the data from SQL database # removing sales and profit columns from dataframe
pd.read_sql(query, connection_object) df[['sale', 'profit']] # select two selected columns df.drop(columns=['sales', 'profit'])