KEMBAR78
Pandas DataFrame Basics Guide | PDF | Software Engineering | Information Technology
0% found this document useful (0 votes)
184 views4 pages

Pandas DataFrame Basics Guide

The document discusses various methods for creating, loading, manipulating, and analyzing dataframes in Pandas. Key points include: - Pandas series and dataframes can be created from arrays, dictionaries, and CSV files using functions like pd.Series(), pd.DataFrame(), and pd.read_csv(). - Data can be extracted from dataframes using indexing, column selection, .loc[], and .pivot_table(). Rows and columns can be renamed, merged, and concatenated. - Methods like .head(), .info(), .describe() provide information about the data in a dataframe.

Uploaded by

Dev D Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views4 pages

Pandas DataFrame Basics Guide

The document discusses various methods for creating, loading, manipulating, and analyzing dataframes in Pandas. Key points include: - Pandas series and dataframes can be created from arrays, dictionaries, and CSV files using functions like pd.Series(), pd.DataFrame(), and pd.read_csv(). - Data can be extracted from dataframes using indexing, column selection, .loc[], and .pivot_table(). Rows and columns can be renamed, merged, and concatenated. - Methods like .head(), .info(), .describe() provide information about the data in a dataframe.

Uploaded by

Dev D Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

PANDAS

You could create a Pandas series from an array-like


object using the following command: pd.Series(data, dtype)

To create a dataframe from a dictionary, you can run


the following command: pd.DataFrame(dictionary_name)

You can also provide lists or arrays to create dataframes, but you will have to
specify the column names as shown below.

pd.DataFrame(dictionary_name, columns = ['column_1', 'column_2'])

You can use the following command to load data into a dataframe from a csv
file:

pd.read_csv(filepath, sep=',', header='infer')

 use the following code to change the row indices:

dataframe_name.index

To change the index while loading the data from a file,


you can use the attribute 'index_col':
pd.read_csv(filepath, index_col = column_number)

For column header, you can specify the column names using the following
code:
dataframe_name.columns = list_of_column_names
While working with Pandas, the dataframes may hold large volumes of data. It
would be an inefficient approach to load the entire data whenever an operation is
performed. Hence, you must use the following code to load a limited number of
entries:

dataframe_name.head()

 dataframe.info(): This method prints information about the dataframe, which

includes the index data type and column data types, the count of non-null values and

the memory used.


 dataframe.describe(): This function produces descriptive statistics for the

dataframe, that is, the central tendency (mean, median, min, max, etc.), dispersion,

etc. It analyses the data and generates output for numeric and non-numeric data types

accordingly.

The selection of rows in dataframes is similar to the indexing you saw in NumPy


arrays.
The syntax  df[start_index:end_index] will subset the rows according to
the start and end indices.

You can select one or more columns from a dataframe using the following
commands:

 df['column'] or df.column: It returns a series

 df[['col_x', 'col_y']]: It returns a dataframe

You can use the loc method to extract rows and columns from a dataframe
based on the following labels:

dataframe.loc[[list_of_row_labels], [list_of_column_labels]]
You can use the following code to rename a column:

dataframe.rename(index={row_index: "new_name"}, columns={column_name:


"new_name"})

You can use the following code to set a multilevel index in a dataframe:

dataframe.set_index([column_1, column_2])

To obtain data from such dataframes, you have to provide the row details as a
tuple inside a list. You can go through the code provided below for reference:

dataframe.loc[[(label_1, sub_label_1), (label_1, sub_label_2)],


[column_label_1, column_label_2]] 

You can use the following command to create pivot tables in Pandas:

df.pivot(columns='grouping_variable_col', values='value_to_aggregate',
index='grouping_variable_row')

Using the pivot_table() function, you can specify the aggregate function


you would want Pandas to execute over the columns provided. It could be the
same or different for each column in the dataframe.

df.pivot_table(values, index, aggfunc={'value_1': np.mean,'value_2': [min,


max, np.mean]})

You can use the following command to merge two dataframes:

dataframe_1.merge(dataframe_2, on = ['column_1', 'column_2'], how = '____')


The how attribute in the code above specifies the type of merge to be performed:

 left: This will select the entries only in the first dataframe.

 right: This will consider the entries only in the second dataframe.

 outer: This takes the union of all the entries in the dataframes.

 inner: This will result in the intersection of the keys from both frames.

You can add columns or rows from one dataframe to another using the
concat() function:

pd.concat([dataframe_1, dataframe_2], axis = _)

You might also like