dropna() Drop missing observations
dropna(how='all') Drop observations where all cells is NA
dropna(axis=1, how='all') Drop column if all the values are missing
dropna(thresh = 5) Drop rows that contain less than 5 non-missing values
fillna(0) Replace missing values with zeros
isnull() returns True if the value is missing
notnull() Returns True for non-missing values
describe() Basic statistics (count, mean, std, min, quantiles, max)
min, max()Minimum and maximum values
mean, median, mode() Arithmetic average, median and mode
var(), std() Variance and standard deviation
sem() Standard error of mean
skew() Sample skewness
kurt() kurtosis
distplot() histogram
barplot() estimate of central tendency for a numeric variable
violinplot() similar to boxplot, also shows the probability density of the
Data
df.dtypes() is an attribute of a DataFrame that returns a Series with the data type of each column.
df.axes() is an attribute of a DataFrame that returns a list of the row axis labels and column axis labels.
df.ndim() is an attribute of a DataFrame that returns the number of dimensions of the DataFrame. For
a DataFrame, it always returns 2.
df.shape() is an attribute of a DataFrame that returns a tuple representing the dimensions of the
DataFrame (number of rows, number of columns).
df.values() is an attribute of a DataFrame that returns a Numpy representation of the DataFrame's
data. It returns a 2D Numpy array.
df.dropna() is a method used to remove rows or columns with missing values (NaN). By default, it
removes rows with any NaN values, but you can specify additional parameters like axis and how to
control the behavior.
df.sample([n]) is a method used to randomly select rows from the DataFrame. It returns a sample of
the DataFrame. You can specify the number of rows (n) to sample.
df.loc:
df.loc() is a method used to access a group of rows and columns by labels or a boolean array. It is
primarily label-based, but may also be used with a boolean array.
df.iloc:
df.iloc() is a method used for integer-location based indexing. It is primarily integer position-based,
but may also be used with a boolean array.
df.sort_values:
df.sort_values() is a method used to sort the DataFrame by the values along either axis.
DataFrame():
DataFrame() is a class constructor used to create a new DataFrame object, which is a two-
dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and
columns).
read_csv() is a function used to read data from a CSV file into a DataFrame. It automatically converts
the data into a DataFrame format.
head():
head() is a method used to display the first n rows of a DataFrame, where n is the number of rows
specified as an argument (default is 5).
tail() is a method used to display the last n rows of a DataFrame, where n is the number of rows
specified as an argument (default is 5).
info():
info() is a method used to print a concise summary of a DataFrame, including the index dtype and
column dtypes, non-null values, and memory usage.
describe():
describe() is a method used to generate descriptive statistics of the DataFrame's numerical columns,
including count, mean, standard deviation, minimum, maximum, and percentiles.
shape is an attribute of a DataFrame that returns a tuple representing the dimensions of the
DataFrame (number of rows, number of columns).
columns is an attribute of a DataFrame that returns an Index object containing the column labels.
index is an attribute of a DataFrame that returns an Index object containing the row labels.
set_index() is a method used to set the DataFrame's index (row labels) using one or more existing
columns.
reset_index() is a method used to reset the DataFrame's index (row labels) to the default integer
index and move the old index into a new column.
drop() is a method used to remove rows or columns from a DataFrame based on labels or indexes.
fillna() is a method used to fill missing (NaN) values in a DataFrame with a specified value or method.
groupby() is a method used to group DataFrame rows by values in one or more columns, allowing for
aggregation and analysis.
jointplot() Scatterplot
regplot() Regression plot
pairplot() Pairplot
boxplot() boxplot
swarmplot() categorical scatterplot
factorplot() General categorical plot
Pandas Methods Quiz
.
What does the df.head() method do in Pandas?
.
A) Returns the first 10 rows of the DataFrame
B) Returns the last 5 rows of the DataFrame
C) Returns the first 5 rows of the DataFrame
D) Returns the last 10 rows of the DataFrame
Answer: C) Returns the first 5 rows of the DataFrame
.
What does the df.tail(3) method do in Pandas?
.
A) Returns the first 3 rows of the DataFrame
B) Returns the last 3 rows of the DataFrame
C) Returns the last 5 rows of the DataFrame
D) Returns the first 5 rows of the DataFrame
Answer: B) Returns the last 3 rows of the DataFrame
.
What does the df.info() method do in Pandas?
.
A) Provides basic information about the DataFrame, such as the
number of rows and columns
B) Returns descriptive statistics of the DataFrame
C) Removes rows with missing values
D) Sorts the DataFrame based on specified columns
Answer: A) Provides basic information about the DataFrame, such as the number
of rows and columns
.
What does the df.describe() method do in Pandas?
.
A) Provides basic information about the DataFrame, such as the
number of rows and columns
B) Returns descriptive statistics of the DataFrame
C) Removes rows with missing values
D) Sorts the DataFrame based on specified columns
Answer: B) Returns descriptive statistics of the DataFrame
.
What does the df.shape attribute return in Pandas?
.
A) Number of rows in the DataFrame
B) Number of columns in the DataFrame
C) Dimensions of the DataFrame (number of rows, number of
columns)
D) Data types of each column in the DataFrame
Answer: C) Dimensions of the DataFrame (number of rows, number of columns)
.
What does the df.dropna() method do in Pandas?
.
A) Removes duplicate rows from the DataFrame
B) Removes rows with missing values
C) Removes specified columns from the DataFrame
D) Sorts the DataFrame based on specified columns
Answer: B) Removes rows with missing values
.
What does the df.groupby() method do in Pandas?
.
A) Groups the DataFrame by specified columns and applies an
aggregate function
B) Sorts the DataFrame based on specified columns
C) Returns the unique values in a specified column
D) Merges two DataFrames based on common columns
Answer: A) Groups the DataFrame by specified columns and applies an aggregate
function
.
What does the df.sort_values() method do in Pandas?
.
A) Groups the DataFrame by specified columns and applies an
aggregate function
B) Sorts the DataFrame based on specified columns
C) Returns the unique values in a specified column
D) Merges two DataFrames based on common columns
Answer: B) Sorts the DataFrame based on specified columns
.
What does the df.rename() method do in Pandas?
.
A) Removes specified columns from the DataFrame
B) Renames specified columns in the DataFrame
C) Returns the unique values in a specified column
D) Merges two DataFrames based on common columns
Answer: B) Renames specified columns in the DataFrame
.
What does the df.drop_duplicates() method do in Pandas?
.
A) Removes duplicate rows from the DataFrame
B) Removes rows with missing values
C) Groups the DataFrame by specified columns and applies an
aggregate function
D) Sorts the DataFrame based on specified columns
Answer: A) Removes duplicate rows from the DataFrame
You
more mcq
ChatGPT
ChatGPT
Sure, here are some more multiple-choice questions (MCQs) about Pandas
methods:
.
What does the df.mean() method do in Pandas?
.
A) Computes the mean value of each column in the DataFrame
B) Computes the median value of each column in the DataFrame
C) Computes the mode value of each column in the DataFrame
D) Computes the standard deviation of each column in the
DataFrame
Answer: A) Computes the mean value of each column in the DataFrame
.
What does the df.sum() method do in Pandas?
.
A) Computes the sum of each row in the DataFrame
B) Computes the sum of each column in the DataFrame
C) Computes the cumulative sum of each column in the DataFrame
D) Computes the product of each column in the DataFrame
Answer: B) Computes the sum of each column in the DataFrame
.
What does the df.min() method do in Pandas?
.
A) Returns the minimum value of each column in the DataFrame
B) Returns the minimum value of each row in the DataFrame
C) Returns the index of the minimum value in each column
D) Returns the index of the minimum value in each row
Answer: A) Returns the minimum value of each column in the DataFrame
.
What does the df.max() method do in Pandas?
.
A) Returns the maximum value of each column in the DataFrame
B) Returns the maximum value of each row in the DataFrame
C) Returns the index of the maximum value in each column
D) Returns the index of the maximum value in each row
Answer: A) Returns the maximum value of each column in the DataFrame
.
What does the df.count() method do in Pandas?
.
A) Counts the number of non-null values in each column of the
DataFrame
B) Counts the number of null values in each column of the
DataFrame
C) Counts the number of non-null values in each row of the
DataFrame
D) Counts the number of null values in each row of the DataFrame
Answer: A) Counts the number of non-null values in each column of the
DataFrame
.
What does the df.median() method do in Pandas?
.
A) Computes the median value of each column in the DataFrame
B) Computes the mean value of each column in the DataFrame
C) Computes the mode value of each column in the DataFrame
D) Computes the standard deviation of each column in the
DataFrame
Answer: A) Computes the median value of each column in the DataFrame
.
What does the df.std() method do in Pandas?
.
A) Computes the standard deviation of each column in the
DataFrame
B) Computes the mean value of each column in the DataFrame
C) Computes the mode value of each column in the DataFrame
D) Computes the median value of each column in the DataFrame
Answer: A) Computes the standard deviation of each column in the DataFrame
.
What does the df.unique() method do in Pandas?
.
A) Returns the unique values in a specified column
B) Removes duplicate rows from the DataFrame
C) Computes the mean value of each column in the DataFrame
D) Computes the sum of each column in the DataFrame
Answer: A) Returns the unique values in a specified column
.
What does the df.nunique() method do in Pandas?
.
A) Counts the number of unique values in each column of the DataFrame
B) Returns the unique values in a specified column
C) Removes duplicate rows from the DataFrame
D) Computes the sum of each column in the DataFrame
Answer: A) Counts the number of unique values in each column of the DataFrame
.
What does the df.value_counts() method do in Pandas?
.
A) Counts the number of unique values in each column of the DataFrame
B) Returns the unique values in a specified column
C) Computes the sum of each column in the DataFrame
D) Counts the occurrences of each unique value in a specified column
Answer: D) Counts the occurrences of each unique value in a specified column