Pandas Library Documentation: Core Data Structures
and Methods
May 31, 2025
1 Introduction
Pandas is a powerful Python library for data manipulation and analysis. It provides two primary
data structures: Series and DataFrame, which are built on top of NumPy arrays. This document
outlines these data structures, their key methods, and commonly used keyword arguments, with
explanations for practical use.
2 Pandas Series
A Series is a one-dimensional labeled array capable of holding data of any type (integers, strings,
floats, etc.). It is similar to a column in a spreadsheet or a vector with labels (index).
2.1 Key Attributes
• values: Returns the underlying NumPy array.
• index: Returns the index (labels) of the Series.
• dtype: Returns the data type of the Series elements.
2.2 Key Methods
• Series(data=None, index=None, dtype=None, name=None, copy=False)
– Description: Creates a new Series.
– Parameters:
* data: Input data (list, dict, ndarray, scalar, etc.).
* index: Index labels (default: range index).
* dtype: Data type for the Series (e.g., int64, float64).
* name: Name of the Series (useful when converting to DataFrame).
* copy: If True, copies input data (default: False).
– Example: pd.Series([1, 2, 3], index=[’a’, ’b’, ’c’], name=’Numbers’)
• head(n=5)
– Description: Returns the first n rows.
– Parameters:
* n: Number of rows to return (default: 5).
– Example: series.head(3) returns the first 3 elements.
• tail(n=5)
1
– Description: Returns the last n rows.
– Parameters:
* n: Number of rows to return (default: 5).
– Example: series.tail(2) returns the last 2 elements.
• describe(percentiles=None, include=None, exclude=None)
– Description: Generates descriptive statistics (count, mean, std, min, max, etc.).
– Parameters:
* percentiles: List of percentiles to include (e.g., [0.25, 0.5, 0.75]).
* include: Data types to include (e.g., ’all’, [np.number]).
* exclude: Data types to exclude (e.g., [np.object]).
– Example: series.describe() summarizes numerical data.
• mean(axis=0, skipna=True, numerico nly = F alse)
• Description: Computes the mean of the Series.
• Parameters:
– axis: Axis to compute (0 for Series, default).
– skipna: Exclude NA/NaN values (default: True).
– numerico nly : Includeonlynumericdata(def ault : F alse).
• Example: series.mean() returns the average value.
sum(axis=0, skipna=True, numerico nly = F alse)
Description: Computes the sum of the Series.
Parameters: Same as mean.
Example: series.sum() returns the total sum.
value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
• Description: Counts unique values and returns a Series.
• Parameters:
– normalize: If True, returns relative frequencies (default: False).
– sort: Sort by values (default: True).
– ascending: Sort in ascending order (default: False).
– bins: Group data into bins for numeric data (default: None).
– dropna: Exclude NA/NaN values (default: True).
• Example: series.value_counts() counts occurrences of each value.
sort_values(ascending=True, inplace=False, kind=’quicksort’, na_position=’last’,
ignore_index=False)
• Description: Sorts the Series by values.
• Parameters:
– ascending: Sort in ascending order (default: True).
– inplace: If True, modifies the Series in place (default: False).
– kind: Sorting algorithm (’quicksort’, ’mergesort’, ’heapsort’, default: ’quicksort’).
– na_position: Position of NA/NaN values (’first’, ’last’, default: ’last’).
2
– ignore_index: If True, resets index after sorting (default: False).
• Example: series.sort_values(ascending=False) sorts in descending order.
sort_index(ascending=True, inplace=False, kind=’quicksort’, na_position=’last’,
ignore_index=False)
• Description: Sorts the Series by index labels.
• Parameters: Same as sort_values.
• Example: series.sort_index() sorts index in ascending order.
3 Pandas DataFrame
A DataFrame is a two-dimensional, tabular data structure with labeled axes (rows and columns).
It is similar to a spreadsheet or SQL table.
3.1 Key Attributes
• columns: Returns the column labels.
• index: Returns the row labels.
• shape: Returns a tuple of (rows, columns).
• dtypes: Returns the data types of each column.
3.2 Key Methods
• DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
– Description: Creates a new DataFrame.
– Parameters:
* data: Input data (dict, list, ndarray, Series, etc.).
* index: Row labels (default: range index).
* columns: Column labels (default: inferred from data).
* dtype: Data type for the DataFrame.
* copy: Copy input data (default: depends on input).
– Example: pd.DataFrame({’A’: [1, 2], ’B’: [3, 4]}, index=[’x’, ’y’])
• head(n=5)
– Description: Returns the first n rows.
– Parameters:
* n: Number of rows (default: 5).
– Example: df.head() shows the first 5 rows.
• tail(n=5)
– Description: Returns the last n rows.
– Parameters: Same as head.
– Example: df.tail(3) shows the last 3 rows.
• describe(percentiles=None, include=None, exclude=None)
– Description: Generates descriptive statistics for numeric columns.
3
– Parameters: Same as Series.describe.
– Example: df.describe() summarizes all numeric columns.
• groupby(by=None, axis=0, level=None, as_index=True, sort=True, dropna=True)
– Description: Groups data by specified columns for aggregation.
– Parameters:
* by: Column(s) or function to group by.
* axis: Axis to group along (default: 0).
* level: Group by index level (if multi-index).
* as_index: Return group labels as index (default: True).
* sort: Sort group keys (default: True).
* dropna: Drop NA/NaN in group keys (default: True).
– Example: df.groupby(’column’).mean() computes mean per group.
• merge(right, how=’inner’, on=None, left_on=None, right_on=None, suffixes=(’′x ,′y ′ ))
• Description: Merges two DataFrames using database-style joins.
• Parameters:
– right: DataFrame to merge with.
– how: Type of merge (’left’, ’right’, ’outer’, ’inner’, default: ’inner’).
– on: Column(s) to join on.
– left_on: Columns from left DataFrame.
– right_on: Columns from right DataFrame.
– suffixes: Suffixes for overlapping column names (default: (’′x ,′y ′ )).
• Example: df1.merge(df2, on=’key’, how=’left’) performs a left join.
to_csv(path_or_buf=None, sep=’,’, index=True, encoding=’utf-8’)
• Description: Writes DataFrame to a CSV file.
• Parameters:
– path_or_buf: File path or buffer (default: None, returns string).
– sep: Delimiter (default: comma).
– index: Write row names (default: True).
– encoding: Encoding for output file (default: ’utf-8’).
• Example: df.to_csv(’output.csv’) saves DataFrame to CSV.
sort_values(by, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’la
ignore_index=False)
• Description: Sorts the DataFrame by specified column(s).
• Parameters:
– by: Column name(s) to sort by (single string or list).
– axis: Axis to sort (0 for rows, 1 for columns, default: 0).
– ascending: Sort in ascending order (True) or descending (False, default: True).
– inplace: If True, modifies DataFrame in place (default: False).
– kind: Sorting algorithm (’quicksort’, ’mergesort’, ’heapsort’, default: ’quicksort’).