KEMBAR78
Python Chrat Book Pandas | PDF | Computer Programming | Software Engineering
0% found this document useful (0 votes)
6 views4 pages

Python Chrat Book Pandas

it is pandas book

Uploaded by

murnalinikulkarn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Python Chrat Book Pandas

it is pandas book

Uploaded by

murnalinikulkarn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Pandas Library Documentation: Core Data Structures

and Methods

May 31, 2025

1 Introduction
Pandas is a powerful Python library for data manipulation and analysis. It provides two primary
data structures: Series and DataFrame, which are built on top of NumPy arrays. This document
outlines these data structures, their key methods, and commonly used keyword arguments, with
explanations for practical use.

2 Pandas Series
A Series is a one-dimensional labeled array capable of holding data of any type (integers, strings,
floats, etc.). It is similar to a column in a spreadsheet or a vector with labels (index).

2.1 Key Attributes


• values: Returns the underlying NumPy array.
• index: Returns the index (labels) of the Series.
• dtype: Returns the data type of the Series elements.

2.2 Key Methods


• Series(data=None, index=None, dtype=None, name=None, copy=False)
– Description: Creates a new Series.
– Parameters:

* data: Input data (list, dict, ndarray, scalar, etc.).

* index: Index labels (default: range index).

* dtype: Data type for the Series (e.g., int64, float64).

* name: Name of the Series (useful when converting to DataFrame).

* copy: If True, copies input data (default: False).


– Example: pd.Series([1, 2, 3], index=[’a’, ’b’, ’c’], name=’Numbers’)
• head(n=5)
– Description: Returns the first n rows.
– Parameters:

* n: Number of rows to return (default: 5).


– Example: series.head(3) returns the first 3 elements.
• tail(n=5)

1
– Description: Returns the last n rows.
– Parameters:

* n: Number of rows to return (default: 5).


– Example: series.tail(2) returns the last 2 elements.
• describe(percentiles=None, include=None, exclude=None)
– Description: Generates descriptive statistics (count, mean, std, min, max, etc.).
– Parameters:

* percentiles: List of percentiles to include (e.g., [0.25, 0.5, 0.75]).

* include: Data types to include (e.g., ’all’, [np.number]).

* exclude: Data types to exclude (e.g., [np.object]).


– Example: series.describe() summarizes numerical data.
• mean(axis=0, skipna=True, numerico nly = F alse)
• Description: Computes the mean of the Series.
• Parameters:
– axis: Axis to compute (0 for Series, default).
– skipna: Exclude NA/NaN values (default: True).
– numerico nly : Includeonlynumericdata(def ault : F alse).
• Example: series.mean() returns the average value.
sum(axis=0, skipna=True, numerico nly = F alse)
Description: Computes the sum of the Series.
Parameters: Same as mean.
Example: series.sum() returns the total sum.
value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
• Description: Counts unique values and returns a Series.
• Parameters:
– normalize: If True, returns relative frequencies (default: False).
– sort: Sort by values (default: True).
– ascending: Sort in ascending order (default: False).
– bins: Group data into bins for numeric data (default: None).
– dropna: Exclude NA/NaN values (default: True).
• Example: series.value_counts() counts occurrences of each value.
sort_values(ascending=True, inplace=False, kind=’quicksort’, na_position=’last’,
ignore_index=False)
• Description: Sorts the Series by values.
• Parameters:
– ascending: Sort in ascending order (default: True).
– inplace: If True, modifies the Series in place (default: False).
– kind: Sorting algorithm (’quicksort’, ’mergesort’, ’heapsort’, default: ’quicksort’).
– na_position: Position of NA/NaN values (’first’, ’last’, default: ’last’).

2
– ignore_index: If True, resets index after sorting (default: False).
• Example: series.sort_values(ascending=False) sorts in descending order.
sort_index(ascending=True, inplace=False, kind=’quicksort’, na_position=’last’,
ignore_index=False)
• Description: Sorts the Series by index labels.
• Parameters: Same as sort_values.
• Example: series.sort_index() sorts index in ascending order.

3 Pandas DataFrame
A DataFrame is a two-dimensional, tabular data structure with labeled axes (rows and columns).
It is similar to a spreadsheet or SQL table.

3.1 Key Attributes


• columns: Returns the column labels.
• index: Returns the row labels.
• shape: Returns a tuple of (rows, columns).
• dtypes: Returns the data types of each column.

3.2 Key Methods


• DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
– Description: Creates a new DataFrame.
– Parameters:

* data: Input data (dict, list, ndarray, Series, etc.).

* index: Row labels (default: range index).

* columns: Column labels (default: inferred from data).

* dtype: Data type for the DataFrame.

* copy: Copy input data (default: depends on input).


– Example: pd.DataFrame({’A’: [1, 2], ’B’: [3, 4]}, index=[’x’, ’y’])
• head(n=5)
– Description: Returns the first n rows.
– Parameters:

* n: Number of rows (default: 5).


– Example: df.head() shows the first 5 rows.
• tail(n=5)
– Description: Returns the last n rows.
– Parameters: Same as head.
– Example: df.tail(3) shows the last 3 rows.
• describe(percentiles=None, include=None, exclude=None)
– Description: Generates descriptive statistics for numeric columns.

3
– Parameters: Same as Series.describe.
– Example: df.describe() summarizes all numeric columns.
• groupby(by=None, axis=0, level=None, as_index=True, sort=True, dropna=True)
– Description: Groups data by specified columns for aggregation.
– Parameters:

* by: Column(s) or function to group by.

* axis: Axis to group along (default: 0).

* level: Group by index level (if multi-index).

* as_index: Return group labels as index (default: True).

* sort: Sort group keys (default: True).

* dropna: Drop NA/NaN in group keys (default: True).


– Example: df.groupby(’column’).mean() computes mean per group.
• merge(right, how=’inner’, on=None, left_on=None, right_on=None, suffixes=(’′x ,′y ′ ))
• Description: Merges two DataFrames using database-style joins.
• Parameters:
– right: DataFrame to merge with.
– how: Type of merge (’left’, ’right’, ’outer’, ’inner’, default: ’inner’).
– on: Column(s) to join on.
– left_on: Columns from left DataFrame.
– right_on: Columns from right DataFrame.
– suffixes: Suffixes for overlapping column names (default: (’′x ,′y ′ )).
• Example: df1.merge(df2, on=’key’, how=’left’) performs a left join.
to_csv(path_or_buf=None, sep=’,’, index=True, encoding=’utf-8’)
• Description: Writes DataFrame to a CSV file.
• Parameters:
– path_or_buf: File path or buffer (default: None, returns string).
– sep: Delimiter (default: comma).
– index: Write row names (default: True).
– encoding: Encoding for output file (default: ’utf-8’).
• Example: df.to_csv(’output.csv’) saves DataFrame to CSV.
sort_values(by, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’la
ignore_index=False)
• Description: Sorts the DataFrame by specified column(s).
• Parameters:
– by: Column name(s) to sort by (single string or list).
– axis: Axis to sort (0 for rows, 1 for columns, default: 0).
– ascending: Sort in ascending order (True) or descending (False, default: True).
– inplace: If True, modifies DataFrame in place (default: False).
– kind: Sorting algorithm (’quicksort’, ’mergesort’, ’heapsort’, default: ’quicksort’).

You might also like