Data
Handling
Dealing with Duplicates
▪ The drop_duplicates() method in Pandas is designed to remove duplicate rows
from a DataFrame based on all columns or specific ones. By default, it scans
the entire DataFrame and retains the first occurrence of each row and removes
any duplicates that follow.
▪
This example shows how duplicate rows are removed while retaining the first
occurrence using dataframe.drop_duplicates().
Scaling Data
(Nomralization)
Perform aggregation, summarizing and grouping data at
https://www.geeksforgeeks.org/python/pandas-groupby-summarising-
aggregating-and-grouping-data-in-python/
Understanding and creating boxplots for outlier detection from
https://www.geeksforgeeks.org/python/finding-the-outlier-points-from-matplotlib/
Working with Missing Data in Pandas from
https://www.geeksforgeeks.org/data-analysis/working-with-missing-data-in-pandas/