KEMBAR78
Basic Statistics | PDF | Categorical Variable | Data Analysis
0% found this document useful (0 votes)
103 views2 pages

Basic Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views2 pages

Basic Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

BASIC STATISTICS

Descriptive Analytics and Data Preprocessing on Sales &

Discounts Dataset

Introduction
● To perform descriptive analytics, visualize data distributions, and preprocess
the dataset for further analysis.

Descriptive Analytics for Numerical Columns


● Objective: To compute and analyze basic statistical measures for numerical
columns in the dataset.
● Steps:
Load the dataset into a data analysis tool or programming environment
(e.g., Python with pandas library).
Identify numerical columns in the dataset.
Calculate the mean, median, mode, and standard deviation for these
columns.
Provide a brief interpretation of these statistics.

Data Visualization
● Objective: To visualize the distribution and relationship of numerical and
categorical variables in the dataset.
● Histograms:
Plot histograms for each numerical column.
Analyze the distribution (e.g., skewness, presence of outliers) and provide
inferences.
● Boxplots:
Create boxplots for numerical variables to identify outliers and the
interquartile range.
Discuss any findings, such as extreme values or unusual distributions.
● Bar Chart Analysis for Categorical Column:
Identify categorical columns in the dataset.
Create bar charts to visualize the frequency or count of each category.
Analyze the distribution of categories and provide insights.

Standardization of Numerical Variables


● Objective: To scale numerical variables for uniformity, improving the dataset’s
suitability for analytical models.
● Steps:
Explain the concept of standardization (z-score normalization).
Standardize the numerical columns using the formula: z=x-mu/sigma

Show before and after comparisons of the data distributions.

Conversion of Categorical Data into Dummy Variables


● Objective: To transform categorical variables into a format that can be
provided to ML algorithms.
● Steps:
Discuss the need for converting categorical data into dummy variables
(one-hot encoding).
Apply one-hot encoding to the categorical columns, creating binary (0 or
1) columns for each category.
Display a portion of the transformed dataset.

Conclusion
● Summarize the key findings from the descriptive analytics and data
visualizations.
● Reflect on the importance of data preprocessing steps like standardization
and one-hot encoding in data analysis and machine learning.

You might also like