Data Visualization using
Python:
Analyzing and Visualizing
Sales Data for Retail
Trend Analysis
Introduction
Nurazureen Binti
Yahya
Mohammad Nizwan
bin Mohd Nasir
Chuay Zi Yang
Group member
Project Goals
O b je c tiv e
✓ Poor data quality can make it
difficult to visualize data
✓ To analyze and visualize sales
data for a retail company to Project accurately or highlight trends
and patterns
identify trends and patterns ✓ Limited numerical data set
Goals
Constraint
3
CRISP-DM Methodology
4
Technical Stacks
Python application
Microsoft Excel
Microsoft
PowerPoint
Application and Software Packages used
Project Architecture and Data Preparation
Deployment
Data preparation
The dataset contains information about
the sales of a retail company, including
the date of sale, the product sold, the
Exploratory data analysis quantity, the price, and the total
revenue
Data cleaning by removing duplicates, handling missing values &
transforming the data if necessary
Data cleaning
• Clean and process the data by removing
Data preparation duplicates, handling missing values, and
transforming the data if necessary
Understand the business
problem
6
Model Building
7
Data sources
Python libraries
Syntax:
import pandas as pd o Import the relevant data sources into Python
import matplotlib.pyplot as plt using appropriate libraries and functions
import seaborn as sns)
o Add encoding='latin1' for pandas able to
load the Latin characters present in a
dataset
Import data sources into python
Syntax:
data = pd.read_csv("C:/Users/Desktop/Desktop/Data
Science/Innodatatics Python Project/sales_data_sample.csv",
encoding='latin1')
Data cleaning
Quick observation to see the right type of data
Syntax:
print(data.head)
Quick observation to see the right type of data
Syntax:
print(data.tail)
o By applying these syntaxes, we can easily
obtain the top 5 and bottom 5 results from
the data.
Continue…
Display column names
Syntax:
data.columns o New data frame syntax, users can accurately
copy the header names without having to
constantly refer back to the Excel file.
Create new data frame for important data only
for processing
Syntax:
data1 = data[['QUANTITYORDERED','PRICEEACH',
'SALES','ORDERDATE', 'STATUS', 'MONTH_ID', 'YEAR_ID',
'PRODUCTLINE', 'COUNTRY', 'DEALSIZE']]
o Column syntax, will retrieve all the header
titles from the data source
Continue…
o By utilizing this syntax, a new column will be
Check duplicated values ( Method 1 )
appended to a new data frame, and the user can
Syntax: verify each line from line 1 to line 2823 individually.
data1.duplicated()
Check duplicated values ( Method 2)
Syntax:
data_check_dupl = data1.copy()
data_check_dupl['Duplicated'] = data1.duplicated()
o Method 1, can obtain the top 5 and bottom
5 results from the data, and if the result is
"False," there is no duplicate data on the
data source.
Continue…
Check duplicated values ( Method 3 )
Syntax:
data_check_dupl['Duplicated'].value_counts()
o The output shows that all key headers have a
value of "0," which indicates that there are
no missing values in any of the rows
Check missing value
Syntax:
data1.isnull().sum()
o Method 3, if using this syntax will display
2822 lines with no 'True' value, indicating
that there are no duplicates present in the
data.
Visualization techniques
Check outliers Check outliers ( Box plot for sales data)
Syntax: Syntax:
data1.describe() plt.boxplot(data1.SALES)
o The output is presented in a tabular format
for numerical data, displaying the mean,
standard deviation, minimum, maximum,
and other relevant statistics.
o The box plot graph reveals the presence of
outliers in the data source
Continue…
Histogram graph for month sales Histogram graph for quantity ordered
o Histogram (Month Orders) – Peaked at year end in tandem with holidays and festive seasons.
o Histogram (Quantity Ordered) – Most clients made orders in quantity from 20 to 50.
Continue…
Histogram graph for sales data Histogram graph for price each
o Histogram (Sales Data) – Most of the sales made had values between 1000 to 6000.
o Histogram (Price Each) – Most of the product sold had prices between 90 to 100.
Continue…
Line graph for sales data
o Line Graph (Sales Data) – Increase of sales is seen
from Year 2003 to 2004 but significantly decreased
from Year 2004 to 2005.
Continue…
Pie chart for total sales by product type Pie chart for total sales by country
o The classic cars product type has emerged as o In terms of overall sales, the USA has
the top-selling product among all the types outperformed other regions, emerging as
the top-selling market
Continue…
Check data correlation
o To check the correlation between two
variables
o Sales and quantity ordered show a positive
and weak correlation