Restaurant Sales Analysis
Akshat Shah A007
Atharva Kawtikwar A012
Ronit Vengurlekar A049
Problem Statement:
The restaurant management wants to predict future sales based on historical
data and analyze sales patterns to optimize operations and increase revenue.
The restaurant management also wants to analyze sales data to gain insights
into revenue generation, popular pizza categories and sizes, seasonal
variations in sales, and average order values.
Solutions:
1. Analyze total revenue generated.
2. Calculate the average order value.
3. Determine the total pizzas sold and total number of orders.
4. Analyze the average pizzas per order.
5. Categorize pizzas by type and size to understand revenue distribution.
6. Identify seasonal patterns in sales revenue.
7. Build predictive models to forecast future sales.
8. Analyze historical sales data to identify trends and patterns.
9. Evaluate the performance of different predictive models.
10. Provide insights and recommendations based on analysis results.
Functionalities:
- Data preprocessing: Handling null values, feature engineering.
- Building predictive models: Linear Regression, K-Nearest Neighbors.
- Evaluation of model performance: Mean Squared Error (MSE), R-squared
(R2).
- Visualization of actual vs predicted sales.
- Calculate total revenue.
- Calculate average order value.
- Determine total pizzas sold.
- Analyze average pizzas per order.
- Categorize and analyze pizzas by category and size.
- Identify seasonal patterns in sales.
Algorithms - Explanation:
- Linear Regression: Used to model the relationship between previous sales
and other features.
- K-Nearest Neighbors (KNN): Utilized for regression to predict sales based on
similar instances.
- Categorization of seasons is done using a function that maps months to
seasons (Spring, Summer, Fall, Winter).
- Grouping and aggregation functions are utilized to calculate statistics like
average unit price and revenue per category/size.
CODE:
# -*- coding: utf-8 -*-
"""Restaurant_Sales.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/17Gco2dNtJS49EtOOAupKxK_s2Orq5c
Nq
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error , mean_squared_error ,
r2_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense , LSTM
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from datetime import datetime
import seaborn as sns
data=pd.read_csv("train.csv")
data.info()
data1=data.drop(['date'],axis=1)
sns.pairplot(data1)
# check for null values
data.isnull().sum()
"""Dropping item and stores"""
data = data.drop(['store','item'], axis=1 )
data.info()
data['date']=pd.to_datetime(data['date'])
plt.figure(figsize=(15,5))
plt.plot(data['date'], data['sales'])
plt.xlabel("Date")
plt.ylabel("Sales")
plt.title("Daily Customer Sales")
plt.show()
data['date']=data['date'].dt.to_period("W")
week_sales=data.groupby('date').sum().reset_index()
week_sales.head(10)
week_sales['date']=week_sales['date'].dt.to_timestamp()
plt.figure(figsize=(15,5))
plt.plot(week_sales['date'], week_sales['sales'])
plt.xlabel("Date")
plt.ylabel("Sales")
plt.title("Weekly Customer Sales")
plt.show()
week_sales['sales_diff']=week_sales['sales'].diff()
week_sales=week_sales.dropna()
week_sales.head(10)
supervised_data=week_sales.drop(['date','sales'], axis=1)
for i in range(1,53):
col_name="week"+str(i)
supervised_data[col_name]=supervised_data['sales_diff'].shift(i)
supervised_data=supervised_data.dropna().reset_index(drop=True)
supervised_data.head(10)
"""Split data for previous and coming weeks"""
prev_data=supervised_data[:-52]
com_data=supervised_data[-52:]
print(prev_data.shape)
print(com_data.shape)
scaler= MinMaxScaler(feature_range=(-1,1))
scaler.fit(prev_data)
prev_data=scaler.transform(prev_data)
com_data=scaler.transform(com_data)
X_prev, y_prev= prev_data[:,1:], prev_data[:,0:1]
X_com, y_com=com_data[:,1:],prev_data[:,0:1]
y_prev=y_prev.ravel()
y_com=y_com.ravel()
"""Prediction Model"""
sales_dates=week_sales['date'][-52:].reset_index(drop=True)
predict_df=pd.DataFrame(sales_dates)
act_sales=week_sales['sales'][-53:].to_list()
print(act_sales)
"""Linear Regresson Model and Prediction"""
lr_model=LinearRegression()
lr_model.fit(X_prev,y_prev)
lr_pre=lr_model.predict(X_com)
lr_pre=lr_pre.reshape(-1,1)
lr_pre_com_set=np.concatenate([lr_pre,X_com],axis=1)
lr_pre_com_set=scaler.inverse_transform(lr_pre_com_set)
result=[]
for j in range(0,len(lr_pre_com_set)):
result.append(lr_pre_com_set[j][0]+act_sales[j])
lr_pre_series=pd.Series(result,name="Linear Regression")
predict_df=predict_df.merge(lr_pre_series,left_index=True,right_index=T
rue)
lr_mse=np.sqrt(mean_squared_error(predict_df['Linear Regression'],
week_sales['sales'][-52:]))
lr_mae=mean_squared_error(predict_df['Linear Regression'],
week_sales['sales'][-52:])
lr_r2=r2_score(predict_df['Linear Regression'], week_sales['sales'][-
52:])
print("Linear Regression in MSE ",lr_mse)
print("Linear Regression in MAE ",lr_mae)
print("Linear Regression in R2 ",lr_r2)
"""Predicted vs Actual sales"""
plt.figure(figsize=(15,5))
#Actual Sales
plt.plot(week_sales['date'],week_sales['sales'])
#predicted Sales
plt.plot(predict_df['date'], predict_df['Linear Regression'])
plt.title("Customer sales using Model")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend(['Actual Sales','Predicted sales'])
plt.show()
"""K-"""
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Assuming the data is stored in a variable called 'data'
df = pd.DataFrame(data)
# Extract features from the date
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
# Drop the original 'date' column
df = df.drop(columns=['date'])
# Split the data into features and target
X = df.drop(columns=['sales'])
y = df['sales']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create a KNN model
knn_model = KNeighborsRegressor(n_neighbors=3)
# Fit the model with the training data
knn_model.fit(X_train, y_train)
# Predict the sales
y_pred = knn_model.predict(X_test)
# Calculate the metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
# Print the metrics
print("MSE: ", mse)
print("R2: ", r2)
import matplotlib.pyplot as plt
# Assuming 'y_test' is your actual sales and 'y_pred' is your predicted
sales
plt.figure(figsize=(10,6))
plt.plot(y_test.values, label='Actual')
plt.plot(y_pred, label='Predicted')
plt.title('Sales Prediction')
plt.xlabel('Observation')
plt.ylabel('Sales')
plt.legend()
plt.show()
# -*- coding: utf-8 -*-
"""Restaurant_Sales_2.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1BXnfzePwtIahek-
Qx83OrhznLpZetPZH
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pizza_df = pd.read_excel('/content/Data Model - Pizza Sales.xlsx')
total_revenue = (pizza_df['total_price']).sum()
print("Total Revenue:", total_revenue)
avg_order_value = pizza_df.groupby('order_id')
['total_price'].sum().mean()
print("Average Order Value:", avg_order_value)
total_pizzas_sold = pizza_df['quantity'].sum()
print("Total Pizzas Sold:", total_pizzas_sold)
total_orders = len(pizza_df.groupby('order_id').count())
print("Total Orders:", total_orders)
avg_pizzas_per_order = pizza_df['quantity'].sum() / total_orders
print("Average Pizzas per Order:", avg_pizzas_per_order)
category_analysis = pizza_df.groupby('pizza_category').agg(
average_unit_price=('unit_price', 'mean'),
revenue_per_category=('unit_price', lambda x: (x *
pizza_df['quantity']).sum())
).sort_values(by='revenue_per_category', ascending=False)
print("Average Unit Price and Revenue by Category:\n",
category_analysis)
# Revenue by Pizza Category
category_analysis['revenue_per_category'].plot(kind='bar',
color='skyblue')
plt.xlabel('Pizza Category')
plt.ylabel('Revenue')
plt.title('Revenue by Pizza Category')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
size_analysis = pizza_df.groupby('pizza_size').agg(
average_unit_price=('unit_price', 'mean'),
revenue_per_size=('unit_price', lambda x: (x *
pizza_df['quantity']).sum())
).sort_values(by='revenue_per_size', ascending=False)
print("Average Unit Price and Revenue by Size:\n", size_analysis)
# Revenue by pizza size
size_analysis['revenue_per_size'].plot(kind='bar', color='green')
plt.xlabel('Pizza Size')
plt.ylabel('Revenue')
plt.title('Revenue by Pizza Size')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
def categorize_season(month):
if month in ['March', 'April', 'May']:
return 'Spring'
elif month in ['June', 'July', 'August']:
return 'Summer'
elif month in ['September', 'October', 'November']:
return 'Fall'
else:
return 'Winter'
pizza_df['season'] =
pd.to_datetime(pizza_df['order_date']).dt.strftime('%B').map(categorize
_season)
seasonal_revenue_analysis = pizza_df.groupby('season')
['total_price'].sum()
print("Season with the Highest Revenue:\n", seasonal_revenue_analysis)
# Revenue by Season
seasonal_revenue_analysis.plot(kind='bar', color='red')
plt.xlabel('Season')
plt.ylabel('Revenue')
plt.title('Revenue by Season')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
Weka/Visualization:
- Visualizations are created using WEKA.
- Plots include bar graphs showing revenue by pizza category, size, and season.
- Plots include time series plots showing actual and predicted sales.
Patterns:
- Revenue patterns by pizza category, size, and season are identified.
- Seasonal variations in sales revenue are analyzed.
Predictions:
- Predictive analysis for future sales trends could be conducted based on
identified patterns.
Individual Contribution to the Project:
- Akshat Shah: Contributed to data analysis and visualization.
- Atharva Kawtikwar: Assisted in data preprocessing and coding.
- Ronit Vengurlekar: Contributed to algorithm development and insights
generation.
Observations:
Seasonal Sales Patterns:
o Upon visual inspection of the sales data, it's evident that sales
exhibit seasonal patterns. Sales tend to peak during certain times
of the year, particularly in the middle part of the year.
o This observation suggests that there might be external factors
influencing customer behavior, such as seasonal events, holidays,
or weather conditions, which impact sales volume.
o Understanding these seasonal patterns is crucial for the restaurant
management to plan marketing strategies, menu promotions, and
staffing levels effectively to capitalize on peak sales periods and
optimize revenue generation.
Trend Analysis:
o Besides seasonal fluctuations, there may also be underlying trends
in the sales data. Trend analysis helps in identifying long-term
changes in sales volume over time.
o Detecting and understanding these trends can provide valuable
insights into factors influencing overall sales growth or decline,
such as changes in consumer preferences, economic conditions, or
competition in the market.
o By recognizing and adapting to these trends, the restaurant
management can implement strategies to sustain growth,
mitigate risks, and stay competitive in the market.
Impact of External Factors:
o It's essential to consider the influence of external factors on sales
patterns, such as promotional activities, special events, or local
market dynamics.
o Peaks in sales during specific periods may coincide with
promotional campaigns, new product launches, or seasonal menu
offerings, indicating the effectiveness of marketing initiatives in
driving customer engagement and boosting sales.
o Additionally, analyzing the impact of external factors allows the
restaurant management to assess the success of marketing
strategies and make data-driven decisions to allocate resources
efficiently and maximize return on investment.
Customer Behavior Analysis:
o Sales patterns may also reflect underlying shifts in customer
behavior, preferences, or demographics.
o Analyzing customer data, such as purchase history, frequency of
visits, or order preferences, can provide valuable insights into
customer segmentation and targeting.
o Understanding customer behavior enables the restaurant
management to tailor products, services, and promotions to meet
the needs and preferences of different customer segments
effectively, enhancing customer satisfaction and loyalty.
Forecasting Challenges:
o While identifying sales patterns and trends is essential, forecasting
future sales accurately poses challenges due to the dynamic
nature of the restaurant industry and the influence of multiple
factors on customer behavior.
o Seasonal variations, trend changes, and unpredictable external
factors introduce uncertainty into sales forecasts, requiring
advanced predictive modeling techniques and continuous
monitoring and adjustment.
o By acknowledging these challenges and adopting robust
forecasting methodologies, the restaurant management can
improve the accuracy of sales predictions and make proactive
decisions to adapt to changing market conditions and maintain
profitability.
Overall, through detailed observation and analysis of sales data,
including seasonal patterns, trends, external factors, and customer
behavior, the restaurant management can gain actionable insights to
inform strategic planning, optimize operations, and drive sustainable
growth and success in the competitive restaurant industry landscape.
Conclusion:
The analysis of restaurant sales data reveals distinct seasonal patterns, with
sales peaking in the middle part of the year, indicating potential opportunities
for targeted promotions and menu offerings during these periods. Utilizing
predictive models like Linear Regression and K-Nearest Neighbors enables
forecasting of future sales trends, providing valuable insights for strategic
decision-making. Challenges such as forecasting accuracy and external factors
influence the interpretation of sales data, emphasizing the need for continuous
monitoring and adaptation. By leveraging these insights and refining predictive
models, the restaurant management can optimize operations, capitalize on
growth opportunities, and enhance competitiveness in the dynamic restaurant
industry landscape