KEMBAR78
Project Report TSF Extendec | PDF | Receiver Operating Characteristic | Forecasting
0% found this document useful (0 votes)
34 views52 pages

Project Report TSF Extendec

The project report focuses on time series forecasting for shoe sales and soft drink production, utilizing historical data from January 1980 to July 1995. It includes exploratory data analysis, model application (including Logistic Regression, KNN, and various exponential smoothing techniques), and performance evaluation through metrics like RMSE. The objective is to develop accurate forecasting models to aid strategic planning and operational efficiency for the respective companies.

Uploaded by

SWETA KUMARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views52 pages

Project Report TSF Extendec

The project report focuses on time series forecasting for shoe sales and soft drink production, utilizing historical data from January 1980 to July 1995. It includes exploratory data analysis, model application (including Logistic Regression, KNN, and various exponential smoothing techniques), and performance evaluation through metrics like RMSE. The objective is to develop accurate forecasting models to aid strategic planning and operational efficiency for the respective companies.

Uploaded by

SWETA KUMARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

TIME SERIES FORECASTING

(SHOE SALES &


SOFTDRINKS)

PROJECT REPORT
Student’s Name – Abinash Kumar Nag

PGP-DSBA

Page | 1
1 Problem 1: 5
1.1 Read the dataset. Do the descriptive statistics and do 5-8
thenull value condition check. Write an
inference on it.
1.2 Perform Univariate and Bivariate Analysis. Do 9-31
exploratory data analysis. Check for Outliers.
1.3 Encode the data (having string values) for 36-37
Modelling. Is Scaling necessary here or not? DataSplit:
Split the data into train and test (70:30).
1.4 Apply Logistic Regression and LDA (linear 37-40
discriminant analysis).
1.5 Apply KNN Model and Naïve Bayes Model. Interpret 40-43
the results.
1.6 Model Tuning, Bagging (Random Forest should be 44-57
applied for Bagging), and Boosting.
1.7 Performance Metrics: Check the performance of 56-77
Predictions on Train and Test sets using Accuracy,
Confusion Matrix, Plot ROC curve and get ROC_AUC
score for each model. Final Model: Compare the models
and write inference which
model is best/optimized.
1.8 Based on these predictions, what are the insights? 78-84
2 Problem 2: 85
2.1 Find the number of characters, words, and 85
sentences for the mentioned documents.
2.2 Remove all the stopwords from all three speeches. 86
2.3 Which word occurs the most number of times in his 86
inaugural address for each president? Mention thetop three
words. (after removing the stopwords)
2.4 Plot the word cloud of each of the speeches of the 87
variable. (after removing the stopwords)

Page | 2
LIST OF FIGURES
1.2 Histogram and box plot of NUMERICAL COLUMN 09
1.2 Count plot of CATEGORICAL VARIABLE 10
1.2 Count plot of 'economic.cond.national': 11
1.2 Count plot of 'economic.cond.household': 12
1.2 Count plot of 'Blair': 13
1.2 Count plot of 'Hague': 14
1.2 Count plot of 'Europe': 15
1.2 Count plot of 'political.knowledge': 16
1.2 Strip plot of 'vote' and 'age': 20
1.2 Strip plot of 'vote' and 'economic.cond.national': 21
1.2 Strip plot of 'vote' and 'economic.cond.household': 22
1.2 Strip plot of 'vote' and 'Blair': 24
1.2 Strip plot of 'vote' and 'Hague': 26
1.2 Strip plot of 'vote' and 'Europe': 27
1.2 Strip plot of 'vote' and 'political.knowledge': 28
1.2 Hist plot of vote 29
1.2 Countplot of Hague & Blair with reference to Economic Household 30-
condition & Economic Household national 32
Countplot of Hague, Blair with reference to political knowledge & 32-
Europe 33

1.2 Checking pair-wise distribution of the continuous variables: 34


1.2 Correlation matrix: 35
1.7 LGR - Regular - ROC and AUC - Train: 53
1.7 LGR - Regular - ROC and AUC - Test: 54
1.7 LGR - Regular - Confusion matrix - Train: 54
1.7 LGR - Regular - Confusion matrix - Test: 54
1.7 LGR - Tuned - ROC and AUC - Train: 55
1.7 LGR - Tuned - ROC and AUC - Test: 55
1.7 LGR - Tuned - Confusion matrix - Train: 65
1.7 LGR - Tuned - Confusion matrix - Test: 65
1.7 LDA - Regular - ROC and AUC - Train: 56
1.7 LDA - Regular - ROC and AUC - Test: 56
1.7 LDA - Regular - Confusion matrix - Train: 57
1.7 LDA - Regular - Confusion matrix - Test: 57
1.7 LDA - Tuned - ROC and AUC - Train: 66
1.7 LDA - Tuned - ROC and AUC - Test: 67
1.7 LDA - Tuned - Confusion matrix - Train: 67
1.7 LDA - Tuned - Confusion matrix - Test: 67
1.7 KNN - Regular - ROC and AUC - Train: 57
1.7 KNN - Regular - ROC and AUC - Test: 57
1.7 KNN - Regular - Confusion matrix - Train: 58

Page | 3
Problem 1:
Context:
In today's dynamic business environment, precise sales and
production forecasts are essential for strategic planning and
operational efficiency. Companies like IJK Shoe Company and RST
Firm have accumulated extensive monthly data on shoe sales and soft
drink production, respectively, spanning from January 1980 to July
1995. Leveraging advanced time series forecasting techniques, these
companies aim to utilize their historical data to predict future trends
accurately. This initiative enables them to make informed decisions,
optimize resource allocation, and adapt proactively to market
dynamics.

Objective:
The primary objective is to predict future sales for IJK Shoe Company and
production volumes for RST Firm over the next one year. By analyzing
the historical monthly data spanning from January 1980 to July 1995, our
goal is to develop accurate forecasting models that capture the underlying
patterns and seasonality inherent in the sales and production processes.
Through this task, we aim to empower IJK Shoe Company and RST Firm
with actionable insights that facilitate proactive planning, optimize
resource allocation, and enhance operational efficiency. By anticipating
future trends in sales and production, both companies can align their
strategies, streamline production-related activities, and capitalize on
emerging opportunities in their respective markets.

Page | 4
INTRODUCTION
This report consists of Time Series analysis and forecasting of 2 datasets
• DATASET 1 - Sales data of Shoe Sales
• DATASET 2 - Sales data of Soft Drink

Problem 1:
You are an analyst in the IJK shoe company and you are expected to
forecast the sales of the pairs of shoes for the upcoming 12 months from
where the data ends. The data for the pair of shoe sales have been given
to you from January 1980 to July 1995
Data Source-Shoesales.csv

Problem 2:
You are an analyst in the RST soft drink company and you are expected
to forecast the sales of the production of the soft drink for the upcoming
12 months from where the data ends. The data for the production of soft
drinks has been given to you from January 1980 to July 1995
Data Source- SoftDrink.csv

1.1 Define the problem and perform Exploratory Data Analysis.


Read the data as an appropriate time series data - Plot the data, perform
EDA & decomposition.
Total No. Of Shoe Sales Data Entries:187
Total No. Of Soft Drink Data Entries: 187
No. Of Missing Values in both data = 0
No. Of Duplicate entries in Shoe Sales data = 0
No. Of Duplicate entries in Soft Drink data = 0
Both datasets are split in Train : Test at year 1991 - Test data starts at
1991
Page | 5
Forecasting models applied are:
• Linear Regression
• Simple Average
• 2-pt Moving Average
• Single exponential Smoothing
• Double Exponential Smoothing
• Triple Exponential Smoothing (Holt-Winter Model)
• ARIMA / SARIMA (Auto fitted)
• ARIMA / SARIMA (Manually fitted)
1] Read the data as an appropriate Time Series data and plot the data

Both Datasets are read and stored as Pandas Data Frames for analysis
First 5 rows of both the data are given below
YearMonth Shoe_Sales

182 1995-03 188

183 1995-04 195

184 1995-05 189

185 1995-06 220

186 1995-07 274

YearMonth SoftDrinkProduction

0 1980-01 1954

1 1980-02 2302

2 1980-03 3054

3 1980-04 2414

4 1980-05 2226

Page | 6
Soft Drink Data Plot:

Shoe Sales Data Plot:

Page | 7
2.Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.

Since, the data don’t have outliers, there is no need of treatment of outliers &
duplicated values.

EDA of Shoe sales:


MOM plot of sales of shoes.

Observations:
Spike is introduced on the 3rd quarter of the year i.e. during NOV & DEC
month.

YOY sales of shoes.

Observations:
Year1987 saw a boom & the shoe sales was maximum.
Shoes YOY Sales- All Months

Page | 8
Observation:

• Period btw 1986 & 1988 saw the spike especially during December
Season followed by November Season.

• The diff of sales is may be due to holiday season during end of year.

EDA of Soft drinks:

MOM plot of Soft drinks:

Page | 9
Observations:

• December month was the production happened across years. The


boxplot is to understand the overview of production across months. The
2nd highest production was November.

YOY sales of the Soft Drink Productions:

Observations:
• Maximum productions were observed btw the period of 1994 & 1995

Page | 10
YOY productions for all months:

Observations:
• Maximum sales is observed in the month of December.

Additive Decomposition of Shoe Sales:

Page | 11
Multiplicative Decomposition of Shoe Sales:-

Page | 12
Additive Decomposition of Soft Drink:

Multiplicative Decomposition of Soft Drink:

Page | 13
Since we are looking at change in absolute quantity for this particular dataset
we move on with using the additive model.

Shoe Sales Forecast

1.3] Split the data into training and test. The test data should start in
1991.

The train shape is 132 and test is 55 for this dataset


we need one month test data for one year in further evaluation
& the test data is starts from 1991

Page | 14
Train data Head of the dataset: Test data Head of the dataset:

Train data Tail of the dataset:

Train Data Shape = (132, 1)

Page | 15
Test data Tail of the dataset:

Test Data Shape = (55, 1)

Graphic representation of Train and Test Split:

Shoe Sales- Train and Test split

1.4] Build various exponential smoothing models on the training


data and evaluate the model using RMSE on the test data.

Objective: The main objective of building so many models is to ensure we


pick an optimum model with the lowest RMSE and MAPE values.
Stratergy-
• We have build Linear Regression, Naïve Bayes Model, Simple Average
Page | 16
Models & check the performance of the model
• We also build various Exponential models to check the performance1
1.4.1]
Linear Regression:
Plot of the Shoe Sales of Linear Regression Model.
Linear Regression Model

Model Type RMSE


Regression On Time 244.810664

Page | 17
1.4.2]
Naïve Bayes Model:

Naïve Bayes Model


Model Type RMSE
Regression On Time 244.810664
Naïve Model 245.1213

Inference:
The RMSE values seem to be lowest for Naïve Bayes so far. But since the forecast is
constant through the years, it isn’t an ideal model for our dataset.

1.4.3]
Simple Average Forecast:

Page | 18
Model Type RMSE
RegressionOnTime 266.2765
NaiveModel 245.1213
SimpleAverageModel 61.714

Inference:
The RMSE values seem to be lowest for the Simple Average Method so far. But
since the forecast is constant through the years, it isn’t an ideal model for our
dataset.

1.4.4]
Moving Average Forecast:
• Moving Average Forecasting is a naive and effective technique in time
series forecasting.
• Moving average involves creating a new series where the values are
comprised of the average of raw observations in the original time
series.

Trailing Moving Average(2) Forecast

Model Type RMSE


RegressionOnTime 266.2765
NaiveModel 245.1213
SimpleAverageModel 63.98457

Page | 19
4pointTrailingMovingAverage
40.500621

1.4.5]
Simple Exponential Smoothening:

Simple Exponential Smoothing, is a time series forecasting method for univariate data
without a trend or seasonality

Simple Exponential Smoothening

Model Type RMSE


RegressionOnTime 266.2765
NaiveModel 245.1213
SimpleAverageModel 63.98457
4pointTrailingMovingAverage
40.500621
Simple Exponential Smoothening 192.641397

Page | 20
Double Exponential Smoothening
It employs a level component and a trend component at each period.
Double exponential smoothing uses two weights, (also called smoothing
parameters), to update the components at each period.

Figure-Simple and Double Exponential Smoothening

Model Type RMSE

RegressionOnTime 266.2765

NaiveModel 245.1213

SimpleAverageModel 63.98457

4pointTrailingMovingAverage
40.500621

Simple Exponential Smoothening 192.641397

Double Exponential Smoothening 247.788062

Page | 21
Triple Exponential Smoothing:

Holt’s winter method it’s an extension of double exponential smoothing


(Holt’s method) it incorporates the seasonality in addition to the level and trend
components.
The level captures the underlying pattern and it represents the
average value of the seasonality over time.
The Trend represent the rate of change the series over time.

Figure- Simple, Double and Triple Exponential Smoothening

Triple Exponential Smoothening (Multiplicative):


• This method is based on three smoothing equations: stationary component, trend,
and seasonal. This is the multiplicative model.
• The alpha value or smoothening level at which the graph is plotted is 0.551, while
the beta or smoothening trend is 0.0001 and gamma or smoothening seasonal is
0.30.

Page | 22
Figure-Triple Exponential Smoothening (Multiplicative)

Prediction of all Models

Figure-1 Simple, Double and Triple Exponential (tuned)& Linear Model


Page | 23
Model Type RMSE

RegressionOnTime 266.2765

NaiveModel 245.1213

SimpleAverageModel 63.98457

4pointTrailingMovingAverage
40.500621

Simple Exponential Smoothening 192.641397

Double Exponential Smoothening 247.788062

Triple Exp Smoothing Model: Level 0.57 97.286929


,Trend0.01 ,Seasonality0.27
Triple Exp Smoothing Model(tuned) 56.89

The RMSE values seem to be lowest for the 4 point Trailing Moving Average Method so far.

1.4] Check for stationarity - Make the data stationary (if needed)

• The Augmented Dickey-Fuller test is a unit root test which determines whether
there is a unit root and subsequently whether the series is non-stationary.
• The hypothesis in a simple form for the ADF test is:
H0: The Time Series has a unit root and is thus non-stationary.
H1: The Time Series does not have a unit root and is thus stationary.
• We would want the series to be stationary for building ARIMA models and thus we
would want the p-value of this test to be less than the Alpha value.
• When ADF was applied on the model we got a p-value of 0.601 which is higher than
0.5, hence we fail to reject the null hypothesis. Concluding that the series is not
stationary.
• We now have to do a level differencing on the dataset and check for Stationarity.
• The p-value after level 1 differencing is 0.0234<0.05, hence we now reject the null
hypothesis and conclude that the series is stationary with a lag of 1.
• Below is a graphic representation of the same. The test statistic value is -3.144211,
while the number of lags used is 13.
• Now that the data is stationary, we can move on to building the ARIMA and
SARIMA models.

Page | 24
Results of Dickey-Fuller Test:

Test Statistic -1.717397


p-value 0.6022172
#Lags Used 13.000000
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
Critical Value (5%) -2.878396
Critical Value (10%) -2.575756
dtype: float64

Before Differentiation:

Page | 25
After Integration:

Results of Dickey-Fuller Test:


Test Statistic -3.144211
p-value 0.023450
#Lags Used 13.000000
Number of Observations Used 117.000000
Critical Value (1%) -3.487517
Critical Value (5%) -2.886578
Critical Value (10%) -2.580124
dtype: float64

Page | 26
Stationary Shoe lag

1.5]
Build an automated version of the ARIMA/SARIMA model in
which the parameters are selected using the lowest Akaike
Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.

We first create a grid of all possible outcomes (p,d,q).


The range of ‘p’ and ‘q’ being (0,4) and ‘d’ a constant = 1.
Model performance is calculated by lowest AIC value which is then fitted into ARIMA
model

param AIC

11 (2, 1, 3) 1480.805493

ARIMA SUMMARY

Page | 27
Graph:

Diagnostics:

Page | 28
Details:
Model Type RMSE MAPE
AIC-ARIMA(2,1,3) 184.648 85.73498

SARIMA:

Again, creation a grid of all possible combinations of (p,d,q) along with seasonal
(P,D,Q) and seasonality of 12.
The range of ‘p’ and ‘q’ being (0,4) and ‘d’ a constant = 1.
Model performance is calculated by lowest AIC value which is then fitted into
SARIMA model

param seasonal AIC


23 (0, 1, 2) (1, 0, 2, 12) 1156.165429

We now fit the train data with the model and forecast on the test set. And we
get the SARIMA Summary, graph and diagnostic results

Inference:

Model Type RMSE MAPE


AIC-ARIMA(2,1,3) 184.648 85.73498

Page | 29
AIC-SARIMA(0, 1, 2)(1, 0, 2, 12) 69.03066 26.45588

Summary:

Graph:

Details:

Page | 30
Check the performance of the models built

• The AR order is selected by looking at where the PACF plot cuts-off (for
appropriate confidence interval bands) and the MA order is selected by
looking at where the ACF plots cuts-off (for appropriate confidence interval
bands).
• The correct degree or order of difference gives us the value of ‘d’ while the
‘p’ value is for the order of the AR model and the ‘q’ value is for the order of
the MA model.
• For SARIMA, the seasonal parameter ‘F’ can be determined by looking at the
ACF plots. The ACF plot is expected to show a spike at multiples of ‘F’
thereby indicating a presence of seasonality.
Also, for Seasonal models, the ACF and the PACF plots are going to behave a bit different
and they will not always continue to decay as the number of lags increase

We get the ‘p’ value from the PACF and the ‘q’ value from the ACF plot. The following are
the plots at d=1:

Page | 31
Figure Autocorrelation of Differenced Data

Figure Partial Autocorrelation of Differenced Data

Fitting of ARIMA model into (3,1,1). These values have been found from the ACF and PACF
plots.
Summary:

Figure-ACF/PACF(Summary)
Forecast:

Page | 32
graph
Diagnostics:

Figure-. Diagnostics
Observations:
Model Type RMSE MAPE
AIC-ARIMA(2,1,3) 184.648 85.73498
AIC-SARIMA(0, 1, 2)(1, 0, 2, 12) 69.03066 26.45588
ACF/PACF-ARIMA(3,1,1) 144.1839 66.91049

SARIMA:

Page | 33
We got ‘p’ value from the PACF and the ‘q’ value from the ACF plot.
From the above plots Figure 19 and 20 at d=1, frequency= 12. We additionally find
P, D, Q from the above plot by looking for seasonal peaks.

Fit the SARIMA model into (3,1,1) (2, 0, 4, 12). These values have been found
from the ACF and PACF plots. And we get the SARIMA Summary, graph and
diagnostic results.

Summary:

Graph:

Page | 34
Diagnostics:

Calculations:
Model Type RMSE MAPE
AIC-ARIMA(2,1,3) 184.648 85.73498
AIC-SARIMA(0, 1, 2)(1, 0, 2, 12) 69.03066 26.45588
ACF/PACF-ARIMA(3,1,1) 144.1839 66.91049
ACF/PACF-SARIMA(3,1,1)(2, 0, 4, 12) 109.9242 46.26953

Inference:

Page | 35
• AIC-SARIMA(0, 1, 2)(1, 0, 2, 12). Additionally, ARIMA models are more
computationally efficient and gives us accurate predictions.
• It also takes into consideration MAPE, and it is always a good idea to have
more than one accuracy parameter.

1.7] Make a forecast for the next 12 months


Graph is shown for the forecast of next 12 month using ARIMA model.

Figure-Optimum Model Forecast for next 12 months

Insights & Recommendations:

• The sales tend to pick up at the second half of the year, especially at the last
quarter more than the first. December records the highest sales in shoes.
• The spike may be due to the Holiday season, especially year end where people
tend to gift to others or for self use.
• It peaked in sales between 1986 and 1988. This peak may be due to widespread
interest and a lot of innovations, offers made to lure the customers into buying
their products, thus boosting sales.
• The Company can increase sales provided they focus on advertisement &
Marketing & launching of new type of shoes.
• With the decision of launching new variant & type of shoes, They can boost
sales & study the pattern of the sales of shoes & then further decide to
stop manufacture for the shoe type which are less demanded in the
market.
• This may help in year on year spike of shoe sales.
Page | 36
Sparkling Shoes

1.4] Building Different models and checking RMSE

Linear Regression:

Plotting of Graph:

Linear Regression

Model Type RMSE


Regression On Time 798.150383

Page | 37
Model Type RMSE
Regression On Time 798.1503
SimpleAverageModel 934.353357929829

Moving Average Forecast:

Figure-11 Trailing Moving Average Forecast


Page | 38
Model Type RMSE
Regression On Time 798.1503
SimpleAverageModel 934.353357929829
MovingAverage(2 pt Trailing) 429.354079

The RMSE values seem to be lowest for the 2 point Trailing Moving
Average is lowest

The alpha value or smoothening level at which the graph is plotted is 0.119.

Figure- Simple Exponential Smoothening

Model Type RMSE


Regression On Time 798.1503
SimpleAverageModel 934.353357929829
MovingAverage(2 pt Trailing) 429.354079
Single Exp. Smoothing Model: Level 0.12 817.697561

Page | 39
Double Exponential Smoothening:

• Double exponential smoothing uses two weights, (also called


smoothing parameters), to update the components at each
period.
• The alpha value or smoothening level at which the graph is
plotted is 0.124, while the beta or smoothening trend is 0.11.

Figure-Simple and Double Exponential Smoothening

Model Type RMSE


Regression On Time 798.1503
SimpleAverageModel 934.353357929829
MovingAverage(2 pt Trailing) 429.354079
Single Exp. Smoothing Model: Level 0.12 817.697561
Double Exp Smoothing Model: Level 0.12 931.309018
,Trend0.11

Page | 40
Triple Exponential Model
• Triple exponential smoothing is used to handle the time series data
containing a seasonal component. This method is based on three
smoothing equations: stationary component, trend, and seasonal. Both
seasonal and trend can be additive or multiplicative. This is the additive
model.

• The alpha value or smoothening level at which the graph is plotted is 0.15,
while the beta or smoothening trend is 0.039 and gamma or smoothening
seasonal is 0.262.

Model Type RMSE


Regression On Time 798.1503
SimpleAverageModel 934.353357929829
MovingAverage(2 pt Trailing) 429.354079
Single Exp. Smoothing Model: Level 0.12 817.697561
Double Exp Smoothing Model: Level 0.12 931.309018
,Trend0.11
Triple Exp Smoothing Model: Level 0.15 459.51
,Trend0.04 ,Seasonality0.26

The RMSE values seem to be lowest for the Triple Exponential


Smoothening Method so far.

Page | 41
Prediction of all Models

Model Type RMSE


Regression On Time 798.1503
SimpleAverageModel 934.353357929829
MovingAverage(2 pt Trailing) 429.354079
Single Exp. Smoothing Model: Level 0.12 817.697561
Double Exp Smoothing Model: Level 0.12 931.309018
,Trend0.11
Triple Exp Smoothing Model: Level 0.15 459.51
,Trend0.04 ,Seasonality0.26

1.5] Checking for Stationarity


• The hypothesis in a simple form for the ADF test is:

H0: The Time Series has a unit root and is thus non-stationary.
H1: The Time Series does not have a unit root and is thus stationary.

• When ADF was applied on the model we got a p-value of 0. 756854


which is higher than 0.5, hence we fail to reject the null hypothesis.
Concluding that the series is not stationary.
• The p-value after level 1 differencing is .01345<0.05, hence we now
reject the null hypothesis and conclude that the series is stationary with
Page | 42
a lag of 1.
• The test statistic value is -3.33, while the number of lags used is 12.
• Now that the data is stationary we can move on to building the
ARIMA and SARIMA models.

Before Differentiation Stationarity Check:

After Integration

Page | 43
Figure- Stationarity of Soft Drink Production at lag 1

1.6] ARIMA and SARIMA using lowest AIC method

ARIMA:

• Creation of grid of all possible outcomes (p,d,q). The range of ‘p’ and ‘q’ being
(0,4) and ‘d’ a constant = 1
• ARIMA model is fitted into each of the above combinations and end
up choosing that one with the least AIC value.
• The lowest AIC value is mentioned below:

param AIC

2 (2,0,2) 2054.66072

• We now fit the train data with the model and forecast on the test
set.

Page | 44
ARIMA Summary

GRAPH

Inference:

This is not a good model because it’s predicted far from the test data.

Page | 45
SARIMA

We fit the SARIMA model into each of the above combinations and end
up choosing that one with the least AIC value.

param seasonal AIC


26 (1,0,2) (1, 0 , 2, 5) 1865.43

We now fit the train data with the model and forecast on the test set. And we get the
SARIMA Summary, graph and diagnostic results

Summary

Graph:

Page | 46
Diagnostics:

Calculation

Inference:

The above graph represents this is also not a good model because it’s straight line
occurs in predicted SARIMA

Page | 47
1.7] ARIMA and SARIMA based on the cut-off points of ACF and
PACF:

ARIMA:

We get the ‘p’ value from thePACF and the ‘q’ value from the ACF
plot. The following are the plots at d=1:

Figure-19 Autocorrelation of Differenced Data

Figure-20 Partial Autocorrelation of Differenced Data

we get the ARIMA Summary, graphand diagnostic results.

Page | 48
A.

B.

SARIMA:
We then move on to fit the SARIMA model into (1,0,2) (0, 0, 2,5). These
values have been found from the ACF and PACF plots. And we get the
SARIMA Summary, graph and diagnostic results.

Page | 49
SUMMARY

Graph

Diagnostic

Page | 50
C.

1.9] Building of optimum model and 12 month forecast

Forecasting of Data with all the model.

Inference:

The model is a good model as it predicts the data closer to the test
data

Page | 51
1.9]Recommendations & Suggestions

• Production picks up at the second half of the year more than the
first. December records the highest sales in Soft drinks.
• Higher production most likely always is assumed to be a reason of
higher Sales of a particular commodity.
• It is observed & surprising that the sales are not high for the summer season
rather it is highest during Year End.
• In the monthly as well as the yearly trend, we see that December is
the most popular month for Soft Drink Production as well as the
year it peaked in production between 1988 and 1990.This peak may
be due to widespread better buying power of the consumers or
maybe any new innovation of soft drink.
• Since the sales of the Softdrinks exceeds the sales of the prior year.Therfore,
Manufacturers should ensure that they have enough production than the
preceding year.
• Audit of transportation & any difficulty before the peak should be conducted
as there should not be any supply chain breakdown during the period.

Page | 52

You might also like