2020 3rd International Conference on Computing, Mathematics and Engineering Technologies – iCoMET 2020
ATM Cash Prediction Using Time Series Approach
Muhammad Rafi∗ , Mohammad Taha Wahab† , Muhammad Bilal Khan‡ , Hani Raza§
Department of Computer Science, FAST National University,
Karachi, Pakistan.
Email: ∗ muhammad.rafi@nu.edu.pk, † k163763@nu.edu.pk, ‡ k163778@nu.edu.pk, § k163803@nu.edu.pk
Abstract—One of the main challenges in today’s banking institutions believe that the demand for cash is based on the
industry is to forecast the cash demand of their ATM network. three important points [13]:
Each ATM must be filled with the right amount so that neither a
customer’s transaction is rejected because of out-of-cash status, 1) ATM location
nor the idle cash ruins the opportunity for the bank to earn profit 2) Seasonal factor
on it. This paper proposed a time-series model for forecasting 3) Historical patterns from the users
the cash demands of each ATM in a network of ATMs for
a specific financial institution. Using the transaction data of There are research works that utilize transactional data from
each ATM we build a Vector Auto Regressive Moving Average the historical timeline to predict the cash demands for an ATM,
with Exogenous Variable model (VAR-MAX) for each ATM. We
compared our proposed approach with one recently proposed it is very important for banks to use intelligent processing
Recurrent Neural Network (RNN) approach termed as Long- techniques to optimize and reduce the cost for their operational
Short Term Memory(LSTM) which was reported to performed tasks [11]. The cash demand for forecasting for ATM is a
best for this problem. Our proposed model using the exogenous challenging problem, and there are two important approaches
variable performed better than this model. The study used a that try to solve the same problem.
dataset comprises of transaction of 7 ATMs from the period
of June 2013 to December 2015 from some of the financially In the machine learning approach, there are algorithms like
busy areas of Karachi Pakistan. The Symmetric Mean Absolute regression, deep learning, random forests, support vector ma-
Percentage Error (SMAPE) is used to reports evaluation from chines which are used to make forecasting. Any institution
the experiments. might have multiple ATMs such as banks. A machine learning
Keywords—Vector Auto Regressive Moving Average, Symmet- model can be trained here with individual ATM id as a feature
ric Mean Absolute Percentage Error, Mean Squared Error, Time
Series, Seasonal Auto Regressive Integrated Moving Average,
space. These algorithms do not handle seasonality factors very
Neural Networks. well. Hence, this affects the complexity of the calculation
by increasing it linearly. By dropping the ATM ’ID’ feature
we can have sub data sets to divide the workload. This will
I. I NTRODUCTION give us thousands of machine learning models with each
Automatic Teller Machines (ATMs) are devices used by model translating a smaller dataset [10]. Thus, training time
financial institutions to support a group of services round the will be reduced by having different models for deposit and
clock in public space or area automatically. One of the major withdrawal. As a result, each ATM will have two different
services these ATMs offer is the disbursement of cash from machine learning models. The other approaches are from
a corresponding account of a user. There was more than 1.6 statistical science and especially time series analysis [5].
million ATM in the world [12], from a survey conducted by Reference [8] reports that there are two critical observations.
ATMIA (ATM Industry Association). The number increases They reported that the time series model they developed
by 10-12% per year basis. Maintaining the smooth network performed very poorly on the dataset. We study the problems
of ATMs with the regular transactional workload is very associated with their poor results are based on applying the
challenging for the banks. ATM replenishment is the activity time series model without preprocessing for stationary data.
of refilling ATM with cash. There are two important issues The model they selected was very simple and would not take
with proper maintenance of cash in an ATM. If the ATM is care of implicit seasonal and cyclic components on demand.
filled with a lot of amount the cash will lie idle and bank We study the different time series models with making certain
lose an opportunity to invest the extra amount to get some preprocessing and identification of exogenous variables from
profit out of it. On the other hand, if the amount is not the data. Our Vector Auto Regressive Moving Average with
enough and after dispensing all the amount, there will be out Exogenous Variables (VAR-MAX) model performed excep-
of cash transactions, which makes the customers unsatisfied tionally well on the same data. Another serious observation
with the services. Forecasting the actual demand of cash for of [8] work is their selection of evaluation metric. They used
every ATM is a long standing desire from the banks. Financial residual sum of square(RSS) error. The RSS is not a good
metrics for estimating a model with time series approach for
a number of distinct devices like ATM. We favour the use
978-1-7281-4970-7/20/$31.00 © 2020 IEEE of Symmetric Mean Absolute Percentage Error (SMAPE), an
accuracy measure derived from relative or percentage error. [7].
It has both a lower bound and an upper bound. It can be Auto Regression Integrated and Moving Average ARIMA is a
used across all the distinct ATM to report their performance time series model ideal for short term prediction [2]. ARIMA
of forecast. model gives great accuracy in forecasting which decreases
There are several contributions from this paper: with respect to the increase in time length of prediction. This
i A Vector Auto Regressive Moving Average with Exoge- method can be applied and suitable for cases of the high
nous Variables (VAR-MAX) model is proposed for ATM technology market especially for the banks since it gives a
cash demand forecasting. significant indicator for the future. The method was limited to
ii The data is preprocessed and transformed into a ready short term forecasting and it is not useful for the long term on
to utilize form for the VAR-MAX, we conducted several this topic. Future research in this topic includes other forecast
experiments on different ATMs with a different timeline. horizons stock market data such as industrial data.
iii The experiments are critically evaluated and compared Neural Networks is one of the most modern solutions available
with top performing work on the same dataset. for time series analysis. As per any machine learning model,
The paper is organized with the next section II describing it requires a vast amount of historical data in order to predict
the related work. Section IV discuss methodology, Section V with the most precise prediction of the future dataset. As per
will explain our data and model approach for the experiments. the results obtained by [3], Neural Networks NN provides
Section VI discusses the results from the experiments. better and feasible results over a regression model. The Box-
Jenkins approach, which uses Autoregressive Moving Aver-
II. L ITERATURE R EVIEW age has been ideal for this case of time series forecasting.
Numerous attempts have been carried out regarding different
ATM cash flow prediction is a challenging problem, and
neural networks for processing time series as far as related
various approaches in literature give varying degree of per-
works are concerned. Numerous studies have grown to show
formance for this task, a handful of Neural Networks and
the superiority of Neural Networks over ARMA. However,
a few observing a time series approach and solving it with
it is questionable if an Artificial Neural Network ANN can
similar models like ARIMA, SARIMA, and VARMAX. The
continuously outperform ARMA models in all situations.
root problem lies in understanding the trends, seasonality and
”Therefore, none of them is a universal model that is suitable
exogenous variables in a time series [5].
for all circumstances” [9]. Support Vector Machines SVM
For a multiplicative approach
unlike fuzzy networks are used for binary classification. It
Y =T ∗C ∗S∗I (1) deals with promising values with respect to data classification
and regression. The proposed daily cash demand forecasting
For an additive approach methods for automatics teller machines ATM shows tolerable
Y =T +C +S+I (2) forecasting quality using SVR, but according to tests carried
out by [12] better results can be achieved using forecasting
Where ”T” stands for Trend, ”C” stands for Cyclic, ”S” stands method of Artificial Neural Networks. Various factors affect-
for Seasonality And finally ”I” stands for Integrated(random ing the prediction are used to establish a network for cash
movements). There exist many ways that are able to model and withdrawal and the actual cash demand. The input values for
predict time series model, but some of them are difficult such an ANN include values of weekdays, days of the months
as the ARMA model. It is almost impossible to build such ,months in the years and holidays. The output generated from
a model without the direct involvement of a human expert; this scenario is the variable of ATM cash demand for the next
especially in Box-Jenkins methodology [9]. Moving Average day which are evaluated over Root Mean Square Error for the
looks to be a simple approach to minimize the effect of predictions.
different conditions in time series and may have an optimized
performance compared to other complex algorithms. III. DATA P RE - PROCESSING
A primary source of data has been considered the NN5 compe- Data pre-processing concerns with forming the data for
tition, which hosts 111 time series of ATMs with 735 known parsing it into our models. It deals with removing duplicates,
observations. Such an experiment was carried out over the missing values were interpolated for the purpose of daily
database discussing two methods by Kamini, Vadlamani, and prediction. Data was verified for stationary test, it was made
Kumar [6]. One is the traditional method of deseasonalisation stationary using Python’s predefined Dickey-Fuller test and
considering a lag of 7 days over ARIMA(2,1,2) and other differentiation up to the required degree. However, the explicit
forecasting techniques including Multilayer Feed Forward differentiation for making the dataset stationary was carried
Neural Network MLFF, Wavelet Neural Network WNN and out for VARMAX while ARIMA itself takes these steps and
Graph Neural Networks GNN. In method 2, the chaotic nature returns prediction according to the original present values for
of time series is analysed by reconstructing the phase space the purpose of forecasting. After making the data stationary,
using delay time of 2 and embedding dimension of 2. The for the case of VARMAX the values are integrated again to
best result of the study, 14.71% SMAPE value, is yielded by return to the original set of values that were once used to
GRNN. This result convincingly outperformed the results of stationarize. VARMAX generally works on the presence of
exogenous values, hence the feature spaces of Holidays and 2) Evaluation Approach: Residual Squared Sum (RSS) was
Salary Week were vectorized for predicting in accordance with used as an evaluation matrix for the time series approach. RSS
exogenous values which are affecting directly in the proportion uses a similar approach to Mean Squared Error (MSE). It is
of one another. basically the summation of the values i.e. the predicted and
the tested upon the total number of values available. Hence:
IV. M ETHODOLOGY
1 1 X
M SE = RSS = (fi − yi )2 (3)
The methodology section describes the previously attempted N N
approach considering the time series approach by [8]. The Where N is the number of samples and fi is our estimation
paper contains description of implementing Neural Networks of yi
approach, but our concern is critical analysis based on Time Likewise, the difference between MSE and SMAPE is clear.
Series modeling. MSE squares the residual obtained to punish the errors and
mark them for more obvious reading. But this seems to lead to
A. Previous Approach the problem that the units are also squared and hence changes
1) Model Approach: Reference [8] uses a Dataset com- the approach that was initiated at the start of the research.
prising of 3 year time period of ATM transactions located n
1 X | Ft − At |
in Pakistan. They further state that: Amount of Transaction SM AP E = (4)
and Count of Transaction are highly and strongly related, n t=1 (At +Ft )
2
which is about 0.98, and it indicates that most transactions Where ’n’ is the number of total values, ’F’ are the
were of a small amount of cash. The procedure which they forecasted values and ’A’ represents the actual number of
carried out used 30% of data for training purpose and rest values.
70% for testing purpose. It also finds hidden observations in
data. It performs well when the time series is highly/strongly B. Proposed Approach
related. Time Series recognizes the seasonal trends in the data, 1) Model Approach: ATM cash prediction using Time
such as a high amount of transactions on public holidays and Series approach uses statistical models such as Auto Re-
religious festivals. We have supposed that time series (TS) gression(AR), Auto Regression Integrated Moving Average
is stationary, and its statistical attributes such as variance, (ARIMA), and Vector Auto Regressive Moving Average with
mean remain constant over time. It is essential since there Exogenous variables (VARMAX). Our approach defines the
are high chances that time series data will follow an identical data into 70/30 split, training and testing, respectively, which
pattern in the upcoming time. ’Dickey Fuller Test’ is used is the normal specified approach very unlikely to the split ratio
to test whether the data is stationary or non-stationary. After defined above 30/70 split, training and testing, respectively. In
the test, they concluded that the data of some ATMs were the previous approach, data was made stationary using the
not stationary. Furthermore, to make the dataset stationary, Moving Average (MA) methodology, which is not a practical
they have used moving average over a specific window size. approach. Our approach deals with the similar task using data
ARIMA and ARMA models of time series statistics were used differentiation ”Calculates the difference of a Data Frame
to forecast and match the testing data, which proved to be element compared with another element in the Data Frame
highly insufficient and they concluded that the results were (default is the element in the same column of the previous
not satisfactory. row)” [4]. In such cases, differencing and power transforma-
This fails due to the fact that unlike interpolation mean average tions are often used to remove the trend and to make the
is just the sum of over a specific window, which does not series stationary [1]. ARIMA model uses the (p,d,q) standard
provide as much insight into the user data as the interpolation where ’d’ defines the number of integration i.e. differentiation
approach carried out by our approach. ARIMA, as in the required to make the data stationary. Whereas, in the previous
name uses differentiation to normalize the data and make approach Moving Average was used first make the time series
it stationary for ARMA. The prediction is made using the data stationary and then passed to the ARIMA model. This
original non-stationary data since that is what is required if methodology implements ARIMA, we passed the original time
the prediction is made using the obtained stationary data our series data into the model and then forecast the result. It
forecast would not map on the results of the tests and as an concludes that the results are satisfactory, while previously
outcome the accuracy will be way over the desired facts. Hence RSS was deemed too high to proceed. Consider this figure
the data is integrated before prediction purposes. The previous which shows the SMAPE being 22.5% for the ARIMA model
approach uses the differenced dataset to predict hence getting implemented and tested on a two month time frame. The key
faulty error readings. for the graph below shows blue for the original values and red
Another approach is Using seasonality implementation, but for the predicted.
unlike the usual approach they have been enforced into the Moving on, data analysis proved a high co-relation between
model rather than letting the model understand the parameters transaction amount and transaction count. So, we used multi-
itself. Whereas Time Series model uses seasonality and trend variate time series model VARMAX, now we can forecast two
like features implicitly. time series i.e. transaction amount and transaction count at a
the fact it is considered the ideal standard evaluation matrix
for time series analysis, it provides results in the form of ratio
ranging from 0% to a maximum of 200%. To evaluate the
accuracy of the forecast, MAPE becomes infinite if there are
zero values in a series. Therefore, Symmetric Mean Absolute
Percentage Error (SMAPE) is used to evaluate error, and
we have also used it for the same reason, and it is also
the main evaluation metric in the NN5 competition [6] .
More discrepancies concerning the fact have been regarded
in section IV-A2.
V. E XPERIMENTAL S TUDIES
Fig. 1. ARIMA Model Test
A. ARIMA
ARIMA is a statistical model for analysing and forecasting
time. VARMAX puts into account the exogenous variables
time series data. It is a simple yet precise method considered
such as holidays, salary week, and even weekdays. We first
to be one of the standards in time series prediction. This name
made the time series stationary using differentiation techniques
itself describes the model; ARIMA stands for AutoRegressive
and then parsed the values to our model for testing and
Integrated Moving Average. It adds the parts of integration
prediction, with the average SMAPE being 31.4%. VARMAX
to regression for making the data stationary with the use of
figures shown below describe the original values in blue,
differentiation [4].
whereas the red is for the predictions.
ARIMA is a notation for (p,d,q) where the parameters are
filled with values to indicate the ARIMA model being used
for the purpose of training, testing and forecasting.
ARIMA consists of the following parameters:
1) p: Lag or previous value count to predict the forecast.
2) d: Number of differentiation for making the data station-
ary.
3) q: Moving average window size.
ARMA (p, q):
Yt = βo +β1 Yt−1 +...+βp Yt−p +t +θ1 t−1 +...+θq t−q (5)
The differentiation (if any) must be reversed to obtain
Fig. 2. VARMAX Model for count forecast:
if d = 0: Yt‘ = yt‘
if d = 1: Yt‘ = yt‘ + Yt−1
if d = 2: Yt‘ = yy‘ + 2Yt−1 − Yt−2
Continuing the above approach, our methodology consists
of 3 major steps:
1) Parsing Dataset.
2) Prediction using the time series approach
3) Evaluation using SMAPE matrix
The dataset as always is the heart of the implementation
and consists of ATMs with their id and transaction. The time
series is made understandable for the model being used and
implemented for the ARIMA prediction purposes. The results
Fig. 3. VARMAX Model for transaction
obtained then will be evaluated using the SMAPE matrix.
Previous approach also considers to implement seasonality
B. VARMAX
explicitly by setting exogenous variables whereas, for a
scenario such as this in which the real time series data is VARMAX (Vector Autoregressive Moving Average model
catered seasonality must be implicit in the time series data. with exogenous variables) extends the ARMA/ARIMA model
in two ways:
2) Evaluation Approach: Our approach implements Sym- 1) to work with time series with multiple response variables
metric Mean Absolute Percentage Error (SMAPE). Besides (vector time series).
2) to work with exogenous variables, or variables that are
independent of the other variables in the system.
The model includes both the dynamic relationship between
the multiple response variables and the relationship between
the dependent and independent variables. This formula
represents a non-seasonal VARMAX model:
p
X b−1
X q
X
Yt = ΦYt−i + Bi Xt − i + φi Et−i + C + Et (6)
i=1 i=0 i=1
In the preceding equation, Yt is a stationarized time series. The
first term is the autoregressive component, the second term is Fig. 4. Prediction over stationary data
the exogenous component, the third term is the moving average
component, the fourth (C) is a vector of constants, and the fifth
(Et ) is a vector of residual errors, and: implementation regarding ARIMA. Figure 1 shows the result
for our ARIMA model.
• Yt is a vector of n response variables
• Xt is a vector of m exogenous variables R EFERENCES
• p is the number of previous periods of the exogenous
[1] Ratnadip Adhikari and R. Agrawal. An Introductory
variables included in the model Study on Time series Modeling and Forecasting. Jan.
• q number of previous periods included in the moving
2013. ISBN: 978-3-659-33508-2. DOI: 10 . 13140 / 2 . 1 .
average 2771.8084.
• b number of previous periods of exogenous variables
[2] Mohammad Almasarweh and Sadam Alwadi. “ARIMA
• Φ is an n * n matrix of autoregressive parameters
Model in Predicting Banking Stock Market Data”. In:
• Bi is an n * m matrix of exogenous variable parameters
Modern Applied Science 12 (Oct. 2018), p. 309. DOI:
• φi is an n * n matrix of moving average parameters
10.5539/mas.v12n11p309.
• Et being the difference between the actual and the
[3] Pushkar Dandekar and Ketki Ranade. “ATM Cash Flow
predicted value of Yt , (Yt − Yt‘ ) Management”. In: International Journal of Innovation,
1) Model Implementation: The parsed and modelled data Management and Technology 6 (Oct. 2015), pp. 343–
is first tested for co-relation and the tests prove to be highly 347. DOI: 10.18178/ijimt.2015.6.5.627.
positive with the transaction amount and transaction count. [4] Python Documentation. Pandas DataFrame
Secondly, a test is carried out for verifying the stationary Diff. https://pandas.pydata.org/pandas-
state of the dataset. If found non-stationary it is differenced docs/stable/reference/api/pandas.DataFrame.diff.html.
for the third leg of the implementation. The data, finally Accessed on 2019-10-21.
made feasible is then executed on the VARIMAX model. [5] Chihli Hung, Chih-Neng Hung, and Szu-Yin Lin. “Pre-
The results obtained are first undifferentiated and converted dicting Time Series Using Integration of Moving Aver-
back to its original form for the purpose of prediction. The age and Support Vector Regression”. In: International
future predictions dataset is evaluated using SMAPE matrix. Journal of Machine Learning and Computing 4 (Jan.
Which equates the results with the purpose of keeping the data 2014), pp. 491–495. DOI: 10.7763/IJMLC.2014.V6.460.
absolute and matching with the unit of quantity, plus in the [6] Venkatesh Kamini, Ravi Vadlamani, and D Nagesh
form of a percentage. Kumar. “Chaotic Time Series Analysis with Neural
Networks to forecast Cash Demand in ATMs”. In: Dec.
VI. R ESULT AND E VALUATION 2014. DOI: 10.1109/ICCIC.2014.7238399.
Comparing the values obtained for the same series of ATM [7] Venkatesh Kamini et al. “Cash demand forecasting in
datasets [8] proposes a RMSE (Root Mean Square Error) of ATMs by clustering and neural networks”. In: Euro-
358950.12 over a total sum of transaction summing around 2.5 pean Journal of Operational Research 232 (Jan. 2014),
million. The sample of their execution can be seen in a graph pp. 383–392. DOI: 10.1016/j.ejor.2013.07.027.
implemented in figure 4 where blue describes the original and [8] Akber Rajwani et al. “Regression Analysis for ATM
orange line is for the predicted or in this case the tested values Cash Flow Prediction”. In: Jan. 2018. DOI: 10 .1109 /
as per the model. FIT.2017.00045.
Comparing the values obtained for the same series of ATM [9] Ignacio Rojas et al. “Soft-computing techniques and
datasets our approach proposes a RMSE (Root Mean Square ARMA model for time series prediction”. In: Neuro-
Error) of 19994.81 and an additional error evaluation approach computing 71 (Jan. 2008), pp. 519–537. DOI: 10.1016/
SMAPE of 22.52% as discussed in figure 1. This shows a clear j.neucom.2007.07.018.
and major difference in RMSE as compared in the previously
implemented Rajwani et al. [8] approach and our deduced
[10] Sefik Serengil and Alper Ozpinar. “ATM Cash
Flow Prediction and Replenishment Optimization with
ANN”. In: 11 (Jan. 2019), pp. 402–408. DOI: 10.29137/
umagd.484670.
[11] Sefik Serengil and Alper Ozpinar. “Workforce Op-
timization for Bank Operation Centers: A Machine
Learning Approach”. In: International Journal of In-
teractive Multimedia and Artificial Intelligence 4 (Dec.
2017), pp. 81–87. DOI: 10.9781/ijimai.2017.07.002.
[12] Rimvydas Simutis, Darius Dilijonas, and Lidija Bastina.
“Cash demand forecasting for ATM using neural net-
works and support vector regression algorithms”. In: 1
(Jan. 2008), pp. 2008–416.
[13] Rimvydas Simutis et al. “Optimization of cash manage-
ment for ATM network”. In: Information Technology
and Control 36 (Dec. 2010).