KEMBAR78
Machine Learning Based Decline Curve Analysis For | PDF | Time Series | Forecasting
0% found this document useful (0 votes)
122 views23 pages

Machine Learning Based Decline Curve Analysis For

This research article proposes using machine learning techniques for short-term oil production forecasting as an alternative to traditional decline curve analysis (DCA) methods. Specifically, the article tests two state-of-the-art time series models - DeepAR and Prophet - on production data from a well in the Midland fields in the USA. The results show that both DeepAR and Prophet can provide better understanding of well behavior compared to a single DCA model, and mitigate over or underestimates in forecasts. The machine learning approach performs short-term predictions without requiring selection of a DCA model, making it more flexible and able to account for uncertainty compared to traditional DCA methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views23 pages

Machine Learning Based Decline Curve Analysis For

This research article proposes using machine learning techniques for short-term oil production forecasting as an alternative to traditional decline curve analysis (DCA) methods. Specifically, the article tests two state-of-the-art time series models - DeepAR and Prophet - on production data from a well in the Midland fields in the USA. The results show that both DeepAR and Prophet can provide better understanding of well behavior compared to a single DCA model, and mitigate over or underestimates in forecasts. The machine learning approach performs short-term predictions without requiring selection of a DCA model, making it more flexible and able to account for uncertainty compared to traditional DCA methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Research Article

Energy Exploration & Exploitation


0(0) 1–23
Machine learning based ! The Author(s) 2021
DOI: 10.1177/01445987211011784
decline curve analysis for journals.sagepub.com/home/eea

short-term oil production


forecast

Amine Tadjer , Aojie Hong and


Reidar B Bratvold

Abstract
Traditional decline curve analyses (DCAs), both deterministic and probabilistic, use specific
models to fit production data for production forecasting. Various decline curve models have
been applied for unconventional wells, including the Arps model, stretched exponential model,
Duong model, and combined capacitance-resistance model. However, it is not straightforward to
determine which model should be used, as multiple models may fit a dataset equally well but
provide different forecasts, and hastily selecting a model for probabilistic DCA can underestimate
the uncertainty in a production forecast. Data science, machine learning, and artificial intelligence
are revolutionizing the oil and gas industry by utilizing computing power more effectively and
efficiently. We propose a data-driven approach in this paper to performing short term predictions
for unconventional oil production. Two states of the art level models have tested: DeepAR and
used Prophet time series analysis on petroleum production data. Compared with the traditional
approach using decline curve models, the machine learning approach can be regarded as” model-
free” (non-parametric) because the pre-determination of decline curve models is not required.
The main goal of this work is to develop and apply neural networks and time series techniques to
oil well data without having substantial knowledge regarding the extraction process or physical
relationship between the geological and dynamic parameters. For evaluation and verification
purpose, The proposed method is applied to a selected well of Midland fields from the USA.
By comparing our results, we can infer that both DeepAR and Prophet analysis are useful for
gaining a better understanding of the behavior of oil wells, and can mitigate over/underestimates
resulting from using a single decline curve model for forecasting. In addition, the proposed
approach performs well in spreading model uncertainty to uncertainty in production forecasting;
that is, we end up with a forecast which outperforms the standard DCA methods.

Department of Energy Resources, University of Stavanger, Stavanger, Norway


Corresponding author:
Amine Tadjer, University of Stavanger, Kjell Arholms gate 41, Stavanger 4036, Norway.
Email: amine.tadjer@uis.no

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution
4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and
distribution of the work without further permission provided the original work is attributed as specified on the SAGE and
Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Energy Exploration & Exploitation 0(0)

Keywords
Deep learning, probabilistic modeling, production forecasting, time series analysis

Introduction
Hydrocarbon production forecasting includes estimation of the ultimate recoveries and the
lifetimes of wells, which are material factors for decision-making in the oil and gas industry
because they can impact significantly economic evaluation and field development planning.
Although mathematically richer forecasting models (e.g., grid-based reservoir simulation
models) have been developed over the past decades, decline curve analysis (DCA) is still
widely used because of its simplicity: The mathematical formulations of DCA models are
simple with only a few parameters, and only production data are required to calibrate the
parameters. The Arps model (Arps, 1945) has been used for DCA for more than 60 years
and has been proved to perform well for conventional reservoirs. However, because of the
complexity of flow behaviors in unconventional reservoirs as several flow regimes are
involved (Adekoya, 2009; Joshi, 2012; Nelson, 2009) the Arps model may not be ideal,
and many other models have been proposed (e.g., the Stretched Exponential decline
model (Valk o and Lee, 2010), the Duong model (Duong, 2011) and the combined
capacitance-resistance model proposed by Pan (Pan, 2016). The Pan model is subsequently
referred as Pan CRM is this paper. Some researchers (e.g. (Gonzalez et al., 2012)) have
attempted to identify a single ed as Pan CRM is this paper. Some researchers (e.Hong et al.,
2019) have argued that selecting a single ed as Pan CRM is this paper. Some researchers (e.g.
(pacitance Resistaconfidence (i.e., trust the single model 100%) , which can cause significant
over/underestimates. Thus, their proposed approach incorporates multiple models by using
Monte Carlo simulation to assess the probability of each model and consequently provides a
probabilistic forecast of production. Some limitations of Hong et al.’s approach are: (1) a
collection of DCA models still needs to be predefined, and (2) the assessed probability of
each model is only a measure of the model’s relative goodness to other models. If, for
example, all the candidate models overestimate production, using Hong et al.’s approach
will still result in an overestimated forecast. Thus, an approach that does not require the
predefinition of DCA models is deemed preferable; i.e., using a non-parametric model.
Machine learning (ML) is still a relatively new technique in the oil and gas industry.
Several researchers have discussed the applications of ML for DCA. For instance,
(Gupta et al., 2014) used neural networks (NNs)d neural networkfor DCA. They first
trained the NNs using historical data to capture the decline in production in shale forma-
tions, and the trained model was then used for prediction. This study also used the autor-
egressive integrated moving average (ARIMA) (George et al., 2015), a time series analysis to
analyze the historical data and identify the trends and relationships of historical and pre-
dicted data. Although they applied these two methods for a sample size of around 30 wells,
but they did not quantify uncertainties in the forecasted results. (Ma and Liu, 2018) pre-
dicted the oil production using the novel multivariate nonlinear model based on traditional
Arps decline model and a kernel method. (Aditya et al., 2017) developed a novel predictive
modeling methodology that linked well completion and location features to DCA model
parameters. The objective of the methodology was to generate predicted decline curves at
potential new well locations. (Han et al., 2020) used Random Forest (RF) to develop a
Tadjer et al. 3

predictive model that can be used to predict productivity during the early phase of produc-
tion (within 6 months). The required datasets were obtained from 150 wells, targeting shale
gas, stationed at Eagle Ford shale formations. Reservoir properties, well stimulation and
completion were considered as key input parameters whilst the cumulative production of gas
during a span of 3 years was identified as the target variable.Although (Aditya et al., 2017;
Han et al., 2020) results were promising, the applicability of their methodology depends
heavily on the presence of specific geological, well stimulation and completion data, and the
quality and accuracy of the data have a big impact and influence, any anomaly in data
consequently make their results less promising.In the context of deep learning, (Luo et al.,
2019) built non-linear models using RF and Deep Neural Network (DNN) algorithms to
forecast the cumulative production of oil during a span of 6 months. The whole dataset was
obtained from around 3600 wells positioned at Eagle Ford formations. Key parameters
associated with geological parameters such as structural depth, thickness of the formation,
total organic carbon (TOC), number of calcite layers and average thickness of the layer
(thickness of the formation divided by the total count of the layers) were identified as the
input variables that impacted the productivity of wells in Eagle Ford. In the context of Deep
Recurrent Neural Networks, (Lee et al., 2019) used the Long short-term memory (LSTM)
algorithm to develop a model for forecasting future shale-gas production. The gas produc-
tion and shut-in period of the past were taken into account to deduce the input features. The
training dataset was collected from 300 wells located in Alberta, Canada, at the Duverney
formation. For 15 wells stationed in the same field, the model was tested. The trained model
demonstrated the ability to predict production rates over a longer period (55 months). They
found out that the method can be used even faster to forecast future production rates and
analyze the impact of added attributes such as the shut-in period. It was highlighted that the
approach would provide a more reliable and accurate forecast of the production of shale gas
and that this method can be used in both traditional and unconventional scenarios. In terms
of reliability and utilization, further tuning and improvement of the feature selection process
will produce a system with improved predictive capabilities. Stimulation parameters attrib-
ute derived from geological knowledge, and refracturing were proposed to be included as
possible features that have a critical impact on shale gas production and improve the
methode the methodgeological knowledge, and hat the ahose circumstances, the high-
intensity drilling associated with unconventional hydrocarbon resources and the underper-
formance of DCA make this technique more successful. (Zhan et al., 2019) checked the
LSTM methoduded as possible features that have a critiduction of oil over two years or even
further by using very little previous data acquired during the initial production phases.
From over 300 wells stationed in unconventional onshore formations, the required dataset
was obtained. Over the first few production years, it is possible to recover around 70% of
the total EUR from shale wells, after which a rapid decline is observed. The steepness of the
decline makes it hard to survey the trend that causes over-estimation. In such a dynamic
situation, they highlighted this methodology’s value for forecasting production and assess-
ing the reservoir. They found that the average difference between the accumulated produc-
tion estimated and observed stayed within 0.2%, while the variance did not exceed 5%.
(Sagheer and Kotb, 2019) tested deep LSTM (DLSTM) network predictive efficacy in which
more LSTM layers were stacked to address shallow structure limitations when operating
with data from long interval time series. They figured that the proposed approach per-
formed much better than other models used in the analysis, like those based on ARIMA,
deep gated recurrent unit (GRU), and deep RNN. Based on their applicability, the models
4 Energy Exploration & Exploitation 0(0)

were tested and validated in two real-field case studies, such as India’s Cambay Basin oil
field and China’s Huabei oil field. An ensemble empirical mode decomposition (EEMD)
based LSTM was suggested by (Liu et al., 2020) to increase the oil production forecasting
speed and accuracy. Two real-field events, the JD and SJ oilfields based in China, were
tested to determine and verify the efficacy of the model. The EEMD-LSTM model was
contrasted with models EEMD-Suport Vector Machine (SVM) and EEMD-Neural
netowrk. The EEMD-LSTM model has been found to work much better as compared to
the other models by producing the forecast perfectly and with great quality. Although
several machine learning and deep learning models have been proposed to learn better on
how to handle multiple seasonal patterns in oil production data. However, to the best of our
knowledge, no studies have yet applied such probabilistic model for production forecasting.
The novelty of this work is to improve upon the existing techniques used by petroleum
engineers to analyze and appraise oil wells. Evaluating oil well potential is a lengthy inves-
tigation process. This is because the production profiles can be complex, as they are driven
by reservoir physics and made even more challenging by a variety of operational events.
Petroleum engineers analyze and evaluate the production profiles of oil wells, understand
their underlying behavior, forecast their expected production, and identify opportunities for
performance improvements. The investigation process is, nevertheless, time-consuming.
This introduces opportunities to optimize these processes. Thus, State-of-the-art level prob-
abilistic machine learning methods are considered DeepAR (David et al., 2019) and Prophet
time series analysis (Taylor and Letham, 2007), that are known to be effective in pattern
recognition and outperforming the state-of-the-art forecasting methods on several prob-
lems. These two algorithms can be used to understand and predict the behavior of oil
wells. Our objective is to determine the viability of these algorithms in predicting the dis-
tribution of future outcomes, specifically with time series data representing the oil produc-
tion of petroleum without having substantial knowledge regarding the extraction process or
physical relationship between the geological and dynamic parameters. In the remainder of
this paper, we first review the DL and time series analysis modeling that will be used to
accomplish the task. Thereafter, we explain the evaluation metrics used to assess the quality
of the forecast; and finally, we present the experimental results and a discussion of our work.

Time series analysis and DCA


Time series analysis
A time series is a sequence of data obtained at many regular or irregular time intervals and
stored in a successive time order; for example, a sequence of measured oil production rates
over time. The objective of time series analysis is to extract useful statistical characteristics
(e.g., trend, pattern, and variability) from a time series, to determine a model that describes
the characteristics, to use the model for forecasting, and ultimately to leverage insights
gained from the analysis for decision supporting and making. Traditionally, time series
models can be classified into generative and discriminative models, depending on how the
target outcomes are modeled (Ng and Jordan, 2002). The main difference between the two
models is that generative models predict the conditional distribution of the future values of
the time series given relevant covariates while the discriminative models use the past value.
In this study, we will use discriminative models, as they are more flexible and require fewer
parameters and structural assumptions than generative models. For more details about
Tadjer et al. 5

generative and discriminative model, see (David et al., 2019; Gasthaus et al., 2019; Ng and
Jordan, 2002; Ruofeng et al., 2018).
A critical aspect of discriminative models is the process of reconstructing a single
sequence of data points to yield multiple response observations. To solve this, sequence-
to-sequence (seq2seq) (Cho et al., 2014) and autoregressive recurrent networks (David et al.,
2019) approaches were used to feed and generate output from time series prediction models.
In seq2seq, the model is fed a sequence of time series as inputs, and it produces a time series
sequence as output, unlike the autoregressive model, which reduces the sequence prediction
to a one-step-ahead problem.

DCA
DCA is a type of time series analysis with data type of oil production data. DCA aims to
predict the future production of a well or a field based on historical data. The prediction is
useful for evaluating the economics of the future production and supporting decisions such
as whether a well or a field should be abandoned. Panutoregressive model, which reduces
the se(Pan CRM) is DCA method. It is designed to capture the major flow regimes–transient
and semi-steady state flow regimes–relevant for an unconventional well. (Pan, 2016) pro-
posed a model to capture the productivity index behavior over both linear transient and
boundary-dominated flow. Its formula is given as:

b
J ¼ pffiffi þ J1 (1)
t

where J is the productivity, J1 is the constant productivity index that a well will eventually
reach at boundary dominated flow, b is the parameter of the linear transient flow, b is
related to the permeability in the analytical solution of linear flow into fractured wells
presented by (Wattenbarger et al., 1998). Pan obtained the empirical solution of rate over
time by combining the previous equation and a tank material balance equation.The stan-
dard form is given as:

b pffi
qðtÞ ¼ DPðpffiffi þ J1 Þeð2b tþJ1 t=ct Vp Þ (2)
t
where ct the total compressibility, Vp the drainage pore volume, and DP is the difference
between the initial reservoir pressure and the assumed constant flowing bottom hole pres-
sure. For small t, the Pan CRM may offer an unrealistically high rate, as q(t) approaches
infinity when t approaches 0. The Pan CRM is analytically derived and has all the param-
eters associated with a reservoir system For small ation.The standard form is given as:
University ofct, Vp, DP, b and J1 are determined through history matching with the goal
to minimize a predefined loss (or objective) function by adjusting the model parameters.

Machine-learning models and techniques for time series analysis


Prophet forecasting model
The Prophet forecasting is a bayesian nonlinear univariate generative model for time series
forecasting, which was developed by the Facebook Research team (Taylor and Letham,
6 Energy Exploration & Exploitation 0(0)

2007) for the purpose of creating high-quality multistep-ahead forecasting. This model tries
to address the following difficulties common to many types of time series forecasting and
modeling:

• Seasonal effects caused by human behavior: weekly, monthly, and yearly cycles; dips and
peaks on public holidays;
• Changes in trends due to new products and market events;
• Outliers.

The Prophet forecasting model utilizes the additive regression model, which comprises of
the following components:

yðtÞ ¼ gðtÞ þ sðtÞ þ hðtÞ þ et (3)

where y(t) is the variable of interest, g(t) is the piecewise linear or logistic growth curve for
modeling non-periodic changes in a time series, seasonality s(t) represents periodic changes
(e.g., weekly or yearly seasonality), h(t) reflects the effects of irregular holidays, and et
represents the error term that accounts for any uncertain changes not accommodated by
the model (usually, et is modeled as normally distributed noise).
We invoke the growth trend g(t) as a core component of the entire Prophet model. The
trend illustrates how the entire time series expands and how it is projected to evolve in the
future. For analysts, Prophet proposes two models: a piecewise-linear model and a
saturating-growth model.
Nonlinear, saturating growth is modeled using the logistic growth model, which occurs as
follows in its most basic form:

C
gðtÞ ¼ (4)
1 þ expkð1mÞ

Where m is an offset parameter, k is the growth rate, and C is the carrying capacity.
However, the value of C is not inherently a constant, which usually varies over time. It was
then replaced by a time-varying capability C(t). Moreover, the growth rate of k is not
constant. Therefore, it is presumed that the change-point where growth rates change has
been integrated and the growth rate between two change-points is constant.
The piecewise logistic-growth model is formed as follows:

C
gðtÞ ¼ T (5)
1 þ expðkþaðtÞ dÞðtðmþaðtÞT cÞÞ

where c is the vector of rate adjustments, d is the vector of correct adjustments at change-
points, and k þ aðtÞT d is the growth rate at time t. a(t) is defined by the following:
 
1 ts
aðtÞ ¼ (6)
0 otherwise

where, s is the time point of change in the growth rate.


Tadjer et al. 7

Linear growth is modeled using a constant growth rate piecewise, and its formula is
given as:

gðtÞ ¼ ðk þ aðtÞT dÞt þ ðm þ aðtÞT cÞ (7)

Where a(t), k, d, and c are the same as the nonlinear trend model.
In the time series, seasonality reflects periodic changes daily, weekly monthly and yearly
seasonality. To provide a versatile model of periodic effects, the Prophet forecasting model
depends on a Fourier series. Its smooth fitting formula is given as:
    
X
N
2pnt 2pnt
sðtÞ ¼ an cos þ bn sin (8)
n¼1
P P

where P is a regular period that the time series may have (for example, P ¼ 7 for weekly data
or P ¼ 365 for annual data) and N is the number of such cycles that we want to use in the
model. The final seasonal model appears as follows when combining all seasonal time series
models in s(t) into a vector X(t):

sðtÞ ¼ XðtÞb (9)

where b Normalð0; r2 Þ is needed before the seasonality to enforce a smoothing.


Holidays and events: To completely understand the effect on holidays of a business time
series or other major events such as workover, production shutdown for operations (for
example, a workover), these constraints are explicitly set by the Prophet forecast model.

Recurrent neural network (RNN)


Compared with the traditional artificial neural network (ANN), the structure of RNN
neuron is different from that of ANN by adding a cyclic connection, which form feedback
loops in hidden layers, and hence the information of the last item in RNN can be trans-
mitted to the current item. The structure of RNN neuron is shown in Figure 1. When the

Figure 1. The structure of Recurrent Neural Network.


8 Energy Exploration & Exploitation 0(0)

time series X ¼ ðx1 ; x2 ; x3 ; . . . ; xn Þ is input, the sequence of hidden layer is H ¼


ðh1 ; h2 ; h3 ; . . . ; hy Þ and the sequence of output layer is Y ¼ ðy1 ; y2 ; y3 ; . . . ; yn Þ.
The relationship of X, H and Y are listed in the following equations:
( )
hn ¼ rðWxh xn þ Whh hn1 þ bh Þ
(10)
yn ¼ Why hn þ by

where, r is the non-linear activation function, Wxh, Whh and Why are the weight matrix from
input to hidden layer, hidden layer to hidden layer and hidden layer to output, respectively,
bh and by are biased terms.

Long short-term memory neural network (LSTM)


The LSTM neural network model (Greff et al., 2017; Hochreiter and Schmidhuber, 1997) is
a type of RNN structure, which is widely used to solve sequence problems. An LSTM tends
to learn long -term dependencies and solve the vanishing gradient problems1 (Grosse, 2017),
an issue observed in training ANN with gradient based learning techniques as well as
backpropagation algorithms. An LSTM allows the storage of information extracted from
data over an extended time period, and shares the same parameters (i.e., network, weights)
across all timesteps.
The structure of the LSTM shown in Figure 2. consists of the long term state ðct Þ and
three multiplicative units N with gðit Þ, output gate ðot Þ, and forget gate ðft Þ— and equiva-
lently write, read, and reset information within the model’s cells. These three multiplicative
gates enable the LSTM memory cells to store and access information over long time periods.
The gates control the amount of information fed into the memory cell at a given timestep.
Unlike traditional RNN methods that overwrite new content at each timestep, the LSTM
state vector and weights are modified at each timestep to take into account any evolution of
the input-output relation occurring over time and carry that information over a long

Figure 2. Architecture of an LSTM cell (Geron, 2017).


Tadjer et al. 9

distance. The LSTM functions are listed as follows:


8 9
>
> it ¼ rðWhi ht1 þ Wci ct1 þ Wxi xt þ bi Þ > >
>
> >
>
>
> f ¼ r ð W h þ W c þ W x þ b Þ >
>
>
> t hf t1 cf t1 xf t f >
>
< ct ¼ tanhðWhc ht1 þ Wxc xt þ bc Þ =
(11)
>
> ct ¼ ft  ct1 þ it  ct >
>
>
> >
>
> ð
> ot ¼ r Who ht1 þ Wxo xt þ bo Þ >
>
>
> >
>
: ;
ht ¼ ot  tanhðct Þ

Where the input gate ðit Þ, a forget gate ðft Þ and previous cell state ð
c t Þ control the current cell
state ðht Þ, and the output gate ot Þ and current cell state ðct Þ are used to control the hidden state
ðht Þ at time t.r is the element-wise sigmoid function,  denotes the elementwise dot product
operator, xt Þ is the input vector at time t, and ht1 is the hidden state vector that store all the
useful information prior to time t. Wxi, Wxf, Wxc, and Wxo denote the weight matrices of different
gates for input xt Þ; Whi, Whf, Whc, and Who are the weight matrices for hidden state ht; Wci, and
Wcf denote the weight matrices of cell state ct1 ; and bi, bf, bc, and bo denote the bias vector.

Gated recurrent unit (GRU)


The GRU is similar to the LSTM, but with a simplified structure and parameters. It was first
introduced by (Kyunghyun et al., 2014). GRUs have been used in a variety of tasks that
require capturing long-term dependencies (Junyoung et al., 2014). Similar to the LSTM, the
GRU contains gating units that modulate the flow of information inside the unit. However,
unlike the LSTM, the GRU does not include separate memory cells, and contains only two
gates—the update gate and the reset gate—as displayed Figure 3. The update gate zt decides
how often the unit updates its activation functions. This process takes a linear sum between the
existing state and a newly computed state. The second gate within the GRU, the reset gate rt ,
acts to forget the previously computed state. The updated functions are listed as follows:
8   9
>
> ¼ 1  Þh þ ~ >
>
> h
> t
zt t1 z h
t t >
>
< =
zt ¼ rðWz xt þ Uz ht1 Þ (12)
>
> ~t ¼ tanhðWxt þ Uð rt  ht1 ÞÞ > >
>
> h >
>
: ;
rt ¼ rðWr xt þ Ur ht1 ÞÞ

Figure 3. Gated Recurrent Unit: rÞ and z are the reset and update gates, and ht and ~h t are the activation
and the candidate activation (Kyunghyun et al., 2014).
10 Energy Exploration & Exploitation 0(0)

Where the activation ht of the GRU at time t is a linear interpolation between the pre-
vious activation ht1 and the candidate activation h~t , W denotes the weight matrices, xt is the
input vector at time t,and U denotes the weight matrices of the cell state.

DeepAR
DeepAR is a generative, auto-regressive model. It consists of a recurrent neural network
(RNN) using Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) cells
that takes the previous time points and covariates as input. In this study, We use the fore-
casting model from Salinas et al. (David et al., 2019). Unlike other methods of forecasting,
DeepAR jointly learns from every time series. In (David et al., 2019) publication, DeepAR
was outperforming the state-of-the-art forecasting methods on many problems.
Let zi;t be the value of time series i at time t, the objective is to model the conditional
distribution Pðzi ; t0:T jzi;1:t0 1 ; xi;1:T Þ, of the future of each time series ½zi;t0 ; zi;t0þ1 ; . . . ; zi;T  :¼
zi;t0:T , given its past ½zi;1 ; . . . ; zi;t02 ; zi;t01  :¼ zi;1:t01 , where t0 represents the time point from
which zi;t is assumed to be unknown at prediction time, and xi;1:T are covariates that are
presumed to be known for all time points. The time ranges ½1 : t0  1 and ½t0 : T are the
context range and the prediction range, respectively. The model is based on an auto-
regressive recurrent network, summarised in the Figure 4. The model distribution
QH ðzi;t0 jzi;1:t0 ; xi;1:T Þ; xi;1:T is considered to be a product of likelihood factors:

Y
T
QH ðzi;t0 jzi;1:t0 ; xi;1:T Þ ¼ QH ðzi;t jzi;1:t1 ; xi;1:T Þ
t¼0
(13)
Y
T
¼ lðzi;t jhðhi;t ; HÞÞ
t¼0

Figure 4. Model Summary: The network inputs are the covariates xi;t at each step t, the goal value at
the previous step zi;t1 , and the previous network output hi;t1 at each step t. The network output hi;t ¼
hðhi;t1 ; zi;t1 ; xi;t ; HÞ is then used to measure the parameters hi;t ¼ hðhi;t ; HÞ of the probability lðzjhÞ that is
used to train the parameters of the model. A sample ^z i;t lðjhi;t Þ is fed back to the next step instead of the
true value when zi;t is unknown (David et al., 2019).
Tadjer et al. 11

with hi;t the autoregressive recurrent network output hi;t ¼ hðhi;t1 ; zi;t1 ; xi;t ; HÞ - which will
be fed as the next timestep input for hi;tþ1 - hðÞ is a function that is implemented by a multi-
layer recurrent neural network with LSTM or GRU cells parametrized by k, summarised in
tood lðzi;t jhðhi;t ; HÞ being a fixed distribution parametrized by a function hðhi;t ; HÞ. The
hi;t0 1 , initial state contains zi;t0 1 , context range information required to predict values in
the prediction range.
Given the model parameters required to pred z^i;t0 :T  QH ðzi;t0 jzi;1:t0 ; xi;1:T Þ can be obtained
directly by ancestral sampling: First, hi;t0 1 is obtained as as a recurrent network output,
then we sample z^i;t0 :T  lðjhðh^i;t ; HÞÞ for t ¼ 1; . . . ; t0  1, where h^i;t ¼ hðhi;t1 ; z^i;t1 ; xi;t ; HÞ
is initialized with h^i;t0 1 ¼ hi;t0 1 and z^i;t0 1 ¼ zi;t0 1 . The use of these samples makes it
possible to calculate quantities, like the value distribution quantities, at a particular time
in the prediction range.

Likelihood model. The probability of lðzjhÞ should at best reflect the data statistical properties.
It can be selected between any potential possibility, for example, Bernoulli, Gaussian,
Binomial-negative, etc.
For instance, the mean and the standard deviation are the parameters h ¼ ðl; rÞ in the
Gaussian likelihood case. These are provided to the network output respectively by the
network output and softplus activation to ensure r > 0:

1 ðz  lÞ2 Þ
lG ðzjl; rÞ ¼ pffiffiffiffiffiffiffiffiffiffi expð Þ;
2pr2 2r2
(14)
lðhi;t Þ ¼ wTl hi;t þ bl ;

rðhi;t Þ ¼ logð1 þ expðwTr hi;t þ br ÞÞ

Loss function. The model parameter H which consists of the RNN hðÞ parameters and the
hðÞ parameters, can be learned by maximizing the log-likelihood, as follows:

X
N X
T
i¼ loglðzi;t jhðhi;t Þ (15)
i¼1 t¼t0

with the time series dataset zi;1:Ti:1;...;N and known related covariates xi;1:T . No inference is
needed to calculate the previous equation compared to state-space models with latent var-
iables, as hi;t is a deterministic input function. It can therefore be explicitly optimized with
respect to th latent variables, as tplus activ

Measures for evaluating forecast


As previously mentioned, the purpose of this task is to predict several future timesteps in the
target time series. Confidence intervals are also given and predicting the exact values (such
as point forecasting). These are based on percentiles calculated from a probability
12 Energy Exploration & Exploitation 0(0)

distribution based on a fixed number of samples (e.g. DeepAR model). To evaluate the
forecast accuracy, we use the mean continuous ranked probability score (mean CRPS).
Mean (CRPS): used to quantify both the accuracy and precision of a probabilistic fore-
cast (Hersbach, 2000). A higher value of mean CRPS indicates less accurate results. CRPS
can be defined as:
Z 1
CRPS ¼ ½pðxÞ  Hðx  xobs Þ2 dx (16)
1

Z x
Here, pðxÞ ¼ pðyÞdy is the cumulative distribution of a quantity of interest, and Hðx  xobs Þ
1
is the step function, i.e.,
 
0 if < 0
HðxÞ ¼ (17)
1 if P 0

For N samples, the CRPS can be evaluated as follows:

X
N
CRPS ¼ ci ci ¼ ai p2i þ bi ð1  pi Þ2 (18)
i¼0

where pi ¼ PðxÞ ¼ i=N; for xi < x < xiþ1 (piecewise constant function).
8 9
< 0 if xobs < xi =
ai ¼ xobs  xi if xi < xobs < xiþ1 (19)
: ;
xiþ1  xi if xobs > xiþ1
8 9
< xiþ1  xi if xobs < xi =
bi ¼ xiþ1  xobs if xi < xobs < xiþ1 (20)
: ;
0 if xobs > xiþ1

Data collection and preparation procedure


In this work, we use oil production data from wells in the Midland field. We have selected 22
Midland wells, relatively smooth data, which indicates fewer significant operational
changes. The selected Midland wells have been completed in a natural fractured reservoir
and measured monthly. However, there are some missing measurements (i.e., no recorded
values) for a few months for each selected well. We simply ignore these missing values. Some
measurements have recorded zero values, and we suspect they indicate temporary shut-
down for operations (e.g., a workover). The zero values may interfere with the training
process, so we remove them from the data, then the datasets are rescaled with a standard-
ization.The standardization is included in deep learning to improve neural networks con-
vergence. Table 1 lists the lengths of production history of the selected wells. The lengths
range from 105 to 362 months. No matter how long a well’s production history is, we use the
Tadjer et al. 13

Table 1. Production history length of selected midland field wells.


Well-ID 3 5 8 9 11 14 15 16 17 18 20
Length (months) 108 105 105 109 106 308 315 319 362 344 311
Well-ID 21 22 72 142 156 157 171 181 206 249 524
Length (months) 314 307 112 235 246 253 162 136 134 133 105

Table 2. DeepAR fixed training hyperparameters.

Hyperparameter Value

Epochs 100
Batch size 32
Batches/epoch 100

Table 3. DeepAR hyperparameters to optimize for each well.

Hyperparameter Value

Context length 24
Layers 1, 2, 4
Cell type GRU, LSTM
Cell hidden state size 1,54,560
Gaussian number 1, 3, 8
Dropout rate 0.1, 0.4, 0.6
Learning rate 1e-4, 1e-3, 1e-2

data of the last 24 months (regarded as a short term) for the blind test. Taking Well-ID3 as
an example: As shown in Figure 5, the data covers 108 months. The data from Month 1 to
84 are used for building training and forecasting model using DeepAR and Prophet model,
and the data from Month 85 to 108 are used for blind testing to assess the performance of
prediction results. The same procedure is applied to the 22 selected wells individually.

Models implementation
The two models considered, DeepAR and the Prophet time series, are evaluated based on
Midland datasets. The experimental setup which is shared for each dataset evaluation is first
described before the dataset experiments.

DeepAR. DeepAR is a model presented by (David et al., 2019) and then implemented in
Gluon Time Series (GluonTS). GluonTS is a toolkit developed by Amazon scientists based
on the Gluon framework (Alexander et al., 2019). It aims to regroup all the tools required to
build deep learning models for time series forecasting and anomaly detection. DeepAR
configurations are trained as it is not implemented without early stopping during its
14 Energy Exploration & Exploitation 0(0)

Figure 5. History of oil production rate, Well-ID3.

optimization (cf. Tables 2 and 3). If necessary, the final models are trained on the best
parameters without early stopping or validation failure on a new training set. For each
selected wells, the optimization is performed on the following parameters:

• The context length and the learning rate: The number of prior timesteps taken to make
the most precise forecasts. The tested context lengths depend on the data provided.
• Stacking layer number: The number of layers in the recurrent neural network.
• The cell type: Cell type in the recurrent neural network (GRU or LSTM).
• The number of gaussians: The number of Gaussians considered to be the probability
distribution of each timestep in Gaussianssianse the
• The dropout rate: The output of each LSTM cell is feed to a Zoneout2 cell which uses this
dropout rate (David et al., 2017).

Prophet time series implementation. An open-source implementation of the Temporal Prophet


time series model that was published with the paper (Taylor and Letham, 2007) can be
found on this web documentation. For each well, the main hyperparameters which can be
tuned are:

• Changepoint prior scale: This is likely the parameter that is most impactful. It determines
the trend change points in particular. If it is, too, the trend will overfit,if it is too small,
the trend will be underfitting, and variation that should have been modeled with trend
changes will be treated with the noise term instead. The default value of 0.05 works for
several time series, but this can be tuned; the range is [0.001, 0.5].
• Sasonality prior scale: This parameter regulates the flexibility of seasonality. Similarly, a
large value helps the seasonality respond to large variations, a small value shrinks the
magnitude of the seasonality. The default parameter is 10, with practically no regularisa-
tion being applied. This is because overfitting occurs here very rarely (there is inherent
Tadjer et al. 15

Figure 6. Oil production time series forecast- DeepAR. Left: Overview of forecast. Right: zoomed forecast.

Table 4. Mean CRPS of probabilistic forecast from DeepAR model.


Well-ID 3 5 8 9 11 14 15 16 17 18 20
Mean CRPS 69 29.15 74 93.5 80.9 15.25 7.89 19.17 9.91 12.01 15.18
Well-ID 21 22 72 142 156 157 171 181 206 249 524
Mean CRPS 23.18 22.14 8.35 27.39 3.47 10.50 21.67 21.37 19.69 27.80 29.91
16 Energy Exploration & Exploitation 0(0)

Figure 7. Oil production time series forecast- Prophet.


Tadjer et al. 17

regularisation because it is modeled with a truncated Fourier sequence, so it is filtered


practically low-pass). [0.01, 10] would possibly be a good range for tuning it.
• Growth: Options are le: This parameter regu

Results
Figure 6 demonstrates the forecast results for some selected wells using the DeepAR, the
means of forecasts (dashed steel blue curve) comparing to blind-test data (dashed red curve)
and Pan CRM model (black line curve). In general, the production forecast seems to be
reasonable, the DeepAR model can forecast both the upward and downward trends gen-
erally well and outperform the Pan CRM model, it is observed that the prediction intervals
are, mostly, containing the correct values, except for the well-ID11, this could be explained
by being incapable of predicting when changes in production are going to happen. We
quantify the accuracy of the probabilistic forecast using the mean CRPS score as listed in
Table 4. The results for prediction accuracy are quite satisfactory. In most cases, the mean
CRPS decreases as the length of production history increases. This indicates that a longer
production history (i.e., more data) will improve the DeepAR model forecast. A major
drawback of DeepAR is that it has very little to no interpretability. We cannot interpret
any physical meanings from the trained DeepAR model parameters.
Figure 7 shows forecasts from the trained Prophet models. The means of forecasts (the
dashed Steel blue curve) follow the blind-test data (dashed red curve in Figure 7) generally
well. The P5-P95 prediction intervals (grey band in Figure 7) covers most of the blind-test
data. However, for Well-ID8, the forecast significantly deviates from blind-test data and
fails to capture both trends and the peaks and troughs reasonably; more specifically, the
forecast underestimates the oil production rates.
Compared to Prophet, the DeepAR models represent distinct trends in the mean CRPS
score as listed in Table 5. This is possibly due to the DeepAR layerresent distinct trends in
the mean CRPS score as listed in est data and fails to capture both trends and the pf the
most previous historical data. Besides, DeepAR indicates that the lowest CRPS errors arise.
Simultaneously, the difference in values is minimal, even though this statement is only valid
for the 5-th and 95-th percentileses. This is demonstrated by the better coverage earned by
the longer periods that compensate for the 50-th percentile’s low accuracy.
Limitation: In the previous section, we presented DeepAR and Prophet trends in the
mean CRPS score as liste months (2 years). We evaluate the performance of the two methods
for a forecast horizon of 48 months, as displayed in both Figures 8 and 9, It can be obviously
seen that the two methods exhibited quite similar performance almost equally well when the
length of wells more than 300 months, for the most part, they well capture the trends of oil
production rate in blind tests, and the predictions yielded by each of the models appear to be
quite similar. The models were good at predicting trends and flat lines, but sometimes

Table 5. Mean CRPS of probabilistic forecast from Prophet model for each well.
Well-ID 3 5 8 9 11 14 15 16 17 18 20
Mean CRPS 63.4 29.15 174 153.5 180.9 22.83 14.45 32.06 9.41 16.45 25.73
Well-ID 21 22 72 142 156 157 171 181 206 249 524
Mean CRPS 38.21 33.06 14.55 34.27 13.42 14.20 24.65 27.24 18.39 28.63 36.21
18 Energy Exploration & Exploitation 0(0)

Figure 8. Oil production time series forecast- DeepAR - 48 months horizon forecast.
Tadjer et al. 19

Figure 9. Oil production time series forecast- Prophet - 48 months horizon forecast.
20 Energy Exploration & Exploitation 0(0)

undershot/overshot the peaks and troughs, i.e., Well-ID8. However, both the Prophet and
DeepAR did not match production data including quantifying uncertainty, with a small
historical data length. Based on the previous results, we can highlight that the two methods
enrich the family of time series analysis models by extracting the weighted differencing/trend
feature, and contribute to better performance in short-term oil production forecasts, and it
can be an alternative way for oil production forecasting in practical application.

Discussion and conclusion


The purpose of this work is to demonstrate a method of machine learning that could replace
or accelerate manual DCA for short-term oil and gas well forecasting. Probabilistic Prophet
time series analysis and more accurate deep learning models, DeepAR, were considered to
solve this problem. These two have been selected as they outperform the state-of-the-art
methods of forecasting on many topics. For time series forecasting, Prophet is a Bayesian
non-linear univariate generative model proposed by Facebook. The Prophet is also a struc-
tural time series analysis method that specifically models the impact of patterns, seasonality,
and events. For the Prophet, the cyclical duration and event date parameters are set the
same as our model. In contrast, DeepAR is an auto-regressive model based on cells with
GRU or LSTM recurrent neural networks. It learns the parameters for each forecast hori-
zon from a given probability allocation. Then, by sampling several times, one can sample
from certain probability distributions to forecast each horizon or compute confidence inter-
vals. The model validation was carried out on 22 separate midland reservoir field oil pro-
duction datasets. Each has had their outliers removed and missing data replaced. They were
also standardized as a pre- and post-processing to increase the model’s accuracy. Their
performances were evaluated based on mean CRPS metrics. The prediction length was
initially fixed to 24 months and planned to be increased to 48 months. The models first
went through a hyperparameter optimization to select to optimal parameters of each meth-
ods of each well. The results showed that the deep learning approach and Prophet analysis
yield a satisfactory result in short term forecast, but they may fail to identify long-term
trends in predictions unless the predictions are constantly adjusted. However, The both
approaches relies on the volume and granularity of data to develop capability for predicting
production over a long-time horizon.
This approach can be regarded as lume and granularity of data to develop capability for
predicting production over a long-time horizon.pproaches relies on thIt is important to
highlight some potential drawbacks of applying time series deep learning for oil production
prediction. Deep learning models may suffer significant errors when used for long-term
forecasts. This is in addition to their limited interpretability. That is because the predictions
are computed sequentially and depend on past predictions that have been appended to the
data. Thus, there is a gradual accumulation of error over time. Deep learning models have to
be retrained periodically as more data are collected. Otherwise, their predictions become
highly inaccurate after a long period. Furthermore, another difficulty that may arise when
applying deep learning is that an intermediate-to-expert level of knowledge may be required
during model creation and training, as opposed to other out-of-the-box machine learning
methods that can be trained easily by adjusting their hyperparameters. Therefore, general
NNs may require some adjustments to their cell architecture.
In conclusion, the precise prediction and learning performance presented in the paper
suggests that both Prophet and DeepAR are eligible for use in the petroleum industry’s
Tadjer et al. 21

non-linear short-term forecasting problems. Many steps should be taken to further


improve the performance of forecasting over long time horizons, such as the application
to spatiotemporal tasks or the use of an encoder-decoder from sequence to sequence,
where the contextual data (static and dynamic) would be integrated into the model archi-
tecture. Additionally, integrating physics constraints during the training of a deep neural
network. An advantage of such approach is that physics can be introduced into ML
approaches and could replace or speed up manual DCA to perform long term forecast
of oil and gas well.

Declaration of conflicting interests


The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/
or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.

Notes
1. The term vanishing gradient refers to the fact that in a feedforward network (FFN) the back-
propagated error signal typically decreases (or increases) exponentially as a function of the distance
from the final layer.
2. Zoneout is an application of dropout where the values are reset to their previous state ðht ¼ ht1 Þ
instead of being dropped out ðht ¼ 0Þ.
3. https://facebook.github.io/prophet/

References
Adekoya F (2009) Production decline analysis of horizontal well in gas shale reservoirs. Master thesis,
West Virginia University, Morgantown, WV, USA.
Aditya V, Akhil DG and Srikanta M (2017) Modeling early time rate decline in unconventional
reservoirs using machine learning techniques. In: Abu Dhabi international petroleum exhibition
& conference, Abu Dhabi, UAE, 13–16 November.
Alexander A, Konstantinos B, Michael BS, et al. (2019) Gluonts: Probabilistic time series models in
python. Available at: https://arxiv.org abs/1906.05264
Arps JJ (1945) Analysis of decline curves. Trans.
Cho K, van Merrienboer B, Bahdanau D, et al. (2014) On the properties of neural machine translation:
Encoder-decoder approaches. In: Eighth workshop on syntax, semantics and structure in statistical
translation. Available at: https://nyuscholars.nyu.edu/en/publications/on-the-properties-of-neural-
machine-translation-encoder-decoder-a
David K, Tegan M, János K, et al. (2017) Zoneout: Regularizing rnns by randomly preserving hidden
activations.
David S, Valentin F and Jan G (2019) Deepar: Probabilistic forecasting with autoregressive recurrent
networks.
Duong AN (2011) Rate-decline analysis for fracture-dominated shale reservoir. SPE Reservoir
Evaluation & Engineering 14(3): 377–387.
Gasthaus J, Benidis K, Wang Y, et al. (2019) Probabilistic forecasting with spline quantile function
rnns. In: Proceedings of machine learning research (eds K Chaudhuri and M Sugiyama), volume 89,
pp.1901–1910.
22 Energy Exploration & Exploitation 0(0)

George EPB, Gwilym MJ, Gregory CR, et al. (2015) Time Series Analysis: Forecasting and Control.
Hoboken: John Wiley & Sons.
Geron A (2017) Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and
Techniques to Build Intelligent Systems. USA: OSA: Intelligent Sy
Gonzalez RA, Gong X and McVay DA (2012) Probabilistic decline curve analysis reliably quantifies
uncertainty in shale gas reserves regardless of stage of depletion. In: SPE eastern regional meeting.
Greff K, Srivastava RK, Koutnik J, et al. (2017) Lstm: A search space odyssey. IEEE Transactions on
Neural Networks and Learning Systems 28(10): 2222–2232.
Grosse R (2017) Lecture 15: Exploding and vanishing gradient. Available at: www.cs.toronto.edu/
rgrosse/courses/csc321_2017/readings/L15\%20Exploding\%20and\%20Vanishing\%20Gradients.
pdf
Gupta S, Fuehrer F and Benin CJ (2014) Production forecasting in unconventional resources using
data mining and time serie analysis. In: SPE/CSUR unconventional resources conference, Canada,
30 September–2 October 2014.
Han D, Jung J and Kwon S (2020) Comparative study on supervised learning models for productivity
forecasting of shale reservoirs based on a data-driven approach. Applied Sciences 10(4): 1267.
Hersbach H (2000) Decomposition of the continuous ranked probability score for ensemble prediction
systems. weather and forecasting. Weather and Forecasting 15(5): 559–570.
Hochreiter S and Schmidhuber J (1997) Long short term memory. Neural Computation 9(8):
1735–1780.
Hong A, Bratvold RB, Lake LW, et al. (2019) Integrating model uncertainty in probabilistic decline-
curve analysis for unconventional-oil-production forecasting. SPE Reservoir Evaluation &
Engineering 22(3): 861–876.
Joshi KJ (2012) Comparison of various deterministic forecasting techniques in shale gas reservoirs with
emphasis on the duong method. PhD thesis, Texas A&M University, College Station, TX, USA.
Junyoung C, Caglar G, KyungHyun C, et al. (2014), Empirical evaluation of gated recurrent neural
networks on sequence modeling.
Kyunghyun C, Bart van M, Caglar G, et al. (2014) Learning phrase representations using rnn encoder-
decoder for statistical machine translation.
Lee K, Lim J, Yoon D, et al. (2019) Prediction of shale-gas production at duvernay formation using
deep-learning algorithm. SPE Journal 24(6): 2423–2437.
Liu W, Liu DW and Gu J (2020) Forecasting oil production using ensemble empirical model decom-
position based long short-term memory neural network. Journal of Petroleum Science and
Engineering 189: 107013.
Luo G, Tian Y, Sharma A, et al. (2019) Eagle ford well insights using data-driven approaches. In:
International petroleum technology conference, Beijing, China, March 2019.
Ma X and Liu Z (2018) Predicting the oil production using the novel multivariate nonlinear
model based on arps decline model and kernel method. Neural Computing and Applications
29(2): 579–591.
Nelson PH (2009) Pore-throat sizes in sandstones, tight sandstones, and shales. AAPG Bulletin 93(3):
329–340.
Ng AY and Jordan MI (2002) On discriminative vs. generative classifiers: A comparison of logistic
regression and naive bayes. Advances in Neural Information Processing Systems 14: 841–848.
Pan Z (2016) Revised productivity index equation to improve transient history match for the capacitance
resistance model. Master thesis, University of Texas at Austin, Austin, USA.
Ruofeng W, Kari T, Balakrishnan N, et al. (2018), A multi-horizon quantile recurrent forecaster.
Sagheer A and Kotb M (2019) Time series forecasting of petroleum production using deep lstm
recurrent networks. Neurocomputing 323: 203–213.
Taylor SJ and Letham B (2007) Forecasting at scale.
Valko PP and Lee WJ (2010) A better way to forecast production from unconventional gas wells. In:
SPE annual technical conference and exhibition.
Tadjer et al. 23

Wattenbarger RA, El-Banbi AH, Villegas ME, et al. (1998), Production analysis of linear flow into
fractured tight gas wells. In: SPE Rocky Mountain Regional/low-permeability reservoirs sympo-
sium, Denver, USA, 5–8 April.
Zhan C, Sankaran S, LeMoine V, et al. (2019) Application of machine learning for production fore-
casting for unconventional resources. In: Unconventional resources technology conference
(URTEC).

You might also like