ICAO Strategic Objective: Economic Development of Air Transport
Introduction to Forecasting Analysis
ICAO Aviation Data Analyses Seminar
Middle East (MID) Regional Office
27-29 October
Economic Analysis and Policy (EAP) Section
Air Transport Bureau (ATB)
Long-Term Air Traffic
Forecasts: GATO
Past decade air transport trends
Demand drivers analysis
- Economic growth
- Liberalization
PASSENGERS - Low Cost Carriers
- Improving technologies
AND CARGO
TRAFFIC Challenges for air traffic
development
- Fuel prices
- Airport/ANSPs capacity constraints
- Competition and inter-modality
Available at: Forecasts
www.icao.int - Structure and methodology
- Passenger and cargo
- Results and analysis by route group
Background
Assembly Resolution A38-
14
Appendix C : Forecasting, planning and economic analyses
The Assembly:
Requests the Council to prepare and maintain, as necessary, forecasts of future
trends and developments in civil aviation of both a general and a specific kind,
including, where possible, local and regional as well as global data, and to make
these available to Contracting States and support data needs of safety, security,
environment and efficiency
Requests the Council to develop one single set of long term traffic forecast, from
which customized or more detailed forecasts can be produced for various purposes,
such as air navigation systems planning and environmental analysis
Main terms and definitions
used in forecasting
analysis
Types of Data
Data can be broadly divided into the following three types:
- Time series data consist of data that are collected, recorded,
or observed over successive increments of time.
- Cross-sectional data are observations collected at a single
point in time.
- Panel data are cross-sectional measurements that are
repeated over time, such as yearly passengers carried for a
sample of airlines.
Of the three types of data, time series data is the most
extensively used in traffic forecasts.
Forecasting Timeframe
Short-term Forecasts
Short-term forecasts generally involve some form of
scheduling which may include for example the seasons of
the year for planning purposes.
The cyclical and seasonal factors are more important in
these situations.
Such forecasts are usually prepared every 6 months or on
a more frequent basis.
Some airport operators undertake ultra short term
forecasts for (e.g.) the next month in order to provide for
specific requirement such as adequate staffing in the
peaks.
Forecasting Timeframe
Medium-term Forecasts
Medium-term forecasts are generally prepared for
planning, scheduling, budgeting and resource
requirements purposes.
The trend factor, as well as the cyclical component, plays a
key role in the medium-term forecast as the year to year
variations in traffic growth are an important element in the
planning process
Forecasting Timeframe
Long-term Forecasts
Long-term forecasts are used mostly in connection with strategic planning to
determine the level and direction of capital expenditures and to decide on
ways in which goals can be accomplished.
The trend element generally dominates long term situations and must be
considered in the determination of any long-run decisions.
It is also important that since the time span of the forecast horizon is long,
forecasts should be calibrated and revised at periodic intervals (every two or
three years depending on the situation).
The methods generally found to be most appropriate in long-term situations
are econometric analysis and lifecycle analysis.
Forecasting Timeframe
Forecasts Horizons
In some cases, the aviation industry forecasts
call for much longer time horizons, up to 25 30
years.
This is particularly relevant for large airport
infrastructure projects and for aircraft
manufacturers, for example, when considering
next generation of aircraft.
When looking at a 30-year horizon, it is advisable to consider a forecast scenario rather than
a forecast itself, because of the uncertainty associated with such a longer-term forecast.
Source: BAA (2011)
Such longer-term outlooks should take into account mega trends and the market maturity
likely to occur over the period.
Alternative Forecasting
Techniques
Source: ICAO Manual on Air Traffic Forecasting
ICAO forecasting
methodogy
Bottom-up approach
Model development and
Historical Traffic selection
Explanato
ry
Traffic Forecasts
variables
World assumptio
ns
=
econometric model
RG #1 RG #1
+ #1 +
econometric model
RG #2 RG #2
+ #2 +
econometric model
RG #3 RG #3
+ #3 +
. . = World
. .
. .
. .
+ econometric model +
RG #n-1 RG #n-1
+ # n-1 +
econometric model
RG #n RG #n
#n
Bottom-up approach
11
Basic Principle
In order to
or modelled value
1,400,000
Modelled
generate a forecast 1,200,000 values
1,000,000
from a time series, Actual
800,000
Observatio
a mathematical 600,000
ns
= actual value
Difference
equation is to be 400,000
actual vs.
200,000 modelled
found to replicate data
0
0 5 10 15 20 25
the historical
Some Definitions
Error
The validity of a forecasting method et Yt Yt
would depend on how accurately
predictions can be made using that
method. One approach to Where
estimating accuracy is to compare
the difference between an actual = the error in time period t
observed value and its modelled = the actual value in time period t
value. = the modelled value for time period t
Some Definitions
Sample (Arithmetic) Mean
Given a set of n values , the
arithmetic mean is
Y1 Y2 K Yn 1 i n
Y Yi
n n i 1
That is, the sum of the observations is divided by the number of
values included.
Median Calculation
Calculation of the
Example 1:
Median
Raw Data: 24.1 22.6 21.5 23.7 22.6
Ordered: 21.5 22.6 22.6 23.7 24.1
Position: 1 2 3 4 5
Median = 22.6
Example 2:
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6
Median
3.5
Some Definitions
Deviation from the Mean:
Some Definitions
The mean absolute deviation is the average of
the deviations about the mean, irrespective of the
sign:
The variance is an average of the squared
deviations about the mean:
The standard deviation is the square root of the
variance:
Example
Mean isX = 12
18
From the table, we have
MAD 2.57,
7
58
S
2
9.67 and S 3.11.
6
Some Definitions
Differences and Growth Rates
The (first) difference of a time series is given by:
DYt Yt Yt 1
The growth rate for a time series is given by:
GYt 100
Yt Yt 1
Yt 1
Some Definitions
The log transform may be written as:
Lt ln(Yt )
The (first) difference in logarithms becomes:
DLt ln(Yt ) ln(Yt 1 )
The inverse transformation is: Y exp( L )
t t
Some Definitions
Source: Song, Witt and Li (2009) The Advanced Econometrics of Tourism Demand,
London: Routledge.
Practical Example of Time
Series Models with Excel
Linear Trend
A Forecasting Model linear trend
0 and 1 are the level and slope (or trend)
Statistical (forecasting) model:
parameters, respectively
denotes a random error term corresponding to the
part of the series that cannot be described by the
Yt 0 1t
model.
o Plus assumptions about the distribution of the
If we make appropriate assumptions about the random error term.
nature of the error term, we can estimate the o The estimated model provides the forecast
unknown parameters 0 and 1.
function, along with the framework to make
statements about model uncertainty.
Linear Trend
Practical Example
Dataset
Linear Trend
Scatter Plot 1,400,000
1,200,000
The first step is to draw a 1,000,000
scatter plot. The scatter 800,000
plot seems to suggest that 600,000
the data follows a linear 400,000
trend. 200,000
0
0 5 10 15 20 25
Linear Trend
Excel Illustration
EXCEL can be used for trend analysis.
First, highlight Columns A and B as
illustrated on the right.
Then, go to Insert Scatter
and select the first one
Linear Trend
Excel Illustration
Excel will then automatically
generate a scatter plot.
Put the cursor on the scatter
and right click on the mouth,
select add trendline as shown
in the screen shot on the right.
Linear Trend
Excel Illustration
Then select
Linear
and
Display Equation on chart
as shown on the right.
Linear Trend
1,400,000
The figure besides 1,200,000
f(x) = 46595.31x + 244852.01
show that the data fit 1,000,000 R = 0.98
the model reasonably
800,000
600,000
well. The equation is 400,000
also presented. 200,000
0
0 5 10 15 20 25
Linear Trend
Generating Forecasts
After a trend curve that appears to fit the data
is established, the forecaster can then simply
extend the visually fitted trend curve to the
future period for which the forecast is desired.
For example, to forecast passenger numbers
at period 21, we simply plug 21 into the
equation. This is considered to be a simple
linear extrapolation of the data
Paxt=21 = 46,595 x (21) + 244,852 = 1,223,347
Exponential Trend
Analysis
Existing trend is exponential if it increases at a
steady percentage per time period.
1,400,000
If a trend is stable in percentage terms 1,200,000
(exponential growth) , it can be expressed as:
1,000,000
Y=a(1+b)T 800,000
600,000
or
ln(Y) = ln(a) + T x ln(1+b) 400,000
200,000
By taking logarithms, the exponential
formulation can be converted to a linear 0
formulation. 0 5 10 15 20 25
Exponential Trend
Analysis
To select exponential trend
analysis in EXCEL, we simply
tick the box for
Exponential
and
Display Equation
as illustrated on the right.
Polynomial Trend Analysis
600,000
The figure on the right shows
terminal passenger data from
London Luton airport to 500,000
Amsterdam Schipol airport
from 1995 to 2009. 400,000
Traffic data in this case can be 300,000
modelled by parabolic trend:
200,000
Y= a + bT + cT2
100,000
With three constants, this
family of curves covers a wide
variety of shapes (either 0
1995 1997 1999 2001 2003 2005 2007 2009 2011
concave or convex).
Polynomial Trend Analysis
To select exponential trend
analysis, in EXCEL, we
simply tick the box for
Polynomial
and
Display Equation
as illustrated on the right.
Polynomial Trend Analysis
600,000
We may have a few points that fall
outside of the underlying trend. 500,000
Normally it happens with monthly
data which may due to 400,000
Strikes, weather, sporting events
Easter tends to move around 300,000
Do nothing if no substantial effects
200,000
on estimation
May remove them from the data 100,000
May adjust them to fit in with the
0
underlying trend 1995 1997 1999 2001 2003 2005 2007 2009 2011
Introduction to Regression
Analysis
Relationship Between
Variables
Regression analysis involves
relating the variable of interest
(Y), known as the dependent
variable, to one or more input
(or predictor or explanatory)
variables (X).
The regression line
represents the expected value
of Y, given the value(s) of the
inputs.
Relationship Between
Variables
The regression relationship
has a predictable component
(the relationship with the
inputs) and an unpredictable
(random error) component.
Thus, the observed values of
(X, Y) will not lie on a straight
line.
Simple Linear Regression
Introduction to
Regression Analysis Model
Random
and are the parameters that define the
Error
line.
Slope Independent term
is the random term which means that even Coefficient Variable
the best line is unlikely to fit the data perfectly, intercept
so there is an error at each point.
We can define the line of best fit as the line
that minimises some measure of this error. Yi 0 1X i i
In practice, this means that we look for the
line that minimises the mean square error. Linear component Random Error
Then we can say that linear regression finds
values for the parameters that define the line component
Dependent
of best fit through a set of points, and Variable
minimises the mean squared error.
Simple Linear Regression
Introduction to
Regression Analysis Model
For each observed value
Xi, an observed value of
Yi is generated by the
population model.
Simple Linear Regression
Introduction to
Regression Analysis Equation
In practice, we will be using
sample data to develop a
line.
The simple linear regression
equation on the right
provides an estimate of the
population regression line.
Least Square Estimators
To get the best line for predicting y
we want to make all of these errors
as small as possible.
min SSE min ei2
We use least square principle to
determine a regression equation by
minimizing the sum of the squares min (y i y i )2
of the vertical distances (SSE)
between the actual Y values and the
predicted values of Y.
min [y i (b 0 b1x i )] 2
Simple Regression Model
Introduction to
Regression Analysis Least Square Estimators
The slope coefficient estimator is:
r is the correlation coefficient:
sy n
b1 r X X Yi Y
sx r i 1
i
n n
X X Y Y
2 2
i i
i 1 i 1
And the constant or y-intercept is:
b 0 y b1x
The Multiple Regression
Model
Least Squares Estimators for
Linear Models with two
Independent Variables
2
y i y x1i x1 x2 i x2 yi y x2i x2 x1i x1 x2 i x2
b1 i i i i
2
2 2
x1i x1 x2 i x2 x1i x1 x2 i x2
i i i
2
y i y x2 i x2 x1i x1 yi y x1i x1 x2 i x2 x1i x1
b2 i i i i
2
2 2
x1i x1 x2 i x2 x1i x1 x2i x2
i i i
b0 y b1 x1 b2 x2
T-value
t Value
The t statistic corresponding to a
particular coefficient estimate is a
statistical measure of the confidence that
can be placed in the estimate.
Since regression coefficients are
estimates of the expected value or the
mean value from a normal distribution,
they have standard errors which can
themselves be estimated from the
observed data.
The t statistic is obtained by dividing the
value of the coefficient by its standard
error. The larger the magnitude of the t,
the greater is the statistical significance of
the relationship between the explanatory
variable and the dependent variable, and
the greater is the confidence that can be
placed in the estimated value of the
corresponding coefficient.
Likewise, the smaller the standard error of
the coefficient, a higher confidence can be
placed on the validity of the model.
T-value
t Value
Most of the computer
software packages available
for statistical analysis
provide the t values.
A value of about 2 is usually
considered as the critical
value of t. A t value below
2 is considered not
significant as much
confidence cannot be placed
on the precision of the
coefficient.
Coefficient of
Determination, R2
Suppose we have a number of
observations of yi and calculate the
mean. Actual value vary around this
mean, and we can measure the
variation by the total sum of squares
(SStotal).
If we look carefully at this SStotal we
can separate it into different
components SSE (sum of squares
due to error) and SST (sum of
squares due to regression).
When we build a regression model we
estimate values, So the regression
model explains some of the variation
of actual observation from the mean.
Coefficient of
Determination, R2
SST Variation explained by the model
R2
SStotal Total variation of the dependent variable
note:
0 R2 1
This measure has a value between 0 and 1. If it is near to 1 then most of the
variation is explained by the regression line, there is little unexplained variation and
the line is a good fit of the data. If the value is near to 0 then most of the variation is
unexplained and the line is not a good fit.
Multiple Linear
Regression
Least Square Estimators
Too
We have to calculate the
coefficients for each of the complicated
independent variable, but after
seeing the arithmetic for multiple by hand!
regression with two independent
variables in the previous slide, you
might guess, quite rightly, that the
arithmetic is even more messy for a
regression with more than two
independent variables.
This is why multiple regression is
never tackled by hand.
Thankfully, a lot of standard
software includes multiple
regression as a standard function.
Development of an
Econometric Model
Development of an
Econometric Model
Selection of the Dependent Variable
Demand for air travel is usually measured by:
Departures
Number of passengers
Revenue Passenger Kilometres (RPKs)
Tonnes of freight
Freight tonne kilometres (FTKs)
Therefore, the above indictors are normally used as the
dependent variable in the regression analysis.
Development of an
Polynomial Trend Analysis
Econometric Model
Selection of Explanatory Variables
The explanatory variables are expected to
represent an important influence on demand in
the particular circumstances.
The explanatory variables should be chosen from
those that are available from reliable sources.
The explanatory variables should be
independently predicted, either by a reliable
independent source or by the forecaster
Development of an
Formulation of the Model
Econometric Model
i) Linear
Y = a + bX1 + cX2 + ...zXn
ii) Multiplicative or log-log
Y = aX1b X2c ...Xnz
log Y = log(a) + b log X1 + c log X2 + ...z log Xn
iii) Linearlog
eY = aX1b X2c ... Xn z
Y = log(a) + b log X1 + c log X2 + ... z log Xn
iv) Loglinear
log Y = a + bX1 + cX2 + ... zXn