The ML.EXPLAIN_FORECAST function
This document describes the ML.EXPLAIN_FORECAST function, which lets you
generate forecasts that are based on a trained
time series model. It only works on ARIMA_PLUS models with the
training option decompose_time_series enabled or on ARIMA_PLUS_XREG models.
The ML.EXPLAIN_FORECAST function encompasses the ML.FORECAST function
because its output is a superset of the results of ML.FORECAST.
Syntax
# `ARIMA_PLUS` models:
ML.EXPLAIN_FORECAST(
MODEL `PROJECT_ID.DATASET.MODEL`,
STRUCT(
[HORIZON AS horizon]
[, CONFIDENCE_LEVEL AS confidence_level]))
# `ARIMA_PLUS_XREG` model:
ML.EXPLAIN_FORECAST(
MODEL `PROJECT_ID.DATASET.MODEL`,
STRUCT(
[HORIZON AS horizon]
[, CONFIDENCE_LEVEL AS confidence_level]),
{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) })
Arguments
ML.EXPLAIN_FORECAST takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: the name of the model.TABLE: the name of the input table that contains the data to be evaluated.If
TABLEis specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the table, they are ignored.
The
TABLEargument is required for theARIMA_PLUS_XREGmodel.QUERY_STATEMENT: the GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax for theQUERY_STATEMENTclause in GoogleSQL, see Query syntax.If
QUERY_STATEMENTis specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.HORIZON: anINT64value that specifies the number of time points to forecast. The maximum value is the horizon value specified in theCREATE MODELstatement for the time series model, or1000if unspecified. The default value is3. When forecasting multiple time series at the same time, this parameter applies to each time series.CONFIDENCE_LEVEL: aFLOAT64value that specifies the percentage of the future values that fall in the prediction interval. The valid input range is[0, 1). The default value is0.95.
Output
The ML.EXPLAIN_FORECAST function returns the following columns:
time_series_id_colortime_series_id_cols: a value that contains the identifiers of a time series.time_series_id_colcan be anINT64orSTRINGvalue.time_series_id_colscan be anARRAY<INT64>orARRAY<STRING>value. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COLoption as specified in theCREATE MODELstatement.time_series_timestamp: aTIMESTAMPvalue that contains the timestamp of the time series. This column has a type ofTIMESTAMPregardless of the type of the inputtime_series_timestamp_col. For each time series, the output rows are sorted in chronological order by thetime_series_timestampvalue.time_series_type: aSTRINGvalue that contains eitherhistoryorforecast. The rows that have a value ofhistoryin this column are used in training, either directly from the training table, or from interpolation using the training data.time_series_data: aFLOAT64value that contains the data of the time series. For rows that have a value ofhistoryin thetime_series_typecolumn,time_series_datais either the training data or the interpolated value using the training data. For rows that have a value offorecastin thetime_series_typecolumn,time_series_datais the forecast value.time_series_adjusted_data: aFLOAT64value that contains the adjusted data of the time series. For rows that have a value ofhistoryin thetime_series_typecolumn, this is the value after cleaning spikes and dips, adjusting the step changes, and removing the residuals. It is the aggregation of all the valid components: holiday effect, seasonal components, and trend. For rows that have a value offorecastin thetime_series_typecolumn, this is the forecast value, which is the same as the value oftime_series_data.standard_error: aFLOAT64value that contains the standard error of the residuals during the ARIMA fitting. The values are the same for all rows that have a value ofhistoryin thetime_series_typecolumn. For rows that have a value offorecastin thetime_series_typecolumn, this value increases with time, as the forecast values become less reliable.confidence_level: aFLOAT64value that contains the user-specified confidence level or, if unspecified, the default value. This value is the same for all rows that have a value ofhistoryin thetime_series_typecolumn. This value isNULLfor all rows that have a value offorecastin thetime_series_typecolumn.prediction_interval_lower_bound: aFLOAT64value that contains the lower bound of the prediction result. Only rows that have a value offorecastin thetime_series_typecolumn have values other thanNULLin this column.prediction_interval_upper_bound: aFLOAT64value that contains the upper bound of the prediction result. Only rows that have a value offorecastin thetime_series_typecolumn have values other thanNULLin this column.trend: aFLOAT64value that contains the long-term increase or decrease in the time series data.seasonal_period_yearly: aFLOAT64value that contains the time series data value affected by the time of the year. This value isNULLif no yearly effect is found.seasonal_period_quarterly: aFLOAT64value that contains the time series data value affected by the time of the quarter. This value isNULLif no quarterly effect is found.seasonal_period_monthly: aFLOAT64value that contains the time series data value affected by the time of the month. This value isNULLif no monthly effect is found.seasonal_period_weekly: aFLOAT64value that contains the time series data value affected by the time of the week. This value isNULLif no weekly effect is found.seasonal_period_daily: aFLOAT64value that contains the time series data value affected by the time of the day. This value isNULLif no daily effect is found.holiday_effect: aFLOAT64value that contains the time series data value affected by different holidays. This is the sum of the maximum positive individual holiday effect and the minimum negative individual holiday effect. This is shown in the following formula, where \(H\) is the overall holiday effect and \(h(i)\) is the individual holiday effect:\[H=\max\limits_{h(i)>0} h(i) + \min\limits_{h(i)<0} h(i)\]
This value is
NULLif no holiday effect is found.spikes_and_dips: aFLOAT64value that contains the unexpectedly high or low values of the time series. For rows that have a value ofhistoryin thetime_series_typecolumn, the value isNULLif no spike or dip is found. For rows that have a value offorecastin thetime_series_typecolumn, this value isNULL.step_changes: aFLOAT64value that contains the abrupt or structural change in the distributional properties of the time series. For rows that have a value ofhistoryin thetime_series_typecolumn, this value isNULLif no step change is found. For rows that have a value offorecastin thetime_series_typecolumn, this value isNULL.residual: aFLOAT64value that contains the difference between the actual time series and the fitted time series after model training. The residual value is only meaningful for historical data. For rows that have a value offorecastin thetime_series_typecolumn, theresidualvalue isNULL.holiday_effect_holiday_name: aFLOAT64value that contains the time series data value affected by the holiday that's identified in holiday_name. If no holiday effect is found, this value isNULL.There is one
holiday_effect_holiday_namecolumn for each holiday that's included in the model.attribution_feature_name: aFLOAT64value that contains the attribution of each feature to the final forecast. This only applies toARIMAX_PLUS_XREGmodels. The value is calculated by multiplying the weight of the feature with the feature value. This is shown in the following formula, where \(\beta_{fn}\) is the weight of featurefnin the linear regression and \(X_{fn}\) is the numericalized feature value:\[attribution_{fn}=\beta_{fn} * X_{fn}\]
Mathematical explanation
The mathematical relationship of the output columns is described in the following sections.
time_series_data
The time_series_data value is decomposed into several components to get better
explainability. For ARIMA_PLUS models, the component list includes the
following components for better explainability:
trendseasonal_period_yearlyseasonal_period_quarterlyseasonal_period_monthlyseasonal_period_weeklyseasonal_period_dailyholiday_effectspikes_and_dipsstep_changesresidual
For ARIMA_PLUS_XREG models, this list also includes the feature contribution
attribution_feature_name. For future data,
the spikes_and_dips, step_changes, and residuals values aren't applicable.
The following formulas show what components make up the time_series_data
value for historical and forecast data for time series models
For
ARIMA_PLUSmodels:- Historical data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual- Forecast data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effectFor
ARIMA_PLUS_XREGmodels:- Historical data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual + (attribution_feature_1 + ... + attribution_feature_n)- Forecast data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
time_series_adjusted_data
The time_series_adjusted_data value is the value that remains after cleaning
spikes and dips, adjusting the step changes, and removing the residuals. Its
formula is the same for both historical and forecast data.
For
ARIMA_PLUSmodels:time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effectFor
ARIMA_PLUS_XREGmodels:time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
holiday_effect
The holiday_effect_holiday_name value is a subcomponent
. The holiday_effect value is the sum of all the
holiday_effect_holiday_name values. For example, if you
specify holidays xmas and mlk, the formula is
holiday_effect = holiday_effect_xmas + holiday_effect_mlk.
ARIMA_PLUS example
The following example forecasts 30 time points with
a confidence level of 0.8:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level))
ARIMA_PLUS_XREG example
The following example forecasts 30 time points with a
confidence level of 0.8 with future features:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level), (SELECT * FROM `mydataset.mytable`))
What's next
- For more information about Explainable AI, see BigQuery Explainable AI overview.
- For more information about supported SQL statements and functions for time series forecasting models, see End-to-end user journeys for time series forecasting models.