Data Mining For Unemployment Rate Prediction Using Search
Data Mining For Unemployment Rate Prediction Using Search
DOI 10.1007/s11761-012-0122-2
Abstract Unemployment rate prediction has become                       used to forecast unemployment trend. The empirical results
critically significant, because it can help government to make         show that the proposed framework clearly outperforms the
decision and design policies. In previous studies, traditional         traditional forecasting approaches, and support vector regres-
univariate time series models and econometric methods for              sion with radical basis function (RBF) kernel is dominant for
unemployment rate prediction have attracted much attention             the unemployment rate prediction. These findings imply that
from governments, organizations, research institutes, and              the data mining framework is efficient for unemployment
scholars. Recently, novel methods using search engine query            rate prediction, and it can strengthen government’s quick
data were proposed to forecast unemployment rate. In this              responses and service capability.
paper, a data mining framework using search engine query
data for unemployment rate prediction is presented. Under              Keywords Unemployment rate prediction · Data mining ·
the framework, a set of data mining tools including neural             Search engine query data · Government service
networks (NNs) and support vector regressions (SVRs) is
developed to forecast unemployment trend. In the proposed
method, search engine query data related to employment                 1 Introduction
activities is firstly extracted. Secondly, feature selection
model is suggested to reduce the dimension of the query data.          Unemployment rate prediction has become critically
Thirdly, various NNs and SVRs are employed to model the                significant, in particular during economic recession, because
relationship between unemployment rate data and query data,            it can not only help government to make decision and
and genetic algorithm is used to optimize the parameters and           design policies, but also offer practitioners to have a bet-
refine the features simultaneously. Fourthly, an appropriate           ter understanding of the future economic trend. In recent
data mining method is selected as the selective predictor by           years, forecast of unemployment rate attracts much atten-
using the cross-validation method. Finally, the selective pre-         tion from governments, organizations, research institutes,
dictor with the best feature subset and proper parameters is           and scholars. A great number of methods are proposed for
                                                                       unemployment rate prediction. Traditional univariate time
W. Xu (B) · Z. Li · C. Cheng                                           series models have been proposed for the unemployment
School of Information, Renmin University of China,
Beijing 100872, China
                                                                       rate prediction [3,13,20,22]. For example, a time deforma-
e-mail: weixu@ruc.edu.cn                                               tion model is applied to US unemployment data, and the
Z. Li
                                                                       experimental results indicate that the proposed method has
e-mail: ziang_lee@126.com                                              better performance than other better-known models, such
C. Cheng
                                                                       as the autoregressive integrated moving average (ARIMA)
e-mail: chengcheng_ruc@126.com                                         [22]. Similarly, autoregressive fractionally integrated moving
                                                                       average (ARFIMA) is offered to analyze the US unem-
T. Zheng                                                               ployment trend, and the results show that ARFIMA has a
School of Economics and Management, Tsinghua University,
Beijing 100084, China
                                                                       better forecasting performance than threshold autoregressive
e-mail: zhengtingting@hotmail.com                                      (TAR) and symmetric ARFIMA model [13].
                                                                                                                          123
                                                                                                                           SOCA
   Some macroeconomic variables, such as money supply,          proposed framework, an automated feature selection model
producer price index, interest rate, and gross national         is firstly constructed to reduce the dimension of the query
product (GNP), have been considered in unemployment rate        data. Secondly, different data mining tools are employed
prediction [10–12,15–17,21]. A smooth transition vector         to describe the relationship between the unemployment rate
error-correction model (STVECM) is used to forecast the         data and the search engine query data. Thirdly, an optimal
unemployment rates of the four non-Euro G-7 countries in        data mining model is selected as the predictor by using the
terms of economic indicators [15]. Similarly, a Markov-         cross-validation method. Finally, the selected predictor with
switching vector error-correction model (MS-VECM) is            proper parameter and best feature subset is used to forecast
suggested to analyze the UK labor market [12]. Moreover,        unemployment trend.
a univariate and multivariate functional coefficient autore-       The rest of this paper is organized as follows. The next
gressive (FCAR) models are presented and evaluated for          section introduces some basic concepts of data mining tools
multi-step unemployment rate prediction [10]. A pattern         used in this paper, including NNs and SVRs. The data mining
recognition method is developed to analyze the specific         framework using search engine query data is proposed for the
phenomenon of fast acceleration of unemployment [11].           unemployment rate prediction in Sect. 3. For illustration, the
   In recent years, Web information is regarded as a useful     efficiency of the proposed framework and empirical analy-
resource to analyze socioeconomic hot spot, such as influenza   sis of unemployment trend using the data mining tools are
epidemics detection [8,23] and finance market prediction        reported in Sect. 4. Finally, conclusions and future research
[2,14,18], and the unemployment rate prediction using Web       directions are summarized in Sect. 5.
information has attracted more attention from researchers
and practitioners [1,4–7,19]. A new method of using data on
internet activity is proposed to demonstrate strong correla-    2 Introduction to data mining tools
tions between keyword searches and unemployment rates,
and the experimental results show that the method used          Data mining is a technique that investigates the internal rules
has a strong potential for the unemployment rate prediction     of data by analyzing large quantity of data. In other words, it is
[1]. An internet job-search indicator called Google Index       a technique that transforms large data into useful information.
(GI) is offered as the best leading indicator to predict the    Data mining makes use of the theories of statistics, artificial
US unemployment rate, and an out-of-sample comparison           intelligence and the others. In this paper, neural network and
of other forecasting models is done to show that the GI         support vector regression are used for mining the internal rule
indeed helps in predicting the US unemployment rate even        of search engine query data and predicting the unemployment
after controlling for the effects of data snooping [6], while   rate.
the power of a novel indicator based on job-search-related
Web queries is employed to predict quarterly unemployment       2.1 Neural networks
rates in short samples [7]. Similarly, the popularity of Web
searches tracked by Google is suggested as an indicator of      Neural network is a mathematical model that imitates the
contemporaneous economic activity, before the official data     structure and functions of biological neural network. A
become available and/or are revised [19]. Moreover, Google      neural network consists of different interconnected artificial
Trends data are suggested to forecast the US unemployment       neurons that are distributed in input layer, hidden layer(s),
time series, and it could improve the forecasting accuracy      and output layer. Generally, in learning phase, the neural
significantly by using Google Trends [4,5].                     network could change its structure based on the information
   Different from the previous studies, a data mining method    that flows through the network. This nonlinear computational
using neural networks has been used to forecast unemploy-       model is widely used in detecting the complex relationship
ment rate with search engine query data, and the experimental   between the input and the output data.
results show that the proposed method outperforms the tradi-       Back-propagation neural network (BPNN) is a widely
tional methods [24]. Furthermore, combining search engine       used neural network model, in which the information is trans-
query data and time series data, a hybrid forecasting model     ferred from the input layer to the output layer via hidden
is suggested to improve the performance of unemployment         layer(s). When the practical output is different from the esti-
rate prediction [25]. Since data mining techniques can make     mated output, the weights and thresholds are adjusted by the
a significant contribution to forecast unemployment rate pre-   back-propagation process of errors, as shown in Fig. 1.
diction, in this paper, a data mining framework using search       When the first input information flows through the net-
engine query data for the unemployment rate prediction is       work and the output information is produced, the back-
presented, and within the proposed framework, various data      propagation process is commenced. As mentioned above,
mining tools are validated and compared to examine the effi-    the error between the produced value and actual value is cal-
ciency and effectiveness of the proposed framework. In the      culated to optimize the network with the help of an error
123
SOCA
                                                                                  (x.)2
                                                                     f (x) = e−     2     cos(1.75x)                                (5)
          Input Layer            Hidden Layer   Output Layer         Support vector regression (SVR) is an adaptation of support
                                                                     vector machine (SVM), which is a recently proposed statis-
Fig. 1 The structure of BPNN                                         tical learning for classification by Vapnik. The basic idea of
                                                                     SVR is mapping the data to high-dimensional feature space
function. The commonly used error function is quadratic              from input space and then using linear regression to solve the
function, which is displayed as follows:                             problem in high-dimensional feature space.
        1                                                              Given a training set {(xi , yi )}, i = 1, 2, . . ., n, where xi
E(t) =        (a j (t) − y j (t))2                  (1)
        2                                                            defines the input data, yi defines the corresponding output,
where y j (t) is the produced value from neural network at time      and n is the total number of data instances. The regression
period t, and a j (t) represents the actual value at time period     function of SVR is defined as:
t. Then, the connection weights are adjusted by generalized          f (x) = (w · ϕ(x)) + b                                         (6)
delta learning function:
            ε                                                        where w and b denote weight vector and bias constant,
                                    
w ji (t) =    η(a j − y j ) f  (.)yi + μw ji (t − 1)        (2)   respectively, and ϕ(x) stands for the function of the mapping
               s=1                                                   data to high-dimensional feature space from input space.
                                                                        In ε-SVR, the coefficients of regression, which are w and
where η is learning rate, and μ is momentum value, ε is epoch
                                                                     b, are solved by minimizing the regularized risk function
size, and f (.) is the activation function. Besides, (a j − y j )
                                                                     below:
stands for the error between the actual value and the produced
value.                                                                            
                                                                                  n
                                                                                                                  1
                                                                     R(C) = C             L ε ( f (xi ), yi ) +     w2            (7)
   The activation function of traditional BPNN is the hyper-                                                      2
                                                                                  i=1
bolic tangent function, which could be defined as:
                                                                         In this function, the first part stands for empirical risk
              2
f (x) =              −1                                        (3)   and the second part stands for regularized risk. Parameter
          1 + e(−2x)
                                                                     C, which is the regularization constant, is utilized to strike
   Learning rate is a parameter that determines the efficiency       the balance between empirical risk and regularized risk. In
and effectiveness of finding the best solution. The larger the       addition, L ε ( f (x), y) is the ε-insensitivity loss function and
value of learning rate, the faster the learning process, but it      defined as:
may jitter. However, if the value of learning rate is relatively                                                                
                                                                                          0                  if |y − f (x)| ≤ ε
small, the local optimal solution may reach.                         L ε ( f (x), y) =                                               (8)
                                                                                          |y − f (x)| − ε otherwise
   Different from BPNN, radical basis function neural net-
works (RBFNN) uses the nonlinear radical basis functions             where ε defines the size of tube or, in other words, the max-
(RBF) as the activation function in the hidden layer, like           imum error allowed in regression.
Gaussian function:                                                      By introducing slack variables ξ , the problem can be trans-
                 − (x−θ2 )
                             2                                       formed into an optimization problem as below:
f (x − θ ) = e          σ                                      (4)
                                                                                          
                                                                                          n
where (x − θ) represents the mean value of Gaussian distrib-         Minimize       C           (ξi + ξ ∗j ) +   1
                                                                                                                 2   w2
                                                                                          i=1
ution, and σ 2 stands for the variance. ‘Spread’ is a parameter                   (yi − (w · ϕ(x) + b)) ≤ ε + ξi ,                  (9)
that reflects the changing speed of RBF. The larger value            s.t.         (y j − (w · ϕ(x) + b)) ≥ ε + ξ ∗j
of spread means that the neurons are required to fit a fast-                      ξi , ξ ∗j , ε ≥ 0, i = 1, 2, . . . , n
changing function, while a smaller spread indicates that the
neurons are needed to fit a smooth function.                            Because in ε-SVR, selection of ε in ε-sensitivity loss func-
   Similarly, for wavelet neural network (WNN), the wavelet          tion is difficult, ν-SVR is designed to overcome this problem
function imbedded in hidden layer is regarded as the activa-         by introducing another parameter v ∈ (0, 1] for controlling
tion function. This function could be described as follow.           the number of support vector. And in ν-SVR, the optimization
                                                                                                                            123
                                                                                                                                              SOCA
123
SOCA
Step 4: Prediction The designed models are taken through                                            Randomly initializing GA
        an iterative validation process using various evalua-                                            populations
        tion methods such as cross-validation method with
        different evaluation criteria, until the model with
                                                                                                            Selection
        best performance is selected. The selective predictor
        with the best feature subset and the optimal parame-
        ters is used to forecast the unemployment trend.                                                                                         Dataset
                                                                                                           Crossover
features is shown in Fig. 3, and the GA-based data mining                                     The Selective Data Mining Models
methods are summarized in Fig. 4.                                                             with Proper Feature s and Parameters
    As can be seen from Fig. 4, a population consists of a
group of chromosomes and it is generated randomly in the                                        Unemployment Rate Prediction
first generation according to the number and size of chromo-
somes. During the selection process, the fitness value of each                              Fig. 4 The GA-based data mining method
chromosome is calculated through fitness function, which
is served as an evaluation indicator to determine whether                                   vector regression models, ε-SVR and v-SVR are imple-
this chromosome could appear in next generation: The chro-                                  mented with four different kernel functions: linear, poly-
mosome with low fitness value is dropped out, and a new                                     nomial, RBF, and sigmoid kernel. In the process of fitness
chromosome is added automatically. From the second gen-                                     function construction, a five-fold cross-validation, in which
eration, the crossover and mutation may happen to some                                      the data are divided into five folds evenly, is carried out,
chromosomes in accordance with some possibilities. The                                      and each time, four folds are trained by neural networks or
crossover means that two chromosomes exchange their genes                                   support vector regressions, while the other fold is used as
from a fixed point and develop into two new chromosomes,                                    testing set and is used to validate the performance of data
while mutation indicates a sudden change in genes on a chro-                                mining models; furthermore, the average RMSE is calcu-
mosome. Then, the fitness function is applied again. This                                   lated through this fivefold cross-validation, and 1/RMSE is
iteration may not stop until the maximum generation of evo-                                 chosen as the value of fitness function.
lution. In this experiment, the maximum generation of evo-
lution is set at 100, and the initial size of population is set
at 60, which means 60 possible feature groups are selected                                  4 Empirical analysis
randomly at first.
    The fitness function is calculated by the performance of                                4.1 Data description and evaluation criteria
neural networks and support vector regression separately. In
neural network models, three different neural networks are                                  The US government only releases a monthly report of unem-
implemented to train and test the selected features and para-                               ployment rate to the public. In order to improve the prediction
meter(s), namely BPNN, RBFNN, and WNN. In support                                           performance, instead of forecasting the unemployment rate
                                                                                            itself, the Unemployment Initial Claims (UIC) is used in our
                                                                                            experiments. UIC is a leading indicator of US labor market
  0    1        ...   0    ...    0   0        ...   1    0    1       0         ...   1    to estimate the unemployment rate, which is a weekly report
                                                 Randomly initializing GA
                                                                                            that issued by US Department of Labor. Thus, the weekly
           P1              ...            Pm            F1    F2
                                                      populations   F3  ...            Fn   initial claims data are collected from the Web site of the US
                                                                                            Department of Labor.
                      Parameter Set                                Feature Set                 On the another hand, as proposed in [4], two types
                                                                                            of the query data, “Local/Jobs” and “Society/Social Ser-
Fig. 3 Genetic representation                                                               vices/Welfare & Unemployment”, are supposed to be related
                                                                                                                                                             123
                                                                                                                                                  SOCA
123
SOCA
   As revealed in Table 3, in terms of MAE, similar results               ν-SVRs outperform ε-SVRs if kernels are same, (3) WNN
can be found. GA-SVR models perform better than GA-NN                     and SVR with sigmoid kernel are not suitable to tackle this
models except for SVRs with sigmoid kernel. In addition,                  problem, because of their relatively poor performances when
ν-SVRs outperform ε-SVRs under conditions that their ker-                 compared with the others, (4) best average result comes from
nels are same. The best average performance is generated by               ν-SVR with RBF kernel, and ν-SVR with RBF kernel is best
ν-SVR with RBF kernel, and it is different from the result                suited for this problem.
in terms of RMSE. Moreover, the best performance comes
from ν-SVR with RBF kernel in iteration 3.
   When performance results are evaluated in terms of                     4.4 Prediction and further discussion
MAPE, which is reflected in Table 4, the analyses are nearly
exactly the same: (1) SVRs perform better than NNs in most                According to the result analyses above, model ν-SVR with
circumstance, (2) ν-SVRs outperform ε-SVRs if kernels are                 RBF kernel in iteration 3 is chosen as the model for the final
same, (3) best average result comes from ν-SVR with RBF                   prediction. The model ν-SVR with polynomial kernel in iter-
kernel, and (4) ν-SVR with RBF kernel in iteration 3 yields               ation 5, which performs best in terms of RMSE, is not chosen
best performance.                                                         for (1) in terms of MAE and MAPE, and model ν-SVR with
   Grounded on the similar results in terms of different per-             RBF kernel in iteration 3 performs better; and (2) even in
formance evaluator, several implications are concluded: (1)               terms of RMSE, model ν-SVR with RBF kernel in iteration
SVRs perform better than NNs in most circumstance, (2)                    3 performs only slightly worse (50505.49 versus 50330.03).
                                                                                                                               123
                                                                                                                                                   SOCA
Selected features
No. 5, 8, 12, 13, 16, 19, 22, 24, 25, 29, 30, 31, 32, 35, 36, 38, 39, 41, 44, 45, 50, 51, 52, 53, 59, 60, 61, 62, 67, 69, 70, 73, 75, 76, 77, 78, 80,
81, 82, 85, 87, 88, 89, 91, 93, 95, 97, 99, and 100
   The details of the parameters related to this model and the                  ment rate are compared visually in Fig. 5 below, and it is
features selected are listed in Table 5, and the numbers with                   not rude to conclude that the predicted value generally fol-
corresponding key words features are displayed in “Appen-                       lows the trend of real unemployment rate as shown in Fig. 5.
dix.”                                                                           The RMSE, MAE, and MAPE are 68,182.55, 54,241.10, and
   When the selected model is applied to predict the real                       12.54, respectively. The worse performance may be caused
value of unemployment rate, the performance of it is not as                     by the outliers that occurred between 10-12-26 and 11-01-22.
good as the one in the experiment aforementioned. This may
be due to the overfitting of the model in training process.                     5 Conclusions
The prediction result of select model and the real unemploy-
                                                                                This paper presents a novel data mining framework for the
                                                                                unemployment rate prediction using search engine query
                                                                                data. Under the framework, GA-based data mining meth-
                                                                                ods are proposed to forecast the unemployment rate. In the
                                                                                proposed method, the proper feature subset and the optimal
                                                                                parameters are selected. In terms of evaluation criteria, the
                                                                                empirical results show the efficiency and effectiveness of the
                                                                                proposed framework and also revealed that among these data
                                                                                mining tools, the GA-based ν-SVR with RBF kernel shows
                                                                                dominant advantages for the unemployment rate prediction.
                                                                                So, it indicates that the proposed framework can be used as
                                                                                a potential alternative to analyze the unemployment trend.
                                                                                Besides, the timely search engine query data could generate
                                                                                simultaneous prediction result, which could help government
                                                                                and scholars deal with unemployment trend without delay.
                                                                                   In addition, this study also has some research questions
                                                                                for further studies. Firstly, under our proposed framework,
Fig. 5 Prediction result with real unemployment rate value                      other data mining tools, such as ensemble methods, can be
123
SOCA
used to forecast the unemployment trend for a more stable          21   unemployment claims        71    new york unemployment
solution. Secondly, some other Web information, including                                                 benefit
Web content information and Web link information, can be           22   unemployment apply for     72    unemployment insurance
used to improve the forecast performance. Thirdly, in this                                                benefit
paper, the primary data set of search engine query is rel-         23   apply for unemployment     73    unemployment dol
atively large, and thus an efficient feature group, which is       24   unemployment ca            74    unemployment info
small and reasonable, should be built to forecast unemploy-        25   unemployment services      75    unemployment commission
ment rate. Fourthly, an online unemployment analysis and           26   unemployment security      76    michigan unemployment
                                                                                                          benefits
forecast system (UAFS) can be developed to assist govern-
ments and organizations for early-warning and decision sup-        27   unemployment               77    weekly unemployment
                                                                                                          insurance
port. Finally, the proposed methodology can also be applied
                                                                   28   to file unemployment       78    weekly unemployment
to other research fields, especially to society hot spot, such                                            benefits
as real estate market, crude oil market, and foreign exchange      29   unemployment benefits      79    nyc unemployment benefits
market.
                                                                   30   file for unemployment      80    green jobs
                                                                           online
Acknowledgments This research work was partly supported by 973     31   ohio unemployment          81    how to claim unemployment
Project (Grant No. 2012CB316205), National Natural Science Foun-           benefits
dation of China (Grant No. 71001103) and Beijing Natural Science   32   unemployment file          82    unemployment rate
Foundation (No. 9122013).                                                  claims
                                                                   33   to file for unemployment   83    unemployment insurance
Appendix: The top 100 search engine query data                                                            benefits
                                                                   34   unemployment benefits      84    unemployment weekly
                                                                         pa                               benefits
No.   Key words                 No.   Key words                    35   unemployment benefit       85    online unemployment
                                                                                                          application
1     filing unemployment       51    ohio unemployment rate       36   nys dept labor             86    unemployment rate ny
2     unemployment filing for   52    unemployment ny              37   state unemployment         87    jobs in usa
3     unemployment office       53    unemployment                        benefit
                                       compensation                38   connecticut                88    new york unemployment
                                                                          unemployment benefits           benefits
4     file for unemployment     54    unemployment in az
                                                                   39   dept of unemployment       89    benefits for unemployment
5     unemployment file for     55    to apply for unemployment
                                                                   40   nys dept of labor          90    police jobs
6     unemployment state        56    unemployment insurance
                                       claim                       41   for unemployment           91    dc unemployment
                                                                          benefits
7     state of unemployment     57    unemployment department
                                                                   42   uimn.org                   92    unemployment in kansas
                                       of labor
                                                                   43   unemployment in            93    mass unemployment benefits
8     insurance unemployment    58    department of labor
                                                                         michigan
                                       unemployment
                                                                   44   unemployment benefit       94    unemployment online
9     washington                59    labor department                   claim
       unemployment                     unemployment               45   unemployment payment       95    unemployment in florida
10    unemployment file         60    unemployment check           46   unemployment in            96    eligible for unemployment
11    unemployment insurance    61    unemployment for mn                 colorado
12    unemployment apply        62    unemployment in indiana      47   apply for unemployment     97    benefits of unemployment
                                                                          online                          insurance
13    department of             63    unemployment in california
       unemployment                                                48   unemployment benefits      98    unemployment eligibility
14    unemployment website      64    snag a job                          insurance
                                                                   49   application for            99    construction jobs
15    unemployment              65    unemployment grants                 unemployment
       application                                                 50   benefits unemployment      100   unemployment rate recession
16    unemployment new york     66    unemployment in                     insurance
                                       pennsylvania
17    washington state          67    unemployment benefit
       unemployment                    insurance
18    Wisconsinunemployment     68    claim unemployment benefit
        benefits                                                   References
19    insurance for             69    part time unemployment
        unemployment
20    apply for unemployment    70    security jobs                1. Askitas N, Zimmermann KF (2009) Google econometrics and
                                                                      unemployment forecasting. Appl Econom Q 55(2):107–120
                                                                                                                             123
                                                                                                                                         SOCA
 2. Blasco N, Corredor P, Del Rio C, Santamaria R (2005) Bad news         14. Lan KC, Ho KS, Luk RWP, Yeung DS (2005) FNDS: a dialogue-
    and Dow Jones make the Spanish stocks go round. Eur J Oper Res            based system for accessing digested financial news. J Syst Softw
    163(1):253–275                                                            78(2):180–193
 3. Chen CI (2008) Application of the novel nonlinear grey Bernoulli      15. Milas C, Rothman P (2008) Out-of-sample forecasting of unem-
    model for forecasting unemployment rate. Chao Solitons Fractals           ployment rates with pooled STVECM forecasts. Int J Forecast
    37(1):278–287                                                             24(1):101–121
 4. Choi H, Varian H (2009) Predicting initial claims for unemploy-       16. Proietti T (2003) Forecasting the US unemployment rate. Comput
    ment benefits. Google technical report                                    Stat Data Anal 42(3):451–476
 5. Choi H, Varian H (2009) Predicting the present with Google trends.    17. Schanne N, Wapler R (2010) Regional unemployment forecasts
    Google technical report                                                   with spatial interdependencies. Int J Forecast 26(4):908–926
 6. D’Amuri F (2009) Predicting unemployment in short samples with        18. Schumaker RP, Chen H (2009) A quantitative stock prediction sys-
    internet job search query data. MPRA paper no. 18403:1–17                 tem based financial news. Inform Process Manag 45(5):571–583
 7. D’Amuri F, Marcucci J (2009) Google it! forecasting the US unem-      19. Suhoy T (2009) Query indices and a 2008 downturn: Israeli data.
    ployment rate with a Google job search index. MPRA Paper No.              Bank of Israel discussion paper
    18248:1–52                                                            20. Tashman LJ (2000) Out-of-sample tests of forecast accuracy: an
 8. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS                 analysis review. Int J Forecast 16(4):437–450
    (2009) Detecting influenza epidemics using search engine query        21. Terui N, van Dijk HK (2002) Combined forecasts from linear and
    data. Nature 457(19):1012–1014                                            nonlinear time series models. Int J Forecast 18(3):421–438
 9. Guyon I, Elisseeff A (2003) An introduction to variable and feature   22. Vijverberg CPC (2009) A time deformation model and its time-
    selection. J Mach Learn Res 3:1157–1182                                   varying autocorrelation: an application to US unemployment data.
10. Harvill JL, Ray BK (2005) A note on multi-step forecasting                Int J Forecast 25(1):128–145
    with functional coefficient autoregressive models. Int J Forecast     23. Xu W, Han ZW, Ma J (2010) A neural network based approach to
    21(4):717–727                                                             detect influenza epidemics using search engine query data. In: Pro-
11. Keilis-Borok VI, Soloviev AA, Allegre CB, Sobolevskii AN                  ceeding of the ninth international conference on machine learning
    (2005) Patterns of macroeconomic indicators preceding the unem-           and cybernetics, Qingdao, China, pp 1408–1412
    ployment rise in Western Europe and the USA. Pattern Recogn           24. Xu W, Zheng T, Li Z (2011) A neural network based forecast-
    38(3):423–435                                                             ing method for the unemployment rate prediction using the search
12. Krolzig HM, Marcellino M (2002) A Markov-switching vector                 engine query data. In: Proceeding of the eighth IEEE international
    equilibrium correction model of the UK labour market. Empir Econ          conference on e-business engineering, Beijing, China, pp 9–15
    27:233–254                                                            25. Xu W, Li Z, Chen Q (2012) Forecasting the unemployment rate
13. Lahiani A, Scaillet O (2009) Testing for threshold effect in              by neural networks using search engine query data. In: Proceeding
    ARFIMA models: application to US unemployment rate data. Int              of the 45th Hawaii international conference on system sciences,
    J Forecast 25(2):418–428                                                  Hawaii, US, pp 3591–3599
123