Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Advanced sentiment analysis of social
media for short-term cryptocurrency
price prediction.
Krzysztof Wołk1
1
 Polish-Japanese Academy of Information Technology, Warsaw, MZ 02-008 Poland
Corresponding author: Krzysztof Wołk (e-mail: kwolk@pja.edu.pl).
ABSTRACT Over the last few years, bitcoin has become a fundamental aspect of financial systems.
Bitcoin is one the largest cryptocurrency in terms of capital share markets but not the only one. Therefore,
using sentiment analysis as a computational opinion can be used to predict bitcoin and other crypto
currency prices at different intervals of time. One the key characteristics of crypto market is that the
fluctuation of its prices does not depend on institutional money regulation but relies on people’s perception
and opinions. Therefore, analysing the relationship between social media and web search is crucial for a
cryptocurrency price prediction. In this research Twitter and Google Trends are used to forecast the short-
term price of main crypto currencies as they are used to influence purchasing decision. This research article
uses multi model approach, that is interpolated, to analyse the impact social media on cryptocurrency
prices. Summing up we prove that psychological and behavioural attitude of population has great impact on
cryptocurrency prices that are very speculative.
INDEX TERMS Sentiment analysis, Machine learning, cryptocurrencies, Social Media, Speculative
Models
I.   INTRODUCTION                                                              2017 the price of bitcoin was $863 but it rose to around
Bitcoin, Ethereum, Electroneum, Ripple,                                        $17,000 which is about 2000% increase at the end of the
ZEC Cash and Monero (crypto in short)                                          same year. This massive unprecedented rise has captured
are cryptocurrencies that are an electronic                                    worldwide attention in digital currency transaction. Basing
form of currency transaction. Crypto is a                                      on the previous research studies, it is evident that bitcoin
decentralized form of currency transaction                                     possesses unique characteristics as compared to traditional
which       takes    place     without    an                                   mode of transaction such as banking. This is because its
intermediary. It was introduced in the                                         price fluctuation depends on people’s perception and
market in 2008 by Satoshi Nakamoto (as                                         opinions instead of following institutional regulations.
Bitcoin project) and it can be circulated in                                      However, the value of crypto is volatile and its price
the market on peer-to-peer networking                                          keep on fluctuating with time and it’s uncertain for
transaction. Crypto is different from                                          investors and people who wish to use them as a currency.
traditional form of currency transaction                                       Twitter is one of the most widely form of social media
such as banking system as it allows its                                        platform which collects multidimensional views and
users to enjoy transaction without                                             perspectives of different people in the whole world.
operation fee and following any rules and                                      Therefore, Twitter is used as marketing tool for crypto
authorities of financial institutional which                                   transactions and hence it can be used to predict their prices.
are full of fraud and intense corruption.                                      Also, web search tool such as Google Trends is also one of
   Bitcoin and other crypto, is one of the most growing                        the most widely used research platform that provides wide
form of digital transaction in the world. At the start of year                 range of information and therefore it is used as a marketing
VOLUME XX, 2017                                                                                                                             1
tool for price for crypto and is used to predict the future      Twitter data analysis correlated with views of the people
prices. This research study analyses correlation between         towards the price of cryptocurrency. In addition to that he
Twitter as a social media platform as the number of Tweets       also pointed that social media such Twitter sentiments has a
and the prices of the crypto. In addition, the effect of web     great impact to the final users of cryptocurrency as
search data like Google Trends on price of crypto will be        compared to the emotional state of the users. Bollen et al
examined as well.                                                [2] conducted a research on Twitter sentiment against stock
   Therefore, using sentiment analysis we can predict the        market. In his research study, he used neural networks and
price change of crypto on different time interval using          casualty analysis to determine where the price of the
different computational and statistical models such as linear    cryptocurrency was heading to. From his results there was
regression technique, boosting methods and neutral               ability to predict change of capital market for some days for
networks and determine the significance of coefficient of        instance almost one week.
determination using Twitter data and Google Trends. By              Another research was carried out by Prosky et al [3] who
using linear modelling which takes the number of Tweets          used tensor networks in order to formulate a model for
and Google Trends, we will be able to accurately make a          learning. He concentrated on Twitter data in order to carry
prediction towards the direction of the changes of prices of     out sentiment analysis on it and see if they could formulate
crypto. According to the past research studies, sentiment        the results and see the relationships with other different
analysis technique can be a good modelling to the capital        stochastic events. Also, Rather et al [4] developed a
market and cryptocurrencies.                                     recurrent neutral networks and a multiple linear regression
   In order to establish the usefulness of the data, only data   to formulate a hybrid model, which tried to provide a
that contains a certain set of keywords (cryptocurrency full     solution to each model used and solved the associated
name and abbreviation) is analysed. The underlying               limitations. According to researchers these three methods
assumption is that the sentiment correlates with the             could be useful for the prediction of the price of the
movement of the financial instrument, such as Bitcoin.           cryptocurrency and especially bitcoin.
There is solid research to suggest this correlation exists.         A research done by Nie et al [5] showed that comments
Google Trends data consists of relative search volume            done in social media such as Twitter comments and web
scores for a given search term, during a given time interval.    search such as Google Trends were very important
Several researchers have focused on using Google Trends          information that could be used to predict the price of
data to predict the stock market. Many searches for bitcoin      bitcoins and other types of cryptocurrencies. Nie found that
or some other keyword could indicate a reaction to current       these factors produced a very high significance with a very
events or predict a future event.                                low p-value for search terms related to mining and block
   When carrying out analysis of sentiment about opinions        chains which are important aspects of cryptocurrencies.
and perceptions of Twitter users and google researchers             Karalevičius [6] used sentiment analysis to predict
regarding the price of bitcoin, the problem statement that       intraday Bitcoin movements in social media forums. His
emerges and need to be solved is to determine whether            conclusion was that short-term price fluctuations could be
there is correlation between Twitter data and the price          predicted with some degree of accuracy which diminished
fluctuation of crypto. Also, can a prediction of naivety         as time was increased. The significance to this research
model regarding sentiment changes yields better output as        project is that our time frame is short, using mostly ten or
compared to random accuracy.                                     sixty minute time frames. Garcia et al. [7] showed that it
II.   PREVIOUS RELATED WORK                                      was possible to use a combined strategy to predict Bitcoin
   This research paper has been built on a wide range of         price using standard financial modelling techniques and
related research ideas and topics. Some economists which         social media signals. These signals included target words,
are behavioural in nature articulated that decisions             sentiment, and other features that describe the changing
regarding financial systems are influenced by emotional          environment of social media such as post frequency and
ethics and not by value of the capital alone. This idea of       comments. These researchers implemented a strategy that
behaviour and emotions was also supported by Dollan [1]          yielded 32.29 percent daily gain. Valence measures alone
who argued that decision making is influenced by                 yielded a 0.1183 daily gain. With enough capital, at these
emotions. Basing on these researches there is an open            rates, trading Bitcoin could be profitable. The researchers
possibility to find beneficial tools such as sentiment           performed back testing on their results which add some
analysis which shows that the price of a commodity may be        confidence to their prediction model.
impacted by other values such as emotions other than                Also, a model formulated by a scholar called Kristoufek
economic fundamentals.                                           et al [8] showed that Google Trends as one of the factors
   Recent research has pointed out clearly that decisions for    which affect the price of Bitcoin had a strong positive
purchase made by people are being influenced by the              correlation with the price of Bitcoin which achieved a very
information found in the website and social media. A             low p-value with a high significance during the study
research study conducted by Gallen Thomas showed that            testing. He also used vector auto regression technique
VOLUME XX, 2017                                                                                                              1
which showed that Wikipedia information was also a good             Bayesian Ridge Regression is similar to LSLR, but it
predictor to produce a considerate model for the prediction      adds a lambda parameter to the input values that penalizes
of the price of the Bitcoin. Stenquist Evita and Lonno Jacob     the beta coefficients and shifts them towards zero. Bayesian
wrote a paper titled “Predicting the price of Bitcoin            ridge regression returns a probabilistic model with a
fluctuation using Twitter sentiment analysis” who collected      Gaussian parameter. MacKay [10] describes the Bayesian
tweets relating to the price of Bitcoin and formulated a         model with a Gaussian probability parameter in the
model which was useful to predict the price of Bitcoin [12].     following equation:
                                                                                                        −1
They used Valence Aware Dictionary and Sentiment                               p ( λ ) =N (α , λ Ι p )
Reasoner (VADER) to analyse the effect of each tweet and
classified them as either positive, negative, or neutral. They      Using the effects of precision of Gaussian, we choose
only kept those tweets that were negative or positive and        alpha and lambda which are chosen to be gamma
thus were used for analysis.                                     distribution. To examine the default parameter in the model
III.     METHODS                                                 for alpha and lambda we use 10-6. These can be adjusted to
   In this project we have applied different predictive and      the data for modeling using the SKLEARN package.
descriptive models which are important for data analysis.        Bayesian Ridge Regression assigns coefficient values using
The work was initiated using two predicted models                the equation:
essential for predicting the price of the cryptocurrency with                                     β=X ¿
the help of Twitter sentiments and Google Trends. These
models are least square linear regression and Bayesian              In the equation above “I” resembles the identity matrix,
Ridge Regression Model. These models are embedded in             and the lambda term is applied across only the diagonal
Python language library called SKLEARN1. This model              elements of the input array.
was explained intensively by Kuchibhotla et al [9] who              Boostings algorithms were also employed, specifically
argued that high dimensional data and methods have               AdaBoost and Gradient boosting. In general, these boosting
proliferated throughout the literature for the last two          algorithms work by minimizing the error. Equation below
decades.                                                         illustrates how this procedure is done:
   When data is expressed as a linear combination of a                 ET =∑ E (f t −1 ( x i ) + α t h ( x i ) )
product of independent variables and a coefficient matrix,                     i
least squares linear regression seeks to minimize any               where E is the error during each iteration, and alpha*h(x)
necessary error that occurs. To determine the coefficients,      is the weak learner for the classifier function. Each result is
we use array of independent variables and dependent              also weighted. When implementing gradient boosting, the
variables. The relationship between the predicted values Y       model applies steepest descent (or gradient descent),
based on the coefficients and the inputs of the array X is       updating the model by computing the derivative of the
expressed as:                                                    residuals (loss) and a multiplier,
                                 p
                    Y^ = ^β0 + ∑ ^
                                 X j ^β j                                               dL ( y i , F ( x i ) )
                                                                                   r=
                               j=1                                                           dF( x i )
   To calculate the better coefficient matrix, we use the                                           n
following formula:                                                           m=argmin ∑ L ¿ ¿ ¿
                           β=X ¿                                                                  i=1
   Another important technique used in analysis is Bayesian         The information regarding crypto was retrieved from a
Ridge Regression modelled by Nie and Ji (2014) who               web-based platform known as Crypto Compare 2, which
claimed that future learning refers to learning the              provides historical prices for various cryptocurrencies.
transformation of the raw data into useful and analytical        During data processing, data was time indexed,
data and other purposes. Feature learning techniques can be      concatenated, and averaged. The many data models were
either supervised or unsupervised, which commonly                employed to make predictions. The complete list of data
include auto-encoders, dictionary learning, restricted           models and their results is given in our results section.
Boltzmann machine, k-means clustering and many other                As it was discussed earlier, we used the VADER 3
approaches. During the past few years, restricted                sentiment analysis tool which is impressive in terms of its
Boltzmann machine draws more and more attention from             ability to be sensitive to nuances in the text, such as
researchers due to its capability of handling different kinds    punctuation, capitalization, negation, and amplification of
of data and its efficient learning method.                       lexicon values. Data processing was the most time-
  1
       SKLEARN: http://scikit-learn.org/
  2
       https://www.cryptocompare.com/
  3
       https://github.com/cjhutto/vaderSentiment
VOLUME XX, 2017                                                                                                                1
consuming aspect of the research, in addition to variable        inverse correlation with the price, as if bad news caused an
transformation. All variables were included in the final         increase in post frequency. This is illustrated in following
model, given that they all had at least moderate correlation     Figures (1-25). On the figures we also compared the
coefficients, and there was no logical reason to exclude         Google Trends data with Crypto Data as well as Tweet
them given their potential predictive ability.                   Frequency Data with Crypto Data. These results imply that
                                                                 there are meaningful relationships between those entities.
IV.   MEASURES OF FIT AND RESULTS                                                           TABLE I
A bagging method of many different models was used to
generate the final prediction. In this bagging method the
result from different categories are collected and are either
summed of averaged or the probability of their occurrence
was identified. To reduce errors that can be occurred in one
particular model, we found that having an ensemble method
of learning was beneficial. Comparing linear regression and
ensemble method we found that the latter performance was
better compared to linear regression model. To test
measures of goodness of fit we found that mean error and
correlation coefficients were our measures of fit and the
other measures of fit will indicate potential profit.
   The correlation coefficient R2 was calculated from the
set of testing data. In practical application, the full set of
data minus the final target value should be trained. Only the
final point, or the last unknown price value should be                               MODEL RESULTS FOR BITCOIN
predicted. We are only interested in knowing how the final                    Models                ME          R²        T.s.
predicted value differs from the actual value. Therefore,           Support Vector Regression     1357.482 0.706722    -384.664
another measure of fit is introduced, and shall be called +-T       Stochastic Gradient Descent   8706.509 0.684718    -254.258
or dT, the error from our target value in dollars. This is the       Gradient Boosting Model      1370.471 0.704768    -209.353
most useful measure of fit and establishes the potential to            MLP Neural Network         1382.774 0.703728    -117.398
be profitable when trading crypto.                                Least Squares Linear Regression 2000.951 0.685587    395.1489
V.    RESULTS                                                                AdaBoost             1986.594 0.676574    -5.40525
The hybrid of models that were used are                             Bayesian Ridge Regression     1234.013 0.720673    48.95157
                                                                           Decision Tree          8791.874 0.678843    359.0313
shown below, including their measures of
                                                                             ElasticNet           2280.217 0.769856    313.6858
fit. Each model was run on testing data to
                                                                          Hybrid (Mean)           498.6117   0.94169   151.6282
gather the RSI and ME values, where ME
is the mean error, R2 is the correlation
coefficient, and +-T is the actual error                           FIGURE 1.   Bitcoin price vs number of tweets
when predicting the price on a brand-new
data point (the final interval). Sampling at
first was done in a 10 and 60-minute shifts.
Overall, in this empirical experiment the
10-minute shifts results in less error in the
hybrid model and it was chosen to be used
within the experiments. Table 1-6 show
the results provided by different methods
we have used for Bitcoin, Ethereum,
Electroneum, Monero, ZEC and Ripple.
    To be more precise the models we used were Support
Vector Regression [13], Stochastic Gradient Descent [14],
Gradient Boosting Model [15], MLP Neural Network [11],
Least Squares Linear Regression [16], AdaBoost [17],
Bayesian Ridge Regression [18], Decision Tree [19],
ElasticNet [20] and Hybrid is mean of all of them. This
mean is actually what was used for prediction. The tweet
frequency was added as a transformed variable and graphed
against the crypto prices. We found that tweets had a high
VOLUME XX, 2017                                                                                                                   1
  FIGURE 2.   All models vs Bitcoin price
  FIGURE 3.   Bitcoin price vs predicted price
                                                                 FIGURE 4.   Bitcoin price vs Google trends
                                                                 FIGURE 5.   Electroneum price vs number of tweets
                                                                 FIGURE 6.   All models vs Electroneum price
                              TABLE II
                  MODEL RESULTS FOR ELECTRONEUM
             Models                  ME        R²        T.s.
   Support Vector Regression      0.004487 0.946137   0.000601
   Stochastic Gradient Descent    0.036371 0.922049   -0.00082
    Gradient Boosting Model       0.005131 0.932071   0.000303
      MLP Neural Network          0.008413 0.935374   0.000488
 Least Squares Linear Regression 0.004657 0.953642    0.001144
            AdaBoost              0.009122 0.927278   -0.00137
   Bayesian Ridge Regression      0.006355 0.898442   0.001081
                                                                 FIGURE 7.   Electroneum price vs predicted price
          Decision Tree           0.033998 0.952623    0.00044
            ElasticNet            0.007629   0.9364    -0.0008
         Hybrid (Mean)            0.001842  0.99163     0.001
VOLUME XX, 2017                                                                                                      1
  FIGURE 8.    Electroneum price vs Google trends
                              TABLE III
                    MODEL RESULTS FOR ETHEREUM
             Models                  ME        R²         T.s.
   Support Vector Regression       75.8912  0.964188   -17.1537
   Stochastic Gradient Descent    364.9388 0.966607     -12.644
    Gradient Boosting Model       126.1168 0.968592    -3.84195
      MLP Neural Network          45.18472 0.964776     -12.367
 Least Squares Linear Regression 126.2375 0.963377     -3.38566
            AdaBoost              73.22109 0.969224    16.75794
   Bayesian Ridge Regression      73.57855 0.964501    -2.00044
          Decision Tree           615.5554 0.961733    17.85768
            ElasticNet            124.8832 0.968191    -6.17257   FIGURE 11.   Ethereum price vs predicted price.
         Hybrid (Mean)            16.02903 0.994549    7.823218
                                                                  FIGURE 12.   Ethereum price vs Google Trends.
                                                                                               TABLE IV
FIGURE 9.    Ethereum price vs number of tweets.
                                                                                      MODEL RESULTS FOR MONERO
                                                                               Models                 ME        R²        T.s.
                                                                     Support Vector Regression     32.21549 0.839515   -2.96132
                                                                     Stochastic Gradient Descent   194.4944 0.815937   -2.35604
                                                                      Gradient Boosting Model      45.53463 0.843566   -4.13285
                                                                        MLP Neural Network         32.52362 0.845643   8.301083
                                                                   Least Squares Linear Regression 26.08474 0.862746   4.746065
                                                                              AdaBoost             30.88287 0.859943   2.887843
                                                                     Bayesian Ridge Regression     50.84047  0.84862   3.112188
                                                                            Decision Tree          196.3985 0.859415   3.559582
                                                                              ElasticNet           44.01616 0.856563   -3.38396
                                                                           Hybrid (Mean)           13.79168 0.978258   -8.01742
FIGURE 10.    All models vs Ethereum price.
VOLUME XX, 2017                                                                                                                   1
                                                 FIGURE 16.   Ethereum price vs Google Trends.
FIGURE 13.   Monero price vs number of tweets.
                                                                               TABLE V
                                                                      MODEL RESULTS FOR RIPPLE
                                                              Models                  ME         R²        T.s.
                                                    Support Vector Regression      32.21549 0.839515    -2.96132
                                                    Stochastic Gradient Descent    194.4944 0.815937    -2.35604
                                                     Gradient Boosting Model       45.53463 0.843566    -4.13285
                                                       MLP Neural Network          32.52362 0.845643    8.301083
                                                  Least Squares Linear Regression 26.08474 0.862746     4.746065
                                                             AdaBoost              30.88287 0.859943    2.887843
                                                    Bayesian Ridge Regression      50.84047   0.84862   3.112188
                                                           Decision Tree           196.3985 0.859415    3.559582
                                                             ElasticNet            44.01616 0.856563    -3.38396
                                                          Hybrid (Mean)            13.79168 0.978258    -8.01742
FIGURE 14.   All models vs Monero price.
                                                 FIGURE 17.   Ripple price vs number of tweets.
FIGURE 15.   Monero price vs predicted price.
VOLUME XX, 2017                                                                                                    1
                                                                             TABLE VI
                                                                     MODEL RESULTS FOR ZCASH
                                                             Models                 ME         R²        T.s.
                                                   Support Vector Regression     32.21549 0.839515    -2.96132
                                                   Stochastic Gradient Descent   194.4944 0.815937    -2.35604
                                                    Gradient Boosting Model      45.53463 0.843566    -4.13285
                                                      MLP Neural Network         32.52362 0.845643    8.301083
                                                 Least Squares Linear Regression 26.08474 0.862746    4.746065
                                                            AdaBoost             30.88287 0.859943    2.887843
                                                   Bayesian Ridge Regression     50.84047   0.84862   3.112188
                                                          Decision Tree          196.3985 0.859415    3.559582
                                                            ElasticNet           44.01616 0.856563    -3.38396
                                                         Hybrid (Mean)           13.79168 0.978258    -8.01742
FIGURE 18.   All models vs Ripple price.
                                                FIGURE 21.   Zcash price vs number of tweets.
FIGURE 19.   Ripple price vs predicted price.
                                                FIGURE 22.   All models vs Zcash price.
FIGURE 20.   Ripple price vs Google Trends.
VOLUME XX, 2017                                                                                                  1
                                                                 balance showed 114.82$, what conforms method is
                                                                 profitable, especially crypto market is on its down
                                                                 currently. Our bot did about 1-3 transaction per day. In
                                                                 contrast we also used well recognized KryptoBot that
                                                                 managed to convert 100$ into 102.45$ within the same
                                                                 period of time.
                                                                 V.   CONCLUSIONS
                                                                    From the data analysis conducted, we can conclude that
                                                                 cryptocurrency fluctuations depend heavily on social media
                                                                 sentiments and web data bases such as Google Trends. In
                                                                 regard on the future price of the cryptocurrency we can
                                                                 conclude that the Twitter sentiments with respect to crypto
                                                                 price tend to be positive. Many people tweet about crypto
                                                                 even if the price of them goes down giving a positive Twitter
                                                                 sentiment. However, we have identified some problems
FIGURE 23.   Zcash price vs predicted price.                     associated with prediction of crypto. One of the problems is
                                                                 the high level of flexibility of the currency due to volatility
                                                                 nature of cryptocurrency in the current market. We also see
                                                                 that bank regulations, political risk and regulatory agencies
                                                                 caused major fluctuations of the currency during the study of
                                                                 this paper. Our hybrid model, as shown in our results,
                                                                 achieved consistently good results even when shown blind
                                                                 testing data. We found the most powerful predictors to be
                                                                 Google Trends data together with general negative sentiment
                                                                 (including weighted sentiment). Negative news and carries a
                                                                 larger weight, as shown by the correlation values during our
                                                                 data exploration phase. We recommend a hybrid model to
                                                                 help alleviate some of the deficiencies of any one model, and
                                                                 most of the research supports this methodology. Summing up
                                                                 we prove that psychological and behavioural attitude of
                                                                 population has great impact on cryptocurrency prices that are
                                                                 very speculative.
FIGURE 24.   Zcash price vs Google Trends.                          Finally, our solution is shared as Python tool on GitHub
                                                                 repository. The script is capable to customize to any
   From the above analysis which was performed by                currency type, allows custom windows for grouping tweets
comparing sentiment analysis on twitter data against crypto      and averaging sentiment and Google Trends data, it allows
prices for a certain time frame. We showed on the figures        custom number of tweets to extract, connects to Google
for each crypto in analysis the comparison and experiments,      Trends API to get search trends, connects to
the performance of the models against testing data, the          CryptoCompare API to get currency prices, connects to
results of these models for a interval of 10 minutes and the     Twitter API to get tweets containing specified keywords
implementation of our model against the full data set. The       (currencies), performs sentiment analysis on tweets using
last value in our predicted values represents the true           all described models, builds models using Google Trends
performance of the model when making a final prediction.         and sentiment analysis, makes predictions for future price,
We also showed separately the results of the hybrid model.       gives recommendations for each model and totals the
The best result gives an error of less than $6, when             number of buy / sell / hold recommendations for the group.
predicting the final price point. The volatility of crypto has
resulted in daily price swings that are much greater than our    REFERENCES
total error. This suggest the model could be profitable.                   [1]   P. Dolan, R. Edlin, "Is it really possible to build a bridge
                                                                                 between cost-benefit analysis and cost-effectiveness
   Finally, we did an empirical experiment on investing                          analysis?." Journal of Health Economics 21.5 (2002):
100$ and trading it using BitBay Cryptocurrency exchange                         827-843.
for a period of one month. For this we implemented the                     [2]   J. Bollen, H. Mao, X. Zeng, "Twitter mood predicts the
                                                                                 stock market." Journal of computational science 2.1
Python script that automatically gathered predictions every                      (2011): 1-8.
10 minutes, and if it found it profitable to buy, sell or                  [3]   J. Prosky, X. Song, A. Tan, M. Zhao, "Sentiment
exchange crypto (taking into account BitBay fees) the                            Predictability    for       Stocks."     arXiv      preprint
recommended action was taken. After a month the account                          arXiv:1712.05785 (2017).
VOLUME XX, 2017                                                                                                                            1
          [4]    A.M. Rather, A. Agarwal, V. N. Sastry, "Recurrent
                 neural network and a hybrid model for prediction of
                 stock returns." Expert Systems with Applications 42.6
                 (2015): 3234-3241.
          [5]    S. Nie, Q. Ji, "Feature Learning Using Bayesian Linear
                 Regression Model." Pattern Recognition (ICPR), 2014
                 22nd International Conference on. IEEE, 2014.
          [6]    V. Karalevicius, N. Degrande, J. De Weerdt, "Using
                 sentiment analysis to predict interday Bitcoin price
                 movements." The Journal of Risk Finance 19.1 (2018):
                 56-75.
          [7]    D. Garcia, F. Schweitzer, "Social signals and
                 algorithmic trading of Bitcoin." Royal Society open
                 science 2.9 (2015): 150288.
          [8]    L. Kristoufek, "BitCoin meets Google Trends and
                 Wikipedia: Quantifying the relationship between
                 phenomena of the Internet era." Scientific reports 3
                 (2013): 3415.
          [9]    A.K. Kuchibhotla, L.D. Brown, A. Buja, E.I. George, L.
                 Zhao, "A Model Free Perspective for Linear Regression:
                 Uniform-in-model Bounds for Post Selection Inference."
                 arXiv preprint arXiv:1802.05801 (2018).
          [10]   D.J.C. MacKay, "Bayesian interpolation." Neural
                 computation 4.3 (1992): 415-447.
          [11]   A. Salinca, "Convolutional Neural Networks for
                 Sentiment Classification on Business Reviews." arXiv
                 preprint arXiv:1710.05978 (2017).
          [12]   Y.B. Kim, J. Lee, N. Park, J. Choo, J.H. Kim, C.H. Kim,
                 "When Bitcoin encounters information in an online
                 forum: Using text mining to analyse user opinions and
                 predict value fluctuation." PloS one 12.5 (2017):
                 e0177630.
          [13]   A.J. Smola, B. Schölkopf, "A tutorial on support vector
                 regression." Statistics and computing 14.3 (2004): 199-
                 222.
          [14]   L. Bottou, "Large-scale machine learning with
                 stochastic      gradient    descent."   Proceedings    of
                 COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186.
          [15]   J.H. Friedman, "Stochastic gradient boosting."
                 Computational Statistics & Data Analysis 38.4 (2002):
                 367-378.
          [16]   S. Wold, A. Ruhe, H. Wold, W.J. Dunn, "The
                 collinearity problem in linear regression. The partial
                 least squares (PLS) approach to generalized inverses."
                 SIAM Journal on Scientific and Statistical Computing
                 5.3 (1984): 735-743.
          [17]   M. Collins, R.E. Schapire, Y. Singer, "Logistic
                 regression, AdaBoost and Bregman distances." Machine
                 Learning 48.1-3 (2002): 253-285.
          [18]   A.E. Hoerl, R.W. Kennard, "Ridge regression: biased
                 estimation for nonorthogonal problems." Technometrics
                 42.1 (2000): 80-86.
          [19]   R. Kohavi, "Scaling up the accuracy of Naive-Bayes
                 classifiers: a decision-tree hybrid." KDD. Vol. 96. 1996.
          [20]   H. Zou, T. Hastie, "Regularization and variable selection
                 via the elastic net." Journal of the Royal Statistical
                 Society: Series B (Statistical Methodology) 67.2 (2005):
                 301-320.
VOLUME XX, 2017                                                              1