See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/283515335
Twitter Sentiment Analysis: A Case Study in the Automotive Industry
Conference Paper · November 2015
DOI: 10.1109/AEECT.2015.7360594
CITATIONS READS
4 2,555
4 authors, including:
Sarah Shukri Ibrahim Aljarah
University of Jordan University of Jordan
2 PUBLICATIONS 5 CITATIONS 62 PUBLICATIONS 351 CITATIONS
SEE PROFILE SEE PROFILE
Hamad Alsawalqah
University of Jordan
15 PUBLICATIONS 27 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Meta-Heuristic Approaches for tackling data-mining tasks View project
Web Log Clustering Based on Evolutionary Optimization Algorithm View project
All content following this page was uploaded by Ibrahim Aljarah on 06 November 2015.
The user has requested enhancement of the downloaded file.
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
Twitter Sentiment Analysis: A Case Study in the
Automotive Industry
Sarah E. Shukri Rawan I. Yaghi Ibrahim Aljarah Hamad Alsawalqah
Business Information Business Information Business Information Computer Information
Technology Department Technology Department Technology Department Systems Department
The University Of Jordan The University Of Jordan The University Of Jordan The University Of Jordan
Amman, Jordan Amman, Jordan Amman, Jordan Amman, Jordan
Sar8141197@fgs.ju.edu.jo Roa8141203@fgs.ju.edu.jo i.aljarah@ju.edu.jo h.sawalqah@ju.edu.jo
Abstract— Sentiment analysis is one of the fastest growing Researches indicate that using the social media sites is
areas which uses the natural language processing, text mining considered as the best way to grow a business in terms of
and computational linguistic to extract useful information to help money, time, effort and other resources [2].
in the decision making process. In the recent years, social media Although these opinions are meant to be helpful, the
websites have been spreading widely, and their users are
massive availability of such opinions and their unstructured
increasing rapidly. Automotive industry is one of the largest
economic sectors in the world with more than 90 million cars and nature make it difficult for companies to benefit from them.
vehicles. Automotive industry is highly competitive and requires To solve this issue, a number of techniques for analysing data
that sellers, automotive companies, carefully analyze and attend generated by users on social media sites have been developed.
to consumers’ opinions in order to achieve a competitive Sentiment analysis which is known as opinion mining is one
advantage in the market. Analysing consumers’ opinions using such recent techniques. Sentiment analysis uses natural
social media data can be very great way for the automotive language processing, text mining and computational linguistic
companies to enhance their marketing targets and objectives. In to extract useful information and knowledge from source data.
this paper, a sentiment analyses on a case study in the automotive The purpose of sentiment analysis is to classify polarity from a
industry is presented. Text mining and sentiment analysis are
source text into positive, neutral and negative. Text mining is
used to analyze unstructured tweets on Twitter to extract the
polarity, and emotions classification towards the automotive a crucial step in sentiment analysis where unstructured data
classes such as Mercedes, Audi and BMW. We can note from the are analysed and scored based on how much it relates to a
emotions classification results that, “joy” category is better for specific concept, in order to be classified later based on its
BMW comparing to Mercedes and Audi, The “sadness” given score [3].
percentage is larger for Audi and Mercedes comparing to BMW. Automotive industry is one of the largest and highly
Furthermore, we can note from the polarity classification that competitive economic sectors in the world. Due to the high
BMW has 72% positive tweets compared 79% for Mercedes and competition, automotive companies are moving toward using
83% for Audi. In addition, the results show that BMW has 8% social media sites to reach further customers and advertise
negative polarity compared 18% for Mercedes and 16% for
their products in considerably short time.
Audi.
Twitter is one of the highest growing social media websites
Keywords— Sentiment Analysis; Twitter; Automotive; in the world. Twitter is a micro blogging services which
Classification enables users to tweet within any topic with a maximum
length of 140 characters. As of June 20151, Twitter has more
I. INTRODUCTION than 500 million users, out of which more than 302 million are
Others’ opinions have always been an important piece of active users. With an average of 500 million tweets created
information for consumers when it’s time to make buying daily; twitter became one of the greatest sources of
decision. Long before awareness of the World Wide Web information that is available on the Internet [4]. Thus, twitter
became widespread, people often rely on their friends’ data can be very useful for automotive marketers because it
recommendations and specialized magazines or websites as can be used for mining consumers’ opinions and reviews in
the main sources of information. But with the growth of the the automotive industry using sentiment analysis. This can
web over the last decade, the social media nowadays provides provide useful insights to help companies in creating a
new tools to efficiently create and share useful information competitive advantage over their competitors.
[1]. This made it possible to find out about experiences and
the opinions almost everywhere (blogs, forums, social
networks, news portals, and content-sharing sites, etc.).
1
about.twitter.com/company
978-1-4799-7431-3/15/$31.00 ©2015 IEEE
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
This research applies sentiment analysis to analyse peoples’ the model to classify the tweets using Naive Bayes algorithm
opinions and reviews about three automotive companies: (NB) based on sentiments (positive, negative and neutral).
Mercedes, Audi, and BMW. To do so, tweets are extracted Another work [10] introduced what is called the J.D. Power
from twitter and processed using text mining techniques. and Associates (JDPA) sentiment Corpus. The JDPA corpus
These tweets are then used in the sentiment analysis to classify consists of users’ blog posts containing opinions about
tweets based on the sentiment that is expressed in a text [5]. At automobiles. Moreover, the authors presented statistics
the end, tweets are classified into three categories: positive including inter-annotator agreement and catalogued
sentiment, negative sentiment, or neutral sentiment. As the components of sentiment that occur naturally.
attempts to apply applying sentiment analysis in the The authors in [11] analyzed a data set of around 730,000
automotive industry, to the best of our knowledge, are very Tweets published in a time frame of 19 weeks using sentiment
few [10, 11], the results of this research can provide further analysis. Within this data set, they analyzed those Tweets
insights about the importance of analysing the consumers’ dealing with the corporate crisis of Toyota in 2010. Their
reviews and opinions in this industry. focus was on the dynamics of discussions in social media in
The remainder of this paper is organized as follows: Section order to reflect sentiments within these discussions. The
II presents the research work related to this research. Section authors Identified and investigated specific stages of
III presents the methodology. Section IV presents a communication, which they called “quiet stages” and “peaks”.
demonstration of the method on the case study and discusses
the results. Section V concludes the paper with a summary and III. METHODOLOGY
an outlook on future research direction.
As the usage of social media sites grows and extends, the
companies can use social media sites to assess their state in the
II. RELATED WORK
market as well as their competitors. This can be done by
With the explosion of Web 2.0 platforms, social media sites studying the data generated by users on these sites. Such data
become a huge source for consumer voices. Capturing and tells about users’ opinions and comments about these
analyzing public opinions from social media sites has recently companies’ products or services. Thus, in this paper we will
enjoyed a huge burst of research activity. One of The resulting study the automotive industry in social media, and try to
emerging fields is sentiment analysis [1, 5]. Subsequently answer the following questions:
there have been literally hundreds of papers published on the What is the rate of using these companies’ data by users?
subject. Among these papers, we focus on the most related to What is the percentage of negative reviews and comments
the work presented in this paper as follows: compared to the positive ones?
In paper [6], the authors analyzed three of the most popular Who is the leader in automotive sector based on polarity
companies in pizza industry by using text mining. The authors classifications of reviews and comments?
studied information from social media sites about the users of
While the social media provides a great engagement of
those companies and their competitors. The goal was to help
users, and leads to incredibly high level of communication
those companies improve their services and strategies to
between the user and the seller, still there are some industries
attract more customers. They found that social media sites
that do not engage in social media. The automotive industry
have an important role in creating competitive advantage.
represents a great example of engagement in social media, as
Authors recommended that good understanding and use of
published in 2014 CMO council report: 1 out of 4 - which
social media users’ information can improve the relationship
equals 23%- of car buyers has discussed other users’
of companies with their users, improve their services’ levels,
experiences and reviews before purchasing their car. 38% of
and improve the quality of their decision.
cars’ costumers said that they will use social media in the next
Another work [7] presented a new approach to provide
purchase. 84% of the car’s customers use Facebook with a
decision support for vehicle defect discovery. Authors used
24% of them using social media sites to purchase their last car
many techniques such as text mining and sentiment analysis
and in the range of October 2012- April 2013 an amazing
on popular social media communities. Their focus was on
increase in the number of clicks of automotive Ad’s on
improving vehicle quality management by analyzing social
Facebook occurred to jump up from 16% to 39%2.
media. They found that a good analysis of social media data
In this paper, we will first discuss the level of engagements
can improve automotive quality management strategies.
in social media of these three automotive manufacturers. We
As an attempt to overcome the challenges that may face the
extracted the engagements percentage from the Talkwalker
developers while developing opining mining tools, the authors
API3. BMW, Mercedes and Audi are defined to be of the
in [8] developed a model rule-based approach which can
largest automotive brands in Europe, it’s very critical to
analyze the linguistics of social media sites.
discuss the level of their engagement in social media. Figure 1
In [9], we can find a case study which applies sentiment
shows the engagement percentage in different social media
analysis on twitter. Authors presented a method to make
sites.
sentiment analysis and opinion mining using tweets. The first
step in the presented method is collecting the corpus and
preparing it for the analysis while the second one is building 2
www.cmocouncil.org
3
www.talkwalker.com
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
As we can note in Figure 1, BMW has the largest C. Sentiment Analysis Models
engagement percentage in twitter with a percentage of 62%. We used the classification algorithm Naïve Bayes (NB) to
Mercedes also has the largest engagement percentage throw classify the polarity and emotions in the sentiment analysis.
online news, Blogs, and Other with 18%, 6%, and 30%, The NB algorithm is simple, easy to implement and efficient
respectively. Audi also has engagement percentage through with acceptable accuracy. Furthermore, two sentiment models
twitter comparing to Mercedes with a percentage of 59% are investigated based on polarity lexicon [13], and emotions
(Audi), and 47% (Mercedes). lexicon [14].
The NB algorithm is a simple probabilistic model that
assumes all the data attributes are independent. The
probabilistic model uses the Bayes theorem to solve the
classification problems such as the maximum posterior
probability of the class label given the attributes set is
calculated. Bayes theorem is given by the following equation:
(1)
Where C is a Class label, X is the attributes set, while P(C)
and P(X|C) are the prior probability of the class and the
conditional probability of the attributes given the class.
The first sentiment model uses NB classifier, which is
Figure 1. Social Media Sites engagement percentage
trained by the training data set, and makes use of Wiebe's
polarity lexicon [13]. The training data set is annotated to
A. Data collection three classes: positive, neutral and negative tweets.
In this paper, we collected data from twitter using the The NB polarity classifier uses polarity lexicon based on
twitter API. The corpus had 3000 tweets, tweets are extracted the matching criteria between the tweet words and lexicon
using R4. words. When the training process is finished and the model is
well trained, the second step begins to test the model using
B. Data pre-processing testing data set, which is not labeled. The testing process is
Tweets are filtered to be in English language. The corpus used to assess the accuracy of the built model. The last step is
contains three types of cars: Mercedes, Audi, and BMW. Each to validate the model and extract the polarity percentages for
type is represented by 1000 tweets. The tweets are extracted the three categories; positive, negative, and neutral.
based on the search query using “@” annotation followed by The second NB classifier is trained on training data set and
the car’s type. To build a good experiment, Dataset of each makes use of emotions lexicon using the Strapparava emotions
car's type was extracted from twitter pages and users. After lexicon [14]. The training data set is annotated to seven
that, we have started to prepare the extracted datasets by classes: anger, disgust, fear, joy, sadness, surprise, and
cleaning them from any unnecessary characters such as unknown tweets. Like the polarity classification, the matching
retweets and usernames' symbols, hashtags, numbers, criteria between the tweet words and emotions lexicon words.
punctuations, stop words, whitespaces and html links. In this
paper, we applied the following text mining pre-processing IV. RESULTS
techniques:
The tweets collected about BMW, Mercedes, and Audi
· Tokenization: that reads the text that will be mined and
contains the @BMW tag, @Mercedesbenz, and @Audi,
removes all tabs and punctuations between words and
respectively. Each tweet is analysed and classified to be
replaces them with a white space,
positive or negative or neutral tweet based on a query term and
· Filtering: that will remove words such as: stop words,
polarity classification. Table I, Table II, and Table III contain
extremely repeated words and rarely repeated words,
some tweet samples about BMW, Mercedes, and Audi,
· Lemmatization: which will be used to transform all the respectively and the polarity classifications.
verbs to the infinite tense and all the nouns to the singular
form. TABLE I: TWEETS’ SAMPLES (BMW)
· Stemming: will be used to return all the words to their Tweet Polarity Classification
basic forms where it will remove the plural ‘s’ from the #BMW Nice car, you can try Positive
nouns and the ‘ing’ from the verbs. it?"
Elegance and sportiness united Positive
in one vehicle: the new
#BMW #series Coupé
4 such a bad car #BMW Negative
https://www.r-project.org/
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
Figure 3 shows emotions classification results for three
automotive companies. BMW emotion classifications are
TABLE II: TWEETS’ SAMPLES (MERCEDES)
79% labeled as “unknown”, 5% “Joy”, 0.5% “Surprise”, 9%
Tweet Polarity “Sadness”, 0% “Fear”, 5.5% “Anger” and 1% for “Disgust”.
Classification Mercedes emotions categories are 56.6% labeled as
@MercedesBenz Intelligent Positive “Unknown”, 31.9% “Joy”, 0.5% “Surprise”, 4.1% “Sadness”,
innovation and safety as never before. 0.4% “Fear”, 6.4% “Anger” and 0.1% for “Disgust”. Audi
Preview of the future of the #EClass emotions categories are 63.2% labeled as “Unknown”, 10%
Amazing @MercedesBenz 300 SLR Positive “Joy”, 17.7% “Surprise”, 5.1% “Sadness”, 0.2% “Fear”, 1.3%
@MercedesBenz That's not what we'd Negative “Anger” and 2.4% for “Disgust”. These results give a good
expect. Please contact your local indicator for customers seeking to buy cars and help them to
Workshop so that our Technicians take a right decision. We can note that, “joy” category was
inspect the issue. better for BMW comparing to Mercedes and Audi. This is can
be due to the fact that positive reviews are not necessary to be
“Joy” always, other categories can be also determined as a
TABLE III: TWEETS’ SAMPLES (AUDI)
positive, since it has no negative implication.
Tweet Polarity Classification
@audi Probably one of my Negative
worst decisions was buying an
Proud to own an Audi @audi Positive
@audi Sorry RPM but this is Negative
rubbish. There is so much
great motor sport happening
and you dish up crap
@Audi Excellent SUV from Positive
Audi! Beautiful Car!
Polarity classification for BMW, Mercedes, and Audi are
shown in Figure 2. The figure shows that BMW has 72%
positive tweets compared 79% for Mercedes and 83% for
Audi. Furthermore, the figure shows that BMW has 8%
negative polarity compared 18% for Mercedes and 16% for Fig 3. Emotion Classifications for BMW, Mercedes, and Audi
Audi. This gives a good indication for customers seeking to
buy cars from the manufacturers that have a good reviews and V. CONCLUSION
comments from users owning this car and it gives indications
to competitors that Audi is a huge competitor. Sentiment Analysis is considered one of the most attractive
fields that encourage to study and apply in various sectors. In
this paper, sentiment analysis models are applied on three of
most leading automotive industry companies to extract the
polarity and emotions (opinions) of customers around each
company, which are very useful information that helps in
marketing. The results showed that Audi’s positive polarity
was higher (83%) than other companies. On the other hand,
the negative polarity of Audi is less than all other companies.
This means that for example offers in Audi’s page would
circulate to higher number of satisfied people than in BMW
and Mercedes.
Furthermore, the analysis results show that that the
percentage of positive reviews in Audi are the most among the
three companies with a percentage of 83%. In addition, Audi
negative polarity is less than others with a percentage of 16%.
We can conclude that, the Audi users have more satisfaction
comparing to the other users. This will help the users that
welling to buy a car to compare between the three of the
Fig 2. Polarity Classification for BMW, Mercedes, Audi companies based on the previous users' opinions. In addition,
the emotions classification results were consistent with the
polarity classifications, and give more information about each
polarity class.
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
REFERENCES
[1] Cambria, Erik, et al. "New avenues in opinion mining and sentiment [8] Maynard, Diana, KalinaBontcheva, and Dominic Rout. "Challenges
analysis."IEEE Intelligent Systems 2 (2013): 15-21. in developing opinion mining tools for social media." Proceedings of
[2] Edosomwan, Simeon, et al. "The history of social media and its the@ NLP can u tag# usergeneratedcontent (2012): 15-22.
impact on business." Journal of Applied Management and [9] Pak, Alexander, and Patrick Paroubek. "Twitter as a Corpus for
entrepreneurship 16.3 (2011): 79-91. Sentiment Analysis and Opinion Mining." LREC.Vol. 10. 2010.
[3] Li, Nan, and Desheng Dash Wu. "Using text mining and sentiment [10] Kessler, Jason S., and Nicolas Nicolov. "The JDPA Sentiment Corpus
analysis for online forums hotspot detection and forecast." Decision for the Automotive Domain."
Support Systems 48.2 (2010): 354-368. [11] Stieglitz, Stefan, and Nina Krüger. "Analysis of sentiments in
[4] Lima, Ana CES, and Leandro N. de Castro. "Automatic sentiment corporate Twitter communication–A case study on an issue of
analysis of Twitter messages." Computational Aspects of Social Toyota." Analysis 1 (2011): 1-2011.
Networks (CASoN), 2012 Fourth International Conference on.IEEE, [12] Rish, Irina. "An empirical study of the naive Bayes classifier." IJCAI
2012. 2001 workshop on empirical methods in artificial intelligence.Vol.
[5] Pang, Bo, and Lillian Lee. "Opinion mining and sentiment 3.No. 22.IBM New York, 2001.
analysis."Foundations and trends in information retrieval 2.1-2 [13] Wilson, Theresa, JanyceWiebe, and Paul Hoffmann. "Recognizing
(2008): 1-135. contextual polarity in phrase-level sentiment analysis." Proceedings
[6] He, Wu, ShenghuaZha, and Ling Li. "Social media competitive of the conference on human language technology and empirical
analysis and text mining: A case study in the pizza methods in natural language processing.Association for
industry." International Journal of Information Management 33.3 Computational Linguistics, 2005.
(2013): 464-472. [14] Strapparava, Carlo, and Alessandro Valitutti. "WordNet Affect: an
[7] Abrahams, Alan S., et al. "Vehicle defect discovery from social Affective Extension of WordNet." LREC.Vol. 4. 2004.
media."Decision Support Systems 54.1 (2012): 87-97.
View publication stats