KEMBAR78
Final Sentiment Classification | PDF | Machine Learning | Deep Learning
0% found this document useful (0 votes)
8 views16 pages

Final Sentiment Classification

This paper presents a sentiment classification model utilizing various machine learning algorithms and deep convolutional neural networks (DCNN) to analyze text reviews from social media. It focuses on feature representation techniques, comparing TF-IDF, word2vec, and glove vectorizers, and demonstrates that the combination of word2vec with DCNN achieves the highest accuracy of 85.7%. The study aims to enhance sentiment classification performance and proposes future work involving advanced techniques like transfer learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

Final Sentiment Classification

This paper presents a sentiment classification model utilizing various machine learning algorithms and deep convolutional neural networks (DCNN) to analyze text reviews from social media. It focuses on feature representation techniques, comparing TF-IDF, word2vec, and glove vectorizers, and demonstrates that the combination of word2vec with DCNN achieves the highest accuracy of 85.7%. The study aims to enhance sentiment classification performance and proposes future work involving advanced techniques like transfer learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

An Effective Sentiment Classification by using Machine

Learning Algorithms and Deep Convolution Neural


Network

Ramya Perumal1, a), Karthik Balasubramanian 1,b)

1
Dept. of CSE, Sona College of Technology, Junction Main Road, Salem-636005, Tamilnadu, India
2
Dept. of EEE, Sona College of Technology, Junction Main Road, Salem-636005, Tamilnadu, India
a)
Corresponding author : ramyaperumal@sonatech.ac.in

Abstract. Nowadays social media plays a significant role in all sorts of our activities ranging from analyzing the
attitude of a person for the job, getting opinions towards buying a product, acting as a forum for exchanging thoughts
about the current events of various domains, creating awareness to the public about the natural calamities, educating
the public about the fraudulent news spread by the fakers, initiating the young aspirant to protest against any societal
issues, etc. Grasping the opinions shared by the experienced people towards a product, film, event, news, or politics
like any subject of matter is one among the worth noting applications for a common man. It extends its application to
making decisions about our day-to-day activities. The text reviews consist of enormous, sparse, non-uniform
distribution of words represented as features. Text mining is the backend process for those applications. It includes
techniques such as feature representation, sentiment classification, feature optimization, etc. Analyzing the opinions
suggested by the experienced people as positive and negative reviews is a challenging process and it is the baseline
of our work. This paper contributes to the related processes involved in analyzing the sentiments from the text
reviews and accurately classifying them based on their polarity. In the proposed work, we particularly focus on
feature representation techniques that have a major effect on enhancing the performance of sentiment classification.
We explore different feature representation models such as TF-IDF vectorizer, word2vec vectorizer, and glove
vectorizer as these word embedding models are interpreting the words and their syntactic and semantic relationships
differently from the corpus. Also, we employ machine learning algorithms and a deep convolution neural network to
perform comparative studies in classifying the sentiments. The word2vec in combination with Deep Convolution
Neural Network provides the accuracy of 85.7%, precision of 84.4%, recall of 87%, and F-measure of 85.7%
compared to other models.
1. INTRODUCTION
In this digital era, on average millions of text data are generated per second on social media. The generated
data possess valuable information [16]. It is noted that the majority of the source of data is in textual data format
and they are mostly in the unstructured form [23]. Extracting valuable information from the data requires
different techniques in the text mining process. Social media is one such platform where we have voluminous
data generated in different varieties at a greater speed [10]. People get attracted to social media for exhibiting
their interests and sharing their opinions about any event or product. Nowadays social media gains its popularity
and enables several applications. Even a common man could perform decision-making for their daily activities
by examining the text reviews posted by many users about the topic of interest [9]. Sentiment analysis is the
technique behind the application where the naïve user gets benefited from it. Analyzing the text reviews just by
extracting the features from it facilitates sentiment classification. Feature representation in texts plays a vital
role in sentiment classification. There are different text representation models. They are the TFIDF model,
word2vec model, and glove word embedding models [4]. The traditional TFIDF model lacks in finding the
words in text reviews semantically and also it does not consider the word order. In the word embedding model;
word2vec overcomes the limitations present in the traditional model. It is used to determine syntactic and
semantic relationships that exist between words. It considers the contextual words to disambiguate words from
text reviews. Also, it preserves the order of the words in the text reviews which is most important for sentiment
classification. These two characteristics in this model show promising results in sentiment analysis.
The evolution of machine learning algorithms has a wider span of applications like Pattern recognition, Spam
detection, etc. To perform sentiment classification, we use machine learning algorithms. Machine learning
algorithm replicates the human learning process. Humans learn things by experience the same way machines
learn things by labeled data [2]. In the learning phase, the machine extracts the features associated with the
labeled data. During the testing phase, it predicts the labels of unseen data based on features of data. Our
proposed model uses machine learning algorithms such as Logistic Regression, Support Vector Machine, and
Random Forest for evaluating the performance of sentiment classification. It gives us good results in terms of
accuracy, precision, recall, and f1-score.
To perform comparative analysis in sentiment classification, we also use Deep Convolution Neural Network
(DCNN). It gains its popularity in Computer Vision, Natural Language Processing, and Image Classification
[15]. DCNN is a subset of machine learning algorithms. It constructs a deep neural network that exhibits similar
behavior representing how the human brain functions. It consists of the input layer, one or more hidden layers,
and an output layer. More hidden layers mean deeper the network which in turn provides more inherent details
of the data. Our work is to categorize the text reviews as positive sentiment or negative sentiment. The
performance measures show comparatively better improvement than machine learning algorithms in sentiment
classification.
Our contribution to the work is by collecting the datasets from the online data repository. Then perform pre-
processing of the texts in the corpus for further experimentation. The work contributes different feature
representation models as features play a vital part in sentiment classification. We investigate the feature
representational techniques such as TF-IDF vectorizer, word2vec vectorizer, and glove vectorizer individually
for experimentation of our dataset. We adopt both machine learning algorithms and deep convolution neural
networks to perform the comparative analysis in sentiment classification. We evaluate our proposed system by
using various evaluation measures viz. precision, recall, f1-score, and accuracy. To improvise the performance
of the classifier, we use feature optimization to fine-tune the hyper-parameters present in DCNN. We also
prepare documentation of our work like research articles. Our next plan is to extend the work by using advanced
techniques such as transfer learning in sentiment classification as data availability is scarce in certain classes in
real-world data. Also, we could go in another direction to extend our work through an unsupervised approach. In
an unsupervised approach, the labeled data is not available. We identify positive and negative features present in
the text reviews and compute the overall score of the reviews to assign the documents to their corresponding
sentiment class.
The contribution of the paper is organized as follows. Section 2. outlines the related works of our proposed model.
Section 3. discusses the proposed system and its operation. Section 4. describes the experiments. Section 5.
provides the results and discussion, and finally, Section 6 presents the conclusion and future enhancement.
FIGURE 1. Machine Learning Vs Deep Learning
2. RELATED WORKS
Many existing works use different feature selection and feature reduction techniques in classifying the
sentiments. In the early days, Single Value Decomposition; a dimensional reduction technique is used to capture
the lowest dimensional representation of input data. It uses matrix decomposition to obtain highly discriminative
features which are quite sufficient to classify the reviews to their sentiment classes. Also, the different types of
machine learning algorithms are used to improvising the performance of the system. The performance of
classifiers completely relies on optimal feature selection or feature reduction techniques. In the existing work,
the spider monkey crow optimization algorithm and Deep RNN are used for feature selection and sentiment
classification respectively [11][23]. The conventional tfidf representation model is used in combination with
machine learning algorithms such as naïve bayes, Linear SVC; logistic regression etc. providing good results.
The n-gram model is used to preserve word order but it suffers from data sparsity and it increases time and
space complexity. Sentiment classification can be conducted based on document level, sentence level, and
aspect-based features [12]. There are two different approaches to be handled for classifying the sentiments. They
are supervised and unsupervised approaches. Our proposed system employs a supervised approach to document-
level sentiment classification where class labels are provided to the system. Document-level sentiment
classification highly depends on computing the number of positive and negative words and finding the overall
score to label the test documents. Sentence level sentiment classification considers each sentence in the
document to capture its sentiments. Aspect level sentiment classification considers each feature and its
corresponding sentiment to classify the unseen documents. In an unsupervised approach, class labels are not
available, the user needs to identify the positive and negative words by using SentiWordNet; a lexical database
[11]. The lexical database is integrated into the research works for capturing adjective and adverb words present
in the text corpus and computing the overall score depending on the number of positive and negative words to
label the documents accordingly.
The emergence of deep learning and its outstanding performance in computer vision also initiate to use in
Natural Language Processing. The progress is also remarkable in NLP tasks [17]. There are different types of
deep neural networks. They are convolution neural networks and recurrent neural networks. The advantage of
the deep neural network over machine learning algorithms is that it does not require any feature extraction
technique separately by default it performs a feature extraction process. The convolution neural network consists
of the convolution layer and the pooling layer. The convolution layer extracts features ranging from high level
to low level. The pooling layer diminishes the computational time and space of the neural network. The
recurrent neural network has a recurrent layer to remember the previous data. But it suffers from vanishing and
exploding gradient problems and also does not capture longer dependencies in the texts [16]. Variants of RNN
are LSTM and GRU are used to resolve the limitations present in RNN. GRU has only two gates and it is
computationally faster when compared to LSTM [17]. Many research works are carried out by using CNN and
RNN separately. There are also a few works that use hybrid architecture by combining CNN and LSTM.
Together with the model architectures CNN and LSTM improvise the performance, particularly in sentiment
classification. CNN captures local features from the texts and LSTM captures the longer dependency features
from the texts [4][19]. The bidirectional LSTM preserves both previous and future context information during
forward and backward propagation respectively [16][17]. Multi-task learning is another effective method to
improve performance that is utilized when enough dataset is not available for single-task. MTCNN-LSTM
provides comparatively better results than the conventional classification algorithms, particularly in the
evaluation metrics viz. accuracy and F1-score [19]. Compared to complex model architecture, the simple single-
layered BiLSTM with a global pooling mechanism outperforms sentiment classification [20]. The selection of
deep neural network architecture completely relies on the undertaken problem, characteristics of the dataset, and
label correctness. The performance of the deep neural network model architecture can be improvised by
adjusting the weights of the hyper-parameters to reduce errors.

3. PROPOSED MODEL

The purpose of our proposed model is to categorize the positive and negative sentiments of the text reviews. It
consists of different modules such as preprocessing, feature representation models, data partitioning, machine
learning models, and deep convolution neural network models.
Model
Feature Building
Training
Extraction (ML/DL)
Preprocessing Data
(Word2Vec)
Text Tokenization (Tf-Idf) Data
Reviews Stop word (Glove) Partitioning
removal
Lemmatization Test Data Trained
Model

Sentiment
Classification

FIGURE 2. Block Diagram of the Proposed model

3.1 Pre-processing Techniques


Tokenization

The process of transforming the text reviews into tokens is known as Tokenization. The text reviews are
converted to lowercase. It eliminates numerals, punctuations, and white space from the text reviews. Tokens are
the character sequence that acts as semantic identifiers that are uniquely identified [13].

Stop word Removal

There are certain words such as conjunction and preposition that are considered irrelevant words. These words
or features are known as stop words. Those words provide no meaning to the text. The process of eliminating
those words is known as Stop word removal [13].

Lemmatization

The process of transforming the word to its base word that is uniquely identified in the IR dictionary is known
as Lemmatization. Typical text reviews are high-dimensional. Lemmatization is widely used to reduce the
number of dimensions in text reviews.

3.2 Feature Representation Models


Tf-idf Vectorizer Model

This is the conventional model in which words in the text corpus are represented as columns and each
document corresponds to a row. In this way, it constructs a term-document matrix. Each cell in the term-
document matrix represents the tf-idf value of the respective word in the text document. The Term Frequency
represents the number of times that the particular word occurred in the text document. The inverse document
frequency represents the number of times the word occurred across the text documents in a text corpus [8]. The
term or feature is considered important in the text document if it has a high tf-idf value [1][13]. It acts as a
weighting measure to identify the core features of the text documents in a text corpus. The drawback of this
conventional model is its high dimensionality and not capturing semantically relationships that exist between
words [14]. Also, it ignores the word's order. These are the limitations present in the conventional model.

Word2Vec Model

It is the text representational model which preserves contextual information present in the text reviews. The
words in the vocabulary are represented in the distributional representation of vectors. The semantically related
words are represented as a single vector. It is a prediction-based word embedding model. There are two different
word embedding models in the word2vec model. They are Continuous Bag of Words (CBOW) and Skip-gram
word embedding models [14]. Our proposed system uses the gensim word2vec vectorizer model that accepts the
collection of text documents as input and provides a vector representation of each word in the text documents of
a corpus [18]. The working principle of the word2vec model is that it constructs the deep neural network that
includes the input layer, hidden layer, and output layer. The n-dimensional text documents are fed into the input
layer of the DNN. In the hidden layer, two network parameters Wcontext and Wword are learned to adjust their
weights for assigning the same vector representation to semantically similar words. In this way, the latent
relationships that exist between words in the corpus are revealed from the concurrence matrix. In the hidden
layer, the co-occurrence matrix is generated which maintains contextual information of each word in the
vocabulary. It consists of Wcontext as columns that represent all unique words in the vocabulary. The Wword
are represented as rows that consist of all unique words in the vocabulary in a concurrence matrix. Both rows
and columns represent words in the vocabulary. Each cell in the co-occurrence matrix represents distributed
representation of words in vectors. The output layer generates the vector representation of each word in the
vocabulary.

Glove Vector Model

Count-based models rely on global co-occurrence counts from the entire corpus for computing word
representation. Whereas the prediction-based model learns word representations using co-occurrence
information. It is a hybrid model that combines both count-based and prediction-based models. It is a pre-trained
embedding model in which words and their associated vector representations are mapped in a corpus. The Glove
corpus consists of 3.3 billion words and their vector representation. The words that are not present in a corpus
are assigned to the value 0. It represents words according to the coexistence statistics of the words in the dataset
[4]. Our proposed model provides text documents as input to the Glove vectorizer and obtains dense vector
representation of each word in the text documents. We employ a 100-dimensional pre-trained word embedding
glove model. The embedding matrix is constructed that has a dimension of v*d where v corresponds to unique
words in a dataset and d represents the dimension of the dense vector [20]. It uses the matrix factorization
method for generating a lower and dense representation of words in a text corpus. It provides global statistical
information about the corpus.

3.3 Data Partitioning

The stratified ten cross-folds are used in which data are randomly shuffled and split into ten folds. At each
iteration, the ith fold is separated and used as test data whereas the remaining folds are used as training data.
This approach is used to avoid biasing, improve the performance of the classifiers and preserve uniform data
distribution. Stratified sampling assures the samples are taken evenly from all classes of datasets.

3.4 Machine Learning Models


In supervised learning, the text reviews with its classes defined as positive sentiment and negative sentiment are
given as input to the proposed model. The proposed model uses both machine learning and deep learning
algorithms to perform sentiment classification. As the name suggests, the machine learning algorithms learn the
data by experience [6]. The performance of the classifier is improved by providing more labeled data during
training. The model predicts the class of unseen data during testing.

Linear SVC classifier

The Linear SVC classifier uses a hyper-plane to categorize the classes of data. High-dimensional data uses a
hyper-plane to discriminate the classes of texts [10]. Our proposed system performs binary sentiment
classification; hence it uses a decision boundary line to classify positive and negative reviews [23]. The margin
of the data separating the two different classes is maximized to obtain an optimal decision boundary line from
all the possible decision lines,

yi=wTxi+b>=1 for yi=1 (1)


yi=wTxi+b<=-1for yi=-1

Logistic Regression Classifier


The Logistic Regression classifier is used for both classification and regression tasks. It uses a sigmoid function
to compute the log odd ratio and maximum likelihood function for the given data belonging to the class. The
sigmoid function value ranges from 0 to 1. Say for example the text review is 49% in which the threshold value
is set as 50% hence it belongs to the negative class and if it is 51%, then it belongs to the positive class. The
perception of this statement is too harsh. We need a smoothing function to redefine the same statement as to
how probable the text documents belonging to the corresponding class are. The outcome of the resultant
function Z (dependent variable) is as follows,

Z = WX + B
(2)

Where X is the independent variables also called the input variables, and B is the bias.

hΘ(x) = Sigmoid (Z) (3)


Sigmoid (t) =1/1+e-z
The underlying hypothesis function is the sigmoid function. Z lies between negative infinity and positive
infinity. If it goes positive, then the predicted output Y will become 0. Suppose if it goes negative infinity,
then the predicted output will become 1.

Random Forest Classifier

The Randomized Forest classifier is a supervised technique that is used for both regression and
classification. As the name implies, the random forest consists of many decision trees[12]. By default, it
constructs 100 decision trees for the given randomized samples. It is an ensemble-based learning method
that performs better than a single decision tree as it lessens the over-fitting by taking an average of the
resultant trees that are considered for the given task[7]. It constructs a bundle of decision trees based on
various subsets of the given dataset. It predicts the class of data based on the majority voting of the decision
trees. The selection of the attributes for splitting the data is done based on the measures such as information
gain and entropy. It is given by,

Information Gain (T, X) = Entropy (T) — Entropy (T, X) (4)

Where T = output variable, X = Feature to be split on, and Entropy (T, X) is calculated after the data are split
based on feature X.

3.5 Deep Learning Model

The popularity behind the Deep Convolution Neural Network is that it simulates the working principle of the
human brain. The deep convolution neural network and its operation are explained as follows

Deep Convolution Neural Network

The Convolution Neural Network is a deep neural network widely used for image classification. It
comprises the convolution layer, pooling layer, and fully connected layer. Our proposed model is sentiment
classification that focuses only on texts. Hence 1-dimensional single-layered convolution layer is used for texts.
The word vectors generated by the word2vec or glove or tf-idf vectorizer are provided as an input to the
convolution layer. It acts as a feature extractor that captures features ranging from a higher level to a lower level
[5]. It uses a filter of size 5 for extracting the core features by sliding over the input matrix for text processing.
The length of the input sentence is given by,

x1: n = x1 ⊕ x2 ⊕ · · · ⊕ xn (5)

Where ⊕ is the concatenate operation and xi is the ith word in the text document.
It accepts input in a matrix form where the row is the distributed representation of words and the column is the
length of the word vector in the document. The feature map c1 is generated by convolution operation using
consecutive h words and a filter,

ci = f (w· x: h + b) (6)

Where b is the bias term, w and f(.) are the weight of the convolution filter and a non-linear activation
function respectively. The ci represents the feature map that constitutes [c1, c2...ci] [17].

A convolution layer is successively followed by a max-pooling layer. The necessity of the max-pooling layer is
to reduce the dimension of the feature vectors by eliminating redundant and irrelevant features. For each feature
map, we get the maximum value of the feature map by applying the max-pooling operation. The filter size of 5
provides 128 feature maps each is used; finally, the fully connected layer is used as the output layer for
classifying the text reviews. The non-linear activation function relu is used in the hidden layer to trigger the
neurons in the network. The output of the relu activation function increases as input increases but it is zero on
the negative side. The dropout layers are employed to remove biasing and over-fitting the data [3]. It is widely
used after the max-pooling layer to leave certain hidden units during the operation of the network to enhance its
results. The batch-normalization is used to regularize the network to lessen the network complexity. Finally, we
use the activation function namely sigmoid which ranges from 0 to 1 in which 0 denotes negative sentiment and
1 denotes positive sentiment respectively.

FIGURE 3. Deep Convolution Neural Network

The following presents the structure of CNN. model. It consists of a single convolution layer followed by a
max-pooling layer and a dense layer.

TABLE 1 Structure of CNN model


Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 1171, 128) 5275904

conv1d_1 (Conv1D) (None, 1167, 128) 82048

batch_normalization_1 (Batc (None, 1167, 128) 512


h Normalization)

global_max_pooling1d_1 (Glo (None, 128) 0


balMaxPooling1D)

dropout_2 (Dropout) (None, 128) 0

dense_2 (Dense) (None, 128) 16512

dropout_3 (Dropout) (None, 128) 0

dense_3 (Dense) (None, 1) 129

=================================================================
Total params: 5,375,105
Trainable params: 5,374,849
Non-trainable params: 256

3.6 Feature Optimization

Various feature optimization techniques are implemented in the proposed work. It is also an equally essential
process for improving the performance of the classifiers.

Data distribution

We employ a stratified sampling technique to maintain the uniform data distribution of each class. It helps to
avoid biasing which in turn improvising the performance of the classifiers.

Hyper-parameter tuning

There are various parameters are present in model architecture. The weights can be updated after each iteration
of the batch to bring the network to minimize the errors during back-propagation. In our proposed system, we
concentrate on learning rate parameters to be fine-tuned to make the network stable and also free of errors. The
learning rate acts as a controlling parameter that tells how rapidly a network updates the parameter. If its value
is small, it slows down the learning process but converges smoothly. If its value is higher, the learning process is
faster but it does not converge. Initially, we adopt a trial-and-error approach to finding the optimal value for the
learning rate. We randomly pick the learning rate values and analyze the results. Later, we manually set learning
rate ranges from 10-1 to 10-7 to find the optimal value of the learning rate for the given dataset. It is found that the
learning rate of 10-4 for the given dataset shows improvement in the performance of the classifier.

Then we find different learning rate methods to reduce the learning rate as the training process progresses. It
includes time-based decay, step decay, and exponential decay. Initially, we set a constant learning rate as the
default learning rate method. The momentum and the decay rate are set to 0. Momentum tends to obtain faster
convergence. It serves as a baseline for us to determine the optimal learning rate. In time-based decay rate,
momentum is set to 0.8, and decay rate can be calculated based on learning rate and epochs as follows,

decay rate=learning rate/epochs (6)

Step decay drops the learning rate by half its value for every 10 epochs. In exponential decay, the learning rate
is fixed in terms of exponential order and is given by,

learning rate =initial learning rate*exp (-k*t) (7)

Where t is the number of iterations, k is the order of magnitude 1, 2, 3…

We explore these learning rate methods to ensure our’s initialization towards the learning rate is the optimal
value to attain performance improvements. ADAM; an adaptive learning optimizer is used in our work which is
a very efficient optimizer compared to other optimizers.
Loss function

In deep learning, the selection of the suitable loss function is an important factor for achieving performance up-
gradation. The binary cross-entropy is a suitable choice for binary classification.

Number of epochs

The number of epochs represents the number of times the training data is delivered to the network during
training. It is another significant parameter because there is the possibility of over-fitting and under-fitting
happening if it is not properly chosen. Too many epochs lead to the validation accuracy starting of decreasing
even when the training accuracy is increasing. This is known as an over-fitting problem. The under-fitting
causes if it is not properly learning the data during the training process itself. Our proposed model uses the
number of epochs as 25 within that it provides good results in performance.

Batch-size

Mini-Batch splits the dataset into the number of subsets given to the network after which the network updates
the parameters in each subset. We set the batch size as 25 in our proposed system. It is quite enough to get good
results. The batch size is small, it may slow down the training process and if it is high, it requires high memory
[20].

Early stopping technique

This technique reduces the computational time and space complexity of the employed model architecture. This
method finds if there are no improvements in the performance measure of the model architecture in several
epochs, then it stops the execution. It is used to prevent the network from overfitting. The validation loss keeps
on decreasing at one certain point it starts to increase even then the training loss is getting reduced it means that
the network began to over-fit. It should be avoided by using an early stopping technique. It determines the
validation loss reaches its minimal value and after that, it stops the execution of a network.

4. EXPERIMENTS

4.1 Dataset Collection

Our proposed work uses the Movie Review dataset. It is collected from Kaggle online data repository. It
includes two classes positive and negative class which consists of 12,500 text reviews per class and 25,000 text
reviews in total.

4.2 Experiment Setup

All the computational work is conducted on processor AMD Ryzen 3 3250U with Radeon Graphics 2.60
GHz ,8GB RAM with 64-bit Windows OS. The proposed system is implemented by using python programming
in the Google Colab environment.

4.3 Evaluation measures


Accuracy

Accuracy is the ratio of the number of text reviews that are correctly categorized by the classifier to the total
number of text reviews put in practice.

Accuracy=TP+TN/(TP+TN+FP+FN) (8)
TP is correctly identified as Positive Reviews
TN is correctly identified as Negative Reviews
FP is incorrectly identified as Positive Reviews
FN is incorrectly identified as Negative Reviews

Precision

Precision is a measure that tells the capability of the classifier not to label as positive i.e. a negative sample.

Precision =TP/(TP+FP) (9)

Recall

The recall is a measure that tells the capability of the classifier to find all the positive samples out of the total
positive samples.

Recall= TP/ (TP+FN) (10)


F-measure

F-measure is a harmonic mean that combines both precision and recall into a single measure that includes both
the properties [4][20].

F-Measure = (2 * Precision * Recall) / (Precision + Recall) (11)

5. RESULTS AND DISCUSSION


The proposed model is experimented with a movie review dataset to be classified as positive sentiments and
negative sentiments of the text reviews. The performance of the proposed model and the corresponding plot is
given as follows. We investigate different feature representation models such as tf-idf vectorizer, word2vec
vectorizer, and glove vectorizer in combination with a single-layered convolution neural network and fine-
tuning the learning rate to obtain remarkable results. Among these model architectures, the CNN+word2vec
vectorizer provides remarkable results in terms of precision of 84.4%, recall of 87%, accuracy of 85.7%, and f-
measure of 85.7%. The traditional tf-idf vectorizer and pre-trained word embedding model in combination with
CNN show comparatively less performance. We also interpret the learning rate manually through random
search. We set the learning rate in the ranges 10 -1 to 10-7 for the given dataset. Finally, we explore that the
learning rate=10-4 provides a good improvement in performance compared to learning rate =10 -3 whereas the
remaining values show poor performance. The glove vector; a word embedding model in combination with
CNN shows good performance with an accuracy of 83.2%, a precision of 81.4%, recall of 85.2%, and F-
measure of 83.3% at a learning rate of 10 -3. The CNN in combination with tf-idf vectorizer shows poor
performance in all aspects at a learning rate of 10-6

TABLE 2. Performance measures of the proposed model

Precisio F-
Models Recall Accuracy
n measure

CNN+word2vec (Proposed Model) 84.4 87 85.7 85.7

CNN+tfidf vectorizer 47.4 71 53 54.7

CNN+glove (pre-trained word


81.4 85.2 83.2 83.3
embedding model)
Performance measure of Proposed model
100

50

0
Precision Recall Accuracy Fmeasure

CNN+word2vec (Proposed Model) CNN+tfidf vectorizer


CNN+glove

FIGURE 4. Performance measure of the proposed model

FIGURE.5 Training and validation accuracy versus epochs

FIGURE 6. Training and validation loss versus epochs

We perform a comparative analysis of the proposed model with machine learning models. The machine learning
algorithms such as random forest classifier, linear SVC classifier, and logistic regression classifier are compared
with the proposed model. Our proposed model outperforms in terms of precision, recall, accuracy, and f-
measure compared to machine learning algorithms.

TABLE 3. Performance measures of the machine learning algorithms

Precisio
Machine Learning Models Recall Accuracy F1-Score
n

RF Classifier+word2vec with
82.5 83 81.2 81.5
stratified sampling data

Linear SVC Classifier+word2vec


84 84 83.7 84
with stratified sampling data
LR Classifier+word2vec with
83.5 83.5 83.6 84
stratified sampling data

RF Classifier+ tfidf vectorizer with


82.5 82.5 82.4 82.5
stratified sampling data

Linear SVC Classifier+tfidf


vectorizer with stratified sampling 84.3 85.1 84.5 84.7
data

LR Classifier+ tfidf vectorizer with


84.3 85.5 85.4 85
stratified sampling data

RF Classifier+ glove vectorizer with


74.5 74.5 74.4 74.5
stratified sampling data

Linear SVC Classifier +glove


vectorizer with stratified sampling 75.5 76 75.9 75.7
data

LR Classifier glove vectorizer with


75.5 76 75.7 75.7
stratified sampling data

Performance of machine learning models


86
80
74
68

Precision Recall Accuracy F1-Score

FIGURE 7. Performance measure of machine learning models

We briefly discuss the potential of our proposed model. We adopt different feature representation models such
as tf-idf vectorizer, word2vec vectorizer, and glove vectorizer. The traditional tf-idf vectorizer completely relies
on occurrences of features within the document and across the documents in a collection. The word2vec
vectorizer depends on co-occurrences of words by which it captures syntactic and semantic relationships of
words in text documents. The glove is the hybrid model which combines count-based and prediction-based
models. We explore the potential of these three feature representation techniques in our work.Word2vec
outperforms the other two feature representation models.

We employ both machine learning models and deep convolution neural networks in our research work. The
deep convolution neural network is well-performed when compared to machine learning algorithms. The hyper-
parameter learning rate is adjusted to determine the optimal value to obtain a good result. The advantage of our
proposed system is that it is a simple model. It does not require any feature extraction technique separately. By
default, the convolution neural network extracts features ranging from high-level to low-level. It is a single-
layered convolution neural network layer and does not require any hybridization. It also does not need any
advanced techniques such as transfer learning etc. The pre-trained word embedding model glove is also
compared with our proposed model and gives a remarkable performance. The performance of the proposed
model highly relies on the undertaken problem, characteristics of the dataset, label correctness, and selection of
model architecture [17].

6. CONCLUSION
Social media plays a vital part in our day-to-day life activities for taking crucial decisions. It is an online
forum where people share their thoughts, experience, and grievance about any subject matter. It provides
valuable information for making the decision. Particularly, in the e-commerce domain, both the vendor and
customer get benefitted out of it from the reviews shared by the public. The vendors could identify the pros and
cons of their manufactured products. The customer could finalize their decision whether to buy a product or not
based on the comments shared in the forum [17]. Its application is not restricted only to its scope. It has broader
applications depending on different perspectives of any domain. A wide variety of data is generated at a greater
scale with high speed. Our proposed model focusing on the text reviews which possess both positive and
negative comments helps the common man to understand, analyze things and quickly take the decision. It has
experimented with a movie review dataset which consists of 12,500 positive reviews and 12,500 negative
reviews. The word embedding model; word2vec provides the semantic and syntactic relationship of the texts
and also preserves word order. It uses both machine learning algorithms and a deep convolution neural network
to perform the sentiment classification task. The performance of both the algorithms is evaluated by using their
accuracy, precision, recall, and f1-score. Among them, Deep Convolution Neural Network; gives comparatively
better results compared to other algorithms. It provides accuracy 85.7%, precision 84.4%, recall 87% and F1-
score 85.7%.

In the future, there are two possible directions to enhance our work. Our first approach is to extend our work
by performing the sentiment analysis through an unsupervised approach that involves a greater amount of data.
In this unsupervised approach, we do not have labeled data in the real-world scenario. Hence terms or features
are the most significant factor in it. Identifying the positive or negative features from the text reviews is used to
compute the overall sentiment score. Then explore different machine learning and deep learning algorithm to
classify the text reviews to their classes. Hence our second approach is to use supervised learning where enough
data are available but are scarce in certain classes in an online data repository. Advanced technologies such as
transfer learning can be incorporated as pre-trained word embedding models that help to improvise the
performance of our proposed system in classifying the sentiments.

REFERENCES

1. Christopher D. Manning, Prabhakar Raghavan, HinrichSchutze, "An Introduction to Information


Retrieval" (Cambridge University Press England)
2. Tom M Mitchell, "Machine learning” (McGraw Hill Science), ISBN 0070428077
3. M. Alhawarat, (Member, IEEE), and Ahmad O. Aseeri "A Superior Arabic Text Categorization Deep
Model (SATCDM)" (IEEE Access Vol.8 Pg. No 24653-24661 2020),
DOI:10.1109/ACCESS.2020.2970504
4. Mehmet Umut Salur 1 and Ilhan Aydin 2, "A Novel Hybrid Deep Learning Model for Sentiment
Classification”, (IEEE Access, March 2020), DOI:10.1109/ACCESS.2020.2982538
5. Muhammad Pervez Akhter 1, Zheng Jiangbin 1, IrfanRaza Naqvi Mohammed Abdelmajeed 2,
AtifMehmood 3, And Muhammad Tariq Sadiq," Document-Level Text Classification Using Single-
Layer Multi-size Filters Convolutional Neural Network" IEEE Access 2020,
DOI:10.1109/ACCESS.2020.2976744
6. Christoph Tauchert, Marco Bender, Neda Mesbah, "Towards an Integrative Approach for Automated
Literature Reviews Using Machine Learning " Proceedings of the 53rd Hawaii International
Conference on System Sciences 2020
7. Jiun-Yu Wu, Yi-Cheng Hsiao, and Mei-Wen Nian," Using supervised machine learning on large-scale
online forums to classify course-related Facebook messages in predicting learning achievement within
the personal learning environment" Taylor & Franchis Group, Sep 2018,
DOI:10.1080/10494820.2018.1515085
8. Rungroj Maipradit, Hideki Hata, Kenichi Matsumota, "Sentiment Classification using N-gram IDF and
Automated Machine Learning" Apr 2019, Pg.No1-4, DOI:10.1109/MS.2019.2919573
9. Ming Zhang a, Vasile Palade b, Yan Wang c, ZhichengJi," Attention-based word embeddings using
Artificial Bee Colony algorithm for aspect-level sentiment classification”, Information Science, Vol-
545, Sep 2020, DOI: https://doi.org/10.1016/j.ins.2020.09.038

10. Hafiz Muhammad Ahmed, Mazhar Javed Awan, Nabeel Sabir Khan, "Sentiment Analysis of Online
Food Reviews using Big Data Analytics” Research gate, Apr 2021,
DOI:10.17051/ilkonline.2021.02.93
11. Aarti Chugh1, Vivek Kumar Sharma2, Sandeep Kumar 3, Anand Nayyar 4,5, (Senior Member, IEEE),
Basit Qureshi6, (Member, IEEE), Manjot Kaur Bhatia7, and Charu Jain1," Spider Monkey Crow
Optimization Algorithm with Deep Learning for Sentiment Classification and Information Retrieval"
IEEE Access, Feb 2021, DOI:10.1109/ACCESS.2021.3055507
12. IshratNazeer, Mamoon Rashid, Sachin Kumar Gupta, Abhishek Kumar,"Use of Novel Ensemble
Machine Learning Approach for Social Media Sentiment Analysis” Research Gate IGI Global, Oct
2020, DOI:10.4018/978-1-7998-4718-2.ch002
13. MowafyM, Rezk A and El-bakry HM," An Efficient Classification Model for Unstructured Text
Document" American Journal of Computer Science and Information Technology, ISSN 2349-3917,
Feb 20, 2018, DOI:10.21767/2349-3917.100016
14. Eunjeong L. Park, Sungzooncho and Pilsungkang," Supervised Paragraph Vector: Distributed
Representations of Words, Documents and Class Labels “, IEEE Transaction, Vol 7 February 27,2019,
DOI:10.1109/Access.2019.2901933
15. Shervin Minaee , Nal Kalchbrenner, Erik Cambria, Narjes Nikzad And Meysam Chenaghlu Jianfeng
Gao, "Deep Learning–based Text Classification: A Comprehensive Review”, April 2021,
https://doi.org/10.1145/3439726
16. Sakirin Tam 1,2, Rachid Ben said 2, and ö. Özgür tanriöver 2, “A CONVBILSTM Deep Learning
Model-based Approach for Twitter Sentiment Classification”, IEEE Access, March 9,2021,
DOI:10.1109/ACCESS.2021.3064830
17. Seungwan seo1 , Czangyeob kim1 , Haedong kim2 , Kyounghyun mo3 , and Pilsung Kang 1,
“Comparative Study of Deep learning-based Sentiment Classification” ,IEEE Access Jan 14,2020,
DOI: 10.1109/ACCESS.2019.2963426
18. Mehmet Umut Salur 1 and Ilhan Aydin 2, " A Novel hybrid deep learning model for sentiment
classification", IEEE Access, Apr 16 2020, DOI: 10.1109/ACCESS.2020.2982538
19. Ning Jin 1, Jiaxian Wu 1, Xiang ma 1, Ke Yan 1, (member, IEEE), and Yuchang Mo 2, (senior
Member, IEEE), "Multi-task Learning model based on multi-scale CNN and LSTM for sentiment
Classification", IEEE Access, May 17,2020, DOI:10.1109/ACCESS.2020.2989428
20. Zabit Hameed, (Member, IEEE), and Begonya Garcia-Zapirain, (Member, IEEE), "Sentiment
Classification using a Single-Layered BILSTM model", IEEE Access, May1,2020,
DOI: 10.1109/ACCESS.2020.2988550
21. Rajeswari C., Sathiyabhama B, Devendiran S., Manivannan K. "Bearing fault diagnosis using wavelet
packet transform, hybrid PSO and support vector machine", Procedia Engineering, Vol.97 (1) PP:1772-
1783,2014, https://doi.org/10.1016/j.proeng.2014.12.329
22. Rajeswari C., Sathiyabhama B., Devendiran S., Manivannan K. , "A Gear fault identification using
wavelet transform, rough set based GA, ANN and C4.5 algorithm" Procedia Engineering, Vol – 2 PP:
338-344 , DOI : 10.1016/j.procs.2010.11.044,2014, DOI:10.1016/j.proeng.2014.12.337
23. Wahyu Calvin Frans Mariel1,3, Siti Mariyah1,2,3, and Setia Pramana1,2,3,"Sentiment analysis: A
Comparison of Deep Learning Neural Network Algorithm with SVM and Naϊve Bayes for Indonesian
text", International Conference on Data and Information Science, April 2018 DOI:10.1088/1742-
6596/971/1/012049

You might also like