Call Center Customer Sentiment Analysis Using ML and NLP
Call Center Customer Sentiment Analysis Using ML and NLP
ML and NLP
2023 14th International Conference on Intelligent Systems: Theories and Applications (SITA) | 979-8-3503-0821-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/SITA60746.2023.10373715
Abstract—In the contemporary digital era, call centers have nuances through the customer’s tone, choice of words, and
significantly incorporated automation through callbots, but they voice inflections. But a standard callbot can’t do this... at least,
often lack an essential aspect of customer service: empathy. This not without the right technological aid.
paper explores the integration of sentiment analysis into call
center operations to introduce an emotional dimension to callbot This is where sentiment analysis, combining ML techniques
interactions. Utilizing natural language processing and machine and NLP , comes into play. By equipping callbots with this
learning, the paper examines both text-based and signal-based capability, it becomes possible to detect a customer’s sentiment
sentiment analysis approaches. The proposed sentiment analysis and act accordingly, whether it’s to adjust the bot’s response
architecture encompasses data input and output interfaces, pre-
or to transfer the call to a human agent.
processing, sentiment analysis, and a decision manager module
that can, for example, escalate calls to a human agent based on In this article, we will explore how such integration could
sentiment analysis results. The preprocessing steps, both for text not only humanize automated interactions but also greatly en-
and signal analysis, are outlined in detail, along with a review of hance the customer experience in call centers. We will outline
relevant sentiment analysis algorithms. the techniques and methodologies we’ve adopted, focusing
Through comprehensive experiments, the study demonstrates
that the integration of sentiment analysis yields promising out-
on two main approaches: text-based analysis and signal-based
comes. In text-based analysis, SVM and LSTM models consis- vocal analysis.
tently performed well, achieving accuracy scores of 74% and
72%, respectively. For voice-based analysis, the MLP model II. L ITERATURE R EVIEW
exhibited the highest accuracy of 0.72 when using mel spec-
trogram features, while the RF model outperformed others This article is a continuation of our work on the develop-
with an accuracy of 0.78 using MFCC features. These results ment of an intelligent callbot [1], [2], [3].
showcase the potential of sentiment analysis in humanizing
callbot interactions, thereby enhancing customer satisfaction and Sentiment analysis is a firmly established subfield of NLP
service efficiency. that endeavors to ascertain the emotion or sentiment conveyed
Index Terms—Sentiment Analysis, Call Centers, Callbots, NLP within a textual entity [4]. It has found manifold applications,
, Machine Learning, Text-based Analysis, Signal-based Analysis, from social media analysis [5] to finance [6], and, of course,
Emotion Detection, SVM, Naive Bayes, Random Forest, LSTM, call centers.
MLP, Mel Frequency Cepstral Coefficients (MFCC), Customer
Satisfaction.
A. Text-based Sentiment Analysis in Call Centers
I. I NTRODUCTION Text-based sentiment analysis techniques have been exten-
In the era of digitalization and artificial intelligence, cus- sively studied and applied to various domains. Within the
tomer interaction has undergone a significant transformation. call center context, call transcriptions can be analyzed to
Nowadays, it is common to reach out to customer service gauge customer sentiment [7]. Common methods include those
and be greeted not by a human voice, but by a callbot. based on bag-of-words models, such as SVM [8] and logistic
These automated systems, powered by complex algorithms regression [9], as well as more advanced techniques employing
and sometimes even neural networks, aim to streamline and neural networks like LSTM [10] and BERT [11].
expedite the handling of customer requests, while optimizing
human resources. However, despite their technical efficiency, B. Signal-based Sentiment Analysis
these callbots can lack one essential thing: empathy. Beyond lexical content, voice tone, pacing, pauses, and
Feeling and emotion play a pivotal role in human com- other prosodic features can carry significant insights into a
munication. A displeased, frustrated, or confused customer speaker’s sentiment or emotional state [12]. Techniques like
requires a different approach than a satisfied or neutral one. Convolutional Neural Networks (CNN) [13] and Gaussian
In a traditional call center, a human agent can discern these Mixture Models (GMM) [14] have been employed to classify
979-8-3503-0821-1/23/$31.00 ©2023 IEEE emotions from voice signals.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
C. Challenges in Emotion Detection for Call Centers steps involved in assessing and responding to the feelings of
Accurate emotion detection within call centers presents customers as they interact with a callbot.
unique challenges. In addition to background noise that may
hamper signal quality [15], individual variability in emotion
expression [16], and the need for real-time data processing
[17] complicate the endeavor.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Preprocessing for Voice-based Analysis
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
istics and sentiments, consequently augmenting the precision intensity of each matrix element indicates the magnitude of the
of predictions.[22] corresponding frequency component. The Mel Spectrogram
d) Long Short-Term Memory: LSTM is an Recurrent serves as a fundamental feature for various audio-related tasks,
Neural Network(RNN) created to solve the vanishing gra- such as analyzing the sentiment in call center interactions.[27]
dient issue in sequence modeling. It can capture long-term b) MFCCs (Mel Frequency Cepstral Coefficients):
dependencies in sequential data, making it useful for sentiment MFCCs represent a succinct portrayal of an audio signal’s
analysis tasks involving text sequences..[23] spectral characteristics. Commonly employed in speech and
e) Multi-Layer Perceptron: a category of artificial neural audio processing, they capture the unique timbral and pho-
networks is recognized for its numerous layers of intercon- netic attributes of a sound. The MFCC extraction process
nected nodes. It encompasses an initial layer for input, one involves multiple steps, encompassing computation of the Mel
or more concealed layers, and a concluding output layer. The Spectrogram, followed by the application of Discrete Cosine
training of MLPs is achieved through the utilization of activa- Transform (DCT) to decorrelate the Mel frequency elements.
tion functions and backpropagation, rendering them adaptable The resultant coefficients outline the audio signal’s spectral
for a multitude of tasks, such as sentiment analysis.[24] envelope shape, effectively compressing information while
5) Text Vector Representation Techniques: Leveraging ap- retaining pertinent attributes. MFCCs find extensive use in
propriate text vector representation techniques is crucial for automatic speech recognition and emotion recognition tasks,
effective sentiment analysis and other NLP tasks. In this rendering them a valuable tool for extracting distinguishing
section, we delve into two widely used techniques: TF-IDF features from call center voice recordings.[27]
and CountVectorizer. Both the Mel Spectrogram and MFCCs play a pivotal role
a) TF-IDF (Term Frequency-Inverse Document Fre- in metamorphosing raw audio data into meaningful features
quency): TF-IDF is a fundamental text vectorization method that encapsulate the auditory traits of speech. These features
that evaluates the importance of words in a document relative empower sentiment analysis models to decode emotional ex-
to their frequency across a corpus. The technique combines pressions and sentiments conveyed through voice interactions
Term Frequency (TF), which measures the occurrence of in the call center milieu.
a word within a specific document, and Inverse Document 7) Comparison Metrics: To assess the performance of the
Frequency (IDF), which gauges the rarity of the word across sentiment analysis models applied to call center interactions,
the entire collection. [25] several key metrics are employed:
b) CountVectorizer: This method is a straightforward
a) Recall: Recall measures the model’s ability to cor-
and efficient approach for transforming textual data into
rectly identify actual positive instances, calculated as:
numerical vectors. Its functioning involves the creation of
a matrix, where each row represents a document and each
column represents a distinct word from the complete collection True Positives
Recall =
of texts. The entries in the matrix indicate the frequency at True Positives + False Negatives
which each word appears in its respective document.[26]
Both TF-IDF and CountVectorizer are valuable tools for b) F1-Score: is the harmonic mean of precision and
transforming raw text into structured numerical data, enabling recall, offers a balanced assessment of a model’s performance.
ML algorithms to effectively analyze and classify text docu- This metric takes into account both false positives and false
ments for sentiment analysis and various other applications. negatives and is particularly valuable when working with im-
6) Audio Feature Extraction: Efficient audio feature extrac- balanced datasets. It is computed using the following formula.
tion is crucial for converting raw audio signals into meaningful Precision × Recall
numerical representations that can be utilized for sentiment F1-score = 2 ×
Precision + Recall
analysis and other audio processing tasks. In this section, we
delve into two prominent techniques: the Mel Spectrogram and c) Precision: The precision metric determines the cor-
MFCCs. rectness of positive predictions by calculating the ratio of accu-
a) Mel Spectrogram: The Mel Spectrogram is a widely rately predicted positive instances to all instances predicted as
utilized technique in the field of audio analysis, which provides positive, which can be computed using the provided formula.
a visual representation of the frequency characteristics of an
audio signal over a period of time. Its effectiveness in captur- True Positives
ing variations in pitch and timbre makes it extremely valuable Precision =
True Positives + False Positives
for detecting subtle emotional nuances in speech. The Mel
Spectrogram divides the audio signal into short overlapping d) Accuracy: The accuracy of a model’s predictions is
intervals, calculates the magnitude of the Fourier Transform determined by the proportion of correctly classified instances
for each interval, and then maps the resulting spectrum onto out of the total number of instances.
the Mel scale—a frequency scale that is perceptually signifi-
cant. This process generates a two-dimensional matrix where True Positives + True Negatives
one axis represents time, another represents frequency, and the Accuracy =
Total Instances
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
B. voice-based Analysis
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
the contrary, in voice-based analysis, feature representation [7] L. Zhao et al., “Call center customer complaint pre-
played a pivotal role in discerning the best models: MLP and diction with topic modeling and sentiment analysis,” in
LSTM for mel spectrogram and RF for MFCCs. However, International Conference on Web Information Systems
it’s paramount to consider factors such as interpretability, Engineering, 2012.
efficiency, and real-world application needs before finalizing [8] C. Cortes and V. Vapnik, “Support-vector networks,”
any model for deployment. Moreover, while standard metrics Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
provide a robust understanding of model performance, an in- [9] D. Jurafsky and J. H. Martin, Speech and language
depth examination, especially regarding edge cases, remains processing. 2019.
indispensable. [10] S. Hochreiter and J. Schmidhuber, “Long short-term
memory,” Neural computation, vol. 9, no. 8, pp. 1735–
V. C ONCLUSION
1780, 1997.
The integration of sentiment analysis in call centers offers [11] J. Devlin et al., “Bert: Pre-training of deep bidirec-
a promising opportunity to enhance the customer experience tional transformers for language understanding,” arXiv
by introducing an emotional dimension into automated interac- preprint arXiv:1810.04805, 2018.
tions. Through the use of ML and NLP techniques, it becomes [12] B. Schuller et al., “A tutorial on paralinguistics in
possible to detect customer sentiments from text and voice speech and language processing,” Proceedings of the
signals, paving the way for tailored responses and intelligent IEEE, 2011.
call routing. The experiments have shown promising results, [13] Z. Zhang et al., “Pattern recognition in speech and
but there are challenges to overcome, including adaptation to language processing,” Electrical Engineering & Au-
different accents and languages, as well as ongoing improve- tomation, 2017.
ment of sentiment analysis models. The future of automated [14] D. A. Reynolds and R. C. Rose, “Robust text-
call centers could be much more than mechanical interaction. independent speaker identification using gaussian mix-
With the addition of artificial empathy, these systems could ture speaker models,” IEEE transactions on speech and
become true partners in delivering exceptional customer ser- audio processing, 1995.
vice. [15] U. Zölzer, Ed., DAFX: Digital Audio Effects. John
In future endeavors, we plan to advance our research by Wiley & Sons, 2011.
focusing on the seamless integration of the sentiment analysis [16] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey
module within the callbot architecture. This integration aims to on speech emotion recognition: Features, classification
provide real-time monitoring and analysis of customer interac- schemes, and databases,” Pattern Recognition, vol. 44,
tions, offering valuable insights into the callbot’s performance no. 3, pp. 572–587, 2011.
and the emotional dynamics of the conversations. [17] E. Ribeiro et al., “Real-time speech emotion recognition
R EFERENCES using gpu and its applicability in neurology,” Health
Information Science and Systems, vol. 3, no. S1, 2015.
[1] I. Aattouri, M. Rida, and H. Mouncif, “A comparative [18] “French twitter sentiment analysis dataset.” (), [Online].
study of learning algorithms on a call flow entering of Available: https://www.kaggle.com/datasets/hbaflast/
a call center,” 2021. DOI: 10.1007/978-3-030-73103- french-twitter-sentiment-analysis.
8 36. [19] “Canadian french emotional (cafe) speech dataset.” (),
[2] I. Aattouri, M. Rida, and H. Mouncif, “Creation of a [Online]. Available: https://zenodo.org/record/1478765.
callbot module for automatic processing of a customer [20] C. Cortes and V. Vapnik, “Support-vector networks,”
service calls,” 2021. DOI: 10.1007/978-3-030-76508- Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
8 30. [21] A. Y. Ng and M. I. Jordan, “The optimality of naive
[3] I. Aattouri, H. Mouncif, and M. Rida, “Modeling of bayes,” Advances in neural information processing sys-
an artificial intelligence based enterprise callbot with tems, vol. 14, pp. 849–856, 2001.
natural language processing and machine learning al- [22] L. Breiman, “Random forests,” Machine learning,
gorithms,” IAES International Journal of Artificial In- vol. 45, no. 1, pp. 5–32, 2001.
telligence (IJ-AI), vol. 12, pp. 943–955, 2 2023. DOI: [23] S. Hochreiter and J. Schmidhuber, “Long short-term
10.11591/ijai.v12.i2.pp943-955. memory,” Neural computation, vol. 9, no. 8, pp. 1735–
[4] B. Liu, Sentiment analysis and opinion mining (Synthe- 1780, 1997.
sis lectures on human language technologies 1). 2012, [24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams,
vol. 5, pp. 1–167. “Learning representations by back-propagating errors,”
[5] A. Pak and P. Paroubek, “Twitter as a corpus for Nature, vol. 323, no. 6088, pp. 533–536, 1986.
sentiment analysis and opinion mining,” in LREC, 2010. [25] K. S. Jones and S. Walker, “A study of automatic text
[6] B. Wuthrich et al., “Daily stock market forecast from classification,” Journal of Documentation, vol. 53, no. 1,
textual web data,” in IEEE International Conference on pp. 1–32, 1997.
Systems, Man, and Cybernetics, 1998.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.
[26] C. D. Manning and H. Schütze, Foundations of Sta-
tistical Natural Language Processing. The MIT Press,
1999.
[27] J. P. Bello and L. R. Rabiner, “Mel frequency cepstral
coefficients for music modeling,” in 2010 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), IEEE, 2010, pp. 5542–5545.
Authorized licensed use limited to: Indian Institute of Technology Palakkad. Downloaded on July 08,2025 at 14:07:23 UTC from IEEE Xplore. Restrictions apply.