SENTIPUBLIKO Sentiment Analysis of Repos
SENTIPUBLIKO Sentiment Analysis of Repos
Volume Trends
Adomar L. Ilao et al., International Journal of Advanced 9 No.2,
in March - April
Computer Science2020
and Engineering, 9(2), March - April 2020, 1744 – 1751
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse128922020.pdf
https://doi.org/10.30534/ijatcse/2020/128922020
ABSTRACT Nevertheless, Jejemon language users suffered from weak
Jejemon language becomes a form of communication dialect. speaking ability and mangled word spelling as observed by
It was a form of expression used by a particular social group English teachers. They average 12 text messages everyday.
unknown as Jejemon. However, the Jejemon expression has Each jejemon message normally is composed of symbols and
different formats ranging from basic form of changing letter phonetics. However, the advent of technological
to number, lowercase letter to uppercase letter, inserting communication medium focuses on social connection such as
shortcut texts into more complicated format. This paper aims social media created new Jejemon followers in Twitter aside
to classify Jejemon tweet whether it is a positive, negative or from JEJETYPING, wearing a jeje-hat and jeje-photos online
neutral sentiment through sentiment analysis techniques. [4].
Experiment included translation of Jejemon formatted tweet,
reduction of sentiment scores on repost tweets and sentiment The overwhelming success of Jejemon language lead to an
classification. Analysis of experiment results involves Paired award as Word of the Year in 2010 by the Filipinas Institute of
T-Test, confusion matrix, precision, recall, f-score and Translation Incorporated based on significant impact on
accuracy. Evidently, translated Jejemon tweet resulted 78.5% Filipino life in terms of socio-cultural, political, social and
similar from the actual message using cosine similarity economic [5].
algorithm. Furthermore, Paired T-Test shows no significant
difference between new sentiment scores from translated As an influential millennial language, Jejemon expression is
expression and actual sentiment scores using Hybrid a result of self-expression which designed to resolve concerns
Algorithm. Sentiment analysis metrics such as precision, on limited available space provided by text messages and
recall, f-score and accuracy show acceptable values of 71%, Twitter [5]. However, it is significantly important beyond
76%, 71% and 73% respectively. translation to understand the actual feeling behind the
person’s message or opinion.
Key words: Sentiment Analysis, Social Media, Jejemon,
Hybrid Algorithm, Cosine Similarity, Dictionary Substitution Classification of one’s opinion can be done through sentiment
Approach, Tweet analysis. It can classify opinion whether positive, negative or
neutral via polarity score. Unfortunately, other factor to
1. INTRODUCTION consider is the impact of repost known as retweeting in
Twitter towards sentiment classification in a document level.
Human language is constantly changing. It happens across
time and social groups. These changes on human language This study is designed to evaluate the accuracy and precision
yields negative perception from people who are unable to cope Hybrid Algorithm developed by Ilao and Fajardo [6] applied
with new vocabulary or new visual representation. The to Jejemon Tweets. Furthermore, integration of string
downside of the new trends might result possible similarity algorithm in reducing sentiment polarity score for
miscommunication between social groups. repost or retweet messages will provide better understanding
of the side effect of repost messages in a document level
In the Philippines, several languages were invented to cater sentiment evaluation.
specific social groups. Predominantly, millennial and
member of the third sex are the most socially active group in
2. REVIEW OF RELATED LITERATURES
terms of modern language expression [1]. They developed
new language semantics such as Jejemon (p30pL3, o+h3r,
2.1 Jejemon Word Structure
pl4c3s) [2] and bikemon (Aglipay, Chiquito, Churchill).
However, the most controversial language trend in 2016 is
Jejemon language becomes popular 2010 based on the limited
Jejemon [3].
space available for text messages and tweets [5]. It is
primarily composed of alphabet known as Jejebet.
1744
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
1745
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
It uses sentiment lexicon to assign a polarity value. A lexicon The number of occurrences of positive, negative and neutral
is comprised of words or phrase where each label is tweets will determine the over-all sentiment of the entire
categorized based on polarity value whether positive or population of collected political tweets. The hybrid approach
negative orientation [12]. In building a sentiment lexicon was experimented to different political datasets. The
have three strategies namely hand-craft elaboration, experiment yielded 88.33% accuracy better than
automatic expansion from an initial list of seed words and SentiWordNet and VADER algorithm.
corpus-based approach.
3. METHODOLOGY
A comparative study [13] on Lexicon-based review involving
AFINN, General Inquirer, Micro-WNOP, Opinion Lexicon,
The study was designed to implement sentiment analysis
SentiSense, SentiWordNet, Subjectivity Lexicon and
approach as illustrated in Figure 3.
WordNet-Affect. The investigation resulted 78% accuracy
towards SentiWordNet which utilizes WordNet corpus.
1746
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
Furthermore, customized Tagalog dictionary was Classification of sentiment using VADER normally
constructed from collection of rubbish words of uses compound score. However, hybrid approach
English or Filipino words will resolve some will apply positive score and negative score as shown
ambiguity coming from Jejemon expression in Equation 1.0 and Equation 2.0.
translation.
3.3.3 Filipino Sentiment Polarity Score
3.1.4. Tokenization
Filipino sentiment polarity score will be applied for
The process of tokenization segments where Jejemon Filipino expression based on Equation 4.0.
expression might be in the form of paragraph,
sentences and word into a lexeme (single word). For every detected positive or negative word from
collected list of commonly used tagalog political
3.1.5 Stemming sentiment word will be scored as 1 point.
In this stage, Jejemon lexeme will subjected to Filipino Sentiment Polarity Score (FOSS) =
extraction of base word by simplifying plural form to (Number of Positive Words-Number of
(4)
singular or by removing prefix, suffix and infix. Negative Words)/Number of Words in the
Tweet
1747
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
1748
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
Table 8: Similarity Percentage of Translated Expression and Actual After Jejemon translation, the new dataset comprised of
Expression English, Filipino or combination underwent pre-selection. As
No. of No. of
Average stated on previous studies on machine translation on several
Jejemon
Strong Weak
Cosine Filipino dialects, accuracy rates are 70.67% [22] and 69.5%
Format
Similarit Similarit
Similarity [23]. Pre-selection of instances will be based on similarity
Domain
y (70% y
Percentag percentage between 70% up to 100%. Qualified instances are
and (Below
e revealed on Table 10.
Above) 70%) Table 10: New Instances after Pre-Selection
Online Languages Controlled Uncontrolled
Resource Environment Environment
(Controlled 92% 7% 81% English 33 64
Environment Filipino 0 2
) Combination 1 1
Personal Total 34 67
Preference
(Uncontrolle Table 10 shows 12% decreased of instances under the
73% 27% 76%
d controlled environment from 39 instances down to 34
Environment instances. Similarly, 27% decreased on uncontrolled
) environment instances from 92 instances down to 67
instances.
The controlled group’s translation similarity percentage
ranges from 23% up to 100%. The factors which affect Table 11: Paired T-Test of Computed Sentiment Scores
translation reflected Table 9 were word case, missing letters
that were omitted by Jejemon expression when translated lead
to a different English or Filipino expression due to ambiguity,
proper space between Jejemon expression and single/plural
forms. However, uncontrolled group’s similarity percentage
ranges 25% up to 100%. Significant difference from
controlled group, uncontrolled group’s factors are special
characters such as Ë, ë, $, @, or ü ; repetition of letters namely
“z”,”s”; two representation of 8 either “te” or “ate” , “i” either
“!” or “1”, “a” either “4” or “@” and Filipino word
interchangeably used “po” or “poh”.
1749
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
Table 11 described the relationship between different 79%. Lastly, the highest f-score value is 84% under the
sentiments’ scores namely new score is derived from negative sentiments of Uncontrolled Environment Domain
translated expression, original score is derived from the whereas the remaining values are within 60% up to 84%.
actual expression express either English, Filipino or
combination provided by participants. Furthermore, cosine The improved Ilao’s hybrid model provided accuracies of
similarity found instances with 70% similarity considered as 68% (Controlled Environment Domain), 79% (Uncontrolled
repost messages. All repost sentiments’ scores underwent Environment Domain) and an average of 74% from collected
either 1% or 3% reduction. As stated on Table 11, p-values Jejemon format domain.
resulted 0.821, 0.257 and 0.246 are greater than 0.05 implied
there is no significant difference between new scores and Generally, average scores of precision, recall, f-score and
original scores under controlled environment, uncontrolled accuracy resulted 70% and above in term of classification
environment and after application of recurrence percentage performance towards the different Jejemon format domain.
reduction scores.
7. CONCLUSION
Table 12: Confusion Matrix of Controlled Environment
Predicted This paper experimented on sentiment analysis solely focus
Negative Positive on Filipino fascination in expressing their idea through
(F) (T) Jejemon language.
Negative
11 1
Actua (F) However, Jejemon language raised issues on millennial
l Positive English proficiency. Jejemon language as a form of
7 6
(T) communication is normally expressed through short-text and
special representation of word or expression. Furthermore,
Jejemon expression offered several language structure
Table 13: Confusion Matrix of Uncontrolled Environment ranging simple to complicated techniques.
Predicted
Negative Positive In general, study of machine translation from Jejemon
(F) (T) expression into counterpart language namely English,
Negative Filipino or combination yielded successful conversion based
31 3 on the cosine similarity done. The similarity rates of
Actua (F)
l Positive translated expression against actual expression gained 76%
9 11 and 81% for controlled and uncontrolled environment
(T)
respectively.
Table 12 and Table 13 shown number of positive and
negatives instances classified correctly or incorrectly. It Even though some instances were not converted exactly as
appears out of 79 instances, 59 instances were classified compared to actual expression, paired T-Test shows no
correctly and 20 instances were classified incorrectly. significant difference between original polarity score and new
polarity score derived from the two dataset domain.
Table 14: Sentiment Analysis Metric Summary Furthermore, repost messages that underwent reduction
through recurrence percentage reduction technique do not
shows significant difference from the two domains based on
the available data.
ACKNOWLEDGEMENT
Both negative sentiments resulted high precision with at least
91% and lowest precision value is 46% under Positive
Authors would like to express their greatest gratitude to
Controlled Environment. While, recall highest value is 86%
Graduate School of Technological Institute of the Philippines,
falls under positive sentiment of Controlled Environment
Quezon City for the support of realization of this paper
Domain; however the rest of the recall values are 61% up to
publication.
1750
Adomar L. Ilao et al., International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 1744 – 1751
REFERENCES 14. Lu, W., Du, X., Hadjieleftheriou, M., & Ooi, B. C.
(2014). Efficiently Supporting Edit Distance Based
1. Ramos, J. J. R. (2018). Bekimon in Social Networking String Similarity Search Using B+-Trees. IEEE
Transactions on Knowledge and Data
Site: Modernized Idiolect and Sociolect. International
Engineering, 26(12), 2983-2996.
Journal of Social Sciences & Humanities, 3(2), 16-25.
15. Khuat, Tung & Duc Hung, Nguyen & Thi My Hanh, Le.
2. Nocon, N., Cuevas, G., Magat, D., Suministrado, P., &
(2015). A Comparison of Algorithms used to Measure
Cheng, C. (2014, October). NormAPI: An API for
the Similarity between Two Documents. International
normalizing Filipino shortcut texts. In 2014
Journal of Advanced Research in Computer Engineering
International Conference on Asian Language Processing
& Technology (IJARCET). 4. 1117-1121.
(IALP) (pp. 207-210). IEEE.
https://doi.org/10.1109/IALP.2014.6973494 16. Chen, H. (2012). String Metric and Word Similarity
3. Mongaya, K.M. (2010). Applied to Information Retrieval (Doctoral
(http://karlomongaya.wordpress.com/2010/07/09/the- Dissertation, Master’s Thesis, School Of Computing.
University Of Eastern Findland).
internet-as -corrective-of-individualist-culture/)
17. Prasanna Lakshmi, K., Shraddha, V., Abhinava, V.,
4. The Social and Educational Influences of Jejemon
Kavya, K., & Gayathri, R. (2017). Sentiment Analysis
Texting Style.(2010) Date Retrieved: November 10,
And Prediction Using Text Mining. Indian Journal Of
2019
Science And Technology, 10 (28). Doi:
5. Tubac, Angelo. (2017). LINGUISTIC
10.17485/Ijst/2017/V10i28/113441
INNOVATIONS IN THE JEJEMON
18. Ilao, A. & Fajardo, A (2019). Sentiment Analysis of
PHENOMENON. 10.13140/RG.2.2.18434.07360.
Tweet Messages using Hybrid Approach Algorithm.
6. Ilao, A. & Fajardo, A. Sentiment Analysis of Tweet
17th International IEEE Conference of ICT and
Messages using Hybrid Approach Algorithm. In 2019,
Knowledge Engineering 2019
The 17th International Conference on ICT and
https://doi.org/10.1109/ICTKE47035.2019.8966887
Knowledge Engineering (ICT-KE). IEEE.
19. https://www.internetslang.com/trending.asp.
https://doi.org/10.1109/ICTKE47035.2019.8966887
Accessed Date : March 10,2020
7. Cataan, J. C. (2011). A ‘World’ Within the World: The
20. https://abiword.github.io/enchant/. Accessed Date:
Jejemons as the ‘Other’ Culture, Unpublished
March 12, 2020.
Undergraduate Thesis, University of the Philippines
College of Mass Communication. 21. https://github.com/raymelon/tagalog-dictionary-scra
per/blob/master/tagalog_dict.txt. Accessed Date:
8. Raghunathan, K., & Krawczyk, S. 2009. CS224N:
March 12, 2020.
Investigating SMS Text Normalization using
22. Domingo and R. Roxas. 2006. Utilizing Clues in
Statistical Machine Translation. Department of
Computer Science, Stanford University. Syntactic Relationships for Automatic Target Word
9. Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sense Disambiguation. Journal of Research for Science,
Computing and Engineering. 3(3), 18-24.
Sentiment Analysis: A Comparative Study on
https://doi.org/10.3860/jrsce.v3i3.99
Different Approaches. Procedia Computer Science, 87,
44-49. 23. Alcantara and A. Borra. 2008. Constituent Structure
https://doi.org/10.1016/j.procs.2016.05.124 for Filipino: Induction through Probabilistic
10. Romanyshyn, M. (2013). Rule-Based Sentiment Approaches. Proceedings of the 22nd Pacific Asia
Conference on Language, Information and Computation
Analysis of Ukrainian Reviews. International Journal
(PACLIC). 113-122.
of Artificial Intelligence & Applications, 4(4), 103
11. Kawathekar, S. A., & Kshirsagar, M. M. (2012).
Sentiments Analysis Using Hybrid Approach
Involving Rule-Based & Support Vector Machines
Methods. IOSRJEN, 2(1), 55-58.
https://doi.org/10.9790/3021-0215558
12. Almatarneh S, Gamallo P (2018) A Lexicon Based
Method to Search for Extreme Opinions. Plos One
13(5): E0197816. Https://DOI.Org/
10.1371/Journal.Pone.0197816
13. Cho, H., Lee, J. S., & Kim, S. (2013). Enhancing
Lexicon-Based Review Classification by Merging and
Revising Sentiment Dictionaries. In Proceedings of the
Sixth International Joint Conference on Natural
Language Processing (pp. 463-470).
1751