SudaBERT A Pre-Trained Encoder Representation

The document presents SudaBERT, a pre-trained model designed for understanding the Sudanese Arabic dialect, leveraging the BERT architecture to improve Natural Language Understanding tasks such as Sentiment Analysis and Named Entity Recognition. The authors collected over 7 million sentences in Sudanese dialect to fine-tune a pre-existing Arabic-BERT model, resulting in enhanced performance for dialect-specific tasks. The paper details the methodology, data collection, pre-training, and evaluation processes, demonstrating SudaBERT's effectiveness in handling the unique aspects of Sudanese Arabic.

Uploaded by

schchatbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views4 pages

SudaBERT A Pre-Trained Encoder Representation

Uploaded by

schchatbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)

SudaBERT: A Pre-trained Encoder Representation

For Sudanese Arabic Dialect

M ukhtar Elgezouli*, K halid N. Elmadani*, M uham m ed Saeed*
University of Khartoum, Faculty of Engineering
Department of Electrical and Electronic Engineering
Algamaa Street, P.O. Box 321, Khartoum, Sudan
{mukhtaralgezoli, khalidnabigh, mohammed.yahia3}@gmail.com

*All authors have contributed equally

Abstract—Bidirectional Encoder Representations from Trans- different word vectors under different contexts ELMO has
formers (BERT) has proven to be very efficient at Natural some drawbacks. First, the complex Bi-LSTM structure makes
Language Understanding (NLU), as it allows to achieve state-of- it very slow to train and generate Embedding, second struggles
the-art results in most NLU tasks. In this work we aim to utilize
the power of BERT in Sudanese Arabic dialect, and produce with long-term context dependencies. One major problem that
a Sudanese word representation. We collected over 7 million all the above models suffer from is the high computational
sentences in Sudanese dialect and used them to resume training of cost and the long time needed for training. Also, LSTM
the pre-trained Arabic-BERT, as it was trained on large Modern layers unroll the entire network, which requires sequential
Standard Arabic (MSA) corpus. Our model -SudaBERT- has calculation, and the model ends up not utilizing the full power
achieved better performance on Sudanese Sentiment Analysis,
this clarifies that SudaBERT works better in understanding of GPU or TPU. Those drawbacks restricted the availability
Sudanese Dialectic which is the domain we are interested in. of language models for non-English language. To fill this
Index Terms—Sudanese Arabic Dialect, BERT, SudaBERT, gap, multilingual models have been trained to learn represen-
Natural Language Understanding tation in +100 different languages. Still, multilingual model
performance falls against single language models far behind
I. I NTRODUCTION due to little data representation and small specific vocabulary,
Early text representation was done using Bag of Words especially for Arabic, which has different morphological and
concepts such as TF-IDF - term frequency-inverse document syntactic structures with other languages in the multilingual
frequency - to get the token importance score. Still, this model. Sudanese data also has its differences from the Arabic
method lakes semantic understanding of the words since it was language.
just a statistical representation for the words in the document. In this paper, we describe the process of collecting Su-
After TF-IDF came to some other statistical representation danese Dialect Data, pre-training BERT transformer model
such as Latent semantic analysis LDA, but also was a complex for Sudanese Arabic Data. We evaluate our model on two
statistical representation and still depends on the count of the Arabic Natural Language Understanding downstream tasks
words. that are different tasks I) Sentiment Analysis II) Named Entity
In 2013 Mikolov came with the idea of Embedding -word Recognition.
vectors- [1]. Embedding contains a single hidden layer, which The paper is structured as follows: Section II describes the
learns the meaning of the words merely by processing a previous work. In Section III we discuss pre-training process to
large corpus of unlabeled text. This unsupervised nature of develop SudaBERT. Section IV describes the datasets we used
word2vec makes it power but has some drawbacks. First, to evaluate our model. Section V presents the experimental
Word2vec consists of only a single hidden layer, which is setup. Section VI presents the results. Finally, the conclusion
not sufficient to capture the language rules. Second word2vec in Section VII.
represents non-contextual word Embedding in which the word
will have the same meaning regardless of the context that came II. R e l a t ed Wo r k
before and after it.
A. non-contextual Embedding
Those issues were addressed by generating contextualized
representation, such as ELMO [2]. Which used deep bi- The first meaningful word representation has appeared with
directional LSTM, ELMO is trained in an unsupervised man- the word2vec model developed by Mikolov [1] then appeared
ner. Interestingly, each layer ends up learning a different char- Glove [3]. Facebook FastText [4] also some Arabic word2vec
acteristic of the sentence. Unlike traditional word Embedding models like AraVec [5] both the previous were non-contextual
such as word2vec and GLoVe [3], the ELMO [2] vector is representation for words - the word has the same meaning re-
assigned to a token or word is a function of the entire sentence gardless of its position on the sentence - a significant advance
containing the word. As a result, the same word can have was achieved with ELMO, which is contextual embedding.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 13:57:32 UTC from IEEE Xplore. Restrictions apply.
B. Contextual Embedding two most used LM tasks for BERT are Masked Language
ELMO was tested on six benchmark Natural Languages Modeling (MLM) and Next Sentence prediction (NSP).
Processing tasks: named entity extraction, question answering, In Masked Language modeling, some of the tokens (up to
semantic role labeling, sentiment analysis, textual entailment, 15%) in the input sequence are randomly masked (replaced
and coreference resolution. In all cases, the enhanced models with [MASK] token), the model should then predict those
achieved state-of-the-art performance, since ELMO more lan- masked tokens. But this implementation raises a problem; as
guage representation models has been developed like ULMFit the input sequences in the fine-tuning or inference will never
[6], BERT [7], RoBERTa [8], XLNet [9], ALBERT [10], and contain this [MASK] token, to solve this problem, some of
T5 [11], which offered improved performance by exploring the [MASK] tokens are replaced with a random word.
different pre-training methods, modified model architectures In Next Sentence Prediction -which helps the model un-
and larger training corpora. There are some Arabic BERT derstand the relationship between two sentences- the model
models such as AraBERT [12] and Arabic-BERT [13]. is given two sentences as input (separated by [SEP] token).
Half of the time, the second sentence comes after the first one
III. M e t h o d o l o g y in the original text. In the other half, the second sentence is
Bidirectional Encoder Representations from Transformers randomly chosen from the original text; BERT is then required
(BERT) is a model introduced in the paper ”BERT: Pre- to predict if the second sentence comes after the first in the
training o f Deep Bidirectional Transformers fo r Language original text or not.
Understanding” [7], it was a giant leap in NLP as it allowed C. Sub-word Segmentation
for a model to be trained using unsupervised data and then -the
Tokenization is the process of breaking down the raw
same model- can be used with minimum fine-tuning (both on
text into tokens; a Vocabulary is then constructed from the
the size of data used and time of training) in downstream NLP
most common/frequent words, this vocabulary contains all the
tasks, in short, it allowed for transfer learning to be fully used
words that the model can understand.
in NLP tasks. We can say that BERT’s architecture consists
BERT uses a special type of bit pair encoding algorithm
of multiple layers of a model called the transformer [14]. The
(BPE) called Wordpiece tokenizer, where the vocabulary is
most common BERT model (BERT_base) has 12 layers and
initialized with all the individual characters in the language.
768 hidden neurons accumulating to 110 million parameters.
Then the most frequent/likely combinations of the symbols in
Several models utilize a similar transformer architecture, but
the vocabulary are iteratively added to the vocabulary, meaning
BERT distinguishes itself by its bidirectional nature and the
that any new word can just be embedded using subwords or
way it is pre-trained to deliver this bidirectional nature.
even individual characters, excluding the chance of getting
Although the Sudanese dialect corpus that we collected
an unknown token in our text. Workpiece tokenizer works
is not small, it is not merely enough to pre-train a BERT
extremely well with the Arabic language eliminating the need
model from scratch. As a solution to this problem, we used
to use an Arabic specific tokenizer like FARASA [15].
a pre-trained model on MSA (modern standard Arabic) -
Arabic-BERT [13]-, and then continued the pre-training on D. Fine-tuning
our Sudanese dialect data. Finally, we did the fine-tuning; which means benefiting
A. Pre-training Datasets from a pre-trained language representation model by doing
a little bit more training on it with an application specific text.
We used Arabic-BERT model [13], that was already trained This approach has achieved amazing results in many language
on OSCAR Arabic data1, which contains about 8.1 billion understanding tasks in different languages [16], [17].
cleaned -pure- Arabic sentences. Then we pre-trained the 1) Sentence classification: Before feeding the text into
model on Sudanese data we collected on our own. BERT, the [CLS] token is prepended to each sentence to work
The first step in the training process was to collect a large as a sentence representation. In order to fine-tune BERT for
amount of Sudanese dialect. We have collected about 13 sentence classification, we inserted a classifier layer on top of
million Sudanese sentences -each sentence consists of at least the final hidden state corresponding to the [CLS] token. So,
20 characters- from twitter and public Telegram channels. the model should learn to encode all information it needs in
Then we cleaned the data from all non- Sudanese Arabic that hidden state. Figure 1 illustrates the steps we followed to
syntax we removed all symbols - #,?, :) , (:, ! ...etc. - and fine-tune BERT for this task.
emojis. After the cleaning step was completed, we ended 2) Name Entity Recognition: The same previous architec-
up with more than seven million cleaned -pure- Sudanese ture is used for name entity recognition (NER) task, where
sentences. each word is divided into segments using word piece tokenizer,
B. Pre-training tasks prepended with [CLS] token and fed into the model. Finally,
the classifier layer would predict “Person”, “Location”, “Or-
The second step was pre-training; the model was trained
ganization” or “Miscellaneous” for each word based on the
as a language model (LM) using relatively generic tasks; the final hidden representation of the [CLS] token.
1https://oscar-corpus.com

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 13:57:32 UTC from IEEE Xplore. Restrictions apply.
Arabic Text

Fig. 1. Steps of Fine-tuning BERT for Sentence Classification.

IV. E VALUATION B. Name Entity Recognition

We evaluated our model on two NLU tasks: Sentiment 1) ANERCorp: Arabic Name Entity Recognition corpus
Analysis (SA) and Name Entity Recognition (NER), then we contains 150k tokens, 11% of them are name entities dis-
compared the results to Arabic-BERT model which we started tributed among four entity categories: Person (39%), Lo-
training from. cation (30.4%), Organization (20.6%) and Miscellaneous
(10%) [24].
A. Sentiment Analysis V. E x p e r im e n t s

Due to the lack of the Sudanese dialect content on the A . Pre-training

internet, we evaluated our model on only one Sudanese slang Before beginning to pre-train the model, some preparations
dataset. The rest of the datasets were in MSA or another were done on the data:
Arabic dialect.
• The data was divided into small files, each having 250K
1) AJGT: Arabic Jordanian General Tweets dataset consists sentences.
of 1,800 tweets annotated as positive (900) and negative • Each file was then tokenized, and sentence segmentation
(900) [18]. The tweets were written in MSA and Jordanian was added along with position segmentation.
dialect. • All those files were then saved into TFrecord files to ease
2) ArSenTD-LEV: Arabic Sentiment Twitter Dataset for reading and loading them to the TPU.
Levantine dialect contains 4,000 tweets written in Arabic
Data preparation and initialization of the training was done on
and equally retrieved from Jordan, Lebanon, Palestine and
a Google cloud virtual machine instance; this instance contains
Syria [19]. The tweets were classified into very negative (630),
an N1-standard VCPU with 3.75GB of memory running in a
negative (1,253), neutral (885), positive (835) and very positive
Linux based image. The actual pre-training was done on a
(397).
V3-8 TPU rented from Google cloud platform, and the data
3) ASTD: Arabic Sentiment Tweets Dataset consists over was stored in a GCP (Google Cloud Platform) bucket.
10K Arabic sentiment tweets annotated as subjective negative, The pre-training was carried out with a batch size of 32
subjective positive, subjective mixed and objective [20]. We sentences per input and a learning rate of 1 e - 5 -an order
evaluated our model on the balanced version of the dataset of magnitude less than if the pre-training was from scratch-.
(797 tweets in each class). The model converged after one million steps, equivalent to 14
4) HARD: Hotel Arabic-Reviews Dataset contains 93,700 hours of training time.
hotel reviews in MSA as well as dialectal Arabic [21]. The the pre-training, the model gave a masked language model-
balanced version of the dataset consists 46,850 reviews for ing accuracy of 0.53 and a next sentence prediction accuracy
each positive and negative classes. of 0.638.
5) LABR: Large Scale Arabic Book Reviews Dataset con-
tains over 63K book reviews in Arabic. Each book review B. Fine-tuning
comes with the review text, the rating (1 to 5) and other Unlike the pre-training process, we fine-tuned SudaBERT
metadata [22]. We evaluated our model on the balanced 2class and Arabic-BERT using the GPUs provided by Google Colab.
dataset where the ratings are converted into positive (rating 4 We trained all sentiment analysis datasets for only 3 epochs
& 5) and negative (rating 1 & 2) and rating 3 is ignored. -recommended by [7]- with a learning rate of 2 e - 5, batch
6) Sentiment analysis fo r Sudanese dialect: The dataset size of 16 and max sequence length of 128. We did the same
consists of the opinions of people on Twitter about the during training the models on ANERCorp, but this time with
telecommunication services provided in Sudan [23]. It con- a batch size of 128 and max sequence length of 16.
tains 4,712 tweets written in Sudanese Arabic delicate. The For all datasets, we used the splits provided by the authors
tweets were classified into negative (3,358), positive (716) and when available. Otherwise, we split the data into 80% for
objective (638). training and 20% for testing.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 13:57:32 UTC from IEEE Xplore. Restrictions apply.
TABLE I [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
Pe r f o r m a n c e o f S u d a BERT o n A r a b ic d o w n s t r e a m t a s k s a n d of deep bidirectional transformers for language understanding,” arXiv
o n e S u d a n e s e d i a l e c t s e n t i m e n t a n a l y s is d a t a s e t , c o m p a r e d t o preprint arXiv:1810.04805, 2018.
Ar a b ic -BERT [8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
D a ta se t M etric A rab ic-B E R T S u d aB E R T pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[9] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and
A J G T (SA ) Accuracy 9 1 .7 89.2 Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language
A rS en T D -L E V (SA ) Accuracy 56.1 53.9 understanding,” in Advances in neural information processing systems,
A S T D (SA ) Accuracy 59 51.6 pp. 5753-5763, 2019.
H A R D (SA ) Accuracy 9 5 .8 95.5 [10] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut,
L A B R (SA ) Accuracy 83.5 80.8 “Albert: A lite bert for self-supervised learning of language representa-
S e n tim e n t an a ly sis fo r Accuracy 75.4 7 6.2 tions,” arXiv preprint arXiv:1909.11942, 2019.
S u d a n e se d ia lect Macro-F1 577 6 0.6 [11] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
A N E R C o rp (N E R ) Macro-F1 7 6 .9 73.8 Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of trans-
fer learning with a unified text-to-text transformer,” arXiv preprint
arXiv:1910.10683, 2019.
[12] W. Antoun, F. Baly, and H. Hajj, “Arabert: Transformer-based model
VI. R ESULTS for arabic language understanding,” p. 9.
[13] A. Safaya, M. Abdullatif, and D. Yuret, “Kuisail at semeval-2020 task
Table I shows the experimental results of applying Sud- 12: Bert-cnn for offensive speech identification in social media,” 2020.
aBERT on sentiment analysis and name entity recognition [14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
tasks, compared to Arabic-BERT [13]. L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
[15] A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak, “Farasa: A
The results from table I shows SudaBERT has achieved better fast and furious segmenter for Arabic,” in Proceedings of the 2016
performances in Sudanese dialectic sentiment analysis com- Conference o f the North American Chapter o f the Association for
pared to the state-of-the-art Arabic model Arabic-BERT, this Computational Linguistics: Demonstrations, (San Diego, California),
pp. 11-16, Association for Computational Linguistics, June 2016.
clarify that our model will work better in Sudanese dialectic. [16] Y. Liu and M. Lapata, “Text summarization with pretrained encoders,”
The results also shows Arabic-BERT has achieved better per- 2019.
formance than SudaBERT on Modern Standard Arabic (MSA) [17] K. N. Elmadani, M. Elgezouli, and A. Showk, “Bert fine-tuning for
arabic text summarization,” 2020.
and other Arabic accents. We concluded this performance [18] K. M. Alomari, H. M. ElSherif, and K. Shaalan, “Arabic tweets sen-
degradation on SudaBERT compared to Arabic-BERT, because timental analysis using machine learning,” in International Conference
we pre-trained SudaBERT for more epochs on Sudanese data on Industrial, Engineering and Other Applications o f Applied Intelligent
Systems, pp. 602-610, Springer, 2017.
and this led to change the embedding values of the model. [19] R. Baly, A. Khaddaj, H. Hajj, W. El-Hajj, and K. Bashir Shaban,
“Arsentd-lev: A multi-topic corpus for target-based sentiment analysis
VII. C o n c l u s io n in arabic levantine tweets,” OSACT3, 2018.
[20] M. Nabil, M. Aly, and A. Atiya, “ASTD: Arabic sentiment tweets
In this study, we collected and cleaned Sudanese dialect dataset,” in Proceedings o f the 2015 Conference on Empirical Methods
data from Twitter and public Telegram channels. Then, we in Natural Language Processing, (Lisbon, Portugal), pp. 2515-2519,
used Arabic-BERT model as a checkpoint to start training Association for Computational Linguistics, Sept. 2015.
[21] A. Elnagar, Y. S. Khalifa, and A. Einea, “Hotel arabic-reviews dataset
SudaBERT with the collected data. Finally, we evaluated construction for sentiment analysis applications,” in Intelligent Natural
SudaBERT against Arabic-BERT on two NLU tasks: senti- Language Processing: Trends and Applications, pp. 35-52, Springer,
ment analysis, and name entity recognition. The experimental 2018.
[22] M. Aly and A. Atiya, “LABR: A large scale Arabic book reviews
results show higher performance of SudaBERT as compared dataset,” in Proceedings o f the 51st Annual Meeting of the Association
to Arabic-BERT when dealing with Sudanese dialect, while for Computational Linguistics (Volume 2: Short Papers), (Sofia, Bul-
Arabic-BERT was better in understanding MSA and other garia), pp. 494-498, Association for Computational Linguistics, Aug.
2013.
Arabic dialects. [23] R. Ismail, M. Omer, M. Tabir, N. Mahadi, and I. Amin, “Sentiment
analysis for arabic dialect using supervised learning,” in 2018 Inter-
Re f e r en c es
national Conference on Computer, Control, Electrical, and Electronics
[1] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Engineering (ICCCEEE), pp. 1-6, 2018.
“Distributed representations of words and phrases and their composition- [24] Y. Benajiba and P. Rosso, “Anersys 2.0: Conquering the ner task for
ality,” in Advances in neural information processing systems, pp. 3111— the arabic language by combining the maximum entropy with pos-tag
3119, 2013. information.,” in IICAI, pp. 1814-1823, 2007.
[2] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee,
and L. Zettlemoyer, “Deep contextualized word representations,” arXiv
preprint arXiv:1802.05365, 2018.
[3] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors
for word representation,” in Proceedings of the 2014 conference on
empirical methods in natural language processing (EMNLP), pp. 1532—
1543, 2014.
[4] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
vectors with subword information,” Transactions o f the Association for
Computational Linguistics, vol. 5, pp. 135-146, 2017.
[5] A. B. Soliman, K. Eissa, and S. R. El-Beltagy, “Aravec: A set of arabic
word embedding models for use in arabic nlp,” Procedia Computer
Science, vol. 117, pp. 256-265, 2017.
[6] J. Howard and S. Ruder, “Universal language model fine-tuning for text
classification,” arXiv preprint arXiv:1801.06146, 2018.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 13:57:32 UTC from IEEE Xplore. Restrictions apply.

Arabert: Transformer-Based Model For Arabic Language Understanding
No ratings yet
Arabert: Transformer-Based Model For Arabic Language Understanding
7 pages
BERT
No ratings yet
BERT
98 pages
ELMo: Deep Contextualized Word Representations
No ratings yet
ELMo: Deep Contextualized Word Representations
15 pages
Split 1363534026993628405
No ratings yet
Split 1363534026993628405
2 pages
Pars BERT
No ratings yet
Pars BERT
10 pages
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
No ratings yet
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
71 pages
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
No ratings yet
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
8 pages
Contextual Word Embeddings
No ratings yet
Contextual Word Embeddings
8 pages
A Survey On Contextual Embeddings
No ratings yet
A Survey On Contextual Embeddings
13 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
No ratings yet
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
8 pages
BERT-based Models For Classifying Multi-Dialect Arabic Texts
No ratings yet
BERT-based Models For Classifying Multi-Dialect Arabic Texts
10 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
No ratings yet
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
17 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
No ratings yet
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
21 pages
German Language Model Advances
No ratings yet
German Language Model Advances
9 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
A Set of Arabic Word Embedding Models For Use in Arabic NLP
No ratings yet
A Set of Arabic Word Embedding Models For Use in Arabic NLP
10 pages
Language Models as Knowledge Bases
No ratings yet
Language Models as Knowledge Bases
11 pages
Elmo Slides
No ratings yet
Elmo Slides
12 pages
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
No ratings yet
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
10 pages
Development of Pre-Trained Transformer-Based Models For The Nepali Language
No ratings yet
Development of Pre-Trained Transformer-Based Models For The Nepali Language
8 pages
Bert
No ratings yet
Bert
10 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Bert
No ratings yet
Bert
20 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
No ratings yet
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
16 pages
Trend
No ratings yet
Trend
47 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Neobert: A Next-Generation Bert: Lola Le Breton Quentin Fournier John X. Morris Mariam El Mezouar Sarath Chandar
No ratings yet
Neobert: A Next-Generation Bert: Lola Le Breton Quentin Fournier John X. Morris Mariam El Mezouar Sarath Chandar
23 pages
2503.05500v2 EuroBert Multilingual RAG
No ratings yet
2503.05500v2 EuroBert Multilingual RAG
28 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
Arabic NLP Session Hackathon
No ratings yet
Arabic NLP Session Hackathon
33 pages
T B: Pretraining For Joint Understanding of Textual and Tabular Data
No ratings yet
T B: Pretraining For Joint Understanding of Textual and Tabular Data
15 pages
Neo Bert
No ratings yet
Neo Bert
19 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Unit - 7 NLP
No ratings yet
Unit - 7 NLP
14 pages
Qiu Et Al. - 2020 - Pre-Trained Models For Natural Language Processing
No ratings yet
Qiu Et Al. - 2020 - Pre-Trained Models For Natural Language Processing
28 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Ria 37.03 24
No ratings yet
Ria 37.03 24
7 pages
Ara - CANINE: Character-Based Pre-Trained Language Model For Arabic Language Understanding
No ratings yet
Ara - CANINE: Character-Based Pre-Trained Language Model For Arabic Language Understanding
15 pages
Pre-Trained Models For Natural Language Processing: A Survey
No ratings yet
Pre-Trained Models For Natural Language Processing: A Survey
31 pages
AraBERT for Arabic Reviews Analysis
No ratings yet
AraBERT for Arabic Reviews Analysis
9 pages
1 s2.0 S1877050922007141 Main
No ratings yet
1 s2.0 S1877050922007141 Main
6 pages
State of Multilingual and Multimodal NLP
No ratings yet
State of Multilingual and Multimodal NLP
27 pages
Sentiment Analysis With Contextual Embeddings and Self-Attention
No ratings yet
Sentiment Analysis With Contextual Embeddings and Self-Attention
10 pages
M GTE
No ratings yet
M GTE
20 pages
2024 Semeval-1 72
No ratings yet
2024 Semeval-1 72
6 pages
Stanford Dataset 2.0
No ratings yet
Stanford Dataset 2.0
9 pages
Aspect-Based Sentiment Analysis Using BERT
No ratings yet
Aspect-Based Sentiment Analysis Using BERT
10 pages
Report
No ratings yet
Report
3 pages
The Death of Feature Engineering Bert With Linguistic 4s2zmqi9xs
No ratings yet
The Death of Feature Engineering Bert With Linguistic 4s2zmqi9xs
6 pages
AI-Enhanced Text Embeddings
No ratings yet
AI-Enhanced Text Embeddings
20 pages
Multilingual Text Embeddings Model
No ratings yet
Multilingual Text Embeddings Model
20 pages
Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
Llama 3.1 AI Model Analysis
No ratings yet
Llama 3.1 AI Model Analysis
12 pages
FFCA-YOLO For Small Object Detection in Remote Sensing Images
No ratings yet
FFCA-YOLO For Small Object Detection in Remote Sensing Images
15 pages
OLMo
No ratings yet
OLMo
26 pages
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
No ratings yet
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
8 pages
Crop Disease Detection Documentation
No ratings yet
Crop Disease Detection Documentation
2 pages
Fyug English Sem 2
No ratings yet
Fyug English Sem 2
15 pages
Computational Visual Media, Tsinghua University Press
No ratings yet
Computational Visual Media, Tsinghua University Press
15 pages
Deep Learning With R, Second Edition Francois Chollet Instant Download
100% (1)
Deep Learning With R, Second Edition Francois Chollet Instant Download
48 pages
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
No ratings yet
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
11 pages
(Ebook) ChatGPT & Co.: A Workbook For Writing, Research, Creating Images, Programming, and More by Rainer Hattenhauer ISBN 9781032822464, 1032822465 Download
100% (3)
(Ebook) ChatGPT & Co.: A Workbook For Writing, Research, Creating Images, Programming, and More by Rainer Hattenhauer ISBN 9781032822464, 1032822465 Download
37 pages
Comparative Study Between Vision Transformer and EfficientNet
No ratings yet
Comparative Study Between Vision Transformer and EfficientNet
5 pages
Ada Clip
No ratings yet
Ada Clip
41 pages
Efficient 3D Pose Estimation for VPTs
No ratings yet
Efficient 3D Pose Estimation for VPTs
16 pages
Large Language Model (LLM) Interview Question and Answer Course
No ratings yet
Large Language Model (LLM) Interview Question and Answer Course
10 pages
CS388N Practice Questions Answers
No ratings yet
CS388N Practice Questions Answers
48 pages
Diabetic Retinopathy Review Paper
No ratings yet
Diabetic Retinopathy Review Paper
2 pages
Day 9 - Transformer Revision
No ratings yet
Day 9 - Transformer Revision
13 pages
Minor Project Format
No ratings yet
Minor Project Format
46 pages
Hailo Ai SW Suite 2025-01
No ratings yet
Hailo Ai SW Suite 2025-01
25 pages
ICDAR2021-Information Extraction From Invoices
No ratings yet
ICDAR2021-Information Extraction From Invoices
17 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
Iiitb Ed ML Ai
No ratings yet
Iiitb Ed ML Ai
24 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
CovidExpert - A Triplet Siamese Neural Network Framework For The Detection of COVID-19
No ratings yet
CovidExpert - A Triplet Siamese Neural Network Framework For The Detection of COVID-19
14 pages
To Explore Ai Sora Possiblemulti-Dimensionalinfluenceson The Futureeducation Industry, Short-Video Industry, Film Industry and Robotics Industry
No ratings yet
To Explore Ai Sora Possiblemulti-Dimensionalinfluenceson The Futureeducation Industry, Short-Video Industry, Film Industry and Robotics Industry
14 pages
Choosing and Implementing Hugging Face Models - by Stephanie Kirmer - Towards Data Science
No ratings yet
Choosing and Implementing Hugging Face Models - by Stephanie Kirmer - Towards Data Science
15 pages
Swin Transformer Paper
No ratings yet
Swin Transformer Paper
12 pages
It Technical Magazine Jan 2025
No ratings yet
It Technical Magazine Jan 2025
15 pages
SemanticKernelCookBook en
No ratings yet
SemanticKernelCookBook en
47 pages

SudaBERT A Pre-Trained Encoder Representation

Uploaded by

SudaBERT A Pre-Trained Encoder Representation

Uploaded by

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)

SudaBERT: A Pre-trained Encoder Representation

For Sudanese Arabic Dialect

*All authors have contributed equally

978-1-7281-9111-9/20/$31.00 ©2020 IEEE

Fig. 1. Steps of Fine-tuning BERT for Sentence Classification.

IV. E VALUATION B. Name Entity Recognition

Due to the lack of the Sudanese dialect content on the A . Pre-training

You might also like