Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views8 pages

LKAU23 at Qur'an QA 2023

Uploaded by

Baraa Hekal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

LKAU23 at Qur'an QA 2023

Uploaded by

Baraa Hekal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

LKAU23 at Qur’an QA 2023: Using Transformer Models for Retrieving

Passages and Finding Answers to Questions from the Qur’an


Sarah Alnefaie[1][2] , Abdullah N. Alsaleh[1][2] ,
Eric Atwell[2] , Mohammed Ammar Alsalka[2] , Abdulrahman Altahhan[2]
King Abdulaziz University[1]
University of Leeds[2]
{scsaln, scanaa, e.s.atwell, m.a.alsalka, a.altahhan}@leeds.ac.uk

Abstract Task A of the Qur’an QA 2023 shared tasks uses


the dense approach by fine-tuning the Arabic pre-
The Qur’an QA 2023 shared task has two sub trained models and then ensemble the best scores.
tasks: Passage Retrieval (PR) task and Ma-
The idea of Task A is to build a system to return a
chine Reading Comprehension (MRC) task.
Our participation in the PR task was to further list of Qur’anic passages that contain answers to a
train several Arabic pre-trained models using a posed question/query (Malhas et al., 2023). How-
Sentence-Transformers architecture and to en- ever, the challenging aspect of this task is that there
semble the best performing models. The results are some questions that do not have an answer in
of the test set did not reflect the results of the de- the Qur’an. The first research question RQ1: Does
velopment set. CL-AraBERT achieved the best using the Arabic pre-trained models in PR for CA
results, with a 0.124 MAP. We also participate
outperform the traditional approach such as BM25?
in the MRC task by further fine-tuning the base
and large variants of AraBERT using Classical Most recent studies on the Qur’an MRC task
Arabic and Modern Standard Arabic datasets. have tended to use transformers-based models
Base AraBERT achieved the best result with along with Qur’anic Reading Comprehension
the development set with a partial average pre- Dataset (QRCD) (Malhas et al., 2022). We noticed
cision (pAP) of 0.49, while it achieved 0.5 with that they improved the performance of the systems
the test set. In addition, we applied the ensem-
using three approaches: (1) using an additional
ble approach of best performing models and
post-processing steps to the final results. Our MSA and/or CA datasets in fine-tuning (Mostafa
experiments with the development set showed and Mohamed, 2022; Aftab and Malik, 2022), (2)
that our proposed model achieved a 0.537 pAP. constructing an ensemble of different BERT mod-
On the test set, our system obtained a pAP score els (3) applying appropriate post-processing steps
of 0.49. on the result of the final ranked list (ElKomy and
Sarhan, 2022). To the best of our knowledge,
1 Introduction no studies have combined these three approaches.
The Arabic language poses many challenges in Therefore, we applied the combination of those ap-
Naturual Language Processing (NLP), including proaches for Task B of the Qur’an QA 2023 shared
in the areas of Machine Reading Comprehension task. The goal of Task B was to build a model
(MRC) and Passage Retrieval (PR). One of the that took a Qur’anic passage and MSA question
most prominent recent NLP techniques applied as input and extracted a ranked list of up to 10 an-
to MRC and PR tasks in the Arabic language is swer spans to that question from the passage as
pre-trained transformer-based models, which can output (Malhas et al., 2023). The new challenge
achieve state-of-the-art performance (Alsubhi et al., in the second version of this task was that there
2021, 2022). were no answers to some questions. The second
There are PR studies that use a dense approach research questions RQ2 in this paper is: Does the
based on pre-trained models (Karpukhin et al., combination of fine-tuning the models with a large
2020). This approach has outperformed traditional CA dataset and/or MSA dataset, ensembling these
information retrieval, such as TF-IDF (Sammut and models and then applying post-processing steps
Webb, 2010) with Modern Standard Arabic (MSA; improve the results?
Alsubhi et al., 2022). To our knowledge, the dense The paper’s structure is as follows: In Section
approach has not been researched with Classical 2, related work is presented. Section 3 describes
Arabic (CA). Therefore, our proposed system for the datasets. This is followed by Section 4, which
720
Proceedings of the The First Arabic Natural Language Processing Conference (ArabicNLP 2023), pages 720–727
December 7, 2023 ©2023 Association for Computational Linguistics
explains the proposed models. In Section 5, the ElKomy and Sarhan (2022) recommends using
results are discussed. Finally, the paper provides a the training and development sets of QRCD_v1.1
conclusion. to fine-tune five different Arabic BERT models.
They then used these five models individually to
2 Related Work find the answers for the QRCD test set. To ob-
tain good results, they implemented an ensemble
2.1 Task A: Passage Retrieval
approach for the results of these models. Finally,
Karpukhin et al. (2020) proposed their dense pas- post-processing was applied to enhance the results.
sage retrieval (DPR) system using BERT base and The results showed a pRR of 56.6, an EM of 26.8
uncased models. Their system applies dual en- and F1@1 of 0.50.
coders for the passages to be transformed into di- To the best of our knowledge, no study has been
mensional real-valued vectors and then applies an conducted on the impact of the combination of
index for all passages for retrieval. The input query the following three factors in building the Arabic
is then encoded and mapped into the dimensional Qur’an MRC model: First, Arabic pre-trained mod-
vector space and passages are retrieved that are near els are fine-tuned using CA and MSA datasets. Sec-
the query vector. Their approach outperformed ond, the ensembling approach was applied to the
other multiple open-domain QA techniques on sev- results using the majority vote. Finally, the final list
eral QA datasets such as TriviaQA and SQuAD. was refined through several post-processing steps.
Sachan et al. (2022) proposed the unsupervised pas-
sage re-ranker (UPR), in which the system utilizes 3 Datasets
zero-shot question generation for re-ranking pas-
sages in order to improve passage retrieval. It then 3.1 Task A: Passage Retrieval
computes the relevance scores over the generated The data were comprised of the Qur’anic pas-
question and sort the results. Their approach out- sage collection (QPC) and questions from AyaTEC
performed DPR (Karpukhin et al., 2020) on several (Malhas and Elsayed, 2020). The QPC was de-
datasets, such as SQuAD and TriviaQA. Finally, veloped by segmenting the Qur’an passages into
Alsubhi et al. (2022) proposed a multilingual DPR topics, which resulted in 1,266 passages. There
model that was fine-tuned on Arabic datasets. Their were 199 questions that were derived from the Ay-
model outperformed TF-IDF on Arabic datasets, aTEC dataset. The Query Relevance Judgements
which were ARCD (Mozannar et al., 2019) and (QRels) dataset contained 1,132 gold (answer-
TyDiQA-GoldP (Clark et al., 2020). bearing) Qur’anic passages that answered the ques-
tions; these data were used in training and devel-
2.2 Task B: Machine Reading Comprehension opment sets. Finally, the distribution of the dataset
Recently, several researchers have built an MRC was 70%, 10% and 20% for training, development
system to answer questions about the Qur’an. All and testing sets respectively.
these studies used QRCD_v1.1 in the fine-tuning
and evaluation phases (Malhas et al., 2022; Mal- 3.2 Task B: Machine Reading Comprehension
has and Elsayed, 2022). Some studies have pro- In this study, we used three different datasets, as
posed further fine-tuning the model using MSA follows:
datasets (Mostafa and Mohamed, 2022; Malhas QRCD: QRCD_v1.2 consists of 1,399 ques-
and Elsayed, 2022). Mostafa and Mohamed (2022) tion–passage–answer triplets in the training and
developed the Arabic Qur’an MRC model by fine- development splits, as shown in Table 6. It was
tuning the AraELECTRA model using three MSA split 70%, 10%, and 20% for the training, devel-
datasets: Ar-TyDi, Arabic-SQuAD and Arabic opment and test sets respectively (Malhas and El-
Reading Comprehension Dataset (ARCD). Their sayed, 2022, 2020).
model achieved a 0.54 pRR, 0.52 F1@1 and ARCD: It consists of 1,395 ques-
0.23 EM. Other studies have proposed fine-tuning tion–passage–answer triplets for Wikipedia
the model using the CA dataset. Sleem et al. passages (Mozannar et al., 2019).
(2022) fine-tuned AraBERTv02 using the Arabic Quran Question–Answer pairs (QUQA): It
Al-Qur’an Question and Answer Corpus (AQQAC) consists of 3,382 question–passage–answer triplets
(Alqahtani, 2019). This model achieved scores of regarding the Arabic Holy Qur’an. This dataset was
0.52 pRR, 0.5 F1@1 and 0.25 EM. built using the available Qur’an AQQAC dataset
721
Model Encoder MAP MRR
BM25 (Robertson and Zaragoza, 2009) - 0.17 0.313
bi-encoder 0.511 0.687
ArabicBERT (Safaya et al., 2020)
cross-encoder 0.292 0.452
bi-encoder 0.489 0.7
CL-AraBERT (Malhas and Elsayed, 2022)
cross-encoder 0.318 0.481
bi-encoder 0.461 0.662
AraBERT (Antoun et al.)
cross-encoder 0.351 0.54
bi-encoder 0.455 0.606
CAMeL-BERT (Inoue et al., 2021)
cross-encoder 0.351 0.505
Ensemble ArabicBERT & CL-AraBERT bi-encoder 0.534 0.73
Ensemble ArabicBERT & CL-AraBERT & CAMeL-BERT bi-encoder 0.487 0.688
Ensemble ArabicBERT & CL-AraBERT & AraBERT bi-encoder 0.485 0.682

Table 1: The results of the development set by BM25, individual Arabic pre-trained models and the ensemble
method. MAP is the official evaluation metric. The cross-encoder is used for re-ranking the list of answers output
by the bi-encoder.

(Alqahtani, 2019) and five available books. It is ingLoss (MNRL) loss function, as it allowed for
available in the Github repository. 1 two similar or positive sentences without labels to
be computed. Finally, the QPC dataset were en-
4 Proposed Models coded for each model. All the models used the
4.1 Task A: Passage Retrieval following parameters: 5 epochs, a learning rate of
2e-5, max length 512 and batch size of 16.
Sentence transformers, also known as Sentence- Ensemble Approach: The ensemble method
BERT (SBERT), introduced a bi-encoder that trans- used for this task was to retrieve the top 20 answers
forms a pair of sentences independently and maps from the Arabic pre-trained models. If the answer
them to a dense vector for efficient comparison was found in all outputs, we then summed up the
when performing an information retrieval task scores and divided by the number of models to ob-
(Thakur et al., 2021). Our proposed system uses tain the average score. These answers were then
a bi-encoder method for a semantic search task put at a top-ranked list by descending order of av-
by further training Arabic pre-trained models with eraged score. If there were remaining places in the
the QRCD_v1.1 (Malhas et al., 2022). We also ranked list, we added answers that had the highest
used the cross-encoder “mmarco-mMiniLMv2- scores out of all the models. Finally, we capped the
L12-H384-v1” 2 for re-ranking; however, it did not ranked list at 10 answers 3 .
improve the performance of the individual models.
Training the Models: We trained a set of 4.2 Task B: Machine Reading Comprehension
four models using the SBERT architecture with
The pre-trained transformer-based models were the
Arabic pre-trained models: ArabicBERT (Safaya
basis of our methodology. As a first step, we fine-
et al., 2020), CAMeL-BERT (Inoue et al., 2021),
tuned all available Arabic pre-trained models with
AraBERT (Antoun et al.) and CL-AraBERT (Mal-
the QRCD_v1.2 training set. There were nine Ara-
has and Elsayed, 2022). Two datasets were used
bic pre-trained models: AraBERT base, AraBERT
for training the models: the training set of Task
large, CAMeL-BERT, ArabicBERT, CL-AraBERT,
A and the QRCD_v1.1. Since most of the data
AraELECTRA (Antoun et al., 2021), MARBERT,
were duplicated between the QRCD_v1.1 and the
ARBERT (Abdul-Mageed et al., 2021) and QARiB
training set of PR task, we used the NoDuplicates-
(Abdelali et al., 2021). When we conducted our ex-
DataLoader function to remove any copies prior
periments, we set the batch size to 8 for AraBERT
to training. We used the MultipleNegativesRank-
large and 16 for the rest of the models, the number
1
http://https://github.com/scsaln/ of epochs to 4, and the learning rate to 1e-4. We
HAQA-and-QUQA
2 3
https://huggingface.co/nreimers/ The code can be accessed here https://github.com/
mmarco-mMiniLMv2-L12-H384-v1 AlsalehAbdullah/Quran_PR_Task

722
Model QRCD QRCD QRCD QRCD
+QUQA +ARCD +QUQA
+ARCD
AraBERT Large 0.165 0.482 0.162 -
AraBERT Base 0.402 0.458 0.433 0.49
MARBERT 0.326 0.089 0.291 -
ARBERT 0.357 0.38 0.343 -
QARiB 0.307 0.301 0.278 -
CAMeL-BERT 0.401 0.406 0.362 -
ArabicBERT 0.332 0.330 0.313 -
AraELECTRA 0.332 0.248 0.218 -
CL-AraBERT 0.373 0.383 0.358 -

Table 2: The pAP@10 result of fine-tuned different Arabic pre-trained models by using different combinations of
the datasets.

attempted to improve the performance using the Model pAP@10


following three optimisation approaches 4 : Ensemble Vanilla (All) 0.466
Transfer Learning: We conducted three experi- Ensemble Vanilla (Best) 0.517
ments using this approach. We further fine-tuned Ensemble POST (Best) 0.537
the models using different datasets. In the first ex-
periment, we used the CA dataset QUQA. Second, Table 3: The results of the ensemble approach. En-
semble Vanilla (All) refers to applying the ensemble
the MSA ARCD was used. Finally, a combination
approach to all models. Ensemble Vanilla (Best) repre-
of the QUQA dataset and ARCD was used only sents applying the ensemble approach to the best two
for the models that showed an improvement in per- performed models (the bert-large-arabertv02 and the
formance when using one of these two datasets bert-base-arabertv02). Ensemble POST (Best) refers
individually. to the Vanilla (Best) after applying the postprocessing
Ensemble Approach: We used majority voting step.
among the models to produce the final ranked-list
results. We took the top 20 answers with their
reciprocal rank (MRR) was also reported.
scores for each question from each model. We then
computed the total score for each answer. The total Validation: As for the validation results, the
score was the sum of the scores obtained from the BM25 scored the lowest, with a 0.17 MAP. As for
answers from all models. After that, we sorted the the pre-trained models, ArabicBERT performed the
answers for each question based on the total score. best of the individual models using a bi-encoder
Finally, we adopted the top 10 answers as the final with a 0.511 MAP, while the ensemble of Ara-
ranked list. bicBERT and CL-AraBERT performed the best
Post-Processing: There were two issues when with the validation set with 0.534. Therefore, to
producing the ranked list: uninformative answers address RQ1, the Arabic pre-trained models out-
(as shown in Figure 1) and overlapping answers (as performed BM25 (See Table 1).
shown in Figure 2). The first issue was solved by Testing: For the test set, we chose three methods
removing these answers from the list. The second based on their performances with the validation set.
was overcome by applying a redundancy elimina- They were: ArabicBERT, CL-AraBERT and the
tion algorithm (ElKomy and Sarhan, 2022). ensemble of ArabicBERT and CL-AraBERT. The
test set results did not reflect the performances on
5 Results and Discussion the validation set, as it can be seen in Table 4. CL-
AraBERT performed the best with a 0.124 MAP
5.1 Task A: Passage Retrieval
while the performance of the ensemble method
The official evaluation metric used for this task was a close second with a 0.117 MAP. The ensem-
was mean average precision (MAP), but the mean ble method and CL-AraBERT have successfully
4
The code can be accessed here https://github.com/ answered two questions with a perfect score of 1
scsaln/RC MAP while 21 questions scored a 0 MAP. Some
723
of these happened to be a no-answer, which the Model MAP MRR
models have failed to identify. Ensemble 0.117 0.36
ArabicBERT 0.07 0.20
5.2 Task B: Machine Reading Comprehension CL-AraBERT 0.124 0.375
The evaluation metric for Task B was partial aver- Table 4: Test set results of Task A.
age precision (pAP) (Malhas and Elsayed, 2022,
2020).
Validation: Column QRCD in Table 2 presents Model pAP@10
the results of the models when they were fine-tuned Ensemble POST (Best) 0.498
using only the QRCD dataset. The AraBET base AraBERT Base 0.5
model outperformed the other models with a 0.402
Table 5: Test set results of Task B.
pAP@10.
First, we addressed RQ2, which was related to
whether the combination of the three factors en-
hanced the performance of the Qur’an MRC mod- following: The model worked as a simple match
els. The first factor further fine-tuned the models us- model. When part of the passage contained words
ing the CA dataset QUQA and/or MSA ARCD. The from the question, it retrieved this part as an an-
results are shown in columns ‘QRCD + QUQA’, swer to the question, even though the meaning of
‘QRCD + ARCD’ and ‘QRCD + QUQA + ARCD’ this part did not answer the question (see Figure 3).
in Table 2. There are three interesting observations Therefore, the system failed to predict the correct
in the results. First, using the QUQA dataset led answer when the answer has semantically similar
to improvements in more than half of the models. words to the question (see Figure 4).
The best score was the pAP@10 of 0.482, obtained
by AraBERT large. Second, when we trained the
model using the ARCD dataset it enhanced the per- 6 Conclusion
formance of the AraBERT base model only with
0.433 pAP@10. Third, using QUQA and ARCD This paper presented our contributions to Task A:
at the same time to train the AraBERT base im- PR and Task B: MRC of the Qur’an QA 2023
proved results with 0.49 pAP@10 compared to shared task. Our proposed PR method was to
using QUQA and ARCD separately. For the sec- train several Arabic pre-trained models with QRCD
ond factor, we used the ensemble method for all the dataset using SBERT architecture and then ensem-
models; however, this approach did not yield the ble the combination of these models. The ensemble
best performance with a result of 0.466 pAP@10. method did not yield the best performance with the
We then ensembled two of the best performing in- test set, although it had the best score with the de-
dividual models, which were AraBERT base and velopment set. CL-AraBERT achieved the best re-
AraBERT large. The results outperformed the other sults with a 0.124 MAP. Our proposed MRC system
models with 0.517. For the third factor, we note is based on combining the transfer learning and en-
that the post-processing step improved the results semble approaches for the best-performing models.
based on the Ensemble ‘POST (Best)’ row shown Initially, we fine-tuned nine different Arabic pre-
in Table 3. trained models using different data collections. We
Testing: For the test set, we chose two methods then applied the ensemble approach to the two best-
based on the performance of the development set. performing models. Finally, we implemented ap-
They were (1) the ensemble of AraBERT base and propriate post-processing steps. The combination
AraBERT large with post-processing and (2) the of the base and large variants of AraBERT achieved
AraBERT base model. The ensemble with the post- the best results on the development set, with a 0.537
processing approach achieved a 0.498 pAP@10, pAP@10. The second-highest score was achieved
while the AraBERT base model achieved the best by base AraBERT with a 0.49 pAP@10. The re-
performance with a 0.5 pAP@10, as it can be seen sults of applying these two models to the test set
in Table 5. showed that the base AraBERT model was the best
When we analysed the model answers to ques- with a score of 0.5 pAP@10, while the ensemble
tions from the development set, we identified the model achieved a score of 0.49 pAP@10.
724
Limitations Wissam Antoun, Fady Baly, and Hazem Hajj. 2021.
AraELECTRA: Pre-training text discriminators for
One of the most important factors affecting the Arabic language understanding. In Proceedings of
performance of pretraining models is the size of the Sixth Arabic Natural Language Processing Work-
the dataset. The size of the dataset used in the shop, pages 191–195, Kyiv, Ukraine (Virtual). Asso-
ciation for Computational Linguistics.
training in this study is miniscule compared to the
size of the data available in the English language. Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan
Therefore, we noticed weak performance of the Garrette, Tom Kwiatkowski, Vitaly Nikolaev, and
Jennimaria Palomaki. 2020. TyDi QA: A Benchmark
models in Arabic. There is an urgent need to build
for Information-Seeking Question Answering in Ty-
large data collections in Arabic. pologically Diverse Languages. Transactions of the
Association for Computational Linguistics, 8:454–
Acknowledgement 470.

The authors would like to express their deepest Mohammed ElKomy and Amany M Sarhan. 2022. Tce
gratitude to King Abdulaziz University for the sup- at qur’an qa 2022: Arabic language question an-
swering over holy qur’an using a post-processed
port. We thank the reviewers for their constructive ensemble of bert-based models. arXiv preprint
comments. arXiv:2206.01550.

Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda


References Bouamor, and Nizar Habash. 2021. The interplay
of variant, size, and task type in Arabic pre-trained
Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Ka- language models. In Proceedings of the Sixth Arabic
reem Darwish, and Younes Samih. 2021. Pre-training Natural Language Processing Workshop, pages 92–
bert on arabic tweets: Practical considerations. 104, Kyiv, Ukraine (Virtual). Association for Compu-
tational Linguistics.
Muhammad Abdul-Mageed, AbdelRahim Elmadany,
and El Moatez Billah Nagoudi. 2021. ARBERT & Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick
MARBERT: Deep bidirectional transformers for Ara- Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and
bic. In Proceedings of the 59th Annual Meeting of the Wen-tau Yih. 2020. Dense passage retrieval for open-
Association for Computational Linguistics and the domain question answering. In Proceedings of the
11th International Joint Conference on Natural Lan- 2020 Conference on Empirical Methods in Natural
guage Processing (Volume 1: Long Papers), pages Language Processing (EMNLP), pages 6769–6781,
7088–7105, Online. Association for Computational Online. Association for Computational Linguistics.
Linguistics.
Rana Malhas and Tamer Elsayed. 2020. Ayatec: Build-
Esha Aftab and Muhammad Kamran Malik. 2022. erock ing a reusable verse-based test collection for arabic
at qur’an qa 2022: Contemporary deep neural net- question answering on the holy qur’an. ACM Trans.
works for qur’an based reading comprehension ques- Asian Low-Resour. Lang. Inf. Process., 19(6).
tion answers. In Proceedinsg of the 5th Workshop on
Open-Source Arabic Corpora and Processing Tools Rana Malhas and Tamer Elsayed. 2022. Arabic ma-
with Shared Tasks on Qur’an QA and Fine-Grained chine reading comprehension on the holy qur’an us-
Hate Speech Detection, pages 96–103. ing cl-arabert. Information Processing & Manage-
ment, 59(6):103068.
Mohammad Mushabbab A Alqahtani. 2019. Quranic
Arabic semantic search model based on ontology of Rana Malhas, Watheq Mansour, and Tamer Elsayed.
concepts. Ph.D. thesis, University of Leeds. 2022. Qur’an QA 2022: Overview of the first shared
task on question answering over the holy qur’an. In
Kholoud Alsubhi, Amani Jamal, and Areej Alhothali. Proceedinsg of the 5th Workshop on Open-Source
2021. Pre-trained transformer-based approach for Arabic Corpora and Processing Tools with Shared
arabic question answering: A comparative study. Tasks on Qur’an QA and Fine-Grained Hate Speech
arXiv preprint arXiv:2111.05671. Detection, pages 79–87, Marseille, France. European
Language Resources Association.
Kholoud Alsubhi, Amani Jamal, and Areej M. Alhothali.
2022. Deep learning-based approach for arabic open Rana Malhas, Watheq Mansour, and Tamer Elsayed.
domain question answering. PeerJ Computer Sci- 2023. Qur’an QA 2023 Shared Task: Overview
ence, 8. of Passage Retrieval and Reading Comprehension
Tasks over the Holy Qur’an. In Proceedings of the
Wissam Antoun, Fady Baly, and Hazem Hajj. Arabert: First Arabic Natural Language Processing Confer-
Transformer-based model for arabic language under- ence (ArabicNLP 2023), Singapore.
standing. In LREC 2020 Workshop Language Re-
sources and Evaluation Conference 11–16 May 2020, Ali Mostafa and Omar Mohamed. 2022. Gof at qur’an
page 9. qa 2022: Towards an efficient question answering

725
for the holy qu’ran in the arabic language using Dataset #Q # Q-P # Q-P-A
deep learning-based approach. In Proceedinsg of Pairs Triplets
the 5th Workshop on Open-Source Arabic Corpora Training 174 992 1179
and Processing Tools with Shared Tasks on Qur’an
QA and Fine-Grained Hate Speech Detection, pages Development 25 163 220
104–111.
Table 6: QRCD distribution. # Q shows the number of
Hussein Mozannar, Karl El Hajal, Elie Maamary, and the questions, # Q-P Pairs show the number of the ques-
Hazem Hajj. 2019. Neural arabic question answering. tions–passage pairs and # Q-P-A Triplets show number
arXiv preprint arXiv:1906.05394. of questions–passage–answers triplets.

Stephen Robertson and Hugo Zaragoza. 2009. The


probabilistic relevance framework: Bm25 and be- C The Analysis and Discussion of Task B
yond. Found. Trends Inf. Retr., 3(4):333–389.
In this appendix, Figure 3 and Figure 4 present the
Devendra Sachan, Mike Lewis, Mandar Joshi, Armen discussion of Task B Machine Reading Compre-
Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke
Zettlemoyer. 2022. Improving passage retrieval with hension.
zero-shot question generation. In Proceedings of
the 2022 Conference on Empirical Methods in Nat-
ural Language Processing, pages 3781–3797, Abu
Dhabi, United Arab Emirates. Association for Com-
putational Linguistics.

Ali Safaya, Moutasem Abdullatif, and Deniz Yuret.


2020. KUISAIL at SemEval-2020 task 12: BERT-
CNN for offensive speech identification in social me-
dia. In Proceedings of the Fourteenth Workshop on
Semantic Evaluation, pages 2054–2059, Barcelona
(online). International Committee for Computational
Linguistics.

Claude Sammut and Geoffrey I. Webb, editors. 2010.


TF–IDF, pages 986–987. Springer US, Boston, MA.

Ahmed Sleem, Eman Mohammed lotfy Elrefai,


Marwa Mohammed Matar, and Haq Nawaz. 2022.
Stars at qur’an qa 2022: Building automatic extrac-
tive question answering systems for the holy qur’an
with transformer models and releasing a new dataset.
In Proceedinsg of the 5th Workshop on Open-Source
Arabic Corpora and Processing Tools with Shared
Tasks on Qur’an QA and Fine-Grained Hate Speech
Detection, pages 146–153.

Nandan Thakur, Nils Reimers, Johannes Daxenberger,


and Iryna Gurevych. 2021. Augmented SBERT: Data
augmentation method for improving bi-encoders for
pairwise sentence scoring tasks. In Proceedings of
the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies, pages 296–310, Online.
Association for Computational Linguistics.

A QRCD Dataset Distribution


In this appendix, Table 6 presents the distribution
of the dataset.

B The problems of the list of answers


In this appendix, Figure 1 and Figure 2 present the
problems we encountered in the list of answers.
726
Figure 1: Example of an uninformative answer.

Figure 2: Example of repeated answers.

Figure 3: Example 1 of an incorrect answer.

Figure 4: Example 2 of an incorrect answer.

727

You might also like