LKAU23 at Qur'an QA 2023

Uploaded by

Baraa Hekal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views8 pages

LKAU23 at Qur'an QA 2023

Uploaded by

Baraa Hekal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

LKAU23 at Qur’an QA 2023: Using Transformer Models for Retrieving

Passages and Finding Answers to Questions from the Qur’an

Sarah Alnefaie[1][2] , Abdullah N. Alsaleh[1][2] ,
Eric Atwell[2] , Mohammed Ammar Alsalka[2] , Abdulrahman Altahhan[2]
King Abdulaziz University[1]
University of Leeds[2]
{scsaln, scanaa, e.s.atwell, m.a.alsalka, a.altahhan}@leeds.ac.uk

Abstract Task A of the Qur’an QA 2023 shared tasks uses

the dense approach by fine-tuning the Arabic pre-
The Qur’an QA 2023 shared task has two sub trained models and then ensemble the best scores.
tasks: Passage Retrieval (PR) task and Ma-
The idea of Task A is to build a system to return a
chine Reading Comprehension (MRC) task.
Our participation in the PR task was to further list of Qur’anic passages that contain answers to a
train several Arabic pre-trained models using a posed question/query (Malhas et al., 2023). How-
Sentence-Transformers architecture and to en- ever, the challenging aspect of this task is that there
semble the best performing models. The results are some questions that do not have an answer in
of the test set did not reflect the results of the de- the Qur’an. The first research question RQ1: Does
velopment set. CL-AraBERT achieved the best using the Arabic pre-trained models in PR for CA
results, with a 0.124 MAP. We also participate
outperform the traditional approach such as BM25?
in the MRC task by further fine-tuning the base
and large variants of AraBERT using Classical Most recent studies on the Qur’an MRC task
Arabic and Modern Standard Arabic datasets. have tended to use transformers-based models
Base AraBERT achieved the best result with along with Qur’anic Reading Comprehension
the development set with a partial average pre- Dataset (QRCD) (Malhas et al., 2022). We noticed
cision (pAP) of 0.49, while it achieved 0.5 with that they improved the performance of the systems
the test set. In addition, we applied the ensem-
using three approaches: (1) using an additional
ble approach of best performing models and
post-processing steps to the final results. Our MSA and/or CA datasets in fine-tuning (Mostafa
experiments with the development set showed and Mohamed, 2022; Aftab and Malik, 2022), (2)
that our proposed model achieved a 0.537 pAP. constructing an ensemble of different BERT mod-
On the test set, our system obtained a pAP score els (3) applying appropriate post-processing steps
of 0.49. on the result of the final ranked list (ElKomy and
Sarhan, 2022). To the best of our knowledge,
1 Introduction no studies have combined these three approaches.
The Arabic language poses many challenges in Therefore, we applied the combination of those ap-
Naturual Language Processing (NLP), including proaches for Task B of the Qur’an QA 2023 shared
in the areas of Machine Reading Comprehension task. The goal of Task B was to build a model
(MRC) and Passage Retrieval (PR). One of the that took a Qur’anic passage and MSA question
most prominent recent NLP techniques applied as input and extracted a ranked list of up to 10 an-
to MRC and PR tasks in the Arabic language is swer spans to that question from the passage as
pre-trained transformer-based models, which can output (Malhas et al., 2023). The new challenge
achieve state-of-the-art performance (Alsubhi et al., in the second version of this task was that there
2021, 2022). were no answers to some questions. The second
There are PR studies that use a dense approach research questions RQ2 in this paper is: Does the
based on pre-trained models (Karpukhin et al., combination of fine-tuning the models with a large
2020). This approach has outperformed traditional CA dataset and/or MSA dataset, ensembling these
information retrieval, such as TF-IDF (Sammut and models and then applying post-processing steps
Webb, 2010) with Modern Standard Arabic (MSA; improve the results?
Alsubhi et al., 2022). To our knowledge, the dense The paper’s structure is as follows: In Section
approach has not been researched with Classical 2, related work is presented. Section 3 describes
Arabic (CA). Therefore, our proposed system for the datasets. This is followed by Section 4, which
720
Proceedings of the The First Arabic Natural Language Processing Conference (ArabicNLP 2023), pages 720–727
December 7, 2023 ©2023 Association for Computational Linguistics
explains the proposed models. In Section 5, the ElKomy and Sarhan (2022) recommends using
results are discussed. Finally, the paper provides a the training and development sets of QRCD_v1.1
conclusion. to fine-tune five different Arabic BERT models.
They then used these five models individually to
2 Related Work find the answers for the QRCD test set. To ob-
tain good results, they implemented an ensemble
2.1 Task A: Passage Retrieval
approach for the results of these models. Finally,
Karpukhin et al. (2020) proposed their dense pas- post-processing was applied to enhance the results.
sage retrieval (DPR) system using BERT base and The results showed a pRR of 56.6, an EM of 26.8
uncased models. Their system applies dual en- and F1@1 of 0.50.
coders for the passages to be transformed into di- To the best of our knowledge, no study has been
mensional real-valued vectors and then applies an conducted on the impact of the combination of
index for all passages for retrieval. The input query the following three factors in building the Arabic
is then encoded and mapped into the dimensional Qur’an MRC model: First, Arabic pre-trained mod-
vector space and passages are retrieved that are near els are fine-tuned using CA and MSA datasets. Sec-
the query vector. Their approach outperformed ond, the ensembling approach was applied to the
other multiple open-domain QA techniques on sev- results using the majority vote. Finally, the final list
eral QA datasets such as TriviaQA and SQuAD. was refined through several post-processing steps.
Sachan et al. (2022) proposed the unsupervised pas-
sage re-ranker (UPR), in which the system utilizes 3 Datasets
zero-shot question generation for re-ranking pas-
sages in order to improve passage retrieval. It then 3.1 Task A: Passage Retrieval
computes the relevance scores over the generated The data were comprised of the Qur’anic pas-
question and sort the results. Their approach out- sage collection (QPC) and questions from AyaTEC
performed DPR (Karpukhin et al., 2020) on several (Malhas and Elsayed, 2020). The QPC was de-
datasets, such as SQuAD and TriviaQA. Finally, veloped by segmenting the Qur’an passages into
Alsubhi et al. (2022) proposed a multilingual DPR topics, which resulted in 1,266 passages. There
model that was fine-tuned on Arabic datasets. Their were 199 questions that were derived from the Ay-
model outperformed TF-IDF on Arabic datasets, aTEC dataset. The Query Relevance Judgements
which were ARCD (Mozannar et al., 2019) and (QRels) dataset contained 1,132 gold (answer-
TyDiQA-GoldP (Clark et al., 2020). bearing) Qur’anic passages that answered the ques-
tions; these data were used in training and devel-
2.2 Task B: Machine Reading Comprehension opment sets. Finally, the distribution of the dataset
Recently, several researchers have built an MRC was 70%, 10% and 20% for training, development
system to answer questions about the Qur’an. All and testing sets respectively.
these studies used QRCD_v1.1 in the fine-tuning
and evaluation phases (Malhas et al., 2022; Mal- 3.2 Task B: Machine Reading Comprehension
has and Elsayed, 2022). Some studies have pro- In this study, we used three different datasets, as
posed further fine-tuning the model using MSA follows:
datasets (Mostafa and Mohamed, 2022; Malhas QRCD: QRCD_v1.2 consists of 1,399 ques-
and Elsayed, 2022). Mostafa and Mohamed (2022) tion–passage–answer triplets in the training and
developed the Arabic Qur’an MRC model by fine- development splits, as shown in Table 6. It was
tuning the AraELECTRA model using three MSA split 70%, 10%, and 20% for the training, devel-
datasets: Ar-TyDi, Arabic-SQuAD and Arabic opment and test sets respectively (Malhas and El-
Reading Comprehension Dataset (ARCD). Their sayed, 2022, 2020).
model achieved a 0.54 pRR, 0.52 F1@1 and ARCD: It consists of 1,395 ques-
0.23 EM. Other studies have proposed fine-tuning tion–passage–answer triplets for Wikipedia
the model using the CA dataset. Sleem et al. passages (Mozannar et al., 2019).
(2022) fine-tuned AraBERTv02 using the Arabic Quran Question–Answer pairs (QUQA): It
Al-Qur’an Question and Answer Corpus (AQQAC) consists of 3,382 question–passage–answer triplets
(Alqahtani, 2019). This model achieved scores of regarding the Arabic Holy Qur’an. This dataset was
0.52 pRR, 0.5 F1@1 and 0.25 EM. built using the available Qur’an AQQAC dataset
721
Model Encoder MAP MRR
BM25 (Robertson and Zaragoza, 2009) - 0.17 0.313
bi-encoder 0.511 0.687
ArabicBERT (Safaya et al., 2020)
cross-encoder 0.292 0.452
bi-encoder 0.489 0.7
CL-AraBERT (Malhas and Elsayed, 2022)
cross-encoder 0.318 0.481
bi-encoder 0.461 0.662
AraBERT (Antoun et al.)
cross-encoder 0.351 0.54
bi-encoder 0.455 0.606
CAMeL-BERT (Inoue et al., 2021)
cross-encoder 0.351 0.505
Ensemble ArabicBERT & CL-AraBERT bi-encoder 0.534 0.73
Ensemble ArabicBERT & CL-AraBERT & CAMeL-BERT bi-encoder 0.487 0.688
Ensemble ArabicBERT & CL-AraBERT & AraBERT bi-encoder 0.485 0.682

Table 1: The results of the development set by BM25, individual Arabic pre-trained models and the ensemble
method. MAP is the official evaluation metric. The cross-encoder is used for re-ranking the list of answers output
by the bi-encoder.

(Alqahtani, 2019) and five available books. It is ingLoss (MNRL) loss function, as it allowed for
available in the Github repository. 1 two similar or positive sentences without labels to
be computed. Finally, the QPC dataset were en-
4 Proposed Models coded for each model. All the models used the
4.1 Task A: Passage Retrieval following parameters: 5 epochs, a learning rate of
2e-5, max length 512 and batch size of 16.
Sentence transformers, also known as Sentence- Ensemble Approach: The ensemble method
BERT (SBERT), introduced a bi-encoder that trans- used for this task was to retrieve the top 20 answers
forms a pair of sentences independently and maps from the Arabic pre-trained models. If the answer
them to a dense vector for efficient comparison was found in all outputs, we then summed up the
when performing an information retrieval task scores and divided by the number of models to ob-
(Thakur et al., 2021). Our proposed system uses tain the average score. These answers were then
a bi-encoder method for a semantic search task put at a top-ranked list by descending order of av-
by further training Arabic pre-trained models with eraged score. If there were remaining places in the
the QRCD_v1.1 (Malhas et al., 2022). We also ranked list, we added answers that had the highest
used the cross-encoder “mmarco-mMiniLMv2- scores out of all the models. Finally, we capped the
L12-H384-v1” 2 for re-ranking; however, it did not ranked list at 10 answers 3 .
improve the performance of the individual models.
Training the Models: We trained a set of 4.2 Task B: Machine Reading Comprehension
four models using the SBERT architecture with
The pre-trained transformer-based models were the
Arabic pre-trained models: ArabicBERT (Safaya
basis of our methodology. As a first step, we fine-
et al., 2020), CAMeL-BERT (Inoue et al., 2021),
tuned all available Arabic pre-trained models with
AraBERT (Antoun et al.) and CL-AraBERT (Mal-
the QRCD_v1.2 training set. There were nine Ara-
has and Elsayed, 2022). Two datasets were used
bic pre-trained models: AraBERT base, AraBERT
for training the models: the training set of Task
large, CAMeL-BERT, ArabicBERT, CL-AraBERT,
A and the QRCD_v1.1. Since most of the data
AraELECTRA (Antoun et al., 2021), MARBERT,
were duplicated between the QRCD_v1.1 and the
ARBERT (Abdul-Mageed et al., 2021) and QARiB
training set of PR task, we used the NoDuplicates-
(Abdelali et al., 2021). When we conducted our ex-
DataLoader function to remove any copies prior
periments, we set the batch size to 8 for AraBERT
to training. We used the MultipleNegativesRank-
large and 16 for the rest of the models, the number
1
http://https://github.com/scsaln/ of epochs to 4, and the learning rate to 1e-4. We
HAQA-and-QUQA
2 3
https://huggingface.co/nreimers/ The code can be accessed here https://github.com/
mmarco-mMiniLMv2-L12-H384-v1 AlsalehAbdullah/Quran_PR_Task

722
Model QRCD QRCD QRCD QRCD
+QUQA +ARCD +QUQA
+ARCD
AraBERT Large 0.165 0.482 0.162 -
AraBERT Base 0.402 0.458 0.433 0.49
MARBERT 0.326 0.089 0.291 -
ARBERT 0.357 0.38 0.343 -
QARiB 0.307 0.301 0.278 -
CAMeL-BERT 0.401 0.406 0.362 -
ArabicBERT 0.332 0.330 0.313 -
AraELECTRA 0.332 0.248 0.218 -
CL-AraBERT 0.373 0.383 0.358 -

Table 2: The pAP@10 result of fine-tuned different Arabic pre-trained models by using different combinations of
the datasets.

attempted to improve the performance using the Model pAP@10

following three optimisation approaches 4 : Ensemble Vanilla (All) 0.466
Transfer Learning: We conducted three experi- Ensemble Vanilla (Best) 0.517
ments using this approach. We further fine-tuned Ensemble POST (Best) 0.537
the models using different datasets. In the first ex-
periment, we used the CA dataset QUQA. Second, Table 3: The results of the ensemble approach. En-
semble Vanilla (All) refers to applying the ensemble
the MSA ARCD was used. Finally, a combination
approach to all models. Ensemble Vanilla (Best) repre-
of the QUQA dataset and ARCD was used only sents applying the ensemble approach to the best two
for the models that showed an improvement in per- performed models (the bert-large-arabertv02 and the
formance when using one of these two datasets bert-base-arabertv02). Ensemble POST (Best) refers
individually. to the Vanilla (Best) after applying the postprocessing
Ensemble Approach: We used majority voting step.
among the models to produce the final ranked-list
results. We took the top 20 answers with their
reciprocal rank (MRR) was also reported.
scores for each question from each model. We then
computed the total score for each answer. The total Validation: As for the validation results, the
score was the sum of the scores obtained from the BM25 scored the lowest, with a 0.17 MAP. As for
answers from all models. After that, we sorted the the pre-trained models, ArabicBERT performed the
answers for each question based on the total score. best of the individual models using a bi-encoder
Finally, we adopted the top 10 answers as the final with a 0.511 MAP, while the ensemble of Ara-
ranked list. bicBERT and CL-AraBERT performed the best
Post-Processing: There were two issues when with the validation set with 0.534. Therefore, to
producing the ranked list: uninformative answers address RQ1, the Arabic pre-trained models out-
(as shown in Figure 1) and overlapping answers (as performed BM25 (See Table 1).
shown in Figure 2). The first issue was solved by Testing: For the test set, we chose three methods
removing these answers from the list. The second based on their performances with the validation set.
was overcome by applying a redundancy elimina- They were: ArabicBERT, CL-AraBERT and the
tion algorithm (ElKomy and Sarhan, 2022). ensemble of ArabicBERT and CL-AraBERT. The
test set results did not reflect the performances on
5 Results and Discussion the validation set, as it can be seen in Table 4. CL-
AraBERT performed the best with a 0.124 MAP
5.1 Task A: Passage Retrieval
while the performance of the ensemble method
The official evaluation metric used for this task was a close second with a 0.117 MAP. The ensem-
was mean average precision (MAP), but the mean ble method and CL-AraBERT have successfully
4
The code can be accessed here https://github.com/ answered two questions with a perfect score of 1
scsaln/RC MAP while 21 questions scored a 0 MAP. Some
723
of these happened to be a no-answer, which the Model MAP MRR
models have failed to identify. Ensemble 0.117 0.36
ArabicBERT 0.07 0.20
5.2 Task B: Machine Reading Comprehension CL-AraBERT 0.124 0.375
The evaluation metric for Task B was partial aver- Table 4: Test set results of Task A.
age precision (pAP) (Malhas and Elsayed, 2022,
2020).
Validation: Column QRCD in Table 2 presents Model pAP@10
the results of the models when they were fine-tuned Ensemble POST (Best) 0.498
using only the QRCD dataset. The AraBET base AraBERT Base 0.5
model outperformed the other models with a 0.402
Table 5: Test set results of Task B.
pAP@10.
First, we addressed RQ2, which was related to
whether the combination of the three factors en-
hanced the performance of the Qur’an MRC mod- following: The model worked as a simple match
els. The first factor further fine-tuned the models us- model. When part of the passage contained words
ing the CA dataset QUQA and/or MSA ARCD. The from the question, it retrieved this part as an an-
results are shown in columns ‘QRCD + QUQA’, swer to the question, even though the meaning of
‘QRCD + ARCD’ and ‘QRCD + QUQA + ARCD’ this part did not answer the question (see Figure 3).
in Table 2. There are three interesting observations Therefore, the system failed to predict the correct
in the results. First, using the QUQA dataset led answer when the answer has semantically similar
to improvements in more than half of the models. words to the question (see Figure 4).
The best score was the pAP@10 of 0.482, obtained
by AraBERT large. Second, when we trained the
model using the ARCD dataset it enhanced the per- 6 Conclusion
formance of the AraBERT base model only with
0.433 pAP@10. Third, using QUQA and ARCD This paper presented our contributions to Task A:
at the same time to train the AraBERT base im- PR and Task B: MRC of the Qur’an QA 2023
proved results with 0.49 pAP@10 compared to shared task. Our proposed PR method was to
using QUQA and ARCD separately. For the sec- train several Arabic pre-trained models with QRCD
ond factor, we used the ensemble method for all the dataset using SBERT architecture and then ensem-
models; however, this approach did not yield the ble the combination of these models. The ensemble
best performance with a result of 0.466 pAP@10. method did not yield the best performance with the
We then ensembled two of the best performing in- test set, although it had the best score with the de-
dividual models, which were AraBERT base and velopment set. CL-AraBERT achieved the best re-
AraBERT large. The results outperformed the other sults with a 0.124 MAP. Our proposed MRC system
models with 0.517. For the third factor, we note is based on combining the transfer learning and en-
that the post-processing step improved the results semble approaches for the best-performing models.
based on the Ensemble ‘POST (Best)’ row shown Initially, we fine-tuned nine different Arabic pre-
in Table 3. trained models using different data collections. We
Testing: For the test set, we chose two methods then applied the ensemble approach to the two best-
based on the performance of the development set. performing models. Finally, we implemented ap-
They were (1) the ensemble of AraBERT base and propriate post-processing steps. The combination
AraBERT large with post-processing and (2) the of the base and large variants of AraBERT achieved
AraBERT base model. The ensemble with the post- the best results on the development set, with a 0.537
processing approach achieved a 0.498 pAP@10, pAP@10. The second-highest score was achieved
while the AraBERT base model achieved the best by base AraBERT with a 0.49 pAP@10. The re-
performance with a 0.5 pAP@10, as it can be seen sults of applying these two models to the test set
in Table 5. showed that the base AraBERT model was the best
When we analysed the model answers to ques- with a score of 0.5 pAP@10, while the ensemble
tions from the development set, we identified the model achieved a score of 0.49 pAP@10.
724
Limitations Wissam Antoun, Fady Baly, and Hazem Hajj. 2021.
AraELECTRA: Pre-training text discriminators for
One of the most important factors affecting the Arabic language understanding. In Proceedings of
performance of pretraining models is the size of the Sixth Arabic Natural Language Processing Work-
the dataset. The size of the dataset used in the shop, pages 191–195, Kyiv, Ukraine (Virtual). Asso-
ciation for Computational Linguistics.
training in this study is miniscule compared to the
size of the data available in the English language. Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan
Therefore, we noticed weak performance of the Garrette, Tom Kwiatkowski, Vitaly Nikolaev, and
Jennimaria Palomaki. 2020. TyDi QA: A Benchmark
models in Arabic. There is an urgent need to build
for Information-Seeking Question Answering in Ty-
large data collections in Arabic. pologically Diverse Languages. Transactions of the
Association for Computational Linguistics, 8:454–
Acknowledgement 470.

The authors would like to express their deepest Mohammed ElKomy and Amany M Sarhan. 2022. Tce
gratitude to King Abdulaziz University for the sup- at qur’an qa 2022: Arabic language question an-
swering over holy qur’an using a post-processed
port. We thank the reviewers for their constructive ensemble of bert-based models. arXiv preprint
comments. arXiv:2206.01550.

Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda

References Bouamor, and Nizar Habash. 2021. The interplay
of variant, size, and task type in Arabic pre-trained
Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Ka- language models. In Proceedings of the Sixth Arabic
reem Darwish, and Younes Samih. 2021. Pre-training Natural Language Processing Workshop, pages 92–
bert on arabic tweets: Practical considerations. 104, Kyiv, Ukraine (Virtual). Association for Compu-
tational Linguistics.
Muhammad Abdul-Mageed, AbdelRahim Elmadany,
and El Moatez Billah Nagoudi. 2021. ARBERT & Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick
MARBERT: Deep bidirectional transformers for Ara- Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and
bic. In Proceedings of the 59th Annual Meeting of the Wen-tau Yih. 2020. Dense passage retrieval for open-
Association for Computational Linguistics and the domain question answering. In Proceedings of the
11th International Joint Conference on Natural Lan- 2020 Conference on Empirical Methods in Natural
guage Processing (Volume 1: Long Papers), pages Language Processing (EMNLP), pages 6769–6781,
7088–7105, Online. Association for Computational Online. Association for Computational Linguistics.
Linguistics.
Rana Malhas and Tamer Elsayed. 2020. Ayatec: Build-
Esha Aftab and Muhammad Kamran Malik. 2022. erock ing a reusable verse-based test collection for arabic
at qur’an qa 2022: Contemporary deep neural net- question answering on the holy qur’an. ACM Trans.
works for qur’an based reading comprehension ques- Asian Low-Resour. Lang. Inf. Process., 19(6).
tion answers. In Proceedinsg of the 5th Workshop on
Open-Source Arabic Corpora and Processing Tools Rana Malhas and Tamer Elsayed. 2022. Arabic ma-
with Shared Tasks on Qur’an QA and Fine-Grained chine reading comprehension on the holy qur’an us-
Hate Speech Detection, pages 96–103. ing cl-arabert. Information Processing & Manage-
ment, 59(6):103068.
Mohammad Mushabbab A Alqahtani. 2019. Quranic
Arabic semantic search model based on ontology of Rana Malhas, Watheq Mansour, and Tamer Elsayed.
concepts. Ph.D. thesis, University of Leeds. 2022. Qur’an QA 2022: Overview of the first shared
task on question answering over the holy qur’an. In
Kholoud Alsubhi, Amani Jamal, and Areej Alhothali. Proceedinsg of the 5th Workshop on Open-Source
2021. Pre-trained transformer-based approach for Arabic Corpora and Processing Tools with Shared
arabic question answering: A comparative study. Tasks on Qur’an QA and Fine-Grained Hate Speech
arXiv preprint arXiv:2111.05671. Detection, pages 79–87, Marseille, France. European
Language Resources Association.
Kholoud Alsubhi, Amani Jamal, and Areej M. Alhothali.
2022. Deep learning-based approach for arabic open Rana Malhas, Watheq Mansour, and Tamer Elsayed.
domain question answering. PeerJ Computer Sci- 2023. Qur’an QA 2023 Shared Task: Overview
ence, 8. of Passage Retrieval and Reading Comprehension
Tasks over the Holy Qur’an. In Proceedings of the
Wissam Antoun, Fady Baly, and Hazem Hajj. Arabert: First Arabic Natural Language Processing Confer-
Transformer-based model for arabic language under- ence (ArabicNLP 2023), Singapore.
standing. In LREC 2020 Workshop Language Re-
sources and Evaluation Conference 11–16 May 2020, Ali Mostafa and Omar Mohamed. 2022. Gof at qur’an
page 9. qa 2022: Towards an efficient question answering

725
for the holy qu’ran in the arabic language using Dataset #Q # Q-P # Q-P-A
deep learning-based approach. In Proceedinsg of Pairs Triplets
the 5th Workshop on Open-Source Arabic Corpora Training 174 992 1179
and Processing Tools with Shared Tasks on Qur’an
QA and Fine-Grained Hate Speech Detection, pages Development 25 163 220
104–111.
Table 6: QRCD distribution. # Q shows the number of
Hussein Mozannar, Karl El Hajal, Elie Maamary, and the questions, # Q-P Pairs show the number of the ques-
Hazem Hajj. 2019. Neural arabic question answering. tions–passage pairs and # Q-P-A Triplets show number
arXiv preprint arXiv:1906.05394. of questions–passage–answers triplets.

Stephen Robertson and Hugo Zaragoza. 2009. The

probabilistic relevance framework: Bm25 and be- C The Analysis and Discussion of Task B
yond. Found. Trends Inf. Retr., 3(4):333–389.
In this appendix, Figure 3 and Figure 4 present the
Devendra Sachan, Mike Lewis, Mandar Joshi, Armen discussion of Task B Machine Reading Compre-
Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke
Zettlemoyer. 2022. Improving passage retrieval with hension.
zero-shot question generation. In Proceedings of
the 2022 Conference on Empirical Methods in Nat-
ural Language Processing, pages 3781–3797, Abu
Dhabi, United Arab Emirates. Association for Com-
putational Linguistics.

Ali Safaya, Moutasem Abdullatif, and Deniz Yuret.

2020. KUISAIL at SemEval-2020 task 12: BERT-
CNN for offensive speech identification in social me-
dia. In Proceedings of the Fourteenth Workshop on
Semantic Evaluation, pages 2054–2059, Barcelona
(online). International Committee for Computational
Linguistics.

Claude Sammut and Geoffrey I. Webb, editors. 2010.

TF–IDF, pages 986–987. Springer US, Boston, MA.

Ahmed Sleem, Eman Mohammed lotfy Elrefai,

Marwa Mohammed Matar, and Haq Nawaz. 2022.
Stars at qur’an qa 2022: Building automatic extrac-
tive question answering systems for the holy qur’an
with transformer models and releasing a new dataset.
In Proceedinsg of the 5th Workshop on Open-Source
Arabic Corpora and Processing Tools with Shared
Tasks on Qur’an QA and Fine-Grained Hate Speech
Detection, pages 146–153.

Nandan Thakur, Nils Reimers, Johannes Daxenberger,

and Iryna Gurevych. 2021. Augmented SBERT: Data
augmentation method for improving bi-encoders for
pairwise sentence scoring tasks. In Proceedings of
the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Hu-
man Language Technologies, pages 296–310, Online.
Association for Computational Linguistics.

A QRCD Dataset Distribution

In this appendix, Table 6 presents the distribution
of the dataset.

B The problems of the list of answers

In this appendix, Figure 1 and Figure 2 present the
problems we encountered in the list of answers.
726
Figure 1: Example of an uninformative answer.

Figure 2: Example of repeated answers.

Figure 3: Example 1 of an incorrect answer.

Figure 4: Example 2 of an incorrect answer.

727

2023 Rocling-1 15
No ratings yet
2023 Rocling-1 15
10 pages
Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization
No ratings yet
Enhancing Quranic Question Answering Systems Through Dataset Expansion and Model Optimization
7 pages
Arabic QA System for Quran
No ratings yet
Arabic QA System for Quran
8 pages
Automated Auth Quranic Verset Using Bert
No ratings yet
Automated Auth Quranic Verset Using Bert
8 pages
DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
No ratings yet
DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
14 pages
Urdu QA Dataset for MRC Research
No ratings yet
Urdu QA Dataset for MRC Research
22 pages
Term Paper
No ratings yet
Term Paper
7 pages
Rag Llama
No ratings yet
Rag Llama
11 pages
03enhancing Text Book Question Answering Using Rag
No ratings yet
03enhancing Text Book Question Answering Using Rag
19 pages
XAI and IOT For Cardiovascular Disease Prediction
No ratings yet
XAI and IOT For Cardiovascular Disease Prediction
13 pages
Persian News QA System Development
No ratings yet
Persian News QA System Development
17 pages
EasyChair Preprint 8588
No ratings yet
EasyChair Preprint 8588
13 pages
Very Good For Transformer
No ratings yet
Very Good For Transformer
34 pages
Arabic Text-to-SQL Challenges
No ratings yet
Arabic Text-to-SQL Challenges
6 pages
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
No ratings yet
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
10 pages
Using An Islamic Question and Answer Knowledge Base To Answer Questions About The Holy Quran
No ratings yet
Using An Islamic Question and Answer Knowledge Base To Answer Questions About The Holy Quran
11 pages
Anatomy of Long-Form Content To KBQA System and QA Generator
No ratings yet
Anatomy of Long-Form Content To KBQA System and QA Generator
8 pages
AI-Driven QA: Dense Passage Retrieval
No ratings yet
AI-Driven QA: Dense Passage Retrieval
13 pages
Large Language Models-Based Metric For Generative Question Answering Systems
No ratings yet
Large Language Models-Based Metric For Generative Question Answering Systems
8 pages
Recent Trends in Deep Learning Based Open-Domain Textual Question Answering Systems
No ratings yet
Recent Trends in Deep Learning Based Open-Domain Textual Question Answering Systems
16 pages
Proceeding New Horizons Religious Text
No ratings yet
Proceeding New Horizons Religious Text
105 pages
Others Indigo Case Study
No ratings yet
Others Indigo Case Study
9 pages
2023 Findings-Acl 609
No ratings yet
2023 Findings-Acl 609
28 pages
Self-Supervised Contrastive Cross-Modality Representation Learning For Spoken Question Answering
No ratings yet
Self-Supervised Contrastive Cross-Modality Representation Learning For Spoken Question Answering
12 pages
IDRAAQ
No ratings yet
IDRAAQ
11 pages
2022.findings Naacl.115
No ratings yet
2022.findings Naacl.115
12 pages
ICSE23
No ratings yet
ICSE23
13 pages
NLP PBL
No ratings yet
NLP PBL
21 pages
Automated Question Generation and Question Answering From Turkish Texts
No ratings yet
Automated Question Generation and Question Answering From Turkish Texts
14 pages
Ara - CANINE: Character-Based Pre-Trained Language Model For Arabic Language Understanding
No ratings yet
Ara - CANINE: Character-Based Pre-Trained Language Model For Arabic Language Understanding
15 pages
BDCC 08 00032
No ratings yet
BDCC 08 00032
26 pages
1 s2.0 S1877050922007141 Main
No ratings yet
1 s2.0 S1877050922007141 Main
6 pages
Retrospective Reader For Machine Reading Comprehension: Zhuosheng Zhang, Junjie Yang, Hai Zhao
No ratings yet
Retrospective Reader For Machine Reading Comprehension: Zhuosheng Zhang, Junjie Yang, Hai Zhao
9 pages
Seg s2s3 t5
No ratings yet
Seg s2s3 t5
5 pages
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
No ratings yet
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
7 pages
Automated Quiz Generation Tool
No ratings yet
Automated Quiz Generation Tool
11 pages
Building Standard Dataset For Quran Tafseer: December 2013
No ratings yet
Building Standard Dataset For Quran Tafseer: December 2013
6 pages
04 Lecture10 QA
No ratings yet
04 Lecture10 QA
51 pages
Arabert: Transformer-Based Model For Arabic Language Understanding
No ratings yet
Arabert: Transformer-Based Model For Arabic Language Understanding
7 pages
Lee 23 N
No ratings yet
Lee 23 N
17 pages
Retrospective Reader For Machine Reading Comprehension
No ratings yet
Retrospective Reader For Machine Reading Comprehension
9 pages
Arabicmmlu: Assessing Massive Multitask Language Understanding in Arabic
No ratings yet
Arabicmmlu: Assessing Massive Multitask Language Understanding in Arabic
17 pages
LLM Meet NG
No ratings yet
LLM Meet NG
20 pages
NLP QAG Methods Compared
No ratings yet
NLP QAG Methods Compared
9 pages
Final Exam: Department: Computer Sciences
No ratings yet
Final Exam: Department: Computer Sciences
10 pages
Retrieving and Reading - A Comprehensive Survey On Open-Domain Question Answering
No ratings yet
Retrieving and Reading - A Comprehensive Survey On Open-Domain Question Answering
21 pages
QASAR Self Supervised Learning Framework For Extractive Question Answering
No ratings yet
QASAR Self Supervised Learning Framework For Extractive Question Answering
12 pages
LSTM4
No ratings yet
LSTM4
5 pages
QASs Presentation
No ratings yet
QASs Presentation
20 pages
Qur'an & NLP Research Topics
No ratings yet
Qur'an & NLP Research Topics
6 pages
How To Build An Open-Domain Question Answering System - Lil'Log
No ratings yet
How To Build An Open-Domain Question Answering System - Lil'Log
27 pages
Learning Answer Generation Using Supervision From Automatic Question Answering Evaluators
No ratings yet
Learning Answer Generation Using Supervision From Automatic Question Answering Evaluators
13 pages
Tsa Ut III Tsa Notes
No ratings yet
Tsa Ut III Tsa Notes
30 pages
Question Answering System: 296: Natural Language Processing
No ratings yet
Question Answering System: 296: Natural Language Processing
30 pages
CCS369 Two Marks
No ratings yet
CCS369 Two Marks
9 pages
Aqua
No ratings yet
Aqua
25 pages
Arabic-Nougat: Fine-Tuning Vision Transformers For Arabic OCR and Markdown Extraction
No ratings yet
Arabic-Nougat: Fine-Tuning Vision Transformers For Arabic OCR and Markdown Extraction
7 pages
Speech Recognition Models For Holy Quran Recitatio
No ratings yet
Speech Recognition Models For Holy Quran Recitatio
14 pages
UNIT IV Lecture Notes Covering Natural Language Processing
No ratings yet
UNIT IV Lecture Notes Covering Natural Language Processing
6 pages
Unit 3
No ratings yet
Unit 3
21 pages
Mekonini Kasu 2020
No ratings yet
Mekonini Kasu 2020
78 pages
A-Z of RAG Question Answering Methods in Langchain
No ratings yet
A-Z of RAG Question Answering Methods in Langchain
33 pages
HLD LLD Design
No ratings yet
HLD LLD Design
3 pages
Hs1501 Notes
No ratings yet
Hs1501 Notes
117 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part I Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part I Spring 2016
10 pages
Capstone Ragchatbot
No ratings yet
Capstone Ragchatbot
84 pages
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
No ratings yet
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
54 pages
Aman
No ratings yet
Aman
71 pages
S41467-022-30761-2-Towards Artificial General Intelligence Via A Multimodal Foundation Model
No ratings yet
S41467-022-30761-2-Towards Artificial General Intelligence Via A Multimodal Foundation Model
13 pages
Plug-and-Play Compositional Reasoning With Large Language Models
No ratings yet
Plug-and-Play Compositional Reasoning With Large Language Models
25 pages
NLP in Medical
No ratings yet
NLP in Medical
11 pages
Pdftriage: Question Answering Over Long, Structured Documents
No ratings yet
Pdftriage: Question Answering Over Long, Structured Documents
17 pages
Expert Systems - 2024 - Kurt Pehlivanoğlu - Comparative Analysis of Paraphrasing Performance of ChatGPT GPT 3 and T5
No ratings yet
Expert Systems - 2024 - Kurt Pehlivanoğlu - Comparative Analysis of Paraphrasing Performance of ChatGPT GPT 3 and T5
22 pages
Natural Language Processing
100% (2)
Natural Language Processing
9 pages
Thesis
No ratings yet
Thesis
154 pages
Exam Ai 102 Designing and Implementing A Microsoft Azure Ai Solution Skills Measured
No ratings yet
Exam Ai 102 Designing and Implementing A Microsoft Azure Ai Solution Skills Measured
15 pages
11 - Question Answering Systems
No ratings yet
11 - Question Answering Systems
34 pages
LLM Paper 5
No ratings yet
LLM Paper 5
6 pages
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
22 pages
AI & ML Researchers' Digest
No ratings yet
AI & ML Researchers' Digest
15 pages
Biogpt: Generative Pre-Trained Transformer For Biomedical Text Generation and Mining
No ratings yet
Biogpt: Generative Pre-Trained Transformer For Biomedical Text Generation and Mining
12 pages
Video Lectures: Week 1 Course Introduction
No ratings yet
Video Lectures: Week 1 Course Introduction
5 pages
Computational Linguistics
No ratings yet
Computational Linguistics
4 pages
AmazonQA: For NLP Researchers
No ratings yet
AmazonQA: For NLP Researchers
8 pages
The Narrativeqa Reading Comprehension Challenge
No ratings yet
The Narrativeqa Reading Comprehension Challenge
12 pages
How Close Is Chatgpt To Human Experts? Comparison Corpus, Evaluation, and Detection
100% (1)
How Close Is Chatgpt To Human Experts? Comparison Corpus, Evaluation, and Detection
20 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages