Adapting LLMs for Educational Use

Uploaded by

techprai77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views13 pages

Adapting LLMs for Educational Use

Uploaded by

techprai77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Adapting Large Language Models for Education:

Foundational Capabilities, Potentials, and Challenges

Qingyao Li1 , Lingyue Fu1 , Weiming Zhang1 , Xianyu Chen1 , Jingwei Yu1 , Wei Xia2 ,
Weinan Zhang1∗ , Ruiming Tang2 and Yong Yu1
1
Shanghai Jiao Tong University
2
Huawei Noah’s Ark Lab
arXiv:2401.08664v1 [cs.AI] 27 Dec 2023

Abstract Student: I am asked to write a Python program to calculate the sum from 1 to 100 in
a simple way. Could you help me to solve this problem?
Online education platforms, leveraging the inter-
net to distribute education resources, seek to pro- Foundational Capabilities
vide convenient education but often fall short in
real-time communication with students. They often Knowledge-based
Programming Writing Question Reasoning Mathematics
struggle to offer personalized education resources Answering
due to the challenge of addressing the diverse ob-
stacles students encounter throughout their learn-
Education Assistant: Sure! You can use a mathematical formula to calculate the sum
ing journey. Recently, the emergence of large lan- of numbers from 1 to 100. The formula for the sum of an arithmetic series is
guage models (LLMs), such as ChatGPT, offers the , where n is the last number in the series. In this case, n= 100.
possibility for resolving this issue by comprehend- Here is a simple Python program using this formula:
ing individual requests. Although LLMs have been def calculate_sum():
successful in various fields, creating an LLM-based n=100
education system is still challenging for the wide total_sum = n * (n+1) // 2
return total_sum
range of educational skills required. This paper re- result = calculate_sum()
views the recently emerged LLM researches related print(f"The sum of numbers from 1 to 100 is: {result}")

to educational capabilities, including mathematics,

writing, programming, reasoning, and knowledge-
Figure 1: An example of LLM-based education systems integrating
based question answering, with the aim to explore multiple abilities to solve student problems.
their potential in constructing the next-generation
intelligent education system. Based on the cur- students in real-time like a teacher. Each student inevitably
rent development status, we further outline two ap- encounters unique challenges during their learning journey.
proaches for an LLM-based education system: a Presently, online education platforms typically offer static
unified approach and a mixture-of-expert (MoE) course videos and exercises, leaving students unable to ask
approach. Finally, we explore the challenges and questions or seek dynamic solutions to their specific issues.
future directions, providing new research opportu- Therefore, developing a teaching assistant model capable of
nities and perspectives on adapting LLMs for edu- addressing students’ individual concerns represents a crucial
cation. step forward in advancing online education.
The emergence of large language models (LLMs) instills
1 Introduction optimism in creating an intelligent education system. Since
the launch of ChatGPT, LLMs have shown exceptional abil-
Education plays a vital role in shaping individuals’ futures ity in understanding human knowledge and have been widely
as it forms the foundation for providing people with knowl- applied in various professional fields, including recommenda-
edge, skills, and critical thinking abilities. Conventional edu- tion systems [Lin et al., 2023], healthcare [Liu et al., 2023d],
cation systems heavily rely on teachers for imparting knowl- economics [Li et al., 2023], and others. However, the chal-
edge to students, which place a significant demand on educa- lenge intensifies when constructing an LLM-based education
tional resources. However, the advent of online education has system due to the broad spectrum of knowledge encompassed
substantially lowered the cost of accessing these educational by education, including mathematics, reasoning, writing, and
materials. Many people are conveniently acquiring knowl- beyond.
edge through online courses and exercises. However, online
education faces the issue of not being able to interact with The act of LLMs answering students’ questions can be
considered as a process that involves using multiple educa-
∗ tional capabilities. For instance, as depicted in Figure 1, the
Corresponding authors: Weinan Zhang, Yong Yu ({wnzhang,
yyu}@sjtu.edu.cn), and Wei Xia ([email protected]). question posed by the student requires the education assis-
tant to possess both mathematical and programming skills 2 Foundational Capabilities
simultaneously. Therefore, summarizing the development
status of LLMs’ education-related capabilities is meaning- 2.1 Mathematics
ful for building the next-generation intelligent education sys- Mathematics demands reasoning and a sophisticated under-
tem. In this paper, we study LLMs from the perspective of standing of complex information, making it one of the dis-
these education-related capabilities and explore the potential ciplines that place the highest premium on these cognitive
of an LLM-based education system, aiming to provide valu- abilities. There is significant academic interest in enhancing
able insights into the areas that need further improvement. the mathematical capabilities of LLMs [Lu et al., 2022b]. In
We present our primary investigation in Figure 2. Our spe- the pursuit of creating an education system based on LLMs,
cific focus centers on five distinct capabilities of LLMs: our goal is for it to possess the ability to address various as-
pects of mathematical problems, whether they involve basic
• Mathematics: We primarily summarize the current numerical computations, intricate reasoning, or problems in-
development of LLMs in solving mathematical prob- tegrating multi-modal information. In this section, we sum-
lems of varying difficulty and types, encompassing ba- marize the current developments in the mathematical capabil-
sic arithmetic operations, challenging problems, multi- ities of LLMs, focusing on four aspects: fundamental numer-
modal problems, and math proof. ical computations, complex reasoning, the handling of multi-
modal problem-solving, and mathematical proof.
• Writing: We investigate the performance of LLMs on
a number of representative writing tasks to outline the Basic Arithmetic Problems
problems LLMs face and potential future directions. In human learning mathematics, foundational mathematical
operations serve as the basis for addressing more advanced
• Programming: In accordance with human program- problems. Recently, significant scholarly attention has fo-
ming conventions, we divide LLMs’ programming pro- cused on augmenting LLMs’ proficiency in this domain [Patel
cess into two stages: code writing and code refinement. et al., 2021; Wang et al., 2017; Zhang et al., 2023a; Geva et
We review the researches in this area and summarize the al., 2020]. Most models try to fine-tune a pre-trained LLM
remaining problems for code education. on high-quality mathematics datasets. Given the robust com-
prehension of human language and notable textual reasoning
• Reasoning: We explore the capability of LLMs to per- abilities exhibited by LLMs, it is natural for many to assume
form reasoning in various ways, including supervised that LLMs should effortlessly handle fundamental mathemat-
fine-tuning, prompt engineering, and hybrid methods, ical problems. However, the reality proves otherwise. Yuan et
and explore their potential applications in the field of al. [2023] pointed out that ChatGPT and GPT-4 perform well
education. in addition and subtraction operations, but their accuracy de-
creases when dealing with multiplication involving larger nu-
• Knowledge-based Question Answering: We investi- merical values. This limitation arises because the language
gate the development of LLMs in open-domain and models do not access a calculator during the computations.
domain-specific knowledge-based question answering. Yang et al. [2023c] and Liu and Low [2023] proposed to fine-
We hope to offer a comprehensive view of incorporat- tune LLMs on high-quality datasets and found that even small
ing such capabilities into the education system. models could avoid this multi-digit problem. However, it is
not feasible to fine-tune each LLM individually. How to pre-
Besides summarizing the current development status of vent such issues during the pre-training phase of language
various foundational abilities related to education, we pro- models remains an open question.
pose two potential approaches for forming an LLM-based
education system. One approach involves training a compre- Challenging Mathematics Problems
hensive language model with multiple capabilities, while the Despite occasional errors in basic mathematical operations,
other is based on an LLM controller in a mixture-of-experts the expectation for LLMs to solve more complex problems
framework. We analyze the implementation difficulties and remains high, and the LLMs’ capabilities in this area continue
pros and cons of these two approaches. to develop. The ability to handle college-level mathematics is
LLMs hold vast potential in the field of education, and our particularly beneficial for senior students’ learning, offering
work aims to offer insights for building an LLM-based intel- significant assistance in understanding challenging concepts.
ligent education system. While Gan et al. [2023] anticipated In recent years, more and more work has been proposed
the prospects of LLMs for education by exploring the various to develop this ability [Luo et al., 2023a; Yu et al., 2023;
roles they can play in the education process, our work differs. Lewkowycz et al., 2022; Wang et al., 2023a]. Wang et al.
We review the evolution of LLMs from the perspective of [2023a] introduced a benchmark SCIBENCH, which contains
education-related capabilities. Additionally, we provide fore- collegiate-level scientific problems from mathematics, chem-
sight into constructing feasible frameworks for LLM-based istry, and physics textbooks. Luo et al. [2023a] tried to en-
education systems. Our work emphasizes a comprehensive hance the mathematical reasoning abilities of Llama-2 by ap-
understanding of LLMs’ educational competencies and ex- plying Reinforcement Learning from Evol-Instruct Feedback
plores frameworks that can effectively integrate these abilities (RLEIF). Zhou et al. [2023] tried to enhance GPT-4’s mathe-
into the educational landscape. matical ability by encouraging it to use code to self-verify its
Basic Arithmetic Problems Goat [Liu and Low, 2023], GENBERT [Geva et al., 2020]

Challenging Mathematics Problems WizardMath [Luo et al., 2023a], MeatMath [Yu et al., 2023]
Mathematics
Multimodal Problems Mathvista [Lu et al., 2023], Unigeo [Chen et al., 2022b]

Mathematical Proof LeanDojo [Yang et al., 2023b], LEGO-Prover [Xin et al., 2023]

Text Summarization BRIO [Liu et al., 2022], PROM [Ma et al., 2023b]
Writing
Grammatical Error Correction CoEdit [Liu et al., 2022], GrammarGPT [Fan et al., 2023]

Foundational Code Writing CodeLlama [Rozière et al., 2023], WizardCoder [Luo et al., 2023b]
Programming
Capabilities Code Refinement Codet [Chen et al., 2022a], SEIDR [Liventsev et al., 2023]

Supervised Finetuning CAGE [Rajani et al., 2019], Scratchpad [Nye et al., 2021]

Reasoning Prompt Engineering CoT [Wei et al., 2022], SelfConsistency [Wang et al., 2022]

Hybrid Methodologies FineTuneCoT [Ho et al., 2022], STaR [Zelikman et al., 2022]

Open-domain QA Retro [Borgeaud et al., 2022], FRESHPROMPT [Vu et al., 2023]

Knowledge-based QA
Domain-specific QA Reta-llm [Liu et al., 2023c], LeanContext [Arefeen et al., 2023]

Figure 2: Summary of LLM’s education-related foundational capabilities.

answers. This approach leads to a significant improvement in Many LLM-based methods are proposed to help theorem
the zero-shot accuracy of mathematical problem-solving. proving [Jiang et al., 2023; Liu et al., 2023b; Xiong et al.,
2023]. Based on these proof assistants, there are two main
Problems Involving Multi-Modal Information approaches to utilizing LLMs for mathematical proofs. The
Multi-modal inputs like geometric shapes in math problems first approach is formal proof search, exemplified by mod-
require LLMs to understand text and image information for els like GPT-f [Polu and Sutskever, 2020], which involves
solutions. Research in multi-modal LLMs for mathematical prompting LLMs to produce the next proof step(also called
reasoning is emerging [Lu et al., 2023; Chen et al., 2022b; ’tactic’ in proof assistants) based on the current proof state
Peng et al., 2023b]. This type of task poses a high require- and some optional context. Following, Thor [Jiang et al.,
ment for the training data’s formation and quality. Chen et 2022b] was further proposed to help select the premise for
al. [2022b] introduced a Unified Geometry problem bench- theorem proving. Yang et al. [2023b] introduced an open-
mark combining calculation and proving tasks. Based on source framework named LeanDojo based on the Lean proof
this dataset, the study presented a framework capable of si- assistant. This framework comprises data, toolkits, models,
multaneously solving calculations and proving tasks through and benchmarks, and it has led to the development of Re-
a sequence generation approach. Lu et al. [2023] proposed Prover (Retrieval-Augmented Prover), which enhances proof
MATHVISTA, a benchmark for diverse mathematical and vi- accuracy by using retrieval methods to extract premises for
sual challenges. Frameworks for this task usually contain LLMs to base their mathematical proofs.
separate encoders for different modalities. For geometric
problems, Zhang et al. [2023b] converted diagrams into text The second is natural proof translation, also known as aut-
clauses, using a convolutional neural network and a language oformalization, represented by models like DSP [Jiang et al.,
model for encoding and a GRU-based framework for answer 2022a], which converts math proofs written in natural lan-
generation. Additionally, Lu et al. [2022a] presented the Tab- guage into formalized versions. This approach primarily ad-
ular Math Word Problems (TABWP) dataset, requiring tex- dresses the issue of insufficient data for formal mathematical
tual and tabular data reasoning, and introduced PROMPTPG, proofs. By utilizing autoformalization, a significant increase
a policy gradient-based selector for training and prompt con- in this type of data can be achieved. Consequently, this ap-
struction for test samples. proach enhances the proof-generating capabilities of neural
provers that have been fine-tuned on this expanded dataset.
Mathematical Proof The strategy leverages the transformation of existing proofs
Unlike other types of mathematical problems where LLMs into a formal structure, thereby enriching the training mate-
primarily focu on reasoning and providing answers, LLMs’ rial for machine learning models in the field of mathemat-
role in mathematical proofs emphasizes the integration with ical proof generation. For example, Wu et al. [2022] em-
proof assistants such as Coq [Barras et al., 1997], Is- ployed LLMs for autoformalization, transforming mathemat-
abelle [Nipkow et al., 2002], and Lean [de Moura et al., ical proofs and problems expressed in natural language into
2015]. These proof assistants correspond to specific program- formal specifications and proofs in Isabelle language. The
ming languages, requiring users to input proofs in these lan- generated data was used to train a neural theorem prover,
guages, after which the assistant can verify the proof’s cor- enhancing the effectiveness of the original prover. Follow
rectness. it, Cunningham et al. [2023] utilized an encoder-decoder
framework based on the universal transformer architecture, al., 2023] first evaluated the error correction performance of
converting both problem statements and mathematical proofs closed-source LLMs such as ChatGPT. Although there ex-
written in LaTeX into the language of the Coq interactive ists a pronounced gap between ChatGPT and the previous
prover. SOTA [Omelianchuk et al., 2020; Grammarly, 2023] mod-
In mathematical education, proof problems are indispens- els on the overall F0.5 metric, closer analysis shows that
able. Currently, LLMs for math proof primarily operate in ChatGPT underperforms other models in terms of precision
the form of interactive theorem proving. In this approach, but far exceeds other models in terms of recall. That said,
LLMs complete proofs by interacting with software proof as- LLMs like ChatGPT are good at error detection. A detailed
sistants. To realize completely automated theorem proving manual analysis of ChatGPT’s outputs revealed that, in most
with LLMs, it is essential that these models possess not only cases, it maintained grammar accuracy better than the pre-
strong reasoning skills but also the capability to formalize vious methods. However, it often overcorrects sentences to
concepts effectively. There is no room for hallucination in increase diversity and fluency, resulting in a decrease in pre-
mathematical proofs, which poses a formidable challenge for cision. Nevertheless, it demonstrates that LLMs are adept at
LLMs. generating diverse and accurate content. Some studies have
attempted to explore open-source LLMs, such as CoEdit [Ra-
2.2 Writing heja et al., 2023], which covered multiple text editing tasks
Writing proficiency is crucial for LLMs, underpinning their (including GEC) by fine-tuning LLMs to integrate the ca-
ability to comprehend inputs deeply and produce semanti- pabilities brought by these tasks. GrammarGPT [Fan et al.,
cally and syntactically accurate outputs [Dong et al., 2022; 2023], on the other hand, asked ChatGPT to generate gram-
Chang et al., 2023]. In education, the writing capability of matically incorrect sentences to guide the training of other
LLMs holds the potential to transform how writing is taught. models. Collectively, these studies underscore the prowess of
They can assist in content creation, simplify complex top- LLMs in generating correct text.
ics for students, and offer personalized educational materials. Leveraging LLMs’ proficiency in text summarization and
In this part, we dive into LLM’s writing capability on two grammatical error correction can significantly benefit edu-
education-related tasks: text summarization and grammatical cation. Their capability to condense complex material into
error correction. concise summaries facilitates efficient learning, while error
correction tools help improve students’ writing and language
Text Summarization
skills. However, critical challenges need to be resolved to
Text summarization, which condenses long texts into sum- integrate these writing-related capabilities to help education.
maries, challenges LLMs to understand diverse content like It becomes evident that more refined evaluation metrics and
news and cross-lingual texts. It is evident that traditional fine- task-specific optimizations are essential for LLMs.
tuning methods are less effective with the advent of more ad-
vanced LLMs [Pu et al., 2023; Liu et al., 2022]. Pu et al. 2.3 Programming
[2023] and Liang et al. [2022] showed LLMs like ChatGPT
initially lag behind fine-tuned models like T5 [Raffel et al., Programming is a process of writing code and correcting
2020] and BART [Lewis et al., 2019] in ROUGE scores [Lin, code if unexpected results are obtained. Incorporating LLMs
2004] for text summarization. However, when human judges in programming education is reshaping the future of AI-
evaluate overall quality, LLMs outperform fine-tuned mod- assisted programming learning. LLMs could play multiple
els and even standard human summaries, superior in aspects roles: as instructors providing guidance, as teaching assis-
like factual consistency, fluency, and diversity. This discovery tants offering personalized tutorials, and as collaborative cod-
underscores the limitations of traditional evaluation methods ing partners. Studies like [Ma et al., 2023a] demonstrated
and suggests a need for new paradigms to guide summariza- improved performance (17%) and efficiency (13%) among
tion tasks in the LLM era. For example, BRIO [Liu et al., programming novices using LLM-based assistants. Research
2022] implemented a ranking task to foster more diverse sum- from [Phung et al., 2023] focused on programming educa-
marizations. Furthermore, Liu et al. [2023e] utilized a GPT tion tasks and benchmarks like [Fu et al., 2023] and [Ding
model based on BRIO to directly generate training data to et al., 2023], which were used to evaluate the effectiveness
guide the learning process of other models, which is similar of LLMs. This section mainly discusses the LLMs’ coding
to the process of RLHF [Stiennon et al., 2020]. development from two perspectives: code writing and code
refinement, corresponding to the two stages in programming.
Grammatical Error Correction
We are well aware of the remarkable capability of LLMs to Code Writing
generate fluent and coherent conversations. However, from an Unlike natural language tasks, generating code requires a
educational perspective, the importance of producing gram- more rigorous token syntax and places higher demands on
matically correct dialogues cannot be overstated, especially the training stage. A common method to improve LLMs’
for students learning a new language. The correctness of performance in generating code is to train or fine-tune them
grammar in conversations plays a vital role in language ac- on extensive code datasets [Chen et al., 2021a; Nijkamp et
quisition, providing students with reliable examples to em- al., 2023]. WizardCoder [Luo et al., 2023b] introduced the
ulate and learn from. Numerous studies have evaluated the Evol-Instruct [Xu et al., 2023] method to generate more com-
effectiveness of LLMs in grammatical error correction. Sev- plex and diverse instruction datasets of code-related tasks.
eral works [Wu et al., 2023; Fang et al., 2023; Mohammed et To emulate the iterative process of humans repeatedly mod-
ifying and reviewing code, InCoder [Fried et al., 2023] uti- 2.4 Reasoning
lized bidirectional encoding instead of left-to-right encoding. The reasoning capability of LLMs offers significant poten-
In addition to next-token prediction, training or fine-tuning tial for educational use. LLMs can act as advanced tools,
code-aimed LLMs on additional code-related tasks could en- enhancing students’ cognitive processes, providing person-
hance their programming capabilities. LLMs first learn lan- alized mentorship, and offering tailored learning support.
guage patterns and representations from a large amount of Consequently, these models can bolster students’ problem-
text data through unsupervised learning. Then, they could solving skills and promote holistic cognitive development in
be fine-tuned on labeled code tasks, allowing them to learn an educational setting. This section reviews LLMs’ general
more targeted code representations and gain a deeper under- reasoning ability development strategies.
standing of code structure and semantics based on the pro-
vided labels. CodeT5+ [Wang et al., 2023b] introduced the Supervised Fine-tuning for Reasoning
concepts of unimodal and bimodal alignment, increasing the Prior researches on the LLMs’ reasoning ability are pri-
model’s adaptability to function in different modes for vari- marily based on the fully supervised fine-tuning method.
ous downstream tasks. During the bimodal alignment phase, For instance, Rajani et al. [2019] effectively fine-tuned a
the model synchronizes the representations of text-code pairs pre-trained GPT model to generate rationales for predic-
using multiple tasks, which improves its ability to understand tions based on the CoS-E dataset [Talmor et al., 2018], en-
and generate content across different modalities. CodeL- hancing performance in commonsense question-answering
lama [Rozière et al., 2023] also applied the multi-task ob- tasks. However, the success of fine-tuning mainly depends
jectives, including autoregression and causal infilling predic- on the availability of a dataset with explicit reasoning, which
tion, which achieves better performance among open models. could be challenging to obtain. Furthermore, the inferences
MFTCoder [Liu et al., 2023a] utilized the Multi-Task Learn- drawn from fine-tuning are mainly limited to specific domain
ing (MTL) technique and incorporated a training loss compu- datasets, relying more on the content within the fine-tuning
tation algorithm to alleviate the instability and imbalance of data than on the LLM’s inherent reasoning abilities.
multi-task training.
Prompt Engineering for Reasoning
Efforts have been made in recent research to tackle the con-
Code Refinement straints inherent in the fine-tuning process of LLMs. These
fine-tuning methods tend to overfit specific dataset distribu-
In most cases, LLMs could not generate the correct code at tions, reducing their effectiveness on more diverse datasets.
once. We could enable LLMs to generate a code sketch (ei- To mitigate this issue, numerous studies have developed
ther actual code or pseudocode) and utilize various methods strategies that utilize the robust reasoning abilities of LLMs,
to guide the model to modify and refine the code. By leverag- taking advantage of their extensive pre-trained parameters.
ing the inherent code correction ability of LLMs, the overall One such strategy involves guiding LLMs to generate in-
precision and quality of the code could be significantly en- ference and reasoning processes through demonstrations or
hanced. Liventsev et al. [2023] constructed a pipeline: Syn- prompts. For example, Wei et al. [2022] introduced the
thesize, Execute, Instruct, Debug, and Rank (SEIDR). It first ”Chain of Thought” (CoT) method, which employed natu-
generates multiple different codes and undergoes the process ral language reasoning steps as prompts to enhance the rea-
of code filtering and debugging, ultimately selecting the best soning performance of LLMs. Similarly, Wang et al. [2022]
code among them. According to Magister et al. [2022], teach- proposed a self-consistency strategy involving sampling a va-
ing an LLM to debug its program draft via few-shot demon- riety of reasoning paths and selecting the most consistent an-
strations could improve the performance on code generation swer, which has been shown to improve model performance.
tasks. Another method for LLM debug is generating unit tests Addressing the constraint of fixed manual annotation demon-
by LLM itself and checking its code [Chen et al., 2022a]. By strations, Diao et al. [2023] introduced an active selection
mimicking the human coding process, LLM’s programming approach. This method dynamically identified the most ef-
ability is greatly enhanced. However, these methods lead to fective demonstrations relevant to task requirements from a
an increased number of calls to LLMs, resulting in a signifi- pool of queries. Furthermore, Zhou et al. [2022] proposed a
cant increase in inference time. prompting strategy that decomposed complex problems into
simpler sub-problems. This approach encouraged sequen-
In coding education, the guidance given by current LLMs tial problem-solving and could potentially enhance the per-
still falls short of the level offered by human instructors. formance of LLMs further.
Firstly, there is significant room for improvement in the cod-
ing abilities of LLMs. When it comes to generating complex Hybrid Methodologies for Reasoning
algorithms, LLMs’ performance rapidly declines compared Despite the success of prompt engineering in improving the
to the functionality achieved by humans [Chen et al., 2021b]. performance of LLMs, it does not boost their core reason-
Additionally, due to the lack of real-world data, LLMs strug- ing skills by not changing the model parameters. To ad-
gle to learn the intermediate thinking process of code writing, dress this limitation, new approaches combining fine-tuning
making it difficult for them to provide relevant explanations with prompt engineering have been developed, aiming to en-
and instructions to beginners. As a result, the use of LLMs in hance the model’s reasoning abilities and responsiveness to
programming education still needs to be improved, especially prompts. One approach employs LLMs in a pedagogical
in terms of interpretability. capacity to instruct smaller student models. Studies by Ho
et al. [2022] and Magister et al. [2022] demonstrated sig- LLM-based education system, this issue becomes more crit-
nificant performance gains across various datasets through ical for providing seemingly correct yet incorrect answers,
fine-tuning student models based on the reasoning outputs leading to misleading students. Drawing insights from ap-
of larger LLMs, even for smaller-scale models. Moreover, proaches that introduce additional information from sources
Zelikman et al. [2022] reported substantial performance im- such as the web or textbooks can offer valuable lessons for
provements across multiple datasets by generating step-by- the development of an LLM-based education system.
step rationales and fine-tuning models based on correct an-
swers, thus facilitating model learning from its reasoning. Domain-specific QA
In conclusion, the discussed methods have significantly ad- Although LLMs are trained on vast corpora, they may still ex-
vanced the reasoning capability of LLMs, optimally utiliz- hibit gaps in understanding specific domains. Typically, there
ing their unique features for diverse reasoning tasks. This are dedicated knowledge repositories for professional content
enhancement can greatly benefit educational applications. that consolidate domain-specific knowledge, such as Med-
However, it’s crucial to recognize the limitations. As under- linePlus1 , GeeksforGeeks2 , etc. Choi et al. [2023] utilized
scored by Valmeekam et al. [2022] and Ruis et al. [2022], an external knowledge base to generate a set of question-
LLMs struggle with complex reasoning tasks and those re- answer pairs and then employed fine-tuning to transfer fi-
quiring implicit expressions. For example, LLMs can strug- nancial knowledge to LLMs, significantly improving finan-
gle with complex reasoning scenarios, leading to a notable cial question-answering tasks. Another common approach
decrease in performance. This is particularly relevant in edu- is to leverage the in-context learning capability of LLMs by
cational contexts, where incorrect problem-solving modeled incorporating retrieved knowledge from the knowledge base
by LLMs could misguide students and lead to misunderstand- into prompts. Peng et al. [2023a] used text embeddings for
ings or flawed comprehension. Thus, despite LLMs’ im- retrieval, followed by LLM’s automatic feature extraction in
mense potential in education, their limitations must be care- order to achieve higher accuracy in pest identification tasks.
fully considered to ensure they facilitate rather than obstruct zhang et al. [2023d] utilized K-nearest neighbors (KNN) to
learning. search for the most similar K records from an accounting
database, serving as k-shot examples and greatly improving
2.5 Knowledge-based Question Answering accounting efficiency. There are also works that train and im-
prove the retriever encoder [Zhang et al., 2023c], as well as
In the context of Knowledge-based Question Answering us-
distill and refine the data in the database [Jeronymo et al.,
ing LLM, the user presents a question to LLM, and LLM
2023]. Such retrieval frameworks have lower costs and can
leverages knowledge-based methods and responds with the
be more flexible in applications across different domains. Liu
corresponding answer. Previous work by Ren et al. [2023]
et al. [2023c] built upon a retrieval-enhanced framework and
showed that LLMs have an inaccurate perception of factual
provided several plug-and-play modules, enabling its appli-
boundaries and often exhibit overconfidence. Many studies
cation in various professional fields.
have explored and utilized external knowledge from open-
world and domain-specific databases to enhance the knowl- By integrating Information Retrieval (IR) systems, LLMs
edge base of these LLMs. can enhance their capabilities with professional knowledge,
gaining valuable and precise supplemental information. Fur-
Open-domain QA thermore, according to Ren et al. [2023], retrieval augmen-
Open-domain question answering requires LLMs to accu- tation can also be employed to improve LLMs’ ability to
rately determine the reliability of information in the open perceive facts within the boundaries of their legal knowl-
world and craft their responses based on that understanding. edge, mitigating the issue of hallucinations. During the ed-
Jiang et al. [2021] evaluated the accuracy of LLM responses ucation process, different majors or courses involve different
to a particular question from the perspective of calibration. professional content. Applying external knowledge reposi-
Through experiments, the researchers discovered that mod- tories as an enhancement mechanism can provide more ac-
els such as T5, BERT, and GPT-2 are not well-calibrated in curate guidance in domain-specific contexts and mitigate the
QA tasks. While suggesting that incorporating calibration- issues caused by misleading information. Therefore, domain-
related methods into the fine-tuning process can effectively specific question-answering ability is crucial for developing
enhance performance in QA tasks, it is evident that solely an LLM-based education system.
pre-trained language models still face significant challenges
in open-domain tasks. To overcome this challenge, many 2.6 Overall Development Status
works tried to add additional information to help the LLMs We select representative benchmarks to assess the current
answer correctly [Khandelwal et al., 2019; Guu et al., 2020; development of LLMs across education-related capabilities.
Borgeaud et al., 2022]. Lazaridou et al. [2022] employed in- Specifically, we mainly collect the results from three sources:
formation gathered from web searches as prompt input for Huggingface3 , OpenCompass4 and C-Eval5 . The formal two
LLM, conditioning it to generate answers to questions. This
1
approach effectively enables LLM to use open-world infor- https://medlineplus.gov
2
mation to answer questions. https://www.geeksforgeeks.org
3
The development of LLM-based open-domain question an- https://huggingface.co/spaces/HuggingFaceH4/open llm leaderboard
4
swering highlights significant challenges, particularly in deal- https://opencompass.org.cn/leaderboard-llm
5
ing with hallucinations. In the context of establishing an https://cevalbenchmark.com/static/leaderboard.html
Table 1: Overview of LLMs’ performance on foundational education-related capabilities.

Models Reference Mathematics Writing Programming Reasoning Knowledge-based QA General

(GSM8K) (OpenCompass) (HumanEval) (HelloSwag) (TruthfulQA) (C-Eval)
(Pass@1) (Avg Score) (Pass@1) (Acc) (MC2) (Avg Score)
GPT-4 [OpenAI, 2023b] 92.00 62.00 67.00 91.40 59.00 68.70
ChatGPT [OpenAI, 2023a] 57.10 48.60 48.10 79.50 47.00 54.40
TigerBot-70B-Chat-V2(70B) [TigerResearch, 2023] 54.36 61.30 30.50 82.83 75.40 -
LLaMA2(70B) [Touvron et al., 2023b] 60.27 51.60 29.90 82.30 56.18 55.20
LLaMA(65B) [Touvron et al., 2023a] 43.37 47.10 23.70 82.30 55.09 38.80
Yi(34B) [lingyiwanwu, 2023] 50.64 48.90 26.20 82.00 56.23 81.40
Vicuna(33B) [TheVicunaTeam, 2023] 13.72 44.90 15.20 83.00 56.16 39.80
WizardLM(30B) [Xu et al., 2023] 34.42 - 26.08 76.32 49.14 -
Moss(16B) [FudanUniversity, 2023] 6.90 39.00 - 55.80 49.00 33.10
Qwen(14B) [Bai et al., 2023] 58.98 52.7 43.90 80.20 49.43 72.10
Baichuan2(13B) [Yang et al., 2023a] 55.30 51.50 17.07 66.90 48.98 40.00
Alpaca(7B) [Taori et al., 2023] 0.15 39.50 9.10 75.71 36.28 29.90
ChatGLM3(6B) [Zeng et al., 2022] 72.30 43.10 44.50 76.50 - 69.00

are comprehensive leaderboards. C-Eval is a Chinese eval- Research on whether LLMs can handle educational tasks
uation suite for foundation models spanning 52 diverse dis- has been underway. Wang and Demszky [2023] introduced
ciplines. We collect performance data from popular general three teacher coaching tasks for generative AI: (A) scoring
LLMs on these benchmarks, and the compiled results are pre- transcript segments using classroom observation instruments,
sented in Table 1, where one can observe that: (B) identifying highlights and missed opportunities for effec-
• It is hard for a single LLM to be superior across all ca- tive instructional strategies, and (C) offering actionable sug-
pabilities. GPT-4 exhibits the best overall performance gestions to encourage more student reasoning. And evaluated
among current LLMs. However, LLMs still lag signif- by human teachers, ChatGPT on these tasks for elementary
icantly behind humans in some crucial abilities, such math classroom transcripts generates responses that are rele-
as achieving 94% accuracy in TruthfulQA [Lin et al., vant to improving instruction, but they are often not novel or
2021], a performance level far superior to the optimal insightful. Beyond that, Phung et al. [2023] assessed the pro-
results of LLMs. gramming education ability of ChatGPT and GPT-4 by com-
paring them with human tutors. The result shows that GPT-
• Most LLMs display considerable variation in developing 4 performs way better than ChatGPT, even close to human
these skills. While certain models (such as Alpaca and tutors in some scenarios, while it also highlights some situa-
Yi) might excel in text comprehension tasks, their effec- tions in which GPT-4 struggles. In particular, for the grading
tiveness often diminishes in areas requiring deep under- feedback and task creation scenarios that have a substantial
standing and reasoning, like Mathematics and Program- gap in the performance of GPT-4 compared to that of human
ming. This reveals the substantial challenges in build- tutors.
ing a unified education-focused LLM since it may fail in
certain areas. 3.2 MoE Approach
Section 2 reviewed the current development of LLMs across
3 Potential of LLM-based Education System various capabilities. Unfortunately, despite the existence of
comprehensive language models, such as GPT-4, these mod-
LLMs can potentially transform online education by under- els often exhibit notable deficiencies in certain abilities. This
standing a wide range of student questions, similar to hu- situation poses a challenge, indicating that relying solely on
man teachers. They aim to provide support across different an LLM itself for educational guidance involving all these ca-
subjects and skill levels. With the latest developments in pabilities is currently a difficult task. Yet, LLMs can achieve
LLMs, we suggest two approaches for creating LLM-based excellent results through fine-tuning individual capabilities,
education systems. The first involves training a comprehen- and their ability to comprehend human language is exception-
sive LLM that can handle questions from various subjects. ally strong. Therefore, we can aggregate models with distinct
The second approach uses a mixture-of-experts (MoE) frame- capabilities using a mixture-of-experts approach. By estab-
work, integrating specialized models to support the system lishing an LLM-based controller for language interaction and
with an LLM controller to manage interactive dialogs with task assignment with students, a currently feasible education
students. system can be generated.
An education framework implemented with a mixture-of-
3.1 Unified Approach experts (MoE) approach is illustrated in Figure 3(b), consist-
The most straightforward idea for establishing an LLM-based ing of multiple models that excel in individual capabilities
education system is to train a language model capable of an- (not necessarily LLMs) and an LLM controller. The con-
swering students’ questions across all subjects. As shown in troller is mainly responsible for three tasks: (1) Understand
Figure 3 (a), the foundational capabilities are included in the the student’s request and decide which specific area or areas
unified LLM, and the student can directly communicate with the request is about. (2) Re-form the request to fit the input
it and ask questions. of the specific areas’ expert models. (3) Aggregate the output
Q: I am asked to write A: The formula
A: The for the sumfor is
the sum is , A: The A:formula for the for
sumtheis sum is ,
Q: I am askeda to
Python
write a Python formula , Q: I am Q:
askedI amtoasked
writetoa Python
write a Python The formula ,
program toprogram
calculate
to the sum from
calculate 1 tofrom 1 where
the sum to n is the last
where n isnumber
the last in the in the
number programprogram
to calculate the sumthe
to calculate from
sum1 to
from 1 towhere nwhere
is thenlast number
is the in the in the
last number
100 in a simple
100 inway. Could
a simple youCould
way. help you
me helpseries.
me Here is aHere
series. simpleis aPython
simpleprogram
Python program 100 in a100
simplein a way.
simpleCould
way.you helpyou
Could mehelp me
series. Here
series.isHere
a simple PythonPython
is a simple program
program
to solve this problem?
to solve this problem? using thisusing
formula: def calculate_sum():
this formula: def calculate_sum(): to solve to
this problem?
solve this problem? using this formula:
using def calculate_sum():
this formula: def calculate_sum():

LLM Controller
LLM Controller Unified LLM-based
Unified Education
LLM-based Model
Education Model
Call Mathematics model andmodel
Call Mathematics ask 'how Call Programming
and ask 'how model and
Call Programming ask 'Write
model and ask 'Write A: The formula for the sum
A: The formula is sum is
for the , where, nwhere
is thenlast number
is the in the in the
last number
to calculate
to the sum from
calculate 1 to from
the sum 100 in a Python
a 100 in
1 to a aprogram to implement
Python program the
to implement the
series.Here is a simple
series.Here is a Python programprogram
simple Python using this formula:
using this formula:
simple way?'.
simple way?'. calculation of
calculation of '. '. def calculate_sum():
def calculate_sum():
n=100 n=100
··· ···
Expert Expert
ModelsModels Internal
Internal Capabilities
Capabilities
Knowledge-based
Knowledge-based Knowledge-based
Knowledge-based
Mathematics
Mathematics Writing WritingQuestionQuestionReasoning
Reasoning Programming
Programming Mathematics
Mathematics WritingWriting Question
Question Reasoning
Reasoning Programming
Programming
Answering
Answering Answering
Answering

(a) Unified (b) MoE

Figure 3: Two frameworks towards LLM-based educational framework. (a) is the unified framework where one LLM model handles all the
education-related questions. (b) is the MoE framework where the LLM controller is responsible for task distribution, and the corresponding
expert model finishes the given task.

of the related experts and generate the final response to the mainly discuss the LLMs’ ability to help researchers’ inter-
student. disciplinary collaboration instead of their ability to answer
The advantage of this approach is that training is less chal- interdisciplinary questions. Cultivating an LLM to obtain
lenging, requiring training on individual capability-related this ability would help to develop a unified education system,
datasets to obtain multiple models, each superior in a spe- which is an essential research direction.
cific capability. However, the drawback lies in potential un-
derstanding errors during inference due to interactions. This 4.2 Student Modeling
includes errors introduced by the LLM controller when as- Establishing an LLM-based education system allows stu-
signing tasks based on student input and integrating outputs dents to articulate their personalized needs through dialogs.
from various experts. Currently, this approach might be a re- Through such conversations, we can extract or infer person-
alistic way of achieving the LLM-based educational assistant alized features about students, such as their current mastery
system. of topics and preferences in learning styles.
Besides modeling students from conversations, some re-
4 Future Directions searches [Aher et al., 2023; Argyle et al., 2023] have shown
Recently, more and more researchers have been trying to ap- that LLMs have certain abilities in simulating humans and
ply LLMs to handle education tasks, such as course design, generating human samples. Applying to education, this abil-
student evaluation, lesson plan design, and others. Neverthe- ity indicates a potential for LLM-based student simulation.
less, there are still many challenges and opportunities. In this way, for the students with few interaction records, the
LLM-based simulator could generate more samples and pro-
4.1 Interdisciplinary Reasoning Ability vide data to help the expert model better understand the stu-
dent. It could help human teachers develop teaching skills
Students may encounter interdisciplinary reasoning problems
better.
during real-world learning, requiring the education system to
integrate multiple capabilities to formulate responses. As il-
lustrated in Figure 1, the student intends to write a program
4.3 Multi-modal Education
to solve a mathematical problem, and the model needs first to In education, multi-modal information is common, like ge-
comprehend the mathematical problem, devise a solution, and ometry problems combining text and images or textbook con-
then generate the code. This process necessitates the model to cepts with illustrations. Building a general intelligent educa-
synthesize both mathematical and programming capabilities. tion system requires handling such multi-modal data. Thus,
However, there is currently limited research in the integra- developing a multi-modal educational model is a promising
tion of multiple interdisciplinary capabilities for LLMs at this future direction.
stage, including both datasets and algorithms. Boyko et al. Notably, the development of multi-modal LLMs is rapidly
[2023] examined how LLMs augment scientific inquiry, code advancing [Ye et al., 2023; Du et al., 2022]. Different
development, scientific writing process, etc., and they pro- kinds of architectures and pre-train tasks are proposed [Du
pose that LLMs can foster interdisciplinary work by bridg- et al., 2022]. However, the education domain often exhibits
ing knowledge gaps across scientific fields. However, they unique distribution characteristics in multi-modal informa-
tion. Therefore, it may place higher demands on the model’s Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Hen-
multi-modal reasoning capability. rique Ponde de Oliveira Pinto, Jared Kaplan, Harri Ed-
wards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex
5 Conclusion Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy
Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan,
In this paper, we presented an overview of the development Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power,
of the LLM-based education system. We first reviewed the Lukasz Kaiser, Mohammad Bavarian, Clemens Winter,
important development of LLMs’ education-related abilities. Philippe Tillet, Felipe Petroski Such, Dave Cummings,
Then, we analyzed the potential of it and proposed two dif- Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel
ferent ways of building such a system. We also highlighted Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex
the future directions that are worth working on. We hope this Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir
survey provides some insight into future research in this di- Balaji, Shantanu Jain, William Saunders, Christopher
rection. Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant
Misra, Evan Morikawa, Alec Radford, Matthew Knight,
Acknowledgments Miles Brundage, Mira Murati, Katie Mayer, Peter Welin-
The SJTU team is partially supported by National Natural der, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya
Science Foundation of China (62177033). Sutskever, and Wojciech Zaremba. Evaluating large lan-
guage models trained on code, 2021.
References Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Hen-
Gati V Aher, Rosa I Arriaga, and Adam Tauman Kalai. Us- rique Ponde de Oliveira Pinto, Jared Kaplan, Harri Ed-
ing large language models to simulate multiple humans and wards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex
replicate human subject studies. In International Confer- Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy
ence on Machine Learning, pages 337–371. PMLR, 2023. Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan,
Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power,
Md Adnan Arefeen, Biplob Debnath, and Srimat Chakrad-
Lukasz Kaiser, Mohammad Bavarian, Clemens Winter,
har. Leancontext: Cost-efficient domain-specific question
Philippe Tillet, Felipe Petroski Such, Dave Cummings,
answering using llms. arXiv preprint arXiv:2309.00841,
Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel
2023.
Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex
Lisa P Argyle, Ethan C Busby, Nancy Fulda, Joshua R Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir
Gubler, Christopher Rytting, and David Wingate. Out of Balaji, Shantanu Jain, William Saunders, Christopher
one, many: Using language models to simulate human Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant
samples. Political Analysis, 31(3):337–351, 2023. Misra, Evan Morikawa, Alec Radford, Matthew Knight,
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Miles Brundage, Mira Murati, Katie Mayer, Peter Welin-
Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei der, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya
Huang, et al. Qwen technical report. arXiv preprint Sutskever, and Wojciech Zaremba. Evaluating large lan-
arXiv:2309.16609, 2023. guage models trained on code, 2021.
Bruno Barras, Samuel Boutin, Cristina Cornes, Judicaël Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi
Courant, Jean-Christophe Filliatre, Eduardo Gimenez, Lin, Jian-Guang Lou, and Weizhu Chen. Codet: Code gen-
Hugo Herbelin, Gerard Huet, Cesar Munoz, Chetan eration with generated tests, 2022.
Murthy, et al. The Coq proof assistant reference manual:
Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin,
Version 6.1. PhD thesis, Inria, 1997.
Chongyu Chen, and Xiaodan Liang. Unigeo: Unifying ge-
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, ometry logical reasoning via reformulating mathematical
Trevor Cai, Eliza Rutherford, Katie Millican, George Bm expression. arXiv preprint arXiv:2212.02746, 2022.
Van Den Driessche, Jean-Baptiste Lespiau, Bogdan
Damoc, Aidan Clark, et al. Improving language models Stephen Choi, William Gazeley, Siu Ho Wong, and Tingting
by retrieving from trillions of tokens. In International con- Li. Conversational financial information retrieval model
ference on machine learning, pages 2206–2240. PMLR, (confirm), 2023.
2022. Garett Cunningham, Razvan C Bunescu, and David Juedes.
James Boyko, Joseph Cohen, Nathan Fox, Maria Han Veiga, Towards autoformalization of mathematics and code cor-
Jennifer I Li, Jing Liu, Bernardo Modenesi, Andreas H rectness: Experiments with elementary proofs. arXiv
Rauch, Kenneth N Reid, Soumi Tribedi, et al. An interdis- preprint arXiv:2301.02195, 2023.
ciplinary outlook on large language models for scientific Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris
research. arXiv preprint arXiv:2311.04929, 2023. Van Doorn, and Jakob von Raumer. The lean theorem
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kai- prover (system description). In Automated Deduction-
jie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang CADE-25: 25th International Conference on Automated
Wang, Yidong Wang, et al. A survey on evaluation of large Deduction, Berlin, Germany, August 1-7, 2015, Proceed-
language models. arXiv preprint arXiv:2307.03109, 2023. ings 25, pages 378–388. Springer, 2015.
Shizhe Diao, Pengcheng Wang, Yong Lin, and Tong Zhang. Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neu-
Active prompting with chain-of-thought for large language big. How can we know when language models know? on
models. arXiv preprint arXiv:2302.12246, 2023. the calibration of language models for question answer-
Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian ing. Transactions of the Association for Computational
Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Linguistics, 9:962–977, 2021.
Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Albert Q Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Ji-
Xiang. Crosscodeeval: A diverse and multilingual bench- acheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu,
mark for cross-file code completion, 2023. and Guillaume Lample. Draft, sketch, and prove: Guid-
Chenhe Dong, Yinghui Li, Haifan Gong, Miaoxin Chen, ing formal theorem provers with informal proofs. arXiv
Junxin Li, Ying Shen, and Min Yang. A survey of natu- preprint arXiv:2210.12283, 2022.
ral language generation. ACM Comput. Surv., 55(8), dec Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski,
2022. Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś,
Yifan Du, Zikang Liu, Junyi Li, and Wayne Xin Zhao. A sur- Yuhuai Wu, and Mateja Jamnik. Thor: Wielding ham-
vey of vision-language pre-trained models. arXiv preprint mers to integrate language models and automated theorem
arXiv:2202.10936, 2022. provers. Advances in Neural Information Processing Sys-
tems, 35:8360–8373, 2022.
Yaxin Fan, Feng Jiang, Peifeng Li, and Haizhou Li. Gram-
margpt: Exploring open-source llms for native chi- Albert Q Jiang, Wenda Li, and Mateja Jamnik. Multi-
nese grammatical error correction with supervised fine- lingual mathematical autoformalization. arXiv preprint
tuning. In CCF International Conference on Natural Lan- arXiv:2311.03755, 2023.
guage Processing and Chinese Computing, pages 69–80. Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettle-
Springer, 2023. moyer, and Mike Lewis. Generalization through mem-
Tao Fang, Shu Yang, Kaixin Lan, Derek F Wong, Jinpeng orization: Nearest neighbor language models. arXiv
Hu, Lidia S Chao, and Yue Zhang. Is chatgpt a highly flu- preprint arXiv:1911.00172, 2019.
ent grammatical error correction system? a comprehensive Angeliki Lazaridou, Elena Gribovskaya, Wojciech
evaluation. arXiv preprint arXiv:2304.01746, 2023. Stokowiec, and Nikolai Grigorev. Internet-augmented
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, language models through few-shot prompting for
Eric Wallace, Freda Shi, Ruiqi Zhong, Wen tau Yih, Luke open-domain question answering. arXiv preprint
Zettlemoyer, and Mike Lewis. Incoder: A generative arXiv:2203.05115, 2022.
model for code infilling and synthesis, 2023. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvinine-
Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, jad, Abdelrahman Mohamed, Omer Levy, Ves Stoy-
Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, anov, and Luke Zettlemoyer. Bart: Denoising sequence-
Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, to-sequence pre-training for natural language genera-
Siyuan Qi, Kangning Zhang, Weinan Zhang, and Yong Yu. tion, translation, and comprehension. arXiv preprint
Codeapex: A bilingual programming evaluation bench- arXiv:1910.13461, 2019.
mark for large language models, 2023. Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan
FudanUniversity. moss. https://github.com/OpenLMLab/ Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose
MOSS, 2023. Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al.
Solving quantitative reasoning problems with language
Wensheng Gan, Zhenlian Qi, Jiayang Wu, and Jerry Chun- models, 2022. URL https://arxiv. org/abs/2206.14858,
Wei Lin. Large language models in education: Vision and 2022.
opportunities. arXiv preprint arXiv:2311.13160, 2023.
Nian Li, Chen Gao, Yong Li, and Qingmin Liao. Large lan-
Mor Geva, Ankit Gupta, and Jonathan Berant. Injecting guage model-empowered agents for simulating macroeco-
numerical reasoning skills into language models. arXiv nomic activities. arXiv preprint arXiv:2310.10436, 2023.
preprint arXiv:2004.04487, 2020.
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras,
Grammarly. Grammarly, 2023. Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holis-
Mingwei Chang. Retrieval augmented language model pre- tic evaluation of language models. arXiv preprint
training. In International conference on machine learning, arXiv:2211.09110, 2022.
pages 3929–3938. PMLR, 2020. Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa:
Namgyu Ho, Laura Schmid, and Se-Young Yun. Large Measuring how models mimic human falsehoods. arXiv
language models are reasoning teachers. arXiv preprint preprint arXiv:2109.07958, 2021.
arXiv:2212.10071, 2022. Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen,
Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruim-
Fadaee, Roberto Lotufo, Jakub Zavrel, and Rodrigo ing Tang, et al. How can recommender systems benefit
Nogueira. Inpars-v2: Large language models as efficient from large language models: A survey. arXiv preprint
dataset generators for information retrieval, 2023. arXiv:2306.05817, 2023.
Chin-Yew Lin. Rouge: A package for automatic evaluation Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng,
of summaries. In Text summarization branches out, pages Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and
74–81, 2004. Daxin Jiang. Wizardcoder: Empowering code large lan-
guage models with evol-instruct, 2023.
lingyiwanwu. Yi. https://www.lingyiwanwu.com/, 2023.
Qianou Ma, Hua Shen, Kenneth Koedinger, and Tongshuang
Tiedong Liu and Bryan Kian Hsiang Low. Goat: Fine-tuned Wu. Hypocompass: Large-language-model-based tutor for
llama outperforms gpt-4 on arithmetic tasks. arXiv preprint hypothesis construction in debugging for novices, 2023.
arXiv:2305.14201, 2023.
Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan
Yixin Liu, Pengfei Liu, Dragomir Radev, and Graham Neu- Duan. Prom: A phrase-level copying mechanism with
big. Brio: Bringing order to abstractive summarization. pre-training for abstractive summarization. arXiv preprint
arXiv preprint arXiv:2203.16804, 2022. arXiv:2305.06647, 2023.
Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Lucie Charlotte Magister, Jonathan Mallinson, Jakub
Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Adamek, Eric Malmi, and Aliaksei Severyn. Teach-
Hailian Zhou, Hang Yu, and Jianguo Li. Mftcoder: Boost- ing small language models to reason. arXiv preprint
ing code llms with multitask fine-tuning, 2023. arXiv:2212.08410, 2022.
Chengwu Liu, Jianhao Shen, Huajian Xin, Zhengying Liu, Osamah Mohammed, Thaeer Mueen Sahib, Israa M Hayder,
Ye Yuan, Haiming Wang, Wei Ju, Chuanyang Zheng, Sani Salisu, Misbah Shahid, et al. Chatgpt evaluation: Can
Yichun Yin, Lin Li, et al. Fimo: A challenge formal it replace grammarly and quillbot tools? British Journal of
dataset for automated theorem proving. arXiv preprint Applied Linguistics, 3(2):34–46, 2023.
arXiv:2309.04295, 2023. Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan
Jiongnan Liu, Jiajie Jin, Zihan Wang, Jiehan Cheng, Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong.
Zhicheng Dou, and Ji-Rong Wen. Reta-llm: A retrieval- Codegen: An open large language model for code with
augmented large language model toolkit, 2023. multi-turn program synthesis, 2023.
Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.
Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin
Isabelle/HOL: a proof assistant for higher-order logic.
Zhou, and Yining Hua. Qilin-med-vl: Towards chinese
Springer, 2002.
large vision-language model for general healthcare. arXiv
preprint arXiv:2310.17956, 2023. Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Hen-
ryk Michalewski, Jacob Austin, David Bieber, David Do-
Yixin Liu, Alexander R Fabbri, Pengfei Liu, Dragomir han, Aitor Lewkowycz, Maarten Bosma, David Luan,
Radev, and Arman Cohan. On learning to summarize et al. Show your work: Scratchpads for intermedi-
with large language models as references. arXiv preprint ate computation with language models. arXiv preprint
arXiv:2305.14239, 2023. arXiv:2112.00114, 2021.
Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chern-
Moonen. Fully autonomous programming with large lan- odub, and Oleksandr Skurzhanskyi. Gector–grammatical
guage models. In Proceedings of the Genetic and Evolu- error correction: tag, not rewrite. arXiv preprint
tionary Computation Conference. ACM, jul 2023. arXiv:2005.12592, 2020.
Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song- OpenAI. chatgpt. https://chat.openai.com/, 2023.
Chun Zhu, Tanmay Rajpurohit, Peter Clark, and Ashwin OpenAI. Gpt-4 technical report, 2023.
Kalyan. Dynamic prompt learning via policy gradient for
Arkil Patel, Satwik Bhattamishra, and Navin Goyal. Are nlp
semi-structured mathematical reasoning. arXiv preprint
models really able to solve simple math word problems?
arXiv:2209.14610, 2022.
arXiv preprint arXiv:2103.07191, 2021.
Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, and Kai-Wei Ruoling Peng, Kang Liu, Po Yang, Zhipeng Yuan, and Shun-
Chang. A survey of deep learning for mathematical rea- bao Li. Embedding-based retrieval with llm for effective
soning. arXiv preprint arXiv:2212.10535, 2022. agriculture information extracting from unstructured data,
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, 2023.
Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Shuai Peng, Di Fu, Yijun Liang, Liangcai Gao, and Zhi Tang.
Galley, and Jianfeng Gao. Mathvista: Evaluating mathe- Geodrl: A self-learning framework for geometry problem
matical reasoning of foundation models in visual contexts. solving using reinforcement learning in deductive reason-
arXiv preprint arXiv:2310.02255, 2023. ing. In Findings of the Association for Computational Lin-
Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang guistics: ACL 2023, pages 13468–13480, 2023.
Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Tung Phung, Victor-Alexandru Pădurean, José Cambronero,
Chen, and Dongmei Zhang. Wizardmath: Empowering Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish
mathematical reasoning for large language models via re- Singla, and Gustavo Soares. Generative ai for program-
inforced evol-instruct. arXiv preprint arXiv:2308.09583, ming education: Benchmarking chatgpt, gpt-4, and human
2023. tutors, 2023.
Stanislas Polu and Ilya Sutskever. Generative language Llama: Open and efficient foundation language models.
modeling for automated theorem proving. arXiv preprint arXiv preprint arXiv:2302.13971, 2023.
arXiv:2009.03393, 2020. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert,
Xiao Pu, Mingqi Gao, and Xiaojun Wan. Summarization is Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov,
(almost) dead. arXiv preprint arXiv:2309.09558, 2023. Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Llama 2: Open foundation and fine-tuned chat models.
Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and arXiv preprint arXiv:2307.09288, 2023.
Peter J Liu. Exploring the limits of transfer learning with Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, and
a unified text-to-text transformer. The Journal of Machine Subbarao Kambhampati. Large language models still can’t
Learning Research, 21(1):5485–5551, 2020. plan (a benchmark for llms on planning and reasoning
Vipul Raheja, Dhruv Kumar, Ryan Koo, and Dongyeop about change). arXiv preprint arXiv:2206.10498, 2022.
Kang. Coedit: Text editing by task-specific instruction tun- Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry
ing. arXiv preprint arXiv:2305.09857, 2023. Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou,
Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Quoc Le, et al. Freshllms: Refreshing large language
and Richard Socher. Explain yourself! leveraging lan- models with search engine augmentation. arXiv preprint
guage models for commonsense reasoning. arXiv preprint arXiv:2310.03214, 2023.
arXiv:1906.02361, 2019. Rose E Wang and Dorottya Demszky. Is chatgpt a good
Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, teacher coach? measuring zero-shot performance for scor-
Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, and Haifeng ing and providing actionable insights on classroom instruc-
Wang. Investigating the factual knowledge boundary of tion. arXiv preprint arXiv:2306.03090, 2023.
large language models with retrieval augmentation, 2023. Yan Wang, Xiaojiang Liu, and Shuming Shi. Deep neural
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten solver for math word problems. In Proceedings of the 2017
Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu conference on empirical methods in natural language pro-
Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, cessing, pages 845–854, 2017.
Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Can- Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le,
ton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Ed Chi, Sharan Narang, Aakanksha Chowdhery, and
Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Denny Zhou. Self-consistency improves chain of
Martin, Nicolas Usunier, Thomas Scialom, and Gabriel thought reasoning in language models. arXiv preprint
Synnaeve. Code llama: Open foundation models for code, arXiv:2203.11171, 2022.
2023.
Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu
Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Zhang, Satyen Subramaniam, Arjun R Loomba, Shichang
Rocktäschel, and Edward Grefenstette. Large language Zhang, Yizhou Sun, and Wei Wang. Scibench: Evaluating
models are not zero-shot communicators. arXiv preprint college-level scientific problem-solving abilities of large
arXiv:2210.14986, 2022. language models. arXiv preprint arXiv:2307.10635, 2023.
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q.
Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Bui, Junnan Li, and Steven C. H. Hoi. Codet5+: Open code
and Paul F Christiano. Learning to summarize with human large language models for code understanding and genera-
feedback. Advances in Neural Information Processing Sys- tion, 2023.
tems, 33:3008–3021, 2020.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma,
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-
Jonathan Berant. Commonsenseqa: A question answer- thought prompting elicits reasoning in large language mod-
ing challenge targeting commonsense knowledge. arXiv els. Advances in Neural Information Processing Systems,
preprint arXiv:1811.00937, 2018. 35:24824–24837, 2022.
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Yuhuai Wu, Albert Qiaochu Jiang, Wenda Li, Markus Rabe,
Xuechen Li, Carlos Guestrin, Percy Liang, and Tat- Charles Staats, Mateja Jamnik, and Christian Szegedy.
sunori B. Hashimoto. alpaca. https://crfm.stanford.edu/ Autoformalization with large language models. Advances
2023/03/13/alpaca.html, 2023. in Neural Information Processing Systems, 35:32353–
TheVicunaTeam. vicuna. https://lmsys.org/blog/ 32368, 2022.
2023-03-30-vicuna/, 2023. Haoran Wu, Wenxuan Wang, Yuxuan Wan, Wenxiang Jiao,
TigerResearch. Tigerbot. https://github.com/TigerResearch/ and Michael Lyu. Chatgpt or grammarly? evaluating chat-
TigerBot, 2023. gpt on grammatical error correction benchmark. arXiv
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Mar- preprint arXiv:2303.13648, 2023.
tinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Huajian Xin, Haiming Wang, Chuanyang Zheng, Lin Li,
Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong,
Han Shi, Enze Xie, et al. Lego-prover: Neural the- Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, and
orem proving with growing libraries. arXiv preprint Jian-Yun Nie. Retrieve anything to augment large language
arXiv:2310.00656, 2023. models, 2023.
Jing Xiong, Jianhao Shen, Ye Yuan, Haiming Wang, Yichun Sabrina zhang, Daksha Yadav, and Tom Jin. Cash transac-
Yin, Zhengying Liu, Lin Li, Zhijiang Guo, Qingxing Cao, tion booking via retrieval augmented llm. In KDD 2023
Yinya Huang, et al. Trigo: Benchmarking formal math- Workshop on Robust NLP for Finance (RobustFin), 2023.
ematical proof reduction for generative language models. Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan
arXiv preprint arXiv:2310.10180, 2023. Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui,
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Ji- Olivier Bousquet, Quoc Le, et al. Least-to-most prompt-
azhan Feng, Chongyang Tao, and Daxin Jiang. Wizardlm: ing enables complex reasoning in large language models.
Empowering large language models to follow complex in- arXiv preprint arXiv:2205.10625, 2022.
structions, 2023. Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo,
Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang,
Zhan, et al. Solving challenging math word problems using
Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan,
gpt-4 code interpreter with code-based self-verification.
Fan Yang, et al. Baichuan 2: Open large-scale language
arXiv preprint arXiv:2308.07921, 2023.
models. arXiv preprint arXiv:2309.10305, 2023.
Kaiyu Yang, Aidan M Swope, Alex Gu, Rahul Chalamala,
Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger,
and Anima Anandkumar. Leandojo: Theorem proving
with retrieval-augmented language models. arXiv preprint
arXiv:2306.15626, 2023.
Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai
He, Yuyi Guo, Jinfeng Bai, and Jie Tang. Gpt can solve
mathematical problems without a calculator. arXiv preprint
arXiv:2309.03241, 2023.
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Haowei
Liu, Qi Qian, Ji Zhang, Fei Huang, and Jingren Zhou.
mplug-owl2: Revolutionizing multi-modal large language
model with modality collaboration. arXiv preprint
arXiv:2311.04257, 2023.
Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengy-
ing Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian
Weller, and Weiyang Liu. Metamath: Bootstrap your own
mathematical questions for large language models. arXiv
preprint arXiv:2309.12284, 2023.
Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang,
and Songfang Huang. How well do large language
models perform in arithmetic tasks? arXiv preprint
arXiv:2304.02015, 2023.
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D Goodman.
Star: Bootstrapping reasoning with reasoning, 2022. URL
https://arxiv. org/abs/2203.14465, 2022.
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu
Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng,
Xiao Xia, et al. Glm-130b: An open bilingual pre-trained
model. arXiv preprint arXiv:2210.02414, 2022.
Beichen Zhang, Kun Zhou, Xilin Wei, Wayne Xin Zhao, Jing
Sha, Shijin Wang, and Ji-Rong Wen. Evaluating and im-
proving tool-augmented computation-intensive math rea-
soning. arXiv preprint arXiv:2306.02408, 2023.
Ming-Liang Zhang, Fei Yin, and Cheng-Lin Liu. A multi-
modal neural geometric solver with textual clauses parsed
from diagram. arXiv preprint arXiv:2302.11097, 2023.

Adapting Large Language Models For Education: Foundational Capabilities, Potentials, and Challenges
No ratings yet
Adapting Large Language Models For Education: Foundational Capabilities, Potentials, and Challenges
31 pages
5-Transforming Education With Large Language Models Trends Themes and Untapped Potential
No ratings yet
5-Transforming Education With Large Language Models Trends Themes and Untapped Potential
21 pages
Survey On Generative AI in Education
No ratings yet
Survey On Generative AI in Education
7 pages
From MOOC To MAIC: Reshaping Online Teaching
No ratings yet
From MOOC To MAIC: Reshaping Online Teaching
18 pages
2023 LLM4 Edu
No ratings yet
2023 LLM4 Edu
11 pages
LLMs Transforming Education
No ratings yet
LLMs Transforming Education
14 pages
Educhat:: A Large-Scale Language Model-Based Chatbot System For Intelligent Education
No ratings yet
Educhat:: A Large-Scale Language Model-Based Chatbot System For Intelligent Education
9 pages
Transforming Education With Large Language Models
No ratings yet
Transforming Education With Large Language Models
10 pages
Escholarship UC Item 6kf0r28s
No ratings yet
Escholarship UC Item 6kf0r28s
45 pages
Large Language Models For Education: A Survey: Xu, Gan, Qi, Wu and Yu
No ratings yet
Large Language Models For Education: A Survey: Xu, Gan, Qi, Wu and Yu
19 pages
z-s2.0-S2666920X23000516-main (Bernabei, 2023)
No ratings yet
z-s2.0-S2666920X23000516-main (Bernabei, 2023)
18 pages
Ai Meets The Classroom: When Does Chatgpt Harm Learning?: Matthias Lehmann, Philipp B. Cornelius, Fabian J. Sting
No ratings yet
Ai Meets The Classroom: When Does Chatgpt Harm Learning?: Matthias Lehmann, Philipp B. Cornelius, Fabian J. Sting
40 pages
Enhancing Education Through Thoughtful Integration of Large Language Models in Assigned Work
No ratings yet
Enhancing Education Through Thoughtful Integration of Large Language Models in Assigned Work
9 pages
RAG For Educational Application
No ratings yet
RAG For Educational Application
14 pages
Leveraging Large Language Models To Generate Course-Specific Semantically Annotated Learning Objects
No ratings yet
Leveraging Large Language Models To Generate Course-Specific Semantically Annotated Learning Objects
20 pages
Empowering Private Tutoring by Chaining Large Language Models
No ratings yet
Empowering Private Tutoring by Chaining Large Language Models
11 pages
Beyond Traditional Teaching
No ratings yet
Beyond Traditional Teaching
43 pages
Ingram 2023 Prompting Large Language Models Power Educational Chatbots
No ratings yet
Ingram 2023 Prompting Large Language Models Power Educational Chatbots
20 pages
A Comprehensive Review of Large Language Models: Issues and Solutions in Learning Environments
No ratings yet
A Comprehensive Review of Large Language Models: Issues and Solutions in Learning Environments
34 pages
Leveraging The Potential of Large Language Models in Education Through Playful and Game Based Learning
No ratings yet
Leveraging The Potential of Large Language Models in Education Through Playful and Game Based Learning
20 pages
Teaching With chatGPT
No ratings yet
Teaching With chatGPT
124 pages
SSRN Id4603237
No ratings yet
SSRN Id4603237
9 pages
AI-Powered EdTech Transformation
No ratings yet
AI-Powered EdTech Transformation
5 pages
AI Literacy's Role in Prompt Engineering
No ratings yet
AI Literacy's Role in Prompt Engineering
14 pages
ChatGPTinEducation-AGlobalandVietnameseResearchOverview 7.13
No ratings yet
ChatGPTinEducation-AGlobalandVietnameseResearchOverview 7.13
12 pages
DeeShek Education
No ratings yet
DeeShek Education
6 pages
FutureOfLearning LLMs Book Chapter
No ratings yet
FutureOfLearning LLMs Book Chapter
12 pages
Courseassist: Pedagogically Appropriate Ai Tutor For Computer Science Education
No ratings yet
Courseassist: Pedagogically Appropriate Ai Tutor For Computer Science Education
3 pages
Yan Et Al, 2023 - Practical and Ethical Challenges of Large Language Models in Education A Systematic
No ratings yet
Yan Et Al, 2023 - Practical and Ethical Challenges of Large Language Models in Education A Systematic
23 pages
Brave New World: Artificial Intelligence in Teaching and Learning
No ratings yet
Brave New World: Artificial Intelligence in Teaching and Learning
6 pages
LLM Agents For Education Advances and Applications
No ratings yet
LLM Agents For Education Advances and Applications
17 pages
Can Chatgpt Pass The Vietnamese National High School Graduation Examination?
No ratings yet
Can Chatgpt Pass The Vietnamese National High School Graduation Examination?
9 pages
Exploring The Potential of Using ChatGPT in Physics Education
No ratings yet
Exploring The Potential of Using ChatGPT in Physics Education
19 pages
FOKE A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering-1
No ratings yet
FOKE A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering-1
17 pages
GPT 4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions
No ratings yet
GPT 4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions
31 pages
ChatGPT in Education - A Global and Vietnamese Research Overview - 7.13
No ratings yet
ChatGPT in Education - A Global and Vietnamese Research Overview - 7.13
11 pages
Impact Robotic
No ratings yet
Impact Robotic
21 pages
On The Application of Large Language Models For Language Teaching and Assessment Technology
No ratings yet
On The Application of Large Language Models For Language Teaching and Assessment Technology
25 pages
Performance Secondary
No ratings yet
Performance Secondary
7 pages
Generative AI in Education: From Foundational Insights To The Socratic Playground For Learning
No ratings yet
Generative AI in Education: From Foundational Insights To The Socratic Playground For Learning
49 pages
1 s2.0 S1041608023000195 Main
No ratings yet
1 s2.0 S1041608023000195 Main
9 pages
Gamification
No ratings yet
Gamification
11 pages
Bringing Generative AI To Adaptive Learning in Education
No ratings yet
Bringing Generative AI To Adaptive Learning in Education
14 pages
Stanford AI Education Summit I-1
No ratings yet
Stanford AI Education Summit I-1
27 pages
The Role of GPT Models in Education Challenges and Solutions
No ratings yet
The Role of GPT Models in Education Challenges and Solutions
3 pages
Impact of Large Language Models On Education
No ratings yet
Impact of Large Language Models On Education
21 pages
Transforming Data Science Education
No ratings yet
Transforming Data Science Education
22 pages
Employing Automatic Analysis Tools Aligned To Learning Progressions To Assess Knowledge Application and Support Learning in STEM
No ratings yet
Employing Automatic Analysis Tools Aligned To Learning Progressions To Assess Knowledge Application and Support Learning in STEM
17 pages
Fire System
No ratings yet
Fire System
3 pages
ChatGPTinResearch and Education A SWOTAnalysis of Its Academic Impact
No ratings yet
ChatGPTinResearch and Education A SWOTAnalysis of Its Academic Impact
42 pages
AI Solutions for Struggling Students
No ratings yet
AI Solutions for Struggling Students
10 pages
Advancing AI For Education With Agentic
No ratings yet
Advancing AI For Education With Agentic
7 pages
Cain 2024 Prompting Change Exploring Prompt e
No ratings yet
Cain 2024 Prompting Change Exploring Prompt e
11 pages
AI Como Tutores de Professores
No ratings yet
AI Como Tutores de Professores
33 pages
Assignment Week 6 Sila Celebi AI Presentation
No ratings yet
Assignment Week 6 Sila Celebi AI Presentation
21 pages
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
No ratings yet
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
8 pages
Computers Education: Jin Yuxian
No ratings yet
Computers Education: Jin Yuxian
19 pages
Education AI
No ratings yet
Education AI
7 pages
Force FX-8CS Service Manual - en
83% (6)
Force FX-8CS Service Manual - en
282 pages
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
No ratings yet
Fpv3dcam 3d FPV Camera Blackbird 2 User Guid Eng
16 pages
Relational DB Design Lab Guide
No ratings yet
Relational DB Design Lab Guide
2 pages
Story Name: "The Story Canvas"
No ratings yet
Story Name: "The Story Canvas"
1 page
2nd Exam TQ
No ratings yet
2nd Exam TQ
23 pages
Akshatha Paper
No ratings yet
Akshatha Paper
7 pages
Garduate Nurse Perceptions of The Work Experience
No ratings yet
Garduate Nurse Perceptions of The Work Experience
7 pages
It Ix Sa1 Sample Paper
No ratings yet
It Ix Sa1 Sample Paper
3 pages
Smartphones-Each & Everything You Want To Know About Your Smartphone-FiLELiST PDF
No ratings yet
Smartphones-Each & Everything You Want To Know About Your Smartphone-FiLELiST PDF
181 pages
Drilling Machine Mechanics
No ratings yet
Drilling Machine Mechanics
14 pages
FIFA 17 Release Date Details
No ratings yet
FIFA 17 Release Date Details
3 pages
Advances in Carbohydrate Chemistry and Biochemistry Secure Ebook Download
No ratings yet
Advances in Carbohydrate Chemistry and Biochemistry Secure Ebook Download
17 pages
Invariant Variation Problems: 1. Preliminary Remarks and Formulation of Theorems
No ratings yet
Invariant Variation Problems: 1. Preliminary Remarks and Formulation of Theorems
14 pages
PNL Account Cashflow Forecast: Missing Values
No ratings yet
PNL Account Cashflow Forecast: Missing Values
5 pages
Journal: Embedded Finance
No ratings yet
Journal: Embedded Finance
116 pages
Vinno M80 Brochures PDF
No ratings yet
Vinno M80 Brochures PDF
4 pages
AP0070462152019
No ratings yet
AP0070462152019
1 page
Visitors Guide. Motril History Museum
No ratings yet
Visitors Guide. Motril History Museum
24 pages
Hanon Complete Text
No ratings yet
Hanon Complete Text
129 pages
Unit Ii
No ratings yet
Unit Ii
17 pages
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
No ratings yet
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
5 pages
P&ID Symbols and Legend Guide
No ratings yet
P&ID Symbols and Legend Guide
1 page
Chapter 1+2+GSCM
No ratings yet
Chapter 1+2+GSCM
45 pages
Phones 2017 PDF
No ratings yet
Phones 2017 PDF
161 pages
Multi2sim Quickstart
No ratings yet
Multi2sim Quickstart
10 pages
Export Promotion
No ratings yet
Export Promotion
7 pages
Ictasol
No ratings yet
Ictasol
1 page
Distance Learning Courses DLEN
No ratings yet
Distance Learning Courses DLEN
35 pages
DIY Guide To Building Your Own Pulk
No ratings yet
DIY Guide To Building Your Own Pulk
41 pages
5G Wireless Technology: Millimeter Wave Health Effects
No ratings yet
5G Wireless Technology: Millimeter Wave Health Effects
5 pages