Automated Test Case Generation Using T5 and GPT-3
Automated Test Case Generation Using T5 and GPT-3
Abstract— Test case generation for a given topic can be a Additionally, the manual process of creating test cases
challenging and time-consuming task, particularly when the can be time-consuming and prone to human error, resulting
conversation about the topic includes boundary conditions in wasted resources and inefficiencies in the testing process.
and requirements. This process requires a complete Furthermore, the test designer may miss certain edge cases
understanding of the system and its scenarios to cover all while creating the test cases due to a lack of attention or
possible examples, both good and bad. This paper proposes an understanding of the system.
automated solution for generating test cases using natural
language processing techniques. The proposed methodology This paper intends to develop an automated solution for
uses T5 and GPT-3 models to extract the context and topic of test case generation for a given topic using natural language
the conversation and generate test cases. The model identifies processing techniques, with the goal of improving the
relevant keywords and generates test cases based on them, efficiency, accuracy, and completeness of the test case
providing context-based meanings of specific words and generation process. It also reduces the dependency on the
defining new terms used in the conversation. The proposed expertise of the test designer and ensures complete coverage
approach eliminates the need for manual test case generation of all scenarios. This will not only save time and resources
and allows for the testing of more complex systems with a but also make the process of test case generation more
larger number of scenarios. This will reduce the dependency manageable.
on the expertise of the test designer, resulting in complete
coverage of all scenarios. Additionally, this automation will A. Implications Of The Study And Benefits
save time and resources, making the process of test case
There is a requirement to improve the efficiency,
generation more manageable. By utilizing NLP techniques, it
accuracy, and completeness of the test case generation
will help to ensure that all scenarios are taken into account
and that the test cases generated are comprehensive and
process, by reducing the dependency on the expertise of the
accurate. test designer and ensuring complete coverage of all
scenarios. The potential benefits of the proposed solution
Keywords—Natural Language Processing, T5 Model, GPT- should include saving time and resources, making the
3, Text Conversation, Test Cases, Text Summarisation. process of test case generation more manageable, and
improving the quality of various software applications such
I. INTRODUCTION as automated customer service, quality assurance for
chatbots, language understanding for virtual assistants, user
Generating test cases from a conversation can present
testing for mobile apps, and text-based game testing.
various obstacles that make the task arduous and time-
consuming. One of the major challenges is the need for a Automating test case generation can offer many
thorough comprehension of the system and its potential potential benefits to organizations involved in software
scenarios to cover all examples, both positive and negative. development and testing. These benefits include reduced
This can be difficult as the conversation about the topic may time and effort required for testing, improved quality of
involve boundary conditions and requirements that must be testing, reduced costs, improved collaboration and
considered. communication within development teams, and improved
overall quality of software products.
Another issue is the dependency on the test designer's
expertise. The test designer must have a solid understanding The literature survey provides examples of related
of the system and scenarios to create comprehensive and research, including methods for generating short test cases
accurate test cases. However, this dependency can result in using an ensemble technique, model-based approaches for
overlooked sub-systems when generating test cases, leading generating test cases from business process architecture,
to incomplete coverage of all scenarios. and techniques for improving the performance of natural
1986
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
language understanding models. This study implies that an The paper [3] proposed by Katuri et al in 2022 shows
automated solution for test case generation using natural the development of an interface that transforms speech and
language processing techniques could provide significant other auditory inputs using a digital filter. One of the
benefits for the software development industry, by primary benefits of this technology is that it helps to avoid
improving the efficiency and effectiveness of the testing software errors in the translation of one system to another.
process. As a result, it can prevent issues such as gender recognition
errors and speech recognition failures. This interface can
B. Applications improve the accuracy of speech and auditory input
Automated customer service: By analysing text translation, leading to more effective communication and
conversations between customers and support less confusion.
representatives, test cases can be automatically generated to
ensure that the customer service system is able to handle Liu et al 2021 investigated the sensitivity of GPT-3 to
common issues and questions. out-of-context examples [4]. It proposes a non-parametric
selection approach called KATE, which retrieves in-context
Quality assurance for chatbots: Test cases can be examples based on their semantic similarity to test samples.
generated from text conversations between users and The study shows that this method improves GPT-3's
chatbots to ensure that the chatbot is able to under-stand and performance on several natural language understanding,
respond appropriately to user input. and generation tasks compared to a random sampling
Language understanding for virtual assistants: Test baseline. Additionally, it was found that fine-tuning
cases can be generated from text conversations between sentence embeddings for retrieval on task-related datasets
users and virtual assistants to evaluate the assistant’s ability result in further performance gains. The study suggests that
to understand natural language and perform tasks. this work will aid in understanding GPT-3's behavior and
could be a useful step toward improving its few-shot
User testing for mobile apps: Test cases can be capabilities.
generated from text conversations between users and the
mobile app to evaluate the app's usability and functionality. This study [5] in the year 2021 proposed by Ni et al
presents a two-stage contrastive learning strategy for fine-
Text-based game testing: Test cases can be generated tuning a pre-trained text-to-text model, T5, to create
from text conversations between players and the game's sentence encoders (ST5). The study suggests three
NPCs (non-player characters) to ensure that the NPCs are architectures and compares the performance of encoder-
able to understand and respond to player input. only and encoder-decoder designs as phrase encoders. The
results of extensive SentEval benchmark trials show that
As an aid to manual test case generation: By analysing
encoder-only models have high transfer performance, while
text conversations be-tween users, test cases can be
encoder-decoder models are more effective on textual
generated that cover the common scenarios and edge cases,
similarity tasks. The study also finds that increasing the
which can then be reviewed and refined by manual testers.
model size from millions to billions of parameters results in
II. LITERATURE SURVEY significant benefits for sentence embedding. The study also
describes efficient techniques for creating ST5 from trained
Zhuang et al 2018 describe a method for generating models and highlights the value of increasing model size.
short test cases using an ensemble technique [1]. It utilizes The results imply that future advancements in the size and
multiple rankers and a generation-based approach, and the quality of pre-trained text-to-text models could lead to
final response is chosen based on the rankings provided by further benefits for sentence encoder models.
an ensemble module. However, it is important to note that
the output of an ensemble model is difficult to predict and The P-tuning method proposed by Liu et al in 2021
understand, making it less interpretable. Additionally, enhances the natural language understanding (NLU) of
assembling an ensemble model can be challenging and GPT-3 by automatically searching for better prompts in the
time-consuming and may result in a model with lower continuous space [6]. This method reduces dependence on
prediction accuracy than a single model. a large validation set, has fewer adversarial prompts, and
reduces overfitting. P-tuning recovers 64% of world
The paper proposed by Yazdani et al in 2019 aims to knowledge and enables GPT-style models to compete with
address the issue of a significant proportion of software similar-size BERT models in NLU on the SuperGLUE
errors stemming from a lack of understanding during the benchmark. It also outperforms state-of-the-art methods in
early stages of development [2]. To achieve this, the paper the few-shot SuperGLUE benchmark, improving
presents a model-based approach for automatically bidirectional models.
generating test cases from business process architecture and
process characterizations. Using process models is an In the paper [7] by Zolotareva et al in 2020 proposed an
efficient way to find test cases for business sectors. The abstraction text summarization that has recently made
proposed approach includes a process model to state graph significant progress by moving from linear models with
conversion algorithm, which sets up the model for data sparse, handcrafted features to nonlinear neural network
creation. Subsequently, a tool is used to automatically build models with dense inputs. This success is due to the ability
the test cases based on the identified preconditions. The of deep learning models to capture complex patterns in
proposed method provides an efficient way to identify and natural language data without relying on handcrafted
generate test cases, which can help in reducing errors and features. In this study, the authors investigate text
improve the software development process. summarization using Sequence-to-sequence recurrent
neural networks and Transfer Learning with a Unified Text-
1987
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
to-Text Transformer approach. The experimental results but unnecessary information such as timestamps, names,
demonstrate that the Transfer Learning-based model and background noise is removed.[11]
achieves considerable improvement for abstractive text
summarization. The authors provide a comprehensive The context and topic of the conversation are derived
review of related works on abstractive text summarization from the pre-processed text using the YAKE Extractor. Key
and discuss the advantages and limitations of various phrases are selected by identifying 1–3-gram candidates that
approaches. They also discuss the potential applications of do not contain punctuation marks and do not begin or end
abstractive text summarization in various domains, with a stop word and are then weighted using the YAKE
including news and social media. weighing scheme. The 10 highest-scoring candidates are
chosen as the key phrases.[12]
The commentary in the paper [8] proposed by Floridi et
al focuses on the nature of reversible and irreversible The context is then tokenized using the Sentence piece
questions and their relevance to the analysis of GPT-3, a library and relevant sentences are extracted using the
third-generation autoregressive language model that uses keywords. The sentences are combined and given as input
deep learning to produce human-like texts. The authors to the T5 model for sequence-to-sequence encoding.
discuss three tests based on mathematical, semantic, and Attention masks and input ids are obtained from the
encoding and are used to generate test cases for the given
ethical questions to demonstrate that GPT-3 is not designed
to pass any of them, and caution against interpreting GPT-3 context.
as a step toward general artificial intelligence. The authors Leveraging the natural language generation (NLG)
provide a literature survey on the industrialization of capability of the GPT-3 model, which is trained to produce
automatic and cheap production of good semantic artifacts, natural human text using online text and can generate
emphasizing the potential consequences of this relevant text in response to any input. This enables us to
development. Overall, this commentary highlights the need summarize the generated test cases in natural language,
for a critical examination of the capabilities and limitations including computer code and summaries. By using this
of AI technologies, such as GPT-3, and the impact they may methodology, there is a improvement meant the efficiency
have on society. and accuracy of test case generation, while reducing the
The paper proposed by Vel et al in 2021 highlights the dependency on human expertise and ensuring
need for efficient ways to access and extract relevant comprehensive coverage of all possible scenarios.
information from the exponential growth of digital Basically, the main purpose of using GPT-3 is to check the
information [9]. Text mining has emerged as a solution by output generated by T5 and refine it further. Basic flow
enabling the mining of relevant information from chart of the implementation is shown in fig1.
unstructured or semi-structured text documents. The paper
provides a comprehensive overview of text mining,
including various concepts, techniques, pre-processing
steps, applications, and issues, making it a valuable resource
for researchers and practitioners interested in the field of
text mining.
With the advent of automation, there is an increasing Fig. 1. Basic flow chart of the implementation
need for automated systems in answer assessment to reduce
the workload of manual evaluators [10]. However, current A. Pre-Processing
systems are limited to option-based questions, making it
difficult to evaluate theory answers. This paper presents the When dealing with text in the form of documents, pre-
AutoEval system, an automatic exam paper evaluation processing is an essential step to prepare the data for further
system that uses natural language processing (NLP) analysis.
methods to assess written responses. By automating the The first step in document pre-processing involves
paper correction process, the system provides a reliable and removing unnecessary information such as time stamps and
standardized approach to evaluation while reducing the time names, which are irrelevant to the text analysis. This can be
and resources required for manual assessment. The paper achieved using a text wrapper function that makes the text
discusses the use of NLP for grammatical analysis, syntactic in more viewable format.
analysis, semantic similarity, and database storage to
evaluate exam papers. The study highlights the limitations Once the content has been extracted, the next step is to
of manual evaluation and the benefits of automated systems tokenize the sentences. Tokenization is the process of
in providing unbiased and efficient assessment. breaking down the text into smaller components or tokens,
such as words or phrases, that can be easily processed by a
III. PROPOSED METHODOLOGY computer.
This paper proposes a methodology which involves the In addition to tokenization, it is also important to
usage of T5, and GPT-3 model together. The T5 model is identify the key phrases or topics within the text. This can
an encoder-decoder model that converts all-natural be achieved using an unsupervised algorithm such as
language processing (NLP) problems into a text-to-text Multipartite or YAKE, which analyse the text to identify the
format. It is trained using a method called teacher forcing, most important words and phrases. The specific algorithm
which requires an input sequence and a corresponding target used may depend on the type of document being analysed
sequence for training. The input text is given by the user,
1988
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
and the desired outcome of the analysis. Preprocessing steps training data, the hyperparameters of the model, and the
incorporated is shown in fig2. training process. Careful evaluation and hyperparameter
tuning are crucial for achieving the best results.
Now after training the T5 Model, GPT-3 is used to check
the output of a T5 model. One way to do this is to use GPT-
3 as a post-processing step, where the output of the T5
model is fed as input to GPT-3 to further refine the
generated text. This can help improve the overall quality and
Fig. 2. Preprocessing steps incorporated coherence of the generated text.
B. NLU Using T5 Model V. EXPERIMENTAL RESULTS
The context produced after pre-processing is then The proposed approach of using T5 and GPT-3 for
tokenized using the Sentence piece library and relevant automated test case generation has the potential to
sentences are extracted using the keywords. The sentences significantly impact software testing. Traditionally, test
are combined and given as input to the T5 model for case generation is a manual and time-consuming process,
sequence-to-sequence encoding. Attention masks and input which can often result in inadequate test coverage and
ids are obtained from the encoding and are used to generate missed defects. With the use of machine learning models
test cases for the given context. like T5 and GPT-3, it may be possible to automate the
The T5 model is a text-to-text format encoder-decoder process of test case generation, thereby reducing the time
model that is trained using teacher forcing which requires and effort required for testing. The approach could
input and target sequences. It is used to extract the context potentially improve the accuracy and completeness of test
and topic of the conversation and generate the test cases. cases generated, leading to more effective testing and
ultimately, better quality software. It may also help to
The input text is pre-processed before being fed to the
identify edge cases and corner cases that are often missed in
model. The T5 model is considered one of the best for
manual test case generation. However, there are also
natural language understanding. Processing done by T5 potential limitations and challenges with this approach, such
Model is shownin fig3. as the need for large amounts of training data and the risk of
generating redundant or irrelevant test cases. Therefore, it is
important to carefully evaluate and validate the
effectiveness of this approach before
implementing it in practice.
Considering a sample conversation. This can be even
Fig. 3. Processing done by T5 Model obtained as a form of the transcript from various meet apps.
C. Nlg Using Gpt-3 Model A. Testcase 1
GPT-3 is trained to produce natural human text using For conversation here is between 3 people discussing
online text and can generate relevant text in response to any passwords and about restrictions imposed on it. Now this
input. The understanding from the T5 Model i.e., the is just taken this as an example the conversation can be
attention mask and input ids are passed on to the GPT-3 about anything our proposed model will work on it. Like a
model for its natural language generation capability. GPT- conversation on palindrome to identify words starting with
3 is trained to produce natural human text and can generate a particular letter and of fixed length. The output generated
relevant text for any input, including computer code and from the website using proposed model is shown in fig5.
summaries. Processing done by GPT-3 Model is shown
infig4. Tom: Password should be 8 characters long
Jane: Password should contain at least one uppercase letter
Kjel: Password should contain at least one number
Jane: Password should contain at least one special character
Fig. 4. Processing done by GPT-3 Model
IV. DATASET
Training the T5 model with the WebNLG2020 dataset
has the potential to result in a high-quality natural language
generation model. The dataset includes over 36,000
examples in two domains, with SPO triples and references
texts for each triple. However, training a high-quality model
is challenging due to the complexity of the task, which
requires the model to understand semantics and generate
natural-sounding text. The quality of the resulting model Fig. 5. The output generated from the website using proposed model
depends on several factors, including the quality of the
1989
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
1990
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
training data and the specific use case. It is important to The results of this study suggest that the use of NLP
carefully evaluate and compare different test case techniques in test case generation can be a promising
generation techniques before selecting an approach for a solution. Although the test cases in the paper are just a
specific software testing project. small unit of what is expected. The automated approach can
save time and resources while improving the accuracy and
D. Limitations
completeness of test cases. However, there are limitations
Using GPT-3 as a post-processing step can be to this approach, such as the need for high-quality training
computationally expensive and time-consuming, as GPT-3 data and the potential for model biases. Therefore, it is
is a large and complex model that requires significant essential to validate the generated test cases and ensure that
computational resources to run. Additionally, it may not they cover all scenarios adequately.
always be necessary or practical to use GPT-3 to check the
output of a T5 model, especially if the T5 model is The area to be worked on could include developing a
performing well on its own. framework for validating and refining the generated test
cases, evaluating the performance of different NLP models
Ultimately, the decision to use GPT-3 to check the in test case generation, and exploring the potential of
output of a T5 model depends on the specific use case and incorporating machine learning techniques for improving
the desired level of quality and refinement for the generated the efficiency and accuracy of the process. Additionally,
text. the impact of the proposed automated approach on the
Also currently the model doesn’t perform well when overall testing process and its effectiveness in detecting
there are a large number of test cases required. The model faults could be evaluated in real-world scenarios.
either can’t produce that many outputs or the produced B. Business Model
output may not be completely accurate.
The business model provides a test case generation
VI. DISCUSSION service for software development teams, quality assurance
departments, and testing firms. Customers will provide
A. Explanation of Results conversation text, which will be pre-processed by
Consider a scenario where a company wants to test a removing irrelevant information. The model offers a
new e-commerce platform they have developed. The subscription-based pricing system with different tiers and
platform has various features such as user registration, service level agreements. Various marketing channels will
product search, shopping cart, checkout, and payment be used to promote the service, including digital ads and
gateway. In order to test the platform, the testing team social media marketing. The business model aims to
needs to generate a variety of test cases that cover all automate the test case generation process using NLP
possible scenarios. techniques to improve accuracy, reduce errors, and save
time and resources.
The traditional process of test case generation can be
time-consuming and complex, requiring a deep VII. CONCLUSION AND FUTURE WORK
understanding of the platform and its features. The testing
This paper proposes an automated solution for
team needs to create test cases for all possible scenarios, generating test cases from a given conversation. Our
including boundary conditions and requirements. The approach uses the T5 model for natural language
manual process of test case generation can be prone to understanding (NLU) of the conversation and GPT-3 for
errors and may result in missed sub-systems, leading to natural language generation (NLG) of the test cases. The T5
incomplete coverage of scenarios. model is pre-trained on the WebNLG2020 dataset and fine-
To address these challenges, an automated solution tuned on the conversation. The GPT-3 model is fine-tuned
using natural languages processing techniques such as T5 using the knowledge obtained from the T5 model's NLU of
the conversation's context. By combining the T5 model,
and GPT3 can be employed. By using these models, the
which is well-suited for NLU tasks, and the GPT-3 model,
system can generate test cases from text conversations with
which excels in NLG tasks, our proposed model can
improved efficiency, accuracy, and completeness. This generate test cases without any human intervention. The
approach reduces the dependency on the expertise of the model identifies relevant keywords/key phrases and
test designer and ensures that all scenarios are taken into generates test cases based on them. Additionally, it helps
account, resulting in comprehensive and accurate test define new terms used in the conversation and provides
cases. context-based meanings of specific words. This model can
For example, if a customer sends a text message to the be used for test-based development and will help in
customer support team saying "I am unable to complete the generating accurate and comprehensive test cases.
checkout process", the NLP models can be used to generate This proposed approach eliminates the need for manual
test cases that cover scenarios such as incorrect payment test case generation, which can be tedious and error-prone.
details, server downtime, or issues with the product itself. It also allows for the testing of more complex systems with
The generated test cases can be executed on the e- a larger number of scenarios. By using natural language
commerce platform, ensuring that all possible scenarios are processing techniques, the methodology can quickly
covered and the platform is thoroughly tested. analyze large amounts of textual information and identify
the most relevant information for generating test cases. This
significantly reduces the time and resources required for test
1991
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
1992
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:01:54 UTC from IEEE Xplore. Restrictions apply.