Testing and Maintenance of Graphical User Interfaces
Testing and Maintenance of Graphical User Interfaces
Acknowledgements 5
Abstract 7
Résumé en Français 9
1 Introduction 19
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Challenges and Objectives for Testing GUIs . . . . . . . . . . . . . . . . 19
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
II Contributions 61
1
2 CONTENTS
3.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Relevance of the Fault Model: an empirical analysis . . . . . . . . . . . . 70
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.3 Classification and Analysis . . . . . . . . . . . . . . . . . . . . . 71
3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3 Are GUI Testing Tools Able to Detect Classified Failures? An Empirical
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 GUITAR and Jubula . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Forging faulty GUIs for benchmarking . . . . . . . . . . . . . . . . . . . 76
3.4.1 Presentation of LaTeXDraw . . . . . . . . . . . . . . . . . . . . . 76
3.4.2 Mutants Generation . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.3 How GUI testing tools kill our GUI mutants: a first experiment . 79
3.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 Current limitations in testing advanced GUIs . . . . . . . . . . . . . . . 81
3.6.1 Studying failures found by current GUI testing frameworks . . . 81
3.6.2 Limits of the current GUI testing frameworks . . . . . . . . . . . 81
3.7 GUI testing with richer UIDLs . . . . . . . . . . . . . . . . . . . . . . . 84
3.7.1 Modeling GUIs with Malai UIDL . . . . . . . . . . . . . . . . . . 84
3.7.2 Interaction-action-flow graph . . . . . . . . . . . . . . . . . . . . 86
3.7.3 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Appendix 131
Bibliographie 135
I would like to thank my supervisors, Benoit Baudry and Arnaud Blouin, for their
guidance and mentoring that made this work possible. I would like to say that working
in the Diverse team was an pleasure.
I owe special thanks to my Master’s supervisor, Rossana Andrade, for introducing
me in the research world and encouraging me to do a PhD abroad.
I would like to thank my family. Even so far away from home, they are always
present in my life during these three years. I first wish to express my gratitude to my
parents, Ademir and Maria Aparecida, for all their support, comprehension, and love.
I would like to thank my brother Marcelo for helping me with his designer skills.
Thanks to my sister Claudia for all love and for giving us three awesome nephews: João
Henrique, Arthur and my goddaughter Júlia. A special thank to my youngest sister
Priscila for being a godmother and taking care of my precious Assolinha until the last
moment of his life.
I would like to thank my loved Assolinha, my eternal little gray, for doing me a
better person. You will be always in my heart and forever on my mind.
Finally, I thank my husband Manoel for encouraging me to accept this challenge.
Thanks so much for your sacrifice, patience, support, and love. Your presence was vital.
Thank you my life. I love you.
5
Abstract
Graphical user interfaces (GUIs) are integral parts of interactive systems that require
interactions from their users to produce actions that modify the state of the system.
While GUI design and qualitative assessment is handled by GUI designers, integrating
GUIs into software systems remains a software engineering task. The software engi-
neering community takes special attention to the quality and the reliability of software
systems. Software testing techniques have been developed to find errors in code. Soft-
ware quality criteria and measurement techniques have also been assessed to detect
error-prone code.
In this thesis, we argue that the same attention has to be investigated on the quality
and reliability of GUIs, from a software engineering point of view. We specifically make
two contributions on this topic. First, GUIs can be affected by errors stemming from
development mistakes. While the current GUI testing tools have demonstrated their
ability in finding several kinds of failures (e.g., crashes, regressions) on simple GUIs,
various kinds of faults specific to GUIs have to be identified and studied. The first
contribution of this thesis is a fault model that identifies and classifies GUI faults. This
fault model has been designed empirically by performing a round trip process between
the analysis of the state-of-the-art of GUI design and bug reports. For each fault we
illustrate which part of a GUI is affected and how that fault occurs as GUI failure. We
show that GUI faults are diverse and imply different testing techniques to be detected.
We then develop GUI mutants derived from our fault model. These mutants are freely
available for developers of GUI testing tools to evaluate the ability of their tool in finding
GUI failures.
Second, like any code artifact, GUI code must be maintained and is prone to evolu-
tion. Design smells are bad designs in the code that may affect its quality. Maintenance
operations can be performed to correct design smells. As for the first contribution, we
focus on design smells that can affect GUIs specifically. We identify and characterize
a new type of design smell, called Blob listener , which is a GUI specific instance of a
more general design smell: God method, that characterizes methods that "know too
much or do too much". It occurs when a GUI listener, that gathers events to treat and
transform as commands, can produce more than one command. We propose a system-
atic static code analysis procedure that searches for Blob listener that we implement
in a tool called InspectorGuidget. Experiments we conducted exhibits positive results
regarding the ability of InspectorGuidget in detecting Blob listeners. To counteract the
7
8 Abstract
use of Blob listeners, we propose good coding practices regarding the development of
GUI listeners.
Résumé en Français
Contexte
Les logiciels reposent généralement sur des interfaces utilisateur pour être contrôlés
par les utilisateurs. De tels logiciels sont alors appelés des systèmes interactifs. Les
interfaces utilisateurs peuvent prendre différentes formes telles que les interfaces en ligne
de commande, les interfaces vocales ou – et ce sera le centre d’intérêt de cette thèse – les
interfaces graphiques. Les interfaces graphiques se composent de composants graphiques
dont les plus connus sont les boutons, les menus ou encore les champs de saisie de texte.
Les utilisateurs interagissent avec ces composants (par exemple, en pressant un bouton
à l’aide d’un dispositif de pointage comme la souris) pour produire une action1 visant,
par exemple, à modifier ou à consulter les données que le logiciel permet de manipuler.
La conception et le développement d’interfaces utilisateur, en particulier les inter-
faces graphiques, impliquent différents rôles. Le designer conçoit, choisit comment l’uti-
lisateur va interagir avec le système via une interface utilisateur et valide ses choix. Le
développeur logiciel implémente cette interface utilisateur pour l’intégrer au système et
la valider. Le développement logiciel d’interfaces utilisateur est un travail conséquent
puisque le code produit peut atteindre 60% du code total du logiciel [Mem07]. De plus,
la multiplication des périphériques d’entrée (clavier tactile, GPS, accéléromètre, etc..) et
des plates-formes (tablettes, smartphones, cockpits, etc.) complexifie le développement
logiciel de ces interfaces utilisateur.
La communauté du génie logiciel porte depuis ses débuts une attention spéciale
à la qualité et la fiabilité des logiciels. De nombreuses techniques de test logiciel ont
été développées pour caractériser et détecter des erreurs dans les logiciels. Les modèles
de fautes identifient et caractérisent les erreurs pouvant affecter les différentes parties
d’un logiciel. D’autre part, les critères de qualité logiciel et leurs mesures permettent
d’évaluer la qualité du code logiciel et de détecter en amont du code potentiellement sujet
à erreur. Les techniques d’analyses statiques et dynamiques scrutent, respectivement,
le logiciel à l’arrêt et à l’exécution pour trouver des erreurs ou réaliser des mesures de
qualité. Dans cette thèse, nous prônons le fait que la même attention doit être portée
sur la qualité et la fiabilité des interfaces utilisateurs, au sens génie logiciel du terme.
Nous pensons que le code, ou plus généralement les artefacts logiciels composant des
1
Une action est également appelée un évènement [Mem07] ou une commande [GHJV95, BL00].
9
10 Résumé
interfaces utilisateurs impliquent des erreurs spécifiques nécessitant donc des techniques
de détection tout aussi spécifiques.
Les oracles graphiques analysent les résultats de l’exécution d’un test et le compare
aux résultats attendus afin de rendre un verdict. Les oracles graphiques sont conçus pour
Résumé 11
rechercher des erreurs spécifiques. Les oracles actuellement gérés par les techniques au-
tomatique de production de tests d’interface graphique sont principalement les crashs
(par exemple, les exceptions en Java) et les régressions (par exemple, un bouton dis-
paraît d’une version à une autre empêchant un test utilisant ce dernier de s’exécuter
correctement).
Parallèlement au test logiciel, la qualité logiciel prône la définition, la caractérisa-
tion, la détection et la correction la plus automatique possible de mauvaises pratiques de
développement [Fow99]. En effet, la présence d’erreurs dans le code peut être accentuée
par la présence de mauvaises pratiques [LS07, HZBS14]. L’étude des mauvaises pratiques
concernent principalement le code orienté-objet[BMMM98, Fow99, Mar04, PBDP+ 14,
ZFS15]. Peu de travaux existent sur les mauvaises pratiques de développement d’inter-
faces utilisateur. La notion de "spaghetti de call-backs" est une des premières mauvaises
pratiques relatives aux interfaces utilisateur mettant en exergue la mauvaise structure
du code lié à ces interfaces [Mye91]. Les patron d’architecture du type MVC (Modèle-
Vue-Contrôleur) [KP88] ou MVP (Modèle-Vue-Présentateur) [Pot96] permettent de li-
miter ce problème en structurant le code lié aux interfaces utilisateur dans différents
modules. L’utilisation de ces patrons d’architecture n’assure cependant pas l’absence
de problèmes de développement, en particulier avec de larges et complexes interfaces
utilisateur [Kar08].
Challenges
Des techniques de test d’interfaces utilisateur ont donc été proposées pour améliorer la
qualité et la fiabilité des interfaces. Nous pensons cependant que ces techniques ne vont
pas assez loin et que le domaine du test d’interfaces utilisateur devrait plus largement
adapter des concepts provenant du génie logiciel. De plus, ces techniques visent les inter-
faces utilisateurs simples, appelées WIMP (Window-Icon-Menu-Pointer) caractérisant
des interfaces fondées uniquement sur des interactions clavier-souris et des composants
graphiques standards qui en découle (bouton, case-à-cocher, etc.). En effet, la diversifi-
cation des périphériques d’entrée (écran tactile, accéléromètre, etc..) limite l’utilisation
des interfaces WIMP fondées sur l’utilisation du clavier et de la souris pour aller vers
des interfaces dites post-WIMP fondées sur une plus large palette de périphériques et
donc d’interactions homme-machine et de composants graphiques, découlant sur les
conséquences suivantes :
1. Les données sont dynamiquement affichées par des composants graphiques spéci-
fiquement développés (que l’on appellera composants ad hoc) pour l’occasion.
3. Des interactions ad hoc sont conçues pour interagir avec ces données et leurs
composants associés. D’un point de vue génie logiciel, de telles interactions sont
plus complexes que les interactions WIMP : elles visent à se rapprocher des in-
teractions naturelles que les utilisateurs font dans leur vie de tous les jours (in-
teractions gestuelles, vocales, etc..). Cela implique généralement des interactions
plus longues qu’un simple clic sur un bouton où plusieurs modalités d’interaction
peuvent être utilisées en même temps (voix, geste, etc..). La figure 3.7 illustre ce
principe avec une interaction bi-manuelle représentée sous la forme d’une machine
à nombre fini d’états où chaque transition correspond à un évènement produit par
un périphérique d’entrée.
release
move | pressed
release
voice | "abort"
press
press
Un des problèmes majeurs de cette évolution est que les développeurs d’interfaces
utilisateur font désormais face à de nouveaux types d’erreur que les outils classique de
test d’interfaces ne peuvent trouver. De plus, ces outils se focalisent sur les crashs et les
régressions des interfaces [Mem07] alors même que des erreurs spécifiques aux interfaces
existent requérant des techniques de test spécifiques pour être identifiées.
Challenge #1 : modèle de faute d’IHM. Un prérequis essentiel pour développer
des techniques spécifiques de test d’interfaces utilisateur WIMP et post-WIMP est un
modèle de fautes dédié, ce qui n’a jamais été fait. Un modèle de fautes fournit une
classification et une description exhaustive des fautes affectant un domaine particulier,
en l’occurrence les interfaces utilisateur. Un tel modèle permet aux développeurs d’outils
de test de développer pour chaque faute, une technique de test.
Challenge #2 : étude de mauvaises pratiques de développement d’IHM.
Parallèlement, le développement d’interfaces WIMP et post-WIMP implique du code
liant l’interface utilisateur au logiciel. Comme tout artefact logiciel, ce code doit être
maintenu et est susceptible d’évoluer et d’être affecté par des mauvaises pratiques de
codage. Par exemple, la mauvaise pratique appelée Méthode Dieu (aka. Large méthode
Résumé 13
ou méthode blob) identifie les méthodes trop longue [Fow99]. Peu de travaux ont cepen-
dant été menés pour étudier des mauvaises pratiques spécifiques au code des interfaces
utilisateurs.
Contributions
Cette thèse propose deux contributions dans le domaine du test et de la maintenance
d’interfaces utilisateur, comme l’illustre la figure 4.2 (en vert) et détaillé ci-dessous.
Format ou type des données in- Une date est affichée à l’aide de cinq
correct nombres au lieu de six. Une valeur en ra-
dian est affichée en degré.
Conditional
Commands Detection GUI listeners Blob Listeners Detection
Des bonnes pratiques de codages visant à limiter l’usage de Blob listeners sont
ensuite proposées.
Structure du document
Le document décrivant la thèse réalisée se structure de la manière suivante.
Chapitre 5 : ce chapitre conclut cette thèse en proposant une synthèse des contribu-
tions et des perspectives de recherche détaillées dans la section suivante.
18 Résumé
Perspectives
Le modèle de fautes proposés a permis d’identifier les limites actuelles des outils de test
d’interfaces utilisateur. Nous pensons que ce modèle peut servir de base pour l’amélio-
ration de ces outils. En particulier :
• Oracles graphiques. Chaque faute doit être liée à un oracle graphique décrivant
comment détecter cette faute. Cette étape est complexe car les oracles peuvent
nécessiter des techniques variées. Par exemple, pour assurer que le rendu graphique
d’un objet est correct, il est possible d’analyser une capture d’écran de ce dernier.
Par contre, pour assurer que objet graphique est un certain état, il est possible
d’analyser le code définissant cet état.
Introduction
1.1 Context
Software systems usually rely on user interfaces for being controlled by their users. Such
systems are then called interactive systems. User interfaces can take various forms such
as command line interfaces, vocal interfaces, or graphical user interfaces (GUI). GUIs
are composed of graphical interactive objects called widgets, such as buttons, menus,
and text fields. Users interact with these widgets (e.g., by pressing a button) to produce
an action1 that modifies the state of the system.
The development of GUIs involves multiple roles. The GUI design and qualitative
assessment is handled by GUI designers. Software engineers then integrate GUIs into
interactive systems and validate this integration using software testing techniques. GUI
development and validation is a major task in the development of an interactive system
since GUIs may achieve up to 60% of the total software [Mye95, Mem07]. Besides,
a multiplication of human interface devices (HID, e.g., tactile screen, gyroscope) and
a diversification of the platforms (tablet, mobile, etc.) has been experienced over the
last decade. As a result, software testers face user interfaces that have to be tested
on multiple platforms with various HIDs, which increases the testing cost. Validating,
from a software engineering point of view, user interfaces and in particular GUIs is now
a crucial challenge. Similarly to the attention that has been paid to the validation of
software systems since the last decades, we argue in this thesis that software testing
must be tailored to consider GUI specific defects. Indeed, GUIs have to be considered
as special software artifacts that require dedicated software testing techniques. Such
techniques can rely on mainstream software testing techniques, as we detail in this
document.
19
20 Introduction
enable user interactions composed of a single input event (e.g., press). Such widgets
work similarly across mainstream GUI toolkits2 and mainly rely on the use of a mouse
and a keyboard. Current GUI testing approaches have targeted GUIs composed of
standard widgets (aka. standard GUIs [LBBC15] or WIMP GUIs [vD00])3 . However,
the evolution of GUIs and how they are developed and maintained require increasing
efforts in the GUI testing domain as discussed below.
First, an ongoing trend in GUI design is the shift from designing "[user] interfaces to
designing [user] interaction" [BL04]. The underlying goal is to embrace the diversity of
HIDs and platforms to design user interfaces more adapted and natural to users, called
post-WIMP interfaces. Post-WIMP GUIs require specific, ad hoc widgets [BL00, BL04,
BB10] developed to fulfil a need that standard widgets cannot meet. Post-WIMP GUIs
exploit the following mechanisms to be more adapted to users:
2. Ad hoc interactions are provided to interact with the data and their associated
widgets. Developing and testing these interactions is more complex than interac-
tions from standard widgets. They involve multiple events triggered (e.g., voice
and gesture events) by one or more modern HIDs such as tactile screen, gyroscope,
or eye tracking. An example is the bimanual interaction that can be performed in
a multi-touch screen for zooming shapes using two fingers as it is typically more
and more the case with mobile phones and tablets.
The aforementioned specificities in GUI design demand different ways in testing a GUI.
For example, testing a GUI consists of verifying whether a user interaction produces the
expected action. In WIMP GUIs, the interactions are not tested since they are composed
of a single event, which is embedded in a GUI toolkit. For example, the interaction
"pressing" the left mouse button has the same behavior on all GUI platforms and thus
the target is to test the action resulting after the execution of such an interaction. In
post-WIMP GUIs, however, the ad hoc interactions involve multiple events, they have
some special properties (e.g., feedback) that should be tested while they are performed.
Thus, we argue that a transition must occur in the GUI testing domain: move from
testing WIMP GUIs to testing post-WIMP GUIs.
One major challenge with this transition is that the GUI developers and testers
must handle new types of GUI faults that current GUI standard testing tools cannot
detect. While GUI testing approaches have demonstrated their ability in finding sev-
eral kinds of failures (e.g., regressions, crashes) on standard GUIs [NRBM14, CHM12,
APB+ 12, MPRS12, NSS10], various kinds of faults, which are specific to the character-
istics of recent developments of GUIs, have still need to be identified and studied.
2
A GUI toolkit is a library (or a collection of libraries) that contains a set of widgets to support the
development of GUIs for instance Java Swing, or JavaFX toolkits.
3
WIMP stands for Windows, Icons, Menus, and Pointing device.
Challenges and Objectives for Testing GUIs 21
These faults are mostly stemming from post-WIMP GUIs, which have more em-
bedded HCI (Human-computer interaction) features such as direct manipulation, and
feedback [HHN85, Nor02, BL00, BB10]. In WIMP GUIs, checking whether a stan-
dard widget is correctly activated can be done easily using regression testing techniques
[Mem07]. In post-WIMP GUIs, however, errors show up in ad hoc widgets and their in-
teractions, and the graphical nature of these GUIs (e.g., data rendering). Thus, testing
such GUIs requires new testing techniques.
Second, like any code artifact, GUI code should be analyzed statically to detect
implementation defects and design smells. Several bad design decisions can introduce
design smells such as design principle violations (e.g., OO principle: encapsulation),
ad hoc software evolution, and non-use of well-known best practices (e.g., patterns
[GHJV95], guidelines [Sun01]). Indeed, the presence of design smells may indicate a
fault-prone code [LS07, HZBS14]. For example, the design smell "God method "4 has
been associated with the class error-probability [LS07].
Few efforts, however, have been done to investigate specific problems that arise
from the development of GUI systems. To develop GUIs, software engineers have
reused architectural design patterns, such as MVC [KP88] or MVP [Pot96] (Model-
View-Controller/Presenter), that consider GUIs as first-class concerns (e.g., the View
in these two patterns). These patterns clarify the implementations of GUIs by clearly
separating concerns, and thus minimizing the "spaghetti of call-backs" [Mye91]. Yet, the
use of these patterns does not ensure that a GUI code will not be affected by develop-
ment problems, which become more apparent in large and complex GUIs [Kar08]. GUI
development thus poses a challenge to the GUI quality: identifying design smells
that degrade GUI code quality. We call GUI design smells that decisions affect
negatively the code responsible for GUI behavior. These smells are particular recurrent
issues that limit the maintenance and evolution of GUI systems. Furthermore, deciding
whether a GUI code contains design smells is not a trivial task due to the absence of
GUI metrics to smell them. Software engineers have adapted well-known metrics such
as cyclomatic complexity or code duplication to measure the GUI complexity. Also,
cyclomatic complexity has been correlated to the number of faults found in the source
code [KMHSB10]. So, we need to study which kinds of development practices affect
the quality of GUI systems and which kinds of maintenance operations are required to
correct them.
In this thesis, we address these two challenges by first providing a conceptual frame-
work for GUI testing, i.e., a GUI fault model, that covers the specificities of recent GUI
developments. A GUI fault model must provide clear definitions of the different kinds
of faults a GUI testing technique should look for. It then serves both as an objective
to develop testing techniques and as a baseline to evaluate testing techniques. We ad-
dress the second challenge through the development and empirical assessment of a novel
static analysis technique, which specifically targets GUI code and focuses on one bad
design practice that we call Blob listener . The objectives to improve the testing and
maintenance of GUIs are listed as follows.
4
Also called long method. It is a method that has grown too large [Fow99].
22 Introduction
GUI testing:
• Provide a conceptual framework for GUI testing, which integrates the char-
acteristics of GUI development paradigms;
GUI maintenance:
1.3 Contributions
This thesis establishes two core contributions in the research of testing and maintenance
of GUIs. We detail these contributions below.
1. GUI fault model and its empirical assessment: We have proposed a GUI
fault model based on HCI concepts to identify and classify GUI faults. These faults
concern WIMP and post-WIMP GUIs and they are described at two levels: user
interface and user interaction. Our fault model tackles dual objectives: 1) provide
a conceptual GUI testing framework against which GUI testers can evaluate their
tool or technique; and 2) build a set of benchmark mutations, i.e., GUI mutants 5 ,
to evaluate the ability of GUI testing tools to detect failures for both WIMP and
post-WIMP GUIs.
To evaluate our fault model, we have conducted three empirical studies. In the first
study, we assess the relevance of our fault model w.r.t. faults currently observed
in existing GUIs. The goal is to state whether our GUI fault model is relevant
against failures found in real GUIs. We have successfully classified 279 GUI-
related bug reports of five highly interactive open-source GUIs in our GUI fault
model. In the second one, we have evaluated the coverage of two well-known GUI
testing tools to detect real GUI failures previously classified into our GUI fault
model. These tools differ in how they build the GUI model, from specification
of the SUT or from own SUT by reverse engineering. We have observed some
faults related to standard widgets that GUI testing tools cannot detect such as
GUI failures regarding the auto-completion feature or the data model. In the
last study, we have executed these GUI tools against 65 GUI mutants derived
5
All these mutants and the original version are freely available in https://github.com/arnobl/
latexdraw-mutants
Contributions 23
from our fault model. These mutants are planted into a highly interactive open-
source system, and they concern both standard widgets and ad hoc widgets. This
evaluation have demonstrated that GUI testing frameworks failed at detecting 43
out of 65 mutants, and most of them are related to ad hoc widgets and their user
interactions.
The above experiments allowed us to study the current limitations of standard
GUI testing frameworks. Thus, we have demonstrated why GUI failures that come
from recent developments of GUIs were not detected by these frameworks. As a
secondary contribution, we have introduced a richer User Interface Description
Languages (UIDL), which is based on the concept of interaction-action flow graph
(IFG), to tackle these limitations. We have presented the initial feedback from
the use of this approach in two case studies: an industrial project and on an open-
source interactive system. The preliminary results highlighted the capability of
the IFG produces test cases able to detect some defects in advanced GUIs.
2. GUI code assurance quality: We propose a novel static analysis technique
to automatically detect one specific GUI design smell, called Blob listener . We
identified and characterized Blob listeners as a new type of GUI design smell. It
occurs when a GUI listener can produce more than one GUI command. A GUI
command is a set of statements executed in reaction to a GUI event triggered by
each widget. This tool currently focuses on detecting Blob listeners in Java Swing
software systems. We focus on the Java Swing toolkit because of its popularity
and the large quantity of Java Swing legacy code. Also, InspectorGuidget
leverages the Eclipse development environment to raise warnings in the Eclipse
Java editor when it suspects the presence of Blob listeners.
To design this tool, we first have conducted an empirical study to identify the good
and bad practices in GUI programming. We examine in detail the GUI controllers
code of open-source interactive systems stored on 511 Web code hosting services.
These repositories contain 16 617 listener methods and 319 795 non-listener meth-
ods. The results pointed out a higher use of conditional statements (e.g., if ) in
listeners used to manage widgets. Second, we identify several good and bad coding
practices in GUI code. The critical bad development practice is the Blob listen-
ers. We have observed that the use of Blob listeners in real GUI implementations
complexifies the GUI code by, for instance, degrading the GUI code quality or
introducing GUI faults. Third, we propose an algorithm to statically analyse GUI
controllers and detect Blob listeners. Last, the ability of InspectorGuidget
in detecting the presence of Blob listeners is evaluated on six highly interactive
Java systems. InspectorGuidget have successfully identified 67 out of 68
Blob listeners. The results have exhibited a precision of 69.07 % and a recall of
98.81 %. Our experiments also show that 5 % of the analyzed GUI listeners are
Blob listeners.
The above contributions have been published [Lel13, LBB15, LBBC15] or submitted
for publication [LBB+ 16]. The preliminary challenges of testing interactive systems
24 Introduction
appeared in [Lel13]. The work on fault model is published in [LBB15]. The kinds of
GUI faults have been identified by performing a complete research study based on the
HCI concepts and real bug reports. We also demonstrate the weakness of standard GUI
testing tools at detecting several GUI failures by building a set of benchmark mutations
derived from our fault model. The current limitations of GUI testing frameworks is
published in [LBBC15]. In this work, we explain why standard GUI testing tools fail
in testing advanced GUIs. We also present how a user interface description language,
dedicated to advanced GUIs, can be applied on GUI testing to overcome that limitations.
The work on code assurance has been submitted [LBB+ 16]. In this work, we present
the empirical study to investigate coding practices that affect Java Swing GUIs. We also
present and evaluate the static analysis approach to automatically detect the presence
of Blob listeners.
Chapter 3 presents an original, complete fault model for GUIs. The GUI faults are
explained in detail through two levels: user interface and user interaction. This chapter
also gives several examples of GUI failures originated by that faults. The empirical
studies performed to assess the fault model are described in detail. We also present the
limitations of standard GUIs testing frameworks in testing more advanced GUIs. We
end this chapter by presenting the concept of interaction-action-flow graph to tackle
these limitations.
Chapter 4 starts showing the coding practices that may affect the GUI code quality.
These practices are analyzed through an empirical study of GUI controllers through 511
code repositories. We then introduce the critical bad coding practice that we identified
and characterized as a new GUI design smell. Next, we present an automated static
analysis approach to detect this smell in GUI controllers called Blob listener . The
empirical assessment of our approach’s ability at detecting Blob listeners in six large,
popular Java applications is presented in detail. We also propose the good coding
practices to avoid the Blob listeners. This chapter ends with ongoing work towards
refactoring Blob listeners.
Chapter 5 presents the main achievements of this thesis and the future research.
As a future research, we present simple examples of bad coding practices specific for
Overview of this thesis 25
Java Swing toolkit that we identified. Such practices may be used to develop a kind of
findbugs or PMD extension that targets GUIs.
Part I
27
Chapter 2
This chapter introduces the main concepts of GUI design to illustrate the new challenges
for GUI testing, and presents the limitations of current state of the art technique.
First, we present the characteristics that compose a GUI and explain them by the
two generations of GUIs (Section 2.1). Then, we detail the interactive features that
complexify how to develop and test GUIs. Second, we present the existing archictectural
design patterns used to develop GUIs (Section 2.2). Last, we present the seminal GUI
V&V techniques applied to support GUI testing (Section 2.3). We also explain how
the current solutions leverage these techniques to provide an automated GUI testing
process and point out their drawbacks.
29
30 Context
In the next subsection, we present the two generations of graphical user interface
styles: WIMP and post-WIMP GUIs.
Post-WIMP GUIs go beyond the mere use of mice and keyboards to fit to new
input devices such as tactile screens, gyroscopes, finger/eye trackers. They are presented
in several domains such as ubiquitous and mobile computing, and virtual reality. These
domains leverage the novel human-computer interaction techniques to develop highly
interactive GUIs with more "real world " interactions [JGH+ 08]. van Dam proposed that
a post-WIMP GUI is one "containing at least one interaction technique not dependent
on classical 2D widgets such as menus and icons" [vD97]. Thus, post-WIMP GUIs rely
more and more on ad hoc widgets and their complex3 interactions.
Ad hoc widgets are mainly developed for a specific purpose such as GUIs in avionics
systems [FFP+ 13]. These widgets are dynamic and their underlying data are not "pre-
defined" like WIMP GUIs. Notably, objects in post-WIMP GUIs have more complex
data structure such as geometric objects in 2D (e.g., vector-based graphics systems) or
3D systems. These objects are more dynamic since they evolve as interactions are per-
formed. For example, 3D objects change their state of appearance as the user interacts
with them. Besides, ad hoc widgets allow users to interact more directly with objects
(e.g., data). We call such interactions multi-event interactions. They are composed of
multiple events (in opposition to mono-event interactions) produced by one or more
input devices. So, post-WIMP interactions are either sequential (i.e., multiple events)
and simultaneous (i.e., multiple interactions). For example, a multimodal interaction4
can be represented by a finite state machine, where each transition is an event produced
by an input device [BB10].
Figure 2.2 illustrates a graphical post-WIMP editor called CNP2000 [BLL00, BL04].
This editor was redesigned for manipulating colored Petri nets through complex inter-
actions techniques. The GUI of CNP2000 is based on toolglasses, marking menus, a
floating palette5 and bi-manual interaction to manipulate objects. A bi-manual inter-
action is a variant of direct manipulation that requires both hands to perform an action
over a GUI (e.g., zooming or resize a GUI object).
opposition to the use of standard widgets that bring indirection between users and their
objects of interest. For instance, scaling a shape using a bi-manual interaction on its
graphical representation is more direct than using a text field. So, developing direct
manipulation GUIs usually implies the development of ad hoc widgets, such as the
drawing area. These ad hoc widgets are usually more complex than standard ones since
they rely on: advanced interactions (e.g., bi-manual, speech+pointing interactions); a
dedicated data representation (e.g., shapes painted in the drawing area). Testing such
heterogeneous and ad hoc widgets is thus a major challenge.
Another seminal HCI concept is feedback [HHN85, Nor02, BL00, BB10]. Feedback
is provided to users while they interact with GUIs. It allows users to evaluate contin-
uously the outcome of their interactions with the system. Direct manipulation relies
on feedback to provide users with continuous representations of the data objects of the
system. Feedback is computed and provided by the system through the user interface
and can take many forms. A first WIMP example is when users move the cursor over
a button. To notify that the cursor is correctly positioned to interact with the button
this changes its shape. A more sophisticated example is the selection process of most
of drawing editors that can be done using a Drag-And-Drop (DnD) interaction. While
the DnD is performed on the drawing area, a temporary rectangle is painted to notify
users about current selection area.
Another HCI concept is the notion of reversible actions [Shn83, HHN85, BB10]. The
goal of reversible actions is to reduce user anxiety by about making mistakes [Shn83]. In
WIMP GUIs, reverting actions is reified under the undo/redo features usually performed
using buttons or shortcuts that revert the latest executed actions. In post-WIMP GUIs,
recent outcomes promote the ability to cancel actions in progress [ACP12]. In such a
case, users may cancel an interaction while it is executed, when it only changes the state
of the view. For instance, moving a graphical object (e.g., a shape) to a new position
can be reverted before the object reaches that position. Thus, this feature also allows
widgets to be more susceptible to direct manipulation.
All these HCI concepts introduced in this subsection are interactive features that
require significant efforts to develop and test GUIs. Software engineers have paid spe-
cial attention to integrate these features in GUIs. Such an integration has been done
through complex pieces of code, which must be thoroughly analyzed and tested to
detect faults. While GUI implementations rely on existing GUI architectural design
patterns to minimize recurrent development problems and thus provide a higher quality
code, GUI testing techniques have adapted existing testing techniques to provide GUI
solutions that ensure the reliability of GUIs.
We thus present in the next sections the GUI architectural design patterns used to
develop GUIs and the GUI V&V techniques applied to GUI testing.
how they interact with each other. This section gives a brief overview of the mainstream
architectural design patterns used to develop GUIs.
The core decomposition that follows all the GUI architectural design patterns is
the separation between the data model of the system (aka. the functional core) and
their representations (the views or user interfaces). This explicit separation permits the
model to be independent of any view. So, multiple views can be developed for a single
data model. Views can be added, removed, or modified without having an impact on
the model.
Based on this model-view separation, multiple GUI architectural design patterns
have been proposed to complete this separation with details. The most well-known
architectural design pattern is certainly the Model-View-Controller design pattern
(MVC) [KP88]. MVC organizes an interactive system in three main components sup-
plemented with communications between them, as depicted by Figure 2.3. The model
corresponds to the data model of the system. The view is the representation of this
data model. The controller is the component that receives events produced by users
while interacting with the view, treats them to then modify the model or the view. On
changes, the model can notify its observers (in our case the views) about the changes
to be updated.
The transmission of events from a view to its controller usually relies on an event-
driven programming pattern: an interaction produced through the view by a user using
an input device (e.g., pressing a button using a mouse) produces an GUI event (e.g.,
a mouse event). This GUI event is then transmitted to the controller of the view that
receive it through a listener method (aka. a handler method). For instance, Listing 2.1
illustrates a Java Swing listener method actionP erf ormed implemented in a mere con-
troller AController. This listener receives events (ActionEvent) produced while inter-
acting with buttons or menus. This event can be then treated by the controller in this
method to perform some action.
(MVP) [Pot96]. MVP shares many similarities with MVC. The main differences are the
renaming of the controller as a presenter and the communications between the three
components, as depicted by Figure 2.4. The presenter is the core of the MVP pattern
since the model does not notify its views anymore. The presenter manages the process
of the events produced by the views and updates these last. This change permits the
model to be fully independent of observers (views). MVP is mainly used to develop
Web applications where the model remains on the server, the view on the client, and
the presenter cut in two parts: one part on the client and another one on the server). In
this case the presenter manages the communications between the client and the server.
Implementations of MVP mainly rely on the use of listeners/handlers to treat GUI
events.
The differences between MVC and MVP are, however, not so clear in practice since
these patterns provide few details about how to implement them. Many variations and
interpretations on MVC exist so that one may already be using MVP under the guise of
MVC. Besides, a second version of MVC, called MVC2, has been proposed and matches
the MVP definition6 . Beaudoux et al. discuss the distinctions between MVC and
MVC2 [BCB13]. MVC2 evolves MVC to allow the definitions of complex relationships
between models and views such as data binding mechanisms. Both patterns, however,
do not provide mechanisms for separating the interactions (e.g., DnD) on presentations
from actions (e.g., Create) on the domain objects. In such a case, a mechanism should
be provided for transforming a user interaction into an action that has effect on the
presentation and/or on the domain objects.
Other industrial and academic GUI architectural design patterns have been pro-
posed. We focus on two of them related to MVC and MVP that rely on event-driven
programming. The Model-View-ViewModel pattern (MVVM) is the core component of
the WPF (Windows Presentation Foundation) toolkit developed by Microsoft [Smi09].
MVVM can be summarized as the MVP pattern supplemented with WFP features,
such as data binding. Regarding academic patterns, Malai [Blo09, BB10] can be viewed
as an MVC or MVP pattern supplemented with two other components: interaction
and action. These components respectively permit developers to explicitly define user
interactions as finite-state machines and actions as reversible GUI commands.
The use of GUI architectural design patterns clarifies the implementations of GUIs
by clearly separating concerns, and thus helps developers to avoid some problems such
as the "spaghetti of call-backs" [Mye91]. The GUI implementations, however, still
6
http://www.javaworld.com/article/2076557/java-web-development/
understanding-javaserver-pages-model-2-architecture.html
36 Context
The most used technique to measure the quality and reliability of GUIs is test-
ing. Testing GUIs aims at validating whether a GUI has the expected behavior: Does
the GUI conform to its requirements specification? Indeed, GUI testing exercises each
possible state of a GUI through its components looking for failures. Most of GUI test-
ing approaches focus on automated GUI testing to provide an effective bugs detection
GUI V&V techniques 37
[Pim06, MPRS11, NRBM14]. For example, model-based GUI testing solutions rely on
models to drive the test case generation. In such a case, the reliability of the GUI
depends on the reliability of the models used for modeling and testing GUIs [BBG11].
The reliability of GUIs, however, is still affected by the low quality code. Static
analyses have been thus proposed to measure the quality of a GUI code. They aim
at detecting problems in the source code that may have effect on faults. Furthermore,
V&V techniques are complementary, and similar GUI V&V techniques can be combined
to achieve a fully GUI testing. For example, a model-based testing may use a dynamic
analysis to build a GUI test model.
GUI V&V techniques can be supported by defect classification schemes. An example
is fault models (Section 2.3.1.3) that are largely used as a start point in fault-based
testing such as mutation testing. Mutation testing leverages fault models to implement
a set of mutations that should be detected by their test suites. Thus, a complete test
suite must detect all mutations derived from the fault model.
In the next subsections, we present seminal V&V techniques. This section is orga-
nized as follows. We first introduce the defect classifications schemes adapted for GUIs
(Section 2.3.1). Then, we present the current GUI approaches to automate GUI testing
such as model-based testing (Section 2.3.2) and dynamic analysis (Section 2.3.3). Last,
we present the GUI solutions that leverage static analysis (Section 2.3.4).
7. "System and Software Architecture" focus on defects that affect the entire software
system such as architectural errors.
8. "Test Definition or Execution Bugs" gather defects found in the definition, design,
and execution of tests, and the data used to validate the system (e.g., wrong test
initialization, incomplete test cases).
Moreover, each above category is further break down into subcategories. This hier-
archy structure specializes the defect categories to classify more precisely a defect. So,
a defect is represented by a four digit number, where the first two digits concern the
category and a subcategory, respectively. For example, the number 61xx classifies a
defect within the category "Integration" and the subcategory "Internal interfaces" (see
Table 5.1 in Appendix).
ODC taxonomy defines eight defect types: function, interface, checking, assign-
ment, timing/serialization, build/package/merge, documentation, and algorithm. Also,
the defect types can be associated with the software development phases (e.g., design,
code). The interface type concerns defects into the communication between software
components (e.g., modules, subsystems) or hardware (e.g., device drivers). An example
of Interface defect is a wrong type of variable used in the parameter of a function call.
IEEE standard 1044 aims at classifying software anomalies. The term anomalies
is used to describe any problem detected within the project, product, or a software life
cycle. Also, IEEE Std. distinguish the terms problem, defect, fault, error, and failure.
The terms fault and defect are not used interchangeably. For example, a defect is not
considered as a fault if it is found by inspection or static analysis activities. Thus,
IEEE Std. proposes a scheme that allows software engineers to categorize defect, faults,
and failures, and their relationships. The defect categories are represented by a set of
attributes such as interface, logic, and data. Similarly to ODC taxonomy, the root cause
of the defect can be correlated with the phase of software cycle life (e.g., requirements,
designing, code, configuration). Also, defects into the interface type concern the same
problem: communication between components. Table 2.1 illustrates the defects into
this category.
The ODC extension to cover GUI defects was proposed by IBM Research [Res13a].
This extension defines 6 ODC triggers8 that concern the visual aspects of GUIs. Table
2.2 shows the triggers (highlighted in blue color and italic) and the corresponding defect
types. These types remain the same as described in ODC version 5.2 for Software Design
and Code classification [Res13b].
Brooks’ work [BRM09], one category that focuses on GUI faults is added (GUI faults)
but no GUI fault classification is proposed.
Other defect classifications focus on supporting the GUI testing automation. For
instance, Xuebing has defined a small defect classification to drive a particular test
case generation [Yan11]. This classification covers few types of GUI defects which are
grouped in the three following categories:
1. Functional defects are defects found when events from functional objects (e.g.,
buttons, menus) are invoked;
2. Interactive defects are defects that concern the interaction between a user and the
SUT to exchange some information (e.g., data editing);
3. GUI adjustment defects are defects regard the GUI adjustments objects, which
are used to resize the GUI such as resizing, or scrolling adjustements.
The above categories are based on the most common standard widgets. Table 2.3
illustrates these categories according to their related widgets. A set of widgets is linked
to a defect category (called "defect class" by the author). For instance, the category
"Functional defects" is linked to widgets that have their results reflected in GUI prop-
erties (e.g., buttons), whereas the category "Interactive defects" is related to widgets
that receive inputs from the users (e.g., combo boxes).
Table 2.3: Defects and their related objects presented by Xuebing [Yan11]
Defect Class Related Object Type
Functional defects Button, Tool Button, Menu Item, List Item, Tree View, etc.
Interactive defects EditBox, ComboBox, CheckBox, Spinner, TrackBar, etc.
GUI adjustment Scroll Bar, Title Bar, Split Container, Tab Layouts, etc.
(zooming and out), and Widget. Although, both research studies have investigated fail-
ures in a context that brings many advances in terms of interactive features, no GUI
defect classification or discussion about these kinds of failures is presented.
Mauser et al. propose a GUI failure classification for automotive systems [MKZD12].
This classification is based on the three categories: design, content, and behavior. In
the Design category, the failures refer to GUI layouts (e.g., color, font, position). In the
Content category, the failures are associated to data displayed such as text, animation,
and symbols/icons. The failures in the Behavior category are caused by a wrong behav-
ior of windows (e.g., wrong pop-up) or widgets (e.g., wrong focus). The authors focus
on characterizing GUI failures based only on a small set of specific widgets designed for
these kinds of GUIs. Furthermore, they do not consider failures that come from user
interactions.
Definition 2.4 (Fault Model) A fault model describes a set of faults responsible for
a failure possibly at a high level of abstraction and is the basis for mutation testing.
This definition highlights two key points of fault models: 1. they provide different
levels of abstraction; and 2. they qualify the effectiveness of testing techniques in terms
of fault-detection. A higher level fault model describes faults covering both physical
faults and software faults [vBDD+ 91]. The physical faults concern faults that affect the
hardware components. One example is the well-known fault model stuck-at model that
concerns physical faults within circuits in Digital Systems [MC71]. The software fault
models provide a collection of faults that occurs in software systems. These faults can be
structured according to the software assets (e.g., specification, design, implementation).
This allows software engineers to provide more detailed fault models and thus captures
"what is going wrong" in a specific part of a system. Because of this specificity, faults
models are widely applied to measure the fault-detection effectiveness of automated
testing techniques. Binder wrote "a fault model answers a simple question about testing
techniques: why do the features called out by the technique warrant our effort? " [Bin99].
One way to measure the fault coverage of test suites is to use a fault-based testing
technique such as mutation testing (aka. mutation analysis). Mutation testing creates
several mutants of a software system [JH11]. Each mutant contains a fault derived
from a fault model, which is called a system’s faulty version. A fault is seeded in
the system’s code as a mutant operator9 , which produces specific incorrect statements.
The most common examples of mutants operators are: deletion/insertion/duplication
of an statement, replacement of relational (e.g., >, >=) or conditional (e.g., &&, ||)
operators. Once the mutants are planted, they are executed over a test suite. A mutant
is killed when a test suite detects its corresponding failure.
• show that although GUIs are presented in several domains (e.g., OO systems or
critical interactive systems), the current fault models do not describe GUI faults;
and
• illustrate how fault models should be structured to support for instance the fault-
detection testing techniques.
State transition fault models. The GUI of a system is specified using User
Interface Description Languages (UIDL). Such languages may leverage state transistion
models to describe the GUI behavior. A state transition model is composed of states
9
Also called mutation operators, mutagenic operators, mutagens, mutation transformations, and
mutation rules [OU01, JH11].
44 Context
and transitions between these states. Bochmann et al. presented several faults that
affect FSMs and Petri nets [vBDD+ 91]. For example, an FSM is composed of states
(initial, end, intermediate), events (input, output), and transitions. FSM faults refer to
problems found in such a structure as detailed as follows.
• transfer faults with additional states when a transfer fault leads to a non-specified
state (i.e., additional); and additional or missing transitions from one state to
another.
Likewise, Petri nets faults refer to problems into Petri nets components (e.g., arc,
places, transitions). Some examples of such faults are: missing/additional input and
output arcs, incorrect transition between the places.
FSMs and Petri nets have been used to describe GUI models [NPLB09, BB10]. GUI
models can serve as a basis to build GUI test models. One example of an FSM model
used to build GUI test models is presented in Chapter 3. If a GUI model describes
correctly a GUI, the test model will produce the test cases correctly. So, the accuracy
of a test model also depends on the accuracy of building correctly its representation.
Object-oriented fault models describe faults specific to object-oriented features.
An example of OO faults is when OO principles are violated. Offutt et al. have proposed
a fault model regards the misuse of inheritance and polymorphic principles [OAW+ 01].
Such faults are described independently of a programming language, but the manifesta-
tion of their failures is more prone to a specific language. Some examples of OO faults
are listed below:
• Inconsistent Type Use (context swapping);
• Incomplete Construction;
parent (or ancestor) instead of its child. Also, the authors have proposed Java mutant
operators. For example, the Java mutant operator "this keyword deletion" introduces
the fault "this keyword misuse".
Likewise, Strecker et al. [SM08] have presented 12 mutant operators (called as fault
classes by the authors) in Java code that affect GUI test suites. These mutants, however,
do not characterize GUI faults but Java mutant operators (e.g., class or method faults)
that may affect a GUI. Table 2.4 illustrates some of them.
Aspect-oriented fault models refer to faults that arise from aspect-oriented pro-
gramming. AOP fault models describe faults that are distinct from OO and procedure-
oriented programming [ABA04]. Thus, AOP faults are most likely to appear during the
implementation of the cross-cutting concerns such as pointcuts, join points, and advice
in AspectJ10 language. Several AspectJ fault models have been proposed in the litera-
ture [ABA04, BA06, CTR05, DMM05]. Baekken et al. [BA06] describe several specific
faults regarding the AspectJ pointcuts. One example is the incorrect choice of primitive
pointcut, where one pointcut should be selected in place of another.
Concurrency fault models define several faults related to concurrent systems.
The essential property of these systems is concurrency, which enables several computa-
tions executing simultaneously. Developers have faced several problems to implement
such a property. One key issue is when the synchronization of concurrent systems is
violated. Lu et al. [LPSZ08] have studied several bugs related to this problem. Each
bug was analysed to identify the root cause and thus identify a bug pattern. The re-
sults of this analysis are presented into a fault model. This model is structured in three
dimensions: bug pattern, bug manifestation, and bug fix strategy, which describe, respec-
tively, the identified faults, their manifestation, and the strategy to fix them. Table 2.5
illustrates the kind of faults presented in the bug pattern.
System-specific fault models describe faults collected from empirical observa-
tions of an application domain. Such fault models differ from the aforementioned mod-
els since they do not focus on characterizing faults from programming features and
languages. Instead, they are based on a system domain. One example is a preliminary
10
It is AOP extension dedicated to Java programming language: http://www.eclipse.org/
aspectj/
46 Context
Web-specific fault model proposed by Ricca and Tonella [RT05]. This model is obtained
by analysing the open-source bug repositories of Web applications written in different
languages (e.g., PHP, Javascript). The fault categories concern the authentication prob-
lems, hyperlink problems, incorrect session management, incorrect generation of error
page, etc.
Other system domain that fault models have been used is critical interactive systems.
In such a domain, the interface contains an amount of information which is shown
in different displays by using a range of technologies (e.g., LCD and CRT screens).
This information must to be coherent in all displays to avoid failures provoked by the
operator. One example is the interface of instrumentation and control (I&C) systems for
nuclear power plants. Fayollas et al. have used fault models to increase the reliability
of safety critical systems such as interactive cokpits in avionics systems [FFP+ 13]. The
interactive cockpit user interfaces are composed events, display managers, and widgets
(e.g., push-buttons, radio buttons, edit boxes). These interfaces manage information
from a human operator (e.g., crew members) and from user applications (e.g., aircraft
systems). One way to validate such interfaces is to ensure that the information received
from the operator (e.g., input events) and the data received from the applications are
processed correctly to be then visualized on GUIs. So, the three following failures must
be avoided:
2. Erroneous control refer to the transmission of actions that differ from the events
triggered by the operators;
3. Inadvertent control are related to the transmission of actions that are not per-
formed by the operators.
To prevent these failures, the authors have considered a fault model that cover software
faults (e.g., design faults) and physical faults such as crash and transient faults.
By contrast, Pretschner et al. have proposed a generic fault model for quality
assurance [PHEG13]. The authors qualified this model as generic since it aims at char-
GUI V&V techniques 47
acterizing fault models instead of their faults. They demonstrated how some existing
fault models (e.g., finite state machine models) can be instantiated from it.
What is the important about the aforementioned fault models is they are presented
in several domains, which some of them (e.g., critical interactive systems) are driven by
GUIs. Although they describe specific faults that may result in failures (e.g., crashes)
observed by a GUI, none of them have presented or discussed GUI faults that may be
the root cause for such failures.
In this subsection, we presented the defect classification schemes for GUIs and
pointed out their drawbacks. We observed that the current GUI schemes bring sev-
eral limitations to cover properly GUI faults. Such limitations help us to understand
which kinds of faults existing GUI V&V techniques must be overcome. We thus present
in the next three subsections the seminal GUI V&V techniques and how they are de-
signed to support GUI testing. The purpose is to show the main drawbacks of GUI
testing approaches.
• GUI test models are test models that describe the structure and behavior of a GUI
of the SUT. They contain all possible sequence of user interactions. A GUI model,
however, can be designed at different levels of abstractions: modelling standard
GUIs based on single input events such as event-flow graphs (EFG) [Mem07]; mod-
elling advanced GUIs based on multi-event interactions such as interaction-action
flow graph (IFG) [LBBC15]; or modelling user scenarios to achieve a particular
goal such as task models [SCP08].
• Abstract GUI test cases are non-executable test scripts generated by traversing
a GUI test model w.r.t. a test adequacy criterion. A test adequacy criterion is
defined to constitute an adequate test suite (e.g., what to test in a GUI? ). It
helps to determine whether the test cases adequately check the SUT. An abstract
GUI test case represents one possible sequence of user interaction described in the
GUI test model.
• Concrete GUI test cases are executable test scripts concretized from the abstract
test cases. A concrete test case contains information (e.g., input data) that allows
an abstract test case be executed over the SUT.
48 Context
• GUI Oracles yield test verdicts by comparing the actual results of test scripts with
the expected ones. A GUI oracle model can be obtained from the requirements
specifications or from a stable version11 of a GUI software.
GUI test models can be automatically extracted by reverse engineering from the
SUT binaries [Mem07] (i.e., using dynamic analysis). Such models of existing GUIs is
effective for detecting crashes and regressions: the analysed GUI is then considered as
a benchmark. In some cases, however, reverse engineering is not possible. For instance,
the norms IEC 60964 and 61771, dedicated to the validation and design of nuclear power
plants, require that: developed systems must conform to the legal requirements; models
must be created from requirements in that purpose [IEC95, IEC10]. In this case, GUI
models are designed manually from the requirements. The testing process then targets
mismatches between a system and its specifications. Paiva proposed an automated MBT
approach for GUIs based on the requirements specification [Pim06]. This specification is
written in Spec#12 language and then converted into a FSM to generate automatically
GUI test cases.
Furthemore, GUI test models can be refined into GUI models, which are specified
using a User Interface Description Language (UIDL). UIDLs are used to model GUIs
in different domains (e.g., critical interactive systems, rich interactive GUI systems).
For example, Malai [BB10] and ICOs [NPLB09] architectures have been used to model
rich GUIs, i.e., post-WIMP GUIs, whereas GUITAR testing framework [NRBM14] uses
its own UIDL that captures GUI structures (the widgets that compose a GUI and
their layout). Examples of GUI test models are: state transition models (e.g., FSMs
or extended FSM such as Markov chain, or Petri nets), graphs (e.g., EFG) or formal
languages (e.g., Z language). While Malai uses FSMs to represent GUIs, ICOs leverages
Petri nets. Similarly, UML can also be applied to model user interactions using its state
machine diagrams [OMG07].
In the current GUI testing approaches, test models are mainly event-flow graphs
(EFG) [Mem07, APB+ 12]. EFGs contain all the possible sequences of user interactions
11
This version (aka. golden version) is considered a fault-free version of a system, and it is used for
doing regression testing [YCM11, SKIH11].
12
This is a formal programming language developed by Microsoft Research [BLS05].
GUI V&V techniques 49
that can be performed within a GUI. One example is GUITAR that leverages EFGs to
support an automated test cases generation. Figure 2.7 gives an general overview of
GUITAR process. The first step consists of extracting the GUI structure (e.g., widgets
that compose a GUI and their properties such as visibility) by a GUI Ripper tool. Then,
this structure is converted into a GUI model, i.e., EFG. Next, the EFG is inferred to
generate test cases by using GUI tcgenerator tool. So, GUI Replayer executes those test
cases against the SUT. One example of a GUI test sequence to "Draw a black square"
on the radio button demo shown in that Figure is: <square; none; create shape>.
The main shortcoming of the MBT approaches for GUIs is that they are based
on models concerning standard GUIs. Such models will produce test cases that are
limited to testing standard widgets and their mono-event interactions. An adaptation
of these models is required to test advanced GUIs and thus to detect the new kind of
GUI faults. The current limitations of the GUI testing frameworks dedicated to test
standard GUIs (e.g., GUITAR) and a promising GUI test model to test advanced GUIs
will be discussed in Chapter 3. Furthermore, standard GUI test models (e.g., EFGs)
can be built from the SUT by using dynamic analysis. This brings other drawbacks
that we discuss in the next subsection.
possible sequence of events13 . The goal is to interact with a GUI through all the
possible GUI states. To do this, each widget (e.g., windows or buttons) is traversed
and its events are triggered through the GUI. One example is the GUI Ripper step of
GUITAR (see Figure 2.7) that dynamically explores a GUI. The information obtained
from the GUI will be the basis to build a GUI test model. Three factors, however,
should be considered to dynamically traverse a GUI [AFT+ 12]:
One problem that arises when the order of events is not correctly triggered is that:
a GUI test model can produce infeasible test cases sequences. For example, a widget
is disabled during the test execution when it should be enabled. In this case, the test
may fail. Memon proposed an automated approach to repair the infeasible sequences
[Mem08] generated by GUITAR.
Figure 2.8 illustrates the two artifacts generated by GUI Ripper tool: GUI tree on
the top and its EFG model on the bottom. The GUI Ripper tool [MBN03] executes
dynamically a SUT and extracts its GUI tree. This GUI tree represents the hierarchical
GUI structure such as widgets (e.g., w0 , ...,w8 , where w0 represents the button "Exit")
and their properties. These properties have discrete values (e.g., integer or text) that
compose a GUI state. The EFG represents the possible sequences of events behind the
GUI structure. Each node of EFG is an event triggered by a widget. For example, the
nodes named create and exit in the EFG represent the events triggered by the buttons
"Create" and "Exit" in the GUI tree, respectively.
An extension of GUI Ripper to support Android platform, called AndroidRipper, is
presented by Domenico et al. [AFT+ 12]. The process is similar to GUI Ripping but the
GUI structure is represented by a state machine model. This model contains the set of
GUI states and state transitions. The main goal is to find GUI crashes. Also, Takala et
al. proposed a dynamic solution to build models of Android applications and generate
online GUI test cases [TKH11]. The solution uses two separate state machines to build
the model to facilitate its reuse on other device models. The first machine represents
the state of the SUT and verifies it against the model. The second state machine uses
the keyword method to capture the user interactions, for instance, tapping and dragging
objects on the screen.
Mariani et al. proposed the AutoBlackTest technique to build a model generat-
ing test cases incrementally and automatically while interacting with a GUI of SUTs
[MPRS11, MPRS12]. While interacting with the SUT, the GUI is analyzed to extract
its current state. Then, the behavioral model is updated to select and execute the next
action. Thus, the model is built incrementally and the test cases are identified. These
test cases are refined and a test suite is then generated automatically.
13
This exploration is also called "crawling or ripping" the GUI.
GUI V&V techniques 51
Although most of GUI dynamic approaches build test models to automate steps of a
model-based GUI testing. Other solutions may generate sequences of events without cre-
ating a GUI test model. A typical example is the Monkey tools such as UI/Application
Exerciser Monkey for Android systems. These tools interact with a GUI by randomly
sending events to a device (or emulator) such as key pressures, touches pressures, or
gestures. They are useful to find failures that result in crashes. Some model-based
testing tools (e.g., TEMA tools14 ) have reused Monkey tools to interact with a specific
GUI (e.g., Android GUIs) and thus extract the GUI structure.
The main drawback of GUI dynamic analysis techniques is that they require a correct
SUT to extract GUI models. If a faulty SUT is used, its corresponding GUI model will
be built incorrectly. So, the test cases generated from this model will consider GUI
failures as a correct behavior. For this reason, GUI testing techniques that use dynamic
analysis to support test cases generation have demonstrated their effectiveness during
regression testing.
14
http://tema.cs.tut.fi/
52 Context
Static code analysis is usually supported by automated tools. There are several static
code analysis solutions that vary in their scope from design smell detection [BMMM98,
Fow99, MPN+ 12, SKBD14] to maintenance and evolution of systems [Sta07, APB+ 12,
ZLE13]. Such solutions aim at solving several problems when the aspects of code quality
(e.g., understandability and changeability) are affected. The main solutions of static
analysis tools are presented below.
Style checkers aim at assisting developers to write code that adheres to quality
checkers. Quality checkers are a set of coding standards such as programming rules,
naming conventions, and layout specifications. One example of a quality checker tool
is Checkstyle 15 . This tool analyses the coding style and conventions of a Java code
checking for whitespace, method and line length, empty blocks, etc.
Code metrics allow developers to calculate the structural attributes of a source
code. Several studies has been done to establish metrics to measure the complexity of
code since they are a strong indicator of software quality and maintenance cost. Code
metrics are, for example, the number of lines of code (LOC), the McCabe’s cyclomatic
complexity (CC) [McC76], the cohesion metrics [RRP14].
Code structures are useful to understand different aspects of a source code such
as control flow, data flow and data structures. Control flow structures concern the
paths of an execution and can be thus used, for instance, to identify unreachable code.
Data flow structures track a data item as it is accessed and modified by the code. One
example of data-flow error is assigning an incorrect or invalid value to a variable.
15
http://checkstyle.sourceforge.net/
GUI V&V techniques 53
Bug finders look for potential defects in the source code of a particular program-
ming language. The goal is to raise warnings into a code where the system will behave
incorrectly. Such tools are based on several rules (or patterns or idioms) to indicate
often real defects. A well-known tool is Find bugs 16 that detects several errors in Java
systems such as infinite recursive loops, deadlocks, and catch exceptions. Another tool
to identify potential bugs and bad programing practices in Java code is PMD17 .
Design smells detectors look for bad design problems that degrade the code
quality. A design smell (aka. code smells or bad smells) is any symptom in the
source code that may indicate a design problem or a poor design choice. The char-
acterization and detection of object-oriented design smells have been widely studied
[BMMM98, Fow99, GPEM09, MGDLM10, KVGS11, MPN+ 12, PBDP+ 14, ZFS15].
The most relevant work about design smells is proposed by Fowler et al. [Fow99].
In this work, the authors describe 22 design smells and the refactoring strategies to
correct them.
Note that some tools have more than one purpose, i.e., they combine two or more
solutions. For example, PMD looks for both bugs and bad programing practices. Simi-
larly, Style checkers detect design violations that may be characterized as design smells
[GSS15]. Several tools leverage that combination to cover a domain-specific. This is
the case of static analysis tools for GUIs. These tools can leverage the code metrics
(e.g., CC) or code structures (e.g., data flow structure) to formulate heuristics or to
build/improve test models, that may also be used by detection strategies such as design
smell detections.
Silva et al. use a static analysis to extract automatically the GUI behavior from the
source code [SSG+ 10]. The static analysis is implemented in a tool called GUISurfer.
The GUISurfer process is illustrated in Figure 2.9. First, a language-dependent parser
is used to obtain the abstract syntax tree (AST) from the source code. A code slicing
technique is used to interact with this AST and extract only GUI related data, which
represent a GUI model. Then, a language independent tool receives this model and
generates the GUI outputs files (e.g., events, conditions, actions, and states). These
files are transformed in behavior models such as FSMs to support GUI testing.
Other static analysis approaches focus on GUI maintenance and evolution.
Grechanik et al. propose a tool called REST to identify test scripts that are affected
when a GUI of a system evolve [GXF09]. Zhang et al. have extended REST to auto-
matically repair broken workflows in Swing GUIs [ZLE13]. A workflow is broken when
GUI widgets are replaced or shifted in a new version of the GUI. The authors combine
random testing and static analysis to infer workflows from a user’s point of view. The
static analysis is limited to find methods related to a GUI action. Although a GUI
listener can handle several actions19 , GUI actions presented in a same listener are not
tackled. So, the static analysis approach was less effective. This work highlights the
difficulty "for a static analysis to distinguish UI actions [GUI commands] that share the
same event handler [GUI listener] ". In this thesis, we tackle this problem by developing
an approach to accurately detect GUI commands that compose GUI listeners.
Staiger also proposes a static analysis to extract GUI code, widgets, and their hi-
erarchies in C/C++ software systems [Sta07]. This approach, however, is limited to
find relationships between GUI elements and thus does not analyze GUI event han-
dlers. Zhang et al. propose a static analysis to find violations in GUIs [ZLE12]. These
violations occur when GUI operations are invoked by non-UI threads leading a GUI
error. The static analysis is applied to infer a static call graph and check the violations.
This technique is implemented on the top of WALA framework. An interesting finding
reported by the authors is that "GUI developers have already used design patterns, run-
19
We call GUI (or UI) actions as GUI commands, which will be detailed in Chapter 4.
GUI V&V techniques 55
time checks, and testing to avoid violating the single-GUI-thread rule. However, due to
the huge space of possible UI interactions, hard-to-find invalid thread access errors still
exist". In our work, we tackle a GUI design smell that affects GUI controllers, which
are based on GUI architectural patterns.
Frolin et al. propose an approach to automatically find inconsistencies in MVC
JavaScript applications [OPM15]. GUI controllers are statically analyzed to identify
consistency issues (e.g., inconsistencies between variables and controller functions). This
work is highly motivated by the weakly-typed nature of Javascript. The main issue is
the erroneous understanding of the relationship between the JavaScript code and the
API "Document Object Model ", which is responsible to represent the hierarchy of HTML
elements and their properties. The authors analyzed 300 bug reports to understand the
issues specific for JavaScript applications (e.g., their root cause and propagation). The
static analysis is implemented in a tool called AUREBESH. This tool detected 15 bugs
in 22 real-world MVC applications. They established four patterns to explain the root
cause of these bugs. For example, one pattern, called boolean assigned a string, occurs
when a developer assigning a string to a variable that expects a boolean value.
Regarding the GUI design smells, we first recall the characterization and detection
56 Context
of object-oriented design smells to illustrate how they cannot be used in our work
presented in Chapter 4. For instance, Fowler [Fow99] and Brown et al. [BMMM98]
characterized multiple design smells, resp. anti-patterns, associated with refactoring
operations. The detection of design smells mainly relies on the association of code
metrics (e.g., LOC or CC) to form heuristics. Closely related, Moha et al. propose
Decor, a method to specify and then detect design smells [MGDLM10]. In this thesis,
we characterized a new type of GUI design smell, called Blob listener , as a GUI listener
that contain more than one GUI command (Section 4.3). From this definition, the
aforementioned approach can be hardly used to detect Blob listeners. It would require
considering the number of GUI commands per GUI listener as a code metric, and specify
a rule based on this new metric to detect Blob listeners. If considering such a metric as
a standard metrics is relevant given the increasingly interactivity of software systems,
it is not possible yet. Moreover, our detection strategy aims at precisely locating each
GUI commands that compose Blob listeners, not just detecting Blob listeners.
Sahin et al. propose an approach to generate design smell detection rules using a
dedicated optimization technique [SKBD14]. The goal is to ease the definition of de-
tection rules by automatically analyzing code instead of relying on a manual process.
Closely related, Zanoni et al. propose an approach based on machine learning to detect
design smells [ZFS15]. Khomh et al. propose goal question metric-based approach to
detect anti-patterns [KVGS11]. Because of GUI specificities, the Blob listener detection
rule we propose does not rely on standard code metrics. These approaches, however,
may be applied on detected Blob listeners to see whether interesting detection rules or
results are produced and to what extent they can be used to precise our approach. Sev-
eral research work on design smell characterization and detection are domain-specific.
For instance, Moha et al. propose a characterization and a detection process of service-
oriented architecture anti-patterns [MPN+ 12]. Garcia et al. propose an approach for
identifying architectural design smells [GPEM09]. Similarly, this thesis aims at moti-
vating that GUIs form another domain concerned by specific design smells that have to
be characterized.
Unlike the aforementioned object-oriented design smells, less research work focuses
on GUI design smells. We identified two research studies focus on design smells for
GUIs. These studies, however, leverage static analysis to detect the presence of design
smells in another type of GUI artifacts such as GUI dialog models [SCSS14] and GUI test
scripts [CW12]. Also, the strategies of design smell detections may be based on metrics
or existing OO design smells. Silva et al. propose an external bad smells detection in
dialog models [SCSS14]. External by the authors means looking for bad smells from
the running system instead of from source code analysis. Dialogue models describe
the behaviour of a user interface such as GUI components and their relationship. The
authors use the static analysis approach, tooled by GUISurfer [SSG+ 10], to build such
models. To detect the bad smells in the dialog models, the approach calculates two
metrics: Pagerank and Betweenness. These two metrics are used to measure the user
interface complexity. This complexity can reveal some bad smells such as erroneous
distributed complexity along the application behavior or misplaced central axes in the
GUI V&V techniques 57
interaction between users and the system. The authors show how these metrics can be
applied but no detail about the bad smells is provided.
Chen and Wang [CW12] identify 11 bad smells in GUI test scripts illustrated in
Table 2.7. This choice is motivated by the fact that the structure of a GUI test script
is similar to the source code (e.g., statements) in some aspects. First, a GUI test script
is a sequence of actions that will be executed over the GUI of a SUT. These actions
are represented by GUI events or assertions, this concern the correctness of the SUT.
Second, a method also contains a sequence of statements that will be executed when
it is invoked. For example, the bad smell long method (aka. God Method [Fow99]),
which is too large since it contains several statements, is analogous to the bad smell
long keyword. This bad smell shows up in a keyword-driven test (KDT) script when one
keyword contains several actions. These two bad smells are similar since they tackle
the same problem but in different contexts: several statements in a method and several
actions in a test script. Also, the authors propose 16 refactoring methods to remove
automatically such bad smells. The bad smell long keyword can be removed by applying
the method "extract macro event". This method removes several actions found into a
keyword (or macro event) to create a new reduced one. In this thesis, we identify a
new type of GUI design smell, Blob listener , that are also related to the design smell
long (God) Method. Blob listener is a GUI particular instance of the God Method.
However, we do not investigate the correlation between the Blob listeners and others
GUI artifacts.
Table 2.7: Bad smells for GUI test scripts. Adapted from [CW12]
Bad Smell Description
Unsuitable Naming A keyword, macro component, macro event, or primitive component is not
properly named.
Duplicated Actions Some duplicated actions appear in multiple places.
Long Keyword or Long A keyword (or macro event) contains too many actions.
Macro Event
Long Parameter List The parameter list of a keyword or macro event is too long.
Shotgun Surgery Multiple places need to be modified with a single change.
Large Macro Component A single macro component contains too many primitive components, macro
events, and macro components.
Feature Envy A macro event uses another macro component’s macro events or components
excessively.
Middle Man A macro component delegates all its tasks to another macro component.
Lack of Macro Events A macro component does not have any macro events.
Lack of Encapsulation Using primitive events directly instead of encapsulating actions into macro
events.
Inconsistent Hierarchy The macro component hierarchy is inconsistent with the structure of the GUI.
One may note that metrics play a special role to the detection strategies of design
smells. However, few studies focus on establishing metrics that are specific for GUIs.
58 Context
Magel et al. [MA07] introduce five GUI structural metrics. We detail below three of
them that deal with the complexity of GUIs.
• Controls’ Count (CC)/LOC measures how much the code of a system is dedicated
to implement a GUI. This metric is an adaptation of CC metric, which is based
on the total number of controls’ count that has an interface. A high CC value
may indicate the system’s complexity. However, this metric alone does not reflect
the GUI complexity since some controls are easier to test than others.
• The GUI tree depth measures the depth of a GUI. A GUI structure is mapped
to a tree model and then the depth of the tree is calculated based on its leaf
deepest node. The authors believe that this metric can be useful to select more
representative GUI test scenarios. For instance, the controls that have the same
parent of the selected control may be excluded from a test scenario when it starts
from the lowest level control. Such a reduction is based on heuristics that are not
provided by the authors.
• The structure of the tree measures how much a GUI is complex. The complexity
is based on the fact that if a tree has the most of GUI controls on the top, this
tree is more complex. From a testing point of view, this kind of tree reflects more
user choices or selections.
These metrics aim at measuring how much effort is needed to test a particular GUI.
They do not indicate a problem for instance in the source code of a GUI.
Most of GUI research studies have used mature metrics such as CC to measure the
complexity of a GUI code. Other metrics, however, may also be adapted to detect
specific problems in the GUI code such as number of changes (per commit, class, or
method) and cohesion metrics. For instance, the cohesion metric has been applied to
indicate fat interfaces [RRP14], which are classes that handle several methods from
different clients. In our work, this metric may be adapted to measure the cohesion of
GUI controllers that are Blob listeners. We believe that several metrics can be used
or adapted to help software engineers to identify problems in GUI code. In this thesis,
we have used cyclomatic complexity as an initial step to investigate the code of GUI
controllers to then establish a more elaborate and precise rule, i.e., the number of GUI
commands per GUI listener, to automatically detect Blob listeners.
In this section, we presented the GUI V&V techniques and the GUI defect classifi-
cations schemes to support them. The limitations of GUI schemes allowed us to explain
the main drawbacks of current GUI testing techniques. On the one hand, GUI testing
techniques aim at the reliability of standard GUIs. So, they fail in providing solutions
to test advanced GUIs and their interactive features. By the other hand, GUI analysis
approaches aim at the quality of GUIs. None of them have targeted design smells that
may degrade the GUI code.
Conclusions 59
2.4 Conclusions
This chapter presented the state-of-the-art of testing GUI systems. GUIs play a vital
role in designing interactive systems. The new trend in GUI design brings several
specificities that impact on how to test GUIs. GUI testing has been applied to find
GUI failures provoked by GUI defects. Several defect classification schemes have been
proposed but few of them focus on GUIs. The gap is twofold. First, the proposed GUI
defect classifications are not properly defined since they are based on a small set of
standard widgets. Second, GUI schemes do not consider defects that arise from richer
GUIs. For example, GUI errors that show up from the graphical nature of post-WIMP
GUIs, their ad hoc widgets and complex interactions are not covered. Several kinds of
faults specific to GUIs have been identified and described into a new GUI fault model,
which will be presented in Chapter 3.
Different GUI V&V techniques are detailed. They vary in their purpose w.r.t.
the GUI artifact used to provide automated solutions. Most of GUI approaches focus
on automated GUI testing for standard GUIs. Also, MBT approaches that leverage
dynamic analysis to build GUI test models have demonstrated their ability to detect
crashes and regressions. Such approaches, however, require expressive GUI models to
produce more effective test cases. Different kinds of UIDLs have thus been proposed
to describe intrinsic components of GUI design, which are represented as GUI models.
Such models have been used to derive GUI test models and thus support test case
generation. Also, GUI V&V techniques inspect the source code to measure the code
quality for instance detecting error-prone code that may result in GUI failures. Static
analysis approaches have been applied for different purposes such as GUI testing, GUI
maintenance and evolution. We identified few studies focus on GUI design smells. A
new type of design smell specific for GUIs is identified and automatically detected by
a novel GUI static analysis proposed in this thesis. We will present the details in
Chapter 4.
Part II
Contributions
61
Chapter 3
This chapter presents an original fault model for graphical user interfaces. This fault
model is a GUI fault classification scheme structured at two levels: user interface faults
and user interaction faults. For each fault we illustrate which GUI is affected and how it
occurs as GUI failure (Section 3.1). We evaluate the coverage of our fault model through
an empirical analysis: identifying and classifying several GUI failures from open-source
bug repositories (Section 3.2). We also assess the ability of two GUI testing tools
(GUITAR and Jubula) to find real GUI failures previously classified (Section 3.3).
The practical use of the GUI fault model is demonstrated by forging several mutants
of a highly interactive open-source system (LaTeXDraw). These mutants implement the
faults described in our fault model. We then conduct a third experiment to evaluate
the ability of those GUI testing tools to detect these mutants (Section 3.4). We also
discuss the reasons why several mutants are not killed. A precise analysis of the limits
of GUI testing frameworks for testing advanced GUIs is presented in Section 3.6. We
selected GUITAR, a standard GUI model-based testing framework, to study and explain
the limitations of the current standard GUI testing approaches. We also present the
concept of interaction-action-flow graph to tackle these limitations (Section 3.7). The
benefits of this approach are demonstrated through two different use cases.
The contributions of this chapter have been published in [LBB15, LBBC15].
Definition 3.1 (GUI Fault) GUI faults are differences between an incorrect and a
correct behavior description of a GUI.
63
64 GUI fault model
Definition 3.2 (GUI Error) A GUI error is an activation of a GUI fault that leads
to an unexpected GUI state.
A GUI fault can be introduced at different levels of a GUI software (e.g., GUI code,
GUI models). An illustration of a GUI fault is: a correct line of GUI code vs an incorrect
line of GUI code. For example, a GUI fault can be activated when an entry, such as a
value into an input widget, is not handled correctly by its GUI code. So, an unexpected
GUI state is manifested (e.g., a crash as a GUI failure) when a user clicks on a button
after typing this entry.
Our fault model is divided into the following groups:
• User interface faults refer to faults that affect the structure and the behavior of
graphical components of GUIs.
• User interaction faults refer to faults that affect the interaction process, i.e., when
a user interacts with a GUI.
This fault category corresponds to unexpected GUI designs. Since GUIs are composed
of widgets laid out following a given order, the first fault is the incorrect layout of
widgets (GSA1). Possible failures corresponding to this fault occur when GUI widgets
follow an unexpected layout (e.g., wrong size or position). The next fault concerns the
incorrect state of widgets (GSA2). Widgets’ behavior is dynamic and can be in different
states such as visible, enabled, or selected. This fault occurs when the current state of
a widget differs from the expected one. For example, a widget is unexpectedly visible.
The following fault treats the unexpected appearance of widgets (GSA3). That concerns
aesthetic aspects of widgets not bound to the data model, such as look-and-feels, fonts,
icons, or misspellings.
Fault Model 65
In many cases, widgets aim at editing and visualizing data. For example with WIMP
GUIs, text fields or lists can display simple data to be edited by users. Post-WIMP
GUIs share this same principle with the difference that the data representation is usually
ad hoc and more complex. For example, the drawing area of a drawing editor paints
shapes of the data model. Such a drawing area has been developed for the specific case
of this editor. That permits to represent graphically in a single widget complex data
(e.g., shapes). In other cases, widgets aim at monitoring data only. This is notably the
case for some GUIs in control commands of power plants where data are not edited but
monitored by users. The definition of data representations is complex and error-prone.
It thus requires adequate data presentation faults.
The first fault of this category is the incorrect data rendering (DT1). DT1 is pro-
66 GUI fault model
voked when data is converted or scaled wrongly. Possible failures for this fault are man-
ifested by unexpected data appearance (e.g., wrong color, texture, opacity, shadow) or
data layout (e.g., wrong position, geometry). The second fault concerns incorrect data
properties (DT2). Properties define specific visualization of data such as selectable or
focused. A possible failure is a web address that is not displayed as a hyperlink. The
last fault (DT3) occurs when an incorrect data type or format is displayed. For instance,
an angle value is displayed in radian instead of in degree.
In this section, we introduce the faults that treat user interactions. The proposed
faults are based on the characteristics of WIMP and post-WIMP GUIs detailed in the
previous section. For each fault we separated our analysis into two parts. One dedicated
to WIMP interactions and another one to post-WIMP interactions. WIMP interactions
refer to interactions performed on WIMP widgets. They are simple and composed of
few events (click1 , key pressed, etc.).
Post-WIMP interactions refer to interactions performed on post-WIMP widgets.
Such interactions are more complex since they can be multimodal, i.e., involve multiple
input devices (gesture, gyroscope, multi-touch screen); be concurrent (e.g., in bi-manual
interactions the two hands evolve in parallel); be composed of numerous events (e.g.,
multimodal interactions may be composed of sequences of pressure, move, and voice
events). Subsequently the direct manipulation principles, other particularities of post-
WIMP interactions are that they aim at: being as natural as possible; providing users
with the feeling of handling data directly (e.g., shapes in drawing editors).
Table 3.2 summarizes the user interaction faults and some of their potential failures
for both WIMP and post-WIMP interactions. These faults are detailed as follows.
1
A click is one interaction composed of the event mouse pressed followed by the event mouse released.
Its simple behavior has leaded to consider a click as an event itself.
2
https://docs.oracle.com/javafx/2/api/javafx/scene/input/GestureEvent.html
Fault Model 67
3.1.2.2 Action
This category of faults groups faults that concern actions produced while interacting
with the system. The first fault (ACT1) focuses on the incorrect results of actions. In
this case the expected action is executed but its results are not correct. For instance
with a drawing editor, a failure can be the translation of one shape to the given position
(−x, −y) while the position (x, y) was expected. The root cause of this failure can be
located in the action itself or in its settings. For instance, a first root cause of the
previous failure can be the incorrect coding of the translation operation. A second root
cause can be located in the settings of the translation action.
The second fault (ACT2) concerns the absence of action when interacting with the
system. For instance, this fault can occur when an interaction, such as a keyboard
shortcut, is not correctly bound to its widget.
The third fault (ACT3) consists of the execution of wrong actions. The root cause
of this fault can be that the wrong action is bound to a widget at a given instant. For
instance: clicking on the button Save shows the dialogue box used for loading; doing a
DnD interaction on a drawing area selects shapes instead of translating them.
3.1.2.3 Reversibility
This fault category groups three faults. The fault (RVSB1) concerns the incorrect
behavior of the undo/redo operations. Undo and redo operations usually rely on WIMP
widgets such as buttons and key shortcuts. These operations revert or re-execute actions
already terminated and stored by the system. A possible failure is the incorrect reversion
of the latest executed action when the key shortcut ctrl+z is used.
Contrary to WIMP interactions, that are mainly one-shot, many interactions last
some time such as the DnD interaction. In such a case, users may be able to stop an
interaction in progress. The second fault (RVSB2) thus consists of the incorrect inter-
ruption of the current interaction in progress. For instance, pressing the key "Escape"
during a DnD does not stop this last as expected. This fault could have been classified
as an interaction behavior fault. We decided to consider it as a reversibility fault since
it concerns the ability to revert an ongoing interaction.
Once launched, actions may take time to be executed entirely. In this case such
actions can be interrupted. The third fault (RVSB3) concerns the incorrect interruption
of an action in progress. A possible failure concerns the file loading operation: clicking
on the button "Cancel " to stop the loading of a file does not work properly.
3.1.2.4 Feedback
Widgets are designed to provide immediate and continuous feedback to users while they
interact with them. For instance, progress bars that show the loading progress of a file is
a kind of feedback provided to users. The first fault of this category (FDBK1) concerns
the incorrect feedback provided by widgets, where feedback reflects the current state of
an action in progress. This fault focuses on actions that last in time and which progress
should be monitored by users.
Fault Model 69
The second fault (FDBK2) focuses on potentially long interactions (i.e., interactions
that take a certain amount of time to be completed) which progress should be discernible
by users. For instance with a drawing editor, when drawing a shape on the drawing
area, the shape in creation should be visible so that the user knows the progression of
her work. So, a possible failure is drawing a rectangle using a DnD interaction, that
works correctly, does not show the created rectangle during the DnD as expected.
3.1.3 Discussion
The definition and the use of a fault model raise several questions we discuss about in
this subsection.
What are the benefits of the proposed GUI fault model?
The benefits of our GUI fault model are twofold. First, a fault model is an exhaus-
tive classification of faults for a specific concern [vBDD+ 91]. Providing a GUI fault
model permits GUI developers and testers to have a precise idea of the different faults
they must consider. As an illustration, Section 3.2 describes an empirical analysis we
conducted to classify and discuss about GUI failures of open-source GUIs. Second, our
GUI fault model allows developers of GUI testing tools to evaluate the efficiency of their
tool in terms of bug detection power w.r.t. a GUI specific fault model. As detailed in
Section 3.4, we created mutants of an existing GUI. Each mutant contains one GUI
failure that corresponds to one GUI fault of our fault model. Developers of GUI testing
tools can run their tools against these mutants for benchmarking purposes.
Should usability have been a GUI fault?
Answering this question requires the definition of a fault to be re-explained: a fault is
a difference between the observed behavior description and the expected one. Usability
issues consist of reporting that the current observed behavior of a specific part of a
GUI lacks at being somehow usable. That does not mean the observed behavior differs
from the behavior expected by test oracles. Instead, it usually means that the expected
behavior has not been defined correctly regarding some usability criteria. That is why
we do not consider usability as a GUI fault. This reasoning can be extended to other
concerns such as performance.
How to classify GUI failures into a fault model?
A GUI failure is a perceivable manifestation of a GUI error. Classifying GUI failures
thus requires to have identified the root cause (i.e., GUI fault) of the failure. So,
classifying GUI failures can be done by experts of the GUI under test. These experts
need sufficient information, such as patches, logs, or stack traces, to identify whether
the root cause of a failure is a GUI fault to then classify it. For example, given a failure
manifested through the GUI and caused by a precondition violation. In this case, such
a failure is not classified into the GUI fault model. Similarly, classifying correctly a GUI
failure also requires to qualify the involved widgets (e.g., standard or ad hoc) as well as
the interaction (e.g., mono-event or multiple-event interaction).
70 GUI fault model
3.2.1 Introduction
To assess the proposed fault model, we analyzed bug reports of five popular open-source
software systems: Sweet Home 3D, File-roller, JabRef, Inkscape, and Firefox Android.
These systems implement various kinds of widgets, interactions, and encompass different
platforms (desktop and mobile). Their GUIs cover the main following features: indirect
and direct manipulation; several input devices (e.g., mouse, keyboard, touch); ad hoc
widgets such as canvas; discrete data manipulation (e.g., vector-based graphics); and
undo/redo actions.
Each report has been then manually analyzed to state whether it is a GUI failure.
Also, selected bug reports have to provide explanations about the root cause of the
failure such as a patch or comments. This step is crucial to be able to categorize the
failures using our GUI fault model considering their root cause. We also discarded
failures identified as non-reproducible, duplicated, usability, or user misunderstanding.
From this selection we kept 279 bug reports (in total for the five systems) that describe
one GUI failure each. The following sub-sections discuss about these failures and the
classification process.
Figure 3.1: Classification of the 279 bug reports using the GUI fault model
Table 3.3 shows the distribution of the 279 analyzed GUI failures per software and
category (user interface or user interaction). These results point out that the systems
Sweet Home 3D and Firefox Android seem to be more affected by user interface failures.
Most of these failures concern the GUI structure and aesthetics fault. That can be
explained by the complex and ad hoc GUI structure of these systems. File Roller and
JabRef GUIs include widgets with coarse-grained properties (i.e., simple input value
such as number or text). Most of their failures concern WIMP interactions classified
into the action category. In contrast, Inkscape presented more failures classified as
post-WIMP. Indeed, Inkscape, a vector graphics software, mainly relies on its drawing
72 GUI fault model
area that provides users with different post-WIMP interactions. These failures have
been categorized mainly into interaction behavior, action, and reversibility.
As depicted by Figure 3.2, 41% of these 279 GUI failures are originated by faults
classified into the user interface category and 59% into the user interaction category.
Most of user interaction failures have been classified into the incorrect action results
(54%). This plot also highlights that only 25% of the analyzed user interface failures
and 18% of the user interaction ones have been classified as post-WIMP. We comment
these results in the following subsection.
Figure 3.2: Manifestation of failures in the user interface and interaction levels
3.2.4 Discussion
The empirical results must be balanced with the fact that user interactions are less
tangible than user interfaces. So, users may report more GUI failures when they can
perceive failures graphically (an issue in the layout of a GUI or in the result of an action
visible through a GUI). Users, however, may have difficulties to detect a failure in an
interaction itself while interacting with the GUI. That may explain the low number
of failures (4%) classified into Interaction Behavior. Another explanation may be the
primary use of WIMP widgets, relying on simple interactions.
In our analysis, many failures that could be related to Feedback were discarded
since they concerned enhancements or usability issues, which are out of the scope of a
GUI fault model as discussed previously. For instance, GUI failures that concern the
Are GUI Testing Tools Able to Detect Classified Failures? An Empirical Study 73
lack of haptic feedback in Firefox Android were discarded. So, few faults (1%) were
classified into this category. Another explanation may be the difficulty for users to
identify feedback issues as real failures that should be reported.
We observed that some reported GUI failures are false positives regarding the fault
localization: if the report does not have enough information about the root cause of a
failure (e.g., patch or exception log), a GUI failure can be classified in a wrong fault
category. For example, when moving a shape using a DnD does not move it. At a first
glance, the root cause of this failure can be associated to an incorrect behavior of the
DnD. So, this failure can be categorized into the interaction behavior. However, after
analyzing the root cause this failure refers to an action failure since the DnD works
properly, but no action is linked to this interaction.
Likewise, the failures related to Reversibility and Feedback were easily identified
through the steps to reproduce them. For example in JabRef, "pressing the button
"Undo" will clear all the text in the field, but then pressing the button "Redo" will not
recover the text". Furthermore, some systems do not revert interactions step by step but
entirely. This can imply a failure from a user’s point view, but sometimes it is considered
as an invalid failure (e.g., requirements vs. usability issues) by developers. In JabRef,
the undo/redo actions did not revert discrete operations. For example, pressing the
button "Undo" clears all texts typed into different text fields instead of clearing only
one field each time the button "undo" is pressed.
Another important point concerns the WIMP vs. post-WIMP GUIs faults. We clas-
sified more failures involving WIMP than post-WIMP widgets. A possible explanation
is that, despite the increasing interactivity of GUIs, the analyzed GUIs still rely more
on WIMP widgets and interactions. Moreover, users now master the behavior of WIMP
widgets so that they can easily identify when they provoke failures. It may not be the
case with ad hoc and post-WIMP widgets.
Java Swing (i.e., JFC GUITAR version 1.1.1)4 . In GUITAR, each test case is composed
by a sequence of widget events. The generation of test cases can be parameterized with
the size of that sequence (i.e., test case length).
Jubula is a semi-automated GUI testing tool that leverages pre-defined libraries to
create test cases. These libraries contain modules that can be reused to generate manu-
ally test sequences. The modules encompass actions (e.g., check, select) and interactions
(e.g., click, drag and drop) over different GUI toolkits (e.g., swing, SWT, RCP, mobile).
We have reused the library dedicated to Java Swing (Jubula version 7.2) to write the
test cases presented in the next experiments. This library contains actions to test only
standard widgets such as dragging a column/row of a table by passing an index. To
test ad hoc widgets (e.g., canvas), we made a workaround by mapping actions directly
to these widgets. For example, to draw a shape on canvas we need to specify the exact
position (e.g., drag and drop coordinates) where the interaction should be executed.
3.3.2 Experiment
We selected JabRef5 , a software to manage bibliographic references. JabRef is written
in Java which allows us to apply both GUITAR and Jubula. For each fault described
in our GUI fault model, we selected one reported failure. To reproduce each failure,
we downloaded the corresponding faulty version of JabRef. We used the exact test
sequence (i.e., number of actions) to reproduce a failure. In GUITAR, all test cases
were generated automatically over a faulty version. In Jubula, each test case was created
manually to detect one failure. Also, their test sequences are extracted by analyzing
failure reports (e.g., steps to reproduce a failure) and reusing Jubula’s libraries. Then,
GUITAR and Jubula run all their test cases automatically for checking whether the
selected failure is found.
2. A failure was classified, but we could not reproduce it - only occurred in a specific
environment (e.g., Operating System) or given a certain input (e.g., a particular
database in JabRef).
The reported failures in JabRef are mostly related to WIMP widgets, so we would
expect GUITAR and Jubula to detect them, but it was not the case. For instance,
failure #1 reports an incorrect display of buttons’ label; its root cause is the incorrect
size of a widget positioned to the left of them. Thus, this failure does not affect the
4
http://sourceforge.net/apps/mediawiki/guitar/
5
http://jabref.sourceforge.net/
Are GUI Testing Tools Able to Detect Classified Failures? An Empirical Study 75
values of internal properties (e.g., text, event handlers) of those buttons. In GUITAR,
checking the properties of that widget did not reveal this failure since the expected
and actual values of its size property (e.g., width) remained the same. In Jubula, the
concerned widget cannot be mapped to test cases execution and thus cannot be tested.
Failures #2 and #3 refer to an incorrect menu path and a misspelling, respectively.
Both failures were detected by Jubula. However, these failures were not found by
GUITAR. Indeed, GUITAR does reverse engineering of an existing GUI to produce
tests. If this GUI is faulty, GUITAR will produce tests that will consider these failures
as the correct behavior.
Failures #8 and #13 that lead to a crash of the GUI were found by both GUITAR
and Jubula. However, failures #4, #6, #10, and #11 that affect the data model were
not detected by GUITAR for two reasons. First, GUITAR does not test the table
entries in JabRef since they represent the data model. To do this, we need to extend
GUITAR to interact with them. Second, the test cases successfully passed, but a failure
has been revealed. That means, the events are fired properly (e.g., no exception) and
GUI properties are the "expected" ones. For example, a text property of a status bar
contains the value: "Redo: change field", when this action was actually not redone.
Similarly, failure #10 was not detected by Jubula. This failure reports an unexpected
auto-completion when the action "save" is triggered by shortcuts. We reproduced this
failure manually but the test case was successfully replayed by Jubula. The input text
via keyboard was typed and saved automatically without any interference of the auto-
completion feature.
Another point is the accuracy of test cases generated manually in Jubula. Detecting
failure #6 depends on how the test case is written. For example, adding a field that
contains LaTeX commands (e.g., 100\%), and then checking its output in a preview
window should not contain any command (e.g., 100%). So, we can write a test case
to test the outputs in the preview window only looking for commands (e.g., SelectPat-
tern[%, equals] in ComponentText[preview] ). Or, write a test case to check whether an
76 GUI fault model
entire text matches to the expected one (e.g., CheckText[100%, equals] in Component-
Text[preview] ). However, the last test case will fail since a text from preview window
in JabRef is shown internally as HTML and, in Jubula, the action’s parameters cannot
be specified in that format.
Our experiment does not aim at comparing both tools since GUITAR is a fully
automated tool contrary to Jubula. However, the results of this study highlight the
current limitations of GUI testing tools. GUITAR and Jubula currently mainly work
for detecting failures that affect properties of standard widgets. Moreover, GUITAR
does GUI regression testing: it considers a given GUI as the reference one from which
tests will be produced. If this GUI is faulty, GUITAR will produce tests that will
consider these failures as the correct behavior. A possible solution to overcome this
issue is to base the test process on the specifications (requirements, etc.) of the GUI.
1. Data are dynamically presented in the widget in an ad hoc way. In our case, data
is graphically represented in 2D.
2. Data presentations can be supplemented with widgets to interact directly with the
6
http://sourceforge.net/projects/latexdraw/
Forging faulty GUIs for benchmarking 77
underlying data. For example with Figure 3.3, the shape "Circle" is surrounded
by eight scaling handlers and one rotation handler.
3. Ad hoc interactions are provided to interact with the data representations and
their associated widgets. These interactions are usually more complex than stan-
dard widget’s interactions. For instance, the scaling and rotation handlers in
LaTeXDraw can be used with a drag-and-drop (DnD) interaction. With a multi-
touch screen, one can zoom on shapes using two fingers (a bi-manual interaction)
as it is typically more and more the case with mobile phones and tablets.
artificial faults into the program and check if they are detected by the test. A program
with a planted fault is called a mutant of the original program" [ZHM97]. Following this
principle, we planted 65 faults in LaTeXDraw, a highly interactive open-source software
system.
We created 65 mutants corresponding to the different faults of our proposed fault
model. All these mutants and the original version are freely available7 . Each mutant is
documented to detail its planted fault and the oracle permitting to find it7 . Multiple
mutants have been created from each fault by: using WIMP (22 mutants) or post-
WIMP (43 mutants) widgets to kill the mutants; varying the test case length (i.e., the
number of actions required to provoke the failure). Each action (e.g., select a shape)
requires a minimal number of events (e.g., in LaTeXDraw a DnD requires at least three
events: press/move/release) to be executed. Table 3.5 illustrates the attributes and
their values for a GUI mutant8 planted in LaTeXDraw concerning the Action category.
Tables 3.6 and 3.7 summarize the number of forged mutants and the minimal and
maximal test case length for user interface and user interaction faults, respectively. For
example, a length 0..2 means there exists at least one mutant requiring a minimum of 0
action or a maximum of 2 actions. However, the fault RVSB3 is currently not covered
by the LaTeXDraw mutants. Similarly, some planted mutants only rely on post-WIMP
interactions or widgets (e.g., IB1, DT1).
7
https://github.com/arnobl/latexdraw-mutants
8
https://github.com/arnobl/latexdraw-mutants/tree/master/GUImutants/mutant35
Forging faulty GUIs for benchmarking 79
3.4.3 How GUI testing tools kill our GUI mutants: a first experiment
We applied the GUI testing tools GUITAR and Jubula on the mutants to evaluate their
ability to kill them. Our goal is not to provide benchmarks against these tools but
rather highlight the current challenges in testing interactive systems not considered yet
(e.g., post-WIMP interactions). GUITAR test cases have been generated automatically
while Jubula ones have been written manually.
Considering the mutants planted at the user interface level, Jubula and GUITAR
tests killed the mutants that involve checking standard widget properties, such as layout
(e.g., width, height) and state (e.g., enable, focusable). Also, it is possible to test simple
data (e.g., string values on text fields) on those widgets. However, most of the mutants
that concern the ad hoc widgets were alive. Notably, when test cases involve testing
complex data from the data model. For example, it is not possible to compare the results
of the actual shape on canvas against the expected one. Even if some shape properties
(e.g., rotation angle) are presented on standard widgets (e.g., spinner), GUITAR and
Jubula cannot state whether the current values in these widgets match the expected
shape rotation on the canvas.
Likewise, our GUITAR and Jubula tests cannot kill most of the user interaction
mutants that result on a wrong presentation of shapes. In particular, when we tested
mutants planted into the Reversibility or Feedback categories. For example, testing
undo/redo operations in LaTeXDraw should compare all states to manipulate a shape
on canvas. Moreover, the tests verdict on Jubula passed even though interactions are
defined incorrectly (e.g., mouse cursor does not follow a DnD) or actions cannot be
executed (e.g., a button is disabled). In GUITAR, the generated test cases do not cover
properly actions having dependencies. For example, the action "Delete" in LaTeXDraw
requires first selecting a shape on canvas. However, no test sequence that contains
"Select Shape" before "Delete Shape" was generated. Thus, some mutants could not be
killed.
Table 3.8 gives an overview of the number of mutants killed by GUITAR and Jubula.
The results show that both tools are not able to kill all mutants because of the four
following reasons:
1. Testing LaTeXDraw with GUITAR and Jubula is limited to the test of the standard
80 GUI fault model
Swing widgets. In Jubula, the test cases are only written according to libraries
available in the Swing toolkit. In GUITAR, the basic package for Java Swing
GUIs only covers standard widgets and mono-events (e.g., a click on a button).
4. It is not possible to give a test verdict for complex data. The oracle provided by
the 2 GUI testing tools do not know the internal behavior of ad hoc widgets, their
interaction features and data presentation.
These results answer the research question by highlighting the benefits of our fault
model for measuring the ability of GUI testing tools in finding GUI failures.
reports to describe failures. To deal with this, we based the classification on the bug
report artifacts (patches, logs, etc.) to identify the root cause of the reported failures.
1. All of them have been provoked by the use of standard widgets with mono-event
interactions, mainly buttons and text fields;
While several of the tested interactive systems provide advanced interactive features,
all the reported failures are related to standard widgets. For instance, ArgoUML10
is a modeling tool having a drawing area for sketching diagrams similarly to LaTeX-
Draw. None of the nine failures found by GUITAR on ArgoUML has been detected by
interacting with this drawing area.
We applied GUITAR on LaTeXDraw to evaluate how it manages the mix of both
standard widgets and ad hoc ones, i.e., the drawing area and its content. If standard
widgets were successfully tested, no test script interacted with the drawing area. In the
next section, we identify the reasons of this limit and explain what is mandatory for
resolving it.
creation process. In this subsection we explain how current GUI and test models hinder
the ability to test advanced GUIs.
Current GUI models are not expressive enough. Languages used to build GUI
models, called UIDLs, are a corner-stone in the testing process. Their expressiveness
has a direct impact on the concepts that can compose a GUI test model (e.g., an EFG).
That has, therefore, an impact on the ability of generated GUI test cases to detect
various GUI failures. For instance, GUITAR uses its own UIDL that captures GUI
structures (the widgets that compose a GUI and their layout). However, in the current
trend of developing highly interactive GUIs that use ad hoc widgets, current UIDLs
used to test GUIs are no longer expressive enough as pointed out below.
1. UIDLs currently used for GUIs testing describe the widgets but not how to interact
with them. The reason behind this choice is that current GUI testing frameworks
test standard widgets, which behavior is the same in many GUI platforms. For
instance, buttons work by pressing on it using a pointing device on all GUI plat-
forms. This choice is no more adapted for advanced GUIs that rely more and more
on ad hoc widgets and interactions. Indeed, the behavior of these tests has been
developed specially for a GUI and is thus not standard. As depicted in Figure 3.4,
GUITAR embedds the definition of how to interact with widgets directly in the
Java code of the framework. Test scripts notify the framework of the widgets to
use on the SUT. The framework uses its widgets definitions to interact with them.
So, supporting a new widget implies extending the framework. Even in this case,
if users can interact with a widget using different interactions the framework ran-
domly selects one of the possible interactions. Thus, the choice of the interaction
to use must be clearly specified in GUI models. That will permit to generate a
test model (e.g., an EFG) that can explore all the possible interactions instead of
a single one. UIDLs must be expressive enough for expressing such interactions
in GUI models.
Button x
Drawing area
(DnD expected)
SUT
Figure 3.4: Representation of how interactions are currently managed and the current
limit
3. Current UIDLs do not describe the result of the use of one widget on the system.
Users interact with a GUI in a given purpose. The action resulting from the
interaction of a user with a GUI should be described in GUI models. Similarly, the
dependencies between actions and the fact that some actions can be undone and
redone should be specified as well. For instance, an action paste can be performed
only if an action copy or cut has been already performed. That would help the
creation of GUI oracles able to state whether the result of one interaction had the
expected results. For instance with LaTeXDraw, a GUI model could specify that
executing a DnD on the drawing area can move a shape at a specific position if a
shape is targeted by the DnD. From this definition, a GUI oracle could check that
on a DnD the shape has been moved at the expected position. Another benefit
would be the ability to reduce the number of the generated test cases by leveraging
the dependencies between actions. Such work has been already proposed by Cohen
et al. [CHM12] and must be capitalized in GUI models.
Current EFGs mix both interaction, widget, and action under the term
event. Test models, i.e., EFGs in our case, are graphs of all the possible events that
can be produced by interacting with a GUI. An event is both the widget used and its
underlying interaction. The name of the event usually gives an indication regarding
the action produced when interacting with the widget. For instance, Cohen et al.
describe a test model example is composed of the events Draw, Paste, Copy, Cut, etc.
corresponding to both the menu items that produce these events and their interaction,
here Menu Pressed [CHM12]. The name of each event describes the action resulting
from the use of the widget. However, this mix of concepts may hamper the testing
process of advanced GUIs as explained as follows.
1. Actions must be reified in EFGs. Currently, GUI oracles detect crashes or regres-
sions and are not supplemented with information coming from EFGs. However,
testing advanced GUIs requires the detection of other kinds of failures as explained
in Section 3.4.1. In this chapter, we empirically identified and classified various
kinds of GUI failures that can affect both WIMP and post-WIMP GUIs (see Sec-
tion 3.1). For instance with LaTeXDraw, a GUI oracle must be able to state
whether a shape has been correctly moved. EFGs should contain information
about the actions defined in GUI models in order to provide GUI oracles with
information mandatory for stating on test verdicts.
2. Interactions and widgets must be clearly separated in EFGs. As depicted by the left
part of Figure 3.5, mixing interaction and widget works for standard widgets that
use a mono-event interaction. For testing ad hoc widgets using several multi-event
84 GUI fault model
interactions, EFGs should explicitly describe how user interactions work. For
instance, the right part of Figure 3.5 precisely specifies the events that compose
the interaction performed by a user (here a multi-touch interaction) to rotate
shapes or to cancel the interaction: two touch pressures followed by a move and
a touch release to rotate shapes; canceling the interaction in progress consists of
pronouncing the word "cancel". In this case, EFGs would be able to support ad
hoc interactions.
Press
MenuSelectAll Press
Move
voice Release
"cancel"
Figure 3.5: EFG sequence on the left. Interaction-action sequence on the right.
We explained in this section the precise limits of the current GUI model-based testing
approaches for testing advanced GUIs. We claim that these limitations stem from two
facts: first, there is a lack of proper abstractions to build test model for testing advanced
GUIs; second, only few GUI oracles are currently considered while we demonstrated in
Section 3.1 the diversity of faults that can affect GUIs.
and design patterns, notably the instrumental interaction [BL00], the direct manipula-
tion [Shn83], and the Command and Memento design patterns [GHJV95]. The Malai
implementation11 provides a model-based UIDL.
Malai decomposes an interactive system as a set of presentations and instruments
(see Figure 3.6). A presentation is composed of an abstract presentation and a concrete
presentation. An abstract presentation is the data model of the system (the model in
MVC). A concrete presentation is a graphical representation of an abstract presentation
(the view in MVC).
Interactive System
Presentation
Abstract ③
presentation Action
Instrument
Concrete
presentation Interaction
①
Event
An action encapsulates what users can modify in the system. For instance, LaTeX-
Draw (recall Section 3.4.1), has numerous actions such as rotating shapes. An action
does not specify how users have to interact with the system to perform it. An action
just specifies the results of a user interaction on the system. An action can also depend
on other actions for being executed. For example, the action paste can be executed
only if an action copy or cut has been executed before. An action is executed on a
presentation (link Æ).
An interaction is represented by a finite-state machine where each transition cor-
responds to an event produced by an input device (Figure 3.6, link ¨). Using FSMs
for defining interactions permits the conception of structured multi-event interactions,
such as DnD, multi-touch, or multi-modal interactions. An interaction is independent
of any interactive system that may use it. For instance, a bi-manual interaction, as de-
picted by Figure 3.7, is defined as an FSM using events produced by pointing devices or
speech recognizers (e.g., pressure, voice). This interaction does not specify the actions
to perform on a system when executed (e.g., rotate shapes). The interaction depicted
by Figure 3.7 starts at the initial state (black circle) and ends when entering a terminal
(double-lined circle) or an aborting (crossed out circle) state. Aborting states permit
users to abort the interaction they are performing. Transitions (e.g., press and move)
can be supplemented with a condition constraining the trigger of the transition. For
11
https://github.com/arnobl/Malai
86 GUI fault model
instance, the interaction goes into the aborting state thanks to the transition voice only
if the pronounced word is "abort".
release
move | pressed
release
voice | "abort"
press
press
Because actions and interactions are independently defined, the role of instruments
is to transform input interactions into output actions (link ≠). Instruments reify the
concept of tool one uses in her every day life to manipulate objects [BL00]. For instance,
Figure 3.8 describes the instrument Hand as it could have been defined in Latexdraw.
The goal of this instrument is to move and rotate shapes. Performing these actions
requires different interactions: rotating shapes can be done using the bimanual inter-
action previously depicted; moving shapes can be done using a DnD interaction. In
Malai such tuples interaction-action are called interactors and one instrument can have
several interactors, i.e., one can handle an instrument with different interactions to
execute different actions. In an interactor, the execution of the action is constrained
by a condition. For example, the interactor Bimanual2Rotate (Figure 3.8) permits the
execution of the action RotateShapes only if the source and target objects of the bi-
manual interaction are the same shape: src == tgt ∧ src is Shape (using a bi-manual
interaction to rotate a shape may imply that the two pressures are done on the targeted
shape).
In Malai, the GUIs of a system are composed of the widgets provided by all the
instruments: an interaction can be based on a widget (e.g., a click on a button) and in
this case, instruments using such interactions create and provide these widgets.
Malai supports mono-event as well as multi-event interactions; interactions work
as FSM that may help the test model generation process; actions can be defined and
dependencies between them can be specified; instruments and their interactors can be
viewed as FSMs as well. Because in Malai interactions, interactors, and instruments
are FSMs, the creation of test models is eased as detailed in the next subsection. Such
FSM models capture all the possible scenarios that users can do while interacting with
the system. A path in such FSMs corresponds to one scenario.
Instrument Hand
Link Bimanual2Rotate
Interaction Bimanual
src==tgt ∧
src is Shape Action
RotateShapes
Link DnD2Move
Interaction DnD
src is Shape Action
MoveShapes
the same idea than an EFG by sequencing all the possible user interactions. The
difference is that the concepts of interaction, action, and widget are clearly separated,
and interactions and actions are included in IFGs. The goal of such separation is to be
able to test a widget using its different interaction, and to test that the effective result
of one execution of an interaction is the result expected ass defined in the action. So, an
IFG is a sequence of interaction-to-action tuples. Each tuple is in fact an FSM composed
of two states (the interaction and the action) and one transition (the condition that
permit to execute the action). For instance, Figure 3.9 describes an IFG composed of
the three instruments: the Pencil (¨) that permits to draw shapes; the EditingSelector
(≠) that permits to select the kind of shapes to draw; the Hand (Æ) that permits to
move or scale shapes. At start-up, only the instruments Pencil and EditingSelector are
activated and can be used. So, the interaction-action tuples ¨ and ≠ are available. The
instrument EditingSelector permits to use either the Pencil or the Hand instrument.
So, using this instrument can lead to all the tuples ¨, ≠, and Æ. Figure 3.9 does not
represent the widgets associated to each tuple but the tuples ¨ and Æ deal with the
drawing area while the tuple ≠ deals with a set of buttons.
To create such sequences from Malai models, we need to define which instruments
can be used at a given instant. Malai models do not explicitly specify the relations
between instruments, i.e., they do not specify which instruments can be handled after
having used an instrument. Such relations are mandatory to obtain from Malai models
a graph of instruments from which test scripts can be produced.
Inferring these relations is automatically performed by analyzing the Malai actions:
there exists actions that activate or inactivate instruments. For instance with Figure 3.9,
the instrument EditingSelector is dedicated to select (i.e., to activate/inactivate) be-
tween the instruments Hand or Pencil. Thus, after having used the EditingSelector in-
strument, either the instruments Hand or Pencil can be used (transitions ¨). Similarly,
88 GUI fault model
¨ ≠
Pencil Editing
Interaction DnD Interaction Selector
ButtonPressed
button==handB
∨ button==penB
srcPt!=tgtPt
Action Action
AddShape SelectInstrument
Hand Æ Æ
Interaction DnD Interaction Bimanual
Action Action
MoveShapes ResizeShapes
after having used the instrument Hand, respectively Pencil, the instrument EditingSe-
lector can be used only (Pencil, respectively Hand, is inactivated, transitions ≠).
Based on the IFG, the test cases can be generated to test both standard and ad-
vanced GUIs. For instance, we were able to test multiple ad hoc interactions used on
advanced widgets. Also, actions permitted to compare the effective result of interacting
with SUTs and the expected one. We do not provide, however, a fully automated frame-
work as the case of GUITAR. Instead, we manually built Malai models and manually
executed the test scripts.
In the next subsection, we present the benefits of this approach through two different
use cases and describe research challenges to overcome.
LaTeXDraw
Table 3.9 gives an overview of the defects found during our experiment on LaTeX-
Draw. The manual execution of the tests led to the detection of four defects. All
of them were not reported in the official bug trackers of LaTeXDraw. These defects
concern different parts of the system.
Defects #1 and #2 have detected that several user interactions did not work as
expected. These user interactions are a multi-click and a DnD interactions. Both of
them are ad hoc interactions developed specifically for LaTeXDraw. With a multi-click
interaction, users can click is several locations in the drawing area to create a shape
composed of several points. In our case, executing the interaction did not perform the
excepted action, i.e., the creation of the shape. The DnD interaction can be aborted.
It means that while executing the DnD, the user can press the key "escape" to stop
the interaction and to not execute the associated action. These two defects cannot
be find by GUITAR since: they have been provoked by non-standard interactions; the
two interactions can be executed on the drawing area and we explained that GUITAR
cannot support multiple interactions for a same widget.
Defect #3 was an issue in the data model where changes in the Latexdraw’s prefer-
ences were not considered.
Defect #4 was not a LaTeXDraw issue but a Java Swing one: clicking on the zoom
buttons had not effect if performed too quickly. Still, this defect has been detected
thanks to the action that zooms on and out the drawing area.
• automating as far as possible the current error-prone, expensive, and manual GUI
testing process;
• finding GUI errors as early as possible in the development phases to reduce the
development cost;
12
http://www.cluster-connexion.fr/
90 GUI fault model
• finding various kinds of GUI errors (not only crashes) on various kinds of ad hoc
widgets.
The main constraint, imposed by norms [IEC95, IEC10], is that the testing process
of I&C systems must be driven from their specifications. So, producing entire GUI
models using reverse engineering techniques from the SUT is not possible (some specific
information, however, can be extracted from GUIs to be used in GUI models as discussed
below).
Similarly to the previous case study (i.e., LaTeXDraw), we manually design GUI
models from the provided specifications using Malai. We then automatically generate
abstract GUI tests. These last are then concretized to produce executable GUI tests that
run on top of the SCADE LifeCycle Generic Qualified Testing Environment (SCADE
QTE), a tool for testing GUIs developed with the SCADE Display technologies13 . The
concretization phase includes two steps:
• Mapping abstract test cases to the targeted testing framework and SUT. This
step requires manual operations to map GUI model elements to their correspond
elements in the GUI under test. For instance, the name of the widgets of the
GUI under test have to be mapped to the widgets defined in the GUI models. To
avoid this step, the specifications have to precisely specify the names to use, as
discussed later in this section.
• Generating GUI oracles. We currently generate basic oracles that checks the
correct GUI workflow when interacting with widgets.
These preliminary results highlighted the capability of the IFG, based on Malai
UIDL, produces test cases capable of detecting some defects in advanced GUIs. How-
ever, if the Malai expressiveness permits to test advanced GUIs, we face two UI testing
challenges relative to the concretization phase of MBT process (see Figure 2.6 in Chap-
ter 2):
GUI oracles generation. GUI oracles have to be produced as far as possible from
GUI models. Our proposed fault model describes several faults that imply different
ways to detect them. GUI faults imply the development of multiple GUI oracles based
on techniques diametrically different. For instance, the graphical nature of GUIs re-
quires their graphical rendering to be checked. To do so, a possible oracle consists of
comparing screenshots of GUIs to detect differences. This oracle thus uses image pro-
cessing techniques. However, checking that a widget is correctly activated can be done
using classical code unit testing techniques. These differences between GUI oracles com-
plexifies the GUI model-based testing process that has to generate such oracles. One
challenge to tackle is how to write and generate GUI oracles to detect such failures?
New GUI testing techniques have thus to be developed to automate the generation of
GUI oracles being able to detect those failures other than standard GUI failures or
crashes.
13
http://www.esterel-technologies.com/
Conclusion 91
3.8 Conclusion
This chapter proposes a GUI fault model for providing GUI testers with benchmark
tools to evaluate the ability of GUI testing tools to detect GUI failures. This fault
model has been empirically assessed by analyzing and classifying into it 279 GUI bug
reports of different open-source GUIs. We have used our fault model to evaluate the
limits of GUI testing frameworks. The experiments have shown that if current GUI
testing tools have demonstrated their ability for finding several kinds of GUI failures,
they also fail at detecting several GUI faults we identified. The underlying reasons are
twofold. First, GUI failures may be related to the graphical rendering of GUIs. Testing
a GUI rendering is a complex task since current testing techniques mainly rely on code
analysis that can hardly capture graphical properties. Second, the current trend in GUI
design is the shift from designing GUIs composed of standard widgets to designing GUIs
relying on more complex interactions and ad hoc widgets. Instead, current GUI testing
frameworks rely on the concepts of standard widgets and their simple interactions to
provide an automated GUI testing.
New GUI testing techniques have thus to be proposed for fully GUI testing, as
automated as possible, GUI rendering and complex interactions using ad hoc widgets.
For this purpose, our fault model can assist GUI testing techniques to cover more GUI
faults. We have provided 65 GUI mutants planted into standard and ad hoc widgets of
a highly interactive GUI system to forge several GUI faults. These mutants are freely
available and thus can be used by testing community to achieve that purpose. For
example, the third experiment, we run on GUITAR and Jubula, has shown that 43 out
of 65 GUI mutants remained alive. Most of them concern the graphical data rendering
of ad hoc widgets and their complex interactions. These results help their developers
to understand which kinds of GUI faults were not detected by their GUI testing tools.
Chapter 4
In Chapter 3, we presented a new GUI fault model that describes faults specific for GUIs.
Numerous GUI testing techniques have been proposed to detect errors that affect GUI
systems. Besides, GUI errors may be accentuated by the low quality of GUI code. For
example, the presence of design smells has been correlated to the introduction of faults
[LS07, HZBS14] or other aspects of code quality such as maintainability [SYA+ 13].
Static analyses are developed to find defects in code or to measure the code quality.
To contribute to the code assurance quality of GUIs we focus on studying GUI design
smells.
This chapter presents a novel static analysis technique to automatically detect a
new type of GUI design smell, the Blob listener . In particular, we have designed new
heuristics that enable to search and locate Blob listeners in the source code. The
main challenge of this analysis is to infer the code regions related to GUI and GUI
controllers and to automatically locate the GUI commands handled by these controllers.
The most critical step is to design and implement a heuristic that accurately detects
GUI commands in GUI listeners. The reasons are twofold. First, GUI listeners may
have conditional blocks that do not handle events produced by widgets, i.e., non-GUI
commands. Also, we have identified three variants of Blob listeners that are different
ways to identify the widget that produced the event: 1. comparing a property of the
widget; 2. checking the type of the widget; and 3. comparing widget references. Second,
GUI commands can be nested within other commands. To overcome this step, we have
implemented an algorithm that infers recursively all references in conditional statements
and determines whether they refer to a GUI object. To manage the case of the nested
commands, our algorithm removes the irrelevant ones based on that variants.
We implement our static analysis in a tool called InspectorGuidget. This tool
is an Eclipse plug-in publicly available1 dedicated to Java software systems based on
the Swing toolkit. The ability of our tool to detect Blob listeners is evaluated on six
representative highly interactive software systems. These systems cover several user
1
https://github.com/diverse-project/InspectorGuidget
93
94 GUI Design Smells: The Case of Blob Listener
interactions implemented in different kind of GUI listeners. To build a ground truth for
our experiments we manually retrieved all instances of Blob listeners in each application.
We then run our tool against each application to detect GUI listeners, commands, and
Blob listeners.
In this chapter, we first introduce the motivation of studying GUI controllers and
explain why we have identified a Blob listener as a GUI design smell (Section 4.1).
We then present the empirical study to investigate the current development practices
regarding listeners in GUI controllers (Section 4.2). Based on this study, we identify
and characterize a Blob listener as a GUI design smell presented in Section 4.3. We
then propose a systematic static code analysis procedure to search for Blob listeners
(Section 4.4). Next, we evaluate our approach and present the results (Section 4.5).
The threats to validity of the experiments are presented in Section 4.6. We also discuss
the scope of InspectorGuidget and the good coding practices we proposed to avoid
the use of Blob listeners (Section 4.7). This chapter ends with the preliminary steps
towards refactoring Blob listeners (Section 4.8).
4.1 Introduction
Software engineers develop GUIs following widespread architectural design patterns
that consider GUIs as first-class concerns (recall Chapter 2). GUI implementations rely
on event-driven programming where events are handled by controllers (or presenters2 ).
Listing 4.4 illustrates a simple example of GUI controller in Java Swing code. In this
example, the AController manages events produced by two widgets, b1 and b2 (Lines 2–
3). To handle events these widgets trigger in response of users’ interactions, AController
implements the ActionListener. The code that reacts to the events is included in the
GUI listener actionPerformed. A GUI command, i.e., a set of statements executed in
reaction of a GUI event, is produced for each widget (Lines 8 and 10). We define a Blob
listener as a GUI listener that can produce multiple GUI commands.
if (!BaseEditDelegate.getEditMode(m_gameData)
&& (!(e.isControlDown() || e.isAltDown() || e.isShiftDown())
&& e.getButton() == MouseEvent.BUTTON1
&& (m_clickedAt != null
&& m_releasedAt != null)))...
4. Faults. Faults have been reported into Blob listeners. One example is the two
issues reported in a code repository3 from Github. Listing 4.3 shows the excerpt
of the GUI faulty code.
1 public class JWhiteBoard extends ReceiverAdapter
2 implements ActionListener, ChannelListener {
3 //..
4 clearButton=new JButton("Clean");
5 clearButton.addActionListener(this);
6 leaveButton=new JButton("Exit");
7 leaveButton.addActionListener(this);
8 //...more than 150 lines of code
9
10 @Override public void actionPerformed(ActionEvent e) {
11 String command=e.getActionCommand();
12 if("Clear".equals(command)) {//GUI fault in Command #1
13 if(noChannel) {
14 clearPanel();
15 return;
16 }
17 sendClearPanelMsg();
18 }
19 else if("Leave".equals(command)) {//GUI fault in Command #2
20 stop();
21 }//...
22 }//...
23 }
The listener method "actionPerformed " (Lines 10–22) handles two commands:
Command #1 (Lines 12–18) and Command #2 (Lines 19–21). Command #1 and
Command #2 have a set of statements to be executed when the widgets labeled
"Clear " and "Leave" are pressed, respectively. However, GUI failures will occur
when users interact with such widgets. The two GUI faults are located in the
conditional expressions of that commands (Lines 12 and 19) used to check which
widgets produced the event. The strings associated to identify each widget are
incorrect. The correct strings are "Clean" and "Exit" (Lines 4 and 6) instead of
"Clear " and "Leave", respectively. Both GUI faults are related to the category
Action of our fault model (recall Table 3.2 in Chapter 3).
The above scenarios highlight our findings in the implementations of GUI controllers.
These findings provide a strong motivation for characterizing Blob listeners as a new
kind of GUI design smells, and for the development of tools that detect them. Indeed,
a Blob listener is a GUI specific instance of a more general bad smell: God method
[BMMM98], that characterizes methods that "know too much or do too much".
In the next sections, we present the systematic approach to identify and to detect
automatically Blob listeners in GUI controllers.
RQ0 Is there any difference in term of code metrics between GUI listener methods and
methods that are not listeners?
This preliminary analysis focuses on the Java Swing toolkit because of its popularity
and the large quantity of Java Swing legacy code stored on various Web code hosting
services (e.g., github.com or sourceforge.net). In this study, we collected 511
code repositories from Github that contain references to Java Swing in their description8 .
4
http://docs.oracle.com/javase/8/javase-clienttechnologies.htm
5
http://www.gwtproject.org/
6
https://www.eclipse.org/swt/
7
The sources used in our study and the resulting data are available here: https://github.com/
diverse-project/InspectorGuidget
8
The following Github query has been used: https://api.github.com/search/repositories?q=
swing+GUI+in:description,readme+language:java&sort=stars&per_page=100. Because of resource
constraints, only the 511 most starred repositories out of the 1200 available have been analyzed.
98 GUI Design Smells: The Case of Blob Listener
Java Swing provides a set of GUI listeners that controllers can implement. To identify
GUI listener methods, we search for all the methods that match a Java Swing listener
method. The 511 repositories contain 1 935 262 lines of Java code (LoC): 16 617 listener
methods (3.6 % of the total LoCs) and 319 795 non-listener methods (96.4 % of LoCs).
To obtain an overview of the structure and the complexity of each Java method, we
collected the cyclomatic complexity (CC) and the number of control flow statements.
Empty methods are ignored.
The results are provided in Table 4.1. We performed an independent samples Mann-
Whitney test [She07] (data do not follow a normal distribution) to compare the data
gathered from listener and non-listener methods, using a 95 % confidence level (i.e.,
p-value<0.05). We observe important variations for all the measured metrics, between
listener and non-listener methods. Switch and loop statements per statement are more
than 62 % to 68 % less used in listeners than in non-listener methods. Similarly, the
mean CC of listeners is 16.4 % less than the mean CC of non-listener methods. Mean-
while, there are 24.3 % more if statements per statement in listeners than in non-listener
methods. All the p-values below 0.05 indicate that the variations we observed are statis-
tically significant and not due to randomness. Based on these observations, we conclude
that Java Swing GUI listener methods are not developed as other methods.
Now, focusing on if statements, Figure 4.1 gives the distribution of the number of
listeners according to their number of if statements. Most of listeners have less than
three if statements (91.7 %, i.e., 13 227 of the 14 420 non-empty analyzed listeners). We
also looked at the kinds of listeners that use more than two if statements in their code
(8.3 %). 47.3 % (i.e., 564) of these listener methods are concrete implementations of
actionPerformed declared in the ActionListener interface (used to receive events from
triggerable widgets such as buttons, check-boxes, and menus).
In the next section we explain why listeners have a higher use of if statements than
other methods.
10000
1000
# listeners
100
10
1
0 20 40 60 80 100 120
# if
Definition 4.1 (GUI Command) A GUI command [GHJV95, BL00], aka. ac-
tion [BB10], is a set of statements executed in reaction to a user interaction, captured
by an input event, performed on a GUI. GUI commands may be supplemented with:
a pre-condition checking whether the command fulfils the prerequisites to be executed;
undo and redo functions for, respectively, canceling and re-executing the command.
Definition 4.2 (Blob listener ) A Blob listener is a GUI listener that can produce
several GUI commands. Blob listeners can produce several commands because of the
multiple widgets they have to manage. In such a case, Blob listeners’ methods (such as
actionPerformed) may be composed of a succession of conditional statements that:
1. Check whether the widget that produced the event to treat is the good one, i.e., the
widget that responds a user interaction; and
2. Execute the command when the widget is identified.
We identified three variants of the Blob listener that are detailed in the next sub-
section.
In Java Swing, the properties used to identify widgets are mainly the name or the
action command of these widgets. The action command is a string used to identify the
kind of commands the widget will trigger. Listing 4.5, related to Listing 4.4, shows how
an action command (lines 2 and 6) and a listener (lines 3 and 7) can be associated to
a widget in Java Swing during the creation of the user interface.
Checking the type of the widget. The second variant of Blob listener consists
of checking the type of the widget that produced the event. Listing 4.6 depicts such a
Blob listener: Definition & Illustration 101
practice where the type of the widget is tested using the operator instanceof (Lines 3,
5, 7, and 9). One may note that such if statements may have nested if statements to
test properties of the widget as explained in the previous point.
In these three variants, multiple if statements are successively defined. Such suc-
cessions are required when one single GUI listener gathers events produced by several
widgets. In this case, the listener needs to identify the widget that produced the event
to process.
The three variants design smell also appear in others Java GUI toolkits, namely
SWT, GWT, and JavaFX. Examples for these toolkits are available on the webpage of
this work1 .
102 GUI Design Smells: The Case of Blob Listener
1. GUI listeners detection: GUI listeners that contain conditional blocks (conditional
GUI listeners) are automatically detected in the source code through a static
analysis (Section 4.4.2);
2. GUI commands detection: the GUI commands, produced while interacting with
widgets, that compose conditional GUI listeners are automatically detected using
a second static analysis (Section 4.4.3); and
3. Blob listeners detection: the analysis of these commands allows us to spot the
GUI listeners that are Blob listeners (Section 4.4.4), i.e., those having more than
one command.
Conditional
Commands Detection GUI listeners Blob Listeners Detection
Figure 4.2: The proposed process for automatically detecting Blob listeners
Detecting Blob listeners 103
For example, five nested conditional blocks (Lines 8, 10, 11, 13, and 15) compose the
listener method actionPerformed in Listing 4.4 (Section 4.3). The first conditional block
checks the type of the widget that produced the event (Line 8). This block contains three
other conditional blocks that identify the widget using its action command (Lines 10, 13,
and 15). Each of these three blocks encapsulates one command to execute in reaction
of the event.
Algorithm 1 details the detection of conditional GUI listeners. The inputs are all
the classes of an application and the list of classes of a GUI toolkit. First, the source
code classes are processed to identify the GUI controllers. When a class implements a
GUI listener (Line 5), all the implemented listener methods are retrieved (Line 6). For
example, a class that implements the MouseMotionListener interface must implement
the listener methods mouseDragged and mouseMoved. Next, each GUI listener is ana-
lyzed to identify those having at least one conditional statement (Lines 8 and 9). All
listeners with those statements are considered as conditional GUI listeners (Line 10).
statement (Line 7). Next, these conditionals are analyzed to detect any reference to a
GUI event or widget (Line 8). Typical references we found are, for instance:
where e refers to a GUI event, Component to a Swing class, and copy to a Swing
widget. The algorithm recursively analyzes the variables and class attributes used in the
conditional statements until a reference to a GUI object is found in the controller class.
For instance, the variable actionCmd in the following code excerpt is also considered
by the algorithm.
2. A potential command contains more than one potential command. The follow-
ing code excerpt depicts this case. Four potential commands compose this code
(Lines 1, 3, 5, and 7). In this case, the potential commands that contain multi-
ple commands are not considered. In our example, the first potential command
(Line 1) is ignored. One may note that this command checks the type of the
widget, which is a variant of Blob listener (see Section 4.3.1). The three nested
commands, however, are the real commands triggered on user interactions.
These two cases are described in Algorithm 2 (Lines 17–21). Given a potential
command, all its nested potential commands are gathered (Lines 15–16). The function
getNestCmds analyses the commands by comparing their code line positions, state-
ments, etc. So, if one command C contains other commands, they are marked as nested
to C. Then, for each potential command and its nested ones: if the number of nested
commands equals 1, the single nested command is ignored (Lines 18–19); if the number
of nested commands is greater than 1, the root command is ignored (Line 21).
106 GUI Design Smells: The Case of Blob Listener
4.5 Evaluation
To evaluate the efficiency of our approach, we address the three following research
questions:
RQ1 To what extent is the detection algorithm able to detect GUI commands in GUI
listeners correctly?
RQ2 To what extent is the detection algorithm able to detect Blob listeners correctly?
RQ3 Are all Blob listeners look alike?
4.5.1 Objects
We conducted our evaluation by selecting six well-known or large open-source software
systems based on the Java Swing toolkit: FastPhotoTagger, GanttProject, JAxoDraw,
Evaluation 107
Jmol, TerpPaint, and TripleA. For each system, we downloaded the source code and
configured its Java project in Eclipse.
Table 4.2 lists the subject systems and Table 4.3 presents their characteristics such
as their number of GUI listeners and LOC size.
4.5.2 Methodology
The accuracy of the static analyses that compose the detection algorithm is measured
by the recall and precision metrics [OPM15]. We ran InspectorGuidget on each
software system to detect GUI listeners, commands, and Blob listeners. We assume as
a precondition that only GUI listeners are correctly identified by our tool. Thus, to
measure the precision and recall of our automated approach, we manually analyzed all
the GUI listeners detected by InspectorGuidget to:
• Check conditional GUI Listeners. For each GUI listener, we manually checked
whether it contains at least one conditional GUI statement. The goal is to an-
swer RQ1 and RQ2 more precisely, by verifying whether all the conditional GUI
listeners are statically analyzed to detect commands and Blob listeners.
• Check commands. We analyzed the conditional statements of GUI listeners to
check whether they encompass commands. Then, recall measures the percentage
of relevant commands that are detected (Equation (4.1)). Precision measures the
percentage of detected commands that are relevant (Equation (4.2)).
|{RelevantCmds} ∩ {DetectedCmds}|
Recallcmd (%) = × 100 (4.1)
|{RelevantCmds}|
108 GUI Design Smells: The Case of Blob Listener
|{RelevantCmds} ∩ {DetectedCmds}|
P recisioncmd (%) = × 100 (4.2)
|{DetectedCmds}|
|{RelevantBlobs} ∩ {DetectedBlobs}|
Recallblob (%) = × 100 (4.3)
|{RelevantBlobs}|
|{RelevantBlobs} ∩ {DetectedBlobs}|
P recisionblob (%) = × 100 (4.4)
|{DetectedBlobs}|
Relevant Blob listeners are all the GUI listeners that handle more than one com-
mand (see Section 4.4). Detecting Blob listeners is therefore dependent on the
commands detection accuracy.
Table 4.4 shows the number of commands successfully detected per software sys-
tem. TripleA has presented the highest number of GUI listeners (580), conditional GUI
listeners (174), and commands (152). One can notice that despite the low number of
conditional GUI listeners that has TerpPaint (4), this software system has 34 detected
commands. So, according to the sample we studied, the number of commands does not
seem to be correlated to the number of conditional GUI listeners.
Table 4.4 also reports the number of FN and FP commands, and the values of the
recall and precision metrics. TripleA and Jmol revealed the highest number of FN,
whereas TerpPaint presented the lowest number of FN. The precision of the command
detection is 99.10 %. Most of the commands (439/443) detected by our algorithm are
relevant. We, however, noticed 76 relevant commands not detected leading to an average
Evaluation 109
recall of 86.05 %. Thus, our algorithm is less accurate in detecting all the commands
than in detecting the relevant ones. For example, TripleA revealed 44 FN commands
and no false positive result, leading to a recall of 77.55 % and a precision of 100 %.
The four FP commands has been observed in JAxoDraw (2) and Jmol (2), leading to
a precision of 98.02 % and 98.10 % respectively.
Figure 4.3 classifies the 76 FN commands according to the cause of their non-
detection. 28 commands were not detected because of the use of widgets inside block
statements rather than inside the conditional statements our approach analyzes. For
example, their conditional expressions refer to boolean or integer types rather than
widget or event types. 16 other commands were not detected since they rely on ad
hoc widgets or GUI listeners. These widgets are developed for a specific purpose and
rely on specific user interactions and complex data representation, as we explained in
Chapter 3. Thus, our approach cannot identify widgets that are not developed under
Java Swing toolkit. All the FN commands reported in this category concern TripleA
(14) and Jmol (2) that use several ad hoc widgets. Similarly, we found eight FN com-
mands that use classes defined outside the Swing class hierarchy. A typical example is
the use of widgets’ models (e.g., classes ButtonModel or TableModel ) in GUI listeners.
110 GUI Design Smells: The Case of Blob Listener
Table 4.5 gives an overview of the results of the Blob listeners detection per software
system. The highest numbers of detected Blob listeners concern TripleA (22), Jmol
(18), and JAxoDraw (16). Similarly to the command detection, we did not observe
a correlation between the number of conditional GUI listeners, commands, and Blob
listeners. Indeed, FastPhotoTagger, GanttProject, and TerpPaint have presented a quite
similar number of detected Blob listeners against a different number of conditional GUI
listeners.
The average recall is 98.81 %. Only the analysis of Jmol produced one FN Blob
listener . In contrast to the command detection, we noticed a higher number of FPs
(30). TripleA presented the highest number of FP Blob listeners (19) followed by Jmol
(4). The average precision is 69.07 %. The average time spent to analyze the software
systems is 10810 milliseconds. It includes the time that Spoon takes to process all classes
plus the time to detect GUI commands and Blob listeners. The worst-case is measured
in TripleA, i.e., the largest system, with 16732 milliseconds. Spoon takes a significant
time to load the classes for large software systems (e.g., 12437 milliseconds out of 16732
milliseconds in TripleA).
Figure 4.4 depicts the FP Blob listeners distribution. We identified 14 FP Blob
listeners are composed of commands based on checking the states of widgets. For
instance, two commands can rely on the selection or the non-selection of a checkbox. 13
FP Blob listeners with several commands run over the same widget’ attribute. Two FP
Blob listeners concern reverse commands, e.g., undo operations. One FP Blob listener
concerns the state of user interactions. Commands are in this case detected, while the
GUI listener code just updates the running user interaction.
Listing 4.8 gives an example of a FP Blob listener detected in TripleA. In this
example, our algorithm detects two commands (Lines 6–7 and Lines 8–9). This
Evaluation 111
GUI listener is therefore classified as a Blob listener . However, these two com-
mands are similar since: i) they are executed over the same widget attribute (i.e.,
m_model and m_playerNameLabel ); ii) they are reverse - enablePlayer vs disable-
Player ; and iii) the conditional statement handles the state of the widget "check box "
(i.e., m_enabledCheckBox is selected or not). Thus, this GUI listener should not be
detected as a Blob listener .
Most of the FP Blob listeners have commands contained throughout a single if-then-
else statement (e.g., Lines 6–9). We observed that these statements are mainly used
to handle the state of widgets. For example, checking whether a check box is selected
(e.g., Line 6) or deselected (e.g., Line 8). Also, most of FP Blob listeners identified
in TripleA are found during a listener registration of a widget (e.g., Line 3). Defining
listeners during a widget registration, i.e., as anonymous class, is a good practice (see
Section 4.7.2) to avoid Blob listeners since we implement the corresponding listener
9
Path: src/games/strategy/engine/framework/startup/ui/ClientSetupPanel.java; lines: 316-326.
112 GUI Design Smells: The Case of Blob Listener
To answer RQ3 we analyze and discuss the Number of Commands per Blob listener
(NCB) for each software system. Figure 4.5 depicts this distribution.
Figure 4.5: Number of GUI commands per Blob listener per software system
This chart highlights that NCB can vary in the same software system. For instance,
the NCB for JAxoDraw ranges from 2 to 23. For TerpPaint, it ranges from 2 to 17.
By contrast, the NCB for GanttProject and TripleA ranges from 2 to 4 and 2 to 6,
respectively.
In the case of JAxoDraw, the Blob listener detected with 23 commands handles
events when widgets change their state. Listing 4.9 illustrates the code excerpt from
this Blob listener . This example shows that for each widget, grouped in a "edit panel ",
that changes its value, the Blob listener implements a GUI command. One may note
that using a single controller for all the widgets produces a "long method " bad smell
with all its consequences. Indeed, a Java method with more than a dozen lines is a
strong indicator that it is a long method11 . Besides, this Blob listener contains several
duplicated code blocks. For example, the Lines 6, 9, and 12 are the same for the
commands #1, #2 and #3, respectively. These code lines get the information from the
model, for instance, the spinners sequence of values. Duplicated code has negative effects
on the maintenance and evolution of software systems [SYA+ 13]. One way to eliminate
those lines is to move the duplicated code to the outside of conditional statements since
they handle the same type of a widget (i.e., JSpinner). The duplicated code must be
placed before the conditional statements for instance before the Line 5.
10
Path: src/TerpPaint.java; lines: 4951-5275.
11
http://martinfowler.com/bliki/CodeSmell.html
Threats to validity 113
The Blob listener with 17 commands in TerPaint has 125 LOC. The design solution
for this Blob listener is quite similar to the Blob listener in JAxoDraw. However, its
design solution is more difficult to understand since it has multiple and nested condi-
tional statements. We have observed several conditional statements used to identify the
widgets that produced the event (Lines 5, 8, and 11). These conditional statements can
be eliminated by following the good coding practices in GUI listeners that we present
in Section 4.7.2.
The Blob listeners detected in TripleA and GanttProject seem less badly designed.
The average NCB for both systems is less than or equal three commands. Indeed,
TripleA implements several GUI listeners as anonymous class, which is one example of
the good coding practices in GUI listeners.
We have also observed Blob listeners with few commands but several LOCs. For
instance, Blob listeners detected with three commands in TripleA have LOC that range
from 23 LOC to 66 LOC whereas in GanttProject one Blob listener with the same
number of commands has 35 LOC. These results highlight that the LOC size of Blob
listeners detected with the same NCB varies even in a same software system.
Our implementation and the initial study (Section 4.2) focus on the Java Swing
toolkit only. We focus on the Java Swing toolkit because of its popularity and the large
quantity of Java Swing legacy code. We provide on the companion web page examples
of Blob listeners in other Java GUI toolkits, namely GWT, SWT, and JavaFX1 .
Construct validity. This threat relates to the perceived overall validity of the
experiments. The detection of FNs and FPs have required a manual analysis of all the
GUI listeners of the software systems. To limit errors during this manual analysis, we
added a debugging feature in InspectorGuidget for highlighting GUI listeners in the
code. We used this feature to browse all the GUI listeners and identify their commands
to state whether these listeners are Blob listeners. During our manual analysis, we did
not notice any error in the GUI listener detection. Another threat is the fact that we
manually determined whether a listener is a Blob listener . To reduce this threat, we
carefully inspected each GUI command highlighted in our tool.
4.7 Discussion
In the next two subsections, we discuss the scope of InspectorGuidget and good
practices that should be used to limit Blob listeners.
Listing 4.11: Good practice for defining controllers: one widget per listener
13
Path: src/TerpPaint.java; lines: 1125-1127;4525-4557.
116 GUI Design Smells: The Case of Blob Listener
Listeners as lambdas. Listing 4.12 illustrates the same code than Listing 4.11 but
using Lambdas supported since Java 8. Lambdas simplify the implementation of anony-
mous class that have a single method to implement. With Lambdas, the header dec-
laration of an anonymous class and of its unique method are not more required. The
parameters of the unique method (here event, Line 10) are declared followed by an ar-
row pointing to instructions composing the body of the Lambdas (i.e., of the method).
Note that only listeners composed of a single method can be written using Lambdas.
Listing 4.12: Same code than in Listing 4.11 but using Java 8 Lambdas
the next steps towards a refactoring Blob listeners. We have started the implementation
of this algorithm. The global solution consists of implementing three main steps:
1. Detecting Blob listeners;
2. Locating where the GUI commands included in Blob listeners are registered; and
3. Extracting GUI commands by producing one command per listener, for instance,
commands into listeners as anonymous classes.
The step one is completed and evaluated as we detailed in Section 4.4 and Section 4.5.
During the evaluation of Blob listeners detection we identified some criteria to precise
our static analysis algorithm and thus decrease the number of false positive Blob listeners
(recall Section 4.7).
We have implemented the step two based on the good coding practices that we
explained in Section 4.7.2. Figure 4.6 shows the activity diagram of the algorithm
to locate in the source code where the widget involved in a command is registered.
First, the algorithm detects all listener registrations in the source code (e.g., butto-
nUndo.addActionListener ). Also, the listener methods that implement a listener in-
terface targeted by a registration are recovered. This information will be used to
check whether one of these methods matches the listener method where a command
is found. This preliminary check improves the overall performance of our algorithm.
Next, the registrations are analyzed to obtain the widget targeted by them. Once
the widget is identified, the algorithm obtains the statements referring to this wid-
get. These statements contain the information about the widget properties such as
buttonUndo.setActionCommand("undo"). A similar process is performed to extract the
properties of commands. The difference is that the properties of commands are ex-
tracted from their conditionals statements (e.g., e.getActionCommand.equals("undo")).
Finally, the properties of commands and widgets are compared, if they match we relate
a command with a registration.
Note that gathering all the properties of a widget or a command is more accurate.
For example, some conditional expressions do not refer directly to a widget attribute:
e.getActionCommand().equals("undo"), where e refers to a non-specified widget. In-
stead, the widget attribute buttonUndo is easily identified in the conditional expression
e.getSource() == buttonUndo.
The last step consist of extracting a command to its registration (in the case we
choose to refactor a Blob listener as as anonymous class). However, several factors must
be analysed to perform a safe refactoring to preserve the code behavior. For example,
the dependency between the listeners as well as the attributes shared by them. Also,
a GUI can be developed in different ways, for instance, a listener may register on a
component, which represents several widgets. In such a case, there is no single widget
targeted by a registration (e.g., this.addEventListener ).
While the goal of the second step is to locate the corresponding listener registrations
of commands, we observed that this step can also be used to identify some GUI faults.
For example, Listing 4.3 illustrates a Blob listener with GUI faults in their two com-
mands (Command #1: Lines 12–18 and Command #2: Lines 19–21). In this case, our
118 GUI Design Smells: The Case of Blob Listener
Figure 4.6: The activity diagram of the algorithm to detect the command registration
algorithm will not identify the listener registrations for both commands since the GUI
faults cause the mismatch between the properties. The values of the property invoked
by the commands (e.g., label "Clear " in Command #1) to identify the widget that
produced the event differ from the property values (e.g., label "Clean") of the widgets
targetted by the listener registration (e.g., clearButton in Line 5).
4.9 Conclusion
We identified and characterized a Blob listener as a new design smell specific for GUIs.
The empirical study has shown evidences that implemented Java GUI listeners use more
conditional statements than other methods. We pored over these listeners and identified
the Blob listeners can degrade the aspects of GUI quality code for instance by leading
the introduction of GUI faults. We have observed three variations of a Blob listener
that may occur in Java Swing listeners. Our proposed static analysis, implemented in
InspectorGuidget, has successfully detected 67 Blob listeners out of 68 in six Java
systems.
We believe that the presence of Blob listeners affects the quality of GUI code. We
Conclusion 119
intend to investigate whether some GUI faults are accentuated by GUI design smells.
This study can be assisted by our GUI fault model presented in Chapter 3. The effects
of GUI design smells on GUI faults can help software engineers to prevent specific GUI
faults, for instance by addressing the research question: which kind of GUI faults found
in GUI controllers are correlated to the presence of Blob listeners? Also, this study will
help us to demonstrate the empirical evidence that an automated refactoring of Blob
listeners is worth to be done.
Part III
121
Chapter 5
This chapter summarizes the contributions of this thesis in GUI testing domain, and
outlines the directions for future research.
• Evaluation of two GUI testing tools against real GUI failures: some failures clas-
sified in our fault model that concern standard widgets were not detected by two
standard GUI testing tools (GUITAR and Jubula). This evaluation demonstrates
that some kinds of faults related to WIMP widgets are more difficult to detect
123
124 Conclusion
by the GUI testing tools, for instance, GUI faults found in the auto-completion
feature. Also, the regression GUI testing tools are not able to detect some failures
when a GUI faulty version is used to produce the test cases.
• Development of GUI mutants: 65 GUI mutants are derived from our fault model
and planted in a highly interactive open-source system - LaTeXDraw (22 into
standard widgets and 43 into ad hoc widgets). These mutants are freely available
and can be reused by developers for benchmarking their GUI testing tools.
• Evaluation of GUI testing tools against GUI mutants: 43 out of 65 mutants were
not killed by that standard GUI testing tools, where 40 of them concern ad hoc
widgets and their user interactions. This evaluation demonstrates that GUI test-
ing frameworks fail in detecting several failures that stem from ad hoc widgets,
their multi-event interactions and data presentation.
• An empirical analysis of GUI controllers: the code of Java Swing listeners re-
trieved from 511 code repositories in Github was observed. The results pointed
out a higher use of conditional statements (e.g., if ) in listeners used to identify
widgets.
• A static analysis to automatically detect the presence of Blob listeners in Java sys-
tems: contrary to other code smells that can be located with structural rules on
the whole code base of a software system, the detection of Blob listener requires
a preliminary semantic code analysis to isolate the parts of the code base re-
lated to GUI implementation (e.g., identification of GUI listeners and conditional
GUI listeners). The second step of our analysis consisted in detecting the GUI
commands, i.e., GUI actions, that can produce each implemented GUI listener.
• Good coding practices to avoid the use of Blob listeners: these practices represent
a step towards the automated refactoring of Blob listeners.
Experiments and tools. We have run our experiments on large interactive sys-
tems, which required building tools that can handle the complexity of real-world soft-
ware.
The subject systems have been selected by using several criteria such as interac-
tive features (e.g., direct manipulation, feedback), input devices (e.g., mouse, touch),
the type of widgets (ad hoc), data graphics (2D or 3D objects). We also used an
open-source interactive system to calibrate our tools during the experiments. For ex-
ample, LaTeXDraw has been used to develop the GUI mutants as well as to bootstrap
InspectorGuidget against Java GUI listeners.
The empirical analysis to evaluate the GUI fault model against real GUI failures
was performed by carefully selecting the bug reports of five interactive systems. So, we
analysed the bug report artifacts (e.g., description, patches, logs, or stack traces) to
identify and classify the faults in our fault model. Our examination of GUI develop-
ment practices relies on open-source code hosting repositories that we identified several
interactive systems.
To run the GUI testing tools against the GUI mutants (e.g., GUITAR and Jubula),
we have studied their documentation such as developer/user manuals, technical reports
or research papers, homepages, and discussion forums. This study allowed us to mini-
mize several threats regarding our experiments. To develop the GUI mutants, we have
studied the code of LaTeXDraw in different versions to understand the GUI behavior
of Java Swing implementations.
To implement InspectorGuidget we leveraged the existing tools to provide a
more mature solution. For example, our static analysis uses Spoon to transform and
analyse the Java source code. We also have implemented InspectorGuidget as an
Eclipse plugin to facilitate the integration with other Java tools. Indeed, the evaluation
of InspectorGuidget has been performed on six Java applications, which differ from
the systems that we evaluated our fault model.
We built a complete data set of all experiments performed during this research
work. This data set can be used to study for instance the correlation between GUI
implementations and their GUI faults.
5.2 Perspectives
The proposed fault model has been used to identify the limits of GUI testing tools. We
believe that this model can serve as the basis to improve the GUI testing techniques to
cover more GUI faults. Also, these faults may be accentuated by the presence of GUI
design smells. An example is the two bad coding practices listener with empty bodies
and unsafe registration of listeners, as we explain later, that may introduce faults.
The directions for future research around the contributions of this thesis are pre-
sented below.
126 Conclusion
A failure may occur when that listener is used in its subclasses leading to a race
condition. The following code depicts this case. When a subclass extends the class
Listener, the call "super" must be the first statement in its constructor. In this case,
the event listener (i.e., SubClass.actionPerformed()) could get called while the subclass
fields (e.g., list) are not initialized yet.
1
http://www.ibm.com/developerworks/library/j-jtp0618/
Perspectives 127
Another bad practice is several listeners with empty bodies. This practice is com-
mon in listener interfaces that require more than one method to be implemented (e.g.,
MouseListener interface requires five listener methods). This kind of methods allows
empty bodies. However, a code with several empty method bodies is harder to read and
maintain2 . Furthermore, such flexibility may introduce unintended faults. For example,
a listener is registered on a widget and none of its listener methods are implemented
(e.g., all their bodies are empty or commented). Listing 5.2 gives an example of this
practice observed in Jmol. All the three methods required by the MenuListener in-
terface are not implemented: empty body (Line 10 and Line 12) or commented body
(Line 7). In this case, an error may occur when a user interacts with that menu (i.e.,
processingMenu in Line 4). PMD detects unused private methods3 , i.e., private meth-
ods with empty body, without any comment in their body. This rule, however, will not
detect Java listeners with empty body since they are public methods.
Domain-specific mutants
A GUI code can by affected by different kinds of faults that we presented in our fault
model. We have developed GUI mutants derived from our fault model using domain-
independent mutation operators. GUI mutant operators, however, are domain-specific
and thus may differ from one GUI toolkit to another. To develop Java Swing mutants,
we have studied the behavior of GUI implementations in Java code. Thus, we designed
manually the mapping between Java Swing mutants and the GUI faults in our fault
model. For example, the creation of a standard and interactive widget in Java Swing is
done by three basic operations:
1. instantiate a widget and add it to a GUI;
Given this behavior, GUI faults can be forged by breaking the code involved in these
operations.
An example is the mutation operator introduced in the widget registration. This
mutant changes the statement responsible for registering the listener on a widget. In
such a case, when a user interacts with that widget the desired action is not executed
Furthermore, some faults are more difficult to be modelled. One example is to forge
a fault into the drag-and-drop (DnD) interaction. The basic behavior of a DnD starts
by a mouse pressure followed by at least one move and ends with a release. In Java
Swing, a DnD can be implemented by three GUI listeners which are mousePressed,
mouseDragged, and mouseReleased. A mutant Java operator to introduce the fault "in-
teraction behavior " in a DnD may be developed by breaking some attributes shared by
these Java Swing listeners. Similarly, the unsafe use of the this reference in constructors
exemplified previously by the bad practice unsafe registration of listeners may be used
to forge GUI faults.
We also believe that the creation of GUI mutants may be more particular in other
domains, for instance, GUI actions in XML. A future research can provide a mapping
between GUI faults and the domain-specific mutants. One challenge is to develop a
program analysis to automatically introduce such mutants in GUIs. For instance, the
Java Swing mutants that we developed may be automatically planted in other Java
Swing GUIs.
5
The original code is available here: https://github.com/LinhTran95/Asg7--Team10/
blob/master/jWhiteBoard/src/jWhiteBoard/JWhiteBoard.java
Perspectives 129
We can refactor this Blob listener as lambdas supporting by Java 8 (recall 4.7.2 in
Chapter 4) since its implementation has a single method, i.e., actionPerformed. In this
case, the method actionPerformed and the identification of widgets that produced the
event (used by GUI commands) are not required anymore (see Lines 2–8 and Lines 9–
11). Thus, the two reported faults will be removed when the refactoring is done.
A future research on the refactoring of Blob listeners should first select which kinds
of good practices will be applied and then provide the mechanisms to preserve the code’s
behavior. For instance, refactoring a Blob listener by producing one GUI command per
listener (i.e., listener as anonymous class recall Chapter 4) should locate where the
widget used in a GUI command is registered and handle its dependencies (e.g., shared
attributes) within the Blob listener to perform the refactoring correctly. Thus, one
challenge is to ensure that the refactoring approach will not introduce new GUI faults.
To prevent this, the refactored software systems must be tested using non-regression
GUI test suites generated from the original software systems. Another challenge is
to ensure that the refactoring will not degrade the GUI code quality. So, we should
compare the code quality of the original and refactored code by using some code quality
metrics such as cyclomatic complexity, code duplication, code cohesion, or the correction
of GUI faults in Blob listeners.
Appendix
Beizer’s Taxonomy
Beizer’s taxonomy adapted by Brooks et al. [BRM09] is presented in Table 5.1.
131
132 Appendix
[AFT+ 12] Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Salva-
tore De Carmine, and Atif M. Memon. Using GUI ripping for automated
testing of android applications. In Proc. of ASE’12, pages 258–261, 2012.
[APB+ 12] S. Arlt, A. Podelski, C. Bertolini, M. Schaf, I. Banerjee, and A.M. Memon.
Lightweight static analysis for GUI testing. In Software Reliability En-
gineering (ISSRE), 2012 IEEE 23rd International Symposium on, pages
301–310, 2012.
[BA06] J.S. Bsekken and R.T. Alexander. A candidate fault model for aspectj
pointcuts. In Software Reliability Engineering, 2006. ISSRE ’06. 17th
International Symposium on, pages 169–178, 2006.
[BB10] Arnaud Blouin and Olivier Beaudoux. Improving modularity and us-
ability of interactive systems with Malai. In Proceedings of the 2Nd
ACM SIGCHI Symposium on Engineering Interactive Computing Sys-
tems, EICS’10, pages 115–124, 2010.
[BBG11] F. Belli, M. Beyazit, and N. Güler. Event-based GUI testing and reliability
assessment techniques – an experimental insight and preliminary results.
In Software Testing, Verification and Validation Workshops (ICSTW),
2011 IEEE Fourth International Conference on, pages 212–221, 2011.
135
136 BIBLIOGRAPHY
[BC06] Jon Arvid Børretzen and Reidar Conradi. Results and experiences from
an empirical study of fault reports in industrial projects. In Proc. of
PROFES’06, pages 389–394, Berlin, Heidelberg, 2006. Springer-Verlag.
[BCB13] Olivier Beaudoux, Mickael Clavreul, and Arnaud Blouin. Binding orthog-
onal views for user interface design. In Proceedings of the 1st Workshop
on View-Based, Aspect-Oriented and Orthographic Software Modelling,
page 4. ACM, 2013.
[Bei90] Boris Beizer. Software Testing Techniques. Van Nostrand Reinhold Co.,
1990.
[BLL00] Michel Beaudouin-Lafon and Henry Michael Lassen. The architecture and
implementation of cpn2000, a post-wimp graphical application. In Pro-
ceedings of the 13th Annual ACM Symposium on User Interface Software
and Technology, UIST ’00, pages 181–190, New York, NY, USA, 2000.
ACM.
[BLS05] Mike Barnett, K.RustanM. Leino, and Wolfram Schulte. The spec# pro-
gramming system: An overview. In Gilles Barthe, Lilian Burdy, Marieke
Huisman, Jean-Louis Lanet, and Traian Muntean, editors, Construction
and Analysis of Safe, Secure, and Interoperable Smart Devices, volume
3362 of Lecture Notes in Computer Science, pages 49–69. Springer Berlin
Heidelberg, 2005.
[BM11] Rex Black and Jamie L. Mitchell. Advanced Software Testing - Vol. 3:
Guide to the ISTQB Advanced Certification As an Advanced Technical
Test Analyst. Rocky Nook, 1st edition, 2011.
BIBLIOGRAPHY 137
[BMN+ 11] Arnaud Blouin, Brice Morin, Grégory Nain, Olivier Beaudoux, Patrick Al-
bers, and Jean-Marc Jézéquel. Combining aspect-oriented modeling with
property-based reasoning to improve user interface adaptation. In Pro-
ceedings of the 3rd ACM SIGCHI Symposium on Engineering Interactive
Computing Systems, EICS ’11, pages 85–94, 2011.
[CBC+ 92] Ram Chillarege, Inderpal S Bhandari, Jarir K Chaar, Michael J Halliday,
Diane S Moebus, Bonnie K Ray, and M-Y Wong. Orthogonal defect
classification-a concept for in-process measurements. IEEE Trans. Softw.
Eng., 18(11):943–956, 1992.
[CHM12] M.B. Cohen, Si Huang, and A.M. Memon. AutoInSpec: Using missing
test coverage to improve specifications in GUIs. In Software Reliabil-
ity Engineering (ISSRE), 2012 IEEE 23rd International Symposium on,
pages 251–260, 2012.
[CTR05] M. Ceccato, P. Tonella, and F. Ricca. s aop code easier or harder to test
than oop code? In Proceedings of the 1st Workshop on Testing Aspect-
Oriented Programs (WTAOP), held in conjunction with the 4th inter-
national Conference on Aspect-Oriented Software Development (AOSD),
2005.
[CW12] Woei-Kae Chen and Jung-Chi Wang. Bad smells and refactoring meth-
ods for GUI test scripts. In Software Engineering, Artificial Intelligence,
Networking and Parallel Distributed Computing (SNPD), 2012 13th ACIS
International Conference on, pages 289–294, 2012.
[FBZ12] Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. Automatic
detection of bad smells in code: An experimental assessment. Journal of
Object Technology, 11(2):5:1–38, 2012.
138 BIBLIOGRAPHY
[FFP+ 13] Camille Fayollas, Jean-Charles Fabre, Philippe A. Palanque, Eric Bar-
boni, David Navarre, and Yannick Deleris. Interactive cockpits as criti-
cal applications: a model-based and a fault-tolerant approach. IJCCBS,
4(3):202–226, 2013.
[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design
patterns: elements of reusable object-oriented software. Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[GPEM09] Joshua Garcia, Daniel Popescu, George Edwards, and Nenad Medvidovic.
Identifying architectural bad smells. In Software Maintenance and Reengi-
neering, 2009. CSMR’09. 13th European Conference on, pages 255–258.
IEEE, 2009.
[Gra92] Robert B. Grady. Practical Software Metrics for Project Management and
Process Improvement. Prentice-Hall, Inc., Upper Saddle River, NJ, USA,
1992.
[GXF09] M. Grechanik, Qing Xie, and Chen Fu. Maintaining and evolving gui-
directed test scripts. In Software Engineering, 2009. ICSE 2009. IEEE
31st International Conference on, pages 408–418, 2009.
[GZDG15] Maria Anna G. Gaitani, Vassilis E. Zafeiris, N.A. Diamantidis, and E.A.
Giakoumakis. Automated refactoring to the null object design pattern.
Information and Software Technology, 59(0):33–52, 2015.
[HZBS14] Tracy Hall, Min Zhang, David Bowes, and Yi Sun. Some code smells have
a significant but small effect on faults. ACM Transactions on Software
Engineering and Methodology, 23(4):33:1–33:39, September 2014.
[IEC95] IEC. Nuclear power plants - main control-room - verification and valida-
tion of design, 1995.
[IEE10] IEEE. Standard classification for software anomalies. IEEE Std 1044-2009
(Revision of. IEEE Std 1044-1993), New York, 2010.
BIBLIOGRAPHY 139
[JGH+ 08] Robert J.K. Jacob, Audrey Girouard, Leanne M. Hirshfield, Michael S.
Horn, Orit Shaer, Erin Treacy Solovey, and Jamie Zigelbaum. Reality-
based interaction: A framework for post-wimp interfaces. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems, CHI
’08, pages 201–210, New York, NY, USA, 2008. ACM.
[JH11] Yue Jia and Mark Harman. An analysis and survey of the development
of mutation testing. IEEE Trans. Softw. Eng., 37(5):649–678, September
2011.
[KCM00] Sunwoo Kim, John A. Clark, and John A. McDermid. Class mutation:
Mutation testing for object-oriented programs. In PROC. Net.ObjectDays
Conf. Object-Oriented Software Systems, October 2000.
[LBB15] Valeria Lelli, Arnaud Blouin, and Baudry Benoit. Classifying and qualify-
ing GUI defects. In Software Testing, Verification and Validation (ICST),
2015 IEEE Eighth International Conference on, pages 1–10, April 2015.
[LBB+ 16] Valeria Lelli, Arnaud Blouin, Baudry Benoit, Fabien Coulon, and Beau-
doux Olivier. Automatic detection of GUI design smells: The case of blob
listener. In Submitted to International Conference on Software Testing,
Verification and Validation (ICST), 2016.
140 BIBLIOGRAPHY
[LBBC15] Valeria Lelli, Arnaud Blouin, Baudry Benoit, and Fabien Coulon. On
model-based testing advanced GUIs. In 11th Workshop on Advances in
Model Based Testing (A-MOST 2015), pages 1–10, April 2015.
[LLS10] Ning Li, Zhanhuai Li, and Xiling Sun. Classification of software defect
detected by black-box testing: An empirical study. In Proc. of WCSE’10,
2010.
[Lm04] R.R. Lutz and I.C. mikulski. Empirical analysis of safety-critical anoma-
lies during operations. IEEE Trans. Softw. Eng., pages 172–180, 2004.
[LPSZ08] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from
mistakes –- a comprehensive study on real world concurrency bug char-
acteristics. In Proceedings of Architectural Support for Programming Lan-
guages and Operating Systems (ASPLOS), Seattle, WA, 2008.
[LS07] Wei Li and Raed Shatnawi. An empirical study of the bad smells and
class error probability in the post-release object-oriented system evolu-
tion. Journal of Systems and Software, 80(7):1120 – 1128, 2007. Dynamic
Resource Management in Distributed Real-Time Systems.
[MA07] Kenneth Magel and Izzat Alsmadi. GUI structural metrics and testability
testing. In Proceedings of the 11th IASTED International Conference on
Software Engineering and Applications, SEA ’07, pages 91–95, Anaheim,
CA, USA, 2007. ACTA Press.
[MBN03] Atif M. Memon, Ishan Banerjee, and Adithya Nagarajan. GUI ripping:
Reverse engineering of graphical user interfaces for testing. In Proceedings
of The 10th Working Conference on Reverse Engineering, November 2003.
[MBNR13] Atif Memon, Ishan Banerjee, Bao Nguyen, and Bryan Robbins. The first
decade of GUI ripping: Extensions, applications, and broader impacts.
In Proceedings of the 20th Working Conference on Reverse Engineering
(WCRE). IEEE Press, 2013.
[MPRS12] Leonardo Mariani, Mauro Pezze, Oliviero Riganelli, and Mauro Santoro.
AutoBlackTest: Automatic black-box testing of interactive applications.
volume 0, pages 81–90. IEEE Computer Society, 2012.
[Nor02] Donald A. Norman. The Design of Everyday Things. Basic Books, reprint
paperback edition, 2002.
[NPLB09] David Navarre, Philippe Palanque, Jean-Francois Ladry, and Eric Bar-
boni. ICOs: A model-based user interface description technique dedi-
cated to interactive systems addressing usability, reliability and scalabil-
ity. ACM Trans. Comput.-Hum. Interact., 16(4):1–56, 2009.
[NRBM14] Bao N. Nguyen, Bryan Robbins, Ishan Banerjee, and Atif Memon. GUI-
TAR: an innovative tool for automated testing of gui-driven software.
Automated Software Engineering, 21(1):65–105, 2014.
[NSS10] Duc Hoai Nguyen, Paul Strooper, and Jörn Guy Süß. Automated func-
tionality testing through GUIs. In Proceedings of the 33th Australasian
Conference on Computer Science, pages 153–162. Australian Computer
Society, 2010.
[OAW+ 01] Jeff Offutt, Roger Alexander, Ye Wu, Quansheng Xiao, and Chuck
Hutchinson. A Fault Model for Subtype Inheritance and Polymorphism.
Software Reliability Engineering, International Symposium on, 0, 2001.
[OPM15] Frolin Ocariza, Karthik Pattabiraman, and Ali Mesbah. Detecting in-
consistencies in JavaScript MVC applications. In Proceedings of the
ACM/IEEE International Conference on Software Engineering (ICSE),
page 11 pages. ACM, 2015.
[OU01] A. Jefferson Offutt and Ronald H. Untch. Mutation testing for the new
century. chapter Mutation 2000: Uniting the Orthogonal, pages 34–44.
Kluwer Academic Publishers, Norwell, MA, USA, 2001.
[PFV03] Ana C.R. Paiva, João C.P. Faria, and Raul F.A.M. Vidal. Specification-
based testing of user interfaces. In Interactive Systems. Design, Specifica-
tion, and Verification, volume 2844 of Lecture Notes in Computer Science,
pages 139–153. Springer Berlin Heidelberg, 2003.
[PMP+ 06] Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera,
and Lionel Seinturier. Spoon v2: Large scale source code analysis and
transformation for java. Technical Report hal-01078532, INRIA, 2006.
[Res13b] IBM Research. Orthogonal defect classification v 5.2 for software design
and code. http://researcher.watson.ibm.com/researcher/
files/us-pasanth/ODC-5-2.pdf, 2013. Online; accessed 02-June-
2015.
[RT05] F. Ricca and P. Tonella. Web testing: a roadmap for the empirical re-
search. In Web Site Evolution, 2005. (WSE 2005). Seventh IEEE Inter-
national Symposium on, pages 63–70, 2005.
[SCP08] José L. Silva, José Creissac Campos, and Ana C. R. Paiva. Model-based
user interface testing with spec explorer and ConcurTaskTrees. Electron.
Notes Theor. Comput. Sci., 208:77–93, 2008.
144 BIBLIOGRAPHY
[SCSS14] J.C. Silva, J.C. Campos, J. Saraiva, and J.L. Silva. An approach for
graphical user interface external bad smells detection. In Álvaro Rocha,
Ana Maria Correia, Felix . B Tan, and Karl . A Stroetmann, editors, New
Perspectives in Information Systems and Technologies, Volume 2, volume
276 of Advances in Intelligent Systems and Computing, pages 199–205.
Springer International Publishing, 2014.
[SKBD14] Dilan Sahin, Marouane Kessentini, Slim Bechikh, and Kalyanmoy Deb.
Code-smell detection as a bilevel problem. ACM Trans. Softw. Eng.
Methodol., 24(1):6:1–6:44, October 2014.
[SKIH11] Seyed Reza Shahamiri, Wan Mohd Nasir Wan Kadir, Suhaimi Ibrahim,
and Siti Zaiton Mohd Hashim. An automated framework for software test
oracle. Information and Software Technology, 53(7):774 – 788, 2011.
[SM08] Jaymie Strecker and Atif Memon. Relationships between test suites,
faults, and fault detection in GUI testing. In Proc. of ICST’08, pages
12–21, 2008.
[Smi09] Josh Smith. WPF apps with the model-view-viewmodel design pattern.
MSDN Magazine, February 2009.
[SSG+ 10] João Carlos Silva, Carlos Eduardo Silva, Rui Gonçalo, João Alexandre
Saraiva, and José Creissac Campos. The GUISurfer tool: towards a lan-
guage independent approach to reverse engineering GUI code. In Proceed-
ings of the 2nd ACM SIGCHI Symposium on Engineering interactive com-
puting systems (EICS’10), pages 181–186, Berlin, Germany, 2010. ACM.
[Sun01] Sun Microsystems. Java Look and Feel Design Guidelines. Addison-
Wesley, 2 edition, 2001.
[SYA+ 13] Dag I.K. Sjoberg, Aiko Yamashita, Bente C.D. Anda, Audris Mockus, and
Tore Dyba. Quantifying the effect of code smells on maintenance effort.
IEEE Transactions on Software Engineering, 39(8):1144–1156, 2013.
[TKH11] Tommi Takala, Mika Katara, and Julian Harty. Experiences of system-
level model-based GUI testing of an android application. In Software
BIBLIOGRAPHY 145
[vBDD+ 91] Gregor von Bochmann, Anindya Das, Rachida Dssouli, Martin Dubuc,
Abderrazak Ghedamsi, and Gang Luo. Fault models in testing. In Protocol
Test Systems, pages 17–30, 1991.
[vD97] Andries van Dam. Post-WIMP user interfaces. Commun. ACM, 40(2):63–
67, February 1997.
[vD00] Andries van Dam. Beyond wimp. IEEE Computer Graphics and Appli-
cations, 20(1):50–51, 2000.
[Wei10] Stephan Weißleder. Test models and coverage criteria for automatic
model-based test generation with UML state machines. PhD thesis, Hum-
boldt University of Berlin, October 2010.
[Yan11] Xuebing Yang. Graphic User Interface Modelling and Testing Automation.
PhD thesis, School of Engineering and Science Victoria University, May
2011.
[YCM11] Xun Yuan, M.B. Cohen, and A.M. Memon. GUI interaction testing:
Incorporating event context. Software Engineering, IEEE Transactions
on, 37(4):559–574, 2011.
[YPX13] Wei Yang, MukulR. Prasad, and Tao Xie. A grey-box approach for au-
tomated GUI-model generation of mobile applications. In Fundamental
Approaches to Software Engineering, volume 7793 of Lecture Notes in
Computer Science, pages 250–265. Springer Berlin Heidelberg, 2013.
[ZFS15] Marco Zanoni, Francesca Arcelli Fontana, and Fabio Stella. On apply-
ing machine learning techniques for design pattern detection. Journal of
Systems and Software, 103(0):102 – 117, 2015.
[ZHM97] Hong Zhu, Patrick A. V. Hall, and John H. R. May. Software unit test
coverage and adequacy. ACM Comput. Surv., 29(4):366–427, December
1997.
[ZLE12] Sai Zhang, Hao Lü, and Michael D. Ernst. Finding errors in multithreaded
GUI applications. In Proc. ISSTA’12, ISSTA 2012, pages 243–253. ACM,
2012.
146 BIBLIOGRAPHY
[ZLE13] Sai Zhang, Hao Lü, and Michael D. Ernst. Automatically repairing broken
workflows for evolving GUI applications. In ISSTA 2013, Proceedings
of the 2013 International Symposium on Software Testing and Analysis,
pages 45–55, 2013.
[ZPK14] Razieh Nokhbeh Zaeem, Mukul R. Prasad, and Sarfraz Khurshid. Auto-
mated generation of oracles for testing user-interaction features of mobile
apps. In Proc. of ICST’14, 2014.
List of Figures
3.1 Classification of the 279 bug reports using the GUI fault model . . . . . 71
3.2 Manifestation of failures in the user interface and interaction levels . . . 72
3.3 Screen-shot of the LaTeXDraw’s GUI . . . . . . . . . . . . . . . . . . . . 77
3.4 Representation of how interactions are currently managed and the current
limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 EFG sequence on the left. Interaction-action sequence on the right. . . . 84
3.6 The architectural design pattern Malai . . . . . . . . . . . . . . . . . . . 85
3.7 Example of a bi-manual interaction modelled by Malai UIDL . . . . . . 86
3.8 Illustration of a Malai instrument . . . . . . . . . . . . . . . . . . . . . . 87
3.9 Example of an interaction-action-flow graph . . . . . . . . . . . . . . . . 88
147
List of Tables
149
List of Algorithms
151
Résumé Abstract
La communauté du génie logiciel porte depuis ses débuts une The software engineering community takes special attention to
attention spéciale à la qualité et la fiabilité des logiciels. De the quality and the reliability of software systems. Software
nombreuses techniques de test logiciel ont été développées testing techniques have been developed to find errors in code.
pour caractériser et détecter des erreurs dans les logiciels. Les Software quality criteria and measurement techniques have
modèles de fautes identifient et caractérisent les erreurs also been assessed to detect error-prone code.
pouvant affecter les différentes parties d’un logiciel. D’autre
part, les critères de qualité logiciel et leurs mesures permettent In this thesis, we argue that the same attention has to be
d’évaluer la qualité du code logiciel et de détecter en amont du investigated on the quality and reliability of GUIs, from a
code potentiellement sujet à erreur. Les techniques d’analyses software engineering point of view. We specifically make two
statiques et dynamiques scrutent, respectivement, le logiciel à contributions on this topic. First, GUIs can be affected by errors
l’arrêt et à l’exécution pour trouver des erreurs ou réaliser des stemming from development mistakes. The first contribution of
mesures de qualité. this thesis is a fault model that identifies and classifies GUI
faults. We show that GUI faults are diverse and imply different
Dans cette thèse, nous prônons le fait que la même attention testing techniques to be detected.
doit être portée sur la qualité et la fiabilité des interfaces
utilisateurs (ou interface homme-machine, IHM), au sens génie Second, like any code artifact GUI code should be analyzed
logiciel du terme. Cette thèse propose donc deux contributions statically to detect implementation defects and design smells.
dans le domaine du test et de la maintenance d’interfaces As for the second contribution, we focus on design smells that
utilisateur : 1. Classification et mutation des erreurs d’interfaces can affect GUIs specifically. We identify and characterize a new
utilisateur. 2. Qualité du code des interfaces utilisateur. type of design smell, called Blob listener. It occurs when a GUI
listener, that gathers events to treat and transform as
Nous proposons tout d’abord un modèle de fautes d’IHM. Ce commands, can produce more than one command. We propose
modèle a été conçu à partir des concepts standards d’IHM pour a systematic static code analysis procedure that searches for
identifier et classer les fautes d’IHM ; Au travers d’une étude Blob listener that we implement in a tool called
empirique menée sur du code Java existant, nous avons InspectorGuidget. Experiments we conducted exhibits positive
montré l’existence d’une mauvaise pratique récurrente dans le results regarding the ability of InspectorGuidget in detecting
développement du contrôleur d’IHM, objet qui transforme les Blob listeners. To counteract the use of Blob listeners, we
évènements produits par l’interface utilisateur pour les propose good coding practices regarding the development of
transformer en actions. Nous caractérisons cette nouvelle GUI listeners.
mauvaise pratique que nous avons appelée Blob listener, en
référence à la méthode Blob. Nous proposons également une
analyse statique permettant d’identifier automatiquement la
présence du Blob listener dans le code d’interface Java Swing.