33] computers (nay)
Review
Fraud Detection Using the Fraud Triangle Theory and Data
Mining Techniques: A Literature Review
Marco Sénchez-Aguayo "0, Luis Urquiza-Aguiar™*O and José Estrada-Jiménez >t
Oa:
pdates
Citation: Sénchez Aguayo, Mi
Urges Aguie L Extenda iene,
|. Fraud Detection Using the Frmd
Triangle Theory and Data Mining
Techniques: A Literature Review
ped /
ong /103380/computers1 010023
Academic ator: Francesc Fallucch
Recrved 16 June 20
Accepted? Septem 212
Published 30 September 2021
with regard to juisitonal aims in
published maps and institut afl
(omoml
Copyright © 2021 by the authors
LUcerace MDP, Basel, Switz
This sete ean open access ariel
conditions ofthe Creative Commons
Altaation (CC BY) Leese tps //
creativecommonsorg/lenses/9y/
Departamento de Informstca y Ciencias dela Computacin, Escuela Politécnica Nacional,
Ladtén de Guevara E11-253, Quito 170517, Feusdor
Departamento de Flectrdnica,Telecomunicaciones y Redes de Informacidn, Escuela Politénica Nacional,
Ladrén de Guevara E11-253, Quito 170517, Beuador; lus
[email protected] ec (LU-A)}
jose estradatepn eduec (E-})
+ Conrespondence: marco sanchez01Gepn.edu.ce
+ These authors contributed equally to this work.
Abstract: Fraud entails deception in order to obtain illegal gains; thus, itis mainly evidenced within
financial institutions and is a matter of general interest. The problem is particularly complex, since
perpetrators of fraud could belong to any position, from top managers to payroll employees. Fraud,
detection has traditionally been performed by auditors, who mainly employ manuaal techniques.
‘These could take too long to process fraud-related evidence. Data mining, machine learning, and, as
of recently, deep learning strategies are being used to automate this type of processing. Many related
techniques have been developed to analyze, detect, and prevent fraud-related behavior, with the
fraud triangle associated with the classic auditing model being one of the most important ofthese.
This work aims to review current work related to fraud detection that uses the fraud triangle in
addition to machine learning and deep learning techniques. We used the Kitchenham methodology.
to analyze the research works related to fraud detection from the last decade, This review provides
evidence that fraud is an area of active investigation. Several works related to fraud detection using
machine learning techniques were identified without the evidence that they incorporated the fraud
triangle as a method for more efficient analysis.
Keywords: fraud; machine learning; cybersecurity; human behavior
1. Introduction
Fraud has increased considerably in recent times, affecting the interests of both finan-
cial institutions and their customers. A study conducted by Price Waterhouse Coopers
found that 30% of the companies that they surveyed had already been victims of fraud.
Moreover, 80% of their fraud was committed within the companies’ ranks, especially
in administrative areas, such as accounting, operations, sales, and at the management
level, without leaving aside the customer service dependencies [1]. Fraud-related activi-
ties, which are generally unknown within a company, determine a series of irregularities
and illicit acts characterized by intentional deception committed by fraudsters. Most of
the anomalies detected are due to the lack of internal control mechanisms, and in such
situations, scammers commit fraud by exploiting the weaknesses [2]
Fraud is considered a subset of internal threats, such as corruption, misappropriation
of assets, and fraudulent declarations, among others [3]. In a more formal definition, fraud
is “the use of one’s occupation for personal enrichment through the misuse or deliberate
misapplication of the resources or assets of the employing organization”, according to
the Association of Certified Fraud Examiners (ACFE) [4]. The ability to commit this
type of activity is based on the weakness of the control mechanisms that institutions
and companies have. In such circumstances, fraudsters commit acts of fraud by taking
advantage of these weaknesses.
(Computers 2021, 10,12. bitps //doiorg/10.3390/computers10100121 bitps//wwwmdp com/journal/computersComputers 2021, 10,121
2of2
Since it is committed by humans, fraud is tightly coupled with human behavior
‘Thus, understanding the motivations of perpetrators or their psychological and personality
traits that drive them to cross ethical boundaries can provide a new perspective for fraud
detection [5].
Currently, there are different solutions [6] for detecting fraud, which are focused on
the use of different tools that perform statistical and parametric analyses based on data
mining techniques, as well as analyses of behavior, but none of them solve the problem of
timely fraud detection [7]
Given the complexity of analyzing human behavior to detect fraud, some approaches
in this line have been proposed to tackle some of the issues involved in this task. For
Instance, some works aimed to improve the precision and increase the speed of data pro-
cessing through a hybrid automatic learning system [8] or through incremental learning [9]
Another challenge for fraud detection is the lack of data from which detection systems
learn, and [10] proposed a fraud-detection system that does not require previous fraud-
ulent examples. However, even when the data are available, large and small datasets
should be addressed differently [11]. In any case, as a human behavior, fraud detection is a
multidimensional problem, and so are some of the fraud-detection mechanisms proposed
in the literature [12,13].
There is a consensus that prevention should be a priority in order to minimize fraud
through proper risk management. Avoiding fraud saves time and financial resources,
since detecting it after it occurs has the consequence that the stolen assets are practically
irrecoverable. To enhance fraud prevention, organizations should focus on the root of the
problem by identifying the causes that lead people to commit fraud and to understand
their behavior [14]. Many theories have attempted to answer this question, and the
most frequently cited in this context are Cressey’s Fraud Triangle Theory (FTT) and Wolf
and Hermanson’s Diamond Fraud Theory (FDT) [15]. Both approaches analyze how
perpetrators go so far as to commit fraud, which is discussed below:
The study of fraud and its analysis is best explained with the help of the Fraud
Triangle Theory (FIT), which was proposed by Donald R. Cressey, a leading expert in the
sociology of crime. Cressey investigated why people committed fraud and determined
their responses based on three elements: pressure, opportunity, and rationalization. This
theory also mentions that these elements occur consecutively to provoke the desire to
commit fraud. The first necessary element is perceived pressure, which is related to the
‘motivation and drive behind the fraudulent actions of an individual. This motivation often
occurs in people who are under some form of financial stress [16]. The second element,
known as perceived opportunity, is nothing more than the action behind the crime and the
ability to commit it. Finally, the third component, known as rationalization, has to do with
the idea that the individual can rationalize their dishonest acts, making their illegal actions
seem justified and acceptable [17]
The FDT, considered an extended version of the FTT, integrates a new vertex with
the three that were already known—capacity [18]. Despite the cohesion among the three
vertices of pressure, opportunity, and rationalization, itis unlikely that people will commit
fraud unless they have the capacity (considered the fourth vertex). In other words, the
potential perpetrator must have the skills and ability to commit fraud [19]
‘Various theories of fraud have been used to explain the motivation of this phenomenon,
‘The FIT and FDT can be effectively used to detect the possibility of corporate fraud, where
the measurement of all of the associated variables will depend to a great extent on the data
used for the study, whether public or private [20]
Fraud analysis, when supported by data mining techniques, helps reduce the manual
parts of the detection verification process and makes the search for fraud more efficient.
Itis impossible to guarantee the proper moral and ethical behavior of people, especially
in the workplace. Due to this reality, a valid option for identifying possible evidence of
fraud from available data is to use automatic learning algorithms. Many works cover fraud
detection and use data mining techniques as the primary focus [21-24]. Two criticisms ofComputers 2021, 10,121
3of22
data-mining-based fraud-detection research are frequently raised: the deficiency of the
actual public data available in this domain for conducting experiments [25]—appropriate
access to data for researching this area is extremely difficult due to privacy—and the lack
of well-documented and published methods and techniques.
11. Related Work
Here, we describe some systematic reviews whose main objectives were the anal-
ysis and detection of fraud using automatic learning techniques and the application of,
fraud theories
Phua et al. [25] carried out a survey in which they identified the limitations of fraud-
detection methods and techniques and showed that this field can benefit from other related
areas. Specifically, unsupervised approaches may benefit from existing monitoring systems
and text extraction, semi-supervised, and game-theoretical approaches; spam and intrusion
detection communities can contribute to future fraud-detection investigations. However,
above all, the authors focused on the nature of the information and made an exciting
reflection on the investigation of fraud detection based on data mining. They also referred
to the scarcity of publicly available and real data for carrying out experiments, as well as to
the lack of well-documented and published methods and techniques.
Zhou et al. [26] concluded that most fraud-detection systems employ at least one
supervised learning method and that unsupervised and semi-supervised learning methods
are also used. The study showed that these techniques can be used alone or in combination
to build more robust classifiers and that, without losing generality, these approaches are
relatively successful in detecting fraud and credit scoring. They mentioned that fraud
detection and data-mining-based credit scoring are subject to the same classification related
issues, such as feature engineering, parameter selection, and hyperparameter tuning. The
authors also observed that fraud-related data are not abundant enough for investigators
to train and test their models and that complex financial scenarios are nearly impossible
to represent. They explained that fraud detection must constantly evolve, and it must
particularly depend on the industry in which itis applied.
The authors of [27] performed a meta-analysis to establish the effect of mapping
data samples from fraudulent companies to non-fraudulent companics using classification
methods by comparing the general classification precision found in the literature. The
results indicated that fraudulent samples could be matched equally to non-fraudulent
samples (1:1 data mapping) or could be unevenly mapped using a one-to-many ratio to
proportionally increase the sample size. Based on this meta-analysis, compared to statis-
tical techniques, machine learning approaches can achieve better classification precision,
specifically when the availability of sample data is low. Furthermore, high classification
precision can even be obtained with a dataset with 1:1 mapping by using machine learning
classification approaches.
The results mentioned by the authors of [28] clearly show that data mining techniques
have been applied more widely for fraud detection in other fields, such as insurance,
corporate, and credit card fraud. In this line, we found a lack of research on mortgage
fraud, money laundering, and security fraud.
The main data mining techniques used for detecting financial fraud are logistical mod-
els that provide immediate solutions to the problems inherent in detecting and classifying
fraudulent data. The authors of [29] conducted a review of the literature to address the
following research questions related to financial statement fraud (FSF): (1) Can FSF be
detected, how likely is it, and how can it be done? (2) What characteristics of the data can
bbe used to predict FSF? (3) What kind of algorithm can be used to detect FSF? (4) How can
detection performance be measured? (5) How effective are these algorithms in terms of
detecting fraud? This work presents a generic framework to guide this analysis.
The reviews mentioned above have something in common: They try to unveil the
main techniques used for fraud detection, such as machine learning methods (supervised,
unsupervised, and semi-supervised), and try to identify which of these are more effectiveComputers 2021, 10,121
40822
‘This analysis was carried out in different scenarios, contrasting the results obtained and
specifying the study area in which they are most accurate. We could not find studies linking
fraud detection by means of machine learning techniques and the Fraud Triangle Theory,
Finally, we find it important to comment on some theories for the understanding
of fraud detection, Studies such as [15] analyzed the convergence and divergence of
two classic theories of fraud: the triangle theory and the diamond theory. There, the
concept of fraud and the convergence of the two classical theories were examined. This
work also discussed the differentiation between them. In doing so, the similarities and
differences between these theories were highlighted and appreciated. A discussion of the
two approaches contributes to the understanding of fraud, especially for fraud professionals
and fraud examiners.
1.2, Contribution
This research aims to compile the literature related to fraud detection from two
perspectives. On the one hand, we analyze works that consider human behavior as an
inherent risk factor in this problem, especially by using the FIT and FDT. Beyond exploring
these theories, on the other hand, our review analyzes different works where machine
learning techniques have been used for fraud detection. Moreover, we look for works that
integrate ML techniques with behavior-based theories of fraud, such as the FTT and FDT
To do this, we used the well-known methodology of Barbara Kitchenham and formu-
lated three research questions. As a result, we provide an up-to-date and comprehensive
analysis of the subject. It will help in identifying, investigating, and evaluating the causes
that lead to fraud and in detecting it. This study can guide further research on the topic in
areas that the investigation has not considered.
The rest of this paper is organized as follows. Section 2 addresses the methodology
used to perform this review. Then, Section 3 summarizes our findings. After that, we
discuss the weaknesses and strengths of the techniques identified in Section 4. Finally,
Section 5 draws conclusions and describes future work.
2. Materials and Methods
A systematic literature review (SLR) was carried out for this research work. According
to [25], the purpose of an SLR is to provide a complete list of all studies related to specific
subject areas. Meanwhile, traditional reviews attempt to summarize the results of several
studies. An SLR uses an evidence-based approach to meticulously search for relevant
studies within a context to answer predefined research questions and select, evaluate,
and critically analyze the findings in order to answer those research questions; this is
done by following the recommendations reported in [30]. Considering the guidelines and
recommendations described by Barbara Kitchenham [31 a systematic literature review
must follow the methodological process illustrated in Figure 1.
2.1, Research Questions
As we stated, this article aims to review and summarize the works related to fraud
detection that is performed by using machine learning techniques or the Fraud Triangle
Theory. We do not restrict our search to any specific knowledge. The SLR research questions
(RQs) that we intend to answer in this paper are the following:
1. RQL: How can fraud be detected by analyzing human behavior by applying fraud
theories?
2. RQ2: What machine or deep leaming techniques are used to detect fraud?
3. RQB: Using machine learning techniques, how can fraud cases be detected by analyz-
ing human behavior associated with the Fraud Triangle Theory?
2.2, Keywords
We looked for scientific publications related to fraud detection, its process of identifi-
cation, and its application to answering our research questions. We specifically targetedComputers 2021, 10,121
Sofa
works focused on fraud that relied on machine learning techniques or the Fraud Triangle
‘Theory. To this end, we created a base list of keywords that was built from the keywords
found in related research, as shown in Table 1
ar
cons Define main problem and
eu research questions
Define research
methodology.
Define search strings
‘Search in Scientific
Databases
Download full text versions
|
:
Selection criteria
Geet i=l ee-*| Classification and coding
Data extraction and
analysis
‘Writing and review the
paper
Figure 1. Methodology applied in the systematic literature review (SLR).
Table 1. Keywords,
Title 1 Title2 Titles
1 fraud FR
2 fraud detection a)
3 fraud triangle theory FIT
4 fraud diamond theory FDP
5 human behavior HB
6 behavior patterns BP
7 data mining DT
8 machine learning ML
9 deep learning DL
2.3, Search Strategy
‘We employed the guidelines from [32,33] to define a search strategy in order to retrieve
as many relevant documents as possible. Our search strategy is described below.
23.1. Search Method
‘To find the most relevant publications for the topic addressed in this work, we queried
the following databases: IEEEXplore, ScienceDirect, ACM Digital Library, and Scopus.
We chose these databases because they offer the most essential and high-impact full-text
journals and conference proceedings that cover the ML and FD fields in general. We carried
out the searches in the titles, keywords, and abstracts of articles using the combinations of
terms introduced in the following section.Computers 2021, 10,121
oof22
2.3.2. Search Terms
The search string was designed according to what was mentioned in [34]. Based
on the research questions, we constructed the following relationships: (“Data mining”
OR “Machine learning” OR “Deep Learning”) AND (“Detection Fraud” OR “Internal
Fraud” OR “Fraud Triangle” OR "Diamond Triangle” OR “Human Behavior”). All of
these search terms were combined using “AND” operators to build the search string. The
search terms in the string only matched the title, abstract, and keywords of the digital
databases’ articles. Itis essential to find the correct search field or combination, be it the
title, abstract, or full text, to apply in the search string and, thus, obtain effective results
In many cases, searching only by the “title” does not always provide the most relevant
publications, Therefore, it can be necessary to include the “abstract” and, in other cases,
“the complete document” of the related publications,
233. Selection of Papers
Since the searches in the articles’ full text resulted in many irrelevant publications, we
decided to apply the search criteria by incorporating the “abstracts” of the papers. This
means that an article was selected as a potential candidate if is ttle or abstract contained
the keywords defined in the search string. As a first filter, we evaluated each paper's title
and abstract according to the inclusion and exclusion criteria (see Table 2). We selected the
articles within the scope of the research questions. We thoroughly and entirely read the
previously selected articles (which passed the first filter) as a second filter. Ultimately, the
papers were included or excluded according to the inclusion and exclusion criteria, We
will focus next on explaining the inclusion exclusion criteria. Additionally, the search was
limited to research written in English and published since 2010 [35]
Table 2. Inclusion/exclusion criteria,
‘No Inclusion Criteria
ICL Indexed publications not older than 10 years.
IC2_ Scope of study: Computer Science
IC3_ Primary studies (journal or articles).
IC4 Papers that discuss aspects regarding fraud detection,
IC5_The investigations considered have information relevant to the research questions.
Exclusion Criteria
Papers in which the language is different from English cannot be selected
Papers that are not available for reading and data collection (papers that are only
accessible by paying or are not provided by the search engine) cannot be selected.
EC3_ Duplicated papers cannot be selected
EC4 Publications that do not meet any of the inclusion criteria cannot be selected
ECS_ Publications that do not describe scientific methodology cannot be selected.
2.4, Study Selection
As shown in Figure 2, the selection of studies was performed through the following
processes [36]
1. _ Identification: The keywords were selected from the databases listed above according
to the research questions mentioned in the search method section, The search string
was applied only to the title and abstract, as a full-text search would produce many
irrelevant results [37]. The search period went from 2010 to 2021
2. Filter: All possible primary studies’ titles, abstracts, and keywords were checked
against the inclusion and exclusion criteria. Ifit was difficult to determine whether
an article should be included or not, it was reserved for the next phase.
3. Eligibility: At this stage, a complete reading of the text was carried out to determine
if the article should be included according to the inclusion and exclusion criteria,Computers 2021, 10,121
7022
4. Data extraction: After the filtering process, data were extracted from the selected.
studies to answer RQI-RQ3
ay
Search Scenic eee as oo
| —
|Search in databases to identify
Downoad tutte | _[°erer cessing he
eee
(FE pa Beso]
Lae
Selection criteria + Exclude irrelevant studied
based on analysis of their
peerless
ss _[eransire te masa
ig eae studied based on full text read
etecorema| Obtain primary studies
wa
Writing and review the
paper
Figure 2. Process of the selection of studies
25. Quality Assessment
Once we selected several primary studies based on the inclusion and exclusion criteria,
‘we assessed their quality. Following the guidelines in [36], three quality assessment (QA)
questions were defined to measure the research quality of each proposal and to provide a
quantitative comparison between the research works considered. The criteria were based
on three quality assessment (QA) questions
1. Are the topics covered in the article relevant for fraud detection? Yes: It explicitly
describes the topics related to fraud detection by applying ML techniques through
the FIT. Partially: Only a few are mentioned. No: It neither describes nor mentions
topics related to fraud detection using MI. techniques through the FIT.
2. Were the limitations for the study of fraud detection detailed? Yes: It clearly explained
the limitations related to fraud detection by applying ML techniques through the
FIT. Partially: It mentioned the limitations but did not explain why. No: It did not
mention the limitations.
3. Did the study address systematic research? Yes: The study was developed system-
atically and applied an adequate methodology to obtain reliable findings. Partially:
The study was developed systematically and used a proper methodology but did not
provide details. No: The study was not explained in a clear way and the authors did
not apply an adequate methodology.
The scoring procedure was defined as follows: Y (Yes = 1), P (Partially = 0.5),
N (No = 0), or Unknown (.e,, the information was not specified).
2.6, Data Extraction and Analysis
This section describes the data extraction process performed with the selected papers
and the analysis of the data extracted in order to answer the research questions of this
SLR. We extracted the required data from previously selected works that were accordinglyComputers 2021, 10,121
Bof22
classified to answer the research questions, as shown in Table 3. The data extraction form
used for all selected primary studies is indicated in order to carry out an in-depth analysis.
‘Table 3, Data extraction form.
No Extracied Data Description “Type
7 Identity ofthe study ‘Unique identity for the study General
2 [Bibliographic references Authors, year of publication, tie, and source of publication General
3 Iype of study Book, journal paper, conference paper, workshop paper General
4 ‘The theories employed Description ofthe detection of fraud by applying the PTT and HB RQL
5 ‘The techniques considered Description ofthe detection of fraud by applying ML/DM techniques RQ2
6 — Combination of techniques and theories used Description of the analysis of theories and techniques used to detect fraud ROB.
7 Findings and Contributions Indication ofthe findings and contributions ofthe study General
We extracted the most representative papers related to the research questions based.
on the search string and associated terms. The results of the analysis of the data obtained
are presented in the next section
27. Synthesis
Many papers could contain keywords that were used in the search string, but they
could be irrelevant to our research questions. Therefore, a careful selection of documents
should include only those containing helpful information with respect to the research
approach and the answers to the different research questions. As shown in Figure 3, we
first searched each data source separately in order to later join the results obtained from the
various sources of information, resulting in a total of 1891 papers. We obtained the most
articles from Scopus, representing around 50% of all documents.
Papers found 1891
ES =
Figure 3. Studies retrieved through search engines
‘Table 4 shows the number of articles found per source according to the search for
keywords related to the search strings in the selected databases. The second column shows
the results of the initial selection of papers found in each source. Below is the number of
articles that were chosen after removing the exclusion criteria. The number of articles that
were selected after eliminating duplicate articles is presented in the fourth column, Finally,
the papers from each source that were selected after completing the inclusion process
are presented.
It was necessary to refine the papers obtained by previously eliminating irrelevant
studies to ensure that the works complied with the established selection criteria. Our
search in the databases, the application of the search string to only the titles and abstracts
of the articles, and the selection of articles that were published during the last eleven years
yielded 1891 records. After using the exclusion criteria on these records, we obtained
254 studies. The analysis of the duplicity of such studies enabled us to find 106 papers that
were relevant for a full-text review. Finally, after a full-text assessment, 32 studies [38-69]
‘were identified as a result of the analysis through the SLR technique. Therefore, a total ofComputers 2021, 10,121
90622
32 publications met all of the inclusion criteria. The selection of studies from the initial
search identification phase and the final number of included studies are presented in
Figure 4. As initially proposed and to ensure that the resulting reviews contained relevant
information, we read the full text of the 32 studies to verify if they fit our adopted selection
criteria, As a result, all of these publications represented our final set of primary studies.
‘Table 4, Number of papers found through the selection process.
‘Source PapersFound AbstractandTitle Duplicity Selected
Scopus 960 7 8 16
TEEE al 68 3 7
WoC 360 a 16 9
ACM 230 48 u 4
Total 1891 254 106 32
((Fitec Yes, decureouree, ype ndanguage >)
(2010-20'Corferonce paper, aril, jounalsengsh)
Included (n=1231) Baad
in=660)
Fier Exclude relevant based on abstracts and tes
Included (n=254) ria
~ L_tosarny
rc >)
‘terion: Buplety
Included (n=106) cel
nas)
| (terion: Exclude relevant based ful ex
Included (ne32) Pais
(=r)
{tem nae revi a)
Figure 4. Steps followed to narrow the search results.
Regarding the types of publications where the selected papers were available, we
found that 50% of them had been published in conferences and 50% in journals.
Table 5 shows the number of citations of the selected articles. The data presented
(column cited) provide only an approximation of the citation rates and are not intended for
‘comparisons among studies.
Regarding the period of publication of the selected articles, 32 studies were published.
between 2010 and 2021. Furthermore, as shown in Figure 5, 2010, 2015, 2016, and 2017 had
the most significant numbers of articles, while 2011, 2012, 2019, and 2020 hed the lowest
numbers.Computes 202, 10,121 woot
‘Table 5, Numbers of selected studies by type
¢ Gied=s#~SGited—=Si”SCteds=SiSSCites
Bl 905 LAB) 6 158] B [es] 95H
9] 16 (49) 6 (59] 2 155] 6
[40] 20, [50] 431 [60] 258
ta] 3 1] 9 {o] 5
[2] 55 52] 0 fox] 133
[43] 18 {53] 16 [63] 90.
Vo] 12054] 55 (64] 2
n {65] 7 (46) 2
7 [66] 3 47] 22
209 [67] 4 [69] 6
Figure 5. Number of articles by year of publication,
3. Results
‘As the result of our methodology, we found 32 documents that were published.
between 2010 and 2021 that covered the most representative work on the topic of this paper.
We focused only on peer-reviewed papers from journals and conferences. All of them were
obtained as a result of searching for fraud-related topics in four scientific libraries. Table 6
shows a matrix built using the topics most closely related to the research questions and with
references to the corresponding articles. As can be seen, each column identifies a relevant
topic associated with the research questions. We can see that seven works were found
for RQM(Fraud Detection + Human Behavior + Fraud theory). In contrast, for RQ2 (Fraud
Detection + ML/DM techniques), 24 works were found, while for RQ3 (Fraud Detection
+ Human Behavior + ML/ DM + Fraud theory), only one study was found. So, it looks
like there is room for improving fraud detection because RQS brings together most of the
topics established in the other research questions.
‘Table 6, Data extraction form.
Ref Fraud Detection Human Behavior MUDM Techniques Fraud Theory
1 BS RQi RQi RQI
2 ps] Qh RQi ROI
3 fol Qi RQi RQ
4 tH) RQI RQI RQI
5 [2] RQ ROI RQL
6 Us) RQh RQ. RQL
7 vol Qi Qi ROI
8 is) RQ? RQ?
9 Lisl RQ RQComputers 2021, 10,121
of
‘Table 6. Cont
@ Ref Fraud Detection Human Behavior _MUDMTechniques Fraud Theory
wo (7 RQ RQ?
1 fs] RQ RQ
2 fs] RQ RQ
33 [50] RQ. RQ?
4 Bil RQ RQ
15 RQ RQ
16 RQ? RQ
v RQ RQ
18 RQ RQ
1% RQ RQ
20 RQ? RQ?
21 RQ RQ
2 RQ RQ
23 [60] RQ RQ?
2% [si] RQ? RQ
25 [2] RQ? RQ
26 [63] RQ? RQ
27 [sa] RQ? RQ?
28 [ss] RQZ RQ?
29 [ss] RQ? RQ?
30 [7] RQ? RQ
a1 [ss] RQ RQ2
32 9] ROB RQ3 RB ROS
Table 7 shows the frequencies of the works found vs. the research question, As can be
seen, RQ2 is the most frequently investigated. It accounts for 88.46%. Only one paper was
found for RQI, accounting for 3.84%, and RQ3 accounts for 7.69%
‘Table 7, Data extraction form.
RQ ‘Study Identifier Frequency Percentage
1 (3843,70] 7 21.88
2 [45-68] 2 75
3 [69] 1 3.13
3d, ROL: How Gan Fraud Be Detected by Analyzing Human Behavior by Applying Fraud
This section details the results obtained from the analysis of research papers that relate
fraud detection with the point of view of human behavior by applying the Fraud Triangle
‘Theory. The investigation is intended to answer RQ1. We answer this question through a
statistical analysis of the number of documents linked to the research question. According
to Table 6, seven works were found. Hoyer et al. [38] proposed a prototype in a generic
architectural model that considers the factors of the fraud triangle. In this way, in addition
to the analysis applied as part of a traditional fraud audit, human behavior is considered.
By doing this, the transactions examined by an auditor can be better differentiated and
prioritized. Behavioral patterns are found through the incorporation of the human factor.
‘These patterns appear in multiple sources of information, especially in users’ data, such as
in e-mails, messages, network traffic, and system records from which evidence of fraud
can be extracted,
Sanchez et al. [39] presented a framework that allows the identification of people
who commit fraud and is supported by the Fraud Triangle Theory. This proposal is based
on the use of a continuous audit that is installed on user devices, collects information
from agents, and employs the collection of phrases, They are subsequently analyzed to
identify fraud patterns through the analysis of human behavior and the treatment of theComputers 2021, 10,121
i20fz2
results. In [40], based on primary data on the behavior of perpetrators who commit fraud,
the authors showed the complementarity between an ex-post analysis and the existing
literature on this topic. They suggested that the presence or absence of fraudulent intent
can be assessed by scrutinizing human behavior, Mackevicius and Giriunas [41] analyzed
the Fraud ‘Iriangle Theory and presented its associated elements: “motives, possibilities,
pressure, rationalization, incentive, and others”. They offered a theoretical analysis of
the fraud scales and their elements: motives, conditions, possibilities, and performance
‘To this end, the authors analyzed 265 respondents—including accountants, stakeholders,
public officials, and inspectors in Central Java, Indonesia—by using structural equation
modeling (SEM) with the AMOS analysis tools. In [42], the authors assessed the Fraud
‘Triangle Theory and human behavior in order to study the factors of opportunity, financial
processes, and rationalization. The authors emphasized the importance of psychological
and moral aspects. The International Auditing Standard AI240 focuses on the auditor’s
responsibility to assess fraud in an audit of financial statements. The authors of [43]
explored if the standard has been used effectively in Indonesia based on the proposed
fraud indicators through a fraud analysis. A questionnaire survey was conducted with
three groups of auditors: external, internal, and government auditors. This study examined
auditors’ perceptions of the importance and existence of warning signs of financial fraud
by using the fraud diamond. The findings indicate that the auditors were able to identify
these red flags by giving them high scores. On the contrary, regarding the “level of use”,
the scores were low.
Mekonnen et al. [70] presented an insider threat prevention and prediction model
based on the fraud diamond by combining various approaches, techniques, and IT tools, as
well as criminology and psychology. The deployment of this model involved the collection
of information about possible intentions by using privileged information within a context of
preserving privacy, thus enabling high-risk insider threats to be identified while balancing.
privacy concerns.
3.2, RQ2: What Machine or Deep Learning Techniques Are Used to Detect Fraud?
This section reports the results of works that described the implementation of machine
learning and data analysis for fraud detection. We aimed to identify the most commonly
used machine or deep learning techniques in this realm. Table 7 shows that this research
question had the highest number of related works. Table 8 presents the main focus of the
articles and the ML/DL techniques used, as well as the dataset information. All of these
articles are summarized below.
There are works that enhance traditional security approaches. In [60], the need to use
the Process Information Systems (PAIS) software in organizations and the importance of
fraud detection were investigated, They claimed that this tool is a must for organizations,
as its flexibility raises fraud detection, The authors of [63] sought to design an artifact
(hardware) for detecting communications from disgruntled employees through automated
text mining techniques. The artifact that they developed extended the layered approach in
order to combat internal security risks, They claimed that this phenomenon can be detected
in e-mail repositories by using employee dissatisfaction as the primary indicator of fraud
risk. Considering the methods of fraud detection based on simple comparisons, detection of
associations, clustering, perdition, and outliers, an automated fraud-detection framework
‘was proposed in [47]. The framework allowed fraud identification by using intelligent
agents, data fusion techniques, and various data mining techniques. In [67], the authors
proposed the detection of bank fraud through data extraction techniques, association,
grouping, forecasting, and classification to analyze customer data to identify patterns
Teading to fraud. To conclude this group of papers, West ct al. suggested that a higher level
of verification /authentication can be added to banking processes by identifying patterns
‘To do this, the authors reviewed key performance metrics used to detect financial fraud,
with a focus on credit card fraud. They compared the effectiveness of these metrics to
detect if fraud was carried out. In addition, the performance of the application of variousComputers 2021, 10,121
13 0f22
computational intelligence techniques to this problem’s domain was also investigated, and
the efficacy of different binary classification methods was explored.
Table 8, Summary of works that used machine or deep learning techniques to detect fraud.
= XNDTON NIA Se a pe Uw a goon
Gumcaldug Pevencd a iybrd deacon model rng machine lang nd Sa mA
Ke) RF ‘Financial and non-financial da methods for detecting financial fraud,
wi mp WA “Rolo Fad dtcton framework at allows fraud Wentcaton ng
‘huligen agent dats son ecgus and data mining techs
a a UEiNiins Lassng Modif means cern algerie etacting oles and eming im
epoory Tram he lant nproveprupg precio
Ty) Cw ML Sv NB CARE NIA Care ie ope eudand opine cba NBDE
ma ” ‘Vadncartnstworao conse oration fom a vase of cha and
a NN Ni database sources to identify suspicious account activity.
sy EM ioaeeng and ‘Woidine snd ha Univrshé Presented study on he ue of cing and dsr echnigues and compu
atboos Csr ibs ls Tht precio ofa tecton
= WM ANN Tndanean doskeachige _Tvagh a sppliaton fat ining algo, uch VM snd ANN he
ony cen incr for dling far faa ae potable
‘evelopment of ee malplnclas asien-—MLR, SYM, nd BN—o wall as
©) MER Sia ane aN NIA rv oso dengan canying miatomens caning he
reat fen aad
a MIRE SIN GR GNA, NA aed dns mining tight were sed on datz volving 202 Chine
LENS “Seopa sod competed ha wih tnd wide the tltion of ace
G BIRSIAGNN senbie or aud detection thancal poring, vasous eniguesafratunal ngage
‘ iechaigie sod LDA dtcuments EUCAR) ocean nd upevand Sachneletang se pple
oa ON mI TEs rn ont fm abi ps onc ar
a ERNWEMC BN DT mi etal oad aap apna ep
sn EN SVG BND a imitsobeye opinion ad act eing Tih compare the proposed method
dao and ng Bose WwihLi NNSUMBN DT Ataboos, thd Logos on four FD coasts
UENO SNA Deen NIA dh wn of isiing meds i cic fd gh
“nd Deco be
a oe NA Tipped DRL hanya ov pp liatns nbsing wd dso
tinplate rnd eecton
Pane, Hewre WA Undine tng GA sls nrg rine
a TENe NA Grd ard aod dcton wang pero ining ago
al Talend Fane NIA Sem at dts dntheprocerng of ced card eaneactons
I Ne Fold Daigned an rack Qardware) for detecting communications rom digranlcd
data “Saployes ung ntomated Wt ing chien,
Treraorl Grandad Araya th wee dat mining approach mere to dc the Hak of eal
ea MLce service provider fraud.
ma ONT RSST Tad innsatns rom Tiled the dp aring model for cog of le
reNRISiN indoresan ban ‘ord wanacone
1 RE witter and Pace igor slong withopproptiate indus use eases.
Tagline NIA Detcton ob a tough the of ata ing technique
im GRAN SM Ueoneo Key perforant ud for Fanci Fraud Deecion FFD) witha fous on
“eter fraud
© Neural Networks: NN Decision Trees: DT Bayesian Networks: BN; Random Forest Rly Kemeans: KM; Support Vector Machine
‘SVM: Artificial Neural Network: ANN;
jaltinomial Logistic Regression: MLR Mulilayer Dizect Feed Neural Network: MLFF; Genetic
Programming: GP; Group Method of Data Management: GMDH; Logistic Regression: LR; Probabilistic NN: PNN; Binomial Logistic
Regression: BLR Latent Dirichlet Assignment
A; K Nearest Neighbor: KNN, Deep Reinforcement Learning? DRL; Multivariate Latent
Class Clustering: MLCC; Convolutional Neural Network: CNN; Stacked Long, Short-Term Memory: SLSTM Naive Bayes: NB.
In [45], the authors summarized and compared different datasets and algorithms for
automated accounting fraud detection. The selected works addressed mining algorithms
that included statistical tests, regression analysis, NN, DI, BN, stack variables, ete. Re~Computers 2021, 10,121
1ofz2
gression analysis was widely used to hide data. Generally, the effect of detection and
the precision of NN were higher than those of regression models. The overall conclusion
‘was that pattern detection is better than detection by an unaided auditor. Due to the
small size of the fraud samples, some publications reached decisions based on training
samples and may have overestimated the effects of the models. In [46], S. Wang presented
a hybrid detection model using machine learning and text mining methods for detecting
financial fraud, This model used financial and non-financial data and employed two ways
of selecting easy-to-explain characteristics. During the investigation, the author chose 120
fraudulent financial statements disclosed by the China Securities Regulatory Commission
(CSRO) between 2007 and 2016. He compared the performance of five machine learning
methods and found that the Random Forest method had the following advantages: (1) It
is suitable for processing high-dimensional data; (2) it avoids overfitting to some extent;
@) itis robust and stable. Ravisankar et al. proposed the use of data mining techniques to
identify companies that resort to financial statement fraud [54]. Specifically, the authors
tested the MLFF, SVM, GP, GMDH, LR, and PNN techniques. The evaluation considered
the role of feature selection and relied on a dataset involving 202 Chinese companies. Theit
results indicated that the PNN outperformed all of the methods without feature sclection,
and the GP and PNN outperformed others with feature selection and marginally equal
Pree or other works that compared different MI. methods, we found the following. In 53),
the authors developed three multiple-class classifiers (MLR, SVM, and BN) to detect and
classify misstatements according to the presence of fraud intent. Using the MetaCost tool,
the authors conducted cost-sensitive learning and solved class imbalance and asymmetric
rmisclassfication costs. In [58], the use of data mining methods to detect fraud in electronic
ledgers through financial statements was explored. The Linear Regression, ANN, KNN,
SVM, Decision Stem, MSP Tree, J48 Tree, RE, and Decision Table techniques were used
for training, The authors of [61] detected credit card fraud by using supervised learning
algorithms, such as a DT and NB.
Focusing on the use or comparison of ANNs with other methods, Vimal Kumar et al. [49]
analyzed the challenges of detecting and preventing fraud in the banking industry when
having insider information. The authors reviewed some of the data analysis techniques for
detecting insider trading scams. Their work lists the best data mining techniques available
(NN, DT, and Bayesian Belief Networks), which have been proposed by many researchers
and employed in different industries. They concluded that the banking industry's primary
requirements are fraud detection and prevention and that data mining techniques can
help reduce fraud cases. In addition, the work in [50] proposed the use of NN to correlate
information from a variety of technological sources and databases in order to identify
suspicious account activity. The work in [52] applied data mining algorithms, such as a
SVM and ANNs, to detect financial fraud. The authors stated that the essential indicators of
financial fraud are profitability and efficiency. The incorporation of these factors improved
the accuracy of the SVM algorithm to 88.37%. The ANNs produced the highest precision,
90.97%, for data without feature selection. In [56], Mohanty et al. aimed to identify a
person of interest from the corpus of Enron email data released for research. They tried to
detect fraudulent activities by means of an ANN with the activation functions of the Adam
optimizer and ReLU. Their work achieved high precision in terms of recall, accuracy, and
FI score
Regarding unsupervised approaches, a proposal to detect outliers using a modified
K-Means Clustering algorithm was presented in [48]. For this work, the detected outliers
were removed from the dataset to improve the grouping precision. They also validated
their approach against existing techniques and benchmark performance. The authors
of [51] presented a study on the use of K-Means Clustering and the AdaBoost Classifier,
comparing their accuracies and performances with an analysis of the past and present
models used for fraud detectionComputers 2021, 10,121
15 0f22
Regarding the use of more sophisticated techniques for the problem of fraud detection
in financial reporting, the authors of [55] applied various natural language processing tech-
niques and supervised machine learning, including BLR, SVM, NN, ensemble techniques,
and LDA. They applied Latent Dirichlet Allocation (LDA) to a collection of 10-K financial
reports of documents available in the EDGAR database of the United States Security and
Exchange Commission to generate a frequency matrix of documents and topics. In addition,
they applied evaluation metrics, such as the accuracy, receiver performance characteristic
curve, and area under the curve, to evaluate the performance of each algorithm. For the
resolution of problems for FED, Li and Wong, [57] proposed a new method based on
GBGP through multi-objective optimization and set learning. They compared the proposed
method with LR, NN, SVM, BN, DT, AdaBoost, bagging, and LogitBoost in four FFD
datasets. The results showed the efficacy of the new approach on the given FFD problems,
including two real-life situations. The authors of [59] applied the theory of DRL through
two applications in banking and discussed its implementation for fraud detection. Using
a DT with a combination of the Luhn algorithm and the Hunt algorithm, Save et al. [62]
proposed a system that detects fraud in the processing of credit card transactions. The
validation of the card number is done through the Luhn algorithm. The authors of [64]
focused on the detection of external fraud. The use of a data mining approach in order
to reduce the risk of internal fraud was also discussed. Consequently, a descriptive data
mining strategy was applied instead of the widely used prediction data mining techniques.
‘The authors employed a multivariate latent class clustering algorithm for a case firm’s
procurement data. Their results suggested that their technique helps to assess the current
risk of internal fraud
Exploring a deep learning model to learn short- and long-term patterns from an
unbalanced input dataset was an objective set by [65]. The data obtained were transactions
of an Indonesian bank in 2016-2017 with binary labels (no fraud or fraud). They also
explored the effects of sample ratios of non-fraud to fraud from 1 to 4 and three models: a
convolutional neural network (CNN), short-term /long-term stacked memory (SLSTM),
and a CNN-LSTM hybrid. Using the area under the ROC curve (AUC) as the model
performance metric, the CNN achieved the highest AUC for R = 1, 2,3, 4, followed by the
SLSTM and CNN-LSTM. The authors of [66] proposed the implementation of both the
document clustering algorithm and a set of classification algorithms (DT, RF, and NB), along
with industry-appropriate use cases, In addition, the performance of three classification
algorithms was compared by calculating the “Confusion Matrix”, which, in turn, helped
us calculate performance measures such as “accuracy”, “precision”, and “recovery”.
3.3. ROS: Using Machine Learning Techniques, How Can Fraud Cases Be Detected by Analyzing
Human Behavior Associated with the Fraud Triangle Theory?
We found only one work related to this research question. This means that we obtained
few results when we tried keywords related to the topics most relevant to the research
questions (Fraud Detection + Human Behavior + Machine Learning Techniques + Fraud
‘Triangle Theory). Therefore, the combination of ML techniques and theories related to fraud
needs further investigation because it would integrate two knowledge fields (psychology
and data science) in order to improve fraud detection. In [69], the authors examined the
aspects of the fraud triangle using data mining techniques in order to evaluate attributes
such as pressure/incentive, opportunity, and attitude/rationalization, and, through the use
of expert questionnaires, they discussed whether their suggestion agreed with the results
obtained with the adoption of those techniques. The data extraction methods used in this
research included logistic regression, decision trees (CART), and artificial neural networks
(ANNS). They also compared data mining techniques and expert judgments. The ANNs
and CARI achieved training samples of 91.2% (ANN) and 90.4% (CARI), and they were
tested with correct classification rates of 92.8% (ANN) and 90.3% (CART), which were
more precise than those of logistic models, which only reached 83.7% and 88.5% of correct,
classification in the assessment of the presence of fraud.Computers 2021, 10,121
16 0f22
3.4. Quality Assessment
Once the QA questions were defined, we evaluated the primary studies identified in
the SLR. The score assigned to each study for each question is shown in Table 9.
Table 9. Quality assessment
+ QA QA-2 QA-3 Total Score Max S
Bs P P Y z 06.67
9] P P y 2 66.67
[40] N N N 0 0
a1] P Y Y 2 66.67
2] N N N ° 0
3] N N N 0 0
[70] P P Y 2 66.67
[5] P Y Y 25 83.33
fa] P y y 25 8333
47] N N N 0 0
[48] P P Y 2 66.67
[49] P Y Y 25 83.33
[50] P P Y 2 66.67
[1] P P Y 2 66.67
(52) P P y 2 66.67
53] P P y 2 66.67
[54] N N N 0 0
55] P P Y 2 66.67
(56) P y y 25 8333
57] P Y Y 25 83.33
[58] N N N 0 0
59] P P y 2 66.67
{60} P y y 25 83.33
{6l] N N N 0 0
[62] N N N 0 0
{63} P Y Y 25 83.33
{o4] 0 ° ° 0 0
[65] P P Y 2 66.67
{66} N N N 0 0
{67 P Y y 25 83.33
[68] P Y Y 25 83.33
[69] P Y Y 25 83.33
Total 105 ie 2 49
Max QA’ 21.42 33.68 449 100,
Total Score 47.62 7381 100
The total of the accumulated scores from the QA questions can be observed in the
“Total Score” row, showing that QA3 has 22 points, corresponding to 44.9%, demonstrating,
that this question was more representative in the review. QA? followed this with 33.68%,
and QA1 followed with 21.42%. On the other hand, the last row identifies the percentage of
points collected by the values assigned for a given QA question with respect to the points
obtained if each selected study received the highest score. Refs. [$5 46,49,56,57,60,63,67,69]
obtained the highest score of 2.5, which represents 83.33% of the maximum score that a
preliminary study could obtain; on the other hand, Refs. [38,39,41,44,48,50-53,55,59,65]
obtained a score of 2, that represents 66.67% of the maximum score, Refs, [40,42,43,47,54,58,
61,62,64,66] failed to get any scores, which means that their title and abstract showed that
they could answer the research question for this SLR, but after reviewing the full articles,
no features related to fraud detection using machine learning techniques were discussed.Computers 2021, 10,121 wotm
4. Discussion
In this work, we have reviewed contributions related to fraud detection, with a special
emphasis on those addressing fraud detection from the perspective of the modeling of
‘human behavior,
Applying techniques related to the analysis of human behavior allowed us to consider
behavioral factors that could empower the detection of unusual transactions that would
not have been considered if using traditional auditing methods. By observing people’
behavior, it can be seen that the human factor is closely related to the Fraud Triangle Theory.
On the other hand, the use of machine learning techniques to detect fraud was also
implemented in several works to predict behaviors related to this phenomenon. As a
result of our research, a significant number of articles (24) addressed this approach. In
this context, we found that mainly supervised and unsupervised algorithms are used for
fraud-detection analysis. The supervised strategy enables the blocking of fraud attempts
based on fraudulent and non-fraudulent samples. This is used in rule-based detection,
which automatically infers discriminatory rules from a labeled training set. In addition,
regarding fraud detection, our research unveiled that supervised algorithms regularly have
to deal with unbalanced classes, which might result in poor detection. Furthermore, these
techniques are unable to identify new fraud patterns. Unsupervised learning, however,
concentrates on the discovery of suspicious behavior as a proxy of fraud detection and,
thus, does not require prior knowledge about verified fraudulent cases.
‘Our review focuses on fraud detection performed by means of machine learning
techniques oF through analysis of human behavior based on the Fraud Triangle Thcory. By
answering thrce research questions, we tried to unveil how both approaches are addressed
in the literature and how they may be jointly applied.
By answering RQI, keywords such as human behavior and theories related to fraud
‘were linked, resulting in several related studies. The answer to RQ? linked machine
learning techniques with fraud detection; this question was the one that generated the most
results. The analyzed questions each produced results in a specific field, but when trying
to combine these fields by answering RQ3, we did not find works linking fraud detection
by means of machine learning techniques with any theory related to fraud.
Despite the existence of works about detecting fraud in the areas of data mining and
fraud theories, no literature reviews that jointly covered these two areas were identified
‘Table 10 presents a comparative summary of seven relevant SLRs and surveys performed
in the area of fraud detection, including our contribution,
‘Table 10, Comparison of related systematic literature reviews,
_ - a awuay Fal Saneed Wark Quay Aneamentat
SUR Work Year on Period Data se Primary Studies Primary Studies
vay amy Fate mein NA uasze naam ‘Newline galaton
baat a ud Ni s Ni ‘No valuation eters pled
Pag NE cag RROD 595/98 ‘No vahiton cers applied
t fava Tange Theory ope ; sinea en
Danning wigan imal epgmy pgs ae Te walled atten
ea adnan ? 259th “hited
fe) amor Damn ied neal N/A Nin Nia ‘No vlan eters pled
"Theory and dat ining echadgues” proposed by Us]
T-TREE Xplore ACM DI 5 Fagincering Vilage (Compendenys IST Wob of Scioncer 5 ScienceDirect Wiley Ter Science Journals
2: Google Scholar, 8 Citescer, 8 Springerink; 10: Scopus; TI: Business Source Premier (EBSCO); 12: Emerald Full Text 13: World Scientific
Net: ProQuest.Computers 2021, 10,121
18 0f22
In the “Context” column of Table 10, there are four SLRs that are exclusively related to
some aspect of data mining [25,26,28,29], while only one is related to some aspect of fraud
theory [75], in addition to other approaches [73,75]. The last row of Table 10 also presents
information about the SLR covered in this document, the context of which explores both
data mining and fraud theories together, unlike the other seven presented in this table.
‘These SLRs were published between 2007 and 2020, with the novelty that some of them.
[26,29,73] do not mention the related search period. The research periods of [25,28,74] range
from 10 to 11 years, but include primary studies without making cuts in any specific year.
Some works do not specify the sources of data, and those doing so report a variable number
of data sources. Studies that mention data sources do not clearly explain their reasons
for selecting them. On the other hand, for our research, four data sources were chosen to
maximize the probability of identifying relevant candidate works as primary studies.
Both the number of candidate articles from the data sources and the number of selected
primary studies are presented in this table for cach SLR. The differences in these numbers
may be related to the context of each investigation, e.g, data sources used, keywords, etc
For our SLR, the number of reviewed works resulted from the searches in the different data
sources used in combination with the chosen keywords, while the final number of primary
studies was similar to those of other works. It should be noted that there are works that do
not mention this metric
Although quality evaluation is not a mandatory parameter in the structure of an SLR,
according to [76], itis an essential contribution in this type of work in order to improve its
quality. None of the analyzed works clearly showed how an evaluation was carried out in
this regard. No criteria were mentioned for assessing the quality of the primary studies.
Our work was based on the evaluation criteria proposed by [77].
5. Conclusions and Future Work
Fraud detection is complex, as it requires the interpretation of human behavior, but
this is not the only issue. The lack of data available for training or testing detection
models significantly complicates the assessment of detection strategies. Even when data
are available, unbalanced datasets are the norm in this domain.
‘Accordingly, there are very different approaches that tackle the problem of fraud
detection, as well as systematic literature reviews that are intended to address these
limitations from a more global perspective. Thus, the purpose of this research was to
identify publications related to fraud detection through the use of ML techniques based on
the Fraud ‘Triangle Theory. The proposed reference frameworks focus on developing tools
that allow auditors to perform fraud analyses more efficiently by shortening their detection
time through support from data mining techniques. Most of the works concentrate on
carrying out their analyses after fraud has been carried out in an attempt to shorten the
time taken to find results; thus, these proposals are reactive to such events,
Through this research, it was found that there are a significant number of research
projects that are being carried out in this specific area of fraud detection; in general,
they have a solid level of maturity. The large number of publications in conferences and
jourals—representing 50% and 50% of primary studies, respectively—is substantial proof.
In addition, the results of the quality evaluation carried out for the primary studies showed
that the evaluation of their proposals was satisfactory in terms ofthe criteria of “relevance”,
“limitations”, and “methodology”. When we assumed an approach to fraud detection.
through data mining techniques and the use of fraud theories associated with human
behavior, this SLR reveals very little evidence from studies supporting this approach, since
only one primary study was found, corresponding to 3.13% of the studies. When we
allowed partial coverage, that is, fraud detection by applying only data mining techniques,
24 primary studies (corresponding to 75%) could be classified. On the other hand, when
we analyzed the approach to the analysis and detection of fraud in which only theories
related to fraud that were associated with human behavior were considered, seven primary
studies (corresponding to 21.88%) were found to support this approach.Computers 2021, 10,121 19 0f22
In this sense, only one study with evidence of the use of data mining techniques, the
application of fraud theories, and a corresponding analysis of human behavior to detect
fraud was identified, which means that there is a gap, and this is an appropriate field
to investigate.
As future work, itis proposed that a review focused on detecting fraud and incorpo-
rates an analysis of the availability of data and the lack of access to this resource, including
other data sources as possible alternatives, should be carried out,
Author Contributions: Conceptualization, MSA. and LU-A.; methodology, MSA. and JEJy
validation, MS.~A,, LUA. and JJ; investigation, MS.-A,; writing—original draft preparation,
MS-A; writing—review and editing, LU.-A. and J.-J; supervision, LUA. All authors have read
and agreed to the published version of the manuscript
Funding: This research received no external funding,
Institutional Review Board Statement: Not applicable
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy limitations conceming the
use of personal information,
Acknowledgments: This work was partially supported by Escuela Politécnica Nacional under the
research project PILDETRI-2021-02 “Deteccién de frauide mediante andlisis de t6picos y métodos de
clasificacion”, Marco Sanchez is a recipient of a teaching assistant fellowship from Escuela Politécnica
‘Nacional for doctoral studies in computer science,
Conflicts of Interest: The authors declare no conflict of interest
References
1. Shaikh, A.K; Nazir. A novel dynamic approach to identifying suspicious customers in money transactions. In, J Bus, Intell
Data Min, 2020, 17, 143-158
2. Panigrahi, PK. A framework for discovering internal financial fraud using analytics. In Proceedings ofthe 2011 International
Conference on Communication Systems and Network Technologies, Katra, India, 3-5 June 2011; pp. 323-327.
3. Silowash, G; Cappelli, D; Moore, A. Trzeciak, R Shimeall,T; Flynn, I. Common Sense Guide to Prevention and Detection of Insider
Threats, th ed.; Carnegie Mellon University CyLab: Pittsburgh, PA, USA, 2012,
4. Kassem,R. Detecting asset misappropriation: A framework for external auditor. Int J. Account. Audi. Perform. Eval. 2014,
10, 1-42. [Crossed]
5. Sayal, K; Singh, G. What Role Does Human Behaviour Play in Corporate Frauds? Econ. Poitial Wkly. 2020, 5. Available online:
Jtps/ /www.epwin engage /aticle/what-ole-does-human-behaviour-play-corporate (accessed on 1 September 202).
6 Gabrielli, GMedioli, A. An overview of instruments and tools to detect fraudulent financial statements. Uno, J Account Finan
2018, 7, 76-82. [CrossRef]
7. Dimitrijevig, D.; Kalinig, Z. Software Tools Usage in Fraud Detection and Prevention in Governmental and External Audit
Organizations in the Republic of Serbial. In Krowledge-Fzonony Society, Cracow University of Ezonomies: Cracow, Poland, 2017;
7
8, Vynokurove, 0; Peleshko, D; Bondarenko, Ilyasoy V; Serchanoy, ; Plesk, M. Hybrid Machine Leaming System fr
Solving Fraud Detection Tasks. In Proceedings of the 2020 IEEE Third Intemational Conference on Data Stream Mining &
Processing (DSMP), Lviv, Ukraine, I-25 August 2020; pp. 1-5. [CrossRef]
3 Lebichot,B; Paldino, GM; Bontempi, G, Siblini, W. Tle, 1. Oble, F. Incremental learning strategies for credit cards fraud
detection: Extended abstract. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced
Analytics (DSAA), Sydney, Australia, 6-9 October 2020; pp. 785-786. [Crossitf]
10, aia, RA Discrete Wavelet Transform Approach to Fraud Detection. In Proceedings of the International Conference on Network
and System Security, Helsinks Finland, 21-23 August 2017
11. Vynokurova, 0.;Peleshko, D.; Zherova, P; Perova,1; Kovalenko, A. Solving Fraud Detection Tasks Based on Wavelet-Neuro
Autoencoder In Proceedings ofthe International Scientific Conference “Intellectual Systems of Decision Making and Problem of
‘Computational Intelligence”, Zalizniy Port, Ukraine, 25-29 May 2021; pp. 535-546. [CrossRef]
12, Omair,B; Alturki, A. Taxonomy of Fraud Detection Metrics for Business Processes. ITT. Access 2020, 8, 7IS64-71877, [CrossRef
13. Omair, By Alturki, A. MultiDimensional Fraud Detection Metrics in Business Processes and their Application. Int. J. Ade.
Comput. Sei Appl. 2020, 11,570. [CrossRef]
VA. Ruankaew, T. The Fraud Factors. int. J. Manag. Adm, Sci (IJMAS) 2013, 2, 1-5Computers 2021, 10,121 200822
15. Mansor, N; Abdullahi, R. Fraud triangle theory and fraud diamond theory. Underst
future research. Int. J. Acad. Res. Account. Financ. Manag. Sei. 2015, 1,38-45.
16, Burke, D.D, Sanney, KJ. Applying the fraud triangle to higher education: Ethical implications, J, Legal Slud. Esc, 2018, 35
[CrossRef]
17. Awang, N; Hussin, NS.; Razali, FA, Lyana, S; Talib, A. Fraud Triangle Theory: Calling for New Factors. Editor. Board 2020, 7,
54-64
18, Wolfe, D.T; Hermanson, DR. The fraud diamond: Considering the four elements of fraud. CPA J. 2004, 74, 38.
19. Ruankaew, T. Beyond the fraud diamond. Int. J Bus. Manag. Econ. Res. (I]BMER) 2016, 7, 474-476.
20. Christian, N.; Base, Y; Arafah, W. Analysis of fraud triangle, fraud diamond and fraud pentagon theory to detecting corporate
fraud in Indonesia. Int. J. Bus. Manag. Technol. 2019, 3, 73-78,
21. Manolopoulos, ¥; Spathis, C, Kirkos, F. Data Mining techniques for the detection of fraudulent financial statements. Expert Syst.
Appl. 2007, 32, 995-1003.
22. Mecnatkshi, R, Sivaranjani, K. Fraud detection in financial statement using data mining technique and performance analysis.
ICTA 2016, 9, 407-113,
23. AlHlashedi, K.G; Magalingam, P. Financial fraud detection applying data mining techniques: A comprehensive review from
2009 to 2019. Comput. Sci, Rev. 2021, 40, 100402. [CrossRef]
24, Deng, W, Huang, Z.; Zhang, J; Xu, J. A Data Mining Based System For Transaction Fraud Detection. In Proceedings of the
2021 IEEE Intemational Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15-17
January 2021; pp. 542-548.
25. Phua, C; Lee, V; Smith, K, Gayler, R.A comprehensive survey of data mining-based fraud detection research. arXio 2010,
arXiv’ 10096119.
26. Zhou, X; Cheng, S; Zhu, M.; Guo, C; Zhou, $; Xu, P; Xue, Z.; Zhang, W. A state of the art survey of data mining-based fraud
detection and credit scoring. In MATEC Web of Conferences; EDP Sciences: Les Ulis, France, 2018; Volume 189, p. 13002
27, Gupta, S; Mehta, SK, Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis
to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies. Glob. Bus. Rev. 2021. [CrossRef]
28. Ngai, EW; Hu, Y; Wong, YH; Chen, ¥; Sun, X. The application of data mining techniques in financial fraud detection: A.
classification framework and an academic review of literature. Decis. Suppor! Syst. 2011, 50, 559-569, [CrossRef]
29, Yue, D.; Wu, X; Wang, ¥, Li, Ys Chu, CH. A review of data mining-based financial fraud detection research, In Proceedings of
‘the 2007 International Conference on Wireless Communications, Networking and Mobile Computing, Shanghai, China, 21-7
September 2007; pp. 5519-5522.
30, Sasirekha, M, Thaseen, 1S.; Banu, JS. An Integrated Intrusion Detection System for Credit Card Fraud Detection. In Addoances in
Computing and Information Technology; Springer: Berlin Heidelberg, Germany, 2012; pp. 85-60.
31. Dyba, T; Kitchenham, B.A, Jorgensen, M. Evidence-based software enginzering for practitioners. IEEE Softw. 2008, 22, 58-65.
[CrossRef]
32, Staples, M.;Niazi, M. Experiences using systematic review guidelines. J. Syst, Soft, 2007, 80, 1425-1437, [CrossRef]
33. Kitchenham, B Charters, S, Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE 2007-001, Keele
University and Durham University Joint Report; Kitchenham: Newcastle, UK, 2007. Available online: https: /citeseer ist psu
edu/viewdoc/download?doi=10.1.1.117 471érep=rep léetype~pdf (accessed on I September 2021),
Cronin, P; Ryan, F; Coughlan, M. Undertaking a literature review: A step-by-step approach. Br. J. Nuys. 2008, 17, 38-43.
[CrossRef]
Zhang, H.; Babar, M.A,; Tell, P, Identifying relevant studies in software engineering. Inf. Softw, Technol. 2011, 53, 625-637.
[CrossRef]
36. Rouhani, B.D, Mahrin, MN; Nikpay, F; Ahmad, RB, Nikfard, P. A systematic literature review on Enterprise Architecture
Implementation Methodologies. In, Softw. Technol. 2015, 62, 1-20. [CrossRef]
37. Li, Ys Peng, R,; Wang, B. Challenges in Context-Aware Requirements Modeling’ Systematic Literature Review. In Proceedings
of the Asia Pacific Requirements Engeneering Conference, Melaka, Malaysia, 9-10 November 2017; pp. 140-155.
38. Hoyer, 8, Zakhariya, H; Sandner, T; Breitner, MH. Fraud prediction and the human factor: An approach to include human
behavior in an automated fraud audit. In Proceedings of the 2012 45th Hawaii International Conference on System Sciences,
‘Maui, HI, USA, 4-7 January 2012; pp. 2362-2391,
39, Sinchez, M; Torres, J; Zambrano, P; Flores, P. FraudFind: Financial fraud detection by analyzing human behavior. In Proceedings
of the 2018 IBEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8-10
January 2018; pp. 281-286,
40. Sandhu, N. Behavioural rd flags of fraud—A qualitative assessment. J. Hum. Values 2016, 2, 221-237. [CrossRef]
41. Mackevitius, J; Giriinas, L. Transformational research ofthe fraud triangle. Ekonomika 2013, 92, 150-168. [CrossRef]
42, Zulaikha, Z; Hadiprajitno, P; Rohman, A; Handayani, R, Effect of attitudes, subjective norms and behavioral controls on the
intention and corrupt behavior in public procurement: Fraud triangle and the planned behavior in management accounting
Accounting 2021, 7, 331-338, [CrossRef]
43. Omar, NB, Din, LEM. Fraud diamond risk indicator: An assessment ofits importance and usage. In Proceedings of the 2010
International Conference on Science and Social Research (CSSR 2010), Kuala Lumpur, Malaysia, 5-7 December 2010; pp. 607-812.
ding the convergent and divergent for