Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
55 views22 pages

Fraud Detection Using The Fraud Triangle Theory and Data

Uploaded by

RAUSHAN KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
55 views22 pages

Fraud Detection Using The Fraud Triangle Theory and Data

Uploaded by

RAUSHAN KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 22
33] computers (nay) Review Fraud Detection Using the Fraud Triangle Theory and Data Mining Techniques: A Literature Review Marco Sénchez-Aguayo "0, Luis Urquiza-Aguiar™*O and José Estrada-Jiménez >t Oa: pdates Citation: Sénchez Aguayo, Mi Urges Aguie L Extenda iene, |. Fraud Detection Using the Frmd Triangle Theory and Data Mining Techniques: A Literature Review ped / ong /103380/computers1 010023 Academic ator: Francesc Fallucch Recrved 16 June 20 Accepted? Septem 212 Published 30 September 2021 with regard to juisitonal aims in published maps and institut afl (omoml Copyright © 2021 by the authors LUcerace MDP, Basel, Switz This sete ean open access ariel conditions ofthe Creative Commons Altaation (CC BY) Leese tps // creativecommonsorg/lenses/9y/ Departamento de Informstca y Ciencias dela Computacin, Escuela Politécnica Nacional, Ladtén de Guevara E11-253, Quito 170517, Feusdor Departamento de Flectrdnica,Telecomunicaciones y Redes de Informacidn, Escuela Politénica Nacional, Ladrén de Guevara E11-253, Quito 170517, Beuador; lus [email protected] ec (LU-A)} jose estradatepn eduec (E-}) + Conrespondence: marco sanchez01Gepn.edu.ce + These authors contributed equally to this work. Abstract: Fraud entails deception in order to obtain illegal gains; thus, itis mainly evidenced within financial institutions and is a matter of general interest. The problem is particularly complex, since perpetrators of fraud could belong to any position, from top managers to payroll employees. Fraud, detection has traditionally been performed by auditors, who mainly employ manuaal techniques. ‘These could take too long to process fraud-related evidence. Data mining, machine learning, and, as of recently, deep learning strategies are being used to automate this type of processing. Many related techniques have been developed to analyze, detect, and prevent fraud-related behavior, with the fraud triangle associated with the classic auditing model being one of the most important ofthese. This work aims to review current work related to fraud detection that uses the fraud triangle in addition to machine learning and deep learning techniques. We used the Kitchenham methodology. to analyze the research works related to fraud detection from the last decade, This review provides evidence that fraud is an area of active investigation. Several works related to fraud detection using machine learning techniques were identified without the evidence that they incorporated the fraud triangle as a method for more efficient analysis. Keywords: fraud; machine learning; cybersecurity; human behavior 1. Introduction Fraud has increased considerably in recent times, affecting the interests of both finan- cial institutions and their customers. A study conducted by Price Waterhouse Coopers found that 30% of the companies that they surveyed had already been victims of fraud. Moreover, 80% of their fraud was committed within the companies’ ranks, especially in administrative areas, such as accounting, operations, sales, and at the management level, without leaving aside the customer service dependencies [1]. Fraud-related activi- ties, which are generally unknown within a company, determine a series of irregularities and illicit acts characterized by intentional deception committed by fraudsters. Most of the anomalies detected are due to the lack of internal control mechanisms, and in such situations, scammers commit fraud by exploiting the weaknesses [2] Fraud is considered a subset of internal threats, such as corruption, misappropriation of assets, and fraudulent declarations, among others [3]. In a more formal definition, fraud is “the use of one’s occupation for personal enrichment through the misuse or deliberate misapplication of the resources or assets of the employing organization”, according to the Association of Certified Fraud Examiners (ACFE) [4]. The ability to commit this type of activity is based on the weakness of the control mechanisms that institutions and companies have. In such circumstances, fraudsters commit acts of fraud by taking advantage of these weaknesses. (Computers 2021, 10,12. bitps //doiorg/10.3390/computers10100121 bitps//wwwmdp com/journal/computers Computers 2021, 10,121 2of2 Since it is committed by humans, fraud is tightly coupled with human behavior ‘Thus, understanding the motivations of perpetrators or their psychological and personality traits that drive them to cross ethical boundaries can provide a new perspective for fraud detection [5]. Currently, there are different solutions [6] for detecting fraud, which are focused on the use of different tools that perform statistical and parametric analyses based on data mining techniques, as well as analyses of behavior, but none of them solve the problem of timely fraud detection [7] Given the complexity of analyzing human behavior to detect fraud, some approaches in this line have been proposed to tackle some of the issues involved in this task. For Instance, some works aimed to improve the precision and increase the speed of data pro- cessing through a hybrid automatic learning system [8] or through incremental learning [9] Another challenge for fraud detection is the lack of data from which detection systems learn, and [10] proposed a fraud-detection system that does not require previous fraud- ulent examples. However, even when the data are available, large and small datasets should be addressed differently [11]. In any case, as a human behavior, fraud detection is a multidimensional problem, and so are some of the fraud-detection mechanisms proposed in the literature [12,13]. There is a consensus that prevention should be a priority in order to minimize fraud through proper risk management. Avoiding fraud saves time and financial resources, since detecting it after it occurs has the consequence that the stolen assets are practically irrecoverable. To enhance fraud prevention, organizations should focus on the root of the problem by identifying the causes that lead people to commit fraud and to understand their behavior [14]. Many theories have attempted to answer this question, and the most frequently cited in this context are Cressey’s Fraud Triangle Theory (FTT) and Wolf and Hermanson’s Diamond Fraud Theory (FDT) [15]. Both approaches analyze how perpetrators go so far as to commit fraud, which is discussed below: The study of fraud and its analysis is best explained with the help of the Fraud Triangle Theory (FIT), which was proposed by Donald R. Cressey, a leading expert in the sociology of crime. Cressey investigated why people committed fraud and determined their responses based on three elements: pressure, opportunity, and rationalization. This theory also mentions that these elements occur consecutively to provoke the desire to commit fraud. The first necessary element is perceived pressure, which is related to the ‘motivation and drive behind the fraudulent actions of an individual. This motivation often occurs in people who are under some form of financial stress [16]. The second element, known as perceived opportunity, is nothing more than the action behind the crime and the ability to commit it. Finally, the third component, known as rationalization, has to do with the idea that the individual can rationalize their dishonest acts, making their illegal actions seem justified and acceptable [17] The FDT, considered an extended version of the FTT, integrates a new vertex with the three that were already known—capacity [18]. Despite the cohesion among the three vertices of pressure, opportunity, and rationalization, itis unlikely that people will commit fraud unless they have the capacity (considered the fourth vertex). In other words, the potential perpetrator must have the skills and ability to commit fraud [19] ‘Various theories of fraud have been used to explain the motivation of this phenomenon, ‘The FIT and FDT can be effectively used to detect the possibility of corporate fraud, where the measurement of all of the associated variables will depend to a great extent on the data used for the study, whether public or private [20] Fraud analysis, when supported by data mining techniques, helps reduce the manual parts of the detection verification process and makes the search for fraud more efficient. Itis impossible to guarantee the proper moral and ethical behavior of people, especially in the workplace. Due to this reality, a valid option for identifying possible evidence of fraud from available data is to use automatic learning algorithms. Many works cover fraud detection and use data mining techniques as the primary focus [21-24]. Two criticisms of Computers 2021, 10,121 3of22 data-mining-based fraud-detection research are frequently raised: the deficiency of the actual public data available in this domain for conducting experiments [25]—appropriate access to data for researching this area is extremely difficult due to privacy—and the lack of well-documented and published methods and techniques. 11. Related Work Here, we describe some systematic reviews whose main objectives were the anal- ysis and detection of fraud using automatic learning techniques and the application of, fraud theories Phua et al. [25] carried out a survey in which they identified the limitations of fraud- detection methods and techniques and showed that this field can benefit from other related areas. Specifically, unsupervised approaches may benefit from existing monitoring systems and text extraction, semi-supervised, and game-theoretical approaches; spam and intrusion detection communities can contribute to future fraud-detection investigations. However, above all, the authors focused on the nature of the information and made an exciting reflection on the investigation of fraud detection based on data mining. They also referred to the scarcity of publicly available and real data for carrying out experiments, as well as to the lack of well-documented and published methods and techniques. Zhou et al. [26] concluded that most fraud-detection systems employ at least one supervised learning method and that unsupervised and semi-supervised learning methods are also used. The study showed that these techniques can be used alone or in combination to build more robust classifiers and that, without losing generality, these approaches are relatively successful in detecting fraud and credit scoring. They mentioned that fraud detection and data-mining-based credit scoring are subject to the same classification related issues, such as feature engineering, parameter selection, and hyperparameter tuning. The authors also observed that fraud-related data are not abundant enough for investigators to train and test their models and that complex financial scenarios are nearly impossible to represent. They explained that fraud detection must constantly evolve, and it must particularly depend on the industry in which itis applied. The authors of [27] performed a meta-analysis to establish the effect of mapping data samples from fraudulent companies to non-fraudulent companics using classification methods by comparing the general classification precision found in the literature. The results indicated that fraudulent samples could be matched equally to non-fraudulent samples (1:1 data mapping) or could be unevenly mapped using a one-to-many ratio to proportionally increase the sample size. Based on this meta-analysis, compared to statis- tical techniques, machine learning approaches can achieve better classification precision, specifically when the availability of sample data is low. Furthermore, high classification precision can even be obtained with a dataset with 1:1 mapping by using machine learning classification approaches. The results mentioned by the authors of [28] clearly show that data mining techniques have been applied more widely for fraud detection in other fields, such as insurance, corporate, and credit card fraud. In this line, we found a lack of research on mortgage fraud, money laundering, and security fraud. The main data mining techniques used for detecting financial fraud are logistical mod- els that provide immediate solutions to the problems inherent in detecting and classifying fraudulent data. The authors of [29] conducted a review of the literature to address the following research questions related to financial statement fraud (FSF): (1) Can FSF be detected, how likely is it, and how can it be done? (2) What characteristics of the data can bbe used to predict FSF? (3) What kind of algorithm can be used to detect FSF? (4) How can detection performance be measured? (5) How effective are these algorithms in terms of detecting fraud? This work presents a generic framework to guide this analysis. The reviews mentioned above have something in common: They try to unveil the main techniques used for fraud detection, such as machine learning methods (supervised, unsupervised, and semi-supervised), and try to identify which of these are more effective Computers 2021, 10,121 40822 ‘This analysis was carried out in different scenarios, contrasting the results obtained and specifying the study area in which they are most accurate. We could not find studies linking fraud detection by means of machine learning techniques and the Fraud Triangle Theory, Finally, we find it important to comment on some theories for the understanding of fraud detection, Studies such as [15] analyzed the convergence and divergence of two classic theories of fraud: the triangle theory and the diamond theory. There, the concept of fraud and the convergence of the two classical theories were examined. This work also discussed the differentiation between them. In doing so, the similarities and differences between these theories were highlighted and appreciated. A discussion of the two approaches contributes to the understanding of fraud, especially for fraud professionals and fraud examiners. 1.2, Contribution This research aims to compile the literature related to fraud detection from two perspectives. On the one hand, we analyze works that consider human behavior as an inherent risk factor in this problem, especially by using the FIT and FDT. Beyond exploring these theories, on the other hand, our review analyzes different works where machine learning techniques have been used for fraud detection. Moreover, we look for works that integrate ML techniques with behavior-based theories of fraud, such as the FTT and FDT To do this, we used the well-known methodology of Barbara Kitchenham and formu- lated three research questions. As a result, we provide an up-to-date and comprehensive analysis of the subject. It will help in identifying, investigating, and evaluating the causes that lead to fraud and in detecting it. This study can guide further research on the topic in areas that the investigation has not considered. The rest of this paper is organized as follows. Section 2 addresses the methodology used to perform this review. Then, Section 3 summarizes our findings. After that, we discuss the weaknesses and strengths of the techniques identified in Section 4. Finally, Section 5 draws conclusions and describes future work. 2. Materials and Methods A systematic literature review (SLR) was carried out for this research work. According to [25], the purpose of an SLR is to provide a complete list of all studies related to specific subject areas. Meanwhile, traditional reviews attempt to summarize the results of several studies. An SLR uses an evidence-based approach to meticulously search for relevant studies within a context to answer predefined research questions and select, evaluate, and critically analyze the findings in order to answer those research questions; this is done by following the recommendations reported in [30]. Considering the guidelines and recommendations described by Barbara Kitchenham [31 a systematic literature review must follow the methodological process illustrated in Figure 1. 2.1, Research Questions As we stated, this article aims to review and summarize the works related to fraud detection that is performed by using machine learning techniques or the Fraud Triangle Theory. We do not restrict our search to any specific knowledge. The SLR research questions (RQs) that we intend to answer in this paper are the following: 1. RQL: How can fraud be detected by analyzing human behavior by applying fraud theories? 2. RQ2: What machine or deep leaming techniques are used to detect fraud? 3. RQB: Using machine learning techniques, how can fraud cases be detected by analyz- ing human behavior associated with the Fraud Triangle Theory? 2.2, Keywords We looked for scientific publications related to fraud detection, its process of identifi- cation, and its application to answering our research questions. We specifically targeted Computers 2021, 10,121 Sofa works focused on fraud that relied on machine learning techniques or the Fraud Triangle ‘Theory. To this end, we created a base list of keywords that was built from the keywords found in related research, as shown in Table 1 ar cons Define main problem and eu research questions Define research methodology. Define search strings ‘Search in Scientific Databases Download full text versions | : Selection criteria Geet i=l ee-*| Classification and coding Data extraction and analysis ‘Writing and review the paper Figure 1. Methodology applied in the systematic literature review (SLR). Table 1. Keywords, Title 1 Title2 Titles 1 fraud FR 2 fraud detection a) 3 fraud triangle theory FIT 4 fraud diamond theory FDP 5 human behavior HB 6 behavior patterns BP 7 data mining DT 8 machine learning ML 9 deep learning DL 2.3, Search Strategy ‘We employed the guidelines from [32,33] to define a search strategy in order to retrieve as many relevant documents as possible. Our search strategy is described below. 23.1. Search Method ‘To find the most relevant publications for the topic addressed in this work, we queried the following databases: IEEEXplore, ScienceDirect, ACM Digital Library, and Scopus. We chose these databases because they offer the most essential and high-impact full-text journals and conference proceedings that cover the ML and FD fields in general. We carried out the searches in the titles, keywords, and abstracts of articles using the combinations of terms introduced in the following section. Computers 2021, 10,121 oof22 2.3.2. Search Terms The search string was designed according to what was mentioned in [34]. Based on the research questions, we constructed the following relationships: (“Data mining” OR “Machine learning” OR “Deep Learning”) AND (“Detection Fraud” OR “Internal Fraud” OR “Fraud Triangle” OR "Diamond Triangle” OR “Human Behavior”). All of these search terms were combined using “AND” operators to build the search string. The search terms in the string only matched the title, abstract, and keywords of the digital databases’ articles. Itis essential to find the correct search field or combination, be it the title, abstract, or full text, to apply in the search string and, thus, obtain effective results In many cases, searching only by the “title” does not always provide the most relevant publications, Therefore, it can be necessary to include the “abstract” and, in other cases, “the complete document” of the related publications, 233. Selection of Papers Since the searches in the articles’ full text resulted in many irrelevant publications, we decided to apply the search criteria by incorporating the “abstracts” of the papers. This means that an article was selected as a potential candidate if is ttle or abstract contained the keywords defined in the search string. As a first filter, we evaluated each paper's title and abstract according to the inclusion and exclusion criteria (see Table 2). We selected the articles within the scope of the research questions. We thoroughly and entirely read the previously selected articles (which passed the first filter) as a second filter. Ultimately, the papers were included or excluded according to the inclusion and exclusion criteria, We will focus next on explaining the inclusion exclusion criteria. Additionally, the search was limited to research written in English and published since 2010 [35] Table 2. Inclusion/exclusion criteria, ‘No Inclusion Criteria ICL Indexed publications not older than 10 years. IC2_ Scope of study: Computer Science IC3_ Primary studies (journal or articles). IC4 Papers that discuss aspects regarding fraud detection, IC5_The investigations considered have information relevant to the research questions. Exclusion Criteria Papers in which the language is different from English cannot be selected Papers that are not available for reading and data collection (papers that are only accessible by paying or are not provided by the search engine) cannot be selected. EC3_ Duplicated papers cannot be selected EC4 Publications that do not meet any of the inclusion criteria cannot be selected ECS_ Publications that do not describe scientific methodology cannot be selected. 2.4, Study Selection As shown in Figure 2, the selection of studies was performed through the following processes [36] 1. _ Identification: The keywords were selected from the databases listed above according to the research questions mentioned in the search method section, The search string was applied only to the title and abstract, as a full-text search would produce many irrelevant results [37]. The search period went from 2010 to 2021 2. Filter: All possible primary studies’ titles, abstracts, and keywords were checked against the inclusion and exclusion criteria. Ifit was difficult to determine whether an article should be included or not, it was reserved for the next phase. 3. Eligibility: At this stage, a complete reading of the text was carried out to determine if the article should be included according to the inclusion and exclusion criteria, Computers 2021, 10,121 7022 4. Data extraction: After the filtering process, data were extracted from the selected. studies to answer RQI-RQ3 ay Search Scenic eee as oo | — |Search in databases to identify Downoad tutte | _[°erer cessing he eee (FE pa Beso] Lae Selection criteria + Exclude irrelevant studied based on analysis of their peerless ss _[eransire te masa ig eae studied based on full text read etecorema| Obtain primary studies wa Writing and review the paper Figure 2. Process of the selection of studies 25. Quality Assessment Once we selected several primary studies based on the inclusion and exclusion criteria, ‘we assessed their quality. Following the guidelines in [36], three quality assessment (QA) questions were defined to measure the research quality of each proposal and to provide a quantitative comparison between the research works considered. The criteria were based on three quality assessment (QA) questions 1. Are the topics covered in the article relevant for fraud detection? Yes: It explicitly describes the topics related to fraud detection by applying ML techniques through the FIT. Partially: Only a few are mentioned. No: It neither describes nor mentions topics related to fraud detection using MI. techniques through the FIT. 2. Were the limitations for the study of fraud detection detailed? Yes: It clearly explained the limitations related to fraud detection by applying ML techniques through the FIT. Partially: It mentioned the limitations but did not explain why. No: It did not mention the limitations. 3. Did the study address systematic research? Yes: The study was developed system- atically and applied an adequate methodology to obtain reliable findings. Partially: The study was developed systematically and used a proper methodology but did not provide details. No: The study was not explained in a clear way and the authors did not apply an adequate methodology. The scoring procedure was defined as follows: Y (Yes = 1), P (Partially = 0.5), N (No = 0), or Unknown (.e,, the information was not specified). 2.6, Data Extraction and Analysis This section describes the data extraction process performed with the selected papers and the analysis of the data extracted in order to answer the research questions of this SLR. We extracted the required data from previously selected works that were accordingly Computers 2021, 10,121 Bof22 classified to answer the research questions, as shown in Table 3. The data extraction form used for all selected primary studies is indicated in order to carry out an in-depth analysis. ‘Table 3, Data extraction form. No Extracied Data Description “Type 7 Identity ofthe study ‘Unique identity for the study General 2 [Bibliographic references Authors, year of publication, tie, and source of publication General 3 Iype of study Book, journal paper, conference paper, workshop paper General 4 ‘The theories employed Description ofthe detection of fraud by applying the PTT and HB RQL 5 ‘The techniques considered Description ofthe detection of fraud by applying ML/DM techniques RQ2 6 — Combination of techniques and theories used Description of the analysis of theories and techniques used to detect fraud ROB. 7 Findings and Contributions Indication ofthe findings and contributions ofthe study General We extracted the most representative papers related to the research questions based. on the search string and associated terms. The results of the analysis of the data obtained are presented in the next section 27. Synthesis Many papers could contain keywords that were used in the search string, but they could be irrelevant to our research questions. Therefore, a careful selection of documents should include only those containing helpful information with respect to the research approach and the answers to the different research questions. As shown in Figure 3, we first searched each data source separately in order to later join the results obtained from the various sources of information, resulting in a total of 1891 papers. We obtained the most articles from Scopus, representing around 50% of all documents. Papers found 1891 ES = Figure 3. Studies retrieved through search engines ‘Table 4 shows the number of articles found per source according to the search for keywords related to the search strings in the selected databases. The second column shows the results of the initial selection of papers found in each source. Below is the number of articles that were chosen after removing the exclusion criteria. The number of articles that were selected after eliminating duplicate articles is presented in the fourth column, Finally, the papers from each source that were selected after completing the inclusion process are presented. It was necessary to refine the papers obtained by previously eliminating irrelevant studies to ensure that the works complied with the established selection criteria. Our search in the databases, the application of the search string to only the titles and abstracts of the articles, and the selection of articles that were published during the last eleven years yielded 1891 records. After using the exclusion criteria on these records, we obtained 254 studies. The analysis of the duplicity of such studies enabled us to find 106 papers that were relevant for a full-text review. Finally, after a full-text assessment, 32 studies [38-69] ‘were identified as a result of the analysis through the SLR technique. Therefore, a total of Computers 2021, 10,121 90622 32 publications met all of the inclusion criteria. The selection of studies from the initial search identification phase and the final number of included studies are presented in Figure 4. As initially proposed and to ensure that the resulting reviews contained relevant information, we read the full text of the 32 studies to verify if they fit our adopted selection criteria, As a result, all of these publications represented our final set of primary studies. ‘Table 4, Number of papers found through the selection process. ‘Source PapersFound AbstractandTitle Duplicity Selected Scopus 960 7 8 16 TEEE al 68 3 7 WoC 360 a 16 9 ACM 230 48 u 4 Total 1891 254 106 32 ((Fitec Yes, decureouree, ype ndanguage >) (2010-20'Corferonce paper, aril, jounalsengsh) Included (n=1231) Baad in=660) Fier Exclude relevant based on abstracts and tes Included (n=254) ria ~ L_tosarny rc >) ‘terion: Buplety Included (n=106) cel nas) | (terion: Exclude relevant based ful ex Included (ne32) Pais (=r) {tem nae revi a) Figure 4. Steps followed to narrow the search results. Regarding the types of publications where the selected papers were available, we found that 50% of them had been published in conferences and 50% in journals. Table 5 shows the number of citations of the selected articles. The data presented (column cited) provide only an approximation of the citation rates and are not intended for ‘comparisons among studies. Regarding the period of publication of the selected articles, 32 studies were published. between 2010 and 2021. Furthermore, as shown in Figure 5, 2010, 2015, 2016, and 2017 had the most significant numbers of articles, while 2011, 2012, 2019, and 2020 hed the lowest numbers. Computes 202, 10,121 woot ‘Table 5, Numbers of selected studies by type ¢ Gied=s#~SGited—=Si”SCteds=SiSSCites Bl 905 LAB) 6 158] B [es] 95H 9] 16 (49) 6 (59] 2 155] 6 [40] 20, [50] 431 [60] 258 ta] 3 1] 9 {o] 5 [2] 55 52] 0 fox] 133 [43] 18 {53] 16 [63] 90. Vo] 12054] 55 (64] 2 n {65] 7 (46) 2 7 [66] 3 47] 22 209 [67] 4 [69] 6 Figure 5. Number of articles by year of publication, 3. Results ‘As the result of our methodology, we found 32 documents that were published. between 2010 and 2021 that covered the most representative work on the topic of this paper. We focused only on peer-reviewed papers from journals and conferences. All of them were obtained as a result of searching for fraud-related topics in four scientific libraries. Table 6 shows a matrix built using the topics most closely related to the research questions and with references to the corresponding articles. As can be seen, each column identifies a relevant topic associated with the research questions. We can see that seven works were found for RQM(Fraud Detection + Human Behavior + Fraud theory). In contrast, for RQ2 (Fraud Detection + ML/DM techniques), 24 works were found, while for RQ3 (Fraud Detection + Human Behavior + ML/ DM + Fraud theory), only one study was found. So, it looks like there is room for improving fraud detection because RQS brings together most of the topics established in the other research questions. ‘Table 6, Data extraction form. Ref Fraud Detection Human Behavior MUDM Techniques Fraud Theory 1 BS RQi RQi RQI 2 ps] Qh RQi ROI 3 fol Qi RQi RQ 4 tH) RQI RQI RQI 5 [2] RQ ROI RQL 6 Us) RQh RQ. RQL 7 vol Qi Qi ROI 8 is) RQ? RQ? 9 Lisl RQ RQ Computers 2021, 10,121 of ‘Table 6. Cont @ Ref Fraud Detection Human Behavior _MUDMTechniques Fraud Theory wo (7 RQ RQ? 1 fs] RQ RQ 2 fs] RQ RQ 33 [50] RQ. RQ? 4 Bil RQ RQ 15 RQ RQ 16 RQ? RQ v RQ RQ 18 RQ RQ 1% RQ RQ 20 RQ? RQ? 21 RQ RQ 2 RQ RQ 23 [60] RQ RQ? 2% [si] RQ? RQ 25 [2] RQ? RQ 26 [63] RQ? RQ 27 [sa] RQ? RQ? 28 [ss] RQZ RQ? 29 [ss] RQ? RQ? 30 [7] RQ? RQ a1 [ss] RQ RQ2 32 9] ROB RQ3 RB ROS Table 7 shows the frequencies of the works found vs. the research question, As can be seen, RQ2 is the most frequently investigated. It accounts for 88.46%. Only one paper was found for RQI, accounting for 3.84%, and RQ3 accounts for 7.69% ‘Table 7, Data extraction form. RQ ‘Study Identifier Frequency Percentage 1 (3843,70] 7 21.88 2 [45-68] 2 75 3 [69] 1 3.13 3d, ROL: How Gan Fraud Be Detected by Analyzing Human Behavior by Applying Fraud This section details the results obtained from the analysis of research papers that relate fraud detection with the point of view of human behavior by applying the Fraud Triangle ‘Theory. The investigation is intended to answer RQ1. We answer this question through a statistical analysis of the number of documents linked to the research question. According to Table 6, seven works were found. Hoyer et al. [38] proposed a prototype in a generic architectural model that considers the factors of the fraud triangle. In this way, in addition to the analysis applied as part of a traditional fraud audit, human behavior is considered. By doing this, the transactions examined by an auditor can be better differentiated and prioritized. Behavioral patterns are found through the incorporation of the human factor. ‘These patterns appear in multiple sources of information, especially in users’ data, such as in e-mails, messages, network traffic, and system records from which evidence of fraud can be extracted, Sanchez et al. [39] presented a framework that allows the identification of people who commit fraud and is supported by the Fraud Triangle Theory. This proposal is based on the use of a continuous audit that is installed on user devices, collects information from agents, and employs the collection of phrases, They are subsequently analyzed to identify fraud patterns through the analysis of human behavior and the treatment of the Computers 2021, 10,121 i20fz2 results. In [40], based on primary data on the behavior of perpetrators who commit fraud, the authors showed the complementarity between an ex-post analysis and the existing literature on this topic. They suggested that the presence or absence of fraudulent intent can be assessed by scrutinizing human behavior, Mackevicius and Giriunas [41] analyzed the Fraud ‘Iriangle Theory and presented its associated elements: “motives, possibilities, pressure, rationalization, incentive, and others”. They offered a theoretical analysis of the fraud scales and their elements: motives, conditions, possibilities, and performance ‘To this end, the authors analyzed 265 respondents—including accountants, stakeholders, public officials, and inspectors in Central Java, Indonesia—by using structural equation modeling (SEM) with the AMOS analysis tools. In [42], the authors assessed the Fraud ‘Triangle Theory and human behavior in order to study the factors of opportunity, financial processes, and rationalization. The authors emphasized the importance of psychological and moral aspects. The International Auditing Standard AI240 focuses on the auditor’s responsibility to assess fraud in an audit of financial statements. The authors of [43] explored if the standard has been used effectively in Indonesia based on the proposed fraud indicators through a fraud analysis. A questionnaire survey was conducted with three groups of auditors: external, internal, and government auditors. This study examined auditors’ perceptions of the importance and existence of warning signs of financial fraud by using the fraud diamond. The findings indicate that the auditors were able to identify these red flags by giving them high scores. On the contrary, regarding the “level of use”, the scores were low. Mekonnen et al. [70] presented an insider threat prevention and prediction model based on the fraud diamond by combining various approaches, techniques, and IT tools, as well as criminology and psychology. The deployment of this model involved the collection of information about possible intentions by using privileged information within a context of preserving privacy, thus enabling high-risk insider threats to be identified while balancing. privacy concerns. 3.2, RQ2: What Machine or Deep Learning Techniques Are Used to Detect Fraud? This section reports the results of works that described the implementation of machine learning and data analysis for fraud detection. We aimed to identify the most commonly used machine or deep learning techniques in this realm. Table 7 shows that this research question had the highest number of related works. Table 8 presents the main focus of the articles and the ML/DL techniques used, as well as the dataset information. All of these articles are summarized below. There are works that enhance traditional security approaches. In [60], the need to use the Process Information Systems (PAIS) software in organizations and the importance of fraud detection were investigated, They claimed that this tool is a must for organizations, as its flexibility raises fraud detection, The authors of [63] sought to design an artifact (hardware) for detecting communications from disgruntled employees through automated text mining techniques. The artifact that they developed extended the layered approach in order to combat internal security risks, They claimed that this phenomenon can be detected in e-mail repositories by using employee dissatisfaction as the primary indicator of fraud risk. Considering the methods of fraud detection based on simple comparisons, detection of associations, clustering, perdition, and outliers, an automated fraud-detection framework ‘was proposed in [47]. The framework allowed fraud identification by using intelligent agents, data fusion techniques, and various data mining techniques. In [67], the authors proposed the detection of bank fraud through data extraction techniques, association, grouping, forecasting, and classification to analyze customer data to identify patterns Teading to fraud. To conclude this group of papers, West ct al. suggested that a higher level of verification /authentication can be added to banking processes by identifying patterns ‘To do this, the authors reviewed key performance metrics used to detect financial fraud, with a focus on credit card fraud. They compared the effectiveness of these metrics to detect if fraud was carried out. In addition, the performance of the application of various Computers 2021, 10,121 13 0f22 computational intelligence techniques to this problem’s domain was also investigated, and the efficacy of different binary classification methods was explored. Table 8, Summary of works that used machine or deep learning techniques to detect fraud. = XNDTON NIA Se a pe Uw a goon Gumcaldug Pevencd a iybrd deacon model rng machine lang nd Sa mA Ke) RF ‘Financial and non-financial da methods for detecting financial fraud, wi mp WA “Rolo Fad dtcton framework at allows fraud Wentcaton ng ‘huligen agent dats son ecgus and data mining techs a a UEiNiins Lassng Modif means cern algerie etacting oles and eming im epoory Tram he lant nproveprupg precio Ty) Cw ML Sv NB CARE NIA Care ie ope eudand opine cba NBDE ma ” ‘Vadncartnstworao conse oration fom a vase of cha and a NN Ni database sources to identify suspicious account activity. sy EM ioaeeng and ‘Woidine snd ha Univrshé Presented study on he ue of cing and dsr echnigues and compu atboos Csr ibs ls Tht precio ofa tecton = WM ANN Tndanean doskeachige _Tvagh a sppliaton fat ining algo, uch VM snd ANN he ony cen incr for dling far faa ae potable ‘evelopment of ee malplnclas asien-—MLR, SYM, nd BN—o wall as ©) MER Sia ane aN NIA rv oso dengan canying miatomens caning he reat fen aad a MIRE SIN GR GNA, NA aed dns mining tight were sed on datz volving 202 Chine LENS “Seopa sod competed ha wih tnd wide the tltion of ace G BIRSIAGNN senbie or aud detection thancal poring, vasous eniguesafratunal ngage ‘ iechaigie sod LDA dtcuments EUCAR) ocean nd upevand Sachneletang se pple oa ON mI TEs rn ont fm abi ps onc ar a ERNWEMC BN DT mi etal oad aap apna ep sn EN SVG BND a imitsobeye opinion ad act eing Tih compare the proposed method dao and ng Bose WwihLi NNSUMBN DT Ataboos, thd Logos on four FD coasts UENO SNA Deen NIA dh wn of isiing meds i cic fd gh “nd Deco be a oe NA Tipped DRL hanya ov pp liatns nbsing wd dso tinplate rnd eecton Pane, Hewre WA Undine tng GA sls nrg rine a TENe NA Grd ard aod dcton wang pero ining ago al Talend Fane NIA Sem at dts dntheprocerng of ced card eaneactons I Ne Fold Daigned an rack Qardware) for detecting communications rom digranlcd data “Saployes ung ntomated Wt ing chien, Treraorl Grandad Araya th wee dat mining approach mere to dc the Hak of eal ea MLce service provider fraud. ma ONT RSST Tad innsatns rom Tiled the dp aring model for cog of le reNRISiN indoresan ban ‘ord wanacone 1 RE witter and Pace igor slong withopproptiate indus use eases. Tagline NIA Detcton ob a tough the of ata ing technique im GRAN SM Ueoneo Key perforant ud for Fanci Fraud Deecion FFD) witha fous on “eter fraud © Neural Networks: NN Decision Trees: DT Bayesian Networks: BN; Random Forest Rly Kemeans: KM; Support Vector Machine ‘SVM: Artificial Neural Network: ANN; jaltinomial Logistic Regression: MLR Mulilayer Dizect Feed Neural Network: MLFF; Genetic Programming: GP; Group Method of Data Management: GMDH; Logistic Regression: LR; Probabilistic NN: PNN; Binomial Logistic Regression: BLR Latent Dirichlet Assignment A; K Nearest Neighbor: KNN, Deep Reinforcement Learning? DRL; Multivariate Latent Class Clustering: MLCC; Convolutional Neural Network: CNN; Stacked Long, Short-Term Memory: SLSTM Naive Bayes: NB. In [45], the authors summarized and compared different datasets and algorithms for automated accounting fraud detection. The selected works addressed mining algorithms that included statistical tests, regression analysis, NN, DI, BN, stack variables, ete. Re~ Computers 2021, 10,121 1ofz2 gression analysis was widely used to hide data. Generally, the effect of detection and the precision of NN were higher than those of regression models. The overall conclusion ‘was that pattern detection is better than detection by an unaided auditor. Due to the small size of the fraud samples, some publications reached decisions based on training samples and may have overestimated the effects of the models. In [46], S. Wang presented a hybrid detection model using machine learning and text mining methods for detecting financial fraud, This model used financial and non-financial data and employed two ways of selecting easy-to-explain characteristics. During the investigation, the author chose 120 fraudulent financial statements disclosed by the China Securities Regulatory Commission (CSRO) between 2007 and 2016. He compared the performance of five machine learning methods and found that the Random Forest method had the following advantages: (1) It is suitable for processing high-dimensional data; (2) it avoids overfitting to some extent; @) itis robust and stable. Ravisankar et al. proposed the use of data mining techniques to identify companies that resort to financial statement fraud [54]. Specifically, the authors tested the MLFF, SVM, GP, GMDH, LR, and PNN techniques. The evaluation considered the role of feature selection and relied on a dataset involving 202 Chinese companies. Theit results indicated that the PNN outperformed all of the methods without feature sclection, and the GP and PNN outperformed others with feature selection and marginally equal Pree or other works that compared different MI. methods, we found the following. In 53), the authors developed three multiple-class classifiers (MLR, SVM, and BN) to detect and classify misstatements according to the presence of fraud intent. Using the MetaCost tool, the authors conducted cost-sensitive learning and solved class imbalance and asymmetric rmisclassfication costs. In [58], the use of data mining methods to detect fraud in electronic ledgers through financial statements was explored. The Linear Regression, ANN, KNN, SVM, Decision Stem, MSP Tree, J48 Tree, RE, and Decision Table techniques were used for training, The authors of [61] detected credit card fraud by using supervised learning algorithms, such as a DT and NB. Focusing on the use or comparison of ANNs with other methods, Vimal Kumar et al. [49] analyzed the challenges of detecting and preventing fraud in the banking industry when having insider information. The authors reviewed some of the data analysis techniques for detecting insider trading scams. Their work lists the best data mining techniques available (NN, DT, and Bayesian Belief Networks), which have been proposed by many researchers and employed in different industries. They concluded that the banking industry's primary requirements are fraud detection and prevention and that data mining techniques can help reduce fraud cases. In addition, the work in [50] proposed the use of NN to correlate information from a variety of technological sources and databases in order to identify suspicious account activity. The work in [52] applied data mining algorithms, such as a SVM and ANNs, to detect financial fraud. The authors stated that the essential indicators of financial fraud are profitability and efficiency. The incorporation of these factors improved the accuracy of the SVM algorithm to 88.37%. The ANNs produced the highest precision, 90.97%, for data without feature selection. In [56], Mohanty et al. aimed to identify a person of interest from the corpus of Enron email data released for research. They tried to detect fraudulent activities by means of an ANN with the activation functions of the Adam optimizer and ReLU. Their work achieved high precision in terms of recall, accuracy, and FI score Regarding unsupervised approaches, a proposal to detect outliers using a modified K-Means Clustering algorithm was presented in [48]. For this work, the detected outliers were removed from the dataset to improve the grouping precision. They also validated their approach against existing techniques and benchmark performance. The authors of [51] presented a study on the use of K-Means Clustering and the AdaBoost Classifier, comparing their accuracies and performances with an analysis of the past and present models used for fraud detection Computers 2021, 10,121 15 0f22 Regarding the use of more sophisticated techniques for the problem of fraud detection in financial reporting, the authors of [55] applied various natural language processing tech- niques and supervised machine learning, including BLR, SVM, NN, ensemble techniques, and LDA. They applied Latent Dirichlet Allocation (LDA) to a collection of 10-K financial reports of documents available in the EDGAR database of the United States Security and Exchange Commission to generate a frequency matrix of documents and topics. In addition, they applied evaluation metrics, such as the accuracy, receiver performance characteristic curve, and area under the curve, to evaluate the performance of each algorithm. For the resolution of problems for FED, Li and Wong, [57] proposed a new method based on GBGP through multi-objective optimization and set learning. They compared the proposed method with LR, NN, SVM, BN, DT, AdaBoost, bagging, and LogitBoost in four FFD datasets. The results showed the efficacy of the new approach on the given FFD problems, including two real-life situations. The authors of [59] applied the theory of DRL through two applications in banking and discussed its implementation for fraud detection. Using a DT with a combination of the Luhn algorithm and the Hunt algorithm, Save et al. [62] proposed a system that detects fraud in the processing of credit card transactions. The validation of the card number is done through the Luhn algorithm. The authors of [64] focused on the detection of external fraud. The use of a data mining approach in order to reduce the risk of internal fraud was also discussed. Consequently, a descriptive data mining strategy was applied instead of the widely used prediction data mining techniques. ‘The authors employed a multivariate latent class clustering algorithm for a case firm’s procurement data. Their results suggested that their technique helps to assess the current risk of internal fraud Exploring a deep learning model to learn short- and long-term patterns from an unbalanced input dataset was an objective set by [65]. The data obtained were transactions of an Indonesian bank in 2016-2017 with binary labels (no fraud or fraud). They also explored the effects of sample ratios of non-fraud to fraud from 1 to 4 and three models: a convolutional neural network (CNN), short-term /long-term stacked memory (SLSTM), and a CNN-LSTM hybrid. Using the area under the ROC curve (AUC) as the model performance metric, the CNN achieved the highest AUC for R = 1, 2,3, 4, followed by the SLSTM and CNN-LSTM. The authors of [66] proposed the implementation of both the document clustering algorithm and a set of classification algorithms (DT, RF, and NB), along with industry-appropriate use cases, In addition, the performance of three classification algorithms was compared by calculating the “Confusion Matrix”, which, in turn, helped us calculate performance measures such as “accuracy”, “precision”, and “recovery”. 3.3. ROS: Using Machine Learning Techniques, How Can Fraud Cases Be Detected by Analyzing Human Behavior Associated with the Fraud Triangle Theory? We found only one work related to this research question. This means that we obtained few results when we tried keywords related to the topics most relevant to the research questions (Fraud Detection + Human Behavior + Machine Learning Techniques + Fraud ‘Triangle Theory). Therefore, the combination of ML techniques and theories related to fraud needs further investigation because it would integrate two knowledge fields (psychology and data science) in order to improve fraud detection. In [69], the authors examined the aspects of the fraud triangle using data mining techniques in order to evaluate attributes such as pressure/incentive, opportunity, and attitude/rationalization, and, through the use of expert questionnaires, they discussed whether their suggestion agreed with the results obtained with the adoption of those techniques. The data extraction methods used in this research included logistic regression, decision trees (CART), and artificial neural networks (ANNS). They also compared data mining techniques and expert judgments. The ANNs and CARI achieved training samples of 91.2% (ANN) and 90.4% (CARI), and they were tested with correct classification rates of 92.8% (ANN) and 90.3% (CART), which were more precise than those of logistic models, which only reached 83.7% and 88.5% of correct, classification in the assessment of the presence of fraud. Computers 2021, 10,121 16 0f22 3.4. Quality Assessment Once the QA questions were defined, we evaluated the primary studies identified in the SLR. The score assigned to each study for each question is shown in Table 9. Table 9. Quality assessment + QA QA-2 QA-3 Total Score Max S Bs P P Y z 06.67 9] P P y 2 66.67 [40] N N N 0 0 a1] P Y Y 2 66.67 2] N N N ° 0 3] N N N 0 0 [70] P P Y 2 66.67 [5] P Y Y 25 83.33 fa] P y y 25 8333 47] N N N 0 0 [48] P P Y 2 66.67 [49] P Y Y 25 83.33 [50] P P Y 2 66.67 [1] P P Y 2 66.67 (52) P P y 2 66.67 53] P P y 2 66.67 [54] N N N 0 0 55] P P Y 2 66.67 (56) P y y 25 8333 57] P Y Y 25 83.33 [58] N N N 0 0 59] P P y 2 66.67 {60} P y y 25 83.33 {6l] N N N 0 0 [62] N N N 0 0 {63} P Y Y 25 83.33 {o4] 0 ° ° 0 0 [65] P P Y 2 66.67 {66} N N N 0 0 {67 P Y y 25 83.33 [68] P Y Y 25 83.33 [69] P Y Y 25 83.33 Total 105 ie 2 49 Max QA’ 21.42 33.68 449 100, Total Score 47.62 7381 100 The total of the accumulated scores from the QA questions can be observed in the “Total Score” row, showing that QA3 has 22 points, corresponding to 44.9%, demonstrating, that this question was more representative in the review. QA? followed this with 33.68%, and QA1 followed with 21.42%. On the other hand, the last row identifies the percentage of points collected by the values assigned for a given QA question with respect to the points obtained if each selected study received the highest score. Refs. [$5 46,49,56,57,60,63,67,69] obtained the highest score of 2.5, which represents 83.33% of the maximum score that a preliminary study could obtain; on the other hand, Refs. [38,39,41,44,48,50-53,55,59,65] obtained a score of 2, that represents 66.67% of the maximum score, Refs, [40,42,43,47,54,58, 61,62,64,66] failed to get any scores, which means that their title and abstract showed that they could answer the research question for this SLR, but after reviewing the full articles, no features related to fraud detection using machine learning techniques were discussed. Computers 2021, 10,121 wotm 4. Discussion In this work, we have reviewed contributions related to fraud detection, with a special emphasis on those addressing fraud detection from the perspective of the modeling of ‘human behavior, Applying techniques related to the analysis of human behavior allowed us to consider behavioral factors that could empower the detection of unusual transactions that would not have been considered if using traditional auditing methods. By observing people’ behavior, it can be seen that the human factor is closely related to the Fraud Triangle Theory. On the other hand, the use of machine learning techniques to detect fraud was also implemented in several works to predict behaviors related to this phenomenon. As a result of our research, a significant number of articles (24) addressed this approach. In this context, we found that mainly supervised and unsupervised algorithms are used for fraud-detection analysis. The supervised strategy enables the blocking of fraud attempts based on fraudulent and non-fraudulent samples. This is used in rule-based detection, which automatically infers discriminatory rules from a labeled training set. In addition, regarding fraud detection, our research unveiled that supervised algorithms regularly have to deal with unbalanced classes, which might result in poor detection. Furthermore, these techniques are unable to identify new fraud patterns. Unsupervised learning, however, concentrates on the discovery of suspicious behavior as a proxy of fraud detection and, thus, does not require prior knowledge about verified fraudulent cases. ‘Our review focuses on fraud detection performed by means of machine learning techniques oF through analysis of human behavior based on the Fraud Triangle Thcory. By answering thrce research questions, we tried to unveil how both approaches are addressed in the literature and how they may be jointly applied. By answering RQI, keywords such as human behavior and theories related to fraud ‘were linked, resulting in several related studies. The answer to RQ? linked machine learning techniques with fraud detection; this question was the one that generated the most results. The analyzed questions each produced results in a specific field, but when trying to combine these fields by answering RQ3, we did not find works linking fraud detection by means of machine learning techniques with any theory related to fraud. Despite the existence of works about detecting fraud in the areas of data mining and fraud theories, no literature reviews that jointly covered these two areas were identified ‘Table 10 presents a comparative summary of seven relevant SLRs and surveys performed in the area of fraud detection, including our contribution, ‘Table 10, Comparison of related systematic literature reviews, _ - a awuay Fal Saneed Wark Quay Aneamentat SUR Work Year on Period Data se Primary Studies Primary Studies vay amy Fate mein NA uasze naam ‘Newline galaton baat a ud Ni s Ni ‘No valuation eters pled Pag NE cag RROD 595/98 ‘No vahiton cers applied t fava Tange Theory ope ; sinea en Danning wigan imal epgmy pgs ae Te walled atten ea adnan ? 259th “hited fe) amor Damn ied neal N/A Nin Nia ‘No vlan eters pled "Theory and dat ining echadgues” proposed by Us] T-TREE Xplore ACM DI 5 Fagincering Vilage (Compendenys IST Wob of Scioncer 5 ScienceDirect Wiley Ter Science Journals 2: Google Scholar, 8 Citescer, 8 Springerink; 10: Scopus; TI: Business Source Premier (EBSCO); 12: Emerald Full Text 13: World Scientific Net: ProQuest. Computers 2021, 10,121 18 0f22 In the “Context” column of Table 10, there are four SLRs that are exclusively related to some aspect of data mining [25,26,28,29], while only one is related to some aspect of fraud theory [75], in addition to other approaches [73,75]. The last row of Table 10 also presents information about the SLR covered in this document, the context of which explores both data mining and fraud theories together, unlike the other seven presented in this table. ‘These SLRs were published between 2007 and 2020, with the novelty that some of them. [26,29,73] do not mention the related search period. The research periods of [25,28,74] range from 10 to 11 years, but include primary studies without making cuts in any specific year. Some works do not specify the sources of data, and those doing so report a variable number of data sources. Studies that mention data sources do not clearly explain their reasons for selecting them. On the other hand, for our research, four data sources were chosen to maximize the probability of identifying relevant candidate works as primary studies. Both the number of candidate articles from the data sources and the number of selected primary studies are presented in this table for cach SLR. The differences in these numbers may be related to the context of each investigation, e.g, data sources used, keywords, etc For our SLR, the number of reviewed works resulted from the searches in the different data sources used in combination with the chosen keywords, while the final number of primary studies was similar to those of other works. It should be noted that there are works that do not mention this metric Although quality evaluation is not a mandatory parameter in the structure of an SLR, according to [76], itis an essential contribution in this type of work in order to improve its quality. None of the analyzed works clearly showed how an evaluation was carried out in this regard. No criteria were mentioned for assessing the quality of the primary studies. Our work was based on the evaluation criteria proposed by [77]. 5. Conclusions and Future Work Fraud detection is complex, as it requires the interpretation of human behavior, but this is not the only issue. The lack of data available for training or testing detection models significantly complicates the assessment of detection strategies. Even when data are available, unbalanced datasets are the norm in this domain. ‘Accordingly, there are very different approaches that tackle the problem of fraud detection, as well as systematic literature reviews that are intended to address these limitations from a more global perspective. Thus, the purpose of this research was to identify publications related to fraud detection through the use of ML techniques based on the Fraud ‘Triangle Theory. The proposed reference frameworks focus on developing tools that allow auditors to perform fraud analyses more efficiently by shortening their detection time through support from data mining techniques. Most of the works concentrate on carrying out their analyses after fraud has been carried out in an attempt to shorten the time taken to find results; thus, these proposals are reactive to such events, Through this research, it was found that there are a significant number of research projects that are being carried out in this specific area of fraud detection; in general, they have a solid level of maturity. The large number of publications in conferences and jourals—representing 50% and 50% of primary studies, respectively—is substantial proof. In addition, the results of the quality evaluation carried out for the primary studies showed that the evaluation of their proposals was satisfactory in terms ofthe criteria of “relevance”, “limitations”, and “methodology”. When we assumed an approach to fraud detection. through data mining techniques and the use of fraud theories associated with human behavior, this SLR reveals very little evidence from studies supporting this approach, since only one primary study was found, corresponding to 3.13% of the studies. When we allowed partial coverage, that is, fraud detection by applying only data mining techniques, 24 primary studies (corresponding to 75%) could be classified. On the other hand, when we analyzed the approach to the analysis and detection of fraud in which only theories related to fraud that were associated with human behavior were considered, seven primary studies (corresponding to 21.88%) were found to support this approach. Computers 2021, 10,121 19 0f22 In this sense, only one study with evidence of the use of data mining techniques, the application of fraud theories, and a corresponding analysis of human behavior to detect fraud was identified, which means that there is a gap, and this is an appropriate field to investigate. As future work, itis proposed that a review focused on detecting fraud and incorpo- rates an analysis of the availability of data and the lack of access to this resource, including other data sources as possible alternatives, should be carried out, Author Contributions: Conceptualization, MSA. and LU-A.; methodology, MSA. and JEJy validation, MS.~A,, LUA. and JJ; investigation, MS.-A,; writing—original draft preparation, MS-A; writing—review and editing, LU.-A. and J.-J; supervision, LUA. All authors have read and agreed to the published version of the manuscript Funding: This research received no external funding, Institutional Review Board Statement: Not applicable Informed Consent Statement: Not applicable. Data Availability Statement: The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy limitations conceming the use of personal information, Acknowledgments: This work was partially supported by Escuela Politécnica Nacional under the research project PILDETRI-2021-02 “Deteccién de frauide mediante andlisis de t6picos y métodos de clasificacion”, Marco Sanchez is a recipient of a teaching assistant fellowship from Escuela Politécnica ‘Nacional for doctoral studies in computer science, Conflicts of Interest: The authors declare no conflict of interest References 1. Shaikh, A.K; Nazir. A novel dynamic approach to identifying suspicious customers in money transactions. In, J Bus, Intell Data Min, 2020, 17, 143-158 2. Panigrahi, PK. A framework for discovering internal financial fraud using analytics. In Proceedings ofthe 2011 International Conference on Communication Systems and Network Technologies, Katra, India, 3-5 June 2011; pp. 323-327. 3. Silowash, G; Cappelli, D; Moore, A. Trzeciak, R Shimeall,T; Flynn, I. Common Sense Guide to Prevention and Detection of Insider Threats, th ed.; Carnegie Mellon University CyLab: Pittsburgh, PA, USA, 2012, 4. Kassem,R. Detecting asset misappropriation: A framework for external auditor. Int J. Account. Audi. Perform. Eval. 2014, 10, 1-42. [Crossed] 5. Sayal, K; Singh, G. What Role Does Human Behaviour Play in Corporate Frauds? Econ. Poitial Wkly. 2020, 5. Available online: Jtps/ /www.epwin engage /aticle/what-ole-does-human-behaviour-play-corporate (accessed on 1 September 202). 6 Gabrielli, GMedioli, A. An overview of instruments and tools to detect fraudulent financial statements. Uno, J Account Finan 2018, 7, 76-82. [CrossRef] 7. Dimitrijevig, D.; Kalinig, Z. Software Tools Usage in Fraud Detection and Prevention in Governmental and External Audit Organizations in the Republic of Serbial. In Krowledge-Fzonony Society, Cracow University of Ezonomies: Cracow, Poland, 2017; 7 8, Vynokurove, 0; Peleshko, D; Bondarenko, Ilyasoy V; Serchanoy, ; Plesk, M. Hybrid Machine Leaming System fr Solving Fraud Detection Tasks. In Proceedings of the 2020 IEEE Third Intemational Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, I-25 August 2020; pp. 1-5. [CrossRef] 3 Lebichot,B; Paldino, GM; Bontempi, G, Siblini, W. Tle, 1. Oble, F. Incremental learning strategies for credit cards fraud detection: Extended abstract. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6-9 October 2020; pp. 785-786. [Crossitf] 10, aia, RA Discrete Wavelet Transform Approach to Fraud Detection. In Proceedings of the International Conference on Network and System Security, Helsinks Finland, 21-23 August 2017 11. Vynokurova, 0.;Peleshko, D.; Zherova, P; Perova,1; Kovalenko, A. Solving Fraud Detection Tasks Based on Wavelet-Neuro Autoencoder In Proceedings ofthe International Scientific Conference “Intellectual Systems of Decision Making and Problem of ‘Computational Intelligence”, Zalizniy Port, Ukraine, 25-29 May 2021; pp. 535-546. [CrossRef] 12, Omair,B; Alturki, A. Taxonomy of Fraud Detection Metrics for Business Processes. ITT. Access 2020, 8, 7IS64-71877, [CrossRef 13. Omair, By Alturki, A. MultiDimensional Fraud Detection Metrics in Business Processes and their Application. Int. J. Ade. Comput. Sei Appl. 2020, 11,570. [CrossRef] VA. Ruankaew, T. The Fraud Factors. int. J. Manag. Adm, Sci (IJMAS) 2013, 2, 1-5 Computers 2021, 10,121 200822 15. Mansor, N; Abdullahi, R. Fraud triangle theory and fraud diamond theory. Underst future research. Int. J. Acad. Res. Account. Financ. Manag. Sei. 2015, 1,38-45. 16, Burke, D.D, Sanney, KJ. Applying the fraud triangle to higher education: Ethical implications, J, Legal Slud. Esc, 2018, 35 [CrossRef] 17. Awang, N; Hussin, NS.; Razali, FA, Lyana, S; Talib, A. Fraud Triangle Theory: Calling for New Factors. Editor. Board 2020, 7, 54-64 18, Wolfe, D.T; Hermanson, DR. The fraud diamond: Considering the four elements of fraud. CPA J. 2004, 74, 38. 19. Ruankaew, T. Beyond the fraud diamond. Int. J Bus. Manag. Econ. Res. (I]BMER) 2016, 7, 474-476. 20. Christian, N.; Base, Y; Arafah, W. Analysis of fraud triangle, fraud diamond and fraud pentagon theory to detecting corporate fraud in Indonesia. Int. J. Bus. Manag. Technol. 2019, 3, 73-78, 21. Manolopoulos, ¥; Spathis, C, Kirkos, F. Data Mining techniques for the detection of fraudulent financial statements. Expert Syst. Appl. 2007, 32, 995-1003. 22. Mecnatkshi, R, Sivaranjani, K. Fraud detection in financial statement using data mining technique and performance analysis. ICTA 2016, 9, 407-113, 23. AlHlashedi, K.G; Magalingam, P. Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Comput. Sci, Rev. 2021, 40, 100402. [CrossRef] 24, Deng, W, Huang, Z.; Zhang, J; Xu, J. A Data Mining Based System For Transaction Fraud Detection. In Proceedings of the 2021 IEEE Intemational Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15-17 January 2021; pp. 542-548. 25. Phua, C; Lee, V; Smith, K, Gayler, R.A comprehensive survey of data mining-based fraud detection research. arXio 2010, arXiv’ 10096119. 26. Zhou, X; Cheng, S; Zhu, M.; Guo, C; Zhou, $; Xu, P; Xue, Z.; Zhang, W. A state of the art survey of data mining-based fraud detection and credit scoring. In MATEC Web of Conferences; EDP Sciences: Les Ulis, France, 2018; Volume 189, p. 13002 27, Gupta, S; Mehta, SK, Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies. Glob. Bus. Rev. 2021. [CrossRef] 28. Ngai, EW; Hu, Y; Wong, YH; Chen, ¥; Sun, X. The application of data mining techniques in financial fraud detection: A. classification framework and an academic review of literature. Decis. Suppor! Syst. 2011, 50, 559-569, [CrossRef] 29, Yue, D.; Wu, X; Wang, ¥, Li, Ys Chu, CH. A review of data mining-based financial fraud detection research, In Proceedings of ‘the 2007 International Conference on Wireless Communications, Networking and Mobile Computing, Shanghai, China, 21-7 September 2007; pp. 5519-5522. 30, Sasirekha, M, Thaseen, 1S.; Banu, JS. An Integrated Intrusion Detection System for Credit Card Fraud Detection. In Addoances in Computing and Information Technology; Springer: Berlin Heidelberg, Germany, 2012; pp. 85-60. 31. Dyba, T; Kitchenham, B.A, Jorgensen, M. Evidence-based software enginzering for practitioners. IEEE Softw. 2008, 22, 58-65. [CrossRef] 32, Staples, M.;Niazi, M. Experiences using systematic review guidelines. J. Syst, Soft, 2007, 80, 1425-1437, [CrossRef] 33. Kitchenham, B Charters, S, Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE 2007-001, Keele University and Durham University Joint Report; Kitchenham: Newcastle, UK, 2007. Available online: https: /citeseer ist psu edu/viewdoc/download?doi=10.1.1.117 471érep=rep léetype~pdf (accessed on I September 2021), Cronin, P; Ryan, F; Coughlan, M. Undertaking a literature review: A step-by-step approach. Br. J. Nuys. 2008, 17, 38-43. [CrossRef] Zhang, H.; Babar, M.A,; Tell, P, Identifying relevant studies in software engineering. Inf. Softw, Technol. 2011, 53, 625-637. [CrossRef] 36. Rouhani, B.D, Mahrin, MN; Nikpay, F; Ahmad, RB, Nikfard, P. A systematic literature review on Enterprise Architecture Implementation Methodologies. In, Softw. Technol. 2015, 62, 1-20. [CrossRef] 37. Li, Ys Peng, R,; Wang, B. Challenges in Context-Aware Requirements Modeling’ Systematic Literature Review. In Proceedings of the Asia Pacific Requirements Engeneering Conference, Melaka, Malaysia, 9-10 November 2017; pp. 140-155. 38. Hoyer, 8, Zakhariya, H; Sandner, T; Breitner, MH. Fraud prediction and the human factor: An approach to include human behavior in an automated fraud audit. In Proceedings of the 2012 45th Hawaii International Conference on System Sciences, ‘Maui, HI, USA, 4-7 January 2012; pp. 2362-2391, 39, Sinchez, M; Torres, J; Zambrano, P; Flores, P. FraudFind: Financial fraud detection by analyzing human behavior. In Proceedings of the 2018 IBEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8-10 January 2018; pp. 281-286, 40. Sandhu, N. Behavioural rd flags of fraud—A qualitative assessment. J. Hum. Values 2016, 2, 221-237. [CrossRef] 41. Mackevitius, J; Giriinas, L. Transformational research ofthe fraud triangle. Ekonomika 2013, 92, 150-168. [CrossRef] 42, Zulaikha, Z; Hadiprajitno, P; Rohman, A; Handayani, R, Effect of attitudes, subjective norms and behavioral controls on the intention and corrupt behavior in public procurement: Fraud triangle and the planned behavior in management accounting Accounting 2021, 7, 331-338, [CrossRef] 43. Omar, NB, Din, LEM. Fraud diamond risk indicator: An assessment ofits importance and usage. In Proceedings of the 2010 International Conference on Science and Social Research (CSSR 2010), Kuala Lumpur, Malaysia, 5-7 December 2010; pp. 607-812. ding the convergent and divergent for

You might also like